Hi all,
so with the help of Guangchuang (ChIPseeker's developer) the problem
was
indirectly solved by reading in the file directly in the function
annotatePeak. I also works with the function readPeakFile btw.
The issue itself, is due to the different chomosome notations when
reading with my function or directly - one makes ensembl like
notations,
whereas ChIPseeker creates UCSC type. This seems to conflict
internally
in the annotatePeak.
For future reference, see bellow for the full discussion and debbuging
between myself and Guangchuang.
António
-------- Original Message --------
Subject: Re: ChIPseeker: Error in annotatePeak
Date: Fri, 9 May 2014 09:58:18 +0800
From: Yu, Guangchuang <gcyu@connect.hku.hk>
To: António domingues <amjdomingues@gmail.com>
Dear António,
Great! Problem solved. I will add a testing condition on seqname, and
give user useful information when this case occurred.
Thanks for your feedback!
Best Regards,
Guangchuang
On Fri, May 9, 2014 at 12:51 AM, António domingues
<amjdomingues@gmail.com <mailto:amjdomingues@gmail.com="">> wrote:
Dear Guangchuang,
I did:
peaks_cs <- readPeakFile(peaks_file)
compare(peaks_gr, peaks_cs)
and the main difference is:
Warning message:
In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the
other:
- in 'x': 1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21,
22, 3, 4, 5, 6, 7, 8, 9, X, Y
- in 'y': chr1, chr10, chr11, chr12, chr13, chr14, chr15,
chr16,
chr17, chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5,
chr6, chr7, chr8, chr9, chrX, chrY
Make sure to always combine/compare objects based on the same
reference
so it seems that the object created with readPeakFile have "chr"
but
not the one I created. I would assume that since my transcriptbd
comes from ensembl, via biomart, this would not be a problem.
António
--
António Miguel de Jesus Domingues, PhD
Postdoctoral researcher
Deep Sequencing Group - SFB655
Biotechnology Center (Biotec)
Technische Universität Dresden
FetscherstraÃe 105
01307 Dresden
Phone:+49 (351) 458 82362 <tel:%2b49%20%28351%29%20458%2082362>
Email: antonio.domingues(at)biotec.tu-dresden.de <http: biotec="" .tu-dresden.de="">
--
The Unbearable Lightness of Molecular Biology
On 05/08/2014 06:32 PM, Yu, Guangchuang wrote:
> Dear António,
>
> Can you compare the GRanges object you created and the one from
> readPeakFile provided by ChIPseeker?
>
>
> Bests,
> Guangchuang
On Fri, May 9, 2014 at 12:30 AM, António domingues
<amjdomingues@gmail.com <mailto:amjdomingues@gmail.com="">> wrote:
Thanks for looking into this.
Here it goes the result of traceback:
> peakAnno <- annotatePeak(peaks_gr,
+ tssRegion=c(-3000, 0),
+ as="GRanges",
+ TranscriptDb=txdb_hg19,
+ annoDb="org.Hs.eg.db"
+ )
>> preparing features information... 2014-05-08 06:28:40 PM
>> identifying nearest features... 2014-05-08 06:28:40 PM
Error in IRanges:::normalizeSingleBracketSubscript(i, x) :
subscript contains NAs or out of bounds indices
> traceback()
8: stop("subscript contains NAs or out of bounds indices")
7: IRanges:::normalizeSingleBracketSubscript(i, x)
6: IRanges:::extractROWS(x, i)
5: IRanges:::extractROWS(x, i)
4: features[ps.idx]
3: features[ps.idx]
2: getNearestFeatureIndicesAndDistancespeak.gr <http: peak.gr="">,
features)
1: annotatePeak(peaks_gr, tssRegion = c(-3000, 0), as = "GRanges",
TranscriptDb = txdb_hg19, annoDb = "org.Hs.eg.db")
António
--
António Miguel de Jesus Domingues, PhD
Postdoctoral researcher
Deep Sequencing Group - SFB655
Biotechnology Center (Biotec)
Technische Universität Dresden
FetscherstraÃe 105
01307 Dresden
Phone:+49 (351) 458 82362 <tel:%2b49%20%28351%29%20458%2082362>
Email: antonio.domingues(at)biotec.tu-dresden.de <http: biotec.tu-="" dresden.de="">
--
The Unbearable Lightness of Molecular Biology
On 05/08/2014 06:26 PM, Yu, Guangchuang wrote:
when the error was occurred, run the traceback() function, and show me
the output.
On Fri, May 9, 2014 at 12:22 AM, António domingues
<amjdomingues@gmail.com <mailto:amjdomingues@gmail.com="">> wrote:
Hi Guangchuang,
(skip to the end if want only the conclusion to the story)
I might be missing something but the error tells me that there might
me
NAs in my GRanges (which there aren't):
summary(as.data.frame(peaks_gr))
seqnames start end width
1 :1063 Min. : 50200 Min. : 52449 Min. : 200
6 :1036 1st Qu.: 36269338 1st Qu.: 36270812 1st Qu.: 1100
2 : 992 Median : 70613075 Median : 70614399 Median : 1500
3 : 884 Mean : 80254593 Mean : 80256357 Mean : 1766
4 : 754 3rd Qu.:115672575 3rd Qu.:115674324 3rd Qu.: 2150
5 : 745 Max. :249167000 Max. :249169199 Max. :13300
(Other):7362
strand score
+: 0 Min. :1
-: 0 1st Qu.:1
*:12836 Median :1
Mean :1
3rd Qu.:1
Max. :1
One other alternative would be some confusion with the strand, since
by
default the function BED2RangedData makes strand as "+" in the absence
of information. Very unlikely, but still I tried it:
strand(peaks_gr) <- "*"
and...
>> preparing features information... 2014-05-08 05:59:14 PM
>> identifying nearest features... 2014-05-08 05:59:14 PM
Error in IRanges:::normalizeSingleBracketSubscript(i, x) :
subscript contains NAs or out of bounds indices
so at this point I was none the wiser. Then I read the file directly
from the function:
peakAnno <-
annotatePeak("Tox-W50-G250-islands-summary-FDR0.01_50tags_2fold.bed",
tssRegion=c(-300, 0),
# as="GRanges",
TranscriptDb=txdb_hg19,
annoDb="org.Hs.eg.db"
)
it worked without a glitch! it still does not explain the issue when
creating a GRanges outside annotatePeak.
Best,
António
--
António Miguel de Jesus Domingues, PhD
Postdoctoral researcher
Deep Sequencing Group - SFB655
Biotechnology Center (Biotec)
Technische Universität Dresden
FetscherstraÃe 105
01307 Dresden
Phone:+49 (351) 458 82362 <tel:%2b49%20%28351%29%20458%2082362>
Email: antonio.domingues(at)biotec.tu-dresden.de <http: biotec.tu-="" dresden.de="">
--
The Unbearable Lightness of Molecular Biology
Hi,
Error in IRanges:::normalizeSingleBracketSubscript(i, x) :
subscript contains NAs or out of bounds indices
This line already point out the issue. Please check your bed file.
Bests,
Guangchuang
On Thu, May 8, 2014 at 11:08 PM, António domingues
<amjdomingues@gmail.com <mailto:amjdomingues@gmail.com="">> wrote:
Hi all,
I was trying to ChIPseeker for the first time to annotate some
peaks
with the genomic features, but got the following error:
peakAnno <- annotatePeak(peaks_gr,
tssRegion=c(-300, 0),
as="GRanges",
TranscriptDb=txdb_hg19,
annoDb="org.Hs.eg.db")
Error in IRanges:::normalizeSingleBracketSubscript(i, x) :
subscript contains NAs or out of bounds indices
The peaks originated from Sicer which I converted to GRanges and
then add the seqlengths independently. A tad convoluted but got
the
job done (or so I thought). Bellow the full code. I searched for
the
error but could not find out what is going wrong. Help is
appreciated.
Alternatively, would any one point me to another way to annotate a
list of peaks with genomic features (utr's, exons, introns,
extragenic, ..)?
Cheers,
António
######################
## Peaks
######################
sicer2RangedData <-function(path.to.file) {
# load the file
bed = read.table(path.to.file, header=F, sep="\t",
stringsAsFactors=FALSE)[,c(1:3)]
# generate GRanges object
myrange <- BED2RangedData(bed)
return(myrange)
}
peaks_file="Tox-W50-G250-islands-summary-FDR0.01_50tags_2fold.bed"
peaks <- sicer2RangedData(peaks_file)
#########################
## ChIPseeker likes a proper GRanges obj
#########################
## chr size:
hg19_exons <- exons(txdb_ens)
peaks_gr <- as(peaks, "GRanges")
idx <- c( # where are the proper chromosomes?
grep("^\\d", names(seqlengths(hg19_exons))),
grep("^X", names(seqlengths(hg19_exons))),
grep("^Y", names(seqlengths(hg19_exons)))
)
chrlen <- seqlengths(hg19_exons)[idx] # here they are
chrlen_sorted <- chrlen[sort(names(seqlengths(hg19_exons)[idx]))]
seqlengths(peaks_gr) <- chrlen_sorted
######################
## Annotate
######################
peakAnno <- annotatePeak(peaks_gr,
tssRegion=c(-300, 0),
as="GRanges",
TranscriptDb=txdb_hg19,
annoDb="org.Hs.eg.db")
###########
sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel grid stats graphics grDevices utils
datasets
[8] methods base
other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0
[2] ChIPseeker_1.0.2
[3] org.Hs.eg.db_2.14.0
[4] GenomicFeatures_1.16.0
[5] GenomicRanges_1.16.2
[6] ChIPpeakAnno_2.12.1
[7] AnnotationDbi_1.26.0
[8] GenomeInfoDb_1.0.2
[9] Biobase_2.24.0
[10] RSQLite_0.11.4
[11] DBI_0.2-7
[12] Biostrings_2.32.0
[13] XVector_0.4.0
[14] IRanges_1.21.43
[15] BiocGenerics_0.10.0
[16] VennDiagram_1.6.5
[17] biomaRt_2.20.0
[18] RColorBrewer_1.0-5
[19] plyr_1.8.1
[20] ggplot2_0.9.3.1
loaded via a namespace (and not attached):
[1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.0
[4] bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0
[7] caTools_1.17 codetools_0.2-8 colorspace_1.2-4
[10] digest_0.6.4 fail_1.2 foreach_1.4.2
[13] gdata_2.13.3 GenomicAlignments_1.0.0 GO.db_2.14.0
[16] gplots_2.13.0 gtable_0.1.2 gtools_3.4.0
[19] iterators_1.0.7 KernSmooth_2.23-12 lattice_0.20-29
[22] limma_3.20.1 MASS_7.3-33 Matrix_1.1-3
[25] multtest_2.20.0 munsell_0.4.2 proto_0.3-10
[28] Rcpp_0.11.1 RCurl_1.95-4.1 reshape2_1.4
[31] Rsamtools_1.16.0 rtracklayer_1.23.22 scales_0.2.4
[34] sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0
[37] stringr_0.6.2 survival_2.37-7 tools_3.1.0
[40] XML_3.98-1.1 zlibbioc_1.10.0
--
António Miguel de Jesus Domingues, PhD
Postdoctoral researcher
Deep Sequencing Group - SFB655
Biotechnology Center (Biotec)
Technische Universität Dresden
FetscherstraÃe 105
01307 Dresden
Phone:+49 (351) 458 82362 <tel:%2b49%20%28351%29%20458%2082362>
Email: antonio.domingues(at)biotec.tu-dresden.de <http: biotec="" .tu-dresden.de="">
--
The Unbearable Lightness of Molecular Biology
--
--~--~---------~--~----~------------~-------~--~----~
Guangchuang Yu, PhD Candidate
School of Biological Sciences
Rm 7N-07, Kadoorie Biological Sciences Bldg
The University of Hong Kong
Hong Kong SAR, China
www:
http://ygc.name
-~----------~----~----~----~------~----~------~--~---
[[alternative HTML version deleted]]