I've been playing with mapToTranscripts(), which is a great addition BTW. It's already useful for me.
It would be nice to give a little more description in the help file on the behavior regarding the positions in the returned object and the strand of 'transcripts'. I scanned the man page and found:
"Transcriptome-based coordinates start counting at 1 at the beginning of the ‘transcripts’ range and return positions where ‘x’ was aligned."
But from this it was not totally obvious that 1 starts from the left regardless of the strand. E.g.:
transcripts <- GRangesList(foo=GRanges("1", IRanges(c(101,201),width=50), strand="-")) x <- GRanges("1", IRanges(248,width=1)) mapToTranscripts(x, transcripts) GRanges object with 1 range and 2 metadata columns: seqnames ranges strand | xHits transcriptsHits <Rle> <IRanges> <Rle> | <integer> <integer> [1] foo [98, 98] - | 1 1 ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths
I believe this is not the same behavior as getSeq() for example, which would consider the rightmost position of a negative strand GRanges(List) to be position 1 in the returned DNAStringSet(List).
just to be clear, I'm not suggesting a change in behavior, just a bit more description in the man page.
R version 3.2.0 (2015-04-16) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.10.3 (Yosemite) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats4 parallel stats graphics grDevices datasets utils methods [10] base other attached packages: [1] alpine_0.1 BiocParallel_1.2.1 [3] speedglm_0.2-1.0 Matrix_1.2-0 [5] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.2 GenomicFeatures_1.20.1 [7] AnnotationDbi_1.30.1 Biobase_2.28.0 [9] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.36.0 [11] rtracklayer_1.28.2 GenomicAlignments_1.4.1 [13] Rsamtools_1.20.1 Biostrings_2.36.1 [15] XVector_0.8.0 GenomicRanges_1.20.3 [17] GenomeInfoDb_1.4.0 IRanges_2.2.1 [19] S4Vectors_0.6.0 BiocGenerics_0.14.0 [21] testthat_0.9.1 devtools_1.8.0 [23] knitr_1.10.5 BiocInstaller_1.18.2 loaded via a namespace (and not attached): [1] Rcpp_0.11.6 compiler_3.2.0 git2r_0.10.1 futile.logger_1.4.1 [5] bitops_1.0-6 futile.options_1.0.0 tools_3.2.0 zlibbioc_1.14.0 [9] biomaRt_2.24.0 digest_0.6.8 memoise_0.2.1 RSQLite_1.0.0 [13] lattice_0.20-31 DBI_0.3.1 stringr_1.0.0 roxygen2_4.1.1 [17] rversions_1.0.0 grid_3.2.0 XML_3.98-1.1 magrittr_1.5 [21] lambda.r_1.1.7 stringi_0.4-1 RCurl_1.95-4.6
thanks!
bringing up getSeq, I just meant that getSeq,BSgenome on a GRanges(List) does not ignore strand by default.
Hi again,
After talking with Herve, we thought the default for 'ignore.strand' should be changed to FALSE for the coordinate mapping functions. This would make them consistent with the (many) other functions in GenomicRanges and GenomicAlignments. While there are reasons for having the default be TRUE, at this point in the game it is probably just confusing. I've made the change in GenomicFeatures 1.21.9.
We also thought it's probably more useful to report position starting from the TSS regardless of strand.
Returns an empty GRanges object.
Currently
mapToTranscripts
returns 1 range at position 7 because it counts from TES instead of TSS. The proposed change would return 1 range at position 2.Let me know if you have a preference for counting from TSS vs TES.
Thanks.
Valerie
This default (ignore.strand=FALSE) does make more sense to me. I typically want to know where am I in the transcript with respect to the TSS. And this way, mapped ranges will correspond to the same positions in the DNAStringSet.
thanks,
Mike
Changes are in GenomicFeatures 1.21.11 and apply to all coordinate mapping functions (mapToTranscripts(), mapFromTranscripts(), pmapToTranscripts() and pmapFromTranscripts()).
Summary of changes:
Thanks for bringing these issues up. Fixing them has made the code cleaner and more consistent.
Valerie