mapFromTranscripts strand problem
1
0
Entering edit mode
@lorenzocalviello-10122
Last seen 8.5 years ago

Hi Everybody,
I am using the mapFromTranscripts function to have genomic coordinate of transcript elements, which is absolutely great !
However, I am having some trouble with one specific transcript (same coordinates and ID of an annotated transcript in Gencode23):

library("GenomicFeatures")
tx_pos<-GRanges(seqnames="ENST00000417659.1",ranges=IRanges(start=278,end=286),strand="+")
tx_pos
GRanges object with 1 range and 0 metadata columns:
               seqnames     ranges strand
                  <Rle>  <IRanges>  <Rle>
  [1] ENST00000417659.1 [278, 286]      +
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

As you see, my position of interest starts at 278 and ends at 286, on the plus strand (of the transcript).

Here I define the exons of this transcript (which map on the minus strand):

exon1<-GRanges(seqnames=1,ranges=IRanges(start=764723,end=764925),strand="-",exon_rank=1)
exon2<-GRanges(seqnames=1,ranges=IRanges(start=759032,end=759123),strand="-",exon_rank=2)
exons<-GRangesList(c(exon1,exon2))
names(exons)<-seqnames(tx_pos)

exons
GRangesList object of length 1:
$ENST00000417659.1  
GRanges object with 2 ranges and 1 metadata column:
      seqnames           ranges strand | exon_rank
         <Rle>        <IRanges>  <Rle> | <numeric>
  [1]        1 [764723, 764925]      - |         1
  [2]        1 [759032, 759123]      - |         2

-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths

Now I would like to have the genomic coordinates of my tx_pos element:

As expected, when we take the strand into account with ignore.strand=F in the mapFromTranscripts function, we do not see any result (gene on the minus strand and element on plus strand of the transcript).

mapFromTranscripts(x = tx_pos,transcripts = exons,ignore.strand=F)
GRanges object with 0 ranges and 2 metadata columns:
   seqnames    ranges strand |     xHits transcriptsHits
      <Rle> <IRanges>  <Rle> | <integer>       <integer>
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

When we put ignore.strand=T now we do have an output:

mapFromTranscripts(x = tx_pos,transcripts = exons,ignore.strand=T)
GRanges object with 1 range and 2 metadata columns:
      seqnames           ranges strand |     xHits transcriptsHits
         <Rle>        <IRanges>  <Rle> | <integer>       <integer>
  [1]        1 [764908, 764916]      * |         1               1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

but unfortunately is not what I expected, as this position actually does not map to any of my exons.
My transcript element lies in the second exon (as the first one is 203 nt long), thus what I expect as starting genomic position is 759123-(278-203-1) = 759049

If I put the strand of my tx_pos element to be "*" and ignore.strand=F I the function outputs the correct coordinates.

strand(tx_pos)<-"*"
mapFromTranscripts(x = tx_pos,transcripts = exons,ignore.strand=F)
GRanges object with 1 range and 2 metadata columns:
      seqnames           ranges strand |     xHits transcriptsHits
         <Rle>        <IRanges>  <Rle> | <integer>       <integer>
  [1]        1 [759041, 759049]      - |         1               1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

while the ignore.strand=T still gives me the wrong coordinates:

mapFromTranscripts(x = tx_pos,transcripts = exons,ignore.strand=T)
GRanges object with 1 range and 2 metadata columns:
      seqnames           ranges strand |     xHits transcriptsHits
         <Rle>        <IRanges>  <Rle> | <integer>       <integer>
  [1]        1 [764908, 764916]      * |         1               1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

Am I doing something very wrong which I can't see ? In the help page it was not clear to me how we should set the strand of the transcript element (so I guess it's set to "*").

Should we always put the strand coordinates on the transcript to "*" and  ignore.strand=F to have the correct genomic coordinates of a transcript element?

If that's the case I hope other confused researchers can look at this and clear their minds a bit : )

Thank you so much for the great work, it is really making my life so much easier !

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicFeatures_1.22.13 AnnotationDbi_1.32.3    Biobase_2.30.0          GenomicRanges_1.22.4   
[5] GenomeInfoDb_1.6.3      IRanges_2.4.8           S4Vectors_0.8.11        BiocGenerics_0.16.1    

loaded via a namespace (and not attached):
 [1] XVector_0.10.0             zlibbioc_1.16.0            GenomicAlignments_1.6.3   
 [4] BiocParallel_1.4.3         tools_3.2.2                SummarizedExperiment_1.0.2
 [7] DBI_0.3.1                  lambda.r_1.1.7             futile.logger_1.4.1       
[10] rtracklayer_1.30.4         futile.options_1.0.0       bitops_1.0-6              
[13] RCurl_1.95-4.8             biomaRt_2.26.1             RSQLite_1.0.0             
[16] Biostrings_2.38.4          Rsamtools_1.22.0           XML_3.98-1.4   

 

mapFromTranscripts genomicfeatures • 1.6k views
ADD COMMENT
0
Entering edit mode

Thanks for the report. I'll have a look.

Valerie

ADD REPLY
0
Entering edit mode
@valerie-obenchain-4275
Last seen 3.0 years ago
United States

Hi,

I think you've got your exons reversed, i.e., it's the exon with width 92 that comes first when ignore.strand=TRUE.

When ignore.strand=TRUE the results are the same as if both 'x' and 'transcripts' were on the positive strand. mapFromTranscripts() concatenates the ranges in 'transcripts' as if they were touching so a visual would be something like this:

759032 --------> 759123 764723 --------> 764925

To find the location of tx_pos we need to move 278 from the transcription start; on the positive strand this is on the left at the lowest number 759032. Because the width of the first exon is only 92 we must go into the second exon:

start(second exon) + (start(tx_pos) - width(first exon) -1)

> 764723 + (278 - 92 - 1)
[1] 764908

Your example where the strand of tx_pos is '*', exons are '-' and ignore.strand=FALSE is returning the expected coordinates for a negative strand match (transcription starts at the right with the largest value). This is the same result you'd get if both tx_pos and exons were '-'. Note the strand of the result will always match the strand of 'exons'. When ignore.strand=TRUE the strand of the result is '*'.

Valerie

ADD COMMENT

Login before adding your answer.

Traffic: 572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6