Get Intron Seq
1
0
Entering edit mode
Lillian • 0
@c209ec01
Last seen 17 months ago
United States

Hello,

I'm wondering how to intron sequence by coordinates.

For example, I know that there might be an intron in chr20 start=50941210,end=50942030,

I'm wondering if I could use the BSgenome package in R to derive the DNAString object to get the sequence of intron, wondering if it is possible.

Thank you so much

intron BSgenome.Hsapiens.UCSC.hg38 BSgenome • 1.0k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 19 hours ago
United States

Yes, quite easily in fact.

> library(BSgenome.Hsapiens.UCSC.hg38)
> getSeq(Hsapiens, GRanges("chr20:50941210-50942030"))
DNAStringSet object of length 1:
    width seq
[1]   821 CTACAAAAAAAAAAGTGAATATATATATATATAT...TACTAATAAAGAAGTTGTAACACATGTACCTAC

You could wrap that in as.character if you want the entire sequence, or write to a FASTA file using writeXStringSet.

ADD COMMENT
0
Entering edit mode

Hi!

Thank you so much for providing this.

I just have a question about it that I have tried to create a Granges object with the same coordinates as yours, however, why I feel like the sequences were different?

> gr <- GRanges(gr)
> gr
GRanges object with 1 range and 3 metadata columns:
    seqnames            ranges strand |            ENSG      pval       dPSI
       <Rle>         <IRanges>  <Rle> |     <character> <numeric>  <numeric>
  2    chr20 50941210-50942030      - | ENSG00000000419   0.17493 -0.0109049
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> getSeq(BSgenome.Hsapiens.UCSC.hg38, gr)
DNAStringSet object of length 1:
    width seq                                                                                                       names               
[1]   821 GTAGGTACATGTGTTACAACTTCTTTATTAGTATAACTACTGGAAATGTAT...TATATATATATATATATATATATATATATATATTCACTTTTTTTTTTGTAG 2



 > getSeq(BSgenome.Hsapiens.UCSC.hg38, GRanges("chr20:50941210-50942030"))
DNAStringSet object of length 1:
    width seq
[1]   821 CTACAAAAAAAAAAGTGAATATATATATATATATATATATATATATATATATATAAATAAAA...CTTGCTATGAATACATTTCCAGTAGTTATACTAATAAAGAAGTTGTAACACATGTACCTAC

I just feel a bit confused because I use the same coordinates and BSgenome.

Thank you so much

ADD REPLY
0
Entering edit mode

Your gr is stranded, the sequence is the reverse complement of James version due to strand-aware extraction. Unstranded defaults to positive strand.

ADD REPLY

Login before adding your answer.

Traffic: 473 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6