How to make combined annotation of 5'UTRs and CDS's?
1
0
Entering edit mode
anmej • 0
@anmej-20275
Last seen 5.8 years ago

Hello everyone.

I want to extract the annotation of 5UTR+CDS region of every transcript in the hg19 annotation, to search for alternative ORFs. This is what I've managed to do so far:

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

fiveUTRs = fiveUTRsByTranscript(txdb, use.names = TRUE)
names5UTR = names(fiveUTRs)
cds =  cdsBy(txdb, "tx", use.names=TRUE)
namesCDS = names(cds)
names5UTRCDS = intersect(namesCDS,names5UTR)

fiveUTRs = fiveUTRs[names5UTRCDS]
cds = cds[names5UTRCDS]

fiveUTRCDS = GRangesList()
for (i in 1:length(names5UTRCDS)){
    x = GRangesList(c(unlist(fiveUTRs[i]),unlist(cds[i])))
    names(x) = names(fiveUTRs[i])
    fiveUTRCDS = c(fiveUTRCDS,x)
}

I'm basically looping over both lists and concatenating every element. It works, but is very slow and inelegant. Surely there must be a better, functional way do to it? Some way to "zip" the two listsl?

Thanks.

annotation concatenation GRangesList GenomicRanges • 1.1k views
ADD COMMENT
1
Entering edit mode
anmej • 0
@anmej-20275
Last seen 5.8 years ago

I found the answer, and it is embarrassingly simple. Somehow I failed to notice the existence of pair-wise set functions.

fiveUTRCDS = pc(fiveUTRs, cds)
fiveUTRCDS = reduce(fiveUTRCDS)
ADD COMMENT

Login before adding your answer.

Traffic: 621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6