Ok, so essence of question is: I have a big GRangeslist of Open reading frames in the 5' leaders, convert them from genomic to transcriptcoordinates.
I know I can use :
mapToTranscripts
but!!
Ok, so my problem, I've used redefined leaders, they are extended both start and end, and now I need the transcript coordinates back again, because I don't have the redefined leaders anymore, this is a lot of data, so I will not recompute the tx ranges from scratch, I want to be clever.
Usually I could have done something like this:
#ORFs: a list of orfs in the 5' leader
#fiveUTRs: the GRangeslist of 5' leaders
txRanges = mapToTranscripts(x = ORFs, transcripts = fiveUTRs)
But I extend my orfs into the first exon, so I need to redefine fiveUTRs to include the first exon in each for each gene, plan is now like this:
fiveUTRsWithExon = lapply(1:length(fiveUTRs), function(x) insertFirstCDS(unlist(fiveUTRs[x]),x))
insertFirstCDS = function(fiveTemp,x){
firstExon = unlist(cds[names(cds) == names(shiftedfiveUTRs[x])])[1]
return( sort(c(fiveTemp,firstExon)) ) #return sorted combination
}
This is terribly slow even for just a few 100 MB of data, and I need to do several TB of data, so any idea ?