Dear all,
I'm actually tried to calculated the distance between genomic loci and the end / the start of UTR 3 (for coding genes).
I've used "GenomicFeatures":
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library("GenomicFeatures")
#1.UTRs size
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
tx_lens <- transcriptLengths(txdb, with.utr5_len = F, with.utr3_len = T, with.cds_len=F)
coding.gene <- tx_lens[tx_lens$cds_len !=0,]
# Genomic loci in UTR 3
loci.GR<-makeGRangesFromDataFrame(loci ,ignore.strand = F,seqnames.field = "chromosome",
strand.field="strand",start.field = "start", keep.extra.columns=T, end.field = "stop",starts.in.df.are.0based = TRUE)
# Position in each transcript
ucsc3UTRbytx <- threeUTRsByTranscript(txdb)
ucsctx<-transcripts(txdb)
loci.t <- mapToTranscriptsloci.GR,ucsc3UTRbytx)
loci.df = data.frame(loci.t)
# retrieve id - transcrit
loci.df$txname<-ucsctx@elementMetadata[loci.t@elementMetadata$transcriptsHits,"tx_name"]
# identification of loci position in UTRs
loci.df.annotated <- merge(loci.df, coding.gene , by.x =c("tx_name"), by.y =c("tx_name"),all.x = F , all.y = F)
But when I look back to my results, It's quite confusing. For example,
- UTR present in "loci.df.annotated" are not always in "coding.gene"
- UTR position in "loci.df.annotated " is sometimes out of the range of UTR3 size in "coding.gene"
- UTR position in "loci.df.annotated " is sometimes out of the range of transcript length size in "coding.gene"
...
For you, its a issue in the "GenomicFeatures"? Or an issue in script?
I'm having trouble trying to reproduce the code you provided.
#1.UTRs size,
tx_lens$cds_len
would be NULL, unless in the previous calltx_lens <- transcriptLengths(txdb, with.utr5_len = F, with.utr3_len = T, with.cds_len=F)
the argument
with.cds_len
would have to be changed to TRUE.#
Genomic loci in UTR 3you have not yet defined
loci
when you use it in the argument formakeGRangesFromDataFrame
.Could you please update the code provided so I can reproduce what you are experiencing.