How to build a GRangesList where each GRanges element is a CDS coordinate of gene transcripts?
Hello Bioconductor Community,

I posted this on Biostars originally, but removed it from there and posting it here now.

How do I build a GRangesList where each GRanges element is a CDS coordinate of gene transcripts? Basically, I am trying to overlap CDS coordinates from a TxDb object to CpG Loci from a GRanges object and make sure these CDS coordinates are grouped by gene transcripts.

The reproducible data is within the SesameData package used in the sesame package.

I am trying to create a txns GRangesList similar to the one below (txns.reference):

genomeInfo.mm10 <- sesameData::sesameDataGet('genomeInfo.mm10')
txns.reference <- genomeInfo.mm10$txns

I am trying to do this for the mm39 assembly, but for the sake of providing a reproducible example, I'll only include an mm10 working example.

This is how far I have gotten:

MM285.mm10.manifest <- sesameData::sesameDataGet('MM285.mm10.manifest')
mm10.txdb <- GenomicFeatures::makeTxDbFromEnsembl(organism = "Mus musculus", release = 102)
seqlevelsStyle(mm10.txdb) <- "UCSC"
txns.reproducible.example <- cdsByOverlaps(x = mm10.txdb, ranges =  MM285.mm10.manifest, columns = c("CDSSTART","CDSEND"))

The txns.reproducible.example is a GRanges object not a GRangesList, and it does not contain NAMES of the gene transcripts as txns.refernce does. I have tried many ways, but no success yet.

I would appreciate help from anyone. Thank you in advance!


Is this moving in the direction of what you'd like?

seqlevelsStyle(mm10.txdb) = "UCSC"
txns.reproducible.example <- cdsByOverlaps(x = mm10.txdb, 
   ranges =  MM285.mm10.manifest, columns = c("CDSSTART","CDSEND", "TXNAME", "CDSNAME"))
zz = split(txns.reproducible.example, unlist( txns.reproducible.example$TXNAME))
Thank you very much! This is perfect, specifically this line here:

zz = split(txns.reproducible.example, unlist( txns.reproducible.example$TXNAME))

So I realized that the txns.reference was most likely created by cds() rather than cdsOverlaps(). Regardless you brought me all the way. : ) Thank you

For reference if anyone needs this in the future, this accomplished what I needed to do:

MM285.mm10.manifest <- sesameData::sesameDataGet('MM285.mm10.manifest')
mm10.txdb <- GenomicFeatures::makeTxDbFromEnsembl(organism = "Mus musculus", release = 102)
seqlevelsStyle(mm10.txdb) = "UCSC"
txns.reproducible.example <- GenomicFeatures::cds(x = mm10.txdb,  columns = c("CDSSTART","CDSEND", "TXNAME"))
txns = split(txns.reproducible.example, unlist(txns.reproducible.example$TXNAME))

mcols(txns, level="within")[, "cdsStart"] <- mcols(txns, level="within")[, "CDSSTART"]
mcols(txns, level="within")[, "cdsEnd"] <- mcols(txns, level="within")[, "CDSEND"]
txns <- txns[, c("cdsStart", "cdsEnd")]

