Hi,
I have a question regarding the input data into DExSeq. I'm working with level 3 TCGA exon_quantification data which looks likes this:
Hybridization REF TCGA-3C-AAAU-01A-11R-A41B-07 TCGA-3C-AALI-01A-11R-A41B-07 TCGA-3C-AALJ-01A-31R-A41B-07
exon raw_counts raw_counts raw_counts raw_counts
chr1:11874-12227:+ 29 23 18 2
chr1:12595-12721:+ 7 10 1 0
chr1:12613-12721:+ 7 10 1 0
chr1:12646-12697:+ 6 5 1 0
I have the raw counts for each exon. The problem I'm having is turning this into something I can use in DexSeq. I don't have access to any of the orginal SAM/BAM files, and my dataset just has raw counts which I think I can use in DexSeq but the problem I'm having is converting the exon co-ordinates into transcript IDs. I have a gtf file downloaded. I was wondering is anyone has any suggestions on how to do this or perhaps any bioconductor tools that will give me that output I need.
Thanks in advance.
Why not download transcript level data instead, e.g. from https://jhubiostatistics.shinyapps.io/recount/ ? It's all already been compiled and you can be confident that there aren't any missing exons etc...
Probably should have mentioned, the reason I'm working with exon level data is that I'm looking at a novel transcript, I have the chr co-ordinates for the novel transcript so the idea was to annotate the gtf file with the novel transcript and use DexSeq to look at expression levels.