For a while I used quantile normalization followed by t-test or cuffdiff for analysis of RNASeq. I would like to try DESeq2 given its better normalization method based on the readings. I am using your R package summarizeOverlaps to create count tables as I found it to be the easiest. However, I am encountering couple of issues. First,the ensemble-based gtf file that I used to align all my mouse RNAseq data has the chromosomes named as “1, 2 , 3”… Rather than “chr1 , chr2 ..”. My Testing R code for summarizeOverlaps is as follows:
library(TxDb.Mmusculus.UCSC.mm10.ensGene) TxDb_mm10=TxDb.Mmusculus.UCSC.mm10.ensGene saveDb(TxDb_mm10, file="mm10.sqlite”) mm10<-loadDb("mm10.sqlite”) exbyGene<-exonsBy(mm10,by="gene”) #I assume this is assigning things by GENES ids (ENSG..) exbyTx<-exonsBy(mm10,by="tx”) #I assume this is assigning things by Transcript IDS (ENST…) fls <- list.files(“Path/To/BamFiles", pattern=".accepted_hits.bam", full= TRUE) grp1<- fls[1] grp2 <- fls[c(2,3)] bamLst_grp1<-BamFileList(grp1,yieldSize=100000) bamLst_grp2<-BamFileList(grp2,yieldSize=100000) head(seqlevels(TxDb_mm10)) # this shows that the mm10 sqlite I am using has chr1, chr2 .. Format seqinfo(bamLst_grp1) #this shows that the chromosome names for my testing bam files are 1, 2, 3… se_grp1<-summarizeOverlaps(exbyGene,bamLst_grp1,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE) se_grp2<-summarizeOverlaps(exbyGene,bamLst_grp2,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE) ##Saving environment saveRDS(se_grp1, file="se_grp1.Rdata") saveRDS(se_grp2, file="se_grp2.Rdata”)
While the script runs nicely (and please, advice me if I should optimize anything as I have basic understanding of each of the commands), when I tried to load the environment I am getting the following error:
Error: bad restore file magic number (file may be corrupted) -- no data loaded In addition: Warning message: file ‘se_grp2.Rdata’ has magic number 'X' Use of save versions prior to 2 is deprecated
Also, I am not sure if the error is because the chromosome names are incompatible between the reference and bam file. If that’s the case, how can I correct for this?
Lastly, can i add features to the table counts being generated? For instance, information regarding TSS, exact locus, gene biotype/function that are usually in the gtf, how can we add those to the count tables if I successfully generated a table.
First, you should respond to an answer by clicking on ADD COMMENT and then typing in the box that comes up. You only type in the box under 'Add your answer' if you are actually adding an answer.
That ran without an error. But is that what you would expect to correct for things?
A better question is whether or not that actually 'corrected for things' in your mind. Did you check the seqlevels after doing that? Did you look at the examples for seqlevels? Did you look at the help for seqlevelsStyle? While it is convenient to get help on this support site, the best recourse is to learn how to figure out things for yourself, which is why I pointed you to the help file.
Also, cant i add more features for the count table other than just the geneId or name? Like the function, TSS...
Probably. Here is a pointer. You used
summarizeOverlaps()
to do something, right? If you look at the help page for that function, you can see this, under the 'Value' section:Which will likely not mean much to you, but there is a help page for RangedSummarizedExperiment objects as well. Note that you sometimes have to add '-class' for objects (as compared to functions) to get the help. That's not true in this situation, but it's something to remember.
You can see the help page by doing ?RangedSummarizedExperiment. Or by reading the vignette for the SummarizedExperiment package.
I am purposefully not answering your questions here, because that won't help you. If you are planning to use R/Bioconductor to analyze data, you need to find out how to answer questions for yourself, as you cannot rely on the kindness of strangers. Figuring out how to figure things out is the best skill you can acquire.
I understand where you are coming from and you are right. I should have tried seqlevels on BamLst after the change before posting. I did now and that didn't seem to fix the problem. I did go over the vignette in summarizedOverlaps but that didn't seem to help me fix the issue. I will look more into it.
Thanks-