Hi all,
Recently I'm doing the RNA-seq analysis, yet I got a problem. The data that I have is a matrix of RPKM, not the read counts, so is there any way to find DEG? As mentioned in the DESeq2 document, methods like DESeq2 can only take matrix of read counts. I tried edgeR, but it seems edgeR is also not for RPKM right? As the original RNA-seq data has been deleted by the person who gave me the RPKM data, I'm wondering if there is some way to analyze the RPKM matrix and get the DEG between some inner groups of my data? (the species is cotton, the RPKM matrix is 37000*50, which could be grouped into 6 groups, each group has different number of samples.)
Looking forward to your reply and many thanks for reading this email from a stranger :)
Best,
yue
If the RPKM values were calculated by cufflinks, then they are NOT able to be back-translated to integer counts. While RPKM is not the most ideal normalization, it's not horrible (except for very low expression genes, but you should filter these out anyway). If that's all you have, then I would suggest using standard limma, not the voom normalization, to find DEGs. You could also try going back to the center that did your sequencing to see if they have a copy of the original .fastq files.
Good luck!
Jenny
Thanks Ben, and Jenny! I asked the person who gave me the data, and he finally found the read counts data somewhere...If he didn't, I think maybe I'll use voom and limma instead, cuz the RPKM was calculated by cufflinks. I didn't notice that RPKM is not supported by edgeR before, so the GO analysis result is a total mess. It's so appreciated to have the kind responses of you guys!
Best wishes,
Yue