Question

Find DEG from RNA-seq: What if have only RPKM data?

0

Entering edit mode

Yue Zhao • 0

@yue-zhao-7450

Last seen 10.0 years ago

China

Hi all,

Recently I'm doing the RNA-seq analysis, yet I got a problem. The data that I have is a matrix of RPKM, not the read counts, so is there any way to find DEG? As mentioned in the DESeq2 document, methods like DESeq2 can only take matrix of read counts. I tried edgeR, but it seems edgeR is also not for RPKM right? As the original RNA-seq data has been deleted by the person who gave me the RPKM data, I'm wondering if there is some way to analyze the RPKM matrix and get the DEG between some inner groups of my data? (the species is cotton, the RPKM matrix is 37000*50, which could be grouped into 6 groups, each group has different number of samples.)

Looking forward to your reply and many thanks for reading this email from a stranger :)

Best,

yue

deg rna-seq rpkm • 7.7k views

ADD COMMENT • link updated 10.1 years ago by b.nota ▴ 370 • written 10.1 years ago by Yue Zhao • 0

score 1 · Answer 1 · 2015-03-11

1

Entering edit mode

b.nota ▴ 370

@bnota-7379

Last seen 4.6 years ago

Netherlands

Hi Yue,

It is not advised to use RPKM data for statistical analysis in DESeq2 or edgeR. I don't know what you mean by original data (fastq or bam?), but I would highly recommend not to delete raw data before you publish your study.

I don't know how the person calculated RPKM values, but you might want to ask this person to reverse the calculation.

Usually RPKM is calculated by:

Numb. of mapped reads / (length of transcript / 1000) / (total reads / 10^6)

Correct me if I am wrong.

So if you know the total of reads of each sample (library) and the gene length of each transcript you can calculate the number of mapped reads back.

Hope this helps!

Ben

ADD COMMENT • link 10.1 years ago b.nota ▴ 370

2

Entering edit mode

If the RPKM values were calculated by cufflinks, then they are NOT able to be back-translated to integer counts. While RPKM is not the most ideal normalization, it's not horrible (except for very low expression genes, but you should filter these out anyway). If that's all you have, then I would suggest using standard limma, not the voom normalization, to find DEGs. You could also try going back to the center that did your sequencing to see if they have a copy of the original .fastq files.

Good luck!

Jenny

ADD REPLY • link 10.1 years ago Jenny Drnevich ★ 2.0k

0

Entering edit mode

Thanks Ben, and Jenny! I asked the person who gave me the data, and he finally found the read counts data somewhere...If he didn't, I think maybe I'll use voom and limma instead, cuz the RPKM was calculated by cufflinks. I didn't notice that RPKM is not supported by edgeR before, so the GO analysis result is a total mess. It's so appreciated to have the kind responses of you guys!

Best wishes,

Yue

ADD REPLY • link 10.1 years ago Yue Zhao • 0