Calculating FPKM from raw counts data using fpkm() in deseq2
1
1
Entering edit mode
Beginner ▴ 60
@beginner-15939
Last seen 21 months ago
Switzerland

 

I have raw counts data from featureCounts. I actually wanted to do survival analysis. For a specific gene I want to classify the samples into Low and High based on expression cutoff. For that I'm using maxstat package.

First I would like to convert raw counts to FPKM. So, I did like following. 

              sample1   sample2   sample3   sample4   sample5
A1BG-AS1      195         612       145       131       300
A2M-AS1       373         445       573      1388      1386
A2ML1-AS1     75          27         45       18        35
A2ML1-AS2      0           0         0         0        0
AA06           0           0         0         0        0

I have a matrix like above having genes as rows and samples as columns.

library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = counts,
                              colData = coldata,
                              design = ~ Type)
dds
dds <- estimateSizeFactors(dds)

And now I used fpm() function to calculate FPKM from counts data.

fpkm_data <- fpkm(dds)
Error in fpkm(dds) : rowRanges(object) has all ranges of zero width.
the user should instead supply a column, mcols(object)$basepairs,
which will be used to produce FPKM values​

Sorry, no idea about what to do now. Can you please tell me what I need to do to the data to calculate FPKM. Thank you

deseq2 r fpkm() geneexpression • 13k views
ADD COMMENT
0
Entering edit mode

Hello,

I am having the same question. So I have a matrix with the gene length which i calculated from the gff file and I want to add this in the dds in order to run the fpkm(dds) can you help me with this?

ADD REPLY
1
Entering edit mode

Just add the number of basepairs to here:

mcols(dds)$basepairs = ...
ADD REPLY
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

It’s not trivial to calculate the gene lengths. I don’t have any simple generic code to get the gene lengths (exonic basepairs).

I collaborated on Salmon which is a sophisticated model for estimating abundance, so that’s my solution if users want to work with expression values, rather than making arbitrary calculations on counts (arbitrary because you don’t know the gene length).

ADD COMMENT
0
Entering edit mode

I can actually get the gene lengths from the annotation gtf file. So, if I add that gene_length column to the matrix will it work?

ADD REPLY
0
Entering edit mode

Try following the advice given in the message from the software and see how it works. If you get stuck then it’s appropriate to post.

 

ADD REPLY

Login before adding your answer.

Traffic: 459 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6