Question

Normalisation - Deseq2 vs StingTie-Ballgown

0

Entering edit mode

JindrichK • 0

@jindrichk-22569

Last seen 5.3 years ago

Hi

I have a normalisation question. I am having some inconsistencies between mRNA abundance estimation from Deseq2 and StringTie-Ballgown. I get that that there are many differences between the two, but if you consider 1 gene that has 1 transcript, and use the same bam input file, the main difference between the 2 algorithms is the normalisation - correct?

Attached is the bamcoverage of such a gene. Read Coverage And below are the rpkm estimated by Deseq2 (gene level) and StringTie-ballgown (transcript level) - commands used are at the end of this post :

                                    AMP (blue track)         DLM (green track)
 fpkm by Ballgown                    40.6                        5.1
 fpkm by Deseq2                      21.3                        13.1

The fold change between the 2 conditions according to stringtie is much closer to what you see on the pile up. Is that because StringTie and bamCoverage use the same kind of normalisation algorithm? And if so, which is closer to the "biological truth", Deseq2 or StringTie/read Coverage?

Thanks!

Commands used: StringTie stringtie -e -B -G ${GTF} -o transcripts.gtf -A gene_abundances.tsv input.rmdup.bam

Deseq2 (using featureCounts counts) featureCounts -T $threads -p -F GTF -t exon -g gene_id -s 2 -a ${GTF} -o out.featurecount input.rmdup.bam FPKM values calculated in Deseq2 with: fpkmNormalisedCounts <- as.data.frame(fpkm(analysisObject, robust =TRUE))

Bigwig bamCoverage -b input.rmdup.bam --ignoreDuplicates --effectiveGenomeSize 142573017 --normalizeUsing RPKM --filterRNAstrand forward -of bigwig -o output.bw

deseq2 StringTie RNA-seq normalisation • 4.0k views

ADD COMMENT • link updated 5.3 years ago by Michael Love 43k • written 5.3 years ago by JindrichK • 0

score 0 · Answer 1 · 2019-12-17

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

The fpkm function in DESeq2 is using whatever gene length you provide. So it's not a question of StringTie vs DESeq2, but featureCounts vs StringTie. You can import StringTie data directly into DESeq2 using tximport (has support for type="stringtie"), which would be a 1-to-1 comparison.

ADD COMMENT • link 5.3 years ago Michael Love 43k

0

Entering edit mode

Thanks for the fast reply Michael. I understand that but I'm not concerned about the fact that Im getting different values. I'm concerned that Im getting different FC (the gene/transcript length would be the same for both conditions - AMP vs DLM)

Im still confused as to why the Deseq2 fpkm don't match the read coverage? I guess I'm going back to - which is closer to the biological truth?

ADD REPLY • link 5.3 years ago JindrichK • 0

0

Entering edit mode

FPKM is counts of reads scaled by gene length and library size. StringTie and featureCounts don't agree on gene length. Then the DESeq2 part is just library size. If you use robust=FALSE you will get classic division by the total sum of counts. If you use robust=TRUE we adapt to provide a better estimate of library size than the total sum. So you have a variety of different components and ways to compute FPKM here.

ADD REPLY • link 5.3 years ago Michael Love 43k