Validity of using DESeq2 with directly normalized EDASeq counts rather than offset
1
0
Entering edit mode
@vizio12341337-22616
Last seen 4.9 years ago

Hi,

I realize that when correcting for GC bias via EDAseq and doing DE in DESeq2, the recommended method is to pass EDAseq's offsets to your DESeq object as NormalizationFactors as per the DESeq2 manual. However, we have a previous analysis that was done a few years ago, where the EDAseq normalized counts were passed directly into tximports' count slot prior to creation of the DESeq object. While I know this is not the current recommended pipeline, is this completely wrong or just less ideal (I know this was recommended in original DESeq prior to NormalizationFactors)? I only ask because we have since validated a number of these targets that are significant in the old analysis but are not when doing it the recommended way.

Thanks, I appreciate your time!

deseq2 edaseq • 1.3k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 6 days ago
United States

Can you give more details? There’s not enough here for me to say if it’s valid. Why is tximport involved? What was passed to EDASeq? Please post the code so I can look it over.

ADD COMMENT
0
Entering edit mode

Hi Michael,

Thanks for the quick reply. I realize how the above could be a bit confusing without some code:

Tximport was used to import RSEM genes.results to txi.rsem and uCovar contains the gc percents.

> head(uCovar)
                gccontent
ENSG00000000003     40.40
ENSG00000000419     39.85
ENSG00000000457     40.14
ENSG00000000460     39.22
ENSG00000000938     52.97
ENSG00000000971     35.20

> roundedCounts <- as.matrix(round(txi.rsem$counts,0))
> eda <- newSeqExpressionSet(roundedCounts,featureData=uCovar, phenoData=data.frame(group = sampleTableFull$group, row.names = row.names(sampleTableFull)))
> dataWithin <- withinLaneNormalization(eda,"gccontent", which="full")
> #Replace RSEM object's counts with EDASeq normalized counts
> txi.rsem$counts <- dataWithin@assayData$normalizedCounts
> #Run initial DESeq2 analysis
> dds36 <- DESeqDataSetFromTximport(txi.rsem, sampleTableFull, ~group)
using counts and average transcript lengths from tximport
> dds36 <- DESeq(dds36)
estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

The total library size changed only very slightly with what was done:

colSums(originalCounts)/colSums(NormalizedCounts)
  CTR-A   CTR-B   CTR-C   CTR-D   CTR-E   CTR-F   MUT-A   MUT-B   MUT-C   MUT-D   MUT-E   MUT-F
0.9997960 0.9998470 0.9995003 0.9994305 0.9989726 0.9989360 0.9997949 0.9999131 0.9995712 0.9997287 0.9991146 0.9992895

These data just seem to be finicky, but this analysis being run this way lines up much better with our more recent scRNA-Seq datasets and nanostring validation. Thanks again!

ADD REPLY
0
Entering edit mode

And how did you run the quantification tool and tximport — this is also relevant.

ADD REPLY
0
Entering edit mode

For running RSEM, STAR transcriptome aligned BAMs were quantified with:

rsem-calculate-expression --bam --no-bam-output -p 8 --paired-end \
        --forward-prob 0.5  ${bamList[$counter]} /annotation/human-genome-rsem91/human \
        RSEM-out/$currentFile

The genes.results files were imported with tximport via:

> #Read in RSEM gene output
> temp = list.files(pattern="genes.results")
> txi.rsem <- tximport(temp, type = "rsem")
It looks like you are importing RSEM genes.results files, setting txIn=FALSE
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 
> txi.rsem$countsFromAbundance
[1] "no"

Thanks again and sorry for lacking important details. Let me know if anything else would be helpful.

ADD REPLY
0
Entering edit mode

Ok this looks fine. Tximport here is just calculating a transcript-usage based offset, and EDASeq is fixing GC in place.

So i dont have any issue with your pipeline that doesn’t use EDASeq offsets.

For other users with other cases, its a bit more complicated, e.g. depending on whether counts are generated from abundance, whether GC bias correction is applied by Salmon, what input goes into the R based bias modeler, etc.

ADD REPLY
0
Entering edit mode

Perfect. Thanks a ton for the info here and elsewhere!

ADD REPLY

Login before adding your answer.

Traffic: 563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6