Question

WGCNA following salmon/DESeq2

2

Entering edit mode

maya.kappil ▴ 30

@mayakappil-18569

Last seen 5.6 years ago

Hello!

If I wanted to conduct WGCNA analysis following a salmon/DESeq2 workflow, would it be appropriate to use the matrix generated after applying the vst function on the dds object? Something akin to the following script:

dds<- DESeqDataSetFromTximport(txi, coldata, design = ~ batch + Sex + BW)

keep <- rowSums(counts(dds)>=1) >= 30 #perform some prefiltering

dds <- dds[keep,]

dds <- DESeq(dds)

vsd <- vst(dds, blind = FALSE) #transform while accounting for design

Thanks!

deseq2 wgcna salmon • 4.7k views

ADD COMMENT • link updated 6.4 years ago by Peter Langfelder ★ 3.0k • written 6.4 years ago by maya.kappil ▴ 30

score 6 · Accepted Answer · 2018-11-28

6

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

Yes, that would be the appropriate way to provide scaled, transformed data to a downstream method. I prefer blind=FALSE as you have here because it reduces the amount of shrinkage. It doesn't use the design when applying the transformation, only when estimating the (global) trend of within-group dispersion.

ADD COMMENT • link 6.4 years ago Michael Love 43k

0

Entering edit mode

Thanks for the quick response!

ADD REPLY • link 6.4 years ago maya.kappil ▴ 30

score 4 · Accepted Answer · 2018-11-28

4

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 5 months ago

United States

I'll second Michael's opinion, and that's also pretty much what I do, except I filter genes using a somewhat different condition. I require that a gene has a relatively high expression (e.g., 0.5 to 1 count per million reads, this translates to a counts in low tens for a typical data set with 30-50M reads per sample) in at least 1/4 of the samples (or whatever fraction is the smallest experimental group of the design). The rationale is that typical correlation analysis in WGCNA assumes (approximately) continuous data; using correlation on counts below say 5-10 which tend to be mostly zero can really lead to spurious results.

ADD COMMENT • link 6.4 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Thanks! Ah, ok - that makes sense regarding the filtering. In the code line for the filtering step, the 30 does refer to the sample size of my smallest comparison group. Counts in low tens for at least this number of samples makes sense, and we do have roughly 50M reads/sample, so I can adjust this part of the code to reflect about 1 cpm in at least 30 samples.

ADD REPLY • link 6.4 years ago maya.kappil ▴ 30