Question

RNseq data for WGCNA

0

Entering edit mode

joseph ▴ 50

@joseph-5658

Last seen 7.5 years ago

I face some problem when I apply WGCNA in the RNseq data. Here’s the thing,

I have the raw court data, which looks like as:

head(RawCount_all_Filter)

P6_A1 P6_A2 P6_A3 P6_A4 P6_A5 P6_A6 P6_A7 P6_A8 P6_A9 P6_A10 P6_A11 P6_A12 P6_B1 P6_B2 P6_B3 P6_B4 P6_B5 P6_B6 P6_B7 P6_B8

ENSG00000000419 0 375 527 337 0 607 595 149 0 0 0 0 0 0 567 0 322 1 0 42

ENSG00000000457 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 283 0

Then, I directly apply the count data to vsd = varianceStabilizingTransformation(RawCount_all_Filter), which looks fine.

I got the Expr = as.data.frame(t(vsd)) and Bench = t(Phenotype_all).

Then, checking the data, and cluster the sample as usual.

https://bri.box.com/s/solx4maup2g46jbcg1qe14s9vnszzsok

The height looks not like as usual in the microarray data, which I don’t if that’s correct.

But the scale free topology figure looks fine:

https://bri.box.com/s/9tn9frq2asajqgk48haf806av4j0vkk6

I select power = 10.

Then calculate the Co-expression similarity and adjacency matrix and TOM,

softPower = beta1

adjacency = adjacency(Expr, power = softPower,type = "signed")

TOM = TOMsimilarity(adjacency,TOMType = "signed")

dissTOM = 1-TOM

I get modules like this:

https://bri.box.com/s/35br575uafusinti45a0e9nuywghn8lk

> table(dynamicMods)

dynamicMods

   0    1    2

  14 8041  305

I pretty shock about the results, and I don’t know how to interrupt or modify the analysis.

Later around, the module-trait relationships look awful,

https://bri.box.com/s/ofawxu03hlibrcik3totja66qn6s40z1

Could you please point out what’s going on my data? And how to improve it?

Thank you so much!

WGCNA • 1.7k views

ADD COMMENT • link updated 7.6 years ago by Peter Langfelder ★ 3.0k • written 7.6 years ago by joseph ▴ 50

0

Entering edit mode

I don't find the problem. What do you want to improve? You would like to have more modules?

ADD REPLY • link 7.6 years ago Lluís Revilla Sancho ▴ 760

0

Entering edit mode

Thanks. I'm not sure if few modules that I get looks "normal". Yes, I would like to have more modules. Meanwhile, the module-trait correlations are pretty low, I wonder to figure it out whether my data processing problem or it's the real of the data.

ADD REPLY • link 7.6 years ago joseph ▴ 50

0

Entering edit mode

I think you can add a limit on your module size, or given the TOM distance you could build different networks by cutting at different heights. Maybe there isn't correlation between the group and the traits you are testing. You could also check if building other type of networks adjacency(..., type = "unsigned") or "signed hybrid" would change that, but that depends on what are you looking for in the network.

ADD REPLY • link 7.6 years ago Lluís Revilla Sancho ▴ 760

score 0 · Answer 1 · 2017-09-20

0

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 6 months ago

United States

It is impossible to diagnose issues with your data or analysis without seeing the data (the code looks reasonable but it obviously isn't complete). One usually sees at least a few distinct branches in WGCNA analysis of expression data, but again, without knowing what your data represent and seeing the actual data it is impossible to say anything of substance.

ADD COMMENT • link 7.6 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Here's the data link:

https://bri.box.com/s/412z4zvjerypzffk51ptv368u6l5pxlf

load("Filtered and annotated Count data.Rdata")

RawCount_all_new, gene_annotation_new, Phenotype_all

#------------------------------------------------------------------------------------------------#
# Removing genes that are lowly expressed
#------------------------------------------------------------------------------------------------#
# Keep samples with cpm > 5,and more than 10 samples
keep.exprs = rowSums(cpm(RawCount_all_new) > 5) >= 10

# Filter the Count
RawCount_all_Filter = RawCount_all_new[keep.exprs,]
dim(RawCount_all_Filter)

#------------------------------------------------------------------------------------------------#
# Normalising gene expression distributions
#------------------------------------------------------------------------------------------------#
# Varicance Stabilization
vsd = varianceStabilizingTransformation (RawCount_all_Filter)

# Important: transpose the expression data to rows for samples and columns for genes
Expr <- as.data.frame(t(vsd))

Bench <- t(Phenotype_all)

ADD REPLY • link 7.6 years ago joseph ▴ 50

0

Entering edit mode

The link to data is invalid (box reports the file has been removed or is unavailable).

Plot a sample tree from the "Expr" data; you may have a monstrous batch effect.

I hope the raw counts are really raw counts, not some transformed version. Your filter seems too stringent to me, I would filter on raw counts being greater than 5, not CPM (which are typically much less since each sample has many millions of reads).

ADD REPLY • link 7.6 years ago Peter Langfelder ★ 3.0k