WGCNA differences two groups 10 replicates each group
1
0
Entering edit mode
@nickydriedonks-10918
Last seen 8.5 years ago

Dear all,

I'm new to WGCNA and interested in the differences in expression between 10 tolerant and 10 sensitive plants, using RNA-seq data. 

What I understood so far is that one can start from a full dataset (e.g. 20 samples), and look for their preservation in either tolerant of sensitive plants. So to me it seems that you're looking at the overall correlation of expression in all samples and than try to determine what modules are specific to either tolerant or sensitive plants. However, as you start from a combined dataset, I am wondering about the biological relevance of the correlation analysis of the full dataset. For example, what happen to the genes are highly expressed in the tolerant group, but not/low expressed in the sensitive group? Will they still end up in a module in the full dataset? 

Intuitively, I'd like to asses the datasets separately (tolerant, 10 samples and sensitive 10 samples) and see what modules change in the sensitive group compared to the tolerant group. However, in this approach I'd use 10 replicates only, and I've read that the analysis minimum is 15.

Can anyone provide some advice/explain what type of analysis will be useful? 

Much appreciated,

Nicky

 

WGCNA • 4.6k views
ADD COMMENT
0
Entering edit mode

I don't think WGCNA is the best type of analysis to do here because you are looking at just one trait with just two groups: i.e. tolerant and sensitive. You could just correlate modules with a dummy variable, but this seems a bit clumsy in style of analysis. Your right you lack power to do a comparison of modules between plant groups. I'd start with conventional limma analysis and then cluster, you could just cluster the probes/genes and cut out modules using R.

ADD REPLY
0
Entering edit mode
@peter-langfelder-4469
Last seen 7 weeks ago
United States

I would start with an analysis of all samples and find modules that relate to the trait (sensitive vs. tolerant). Not sure why the commenter above thinks this is a "clumsy" analysis. Also run a differential expression analysis (I use DESeq2 for this purpose, but other packages could be used as well). Please read the WGCNA FAQ (https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html) for recommendations regarding WGCNA analysis of RNA-seq data.

Running WGCNA on 10 samples is indeed not advisable. If there's a public data set you could use to compare your modules against, I would try going that route; other than that I'm afraid I don't have any bright ideas on what other analyses to do with the data.

Peter

ADD COMMENT
0
Entering edit mode

Dear Mr. Langfelder,

Thank you for your quick reponse. Indeed, I'm working on DESeq2 as well to find differentially expressed genes. However, I'd like to get some idea behind the functioning of these genes. Using GSEA or GO analysis, I do not find many interesting things. This is why I was thinking of WGCNA as it is not biased because no selection are needed.

So if I understand it correctly, I am able to identify modules from the full dataset, and then compare these to the phenotype (binary), meaning looking for the preservation/differences of the full dataset to the 10 samples within each group? Comparable to Weston 2008 et al, fig 2 and 3 (http://download.springer.com/static/pdf/236/art%253A10.1186%252F1752-0509-2-16.pdf?originUrl=http%3A%2F%2Fbmcsystbiol.biomedcentral.com%2Farticle%2F10.1186%2F1752-0509-2-16&token2=exp=1469101943~acl=%2Fstatic%2Fpdf%2F236%2Fart%25253A10.1186%25252F1752-0509-2-16.pdf*~hmac=8c6ae9a3bfdfda1cb4c92c72c7c8d702ab15596047b7ad72e1603b22296601a8) but less extensive?

What I still don't understand though, is what will happen to the genes that are differentially expressed/have a high variance between two treatments/groups, when they are put in the full dataset in order to find the modules. How will those genes be assigned to the modules? ICould you maybe comment on that? I'm sorry if I'm asking basic questions, but this type of analysis is rather difficult for me.

Much appreciated. 

Nicky

 

ADD REPLY
0
Entering edit mode

Weston et al did a WGCNA on the entire data set (in your case, sensitive and resistant together). They identified modules and related the module eigengenes to the various traits. In your case, you could plot the eigengenes vs. trait either as a boxplot (more informative for a statistician), or as barlpot of the individual samples (probably more intuitive to a biologist).

Modules in WGCNA are groups of co-expressed genes. If genes relate strongly to a single trait, they will also be co-expressed, so many differentially expressed genes will probably end up in modules whose module eigengenes are associated with the trait. Sometimes there are groups of genes that relate to the trait but not necessarily to each other; these will end up as different modules whose eigengenes will be associated with the trait.

In terms of running WGCNA, Weston et al used a by now very old version (10+years) of the WGCNA code, predating the WGCNA R package. I suggest that you work through the WGCNA tutorials if you haven't done so yet, and read the WGCNA FAQ regarding choices of arguments/parameters of the analysis.

ADD REPLY
0
Entering edit mode

Because you can just cluster the DE genes and samples with hierarchical clustering and then look for co expressed modules from the heatmap which I think is a more parsimonious solution in this case. I think the correlation matrix looks and works better when you have multiple continuous traits. That is just my opinion.

ADD REPLY
0
Entering edit mode

Setting aside the technical details of "looking for co expressed modules from the heatmap", people have found WGCNA useful precisely because it clusters all (or at least a representative subset of) genes and provides information about the transcriptional organization of the genome, relating the modules to the trait after the modules have been identified. This means that the the DE genes are put in the context of a network representing the entire genome.

Clustering of DE genes usually leads to 2 large clusters (one for each direction of DE), perhaps with some rather less well-defined subclusters. This is because significantly DE genes are rather strongly correlated with the trait, and hence are strongly correlated among each other. The trouble is that you lose what I call the context - the other genes that are coexpressed with your DE genes but may not pass the DE threshold.

All of the above is independent of whether you have one or multiple, categorical or continuous traits.

ADD REPLY
0
Entering edit mode

Interesting points.

Although, I would of thought your modules created using WGCNA will be pretty similar to those created from unsupervised clustering and cutting of the dendrogram using hierarchical clustering?

ADD REPLY
0
Entering edit mode

WGCNA does module identification using an unsupervised clustering and branch cutting. The details are different between WGCNA and a standard cor/dist/hclust/cutree combination, but the results will be broadly similar. (Of course I like to think that all the fancy code in WGCNA and Dynamic Tree Cut brings some real benefits to module identification.) The bigger difference is whether you start with all genes, or with just the DE genes.

ADD REPLY
0
Entering edit mode

OK thanks for the info, I've enjoyed playing around with WGCNA.

ADD REPLY
0
Entering edit mode

Dear Mr. Langfelder,

Thank you for your clear explanation. I think I'll give it a go and see what comes out of the analysis.

Indeed, I'm aware of all the updates and will use R for the analysis.

Many thanks for your help.

Best,

Nicky

ADD REPLY

Login before adding your answer.

Traffic: 766 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6