I am a bit lost while analyzing with WGCNA. I want to correlate mRNA data to lncRNA.
Which tutorial should I follow? The tutorial on the official page is divided into 3 parts. which one do I follow?
Can I put both lncRNA + mRNA normalized counts in 1 file and analyze using the 1st tutorial?
After I followed the 1st tutorial and I generated a file "geneinfo.csv". I see lot of values but how I do I know which genes are co-related?
I think that most use cases, including that of yours, are covered by the first tutorial, 'I. Network analysis of liver expression data from female mice: finding modules related to body weight'. You would have a dataset that contains both the protein coding and long non-coding RNAs (lncRNAs) together, and you would use this as input to WGCNA. Some of the derived modules should then contain a mixture of both types of RNAs, but not necessarily all [modules].
As lncRNAs are typically lowly expressed, I would encourage you to rigorously filter these out of your dataset prior to doing anything. By this, I mean filter them out at the raw data stage, prior to normalisation. Leaving any lowly expressed RNAs in your analysis could greatly bias the formation of the network that is ultimately produced.
I do not know what you mean by your second point. If you still require help with that part, please elaborate further. For example, please show some of the contents of the file, as we cannot see your screen from here.
Finally, keep in mind that WGCNA is not actually a Bioconductor package.
Thank you so much. I have been struggling to figure it out from the last few days.
So, I did prefiltering by removing counts of less than 10.
But now, as I was analyzing the data, I am seeing the interaction among different LncRNA's.
None of them are between lnc-mrna
What could be the problems?
I observed a similar pattern with CEMItool
May I ask how you have processed the data prior to using WGCNA? If possible, you could show actual code used? Describing the general experiment may also help.
datExpr = read.csv("/Users/rajpa/Desktop/trancscript_id_lnc/nodules_vs_control/normalized.csv")
head(datExpr)
# Manipulate file so it matches the format WGCNA needs
row.names(datExpr) = datExpr$X
datExpr$X = NULL
datExpr = as.data.frame(t(datExpr)) # now samples are rows and genes are columns
dim(datExpr)
gsg = goodSamplesGenes(datExpr, verbose = 3)
gsg$allOK
#Create an object called "datTraits" that contains your trait data
datTraits = read.csv("/Users/rajpa/Desktop/sample_anno.csv")
head(datTraits)
#form a data frame analogous to expression data that will hold the clinical traits.
rownames(datTraits) = datTraits$Sample
datTraits$Sample = NULL
table(rownames(datTraits)==rownames(datExpr)) #should return TRUE if datasets align correctly, otherwise your names are out of order
head(datTraits)
# You have finished uploading and formatting expression and trait data
# Expression data is in datExpr, corresponding traits are datTraits
save(datExpr, datTraits, file="SamplesAndTraits.RData")
#load("SamplesAndTraits.RData")
# Choose a soft threshold power-
powers = c(c(1:10), seq(from =10, to=30, by=1)) #choosing a set of soft-thresholding powers
sft = pickSoftThreshold(datExpr, powerVector=powers, verbose =5, networkType="signed") #call network topology analysis function
sizeGrWindow(9,5)
par(mfrow= c(1,2))
cex1=0.9
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2], xlab= "Soft Threshold (power)", ylab="Scale Free Topology Model Fit, signed R^2", type= "n", main= paste("Scale independence"))
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2], labels=powers, cex=cex1, col="red")
abline(h=0.90, col="red")
plot(sft$fitIndices[,1], sft$fitIndices[,5], xlab= "Soft Threshold (power)", ylab="Mean Connectivity", type="n", main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1, col="red")
#from this plot, we would choose a power of 18 becuase it's the lowest power for which the scale free topology index reaches 0.90
#build a adjacency "correlation" matrix
enableWGCNAThreads()
softPower = 18
adjacency = adjacency(datExpr, power = softPower, type = "signed") #specify network type
head(adjacency)
# Construct Networks-
#translate the adjacency into topological overlap matrix and calculate the corresponding dissimilarity:
TOM = TOMsimilarity(adjacency, TOMType="signed") # specify network type
dissTOM = 1-TOM
# Generate Modules --------------------------------------------------------
# Generate a clustered gene tree
geneTree = flashClust(as.dist(dissTOM), method="average")
plot(geneTree, xlab="", sub="", main= "Gene Clustering on TOM-based dissimilarity", labels= FALSE, hang=0.04)
#This sets the minimum number of genes to cluster into a module
minModuleSize = 30
dynamicMods = cutreeDynamic(dendro= geneTree, distM= dissTOM, deepSplit=2, pamRespectsDendro= FALSE, minClusterSize = minModuleSize)
dynamicColors= labels2colors(dynamicMods)
MEList= moduleEigengenes(datExpr, colors= dynamicColors,softPower = softPower)
MEs= MEList$eigengenes
MEDiss= 1-cor(MEs)
METree= flashClust(as.dist(MEDiss), method= "average")
save(dynamicMods, MEList, MEs, MEDiss, METree, file= "Network_allSamples_signed_RLDfiltered.RData")
#plots tree showing how the eigengenes cluster together
#INCLUE THE NEXT LINE TO SAVE TO FILE
pdf(file="clusterwithoutmodulecolors.pdf")
plot(METree, main= "Clustering of module eigengenes", xlab= "", sub= "")
#set a threhold for merging modules. In this example we are not merging so MEDissThres=0.0
MEDissThres = 0.0
merge = mergeCloseModules(datExpr, dynamicColors, cutHeight= MEDissThres, verbose =3)
mergedColors = merge$colors
mergedMEs = merge$newMEs
#INCLUE THE NEXT LINE TO SAVE TO FILE
#dev.off()
#plot dendrogram with module colors below it
#INCLUE THE NEXT LINE TO SAVE TO FILE
pdf(file="cluster.pdf")
plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors), c("Dynamic Tree Cut", "Merged dynamic"), dendroLabels= FALSE, hang=0.03, addGuide= TRUE, guideHang=0.05)
moduleColors = mergedColors
colorOrder = c("grey", standardColors(50))
moduleLabels = match(moduleColors, colorOrder)-1
MEs = mergedMEs
#INCLUE THE NEXT LINE TO SAVE TO FILE
#dev.off()
save(MEs, moduleLabels, moduleColors, geneTree, file= "Network_allSamples_signed_nomerge_RLDfiltered.RData")
Thanks - that's quite a lot to sift though. I am sure that you have already gone back and forth through it, modifying parameters, in order to try to bring in more modules that have both lncRNAs and protein coding mRNAs, correct? Parts on which to focus include the value of minModuleSize and the cutreeDynamic command. These, among many other parameters, can control the 'resolution' of the modules that are identified.
Other key points:
your dataset is small and unbalanced between your two classes
you should use the variance-stabilised or regularised log expression
levels from DESeq2, not the normalised counts
Thank you so much. I have been struggling to figure it out from the last few days.
So, I did prefiltering by removing counts of less than 10. But now, as I was analyzing the data, I am seeing the interaction among different LncRNA's. None of them are between lnc-mrna What could be the problems? I observed a similar pattern with CEMItool
Big thanks again!!!
May I ask how you have processed the data prior to using WGCNA? If possible, you could show actual code used? Describing the general experiment may also help.
So, this is the complete code from the beginning.
This was my "trait" data:
and this from deseq to WGCNA:
Then i merged the normalized data from RNA-seq
library(WGCNA) library(flashClust)
Thanks - that's quite a lot to sift though. I am sure that you have already gone back and forth through it, modifying parameters, in order to try to bring in more modules that have both lncRNAs and protein coding mRNAs, correct? Parts on which to focus include the value of
minModuleSize
and thecutreeDynamic
command. These, among many other parameters, can control the 'resolution' of the modules that are identified.Other key points:
Thanks, Kevin... As you said i cleaned both the data sets(lnc & RNA) and applied "vst" with CEMItool. It worked!!!
Now I see coexpression modules with lnc & RNA I will try that with WGCNA as well.
Thank you !
Great to hear!