Question

how to analyze WGCNA results?

0

Entering edit mode

rajpal22288 ▴ 10

@rajpal22288-22395

Last seen 3.8 years ago

Italy

I am a bit lost while analyzing with WGCNA. I want to correlate mRNA data to lncRNA.

Which tutorial should I follow? The tutorial on the official page is divided into 3 parts. which one do I follow? Can I put both lncRNA + mRNA normalized counts in 1 file and analyze using the 1st tutorial?
After I followed the 1st tutorial and I generated a file "geneinfo.csv". I see lot of values but how I do I know which genes are co-related?

Thank you in advance!

cancer probe deseq2 annotation normalization • 3.7k views

ADD COMMENT • link updated 4.9 years ago by Kevin Blighe ★ 4.0k • written 4.9 years ago by rajpal22288 ▴ 10

score 1 · Answer 1 · 2020-05-19

1

Entering edit mode

Kevin Blighe ★ 4.0k

@kevin

Last seen 5 weeks ago

Republic of Ireland

I think that most use cases, including that of yours, are covered by the first tutorial, 'I. Network analysis of liver expression data from female mice: finding modules related to body weight'. You would have a dataset that contains both the protein coding and long non-coding RNAs (lncRNAs) together, and you would use this as input to WGCNA. Some of the derived modules should then contain a mixture of both types of RNAs, but not necessarily all [modules].

As lncRNAs are typically lowly expressed, I would encourage you to rigorously filter these out of your dataset prior to doing anything. By this, I mean filter them out at the raw data stage, prior to normalisation. Leaving any lowly expressed RNAs in your analysis could greatly bias the formation of the network that is ultimately produced.

I do not know what you mean by your second point. If you still require help with that part, please elaborate further. For example, please show some of the contents of the file, as we cannot see your screen from here.

Finally, keep in mind that WGCNA is not actually a Bioconductor package.

Kevin

ADD COMMENT • link 4.9 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Thank you so much. I have been struggling to figure it out from the last few days.

So, I did prefiltering by removing counts of less than 10. But now, as I was analyzing the data, I am seeing the interaction among different LncRNA's. None of them are between lnc-mrna What could be the problems? I observed a similar pattern with CEMItool

Big thanks again!!!

ADD REPLY • link 4.9 years ago rajpal22288 ▴ 10

0

Entering edit mode

May I ask how you have processed the data prior to using WGCNA? If possible, you could show actual code used? Describing the general experiment may also help.

ADD REPLY • link 4.9 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

So, this is the complete code from the beginning.

This was my "trait" data:

 Samplename Class
X3  0
X4  0
X5  0
X6  0
X8  0
X9  0
X10 0
X11 0
X15 0
X16 0
X17 0
X128    1
X129    1
X130    1

and this from deseq to WGCNA:

setwd("/Users/rajpa/Desktop/trancscript_id_lnc/nodules_vs_control/")
logs <- dir(pattern=".counts")
 lab_names <- gsub(".*/(.*)/.*", "\\1", logs)
sample_list <- read.table("/Users/rajpa/Desktop/trancscript_id_lnc/reheader.tsv", header=T, sep="\t", as.is=T)
names(logs) <- sample_list$Client[match(lab_names, sample_list$LIMS)]
tmp_file <- read.table(file=logs[1], header=F, sep="\t")
htseq_data <- matrix(nrow=nrow(tmp_file), ncol=length(logs))
dim(htseq_data)

for (i in 1:length(logs)){
  tmp <- read.table(logs[i], sep="\t", header=F)
  htseq_data[,i] <- tmp[,2]
  cat("Processing file", i,"\n")
}

# Add row and column names
colnames(htseq_data) <- gsub(".counts","",names(logs))
rownames(htseq_data) <- tmp[,1]


# Run DeSeq2
sampleTable <- data.frame(condition = sample_list$group[match(names(logs), sample_list$Client)])
rownames(sampleTable) <- colnames(htseq_data)

# Measure impact of condition
dds <- DESeqDataSetFromMatrix(htseq_data, sampleTable, ~ condition)
dds <- DESeq(dds)

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

normalized_counts <- counts(dds, normalized=TRUE)
write.csv(x=normalized_counts, file="normalized.csv", row.names=TRUE)

Then i merged the normalized data from RNA-seq

ADD REPLY • link 4.9 years ago rajpal22288 ▴ 10

0

Entering edit mode

library(WGCNA) library(flashClust)

 datExpr = read.csv("/Users/rajpa/Desktop/trancscript_id_lnc/nodules_vs_control/normalized.csv")
head(datExpr)

# Manipulate file so it matches the format WGCNA needs
row.names(datExpr) = datExpr$X
datExpr$X = NULL
datExpr = as.data.frame(t(datExpr)) # now samples are rows and genes are columns
dim(datExpr) 


gsg = goodSamplesGenes(datExpr, verbose = 3)
gsg$allOK


#Create an object called "datTraits" that contains your trait data
datTraits = read.csv("/Users/rajpa/Desktop/sample_anno.csv")
head(datTraits)
#form a data frame analogous to expression data that will hold the clinical traits.
rownames(datTraits) = datTraits$Sample
datTraits$Sample = NULL
table(rownames(datTraits)==rownames(datExpr)) #should return TRUE if datasets align correctly, otherwise your names are out of order
head(datTraits)


# You have finished uploading and formatting expression and trait data
# Expression data is in datExpr, corresponding traits are datTraits


save(datExpr, datTraits, file="SamplesAndTraits.RData")
#load("SamplesAndTraits.RData")


# Choose a soft threshold power- 

powers = c(c(1:10), seq(from =10, to=30, by=1)) #choosing a set of soft-thresholding powers
sft = pickSoftThreshold(datExpr, powerVector=powers, verbose =5, networkType="signed") #call network topology analysis function

sizeGrWindow(9,5)
par(mfrow= c(1,2))
cex1=0.9
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2], xlab= "Soft Threshold (power)", ylab="Scale Free Topology Model Fit, signed R^2", type= "n", main= paste("Scale independence"))
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2], labels=powers, cex=cex1, col="red")
abline(h=0.90, col="red")
plot(sft$fitIndices[,1], sft$fitIndices[,5], xlab= "Soft Threshold (power)", ylab="Mean Connectivity", type="n", main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1, col="red")

#from this plot, we would choose a power of 18 becuase it's the lowest power for which the scale free topology index reaches 0.90

#build a adjacency "correlation" matrix
enableWGCNAThreads()
softPower = 18
adjacency = adjacency(datExpr, power = softPower, type = "signed") #specify network type
head(adjacency)

# Construct Networks- 
#translate the adjacency into topological overlap matrix and calculate the corresponding dissimilarity:
TOM = TOMsimilarity(adjacency, TOMType="signed") # specify network type
dissTOM = 1-TOM

# Generate Modules --------------------------------------------------------


# Generate a clustered gene tree
geneTree = flashClust(as.dist(dissTOM), method="average")
plot(geneTree, xlab="", sub="", main= "Gene Clustering on TOM-based dissimilarity", labels= FALSE, hang=0.04)
#This sets the minimum number of genes to cluster into a module
minModuleSize = 30
dynamicMods = cutreeDynamic(dendro= geneTree, distM= dissTOM, deepSplit=2, pamRespectsDendro= FALSE, minClusterSize = minModuleSize)
dynamicColors= labels2colors(dynamicMods)
MEList= moduleEigengenes(datExpr, colors= dynamicColors,softPower = softPower)
MEs= MEList$eigengenes
MEDiss= 1-cor(MEs)
METree= flashClust(as.dist(MEDiss), method= "average")
save(dynamicMods, MEList, MEs, MEDiss, METree, file= "Network_allSamples_signed_RLDfiltered.RData")


#plots tree showing how the eigengenes cluster together
#INCLUE THE NEXT LINE TO SAVE TO FILE
pdf(file="clusterwithoutmodulecolors.pdf")
plot(METree, main= "Clustering of module eigengenes", xlab= "", sub= "")
#set a threhold for merging modules. In this example we are not merging so MEDissThres=0.0
MEDissThres = 0.0
merge = mergeCloseModules(datExpr, dynamicColors, cutHeight= MEDissThres, verbose =3)
mergedColors = merge$colors
mergedMEs = merge$newMEs
#INCLUE THE NEXT LINE TO SAVE TO FILE
#dev.off()

#plot dendrogram with module colors below it
#INCLUE THE NEXT LINE TO SAVE TO FILE
pdf(file="cluster.pdf")
plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors), c("Dynamic Tree Cut", "Merged dynamic"), dendroLabels= FALSE, hang=0.03, addGuide= TRUE, guideHang=0.05)
moduleColors = mergedColors
colorOrder = c("grey", standardColors(50))
moduleLabels = match(moduleColors, colorOrder)-1
MEs = mergedMEs
#INCLUE THE NEXT LINE TO SAVE TO FILE
#dev.off()

save(MEs, moduleLabels, moduleColors, geneTree, file= "Network_allSamples_signed_nomerge_RLDfiltered.RData")

ADD REPLY • link 4.9 years ago rajpal22288 ▴ 10

0

Entering edit mode

# Correlate traits --------------------------------------------------------


#Define number of genes and samples
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
#Recalculate MEs with color labels
MEs0 = moduleEigengenes(datExpr, moduleColors)$eigengenes
MEs = orderMEs(MEs0)
moduleTraitCor = cor(MEs, datTraits, use= "p")
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples)


#Print correlation heatmap between modules and traits
textMatrix= paste(signif(moduleTraitCor, 2), "\n(",
                  signif(moduleTraitPvalue, 1), ")", sep= "")
dim(textMatrix)= dim(moduleTraitCor)
par(mar= c(6, 8.5, 3, 3))

#display the corelation values with a heatmap plot
#INCLUE THE NEXT LINE TO SAVE TO FILE
#pdf(file="heatmap.pdf")
labeledHeatmap(Matrix= moduleTraitCor,
               xLabels= names(datTraits),
               yLabels= names(MEs),
               ySymbols= names(MEs),
               colorLabels= FALSE,
               colors= blueWhiteRed(50),
               textMatrix= textMatrix,
               setStdMargins= FALSE,
               cex.text= 0.5,
               zlim= c(-1,1),
               main= paste("Module-trait relationships"))
#INCLUE THE NEXT LINE TO SAVE TO FILE
#dev.off()


# Define variable weight containing the weight column of datTrait
weight = as.data.frame(datTraits$Class);
names(weight) = "weight"
# names (colors) of the modules
modNames = substring(names(MEs), 3)
geneModuleMembership = as.data.frame(cor(datExpr, MEs, use = "p"));
MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples));


names(geneModuleMembership) = paste("MM", modNames, sep="");
names(MMPvalue) = paste("p.MM", modNames, sep="");
geneTraitSignificance = as.data.frame(cor(datExpr, weight, use = "p"));
GSPvalue = as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance), nSamples));
names(geneTraitSignificance) = paste("GS.", names(weight), sep="");
names(GSPvalue) = paste("p.GS.", names(weight), sep="");

module = "brown"
column = match(module, modNames);
moduleGenes = moduleColors==module;
sizeGrWindow(7, 7);
par(mfrow = c(1,1));
verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]),
                   abs(geneTraitSignificance[moduleGenes, 1]),
                   xlab = paste("Module Membership in", module, "module"),
                   ylab = "Gene significance for body weight",
                   main = paste("Module membership vs. gene significance\n"),
                   cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)

names(datExpr)
names(datExpr)[moduleColors=="brown"]

annot = read.csv(file = "normalised - Copy.csv")
dim(annot)
names(annot)
probes = names(datExpr)
probes2annot = match(probes, annot$Y)
sumis.na(probes2annot))


geneInfo0 = data.frame(substanceBXH = probes,
                       geneSymbol = annot$X[probes2annot],
                       moduleColor = moduleColors,
                       geneTraitSignificance,
                       GSPvalue)

modOrder = order(-abs(cor(MEs, weight, use = "p")));

for (mod in 1:ncol(geneModuleMembership))
{
  oldNames = names(geneInfo0)
  geneInfo0 = data.frame(geneInfo0, geneModuleMembership[, modOrder[mod]],
                         MMPvalue[, modOrder[mod]]);
  names(geneInfo0) = c(oldNames, paste("MM.", modNames[modOrder[mod]], sep=""),
                       paste("p.MM.", modNames[modOrder[mod]], sep=""))
}
# Order the genes in the geneInfo variable first by module color, then by geneTraitSignificance
geneOrder = order(geneInfo0$moduleColor, -abs(geneInfo0$GS.weight));
geneInfo = geneInfo0[geneOrder, ]

write.csv(geneInfo, file = "geneInfo.csv")

ADD REPLY • link 4.9 years ago rajpal22288 ▴ 10

0

Entering edit mode

# Recalculate topological overlap
TOM = TOMsimilarityFromExpr(datExpr, power = 6);
# Read in the annotation file
annot = read.csv(file = "normalised - Copy.csv");
# Select module
module = "brown";
# Select module probes
probes = names(datExpr)
inModule = (moduleColors==module);
modProbes = probes[inModule];
# Select the corresponding Topological Overlap
modTOM = TOM[inModule, inModule];
dimnames(modTOM) = list(modProbes, modProbes)
# Export the network into an edge list file VisANT can read
vis = exportNetworkToVisANT(modTOM,
                            file = paste("VisANTInput-", module, ".txt", sep=""),
                            weighted = TRUE,
                            threshold = 0,
                            probeToGene = data.frame(annot$Y, annot$X) )

nTop = 30;
IMConn = softConnectivity(datExpr[, modProbes]);
top = (rank(-IMConn) <= nTop)
vis = exportNetworkToVisANT(modTOM[top, top],
                            file = paste("VisANTInput-", module, "-top30.txt", sep=""),
                            weighted = TRUE,
                            threshold = 0,
                            probeToGene = data.frame(annot$Y, annot$X) )


TOM = TOMsimilarityFromExpr(datExpr, power = 6);

annot = read.csv(file = "normalised - Copy.csv");

modules = c("brown", "red");

probes = names(datExpr)
inModule = is.finite(match(moduleColors, modules));
modProbes = probes[inModule];
modGenes = annot$Y[match(modProbes, annot$X)];
# Select the corresponding Topological Overlap
modTOM = TOM[inModule, inModule];


dimnames(modTOM) = list(modProbes, modProbes)
# Export the network into an edge list file VisANT can read
vis = exportNetworkToVisANT(modTOM,
                            file = paste("VisANTInput-", module, ".txt", sep=""),
                            weighted = TRUE,
                            threshold = 0,
                            probeToGene = data.frame(annot$Y, annot$X) )

nTop = 30;
IMConn = softConnectivity(datExpr[, modProbes]);
top = (rank(-IMConn) <= nTop)
vis = exportNetworkToVisANT(modTOM[top, top],
                            file = paste("VisANTInput-", module, "-top30.txt", sep=""),
                            weighted = TRUE,
                            threshold = 0,
                            probeToGene = data.frame(annot$Y, annot$X) )


TOM = TOMsimilarityFromExpr(datExpr, power = 6);

annot = read.csv(file = "normalised - Copy.csv");

modules = c("brown", "red");

probes = names(datExpr)
inModule = is.finite(match(moduleColors, modules));
modProbes = probes[inModule];
modGenes = annot$Y[match(modProbes, annot$X)];
# Select the corresponding Topological Overlap
modTOM = TOM[inModule, inModule];


dimnames(modTOM) = list(modProbes, modProbes)
# Export the network into edge and node list files Cytoscape can read
cyt = exportNetworkToCytoscape(modTOM,
                               edgeFile = paste("CytoscapeInput-edges-", paste(modules, collapse="-"), ".txt", sep=""),
                               nodeFile = paste("CytoscapeInput-nodes-", paste(modules, collapse="-"), ".txt", sep=""),
                               weighted = TRUE,
                               threshold = 0.02,
                               nodeNames = modProbes,
                               altNodeNames = modGenes,
                               nodeAttr = moduleColors[inModule]);

ADD REPLY • link 4.9 years ago rajpal22288 ▴ 10

1

Entering edit mode

Thanks - that's quite a lot to sift though. I am sure that you have already gone back and forth through it, modifying parameters, in order to try to bring in more modules that have both lncRNAs and protein coding mRNAs, correct? Parts on which to focus include the value of minModuleSize and the cutreeDynamic command. These, among many other parameters, can control the 'resolution' of the modules that are identified.

Other key points:

your dataset is small and unbalanced between your two classes
you should use the variance-stabilised or regularised log expression levels from DESeq2, not the normalised counts

ADD REPLY • link 4.9 years ago Kevin Blighe ★ 4.0k

1

Entering edit mode

Thanks, Kevin... As you said i cleaned both the data sets(lnc & RNA) and applied "vst" with CEMItool. It worked!!!

Now I see coexpression modules with lnc & RNA I will try that with WGCNA as well.

Thank you !

ADD REPLY • link 4.9 years ago rajpal22288 ▴ 10

1

Entering edit mode

Great to hear!

ADD REPLY • link 4.9 years ago Kevin Blighe ★ 4.0k