This is my first time posting. Hi everyone!
I'm working on RNAseq data from 59 samples and starting to do the WGCNA analysis. My counts were normalized to TPM, and I kept the top 10% of genes (~3600) by variance for WGCNA. I originally had 60 samples but removed a clear outlier based on my dendrogram and inspected PCA to make sure there were no batch effects. When I went to pick my beta using the pickSoftThreshold() function and subequent plot, the y-values after a power of 7 went above 1. When I expanded the graph window, I saw that the signed R^2 asymptoted near 1.5.
In the tutorial by Horvath's group, it suggests picking a power between 0.8 to 0.9 but also suggests picking this value based on when the "the lowest power for which the scale free topology fit curve flattens out upon reaching a high value." If I pick the 0.8 to 0.9, my beta is 6 but if I picked it out based on the way the curve flattens, it's closer to 9 or 10.
My questions are as follows: 1) Why does my curve go above 1? 2) Is it ok if my curve goes above 1? 3) Should I pick my value based on the 0.8 to 0.9 or where the curve flattens? 4) Is all this because I didn't transform my data from TPM to a TPM that's further normalized (like Log2)?
My table from the pickSoftThreshold() function is below:
Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k. 1 1 0.858 1.730 0.960 1370.0 1430.00 1990.0 2 2 0.240 0.312 0.685 729.0 749.00 1360.0 3 3 0.150 -0.236 0.565 445.0 439.00 996.0 4 4 0.434 -0.554 0.679 293.0 274.00 764.0 5 5 0.558 -0.739 0.763 204.0 179.00 603.0 6 6 0.628 -0.893 0.816 147.0 121.00 489.0 7 7 0.652 -1.020 0.828 110.0 83.50 403.0 8 8 0.678 -1.110 0.854 83.5 59.20 336.0 9 9 0.699 -1.180 0.873 64.9 42.90 284.0 10 10 0.719 -1.230 0.896 51.2 31.70 242.0 11 12 0.750 -1.300 0.922 33.2 18.70 179.0 12 14 0.767 -1.360 0.937 22.5 11.40 136.0 13 16 0.760 -1.420 0.918 15.8 7.10 105.0 14 18 0.779 -1.440 0.936 11.4 4.45 83.0 15 20 0.816 -1.430 0.961 8.4 2.92 66.2
The code used to make the graphs is here (note that this version has y-axis only up until 1 but I made another graph extending it to 2 to see the whole curve):
powers = c(c(1:10), seq(from = 12, to = 20, by = 2)) sft = pickSoftThreshold(df.BPSD.clean, powerVector = powers, verbose = 5) sizeGrWindow(9,5) par(mfrow = c(1,2)) cex1 = 0.9
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])sft$fitIndices[,2], xlab="Soft Threshold (power)", ylab="Scale-Free Topology Fit, signed R^2", type="n", main = paste("Scale Independence"), ylim=range(0:1.1) text(sft$fitIndices[,1], -sign(sft$fitIndices[,2])sft$fitIndices[,3], labels=powers, cex=cex1, col="red") abline(h=0.9, col="red")
plot(sft$fitIndices[,1], sft$fitIndices[,5], xlab="Soft Threshold (power)", ylab="Mean Connectivity", type="n", main=paste("Mean Connectivity")) text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1, col="red")
Thanks in advance!
Thanks for the response!
I just realized the table came out all weird when I posted. The signed R^2 value used for the y-axis is a product of the sft.R.sq and slope. To be honest, I'm not sure what the slope is, but when the two are multiplied together, that gives the signed R^2 value. That's what the tutorial suggested using as the y-axis, so that's what I did. It's the product of these two values that pushes things above 1. Hopefully this helps explain how I got the y-axis.
I've reprinted the table more readably below:
The signed R squared equals the sign of the slope times the R squared. The slope is the coefficient of the linear model fit of log (frequency) on log (connectivity). Your code in your original post shows the sign as well (the comment submission interpreted the multiplication star as start and end of italicing; you need to press the code button above to have the system display your code as code)
Ah, ok. So my code was this then:
Unfortunately, I still can't figure out why the resulting graph has values over 1. I'm not sure how to paste the graph, but I feel like the values on it don't correspond to the sign(slope) * R^2. They also don't correspond with slope * R^2 either.
If I were to just use the table, would I pick the R^2 or truncated R^2 to determine an appropriate beta? If I use the R^2, does that really mean I should be picking a beta of 20 or do you think there's something I've done wrong upstream?
Thanks for all your help; as I mentioned, I'm very new to this.
Figured out the problem. In the line of code that goes:
I accidentally switched the columns 2 and 3, which were correct in the line above's plot. Now I just need to figure out which power to use, as only the 20th power goes above 0.8.
You may want to read WGCNA FAQ point 6 (I can't get a good scale-free topology index no matter how high I set the soft-thresholding power).
Thanks for the recommendation! I checked out the FAQ there, and since I had 60 samples and were trying to implement a signed network analysis, I went with a power of 12. The resulting data were fuzzier than I would have liked when compared to my variables of interest, so I'm now trying to use SVA to make sure there are no hidden batch effects that could be artificially increasing the heterogeneity of the data.
I think I got the hang of how to post pictures, and hopefully my signed and unsigned pickSoftThreshold() graphs show up. Any guidance based on their results would always be super helpful.