Question

WGCNA Soft Thresholding: Why does my y-axis for signed R^2 (R-squared) go above 1 and asymptote at 1.5?

0

Entering edit mode

DWF • 0

@dwf-23181

Last seen 4.9 years ago

Seattle, WA

This is my first time posting. Hi everyone!

I'm working on RNAseq data from 59 samples and starting to do the WGCNA analysis. My counts were normalized to TPM, and I kept the top 10% of genes (~3600) by variance for WGCNA. I originally had 60 samples but removed a clear outlier based on my dendrogram and inspected PCA to make sure there were no batch effects. When I went to pick my beta using the pickSoftThreshold() function and subequent plot, the y-values after a power of 7 went above 1. When I expanded the graph window, I saw that the signed R^2 asymptoted near 1.5.

In the tutorial by Horvath's group, it suggests picking a power between 0.8 to 0.9 but also suggests picking this value based on when the "the lowest power for which the scale free topology fit curve flattens out upon reaching a high value." If I pick the 0.8 to 0.9, my beta is 6 but if I picked it out based on the way the curve flattens, it's closer to 9 or 10.

My questions are as follows: 1) Why does my curve go above 1? 2) Is it ok if my curve goes above 1? 3) Should I pick my value based on the 0.8 to 0.9 or where the curve flattens? 4) Is all this because I didn't transform my data from TPM to a TPM that's further normalized (like Log2)?

My table from the pickSoftThreshold() function is below:

Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k. 1 1 0.858 1.730 0.960 1370.0 1430.00 1990.0 2 2 0.240 0.312 0.685 729.0 749.00 1360.0 3 3 0.150 -0.236 0.565 445.0 439.00 996.0 4 4 0.434 -0.554 0.679 293.0 274.00 764.0 5 5 0.558 -0.739 0.763 204.0 179.00 603.0 6 6 0.628 -0.893 0.816 147.0 121.00 489.0 7 7 0.652 -1.020 0.828 110.0 83.50 403.0 8 8 0.678 -1.110 0.854 83.5 59.20 336.0 9 9 0.699 -1.180 0.873 64.9 42.90 284.0 10 10 0.719 -1.230 0.896 51.2 31.70 242.0 11 12 0.750 -1.300 0.922 33.2 18.70 179.0 12 14 0.767 -1.360 0.937 22.5 11.40 136.0 13 16 0.760 -1.420 0.918 15.8 7.10 105.0 14 18 0.779 -1.440 0.936 11.4 4.45 83.0 15 20 0.816 -1.430 0.961 8.4 2.92 66.2

The code used to make the graphs is here (note that this version has y-axis only up until 1 but I made another graph extending it to 2 to see the whole curve):

powers = c(c(1:10), seq(from = 12, to = 20, by = 2)) sft = pickSoftThreshold(df.BPSD.clean, powerVector = powers, verbose = 5) sizeGrWindow(9,5) par(mfrow = c(1,2)) cex1 = 0.9

plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])sft$fitIndices[,2], xlab="Soft Threshold (power)", ylab="Scale-Free Topology Fit, signed R^2", type="n", main = paste("Scale Independence"), ylim=range(0:1.1) text(sft$fitIndices[,1], -sign(sft$fitIndices[,2])sft$fitIndices[,3], labels=powers, cex=cex1, col="red") abline(h=0.9, col="red")

plot(sft$fitIndices[,1], sft$fitIndices[,5], xlab="Soft Threshold (power)", ylab="Mean Connectivity", type="n", main=paste("Mean Connectivity")) text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1, col="red")

Thanks in advance!

WGCNA Soft Thresholding • 4.3k views

ADD COMMENT • link updated 4.9 years ago by Peter Langfelder ★ 3.0k • written 4.9 years ago by DWF • 0

score 0 · Answer 1 · 2020-03-26

0

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 3 months ago

United States

Not sure I understand your question. Although the table you posted is not really readable, I am able to figure out which numbers are R^2 and truncated R^2 and none of them seem to be above 1.

In terms of gene filtering, I would suggest reading WGCNA FAQ and the relevant sections about filtering genes and transforming the data.

ADD COMMENT • link 4.9 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Thanks for the response!

I just realized the table came out all weird when I posted. The signed R^2 value used for the y-axis is a product of the sft.R.sq and slope. To be honest, I'm not sure what the slope is, but when the two are multiplied together, that gives the signed R^2 value. That's what the tutorial suggested using as the y-axis, so that's what I did. It's the product of these two values that pushes things above 1. Hopefully this helps explain how I got the y-axis.

I've reprinted the table more readably below:

Power   SFT.R.sq   slope  truncated.R.sq    mean.k.  median.k.   .max.k.
1   0.857686972 1.725987096 0.959957022 1368.576391 1427.456397 1988.453593
2   0.240065025 0.312117489 0.685150541 729.0913356 748.8438291 1355.26853
3   0.149837198 -0.236019548    0.565207767 444.6535609 438.604915  995.6315552
4   0.434160096 -0.554470695    0.67915113  293.3622061 274.1298842 763.753279
5   0.557863219 -0.738689861    0.762578079 203.9963212 179.4915785 603.1017391
6   0.627882189 -0.893103073    0.815774558 147.3815432 121.0758076 488.6139395
7   0.65165072  -1.020332062    0.828311747 109.653702  83.47702964 402.5893476
8   0.677973116 -1.111163696    0.854099224 83.52282408 59.165775   336.1099291
9   0.699080802 -1.182519844    0.872625996 64.86176557 42.91936901 283.6653891
10  0.719202395 -1.23178329 0.896166641 51.19790301 31.73314414 241.5969029
12  0.750127154 -1.299112213    0.921745337 33.20638069 18.73637534 179.2247616
14  0.766881867 -1.355287208    0.937090424 22.48884148 11.44349214 136.1922377
16  0.760108047 -1.416977826    0.917954061 15.76561445 7.099742959 105.4897124
18  0.779028097 -1.437508623    0.935625431 11.37118465 4.447718671 82.99701214
20  0.816120412 -1.426771757    0.960963157 8.400533997 2.922222408 66.16028593

ADD REPLY • link 4.9 years ago DWF • 0

0

Entering edit mode

The signed R squared equals the sign of the slope times the R squared. The slope is the coefficient of the linear model fit of log (frequency) on log (connectivity). Your code in your original post shows the sign as well (the comment submission interpreted the multiplication star as start and end of italicing; you need to press the code button above to have the system display your code as code)

ADD REPLY • link 4.9 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Ah, ok. So my code was this then:

powers = c(c(1:10), seq(from = 12, to = 20, by = 2))
sft = pickSoftThreshold(df.BPSD.clean, powerVector = powers, verbose = 5)
sizeGrWindow(9,5)
par(mfrow = c(1,2))
cex1 = 0.9

plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2], 
                                    xlab="Soft Threshold (power)",
                                    ylab="Scale-Free Topology Fit, signed R^2",
                                    type="n", main = paste("Scale Independence"), ylim=range(0:2))
text(sft$fitIndices[,1], -sign(sft$fitIndices[,2])*sft$fitIndices[,3],
     labels=powers, cex=cex1, col="red")
abline(h=0.9, col="red")

plot(sft$fitIndices[,1], sft$fitIndices[,5], xlab="Soft Threshold (power)",
     ylab="Mean Connectivity", type="n", main=paste("Mean Connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], 
     labels=powers, cex=cex1, col="red")

Unfortunately, I still can't figure out why the resulting graph has values over 1. I'm not sure how to paste the graph, but I feel like the values on it don't correspond to the sign(slope) * R^2. They also don't correspond with slope * R^2 either.

If I were to just use the table, would I pick the R^2 or truncated R^2 to determine an appropriate beta? If I use the R^2, does that really mean I should be picking a beta of 20 or do you think there's something I've done wrong upstream?

Thanks for all your help; as I mentioned, I'm very new to this.

ADD REPLY • link 4.9 years ago DWF • 0

0

Entering edit mode

Figured out the problem. In the line of code that goes:

text(sft$fitIndices[,1], -sign(sft$fitIndices[,2])*sft$fitIndices[,3],
 labels=powers, cex=cex1, col="red")

I accidentally switched the columns 2 and 3, which were correct in the line above's plot. Now I just need to figure out which power to use, as only the 20th power goes above 0.8.

ADD REPLY • link 4.9 years ago DWF • 0

0

Entering edit mode

You may want to read WGCNA FAQ point 6 (I can't get a good scale-free topology index no matter how high I set the soft-thresholding power).

ADD REPLY • link 4.9 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Thanks for the recommendation! I checked out the FAQ there, and since I had 60 samples and were trying to implement a signed network analysis, I went with a power of 12. The resulting data were fuzzier than I would have liked when compared to my variables of interest, so I'm now trying to use SVA to make sure there are no hidden batch effects that could be artificially increasing the heterogeneity of the data.

I think I got the hang of how to post pictures, and hopefully my signed and unsigned pickSoftThreshold() graphs show up. Any guidance based on their results would always be super helpful.

ADD REPLY • link 4.9 years ago DWF • 0

0

Entering edit mode

Unsigned Analysis

Signed Analysis

ADD REPLY • link 4.9 years ago DWF • 0