Computationally Singular Matrices - DESeq
1
0
Entering edit mode
flippy23 • 0
@flippy23-18925
Last seen 5.5 years ago

Hello,

I had a question about computationally singular matrices in DESeq and surrogate variables. After including all the SVs into my formula where I am analyzing paired samples (before/after treatment within a sample), I use the DESeq function and am returned with the error that my matrix is computationally singular. When I reduce the number of surrogate variables to 3, based on the n.sv() function using the "be" method, I no longer receive this error and am able to run the analysis. I want to know why the reduction of the number of SVs included in my design formula allows the dataset to run?

dds <- DESeqDataSetFromMatrix(countData = countdata, colData = phenotype, design = ~ study_id + SV1 + SV2 + SV3 + SV4 + SV5 + SV6 + SV7 + SV8 + SV9 + SV10 + SV11 + treatment)

deseq2 rnaseq sva • 2.9k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 5 days ago
United States

What is the correlation of the model matrix?

cor(model.matrix(design(dds), colData(dds)))

ADD COMMENT
0
Entering edit mode

I'm not getting a single output. it's a matrix of the pairwise study id's and their respective correlations. because i'm looking at before/after treatment within the same sample, these pairwise correlations are 1. Not all of them, just the ones within the same study id

ADD REPLY
0
Entering edit mode

I was expecting you would get a matrix of pairwise correlations between all the covariates in the design. There shouldn't be any covariates with correlation 1. Can you give an example?

ADD REPLY
0
Entering edit mode
         (Intercept)  study_id121 study_id123 study_id124 study_id125 study_id126 study_id127 study_id128 study_id130   study_id131

(Intercept) 1 NA NA NA NA NA NA NA NA NA studyid121 NA 1.000000000 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.0303030303 studyid123 NA -0.030303030 1.00000000 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.0303030303 studyid124 NA -0.030303030 -0.03030303 1.00000000 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.0303030303 studyid125 NA -0.030303030 -0.03030303 -0.03030303 1.00000000 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.0303030303 Warning message: In cor(model.matrix(design(dds), colData(dds))) : the standard deviation is zero

this is a portion of the matrix, and this is after the surrogate variables have been reduced to 3 (after using the n.sv(method = "be") in svaseq). i'm confused as to why this is happening. i suspect i may have set up my design formula incorrectly?

ADD REPLY
0
Entering edit mode

Oh, the diagonal is always 1, that's the correlation of a variable with itself. The concern would be off-diagonal high correlations.

ADD REPLY
0
Entering edit mode

So there is no concern with the output of this correlation? Were the 11 SV's just extremely correlated with the variation across the samples? Was it overcorrecting?

ADD REPLY
0
Entering edit mode

I don't know, but I would guess that some of those were too highly correlated with some other covariates.

ADD REPLY
0
Entering edit mode

thank you - it does seem that the first couple of SVs were highly correlated with cell type proportions.

ADD REPLY
0
Entering edit mode

Hi Michael,

I hope you don't mind if I ask you another question on this topic. I have a similar error where the vst function tells me my deseq object is computationally singular. The output from cor(model.matrix(design(dds), colData(dds))) shows that the correlation between some of my variables is 0.6 or -0.6. Therefore, I thought that that might be the problem. However, later I read this post: https://stats.stackexchange.com/questions/76488/error-system-is-computationally-singular-when-running-a-glm, where one of the answers suggests that there might be too big of a difference between the values of various variables in my design formula.

The output of my metadata object looks like this:

                                  Patient.group libsize_lncRNAs libsize_ispMRNAs           sv1
sample 1                  BrainMetastases         1024699           462969  2.870817e-02
sample 2                  BrainMetastases         1010253           393701  2.227226e-02
sample 3                  BrainMetastases         1081656           405867  4.160032e-02
sample 4                BrainMetastases          925001           637201  1.654134e-03
sample 5                 BrainMetastases         1147630          1149775  1.823269e-02
sample 6                  BrainMetastases         2369589          1326551 -7.526675e-02
sample 7                 BrainMetastases          865998           196750  1.101019e-01
sample 8                 BrainMetastases         3281444           604129 -1.556478e-02

As you can see, there is a pretty large difference between the libsize values and the sv1 values. The correlation between the libsize factors is 0.6, the correlation between the libsize factors and sv1 is also +0.6 or -0.6, which seems pretty high. I tried the suggestion from stackexchange myself and divided the values of the libsize variables by 10^6 so the libsize values and sv1 values would be closer to each other. Of course this does not influence the correlation, but it does solve my error! What's strange to me, is that using different factors to divide the values of libsize by (10^6 or 10^4), gives different outcomes in my heatmaps.

So my question is: how should I proceed? I can see that the libsize and sv1 values have a pretty large confounding effect on my dataset and therefore I want to add these variables to my design formula and then remove them with limma's removebatcheffect formula. However, I don't know how to transform the values of my libsize or sv1 variable correctly. Could you help me with this problem?

ADD REPLY
1
Entering edit mode

Center and scale numeric values, or at the least put them roughly on a scale between -10 and 10.

If you are just using vst, you can also use a design of ~1 which won't use the design at all.

Note that current versions of DESeq2 also have a note about scaling numeric design variables, at object construction.

ADD REPLY
0
Entering edit mode

I centered and scaled the numeric values, but now I'm getting an error that the metadata is not full rank. I believe this is because some samples have approximately similar values for columns 2 and 3 but I don't think they are ever exactly the same. Is there a way to solve this error? I used the following code line for scaling and centering and a metadata example is below:

z <- scale(newmetadata,center=TRUE,scale=TRUE)


    [,1] [,2]                   [,3]                 
  [1,] "PR" "1.07445329834218"     "0.150079637458238"  
  [2,] "PD" "0.229775299854301"    "0.522757029906995"  
  [3,] "PD" "-0.741792318990576"   "-1.44789287875001"  
  [4,] "PR" "0.328620920254315"    "0.127247665613223"  
  [5,] "PR" "1.02102712752095"     "1.02311831179494"   
  [6,] "PD" "1.33391110778732"     "2.48419106996971"   
  [7,] "PR" "-0.278709892442993"   "0.455864898195633"  
  [8,] "PD" "2.41830387908215"     "2.22075940607133"   
  [9,] "PR" "0.144411730055561"    "-0.805111739099011" 
 [10,] "PR" "1.44027658496325"     "1.29039348966329"   
 [11,] "PR" "-0.668761132298932"   "-1.48888679237402"  
 [12,] "PR" "1.86788009063503"     "0.322487469120705"  
 [13,] "PD" "0.412830272423673"    "0.716211044750991"  
 [14,] "PD" "-0.555832843451078"   "0.483525351378967"  
 [15,] "PR" "0.223285290345598"    "0.741723840824643"  
 [16,] "PR" "0.180990266938334"    "0.995198769369062"  
 [17,] "PR" "0.933188284172737"    "-0.266032674701307" 
 [18,] "PR" "-0.629883260427799"   "-0.908546160176125" 
 [19,] "PR" "0.904162971724024"    "-0.0596520337644327"
 [20,] "PD" "0.0802199393481943"   "0.0666143591576966" 
 [21,] "PD" "0.236421530671579"    "1.68909543297056"   
 [22,] "PD" "0.395733897760879"    "2.96124930866256"   
 [23,] "PR" "0.740634146014908"    "0.750984675320534"  
ADD REPLY
1
Entering edit mode

Check the correlation of the variables in your design.

E.g.:

# check the variables in design
design(dds)
# pull them out of colData to make sure you get right version
data <- colData(dds)[, vars]
cor(data)
ADD REPLY
0
Entering edit mode

In order to use

design(dds) 

or

cor(dds)

DDS has to be made:

dds<-DESeqDataSetFromMatrix(dds,colData =centered_and_scaled_metadata, design = ~variable_1+variable_2+variable_3)

However, creating DDS gives me the following error, so I cannot check the correlation between different variables.

converting counts to integer mode
Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')
In addition: Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
  some variables in design formula are characters, converting to factors

I've tried to use the cor function on my centered_and_scaled_metadata but then I receive an error which says that my argument in cor must be numeric. To turn my metadata numeric, I must first delete my first column which contains characters as can be seen in the previous post. Now I can finally produce a numeric metadata with cor(centered_and_scaled_metadata) which gives me the following:

                 libsize_lncRNAs libsize_ispMRNAs
libsize_lncRNAs      1.000000000      0.008076225
libsize_ispMRNAs     0.008076225      1.000000000

So there is no collinearity between variables and I do not understand why I'm receiving the full rank error. Could you help?

ADD REPLY
0
Entering edit mode

I'd recommend fixing the class of these columns ahead of time. Set the columns that are supposed to be numeric to numeric, likewise for the factor columns.

E.g., if you have a data.frame x, at the top of the script, set the proper classes:

x$variable <- as.numeric(x$variable)
x$other <- factor(x$other)
etc.

Do this before you start running DESeq2 or other diagnostics.

ADD REPLY
0
Entering edit mode

I'd recommend fixing the class of these columns ahead of time. Set the columns that are supposed to be numeric to numeric, likewise for the factor columns.

E.g., if you have a data.frame x, at the top of the script, set the proper classes:

x$variable <- as.numeric(x$variable)
x$other <- factor(x$other)
etc.

Do this before you start running DESeq2 or other diagnostics.

ADD REPLY
0
Entering edit mode

Hi Michael I'd love to hear your input on this issue

ADD REPLY
0
Entering edit mode

I would love some feedback on the error if you have the time

ADD REPLY

Login before adding your answer.

Traffic: 569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6