Dear all,
I have a large metabolomics data set. About 6000 samples were run over a few months in 5 batches, or 61 batches (depending on definition).
At the moment for each sample I have the intensity for 21 peaks (metabolites).
> head(df$dat)
peak1 peak2 peak3 peak4 peak5 peak6 peak7
PA14_EM_14-4_E-3_P1-E-3_01_11213.mzXML 5.440897 6.505249 5.251598 7.206793 7.467628 10.759801 9.294075
PA14_EM_1-2_D-4_P1-D-4_01_2002.mzXML 5.108385 6.556920 5.543050 7.652522 6.748898 9.606819 9.019394
PA14_EM_10-4_H-5_P1-H-5_01_8890.mzXML 5.591401 6.766761 5.381610 7.471881 8.100680 10.481429 9.689601
PA14_EM_2-3_A-12_P1-A-12_01_2500.mzXML 4.618323 6.485310 4.498478 7.309714 8.813708 9.658948 9.379349
PA14_EM_9-2_C-5_P1-C-5_01_6835.mzXML 5.836406 7.378964 6.446740 8.505912 7.779362 10.045803 9.704689
PA14_EM_2_B-6_P1-B-6_01_11723.mzXML 5.231878 6.639438 5.473027 7.712421 7.425328 10.343695 9.246132
peak8 peak9 peak10 peak11 peak12 peak13 peak14
PA14_EM_14-4_E-3_P1-E-3_01_11213.mzXML 9.130252 9.879932 10.441853 8.277511 7.258236 8.837902 4.522068
PA14_EM_1-2_D-4_P1-D-4_01_2002.mzXML 9.058104 8.606485 9.272817 8.047970 6.825918 7.924373 4.738949
PA14_EM_10-4_H-5_P1-H-5_01_8890.mzXML 9.744228 9.416476 9.936105 8.577914 7.534848 8.511881 4.592875
PA14_EM_2-3_A-12_P1-A-12_01_2500.mzXML 9.455950 8.490708 8.859265 8.305719 7.257221 7.529841 4.244724
PA14_EM_9-2_C-5_P1-C-5_01_6835.mzXML 9.779392 9.128307 9.420374 8.487148 7.413307 7.872341 4.345545
PA14_EM_2_B-6_P1-B-6_01_11723.mzXML 9.358493 9.539392 10.017200 8.228972 7.089368 8.362185 4.132186
peak15 peak16 peak17 peak18 peak19 peak20 peak21
PA14_EM_14-4_E-3_P1-E-3_01_11213.mzXML 7.503102 8.519748 7.118348 6.301519 4.083066 5.801221 9.971810
PA14_EM_1-2_D-4_P1-D-4_01_2002.mzXML 7.843904 8.123712 6.916606 6.550114 4.741928 6.003363 9.010882
PA14_EM_10-4_H-5_P1-H-5_01_8890.mzXML 7.618536 8.226453 6.932789 6.565171 4.615487 5.906193 8.728420
PA14_EM_2-3_A-12_P1-A-12_01_2500.mzXML 7.341069 8.136234 6.191456 6.195770 4.499564 5.737833 8.022057
PA14_EM_9-2_C-5_P1-C-5_01_6835.mzXML 7.287684 8.216923 6.457609 6.364861 4.857282 5.839109 8.712801
PA14_EM_2_B-6_P1-B-6_01_11723.mzXML 7.123842 8.229959 6.656468 6.620781 4.688067 5.586544 9.881702
Here is my command:
df_cmB<-ComBat(dat=as.matrix(df$dat),batch=df$MSBa,mod=NULL)
And here is the error mesage:
Error in solve(t(design) %*% design) %*% t(design) %*% t(as.matrix(dat)) :
non-conformable arguments
I read a similar post on stacked overflow and it was solved by removing variables with near zero variance, but I don't have any such variables
Any help is appreciated
Thanks, I now have a new error. And it thinks I have 10 batches when I only have 5.
> df_cmB<-ComBat(dat=t(as.matrix(df$dat)),batch=df$MSBa,mod=NULL)
Found 10 batches
Found 0 categorical covariate(s)
Standardizing Data across genes
Fitting L/S model and finding priors
Error in apply(s.data[, i], 1, var, na.rm = T) :
dim(X) must have a positive length
Okay - oops I had a few extra entries in my df$MSBa.
Now I"m getting a different error re: non-conformable arguments
> df_cmB<-ComBat(dat=t(as.matrix(df$dat)),batch=as.factor(df$MSBa),mod=NULL)
Found 5 batches
Found 0 categorical covariate(s)
Standardizing Data across genes
Error in ((dat - t(design %*% B.hat))^2) %*% rep(1/n.array, n.array) :
non-conformable arguments
> str(df$dat)
'data.frame': 5864 obs. of 21 variables:
$ peak1 : num 5.44 5.11 5.59 4.62 5.84 ...
$ peak2 : num 6.51 6.56 6.77 6.49 7.38 ...
$ peak3 : num 5.25 5.54 5.38 4.5 6.45 ...
$ peak4 : num 7.21 7.65 7.47 7.31 8.51 ...
$ peak5 : num 7.47 6.75 8.1 8.81 7.78 ...
$ peak6 : num 10.76 9.61 10.48 9.66 10.05 ...
$ peak7 : num 9.29 9.02 9.69 9.38 9.7 ...
$ peak8 : num 9.13 9.06 9.74 9.46 9.78 ...
$ peak9 : num 9.88 8.61 9.42 8.49 9.13 ...
$ peak10: num 10.44 9.27 9.94 8.86 9.42 ...
$ peak11: num 8.28 8.05 8.58 8.31 8.49 ...
$ peak12: num 7.26 6.83 7.53 7.26 7.41 ...
$ peak13: num 8.84 7.92 8.51 7.53 7.87 ...
$ peak14: num 4.52 4.74 4.59 4.24 4.35 ...
$ peak15: num 7.5 7.84 7.62 7.34 7.29 ...
$ peak16: num 8.52 8.12 8.23 8.14 8.22 ...
$ peak17: num 7.12 6.92 6.93 6.19 6.46 ...
$ peak18: num 6.3 6.55 6.57 6.2 6.36 ...
$ peak19: num 4.08 4.74 4.62 4.5 4.86 ...
$ peak20: num 5.8 6 5.91 5.74 5.84 ...
$ peak21: num 9.97 9.01 8.73 8.02 8.71 ...
> str(as.factor(df$MSBa))
Factor w/ 5 levels "MSB1","MSB2",..: 5 1 4 1 3 5 3 5 2 1 ...