Hi All
I'm strugglin with the analysis of DEP in a protein dataset i have. I have managed to analyse to this point in R using variants of MSGF etc and quantified via SI. Having run the samples via DAPAR and Prostar, i have knowldege of the number (or at least the region) of DEP to expect in this particular comparison (eg. CTRL to 0.01 mg/L). Using msmsTests, i create the hypothesis tests (H0 and H1), subset the data set to only these two groups and then run msms.EdgeR() with div = to the colSums().
This all runs fine but when i get to the portion or adding FDR to the data via test.results, something weird happens where instead of getting a dataset (so to speak) of protein id, raw spectral counts for control condition, raw spectral conts for comparitive condition, logfold change (LFC.AV) (logfc, LR, p.value etc), the second column (which is the raw spectral counts for the ctrl condition is always NA irrespective of which concentration i choose. I''m unsure why this is the case, when the msms.EdgeR runs ok with the comparisons. Has this happened to anyone else? Is there a particular reason this might occur that i can check to see if my data has been entered incorrectly?
I'm scratching my head over what would cause this. I did wonder whether it had anything to do with my estimates of dispersion used in the msms.EdgeR model build but i'm unsure how to proceed now.
Any help would be much appreciated. I can of course provide the code but i am hoping someone else has stumbled across this.
Many thanks :)
No, this is what is frustrating me. As part of the preparation of my experimental MSnSet, i remove any lines with NA (pNA=1/3 and then impute any remaining NA values with knn). When combining the data set, i use imputation QRILC to ensure that there are no NA values and then check with is.na(exprs(e2)) || exprs(e2) < 0) which is always False.
Many thanks for your quick reply. I've attached the code that i have used but i'm unsure if it is in the correct format? Any help would be greatly appreicated
BaP_Exp consists of 12 samples, 3 biological replicates of 4 conditions. The data below as been subsetted to two comparisons. BaP_exp is a large MSnSet.
Code for analysis
mix1=BaP_exp[,BaP_exp$Group %in% c("Ctrl","5")]
new=pp.msms.data(mix1)
pData(new) ##6 samples
H0="y~"
H1="y~Group"
dispersion=colSums(exprs(new))
check for na values
is.na(exprs(new))|| exprs(new) < 0
#always FALSE
mixed=msms.edgeR(new,H0,H1,div=dispersion, fnm="Group")
returns
NA
values or more recently error: negative counts not allowedIt's difficult to say much without the actual code and it's output, as it is executed in R. Two things though
there are warnings at the end of your
sessionInfo
; I would recommend fixing them befoe doing anything else. Re-installing the packages light help. If you have issues or more questions about this step, please ask them in a new question on the forum.Following exactly what is presented in the
msmsTests
vignette, your H0 should be defined asH0 = "y~1"
. I don't know if this is the reason for your issue though.How can i provide the code as its produced in R? Do i just copy it under the line of run code?
I've never been able to attach code before on this forum and am slightly lost. Sorry for the inconvenience
I didn't mean to attach it, but run all the commands in your R session, then copy the whole lot and their output and paste it here. For example:
Many thanks for your quick reply. I've ran the code again and am now getting a new error message. I've updated my packages since i first asked my question, but i don't think that is what is going wrong. Library(MSGFplus, MSnbase, MSnID,vsn, imputeLCMD)
> t1=c("20171124VS37.mzML")
> q1=c("20171124VS37.mzid")
> bap_0a=readMSData(t1, mode="onDisk")
> bap_0a=addIdentificationData(bap_0a,q1)
> t2=c("20171124VS38.mzML")
> q2=c("20171124VS38.mzid")
> bap_0b=readMSData(t2, mode="onDisk")
> bap_0b=addIdentificationData(bap_0b,q2)
> bap_0c=readMSData(t3, mode="onDisk") .... (3 MORE SAMPLES)
> nms=c(paste0("bap_",0,c("a","b","c")),
+ paste0("bap_",5,c("a","b","c")))
> tmp <- sapply(nms, function(.bap) {
+ cat("Processing", .bap, "... ")
+ x <- get(.bap, envir = .GlobalEnv)
+ x=quantify(x,method="SI")
+ x <- topN(x, groupBy = fData(x)$DatabaseAccess, n = 3)
+ nPeps <- nQuants(x, fData(x)$DatabaseAccess)
+ x <- combineFeatures(x, fData(x)$DatabaseAccess,
+ redundancy.handler = "unique",
+ fun="sum", na.rm = TRUE, CV=FALSE)
+ exprs(x) <- exprs(x) * (3/nPeps)
+ x <- updateFvarLabels(x, .bap)
+ varnm <- sub("bap", "bap", .bap)
+ assign(varnm, x, envir = .GlobalEnv)
+ cat("done\n")
+ })
> ctrl=combine(bap_0a,bap_0b)
> ctrl=combine(ctrl, bap_0c)
> ctrl=filterNA(ctrl, pNA=1/3)
> ctrl=impute(ctrl, method="knn")
> sampleNames(ctrl)=c("CTRL.a","CTRL.b","CTRL.c")
> bap5=combine(bap_5a,bap_5b)
> bap5=combine(bap5,bap_5c)
> bap5=filterNA(bap5,pNA=1/3)
> bap5=impute(bap5, method="knn")
> sampleNames(bap5)=c("BaP.5a","BaP.5b","BaP.5c")
> BaP_exp=combine(ctrl,bap5)
> BaP_exp=impute(BaP_exp, "QRILC")
> BaP_exp
MSnSet (storageMode: lockedEnvironment)
assayData: 2265 features, 6 samples
element names: exprs
protocolData: none
phenoData
sampleNames: CTRL.a CTRL.b ... BaP.5c (6 total)
varLabels: sampleNames
varMetadata: labelDescription
featureData
featureNames: sp|P56839|PEPM_MYTED sp|P91958|TPM_MYTGA ... tr|U3PWG3|U3PWG3_9BIVA (2265 total)
fvarLabels: fileIdx.bap_0a spIdx.bap_0a ... CV.20171124VS42.mzML.1.bap_5c (390 total)
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:
- - - Processing information - - -
Combined [1461,3] and [1285,3] MSnSets Thu Oct 18 15:33:34 2018
Data imputation using QRILC Thu Oct 18 15:33:41 2018
Using default parameters
MSnbase version: 2.6.1
DEP (load msmsTests & msmsEDA)
pData(BaP_exp)$Group=rep(c("Ctrl","5"),each=3)
> pData(BaP_exp)
sampleNames Group
CTRL.a 20171124VS37.mzML Ctrl
CTRL.b 20171124VS38.mzML Ctrl
CTRL.c 20171124VS39.mzML Ctrl
BaP.5a 20171124VS40.mzML 5
BaP.5b 20171124VS41.mzML 5
BaP.5c 20171124VS42.mzML 5
> bap_data=pp.msms.data(BaP_exp)
> processingData(BaP_exp)
- - - Processing information - - -
Combined [1461,3] and [1285,3] MSnSets Thu Oct 18 15:33:34 2018
Data imputation using QRILC Thu Oct 18 15:33:41 2018
Using default parameters
MSnbase version: 2.6.1
> H0="y~1"
> H1="y~Group"
> disp2=colSums(exprs(bap_data))
> disp2
CTRL.a CTRL.b CTRL.c BaP.5a BaP.5b BaP.5c
40684.87 40461.76 40564.24 38956.89 39035.39 38819.56
> is.na(exprs(bap_data)) || exprs(bap_data) < 0
[1] FALSE
mixed=msms.edgeR(bap_data,H0,H1,div=disp2, facs=facs)
Error in glmLRT(fit, coef = vc) :
Need at least two columns for design, usually the first is the intercept column
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] imputeLCMD_2.0 impute_1.54.0 pcaMethods_1.72.0 norm_1.0-9.5 tmvtnorm_1.4-10 gmm_1.6-2
[7] sandwich_2.5-0 Matrix_1.2-14 mvtnorm_1.0-8 vsn_3.48.1 MSnID_1.14.0 MSGFplus_1.14.0
[13] msmsTests_1.18.0 msmsEDA_1.18.0 BiocInstaller_1.30.0 MSnbase_2.6.1 ProtGenerics_1.12.0 BiocParallel_1.14.1
[19] mzR_2.14.0 Rcpp_0.12.19 Biobase_2.40.0 BiocGenerics_0.26.0