warning with lmFit command in limma
3
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.3 years ago
Dear List & Gordon, Firstly thanks to you and your team for the wonderful limma package and its well dcoumented manual. I have used it earlier for analysing both one colored and two colored data. But with one of the datasets, I get the following warning with lmFit command in limma package. > fit <- lmFit(gset38932, design38932) Warning message: Partial NA coefficients for 28837 probe(s) Background of experiment: It is a two color experiment with common reference. Paired samples (cancerous and adjacent noncancerous tissues were obtained from 12 patients) I have followed the instructions under section 9.4.1 "Paired Samples" of limma manual (last revised on 17 june 2014) and also goggled for this warning but I am unable to understand it. My another concern is that if I ignore the warning and proceed with limma analysis, I get only 84 differentially expressed genes (genes with adj.P.Value < 0.05), which appears too small a number for a microarray experiment. Complete code: > library(Biobase) > library(GEOquery) > library(limma) > gset38932 <- getGEO("GSE38932", GSEMatrix =TRUE) > GPLid <- levels((gset38932[[1]])$platform_id) > if (length(gset38932) > 1) + { + #idx <- grep("GPL5936", attr(gset38932, "names")) + idx <- grep(GPLid, attr(gset38932, "names")) + } else + { + idx <- 1 + } > gset38932 <- gset38932[[idx]] > ConventionalMicroarrayData <- which(pData(gset38932)$characteristics_ch1.10 == "strategy: conventional microarray") > ## remove data with "Indirect strategy." and keep ones with "Direct strategy." > gset38932 <- gset38932[,ConventionalMicroarrayData] > # log2 transform > ex38932 <- exprs(gset38932) > qx <- as.numeric(quantile(ex38932, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T)) > LogC <- (qx[5] > 100) || + (qx[6]-qx[1] > 50 && qx[2] > 0) || + (qx[2] > 0 && qx[2] < 1 && qx[4] > 1 && qx[4] < 2) > if (LogC) { ex38932[which(ex38932 <= 0)] <- NaN + exprs(gset38932) <- log2(ex38932) } > PatientInfo <- pData(gset38932)[,c(2,10:20,27)] > ## remove anything that is not a digit > PatientID <- as.numeric(sub(pattern="\D+", replacement="", x=PatientInfo$characteristics_ch1.9,ignore.case=T)) > TissueType <- sub(pattern="tissue type: ", replacement="", x=PatientInfo$characteristics_ch1.8,ignore.case=T) > TissueType <- make.names(TissueType) > Block <- factor(PatientID) > Treatment <- factor(TissueType,levels=c("adjacent.non.tumor","tumor")) ## order of levels is important > design38932 <- model.matrix(~Block+Treatment) > fit <- lmFit(gset38932, design38932) Warning message: Partial NA coefficients for 28837 probe(s) > fit2 <- eBayes(fit, proportion=0.01) #proportion: assumed proportion of genes which are differentially expressed > > ## to get the no.of genes in each array > nrow38932 <- nrow(ex38932) > LastColumn <- dim(design38932)[2] > tT <- topTable(fit2, coef=colnames(design38932)[LastColumn], adjust="fdr", sort.by="B",number=nrow38932) > dim(subset(tT,adj.P.Val<0.05,)) [1] 84 15 -- output of sessionInfo(): > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.20.8 GEOquery_2.30.1 Biobase_2.24.0 BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] RCurl_1.95-4.3 tools_3.1.0 XML_3.98-1.1 -- Sent via the guest posting facility at bioconductor.org.
Microarray limma Microarray limma • 4.5k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia
Dear Shruti Marwaha, The problem is that more than half of the GSE38932 data consists of missing values. This seems to me to be very unfortunate, but limma can only analyse the data that it is given and give you a warning to keep you informed of what is happening. Given the enormous number of missing values, there may be no available data for many of the blocks, and hence the coefficients for the blocks become NA. Another problem is the log2 transformation code that you are using, which is unnecessary and incorrect. I realize that you are using R code from the GEO site itself: http://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE38932 However it is clear from the description of the data processing that the expression values are already log2-ratios produced by limma, and hence do not need to be further processed. After the step, gset38932 <- gset38932[,ConventionalMicroarrayData] you can already pass the data object to lmFit() without any extra processing except for row filtering. If you the unnecessary re-processing, the tests of the treatment effect from lmFit() etc should still be correct despite the warnings. limma is making the best use of the non-missing data. However, personally, I would download the raw gpr files from the GEO site and re-normalize the data without introducing missing values. This is a dataset where I think that somewhat less processing would have resulted in better results. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth > From: "Shruti Marwaha [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, marwahsi at gmail.com > Sent: Wednesday, 6 August, 2014 7:31:13 AM > Subject: [BioC] warning with lmFit command in limma > > Dear List & Gordon, > > Firstly thanks to you and your team for the wonderful limma package and > its well dcoumented manual. > I have used it earlier for analysing both one colored and two colored > data. But with one of the datasets, I get the following warning with > lmFit command in limma package. > >> fit <- lmFit(gset38932, design38932) > Warning message: > Partial NA coefficients for 28837 probe(s) > > Background of experiment: > > It is a two color experiment with common reference. > Paired samples (cancerous and adjacent noncancerous tissues were obtained from 12 patients) > > I have followed the instructions under section 9.4.1 "Paired Samples" of > limma manual (last revised on 17 june 2014) and also goggled for this > warning but I am unable to understand it. My another concern is that if > I ignore the warning and proceed with limma analysis, I get only 84 > differentially expressed genes (genes with adj.P.Value < 0.05), which > appears too small a number for a microarray experiment. > > Complete code: >> library(Biobase) >> library(GEOquery) >> library(limma) > >> gset38932 <- getGEO("GSE38932", GSEMatrix =TRUE) >> GPLid <- levels((gset38932[[1]])$platform_id) >> if (length(gset38932) > 1) > + { > + #idx <- grep("GPL5936", attr(gset38932, "names")) > + idx <- grep(GPLid, attr(gset38932, "names")) > + } else > + { > + idx <- 1 > + } >> gset38932 <- gset38932[[idx]] > >> ConventionalMicroarrayData <- which(pData(gset38932)$characteristics_ch1.10 == "strategy: conventional microarray") >> ## remove data with "Indirect strategy." and keep ones with "Direct strategy." >> gset38932 <- gset38932[,ConventionalMicroarrayData] > >> # log2 transform >> ex38932 <- exprs(gset38932) >> qx <- as.numeric(quantile(ex38932, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T)) >> LogC <- (qx[5] > 100) || > + (qx[6]-qx[1] > 50 && qx[2] > 0) || > + (qx[2] > 0 && qx[2] < 1 && qx[4] > 1 && qx[4] < 2) >> if (LogC) { ex38932[which(ex38932 <= 0)] <- NaN > + exprs(gset38932) <- log2(ex38932) } > >> PatientInfo <- pData(gset38932)[,c(2,10:20,27)] >> ## remove anything that is not a digit >> PatientID <- as.numeric(sub(pattern="\D+", replacement="", x=PatientInfo$characteristics_ch1.9,ignore.case=T)) >> TissueType <- sub(pattern="tissue type: ", replacement="", x=PatientInfo$characteristics_ch1.8,ignore.case=T) >> TissueType <- make.names(TissueType) >> Block <- factor(PatientID) >> Treatment <- factor(TissueType,levels=c("adjacent.non.tumor","tumor")) ## order of levels is important >> design38932 <- model.matrix(~Block+Treatment) > >> fit <- lmFit(gset38932, design38932) > Warning message: > Partial NA coefficients for 28837 probe(s) > >> fit2 <- eBayes(fit, proportion=0.01) #proportion: assumed proportion of genes which are differentially expressed >> >> ## to get the no.of genes in each array >> nrow38932 <- nrow(ex38932) >> LastColumn <- dim(design38932)[2] >> tT <- topTable(fit2, coef=colnames(design38932)[LastColumn], adjust="fdr", sort.by="B",number=nrow38932) >> dim(subset(tT,adj.P.Val<0.05,)) > [1] 84 15 > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.1.0 (2014-04-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 > [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_3.20.8 GEOquery_2.30.1 Biobase_2.24.0 BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] RCurl_1.95-4.3 tools_3.1.0 XML_3.98-1.1 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
@marwaha-shruti-marwahsi-5821
Last seen 2.3 years ago
United States
Thanks Smith for your quick response. Thanks for pointing out the log2 transformation problem in my code. You are correct, I was following GEO2R code. I will also like to confirm with you if my design matrix is correct for a two color experiment with common reference where the samples are paired (cancerous and adjacent noncancerous tissues were obtained from same patient). Thanks & Regards, Shruti Marwaha Graduate Student Systems Biology and Physiology University of Cincinnati Medical Science Building, 231 Albert Sabin Way Cincinnati, OH, USA 45267 [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia
Your design matrix looks correct on my quick look. Gordon > Date: Thu, 7 Aug 2014 15:05:12 +0000 > From: "Marwaha, Shruti (marwahsi)" <marwahsi at="" mail.uc.edu=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Cc: "smyth at wehi.edu.au" <smyth at="" wehi.edu.au=""> > Subject: Re: [BioC] warning with lmFit command in limma > > Thanks Smith for your quick response. Thanks for pointing out the log2 > transformation problem in my code. You are correct, I was following > GEO2R code. > > I will also like to confirm with you if my design matrix is correct for > a two color experiment with common reference where the samples are > paired (cancerous and adjacent noncancerous tissues were obtained from > same patient). > > Thanks & Regards, > Shruti Marwaha > > Graduate Student > Systems Biology and Physiology > University of Cincinnati > Medical Science Building, > 231 Albert Sabin Way > Cincinnati, OH, USA 45267 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
Thanks Smith. -----Original Message----- From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] Sent: 09 August 2014 05:02 To: Marwaha, Shruti (marwahsi) Cc: Bioconductor mailing list Subject: warning with lmFit command in limma Your design matrix looks correct on my quick look. Gordon > Date: Thu, 7 Aug 2014 15:05:12 +0000 > From: "Marwaha, Shruti (marwahsi)" <marwahsi at="" mail.uc.edu=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Cc: "smyth at wehi.edu.au" <smyth at="" wehi.edu.au=""> > Subject: Re: [BioC] warning with lmFit command in limma > > Thanks Smith for your quick response. Thanks for pointing out the log2 > transformation problem in my code. You are correct, I was following > GEO2R code. > > I will also like to confirm with you if my design matrix is correct > for a two color experiment with common reference where the samples are > paired (cancerous and adjacent noncancerous tissues were obtained from > same patient). > > Thanks & Regards, > Shruti Marwaha > > Graduate Student > Systems Biology and Physiology > University of Cincinnati > Medical Science Building, > 231 Albert Sabin Way > Cincinnati, OH, USA 45267 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD REPLY

Login before adding your answer.

Traffic: 529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6