Question

warning with lmFit command in limma

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 10.6 years ago

Dear List & Gordon, Firstly thanks to you and your team for the wonderful limma package and its well dcoumented manual. I have used it earlier for analysing both one colored and two colored data. But with one of the datasets, I get the following warning with lmFit command in limma package. > fit <- lmFit(gset38932, design38932) Warning message: Partial NA coefficients for 28837 probe(s) Background of experiment: It is a two color experiment with common reference. Paired samples (cancerous and adjacent noncancerous tissues were obtained from 12 patients) I have followed the instructions under section 9.4.1 "Paired Samples" of limma manual (last revised on 17 june 2014) and also goggled for this warning but I am unable to understand it. My another concern is that if I ignore the warning and proceed with limma analysis, I get only 84 differentially expressed genes (genes with adj.P.Value < 0.05), which appears too small a number for a microarray experiment. Complete code: > library(Biobase) > library(GEOquery) > library(limma) > gset38932 <- getGEO("GSE38932", GSEMatrix =TRUE) > GPLid <- levels((gset38932[[1]])$platform_id) > if (length(gset38932) > 1) + { + #idx <- grep("GPL5936", attr(gset38932, "names")) + idx <- grep(GPLid, attr(gset38932, "names")) + } else + { + idx <- 1 + } > gset38932 <- gset38932[[idx]] > ConventionalMicroarrayData <- which(pData(gset38932)$characteristics_ch1.10 == "strategy: conventional microarray") > ## remove data with "Indirect strategy." and keep ones with "Direct strategy." > gset38932 <- gset38932[,ConventionalMicroarrayData] > # log2 transform > ex38932 <- exprs(gset38932) > qx <- as.numeric(quantile(ex38932, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T)) > LogC <- (qx[5] > 100) || + (qx[6]-qx[1] > 50 && qx[2] > 0) || + (qx[2] > 0 && qx[2] < 1 && qx[4] > 1 && qx[4] < 2) > if (LogC) { ex38932[which(ex38932 <= 0)] <- NaN + exprs(gset38932) <- log2(ex38932) } > PatientInfo <- pData(gset38932)[,c(2,10:20,27)] > ## remove anything that is not a digit > PatientID <- as.numeric(sub(pattern="\D+", replacement="", x=PatientInfo$characteristics_ch1.9,ignore.case=T)) > TissueType <- sub(pattern="tissue type: ", replacement="", x=PatientInfo$characteristics_ch1.8,ignore.case=T) > TissueType <- make.names(TissueType) > Block <- factor(PatientID) > Treatment <- factor(TissueType,levels=c("adjacent.non.tumor","tumor")) ## order of levels is important > design38932 <- model.matrix(~Block+Treatment) > fit <- lmFit(gset38932, design38932) Warning message: Partial NA coefficients for 28837 probe(s) > fit2 <- eBayes(fit, proportion=0.01) #proportion: assumed proportion of genes which are differentially expressed > > ## to get the no.of genes in each array > nrow38932 <- nrow(ex38932) > LastColumn <- dim(design38932)[2] > tT <- topTable(fit2, coef=colnames(design38932)[LastColumn], adjust="fdr", sort.by="B",number=nrow38932) > dim(subset(tT,adj.P.Val<0.05,)) [1] 84 15 -- output of sessionInfo(): > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.20.8 GEOquery_2.30.1 Biobase_2.24.0 BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] RCurl_1.95-4.3 tools_3.1.0 XML_3.98-1.1 -- Sent via the guest posting facility at bioconductor.org.

Microarray limma Microarray limma • 4.6k views

ADD COMMENT • link updated 10.7 years ago by Gordon Smyth 52k • written 10.7 years ago by Guest User ★ 13k

score 0 · Answer 1 · 2014-08-07

Dear Shruti Marwaha, The problem is that more than half of the GSE38932 data consists of missing values. This seems to me to be very unfortunate, but limma can only analyse the data that it is given and give you a warning to keep you informed of what is happening. Given the enormous number of missing values, there may be no available data for many of the blocks, and hence the coefficients for the blocks become NA. Another problem is the log2 transformation code that you are using, which is unnecessary and incorrect. I realize that you are using R code from the GEO site itself: http://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE38932 However it is clear from the description of the data processing that the expression values are already log2-ratios produced by limma, and hence do not need to be further processed. After the step, gset38932 <- gset38932[,ConventionalMicroarrayData] you can already pass the data object to lmFit() without any extra processing except for row filtering. If you the unnecessary re-processing, the tests of the treatment effect from lmFit() etc should still be correct despite the warnings. limma is making the best use of the non-missing data. However, personally, I would download the raw gpr files from the GEO site and re-normalize the data without introducing missing values. This is a dataset where I think that somewhat less processing would have resulted in better results. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth > From: "Shruti Marwaha [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, marwahsi at gmail.com > Sent: Wednesday, 6 August, 2014 7:31:13 AM > Subject: [BioC] warning with lmFit command in limma > > Dear List & Gordon, > > Firstly thanks to you and your team for the wonderful limma package and > its well dcoumented manual. > I have used it earlier for analysing both one colored and two colored > data. But with one of the datasets, I get the following warning with > lmFit command in limma package. > >> fit <- lmFit(gset38932, design38932) > Warning message: > Partial NA coefficients for 28837 probe(s) > > Background of experiment: > > It is a two color experiment with common reference. > Paired samples (cancerous and adjacent noncancerous tissues were obtained from 12 patients) > > I have followed the instructions under section 9.4.1 "Paired Samples" of > limma manual (last revised on 17 june 2014) and also goggled for this > warning but I am unable to understand it. My another concern is that if > I ignore the warning and proceed with limma analysis, I get only 84 > differentially expressed genes (genes with adj.P.Value < 0.05), which > appears too small a number for a microarray experiment. > > Complete code: >> library(Biobase) >> library(GEOquery) >> library(limma) > >> gset38932 <- getGEO("GSE38932", GSEMatrix =TRUE) >> GPLid <- levels((gset38932[[1]])$platform_id) >> if (length(gset38932) > 1) > + { > + #idx <- grep("GPL5936", attr(gset38932, "names")) > + idx <- grep(GPLid, attr(gset38932, "names")) > + } else > + { > + idx <- 1 > + } >> gset38932 <- gset38932[[idx]] > >> ConventionalMicroarrayData <- which(pData(gset38932)$characteristics_ch1.10 == "strategy: conventional microarray") >> ## remove data with "Indirect strategy." and keep ones with "Direct strategy." >> gset38932 <- gset38932[,ConventionalMicroarrayData] > >> # log2 transform >> ex38932 <- exprs(gset38932) >> qx <- as.numeric(quantile(ex38932, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T)) >> LogC <- (qx[5] > 100) || > + (qx[6]-qx[1] > 50 && qx[2] > 0) || > + (qx[2] > 0 && qx[2] < 1 && qx[4] > 1 && qx[4] < 2) >> if (LogC) { ex38932[which(ex38932 <= 0)] <- NaN > + exprs(gset38932) <- log2(ex38932) } > >> PatientInfo <- pData(gset38932)[,c(2,10:20,27)] >> ## remove anything that is not a digit >> PatientID <- as.numeric(sub(pattern="\D+", replacement="", x=PatientInfo$characteristics_ch1.9,ignore.case=T)) >> TissueType <- sub(pattern="tissue type: ", replacement="", x=PatientInfo$characteristics_ch1.8,ignore.case=T) >> TissueType <- make.names(TissueType) >> Block <- factor(PatientID) >> Treatment <- factor(TissueType,levels=c("adjacent.non.tumor","tumor")) ## order of levels is important >> design38932 <- model.matrix(~Block+Treatment) > >> fit <- lmFit(gset38932, design38932) > Warning message: > Partial NA coefficients for 28837 probe(s) > >> fit2 <- eBayes(fit, proportion=0.01) #proportion: assumed proportion of genes which are differentially expressed >> >> ## to get the no.of genes in each array >> nrow38932 <- nrow(ex38932) >> LastColumn <- dim(design38932)[2] >> tT <- topTable(fit2, coef=colnames(design38932)[LastColumn], adjust="fdr", sort.by="B",number=nrow38932) >> dim(subset(tT,adj.P.Val<0.05,)) > [1] 84 15 > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.1.0 (2014-04-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 > [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_3.20.8 GEOquery_2.30.1 Biobase_2.24.0 BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] RCurl_1.95-4.3 tools_3.1.0 XML_3.98-1.1 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

score 0 · Answer 2 · 2014-08-07

Thanks Smith for your quick response. Thanks for pointing out the log2 transformation problem in my code. You are correct, I was following GEO2R code. I will also like to confirm with you if my design matrix is correct for a two color experiment with common reference where the samples are paired (cancerous and adjacent noncancerous tissues were obtained from same patient). Thanks & Regards, Shruti Marwaha Graduate Student Systems Biology and Physiology University of Cincinnati Medical Science Building, 231 Albert Sabin Way Cincinnati, OH, USA 45267 [[alternative HTML version deleted]]

score 0 · Answer 3 · 2014-08-09

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

Your design matrix looks correct on my quick look. Gordon > Date: Thu, 7 Aug 2014 15:05:12 +0000 > From: "Marwaha, Shruti (marwahsi)" <marwahsi at="" mail.uc.edu=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Cc: "smyth at wehi.edu.au" <smyth at="" wehi.edu.au=""> > Subject: Re: [BioC] warning with lmFit command in limma > > Thanks Smith for your quick response. Thanks for pointing out the log2 > transformation problem in my code. You are correct, I was following > GEO2R code. > > I will also like to confirm with you if my design matrix is correct for > a two color experiment with common reference where the samples are > paired (cancerous and adjacent noncancerous tissues were obtained from > same patient). > > Thanks & Regards, > Shruti Marwaha > > Graduate Student > Systems Biology and Physiology > University of Cincinnati > Medical Science Building, > 231 Albert Sabin Way > Cincinnati, OH, USA 45267 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 10.7 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks Smith. -----Original Message----- From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] Sent: 09 August 2014 05:02 To: Marwaha, Shruti (marwahsi) Cc: Bioconductor mailing list Subject: warning with lmFit command in limma Your design matrix looks correct on my quick look. Gordon > Date: Thu, 7 Aug 2014 15:05:12 +0000 > From: "Marwaha, Shruti (marwahsi)" <marwahsi at="" mail.uc.edu=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Cc: "smyth at wehi.edu.au" <smyth at="" wehi.edu.au=""> > Subject: Re: [BioC] warning with lmFit command in limma > > Thanks Smith for your quick response. Thanks for pointing out the log2 > transformation problem in my code. You are correct, I was following > GEO2R code. > > I will also like to confirm with you if my design matrix is correct > for a two color experiment with common reference where the samples are > paired (cancerous and adjacent noncancerous tissues were obtained from > same patient). > > Thanks & Regards, > Shruti Marwaha > > Graduate Student > Systems Biology and Physiology > University of Cincinnati > Medical Science Building, > 231 Albert Sabin Way > Cincinnati, OH, USA 45267 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 10.7 years ago Marwaha, Shruti marwahsi ▴ 40