EdgeR - problems running estimateCRDisp
2
0
Entering edit mode
@josquintibbitsdpivicgovau-4311
Last seen 10.4 years ago
I am running the new EdgeR package 1.8 and have updated R to 1.12.0 and updated all the packages I have installed. I have an RNAseq experiment with 24 samples and am wanting to run a glm analysis using the Cox-Reid common dispersion (and tagwise) paramaters. I have created a DGEList object using a targets file and this successfully ran in the calcNormFactors() and in the plotMDS.dge() functions. I have also created what appears to be a sensible and ok design matrix using model.matrix(). These then are being input to estimateCRDisp() to estimate the common dispersion. The code I have run is as follows (in red) with output (blue) which is pretty much exactly the same as the worked example in the manual on pp 51-- and does not seem abnormal at all...... targets <- read.delim("Lp_NUE_DGEbymSEQ_TARGETS_EdgeRIN.txt", stringsAsFactors = FALSE, row.names = "label") targets$Nitrogen <- factor(targets$Nitrogen) targets$Endophyte <- factor(targets$Endophyte) targets$Tissue <- factor(targets$Tissue) targets ## CREATE THE DGEList Object DGEList.object <- readDGE(targets, skip = 1, comment.char = '#') colnames(DGEList.object) <- row.names(targets) ## CALCULATE THE NORMALISATION (f) FACTORS DGEList.object <- calcNormFactors(DGEList.object) ## Exclude data where rowsums are < 30 YOU CAN CHANGE THIS (there being 24 samples in the experiment) DGEList.object <- DGEList.object[rowSums(DGEList.object$counts) > 30, ] DGEList.object An object of class "DGEList" $samples files Endophyte Tissue Nitrogen lib.size norm.factors Lp_NUE_LFFULL Lp_NUE_LFFULL_ER_RAWREADS.txt Free Leaf Full 3786614 0.9244799 Lp_NUE_LFNO3 Lp_NUE_LFNO3_ER_RAWREADS.txt Free Leaf Partial 5311856 0.8120410 Lp_NUE_LFK Lp_NUE_LFK_ER_RAWREADS.txt Free Leaf Full 3482772 0.7224960 Lp_NUE_LFPO4 Lp_NUE_LFPO4_ER_RAWREADS.txt Free Leaf Full 6208397 0.7367309 Lp_NUE_LFCa Lp_NUE_LFCa_ER_RAWREADS.txt Free Leaf Full 3597004 0.8147619 19 more rows ... $counts Lp_NUE_LFFULL Lp_NUE_LFNO3 Lp_NUE_LFK Lp_NUE_LFPO4 Lp_NUE_LFCa Lp_NUE_LFNH4 Lp_NUE_RFFULL Lp_NUE_RFNO3 Lp_NUE_RFK Lp_NUE_RFPO4 >prg_cds_44 345 382 120 261 141 215 888 717 698 1112 >prg_cds_141 290 451 126 422 235 459 676 559 474 879 >prg_cds_2 233 277 131 333 239 376 246 185 124 282 >prg_cds_16 433 608 184 614 346 423 924 756 595 913 >prg_cds_84 287 353 90 241 248 394 823 612 989 1061 Lp_NUE_RFCa Lp_NUE_RFNH4 Lp_NUE_LSFULL Lp_NUE_LSNO3 Lp_NUE_LSK Lp_NUE_LSPO4 Lp_NUE_LSCa Lp_NUE_LSNH4 Lp_NUE_RSFULL Lp_NUE_RSNO3 >prg_cds_44 1107 439 954 257 205 107 868 106 316 1147 >prg_cds_141 583 1052 773 325 230 193 600 264 414 1046 >prg_cds_2 235 454 177 232 177 203 138 239 334 228 >prg_cds_16 817 1040 2373 464 220 276 895 291 714 2655 >prg_cds_84 890 1653 894 268 169 123 714 209 311 1078 Lp_NUE_RSK Lp_NUE_RSPO4 Lp_NUE_RSCa Lp_NUE_RSNH4 >prg_cds_44 622 903 114 474 >prg_cds_141 505 711 205 1166 >prg_cds_2 93 136 157 340 >prg_cds_16 1398 1660 311 1258 >prg_cds_84 893 771 154 1269 63905 more rows ... ## Produce an MDS plot of the data #plotMDS.dge(DGEList.object, main = "MDS plot for Lp_NUE_DGEbymSEQ") ## DEFINE THE DESIGN MATRIX design <- model.matrix(~ Endophyte + Tissue + Nitrogen , data = targets) print("This is the design matrix") design (Intercept) EndophyteSTWT TissueRoot NitrogenPartial Lp_NUE_LFFULL 1 0 0 0 Lp_NUE_LFNO3 1 0 0 1 Lp_NUE_LFK 1 0 0 0 Lp_NUE_LFPO4 1 0 0 0 Lp_NUE_LFCa 1 0 0 0 Lp_NUE_LFNH4 1 0 0 1 Lp_NUE_RFFULL 1 0 1 0 Lp_NUE_RFNO3 1 0 1 1 Lp_NUE_RFK 1 0 1 0 Lp_NUE_RFPO4 1 0 1 0 Lp_NUE_RFCa 1 0 1 0 Lp_NUE_RFNH4 1 0 1 1 Lp_NUE_LSFULL 1 1 0 0 Lp_NUE_LSNO3 1 1 0 1 Lp_NUE_LSK 1 1 0 0 Lp_NUE_LSPO4 1 1 0 0 Lp_NUE_LSCa 1 1 0 0 Lp_NUE_LSNH4 1 1 0 1 Lp_NUE_RSFULL 1 1 1 0 Lp_NUE_RSNO3 1 1 1 1 Lp_NUE_RSK 1 1 1 0 Lp_NUE_RSPO4 1 1 1 0 Lp_NUE_RSCa 1 1 1 0 Lp_NUE_RSNH4 1 1 1 1 attr(,"assign") [1] 0 1 2 3 attr(,"contrasts") attr(,"contrasts")$Endophyte [1] "contr.treatment" attr(,"contrasts")$Tissue [1] "contr.treatment" attr(,"contrasts")$Nitrogen [1] "contr.treatment" ## ESTIMATE THE Cox-Reid common dispersion DGEList.object <- estimateCRDisp(DGEList.object, design) names(DGEList.object) This step is giving the following error::: Loading required package: MASS Error: NA/NaN/Inf in foreign function call (arg 1) In addition: Warning messages: 1: glm.fit: algorithm did not converge 2: step size truncated due to divergence > names(DGEList.object) [1] "samples" "counts" Clearly the function is not working. I have tried two different computers and a run it in 64 bit but always with the same result. I have also tried using a balanced design and a simple design all with the same results. I have used the DGEList object to analyse the data using the original estimateCommonDisp() and exactTest() which worked fine. I was hoping someone might be able to help resolve this for me???? Thanks in advance....................... Josquin The sessionInfo() and traceback() are below: > sessionInfo() R version 2.12.0 (2010-10-15) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C [5] LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MASS_7.3-8 edgeR_1.8.1 loaded via a namespace (and not attached): [1] limma_3.6.0 > traceback() 4: .Fortran("dqrls", qr = x[good, ] * w, n = ngoodobs, p = nvars, y = w * z, ny = 1L, tol = min(1e-07, control$epsilon/1000), coefficients = double(nvars), residuals = double(ngoodobs), effects = double(ngoodobs), rank = integer(1L), pivot = 1L:nvars, qraux = double(nvars), work = double(2 * nvars), PACKAGE = "base") 3: glm.fit(design, y[i, ], offset = offset[i, ], family = f) 2: adjustedProfileLik(spline.disp[i], y.select, design = design, offset = offset.mat.select + lib.size.mat.select) 1: estimateCRDisp(DGEList.object, design) Notice: This email and any attachments may contain information t...{{dropped:14}}
RNASeq edgeR RNASeq edgeR • 1.3k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 22 months ago
United States
Hi, I'm going to cut out most of your email to highlight the relevant part: On Fri, Oct 22, 2010 at 1:44 AM, <josquin.tibbits at="" dpi.vic.gov.au=""> wrote: > I am running the new EdgeR package 1.8 and have updated R to 1.12.0 and > updated all the packages I have installed. > > I have an RNAseq experiment with 24 samples and am wanting to run a glm > analysis using the Cox-Reid common dispersion (and tagwise) paramaters. > ... > DGEList.object <- estimateCRDisp(DGEList.object, design) > names(DGEList.object) > > This step is giving ?the following error::: > > Loading required package: MASS > Error: NA/NaN/Inf in foreign function call (arg 1) > In addition: Warning messages: > 1: glm.fit: algorithm did not converge > 2: step size truncated due to divergence >> names(DGEList.object) > [1] "samples" "counts" > > Clearly the function is not working. I have tried two different computers > and a run it in 64 bit but always with the same result. Perhaps this is too simple a suggestion on my part, but do you have any "such" values in your DGEList.object$counts matrix? And by "such" I mean NA, NaN, or Inf values for some counts? What does range(DGEList.object$counts) give you? There's another description from the R-help archives of why this might also happen, but I'm not sure how helpful that is to you at the moment: http://article.gmane.org/gmane.comp.lang.r.general/54844 -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia
Dear Josquin, estimateCDDisp() has worked on all the datasets we have tested it out on so far (although that's admitedly a very limited number), so we are not able to reproduce your problems. The function is liable to convergence issues however because it needs to fit a very large number of generalized linear models, one or more of which could conceivably fail to converge. That might be the problem with your data. I don't see any problems with the code example that you give, you seem to be using the functions correctly, so the issue must be with the count data itself. Would you be willing to share your data with us offline so that we diagnose and perhaps solve the problem? To see estimateCDDisp() working as intended, you can run the example: library(edgeR) example(glmFit) CR <- estimateCRDisp(d, design) fit <- glmFit(d, design, dispersion=CR$CR.common.dispersion) results <- glmLRT(d, fit, coef=2) topTags(results) Best wishes Gordon ----- ORIGINAL MESSAGE ------------- [BioC] EdgeR - problems running estimateCRDisp josquin.tibbits at dpi.vic.gov.au josquin.tibbits at dpi.vic.gov.au Fri Oct 22 07:44:00 CEST 2010 I am running the new EdgeR package 1.8 and have updated R to 1.12.0 and updated all the packages I have installed. I have an RNAseq experiment with 24 samples and am wanting to run a glm analysis using the Cox-Reid common dispersion (and tagwise) paramaters. I have created a DGEList object using a targets file and this successfully ran in the calcNormFactors() and in the plotMDS.dge() functions. I have also created what appears to be a sensible and ok design matrix using model.matrix(). These then are being input to estimateCRDisp() to estimate the common dispersion. The code I have run is as follows (in red) with output (blue) which is pretty much exactly the same as the worked example in the manual on pp 51-- and does not seem abnormal at all...... targets <- read.delim("Lp_NUE_DGEbymSEQ_TARGETS_EdgeRIN.txt", stringsAsFactors = FALSE, row.names = "label") targets$Nitrogen <- factor(targets$Nitrogen) targets$Endophyte <- factor(targets$Endophyte) targets$Tissue <- factor(targets$Tissue) targets ## CREATE THE DGEList Object DGEList.object <- readDGE(targets, skip = 1, comment.char = '#') colnames(DGEList.object) <- row.names(targets) ## CALCULATE THE NORMALISATION (f) FACTORS DGEList.object <- calcNormFactors(DGEList.object) ## Exclude data where rowsums are < 30 YOU CAN CHANGE THIS (there being 24 samples in the experiment) DGEList.object <- DGEList.object[rowSums(DGEList.object$counts) > 30, ] DGEList.object An object of class "DGEList" $samples files Endophyte Tissue Nitrogen lib.size norm.factors Lp_NUE_LFFULL Lp_NUE_LFFULL_ER_RAWREADS.txt Free Leaf Full 3786614 0.9244799 Lp_NUE_LFNO3 Lp_NUE_LFNO3_ER_RAWREADS.txt Free Leaf Partial 5311856 0.8120410 Lp_NUE_LFK Lp_NUE_LFK_ER_RAWREADS.txt Free Leaf Full 3482772 0.7224960 Lp_NUE_LFPO4 Lp_NUE_LFPO4_ER_RAWREADS.txt Free Leaf Full 6208397 0.7367309 Lp_NUE_LFCa Lp_NUE_LFCa_ER_RAWREADS.txt Free Leaf Full 3597004 0.8147619 19 more rows ... $counts Lp_NUE_LFFULL Lp_NUE_LFNO3 Lp_NUE_LFK Lp_NUE_LFPO4 Lp_NUE_LFCa Lp_NUE_LFNH4 Lp_NUE_RFFULL Lp_NUE_RFNO3 Lp_NUE_RFK Lp_NUE_RFPO4 >prg_cds_44 345 382 120 261 141 215 888 717 698 1112 >prg_cds_141 290 451 126 422 235 459 676 559 474 879 >prg_cds_2 233 277 131 333 239 376 246 185 124 282 >prg_cds_16 433 608 184 614 346 423 924 756 595 913 >prg_cds_84 287 353 90 241 248 394 823 612 989 1061 Lp_NUE_RFCa Lp_NUE_RFNH4 Lp_NUE_LSFULL Lp_NUE_LSNO3 Lp_NUE_LSK Lp_NUE_LSPO4 Lp_NUE_LSCa Lp_NUE_LSNH4 Lp_NUE_RSFULL Lp_NUE_RSNO3 >prg_cds_44 1107 439 954 257 205 107 868 106 316 1147 >prg_cds_141 583 1052 773 325 230 193 600 264 414 1046 >prg_cds_2 235 454 177 232 177 203 138 239 334 228 >prg_cds_16 817 1040 2373 464 220 276 895 291 714 2655 >prg_cds_84 890 1653 894 268 169 123 714 209 311 1078 Lp_NUE_RSK Lp_NUE_RSPO4 Lp_NUE_RSCa Lp_NUE_RSNH4 >prg_cds_44 622 903 114 474 >prg_cds_141 505 711 205 1166 >prg_cds_2 93 136 157 340 >prg_cds_16 1398 1660 311 1258 >prg_cds_84 893 771 154 1269 63905 more rows ... ## Produce an MDS plot of the data #plotMDS.dge(DGEList.object, main = "MDS plot for Lp_NUE_DGEbymSEQ") ## DEFINE THE DESIGN MATRIX design <- model.matrix(~ Endophyte + Tissue + Nitrogen , data = targets) print("This is the design matrix") design (Intercept) EndophyteSTWT TissueRoot NitrogenPartial Lp_NUE_LFFULL 1 0 0 0 Lp_NUE_LFNO3 1 0 0 1 Lp_NUE_LFK 1 0 0 0 Lp_NUE_LFPO4 1 0 0 0 Lp_NUE_LFCa 1 0 0 0 Lp_NUE_LFNH4 1 0 0 1 Lp_NUE_RFFULL 1 0 1 0 Lp_NUE_RFNO3 1 0 1 1 Lp_NUE_RFK 1 0 1 0 Lp_NUE_RFPO4 1 0 1 0 Lp_NUE_RFCa 1 0 1 0 Lp_NUE_RFNH4 1 0 1 1 Lp_NUE_LSFULL 1 1 0 0 Lp_NUE_LSNO3 1 1 0 1 Lp_NUE_LSK 1 1 0 0 Lp_NUE_LSPO4 1 1 0 0 Lp_NUE_LSCa 1 1 0 0 Lp_NUE_LSNH4 1 1 0 1 Lp_NUE_RSFULL 1 1 1 0 Lp_NUE_RSNO3 1 1 1 1 Lp_NUE_RSK 1 1 1 0 Lp_NUE_RSPO4 1 1 1 0 Lp_NUE_RSCa 1 1 1 0 Lp_NUE_RSNH4 1 1 1 1 attr(,"assign") [1] 0 1 2 3 attr(,"contrasts") attr(,"contrasts")$Endophyte [1] "contr.treatment" attr(,"contrasts")$Tissue [1] "contr.treatment" attr(,"contrasts")$Nitrogen [1] "contr.treatment" ## ESTIMATE THE Cox-Reid common dispersion DGEList.object <- estimateCRDisp(DGEList.object, design) names(DGEList.object) This step is giving the following error::: Loading required package: MASS Error: NA/NaN/Inf in foreign function call (arg 1) In addition: Warning messages: 1: glm.fit: algorithm did not converge 2: step size truncated due to divergence > names(DGEList.object) [1] "samples" "counts" Clearly the function is not working. I have tried two different computers and a run it in 64 bit but always with the same result. I have also tried using a balanced design and a simple design all with the same results. I have used the DGEList object to analyse the data using the original estimateCommonDisp() and exactTest() which worked fine. I was hoping someone might be able to help resolve this for me???? Thanks in advance....................... Josquin The sessionInfo() and traceback() are below: > sessionInfo() R version 2.12.0 (2010-10-15) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C [5] LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MASS_7.3-8 edgeR_1.8.1 loaded via a namespace (and not attached): [1] limma_3.6.0 > traceback() 4: .Fortran("dqrls", qr = x[good, ] * w, n = ngoodobs, p = nvars, y = w * z, ny = 1L, tol = min(1e-07, control$epsilon/1000), coefficients = double(nvars), residuals = double(ngoodobs), effects = double(ngoodobs), rank = integer(1L), pivot = 1L:nvars, qraux = double(nvars), work = double(2 * nvars), PACKAGE = "base") 3: glm.fit(design, y[i, ], offset = offset[i, ], family = f) 2: adjustedProfileLik(spline.disp[i], y.select, design = design, offset = offset.mat.select + lib.size.mat.select) 1: estimateCRDisp(DGEList.object, design) Notice: This email and any attachments may contain information t...{{dropped:9}}
ADD COMMENT

Login before adding your answer.

Traffic: 457 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6