Problem with topTable function (after Bioconductor update)
1
0
Entering edit mode
@christian-de-santis-6143
Last seen 10.2 years ago
Hi to everybody, I hope someone can help me on this as I am losing my health. I was re running my script today which worked fine till last time I did it. When I run the topTable function on more than one coefficient it gives me the following error. > x.contrast.REG <- topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=20000, p.value = 0.05) Error in `row.names<-.data.frame`(`*tmp*`, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': I have tried to run the same script for each coefficient separately and it works fine. I can't explain the error since if there were duplicated row.names then I should get the same error when doing the analysis on individual contrasts since I run topTable on the fit object. The only thing I have changed was the fact that this morning I updated Bioconductor to the latest release. I really hope someone can help me understand. Regards, Christian De Santis > sessionInfo() R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] reshape2_1.2.2 ellipse_0.3-8 genefilter_1.44.0 limma_3.18.1 BiocInstaller_1.12.0 loaded via a namespace (and not attached): [1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 DBI_0.2-7 [6] IRanges_1.20.3 parallel_3.0.1 plyr_1.8 RSQLite_0.11.4 splines_3.0.1 [11] stats4_3.0.1 stringr_0.6.2 survival_2.37-4 tools_3.0.1 XML_3.98-1.1 [16] xtable_1.7-1 -- The University of Stirling has been ranked in the top 12 of UK universities for graduate employment*. 94% of our 2012 graduates were in work and/or further study within six months of graduation. *The Telegraph The University of Stirling is a charity registered in Scotland, number SC 011159. [[alternative HTML version deleted]]
• 978 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States
Hi Christian, On Wednesday, October 30, 2013 12:23:46 PM, Christian De Santis wrote: > Hi to everybody, > > I hope someone can help me on this as I am losing my health. Not sure if you were intending humor, but this cracked me up! > > I was re running my script today which worked fine till last time I did it. When I run the topTable function on more than one coefficient it gives me the following error. > >> x.contrast.REG <- topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=20000, p.value = 0.05) > Error in `row.names<-.data.frame`(`*tmp*`, value = value) : > duplicate 'row.names' are not allowed > In addition: Warning message: > non-unique values when setting 'row.names': > > I have tried to run the same script for each coefficient separately and it works fine. I can't explain the error since if there were duplicated row.names then I should get the same error when doing the analysis on individual contrasts since I run topTable on the fit object. > > The only thing I have changed was the fact that this morning I updated Bioconductor to the latest release. > > I really hope someone can help me understand. There appears to be a bug in the function topTableF. When you run topTable() on multiple coefficients, it actually uses topTableF(). In that function there is a bit of code where the data are subsetted according to the p.value and lfc values you submitted: if (lfc > 0 || p.value < 1) { if (lfc > 0) big <- rowSums(abs(M) > lfc, na.rm = TRUE) > 0 else big <- TRUE if (p.value < 1) { sig <- adj.P.Value <= p.value sig[is.na(sig)] <- FALSE } else sig <- TRUE keep <- big & sig if (!all(keep)) { M <- M[keep, , drop = FALSE] rn <- rn[keep] Amean <- Amean[keep] Fstat <- Fstat[keep] Fp <- p.value[keep] genelist <- genelist[keep, , drop = FALSE] adj.P.Value <- adj.P.Value[keep] } So now all the data have been subsetted (in your case) to just those genes with an unadjusted p-value < 0.05. And presumably that is fewer than the 20000 genes that you started with. But then there is some sorting of the resulting data: if sort.by == "F") o <- order(Fp, decreasing = FALSE)[1:number] else o <- 1:number if (is.null(genelist)) tab <- data.frame(M[o, , drop = FALSE]) else tab <- data.frame(genelist[o, , drop = FALSE], M[o, , drop = FALSE]) tab$AveExpr = fit$Amean[o] tab <- data.frame(tab, F = Fstat[o], P.Value = Fp[o], adj.P.Val = adj.P.Value[o]) rownames(tab) <- rn[o] The problem here is that 1:number is _longer_ than Fp, which has been subsetted already, based on the p-value criterion. So the vector 'o' will have a bunch of NAs on the end. As an example: (1:5)[1:8] [1] 1 2 3 4 5 NA NA NA And when you get to the last line there, where the rownames are set, since there are multiple NAs, you will have repeated rownames, which R won't allow. In the interest of your health, you can get around this for now by doing num <- sum(topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=20000)$P.value < 0.05) and then x.contrast.REG <- topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=num, p.value = 0.05) Best, Jim > > Regards, > Christian De Santis > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 > [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] reshape2_1.2.2 ellipse_0.3-8 genefilter_1.44.0 limma_3.18.1 BiocInstaller_1.12.0 > > loaded via a namespace (and not attached): > [1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 DBI_0.2-7 > [6] IRanges_1.20.3 parallel_3.0.1 plyr_1.8 RSQLite_0.11.4 splines_3.0.1 > [11] stats4_3.0.1 stringr_0.6.2 survival_2.37-4 tools_3.0.1 XML_3.98-1.1 > [16] xtable_1.7-1 > > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Hi Jim, Thank for the detailed explanation and yes, there was some humour together with some truth! :-) It really drove me crazy as it was working up until yesterday. That's why I thought it must have come with the Bioconductor update. I tried what you said and it worked just fine!!! Just a note for someone else might have the same problem and read this post, there was a misspelling in your code (P.Value instead of P.value). Thanks so much for taking the time to answer and rescue me. I really appreciated it! Regards, Christian -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: 30 October 2013 18:08 To: Christian De Santis Cc: bioconductor at r-project.org Subject: Re: [BioC] Problem with topTable function (after Bioconductor update) Hi Christian, On Wednesday, October 30, 2013 12:23:46 PM, Christian De Santis wrote: > Hi to everybody, > > I hope someone can help me on this as I am losing my health. Not sure if you were intending humor, but this cracked me up! > > I was re running my script today which worked fine till last time I did it. When I run the topTable function on more than one coefficient it gives me the following error. > >> x.contrast.REG <- topTable(fit.contrast, adjust.method="none", >> coef=c(1:4), number=20000, p.value = 0.05) > Error in `row.names<-.data.frame`(`*tmp*`, value = value) : > duplicate 'row.names' are not allowed In addition: Warning message: > non-unique values when setting 'row.names': > > I have tried to run the same script for each coefficient separately and it works fine. I can't explain the error since if there were duplicated row.names then I should get the same error when doing the analysis on individual contrasts since I run topTable on the fit object. > > The only thing I have changed was the fact that this morning I updated Bioconductor to the latest release. > > I really hope someone can help me understand. There appears to be a bug in the function topTableF. When you run topTable() on multiple coefficients, it actually uses topTableF(). In that function there is a bit of code where the data are subsetted according to the p.value and lfc values you submitted: if (lfc > 0 || p.value < 1) { if (lfc > 0) big <- rowSums(abs(M) > lfc, na.rm = TRUE) > 0 else big <- TRUE if (p.value < 1) { sig <- adj.P.Value <= p.value sig[is.na(sig)] <- FALSE } else sig <- TRUE keep <- big & sig if (!all(keep)) { M <- M[keep, , drop = FALSE] rn <- rn[keep] Amean <- Amean[keep] Fstat <- Fstat[keep] Fp <- p.value[keep] genelist <- genelist[keep, , drop = FALSE] adj.P.Value <- adj.P.Value[keep] } So now all the data have been subsetted (in your case) to just those genes with an unadjusted p-value < 0.05. And presumably that is fewer than the 20000 genes that you started with. But then there is some sorting of the resulting data: if sort.by == "F") o <- order(Fp, decreasing = FALSE)[1:number] else o <- 1:number if (is.null(genelist)) tab <- data.frame(M[o, , drop = FALSE]) else tab <- data.frame(genelist[o, , drop = FALSE], M[o, , drop = FALSE]) tab$AveExpr = fit$Amean[o] tab <- data.frame(tab, F = Fstat[o], P.Value = Fp[o], adj.P.Val = adj.P.Value[o]) rownames(tab) <- rn[o] The problem here is that 1:number is _longer_ than Fp, which has been subsetted already, based on the p-value criterion. So the vector 'o' will have a bunch of NAs on the end. As an example: (1:5)[1:8] [1] 1 2 3 4 5 NA NA NA And when you get to the last line there, where the rownames are set, since there are multiple NAs, you will have repeated rownames, which R won't allow. In the interest of your health, you can get around this for now by doing num <- sum(topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=20000)$P.value < 0.05) and then x.contrast.REG <- topTable(fit.contrast, adjust.method="none", coef=c(1:4), number=num, p.value = 0.05) Best, Jim > > Regards, > Christian De Santis > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 > [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] reshape2_1.2.2 ellipse_0.3-8 genefilter_1.44.0 limma_3.18.1 BiocInstaller_1.12.0 > > loaded via a namespace (and not attached): > [1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 DBI_0.2-7 > [6] IRanges_1.20.3 parallel_3.0.1 plyr_1.8 RSQLite_0.11.4 splines_3.0.1 > [11] stats4_3.0.1 stringr_0.6.2 survival_2.37-4 tools_3.0.1 XML_3.98-1.1 > [16] xtable_1.7-1 > > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 -- The University of Stirling has been ranked in the top 12 of UK universities for graduate employment*. 94% of our 2012 graduates were in work and/or further study within six months of graduation. *The Telegraph The University of Stirling is a charity registered in Scotland, number SC 011159.
ADD REPLY

Login before adding your answer.

Traffic: 1320 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6