limma multiple rows per gene in output (write.fit)
2
0
Entering edit mode
Anne • 0
@anne-8108
Last seen 9.5 years ago
Netherlands

Hi all,

I'm working with a RNA-seq (tags) dataset, having some trouble getting my limma results to a file. I use write.fit for this. Seemed to be doing its job, but after I changed the contrasts slightly, I now get a warning. (changing to my old script also doesn't help). The analysis seems to be running (example part of the dataset below, with script). However, when i try to output the results to a .tsv file with write.fit, a problem occurs. First I get this warning:

> write.fit(fit1,results=P1,file="results.tsv",digits=3,adjust="fdr",method="global",sep="\t")
Warning message:
In data.frame(A = c(5.56, 7.45, 2.74, 2.67, 5.17, 6.51, 2.77, 3.6,  :
  row names were found from a short variable and have been discarded

Which seems to suggest to me that something is wrong with the row names (duplicated)/combining the different coefficients. When I look at the .tsv ouput I get a table with each gene row "multiplied" by the number of contrasts. Example (part of the table):

Genes.Gene.name A Coef.group Coef.groupxtime Coef.group2 Coef.group2xtime
Cops5 5.56 -0.688 -0.216 0.539 -0.106
Cops5 5.56 -0.688 -0.216 0.539 -0.106
Cops5 5.56 -0.688 -0.216 0.539 -0.106
Cops5 5.56 -0.688 -0.216 0.539 -0.106
Arfgef1 7.45 0.545 -0.012 -0.233 -0.026
Arfgef1 7.45 0.545 -0.012 -0.233 -0.026
Arfgef1 7.45 0.545 -0.012 -0.233 -0.026
Arfgef1 7.45 0.545 -0.012 -0.233 -0.026

I tried running everything again, also the script that previously worked, thinking something had gone wrong in my tables (they all look ok when I check this though). I also updated my packages. Using topTable to access the results to write a table for each coeffient does work, but as I'm doing a lot of comparisons (the above is a simplified example, but also doesn't work) I'd rather have them in the one table using write.fit.

I really can't figure out why I'm having this problem (sorry if this is a stupid question, as a biologist my understanding of R is very basic).  So I could do with some pointers on how to solve this problem. Thanks in advance! 

Anne

R code:

> head(dataset[,1:8]) # table with annotated counts per sample 

                                      S1   S2  S3   S4   S6   S7   S8   S9
TC_chr1_-_10037945_Cops5    807  733 255  397  591  727  611  888
TC_chr1_-_10232984_Arfgef1 1855 1775 668 1088 1263 1630 1477 1664
TC_chr1_-_105356670_Rnf152  117   52  27   35   72   64   40   79
TC_chr1_-_106714001_Bcl2     74   72  19   32   34   34   42   48
TC_chr1_-_106759669_Kdsr    475  550 206  352  370  461  318  334
TC_chr1_-_106796644_Vps4b  1138 1162 430  679  848 1181  788  943

> names(genes)
 [1] "TCID"                 "gene_region"          "CHROMOSOME"           "START"                "END"                  "STRAND"         
 [7] "START.ENS"               "END.ENS"                "ENSEMBL.TRANSCRIPTID" "STRAND."              "GENE.ENSEMBLE.ID"     "Source" 
[13] "GENE.TYPE"            "Status"               "Gene.name"            "ENTREZGENEID"             


> group <- factor(c("D1","A1","B2","C2","B2","D2","B1","D2","A2",
+                   "C1","A2","D1","B1","B2","C1","B1","C1",
+                   "C2","C2","D1","A1","A2","B1","B2","C2","D2",
+                   "C1","A2","D2","B2","B1","B2","C1","C2","A1",
+                   "D1","D1","A2","B1","A2"))
> design = model.matrix(~0+group)
> g.name <- c("A1","A2","B1","B2","C1","C2","D1","D2")
> colnames(design) <- g.name

> # DGEList
> d <- DGEList(counts=dataset, group=group, genes=genes)

> # filter for low counts and reset lib.sizes
> fcpm <- rowSums(cpm(d) > 1) >= 20
> d <- d[fcpm,]
> d$samples$lib.size <- colSums(d$counts)
> dim(d)
[1] 16314    40
> # library scale normalisation factors
> d <- calcNormFactors(d, method="TMM")
> nf <- d$samples$norm.factors

> # normalise, voom with quality weights
> y <- voomWithQualityWeights(d, design=design, lib.size=(d$samples$lib.size*nf), normalization="none", plot=TRUE)

> fit <- lmFit(y,design)
>
> my.contrast = makeContrasts(group = (B1+B2)-(A1+A2),
+                             groupxtime = (B1-B2)-(A1-A2),
+                             group2 = (D1+D2)-(C1+C2),
+                             group2xtime = (D1-D2)-(C1-C2),levels=design)
> fit1 <- contrasts.fit(fit,my.contrast)
> fit1 <- eBayes(fit1)
>
> P1 <- decideTests(fit1, adjust="none", p.value=0.005)
> FDR1 <- decideTests(fit1,adjust="fdr",method="global",p.value=0.1)
> DEG_group <- rbind(summary(P1),summary(FDR1))
> print(DEG_group)
   group groupxtime group2 group2xtime
-1   479         33     42          33
0  15576      16265  16266       16263
1    259         16      6          18
-1     0          0      0           0
0  16314      16314  16314       16314
1      0          0      0           0

> write.fit(fit1,results=P1,file="results.tsv",digits=3,adjust="fdr",method="global",sep="\t")
Warning message:
In data.frame(A = c(5.56, 7.45, 2.74, 2.67, 5.17, 6.51, 2.77, 3.6,  :
  row names were found from a short variable and have been discarded

> sessionInfo()

R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.10.2    limma_3.24.8    sparcl_1.0.3    calibrate_1.7.2 MASS_7.3-40     plyr_1.8.2     

loaded via a namespace (and not attached):
[1] tools_3.2.0 Rcpp_0.11.6

limma results • 3.7k views
ADD COMMENT
3
Entering edit mode
@gordon-smyth
Last seen 6 minutes ago
WEHI, Melbourne, Australia

Aaron's diagnosis of the problem is correct. Amazing we haven't tripped this bug before in the 12 years that write.fit() has existed.

I have committed a bug fix to Bioconductor. It will be available for installation using biocLite() in a couple of days. In the meantime, here is a workaround:

FDR <- fit1$p.value
FDR[] <- p.adjust(FDR, method="BH")
fit1$genes <- data.frame(fit1$genes, FDR)
write.fit(fit1,results=P1,file="results.tsv",digits=3,sep="\t")
ADD COMMENT
0
Entering edit mode

Thank you for the workaround! I have my file - one happy biologist! Will keep an eye out for the update when it arrives, too.

ADD REPLY
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 20 hours ago
The city by the bay

This appears to be an issue with write.fit. When the BH adjustment is applied with method="global", the dimension attributes are lost when the p-value matrix is processed with p.adjust. This means that the adjusted p-values are stored as a vector instead of a matrix, such that the output table is expanded row-wise by four-fold to accommodate it (as each entry of the vector is given a new row, and there are four times as many entries in the vector as there are rows in the original matrix when you have four contrasts). I'm pretty sure this is not intentional; however, until the patch comes through for limma, just set method="separate" to get sensible results.

ADD COMMENT
0
Entering edit mode

Thank you for the fast response! Reassuring to know that it is not a problem with my code, just the method that I'm trying with the writing out results. I've now tried Gordon's workaround, that seems to do the job :)

ADD REPLY

Login before adding your answer.

Traffic: 706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6