Hi all,
I'm working with a RNA-seq (tags) dataset, having some trouble getting my limma results to a file. I use write.fit for this. Seemed to be doing its job, but after I changed the contrasts slightly, I now get a warning. (changing to my old script also doesn't help). The analysis seems to be running (example part of the dataset below, with script). However, when i try to output the results to a .tsv file with write.fit, a problem occurs. First I get this warning:
> write.fit(fit1,results=P1,file="results.tsv",digits=3,adjust="fdr",method="global",sep="\t")
Warning message:
In data.frame(A = c(5.56, 7.45, 2.74, 2.67, 5.17, 6.51, 2.77, 3.6, :
row names were found from a short variable and have been discarded
Which seems to suggest to me that something is wrong with the row names (duplicated)/combining the different coefficients. When I look at the .tsv ouput I get a table with each gene row "multiplied" by the number of contrasts. Example (part of the table):
Genes.Gene.name | A | Coef.group | Coef.groupxtime | Coef.group2 | Coef.group2xtime |
Cops5 | 5.56 | -0.688 | -0.216 | 0.539 | -0.106 |
Cops5 | 5.56 | -0.688 | -0.216 | 0.539 | -0.106 |
Cops5 | 5.56 | -0.688 | -0.216 | 0.539 | -0.106 |
Cops5 | 5.56 | -0.688 | -0.216 | 0.539 | -0.106 |
Arfgef1 | 7.45 | 0.545 | -0.012 | -0.233 | -0.026 |
Arfgef1 | 7.45 | 0.545 | -0.012 | -0.233 | -0.026 |
Arfgef1 | 7.45 | 0.545 | -0.012 | -0.233 | -0.026 |
Arfgef1 | 7.45 | 0.545 | -0.012 | -0.233 | -0.026 |
I tried running everything again, also the script that previously worked, thinking something had gone wrong in my tables (they all look ok when I check this though). I also updated my packages. Using topTable to access the results to write a table for each coeffient does work, but as I'm doing a lot of comparisons (the above is a simplified example, but also doesn't work) I'd rather have them in the one table using write.fit.
I really can't figure out why I'm having this problem (sorry if this is a stupid question, as a biologist my understanding of R is very basic). So I could do with some pointers on how to solve this problem. Thanks in advance!
Anne
R code:
> head(dataset[,1:8]) # table with annotated counts per sample
S1 S2 S3 S4 S6 S7 S8 S9
TC_chr1_-_10037945_Cops5 807 733 255 397 591 727 611 888
TC_chr1_-_10232984_Arfgef1 1855 1775 668 1088 1263 1630 1477 1664
TC_chr1_-_105356670_Rnf152 117 52 27 35 72 64 40 79
TC_chr1_-_106714001_Bcl2 74 72 19 32 34 34 42 48
TC_chr1_-_106759669_Kdsr 475 550 206 352 370 461 318 334
TC_chr1_-_106796644_Vps4b 1138 1162 430 679 848 1181 788 943
> names(genes)
[1] "TCID" "gene_region" "CHROMOSOME" "START" "END" "STRAND"
[7] "START.ENS" "END.ENS" "ENSEMBL.TRANSCRIPTID" "STRAND." "GENE.ENSEMBLE.ID" "Source"
[13] "GENE.TYPE" "Status" "Gene.name" "ENTREZGENEID"
> group <- factor(c("D1","A1","B2","C2","B2","D2","B1","D2","A2",
+ "C1","A2","D1","B1","B2","C1","B1","C1",
+ "C2","C2","D1","A1","A2","B1","B2","C2","D2",
+ "C1","A2","D2","B2","B1","B2","C1","C2","A1",
+ "D1","D1","A2","B1","A2"))
> design = model.matrix(~0+group)
> g.name <- c("A1","A2","B1","B2","C1","C2","D1","D2")
> colnames(design) <- g.name
>
> # DGEList
> d <- DGEList(counts=dataset, group=group, genes=genes)
>
> # filter for low counts and reset lib.sizes
> fcpm <- rowSums(cpm(d) > 1) >= 20
> d <- d[fcpm,]
> d$samples$lib.size <- colSums(d$counts)
> dim(d)
[1] 16314 40
> # library scale normalisation factors
> d <- calcNormFactors(d, method="TMM")
> nf <- d$samples$norm.factors
>
> # normalise, voom with quality weights
> y <- voomWithQualityWeights(d, design=design, lib.size=(d$samples$lib.size*nf), normalization="none", plot=TRUE)
>
> fit <- lmFit(y,design)
>
> my.contrast = makeContrasts(group = (B1+B2)-(A1+A2),
+ groupxtime = (B1-B2)-(A1-A2),
+ group2 = (D1+D2)-(C1+C2),
+ group2xtime = (D1-D2)-(C1-C2),levels=design)
> fit1 <- contrasts.fit(fit,my.contrast)
> fit1 <- eBayes(fit1)
>
> P1 <- decideTests(fit1, adjust="none", p.value=0.005)
> FDR1 <- decideTests(fit1,adjust="fdr",method="global",p.value=0.1)
> DEG_group <- rbind(summary(P1),summary(FDR1))
> print(DEG_group)
group groupxtime group2 group2xtime
-1 479 33 42 33
0 15576 16265 16266 16263
1 259 16 6 18
-1 0 0 0 0
0 16314 16314 16314 16314
1 0 0 0 0
> write.fit(fit1,results=P1,file="results.tsv",digits=3,adjust="fdr",method="global",sep="\t")
Warning message:
In data.frame(A = c(5.56, 7.45, 2.74, 2.67, 5.17, 6.51, 2.77, 3.6, :
row names were found from a short variable and have been discarded
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.10.2 limma_3.24.8 sparcl_1.0.3 calibrate_1.7.2 MASS_7.3-40 plyr_1.8.2
loaded via a namespace (and not attached):
[1] tools_3.2.0 Rcpp_0.11.6
Thank you for the workaround! I have my file - one happy biologist! Will keep an eye out for the update when it arrives, too.