DESeq2 missing columns
1
0
Entering edit mode
@kuenne-carsten-6658
Last seen 10.2 years ago
Dear mailing list, I recently found out about the new DESeq version and am currently in the process of switching my pipelines to use it. You included some sweet new features and those may simplify my life a lot. Now my questions relate to the output tables bearing the final differently expressed genes. 1. Is there a plan to reintroduce the old DESeq 1 columns baseMeanA and baseMeanB? Or do you have some R code available to create a complete table DESeq-1-style? These two columns may be implicitly included in other values but it was just terribly useful to have them separately. 2. If the output table is written as a tab-delimited file instead of a CSV, the first column headline is removed (since it is empty) resulting in wrong headers for all columns. See the last line of the code below. Maybe you could include some header for this column ("id" or "feature" or something similar) to prevent this behaviour? [...] data = read.table("counts.matrix", header=T, row.names=1, com='') col_ordering = c(1,2,3,4) rnaseqMatrix = data[,col_ordering] rnaseqMatrix = round(rnaseqMatrix) rnaseqMatrix = rnaseqMatrix[rowSums(rnaseqMatrix)>=2,] conditions = data.frame(conditions=factor(c(rep("nha", 2), rep("nhc", 2)))) rownames(conditions) = colnames(rnaseqMatrix) ddsFullCountTable <- DESeqDataSetFromMatrix( countData = rnaseqMatrix, colData = conditions, design = ~ conditions) dds = DESeq(ddsFullCountTable) res = results(dds) write.table(as.data.frame(res[order(res$pvalue),]), file=deseq2_nha_nhc.txt', sep=' ', quote=FALSE, row.names=T) [...] [deseq2_nha_nhc.txt] baseMean log2FoldChange lfcSE pvalue padj comp277602_c0_seq1 438.902806642428 5.58864608712407 0.302717563529963 4.20808210669443e-76 3.08662822526037e-71 [...] Anyway thanks for the great update! Yours hopefully, Carsten [[alternative HTML version deleted]]
PROcess DESeq PROcess DESeq • 2.5k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 7 days ago
United States
hi Carsten, As DESeq2 is a more general model, the results tables don't necessarily involve only two conditions A and B, so to have the results table have consistent columns, we excluded these. Here is the code to reproduce the base means per condition: baseMeanA = rowMeans(counts(dds,normalized=TRUE)[,dds$condition == "A"]) baseMeanB = rowMeans(counts(dds,normalized=TRUE)[,dds$condition == "B"]) res = cbind(baseMeanA, baseMeanB, as.data.frame(res)) To force a header for the first column when writing out, you could add a column explicitly: res = cbind(genes=rownames(res), as.data.frame(res)) And then use row.names=FALSE when writing out. Mike On Mon, Jul 21, 2014 at 4:59 AM, Kuenne, Carsten <carsten.kuenne at="" mpi-bn.mpg.de=""> wrote: > Dear mailing list, > > > > I recently found out about the new DESeq version and am currently in the process of switching my pipelines to use it. You included some sweet new features and those may simplify my life a lot. Now my questions relate to the output tables bearing the final differently expressed genes. > > > > 1. Is there a plan to reintroduce the old DESeq 1 columns baseMeanA and baseMeanB? Or do you have some R code available to create a complete table DESeq-1-style? These two columns may be implicitly included in other values but it was just terribly useful to have them separately. > > 2. If the output table is written as a tab-delimited file instead of a CSV, the first column headline is removed (since it is empty) resulting in wrong headers for all columns. See the last line of the code below. Maybe you could include some header for this column ("id" or "feature" or something similar) to prevent this behaviour? > > > > [...] > > data = read.table("counts.matrix", header=T, row.names=1, com='') > > col_ordering = c(1,2,3,4) > > rnaseqMatrix = data[,col_ordering] > > rnaseqMatrix = round(rnaseqMatrix) > > rnaseqMatrix = rnaseqMatrix[rowSums(rnaseqMatrix)>=2,] > > conditions = data.frame(conditions=factor(c(rep("nha", 2), rep("nhc", 2)))) > > rownames(conditions) = colnames(rnaseqMatrix) > > ddsFullCountTable <- DESeqDataSetFromMatrix( > > countData = rnaseqMatrix, > > colData = conditions, > > design = ~ conditions) > > dds = DESeq(ddsFullCountTable) > > res = results(dds) > > write.table(as.data.frame(res[order(res$pvalue),]), file=deseq2_nha_nhc.txt', sep=' ', quote=FALSE, row.names=T) > > [...] > > > > [deseq2_nha_nhc.txt] > > baseMean log2FoldChange lfcSE pvalue padj > > comp277602_c0_seq1 438.902806642428 5.58864608712407 0.302717563529963 4.20808210669443e-76 3.08662822526037e-71 > > [...] > > > > Anyway thanks for the great update! > > > > Yours hopefully, > > Carsten > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6