Question

Number of rows in DESeq2 output (.csv) is not the same as number of rows in the results(dds) dataframe

0

Entering edit mode

Jamie • 0

@2c2f3803

Last seen 2.4 years ago

Denmark

Hi all,

I am new to DESeq2, but did this together with a colleague that has done some transcriptomics analyses before. However, neither of us could figure out why the output of results is: DataFrame with 14235 rows and 6 columns while our .csv file (imported into excel) shows 31415 rows and 6+1 columns (this last one is obviously because the gene names are now an extra column).

Can anyone tell us why we have so many more rows suddenly? The code we used is below.

#read in counts table with gene names as rownames
read.table("mt_mapped_paired_readcounts.tsv.txt", sep= '\t', header = FALSE, row.names = 1) -> counts
#filter out rows with only zeros
counts.nozero <- counts[rowSums(counts) != 0,]
dim(counts.nozero)
#removing the last row, which contains NA
counts.nozero.nona <- counts.nozero[1:14235,]

#file to explain which column is which
read.table("columndata", sep= ',', header = TRUE) -> columndata

library(DESeq2)
dds <- DESeqDataSetFromMatrix(countData = counts.nozero.nona,
                              colData = columndata,
                              design = ~ cables)
dds<- DESeq(dds)

res <- results(dds, name="cables_yes_cables_vs_no_cables")
res

#res output
log2 fold change (MLE): cables yes cables vs no cables 
Wald test p-value: cables yes cables vs no cables 
DataFrame with 14235 rows and 6 columns

write.csv(as.data.frame(res), file="deseq2results.csv")

output difference • 893 views

ADD COMMENT • link 2.4 years ago Jamie • 0

0

Entering edit mode

Are you sure you checked the right file ? There is no reason for as.data.frame(res) to add lines. please check library(readr); deseq2results <- read_csv("deseq2results.csv"); dim(deseq2results)

ADD REPLY • link 2.4 years ago Basti ▴ 780

score 0 · Answer 1 · 2022-07-04

I think I found the problem, DESeq2 or R could not deal with 5' and 3' or mentionings of commas in the gene names, so after conversion to '_' or letters the # of output rows in our .csv file corresponds to the output rows of results(dds).

Sorry for bothering people with such a basic problem! I will leave the post so others can find how to fix it if they made the same mistake.