Question

DESeq2 output with multiple Ensembl Gene Names per row, how to resolve?

0

Entering edit mode

Nicholas Owen • 0

@nicholas-owen-7068

Last seen 7.6 years ago

Dear All,

We have been using DESeq2 on our RNA-seq data to look for differential expression of genes and it works well.

One issue that keeps on cropping up is the allocation of the EnsemblIDs per row in results(dds), frequently we are getting multiple IDs per row, for example:

ENSG00000001084+ENSG00000231683 6.325517e+02 -0.2914254001 0.10554200 7.586041e+00 5.882200e-03 0.0813763865

Obviously this interfers with annotation so have split it by + and annotated both for gene names etc.

However with many of them per dataset I wondered how best to handle them quickly and easily? I know I could manually check each but with a 100 or so like this per dataset it seems to be not a great use of time.

Searching around I have seen little posted about it, although one suggested just ignore them which seems strange?

How do people handle this? Thanks in advance for any advice.

Cheers

rnaseq deseq2 ensembl • 1.4k views

ADD COMMENT • link updated 7.9 years ago by Michael Love 43k • written 7.9 years ago by Nicholas Owen • 0

score 0 · Answer 1 · 2017-05-11

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 15 hours ago

United States

The annotation of the rows of the DESeqDataSet happens upstream of DESeq2. How do you create the counts matrix? You can try to fix things earlier on in your pipeline.

ADD COMMENT • link 7.9 years ago Michael Love 43k

0

Entering edit mode

Thanks for the information, appreciate it.

From the sorted BAM we have removed the marked duplicates using samtools view -F 0x0400 and piped this into the dexseq count.py to get the counts for each sample, then prepared the matrix from this data. I can see that output has the multiple ENSG per line, although we have paired end data its not stranded. What would be the best way to handle this? Again thanks in advance .

ADD REPLY • link 7.9 years ago Nicholas Owen • 0

0

Entering edit mode

As far as I know, the dexseq_count.py file from DEXSeq is for exon-level analysis, so not for preparing count matrices for gene-level analysis with DESeq2. Can you take a look at our workflow which describes a number of ways to prepare count matrices for gene-level analysis with DESeq2:

http://www.bioconductor.org/help/workflows/rnaseqGene/

Or is there more to it than dexseq_count.py?

ADD REPLY • link 7.9 years ago Michael Love 43k