Question

Extracting results for a single chromosome

0

Entering edit mode

Charlotte • 0

@charlotte-23646

Last seen 4.5 years ago

Hello,

I have run DESeq2 for genome wide ATAC-seq data. At the moment genomic locations are reported in the DESeq results table in this style: "Chr112421625" where 1245 and 1625 are the start and end positions along the chromosome. I would like to obtain the peaks for just one chromosome for downstream analysis. Is there any way of subsetting the results of DESeq2 by providing it with a chromosome number?

I am reluctant to subset the dataset and just run the analysis on the single chromosome as I assume that would reduce the power of the analysis,

I hope this makes sense,

Thank you!

deseq2 • 934 views

ADD COMMENT • link 4.6 years ago Charlotte • 0

score 1 · Answer 1 · 2020-06-04

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 17 hours ago

United States

This is more about the container object that DESeq2 builds on:

https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html

The dds object is a SummarizedExperiment, and you can subset a RangedSummarizedExperiment with square brackets, e.g.:

dds[dds %over% x,]

where x is a GRanges object.

The ranges of the dds are stored as rowRanges(dds). If you use tximeta to build the dds object then ranges are automatically added by the software. Otherwise, you may need to manually add the ranges for the genes in the dds.

ADD COMMENT • link 4.6 years ago Michael Love 43k

0

Entering edit mode

By the way, if you just want to subset the dds by gene name, you can do:

dds[genes,]

Where genes is a character vector of the names of genes on a particular chromosome.

ADD REPLY • link 4.6 years ago Michael Love 43k

0

Entering edit mode

Thank you. I don't think I can use tximeta as I work on a non-model organism. How do you manually add the gene ranges into the dds?

ADD REPLY • link 4.6 years ago Charlotte • 0

0

Entering edit mode

Ok, the part where you annotate the gene ranges is up to you.

You may want to ask a bioinformatics collaborator about how to annotate your data, as this is out of scope for the DESeq2 software. There are various annotation packages on Bioconductor that you can use to get the information you need.

Another option would be to post to Biostars (if you do, link from here to there so others can see your post).

ADD REPLY • link 4.6 years ago Michael Love 43k

0

Entering edit mode

I tried to use dds[dds %over% x,] where x is a vector containing the list of coordinates of the chromosome of interest. However, this returned the error:

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'overlapsAny' for signature ''"DESeqDataSet", "character" '

Instead I used this:

ddsObjchr1 <- ddsObj[subrows, ] where sub_rows contains the row names corresponding to Chr1

Is this a valid way? I am not sure as I noticed when I check the table of padj < 0.05 for chr1 for the full dataset I retrieve less hits than when I check the table of padj < 0.05 for chr1 for the subsetted results (ddsObj_chr1).

ADD REPLY • link 4.6 years ago Charlotte • 0

0

Entering edit mode

The %over% trick would only worked if you had a RangedSummarizedExperiment.

The second way is how to go if you don't have ranges, it is perfectly valid.

"when I check the table of padj < 0.05 for chr1 for the full dataset I retrieve less hits than when I check the table of padj < 0.05 for chr1"

You will get different p-values depending on if you give DESeq2 different rows of data. It is a hierarchical Bayesian model, so it depends on information shared across genes. I can't even say if you will get more or less with subsetting, there are many complex interdependencies in the data and the model.

ADD REPLY • link 4.6 years ago Michael Love 43k

0

Entering edit mode

Ok thank you, I thought it would be due to that. I just thought it was worth double checking the validity as the number of hits for the chr rose from 4 to 13 which was surprising - I didn't want to proceed with the method simply because it gave more more hits without it being statistically sound

ADD REPLY • link 4.6 years ago Charlotte • 0