Question

How can I delete specific rows (using gene name) from a DESeq2 dataset (or even just a counts file.

0

Entering edit mode

rattray56 • 0

@rattray56-15737

Last seen 5.7 years ago

 I am wanting to delete about 9000 rows of genes from 27,000.  The rationale is that 20 of our samples are "identical" in the sense that they have all been deleted for an essential gene (permitted due to a drug pretreatment),(aka biological replicates),  and the 9000 genes show very strong variation between the 20 independently generated samples. We think we can get rid of some of the noise by focusing on just the differences between the controls and the samples.  One problem is that the larger table is either the original RNA counts file or the deSeqDataSet, whereas the other table is a results table.  So I can't just search for identical rows... , but can use gene_ID to point to the correct rows.  I have tried things like %in% and intersect, but no luck (I am guessing because the rows are different beyond the name.  I did read the section on filtering reads (and I have already filtered low counts).  But couldn't figure out how I could filter by comparing one list to another.  Any advice is appreciated!

deseq2 • 2.7k views

ADD COMMENT • link updated 5.7 years ago by Steve Lianoglou ★ 13k • written 5.7 years ago by rattray56 • 0

score 1 · Answer 1 · 2019-08-16

Putting aside the rationale for wanting to remove these genes, there are many ways to do what you're after.

Let's say your DESeqDataSet object is named dds and your "results table" is called res.

Take a minute to familiarize yourself with the type of entries stored in the columns of res, by taking a quick peak: head(res).

Your dds should have some type of row-level identifiers. You can find this out by looking at the output of head(rownames(dds)). Can you match those identifiers with any of the entries in the columns of res? If you can't, you've got larger fish to fry, but let's press on ...

Depending on how you built dds, it should also have a DataFrame of meta information for the rows (genes) of your dds, which you can see by looking at rowData(dds). Do any of the entries there match the entries in the columns of res?

One you've identified the column in res that has identifiers you can match to some gene information in dds, then get the identifiers from res you want to remove, and store them in axe, and do something like dds2 <- dds[!rowData(dds)$some_identifier %in% axe,]

A DESeqDataSet is a SummarizedExperiment (read that vignette if you haven't already). Both of which can be indexed like a 2d data structure. If you are having problems with the mechanics of subsetting and filtering 2d objects, then it'd be helpful to run through a couple of R tutorials before you get too frustrated by some R basics in your bioinformatics quest.