pre-filtering: removing rows with low gene counts

Question

RNA sequencing raw counts gene filtering

0

Entering edit mode

kcarey • 0

@b626d890

Last seen 11 months ago

United States

Hello All,

I am preparing to run normalization and differential expression on my RNA sequencing raw counts data. However, I wanted to perform a gene filtering step before running limma for batch correction (16 Tissue sites were used for data collection) and DESeq2 (DE Analysis and Normalization). I started with ~60,000 genes and I have already filtered for removal of genes was greater than or equal to 50% 0's expression in the data, which lowered my gene amount to 35,458 genes. This is still a bit high. Again, downstream, I will be performing DESeq2 and WGCNA, and I wanted to ensure that I had genes that were robust. I am not confident in the best approach to apply more filtering. I have visualized the data with a histogram and see a bimodal distribution, along with PCA plot as well. Can you offer any suggestions for filtering? Is this mainly technical or biological? From a technical side, I was told that bimodal distribution in RNA sequencing data is typical, and the left bump corresponds to noise. However, biologically, when I pulled some genes out, I did see that the expression across samples in boxplots for the genes, made sense based on my subtype grouping.

Are there any suggestions for filtering? I have seen people use this before DEseq2.

pre-filtering: removing rows with low gene counts

Calculate total read counts per gene

total_counts <- rowSums(counts(dds))

Filter genes with at least 10 total read counts

dds_filtered <- dds[total_counts >= 10, ]

However, this seems arbitrary and not data specific. I am not sure how to search for a value in literature. I am using high grade serous ovarian cancer data. After I filter, I plan to batch correct with limma before DESeq2.

Any suggestions will be great!

DESeq2 limma RNAseq • 2.4k views

ADD COMMENT • link 13 months ago kcarey • 0

score 0 · Answer 1 · 2024-03-13

0

Entering edit mode

ATpoint ★ 4.8k

@atpoint-13662

Last seen 6 hours ago

Germany

Both the vignettes of limma and DESeq2 have recommendations for prefiltering, please read them:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#pre-filtering

https://www.bioconductor.org/packages/devel/bioc/vignettes/limma/inst/doc/usersguide.pdf Section 15.3

I plan to batch correct with limma before DESeq2.

No, don't. DESeq2 expects integer counts. Read the manual of the tools you use first. Don't reinvent the wheel.

ADD COMMENT • link 13 months ago ATpoint ★ 4.8k

0

Entering edit mode

Thank you for pointing me in the right direction. I was overthinking it.

Kaylin

ADD REPLY • link 13 months ago kcarey • 0