Question about filter by expression (filterByExpr) in edgeR
1
0
Entering edit mode
@mohammedtoufiq91-17679
Last seen 19 days ago
United States

Hi,

I am analyzing RNA-Seq dataset using EdgeR package and have a question about filtering by filterByExpr that would keep important genes based on a variable column of the sample metadata.

I have worked earlier with dataset with only 1 timepoint (High dose vs. Control), and have performed filterByExpr on this treatment column. I am now working with the new dataset with same treatment column, however corresponding to 3 timepoints (see example below). My question is, should I perform filtering on the Treatment column or the Treatment_Timepoint column. I assume Treatment column is the right one since this is the core of the experiment. Please advise.

dput(Sample.info)

#>           Donor Treatment Timepoint Treatment_Timepoint
#> Sample.1     P1   Control       6hr         Control_6hr
#> Sample.2     P2   Control       6hr         Control_6hr
#> Sample.3     P3   Control       6hr         Control_6hr
#> Sample.4     P4   Control       6hr         Control_6hr
#> Sample.5     P1      High       6hr            High_6hr
#> Sample.6     P2      High       6hr            High_6hr
#> Sample.7     P3      High       6hr            High_6hr
#> Sample.8     P4      High       6hr            High_6hr
#> Sample.9     P1   Control      24hr        Control_24hr
#> Sample.10    P2   Control      24hr        Control_24hr
#> Sample.11    P3   Control      24hr        Control_24hr
#> Sample.12    P4   Control      24hr        Control_24hr
#> Sample.13    P1      High      24hr           High_24hr
#> Sample.14    P2      High      24hr           High_24hr
#> Sample.15    P3      High      24hr           High_24hr
#> Sample.16    P4      High      24hr           High_24hr
#> Sample.17    P1   Control      48hr        Control_48hr
#> Sample.18    P2   Control      48hr        Control_48hr
#> Sample.19    P3   Control      48hr        Control_48hr
#> Sample.20    P4   Control      48hr        Control_48hr
#> Sample.21    P1      High      48hr           High_48hr
#> Sample.22    P2      High      48hr           High_48hr
#> Sample.23    P3      High      48hr           High_48hr
#> Sample.24    P4      High      48hr           High_48hr

Thank you in advance.

Best Regards,

Toufiq

CPM filterByExpr edgeR design model.matrix • 2.6k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 9 hours ago
WEHI, Melbourne, Australia

Use filterByExpr with group=Treatment_Timepoint. Filtering should take into account all treatment factors, but doesn't need to account for blocking variables (Donor in this case).

ADD COMMENT
0
Entering edit mode

Gordon Smyth thank you very much.

Then, I would just use like the below:

Treatment_Timepoint.filter <- factor(Sample.info$Treatment_Timepoint)
design.Treatment.filter <- model.matrix(~ 0+Treatment_Timepoint.filter)
colnames(design.Treatment.filter) <- levels(Treatment_Timepoint.filter)

keep <- filterByExpr(y, design.Treatment.filter)
table(keep)
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y, method = "TMM")
ADD REPLY
0
Entering edit mode

Your code is correct. Your code is equivalent to what I suggested, just somewhat longer and more complicated. Why not use the group argument, which saves you having to create extra design matrix?

ADD REPLY
0
Entering edit mode

Gordon Smyth this is noted, thank you, I will write as you suggested.

ADD REPLY
0
Entering edit mode

Gordon Smyth I have a follow up question, lets say If I am working with multivariable experiment (perform statistical analysis on each variable column separately; in the above case compare Treatment: High vs. Control and Timepoint: 24hr vs. 6 hr & 48hr vs. 6hr). At times, more variables depending on the experiment leading to complex set-up. In this scenario, what would be my filterByExpr column based on? In the above case, I know Treatment variable plays a crucial role with different incubation time which forms the basis of the experiment. To avoid confusion, Is it a good idea to simply use rowSums function (below) If I am unsure about about the right experimental conditions or It does not affect or change much? Sometimes, I use public RNAseq dataset from GEO for validation studies. Though for filtering purpose, filterByExpr is my choice function.

Lets assume another example to compare Septic patients vs. Healthy Controls, these Septic patients are classified into low, mild, moderate and severe which are again sub-classified into outcome status which are Recovered, and Non-Recovered. From this data, I am interested primarily to compare Septic patients vs. Healthy Controls transcriptomic signatures, and then proceed to different levels of analysis involving Severity, and Outcome Status. Is my filterbyexpr column would be Septic and Healthy column?

keep <- rowSums(y$counts) > 50
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
ADD REPLY
0
Entering edit mode

You enter the whole design matrix to filterByExpr(). You don't choose which experimental conditions to use.

ADD REPLY
0
Entering edit mode

Gordon Smyth Meaning, something like the the below?

filterByExpr (y)
ADD REPLY
1
Entering edit mode

Something like what? I advised you to use "all treatment factors" and use the "whole design matrix" but you've done the opposite, omitting the design matrix entirely.

Reading your previous comment, you're making this trickier than it actually is. In reality, there's nothing to think about. You don't have to decide which treatments to use, you don't use different filtering for different contrasts, you just input the complete design matrix to filterByExpr, same as you use for lmFit. The only change you might make for filtering purposes is to remove a blocking variable from the design matrix.

ADD REPLY
0
Entering edit mode

Hi Gordon Smyth apologies for the confusion. I did use use filterByExpr with group=Treatment_Timepoint for the data that I was working earlier, however, just had an different additional question regarding multi-conditional experiment. Thank you very much for the inputs.

ADD REPLY

Login before adding your answer.

Traffic: 624 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6