Hi,
I am analyzing RNA-Seq dataset using EdgeR
package and have a question about filtering by filterByExpr
that would keep important genes based on a variable column of the sample metadata.
I have worked earlier with dataset with only 1 timepoint (High dose vs. Control), and have performed filterByExpr
on this treatment column. I am now working with the new dataset with same treatment column, however corresponding to 3 timepoints (see example below). My question is, should I perform filtering on the Treatment
column or the Treatment_Timepoint
column. I assume Treatment
column is the right one since this is the core of the experiment. Please advise.
dput(Sample.info)
#> Donor Treatment Timepoint Treatment_Timepoint
#> Sample.1 P1 Control 6hr Control_6hr
#> Sample.2 P2 Control 6hr Control_6hr
#> Sample.3 P3 Control 6hr Control_6hr
#> Sample.4 P4 Control 6hr Control_6hr
#> Sample.5 P1 High 6hr High_6hr
#> Sample.6 P2 High 6hr High_6hr
#> Sample.7 P3 High 6hr High_6hr
#> Sample.8 P4 High 6hr High_6hr
#> Sample.9 P1 Control 24hr Control_24hr
#> Sample.10 P2 Control 24hr Control_24hr
#> Sample.11 P3 Control 24hr Control_24hr
#> Sample.12 P4 Control 24hr Control_24hr
#> Sample.13 P1 High 24hr High_24hr
#> Sample.14 P2 High 24hr High_24hr
#> Sample.15 P3 High 24hr High_24hr
#> Sample.16 P4 High 24hr High_24hr
#> Sample.17 P1 Control 48hr Control_48hr
#> Sample.18 P2 Control 48hr Control_48hr
#> Sample.19 P3 Control 48hr Control_48hr
#> Sample.20 P4 Control 48hr Control_48hr
#> Sample.21 P1 High 48hr High_48hr
#> Sample.22 P2 High 48hr High_48hr
#> Sample.23 P3 High 48hr High_48hr
#> Sample.24 P4 High 48hr High_48hr
Thank you in advance.
Best Regards,
Toufiq
Gordon Smyth thank you very much.
Then, I would just use like the below:
Your code is correct. Your code is equivalent to what I suggested, just somewhat longer and more complicated. Why not use the
group
argument, which saves you having to create extra design matrix?Gordon Smyth this is noted, thank you, I will write as you suggested.
Gordon Smyth I have a follow up question, lets say If I am working with multivariable experiment (perform statistical analysis on each variable column separately; in the above case compare
Treatment: High vs. Control
andTimepoint: 24hr vs. 6 hr & 48hr vs. 6hr
). At times, more variables depending on the experiment leading to complex set-up. In this scenario, what would be myfilterByExpr
column based on? In the above case, I knowTreatment
variable plays a crucial role with different incubation time which forms the basis of the experiment. To avoid confusion, Is it a good idea to simply userowSums
function (below) If I am unsure about about the right experimental conditions or It does not affect or change much? Sometimes, I use public RNAseq dataset from GEO for validation studies. Though for filtering purpose,filterByExpr
is my choice function.Lets assume another example to compare
Septic patients vs. Healthy Controls
, theseSeptic
patients are classified intolow
,mild
,moderate
andsevere
which are again sub-classified intooutcome status
which areRecovered
, andNon-Recovered
. From this data, I am interested primarily to compareSeptic patients vs. Healthy Controls
transcriptomic signatures, and then proceed to different levels of analysis involvingSeverity
, andOutcome Status
. Is myfilterbyexpr
column would beSeptic and Healthy
column?You enter the whole design matrix to filterByExpr(). You don't choose which experimental conditions to use.
Gordon Smyth Meaning, something like the the below?
Something like what? I advised you to use "all treatment factors" and use the "whole design matrix" but you've done the opposite, omitting the design matrix entirely.
Reading your previous comment, you're making this trickier than it actually is. In reality, there's nothing to think about. You don't have to decide which treatments to use, you don't use different filtering for different contrasts, you just input the complete design matrix to
filterByExpr
, same as you use forlmFit
. The only change you might make for filtering purposes is to remove a blocking variable from the design matrix.Hi Gordon Smyth apologies for the confusion. I did use use
filterByExpr
withgroup=Treatment_Timepoint
for the data that I was working earlier, however, just had an different additional question regarding multi-conditional experiment. Thank you very much for the inputs.