edgeR - design a complex matrix
1
0
Entering edit mode
Karla • 0
@1312eda8
Last seen 8 weeks ago
Brazil

Dear all,

I'm a newbie in edgeR and read multiple tutorials + the EdgeR manual but still did not figure out how to solve it.

I'm running a DGE analysis with 27 samples. Unfortunately, there are no replicates (yes, I know the problems with this). My data is divided into something like this (a subset of my lib_spec):

lib_spec

This structure follows to 27_sample.

The df is a dataframe output from featureCounts. Columns of samples follow the same order that lib_spec.

Although the obvious comparison would be according to timepoints, I want to run further comparisons between animals, which is the reason why I'm trying to design a complex matrix (following page 42 from edgeR manual).

I tried cbind tp and animal into a column in my lib_spec but it will not help when I run other sorts of comparisons.

My code is:


group <- lib_spec$Class   # For creating DGEList group

y <- DGEList(counts = df, group = group) # Creates a table of counts designing each group 
bfr_filter <- y$samples # Checks libraries before filtering

keep <- filterByExpr(y) # Creates a vector for genes to be kept based on their expression
y <- y[keep, , keep.lib.sizes = FALSE] # Maintains only what is TRUE in vector list "keep"
aft_filter <- y$samples # Checks libraries after filtering

y <- normLibSizes(object = y, method = "TMM") # Library normalization by a chosen method


group <- lib_spec$Class
tp <- factor(lib_spec$Timepoint)    # Covariate 1
animal <- factor(lib_spec$Animal)    # Covariate 2

design <- model.matrix(~ 0 + group + tp + animal)
design

However, when I run it, I get the following:

design

It seems that I "lost" some data on my design because: 1) I don't see tpD1 2) I don't see animalA

This error makes estimateDisp a nightmare. And I don't want to subset my df into possible comparisons (like different timepoints and animals).

Does anyone know why I'm getting this design problem?

In time, does anyone know how to sort design as:

classbaseline classZIKV tpDm14 tpD1 tpD3 tpD5 tpD7 tpD10 tpD14 animalA animalB animalC animalD

Thank you in advance!

Karla

design model.matrix edgeR complexdesign • 330 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

Actually there is no error. The design matrix has been created correctly according to the model you have specified. You might want to read Section 4.4 of Law et al (2020) to understand why the first level of each factor doesn't appear in the design matrix. (Why this is so has been discussed many times on this forum.)

However I think you are not approaching the analysis correctly. Your experiment does in fact have replication, with each animal being one biological replicate. I suggest that you should combine Class and Timepoint into one factor (call it Group), then use

design <- model.matrix(~0 + group + animal)

Reference

Law CW, Zeglinski K, Dong X, Alhamdoosh M, Smyth GK, Ritchie ME (2020). A guide to creating design matrices for gene expression experiments. F1000Research 9, 1444. https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html

ADD COMMENT

Login before adding your answer.

Traffic: 751 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6