EdgeR GLM using blocking analysis
1
0
Entering edit mode
@catalina-aguilar-hurtado-6554
Last seen 4.1 years ago
United States

Hi All,

I am trying to do a DE analysis in edgeR. I want to block the Subject effect (PCA shows they are grouped by subject rather than treatment) and will like to know which is the effect of LPS (Treatment) under each condition (co and c2). The idea is to get rid of the control PBS effect over LPS Treatment.

I am have tried a couple of things, but keep getting an error after estimateGLMTrendedDisp. Also will like to know if my design is ok.

Thanks,


x=as.matrix(read.table("2015_LPS_6h_col3.csv",header=T, sep=",", row.names=1))

y<- DGEList (counts=x )

> colnames(y)
 [1] "B1" "B2" "B4" "B5" "F1" "F2" "F4" "F5" "D1" "D2" "D4" "D5" "H1" "H2" "H4" "H5"

#Filtering

keep <- rowSums(cpm (y)>5)>=4
nkeep <- sum(keep)
y <- y[keep,]
dim(y)

#rest the library sizes

y$samples$lib.size <- colSums (y$counts)

#Normalisation

y <-calcNormFactors(y)

y$samples

 

targets= read.table ("targets3.csv", header = T, sep=",",row.names=1)

targets
   Subject Condition Treatment
B1       1        co       PBS
B2       2        co       PBS
B4       4        co       PBS
B5       5        co       PBS
F1       1        co       LPS
F2       2        co       LPS
F4       4        co       LPS
F5       5        co       LPS
D1       1        c2       PBS
D2       2        c2       PBS
D4       4        c2       PBS
D5       5        c2       PBS
H1       1        c2       LPS
H2       2        c2       LPS
H4       4        c2       LPS
H5       5        c2       LPS


Subject <- factor(targets$Subject)

Condition <- factor(targets$Condition, levels=c("co","c2"))

Treatment <- factor (targets$Treatment, levels=c("PBS","LPS"))

design <- model.matrix (~0 + Subject + Condition:Treatment, data=y$samples)

design

> colnames(design)
[1] "Subject1"                 "Subject2"                 "Subject4"                 "Subject5"                 "Conditionco:TreatmentPBS"
[6] "Conditionc2:TreatmentPBS" "Conditionco:TreatmentLPS" "Conditionc2:TreatmentLPS"
>

 

y <- estimateCommonDisp(y,design)


y <- estimateGLMTrendedDisp(y,design)

Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  :
  Design matrix not of full rank.  The following coefficients not estimable:
 Conditionc2:TreatmentLPS
 
 > sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] arrayQualityMetrics_3.20.0 DESeq_1.16.0               lattice_0.20-29            locfit_1.5-9.1             Biobase_2.24.0            
 [6] RcppArmadillo_0.4.650.1.1  Rcpp_0.11.5                GenomicRanges_1.16.4       GenomeInfoDb_1.0.2         IRanges_1.22.10           
[11] BiocGenerics_0.10.0        genefilter_1.46.1          edgeR_3.6.8                limma_3.20.9              

 

EdgeR rnaseq • 2.4k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 5 hours ago
The city by the bay

Well, as the error suggests, your design matrix is not of full rank. Consider that the first four columns of design, when added together, give an all-ones vector; similarly, the last four columns yield an all-ones vector. This means that the two sets of columns (first four and last four) are redundant. For any given set of fitted values, there are an infinite number of solutions for the coefficients, i.e., the coefficients cannot be estimated.

To get around this, we need to drop one coefficient from the design matrix. Any one of the interaction terms would do, but I'd suggest dropping the column named Conditionco:TreatmentPBS as this seems like a "baseline" group. This means that each of the Subject columns now represents the expression of the co/PBS sample in the corresponding subject. The other interaction terms represent the log-fold change of the corresponding condition/treatment combination over the co/PBS samples, averaged across all subjects.

Edit: Another possibility is to add condition-specific blocking factors for each subject:

> design2 <- model.matrix (~0 + Subject:Condition + Treatment:Condition)
> colnames(design2)
 [1] "Subject1:Conditionco"     "Subject2:Conditionco"    
 [3] "Subject4:Conditionco"     "Subject5:Conditionco"    
 [5] "Subject1:Conditionc2"     "Subject2:Conditionc2"    
 [7] "Subject4:Conditionc2"     "Subject5:Conditionc2"    
 [9] "Conditionco:TreatmentLPS" "Conditionc2:TreatmentLPS"

The first eight coefficients represent the expression of the PBS sample in each subject/condition combination. The last two coefficients represent the effect of LPS in each condition, and can be dropped to test for DE. This design allows the baseline expression in each patient to vary with condition, which is a bit more flexible than the model in the original post. However, it means that you can no longer compare expression directly between conditions, as any DE between conditions is absorbed in the patient/condition-specific blocking factors. Of course, you can still test for a differential effect of LPS between conditions, by comparing the LPS/PBS log-fold changes between conditions (i.e., test for equality in the last two coefficients, using contrast in glmLRT).

ADD COMMENT
0
Entering edit mode

Hi Aaron, trying to understand what you say, I need to simplify my design? not sure how to drop a column.

ADD REPLY
0
Entering edit mode

In your case, you can just do design[,-5].

ADD REPLY
0
Entering edit mode

Still get the error after dropping column 5.

Do the rest of the commands look fine?

> y <- estimateGLMTrendedDisp(y,design)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  :
  Design matrix not of full rank.  The following coefficients not estimable:
 Conditionc2:TreatmentLPS

ADD REPLY
0
Entering edit mode

To be absolutely clear, you need to assign the reduced matrix back to design.

design <- design[,-5]

If that still doesn't work... well, I've no idea. Run qr(design)$rank and see what you get (should be 7).

ADD REPLY
0
Entering edit mode

Thanks Aaron, my fault didn't put the design <-

but if I can ask you further.. what I wanted to know was the effects of LPS in co and c2 Conditions. but if I drop "Conditionco:TreatmentPBS" how to get the effect of LPS - PBS in Condition:co?

Same for the second design2, if I do :

LPSeffect <- glmLRT(fit,contrast=c(0,0,0,0,0,0,0,0,-1,1)) that will give me the LPS effect between the two but I will like to know for co and c2 separately.

Hope it makes sense.

ADD REPLY
1
Entering edit mode

The coefficient Conditionco:TreatmentLPS directly represents the effect of LPS over PBS in condition co. All you have to do is to drop this coefficient to get the LPS effect in this condition. For example, you should be able to set coef="Conditionco:TreatmentLPS" in glmLRT to do this comparison.

For design2, the contrast that you've written above will test for a differential LPS effect between conditions, as you've realized. However, if you want to get the LPS effect in each condition, you need to drop the 9th or 10th coefficients by themselves, e.g., with coef=9 (for co) or coef=10 (for c2) in glmLRT.

ADD REPLY
0
Entering edit mode

Thank you so much, now I understand.

ADD REPLY

Login before adding your answer.

Traffic: 737 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6