Error in running Dispersion estimation analysis in EdgeR
2
0
Entering edit mode
@mohammedtoufiq91-17679
Last seen 12 weeks ago
United States

Hi,

I am working with a multi variable RNAseq dataset with a total of 18 samples (experimental design below). I am interested in comparing different groups under Treatment_Category variable and add co-variates as Gender and Age, however, even before adding these covariates, I encounter an error message while estimating dispersion estimateDisp(see below).

Thank you,

Toufiq

Sample_metadata:

dput(Sample_metadata)
#>    Samples Donor Treatment Age_years Gender Category Treatment_Category
#> 1  D1_mock    D1      mock        65      F        O             mock_O
#> 2  D2_mock    D2      mock        62      F        O             mock_O
#> 3  D3_mock    D3      mock        20      F        Y             mock_Y
#> 4  D4_mock    D4      mock        20      F        Y             mock_Y
#> 5  D5_mock    D5      mock        24      M        Y             mock_Y
#> 6  D6_mock    D6      mock        21      M        Y             mock_Y
#> 7   D1_Low    D1       Low        65      F        O              Low_O
#> 8   D2_Low    D2       Low        62      F        O              Low_O
#> 9   D3_Low    D3       Low        20      F        Y              Low_Y
#> 10  D4_Low    D4       Low        20      F        Y              Low_Y
#> 11  D5_Low    D5       Low        24      M        Y              Low_Y
#> 12  D6_Low    D6       Low        21      M        Y              Low_Y
#> 13   D1_Hi    D1        Hi        65      F        O               Hi_O
#> 14   D2_Hi    D2        Hi        62      F        O               Hi_O
#> 15   D3_Hi    D3        Hi        20      F        Y               Hi_Y
#> 16   D4_Hi    D4        Hi        20      F        Y               Hi_Y
#> 17   D5_Hi    D5        Hi        24      M        Y               Hi_Y
#> 18   D6_Hi    D6        Hi        21      M        Y               Hi_Y
y.Treatment <- calcNormFactors(y.Treatment, method = "TMM")

Design

Patient_ID_v1 <- factor(Sample_metadata$Donor)
Treatment.Category <- factor(Sample_metadata$Treatment_Category, levels=c("mock_Y", "mock_O", "Low_Y", "Hi_Y", "Low_O", "Hi_O"))
design.Treatment.Category <- model.matrix(~Patient_ID_v1+Treatment.Category)
colnames(design.Treatment.Category)
rownames(design.Treatment.Category)
colnames(design.Treatment.Category)
design.Treatment.Category

Dispersion estimation

y.Treatment.v1 <- estimateDisp(y.Treatment,design.Treatment.Category, robust=TRUE)

Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05,  : 
  Design matrix not of full rank.  The following coefficients not estimable:
 Treatment.CategoryHi_O
edgeR DifferentialExpression limma design RNASeq • 1.8k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 10 hours ago
United States

You can't adjust for the subject and the subject's age at the same time (by blocking on subject you are automatically adjusting for age as well as sex). Just use subject and treatment:

design <- model.matrix(~ Donor + Treatment, Sample_metadata)
ADD COMMENT
0
Entering edit mode

James W. MacDonald thank you very much for the suggestions. The design suggested by you, I have used it earlier for comparing the Treatment column. In addition to this, we are also interested in comparing pairwise groups under the Treatment_Category column in the analysis. Is there a way to modify the same design formula and run on the same or create another design like below:

design.mod <- model.matrix(~Treatment_Category, Sample_metadata) ## This design works
design.mod.1 <- model.matrix(~Gender+Treatment_Category, Sample_metadata) ## This design works
design.mod.2 <- model.matrix(~Age+Treatment_Category, Sample_metadata) ## throws error
design.mod.1.2 <- model.matrix(~Age+Gender+Treatment_Category, Sample_metadata) ## throws error

Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05, : Design matrix not of full rank. The following coefficients not estimable:

ADD REPLY
0
Entering edit mode

If you just want to compare e.g., mock_o vs mock_y, you should just fit

design <- model.matrix(~0 + Treatment_category + Gender)

And compute the contrasts of interest. That's not a great model though, as you have only females in the old group and both males and females in the young group, so gender and Treatment_category are somewhat correlated.

You don't want to include Age, as it's already part of the Treatment_category. And you don't want/need to include the Donor because you are not making any comparisons that use each donor more than once. FYI, your design.mod.2 is likely failing because you have Age coded as a factor. It won't fail if Age is continuous, but you are including age in that model twice (continuous and as a factor as part of Treatment_category), which I wouldn't do.

ADD REPLY
1
Entering edit mode
@gordon-smyth
Last seen 37 minutes ago
WEHI, Melbourne, Australia

This is a multi-level experiment, with Treatment the within-patient factor and Category the between-patient factor. See the relevant sections of the edgeR and limma User's Guides.

ADD COMMENT
0
Entering edit mode

Thank you Gordon Smyth I tried as you suggested, following section 3.5 edgeR manual but got an error message (see below) while Dispersion estimation. Is there anything that I am missing?

3.5 Comparisons both between and within subjects

Donor <- factor(Sample_metadata$Donor)
Treatment <- factor(Sample_metadata$Treatment, levels=c("mock","Low","Hi"))
Category <- factor(Sample_metadata$Category, levels=c("Y", "O"))

design <- model.matrix(~Donor)

mock_Y.vs.O <- Treatment== "mock" & Category=="Y"
Low_Y.vs.O <- Treatment=="Low" & Category=="Y"
Hi_Y.vs.O <- Treatment=="Hi" & Category=="Y"

design <- cbind(design, mock_Y.vs.O, Low_Y.vs.O, Hi_Y.vs.O)
colnames(design)

?estimateDisp
y.Treatment.Category <- estimateDisp(y.Treatment, design, robust=TRUE) 
y.Treatment.Category

Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05, : Design matrix not of full rank. The following coefficients not estimable:

ADD REPLY
1
Entering edit mode

Compared to Section 3.5 of the edgeR User's Guide, you have interchanged the between and within factors. You have treated Treatment as the between-factor and Category as the within-factor whereas they are actually the other way around. The equivalent of Section 3.5 for your experiment would be:

> design <- model.matrix(~Donor)
> Low.Y <- Treatment=="Low" & Category=="Y"
> Hi.Y <- Treatment=="Hi" & Category=="Y"
> design <- cbind(design,Low.Y,Hi.Y)

There are only 2 df for interaction between Treatment and Category so there should be only 2 covariates other than Donor in the design.

You might consider using the limma approach with duplicateCorrelation(), which you may find much easier and more flexible.

ADD REPLY
0
Entering edit mode

Gordon Smyth many thanks. I could use the limma duplicateCorrelation() approach (below). One question, is there need to add Gender or (by blocking on Donor I am automatically adjusting for Age as well as Gender).

library(limma)

Treatment.Category <- paste(Sample_metadata$Treatment, Sample_metadata$Category, sep=".")
Treatment.Category <- factor(Treatment.Category)
design <- model.matrix(~0+Treatment.Category)
colnames(design) <- levels(Treatment.Category)


## estimate the correlation between measurements made on the same subject:
corfit <- duplicateCorrelation(cpm_with_log2,design,block=Sample_metadata$Donor) 

fit <- lmFit(cpm_with_log2, design, block=Sample_metadata$Donor,correlation=corfit$consensus) 

Cont.matrix <- makeContrasts(
  "mock.Old_vs_Young" = mock.Old - mock.Young,
  "hi.Old_vs_Young" = hi.Old - hi.Young,
  "lo.Old_vs_Young" = lo.Old - lo.Young,
  levels=design)


fit_2 <- contrasts.fit(fit, Cont.matrix)
fit_2 <- eBayes(fit_2)
tt <- topTable(fit_2, number=Inf, adjust="BH") 
ADD REPLY
1
Entering edit mode

It is impossible to adjust for Age, Gender or Donor in the design matrix. Age and Gender are confounded with Category and Donor is already handled by the blocking variable.

ADD REPLY
0
Entering edit mode

Gordon Smyth though I could use limma as a workaroud, I am just curious to know what was exactly interchanged? Furthermore maintain homogeneity of using the same edgeR workflow

ADD REPLY
1
Entering edit mode

I have edited my comment above.

ADD REPLY
0
Entering edit mode

Thank you Gordon Smyth I was including 3 comparisons earlier ie., mock, Low.Y, and Hi.Y. As , it is 2 df, I can include only Low.Y, and Hi.Y or mock and Low.Y but not all 3 of them. In this scenario limma looks flexible to me. Thank you again for your continuous support.

ADD REPLY

Login before adding your answer.

Traffic: 623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6