Entering edit mode
Miguel Gallach
▴
40
@miguel-gallach-5016
Last seen 10.2 years ago
Dear list,
I am using edgeR to analyze my RNA-Seq data. My data consist in two
factors
(Adaptation and Temperature) and two levels per factor, and I want fit
full
and reduced glm like the next ones:
Reduced model
fit0 = counts ~ Adaptation
Aditive model
fit1 = counts ~ Adaptation + Temperature
Interaction model
fit2 = counts ~ block + treatment + block:treatment
That is what I did until now:
d = raw.data[,1:8]
head (d)
R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold
R10.Cold
FBgn0000003 0 0 0 0 2 8 3
1
FBgn0000008 92 123 80 41 76 135 188
150
FBgn0000014 0 1 2 4 5 8 10
7
FBgn0000015 2 1 0 0 2 3 9
5
FBgn0000017 2000 2393 1799 1290 1029 973 2441
1665
FBgn0000018 36 43 10 17 6 16 20
30
Adaptation =
factor(c("HotAdapted","HotAdapted","ColdAdapted","ColdAdapted","HotAda
pted","HotAdapted","ColdAdapted","ColdAdapted"))
Temperature =
factor(c("HotCage","HotCage","HotCage","HotCage","ColdCage","ColdCage"
,"ColdCage","ColdCage"))
design = model.matrix(~Adaptation + Temperature +
Adaptation:Temperature)
design
(Intercept) AdaptationHotAdapted TemperatureHot
1 1 1 1
2 1 1 1
3 1 0 1
4 1 0 1
5 1 1 0
6 1 1 0
7 1 0 0
8 1 0 0
AdaptationHotAdapted:TemperatureHot
1 1
2 1
3 0
4 0
5 0
6 0
7 0
8 0
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Adaptation
[1] "contr.treatment"
attr(,"contrasts")$Temperature
[1] "contr.treatment"
d.GLM = DGEList(d, group = c("HotAdaptedHotCage", "HotAdaptedHotCage",
"ColdAdaptedHotCage", "ColdAdaptedHotCage","HotAdaptedColdCage",
"HotAdaptedColdCage", "ColdAdaptedColdCage", "ColdAdaptedColdCage"))
Calculating library sizes from column totals.
d.GLM = calcNormFactors(d.GLM)
d.GLM
An object of class "DGEList"
$samples
group lib.size norm.factors
R4.Hot HotAdaptedHot 17409289 0.9881635
R5.Hot HotAdaptedHot 17642552 1.0818144
R9.Hot ColdAdaptedHot 20010974 0.8621807
R10.Hot ColdAdaptedHot 14064143 0.8932791
R4.Cold HotAdaptedCold 11968317 1.0061084
R5.Cold HotAdaptedCold 11072832 1.0523857
R9.Cold ColdAdaptedCold 22386103 1.0520949
R10.Cold ColdAdaptedCold 17408532 1.0903311
$counts
R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold
R10.Cold
FBgn0000003 0 0 0 0 2 8 3
1
FBgn0000008 92 123 80 41 76 135 188
150
FBgn0000014 0 1 2 4 5 8 10
7
FBgn0000015 2 1 0 0 2 3 9
5
FBgn0000017 2000 2393 1799 1290 1029 973 2441
1665
14864 more rows ...
$all.zeros
FBgn0000003 FBgn0000008 FBgn0000014 FBgn0000015 FBgn0000017
FALSE FALSE FALSE FALSE FALSE
14864 more elements ...
#According to the manual, DGEList is now ready to be passed to the
functions that do the calculations to determine differential
expression
levels for the genes. However is not possible to achieve statistical
significance with fewer than ten counts in total for a tag/gene and we
also
do not want to waste effort finding spurious DE (such as when a gene
is
only expressed in one library), so we filter out tags/genes with fewer
than
1 count per million in SIX or more libraries.
cpm.d.GLM = cpm(d.GLM)
d.GLM = d.GLM[rowSums(cpm.d.GLM > 1) >= 2, ]
nrow(d.GLM)
[1] 9418
#Now the dataset is ready to be analysed for differential expression
#The design matrix
design
(Intercept) AdaptationHotAdapted TemperatureHotCage
1 1 1 1
2 1 1 1
3 1 0 1
4 1 0 1
5 1 1 0
6 1 1 0
7 1 0 0
8 1 0 0
AdaptationHotAdapted:TemperatureHotCage
1 1
2 1
3 0
4 0
5 0
6 0
7 0
8 0
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Adaptation
[1] "contr.treatment"
attr(,"contrasts")$Temperature
[1] "contr.treatment"
#Estimate CR common/trended/tagwise dispersion
d.GLM = estimateGLMCommonDisp(d.GLM, design)
names(d.GLM)
[1] "samples" "counts" "all.zeros"
[4] "common.dispersion"
d.GLM$common.dispersion
[1] 0.04634741
sqrt(d.GLM$common.dispersion)
[1] 0.2152845
d.GLM = estimateGLMTrendedDisp(d.GLM, design)
Loading required package: splines
summary(d.GLM$trended.dispersion)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.03672 0.03843 0.04326 0.05267 0.06134 0.11870
d.GLM = estimateGLMTagwiseDisp(d.GLM, design)
d.GLM$prior.n
NULL
d$prior.n
NULL
ls()
[1] "Adaptation" "cpm.d" "cpm.d.GLM" "d" "d.GLM"
[6] "design" "raw.data" "Temperature"
d.GLM
An object of class "DGEList"
$samples
group lib.size norm.factors
R4.Hot HotAdaptedHotCage 17409289 0.9881635
R5.Hot HotAdaptedHotCage 17642552 1.0818144
R9.Hot ColdAdaptedHotCage 20010974 0.8621807
R10.Hot ColdAdaptedHotCage 14064143 0.8932791
R4.Cold HotAdaptedColdCage 11968317 1.0061084
R5.Cold HotAdaptedColdCage 11072832 1.0523857
R9.Cold ColdAdaptedColdCage 22386103 1.0520949
R10.Cold ColdAdaptedColdCage 17408532 1.0903311
$counts
R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold
R10.Cold
FBgn0000008 92 123 80 41 76 135 188
150
FBgn0000017 2000 2393 1799 1290 1029 973 2441
1665
FBgn0000018 36 43 10 17 6 16 20
30
FBgn0000024 51 68 73 42 118 118 242
129
FBgn0000032 1640 1942 2328 1342 755 674 2110
1307
9413 more rows ...
$all.zeros
FBgn0000008 FBgn0000017 FBgn0000018 FBgn0000024 FBgn0000032
FALSE FALSE FALSE FALSE FALSE
9413 more elements ...
$common.dispersion
[1] 0.04634741
$trended.dispersion
[1] 0.06147752 0.03682037 0.09393240 0.06250704 0.03701067
9413 more elements ...
$abundance
FBgn0000008 FBgn0000017 FBgn0000018 FBgn0000024 FBgn0000032
-11.915246 -9.183744 -13.519050 -11.966242 -9.300038
9413 more elements ...
$bin.dispersion
[1] 0.10242476 0.08298402 0.07844827 0.06269934 0.05173652 0.04872388
[7] 0.04284990 0.04557811 0.04009431 0.03637836 0.03769710 0.03569138
[13] 0.03724316 0.03980808 0.04726670
$bin.abundance
[1] -13.904034 -13.147003 -12.527444 -12.012132 -11.522218 -11.127957
[7] -10.769579 -10.430591 -10.119638 -9.821485 -9.532052 -9.228034
[13] -8.853866 -8.376654 -7.322181
$tagwise.dispersion
[1] 0.05760539 0.03015829 0.10190732 0.05315469 0.03199193
9413 more elements ...
getPriorN(d.GLM, design=design)
[1] 5
#Fit model with tagwise dispersion
glmfit.tgw = glmFit(d.GLM, design,
dispersion=d.GLM$tagwise.dispersion)
lrt.tgw = glmLRT(d.GLM, glmfit.tgw)
topTags(lrt.tgw)
Coefficient: AdaptationHotAdapted:TemperatureHotCage
logConc logFC LR P.Value FDR
FBgn0032285 -8.215481 -5.121465 60.90457 5.990958e-15 5.642284e-11
FBgn0026089 -8.406297 -3.019672 40.81214 1.675890e-10 7.891766e-07
FBgn0039475 -10.039331 -2.757261 39.04361 4.144435e-10 1.301076e-06
FBgn0051641 -10.291549 2.976952 37.46208 9.320768e-10 2.194575e-06
FBgn0058042 -9.665534 2.267317 33.71276 6.388032e-09 1.203250e-05
FBgn0010333 -8.364017 -2.431286 31.28393 2.229167e-08 3.163945e-05
FBgn0030094 -9.930521 -4.244210 31.18012 2.351626e-08 3.163945e-05
FBgn0053653 -11.223515 -2.506126 30.73483 2.958070e-08 3.482388e-05
FBgn0028863 -9.954041 -2.419844 30.13375 4.032520e-08 4.219808e-05
FBgn0037979 -11.777803 -3.211380 29.61620 5.266308e-08 4.959809e-05
If I understand, glmfit.tgw would be the interaction model (f2) I was
talking about, wright? My problem is that I do not know now how to
define
the additive and reduced models to test Additive vs. Reduced and
Interaction vs. Additive.
Any suggestion?
Many thanks,
Miguel Gallach
[[alternative HTML version deleted]]