Entering edit mode
Nick Schurch
▴
60
@nick-schurch-4014
Last seen 10.7 years ago
Dear Bioconductors,
I'm trying to use edgeR to compute the differential expression from
some
RNA-Seq data, but it seems to be generating some odd results.
I'm running edgeR on data on about 6000 genes, across 5 experimental
conditions. When I compute the differential expression for any two of
these
conditions edgeR it is returning a 'nan' fold-change for about 1500
out of
the 6000 genes being tested. Amazingly, it is also returning p-values
for
these fold-changes, and the p-values cover a range of values from
total
insignificant (>0.5) to very significant (<1E-4)! At first I thought
if
might be because there is sometimes no signal for a gene in a given
condition, but 1) other cases of this nature produce '-inf' or 'inf'
fold-changes, not 'nan' fold-hanges, and 2) in some cases edgeR is
calculating a 'nan' fold-change for something that has signal in every
replicate of both conditions! I've checked everything I can think
of... I
don't get any errors or warnings, the Norm Factors are sensible, the
raw
signal is sensible, all the objects look well-formed (i.e. like they
contain
all the bits they should contain).... its very confusing and
frustrating.
So, am I doing something wrong? Or is there a deeper problem with this
package?
I'm using R v 2.13.1, edgeR 2.2.5 and the commands I'm using are:
> # create the design matrix
> model.groups<-groups
> model.factors<-as.factor(model.groups)
> model<-model.matrix(~model.factors)
>
> # build DGElist and calculate normalization factors
> x=as.data.frame(data)
> rownames(x)=genenames
> y=DGEList(x,group=groups)
> y=calcNormFactors(y)
> str(y)
Formal class 'DGEList' [package "edgeR"] with 1 slots
..@ .Data:List of 3
.. ..$ :'data.frame': 17 obs. of 3 variables:
.. .. ..$ group : Factor w/ 5 levels "cond00","cond06",..: 3 3
3 4 4
4 3 3 1 1 ...
.. .. ..$ lib.size : num [1:17] 1866963 2364994 1712838 2782920
2780054
...
.. .. ..$ norm.factors: num [1:17] 0.992 0.978 1.002 1.036 0.972 ...
.. ..$ : num [1:6351, 1:17] 7 187 44 0 98 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:6351] "38536" "38537" "38538" "38539" ...
.. .. .. ..$ : chr [1:17] "cond07.rep0020" "cond07.rep0021"
"cond07.rep0022" "cond08.rep0023" ...
.. ..$ : Named logi [1:6351] FALSE FALSE FALSE FALSE FALSE FALSE ...
.. .. ..- attr(*, "names")= chr [1:6351] "38536" "38537" "38538"
"38539"
...
>
> # estimate dispersion and fit models
> z=estimateGLMCommonDisp(y, design))
> str(z)
Formal class 'DGEList' [package "edgeR"] with 1 slots
..@ .Data:List of 4
.. ..$ :'data.frame': 17 obs. of 3 variables:
.. .. ..$ group : Factor w/ 5 levels "cond00","cond06",..: 3 3
3 4 4
4 3 3 1 1 ...
.. .. ..$ lib.size : num [1:17] 1866963 2364994 1712838 2782920
2780054
...
.. .. ..$ norm.factors: num [1:17] 0.992 0.978 1.002 1.036 0.972 ...
.. ..$ : num [1:6351, 1:17] 7 187 44 0 98 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:6351] "38536" "38537" "38538" "38539" ...
.. .. .. ..$ : chr [1:17] "cond07.rep0020" "cond07.rep0021"
"cond07.rep0022" "cond08.rep0023" ...
.. ..$ : Named logi [1:6351] FALSE FALSE FALSE FALSE FALSE FALSE ...
.. .. ..- attr(*, "names")= chr [1:6351] "38536" "38537" "38538"
"38539"
...
.. ..$ : num 0.101
>
> fit<-glmFit(z,model,dispersion=z$common.dispersion)
> str(fit)
Formal class 'DGEGLM' [package "edgeR"] with 1 slots
..@ .Data:List of 12
.. ..$ : num [1:6351, 1:5] -12.89 -9.22 -10.72 -11.54 -9.56 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:6351] "38536" "38537" "38538" "38539" ...
.. .. .. ..$ : chr [1:5] "(Intercept)" "model.factorscond06"
"model.factorscond07" "model.factorscond08" ...
.. ..$ : int [1:6351] 12 12 12 12 12 12 12 12 12 12 ...
.. ..$ : Named num [1:6351] 23.3 3.45 30.57 25.89 3.07 ...
.. .. ..- attr(*, "names")= chr [1:6351] "38536" "38537" "38538"
"38539"
...
.. ..$ : num [1:17, 1:5] 1 1 1 1 1 1 1 1 1 1 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:17] "1" "2" "3" "4" ...
.. .. .. ..$ : chr [1:5] "(Intercept)" "model.factorscond06"
"model.factorscond07" "model.factorscond08" ...
.. .. ..- attr(*, "assign")= int [1:5] 0 1 1 1 1
.. .. ..- attr(*, "contrasts")=List of 1
.. .. .. ..$ model.factors: chr "contr.treatment"
.. ..$ : num [1:6351, 1:17] 14.4 14.4 14.4 14.4 14.4 ...
.. ..$ :'data.frame': 17 obs. of 3 variables:
.. .. ..$ group : Factor w/ 5 levels "cond00","cond06",..: 3 3
3 4 4
4 3 3 1 1 ...
.. .. ..$ lib.size : num [1:17] 1866963 2364994 1712838 2782920
2780054
...
.. .. ..$ norm.factors: num [1:17] 0.992 0.978 1.002 1.036 0.972 ...
.. ..$ : NULL
.. ..$ : num 0.101
.. ..$ : num [1:17] 1851365 2313681 1716797 2882487 2703330 ...
.. ..$ : NULL
.. ..$ : num [1:6351, 1:17] 6.32 160.53 59.13 12.7 105.68 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:6351] "38536" "38537" "38538" "38539" ...
.. .. .. ..$ : chr [1:17] "cond07.rep0020" "cond07.rep0021"
"cond07.rep0022" "cond08.rep0023" ...
.. ..$ : Named num [1:6351] -12.67 -9.33 -10.36 -11.66 -9.64 ...
.. .. ..- attr(*, "names")= chr [1:6351] "38536" "38537" "38538"
"38539"
...
>
> # liklihood ratio statistics
> results=glmLRT(z, fit, coef = 4)
> str(results)
Formal class 'DGELRT' [package "edgeR"] with 1 slots
..@ .Data:List of 10
.. ..$ :'data.frame': 17 obs. of 3 variables:
.. .. ..$ group : Factor w/ 5 levels "cond00","cond06",..: 3 3
3 4 4
4 3 3 1 1 ...
.. .. ..$ lib.size : num [1:17] 1866963 2364994 1712838 2782920
2780054
...
.. .. ..$ norm.factors: num [1:17] 0.992 0.978 1.002 1.036 0.972 ...
.. ..$ : Named logi [1:6351] FALSE FALSE FALSE FALSE FALSE FALSE ...
.. .. ..- attr(*, "names")= chr [1:6351] "38536" "38537" "38538"
"38539"
...
.. ..$ : num 0.101
.. ..$ :'data.frame': 6351 obs. of 4 variables:
.. .. ..$ logConc : num [1:6351] -12.67 -9.33 -10.36 -11.66
-9.64 ...
.. .. ..$ logFC : num [1:6351] 0.7057 -0.0157 0.5583 -0.3311
-0.1913
...
.. .. ..$ LR.statistic: num [1:6351] 1.38671 0.00165 1.83369 0.46732
0.24042 ...
.. .. ..$ p.value : num [1:6351] 0.239 0.968 0.176 0.494 0.624
...
.. ..$ : num [1:6351, 1:5] -12.89 -9.22 -10.72 -11.54 -9.56 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:6351] "38536" "38537" "38538" "38539" ...
.. .. .. ..$ : chr [1:5] "(Intercept)" "model.factorscond06"
"model.factorscond07" "model.factorscond08" ...
.. ..$ : num [1:6351, 1:4] -12.63 -9.22 -10.52 -11.63 -9.62 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:6351] "38536" "38537" "38538" "38539" ...
.. .. .. ..$ : chr [1:4] "(Intercept)" "model.factorscond07"
"model.factorscond08" "model.factorscond10"
.. ..$ : num [1:17, 1:5] 1 1 1 1 1 1 1 1 1 1 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:17] "1" "2" "3" "4" ...
.. .. .. ..$ : chr [1:5] "(Intercept)" "model.factorscond06"
"model.factorscond07" "model.factorscond08" ...
.. .. ..- attr(*, "assign")= int [1:5] 0 1 1 1 1
.. .. ..- attr(*, "contrasts")=List of 1
.. .. .. ..$ model.factors: chr "contr.treatment"
.. ..$ : num [1:17, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:17] "1" "2" "3" "4" ...
.. .. .. ..$ : chr [1:4] "(Intercept)" "model.factorscond07"
"model.factorscond08" "model.factorscond10"
.. ..$ : num 0.101
.. ..$ : chr "model.factorscond06"
These are some examples of the results:
--------------------
edgeR out:
LR.statistic: 0.975075487675
logConc: -13.3931239073
p.value: 0.323417620577
logFC: NaN
geneID: 38956
Raw data:
Condition 1:
replicate 1: 2,372,532 good reads, female: 8.0
replicate 2: 3,966,968 good reads, female: no signal
replicate 3: 1,389,571 good reads, male: no signal
Condition 2:
replicate 1: 3,102,608 good reads, male: 5.0
replicate 2: 3,451,983 good reads, male: no signal
replicate 3: 2,892,192 good reads, male: no signal
replicate 4: 3,620,124 good reads, male: 5.0
replicate 5: 2,640,968 good reads, female: no signal
--------------------
--------------------
edgeR out:
LR.statistic: 3.57045101322
logConc: -13.6684814046
p.value: 0.0588163345523
logFC: NaN
geneID: 38959
Raw data:
Condition 1:
replicate 1: 2,372,532 good reads, female: 5.0
replicate 2: 3,966,968 good reads, female: 5.0
replicate 3: 1,389,571 good reads, male: no signal
Condition 2:
replicate 1: 3,102,608 good reads, male: no signal
replicate 2: 3,451,983 good reads, male: no signal
replicate 3: 2,892,192 good reads, male: 7.0
replicate 4: 3,620,124 good reads, male: no signal
replicate 5: 2,640,968 good reads, female: no signal
--------------------
--------------------
edgeR out:
LR.statistic: 1.81602638091
logConc: -11.265720531
p.value: 0.177786996458
logFC: NaN
geneID: 38965
Raw data:
Condition 1:
replicate 1: 2,372,532 good reads, female: 22.0
replicate 2: 3,966,968 good reads, female: 27.0
replicate 3: 1,389,571 good reads, male: 5.0
Condition 2:
replicate 1: 3,102,608 good reads, male: 26.0
replicate 2: 3,451,983 good reads, male: 26.0
replicate 3: 2,892,192 good reads, male: 9.0
replicate 4: 3,620,124 good reads, male: 36.0
replicate 5: 2,640,968 good reads, female: 55.0
--------------------
--
Cheers,
Nick Schurch
Data Analysis Group (The Barton Group),
School of Life Sciences,
University of Dundee,
Dow St,
Dundee,
DD1 5EH,
Scotland,
UK
Tel: +44 1382 388707
Fax: +44 1382 345 893
[[alternative HTML version deleted]]