Entering edit mode
You had some problem with your kegg gene set data. What you did:
kegg.gs <- kegg.gsets(species = "aga", id.type = "kegg")
cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp =
samp.idx, compare ="paired")
kegg.gsets produces kegg gene sets with some meta-data. You need to
process the right gene set data out before calling gage:
kg.aga=kegg.gsets("aga")
kegg.gs=kg. aga$kg.sets[kg. aga$sigmet.idx]
cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp =
samp.idx, compare ="paired")
?
You can check the details on your gene set data:
names(kg.aga)
lapply(kg.aga, head, 3)
lapplykegg.gs[1:3], head, 3)
you may also check the function documentation:
?kegg.gsets
Actually you will find everything in the pacakge tutorials and
documentations for functions you work with like gage or pathview etc:
http://bioconductor.org/packages/release/bioc/html/gage.html
http://bioconductor.org/packages/release/bioc/html/pathview.html
----------------------------------------------
Saul Lozano wrote:
???Dear bioconductor members,
? I have completed the "RNA-Seq Data Pathway and Gene-set
Analysis
Workflows" tutorial and have obtained the output png files
??? for the human example datasets???.
Having completed
???the tutorial, I started working with my mosquito
data.
???I aligned my reads using gmap-gsnap using "
https://www.vectorbase.org/download/anopheles-gambiae-
pestchromosomesagamp4fagz"
and created the Transcription Database object using its
corresponding
annotation files that can be find at "
https://www.vectorbase.org/download/anopheles-gambiae-
pestbasefeaturesagamp41gff3gz
"
??? ???
? I have completed the pre-processing and normalization,
the list of
normalized reads per gene looks like this
???>cnts.norm???
...
AGAP009385? ? ? ? 1.729138e+01? ? ? ? 2.222712e+03?
? ? ? 8.552872e+02
AGAP009386? ? ? ? 3.094247e+01? ? ? ? 1.681387e+02?
? ? ? 7.808745e+01
AGAP009387? ? ? ? 1.274102e+02? ? ? ? 6.561511e+01?
? ? ? 7.441275e+01
AGAP009388? ? ? ? 1.695465e+03? ? ? ? 3.033674e+03?
? ? ? 1.940243e+03
AGAP009389? ? ? ? 1.071155e+03? ? ? ? 1.202602e+03?
? ? ? 9.646097e+02
AGAP009390? ? ? ? 0.000000e+00? ? ? ? 2.050472e+00?
? ? ? 1.837352e+00
AGAP009391? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00?
? ? ? 0.000000e+00
AGAP009392? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00?
? ? ? 0.000000e+00
AGAP009393? ? ? ? 1.820145e+00? ? ? ? 0.000000e+00?
? ? ? 0.000000e+00
AGAP009394? ? ? ? 0.000000e+00? ? ? ? 2.050472e+00?
? ? ? 0.000000e+00
...
???the structure of the object is:
> str(cnts.norm)
num [1:12489, 1:12] 1857 3076 1390 4953 5027 ...
- attr(*, "dimnames")=List of 2
? ..$ : chr [1:12489] "AGAP000002" "AGAP000005"
"AGAP000007" "AGAP000008"
...
? ..$ : chr [1:12] "library10.sorted.bam"
"library11.sorted.bam"
"library12.sorted.bam" "library1.sorted.bam" ...
I load the kegg pathway database using kegg.gsets like this
>kegg.gs <- kegg.gsets(species = "aga", id.type =
"kegg")
but the accession number are different in the Transcription
database and
the kegg.
> kegg.gs
...
$kg.sets$`aga04745 Phototransduction - fly`
[1] "AgaP_AGAP000028" "AgaP_AGAP000348" "AgaP_AGAP000651"
"AgaP_AGAP000945"
[5] "AgaP_AGAP001506" "AgaP_AGAP001936" "AgaP_AGAP002145"
"AgaP_AGAP005079"
[9] "AgaP_AGAP005095" "AgaP_AGAP006263" "AgaP_AGAP006475"
"AgaP_AGAP008435"
[13] "AgaP_AGAP009115" "AgaP_AGAP009730" "AgaP_AGAP009953"
"AgaP_AGAP010630"
[17] "AgaP_AGAP010957" "AgaP_AGAP011099" "AgaP_AGAP011414"
"AgaP_AGAP011516"
[21] "AgaP_AGAP012026" "AgaP_AGAP012252"
...
The gage tutorial mentions that the names of the genes in
the counts and
the kegg object should be the same, so I changed the
rownames of the counts
by adding "Agap_"
> x <- data.frame(rownames(cnts.norm),
stringsAsFactors=FALSE)
> x$new <- paste("AgaP_", x[,1],sep="")
> row.names(cnts.norm) <- x$new
so the list now looks like this
...
AgaP_AGAP009385? ? ? ? 1.729138e+01? ? ? ?
2.222712e+03? ? ? ? 8.552872e+02
AgaP_AGAP009386? ? ? ? 3.094247e+01? ? ? ?
1.681387e+02? ? ? ? 7.808745e+01
AgaP_AGAP009387? ? ? ? 1.274102e+02? ? ? ?
6.561511e+01? ? ? ? 7.441275e+01
AgaP_AGAP009388? ? ? ? 1.695465e+03? ? ? ?
3.033674e+03? ? ? ? 1.940243e+03
AgaP_AGAP009389? ? ? ? 1.071155e+03? ? ? ?
1.202602e+03? ? ? ? 9.646097e+02
AgaP_AGAP009390? ? ? ? 0.000000e+00? ? ? ?
2.050472e+00? ? ? ? 1.837352e+00
AgaP_AGAP009391? ? ? ? 0.000000e+00? ? ? ?
0.000000e+00? ? ? ? 0.000000e+00
AgaP_AGAP009392? ? ? ? 0.000000e+00? ? ? ?
0.000000e+00? ? ? ? 0.000000e+00
AgaP_AGAP009393? ? ? ? 1.820145e+00? ? ? ?
0.000000e+00? ? ? ? 0.000000e+00
AgaP_AGAP009394? ? ? ? 0.000000e+00? ? ? ?
2.050472e+00? ? ? ? 0.000000e+00
...
In theory everything is set for the gage analysis like this
> cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref
= ref.idx, samp =
samp.idx, compare ="paired")
but I get
> cnts.kegg.p
$greater
? ? ? ? ???p.geomean stat.mean p.val q.val set.size
library2.sorted.bam
kg.sets? ? ? ? ???NA? ? ???NaN? ? NA? ? NA?
? ? ? 2? ? ? ? ? ? ? ? ? NA
sigmet.idx? ? ? ? NA? ? ???NaN? ? NA? ? NA? ?
? ? 0? ? ? ? ? ? ? ? ? NA
sig.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA?
? ? ? 0? ? ? ? ? ? ? ? ? NA
met.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA?
? ? ? 0? ? ? ? ? ? ? ? ? NA
dise.idx? ? ? ? ? NA? ? ???NaN? ? NA? ? NA?
? ? ? 0? ? ? ? ? ? ? ? ? NA
? ? ? ? ???library5.sorted.bam
kg.sets? ? ? ? ? ? ? ? ? ???NA
sigmet.idx? ? ? ? ? ? ? ? ? NA
sig.idx? ? ? ? ? ? ? ? ? ???NA
met.idx? ? ? ? ? ? ? ? ? ???NA
dise.idx? ? ? ? ? ? ? ? ? ? NA
$less
? ? ? ? ???p.geomean stat.mean p.val q.val set.size
library2.sorted.bam
kg.sets? ? ? ? ???NA? ? ???NaN? ? NA? ? NA?
? ? ? 2? ? ? ? ? ? ? ? ? NA
sigmet.idx? ? ? ? NA? ? ???NaN? ? NA? ? NA? ?
? ? 0? ? ? ? ? ? ? ? ? NA
sig.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA?
? ? ? 0? ? ? ? ? ? ? ? ? NA
met.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA?
? ? ? 0? ? ? ? ? ? ? ? ? NA
dise.idx? ? ? ? ? NA? ? ???NaN? ? NA? ? NA?
? ? ? 0? ? ? ? ? ? ? ? ? NA
? ? ? ? ???library5.sorted.bam
kg.sets? ? ? ? ? ? ? ? ? ???NA
sigmet.idx? ? ? ? ? ? ? ? ? NA
sig.idx? ? ? ? ? ? ? ? ? ???NA
met.idx? ? ? ? ? ? ? ? ? ???NA
dise.idx? ? ? ? ? ? ? ? ? ? NA
$stats
? ? ? ? ???stat.mean library2.sorted.bam
library5.sorted.bam
kg.sets? ? ? ? ? NaN? ? ? ? ? ? ? ? ? NA? ?
? ? ? ? ? ? ? NA
sigmet.idx? ? ???NaN? ? ? ? ? ? ? ? ? NA? ?
? ? ? ? ? ? ? NA
sig.idx? ? ? ? ? NaN? ? ? ? ? ? ? ? ? NA? ?
? ? ? ? ? ? ? NA
met.idx? ? ? ? ? NaN? ? ? ? ? ? ? ? ? NA? ?
? ? ? ? ? ? ? NA
dise.idx? ? ? ???NaN? ? ? ? ? ? ? ? ? NA? ?
? ? ? ? ? ? ? NA
My questions are, how do I debug this problem? is the gene
name (accession
number) change correct? Are "NaN"s the result of zeros in
the datasets? any
help will be greatly appreciated.
Best regards, Saul
??? [[alternative HTML version deleted]]