Hi everybody,
I'm trying to perform a pathway analysis with the R Bioconductor packages Gage/Pathview on honey bee RNA-seq data. I think I got to more or less write a script that would work, except it doesn't because my gene expression data uses gene IDs from BeeBase, while the Kegg package of the honey bee uses ncbi gene ids, so I would need to convert the gene names of my differential expression results. Here's the code I wrote so far:
`res_ut_ins.fc=res_ut_ins$log2FoldChange
names(res_ut_ins.fc)=rownames(res_ut_ins)
exp.fc=res_ut_ins.fc
out.suffix="deseq2"
require(gage)
datakegg.gs)
kg.ame=kegg.gsets("ame")
names(kg.ame)
lapply(kg.ame[1:3], head)
lapply(exp.fc[1:3],head)
#Use the signaling pathways and metabolic pathways in the analysis. Extract those pathways:
kegg.gs=kg.ame$kg.sets[kg.ame$sigmet.idx]
#now call gage with your data like:
fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL, gene.idtype="RefSeq" )
sel <- fc.kegg.p$greater[, "q.val"] < 1 &
!is.na(fc.kegg.p$greater[, "q.val"])
path.ids <- rownames(fc.kegg.p$greater)[sel]
sel.l <- fc.kegg.p$less[, "q.val"] < 1 &
!is.na(fc.kegg.p$less[,"q.val"])
path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
lapplykegg.gs[1:3], head, 3)
lengthkegg.gs)
head(exp.fc)
str(exp.fc)
head(path.ids)
head(path.ids2)
head(fc.kegg.p$greater)
require(pathview)
#view first 3 pathways as demo
pv.out.list <- sapply(path.ids2, function(pid) pathview(
gene.data= exp.fc, pathway.id=pid,
species="ame", out.suffix=outsuffix, gene.idtype="Kegg"))`
So I think the problem is that my kg.sets look like:
$kg.sets$`ame04512 ECM-receptor interaction`
[1] "100576742" "406097" "408393" "408551" "408552" "408826" "409448"
[8] "409474" "409924" "410038" "410775" "412663" "412941" "413739"
[15] "413782" "724950" "726736" "726871"
While my exp.fc looks like:
GB40001-RA GB40002-RA GB40003-RA GB40004-RA GB40005-RA GB40006-RA
0.0000000 NA -0.1594682 NA -0.1337054 -0.1054865
Looking around for related posts I've found this solution (http://seqanswers.com/forums/archive/index.php/t-35472.html) to create a mapping table for conversion:
"You can download the gene_info data file from NCBI ftp site:ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Invertebrates/All_Invertebrates.gene_info.gz under unix/linux shell, do: gunzip All_Invertebrates.gene_info.gz egrep '(^7460)' All_Invertebrates.gene_info >>am.gene_info.txt
Column 2-6 are (Entrez) GeneID, Symbol, LocusTag, Synonyms, dbXrefs. You see GB numbers in Column 5 and 6. After get the ID mapping, you can use mol.sum in pathview package to merge redundant IDs into a unified expression level. ?pathview::mol.sum "
But my problem is that I don't know how to do the last bit of what is suggested here, getting the ID mapping in r and using mol.sum to merge the redundant IDs. Could someone check my script so far and help writing the missing lines of the script?
Thanks a lot in advance for your kind help.
Best, Joanito