Hello,
I work on RNA-seq data from Oryza sativa Japonica (TaxID: 39947)
There is only one genome sequence for this rice, but there are mainly two gene models. One system is known as 'osa' in KEGG, and it was from MSU-RGAP (http://rice.uga.edu; like 'LOC_Os03g02939'), and NCBI and phytozome (DOE) use its gene model. Another system is known as 'dosa' in KEGG, and it was from RAP-DB (https://rapdb.dna.affrc.go.jp/; like 'Os03g0121300') and RAP-DB and Ensembl use this gene model.
There are many resources for MSU gene model since NCBI-entrez also use this gene model. KEGG added 'osa' pathway in 2003 and 'dosa' pathway in 2012. Genome info for dosa: https://www.genome.jp/kegg-bin/show_organism?org=dosa Pathway map for dosa: https://www.genome.jp/kegg-bin/show_organism?menu_type=pathway_maps&org=dosa
As you can guess, I chose to use the gene model from RAP-DB, so I use 'dosa' for KEGG. I try to use pathview to visualize a list of interesting genes on 'dosa' KEGG pathway maps.
When I searched for 'dosa' from 'korg' in 'pathview' package, it showed no record, meaning that 'korg' do not contain 'dosa'. The total number of records is 8282 from pathview_1.38.0, so I think pathview does not include information for 'dosa'.
> data(korg, package="pathview")
> korg[korg[,3]=="dosa",]
ktax.id tax.id kegg.code scientific.name common.name entrez.gnodes kegg.geneid ncbi.geneid ncbi.proteinid uniprot
> korg[korg[,3]=="osa",]
ktax.id tax.id kegg.code scientific.name common.name entrez.gnodes
"T01015" "4530" "osa" "Oryza sativa japonica" "Japanese rice" "1"
kegg.geneid ncbi.geneid ncbi.proteinid uniprot
"4351353" "4351353" "XP_015620368" "Q6ATB4"
> dim(korg)
[1] 8282 10
I also tried to build my own korg file based like pathview says my nonmodel species is unknown: "species invalid". However, it doesn't work.
> korg <- cbind("ktax.id" = "T02163", "tax.id" = "39947", "kegg.code" = "dosa",
"scientific.name" = "Oryza sativa japonica", "common.name" = "Japanes rice",
"entrez.gnodes" = NA, "kegg.geneid" = NA, "ncbi.geneid" = NA,
"ncbi.proteinid" = NA, "uniprot" = NA)
> dosa00940 <- pathview(gene.data = diff_cdsList,
pathway.id = "dosa00940", species = "dosa",
gene.idtype="KEGG",
limit = list(gene=max(abs(diff_cdsList)), cpd=1))
Error in pathview(gene.data = diff_cdsList, pathway.id = "dosa00940", :
This species is not annotated in KEGG!
> dosa00940 <- pathview(gene.data = diff_cdsList,
pathway.id = "00940", species = "dosa",
gene.idtype="KEGG",
limit = list(gene=max(abs(diff_cdsList)), cpd=1))
Error in pathview(gene.data = diff_cdsList, pathway.id = "00940", species = "dosa", :
This species is not annotated in KEGG!
>
> osa00940 <- pathview(gene.data = diff_cdsList,
pathway.id = "00940", species = "osa",
gene.idtype="KEGG",
limit = list(gene=max(abs(diff_cdsList)), cpd=1))
Warning: None of the genes or compounds mapped to the pathway!
Argument gene.idtype or cpd.idtype may be wrong.
Warning: No annotation package for the species osa, gene symbols not mapped!
Info: Working in directory /xxxx/xxxx
Info: Writing image file osa00940.pathview.png
>
Before I dig source codes from pathview package, I would like to get some help for this case where the additional genome for a model species is not included in pathview.
You can see more information about pathways of dosa. https://www.genome.jp/kegg-bin/show_organism?menu_type=pathway_maps&org=dosa https://rest.kegg.jp/list/pathway/dosa https://rest.kegg.jp/link/dosa/pathway
Thank you for your help,
Jiyoung
In your earlier posts in this thread I noticed that meanwhile you were able to accomplish your task through other ways, but this is also possible applying the
pathview
'hack'. Key is that you still have to set"entrez.gnodes" = "1"
, although the input are obviously not entrez ids!Hello Dr. Hooiveld,
Thank you so much for your help. I have modified korg file like your answer, and everything works perfectly! Your R codes and comments are clear. My problem is solved.
Thank you again, Jiyoung
Meanwhile, I found an alternative method from clusterProfile::browseURL(url) We can open a browse and directly visualize our genes on a desired pathway map. http://www.kegg.jp/kegg-bin/show_pathway?/[map_id]/[gene list separated by "/"] ex) https://www.kegg.jp/kegg-bin/show_pathway?dosa00940/Os02t0626600-00/Os07t0638300-01 Genes in the URL are highlighted.
Still, a problem is I cannot directly save the map.png. When I copied a link address on "Download" icon (https://www.kegg.jp/kegg-bin/show_pathway?dosa00940/Os02t0626600-00/Os07t0638300-01#downloadImage1x) on Terminal using wget (wget https://www.kegg.jp/kegg-bin/show_pathway?dosa00940/Os02t0626600-00/Os07t0638300-01#downloadImage1x), I didn't get the image but a html file.
I also tried wget -r -p option, then I downloaded too many other files and the png file was in 'www.kegg.jp/tmp/mark_pathway168688157088208/'. Too much trouble!
I still think pathview::pathview is the easiest solution, if 'dosa' can be included in pathview!
Additionally, I found this information from http://yulab-smu.top/biomedical-knowledge-mining-book/clusterprofiler-kegg.html
It still download other files, but I found a better version using wget options.
ex) downloading dosa00940.png map with two highlighted genes (Os02t0626600-00/Os07t0638300-01).