Question

ChIPseeker/Question about ''annoatePeak' function and for Bioconductor AnnotationData Packages for L.japonicus

0

Entering edit mode

egbastakis • 0

@egbastakis-17559

Last seen 6 months ago

Germany

Hello,

I am currently trying to perform part of my ChIPseq analysis with the ChIPseeker program.

I would like to use the ''annotatePeak'' function but for that apart for the bed file with my peaks (derived from MACS), which I have it, I need two more files, the ''MeSH.Miy.eg.db'', which I also have it (from https://www.bioconductor.org/packages/devel/data/annotation/), and a ``TxDb.Lotus japonicus.knownGene`` file that I cannot find it somewhere.

Could anyone help with that?

Best,

Manolis

commands:

This is an example of the command that I should follow (from the manual of the ChIPseeker):

>peakAnno <- annotatePeak(files[[4]], tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="org.Hs.eg.db")

# txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

I would like to do the same but with Lotus japonicus.

chipseeker Bioconductor AnnotationData Packages Lotus japonicus ChIPseq • 3.0k views

ADD COMMENT • link updated 6.3 years ago by James W. MacDonald 67k • written 6.3 years ago by egbastakis • 0

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 17 hours ago

United States

The MeSH.Miy.eg.db package isn't a conventional OrgDb package, which is what ChIPseeker is expecting. That package only has one table that maps gendoo IDs to MESH IDs, so won't be particularly helpful.

The IDs you have in your GFF appear to be LotusBase IDs, so for ChIPseeker to annotate things you will probably need an OrdDb that maps those IDs to whatever annotation ChIPseeker is normally parsing from the OrgDb package. Unfortunately there doesn't seem to be much data out there for L. japonicus (NCBI has somewhere around 780 Gene IDs, but probably more RefSeq or GI numbers), so you are in the position where there isn't much existing infrastructure, so to a large extent you are on your own.

But do note that the annoDb argument is optional, and according to the help page all it adds is the Gene ID, Symbol, and Gene Name for the nearest gene. You could probably get that yourself by hand.

ADD COMMENT • link 6.3 years ago James W. MacDonald 67k

score 3 · Accepted Answer · 2018-09-27

3

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 17 hours ago

United States

The TxDb packages contain information about where the genes for your species are found in the genome. You presumably have a genome, as you would have needed one to get this far. At this point you will need a GFF file that describes the genomic locations, based on the genome you used. You can then use makeTxDbFromGFF in the GenomicFeatures package to make the TxDb.

ADD COMMENT • link 6.3 years ago James W. MacDonald 67k

0

Entering edit mode

Hi James,

Many thanks, it did work great with your advise.

Nevertheless, I stuck again little bit later...

After I executed the annotatePeak command this is what I got:

-----------------------------------------

History session starts:

> peakAnno <- annotatePeak(files[[1]], tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="MeSH.Miy.eg.db")

>> loading peak file... 2018-09-28 11:09:22

>> preparing features information... 2018-09-28 11:09:23

>> identifying nearest features... 2018-09-28 11:09:24

>> calculating distance from peak to TSS... 2018-09-28 11:09:25

>> assigning genomic annotation... 2018-09-28 11:09:25

>> adding gene annotation... 2018-09-28 11:09:35

>> assigning chromosome lengths 2018-09-28 11:09:35

>> done... 2018-09-28 11:09:35

Warning message:

In annotatePeak(files[[1]], tssRegion = c(-3000, 3000), TxDb = txdb, :

Unknown ID type, gene annotation will not be added...

> plotAnnoPie(peakAnno)

> plotAnnoBar(peakAnno)

> plotAnnoPie(peakAnno)

> vennpie(peakAnno)

> upsetplot(peakAnno)

> upsetplot(peakAnno, vennpie=TRUE)

History session ends:

--------------------------------------------------

So, I have gotten this a warning message, which actually tells that I have no list of my annotated peaks!

Though, I was able afterwords to execute several commands (), in order to produce the corresponding graphs.

I do need this list as a csv file to see the affiliation of my peaks with the corresponding genes but also for the continuation of my analysis as well (when I start using the package ''ReactomePA'' and the command

>pathway1 <- enrichPathway(as.data.frame(peakAnno)$geneId) from of the ChIPseeker manual till the end of it).

Could you please help me with that issue as well?

Greetings,

Manolis

P.S.: Please see below part of the gff3, which I have used for my annotation:

V1 V2 V3 V4 V5 V6 V7 V8
1 Lj3.0_chrC protein_coding gene 519 1371 . . .
2 Lj3.0_chrC protein_coding mRNA 519 1371 . - .
3 Lj3.0_chrC protein_coding exon 519 1371 100 - .
4 Lj3.0_chrC protein_coding CDS 520 1365 . - 0
5 Lj3.0_chrC processed_transcript gene 1323 1431 . . .
6 Lj3.0_chrC processed_transcript mRNA 1323 1431 . + .
V9
1 ID=Ljchlorog3v0000020;Name=NODE_65561_length_799_cov_134.377975.path1;Type=protein_coding
2 ID=Ljchlorog3v0000020.1;Parent=Ljchlorog3v0000020;Name=Ljchlorog3v0000020.1;Type=protein_coding
3 ID=Ljchlorog3v0000020.1.exon.1;Parent=Ljchlorog3v0000020.1;Type=protein_coding
4 ID=Ljchlorog3v0000020.1.CDS.1;Parent=Ljchlorog3v0000020.1;Type=protein_coding
5 ID=Ljchlorog3v0000030;Name=NODE_120091_length_55_cov_10.472727.path1;Type=NULL
6 ID=Ljchlorog3v0000030.1;Parent=Ljchlorog3v0000030;Name=Ljchlorog3v0000030.1;Type=NULL

ADD REPLY • link 6.3 years ago egbastakis • 0

0

Entering edit mode

Hi James,

Many thanks, it did work great with your advise.

Nevertheless, I stuck again little bit later...

After I executed the annotatePeak command this is what I got:

-----------------------------------------

History session starts:

> peakAnno <- annotatePeak(files[[1]], tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="MeSH.Miy.eg.db")

>> loading peak file... 2018-09-28 11:09:22

>> preparing features information... 2018-09-28 11:09:23

>> identifying nearest features... 2018-09-28 11:09:24

>> calculating distance from peak to TSS... 2018-09-28 11:09:25

>> assigning genomic annotation... 2018-09-28 11:09:25

>> adding gene annotation... 2018-09-28 11:09:35

>> assigning chromosome lengths 2018-09-28 11:09:35

>> done... 2018-09-28 11:09:35

Warning message:

In annotatePeak(files[[1]], tssRegion = c(-3000, 3000), TxDb = txdb, :

Unknown ID type, gene annotation will not be added...

> plotAnnoPie(peakAnno)

> plotAnnoBar(peakAnno)

> plotAnnoPie(peakAnno)

> vennpie(peakAnno)

> upsetplot(peakAnno)

> upsetplot(peakAnno, vennpie=TRUE)

History session ends:

--------------------------------------------------

So, I have gotten this a warning message, which actually tells that I have no list of my annotated peaks!

Though, I was able afterwords to execute several commands (), in order to produce the corresponding graphs.

I do need this list as a csv file to see the affiliation of my peaks with the corresponding genes but also for the continuation of my analysis as well (when I start using the package ''ReactomePA'' and the command

>pathway1 <- enrichPathway(as.data.frame(peakAnno)$geneId) from of the ChIPseeker manual till the end of it).

Could you please help me with that issue as well?

Greetings,

Manolis

P.S.: Please see below part of the gff3, which I have used for my annotation:

V1 V2 V3 V4 V5 V6 V7 V8
1 Lj3.0_chrC protein_coding gene 519 1371 . . .
2 Lj3.0_chrC protein_coding mRNA 519 1371 . - .
3 Lj3.0_chrC protein_coding exon 519 1371 100 - .
4 Lj3.0_chrC protein_coding CDS 520 1365 . - 0
5 Lj3.0_chrC processed_transcript gene 1323 1431 . . .
6 Lj3.0_chrC processed_transcript mRNA 1323 1431 . + .
V9
1 ID=Ljchlorog3v0000020;Name=NODE_65561_length_799_cov_134.377975.path1;Type=protein_coding
2 ID=Ljchlorog3v0000020.1;Parent=Ljchlorog3v0000020;Name=Ljchlorog3v0000020.1;Type=protein_coding
3 ID=Ljchlorog3v0000020.1.exon.1;Parent=Ljchlorog3v0000020.1;Type=protein_coding
4 ID=Ljchlorog3v0000020.1.CDS.1;Parent=Ljchlorog3v0000020.1;Type=protein_coding
5 ID=Ljchlorog3v0000030;Name=NODE_120091_length_55_cov_10.472727.path1;Type=NULL
6 ID=Ljchlorog3v0000030.1;Parent=Ljchlorog3v0000030;Name=Ljchlorog3v0000030.1;Type=NULL

ADD REPLY • link 6.3 years ago egbastakis • 0