ChIPpeakAnno, MACS format annotation
3
0
Entering edit mode
@khademul-islam-3826
Last seen 7.8 years ago

Hi,

I just have installed latest ChIPpeakAnno and tried example code and data. But got error. Same error with my data as well. How to solve this?

# Just another question: when it annotate to nearest TSS, does it use Summit or Start position from MACS file?


https://bioconductor.org/packages/devel/bioc/vignettes/ChIPpeakAnno/inst/doc/ChIPpeakAnno.html

macs <- system.file("extdata", "MACS_peaks.xls", package="ChIPpeakAnno")

macsOutput <- toGRanges(macs, format="MACS")

duplicated or NA names found. Rename all the names by numbers.

Many thanks,

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 24 (Workstation Edition)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8   
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8  
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                
 [9] LC_ADDRESS=C               LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C      

attached base packages:
 [1] stats4    parallel  grid      stats     graphics  grDevices utils   
 [8] datasets  methods   base    

other attached packages:
 [1] EnsDb.Hsapiens.v75_2.1.0 ensembldb_1.6.2          GenomicFeatures_1.26.0 
 [4] AnnotationDbi_1.36.0     Biobase_2.34.0           ChIPpeakAnno_3.8.9     
 [7] VennDiagram_1.6.17       futile.logger_1.4.3      GenomicRanges_1.26.1   
[10] GenomeInfoDb_1.10.1      Biostrings_2.42.1        XVector_0.14.0         
[13] IRanges_2.8.1            S4Vectors_0.12.1         BiocGenerics_0.20.0    

bioconductor ChIPpeakAnno • 3.8k views
ADD COMMENT
2
Entering edit mode
Ou, Jianhong ★ 1.3k
@ou-jianhong-4539
Last seen 24 days ago
United States

Hi,

Thanks for selecting ChIPpeakAnno as your annotation tool.

First question, that is a warning. I am consider to change it to a message. That message tells you the function could not find peak name or there are duplicated peak names. And the toGRanges function will automatically give a name for each peak. 

When it annotate to nearest TSS by default, it use start position for calculation.

Let me know if you still have any question.

ADD COMMENT
0
Entering edit mode

Hi,

I am trying to make a custom annotation file to use with ChIPpeakAnno. I am starting with an Ensembl GTF file. The following command gives the error: duplicated or NA names found. Rename all the names by numbers.

annoData <- toGRanges(gff, format="GFF")

Which part of the GTF file does it not like?

If I run annotatePeakInBatch using this file:

annotatedPeak <- annotatePeakInBatch(myPeakList=peaks, AnnotationData=annoData, ignore.strand=TRUE)

I get the error: Error inrownames<-(tmp, value = c("(-73.9,5e+03]", "(5e+03,9.99e+03]", : invalid rownames length In addition: Warning message: In annotatePeakInBatch(myPeakList = peaks, AnnotationData = annoData, : not all the seqnames of myPeakList is in the AnnotationData.

Could someone please explain what this means and what I need to change?

Thank you!

ADD REPLY
0
Entering edit mode

Hi,

You mentioned that you downloaded the annotation file as GTF format from Ensembl. If this is correct, toGranges with format = "GFF" is not correct since GTF format is different from GFF format. Without changing your code, could you please download the annotation file as a GFF file format instead? Alternatively, you can use the following code to get the annotation assuming that you are interested in the human gene annotation.

library(EnsDb.Hsapiens.v86) annoData <- toGRanges(EnsDb.Hsapiens.v86, feature="gene")

Best regards, Julie

ADD REPLY
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 14 months ago
United States

Lucy,

You mentioned that you downloaded the annotation file as GTF format from Ensembl. If this is correct, toGranges with format = "GFF" is not correct since GTF format is different from GFF format. Without changing your code, you could download the annotation file as a GFF file format instead. Alternatively, you can use the following code to get the annotation assuming that you are interested in the human gene annotation.

library(EnsDb.Hsapiens.v86) annoData <- toGRanges(EnsDb.Hsapiens.v86, feature="gene")

Best regards, Julie

ADD COMMENT
0
Entering edit mode

Thank you Julie.

I wasn't sure whether I could use the GFF option as Ensembl states that "The GTF (General Transfer Format) is identical to GFF version 2" https://www.ensembl.org/info/website/upload/gff.html

I have a matched RNA-seq dataset for which I used the Ensembl GTF file for annotation, so I would like to use the exact same annotation version for my peak data. If I download the equivalent GFF file, does this contain all of the same information as the GTF file?

ADD REPLY
0
Entering edit mode

Lucy,

Thanks for the clarification!

Could you please post a few lines of the gtf annotation you used for analyzing your RNA-seq dataset? Thanks!

Best regards,

Julie

ADD REPLY
0
Entering edit mode
#!genome-build GRCh38.p12
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.27
#!genebuild-last-updated 2018-07
chr1    havana  gene    11869   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";
chr1    havana  transcript  11869   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; tag "basic"; transcript_support_level "1";
chr1    havana  exon    11869   12227   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00002234944"; exon_version "1"; tag "basic"; transcript_support_level "1";
chr1    havana  exon    12613   12721   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00003582793"; exon_version "1"; tag "basic"; transcript_support_level "1";
chr1    havana  exon    13221   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "3"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00002312635"; exon_version "1"; tag "basic"; transcript_support_level "1";
chr1    havana  transcript  12010   13670   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-201"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; tag "basic"; transcript_support_level "NA";
chr1    havana  exon    12010   12057   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-201"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; exon_id "ENSE00001948541"; exon_version "1"; tag "basic"; transcript_support_level "NA";
chr1    havana  exon    12179   12227   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-201"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; exon_id "ENSE00001671638"; exon_version "2"; tag "basic"; transcript_support_level "NA";
ADD REPLY
0
Entering edit mode

Sorry that isn't very easy to read! I would be happy to send you the file if it is easier.

ADD REPLY
0
Entering edit mode

Lucy,

Please send me the gtf file (julie.zhu@umassmed.edujulie.zhu@umassmed.edu). Thanks!

BTW, I just noticed that you continued with an old thread which is about MACs format. Could you please start a new thread as ChIPpeakAnno::toGRanges GTF format instead to facilitate future searches? Thanks!

Best,

Julie

ADD REPLY
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 14 months ago
United States

Lucy, Please try the following code snippet for importing the gtf file hg38_200000.gtf.

library(refGenome)

gtf = ensemblGenome()

read.gtf(gtf, filename = "hg38_200000.gtf")

genes = gtf@ev$genes[ ,c("geneid","genename", "start", "end", "strand", "seqid")]

annoData <- toGRanges(genes, format="others", colNames=c("names", "gene_name", "start", "end", "strand", "space"))

Convert peaks file to GRanges object

peaks <- toGRanges("peaks_counts.bed", format="BED", header=FALSE)

peaks <- peaks[width(peaks) >0]

annotatedPeak <- annotatePeakInBatch(myPeakList=peaks, AnnotationData=annoData, ignore.strand=TRUE)

Best regards, Julie

ADD COMMENT

Login before adding your answer.

Traffic: 568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6