Oh, thanks for this fix. I forgot to remove the chr*_random rows when
I loaded the CpG Island BED file into R.
Just one more point though. I just found that after chromosome 1, the
annotated peaks and features were on different chromosomes in the
spreadsheet you sent to me. I suppose this is because the CpG islands
file is ordered chr1, chr2, chr3, ..., whereas the genes file is ASCII
ordered (i.e. chr1, chr10, chr11, ...), and you merge the overlaps by
list position. It would be important to make this requirement clear in
the documentation (annotatePeakInBatch.Rd), or alternatively to make
it not depend on these two tables having the same chromosome ordering.
- Dario.
---- Original message ----
>Date: Thu, 27 May 2010 14:26:12 -0400
>From: "Zhu, Julie" <julie.zhu at="" umassmed.edu="">
>Subject: Re: [BioC] ChIPpeakAnno annotatePeakInBatch error message
>To: "D.Strbenac at garvan.org.au" <d.strbenac at="" garvan.org.au="">,
"bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch="">
>
> Hi Dario,
>
> Thanks for the vigorous test of the new feature!
>
> The peak dataset contains chrX_random that is not in
> the feature dataset. I added is.na check on the
> strand which should fix the problem. I also attached
> the annotated Dataset. Please let me know if you
> encounter any problem.
>
> Best regards,
>
> Julie
>
> On 5/26/10 11:00 PM, "Dario Strbenac"
> <d.strbenac at="" garvan.org.au=""> wrote:
>
> Hello,
>
> Yes, I encountered the same problem again. This
> time I tried the code on my full table of data.
> This is my script. All the files it refers to are
> web accessible, so that you can replicate it too.
> I am definitely using version 1.5.3 of the
> package.
>
> CpGIslandsTable <-
>
read.table("http://129.94.136.7/file_dump/dario/hg18_CpG_Islands.bed",
> sep = '\t', stringsAsFactors = FALSE)
> genesTable <-
> read.csv("http://129.94.136.7/file_dump/dario/humanGenomeAnnotat
ion.csv",
> stringsAsFactors = FALSE)
> colnames(CpGIslandsTable) <- c("chr", "start",
> "end", "name")
>
> peaksRangedData <- RangedData(space =
> CpGIslandsTable$chr, ranges = IRanges(start =
> CpGIslandsTable$start, end = CpGIslandsTable$end))
> featuresRangedData <- RangedData(name =
> genesTable$name, space = genesTable$chr, strand =
> genesTable$strand, ranges = IRanges(start =
> genesTable$start, end = genesTable$end))
> featureLoc <- "TSS"
>
> annotatePeakInBatch(peaksRangedData,
> AnnotationData = featuresRangedData,
> PeakLocForDistance = "middle")
>
> > sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_Australia.1252
> LC_CTYPE=English_Australia.1252
> LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
> LC_TIME=English_Australia.1252
>
>
> attached base packages:
> [1] stats graphics grDevices utils
> datasets methods base
>
> other attached packages:
> [1] ChIPpeakAnno_1.5.3
> limma_3.4.0
> org.Hs.eg.db_2.4.1
> GO.db_2.4.1
> RSQLite_0.9-0
>
> [6] DBI_0.2-5
> AnnotationDbi_1.10.1
> BSgenome.Ecoli.NCBI.20080805_1.3.16
> BSgenome_1.16.0
> GenomicRanges_1.0.1
>
> [11] Biostrings_2.16.0
> IRanges_1.6.0
> multtest_2.4.0
> Biobase_2.8.0
> biomaRt_2.4.0
>
>
> loaded via a namespace (and not attached):
> [1] MASS_7.3-5 RCurl_1.3-1 splines_2.11.0
> survival_2.35-8 XML_2.8-1
>
> ---- Original message ----
> >Date: Mon, 24 May 2010 22:57:47 -0400
> >From: "Zhu, Julie" <julie.zhu at="" umassmed.edu="">
> >Subject: Re: [BioC] ChIPpeakAnno
> annotatePeakInBatch error message
> >To: "D.Strbenac at garvan.org.au"
> <d.strbenac at="" garvan.org.au="">,
> "bioconductor at stat.math.ethz.ch"
> <bioconductor at="" stat.math.ethz.ch="">
> >
> > Hi Dario,
> >
> > Please download dev 1.5.3 version of
> ChIPpeakAnno
> > and let me know if you encounter any problem.
> > Thanks!
> >
> > Best regards,
> >
> > Julie
> >
> > annotatePeakInBatch(peaksRangedData,
> AnnotationData
> > = featuresRangedData, PeakLocForDistance =
> "middle")
> > RangedData with 6 rows and 9 value columns
> across 2
> > spaces
> > space ranges |
> peak
> > strand feature start_position
> end_position
> > insideFeature distancetoFeature
> > <character> <iranges> |
> <character>
> > <character> <character> <numeric>
> <numeric>
> > <character> <numeric>
> > 1 1 chr1 [ 2000010, 2000310] |
> 1
> > + 1 1e+06
> 2.0e+06
> > downstream 1000160
> > 2 2 chr1 [19000000, 19000300] |
> 2
> > - 2 1e+07
> 2.0e+07
> > inside 999850
> > 3 2 chr1 [30000000, 30000300] |
> 3
> > - 2 1e+07
> 2.0e+07
> > upstream -10000150
> > 4 4 chr2 [ 300, 600] |
> 4
> > + 4 1e+03
> 5.0e+03
> > upstream -550
> > 6 6 chr2 [ 100000, 100300] |
> 6
> > + 6 1e+04
> 1.5e+04
> > downstream 90150
> > 5 5 chr2 [ 5500, 5800] |
> 5
> > - 5 6e+03
> 7.0e+03
> > downstream 1350
> > shortestDistance fromOverlappingOrNearest
> > <numeric> <character>
> > 1 1 10 NearestStart
> > 2 2 999700 NearestStart
> > 3 2 10000000 NearestStart
> > 4 4 400 NearestStart
> > 6 6 85000 NearestStart
> > 5 5 200 NearestStart
> >
> > > sessionInfo()
> > R version 2.11.0 (2010-04-22)
> > i386-apple-darwin9.8.0
> >
> > locale:
> > [1]
> >
> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> >
> > attached base packages:
> > [1] stats graphics grDevices utils
> datasets
> > methods base
> >
> > other attached packages:
> > [1] ChIPpeakAnno_1.5.3
> limma_3.4.0
> > org.Hs.eg.db_2.4.1
> >
> > [4] GO.db_2.4.1
> > RSQLite_0.9-0
> > DBI_0.2-5
> >
> > [7] AnnotationDbi_1.10.1
> >
> BSgenome.Ecoli.NCBI.20080805_1.3.16
> > BSgenome_1.16.1
> > [10] GenomicRanges_1.0.1
> > Biostrings_2.16.0
> > IRanges_1.6.1
> >
> > [13] multtest_2.4.0
> > Biobase_2.8.0
> > biomaRt_2.4.0
> >
> >
> > On 5/24/10 5:10 AM, "Dario Strbenac"
> > <d.strbenac at="" garvan.org.au=""> wrote:
> >
> > Hello,
> >
> > I made another small example of using
> > annoPeakInBatch to demonstrate to a friend,
> but it
> > has crashed. It's similar to the other
> example but
> > with different data. I'm not sure why it is
> > happening.
> >
> > Here is my small example:
> >
> > peaksT <- data.frame(chr = c("chr1", "chr1",
> > "chr1", "chr2", "chr2", "chr2"), start =
> > c(2000010, 19000000, 30000000, 300, 5500,
> 100000),
> > end = c(2000310, 19000300, 30000300, 600,
> 5800,
> > 100300))
> > featuresT <- data.frame(name = c("gene1",
> "gene2",
> > "gene3", "gene4", "gene5", "gene6"), chr =
> > c("chr1", "chr1", "chr1", "chr2", "chr2",
> "chr2"),
> > start = c(1000000, 10000000, 15000000, 1000,
> 6000,
> > 10000), end = c(2000000, 20000000, 22000000,
> 5000,
> > 7000, 15000), strand = c('+', '-', '+', '+',
> '-',
> > '+'))
> >
> > require(ChIPpeakAnno)
> >
> > peaksRangedData <- RangedData(space =
> peaksT$chr,
> > ranges = IRanges(start = peaksT$start, end =
> > peaksT$end))
> > featuresRangedData <- RangedData(name =
> > featuresT$name, space = featuresT$chr,
> strand =
> > featuresT$strand, ranges = IRanges(start =
> > featuresT$start, end = featuresT$end))
> > featureLoc <- "TSS"
> >
> > annotatePeakInBatch(peaksRangedData,
> > AnnotationData = featuresRangedData,
> > PeakLocForDistance = "middle")
> >
> > Error in if (as.character(r.n$strand[i]) ==
> "1" ||
> > as.character(r.n$strand[i]) == :
> > missing value where TRUE/FALSE needed
> >
> > My sessionInfo is :
> >
> > R version 2.11.0 (2010-04-22)
> > x86_64-unknown-linux-gnu
> >
> > locale:
> > [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
> >
> > [3] LC_TIME=en_AU.UTF-8
> > LC_COLLATE=en_AU.UTF-8
> > [5] LC_MONETARY=C
> > LC_MESSAGES=en_AU.UTF-8
> > [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
> >
> > [9] LC_ADDRESS=C
> LC_TELEPHONE=C
> >
> > [11] LC_MEASUREMENT=en_AU.UTF-8
> > LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats graphics grDevices utils
> > datasets methods base
> >
> > other attached packages:
> > [1] ChIPpeakAnno_1.5.2
> > limma_3.4.0
> >
> > [3] org.Hs.eg.db_2.4.1
> > GO.db_2.4.1
> >
> > [5] RSQLite_0.9-0
> DBI_0.2-5
> >
> > [7] AnnotationDbi_1.10.0
> >
> BSgenome.Ecoli.NCBI.20080805_1.3.16
> > [9] BSgenome_1.16.1
> > GenomicRanges_1.0.1
> >
> > [11] Biostrings_2.16.0
> > IRanges_1.6.2
> >
> > [13] multtest_2.4.0
> > Biobase_2.8.0
> >
> > [15] biomaRt_2.4.0
> >
> > loaded via a namespace (and not attached):
> > [1] MASS_7.3-6 RCurl_1.4-2
> splines_2.11.0
> > survival_2.35-8
> > [5] XML_3.1-0
> >
> > Thanks,
> > Dario.
> >
> > --------------------------------------
> > Dario Strbenac
> > Research Assistant
> > Cancer Epigenetics
> > Garvan Institute of Medical Research
> > Darlinghurst NSW 2010
> > Australia
> >
> >
> _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> >
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> >
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
>________________
>ForDarioStrbenac.xls (4489k bytes)
--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia