Entering edit mode
On 07/16/2013 05:40 PM, Michael Lawrence wrote:
> The necessary update to VariantTools will propagate soon to devel.
Or you
> can grab it from svn.
This isn't building in devel; does it require something special with
gmapR? Dan
and I looked at this (??) for the conference, but were not quite able
to master
the automake foo required to get off the ground.
Martin
>
> Michael
>
>
> On Tue, Jul 16, 2013 at 3:27 PM, Taylor, Sean D <sdtaylor at="" fhcrc.org=""> wrote:
>
>> In order to work through some of the code, I installed the devel
version
>> of R and updated all the packages. Now when I run tallyVariants() I
get the
>> following error message:****
>>
>> ** **
>>
>> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
****
>>
>> object 'castList' not found****
>>
>> ** **
>>
>>> traceback()****
>>
>> 12: get(name, envir = asNamespace(pkg), inherits = FALSE)****
>>
>> 11: IRanges:::castList****
>>
>> 10: safe_mclapply(ind, function(i, ...) {****
>>
>> FUN(gr[i], ...)****
>>
>> }, ...)****
>>
>> 9: applyByChromosome(si, bam_tally_region, mc.cores = mc.cores)****
>>
>> 8: .local(x, ...)****
>>
>> 7: tallyVariants(x, tally.param)****
>>
>> 6: tallyVariants(x, tally.param)****
>>
>> 5: .local(x, ...)****
>>
>> 4: callVariants(BamFile(x), ...)****
>>
>> 3: callVariants(BamFile(x), ...)****
>>
>> 2: callVariants(destination, tally.param)****
>>
>> 1: callVariants(destination, tally.param)****
>>
>> ** **
>>
>> Perhaps this is something missing/changed in the devel version of
IRanges?
>> ****
>>
>> ** **
>>
>> Updated sessionInfo() below:****
>>
>> ** **
>>
>> R version 3.0.1 (2013-05-16)****
>>
>> Platform: x86_64-unknown-linux-gnu (64-bit)****
>>
>> ** **
>>
>> locale:****
>>
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ****
>>
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ****
>>
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 ****
>>
>> [7] LC_PAPER=C LC_NAME=C ****
>>
>> [9] LC_ADDRESS=C LC_TELEPHONE=C ****
>>
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ****
>>
>> ** **
>>
>> attached base packages:****
>>
>> [1] grid parallel stats graphics grDevices utils
datasets *
>> ***
>>
>> [8] methods base ****
>>
>> ** **
>>
>> other attached packages:****
>>
>> [1] BiocInstaller_1.11.3 latticeExtra_0.6-24
>> lattice_0.20-15 ****
>>
>> [4] RColorBrewer_1.0-5 genoPlotR_0.8
>> ade4_1.5-2 ****
>>
>> [7] gmapR_1.3.2 VariantTools_1.3.2
>> VariantAnnotation_1.7.34****
>>
>> [10] Rsamtools_1.13.24 Biostrings_2.29.13
>> GenomicRanges_1.13.33 ****
>>
>> [13] XVector_0.1.0 IRanges_1.19.18
>> BiocGenerics_0.7.3 ****
>>
>> ** **
>>
>> loaded via a namespace (and not attached):****
>>
>> [1] AnnotationDbi_1.23.16 Biobase_2.20.0 biomaRt_2.17.2
>> ****
>>
>> [4] bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-7
>> ****
>>
>> [7] GenomicFeatures_1.13.19 graph_1.39.3
>> Matrix_1.0-12 ****
>>
>> [10] RBGL_1.37.2 RCurl_1.95-4.1
>> RSQLite_0.11.4 ****
>>
>> [13] rtracklayer_1.20.2 stats4_3.0.1
>> tools_3.0.1 ****
>>
>> [16] XML_3.96-1.1 zlibbioc_1.6.0 ****
>>
>>> ** **
>>
>> ** **
>>
>> Thanks again,****
>>
>> Sean****
>>
>> ** **
>>
>> *From:* Michael Lawrence [mailto:lawrence.michael at gene.com]
>> *Sent:* Saturday, July 13, 2013 1:59 PM
>> *To:* Taylor, Sean D
>> *Cc:* Michael Lawrence; Pages, Herve; Obenchain, Valerie J;
>> bioconductor at r-project.org
>>
>> *Subject:* Re: [BioC] Filtering BAM files by start position for
>> VariantTools****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> On Sat, Jul 13, 2013 at 1:50 PM, Taylor, Sean D <sdtaylor at="" fhcrc.org="">
>> wrote:****
>>
>> Thanks Michael, ****
>>
>> This is an interesting idea. Usually, we resolve PCR/optical
duplicates
>> with the Picard MarkDuplicates command, which just chooses the read
with
>> the highest average quality. This approximates what you want, I
think. ***
>> *
>>
>> Is the Picard MarkDuplicates command run by default? Or is this a
separate
>> command from a different package?****
>>
>> ** **
>>
>> It's a separate package, basically the Java implementation of
samtools;
>> you might that first and see how it improves your error rates,
before
>> taking a more complicated approach.****
>>
>> Also, I find it hard to believe that sequence errors would be
causing
>> you trouble at 10%, as long as you're filtering for quality and
have decent
>> coverage. PCR might in limited cases, and Picard effectively takes
care of
>> that.****
>>
>> 10% is fine, but we have problems at 1% and lower. I think even the
>> defaults in tallyVariants() use 1% as a cutoff if I?m not
mistaken.****
>>
>> ****
>>
>> ** **
>>
>> It uses a ~4% cutoff. And yes, you'll start running into issues
around 1%.
>> Calling at such a low frequency is kind of ambitious.****
>>
>> ****
>>
>> What you want is to specify the cycleBreaks argument to
>> VariantTallyParam. It allows you to define bins by cycle. So if you
made
>> bins for all 100bp, you could do things like generate a consensus
by read
>> family. A family corresponding to position X would be cycle bin #1
at X,
>> cycle bin #2 at X+1, etc. Then just pick the alt (or ref) with the
highest
>> count. Might be tricky to implement efficiently for the whole
genome. Of
>> course, this assumes you've got single end reads, otherwise this
won't
>> work, because the mate is not considered.****
>>
>> Cool, I will try that out, that may be just what I was looking for.
It
>> might be tricky for the whole genome, but we are restricting
ourselves to
>> just the mitochondrial genome. Still gives us several thousand
start sites
>> to work from but should be easier than the whole genome. As for
single end
>> reads, can I just filter on the strand (i.e. ?+? or ?-?)? Or will I
have to
>> perform separate alignments for each of the paired reads?****
>>
>> ****
>>
>> ** **
>>
>> Now that I think about it, you should be fine just considering the
>> alignments individually (as if they were unpaired).****
>>
>> ****
>>
>> ** **
>>
>
> [[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793