Entering edit mode
I need to extract the range coordinates from a VCF file. I've been working with a very large VCF file and filtered it over several parameter so that it is now a refines dataset. However, now that I wish to extract the coordinates for the deletions, I find that the IRanges show up with a width of 1, almost as though they are SNPs rather than large deletion events. How can I have IRanges recognizing the true end coordinate of the deletions?
Here is a sample from the data and my session Info is shown below...
> rowData(delly.no11.depth10_50.377_5044.precise.pass)
GRanges object with 373 ranges and 5 metadata columns:
seqnames ranges strand | paramRangeID REF ALT QUAL FILTER
<Rle> <IRanges> <Rle> | <factor> <DNAStringSet> <CharacterList> <numeric> <character>
DEL00119550 1 [ 9903713, 9903713] * | <NA> N <DEL> <NA> PASS
DEL00139228 1 [ 11524865, 11524865] * | <NA> N <DEL> <NA> PASS
DEL00085052 1 [ 20398921, 20398921] * | <NA> N <DEL> <NA> PASS
DEL00051725 1 [117858481, 117858481] * | <NA> N <DEL> <NA> PASS
DEL00033442 1 [130125517, 130125517] * | <NA> N <DEL> <NA> PASS
-------
seqinfo: 33 sequences from the genome; no seqlengths
> info(delly.no11.depth10_50.377_5044.precise.pass)
DataFrame with 373 rows and 15 columns
CIEND CIPOS CHR2 END PE MAPQ SR SRQ
<IntegerList> <IntegerList> <character> <integer> <integer> <integer> <integer> <numeric>
DEL00119550 -18,18 -18,18 1 9905275 12 60 3 0.975248
DEL00139228 -121,121 -121,121 1 11525275 17 45 7 0.897321
DEL00085052 -10,10 -10,10 1 20399442 28 60 6 1.000000
DEL00051725 -8,8 -8,8 1 117861275 20 60 2 0.974227
DEL00033442 -103,103 -103,103 1 130126131 23 28 11 0.938710
CT IMPRECISE PRECISE SVLEN SVTYPE SVMETHOD
<character> <logical> <logical> <integer> <character> <character>
DEL00119550 3to5 FALSE TRUE 1562 DEL EMBL.DELLYv0.5.6
DEL00139228 3to5 FALSE TRUE 410 DEL EMBL.DELLYv0.5.6
DEL00085052 3to5 FALSE TRUE 521 DEL EMBL.DELLYv0.5.6
DEL00051725 3to5 FALSE TRUE 2794 DEL EMBL.DELLYv0.5.6
DEL00033442 3to5 FALSE TRUE 614 DEL EMBL.DELLYv0.5.6
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.18.2 AnnotationDbi_1.28.1 Biobase_2.26.0 ggplot2_1.0.0
[5] VariantAnnotation_1.12.4 Rsamtools_1.18.2 GenomicRanges_1.18.3 GenomeInfoDb_1.2.3
[9] Biostrings_2.34.0 XVector_0.6.0 IRanges_2.0.0 S4Vectors_0.4.0
[13] BiocGenerics_0.12.1
loaded via a namespace (and not attached):
[1] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8 BiocParallel_1.0.0 biomaRt_2.22.0
[6] bitops_1.0-6 brew_1.0-6 BSgenome_1.34.0 checkmate_1.5.0 codetools_0.2-9
[11] colorspace_1.2-4 DBI_0.3.1 digest_0.6.4 fail_1.2 foreach_1.4.2
[16] GenomicAlignments_1.2.1 grid_3.1.2 gtable_0.1.2 iterators_1.0.7 MASS_7.3-35
[21] munsell_0.4.2 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.3 RCurl_1.95-4.3
[26] reshape2_1.4 RSQLite_1.0.0 rtracklayer_1.26.2 scales_0.2.4 sendmailR_1.2-1
[31] stringr_0.6.2 tools_3.1.2 XML_3.98-1.1 zlibbioc_1.12.0
Can you also show the code that you used to create/import the SV calls as a VCF in R?
Hi Tiffanie, I see deletions of width 1 in your VCF object. Why do you think the end coordinates are wrong and need to be corrected? What do you mean by "true end coordinate of the deletion"? Thanks. H.