VariantAnnotation - MatrixToSnpMatrix - only returns NAs
2
0
Entering edit mode
@lavinia-gordon-2959
Last seen 10.3 years ago
Hi, I have just started working with VCF files and have discovered the VariantAnnotation package, many thanks for making these functions available. Following the code outlined in the reference manual for MatrixToSnpMatrix, my VCF returns only NA values: > head(geno(vcf)$GT) GHS008 GHS015 GHS025 GHS026 GHS027 GHS031 GHS033 GHS034 GHS036 chrM:73 "1/1" "0/0" "1/1" "0/0" "0/0" "1/1" "0/0" "0/0" "0/0" chrM:119 "0/0" "0/0" "0/0" "1/1" "1/1" "0/0" "0/0" "0/0" "0/0" rs72619361 "0/0" "1/1" "0/0" "0/0" "0/0" "0/0" "1/1" "1/1" "1/1" chrM:150 "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" chrM:189 "0/0" "0/0" "0/0" "1/1" "1/1" "0/0" "0/0" "0/0" "0/0" chrM:195 "1/1" "1/1" "1/1" "0/0" "0/0" "1/1" "1/1" "1/1" "1/1" > head(t(as(mat$genotype, "character"))) GHS008 GHS015 GHS025 GHS026 GHS027 GHS031 GHS033 GHS034 GHS036 chrM:73 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" chrM:119 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" rs72619361 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" chrM:150 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" chrM:189 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" chrM:195 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" I have run the reference manual code with the supplied VCF and it all looks good. I have no reason to suspect that there is anything wrong with my VCF. Could anyone give me any tips as to how I can troubleshoot this and work out why all the NAs are appearing? Many thanks, Lavinia Gordon Senior Research Officer Quantitative Sciences Core, Bioinformatics Murdoch Childrens Research Institute The Royal Children's Hospital Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 www.mcri.edu.au > vcf class: VCF dim: 4665545 9 genome: hg19 exptData(1): header fixed(4): REF ALT QUAL FILTER info(19): AC AF ... SB EFF geno(5): AD DP GQ GT PL rownames(4665545): chrM:73 chrM:119 ... chrUn_gl000249:14244 chrUn_gl000249:16222 rowData values names(1): paramRangeID colnames(9): GHS008 GHS015 ... GHS034 GHS036 colData names(1): Samples > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] snpStats_1.8.1 Matrix_1.0-10 lattice_0.20-13 [4] survival_2.37-2 VariantAnnotation_1.4.6 Rsamtools_1.10.2 [7] Biostrings_2.26.2 GenomicRanges_1.10.6 IRanges_1.16.4 [10] BiocGenerics_0.4.0 BiocInstaller_1.8.3 loaded via a namespace (and not attached): [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 [4] bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 [7] GenomicFeatures_1.10.1 grid_2.15.2 parallel_2.15.2 [10] RCurl_1.95-3 RSQLite_0.11.2 rtracklayer_1.18.2 [13] stats4_2.15.2 tools_2.15.2 XML_3.95-0.1 [16] zlibbioc_1.4.0 ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
• 1.4k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
Hi Lavinia, If you can use the development branch MatrixToSnpMatrix() has been replaced by genotypeToSnpMatrix(). This is much more full featured and robust function. However if you are using the release branch you still need to work with MatrixToSnpMatrix(). If this is the case, please read the man page at ?MatrixToSnpMatrix This page outlines the cases for which the values will be NA. You should be seeing warnings such as 'only diploid calls are included', 'only single nucleotide variants are included' or 'variants with >1 ALT allele are set to NA'. If you are not seeing such warnings, please send me a small sample of your VCF so I can reproduce this problem. Valerie On 01/22/13 17:35, Lavinia Gordon wrote: > Hi, I have just started working with VCF files and have discovered the VariantAnnotation package, many thanks for making these functions available. > Following the code outlined in the reference manual for MatrixToSnpMatrix, my VCF returns only NA values: >> head(geno(vcf)$GT) > GHS008 GHS015 GHS025 GHS026 GHS027 GHS031 GHS033 GHS034 GHS036 > chrM:73 "1/1" "0/0" "1/1" "0/0" "0/0" "1/1" "0/0" "0/0" "0/0" > chrM:119 "0/0" "0/0" "0/0" "1/1" "1/1" "0/0" "0/0" "0/0" "0/0" > rs72619361 "0/0" "1/1" "0/0" "0/0" "0/0" "0/0" "1/1" "1/1" "1/1" > chrM:150 "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" > chrM:189 "0/0" "0/0" "0/0" "1/1" "1/1" "0/0" "0/0" "0/0" "0/0" > chrM:195 "1/1" "1/1" "1/1" "0/0" "0/0" "1/1" "1/1" "1/1" "1/1" >> head(t(as(mat$genotype, "character"))) > GHS008 GHS015 GHS025 GHS026 GHS027 GHS031 GHS033 GHS034 GHS036 > chrM:73 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > chrM:119 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > rs72619361 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > chrM:150 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > chrM:189 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > chrM:195 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > > I have run the reference manual code with the supplied VCF and it all looks good. > I have no reason to suspect that there is anything wrong with my VCF. > Could anyone give me any tips as to how I can troubleshoot this and work out why all the NAs are appearing? > > Many thanks, > > Lavinia Gordon > Senior Research Officer > Quantitative Sciences Core, Bioinformatics > > Murdoch Childrens Research Institute > The Royal Children's Hospital > Flemington Road Parkville Victoria 3052 Australia > T 03 8341 6221 > www.mcri.edu.au > >> vcf > class: VCF > dim: 4665545 9 > genome: hg19 > exptData(1): header > fixed(4): REF ALT QUAL FILTER > info(19): AC AF ... SB EFF > geno(5): AD DP GQ GT PL > rownames(4665545): chrM:73 chrM:119 ... chrUn_gl000249:14244 > chrUn_gl000249:16222 > rowData values names(1): paramRangeID > colnames(9): GHS008 GHS015 ... GHS034 GHS036 > colData names(1): Samples > >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] snpStats_1.8.1 Matrix_1.0-10 lattice_0.20-13 > [4] survival_2.37-2 VariantAnnotation_1.4.6 Rsamtools_1.10.2 > [7] Biostrings_2.26.2 GenomicRanges_1.10.6 IRanges_1.16.4 > [10] BiocGenerics_0.4.0 BiocInstaller_1.8.3 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 > [4] bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 > [7] GenomicFeatures_1.10.1 grid_2.15.2 parallel_2.15.2 > [10] RCurl_1.95-3 RSQLite_0.11.2 rtracklayer_1.18.2 > [13] stats4_2.15.2 tools_2.15.2 XML_3.95-0.1 > [16] zlibbioc_1.4.0 > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Valerie, Thank you for your reply. I'll investigate the development branch option. I did see the warnings however I find it hard to believe that these should apply to every single entry in my VCF. With regards, Lavinia Gordon Senior Research Officer Quantitative Sciences Core, Bioinformatics Murdoch Childrens Research Institute The Royal Children's Hospital Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 www.mcri.edu.au -----Original Message----- From: Valerie Obenchain [mailto:vobencha@fhcrc.org] Sent: Wednesday, 23 January 2013 4:14 PM To: Lavinia Gordon Cc: bioconductor at r-project.org Subject: Re: [BioC] VariantAnnotation - MatrixToSnpMatrix - only returns NAs Hi Lavinia, If you can use the development branch MatrixToSnpMatrix() has been replaced by genotypeToSnpMatrix(). This is much more full featured and robust function. However if you are using the release branch you still need to work with MatrixToSnpMatrix(). If this is the case, please read the man page at ?MatrixToSnpMatrix This page outlines the cases for which the values will be NA. You should be seeing warnings such as 'only diploid calls are included', 'only single nucleotide variants are included' or 'variants with >1 ALT allele are set to NA'. If you are not seeing such warnings, please send me a small sample of your VCF so I can reproduce this problem. Valerie On 01/22/13 17:35, Lavinia Gordon wrote: > Hi, I have just started working with VCF files and have discovered the VariantAnnotation package, many thanks for making these functions available. > Following the code outlined in the reference manual for MatrixToSnpMatrix, my VCF returns only NA values: >> head(geno(vcf)$GT) > GHS008 GHS015 GHS025 GHS026 GHS027 GHS031 GHS033 GHS034 GHS036 > chrM:73 "1/1" "0/0" "1/1" "0/0" "0/0" "1/1" "0/0" "0/0" "0/0" > chrM:119 "0/0" "0/0" "0/0" "1/1" "1/1" "0/0" "0/0" "0/0" "0/0" > rs72619361 "0/0" "1/1" "0/0" "0/0" "0/0" "0/0" "1/1" "1/1" "1/1" > chrM:150 "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" > chrM:189 "0/0" "0/0" "0/0" "1/1" "1/1" "0/0" "0/0" "0/0" "0/0" > chrM:195 "1/1" "1/1" "1/1" "0/0" "0/0" "1/1" "1/1" "1/1" "1/1" >> head(t(as(mat$genotype, "character"))) > GHS008 GHS015 GHS025 GHS026 GHS027 GHS031 GHS033 GHS034 GHS036 > chrM:73 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > chrM:119 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > rs72619361 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > chrM:150 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > chrM:189 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > chrM:195 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > > I have run the reference manual code with the supplied VCF and it all looks good. > I have no reason to suspect that there is anything wrong with my VCF. > Could anyone give me any tips as to how I can troubleshoot this and work out why all the NAs are appearing? > > Many thanks, > > Lavinia Gordon > Senior Research Officer > Quantitative Sciences Core, Bioinformatics > > Murdoch Childrens Research Institute > The Royal Children's Hospital > Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 > www.mcri.edu.au > >> vcf > class: VCF > dim: 4665545 9 > genome: hg19 > exptData(1): header > fixed(4): REF ALT QUAL FILTER > info(19): AC AF ... SB EFF > geno(5): AD DP GQ GT PL > rownames(4665545): chrM:73 chrM:119 ... chrUn_gl000249:14244 > chrUn_gl000249:16222 > rowData values names(1): paramRangeID > colnames(9): GHS008 GHS015 ... GHS034 GHS036 colData names(1): Samples > >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] snpStats_1.8.1 Matrix_1.0-10 lattice_0.20-13 > [4] survival_2.37-2 VariantAnnotation_1.4.6 Rsamtools_1.10.2 > [7] Biostrings_2.26.2 GenomicRanges_1.10.6 IRanges_1.16.4 > [10] BiocGenerics_0.4.0 BiocInstaller_1.8.3 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 > [4] bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 > [7] GenomicFeatures_1.10.1 grid_2.15.2 parallel_2.15.2 > [10] RCurl_1.95-3 RSQLite_0.11.2 rtracklayer_1.18.2 > [13] stats4_2.15.2 tools_2.15.2 XML_3.95-0.1 > [16] zlibbioc_1.4.0 > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com If you have any question, please contact MCRI IT Helpdesk for further assistance. ______________________________________________________________________ ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
ADD REPLY
0
Entering edit mode
@lavinia-gordon-2959
Last seen 10.3 years ago
Hi Valerie, Just one other thought, is it possible that MatrixToSnpMatrix() cannot work with unphased data? My VCF is unphased. If I change the file VariantAnnotation\extdata\chr22.vcf to unphased that gives all NA values. > vcf <- readVcf("chr22.vcf", "hg19") > calls <- geno(vcf)$GT > a0 <- ref(vcf) > a1 <- alt(vcf) > mat <- MatrixToSnpMatrix(calls, a0, a1) > head(t(as(mat$genotype, "character"))) HG00096 HG00097 HG00099 HG00100 HG00101 rs7410291 "NA" "NA" "NA" "NA" "NA" rs147922003 "NA" "NA" "NA" "NA" "NA" rs114143073 "NA" "NA" "NA" "NA" "NA" rs141778433 "NA" "NA" "NA" "NA" "NA" rs182170314 "NA" "NA" "NA" "NA" "NA" rs115145310 "NA" "NA" "NA" "NA" "NA" With regards, Lavinia Gordon Senior Research Officer Quantitative Sciences Core, Bioinformatics Murdoch Childrens Research Institute The Royal Children's Hospital Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 www.mcri.edu.au -----Original Message----- From: Lavinia Gordon Sent: Thursday, 24 January 2013 8:47 AM To: 'Valerie Obenchain' Cc: bioconductor at r-project.org Subject: RE: [BioC] VariantAnnotation - MatrixToSnpMatrix - only returns NAs Hi Valerie, Thank you for your reply. I'll investigate the development branch option. I did see the warnings however I find it hard to believe that these should apply to every single entry in my VCF. With regards, Lavinia Gordon Senior Research Officer Quantitative Sciences Core, Bioinformatics Murdoch Childrens Research Institute The Royal Children's Hospital Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 www.mcri.edu.au -----Original Message----- From: Valerie Obenchain [mailto:vobencha@fhcrc.org] Sent: Wednesday, 23 January 2013 4:14 PM To: Lavinia Gordon Cc: bioconductor at r-project.org Subject: Re: [BioC] VariantAnnotation - MatrixToSnpMatrix - only returns NAs Hi Lavinia, If you can use the development branch MatrixToSnpMatrix() has been replaced by genotypeToSnpMatrix(). This is much more full featured and robust function. However if you are using the release branch you still need to work with MatrixToSnpMatrix(). If this is the case, please read the man page at ?MatrixToSnpMatrix This page outlines the cases for which the values will be NA. You should be seeing warnings such as 'only diploid calls are included', 'only single nucleotide variants are included' or 'variants with >1 ALT allele are set to NA'. If you are not seeing such warnings, please send me a small sample of your VCF so I can reproduce this problem. Valerie On 01/22/13 17:35, Lavinia Gordon wrote: > Hi, I have just started working with VCF files and have discovered the VariantAnnotation package, many thanks for making these functions available. > Following the code outlined in the reference manual for MatrixToSnpMatrix, my VCF returns only NA values: >> head(geno(vcf)$GT) > GHS008 GHS015 GHS025 GHS026 GHS027 GHS031 GHS033 GHS034 GHS036 > chrM:73 "1/1" "0/0" "1/1" "0/0" "0/0" "1/1" "0/0" "0/0" "0/0" > chrM:119 "0/0" "0/0" "0/0" "1/1" "1/1" "0/0" "0/0" "0/0" "0/0" > rs72619361 "0/0" "1/1" "0/0" "0/0" "0/0" "0/0" "1/1" "1/1" "1/1" > chrM:150 "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" > chrM:189 "0/0" "0/0" "0/0" "1/1" "1/1" "0/0" "0/0" "0/0" "0/0" > chrM:195 "1/1" "1/1" "1/1" "0/0" "0/0" "1/1" "1/1" "1/1" "1/1" >> head(t(as(mat$genotype, "character"))) > GHS008 GHS015 GHS025 GHS026 GHS027 GHS031 GHS033 GHS034 GHS036 > chrM:73 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > chrM:119 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > rs72619361 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > chrM:150 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > chrM:189 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > chrM:195 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" > > I have run the reference manual code with the supplied VCF and it all looks good. > I have no reason to suspect that there is anything wrong with my VCF. > Could anyone give me any tips as to how I can troubleshoot this and work out why all the NAs are appearing? > > Many thanks, > > Lavinia Gordon > Senior Research Officer > Quantitative Sciences Core, Bioinformatics > > Murdoch Childrens Research Institute > The Royal Children's Hospital > Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 > www.mcri.edu.au > >> vcf > class: VCF > dim: 4665545 9 > genome: hg19 > exptData(1): header > fixed(4): REF ALT QUAL FILTER > info(19): AC AF ... SB EFF > geno(5): AD DP GQ GT PL > rownames(4665545): chrM:73 chrM:119 ... chrUn_gl000249:14244 > chrUn_gl000249:16222 > rowData values names(1): paramRangeID > colnames(9): GHS008 GHS015 ... GHS034 GHS036 colData names(1): Samples > >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] snpStats_1.8.1 Matrix_1.0-10 lattice_0.20-13 > [4] survival_2.37-2 VariantAnnotation_1.4.6 Rsamtools_1.10.2 > [7] Biostrings_2.26.2 GenomicRanges_1.10.6 IRanges_1.16.4 > [10] BiocGenerics_0.4.0 BiocInstaller_1.8.3 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 > [4] bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 > [7] GenomicFeatures_1.10.1 grid_2.15.2 parallel_2.15.2 > [10] RCurl_1.95-3 RSQLite_0.11.2 rtracklayer_1.18.2 > [13] stats4_2.15.2 tools_2.15.2 XML_3.95-0.1 > [16] zlibbioc_1.4.0 > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com If you have any question, please contact MCRI IT Helpdesk for further assistance. ______________________________________________________________________ ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
ADD COMMENT
0
Entering edit mode
Hello, On 01/23/2013 06:29 PM, Lavinia Gordon wrote: > Hi Valerie, > Just one other thought, is it possible that MatrixToSnpMatrix() cannot work with unphased data? Yes, you are correct. This is my oversight. The man page states that 'no distinction is made between phased and unphased genotypes' but certainly there is. Thanks for reporting this. Now fixed in release version 1.4.7 which should be available through biocLite() Friday ~9am PST. The better solution would still be to use genoTypeToSnpMatrix(). We've actually deprecated MatrixToSnpMatrix() in devel because of it's shortcomings. You might also check out snpSummary() in devel. These are new contributions and we'd be interested in any feedback. Credit for these functions go to Stephanie Gogarten and Chris Wallace. Thanks, Valerie My VCF is unphased. If I change the file VariantAnnotation\extdata\chr22.vcf to unphased that gives all NA values. > >> vcf<- readVcf("chr22.vcf", "hg19") >> calls<- geno(vcf)$GT >> a0<- ref(vcf) >> a1<- alt(vcf) >> mat<- MatrixToSnpMatrix(calls, a0, a1) >> head(t(as(mat$genotype, "character"))) > HG00096 HG00097 HG00099 HG00100 HG00101 > rs7410291 "NA" "NA" "NA" "NA" "NA" > rs147922003 "NA" "NA" "NA" "NA" "NA" > rs114143073 "NA" "NA" "NA" "NA" "NA" > rs141778433 "NA" "NA" "NA" "NA" "NA" > rs182170314 "NA" "NA" "NA" "NA" "NA" > rs115145310 "NA" "NA" "NA" "NA" "NA" > > With regards, > > Lavinia Gordon > Senior Research Officer > Quantitative Sciences Core, Bioinformatics > > Murdoch Childrens Research Institute > The Royal Children's Hospital > Flemington Road Parkville Victoria 3052 Australia > T 03 8341 6221 > www.mcri.edu.au > > > -----Original Message----- > From: Lavinia Gordon > Sent: Thursday, 24 January 2013 8:47 AM > To: 'Valerie Obenchain' > Cc: bioconductor at r-project.org > Subject: RE: [BioC] VariantAnnotation - MatrixToSnpMatrix - only returns NAs > > Hi Valerie, > > Thank you for your reply. I'll investigate the development branch option. > I did see the warnings however I find it hard to believe that these should apply to every single entry in my VCF. > With regards, > > Lavinia Gordon > Senior Research Officer > Quantitative Sciences Core, Bioinformatics > > Murdoch Childrens Research Institute > The Royal Children's Hospital > Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 www.mcri.edu.au > > -----Original Message----- > From: Valerie Obenchain [mailto:vobencha at fhcrc.org] > Sent: Wednesday, 23 January 2013 4:14 PM > To: Lavinia Gordon > Cc: bioconductor at r-project.org > Subject: Re: [BioC] VariantAnnotation - MatrixToSnpMatrix - only returns NAs > > Hi Lavinia, > > If you can use the development branch MatrixToSnpMatrix() has been replaced by genotypeToSnpMatrix(). This is much more full featured and robust function. However if you are using the release branch you still need to work with MatrixToSnpMatrix(). If this is the case, please read the man page at > > ?MatrixToSnpMatrix > > This page outlines the cases for which the values will be NA. You should be seeing warnings such as 'only diploid calls are included', 'only single nucleotide variants are included' or 'variants with>1 ALT allele are set to NA'. If you are not seeing such warnings, please send me a small sample of your VCF so I can reproduce this problem. > > Valerie > > On 01/22/13 17:35, Lavinia Gordon wrote: >> Hi, I have just started working with VCF files and have discovered the VariantAnnotation package, many thanks for making these functions available. >> Following the code outlined in the reference manual for MatrixToSnpMatrix, my VCF returns only NA values: >>> head(geno(vcf)$GT) >> GHS008 GHS015 GHS025 GHS026 GHS027 GHS031 GHS033 GHS034 GHS036 >> chrM:73 "1/1" "0/0" "1/1" "0/0" "0/0" "1/1" "0/0" "0/0" "0/0" >> chrM:119 "0/0" "0/0" "0/0" "1/1" "1/1" "0/0" "0/0" "0/0" "0/0" >> rs72619361 "0/0" "1/1" "0/0" "0/0" "0/0" "0/0" "1/1" "1/1" "1/1" >> chrM:150 "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" >> chrM:189 "0/0" "0/0" "0/0" "1/1" "1/1" "0/0" "0/0" "0/0" "0/0" >> chrM:195 "1/1" "1/1" "1/1" "0/0" "0/0" "1/1" "1/1" "1/1" "1/1" >>> head(t(as(mat$genotype, "character"))) >> GHS008 GHS015 GHS025 GHS026 GHS027 GHS031 GHS033 GHS034 GHS036 >> chrM:73 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" >> chrM:119 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" >> rs72619361 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" >> chrM:150 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" >> chrM:189 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" >> chrM:195 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" >> >> I have run the reference manual code with the supplied VCF and it all looks good. >> I have no reason to suspect that there is anything wrong with my VCF. >> Could anyone give me any tips as to how I can troubleshoot this and work out why all the NAs are appearing? >> >> Many thanks, >> >> Lavinia Gordon >> Senior Research Officer >> Quantitative Sciences Core, Bioinformatics >> >> Murdoch Childrens Research Institute >> The Royal Children's Hospital >> Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 >> www.mcri.edu.au >> >>> vcf >> class: VCF >> dim: 4665545 9 >> genome: hg19 >> exptData(1): header >> fixed(4): REF ALT QUAL FILTER >> info(19): AC AF ... SB EFF >> geno(5): AD DP GQ GT PL >> rownames(4665545): chrM:73 chrM:119 ... chrUn_gl000249:14244 >> chrUn_gl000249:16222 >> rowData values names(1): paramRangeID >> colnames(9): GHS008 GHS015 ... GHS034 GHS036 colData names(1): Samples >> >>> sessionInfo() >> R version 2.15.2 (2012-10-26) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] splines stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] snpStats_1.8.1 Matrix_1.0-10 lattice_0.20-13 >> [4] survival_2.37-2 VariantAnnotation_1.4.6 Rsamtools_1.10.2 >> [7] Biostrings_2.26.2 GenomicRanges_1.10.6 IRanges_1.16.4 >> [10] BiocGenerics_0.4.0 BiocInstaller_1.8.3 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 >> [4] bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 >> [7] GenomicFeatures_1.10.1 grid_2.15.2 parallel_2.15.2 >> [10] RCurl_1.95-3 RSQLite_0.11.2 rtracklayer_1.18.2 >> [13] stats4_2.15.2 tools_2.15.2 XML_3.95-0.1 >> [16] zlibbioc_1.4.0 >> >> ______________________________________________________________________ >> This email has been scanned by the Symantec Email Security.cloud service. >> For more information please visit http://www.symanteccloud.com >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com > > If you have any question, please contact MCRI IT Helpdesk for further assistance. > ______________________________________________________________________ > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com > ______________________________________________________________________
ADD REPLY
0
Entering edit mode
Many thanks for this Valerie. I will look at genoTypeToSnpMatrix. With regards, Lavinia Gordon Senior Research Officer Quantitative Sciences Core, Bioinformatics Murdoch Childrens Research Institute The Royal Children's Hospital Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 www.mcri.edu.au -----Original Message----- From: Valerie Obenchain [mailto:vobencha@fhcrc.org] Sent: Friday, 25 January 2013 4:16 AM To: Lavinia Gordon Cc: bioconductor at r-project.org Subject: Re: [BioC] VariantAnnotation - MatrixToSnpMatrix - only returns NAs Hello, On 01/23/2013 06:29 PM, Lavinia Gordon wrote: > Hi Valerie, > Just one other thought, is it possible that MatrixToSnpMatrix() cannot work with unphased data? Yes, you are correct. This is my oversight. The man page states that 'no distinction is made between phased and unphased genotypes' but certainly there is. Thanks for reporting this. Now fixed in release version 1.4.7 which should be available through biocLite() Friday ~9am PST. The better solution would still be to use genoTypeToSnpMatrix(). We've actually deprecated MatrixToSnpMatrix() in devel because of it's shortcomings. You might also check out snpSummary() in devel. These are new contributions and we'd be interested in any feedback. Credit for these functions go to Stephanie Gogarten and Chris Wallace. Thanks, Valerie My VCF is unphased. If I change the file VariantAnnotation\extdata\chr22.vcf to unphased that gives all NA values. > >> vcf<- readVcf("chr22.vcf", "hg19") >> calls<- geno(vcf)$GT >> a0<- ref(vcf) >> a1<- alt(vcf) >> mat<- MatrixToSnpMatrix(calls, a0, a1) head(t(as(mat$genotype, >> "character"))) > HG00096 HG00097 HG00099 HG00100 HG00101 > rs7410291 "NA" "NA" "NA" "NA" "NA" > rs147922003 "NA" "NA" "NA" "NA" "NA" > rs114143073 "NA" "NA" "NA" "NA" "NA" > rs141778433 "NA" "NA" "NA" "NA" "NA" > rs182170314 "NA" "NA" "NA" "NA" "NA" > rs115145310 "NA" "NA" "NA" "NA" "NA" > > With regards, > > Lavinia Gordon > Senior Research Officer > Quantitative Sciences Core, Bioinformatics > > Murdoch Childrens Research Institute > The Royal Children's Hospital > Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 > www.mcri.edu.au > > > -----Original Message----- > From: Lavinia Gordon > Sent: Thursday, 24 January 2013 8:47 AM > To: 'Valerie Obenchain' > Cc: bioconductor at r-project.org > Subject: RE: [BioC] VariantAnnotation - MatrixToSnpMatrix - only > returns NAs > > Hi Valerie, > > Thank you for your reply. I'll investigate the development branch option. > I did see the warnings however I find it hard to believe that these should apply to every single entry in my VCF. > With regards, > > Lavinia Gordon > Senior Research Officer > Quantitative Sciences Core, Bioinformatics > > Murdoch Childrens Research Institute > The Royal Children's Hospital > Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 > www.mcri.edu.au > > -----Original Message----- > From: Valerie Obenchain [mailto:vobencha at fhcrc.org] > Sent: Wednesday, 23 January 2013 4:14 PM > To: Lavinia Gordon > Cc: bioconductor at r-project.org > Subject: Re: [BioC] VariantAnnotation - MatrixToSnpMatrix - only > returns NAs > > Hi Lavinia, > > If you can use the development branch MatrixToSnpMatrix() has been > replaced by genotypeToSnpMatrix(). This is much more full featured and > robust function. However if you are using the release branch you still > need to work with MatrixToSnpMatrix(). If this is the case, please > read the man page at > > ?MatrixToSnpMatrix > > This page outlines the cases for which the values will be NA. You should be seeing warnings such as 'only diploid calls are included', 'only single nucleotide variants are included' or 'variants with>1 ALT allele are set to NA'. If you are not seeing such warnings, please send me a small sample of your VCF so I can reproduce this problem. > > Valerie > > On 01/22/13 17:35, Lavinia Gordon wrote: >> Hi, I have just started working with VCF files and have discovered the VariantAnnotation package, many thanks for making these functions available. >> Following the code outlined in the reference manual for MatrixToSnpMatrix, my VCF returns only NA values: >>> head(geno(vcf)$GT) >> GHS008 GHS015 GHS025 GHS026 GHS027 GHS031 GHS033 GHS034 GHS036 >> chrM:73 "1/1" "0/0" "1/1" "0/0" "0/0" "1/1" "0/0" "0/0" "0/0" >> chrM:119 "0/0" "0/0" "0/0" "1/1" "1/1" "0/0" "0/0" "0/0" "0/0" >> rs72619361 "0/0" "1/1" "0/0" "0/0" "0/0" "0/0" "1/1" "1/1" "1/1" >> chrM:150 "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" "1/1" >> chrM:189 "0/0" "0/0" "0/0" "1/1" "1/1" "0/0" "0/0" "0/0" "0/0" >> chrM:195 "1/1" "1/1" "1/1" "0/0" "0/0" "1/1" "1/1" "1/1" "1/1" >>> head(t(as(mat$genotype, "character"))) >> GHS008 GHS015 GHS025 GHS026 GHS027 GHS031 GHS033 GHS034 GHS036 >> chrM:73 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" >> chrM:119 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" >> rs72619361 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" >> chrM:150 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" >> chrM:189 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" >> chrM:195 "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" >> >> I have run the reference manual code with the supplied VCF and it all looks good. >> I have no reason to suspect that there is anything wrong with my VCF. >> Could anyone give me any tips as to how I can troubleshoot this and work out why all the NAs are appearing? >> >> Many thanks, >> >> Lavinia Gordon >> Senior Research Officer >> Quantitative Sciences Core, Bioinformatics >> >> Murdoch Childrens Research Institute >> The Royal Children's Hospital >> Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 >> www.mcri.edu.au >> >>> vcf >> class: VCF >> dim: 4665545 9 >> genome: hg19 >> exptData(1): header >> fixed(4): REF ALT QUAL FILTER >> info(19): AC AF ... SB EFF >> geno(5): AD DP GQ GT PL >> rownames(4665545): chrM:73 chrM:119 ... chrUn_gl000249:14244 >> chrUn_gl000249:16222 >> rowData values names(1): paramRangeID >> colnames(9): GHS008 GHS015 ... GHS034 GHS036 colData names(1): >> Samples >> >>> sessionInfo() >> R version 2.15.2 (2012-10-26) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] splines stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] snpStats_1.8.1 Matrix_1.0-10 lattice_0.20-13 >> [4] survival_2.37-2 VariantAnnotation_1.4.6 Rsamtools_1.10.2 >> [7] Biostrings_2.26.2 GenomicRanges_1.10.6 IRanges_1.16.4 >> [10] BiocGenerics_0.4.0 BiocInstaller_1.8.3 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 >> [4] bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 >> [7] GenomicFeatures_1.10.1 grid_2.15.2 parallel_2.15.2 >> [10] RCurl_1.95-3 RSQLite_0.11.2 rtracklayer_1.18.2 >> [13] stats4_2.15.2 tools_2.15.2 XML_3.95-0.1 >> [16] zlibbioc_1.4.0 >> >> _____________________________________________________________________ >> _ This email has been scanned by the Symantec Email Security.cloud >> service. >> For more information please visit http://www.symanteccloud.com >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com > > If you have any question, please contact MCRI IT Helpdesk for further assistance. > ______________________________________________________________________ > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com > ______________________________________________________________________ ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com If you have any question, please contact MCRI IT Helpdesk for further assistance. ______________________________________________________________________ ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
ADD REPLY

Login before adding your answer.

Traffic: 631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6