Question

Translating AB/BB/AA into a SNP with Illumina data

0

Entering edit mode

Lavinia Gordon ▴ 480

@lavinia-gordon-2959

Last seen 10.6 years ago

Dear all, I am working with Illumina Human Omni1 Quad data. I only have access to processed data, e.g: ID_REF VALUE Score Theta R B Allele Freq Log R Ratio 200006 AB 0.8273118 0.4800678 2.651576 0.5337635 0.1516016 I would like to know what the SNP is at this position and wondered if there are any components within the Bioconductor packages that can deal with this data, taking into account the TOP/BTM strand approach that Illumina uses. I have previously had great success with crlmm, but that was working from the raw IDAT files. With thanks for your time, Lavinia Gordon Senior Research Officer Quantitative Sciences Core, Bioinformatics Murdoch Childrens Research Institute The Royal Children's Hospital Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 www.mcri.edu.au ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com

SNP crlmm SNP crlmm • 5.0k views

ADD COMMENT • link updated 12.8 years ago by Stephanie M. Gogarten ▴ 890 • written 12.8 years ago by Lavinia Gordon ▴ 480

score 0 · Answer 1 · 2012-07-16

0

Entering edit mode

Stephanie M. Gogarten ▴ 890

@stephanie-m-gogarten-5121

Last seen 9 months ago

University of Washington

Hi Lavinia, The GWASTools package was designed to work with this type of data. You can download annotation for Illumina arrays from their website: https://icom.illumina.com/. They now require that you register with their site to download files. Once you have logged in, click "Downloads" in the menu on the left and then "Genotyping/LOH/CNV" in the menu on the right, and look for the Human Omni1 Quad link. The file that you want is called HumanOmni1-Quad_v1-0_H_csv.zip, and looks like this: IlmnID,Name,IlmnStrand,SNP,AddressA_ID,AlleleA_ProbeSeq,AddressB_ID,Al leleB_ProbeSeq,GenomeBuild,Chr,MapInfo,Ploidy,Species,Source,SourceVer sion,SourceStrand,SourceSeq,TopGenomicSeq,BeadSetID,Exp_Clusters,Inten sity_Only,RefStrand 200006-0_T_R_1853021091,200006,TOP,[A/G],0060702346,AGACTGTGGATGAATAAT GCTGGTGAGTGTCTGGCCCTCGGGGAGGCCCA,,,37.1,9,139926402,diploid,Homo sapiens,ILLUMINA,0,BOT,ACATGCCCCACTCAGCGCCACCCCCGTCCTCCCCTCCCAGGTTGCCT AGCTGTCCCCAGC[T/C]TGGGCCTCCCCGAGGGCCAGACACTCACCAGCATTATTCATCCACAGTCTCC CAGGATCA,TGATCCTGGGAGACTGTGGATGAATAATGCTGGTGAGTGTCTGGCCCTCGGGGAGGCCCA[ A/G]GCTGGGGACAGCTAGGCAACCTGGGAGGGGAGGACGGGGGTGGCGCTGAGTGGGGCATGT,163,3 ,0,- The "SNP" column tells you the A/B allele designation for a particular SNP (format [A/B]) and the "IlmnStrand" column tells you whether that SNP is on the TOP or BOT strand. (See here for a useful article on how to convert between different strand designations: http://www.sciencedirect.com/science/article/pii/S0168952512000704) Stephanie Gogarten Research Scientist, Biostatistics University of Washington On 7/16/12 3:00 AM, bioconductor-request at r-project.org wrote: > Message: 3 > Date: Mon, 16 Jul 2012 13:59:33 +1000 > From: "Lavinia Gordon"<lavinia.gordon at="" mcri.edu.au=""> > To:<bioconductor at="" r-project.org=""> > Subject: [BioC] Translating AB/BB/AA into a SNP with Illumina data > Message-ID:<87223629775F2049917889888F597633FD720F at murmx.mcri.edu.au> > Content-Type: text/plain; charset="us-ascii" > > Dear all, > > I am working with Illumina Human Omni1 Quad data. I only have access to > processed data, e.g: > ID_REF VALUE Score Theta R B Allele Freq Log R Ratio > 200006 AB 0.8273118 0.4800678 2.651576 > 0.5337635 0.1516016 > > I would like to know what the SNP is at this position and wondered if > there are any components within the Bioconductor packages that can deal > with this data, taking into account the TOP/BTM strand approach that > Illumina uses. I have previously had great success with crlmm, but that > was working from the raw IDAT files. > > With thanks for your time, > > Lavinia Gordon > Senior Research Officer > Quantitative Sciences Core, Bioinformatics > > Murdoch Childrens Research Institute > The Royal Children's Hospital > Flemington Road Parkville Victoria 3052 Australia > T 03 8341 6221 > www.mcri.edu.au

ADD COMMENT • link 12.8 years ago Stephanie M. Gogarten ▴ 890

0

Entering edit mode

Hi Stephanie Thank you so much for the useful link and article. With the annotation data I have something like this: Name VALUE IlmnStrand SNP GenomeBuild Chr MapInfo 1 200006 AB TOP [A/G] 36 9 139046223 So I know AB/TOP/[A/G] means the Illumina data is reverse complement to the source seq so the reference is T, so it is actually C/T (confirmed by checking the location on the UCSC Genome Browser, where it is annotated as dbSNP rs7469569). But as far as I can see in the GWASTools package, there are no tools to do this for me, i.e. using the VALUE/IlmnStrand/SNP info to determine the SNP, and ideally the GenomeBuild + chrom data to confirm the dbSNP info? With thanks for your time, Lavinia Gordon Senior Research Officer Quantitative Sciences Core, Bioinformatics Murdoch Childrens Research Institute The Royal Children's Hospital Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 www.mcri.edu.au -----Original Message----- From: Stephanie M. Gogarten [mailto:sdmorris@u.washington.edu] Sent: Tuesday, 17 July 2012 2:19 AM To: Lavinia Gordon Cc: bioconductor at r-project.org Subject: Re: Translating AB/BB/AA into a SNP with Illumina data Hi Lavinia, The GWASTools package was designed to work with this type of data. You can download annotation for Illumina arrays from their website: https://icom.illumina.com/. They now require that you register with their site to download files. Once you have logged in, click "Downloads" in the menu on the left and then "Genotyping/LOH/CNV" in the menu on the right, and look for the Human Omni1 Quad link. The file that you want is called HumanOmni1-Quad_v1-0_H_csv.zip, and looks like this: IlmnID,Name,IlmnStrand,SNP,AddressA_ID,AlleleA_ProbeSeq,AddressB_ID,Al le leB_ProbeSeq,GenomeBuild,Chr,MapInfo,Ploidy,Species,Source,SourceVersi on ,SourceStrand,SourceSeq,TopGenomicSeq,BeadSetID,Exp_Clusters,Intensity _O nly,RefStrand 200006-0_T_R_1853021091,200006,TOP,[A/G],0060702346,AGACTGTGGATGAATAAT GC TGGTGAGTGTCTGGCCCTCGGGGAGGCCCA,,,37.1,9,139926402,diploid,Homo sapiens,ILLUMINA,0,BOT,ACATGCCCCACTCAGCGCCACCCCCGTCCTCCCCTCCCAGGTTGCCT AG CTGTCCCCAGC[T/C]TGGGCCTCCCCGAGGGCCAGACACTCACCAGCATTATTCATCCACAGTCTCCCA GG ATCA,TGATCCTGGGAGACTGTGGATGAATAATGCTGGTGAGTGTCTGGCCCTCGGGGAGGCCCA[A/G] GC TGGGGACAGCTAGGCAACCTGGGAGGGGAGGACGGGGGTGGCGCTGAGTGGGGCATGT,163,3,0,- The "SNP" column tells you the A/B allele designation for a particular SNP (format [A/B]) and the "IlmnStrand" column tells you whether that SNP is on the TOP or BOT strand. (See here for a useful article on how to convert between different strand designations: http://www.sciencedirect.com/science/article/pii/S0168952512000704) Stephanie Gogarten Research Scientist, Biostatistics University of Washington On 7/16/12 3:00 AM, bioconductor-request at r-project.org wrote: > Message: 3 > Date: Mon, 16 Jul 2012 13:59:33 +1000 > From: "Lavinia Gordon"<lavinia.gordon at="" mcri.edu.au=""> > To:<bioconductor at="" r-project.org=""> > Subject: [BioC] Translating AB/BB/AA into a SNP with Illumina data > Message-ID:<87223629775F2049917889888F597633FD720F at murmx.mcri.edu.au> > Content-Type: text/plain; charset="us-ascii" > > Dear all, > > I am working with Illumina Human Omni1 Quad data. I only have access > to processed data, e.g: > ID_REF VALUE Score Theta R B Allele Freq Log R Ratio > 200006 AB 0.8273118 0.4800678 2.651576 > 0.5337635 0.1516016 > > I would like to know what the SNP is at this position and wondered if > there are any components within the Bioconductor packages that can > deal with this data, taking into account the TOP/BTM strand approach > that Illumina uses. I have previously had great success with crlmm, > but that was working from the raw IDAT files. > > With thanks for your time, > > Lavinia Gordon > Senior Research Officer > Quantitative Sciences Core, Bioinformatics > > Murdoch Childrens Research Institute > The Royal Children's Hospital > Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 > www.mcri.edu.au ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com If you have any question, please contact MCRI IT Helpdesk for further assistance. ______________________________________________________________________ ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com

ADD REPLY • link 12.8 years ago Lavinia Gordon ▴ 480

0

Entering edit mode

Hi Lavinia, Sorry my response was not more clear - I wasn't sure if you wanted just information on that particular SNP or how to work with that data format in general. The functions in GWASTools operate on genotype data in A/B format and BAlleleFreq/LogRRatio data, but it doesn't help you set up the SNP annotation, which is why I directed you to the Illumina file. To query dbSNP from within R, you might try biomaRt. You can also look at rtracklayer, which interacts with the UCSC genome browser. Stephanie On 7/16/12 5:09 PM, Lavinia Gordon wrote: > Hi Stephanie > > Thank you so much for the useful link and article. > With the annotation data I have something like this: > Name VALUE IlmnStrand SNP GenomeBuild Chr MapInfo > 1 200006 AB TOP [A/G] 36 9 139046223 > > So I know AB/TOP/[A/G] means the Illumina data is reverse complement to > the source seq so the reference is T, so it is actually C/T (confirmed > by checking the location on the UCSC Genome Browser, where it is > annotated as dbSNP rs7469569). > > But as far as I can see in the GWASTools package, there are no tools to > do this for me, i.e. using the VALUE/IlmnStrand/SNP info to determine > the SNP, and ideally the GenomeBuild + chrom data to confirm the dbSNP > info? > > With thanks for your time, > > Lavinia Gordon > Senior Research Officer > Quantitative Sciences Core, Bioinformatics > > Murdoch Childrens Research Institute > The Royal Children's Hospital > Flemington Road Parkville Victoria 3052 Australia > T 03 8341 6221 > www.mcri.edu.au > > > -----Original Message----- > From: Stephanie M. Gogarten [mailto:sdmorris at u.washington.edu] > Sent: Tuesday, 17 July 2012 2:19 AM > To: Lavinia Gordon > Cc: bioconductor at r-project.org > Subject: Re: Translating AB/BB/AA into a SNP with Illumina data > > Hi Lavinia, > > The GWASTools package was designed to work with this type of data. > > You can download annotation for Illumina arrays from their website: > https://icom.illumina.com/. They now require that you register with > their site to download files. Once you have logged in, click > "Downloads" in the menu on the left and then "Genotyping/LOH/CNV" in the > menu on the right, and look for the Human Omni1 Quad link. The file > that you want is called HumanOmni1-Quad_v1-0_H_csv.zip, and looks like > this: > > IlmnID,Name,IlmnStrand,SNP,AddressA_ID,AlleleA_ProbeSeq,AddressB_ID, Alle > leB_ProbeSeq,GenomeBuild,Chr,MapInfo,Ploidy,Species,Source,SourceVer sion > ,SourceStrand,SourceSeq,TopGenomicSeq,BeadSetID,Exp_Clusters,Intensi ty_O > nly,RefStrand > 200006-0_T_R_1853021091,200006,TOP,[A/G],0060702346,AGACTGTGGATGAATA ATGC > TGGTGAGTGTCTGGCCCTCGGGGAGGCCCA,,,37.1,9,139926402,diploid,Homo > sapiens,ILLUMINA,0,BOT,ACATGCCCCACTCAGCGCCACCCCCGTCCTCCCCTCCCAGGTTGC CTAG > CTGTCCCCAGC[T/C]TGGGCCTCCCCGAGGGCCAGACACTCACCAGCATTATTCATCCACAGTCTCC CAGG > ATCA,TGATCCTGGGAGACTGTGGATGAATAATGCTGGTGAGTGTCTGGCCCTCGGGGAGGCCCA[A/ G]GC > TGGGGACAGCTAGGCAACCTGGGAGGGGAGGACGGGGGTGGCGCTGAGTGGGGCATGT,163,3,0,- > > The "SNP" column tells you the A/B allele designation for a particular > SNP (format [A/B]) and the "IlmnStrand" column tells you whether that > SNP is on the TOP or BOT strand. (See here for a useful article on how > to convert between different strand designations: > http://www.sciencedirect.com/science/article/pii/S0168952512000704) > > Stephanie Gogarten > Research Scientist, Biostatistics > University of Washington > > > On 7/16/12 3:00 AM, bioconductor-request at r-project.org wrote: >> Message: 3 >> Date: Mon, 16 Jul 2012 13:59:33 +1000 >> From: "Lavinia Gordon"<lavinia.gordon at="" mcri.edu.au=""> >> To:<bioconductor at="" r-project.org=""> >> Subject: [BioC] Translating AB/BB/AA into a SNP with Illumina data >> Message-ID:<87223629775F2049917889888F597633FD720F at murmx.mcri.edu.au> >> Content-Type: text/plain; charset="us-ascii" >> >> Dear all, >> >> I am working with Illumina Human Omni1 Quad data. I only have access >> to processed data, e.g: >> ID_REF VALUE Score Theta R B Allele Freq Log R > Ratio >> 200006 AB 0.8273118 0.4800678 2.651576 >> 0.5337635 0.1516016 >> >> I would like to know what the SNP is at this position and wondered if >> there are any components within the Bioconductor packages that can >> deal with this data, taking into account the TOP/BTM strand approach >> that Illumina uses. I have previously had great success with crlmm, >> but that was working from the raw IDAT files. >> >> With thanks for your time, >> >> Lavinia Gordon >> Senior Research Officer >> Quantitative Sciences Core, Bioinformatics >> >> Murdoch Childrens Research Institute >> The Royal Children's Hospital >> Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 >> www.mcri.edu.au > > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud > service. > For more information please visit http://www.symanteccloud.com > > If you have any question, please contact MCRI IT Helpdesk for further > assistance. > ______________________________________________________________________ > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com > ______________________________________________________________________ >

ADD REPLY • link 12.8 years ago Stephanie M. Gogarten ▴ 890