Question

Affy's 500K SNP arrays - retrieval of probe info

0

Entering edit mode

De Bondt, An-7114 [PRDBE] ▴ 190

@de-bondt-an-7114-prdbe-1572

Last seen 10.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070525/ b8c5fb65/attachment.pl

• 968 views

ADD COMMENT • link 17.9 years ago De Bondt, An-7114 [PRDBE] ▴ 190

score 0 · Answer 1 · 2007-05-25

Hi An, I'm assuming you want the offset and GC content for the PM probes, ok? Say your probe-level data (SnpFeature object) is called "rawData". theOffset <- pmPosition(get(annotation(rawData))) theSequences <- pmSequence(get(annotation(rawDataa))) centralSnps <- which(theOffset == 0) percentGC <- sapply(gregexpr("G|C", theSequences), length)/25 b On May 25, 2007, at 8:00 AM, De Bondt, An-7114 [PRDBE] wrote: > Dear, > >> From the raw probe level data, we would like to select only those >> of the > central SNP probe (position 0, with the SNP position exactly in the > middle) > from the sense as well as from the antisense strand. How can we do > this? > > We know we can get the GC content from that central probe based on the > 'Mapping250K_Nsp snp info.txt' file. How can we get %GC for each > of the > other probes as well? Is there a cdf for the Nsp and Sty arrays? Or > can we > get this info out of the pd.mapping250k.nsp/pd.mapping250k.sty? Or > is there > another way to get that info? > > Thanks in advance for your help! > > Regards, > An -- Benilton Carvalho PhD Candidate Department of Biostatistics Bloomberg School of Public Health Johns Hopkins University bcarvalh at jhsph.edu

score 0 · Answer 2 · 2007-05-29

Exactly, Ben, thanks a lot ! Applying this on the Sty based feature set (6553600 rows) results in: 3 vectors, each of length 3201544 (the other 3352056 are corresponding to MM) and the centralSnps vector of length 454224. What I do not understand yet: The number of rows after snprma() is 238304 for Sty. How is that number related to the length of centralSnps? In advance, I expected that the length of centralSnps would have been 4 times the number of rows after snprma: one central snp for alleleA on the sense strand one central snp for alleleB on the sense strand one central snp for alleleA on the antisense strand one central snp for alleleB on the antisense strand Kind regards, An -----Original Message----- From: Benilton Carvalho [mailto:bcarvalh@jhsph.edu] Sent: Friday, 25 May 2007 14:33 To: De Bondt, An-7114 [PRDBE] Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Affy's 500K SNP arrays - retrieval of probe info Hi An, I'm assuming you want the offset and GC content for the PM probes, ok? Say your probe-level data (SnpFeature object) is called "rawData". theOffset <- pmPosition(get(annotation(rawData))) theSequences <- pmSequence(get(annotation(rawDataa))) centralSnps <- which(theOffset == 0) percentGC <- sapply(gregexpr("G|C", theSequences), length)/25 b On May 25, 2007, at 8:00 AM, De Bondt, An-7114 [PRDBE] wrote: > Dear, > >> From the raw probe level data, we would like to select only those >> of the > central SNP probe (position 0, with the SNP position exactly in the > middle) > from the sense as well as from the antisense strand. How can we do > this? > > We know we can get the GC content from that central probe based on the > 'Mapping250K_Nsp snp info.txt' file. How can we get %GC for each > of the > other probes as well? Is there a cdf for the Nsp and Sty arrays? Or > can we > get this info out of the pd.mapping250k.nsp/pd.mapping250k.sty? Or > is there > another way to get that info? > > Thanks in advance for your help! > > Regards, > An -- Benilton Carvalho PhD Candidate Department of Biostatistics Bloomberg School of Public Health Johns Hopkins University bcarvalh at jhsph.edu

score 0 · Answer 3 · 2007-05-30

Hi Ben, I did not realise that there are SNPs that are not represented by a central 'position 0' probe. Thanks for this clarification! An -----Original Message----- From: Benilton Carvalho [mailto:bcarvalh@jhsph.edu] Sent: Tuesday, 29 May 2007 17:50 To: De Bondt, An-7114 [PRDBE] Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Affy's 500K SNP arrays - retrieval of probe info Hi An, There is no direct association between the number of SNPs and the number of probes whose offset is zero. On the 250K designs, given a SNP, the number of probes across offsets is usually unbalanced. Here a little piece of (ugly) code to clarify what I mean: ann = "pd.mapping250k.sty" fields = "man_fsetid, pmfeature.strand, allele, offset" tbls = "pmfeature, sequence, featureSet" conditions = "pmfeature.fid = sequence.fid AND featureSet.fsetid=pmfeature.fsetid" sql = paste("SELECT", fields, "FROM", tbls, "WHERE", conditions) tmp = dbGetQuery(db(get(ann)), sql) table(tmp[["man_fsetid"]], tmp[["offset"]])[1:10,] Hope this helps, b On May 29, 2007, at 6:54 AM, De Bondt, An-7114 [PRDBE] wrote: > > Exactly, Ben, thanks a lot ! > > Applying this on the Sty based feature set (6553600 rows) results in: > 3 vectors, each of length 3201544 (the other 3352056 are > corresponding to > MM) > and the centralSnps vector of length 454224. > > What I do not understand yet: > The number of rows after snprma() is 238304 for Sty. How is that > number > related to the length of centralSnps? > In advance, I expected that the length of centralSnps would have > been 4 > times the number of rows after snprma: > one central snp for alleleA on the sense strand > one central snp for alleleB on the sense strand > one central snp for alleleA on the antisense strand > one central snp for alleleB on the antisense strand > > Kind regards, > An > > > -----Original Message----- > From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu] > Sent: Friday, 25 May 2007 14:33 > To: De Bondt, An-7114 [PRDBE] > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Affy's 500K SNP arrays - retrieval of probe info > > > Hi An, > > I'm assuming you want the offset and GC content for the PM probes, ok? > > Say your probe-level data (SnpFeature object) is called "rawData". > > theOffset <- pmPosition(get(annotation(rawData))) > theSequences <- pmSequence(get(annotation(rawDataa))) > > centralSnps <- which(theOffset == 0) > percentGC <- sapply(gregexpr("G|C", theSequences), length)/25 > > b > > On May 25, 2007, at 8:00 AM, De Bondt, An-7114 [PRDBE] wrote: > >> Dear, >> >>> From the raw probe level data, we would like to select only those >>> of the >> central SNP probe (position 0, with the SNP position exactly in the >> middle) >> from the sense as well as from the antisense strand. How can we do >> this? >> >> We know we can get the GC content from that central probe based on >> the >> 'Mapping250K_Nsp snp info.txt' file. How can we get %GC for each >> of the >> other probes as well? Is there a cdf for the Nsp and Sty arrays? Or >> can we >> get this info out of the pd.mapping250k.nsp/pd.mapping250k.sty? Or >> is there >> another way to get that info? >> >> Thanks in advance for your help! >> >> Regards, >> An > > > -- > Benilton Carvalho > PhD Candidate > Department of Biostatistics > Bloomberg School of Public Health > Johns Hopkins University > bcarvalh at jhsph.edu >