sequences seemed to be not correctly cataloged in package "pd.genomewidesnp.6"

0

Entering edit mode

li lilingdu ▴ 450

@li-lilingdu-1884

Last seen 6.9 years ago

Dear List, I used the following R code to extrace sequence information of a particular probeset for PM probes of Affymetrix SNP6 array. However, for 100 probesets I tested there were only 2 unique PM sequences for each probeset. It appears that the PM sequences were not correctly catalogued. =============== library(pd.genomewidesnp.6) db(pd.genomewidesnp.6)->kao dbGetQuery(kao,"SELECT * from featureSet limit 100")$"man_fsetid"->probesets result<-vector("list",length(probesets)) names(result)<-probesets for(ind in 1:length(result)){ dbGetQuery(kao,paste("SELECT * from featureSet where man_fsetid='",names(result)[ind],"'",sep=""))$fsetid->fsetid dbGetQuery(kao, paste("select * from pmfeature where fsetid=",fsetid,sep=""))->pm.100 c(pm.100$fid)->totiao paste("fid=",paste(totiao,collapse=" or fid="),sep="")->totiao paste("SELECT * from sequence where ",totiao,sep="")->totiao dbGetQuery(kao,totiao)->seq result[[ind]]<-seq } sapply(result, function(xxx) length(unique(xxx$seq))) =================== sessionInfo() R version 2.8.0 (2008-10-20) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] tools stats graphics grDevices utils datasets methods base other attached packages: [1] pd.genomewidesnp.6_0.4.2 oligoClasses_1.4.0 Biobase_2.2.1 RSQLite_0.7-1 [5] DBI_0.2-4 ============== LiGang [[alternative HTML version deleted]]

• 1.2k views

ADD COMMENT • link updated 16.3 years ago by Vincent J. Carey, Jr. 6.7k • written 16.3 years ago by li lilingdu ▴ 450

0

Entering edit mode

Vincent J. Carey, Jr. 6.7k

@vincent-j-carey-jr-4

Last seen 6 weeks ago

United States

what is "not correct"? working directly from affymetrix annotation probe_tab text file, we have PROBESET_ID PROBE_X_POS PROBE_Y_POS PROBE_INTERROGATION_POSITION 1 SNP_A-1780270 2380 1757 3 2 SNP_A-1780270 2381 1757 3 3 SNP_A-1780270 2626 421 3 4 SNP_A-1780270 2627 421 3 5 SNP_A-1780270 540 1827 3 6 SNP_A-1780270 541 1827 3 7 SNP_A-1780270 694 338 3 8 SNP_A-1780270 695 338 3 PROBE_SEQUENCE TARGET_STRANDEDNESS PROBE_TYPE ALLELE 1 TTGTTAAGCAAGTGACTTATTTTAT f PM G 2 TTGTTAAGCAAGTGAGTTATTTTAT f PM C 3 TTGTTAAGCAAGTGACTTATTTTAT f PM G 4 TTGTTAAGCAAGTGAGTTATTTTAT f PM C 5 TTGTTAAGCAAGTGACTTATTTTAT f PM G 6 TTGTTAAGCAAGTGAGTTATTTTAT f PM C 7 TTGTTAAGCAAGTGACTTATTTTAT f PM G 8 TTGTTAAGCAAGTGAGTTATTTTAT f PM C there are 4 replicates of each sequence checking the pd.genomewide package, following your code snippet, we have > dbGetQuery(kao, "select man_fsetid, fsetid from featureSet where man_fsetid = 'SNP_A-1780270'") man_fsetid fsetid 1 SNP_A-1780270 326067 > dbGetQuery(kao, "select * from pmfeature where fsetid = '326067'") fid strand allele fsetid pos x y 1 906535 0 1 326067 6 694 338 2 906536 0 0 326067 5 695 338 3 1130907 0 1 326067 8 2626 421 4 1130908 0 0 326067 7 2627 421 5 4711141 0 1 326067 4 2380 1757 6 4711142 0 0 326067 3 2381 1757 7 4896901 0 1 326067 2 540 1827 8 4896902 0 0 326067 1 541 1827 now we know the fids of the probes we looked at in the original data > dbGetQuery(kao, "select * from sequence where fid = '4711141'") fid offset tstrand tallele seq 1 4711141 3 f G TTGTTAAGCAAGTGACTTATTTTAT > dbGetQuery(kao, "select * from sequence where fid = '4711142'") fid offset tstrand tallele seq 1 4711142 3 f C TTGTTAAGCAAGTGAGTTATTTTAT > dbGetQuery(kao, "select * from sequence where fid = '1130907'") fid offset tstrand tallele seq 1 1130907 3 f G TTGTTAAGCAAGTGACTTATTTTAT > dbGetQuery(kao, "select * from sequence where fid = '1130908'") fid offset tstrand tallele seq 1 1130908 3 f C TTGTTAAGCAAGTGAGTTATTTTAT what is incorrect? On Wed, Jan 7, 2009 at 2:22 PM, LiGang <luzifer.li@gmail.com> wrote: > Dear List, > > I used the following R code to extrace sequence information of a particular > probeset for PM probes of Affymetrix SNP6 array. However, for 100 probesets > I tested there were only 2 unique PM sequences for each probeset. It > appears that the PM sequences were not correctly catalogued. > > =============== > library(pd.genomewidesnp.6) > > db(pd.genomewidesnp.6)->kao > > dbGetQuery(kao,"SELECT * from featureSet limit > 100")$"man_fsetid"->probesets > result<-vector("list",length(probesets)) > names(result)<-probesets > > for(ind in 1:length(result)){ > > dbGetQuery(kao,paste("SELECT * from featureSet where > man_fsetid='",names(result)[ind],"'",sep=""))$fsetid->fsetid > > dbGetQuery(kao, paste("select * from pmfeature where > fsetid=",fsetid,sep=""))->pm.100 > > c(pm.100$fid)->totiao > paste("fid=",paste(totiao,collapse=" or fid="),sep="")->totiao > paste("SELECT * from sequence where ",totiao,sep="")->totiao > dbGetQuery(kao,totiao)->seq > > result[[ind]]<-seq > > } > > > sapply(result, function(xxx) length(unique(xxx$seq))) > > =================== > > sessionInfo() > > R version 2.8.0 (2008-10-20) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] pd.genomewidesnp.6_0.4.2 oligoClasses_1.4.0 > Biobase_2.2.1 RSQLite_0.7-1 > [5] DBI_0.2-4 > ============== > > > LiGang > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 16.3 years ago Vincent J. Carey, Jr. 6.7k

Login before adding your answer.