blast probe clusters when using Affymetrix Gene Array Strips

0

Entering edit mode

Joao Sollari Lopes ▴ 80

@joao-sollari-lopes-6122

Last seen 10.7 years ago

Hi, I am using Zebrafish Gene 1.1 ST Array Strip, I have found some transcript clusters that are differentially expressed but are not annotated (although they belong to the "main" design of the array). I would like to blast them, but I am not sure what to blast as each transcript cluster has various probes associated. Should I blast them all individually? I have read about "probe set target sequence" (https://stat.ethz.ch/pipermail/bioconductor/2004-March/004250.html), but I am not sure if it applies to the Gene Array Strip. If it does, how can I obtain these sequences? Thanks, Joao Instituto Gulbenkian de Ciencia

zebrafish zebrafish • 1.4k views

ADD COMMENT • link updated 11.7 years ago by James W. MacDonald 68k • written 11.7 years ago by Joao Sollari Lopes ▴ 80

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 3 hours ago

United States

Hi Joao, On Wednesday, September 04, 2013 6:11:12 AM, Joao Sollari Lopes wrote: > Hi, > > I am using Zebrafish Gene 1.1 ST Array Strip, I have found some > transcript clusters that are differentially expressed but are not > annotated (although they belong to the "main" design of the array). I > would like to blast them, but I am not sure what to blast as each > transcript cluster has various probes associated. Should I blast them > all individually? I have read about "probe set target sequence" > (https://stat.ethz.ch/pipermail/bioconductor/2004-March/004250.html), > but I am not sure if it applies to the Gene Array Strip. If it does, > how can I obtain these sequences? Depends on what you decide to do. You can download the transcript clusters here: http://www.affymetrix.com/Auth/analysis/downloads/lf/wt/ZebGene- 1_1-st-v1/ZebGene-1_1-st-v1.zv9.transcript_cluster.fa.zip and then get the FASTA sequences you want to blast. This might not be exactly what you want, as the transcripts in that file correspond to very long sequences that a given probeset is designed to interrogate. As an example, probeset 12943944 is intended to interrogate a 2500 nt transcript, but uses 19 probes (25-mers) to do so. If you blast the transcript, you will see where that 2500 nt transcript is in the genome, but you won't know anything about the individual probes. You could alternatively use the probe tab file, found here: http://www.affymetrix.com/Auth/analysis/downloads/lf/wt/ZebGene- 1_1-st-v1/ZebGene-1_1-st-v1.zv9.probe.tab.zip and extract the 19 probes for that particular probeset and then use Jim Kent's blat program at the UCSC genome browser to align. I have a small function I have used in the past to convert these data to FASTA format that you can then upload to blat. But this requires the probe tab data to be in a probe package. I will give you the code, but you will have to make your own probe package. You will need to use makeProbePackage() in the AnnotationForge package. There is a vignette here: http://www.bioconductor.org/packages/release/bioc/vignettes/Annotation Forge/inst/doc/makeProbePackage.pdf as well as a help page, so you shouldn't have any problems with that. If you decide to go that direction, here is the function you will need to make FASTA files: blatGene <- function(affyid, probe, filename){ ## affyid == Affy probeset ID ## probe == BioC probe package name ## filename == output file name require(probe, quietly = TRUE, character.only = TRUE) tmp <- data.frame(get(probe)) if(length(affyid) > 1){ seqnc <- vector() for(i in seq(along = affyid)) seqnc <- c(seqnc, tmp[tmp$Probe.Set.Name == affyid[i], 1]) }else{ seqnc <- tmp[tmp$Probe.Set.Name == affyid,1] } out <- vector() if(length(seqnc) > 25) warning("Blat will only return values for 25 or fewer sequences!", call. = FALSE) for(i in seq(along = seqnc)) out <- rbind(out, rbind(paste("> Probe", i, sep=""), seqnc[i])) write.table(out, filename, sep="\t", quote=FALSE, row.names=FALSE, col.names=FALSE) } Best, Jim > > Thanks, > Joao > Instituto Gulbenkian de Ciencia > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 11.7 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Jim, Thanks for your help once again! Joao On 09/04/2013 03:08 PM, James W. MacDonald wrote: > Hi Joao, > > On Wednesday, September 04, 2013 6:11:12 AM, Joao Sollari Lopes wrote: >> Hi, >> >> I am using Zebrafish Gene 1.1 ST Array Strip, I have found some >> transcript clusters that are differentially expressed but are not >> annotated (although they belong to the "main" design of the array). I >> would like to blast them, but I am not sure what to blast as each >> transcript cluster has various probes associated. Should I blast them >> all individually? I have read about "probe set target sequence" >> (https://stat.ethz.ch/pipermail/bioconductor/2004-March/004250.html), >> but I am not sure if it applies to the Gene Array Strip. If it does, >> how can I obtain these sequences? > > Depends on what you decide to do. You can download the transcript > clusters here: > > http://www.affymetrix.com/Auth/analysis/downloads/lf/wt/ZebGene- 1_1-st-v1/ZebGene-1_1-st-v1.zv9.transcript_cluster.fa.zip > > > and then get the FASTA sequences you want to blast. This might not be > exactly what you want, as the transcripts in that file correspond to > very long sequences that a given probeset is designed to interrogate. > As an example, probeset 12943944 is intended to interrogate a 2500 nt > transcript, but uses 19 probes (25-mers) to do so. If you blast the > transcript, you will see where that 2500 nt transcript is in the > genome, but you won't know anything about the individual probes. > > You could alternatively use the probe tab file, found here: > > http://www.affymetrix.com/Auth/analysis/downloads/lf/wt/ZebGene- 1_1-st-v1/ZebGene-1_1-st-v1.zv9.probe.tab.zip > > > and extract the 19 probes for that particular probeset and then use > Jim Kent's blat program at the UCSC genome browser to align. I have a > small function I have used in the past to convert these data to FASTA > format that you can then upload to blat. But this requires the probe > tab data to be in a probe package. > > I will give you the code, but you will have to make your own probe > package. You will need to use makeProbePackage() in the > AnnotationForge package. There is a vignette here: > > http://www.bioconductor.org/packages/release/bioc/vignettes/Annotati onForge/inst/doc/makeProbePackage.pdf > > > as well as a help page, so you shouldn't have any problems with that. > > If you decide to go that direction, here is the function you will need > to make FASTA files: > > > blatGene <- function(affyid, probe, filename){ > ## affyid == Affy probeset ID > ## probe == BioC probe package name > ## filename == output file name > require(probe, quietly = TRUE, character.only = TRUE) > tmp <- data.frame(get(probe)) > if(length(affyid) > 1){ > seqnc <- vector() > for(i in seq(along = affyid)) > seqnc <- c(seqnc, tmp[tmp$Probe.Set.Name == affyid[i], 1]) > }else{ > seqnc <- tmp[tmp$Probe.Set.Name == affyid,1] > } > out <- vector() > if(length(seqnc) > 25) warning("Blat will only return values for 25 > or fewer sequences!", > call. = FALSE) > for(i in seq(along = seqnc)) out <- rbind(out, rbind(paste("> > Probe", i, sep=""), seqnc[i])) > write.table(out, filename, sep="\t", quote=FALSE, row.names=FALSE, > col.names=FALSE) > } > > Best, > > Jim > > > >> >> Thanks, >> Joao >> Instituto Gulbenkian de Ciencia >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099

ADD REPLY • link 11.7 years ago Joao Sollari Lopes ▴ 80

Login before adding your answer.