FW: Does combineAffyBatch of package matchprobes elimi nate the wrong probe sets?
0
0
Entering edit mode
@christianstratowavieboehringer-ingelheimcom-545
Last seen 10.2 years ago
Dear Wolfgang Thank you for your reply, I understand that you are busy, this seems to be a problem of everybody:-( In the case of "412_s_at" only one of 16 probe sequences are identical, i.e. GGAGCCAGGACACCTGCTCTCCGGC at position (202,259) vs (359,471). This means that one match is sufficient to define the probe set as "412_s_at". A worthwhile expansion of function combineAffyBatch() would be to add a parameter setting the minimum number of oligos necessary for creating a probe set. Sorrowly, I do not have time to implement this expansion. Best regards Christian ============================================== Christian Stratowa, PhD Boehringer Ingelheim Austria Dept NCE Lead Discovery - Bioinformatics Dr. Boehringergasse 5-11 A-1121 Vienna, Austria Tel.: ++43-1-80105-2470 Fax: ++43-1-80105-2782 email: christian.stratowa at vie.boehringer-ingelheim.com -----Original Message----- From: Wolfgang Huber [mailto:huber@ebi.ac.uk] Sent: Tuesday, March 14, 2006 13:51 To: Stratowa,Dr.,Christian FEX BIG-AT-V; bioconductor at stat.math.ethz.ch Subject: Re: FW: [BioC] Does combineAffyBatch of package matchprobes eliminate the wrong probe sets? Christian.Stratowa at vie.boehringer-ingelheim.com wrote: > Dear Wolfgang > > Sorrowly, I did not get any response to my question below. I am very > interested what might be > the explanation for the behavior described below. > Maybe, I made a simple mistake? > I would appreciate your explanation very much. > Dear Christian, I apologize - everyone is very busy, we are doing this in our spare time and are not paid for the consulting. So please excuse if I don't reply to every possible mail. The matching is done by _probe sequence_, the combined probe sets and their naming are taken from the first AffyBatch in the argument list. The probe set assignments and names of the other AffyBatches are ignored, just the probe sequences are used. I think your question might be sufficiently detailed that you will benefit from having a look at the code of that function (it is not that complicated), and use the debugger to look at some of the intermediate results. Please also have a look at the man page combineAffyBatch. This is one way of combining different Affymetrix chip types, there might be other schemes that are more suitable for what you are after (since 95A and 95Av2 chips are so similar), please feel free to implement and contribute them! Cheers Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber ------------------------------------ > -----Original Message----- > From: Stratowa,Dr.,Christian FEX BIG-AT-V > Sent: Friday, March 10, 2006 9:17 > To: 'bioconductor at stat.math.ethz.ch' > Subject: [BioC] Does combineAffyBatch of package matchprobes > eliminate the wrong probe sets? > > Sorowly, until now I did not receive an answer to my question, so here > is some more information: > > The main problem is that combineAffyBatch() adds probe set "412_s_at" > to the list of common probe sets on U95A and U95Av2 although this > probe set does NOT exist on U95Av2 but was > replaced by probe set "160033_s_at". Analogously, other probe sets were > replaced on U95Av2 > too, but are not listed as common probe sets, e.g. "1829_at" is replaced by > "160020_s_at". > For this reason I was not able to subset the mas5 expression values for U95A > and U95Av2 > to the common probe sets. > > So why is probe set "412_s_at" added to the list of common probe sets > but e.g. "1829_at" not, although both probe sets are replaced by > oligos with the same (x,y)-coordinates but with > different oligonucleotide sequences? > > Here is the addition to the code below to show these issues: > > # setdiff as vectors > >>a21 <- >>unlist(setdiff(dimnames(x95Av2.r2)[[1]],dimnames(x95A.r2)[[1]])) >>a12 <- unlist(setdiff(dimnames(x95A.r2)[[1]],dimnames(x95Av2.r2)[[1]])) > > >>a2 <- unlist(setdiff(dimnames(x95Av2.r2)[[1]],dimnames(dat.r2)[[1]])) >>a1 <- unlist(setdiff(dimnames(x95A.r2)[[1]],dimnames(dat.r2)[[1]])) > > >>setdiff(a1,a2) > > [1] "119_at" "1215_at" "1216_at" "124_i_at" "125_r_at" "127_at" > [7] "1301_s_at" "1302_s_at" "132_at" "1429_at" "1502_s_at" "1829_at" > [13] "1864_at" "1889_s_at" "1982_s_at" "36969_at" "383_at" "397_at" > [19] "426_at" "439_at" "787_at" "788_s_at" "972_s_at" "985_s_at" > [25] "997_at" > > >>setdiff(a2,a1) > > [1] "160020_at" "160021_r_at" "160022_at" "160023_at" "160024_at" > [6] "160025_at" "160026_at" "160027_s_at" "160028_s_at" "160029_at" > [11] "160030_at" "160031_at" "160032_at" "160033_s_at" "160034_s_at" > [16] "160035_at" "160036_at" "160037_at" "160038_s_at" "160039_at" > [21] "160040_at" "160041_at" "160042_s_at" "160043_at" "160044_g_at" > > # Problem: probeset 412_s_at does NOT exist on U95Av2 and thus should > not be listed on combined probe set! > >>setdiff(a21, setdiff(a2,a1)) > > character(0) > >>setdiff(a12, setdiff(a1,a2)) > > [1] "412_s_at" > > Here are the informations from the respective probe.tab files: > #HG-U95A_probe.tab > 412_s_at 286 579 1795 AGCCACGGGCGTCAGAGAGACCCGG > Antisense > 412_s_at 494 633 1798 CACGGGCGTCAGAGAGACCCGGGAA > Antisense > 412_s_at 446 145 1807 CAGAGAGACCCGGGAAGGAAGGCTC > Antisense > 412_s_at 77 573 1810 AGAGACCCGGGAAGGAAGGCTCTCG > Antisense > 412_s_at 202 259 1841 GGAGCCAGGACACCTGCTCTCCGGC > Antisense > 412_s_at 141 403 1842 GAGCCAGGACACCTGCTCTCCGGCG > Antisense > 412_s_at 302 119 1846 CAGGACACCTGCTCTCCGGCGCAGA > Antisense > 412_s_at 354 285 1854 CTGCTCTCCGGCGCAGACAGCGGGG > Antisense > 412_s_at 133 489 1855 TGCTCTCCGGCGCAGACAGCGGGGC > Antisense > 412_s_at 134 489 1856 GCTCTCCGGCGCAGACAGCGGGGCC > Antisense > 412_s_at 206 383 1858 TCTCCGGCGCAGACAGCGGGGCCCA > Antisense > 412_s_at 206 385 1859 CTCCGGCGCAGACAGCGGGGCCCAG > Antisense > 412_s_at 192 237 1864 GCGCAGACAGCGGGGCCCAGCGCTC > Antisense > 412_s_at 191 237 1865 CGCAGACAGCGGGGCCCAGCGCTCT > Antisense > 412_s_at 227 191 1868 AGACAGCGGGGCCCAGCGCTCTCCT > Antisense > 412_s_at 364 57 1870 ACAGCGGGGCCCAGCGCTCTCCTGG > Antisense > > #HG-U95Av2_probe.tab > 160033_s_at 1 364 57 1504 GTGCTCCAGGAAGATATAGACATTG > Antisense > 160033_s_at 2 302 119 1533 GGTACAGTCAGAAGGACAGGACAAT > Antisense > 160033_s_at 3 446 145 1534 GTACAGTCAGAAGGACAGGACAATG > Antisense > 160033_s_at 4 227 191 1568 ATTCTGGGGACACAGAGGATGAGCT > Antisense > 160033_s_at 5 191 237 1648 GGGGAAGACCCGTATGCAGGCTCCA > Antisense > 160033_s_at 6 192 237 1700 ACCAGGAGCCTCCTGATCTGCCAGT > Antisense > 160033_s_at 7 202 259 1718 TGCCAGTCCCTGAGCTCCCAGATTT > Antisense > 160033_s_at 8 354 285 1752 CAAGCACTTCTTTCTTTACGGGGAG > Antisense > 160033_s_at 9 206 383 1787 ACGAGCGGCGGAAACTCATCCGATA > Antisense > 160033_s_at 10 206 385 1797 GAAACTCATCCGATACGTCACAGCC > Antisense > 160033_s_at 11 141 403 1901 AGGAGGCCCTGATGGACAACCCCTC > Antisense > 160033_s_at 12 133 489 1926 CCTGGCATTCGTTCGTCCCCGATGG > Antisense > 160033_s_at 13 134 489 1952 TCTACAGTTGCAATGAGAAGCAGAA > Antisense > 160033_s_at 14 77 573 1969 AAGCAGAAGTTACTTCCTCACCAGC > Antisense > 160033_s_at 15 286 579 1980 ACTTCCTCACCAGCTCTATGGGGTG > Antisense > 160033_s_at 16 494 633 2006 TGCCGCAAGCCTGAAGTATGTGCTA > Antisense > > P.S.: > Looking at the Affymetrix "xx_annot.cvs" and "xx_probe.tab" files I realized > that the oligo sequence is > missing in the probe.tab file for all probe-sets which are annotated as > "Sequence Source" = TIGR! > It is not clear while the TIGR sequences are not listed in the probe.tab > files. However, this explains > why function combineAffyBatch() deletes 197 probe sets and not only the 51 > different probe sets. > > Best regards > Christian > > ============================================== > Christian Stratowa, PhD > Boehringer Ingelheim Austria > Dept NCE Lead Discovery - Bioinformatics > Dr. Boehringergasse 5-11 > A-1121 Vienna, Austria > Tel.: ++43-1-80105-2470 > Fax: ++43-1-80105-2782 > email: christian.stratowa at vie.boehringer-ingelheim.com > > Maybe I am doing something wrong but when combining U95A and U95Av2 > files using function combineAffyBatch() it seems that 197 probe sets > from U95A and from U95Av2 are deleted although there are only 51 > different probe sets. Furthermore, it seems that probe sets which are > available on both U95A and U95Av2 are deleted and not the different > probe sets. > > Here is my code, which supports my statements: > # call libraries > >>library(matchprobes) >>library(affy) >>library(hgu95acdf) >>library(hgu95aprobe) >>library(hgu95av2cdf) >>library(hgu95av2probe) > > > # load CEL files from folders HG-U95A and HG-U95Av2 > >>x95A <- ReadAffy(celfile.path="HG-U95A") >>x95Av2 <- ReadAffy(celfile.path="HG-U95Av2") > > > # combine data > >>res <- combineAffyBatch(list(x95A,x95Av2), > > c("hgu95aprobe","hgu95av2probe"), newcdf="comb95") > >>comb95 <- res$cdf > > > # rma normalization for combined data > >>dat.rma <- rma(res$dat) > > > # rma normalization for U95A and U95Av2 separately > >>x95A.rma <- rma(x95A) >>x95Av2.rma <- rma(x95Av2) > > > # store log2 of expression data as matrix, sort for AffyIDs, and > export > >>dat.r2 <- exprs(dat.rma) >>d <- dimnames(dat.r2)[[1]] >>dat.r2 <- dat.r2[order(d),] >>write.table(dat.r2,file="dat_rma.txt",sep="\t",quote=F,col.names=NA) > > >>x95A.r2 <- exprs(x95A.rma) >>d <- dimnames(x95A.r2)[[1]] >>x95A.r2 <- x95A.r2[order(d),] >>write.table(x95A.r2,file="x95A_rma.txt",sep="\t",quote=F,col.names=N A) > > >>x95Av2.r2 <- exprs(x95Av2.rma) >>d <- dimnames(x95Av2.r2)[[1]] >>x95Av2.r2 <- x95Av2.r2[order(d),] >>write.table(x95Av2.r2,file="x95Av2_rma.txt",sep="\t",quote=F,col.nam es =NA) > > > The following 25 probe sets are on U95Av2 but NOT on U95A: > >>unlist(setdiff(dimnames(x95Av2.r2)[[1]],dimnames(x95A.r2)[[1]])) > > [1] "160020_at" "160021_r_at" "160022_at" "160023_at" "160024_at" > [6] "160025_at" "160026_at" "160027_s_at" "160028_s_at" "160029_at" > [11] "160030_at" "160031_at" "160032_at" "160033_s_at" "160034_s_at" > [16] "160035_at" "160036_at" "160037_at" "160038_s_at" "160039_at" > [21] "160040_at" "160041_at" "160042_s_at" "160043_at" "160044_g_at" > > The following 26 probe sets are on U95A but NOT on U95Av2: > >>unlist(setdiff(dimnames(x95A.r2)[[1]],dimnames(x95Av2.r2)[[1]])) > > [1] "119_at" "1215_at" "1216_at" "124_i_at" "125_r_at" "127_at" > [7] "1301_s_at" "1302_s_at" "132_at" "1429_at" "1502_s_at" "1829_at" > [13] "1864_at" "1889_s_at" "1982_s_at" "36969_at" "383_at" "397_at" > [19] "412_s_at" "426_at" "439_at" "787_at" "788_s_at" "972_s_at" > [25] "985_s_at" "997_at" > > This is corrrect, since I checked this also manually! > > Thus alltogether 51 probe sets which are not on both chips, should be > eliminated by function combineAffyBatch(). Is this correct? > > However, these are the differences between the combined table and > U95Av2: > > <<...... SNIP >>>>
Normalization probe oligo Normalization probe oligo • 767 views

Login before adding your answer.

Traffic: 489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6