Combining HGU133A & HGU133B data
2
0
Entering edit mode
@adaikalavan-ramasamy-437
Last seen 10.3 years ago
Dear all, I have been asked to analyze the data where samples were hybridized on both HGU133A and HGU133B Affymetrix chips. One option is to analyze the A and B chips seperately but this is not desirable. The other option is to combine both (using something akin to "rbind") to combine these data. I think it is better to combine the results after rma as different background correction needs be applied. This method however does have its problems with the genes redundant between A and B chip (there are 2000+ genes that overlap both chips). Can anyone suggest what is the best way to deal with this problem ? Does anyone have any experience or seen publications combining data from two different array formats. Thank you. -- Adaikalavan Ramasamy ramasamyA@gis.a-star.edu.sg Research Assistant http://giscompute.gis.a-star.edu.sg/~adai Microarray & Expression Genomics Tel: 65-6478 8043 Information & Mathematical Sciences Fax: 65 6478 9058 Genome Institute of Singapore http://www.gis.a-star.edu.sg/
hgu133a hgu133b hgu133a hgu133b • 2.3k views
ADD COMMENT
0
Entering edit mode
Laurent Gautier ★ 2.3k
@laurent-gautier-29
Last seen 10.3 years ago
On Mon, Sep 15, 2003 at 06:26:15PM +0800, Adaikalavan RAMASAMY wrote: > Dear all, > > I have been asked to analyze the data where samples were hybridized on > both HGU133A and HGU133B Affymetrix chips. One option is to analyze the > A and B chips seperately but this is not desirable. > > The other option is to combine both (using something akin to "rbind") to > combine these data. I think it is better to combine the results after > rma as different background correction needs be applied. > > This method however does have its problems with the genes redundant > between A and B chip (there are 2000+ genes that overlap both chips). > > Can anyone suggest what is the best way to deal with this problem ? Does > anyone have any experience or seen publications combining data from two > different array formats. > > Thank you. > > -- > Adaikalavan Ramasamy ramasamyA@gis.a-star.edu.sg > Research Assistant > http://giscompute.gis.a-star.edu.sg/~adai > Microarray & Expression Genomics Tel: 65-6478 8043 > Information & Mathematical Sciences Fax: 65 6478 9058 > Genome Institute of Singapore http://www.gis.a-star.edu.sg/ > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Wolgang Huber and Robert Gentleman have certainly a word to say about that. Did you check the function 'combine' in the package 'matchprobes' (section 'devel') ? L.
ADD COMMENT
0
Entering edit mode
Hi On Mon, 15 Sep 2003, Laurent Gautier wrote: > Wolgang Huber and Robert Gentleman have certainly a word to say about > that. Did you check the function 'combine' in the package 'matchprobes' > (section 'devel') ? The combine function in the matchprobes package is useful for combining data from different chip types. The combination is done on the probe-level, before normalization, and it requires that there is an appreciable overlap in probe sequences (as, for example, with hu6800/hgu95av2 or mgu74a/mgu74av2). The combination is based on the INTERSECTION of probes that have the same sequence, and from the point of view of the expression matrix, it corresponds, loosely speaking, to a CBIND. What Adaikalavan is looking for is much simpler: something that works on the UNION of all probes/genes on HGU133A and HGU133B, and from the point of view of the expression matrix corresponds to an RBIND. I am not aware of a simpler method for doing this than calling new("exprSet", ....) with the arguments patched together from the individual two HGU133A and HGU133B exprSets. Best regards Wolfgang ------------------------------------- Wolfgang Huber Division of Molecular Genome Analysis German Cancer Research Center Heidelberg, Germany Phone: +49 6221 424709 Fax: +49 6221 42524709 Http: www.dkfz.de/mga/whuber
ADD REPLY
0
Entering edit mode
On Mon, Sep 15, 2003 at 02:19:01PM +0200, w.huber@dkfz-heidelberg.de wrote: > > Hi > > On Mon, 15 Sep 2003, Laurent Gautier wrote: > > Wolgang Huber and Robert Gentleman have certainly a word to say about > > that. Did you check the function 'combine' in the package 'matchprobes' > > (section 'devel') ? > > The combine function in the matchprobes package is useful for combining > data from different chip types. The combination is done on the > probe-level, before normalization, and it requires that there is an > appreciable overlap in probe sequences (as, for example, with > hu6800/hgu95av2 or mgu74a/mgu74av2). The combination is based on the > INTERSECTION of probes that have the same sequence, and from the point of > view of the expression matrix, it corresponds, loosely speaking, to a > CBIND. > > What Adaikalavan is looking for is much simpler: something that works on > the UNION of all probes/genes on HGU133A and HGU133B, and from the point > of view of the expression matrix corresponds to an RBIND. > > I am not aware of a simpler method for doing this than calling > new("exprSet", ....) with the arguments patched together from the > individual two HGU133A and HGU133B exprSets. > > Best regards > Wolfgang > > ------------------------------------- > Wolfgang Huber > Division of Molecular Genome Analysis > German Cancer Research Center > Heidelberg, Germany > Phone: +49 6221 424709 > Fax: +49 6221 42524709 > Http: www.dkfz.de/mga/whuber > ------------------------------------- > Ooops... sorry for the confusion (I never used combined (...yet)). In this case, the union of expression values is a straightforward 'rbind' as Wolfgang suggests. The probe business is slightly more tricjy because of the cdfenvs. The following scheme should make it (more or less I did not test it): ##abatch.a and abatch.b are the AffyBatch objects abatch.ab <- new("AffyBatch", exprs=rbind(exprs(abatch.a), exprs(abatch.b)), cdfName="cdfenv.ab") ## make a cdfenv for the union-combined-chips cdfenv.ab <- new.env(hash=TRUE) cdfenv.a <- getCdfInfo(abatch.a) for (i in ls(cdfenv.a)) { assign(i, get(i, envir=cdfenv.a), envir=cdfenv.ab) } offset <- nrow(exprs(abatch.a)) cdfenv.b <- getCdfInfo(abatch.b) for (i in ls(cdfenv.b)) { if (exists(i, envir=cdfenv.a)) stop(paste(i, ": id already in use !")) assign(i, get(i, envir=cdfenv.b)+offset, envir=cdfenv.ab) } ## from now, this should be like a regular AffyBatch ## (expect quirks with some methods/functions ## dealing with spatial features of the probes, ex: image) Hopin' it helps, L. -- -------------------------------------------------------------- Laurent Gautier CBS, Building 208, DTU PhD. Student DK-2800 Lyngby,Denmark tel: +45 45 25 24 89 http://www.cbs.dtu.dk/laurent
ADD REPLY
0
Entering edit mode
> In this case, the union of expression values is a straightforward 'rbind' > as Wolfgang suggests. The probe business is slightly more tricjy because > of the cdfenvs. The following scheme should make it (more or less I > did not test it): ....(some code)... But while software-technically possible, it may not be the best idea to patch together the data on the probe (AffyBatch) level - the data from the individual arrays will very likely need to be normalized by themselves. Best regards Wolfgang
ADD REPLY
0
Entering edit mode
Crispin Miller ★ 1.1k
@crispin-miller-264
Last seen 10.3 years ago
Hi, A more serious issue is that normalisation (almost certainly) assumes that the average expression level on each chip is the same. This is clearly not the case between A and B chips - and combing each pair of A's and B's for every sample, before normalisation, is almost certainly a bad idea... Normalising the A's and B's separately is probably much more sensible - and this then allows you to use the 2000+ shared probes to see how well your normalisation has worked: their signals are from the same hyb. cocktail so they should produce the same expression levels. If you think about it this way, the repeated probes are a Good Thing(TM) :-) Crispin > -----Original Message----- > From: Adaikalavan RAMASAMY [mailto:ramasamya@gis.a-star.edu.sg] > Sent: 15 September 2003 11:26 > To: bioconductor@stat.math.ethz.ch > Cc: Mark.Reimers@biosci.ki.se > Subject: [BioC] Combining HGU133A & HGU133B data > > > Dear all, > > I have been asked to analyze the data where samples were hybridized on > both HGU133A and HGU133B Affymetrix chips. One option is to > analyze the > A and B chips seperately but this is not desirable. > > The other option is to combine both (using something akin to > "rbind") to > combine these data. I think it is better to combine the results after > rma as different background correction needs be applied. > > This method however does have its problems with the genes redundant > between A and B chip (there are 2000+ genes that overlap both chips). > > Can anyone suggest what is the best way to deal with this > problem ? Does > anyone have any experience or seen publications combining > data from two > different array formats. > > Thank you. > > -- > Adaikalavan Ramasamy ramasamyA@gis.a-star.edu.sg > Research Assistant > http://giscompute.gis.a-star.edu.sg/~adai > Microarray & Expression Genomics Tel: 65-6478 8043 > Information & Mathematical Sciences Fax: 65 6478 9058 > Genome Institute of Singapore > http://www.gis.a-star.edu.sg/ > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > -------------------------------------------------------- This email is confidential and intended solely for the use o...{{dropped}}
ADD COMMENT

Login before adding your answer.

Traffic: 718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6