Hello,
I am currently trying to extract data from a GEO dataset which has been done on the Affymetrix HGU133 plateforme, meaning that all samples were done on two different chip : hgu133a and hgu133b.
Applying the advises presented here :
Combining HGU133A & HGU133B data
A: Methods to combine U133A and B?
https://www.biostars.org/p/56657/
I did the two RMA normalization separately, meaning that I have now two expression sets :
data.rmaA = rma(dataA) data.rmaB = rma(dataB)
The next step would be to combine these two expression sets into one before continuing my analysis with Limma.
What is the best solution to perform this ? What about redundant genes ? Will I be able to compare expression levels between the hgu133a and hgu133b although the normalization was done separately ?
Thank for your help
EDIT :
Actually, this is apparently a common question that is not specific to my case. The trouble is that the solutions I found always are for combining different chips, whereas in my case it is two "sister" chip that has been done in parallel (same samples, at the same time).
So what is the simpliest way to do this ? Is it better to use new("exprSet", ....) with the arguments patched together from the individual two HGU133A and HGU133B exprSets as suggested by Wolfgang Huber in the first link I provided.
I also found this post where they use a4Base package : Combining ExpressionSet objects : error with function merge() in "inSilicoMerging"
There is also the combine.eSet() method Concatenating or merging two or more ExpressionSet objects
and finally I also found the inSilicoMerging package, which seems to be more complete but is designed for different studies combination, so not sure if applicable here (Merge different datasets and perform differential expression analysis in limma
Any advises ?
> data.rmaA ExpressionSet (storageMode: lockedEnvironment) assayData: 22283 features, 15 samples element names: exprs protocolData sampleNames: GSM115046_M0_D1_chipA.CEL GSM115047_M0_D2_chipA.CEL ... GSM115060_M2_D3_chipA.CEL (15 total) varLabels: ScanDate varMetadata: labelDescription phenoData sampleNames: GSM115046_M0_D1_chipA.CEL GSM115047_M0_D2_chipA.CEL ... GSM115060_M2_D3_chipA.CEL (15 total) varLabels: sample varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: hgu133a > data.rmaB ExpressionSet (storageMode: lockedEnvironment) assayData: 22645 features, 15 samples element names: exprs protocolData sampleNames: GSM115061_M0_D1_chipB.CEL GSM115062_M0_D2_chipB.CEL ... GSM115075_M2_D3_chipB.CEL (15 total) varLabels: ScanDate varMetadata: labelDescription phenoData sampleNames: GSM115061_M0_D1_chipB.CEL GSM115062_M0_D2_chipB.CEL ... GSM115075_M2_D3_chipB.CEL (15 total) varLabels: sample varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: hgu133b
Well, neither of a4base:combineTwoExpressionSet and inSillicoMerging methods works for me.
The first one give this error, probably because it is not meant to merge expressionsets from different chips with different number of probes
The inSilicoMerging don't work for me either, because he looks for common probes between the exprset. And he's not happy because there is only 168 of them (qc.probes from Affymetrix)
Still, when using plotMDS and plotRLE from this same package, point out a batch effect between hgu133A and hgu133B exprsets. So I still need to perform so kind of normalization between the two right ?