Question

Analyse of HGU133 set / Merge ExpressionSet

0

Entering edit mode

giroudpaul ▴ 40

@giroudpaul-10031

Last seen 5.0 years ago

France

Hello,

I am currently trying to extract data from a GEO dataset which has been done on the Affymetrix HGU133 plateforme, meaning that all samples were done on two different chip : hgu133a and hgu133b.

Applying the advises presented here :

Combining HGU133A & HGU133B data

A: Methods to combine U133A and B?

https://www.biostars.org/p/56657/

I did the two RMA normalization separately, meaning that I have now two expression sets :

data.rmaA = rma(dataA)
data.rmaB = rma(dataB)

The next step would be to combine these two expression sets into one before continuing my analysis with Limma.

What is the best solution to perform this ? What about redundant genes ? Will I be able to compare expression levels between the hgu133a and hgu133b although the normalization was done separately ?

Thank for your help

EDIT :

Actually, this is apparently a common question that is not specific to my case. The trouble is that the solutions I found always are for combining different chips, whereas in my case it is two "sister" chip that has been done in parallel (same samples, at the same time).

So what is the simpliest way to do this ? Is it better to use new("exprSet", ....) with the arguments patched together from the individual two HGU133A and HGU133B exprSets as suggested by Wolfgang Huber in the first link I provided.

I also found this post where they use a4Base package : Combining ExpressionSet objects : error with function merge() in "inSilicoMerging"

There is also the combine.eSet() method Concatenating or merging two or more ExpressionSet objects

and finally I also found the inSilicoMerging package, which seems to be more complete but is designed for different studies combination, so not sure if applicable here (Merge different datasets and perform differential expression analysis in limma

Any advises ?

> data.rmaA
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22283 features, 15 samples 
  element names: exprs 
protocolData
  sampleNames: GSM115046_M0_D1_chipA.CEL GSM115047_M0_D2_chipA.CEL ...
    GSM115060_M2_D3_chipA.CEL (15 total)
  varLabels: ScanDate
  varMetadata: labelDescription
phenoData
  sampleNames: GSM115046_M0_D1_chipA.CEL GSM115047_M0_D2_chipA.CEL ...
    GSM115060_M2_D3_chipA.CEL (15 total)
  varLabels: sample
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: hgu133a 
> data.rmaB
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22645 features, 15 samples 
  element names: exprs 
protocolData
  sampleNames: GSM115061_M0_D1_chipB.CEL GSM115062_M0_D2_chipB.CEL ...
    GSM115075_M2_D3_chipB.CEL (15 total)
  varLabels: ScanDate
  varMetadata: labelDescription
phenoData
  sampleNames: GSM115061_M0_D1_chipB.CEL GSM115062_M0_D2_chipB.CEL ...
    GSM115075_M2_D3_chipB.CEL (15 total)
  varLabels: sample
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: hgu133b

affy hgu133a hgu133b • 2.1k views

ADD COMMENT • link updated 8.7 years ago by James W. MacDonald 67k • written 8.7 years ago by giroudpaul ▴ 40

0

Entering edit mode

Well, neither of a4base:combineTwoExpressionSet and inSillicoMerging methods works for me.

The first one give this error, probably because it is not meant to merge expressionsets from different chips with different number of probes

data.rmaAB = combineTwoExpressionSet(data.rmaA, data.rmaB)
Error in cbind(assayData(x)$exprs, assayData(y)$exprs) : 
  number of rows of matrices must match (see arg 2)

The inSilicoMerging don't work for me either, because he looks for common probes between the exprset. And he's not happy because there is only 168 of them (qc.probes from Affymetrix)

Still, when using plotMDS and plotRLE from this same package, point out a batch effect between hgu133A and hgu133B exprsets. So I still need to perform so kind of normalization between the two right ?

ADD REPLY • link 8.7 years ago giroudpaul ▴ 40

score 1 · Accepted Answer · 2016-04-07

1

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 2 days ago

United States

You should note that there are almost no overlapping genes being measured on those two arrays, so there is no need to combine. In other words, processing and analyzing the two arrays separately is going to give you the same results that you would get if you were able to combine, so there's no profit in combining.