Entering edit mode
Heidi Dvinge
★
2.0k
@heidi-dvinge-2195
Last seen 10.3 years ago
Hello Balbine,
(I'm forwarding this reply to the Bioconductor list, since it might
be of use to people there. Fluidigm seems to be increasing in
popularity. A recent discussion, http://article.gmane.org/
gmane.science.biology.informatics.conductor/29566/match=htqpcr
+fluidigm , might also be of interest to you.)
On 24 Jun 2010, at 18:03, ROUSSEL BALBINE wrote:
> Hello,
> I need advice on normalizing my data.
>
> With the Fluidigm chips, we can measure expression of 96 genes in
> 96 samples on one plate.
>
> We have 15 plates concerning samples so 1536 samples and 3 sets
> concerning genes so we have finally 288 genes.
>
> Totally, we have 15*3=45 plates
>
> The problem is that we have not housekeeping genes and no
> calibrator samples for each set or for each plate. But we have a
> test sample on all plates and a gene on all sets so we can verify
> if the normalization takes away much of the impact plates and if we
> keep the same information for the same sample or the same gene.
>
Just to make sure I understand you correctly; you have 3 different
plates (lets call them A, B and C) with genes geneA1->geneA96, geneB1-
>geneB96 and geneC1>geneC96. Sample X is present on all 45 plates. A
single gene, geneY is present on plate types A, B and C. So you
should have a single value out of the 96x96 that is identical on all
45 plates. So do you have 15x96 = 1440 or 1536 samples in total?
For normalising within each plate type you have several options. You
can use e.g. rank-invariant normalisation as you suggest, but given
that you have an entire sample, i.e. 96 Ct values, that should be the
same for all 15 plates, you can also select these 96 values and do a
deltaCt normalisation. That corresponds to using these 96 values as
"housekeeping" genes, since they should be identical across all
plates of the same type A, B or C).
Normalising across plates A, B and C is a bit trickier. In principle
you can designate the single sample x gene value common across all 45
plates as a pseudo-housekeeping gene and normalise against that using
delta Ct. But because there are no replicates within each plate, if
that single reaction didn't work well for whatever reason, it all
affect the entire plate after normalisation. Risky! What's the
correlation you see for this one Ct values across all 45 plates?
Within each of the 3 groups of 45 plates?
Alternatively you can also use quantile normalisation as you suggest.
Note though that this is a quite "harsh" procedure. No-matter what
you data looks like to begin with, it will force them into having the
same Ct-value distribution. That might be okay if all your genes and
samples are completely randomised across all 45 plates. But what if
for example 10 samples on one plate (e.g. a particular treatment) all
give very high Ct values, whereas another 10 samples (a different
treatment) on another plate all give very low Ct values? Then you
can't assume that the Ct value distribution on each plate should be
identical. In that case a rank-invariant normalisation is probably
the safest bet.
If you're not going to compare Ct values directly across plate types,
such as sampleAA-gene1 versus sample BA-gene4, then technically you
wouldn't even have to normalise between plates types A, B and C.
Presumably you want to find differential expression of samples across
each individual gene, right? Since the same type of gene will always
be present on the same type of plate, regardless of sample, you
should be okay with just normalising within each A/B/C set.
I can't give you any solid advice on what normalisation to do, since
it will depend on the distribution of your data, how the samples have
been group together on plates and other factors. I would probably
first spend a lot of time on initial data QC and comparison, and
depending on how the data looks do something along these lines:
- Load the three different plate types, A, B and C into separate
qPCRset objects. Each object would then consist of 9216 rows (96
genes x 96 samples) and 15 columns (individual plates).
- Normalise each of these objects separately, using either quantile
normalisation (strictest choice), rank-invariant normalisation or
deltaCt normalisation based on the 96 rows corresponding to the
sample that have been loaded on all plates.
- Combine the three objects together (cbind/rbind), and potentially
change the layout (changeCtLayout) so that you have 1 gene per row
and 1 sample per column, such that the object can be used for
statistical testing.
or perhaps:
- Load all the plates into a single object with 9206 rows (one 96x96
plate) and one row per individual plate = 45.
- Do e.g. rank-invariant normalisation across all these.
I would probably use some of the diagnostics functions, like
clusterCt and plotCtCor both before and after normalisation, to see
if the samples group together as expected based on the biology.
HTH
\Heidi
> So I thought realize normalizations by quantile, or by rank-
invariant.
>
> But I do not know what strategy used because :
>
> - I can have a plate effect on the 3 set of genes
> - I can also have a plate effect concerning the different
samples
>
> Is it necessary that I start combining all "qPCRset objects" or not?
>
> For plate 1 and 3 gene sets:
>
> An object of class "qPCRset"
> Size: 288 features, 96 samples
> Feature types: P1
> Feature names: AACS AADACL1 ABHD5 ...
> Feature classes:
> Feature categories: OK
> Sample names: Sample1 Sample2 Sample3 ...
>
> For plate 2 and 3 gene sets:
>
> An object of class "qPCRset"
> Size: 288 features, 96 samples
> Feature types: P1
> Feature names: AACS AADACL1 ABHD5 ...
> Feature classes:
> Feature categories: OK
> Sample names: Sample1 Sample2 Sample3 ...
>
>
> .... up to the plate 15 and 3 gene sets
>
> > q.features=cbind(qPCRset1,qPCRset2,.....,qPCRset15)
>
> > q.features
> An object of class "qPCRset"
> Size: 288 features, 1536 samples
> Feature types:
> Feature names: AACS AADACL1 ABHD5 ...
> Feature classes:
> Feature categories: OK
> Sample names: Sample1 Sample2 Sample3 ...
>
>
> >group=read.table(file="group 1536 samples.csv",h=T,sep=";",dec=".")
> >attach(group)
> >groupCID=c(as.character(group$CID))
> >sample=c(as.character(group$Subject))
> >sampleNames(q.features)=sample
> >q.features2=setCategory
> (q.features,groups=groupCID,flag=TRUE,flag.out="Failed")
>
>q.features3=filterCategory(q.features2,na.categories="Undetermined")
>
>
>
> do the following strategy may be good? :
>
> - to do the quantile normalisation on the 96 samples and 96*3
> genes (g=288)
> - then to do the global quantile normalisation on all samples
> and all genes
> (n=1536, g=288)
>
> what is with the function "normalizeCtData()" two steps are
> performed simultaneously?
> if not how can I do? Do I have to do a normalization for each plate
> with 3 gene sets? or Is what I specify in my script that it is
> these 15 different plates?
>
> Do you see another strategy more suitable for my data to realize
> normalization?
>
> How would you do?
>
> Perhaps if we combine the two methods (quantile and rank-invariant):
>
> - to do the quantile normalisation on the 96 samples and 96*3
> genes (g=288)
> - then to do rank-invariant normalization on all samples and
> all genes
> (n=1536, g=288)
>
> How would you do?
>
> Thanks to your response,
>
> Balbine
>
>
>
>
>
>
>
>
>
>
[[alternative HTML version deleted]]