Hello,
The limma user's guide states that:
The linear model and differential expression functions are applicable to data from any quantitative gene expression technology including microarrays, RNA-seq and quantitative PCR.
However, that's the only time qPCR gets mentioned, and I haven't found examples on how to work with this type of data.
For background, qPCR data comes in the form of cycle threshold (Ct) values, which are inversely correlated to the gene expression (the more the gene is expressed, the fewer cycles it will need to reach the threshold).
These Ct values are then typically normalized by subtracting the Ct values of reference/housekeeping genes to derive ∆Ct values:
∆Ct = Ct (gene of interest) – Ct (housekeeping gene)
Then one can calculate the difference across conditions of interest as ∆∆Ct values:
∆∆Ct = ∆Ct (treated sample) – ∆Ct (untreated sample)
And fold changes are calculated as:
2^-∆∆Ct
But this only works for very simple experimental designs/contrasts (e.g.: treated vs untreated), while I have a more complex study design that would benefit from the statistical modelling capabilities of limma.
So, here's my question: how should I process the qPCR Ct data for use in limma? Ultimately I'd like to transform the Ct values in expression values that I can fit into an ExpressionSet object and feed to lmFit() and downstream functions.
Thank you!
Thank you Gordon!
A couple follow-up questions:
y
represents log2-expression values (and not linear expression values, to be log2-transformed)?normalizeCyclicLoess()
for normalization of Ct values? I do have Ct values for 3 housekeeping genes, but I'm not sure how I would feed these to the function.trend = TRUE
androbust = TRUE
as arguments ineBayes()
to control heteroscedasicity and outliers?Expression doubles during each cycle of PCR. Ct counts number of cycles needed to reach a threshold. It follows from a couple of lines of math that differences in Ct are on log2 scale.
For cycllc loess,
x
is they
matrix including house keep genes.weights
is a vector of lengthnrow(x)
equal to 50 for house-keeping probes and 1 otherwise.I would probably not use
trend=TRUE
orrobust=TRUE
unless you have PCR for a lot of genes.Dear Dr. Smyth:
I want to follow up this topic for limma: I have used limma method for many occasions of microarray datasets in the past, but besides microarrays, RNA-seq (with voom) and quantitative PCR data as mentioned here. What about other types of high-thoughput data that can be used for limma analysis? Is there a way or general principle/rule to assess whether the data types/distribution etc is suitable for limma analysis to derive differential features between interested contrasting groups? for example, one dataset I wish to use limma was derived from high-throughput assays, and had been scaled between around -2 and 1 (and we can essentially treat the data as log2 intensity like in microrray and high values mean highly expressed etc for the same probe after normalized), since more posiitve and more negative have opposite biological meanings. Another dataset we have is compound labeling high throghput assays for interested protein at intended sites/pockets as percentage levels of successful labeling, data ranged from 0 to 100%. highly lableing % certainly is favored as for good compounds. Maybe I shall log transform the percentage data if can be used for limma analysis?
I was wondering about the possiblity of using limma in these datasets. Any advice would be highly appreciated! Thanks a lot in advance! I shall say I had got great advices from you in the past, really appreciated! Best Ming