Question

questions concerning quantile normalization in HTqPCR

0

Entering edit mode

Andreia Fonseca ▴ 810

@andreia-fonseca-3796

Last seen 8.0 years ago

Dear all, I am analyzing data of qPCR plates in which each well correponds to a miRNA. The system does not have replicates within the same plate. the data consistes in 3 types of cells: I have 3 replicates for cell type one; 2 replicates for cell type 2 and only one sampled cell type 3. The data of each cell type and each replicate is in a separate file where the data has 5 columns miRNA ID Target or Endogenous Position on plate Passed or Failed Ct value miR-18a Target A1 Passed 26.99 this means that I have 6 files: File Treatment sample1.txt cell type1 sample2.txt cell type1 sample3.txt cell type1 sample1.txt cell type2 sample2.txt cell type2 sample1.txt cell type3 For the miRNA which did not have expression in a biological replicate, I have replaced NA by 0. question 1 -So after normalizing using quantile, several miRNA which did not have expression in sample 1 cell type 3 the value were normalized to 23.1191666666667. I notice that this happened only for this cell type, for which I do not have replicates. Can someone explain me why does this happen? question 2- the interpretation of values of Ct=0 and Ct>35. Shouldn't be the same, that there is no expression, or levels of expression under capacity of detection? question 3- I notice that miRNAs which are flagged by failed are being included on the analysis. Shouldn't these miRNAs be excluded? I thought that these would be, and I could find a filter for miRNA classified as undetermined but not for miRNAs flagged as failed. thanks in advance for the help. Kind regards, Andreia the code that I have used is bellow: library(HTqPCR) source("http://www.bioconductor.org/biocLite.R") biocLite("statmod") library(statmod) setwd("G:/margarida_miRNA") path<-("G:/margarida_miRNA") files <- read.delim(file.path(path, "Data.txt")) raw <- readCtData(files=files$File, path=path, n.features = 96, flag = 4, feature = 1, type = 2, position = 3, Ct = 5, header = FALSE, SDS = FALSE) pdf("plotCtOverview_raw_data.pdf") plotCtOverview(raw,genes=g, xlim=c(0,50), groups=files$Treatment, conf.int =TRUE) dev.off() raw.cat<-setCategory(raw, groups=files$Treatment, Ct.max=38, Ct.min=8, flag=TRUE, flag.out="Failed",verbose=TRUE, quantile=0.8) pdf("plot_Category.pdf") plotCtCategoryraw.cat) plotCtCategoryraw.cat, by.feature=TRUE, cexRow=0.1) dev.off() q.norm<-normalizeCtDataraw.cat,norm="quantile") library(HTqPCR) qFilt<-filterCtData(q.norm, remove.type="Endogenous") iqr.values<-apply(exprs(q.norm),1,IQR) iqr.filt<-filterCtData(q.norm, remove.IQR=1.5, remove.category="Undetermined") pdf("severalgraphs.pdf") plotCtCor(q.norm, main="Ct correlation") plotCtDensity(q.norm) plotCtDensity(iqr.filt) plotCtBoxes(q.norm) plotCtBoxes(iqr.filt) plotCtScatter(q.norm, cards=c(1,2), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(1,3), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(2,3), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(4,5), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(1,6), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(2,6), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(3,6), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(4,6), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(5,6), col="type", diag=TRUE) plotCtPairs(q.norm, col="type", diag=TRUE) dev.off() design<-model.matrix(~0+files$Treatment) colnames(design)<-c("CT1","CT2","CT3") contrasts<-makeContrasts(CT1-CT2, levels=design) qDE.limma<-limmaCtData(iqr.filt, design=design, contrasts=contrasts, ndups=1, spacing=1) qDE.limma dim(qDE.limma) names(qDE.limma) pdf("testingandheatmap.pdf") plotCtRQ(qDE.limma, p.val=0.05, transform="log10", col="#9E0142") g<-featureNames(iqr.filt) plotCtHeatmap(iqr.filt, gene.names=g, dist="euclidean", cexRow=0.4) cluster.list<-clusterCt(iqr.filt, type="genes", dist="euclidean", n.cluster=8, hang=-1,cex=0.5) dev.off() sessionInfo() R version 2.11.1 (2010-05-31) i386-pc-mingw32 locale: [1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252 [3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C [5] LC_TIME=Portuguese_Portugal.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] statmod_1.4.6 HTqPCR_1.2.0 limma_3.4.4 RColorBrewer_1.0-2 [5] Biobase_2.8.0 loaded via a namespace (and not attached): [1] affy_1.26.1 affyio_1.16.0 gdata_2.8.0 [4] gplots_2.8.0 gtools_2.6.2 preprocessCore_1.10.0 [7] tools_2.11.1 -- -------------------------------------------- Andreia J. Amaral Unidade de Imunologia Clínica Instituto de Medicina Molecular Universidade de Lisboa email: andreiaamaral@fm.ul.pt andreia.fonseca@gmail.com [[alternative HTML version deleted]]

miRNA qPCR miRNA qPCR • 1.4k views

ADD COMMENT • link updated 14.4 years ago by Heidi Dvinge ★ 2.0k • written 14.4 years ago by Andreia Fonseca ▴ 810

score 0 · Answer 1 · 2010-09-17

Hello Andreia, sorry for the late reply, this email slipped under my radar. > > I am analyzing data of qPCR plates in which each well correponds to a > miRNA. > The system does not have replicates within the same plate. > the data consistes in 3 types of cells: I have 3 replicates for cell type > one; 2 replicates for cell type 2 and only one sampled cell type 3. > The data of each cell type and each replicate is in a separate file where > the data has 5 columns > miRNA ID Target or Endogenous Position on plate Passed or Failed Ct > value > miR-18a Target A1 Passed 26.99 this means that I have 6 files: > File Treatment > sample1.txt cell type1 > sample2.txt cell type1 > sample3.txt cell type1 > sample1.txt cell type2 > sample2.txt cell type2 > sample1.txt cell type3 > > > > For the miRNA which did not have expression in a biological replicate, I > have replaced NA by 0. > > question 1 -So after normalizing using quantile, several miRNA which did > not > have expression in sample 1 cell type 3 the value were normalized to > 23.1191666666667. I notice that this happened only for this cell type, for > which I do not have replicates. Can someone explain me why does this > happen? > This is probably all the values that were replaced with 0, no? This is an artefact of the quantile normalisation, which will force all the distributions of Ct values to be identical - whether this is appropriate or not! Values within each sample are sorted according to Ct value, the average Ct value for rank 1, 2, ... n are calculated across samples, and that value is assigned to the genes in that order. So if for example 20 out of 96 genes are 0 in one sample, these will be ranked 77-96 (ties are treated equally) and are assigned the mean value of genes ranked 77-96 in all samples. In your case that mean value is apparently 23.12. Quantile normalisation is in general quite a harsh normalisation method, and shouldn't be used if you have e.g. many missing values, or if you expect a genuine, i.e. biologically real, higher or lower gene expression level in some samples. > question 2- the interpretation of values of Ct=0 and Ct>35. Shouldn't be > the > same, that there is no expression, or levels of expression under capacity > of > detection? > Ct > 35 is generally taken to mean expression under capacity of detection, whereas Ct=0 in the output of many qPCR instruments is the same as NA, i.e. a value that is completely missing for any number of reasons, not necessarily because of low gene expression level. > question 3- I notice that miRNAs which are flagged by failed are being > included on the analysis. Shouldn't these miRNAs be excluded? I thought > that > these would be, and I could find a filter for miRNA classified as > undetermined but not for miRNAs flagged as failed. > They're not excluded per default, since the user might not necessarily agree with the parameters set by the qPCR instrument (or HTqPCR functions) for failed Ct measurements. Removing them does make sense in some cases though. Does it work to say something like: q.filt <- filterCtData(q.norm, remove.category="failed") You might have to run filterCtData twice. Once to remove category="Undetermined", and a second time to remove category="failed". Depends on how many types of classes you have in featureCategory(q.norm). Per default I think HTqPCR just expect "OK", "Undetermined" and "Unreliable". HTH \Heidi > thanks in advance for the help. Kind regards, > Andreia > > the code that I have used is bellow: > > library(HTqPCR) > source("http://www.bioconductor.org/biocLite.R") > biocLite("statmod") > library(statmod) > setwd("G:/margarida_miRNA") > path<-("G:/margarida_miRNA") > files <- read.delim(file.path(path, "Data.txt")) > raw <- readCtData(files=files$File, path=path, n.features = 96, flag = 4, > feature = 1, type = 2, position = 3, Ct = 5, header = FALSE, SDS = FALSE) > pdf("plotCtOverview_raw_data.pdf") > plotCtOverview(raw,genes=g, xlim=c(0,50), groups=files$Treatment, conf.int > =TRUE) > dev.off() > raw.cat<-setCategory(raw, groups=files$Treatment, Ct.max=38, Ct.min=8, > flag=TRUE, flag.out="Failed",verbose=TRUE, quantile=0.8) > pdf("plot_Category.pdf") > plotCtCategoryraw.cat) > plotCtCategoryraw.cat, by.feature=TRUE, cexRow=0.1) > dev.off() > q.norm<-normalizeCtDataraw.cat,norm="quantile") > library(HTqPCR) > qFilt<-filterCtData(q.norm, remove.type="Endogenous") > iqr.values<-apply(exprs(q.norm),1,IQR) > iqr.filt<-filterCtData(q.norm, remove.IQR=1.5, > remove.category="Undetermined") > pdf("severalgraphs.pdf") > plotCtCor(q.norm, main="Ct correlation") > plotCtDensity(q.norm) > plotCtDensity(iqr.filt) > plotCtBoxes(q.norm) > plotCtBoxes(iqr.filt) > plotCtScatter(q.norm, cards=c(1,2), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(1,3), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(2,3), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(4,5), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(1,6), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(2,6), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(3,6), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(4,6), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(5,6), col="type", diag=TRUE) > plotCtPairs(q.norm, col="type", diag=TRUE) > dev.off() > > > > design<-model.matrix(~0+files$Treatment) > > colnames(design)<-c("CT1","CT2","CT3") > > > > > contrasts<-makeContrasts(CT1-CT2, levels=design) > qDE.limma<-limmaCtData(iqr.filt, design=design, contrasts=contrasts, > ndups=1, spacing=1) > qDE.limma > dim(qDE.limma) > names(qDE.limma) > pdf("testingandheatmap.pdf") > plotCtRQ(qDE.limma, p.val=0.05, transform="log10", col="#9E0142") > g<-featureNames(iqr.filt) > plotCtHeatmap(iqr.filt, gene.names=g, dist="euclidean", cexRow=0.4) > cluster.list<-clusterCt(iqr.filt, type="genes", dist="euclidean", > n.cluster=8, hang=-1,cex=0.5) > dev.off() > > sessionInfo() > R version 2.11.1 (2010-05-31) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252 > > [3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C > > [5] LC_TIME=Portuguese_Portugal.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] statmod_1.4.6 HTqPCR_1.2.0 limma_3.4.4 > RColorBrewer_1.0-2 > [5] Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affy_1.26.1 affyio_1.16.0 gdata_2.8.0 > [4] gplots_2.8.0 gtools_2.6.2 preprocessCore_1.10.0 > [7] tools_2.11.1 > > > > > -- > -------------------------------------------- > Andreia J. Amaral > Unidade de Imunologia Cl?nica > Instituto de Medicina Molecular > Universidade de Lisboa > email: andreiaamaral at fm.ul.pt > andreia.fonseca at gmail.com >