questions concerning quantile normalization in HTqPCR
1
0
Entering edit mode
@andreia-fonseca-3796
Last seen 7.8 years ago
Dear all, I am analyzing data of qPCR plates in which each well correponds to a miRNA. The system does not have replicates within the same plate. the data consistes in 3 types of cells: I have 3 replicates for cell type one; 2 replicates for cell type 2 and only one sampled cell type 3. The data of each cell type and each replicate is in a separate file where the data has 5 columns miRNA ID Target or Endogenous Position on plate Passed or Failed Ct value miR-18a Target A1 Passed 26.99 this means that I have 6 files: File Treatment sample1.txt cell type1 sample2.txt cell type1 sample3.txt cell type1 sample1.txt cell type2 sample2.txt cell type2 sample1.txt cell type3 For the miRNA which did not have expression in a biological replicate, I have replaced NA by 0. question 1 -So after normalizing using quantile, several miRNA which did not have expression in sample 1 cell type 3 the value were normalized to 23.1191666666667. I notice that this happened only for this cell type, for which I do not have replicates. Can someone explain me why does this happen? question 2- the interpretation of values of Ct=0 and Ct>35. Shouldn't be the same, that there is no expression, or levels of expression under capacity of detection? question 3- I notice that miRNAs which are flagged by failed are being included on the analysis. Shouldn't these miRNAs be excluded? I thought that these would be, and I could find a filter for miRNA classified as undetermined but not for miRNAs flagged as failed. thanks in advance for the help. Kind regards, Andreia the code that I have used is bellow: library(HTqPCR) source("http://www.bioconductor.org/biocLite.R") biocLite("statmod") library(statmod) setwd("G:/margarida_miRNA") path<-("G:/margarida_miRNA") files <- read.delim(file.path(path, "Data.txt")) raw <- readCtData(files=files$File, path=path, n.features = 96, flag = 4, feature = 1, type = 2, position = 3, Ct = 5, header = FALSE, SDS = FALSE) pdf("plotCtOverview_raw_data.pdf") plotCtOverview(raw,genes=g, xlim=c(0,50), groups=files$Treatment, conf.int =TRUE) dev.off() raw.cat<-setCategory(raw, groups=files$Treatment, Ct.max=38, Ct.min=8, flag=TRUE, flag.out="Failed",verbose=TRUE, quantile=0.8) pdf("plot_Category.pdf") plotCtCategoryraw.cat) plotCtCategoryraw.cat, by.feature=TRUE, cexRow=0.1) dev.off() q.norm<-normalizeCtDataraw.cat,norm="quantile") library(HTqPCR) qFilt<-filterCtData(q.norm, remove.type="Endogenous") iqr.values<-apply(exprs(q.norm),1,IQR) iqr.filt<-filterCtData(q.norm, remove.IQR=1.5, remove.category="Undetermined") pdf("severalgraphs.pdf") plotCtCor(q.norm, main="Ct correlation") plotCtDensity(q.norm) plotCtDensity(iqr.filt) plotCtBoxes(q.norm) plotCtBoxes(iqr.filt) plotCtScatter(q.norm, cards=c(1,2), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(1,3), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(2,3), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(4,5), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(1,6), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(2,6), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(3,6), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(4,6), col="type", diag=TRUE) plotCtScatter(q.norm, cards=c(5,6), col="type", diag=TRUE) plotCtPairs(q.norm, col="type", diag=TRUE) dev.off() design<-model.matrix(~0+files$Treatment) colnames(design)<-c("CT1","CT2","CT3") contrasts<-makeContrasts(CT1-CT2, levels=design) qDE.limma<-limmaCtData(iqr.filt, design=design, contrasts=contrasts, ndups=1, spacing=1) qDE.limma dim(qDE.limma) names(qDE.limma) pdf("testingandheatmap.pdf") plotCtRQ(qDE.limma, p.val=0.05, transform="log10", col="#9E0142") g<-featureNames(iqr.filt) plotCtHeatmap(iqr.filt, gene.names=g, dist="euclidean", cexRow=0.4) cluster.list<-clusterCt(iqr.filt, type="genes", dist="euclidean", n.cluster=8, hang=-1,cex=0.5) dev.off() sessionInfo() R version 2.11.1 (2010-05-31) i386-pc-mingw32 locale: [1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252 [3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C [5] LC_TIME=Portuguese_Portugal.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] statmod_1.4.6 HTqPCR_1.2.0 limma_3.4.4 RColorBrewer_1.0-2 [5] Biobase_2.8.0 loaded via a namespace (and not attached): [1] affy_1.26.1 affyio_1.16.0 gdata_2.8.0 [4] gplots_2.8.0 gtools_2.6.2 preprocessCore_1.10.0 [7] tools_2.11.1 -- -------------------------------------------- Andreia J. Amaral Unidade de Imunologia Clínica Instituto de Medicina Molecular Universidade de Lisboa email: andreiaamaral@fm.ul.pt andreia.fonseca@gmail.com [[alternative HTML version deleted]]
miRNA qPCR miRNA qPCR • 1.4k views
ADD COMMENT
0
Entering edit mode
Heidi Dvinge ★ 2.0k
@heidi-dvinge-2195
Last seen 10.3 years ago
Hello Andreia, sorry for the late reply, this email slipped under my radar. > > I am analyzing data of qPCR plates in which each well correponds to a > miRNA. > The system does not have replicates within the same plate. > the data consistes in 3 types of cells: I have 3 replicates for cell type > one; 2 replicates for cell type 2 and only one sampled cell type 3. > The data of each cell type and each replicate is in a separate file where > the data has 5 columns > miRNA ID Target or Endogenous Position on plate Passed or Failed Ct > value > miR-18a Target A1 Passed 26.99 this means that I have 6 files: > File Treatment > sample1.txt cell type1 > sample2.txt cell type1 > sample3.txt cell type1 > sample1.txt cell type2 > sample2.txt cell type2 > sample1.txt cell type3 > > > > For the miRNA which did not have expression in a biological replicate, I > have replaced NA by 0. > > question 1 -So after normalizing using quantile, several miRNA which did > not > have expression in sample 1 cell type 3 the value were normalized to > 23.1191666666667. I notice that this happened only for this cell type, for > which I do not have replicates. Can someone explain me why does this > happen? > This is probably all the values that were replaced with 0, no? This is an artefact of the quantile normalisation, which will force all the distributions of Ct values to be identical - whether this is appropriate or not! Values within each sample are sorted according to Ct value, the average Ct value for rank 1, 2, ... n are calculated across samples, and that value is assigned to the genes in that order. So if for example 20 out of 96 genes are 0 in one sample, these will be ranked 77-96 (ties are treated equally) and are assigned the mean value of genes ranked 77-96 in all samples. In your case that mean value is apparently 23.12. Quantile normalisation is in general quite a harsh normalisation method, and shouldn't be used if you have e.g. many missing values, or if you expect a genuine, i.e. biologically real, higher or lower gene expression level in some samples. > question 2- the interpretation of values of Ct=0 and Ct>35. Shouldn't be > the > same, that there is no expression, or levels of expression under capacity > of > detection? > Ct > 35 is generally taken to mean expression under capacity of detection, whereas Ct=0 in the output of many qPCR instruments is the same as NA, i.e. a value that is completely missing for any number of reasons, not necessarily because of low gene expression level. > question 3- I notice that miRNAs which are flagged by failed are being > included on the analysis. Shouldn't these miRNAs be excluded? I thought > that > these would be, and I could find a filter for miRNA classified as > undetermined but not for miRNAs flagged as failed. > They're not excluded per default, since the user might not necessarily agree with the parameters set by the qPCR instrument (or HTqPCR functions) for failed Ct measurements. Removing them does make sense in some cases though. Does it work to say something like: q.filt <- filterCtData(q.norm, remove.category="failed") You might have to run filterCtData twice. Once to remove category="Undetermined", and a second time to remove category="failed". Depends on how many types of classes you have in featureCategory(q.norm). Per default I think HTqPCR just expect "OK", "Undetermined" and "Unreliable". HTH \Heidi > thanks in advance for the help. Kind regards, > Andreia > > the code that I have used is bellow: > > library(HTqPCR) > source("http://www.bioconductor.org/biocLite.R") > biocLite("statmod") > library(statmod) > setwd("G:/margarida_miRNA") > path<-("G:/margarida_miRNA") > files <- read.delim(file.path(path, "Data.txt")) > raw <- readCtData(files=files$File, path=path, n.features = 96, flag = 4, > feature = 1, type = 2, position = 3, Ct = 5, header = FALSE, SDS = FALSE) > pdf("plotCtOverview_raw_data.pdf") > plotCtOverview(raw,genes=g, xlim=c(0,50), groups=files$Treatment, conf.int > =TRUE) > dev.off() > raw.cat<-setCategory(raw, groups=files$Treatment, Ct.max=38, Ct.min=8, > flag=TRUE, flag.out="Failed",verbose=TRUE, quantile=0.8) > pdf("plot_Category.pdf") > plotCtCategoryraw.cat) > plotCtCategoryraw.cat, by.feature=TRUE, cexRow=0.1) > dev.off() > q.norm<-normalizeCtDataraw.cat,norm="quantile") > library(HTqPCR) > qFilt<-filterCtData(q.norm, remove.type="Endogenous") > iqr.values<-apply(exprs(q.norm),1,IQR) > iqr.filt<-filterCtData(q.norm, remove.IQR=1.5, > remove.category="Undetermined") > pdf("severalgraphs.pdf") > plotCtCor(q.norm, main="Ct correlation") > plotCtDensity(q.norm) > plotCtDensity(iqr.filt) > plotCtBoxes(q.norm) > plotCtBoxes(iqr.filt) > plotCtScatter(q.norm, cards=c(1,2), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(1,3), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(2,3), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(4,5), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(1,6), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(2,6), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(3,6), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(4,6), col="type", diag=TRUE) > plotCtScatter(q.norm, cards=c(5,6), col="type", diag=TRUE) > plotCtPairs(q.norm, col="type", diag=TRUE) > dev.off() > > > > design<-model.matrix(~0+files$Treatment) > > colnames(design)<-c("CT1","CT2","CT3") > > > > > contrasts<-makeContrasts(CT1-CT2, levels=design) > qDE.limma<-limmaCtData(iqr.filt, design=design, contrasts=contrasts, > ndups=1, spacing=1) > qDE.limma > dim(qDE.limma) > names(qDE.limma) > pdf("testingandheatmap.pdf") > plotCtRQ(qDE.limma, p.val=0.05, transform="log10", col="#9E0142") > g<-featureNames(iqr.filt) > plotCtHeatmap(iqr.filt, gene.names=g, dist="euclidean", cexRow=0.4) > cluster.list<-clusterCt(iqr.filt, type="genes", dist="euclidean", > n.cluster=8, hang=-1,cex=0.5) > dev.off() > > sessionInfo() > R version 2.11.1 (2010-05-31) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252 > > [3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C > > [5] LC_TIME=Portuguese_Portugal.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] statmod_1.4.6 HTqPCR_1.2.0 limma_3.4.4 > RColorBrewer_1.0-2 > [5] Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affy_1.26.1 affyio_1.16.0 gdata_2.8.0 > [4] gplots_2.8.0 gtools_2.6.2 preprocessCore_1.10.0 > [7] tools_2.11.1 > > > > > -- > -------------------------------------------- > Andreia J. Amaral > Unidade de Imunologia Cl?nica > Instituto de Medicina Molecular > Universidade de Lisboa > email: andreiaamaral at fm.ul.pt > andreia.fonseca at gmail.com >
ADD COMMENT
0
Entering edit mode
Hello Heidi, Sorry for my delayed response, but I was on vacations. Thanks for your response. Concerning point two, my problem is that I would like to make a heatmap which could be interpreted as if I had microarray data, where the expression increases with the change of color. With qpcr data, we get a gradient of colors which would look like a cilinder, so low values for no expression, then values 20-24 high expression and then values in the range 30-35, which means low expression. In my data when I have substituted the NA by 40, the clustering of the samples made much more sense with the biology than when I had values of zero for the NA values. And my question is in spite that the biological reasons for having 38 or NA are different, shouldn't NA values be treated as ct=40? Or then how could we represent better the clustering? thanks for the help. kind regards, Andreia 7, 2010 at 6:29 PM, Heidi Dvinge <heidi@ebi.ac.uk> wrote: > Hello Andreia, > > sorry for the late reply, this email slipped under my radar. > > > > I am analyzing data of qPCR plates in which each well correponds to a > > miRNA. > > The system does not have replicates within the same plate. > > the data consistes in 3 types of cells: I have 3 replicates for cell type > > one; 2 replicates for cell type 2 and only one sampled cell type 3. > > The data of each cell type and each replicate is in a separate file where > > the data has 5 columns > > miRNA ID Target or Endogenous Position on plate Passed or Failed Ct > > value > > miR-18a Target A1 Passed 26.99 this means that I have 6 files: > > File Treatment > > sample1.txt cell type1 > > sample2.txt cell type1 > > sample3.txt cell type1 > > sample1.txt cell type2 > > sample2.txt cell type2 > > sample1.txt cell type3 > > > > > > > > For the miRNA which did not have expression in a biological replicate, I > > have replaced NA by 0. > > > > question 1 -So after normalizing using quantile, several miRNA which did > > not > > have expression in sample 1 cell type 3 the value were normalized to > > 23.1191666666667. I notice that this happened only for this cell type, > for > > which I do not have replicates. Can someone explain me why does this > > happen? > > > This is probably all the values that were replaced with 0, no? This is an > artefact of the quantile normalisation, which will force all the > distributions of Ct values to be identical - whether this is appropriate > or not! Values within each sample are sorted according to Ct value, the > average Ct value for rank 1, 2, ... n are calculated across samples, and > that value is assigned to the genes in that order. So if for example 20 > out of 96 genes are 0 in one sample, these will be ranked 77-96 (ties are > treated equally) and are assigned the mean value of genes ranked 77-96 in > all samples. In your case that mean value is apparently 23.12. > > Quantile normalisation is in general quite a harsh normalisation method, > and shouldn't be used if you have e.g. many missing values, or if you > expect a genuine, i.e. biologically real, higher or lower gene expression > level in some samples. > > > > question 2- the interpretation of values of Ct=0 and Ct>35. Shouldn't be > > the > > same, that there is no expression, or levels of expression under capacity > > of > > detection? > > > Ct > 35 is generally taken to mean expression under capacity of detection, > whereas Ct=0 in the output of many qPCR instruments is the same as NA, > i.e. a value that is completely missing for any number of reasons, not > necessarily because of low gene expression level. > > > question 3- I notice that miRNAs which are flagged by failed are being > > included on the analysis. Shouldn't these miRNAs be excluded? I thought > > that > > these would be, and I could find a filter for miRNA classified as > > undetermined but not for miRNAs flagged as failed. > > > They're not excluded per default, since the user might not necessarily > agree with the parameters set by the qPCR instrument (or HTqPCR functions) > for failed Ct measurements. Removing them does make sense in some cases > though. Does it work to say something like: > > q.filt <- filterCtData(q.norm, remove.category="failed") > > You might have to run filterCtData twice. Once to remove > category="Undetermined", and a second time to remove category="failed". > Depends on how many types of classes you have in featureCategory(q.norm). > Per default I think HTqPCR just expect "OK", "Undetermined" and > "Unreliable". > > HTH > \Heidi > > > thanks in advance for the help. Kind regards, > > Andreia > > > > the code that I have used is bellow: > > > > library(HTqPCR) > > source("http://www.bioconductor.org/biocLite.R") > > biocLite("statmod") > > library(statmod) > > setwd("G:/margarida_miRNA") > > path<-("G:/margarida_miRNA") > > files <- read.delim(file.path(path, "Data.txt")) > > raw <- readCtData(files=files$File, path=path, n.features = 96, flag = 4, > > feature = 1, type = 2, position = 3, Ct = 5, header = FALSE, SDS = FALSE) > > pdf("plotCtOverview_raw_data.pdf") > > plotCtOverview(raw,genes=g, xlim=c(0,50), groups=files$Treatment, > conf.int > > =TRUE) > > dev.off() > > raw.cat<-setCategory(raw, groups=files$Treatment, Ct.max=38, Ct.min=8, > > flag=TRUE, flag.out="Failed",verbose=TRUE, quantile=0.8) > > pdf("plot_Category.pdf") > > plotCtCategoryraw.cat) > > plotCtCategoryraw.cat, by.feature=TRUE, cexRow=0.1) > > dev.off() > > q.norm<-normalizeCtDataraw.cat,norm="quantile") > > library(HTqPCR) > > qFilt<-filterCtData(q.norm, remove.type="Endogenous") > > iqr.values<-apply(exprs(q.norm),1,IQR) > > iqr.filt<-filterCtData(q.norm, remove.IQR=1.5, > > remove.category="Undetermined") > > pdf("severalgraphs.pdf") > > plotCtCor(q.norm, main="Ct correlation") > > plotCtDensity(q.norm) > > plotCtDensity(iqr.filt) > > plotCtBoxes(q.norm) > > plotCtBoxes(iqr.filt) > > plotCtScatter(q.norm, cards=c(1,2), col="type", diag=TRUE) > > plotCtScatter(q.norm, cards=c(1,3), col="type", diag=TRUE) > > plotCtScatter(q.norm, cards=c(2,3), col="type", diag=TRUE) > > plotCtScatter(q.norm, cards=c(4,5), col="type", diag=TRUE) > > plotCtScatter(q.norm, cards=c(1,6), col="type", diag=TRUE) > > plotCtScatter(q.norm, cards=c(2,6), col="type", diag=TRUE) > > plotCtScatter(q.norm, cards=c(3,6), col="type", diag=TRUE) > > plotCtScatter(q.norm, cards=c(4,6), col="type", diag=TRUE) > > plotCtScatter(q.norm, cards=c(5,6), col="type", diag=TRUE) > > plotCtPairs(q.norm, col="type", diag=TRUE) > > dev.off() > > > > > > > > design<-model.matrix(~0+files$Treatment) > > > > colnames(design)<-c("CT1","CT2","CT3") > > > > > > > > > > contrasts<-makeContrasts(CT1-CT2, levels=design) > > qDE.limma<-limmaCtData(iqr.filt, design=design, contrasts=contrasts, > > ndups=1, spacing=1) > > qDE.limma > > dim(qDE.limma) > > names(qDE.limma) > > pdf("testingandheatmap.pdf") > > plotCtRQ(qDE.limma, p.val=0.05, transform="log10", col="#9E0142") > > g<-featureNames(iqr.filt) > > plotCtHeatmap(iqr.filt, gene.names=g, dist="euclidean", cexRow=0.4) > > cluster.list<-clusterCt(iqr.filt, type="genes", dist="euclidean", > > n.cluster=8, hang=-1,cex=0.5) > > dev.off() > > > > sessionInfo() > > R version 2.11.1 (2010-05-31) > > i386-pc-mingw32 > > > > locale: > > [1] LC_COLLATE=Portuguese_Portugal.1252 > LC_CTYPE=Portuguese_Portugal.1252 > > > > [3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C > > > > [5] LC_TIME=Portuguese_Portugal.1252 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] statmod_1.4.6 HTqPCR_1.2.0 limma_3.4.4 > > RColorBrewer_1.0-2 > > [5] Biobase_2.8.0 > > > > loaded via a namespace (and not attached): > > [1] affy_1.26.1 affyio_1.16.0 gdata_2.8.0 > > [4] gplots_2.8.0 gtools_2.6.2 preprocessCore_1.10.0 > > [7] tools_2.11.1 > > > > > > > > > > -- > > -------------------------------------------- > > Andreia J. Amaral > > Unidade de Imunologia Clínica > > Instituto de Medicina Molecular > > Universidade de Lisboa > > email: andreiaamaral@fm.ul.pt > > andreia.fonseca@gmail.com > > > > > -- -------------------------------------------- Andreia J. Amaral Unidade de Imunologia Clínica Instituto de Medicina Molecular Universidade de Lisboa email: andreiaamaral@fm.ul.pt andreia.fonseca@gmail.com [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6