Question

negative correlation from limma's duplicateCorrelation

0

Entering edit mode

Ian Sudbery ▴ 30

@ian-sudbery-5757

Last seen 10.6 years ago

Hi all, I might be misunderstanding something here, but I think I'm having trouble with Limma's duplicateCorrelation function. I am analysing a set of custom spotted two color microarrays. Each comparison contains two arrays. Each array contains two copies of each spot. The design table for each comparison looks something like this: >targets SlideNumber FileName Cy3 Cy5 Date Name yeast_wt_v_dd_rep1 823 s823_bwp17y+ddy.txt wty ddy 17/12/2012 yeast_wt_v_dd_rep1 yeast_wt_v_dd_rep2 828 s828_wty+ddy.txt wty ddy 18/01/2013 yeast_wt_v_dd_rep2 After background correction and normalisation, I run duplicateCorrelation before fitting the model: >corfit <- duplicateCorrelation(MA_norm, design = c(1,1), ndups=2, spacing = 8324) When I look at the results, I find that the consensus correlation is negative: > corfit$consensus.correlation [1] -0.4163954 Surely this is wrong? Examining the correlation for duplicate spots on each slide manually suggests a good correlation, particularly if one ignores flagged spots: > good_spots <- MA_norm$weights[1:8324,] == 1 & MA_norm$weights[8325:16648,] ==1 > cor(MA_norm$M[1:8324,1][good_spots[,1]],MA_norm$M[8325:16648,1][good _spots[,1]]) [1] 0.8769604 > cor(MA_norm$M[1:8324,2][good_spots[,2]],MA_norm$M[8325:16648,2][good _spots[,2]]) [1] 0.858891 Even without removing the flagged spots, the correlation is still positive: > cor(MA_norm$M[1:8324,1],MA_norm$M[8325:16648,1]) [1] 0.2669379 > cor(MA_norm$M[1:8324,2],MA_norm$M[8325:16648,2]) [1] 0.1843577 I know that this isn't the way that duplicateCorrelation calculates rho, but one would expect that a good correlation in this way might indicate at least a positive correlation from duplicateCorrelation? Or am I interpreting the consensus correlation incorrectly? Here is the rest of my analysis script (I've left out some diagnostic plot generation): >targets <- readTargets(row.names="Name") >RG <- read.maimages(targets$FileName, source = "genepix", wt.fun=wtflags(weight=0, cutoff=-50)) >spottypes <- readSpotTypes() >RG$genes$Status <- controlStatus(spottypes,RG) >anno <- read.delim("anno.txt", comment.char="!", header = T) >RG_background <- backgroundCorrect(RG,method = "normexp", offset =10) > >#flag out spots where intentity isn't much above background in both channels >RG_background$weights[RG_background$R < 50 & RG_background$G < 50] = 0 >MA <- normalizeWithinArrays(RG_background) >MA_norm <- MA[MA$genes$Status == "cDNA",] >MA_norm$genes$Status <- NULL >corfit <- duplicateCorrelation(MA_norm, design = c(1,1), ndups=2, spacing = 8324) >fit <- lmFit(MA_norm, design = design, ndups = 2, correlation=corfit$consensus.correlation,spacing = 8324) >fit<- eBayes(fit) >sesssionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] dichromat_2.0-0 gplots_2.11.0 MASS_7.3-18 KernSmooth_2.23-7 caTools_1.14 gdata_2.12.0 gtools_2.7.0 [8] reshape2_1.2.2 ggplot2_0.9.3 statmod_1.4.16 limma_3.14.4 loaded via a namespace (and not attached): [1] bitops_1.0-5 colorspace_1.2-1 digest_0.6.2 gtable_0.1.2 labeling_0.1 munsell_0.4 plyr_1.8 [8] proto_0.3-10 RColorBrewer_1.0-5 scales_0.2.3 stringr_0.6.2 tools_2.15.1 Any advice appreciated Ian [[alternative HTML version deleted]]

• 1.5k views

ADD COMMENT • link updated 12.1 years ago by Gordon Smyth 52k • written 12.1 years ago by Ian Sudbery ▴ 30

score 0 · Answer 1 · 2013-03-13

Dear Ian, The help page for duplicateCorrelation() says "For this function to return statistically useful results, there must be at least two more arrays than the number of coefficients to be estimated, i.e., two more than the column rank of design." You have just two arrays, only one more than the number of columns in the design matrix. So there is not enough meaningful information from which to estimate the duplicate correlation. The correlations that you are computing using cor() are not comparable to the quantity estimated by duplicateCorrelation. You are computing correlations between pairs of spots for the same gene across different genes. This correlation arises from the fact that different genes have different expression levels. You would get a positive value for this correlation even if there were no spot duplicates. The duplicateCorrelation() function on the other hand computes a correlation across technical replicates for the same gene. These two correlations are not related. Best wishes Gordon > Date: Mon, 11 Mar 2013 18:27:25 +0000 > From: Ian Sudbery <ian.sudbery at="" dpag.ox.ac.uk=""> > To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] negative correlation from limma's duplicateCorrelation > > Hi all, > > I might be misunderstanding something here, but I think I'm having > trouble with Limma's duplicateCorrelation function. > > I am analysing a set of custom spotted two color microarrays. Each > comparison contains two arrays. Each array contains two copies of each > spot. The design table for each comparison looks something like this: > >> targets > SlideNumber FileName Cy3 Cy5 Date Name > yeast_wt_v_dd_rep1 823 s823_bwp17y+ddy.txt wty ddy 17/12/2012 yeast_wt_v_dd_rep1 > yeast_wt_v_dd_rep2 828 s828_wty+ddy.txt wty ddy 18/01/2013 yeast_wt_v_dd_rep2 > > > After background correction and normalisation, I run > duplicateCorrelation before fitting the model: > >> corfit <- duplicateCorrelation(MA_norm, design = c(1,1), ndups=2, spacing = 8324) > > When I look at the results, I find that the consensus correlation is > negative: > >> corfit$consensus.correlation > [1] -0.4163954 > > Surely this is wrong? Examining the correlation for duplicate spots on > each slide manually suggests a good correlation, particularly if one > ignores flagged spots: > >> good_spots <- MA_norm$weights[1:8324,] == 1 & MA_norm$weights[8325:16648,] ==1 >> cor(MA_norm$M[1:8324,1][good_spots[,1]],MA_norm$M[8325:16648,1][goo d_spots[,1]]) > [1] 0.8769604 > >> cor(MA_norm$M[1:8324,2][good_spots[,2]],MA_norm$M[8325:16648,2][goo d_spots[,2]]) > [1] 0.858891 > > Even without removing the flagged spots, the correlation is still positive: >> cor(MA_norm$M[1:8324,1],MA_norm$M[8325:16648,1]) > [1] 0.2669379 > >> cor(MA_norm$M[1:8324,2],MA_norm$M[8325:16648,2]) > [1] 0.1843577 > > I know that this isn't the way that duplicateCorrelation calculates rho, > but one would expect that a good correlation in this way might indicate > at least a positive correlation from duplicateCorrelation? Or am I > interpreting the consensus correlation incorrectly? > > Here is the rest of my analysis script (I've left out some diagnostic plot generation): > >> targets <- readTargets(row.names="Name") >> RG <- read.maimages(targets$FileName, source = "genepix", wt.fun=wtflags(weight=0, cutoff=-50)) >> spottypes <- readSpotTypes() >> RG$genes$Status <- controlStatus(spottypes,RG) >> anno <- read.delim("anno.txt", comment.char="!", header = T) >> RG_background <- backgroundCorrect(RG,method = "normexp", offset =10) >> >> #flag out spots where intentity isn't much above background in both channels >> RG_background$weights[RG_background$R < 50 & RG_background$G < 50] = 0 >> MA <- normalizeWithinArrays(RG_background) >> MA_norm <- MA[MA$genes$Status == "cDNA",] >> MA_norm$genes$Status <- NULL >> corfit <- duplicateCorrelation(MA_norm, design = c(1,1), ndups=2, spacing = 8324) >> fit <- lmFit(MA_norm, design = design, ndups = 2, correlation=corfit$consensus.correlation,spacing = 8324) >> fit<- eBayes(fit) > >> sesssionInfo() > R version 2.15.1 (2012-06-22) > > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 > [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods base > > other attached packages: > [1] dichromat_2.0-0 gplots_2.11.0 MASS_7.3-18 KernSmooth_2.23-7 caTools_1.14 gdata_2.12.0 gtools_2.7.0 > [8] reshape2_1.2.2 ggplot2_0.9.3 statmod_1.4.16 limma_3.14.4 > > loaded via a namespace (and not attached): > [1] bitops_1.0-5 colorspace_1.2-1 digest_0.6.2 gtable_0.1.2 labeling_0.1 munsell_0.4 plyr_1.8 > [8] proto_0.3-10 RColorBrewer_1.0-5 scales_0.2.3 stringr_0.6.2 tools_2.15.1 > > Any advice appreciated > > Ian > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}