Question

Strong upwards correlation between log fold change and average log expression in MD plot of Limma

0

Entering edit mode

gil.hornung • 0

@gilhornung-9503

Last seen 5.4 years ago

Hi,

I am trying to analyse my proteomics data (intensity based label free quantification) using Limma.

When I look at the MD plot after limma fit, I see a strong correlation between log-fold-change and average-log-expression. What could cause this odd behaviour and how can I fix it?

Thanks,

Gil

plotmd

The code I used is:

> imat <- read.delim("matrix.tsv", sep = "\t", row.names = 1) %>% 
+   as.matrix()
> head(imat)
          X51      X56      X57      X61      X65      X67      X72      X74      X77      X83      X85      X88
7966  1670300  1057800  1261700  1319300  1273000  1371400   950630  1920300   656870   706280  1157600   860140
7580  2727900  1536000  2675700  2168800  2246800  2530700  7548400  3874900  1761800  2203100  2815500  1867300
5068 27261000 19086000 19027000 16980000 24905000 29381000 27716000 24395000 15292000 16111000 13618000 14989000
4178   124790   372080        0   240420        0        0   492620   495800  2050400   408840  5791100  4327800
1038  1076100   962340  1384800        0  1395300        0   910440        0   714210        0        0   465820
1117        0   154830   159230        0        0        0   221510        0        0   133880   243150   881710
> sample_info <- read.delim("sample_desc.tsv",sep = "\t",stringsAsFactors = T)
> head(sample_info)
  Sample Group
1    X51     A
2    X56     A
3    X57     A
4    X61     A
5    X65     A
6    X67     A
> 
> design <- model.matrix(~0+sample_info$Group)
> colnames(design) <- levels(sample_info$Group)
> 
> limma_fit = lmFit(log2(imat+1), design = design)
> plotMD(limma_fit)

session info:

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.8 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ecoliLeucine_1.10.0  ecolicdf_2.18.0      affy_1.48.0          Biobase_2.30.0       BiocGenerics_0.16.1  BiocInstaller_1.20.3
 [7] plotly_4.5.6         ggdendro_0.1-20      gridExtra_2.2.1      purrr_0.2.2          ggplot2_2.2.1        dplyr_0.5.0         
[13] tidyr_0.6.0          tibble_1.2           limma_3.26.9         knitr_1.15.1        

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2      amap_0.8-14           htmltools_0.3.5       stats4_3.2.1          viridisLite_0.1.3     yaml_2.1.14          
 [7] base64enc_0.1-3       DBI_0.6               bit64_0.9-5           RColorBrewer_1.1-2    affyio_1.40.0         matrixStats_0.51.0   
[13] plyr_1.8.4            stringr_1.2.0         zlibbioc_1.16.0       munsell_0.4.3         gtable_0.2.0          htmlwidgets_0.8      
[19] memoise_1.0.0         evaluate_0.10         labeling_0.3          IRanges_2.4.8         AnnotationDbi_1.32.3  preprocessCore_1.32.0
[25] highr_0.6             Rcpp_0.12.10          edgeR_3.12.1          scales_0.4.1          S4Vectors_0.8.11      jsonlite_1.0         
[31] bit_1.1-12            digest_0.6.12         stringi_1.1.2         grid_3.2.1            tools_3.2.1           magrittr_1.5         
[37] lazyeval_0.2.0        RSQLite_1.1-2         MASS_7.3-45           data.table_1.10.4     assertthat_0.1        httr_1.2.1           
[43] R6_2.2.0

limma proteomics • 2.1k views

ADD COMMENT • link updated 7.5 years ago by Gordon Smyth 52k • written 7.5 years ago by gil.hornung • 0

score 3 · Answer 1 · 2017-11-09

From ?plotMD, specifically the Details section

     If 'object' is an 'MArrayLM' object, then the plot is an fitted
     model MD-plot in which the estimated coefficient is on the y-axis
     and the average A-value is on the x-axis.

Since you have fit a cell means model (e.g., your coefficients estimate the mean of each group), what you are plotting as the 'logFC' is actually just the mean of the group, rather than the fold change. You need instead to either define a contrasts matrix and use contrasts.fit to compute the contrasts, or use a design with an intercept so your coefficients are already inherently contrasts.

score 2 · Answer 2 · 2017-11-10

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

Just to reinforce James' answer, the problem is that you haven't actually computed any log-fold-changes yet. So you're just plotting one mean against another.

Why did you include the "0+" in your model? Presumably you got that from reading a suggested analysis somewhere. If so, then go back to to the analysis and read what the next step was. If you don't already have some advice to read, then try Section 9.2 of the limma User's Guide.

ADD COMMENT • link 7.5 years ago Gordon Smyth 52k

0

Entering edit mode

Thank you Gordon and James for the fast and helpful answers.

I used "1+" in my model and it worked fine.

Al the best ,

Gil

ADD REPLY • link 7.5 years ago gil.hornung • 0