Reading and normalizing two-color Agilent microarrays
2
0
Entering edit mode
dr ▴ 10
@dr-9473
Last seen 20 months ago
United States

Hi,

 

I have a mouse cell-line time-course gene expression data measured on two-color Agilent microarrays.

At time = 0, RNA was purified from 4 cell-lines. Following that, all cell lines were treated with a drug, and then RNA was purified over 10 time points. The reference channel for each cell line (Cy5 - red) is the time = 0 respective sample.

 

I'm trying to use limma for analyzing the data.

 

I've followed these steps:

1. Created a targets data.frame (targets.df), in which a FileName column species the microarray file path, a Cy5 column specifies the time = 0 sample names , a Cy3 column specifies all other time > 0 sample names, and a Date column which specifies the scan date.

2. I then read the microarrays using this command:

array.list <- limma::read.maimages(targets.df, source = "agilent.median")

3. I then do the background correction using this command:

bg.corrected.array.list <- limma::backgroundCorrect(array.list, method = "normexp", offset = 16)

4. I then do within array normalization using this command:

within.normalized.array.list <- limma::normalizeWithinArrays(bg.corrected.array.list, method = "loess")

5.  I then do between array normalization using this command:

between.normalized.array.list <- limma::normalizeBetweenArrays(within.normalized.array.list, method = "Aquantile")

6. I then read my GEO format annotations for the microarray (using simple read.table), for adding the genes annotation data.frame to the between.normalized.array.list. During this step I filter out the control probes (annotated as CONTROL_TYPE = TRUE in my annotations file).

7. I then collapse the probes to genes using WGCNA::collapseRows. For this I set the rownames of between.normalized.array.list$M and between.normalized.array.list$A to between.normalized.array.list$genes$ID (probe ID), and run:

collapsed.probes.list <- WGCNA::collapseRows(datET = between.normalized.array.list$M, rowGroup = between.normalized.array.list$genes$gene_id, rowID = between.normalized.array.list$genes$ID)

 

6. I then override between.normalized.array.list$Mbetween.normalized.array.list$A, and between.normalized.array.list$genes, with the collapsed.probes.list output :

  between.normalized.array.list$M <- between.normalized.array.list$M[which(collapsed.probes.list$selectedRow),]
  between.normalized.array.list$A <- between.normalized.array.list$A[which(collapsed.probes.list$selectedRow),]
  between.normalized.array.list$genes <- between.normalized.array.list$genes[which(collapsed.probes.list$selectedRow),] %>%
    dplyr::select(gene_id, symbol)))
  rownames(between.normalized.array.list$M) <- NULL
  rownames(between.normalized.array.list$A) <- NULL

 

 

I then looked at a specific gene of interest - plotting its between.normalized.array.list$M values vs. time and it looks the opposite from what I expected - the expression goes up with time rather than down.

Is because I swapped the dyes in my microarrays and the expected reference channel should be Cy3 rather than Cy5?

 

As far as I understand this cannot be corrected by limma::read.maimages. Any idea how to deal with this?

 

 

Thanks a lot,

 

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /sw/R/R-3.4.3-install/lib64/R/lib/libRblas.so
LAPACK: /sw/R/R-3.4.3-install/lib64/R/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] bindrcpp_0.2.2 dplyr_0.7.5

loaded via a namespace (and not attached):
  [1] bitops_1.0-6               matrixStats_0.51.0
  [3] bit64_0.9-7                RColorBrewer_1.1-2
  [5] doParallel_1.0.11          httr_1.3.1
  [7] prabclus_2.2-6             GenomeInfoDb_1.13.4
  [9] dynamicTreeCut_1.63-1      backports_1.1.1
 [11] tools_3.4.3                R6_2.2.2
 [13] rpart_4.1-13               Hmisc_4.0-3
 [15] DBI_0.7                    lazyeval_0.2.1
 [17] BiocGenerics_0.23.4        colorspace_1.2-7
 [19] trimcluster_0.1-2          nnet_7.3-12
 [21] tidyselect_0.2.4           gridExtra_2.3
 [23] parsingUtils_0.1.0         preprocessCore_1.40.0
 [25] bit_1.1-12                 compiler_3.4.3
 [27] WGCNA_1.51                 Biobase_2.38.0
 [29] htmlTable_1.9              DelayedArray_0.4.1
 [31] plotly_4.7.1.9000          rtracklayer_1.37.3
 [33] checkmate_1.8.5            diptest_0.75-7
 [35] scales_0.5.0.9000          DEoptimR_1.0-8
 [37] mvtnorm_1.0-6              robustbase_0.92-8
 [39] stringr_1.3.0              digest_0.6.15
 [41] Rsamtools_1.26.1           foreign_0.8-67
 [43] XVector_0.17.0             base64enc_0.1-3
 [45] pkgconfig_2.0.1            htmltools_0.3.6
 [47] limma_3.34.9               htmlwidgets_1.2
 [49] rlang_0.2.1.9000           impute_1.52.0
 [51] RSQLite_2.0                VGAM_1.0-5
 [53] bindr_0.1.1                jsonlite_1.5
 [55] mclust_5.4                 BiocParallel_1.10.1
 [57] acepack_1.4.1              dendextend_1.5.2
 [59] analysisUtils_0.0.0.9000   RCurl_1.95-4.8
 [61] magrittr_1.5               modeltools_0.2-21
 [63] GO.db_3.5.0                GenomeInfoDbData_0.99.1
 [65] Formula_1.2-2              coneproj_1.12
 [67] Matrix_1.2-12              Rcpp_0.12.18
 [69] munsell_0.4.3              S4Vectors_0.15.5
 [71] viridis_0.5.1              stringi_1.2.2
 [73] whisker_0.3-2              yaml_2.1.19
 [75] MASS_7.3-50                SummarizedExperiment_1.8.0
 [77] zlibbioc_1.20.0            flexmix_2.3-13
 [79] plyr_1.8.4                 grid_3.4.3
 [81] blob_1.1.0                 parallel_3.4.3
 [83] lattice_0.20-34            Biostrings_2.45.3
 [85] splines_3.4.3              knitr_1.17
 [87] pillar_1.1.0               fastcluster_1.1.22
 [89] GenomicRanges_1.29.15      fpc_2.1-10
 [91] codetools_0.2-15           stats4_3.4.3
 [93] XML_3.98-1.9               glue_1.2.0
 [95] latticeExtra_0.6-28        data.table_1.11.4
 [97] foreach_1.4.4              gtable_0.2.0
 [99] purrr_0.2.4                tidyr_0.8.1
[101] kernlab_0.9-25             assertthat_0.2.0
[103] ggplot2_3.0.0.9000         class_7.3-14
[105] survival_2.40-1            viridisLite_0.3.0
[107] tibble_1.4.2               iterators_1.0.10
[109] GenomicAlignments_1.10.0   AnnotationDbi_1.40.0
[111] memoise_1.1.0              IRanges_2.11.19
[113] cluster_2.0.6

 

 

limma agilent microarrays • 1.6k views
ADD COMMENT
4
Entering edit mode
@gordon-smyth
Last seen 1 minute ago
WEHI, Melbourne, Australia

limma defines M-values to be Cy5-Cy3 so, naturally, if you put your reference samples on Cy5 instead of Cy3 then all the M-values will be minus what they would have been had you used Cy3 for the reference samples.

I don't see any reason why that should be a problem. Why does it cause you a problem that the M-values decrease over time for some genes instead of going up? You'll still get the same results in the end. If it does bother you, then just multiple the M matrix by -1.

By the way, WGCNA::collapseRows is not a good way to collapse two-color data down to genes because it's designed for single-channel data. A much better way is:

x <- between.normalized.array.list
Amean <- rowMeans(x$A)
o <- order(Amean, decreasing=TRUE)
x <- x[o,]
dup <- duplicated(x$genes$gene_id)
x <- x[!dup]

 

ADD COMMENT
1
Entering edit mode
@peter-langfelder-4469
Last seen 6 weeks ago
United States

Regarding WGCNA::collapseRows for two-color data, Gordon is correct that the default setting (method = "MaxMean") is a poor choice for two-color data, or, for that matter, any data that represent deviation from a reference sample. Using argument method="Average" would be a better choice, but Gordon's solution is best (basically does what collapseRows would do on single-channel data - pick the probe with the maximum mean in the A channel).

ADD COMMENT

Login before adding your answer.

Traffic: 727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6