Hi,
I have a mouse cell-line time-course gene expression data measured on two-color Agilent microarrays.
At time = 0, RNA was purified from 4 cell-lines. Following that, all cell lines were treated with a drug, and then RNA was purified over 10 time points. The reference channel for each cell line (Cy5 - red) is the time = 0 respective sample.
I'm trying to use limma
for analyzing the data.
I've followed these steps:
1. Created a targets data.frame (targets.df)
, in which a FileName
column species the microarray file path, a Cy5
column specifies the time = 0 sample names , a Cy3
column specifies all other time > 0 sample names, and a Date
column which specifies the scan date.
2. I then read the microarrays using this command:
array.list <- limma::read.maimages(targets.df, source = "agilent.median")
3. I then do the background correction using this command:
bg.corrected.array.list <- limma::backgroundCorrect(array.list, method = "normexp", offset = 16)
4. I then do within array normalization using this command:
within.normalized.array.list <- limma::normalizeWithinArrays(bg.corrected.array.list, method = "loess")
5. I then do between array normalization using this command:
between.normalized.array.list <- limma::normalizeBetweenArrays(within.normalized.array.list, method = "Aquantile")
6. I then read my GEO
format annotations for the microarray (using simple read.table
), for adding the genes
annotation data.frame
to the between.normalized.array.list
. During this step I filter out the control probes (annotated as CONTROL_TYPE = TRUE
in my annotations file).
7. I then collapse the probes to genes using WGCNA::collapseRows
. For this I set the rownames
of between.normalized.array.list$M
and between.normalized.array.list$A
to between.normalized.array.list$genes$ID
(probe ID), and run:
collapsed.probes.list <- WGCNA::collapseRows(datET = between.normalized.array.list$M, rowGroup = between.normalized.array.list$genes$gene_id, rowID = between.normalized.array.list$genes$ID)
6. I then override between.normalized.array.list$M
, between.normalized.array.list$A
, and between.normalized.array.list$genes
, with the collapsed.probes.list
output :
between.normalized.array.list$M <- between.normalized.array.list$M[which(collapsed.probes.list$selectedRow),]
between.normalized.array.list$A <- between.normalized.array.list$A[which(collapsed.probes.list$selectedRow),]
between.normalized.array.list$genes <- between.normalized.array.list$genes[which(collapsed.probes.list$selectedRow),] %>%
dplyr::select(gene_id, symbol)))
rownames(between.normalized.array.list$M) <- NULL
rownames(between.normalized.array.list$A) <- NULL
I then looked at a specific gene of interest - plotting its between.normalized.array.list$M
values vs. time
and it looks the opposite from what I expected - the expression goes up with time rather than down.
Is because I swapped the dyes in my microarrays and the expected reference channel should be Cy3 rather than Cy5?
As far as I understand this cannot be corrected by limma::read.maimages.
Any idea how to deal with this?
Thanks a lot,
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /sw/R/R-3.4.3-install/lib64/R/lib/libRblas.so
LAPACK: /sw/R/R-3.4.3-install/lib64/R/lib/libRlapack.so
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 dplyr_0.7.5
loaded via a namespace (and not attached):
[1] bitops_1.0-6 matrixStats_0.51.0
[3] bit64_0.9-7 RColorBrewer_1.1-2
[5] doParallel_1.0.11 httr_1.3.1
[7] prabclus_2.2-6 GenomeInfoDb_1.13.4
[9] dynamicTreeCut_1.63-1 backports_1.1.1
[11] tools_3.4.3 R6_2.2.2
[13] rpart_4.1-13 Hmisc_4.0-3
[15] DBI_0.7 lazyeval_0.2.1
[17] BiocGenerics_0.23.4 colorspace_1.2-7
[19] trimcluster_0.1-2 nnet_7.3-12
[21] tidyselect_0.2.4 gridExtra_2.3
[23] parsingUtils_0.1.0 preprocessCore_1.40.0
[25] bit_1.1-12 compiler_3.4.3
[27] WGCNA_1.51 Biobase_2.38.0
[29] htmlTable_1.9 DelayedArray_0.4.1
[31] plotly_4.7.1.9000 rtracklayer_1.37.3
[33] checkmate_1.8.5 diptest_0.75-7
[35] scales_0.5.0.9000 DEoptimR_1.0-8
[37] mvtnorm_1.0-6 robustbase_0.92-8
[39] stringr_1.3.0 digest_0.6.15
[41] Rsamtools_1.26.1 foreign_0.8-67
[43] XVector_0.17.0 base64enc_0.1-3
[45] pkgconfig_2.0.1 htmltools_0.3.6
[47] limma_3.34.9 htmlwidgets_1.2
[49] rlang_0.2.1.9000 impute_1.52.0
[51] RSQLite_2.0 VGAM_1.0-5
[53] bindr_0.1.1 jsonlite_1.5
[55] mclust_5.4 BiocParallel_1.10.1
[57] acepack_1.4.1 dendextend_1.5.2
[59] analysisUtils_0.0.0.9000 RCurl_1.95-4.8
[61] magrittr_1.5 modeltools_0.2-21
[63] GO.db_3.5.0 GenomeInfoDbData_0.99.1
[65] Formula_1.2-2 coneproj_1.12
[67] Matrix_1.2-12 Rcpp_0.12.18
[69] munsell_0.4.3 S4Vectors_0.15.5
[71] viridis_0.5.1 stringi_1.2.2
[73] whisker_0.3-2 yaml_2.1.19
[75] MASS_7.3-50 SummarizedExperiment_1.8.0
[77] zlibbioc_1.20.0 flexmix_2.3-13
[79] plyr_1.8.4 grid_3.4.3
[81] blob_1.1.0 parallel_3.4.3
[83] lattice_0.20-34 Biostrings_2.45.3
[85] splines_3.4.3 knitr_1.17
[87] pillar_1.1.0 fastcluster_1.1.22
[89] GenomicRanges_1.29.15 fpc_2.1-10
[91] codetools_0.2-15 stats4_3.4.3
[93] XML_3.98-1.9 glue_1.2.0
[95] latticeExtra_0.6-28 data.table_1.11.4
[97] foreach_1.4.4 gtable_0.2.0
[99] purrr_0.2.4 tidyr_0.8.1
[101] kernlab_0.9-25 assertthat_0.2.0
[103] ggplot2_3.0.0.9000 class_7.3-14
[105] survival_2.40-1 viridisLite_0.3.0
[107] tibble_1.4.2 iterators_1.0.10
[109] GenomicAlignments_1.10.0 AnnotationDbi_1.40.0
[111] memoise_1.1.0 IRanges_2.11.19
[113] cluster_2.0.6