I have a mouse cell-line time-course gene expression data measured on two-color Agilent microarrays.
At time = 0, RNA was purified from 4 cell-lines. Following that, all cell lines were treated with a drug, and then RNA was purified over 10 time points. The reference channel for each cell line (Cy5 - red) is the time = 0 respective sample.
I'm trying to use limma
for analyzing the data.
I've followed these steps:
1. Created a targets data.frame (targets.df)
, in which a FileName
column species the microarray file path, a Cy5
column specifies the time = 0 sample names , a Cy3
column specifies all other time > 0 sample names, and a Date
column which specifies the scan date.
2. I then read the microarrays using this command:
array.list <- limma::read.maimages(targets.df, source = "agilent.median")
3. I then do the background correction using this command:
bg.corrected.array.list <- limma::backgroundCorrect(array.list, method = "normexp", offset = 16)
4. I then do within array normalization using this command:
within.normalized.array.list <- limma::normalizeWithinArrays(bg.corrected.array.list, method = "loess")
5. I then do between array normalization using this command:
between.normalized.array.list <- limma::normalizeBetweenArrays(within.normalized.array.list, method = "Aquantile")
6. I then read my GEO
format annotations for the microarray (using simple read.table
), for adding the genes
annotation data.frame
to the between.normalized.array.list
. During this step I filter out the control probes (annotated as CONTROL_TYPE = TRUE
in my annotations file).
7. I then collapse the probes to genes using WGCNA::collapseRows
. For this I set the rownames
of between.normalized.array.list$M
and between.normalized.array.list$A
to between.normalized.array.list$genes$ID
(probe ID), and run:
collapsed.probes.list <- WGCNA::collapseRows(datET = between.normalized.array.list$M, rowGroup = between.normalized.array.list$genes$gene_id, rowID = between.normalized.array.list$genes$ID)
6. I then override between.normalized.array.list$M
, between.normalized.array.list$A
, and between.normalized.array.list$genes
, with the collapsed.probes.list
output :
between.normalized.array.list$M <- between.normalized.array.list$M[which(collapsed.probes.list$selectedRow),]
between.normalized.array.list$A <- between.normalized.array.list$A[which(collapsed.probes.list$selectedRow),]
between.normalized.array.list$genes <- between.normalized.array.list$genes[which(collapsed.probes.list$selectedRow),] %>%
dplyr::select(gene_id, symbol)))
rownames(between.normalized.array.list$M) <- NULL
rownames(between.normalized.array.list$A) <- NULL
I then looked at a specific gene of interest - plotting its between.normalized.array.list$M
values vs. time
and it looks the opposite from what I expected - the expression goes up with time rather than down.
Is because I swapped the dyes in my microarrays and the expected reference channel should be Cy3 rather than Cy5?
As far as I understand this cannot be corrected by limma::read.maimages.
Any idea how to deal with this?
Thanks a lot,
