Question

filtering agilent microarray data using limma in R

1

Entering edit mode

maximilian.collatz ▴ 10

@maximiliancollatz-11373

Last seen 8.5 years ago

For the comparison of fold changes between a microarray and an RNA-seq experiment i need one fold change per gene from the microarray data. I'm working with agilent microarray data using the limma package. Question one is, is the normalisation afffected by the filtering and hence has to be done before? Question two, how to filter the data. The data is read and stored using the following command:

raw <- read.maimages(targets, source = 'agilent',green.only = TRUE, other.columns = c('gPValFeatEqBG'))

The raw object then looks like this:

> str(raw)
Formal class 'EListRaw' [package "limma"] with 1 slot
  ..@ .Data:List of 6
  .. ..$ : num [1:42303, 1:26] 49018 1472 1471 1440 1405 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : chr [1:26] "GSM1150975_48h_LDLDL_CT0_1" "GSM1150976_48h_LDLDL_CT1_1" "GSM1150977_48h_LDLDL_CT3_1" "GSM1150965_48h_LDLDL_CT5_1" ...
  .. ..$ : num [1:42303, 1:26] 25 26 26 26 26 26 26 26 26 27 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : chr [1:26] "GSM1150975_48h_LDLDL_CT0_1" "GSM1150976_48h_LDLDL_CT1_1" "GSM1150977_48h_LDLDL_CT3_1" "GSM1150965_48h_LDLDL_CT5_1" ...
  .. ..$ :'data.frame':	26 obs. of  3 variables:
  .. .. ..$ SampleNumber: int [1:26] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. ..$ FileName    : chr [1:26] "GSM1150975_48h_LDLDL_CT0_1.txt" "GSM1150976_48h_LDLDL_CT1_1.txt" "GSM1150977_48h_LDLDL_CT3_1.txt" "GSM1150965_48h_LDLDL_CT5_1.txt" ...
  .. .. ..$ Condition   : chr [1:26] "CT0" "CT1" "CT3" "CT5" ...
  .. ..$ :'data.frame':	42303 obs. of  10 variables:
  .. .. ..$ Row           : int [1:42303] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..$ Col           : int [1:42303] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. ..$ Start         : int [1:42303] 0 0 0 0 0 0 0 0 0 0 ...
  .. .. ..$ Sequence      : chr [1:42303] "" "" "" "" ...
  .. .. ..$ ProbeUID      : int [1:42303] 0 1 1 1 1 1 1 1 1 1 ...
  .. .. ..$ ControlType   : int [1:42303] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..$ ProbeName     : chr [1:42303] "GE_BrightCorner" "DarkCorner" "DarkCorner" "DarkCorner" ...
  .. .. ..$ GeneName      : chr [1:42303] "GE_BrightCorner" "DarkCorner" "DarkCorner" "DarkCorner" ...
  .. .. ..$ SystematicName: chr [1:42303] "GE_BrightCorner" "DarkCorner" "DarkCorner" "DarkCorner" ...
  .. .. ..$ Description   : chr [1:42303] "" "" "" "" ...
  .. ..$ : chr "agilent"
  .. ..$ :List of 1
  .. .. ..$ gPValFeatEqBG: num [1:42303, 1:26] 0.00 6.52e-03 5.08e-02 1.76e-04 1.32e-08 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : NULL
  .. .. .. .. ..$ : chr [1:26] "GSM1150975_48h_LDLDL_CT0_1" "GSM1150976_48h_LDLDL_CT1_1" "GSM1150977_48h_LDLDL_CT3_1" "GSM1150965_48h_LDLDL_CT5_1" ...

How can i filter this object to only keep the entry for each gene with the best pValue?(keep only the best probe)

Greetings Max

microarray limma • 1.3k views

ADD COMMENT • link updated 8.5 years ago by Gordon Smyth 52k • written 8.5 years ago by maximilian.collatz ▴ 10

0

Entering edit mode

Maybe you need to explain more of the design and your strategy. e.g., is this single or dual channel data? What kind of normalization do you need to do? Loess? And why do you need to filter?

If you want fold changes, just compare the two groups with each other. If you want to use limma, these (log2) FC will apear in the toptables.

ADD REPLY • link 8.5 years ago b.nota ▴ 370

score 0 · Answer 1 · 2016-08-30

Hi Max,

First off, the customized show function for an EListRaw object would be much easier to read:

show(raw)

You should normalize the data before you start filter any probes, except possibly you might remove control probes. For example you might use:

x <- backgroundCorrect(raw, method="normexp")
y <- normalizeBetweenArrays(x, method="quantile")

Then you might have a look at your data:

plotMD(y, column=1, status=y$genes$ControlType)

You can't choose the probe with the best p-value for each gene until you compute a p-value. At the moment, you have read a detection p-value for each observation, but this gives a series of detection p-values for each probe instead of one summary p-value. What were you hoping to do?