For the comparison of fold changes between a microarray and an RNA-seq experiment i need one fold change per gene from the microarray data. I'm working with agilent microarray data using the limma package. Question one is, is the normalisation afffected by the filtering and hence has to be done before? Question two, how to filter the data. The data is read and stored using the following command:
raw <- read.maimages(targets, source = 'agilent',green.only = TRUE, other.columns = c('gPValFeatEqBG'))
The raw object then looks like this:
> str(raw) Formal class 'EListRaw' [package "limma"] with 1 slot ..@ .Data:List of 6 .. ..$ : num [1:42303, 1:26] 49018 1472 1471 1440 1405 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : NULL .. .. .. ..$ : chr [1:26] "GSM1150975_48h_LDLDL_CT0_1" "GSM1150976_48h_LDLDL_CT1_1" "GSM1150977_48h_LDLDL_CT3_1" "GSM1150965_48h_LDLDL_CT5_1" ... .. ..$ : num [1:42303, 1:26] 25 26 26 26 26 26 26 26 26 27 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : NULL .. .. .. ..$ : chr [1:26] "GSM1150975_48h_LDLDL_CT0_1" "GSM1150976_48h_LDLDL_CT1_1" "GSM1150977_48h_LDLDL_CT3_1" "GSM1150965_48h_LDLDL_CT5_1" ... .. ..$ :'data.frame': 26 obs. of 3 variables: .. .. ..$ SampleNumber: int [1:26] 1 2 3 4 5 6 7 8 9 10 ... .. .. ..$ FileName : chr [1:26] "GSM1150975_48h_LDLDL_CT0_1.txt" "GSM1150976_48h_LDLDL_CT1_1.txt" "GSM1150977_48h_LDLDL_CT3_1.txt" "GSM1150965_48h_LDLDL_CT5_1.txt" ... .. .. ..$ Condition : chr [1:26] "CT0" "CT1" "CT3" "CT5" ... .. ..$ :'data.frame': 42303 obs. of 10 variables: .. .. ..$ Row : int [1:42303] 1 1 1 1 1 1 1 1 1 1 ... .. .. ..$ Col : int [1:42303] 1 2 3 4 5 6 7 8 9 10 ... .. .. ..$ Start : int [1:42303] 0 0 0 0 0 0 0 0 0 0 ... .. .. ..$ Sequence : chr [1:42303] "" "" "" "" ... .. .. ..$ ProbeUID : int [1:42303] 0 1 1 1 1 1 1 1 1 1 ... .. .. ..$ ControlType : int [1:42303] 1 1 1 1 1 1 1 1 1 1 ... .. .. ..$ ProbeName : chr [1:42303] "GE_BrightCorner" "DarkCorner" "DarkCorner" "DarkCorner" ... .. .. ..$ GeneName : chr [1:42303] "GE_BrightCorner" "DarkCorner" "DarkCorner" "DarkCorner" ... .. .. ..$ SystematicName: chr [1:42303] "GE_BrightCorner" "DarkCorner" "DarkCorner" "DarkCorner" ... .. .. ..$ Description : chr [1:42303] "" "" "" "" ... .. ..$ : chr "agilent" .. ..$ :List of 1 .. .. ..$ gPValFeatEqBG: num [1:42303, 1:26] 0.00 6.52e-03 5.08e-02 1.76e-04 1.32e-08 ... .. .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. .. ..$ : NULL .. .. .. .. ..$ : chr [1:26] "GSM1150975_48h_LDLDL_CT0_1" "GSM1150976_48h_LDLDL_CT1_1" "GSM1150977_48h_LDLDL_CT3_1" "GSM1150965_48h_LDLDL_CT5_1" ...
How can i filter this object to only keep the entry for each gene with the best pValue?(keep only the best probe)
Greetings Max
Maybe you need to explain more of the design and your strategy. e.g., is this single or dual channel data? What kind of normalization do you need to do? Loess? And why do you need to filter?
If you want fold changes, just compare the two groups with each other. If you want to use limma, these (log2) FC will apear in the toptables.