Dear Community,
i would like to address an "issue" i have discovered while importing and pre-processing an agilent microarray dataset in R. More specifically, part of my code is the following:
... SDRF <- read.delim("GSE12435.sdrf.txt",check.names=FALSE,stringsAsFactors=FALSE) files <- list.files(pattern = "GSM") dat <- read.maimages(files, source="agilent", green.only=T) y <- backgroundCorrect(dat,method="normexp") y <- normalizeBetweenArrays(y,method="quantile")
But then, when inspecting the range of the values of my normalized dataset, i noticed something strange:
> range(y$E) [1] -1.54555 18.77689
> mat <- y$E > length(mat[mat<0]) [1] 39 > mat[mat<0] [1] -0.797841119 -1.545549855 -0.138353557 -0.797841119 -0.797841119 -0.797841119 -1.123788523 [8] -0.138353557 -1.123788523 -1.545549855 -0.138353557 -0.797841119 -0.138353557 -1.545549855 ....(the rest of the negative values)
Thus, how should i deal with these negative values ? it has something to do with the background correction ? Maybe a naive solution is to add an offset, but of which value ? In other words, is a "general" approach on the offset, in not to change it in various other datasets with similar negative values" ? I also share a histogram of my normalized expression values:
https://www.dropbox.com/s/hokwmhh1ib2jo8n/histogram.png?dl=0
Any help would be great !!
Konstantinos
Dear James,
thank you for your quick answer---
but these values should not oppose a problem in downstream analysis ? and i should not use an offset anyway ??
Well, I think Gordon Smyth's group likes to use an offset of 50, IIRC, for this type of data. The problem with low expressing genes like that is you start to have noise dominating any signal that may be there.
You might consider excluding any genes that are consistently low expressing, on the assumption that they aren't really being expressed (genes that are really low in a subset of samples shouldn't be excluded, because those may well be very interesting). But from a statistical standpoint the negative values don't pose any problem at all.