Entering edit mode
Breuk102
▴
50
@breuk102-836
Last seen 10.2 years ago
Dear Limma users,
In our lab we print our own microarray slides. We print both dedicated
and
whole genome arrays. With dedicated arrays there is the possibility to
find
large subsets of data which is up or down regulated, rendering it
difficult
to normalize this data with loess or printtip loess. Recently we have
a
slide containing tags for plant as well as pathogen together with
spikes and
other control spots. It now happens that if you have a time course
experiment with the pathogen on the plants you get a very large subset
of
genes that seem to be upregulated however this is is not true. This is
because in our control (not infected plants) there is almost no RNA
present
of our pathogen. In time this pathogen grows so relatively more
pathogen RNA
is produced. To follow these gene products trough time I needed a way
to
normalize on a subset of genes (pathogen) only. If have written a
"little"
function and extension for the normalization function that can be used
in
limma to render it possible to normalize on a subset of genes only.
The
first function is called flags()
RG <- flags(RG, "regex") where regex is eg. "genesX|genesY|genesZ"
The
regex uses the RG$genes$Name column and searches for names similar to
the
ones in your regex list. e.g conrol1, control2 can be regex as "con"
After this I added to the normalizeWithinArrays() function an extra
option:
MA<-normalizeWithinArrays(RG, method="subsetloess")
Basically the method flags gives all selected genes a 1 and the others
a 0
than it uses this as weights in the normalization (loess). If you woul
like
to also use the spotweights themselves than you can multiply the
RG$flags
column with the weights column. After normalization you'll see that
only
your subset of genes have been used to calculate the loess fit. In our
case
this works very well. I have validated my data against other
experiments in
which we compensate for the mRNA level difference (which is time
consuming)
and the results are similar. I fit the data lmFit only on the genes of
interest and voila great results.
If you would like to know more or see some scatterplots I'm happy to
e-mail
them.
I'm well aware that other possibilities for normalization exists
however
this function is simple and fast (at least for us) and therefore it
might be
usefull for others too.
Yours,
Bas van Breukelen
=== functions ===
normalizeWithinArrays
// the whole function is long... only the last part containing the
addition
is shown here.
}, robustspline = {
if (is.null(layout))
stop("Layout argument not specified")
for (j in 1:narrays) object$M[, j] <-
normalizeRobustSpline(object$M[,
j], object$A[, j], layout, df = df, method = robust)
}, subsetloess = {
controlspots = TRUE
for (j in 1:narrays){
y <- object$M[,j]
x <- object$A[,j]
w <- object$flags
object$M[,j] <- loessFit(y,x,w, span=span, iterations
=
iterations)$residuals
}
})
object
}
flags <- function (object, regex)
{
object$flags <- regexpr(regex, as.vector(object$genes[,6]))
rows <- NROW(object$flags)
for (j in 1:rows){
if (object$flags[j] < 0){
object$flags[j] <- 0
}else{
object$flags[j] <- 1
}
}
return(object)
}
===========
=================================
Dr. Ir. B. van Breukelen
PostDoc, Bioinformatics, Molecular genetics
Dept. of Biology.
Room N407: H.R. Kruytgebouw
Padualaan 8
3574 CH Utrecht
Tel: +31(0)30 253 3355
Mobile: +31(0)6 24 996046
e-mail: b.vanbreukelen@bio.uu.nl
website: http://genomics.bio.uu.nl/