First time use of limma for two-color agilent array analysis
1
0
Entering edit mode
@calin-jageman-robert-6431
Last seen 7.7 years ago
I've been working on a two-color analysis of a custom-printed agilent array. I'm examining gene expression in CNS samples from Aplysia californica that have undergone unilateral sensitization training. I've got 8 animals, each with a sample from the trained and untrained side of the CNS. Each pair is hybridized to one two-color array. With the limma user guide and Google, I came up with the following script which seems to give quite reasonable results: library(limma) targets <- readTargets("targets.txt") RG <- read.maimages(targets, source="agilent") RG <- backgroundCorrect(RG, method="normexp", offset=16) MA <- normalizeWithinArrays(RG, method="loess") MA2 <- MA[MA$genes$ControlType==0,] MA2.avg <- avereps(MA2, ID=MA2$genes$GeneName) fit <- lmFit(MA2.avg) fit2 <- eBayes(fit) allgenes <- topTable(fit2,number=nrow(fit2)) write.table(allgenes,file="all_genes.txt") # I know topTable is not the only output to examine; this is just a start. Couple of quick questions for anyone out there who is reading: * I ended up with a much larger list of regulated transcripts than when I conducted a similar analysis in GeneSpring. In GeneSpring I ended up with ~400 transcripts regulated at with a FDR of 0.05. With limma, though, I get nearly 1,000 at FDR = 0.01! Many of these, though, have fairly small fold-changes. In fact, when I filter for at least 1.5x change, the list shrinks dramatically to about 60 transcripts. Is the much larger list I'm initially getting due to the enhanced power of using the bayes approach for estimating sampling error, or have I failed to properly adjust for multiple comparisons in this script? * My qPCR data from the same samples fits the limma analysis MUCH better. Over 15 transcripts where I've done qPCR on the same samples (mix of predicted up/down/stable), the r2 with GeneSpring estimates is about 0.5; with the limma estimates it's 0.83! Is limma really this much better? * I'm not sure if I fully understand the choice of offset in background correction. I've seen no offset, offsets of 16, 30.... Despite reading the documentation on the function, it's not clear to me how users select an offset or if it's actually worth using. Any feedback? Cheers, Bob ======== Robert Calin-Jageman Associate Professor, Psychology Neuroscience Program Director Dominican University rcalinjageman at dom.edu 708.524.6581
qPCR limma GeneSpring qPCR limma GeneSpring • 2.3k views
ADD COMMENT
0
Entering edit mode

Through painful experience I found that the script I quoted is NOT correct (in most cases).  Specifically,

fit <- lmFit(MA2.avg)

should have been:

design <- modelMatrix(targets, ref="Control")
fit <- lmFit(MA2.avg, design)

What's tricky about this bug is that lmFit doesn't complain about not receiving the design array...instead it simply proceeds assuming each dye is 1 condition.  If you have no dye swaps, this will produce the expected results.  But if you have dye swaps..disaster. 

I should have posted a correction long ago, but forgot I had posted this.  Just thought I'd add this comment just in case anyone finds this code via Google.

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 10 hours ago
United States
Hi Bob, On 2/26/2014 10:50 PM, Calin-Jageman, Robert wrote: > I've been working on a two-color analysis of a custom-printed agilent array. I'm examining gene expression in CNS samples from Aplysia californica that have undergone unilateral sensitization training. I've got 8 animals, each with a sample from the trained and untrained side of the CNS. Each pair is hybridized to one two-color array. > > With the limma user guide and Google, I came up with the following script which seems to give quite reasonable results: > > library(limma) > targets <- readTargets("targets.txt") > RG <- read.maimages(targets, source="agilent") > RG <- backgroundCorrect(RG, method="normexp", offset=16) > MA <- normalizeWithinArrays(RG, method="loess") > MA2 <- MA[MA$genes$ControlType==0,] > MA2.avg <- avereps(MA2, ID=MA2$genes$GeneName) > fit <- lmFit(MA2.avg) > fit2 <- eBayes(fit) > allgenes <- topTable(fit2,number=nrow(fit2)) > write.table(allgenes,file="all_genes.txt") > # I know topTable is not the only output to examine; this is just a start. > > Couple of quick questions for anyone out there who is reading: > * I ended up with a much larger list of regulated transcripts than when I conducted a similar analysis in GeneSpring. In GeneSpring I ended up with ~400 transcripts regulated at with a FDR of 0.05. With limma, though, I get nearly 1,000 at FDR = 0.01! Many of these, though, have fairly small fold-changes. In fact, when I filter for at least 1.5x change, the list shrinks dramatically to about 60 transcripts. Is the much larger list I'm initially getting due to the enhanced power of using the bayes approach for estimating sampling error, or have I failed to properly adjust for multiple comparisons in this script? I don't know what GeneSpring uses for their univariate tests, but if they aren't doing something like the eBayes step, their statistics will be underpowered. Note that the default for topTable is adjust = "BH", so you are adjusting for multiplicity. I'll leave it to others to decide if that is the correct way to do so. ;-D You might also consider using treat() rather than a post hoc fold change adjustment. The difference being that currently you are testing against the null hypothesis that the log fold change == 0, and then filtering on fold change (so there is no statistical evidence that the |fold| is actually larger than zero). The treat function incorporates the fold change into the test, so you are directly testing that the |fold| is greater than some value. This is much more conservative, so you will likely not want a 1.5x fold change. > > * My qPCR data from the same samples fits the limma analysis MUCH better. Over 15 transcripts where I've done qPCR on the same samples (mix of predicted up/down/stable), the r2 with GeneSpring estimates is about 0.5; with the limma estimates it's 0.83! Is limma really this much better? Like I said above, I have almost no experience with GeneSpring. But note that the target audience for the two platforms are quite divergent. GeneSpring is primarily intended for people who are willing to pay for a product that makes it easier for them to get their work done. Limma is the product of a lab that uses it for their own research, and while I am sure ease of use comes into play, the overarching goal is to produce software that improves power in a statistically rigorous way. > > * I'm not sure if I fully understand the choice of offset in background correction. I've seen no offset, offsets of 16, 30.... Despite reading the documentation on the function, it's not clear to me how users select an offset or if it's actually worth using. Any feedback? I went to the mailing list page http://www.bioconductor.org/help/mailing-list/ and then searched using backgroundCorrect offset smyth and the first hit I got was https://stat.ethz.ch/pipermail/bioconductor/2011-August/040885.html Best, Jim > > Cheers, > > Bob > > > > ======== > Robert Calin-Jageman > Associate Professor, Psychology > Neuroscience Program Director > Dominican University > rcalinjageman at dom.edu > 708.524.6581 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT

Login before adding your answer.

Traffic: 586 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6