stuck with affy/ limma

0

Entering edit mode

Maxim ▴ 170

@maxim-3843

Last seen 10.3 years ago

Hi, thank you, that solved most my problems. But of course this is where new problems arise. Anyhow, this aspect works nicely, thanks for pointing me at the right direction. Best regards Maxim 2010/3/31 James MacDonald <jmacdon@med.umich.edu> > > > Maxim wrote: > >> Hi, >> >> I have a question concerning the analysis of some affymetrix chips. I >> downloaded some of the data from GEO GSE11324 (see below). In doing so I'm >> stuck after I identified the probesets with significant changes. I have >> problems in assigning probeset specific gene names as well as getting the >> genomic coordinates. Furthermore I have no clue how to deal with the fact, >> that most genes have different probesets with differential transcriptional >> outcomes. >> >> >> I did this based on the affy and limma manuals like: >> >> targets file: >> Name FileName Target >> 0h1 GSM286031.CEL control >> 0h2 GSM286032.CEL control >> 0h3 GSM286033.CEL control >> 3h1 GSM286034.CEL three >> 3h2 GSM286035.CEL three >> 3h3 GSM286036.CEL three >> 6h1 GSM286037.CEL six >> 6h2 GSM286038.CEL six >> 6h3 GSM286039.CEL six >> >> >> library(affy) >> library(limma) >> library(vsn) >> >> pd <- read.AnnotatedDataFrame("er_for_affy.txt", header = TRUE, row.names >> = >> 2) >> pData(pd) >> #### load >> a <- ReadAffy(filenames = rownames(pData(pd)), phenoData = pd, verbose = >> TRUE) >> #### normalize >> x <- expresso(a, bg.correct = FALSE, normalize.method = "vsn", >> normalize.param >> = list(subsample = 1000), pmcorrect.method = "pmonly", summary.method = >> "medianpolish") >> ### genes with highest variation >> library(hgu133plus2.db) >> rsd <- apply(exprs(x), 1, sd) >> sel <- order(rsd, decreasing = TRUE)[1:250] >> >> >> u<-mget(row.names(exprs(x)[sel,]),hgu133plus2SYMBOL) >> heatmap(exprs(x)[sel, ], labRow=u) >> ### in this case it works to extract the gene symbol >> >> >> ### limma contrasts >> design <- model.matrix(~ -1+factor(c(1,1,1,2,2,2,3,3,3))) >> colnames(design) <- c("control", "three", "six") >> fit <- lmFit(x, design) >> meanSdPlot(x) >> contrast.matrix <- makeContrasts(three-control, six-control, >> levels=design) >> fit2 <- contrasts.fit(fit, contrast.matrix) >> fit2 <- eBayes(fit2) >> #### top list >> topTable(fit2, coef=1, adjust="BH", number=20, sort.by="M") >> library(hgu133plus2.db) >> u<-mget(row.names(fit2),hgu133plus2SYMBOL) >> >> How can I produce a topTable result with according gene names, somehow I >> do >> not understand the genelist argument? >> > > The genelist argument expects you to supply a vector of gene names (which > is what it says in the help page for this function). > > genelist <- unlist(mget(featureNames(x), hgu133plus2SYMBOL)) > > topTable(fit2, coef = 1, number = 20, sort.by = "M", genelist = genelist) > > > >> Next, I would like to produce a standard clustering of the "fold changes" >> observed within (averaged) contrasts 1 (three - control) and 2 (six - >> control) and a heatmap presentation of the results. How to extract for >> example all fold-changes of those genes with a p-value<0.001 in at least >> one >> of the two contrasts? >> > > rslt <- decideTests(fit2, p.value = 0.001) > ind <- apply(rslt, 1, function(x) any(x != 0)) > fit2$coef[ind,] > > > >> The coordinates of the genes I seem to get with >> v<-mget(row.names(fit2),hgu133plus2CHRLOC) >> v<-mget(row.names(fit2),hgu133plus2CHRLOCEND) >> But again I do not know, how to implement it into my fit2 object or >> topTable >> results. Furthermore there are many probesets with multiple mappings, >> should >> these not be excluded from the analysis? >> > > That's up to you. > > > >> Actually, in the end I'd like to get the corresponding genes' coordinates >> in >> a way, that the maximum region size is given, eg in case of genes with >> multiple TSSs. >> >> As mentioned above, I do not know how to deal with the fact, that many >> genes >> are represented with mutliple probesets, often with different fold changes >> - >> is there a general recipe to deal with this question? Furthermore there >> are >> many probesets with multiple mappings, should these not be excluded from >> the >> analysis? >> > > I believe the BioC case studies cover some of these points. One method you > might consider is to select the reporters with the largest value for > whatever test statistic you are using. See the findLargest() function in the > genefilter package. > > Another alternative is to use the MBNI remapped cdfs, which essentially > removes these redundancies. > > Best, > > Jim > > > >> I know it's a lot of questions, so is there a general source of >> information, >> that may help me to overcome the hurdles? >> >> Maxim >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > James W. MacDonald, M.S. > Biostatistician > Hildebrandt Lab > 8220D MSRB III > 1150 W. Medical Center Drive > Ann Arbor MI 48109-5646 > 734-936-8662 > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > [[alternative HTML version deleted]]

Clustering genefilter affy limma Clustering genefilter affy limma • 958 views

ADD COMMENT • link 14.8 years ago Maxim ▴ 170

Login before adding your answer.