Entering edit mode
Maxim
▴
170
@maxim-3843
Last seen 10.3 years ago
Hi,
thank you, that solved most my problems. But of course this is where
new
problems arise.
Anyhow, this aspect works nicely, thanks for pointing me at the right
direction.
Best regards
Maxim
2010/3/31 James MacDonald <jmacdon@med.umich.edu>
>
>
> Maxim wrote:
>
>> Hi,
>>
>> I have a question concerning the analysis of some affymetrix chips.
I
>> downloaded some of the data from GEO GSE11324 (see below). In doing
so I'm
>> stuck after I identified the probesets with significant changes. I
have
>> problems in assigning probeset specific gene names as well as
getting the
>> genomic coordinates. Furthermore I have no clue how to deal with
the fact,
>> that most genes have different probesets with differential
transcriptional
>> outcomes.
>>
>>
>> I did this based on the affy and limma manuals like:
>>
>> targets file:
>> Name FileName Target
>> 0h1 GSM286031.CEL control
>> 0h2 GSM286032.CEL control
>> 0h3 GSM286033.CEL control
>> 3h1 GSM286034.CEL three
>> 3h2 GSM286035.CEL three
>> 3h3 GSM286036.CEL three
>> 6h1 GSM286037.CEL six
>> 6h2 GSM286038.CEL six
>> 6h3 GSM286039.CEL six
>>
>>
>> library(affy)
>> library(limma)
>> library(vsn)
>>
>> pd <- read.AnnotatedDataFrame("er_for_affy.txt", header = TRUE,
row.names
>> =
>> 2)
>> pData(pd)
>> #### load
>> a <- ReadAffy(filenames = rownames(pData(pd)), phenoData = pd,
verbose =
>> TRUE)
>> #### normalize
>> x <- expresso(a, bg.correct = FALSE, normalize.method = "vsn",
>> normalize.param
>> = list(subsample = 1000), pmcorrect.method = "pmonly",
summary.method =
>> "medianpolish")
>> ### genes with highest variation
>> library(hgu133plus2.db)
>> rsd <- apply(exprs(x), 1, sd)
>> sel <- order(rsd, decreasing = TRUE)[1:250]
>>
>>
>> u<-mget(row.names(exprs(x)[sel,]),hgu133plus2SYMBOL)
>> heatmap(exprs(x)[sel, ], labRow=u)
>> ### in this case it works to extract the gene symbol
>>
>>
>> ### limma contrasts
>> design <- model.matrix(~ -1+factor(c(1,1,1,2,2,2,3,3,3)))
>> colnames(design) <- c("control", "three", "six")
>> fit <- lmFit(x, design)
>> meanSdPlot(x)
>> contrast.matrix <- makeContrasts(three-control, six-control,
>> levels=design)
>> fit2 <- contrasts.fit(fit, contrast.matrix)
>> fit2 <- eBayes(fit2)
>> #### top list
>> topTable(fit2, coef=1, adjust="BH", number=20, sort.by="M")
>> library(hgu133plus2.db)
>> u<-mget(row.names(fit2),hgu133plus2SYMBOL)
>>
>> How can I produce a topTable result with according gene names,
somehow I
>> do
>> not understand the genelist argument?
>>
>
> The genelist argument expects you to supply a vector of gene names
(which
> is what it says in the help page for this function).
>
> genelist <- unlist(mget(featureNames(x), hgu133plus2SYMBOL))
>
> topTable(fit2, coef = 1, number = 20, sort.by = "M", genelist =
genelist)
>
>
>
>> Next, I would like to produce a standard clustering of the "fold
changes"
>> observed within (averaged) contrasts 1 (three - control) and 2 (six
-
>> control) and a heatmap presentation of the results. How to extract
for
>> example all fold-changes of those genes with a p-value<0.001 in at
least
>> one
>> of the two contrasts?
>>
>
> rslt <- decideTests(fit2, p.value = 0.001)
> ind <- apply(rslt, 1, function(x) any(x != 0))
> fit2$coef[ind,]
>
>
>
>> The coordinates of the genes I seem to get with
>> v<-mget(row.names(fit2),hgu133plus2CHRLOC)
>> v<-mget(row.names(fit2),hgu133plus2CHRLOCEND)
>> But again I do not know, how to implement it into my fit2 object or
>> topTable
>> results. Furthermore there are many probesets with multiple
mappings,
>> should
>> these not be excluded from the analysis?
>>
>
> That's up to you.
>
>
>
>> Actually, in the end I'd like to get the corresponding genes'
coordinates
>> in
>> a way, that the maximum region size is given, eg in case of genes
with
>> multiple TSSs.
>>
>> As mentioned above, I do not know how to deal with the fact, that
many
>> genes
>> are represented with mutliple probesets, often with different fold
changes
>> -
>> is there a general recipe to deal with this question? Furthermore
there
>> are
>> many probesets with multiple mappings, should these not be excluded
from
>> the
>> analysis?
>>
>
> I believe the BioC case studies cover some of these points. One
method you
> might consider is to select the reporters with the largest value for
> whatever test statistic you are using. See the findLargest()
function in the
> genefilter package.
>
> Another alternative is to use the MBNI remapped cdfs, which
essentially
> removes these redundancies.
>
> Best,
>
> Jim
>
>
>
>> I know it's a lot of questions, so is there a general source of
>> information,
>> that may help me to overcome the hurdles?
>>
>> Maxim
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor@stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Hildebrandt Lab
> 8220D MSRB III
> 1150 W. Medical Center Drive
> Ann Arbor MI 48109-5646
> 734-936-8662
>
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should
not be
> used for urgent or sensitive issues
>
[[alternative HTML version deleted]]