Hello,
is there an easy way to query an annotated Expression-Set by Gene-Symbols etc. to extract the Expression-Values for a single gene?
I tried to annotate the row-names of the Expression-Matrix with a list of Gen-Symbols obtained from the annotation-file, but as some Gen-Symbols are duplicates, this causes some issues.
Thanks in advance!
Hey Pawel,
thanks alot for the answer. I looked into this function and it seems like it is exactly what I was looking for.
However, I have two questions regarding the function:
1. The method states, that collapsing methods "maxMean", "Average", etc. are unreliable for 5 or fewer samples, why is that?
2. The number of rows in the function-output is almost half the size of the original expression Matrix. Why are there so many _at probe-sets mapping to the same Gene? Is there any advantage emerging?
Hey bi_Scholar,
AD.2 Usually two or more probesets are homologus to a different regions of the same gene transript.
Nevertheless, to help you more efficiently I would like to know what is your array platform, how many samples and sample groups (phenotypes) do you have?
Cheers,
Pawel
Hey Pawel,
thanks for taking time to answer my questions, I really appreciate it!
I'm working with an Affymetrix Dataset, downloaded from GEO on the GPL96 platform (HG-U133A).
The Samples are grouped by disease-state (healthy/infected) with 5 samples each.
I want to analyze the Data for Gene-Coexpression within a given group and therefore need to be able to query the Expression-Values by Gene-Symbol.
So far, your proposed method worked perfectly, I was just curious about the things above.
Cheers!
bi_Scholar,
If there is kind of restriction for small sample groups in collapseRows function I propose you even better solution.
You need to normalize your data (from the raw .cel files) using custom annotation package [ I recommend Brainarray package]. Here is explanation why:
http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-48
Link to custom CDFs (use latest version (v.20)):
http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp
In most of the cases after normalisation and annotation of probesets to gene symbols you will get matrix (or df) without duplicated gene names [there might be hovewer few duplicates - I used to remove them by hand].
All above will lead to removal of duplicates (as you whished) and obtaining more reliable expression values than with collapseRows function.
Cheers,
Pawel