Get Gen-Expression values accross samples for ONE Gene given by Gene-Symbol
1
0
Entering edit mode
bi_Scholar • 0
@bi_scholar-11572
Last seen 8.0 years ago

Hello,

is there an easy way to query an annotated Expression-Set by Gene-Symbols etc. to extract the Expression-Values for a single gene?

I tried to annotate the row-names of the Expression-Matrix with a list of Gen-Symbols obtained from the annotation-file, but as some Gen-Symbols are duplicates, this causes some issues.
 

Thanks in advance!

expressionset bioconductor • 1.8k views
ADD COMMENT
1
Entering edit mode
polemiraza ▴ 70
@polemiraza-11428
Last seen 3.0 years ago
Poland

Hello bi_Scholar,

I think that collapsing (aggregating)  several probe measurements corresponding to a single gene would be the best solution.

I recommend collapseRows function in WGCNA package (you can use matrix or data frame as an input). There are a number of useful  options by which you can aggregate your data eg. collapseRows can pick up the probe with highest mean value or maximum variance  across the samples...etc. It can obviously  take average expression value of probes  corresponding to a  gene.

Cheers,

Pawel

ADD COMMENT
0
Entering edit mode

Hey Pawel,

thanks alot for the answer. I looked into this function and it seems like it is exactly what I was looking for.
However, I have two questions regarding the function:

1. The method states, that collapsing methods "maxMean", "Average", etc. are unreliable for 5 or fewer samples, why is that?
2. The number of rows in the function-output is almost half the size of the original expression Matrix. Why are there so many _at probe-sets mapping to the same Gene? Is there any advantage emerging?

ADD REPLY
1
Entering edit mode

Hey bi_Scholar,

AD.2 Usually two or more probesets are homologus to a different regions of the same gene transript.

Nevertheless, to help you more efficiently I would like to know what is your array platform, how many samples and sample groups (phenotypes) do you have?

Cheers,

Pawel

ADD REPLY
0
Entering edit mode

Hey Pawel,

thanks for taking time to answer my questions, I really appreciate it!
I'm working with an Affymetrix Dataset, downloaded from GEO on the GPL96 platform (HG-U133A).

The Samples are grouped by disease-state (healthy/infected) with 5 samples each.
I want to analyze the Data for Gene-Coexpression within a given group and therefore need to be able to query the Expression-Values by Gene-Symbol.
So far, your proposed method worked perfectly, I was just curious about the things above.

Cheers!

ADD REPLY
1
Entering edit mode

bi_Scholar,

If there is kind of restriction for small sample groups in collapseRows function I propose you even better solution.

You need to normalize your data (from the raw .cel files) using custom annotation package [ I recommend Brainarray package].  Here is explanation why:

http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-48

 

Link to custom CDFs (use latest version (v.20)):

http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp

In most of the cases after normalisation and annotation of probesets to gene symbols you will get matrix (or df) without duplicated gene names [there might be hovewer few duplicates - I used to remove them by hand].

All above will lead to removal of duplicates (as you whished) and obtaining more reliable expression values than with  collapseRows function.

Cheers,

Pawel

ADD REPLY

Login before adding your answer.

Traffic: 675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6