Filtering probes without annotation prior to statistical test
1
0
Entering edit mode
@seungwoo-hwang-2520
Last seen 10.2 years ago
Dear all, I am analyzing data from Affymetrix Human Gene 1.0 ST Array. After inspecting its probe annotation file, it came to my attention that it contains a lot of probesets without transcript annotation as follows; Total number of probesets: 33,298 (1) Probesets with annotation: 24,409 (73%) (2) Control probesets: 4,201 (13%) (3) Probesets without any annotation: 4,688 (14%) I am thinking about filtering out the probesets (2) and (3) prior to statistical tests in order to reduce the total number of probesets that are subject to statistical tests. Doing so will make a lot of differences in multiple testing correction, compared to doing statistical tests on all probesets (1),(2), and (3) followed by filtering out the probesets (2) and (3) from the DEG list. Is this type of filtering prior to statistical tests valid? Also, has anyone encountered a similar situation (dealing with array data with a lot of non-gene probes). Thanks, Seungwoo ------------------------------------ Seungwoo Hwang, Ph.D. Senior Research Scientist Korean Bioinformation Center (http://www.kobic.re.kr)
Annotation probe Annotation probe • 1.1k views
ADD COMMENT
0
Entering edit mode
Mark Cowley ▴ 910
@mark-cowley-2951
Last seen 10.2 years ago
Hi Seungwoo, that type of filtering is definitely valid, and I have seen very similar proportions of probesets with no annotation, however the number of probesets in group (3) changes with each new transcript.csv file (the latest being labelled na26), implying that some of the probesets may have had annotation in a previous version, and some that did have annotations no longer do. The only caveat with removing (3) is that there may be differentially expressed "genes/somethings" with a little effort in the form of aligning probe sequences could reveal some interesting novel biology. cheers, Mark On 29/07/2008, at 12:37 PM, Seungwoo Hwang wrote: > Dear all, > > I am analyzing data from Affymetrix Human Gene 1.0 ST Array. > > After inspecting its probe annotation file, it came to my attention > that it contains a lot of probesets without transcript annotation as > follows; > > Total number of probesets: 33,298 > (1) Probesets with annotation: 24,409 (73%) > (2) Control probesets: 4,201 (13%) > (3) Probesets without any annotation: 4,688 (14%) > > I am thinking about filtering out the probesets (2) and (3) prior to > statistical tests in order to reduce the total number of probesets > that are subject to statistical tests. Doing so will make a lot of > differences in multiple testing correction, compared to doing > statistical tests on all probesets (1),(2), and (3) followed by > filtering out the probesets (2) and (3) from the DEG list. > > Is this type of filtering prior to statistical tests valid? Also, > has anyone encountered a similar situation (dealing with array data > with a lot of non-gene probes). > > Thanks, > > Seungwoo > > ------------------------------------ > Seungwoo Hwang, Ph.D. > Senior Research Scientist > Korean Bioinformation Center (http://www.kobic.re.kr) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6