Filtering probes without annotation prior to statistical test

0

Entering edit mode

Seungwoo Hwang ▴ 80

@seungwoo-hwang-2520

Last seen 10.6 years ago

Dear all, I am analyzing data from Affymetrix Human Gene 1.0 ST Array. After inspecting its probe annotation file, it came to my attention that it contains a lot of probesets without transcript annotation as follows; Total number of probesets: 33,298 (1) Probesets with annotation: 24,409 (73%) (2) Control probesets: 4,201 (13%) (3) Probesets without any annotation: 4,688 (14%) I am thinking about filtering out the probesets (2) and (3) prior to statistical tests in order to reduce the total number of probesets that are subject to statistical tests. Doing so will make a lot of differences in multiple testing correction, compared to doing statistical tests on all probesets (1),(2), and (3) followed by filtering out the probesets (2) and (3) from the DEG list. Is this type of filtering prior to statistical tests valid? Also, has anyone encountered a similar situation (dealing with array data with a lot of non-gene probes). Thanks, Seungwoo ------------------------------------ Seungwoo Hwang, Ph.D. Senior Research Scientist Korean Bioinformation Center (http://www.kobic.re.kr)

Annotation probe Annotation probe • 1.2k views

ADD COMMENT • link updated 16.7 years ago by Mark Cowley ▴ 910 • written 16.7 years ago by Seungwoo Hwang ▴ 80

0

Entering edit mode

Mark Cowley ▴ 910

@mark-cowley-2951

Last seen 10.6 years ago

Hi Seungwoo, that type of filtering is definitely valid, and I have seen very similar proportions of probesets with no annotation, however the number of probesets in group (3) changes with each new transcript.csv file (the latest being labelled na26), implying that some of the probesets may have had annotation in a previous version, and some that did have annotations no longer do. The only caveat with removing (3) is that there may be differentially expressed "genes/somethings" with a little effort in the form of aligning probe sequences could reveal some interesting novel biology. cheers, Mark On 29/07/2008, at 12:37 PM, Seungwoo Hwang wrote: > Dear all, > > I am analyzing data from Affymetrix Human Gene 1.0 ST Array. > > After inspecting its probe annotation file, it came to my attention > that it contains a lot of probesets without transcript annotation as > follows; > > Total number of probesets: 33,298 > (1) Probesets with annotation: 24,409 (73%) > (2) Control probesets: 4,201 (13%) > (3) Probesets without any annotation: 4,688 (14%) > > I am thinking about filtering out the probesets (2) and (3) prior to > statistical tests in order to reduce the total number of probesets > that are subject to statistical tests. Doing so will make a lot of > differences in multiple testing correction, compared to doing > statistical tests on all probesets (1),(2), and (3) followed by > filtering out the probesets (2) and (3) from the DEG list. > > Is this type of filtering prior to statistical tests valid? Also, > has anyone encountered a similar situation (dealing with array data > with a lot of non-gene probes). > > Thanks, > > Seungwoo > > ------------------------------------ > Seungwoo Hwang, Ph.D. > Senior Research Scientist > Korean Bioinformation Center (http://www.kobic.re.kr) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 16.7 years ago Mark Cowley ▴ 910

Login before adding your answer.