Entering edit mode
Manhong Dai
▴
200
@manhong-dai-1910
Last seen 10.2 years ago
Dear Bioconductor users,
Since version 11, the MBNI (Molecular & Behavioral Neuroscience
Institute, University of Michigan) Custom CDF repository is added into
Bioconductor. Therefore, Bioconductor user can use biocLite() to
install
those R packages, while the traditional Download/R CMD INSTALL {}
still
works.
The custom CDF is designed to make the GeneChip probe set definitions
consistent with the latest version of the genome and transcriptome
databases. Several systematic analyses show that the updated probe set
definition can lead to 30%-50% differences in the list of
differentially
expressed genes and more significant changes in gene set-based
analysis
approaches. We currently support most Affymetrix GeneChips and
generate
probe set definitions based on several major gene and transcript
definitions for each species.
The following is a list of commonly used custom CDFs and their pros
and
cons:
1. Entrez Gene based CDF: most widely used, excellent for gene-based
target definitions. One probe set for one unique gene in the
corresponding database.
2. Refseq-based CDF: most stable. good for transcript-based analysis.
The shortcoming is probe sets representing different transcripts from
the same gene may be identical or highly similar due to the lack of
transcript-specific probes on GeneChip.
3. UniGene-based CDF: used to be the preferred choice if the goal is
to
represent as many genes as possible. We recommend Entrez Gene for
species that have similar gene-based probe set count, which include
human and mouse now.
4. ENSEMBL/VEGA Gene/Transcript/Exon: for researchers prefer the
ENSEMBL/VEGA system. VEGA is supposed to be expert curated thus its
gene/transcript/exon definitions are conceivably more accurate. The
exon-based probe sets can be used to detect some alternative splicing
events even in GeneChips not designed for splicing analysis.
We want to thank James MacDonald, Marc Carlson and Patrick Aboyoun for
helping us to setup custom CDFs on the bioconductor system. We also
want
to thank many users for their suggestions, which are essential for the
continuous improvement of custom CDFs. Our work is supported by the
Pritzker Neuropsychiatric Disorders Research Consortium and the
National
Center for Integrated Biomedical Informatics.
Best,
Manhong Dai
Molecular & Behavioral Neuroscience Institute
University of Michigan