Human Gene array data analysis workflow

0

Entering edit mode

Javier Pérez Florido ▴ 840

@javier-perez-florido-3121

Last seen 6.8 years ago

Dear list, A possible data analysis workflow for EXON arrays could be as follows (extracted from "Exon Array data analysis using Affymetrix Power Tools and R statistical software", Briefings in Bioinformatics): * Normalization and summarization (at exon or gene-level) of the array set. * Quality control of exon array data of summarization results (to remove possible outliers) * Specific filtering steps, for example: o Restrict analysis to core probesets o Filter for undetected probesets (i.e., undetected exons), making use of DABG (Detected above background) analysis. o Filter for cross-hybridizing probesets (exons) o Filter for genes undetected genes in all groups I'm running a gene-level data analysis on Human GENE ST 1.0 (not EXON) arrays, which are, in principle, designed for gene expression profiling, that is, a gene-level analysis. My question is related to the filtering step. I was wondering if, once the normalization and summarization is run at the transcript level (core), giving 33297 transcripts, the following filtering can be run before differential expression analysis: * Remove control transcripts such as other_spike, AFFX, pos_control (normgene->exon) and neg_control (normgene->intron). This step removes around 4156 transcripts * Remove transcripts with very low variability through varFilter function (genefilter package) Since these were the steps recommended in "Bioconductor case studies" book for 3'IVT arrays (the controls were different in 3'IVT), I was wondering if these 2 filtering steps can also be used on Human Gene arrays for gene-level analysis or, on the contrary, I have to run the filtering steps described above for EXON arrays. Thanks, Javier P.S. If you know any data analysis workflow document for HuGene arrays, please, let me know [[alternative HTML version deleted]]

Normalization Normalization • 1.8k views

ADD COMMENT • link updated 13.7 years ago by cstrato ★ 3.9k • written 13.7 years ago by Javier Pérez Florido ▴ 840

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 6.2 years ago

Austria

Dear Javier, In principle every workflow for Exon arrays can also be applied to Gene arrays. One more note: In principle you could use package "xps" for all these steps: - rma(.., exonlevel="core") will only use the core genes but not AFFX or control genes - PreFilter(mad=c(0.5,0.01)) etc will eliminate all transcripts with low variability For more details see e.g. example script "xps/examples/script4exon.R" which shows you the workflow for HuExon and HuGene arrays. Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ On 4/28/11 7:50 PM, Javier P?rez Florido wrote: > Dear list, > A possible data analysis workflow for EXON arrays could be as follows > (extracted from "Exon Array data analysis using Affymetrix Power Tools > and R statistical software", Briefings in Bioinformatics): > > * Normalization and summarization (at exon or gene-level) of the > array set. > * Quality control of exon array data of summarization results (to > remove possible outliers) > * Specific filtering steps, for example: > o Restrict analysis to core probesets > o Filter for undetected probesets (i.e., undetected exons), > making use of DABG (Detected above background) analysis. > o Filter for cross-hybridizing probesets (exons) > o Filter for genes undetected genes in all groups > > I'm running a gene-level data analysis on Human GENE ST 1.0 (not EXON) > arrays, which are, in principle, designed for gene expression profiling, > that is, a gene-level analysis. My question is related to the filtering > step. I was wondering if, once the normalization and summarization is > run at the transcript level (core), giving 33297 transcripts, the > following filtering can be run before differential expression analysis: > > * Remove control transcripts such as other_spike, AFFX, pos_control > (normgene->exon) and neg_control (normgene->intron). This step > removes around 4156 transcripts > * Remove transcripts with very low variability through varFilter > function (genefilter package) > > Since these were the steps recommended in "Bioconductor case studies" > book for 3'IVT arrays (the controls were different in 3'IVT), I was > wondering if these 2 filtering steps can also be used on Human Gene > arrays for gene-level analysis or, on the contrary, I have to run the > filtering steps described above for EXON arrays. > Thanks, > Javier > P.S. If you know any data analysis workflow document for HuGene arrays, > please, let me know > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 13.7 years ago cstrato ★ 3.9k

0

Entering edit mode

Thanks Christian, But are also correct the other filter steps (the ones applied to 3'IVT) for Gene arrays? Thanks, Javier On 28/04/2011 23:05, cstrato wrote: > Dear Javier, > > In principle every workflow for Exon arrays can also be applied to > Gene arrays. > > One more note: > In principle you could use package "xps" for all these steps: > > - rma(.., exonlevel="core") will only use the core genes but not AFFX > or control genes > > - PreFilter(mad=c(0.5,0.01)) etc will eliminate all transcripts with > low variability > > For more details see e.g. example script "xps/examples/script4exon.R" > which shows you the workflow for HuExon and HuGene arrays. > > Best regards > Christian > _._._._._._._._._._._._._._._._._._ > C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a > V.i.e.n.n.a A.u.s.t.r.i.a > e.m.a.i.l: cstrato at aon.at > _._._._._._._._._._._._._._._._._._ > > > On 4/28/11 7:50 PM, Javier P?rez Florido wrote: >> Dear list, >> A possible data analysis workflow for EXON arrays could be as follows >> (extracted from "Exon Array data analysis using Affymetrix Power Tools >> and R statistical software", Briefings in Bioinformatics): >> >> * Normalization and summarization (at exon or gene-level) of the >> array set. >> * Quality control of exon array data of summarization results (to >> remove possible outliers) >> * Specific filtering steps, for example: >> o Restrict analysis to core probesets >> o Filter for undetected probesets (i.e., undetected exons), >> making use of DABG (Detected above background) analysis. >> o Filter for cross-hybridizing probesets (exons) >> o Filter for genes undetected genes in all groups >> >> I'm running a gene-level data analysis on Human GENE ST 1.0 (not >> EXON) >> arrays, which are, in principle, designed for gene expression profiling, >> that is, a gene-level analysis. My question is related to the filtering >> step. I was wondering if, once the normalization and summarization is >> run at the transcript level (core), giving 33297 transcripts, the >> following filtering can be run before differential expression analysis: >> >> * Remove control transcripts such as other_spike, AFFX, pos_control >> (normgene->exon) and neg_control (normgene->intron). This step >> removes around 4156 transcripts >> * Remove transcripts with very low variability through varFilter >> function (genefilter package) >> >> Since these were the steps recommended in "Bioconductor case studies" >> book for 3'IVT arrays (the controls were different in 3'IVT), I was >> wondering if these 2 filtering steps can also be used on Human Gene >> arrays for gene-level analysis or, on the contrary, I have to run the >> filtering steps described above for EXON arrays. >> Thanks, >> Javier >> P.S. If you know any data analysis workflow document for HuGene arrays, >> please, let me know >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >

ADD REPLY • link 13.7 years ago Javier Pérez Florido ▴ 840

0

Entering edit mode

At the moment I do not know which steps you mean, but in principle, yes: After preprocessing you have a dataframe of gene expression levels. It does not matter which chip was used. Christian On 4/28/11 11:10 PM, Javier P?rez Florido wrote: > Thanks Christian, > But are also correct the other filter steps (the ones applied to 3'IVT) > for Gene arrays? > > Thanks, > Javier > > > On 28/04/2011 23:05, cstrato wrote: >> Dear Javier, >> >> In principle every workflow for Exon arrays can also be applied to >> Gene arrays. >> >> One more note: >> In principle you could use package "xps" for all these steps: >> >> - rma(.., exonlevel="core") will only use the core genes but not AFFX >> or control genes >> >> - PreFilter(mad=c(0.5,0.01)) etc will eliminate all transcripts with >> low variability >> >> For more details see e.g. example script "xps/examples/script4exon.R" >> which shows you the workflow for HuExon and HuGene arrays. >> >> Best regards >> Christian >> _._._._._._._._._._._._._._._._._._ >> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a >> V.i.e.n.n.a A.u.s.t.r.i.a >> e.m.a.i.l: cstrato at aon.at >> _._._._._._._._._._._._._._._._._._ >> >> >> On 4/28/11 7:50 PM, Javier P?rez Florido wrote: >>> Dear list, >>> A possible data analysis workflow for EXON arrays could be as follows >>> (extracted from "Exon Array data analysis using Affymetrix Power Tools >>> and R statistical software", Briefings in Bioinformatics): >>> >>> * Normalization and summarization (at exon or gene-level) of the >>> array set. >>> * Quality control of exon array data of summarization results (to >>> remove possible outliers) >>> * Specific filtering steps, for example: >>> o Restrict analysis to core probesets >>> o Filter for undetected probesets (i.e., undetected exons), >>> making use of DABG (Detected above background) analysis. >>> o Filter for cross-hybridizing probesets (exons) >>> o Filter for genes undetected genes in all groups >>> >>> I'm running a gene-level data analysis on Human GENE ST 1.0 (not EXON) >>> arrays, which are, in principle, designed for gene expression profiling, >>> that is, a gene-level analysis. My question is related to the filtering >>> step. I was wondering if, once the normalization and summarization is >>> run at the transcript level (core), giving 33297 transcripts, the >>> following filtering can be run before differential expression analysis: >>> >>> * Remove control transcripts such as other_spike, AFFX, pos_control >>> (normgene->exon) and neg_control (normgene->intron). This step >>> removes around 4156 transcripts >>> * Remove transcripts with very low variability through varFilter >>> function (genefilter package) >>> >>> Since these were the steps recommended in "Bioconductor case studies" >>> book for 3'IVT arrays (the controls were different in 3'IVT), I was >>> wondering if these 2 filtering steps can also be used on Human Gene >>> arrays for gene-level analysis or, on the contrary, I have to run the >>> filtering steps described above for EXON arrays. >>> Thanks, >>> Javier >>> P.S. If you know any data analysis workflow document for HuGene arrays, >>> please, let me know >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > >

ADD REPLY • link 13.7 years ago cstrato ★ 3.9k

Login before adding your answer.