RNA-seq differentially expressed gene finding methods
4
0
Entering edit mode
Son Pham ▴ 40
@son-pham-6721
Last seen 10.3 years ago
Dear all, I know that we have quite very good packages (edgeR, deseq) that calculate the list of differentially expressed genes in 2 conditions (with replicates) from raw counts. But I do not know what is wrong with the following simple approach (and whether other people have been using it): 1. Get the (estimated) tpm/fpkm for each gene in each sample 2. Do a t-test for two groups on each gene. 3. Adjust the p value for multiple tests (p-adj) Thanks, Son. [[alternative HTML version deleted]]
• 3.9k views
ADD COMMENT
0
Entering edit mode
Son Pham ▴ 40
@son-pham-6721
Last seen 10.3 years ago
Thank you Richard, Devon and Paul for very insight answers. I completely agree that the approach I raised above is inappropriate when the group size is small (3, 4...). But when the group size is large enough ( > 20 or 30), the sampling distribution of the mean will be (closed to) normally distributed, and that is why I believe that the t-test is ok. -Son. On Fri, Sep 5, 2014 at 10:05 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: > Hi Son, > > My understanding is that the approach you describe could be considered > valid for large enough numbers of samples, however, RNA-seq > experiments will typically have smaller numbers (<30) samples per > condition, meaning that a t-test is not valid (because RNA-seq data > isn't normally distributed). However, while I don't think that a > t-test is "invalid" given enough samples, its very difficult to > justify using such a method when much better powered methods have been > invented specifically for this type of data. > > Paul > > On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman > <friedman at="" c2b2.columbia.edu=""> wrote: > > Dear Son, > > > > The t-test assumes a normal distribution, > > which is appropriate for continous variables. RNAseq > > data deals with counts (discrete entities). A negative binomial > distribution > > (EdgeR, Deseq) or a mean dependent variance (VOOM) > > is much more approriate. Also the 3 methods mentioned > > above estimate variablity better with information from all genes > > using empirical Bayesian methods, than does the one-gene > > at-a-time frequentist t-test. > > > > Best wishes, > > Rich > > Richard A. Friedman, PhD > > Associate Research Scientist, > > Biomedical Informatics Shared Resource > > Herbert Irving Comprehensive Cancer Center (HICCC) > > Lecturer, > > Department of Biomedical Informatics (DBMI) > > Educational Coordinator, > > Center for Computational Biology and Bioinformatics (C2B2)/ > > National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ > > Columbia Department of Systems Biology > > Room 824 > > Irving Cancer Research Center > > Columbia University > > 1130 St. Nicholas Ave > > New York, NY 10032 > > (212)851-4765 (voice) > > friedman at c2b2.columbia.edu > > http://friedman.c2b2.columbia.edu/ > > > > "There is nothing in my Contemporary Jewish Literature course that is > > either contemporary, Jewish, or literature". > > > > -Rose Friedman, age 17 > > > > > > On Sep 5, 2014, at 12:44 PM, Son Pham wrote: > > > >> Dear all, > >> I know that we have quite very good packages (edgeR, deseq) that > calculate > >> the list of differentially expressed genes in 2 conditions (with > >> replicates) from raw counts. But I do not know what is wrong with the > >> following simple approach (and whether other people have been using it): > >> > >> 1. Get the (estimated) tpm/fpkm for each gene in each sample > >> 2. Do a t-test for two groups on each gene. > >> 3. Adjust the p value for multiple tests (p-adj) > >> > >> > >> Thanks, > >> > >> Son. > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Dr. Paul Geeleher, PhD > Section of Hematology-Oncology > Department of Medicine > The University of Chicago > 900 E. 57th St., > KCBD, Room 7144 > Chicago, IL 60637 > -- > www.bioinformaticstutorials.com > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Son of course you are right. Here?s an excerpt of our 2010 Genome Biology paper: Conclusions Why is it necessary to develop new statistical methodology for sequence count data? If large numbers of replicates were available, questions of data distribution could be avoided by using non- parametric methods, such as rank-based or permutation tests. However, it is desirable (and possible) to consider experiments with smaller numbers of replicates per condition. In order to compare an observed difference with an expected random variation, we can improve our picture of the latter in two ways: first, we can use distribution families, such as normal, Poisson and negative binomial distributions, in order to determine the higher moments, and hence the tail behavior, of statistics for differential expression, based on observed low order moments such as mean and variance. Second, we can share information, for instance, distributional parameters, between genes, based on the notion that data from different genes follow similar patterns of variability. Here, we have described an instance of such an approach, ... Btw, t-test can be perfectly ?valid? even if the data are non-Normal, in particular, when they are fatter. The test then just looses power, sometimes badly so. I find it odd that so many people worry about that so much. Correlations between samples (e.g. ?batch effects?) are much more problematic. Best wishes Wolfgang Il giorno 05 Sep 2014, alle ore 19:31, Son Pham <spham at="" salk.edu=""> ha scritto: > Thank you Richard, Devon and Paul for very insight answers. > I completely agree that the approach I raised above is inappropriate when > the group size is small (3, 4...). > But when the group size is large enough ( > 20 or 30), the sampling > distribution of the mean will be (closed to) normally distributed, and that > is why I believe that the t-test is ok. > > > -Son. > > > > > On Fri, Sep 5, 2014 at 10:05 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> > wrote: > >> Hi Son, >> >> My understanding is that the approach you describe could be considered >> valid for large enough numbers of samples, however, RNA-seq >> experiments will typically have smaller numbers (<30) samples per >> condition, meaning that a t-test is not valid (because RNA-seq data >> isn't normally distributed). However, while I don't think that a >> t-test is "invalid" given enough samples, its very difficult to >> justify using such a method when much better powered methods have been >> invented specifically for this type of data. >> >> Paul >> >> On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman >> <friedman at="" c2b2.columbia.edu=""> wrote: >>> Dear Son, >>> >>> The t-test assumes a normal distribution, >>> which is appropriate for continous variables. RNAseq >>> data deals with counts (discrete entities). A negative binomial >> distribution >>> (EdgeR, Deseq) or a mean dependent variance (VOOM) >>> is much more approriate. Also the 3 methods mentioned >>> above estimate variablity better with information from all genes >>> using empirical Bayesian methods, than does the one-gene >>> at-a-time frequentist t-test. >>> >>> Best wishes, >>> Rich >>> Richard A. Friedman, PhD >>> Associate Research Scientist, >>> Biomedical Informatics Shared Resource >>> Herbert Irving Comprehensive Cancer Center (HICCC) >>> Lecturer, >>> Department of Biomedical Informatics (DBMI) >>> Educational Coordinator, >>> Center for Computational Biology and Bioinformatics (C2B2)/ >>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ >>> Columbia Department of Systems Biology >>> Room 824 >>> Irving Cancer Research Center >>> Columbia University >>> 1130 St. Nicholas Ave >>> New York, NY 10032 >>> (212)851-4765 (voice) >>> friedman at c2b2.columbia.edu >>> http://friedman.c2b2.columbia.edu/ >>> >>> "There is nothing in my Contemporary Jewish Literature course that is >>> either contemporary, Jewish, or literature". >>> >>> -Rose Friedman, age 17 >>> >>> >>> On Sep 5, 2014, at 12:44 PM, Son Pham wrote: >>> >>>> Dear all, >>>> I know that we have quite very good packages (edgeR, deseq) that >> calculate >>>> the list of differentially expressed genes in 2 conditions (with >>>> replicates) from raw counts. But I do not know what is wrong with the >>>> following simple approach (and whether other people have been using it): >>>> >>>> 1. Get the (estimated) tpm/fpkm for each gene in each sample >>>> 2. Do a t-test for two groups on each gene. >>>> 3. Adjust the p value for multiple tests (p-adj) >>>> >>>> >>>> Thanks, >>>> >>>> Son. >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> -- >> Dr. Paul Geeleher, PhD >> Section of Hematology-Oncology >> Department of Medicine >> The University of Chicago >> 900 E. 57th St., >> KCBD, Room 7144 >> Chicago, IL 60637 >> -- >> www.bioinformaticstutorials.com >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
@richard-friedman-6273
Last seen 10.3 years ago
Dear Son, The t-test assumes a normal distribution, which is appropriate for continous variables. RNAseq data deals with counts (discrete entities). A negative binomial distribution (EdgeR, Deseq) or a mean dependent variance (VOOM) is much more approriate. Also the 3 methods mentioned above estimate variablity better with information from all genes using empirical Bayesian methods, than does the one-gene at-a-time frequentist t-test. Best wishes, Rich Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ Columbia Department of Systems Biology Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman at c2b2.columbia.edu http://friedman.c2b2.columbia.edu/ "There is nothing in my Contemporary Jewish Literature course that is either contemporary, Jewish, or literature". -Rose Friedman, age 17 On Sep 5, 2014, at 12:44 PM, Son Pham wrote: > Dear all, > I know that we have quite very good packages (edgeR, deseq) that calculate > the list of differentially expressed genes in 2 conditions (with > replicates) from raw counts. But I do not know what is wrong with the > following simple approach (and whether other people have been using it): > > 1. Get the (estimated) tpm/fpkm for each gene in each sample > 2. Do a t-test for two groups on each gene. > 3. Adjust the p value for multiple tests (p-adj) > > > Thanks, > > Son. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Son, My understanding is that the approach you describe could be considered valid for large enough numbers of samples, however, RNA-seq experiments will typically have smaller numbers (<30) samples per condition, meaning that a t-test is not valid (because RNA-seq data isn't normally distributed). However, while I don't think that a t-test is "invalid" given enough samples, its very difficult to justify using such a method when much better powered methods have been invented specifically for this type of data. Paul On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman <friedman at="" c2b2.columbia.edu=""> wrote: > Dear Son, > > The t-test assumes a normal distribution, > which is appropriate for continous variables. RNAseq > data deals with counts (discrete entities). A negative binomial distribution > (EdgeR, Deseq) or a mean dependent variance (VOOM) > is much more approriate. Also the 3 methods mentioned > above estimate variablity better with information from all genes > using empirical Bayesian methods, than does the one-gene > at-a-time frequentist t-test. > > Best wishes, > Rich > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ > Columbia Department of Systems Biology > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman at c2b2.columbia.edu > http://friedman.c2b2.columbia.edu/ > > "There is nothing in my Contemporary Jewish Literature course that is > either contemporary, Jewish, or literature". > > -Rose Friedman, age 17 > > > On Sep 5, 2014, at 12:44 PM, Son Pham wrote: > >> Dear all, >> I know that we have quite very good packages (edgeR, deseq) that calculate >> the list of differentially expressed genes in 2 conditions (with >> replicates) from raw counts. But I do not know what is wrong with the >> following simple approach (and whether other people have been using it): >> >> 1. Get the (estimated) tpm/fpkm for each gene in each sample >> 2. Do a t-test for two groups on each gene. >> 3. Adjust the p value for multiple tests (p-adj) >> >> >> Thanks, >> >> Son. >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Dr. Paul Geeleher, PhD Section of Hematology-Oncology Department of Medicine The University of Chicago 900 E. 57th St., KCBD, Room 7144 Chicago, IL 60637 -- www.bioinformaticstutorials.com
ADD REPLY
0
Entering edit mode
Devon Ryan ▴ 200
@devon-ryan-6054
Last seen 9.0 years ago
Germany
N.B., I forgot to CC the list originally. Hi Son, To add a bit to Richard's response, there's also the issue that conversion to FPKM/RPKM/TPM loses precision information. For example, suppose two samples in a group produce values of 1.0 and 1.2 for some gene (these can be any of the aforementioned metrics). It's rarely the case that the number of mapped reads (or even those aligning to genes) is constant across samples, so it's quite likely that one of those numbers was derived from more data than the other, meaning that we'd like to weight estimates of the group measure toward it. That'd be impossible with only FPKM/etc. values, since we lose this information. Best, Devon ____________________________________________ Devon Ryan, Ph.D. Email: dpryan at dpryan.com Tel: +49 (0)178 298-6067 Molecular and Cellular Cognition Lab German Centre for Neurodegenerative Diseases (DZNE) Ludwig-Erhard-Allee 2 53175 Bonn, Germany On Sep 5, 2014, at 6:44 PM, Son Pham wrote: > Dear all, > I know that we have quite very good packages (edgeR, deseq) that calculate > the list of differentially expressed genes in 2 conditions (with > replicates) from raw counts. But I do not know what is wrong with the > following simple approach (and whether other people have been using it): > > 1. Get the (estimated) tpm/fpkm for each gene in each sample > 2. Do a t-test for two groups on each gene. > 3. Adjust the p value for multiple tests (p-adj) > > > Thanks, > > Son. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 42 minutes ago
WEHI, Melbourne, Australia
Dear Son, The problem has little to do with normality or group size and more to do with the fact that fpkm values can have very different variances depending on the size of the original count. The creates a problem for the t-test which assumes equal variances. See the voom paper for discussion of this: http://genomebiology.com/2014/15/2/R29 Best wishes Gordon > Date: Fri, 5 Sep 2014 10:31:25 -0700 > From: Son Pham <spham at="" salk.edu=""> > To: Paul Geeleher <paulgeeleher at="" gmail.com=""> > Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] RNA-seq differentially expressed gene finding > methods > > Thank you Richard, Devon and Paul for very insight answers. > I completely agree that the approach I raised above is inappropriate when > the group size is small (3, 4...). > But when the group size is large enough ( > 20 or 30), the sampling > distribution of the mean will be (closed to) normally distributed, and that > is why I believe that the t-test is ok. > > > -Son. > > > > > On Fri, Sep 5, 2014 at 10:05 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> > wrote: > >> Hi Son, >> >> My understanding is that the approach you describe could be considered >> valid for large enough numbers of samples, however, RNA-seq >> experiments will typically have smaller numbers (<30) samples per >> condition, meaning that a t-test is not valid (because RNA-seq data >> isn't normally distributed). However, while I don't think that a >> t-test is "invalid" given enough samples, its very difficult to >> justify using such a method when much better powered methods have been >> invented specifically for this type of data. >> >> Paul >> >> On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman >> <friedman at="" c2b2.columbia.edu=""> wrote: >>> Dear Son, >>> >>> The t-test assumes a normal distribution, >>> which is appropriate for continous variables. RNAseq >>> data deals with counts (discrete entities). A negative binomial >> distribution >>> (EdgeR, Deseq) or a mean dependent variance (VOOM) >>> is much more approriate. Also the 3 methods mentioned >>> above estimate variablity better with information from all genes >>> using empirical Bayesian methods, than does the one-gene >>> at-a-time frequentist t-test. >>> >>> Best wishes, >>> Rich >>> Richard A. Friedman, PhD >>> Associate Research Scientist, >>> Biomedical Informatics Shared Resource >>> Herbert Irving Comprehensive Cancer Center (HICCC) >>> Lecturer, >>> Department of Biomedical Informatics (DBMI) >>> Educational Coordinator, >>> Center for Computational Biology and Bioinformatics (C2B2)/ >>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ >>> Columbia Department of Systems Biology >>> Room 824 >>> Irving Cancer Research Center >>> Columbia University >>> 1130 St. Nicholas Ave >>> New York, NY 10032 >>> (212)851-4765 (voice) >>> friedman at c2b2.columbia.edu >>> http://friedman.c2b2.columbia.edu/ >>> >>> "There is nothing in my Contemporary Jewish Literature course that is >>> either contemporary, Jewish, or literature". >>> >>> -Rose Friedman, age 17 >>> >>> >>> On Sep 5, 2014, at 12:44 PM, Son Pham wrote: >>> >>>> Dear all, >>>> I know that we have quite very good packages (edgeR, deseq) that >> calculate >>>> the list of differentially expressed genes in 2 conditions (with >>>> replicates) from raw counts. But I do not know what is wrong with the >>>> following simple approach (and whether other people have been using it): >>>> >>>> 1. Get the (estimated) tpm/fpkm for each gene in each sample >>>> 2. Do a t-test for two groups on each gene. >>>> 3. Adjust the p value for multiple tests (p-adj) >>>> >>>> >>>> Thanks, >>>> >>>> Son. >>>> >> >> >> -- >> Dr. Paul Geeleher, PhD >> Section of Hematology-Oncology >> Department of Medicine >> The University of Chicago >> 900 E. 57th St., >> KCBD, Room 7144 >> Chicago, IL 60637 >> -- >> www.bioinformaticstutorials.com ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
For previous discussion on this list see https://stat.ethz.ch/pipermail/bioconductor/2013-May/052802.html This and the voom paper discuss what one needs to do to make t-tests work well in the RNA-seq context. Gordon On Sun, 7 Sep 2014, Gordon K Smyth wrote: > Dear Son, > > The problem has little to do with normality or group size and more to do with > the fact that fpkm values can have very different variances depending on the > size of the original count. The creates a problem for the t-test which > assumes equal variances. > > See the voom paper for discussion of this: > > http://genomebiology.com/2014/15/2/R29 > > Best wishes > Gordon > >> Date: Fri, 5 Sep 2014 10:31:25 -0700 >> From: Son Pham <spham at="" salk.edu=""> >> To: Paul Geeleher <paulgeeleher at="" gmail.com=""> >> Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> >> Subject: Re: [BioC] RNA-seq differentially expressed gene finding >> methods >> >> Thank you Richard, Devon and Paul for very insight answers. >> I completely agree that the approach I raised above is inappropriate when >> the group size is small (3, 4...). >> But when the group size is large enough ( > 20 or 30), the sampling >> distribution of the mean will be (closed to) normally distributed, and that >> is why I believe that the t-test is ok. >> >> >> -Son. >> >> >> >> >> On Fri, Sep 5, 2014 at 10:05 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> >> wrote: >> >>> Hi Son, >>> >>> My understanding is that the approach you describe could be considered >>> valid for large enough numbers of samples, however, RNA-seq >>> experiments will typically have smaller numbers (<30) samples per >>> condition, meaning that a t-test is not valid (because RNA-seq data >>> isn't normally distributed). However, while I don't think that a >>> t-test is "invalid" given enough samples, its very difficult to >>> justify using such a method when much better powered methods have been >>> invented specifically for this type of data. >>> >>> Paul >>> >>> On Fri, Sep 5, 2014 at 11:52 AM, Richard Friedman >>> <friedman at="" c2b2.columbia.edu=""> wrote: >>>> Dear Son, >>>> >>>> The t-test assumes a normal distribution, >>>> which is appropriate for continous variables. RNAseq >>>> data deals with counts (discrete entities). A negative binomial >>> distribution >>>> (EdgeR, Deseq) or a mean dependent variance (VOOM) >>>> is much more approriate. Also the 3 methods mentioned >>>> above estimate variablity better with information from all genes >>>> using empirical Bayesian methods, than does the one-gene >>>> at-a-time frequentist t-test. >>>> >>>> Best wishes, >>>> Rich >>>> Richard A. Friedman, PhD >>>> Associate Research Scientist, >>>> Biomedical Informatics Shared Resource >>>> Herbert Irving Comprehensive Cancer Center (HICCC) >>>> Lecturer, >>>> Department of Biomedical Informatics (DBMI) >>>> Educational Coordinator, >>>> Center for Computational Biology and Bioinformatics (C2B2)/ >>>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ >>>> Columbia Department of Systems Biology >>>> Room 824 >>>> Irving Cancer Research Center >>>> Columbia University >>>> 1130 St. Nicholas Ave >>>> New York, NY 10032 >>>> (212)851-4765 (voice) >>>> friedman at c2b2.columbia.edu >>>> http://friedman.c2b2.columbia.edu/ >>>> >>>> "There is nothing in my Contemporary Jewish Literature course that is >>>> either contemporary, Jewish, or literature". >>>> >>>> -Rose Friedman, age 17 >>>> >>>> >>>> On Sep 5, 2014, at 12:44 PM, Son Pham wrote: >>>> >>>>> Dear all, >>>>> I know that we have quite very good packages (edgeR, deseq) that >>> calculate >>>>> the list of differentially expressed genes in 2 conditions (with >>>>> replicates) from raw counts. But I do not know what is wrong with the >>>>> following simple approach (and whether other people have been using it): >>>>> >>>>> 1. Get the (estimated) tpm/fpkm for each gene in each sample >>>>> 2. Do a t-test for two groups on each gene. >>>>> 3. Adjust the p value for multiple tests (p-adj) >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Son. >>>>> >>> >>> >>> -- >>> Dr. Paul Geeleher, PhD >>> Section of Hematology-Oncology >>> Department of Medicine >>> The University of Chicago >>> 900 E. 57th St., >>> KCBD, Room 7144 >>> Chicago, IL 60637 >>> -- >>> www.bioinformaticstutorials.com > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLY

Login before adding your answer.

Traffic: 629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6