calcNormFactors - normalization

0

Entering edit mode

Lana Schaffer ★ 1.3k

@lana-schaffer-1056

Last seen 10.6 years ago

Greetings, Using d <- calcNormFactors(d) I get the following normalization factors. Why are the factors so similar when the the 4th count is 1/20 the counts as the rest? > d$samples group lib.size norm.factors HCV_45d_1 d45 7812615 1.0471701 HCV_45d_2 d45 9728373 1.0004453 HCV_100d_1 d100 8606449 0.9516424 HCV_100d_2 d100 446991 1.0030340 Lana Schaffer Biostatistics, Informatics DNA Array Core Facility 858-784-2263 [[alternative HTML version deleted]]

Normalization Normalization • 2.2k views

ADD COMMENT • link updated 13.9 years ago by Davis McCarthy ▴ 260 • written 13.9 years ago by Lana Schaffer ★ 1.3k

0

Entering edit mode

Mark Robinson ★ 1.1k

@mark-robinson-2171

Last seen 10.6 years ago

Hi Lana, The factor (offset) that gets used in the statistical model is actually the *product* of lib.size and norm.factors, so the lower depth of library HCV_100d_2 is taken into account. Mark On 2011-05-07, at 9:49 AM, Lana Schaffer wrote: > Greetings, > Using d <- calcNormFactors(d) > I get the following normalization factors. > Why are the factors so similar when the the 4th count is 1/20 the counts as the rest? > >> d$samples > group lib.size norm.factors > HCV_45d_1 d45 7812615 1.0471701 > HCV_45d_2 d45 9728373 1.0004453 > HCV_100d_1 d100 8606449 0.9516424 > HCV_100d_2 d100 446991 1.0030340 > > Lana Schaffer > Biostatistics, Informatics > DNA Array Core Facility > 858-784-2263 > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: mrobinson at wehi.edu.au e: m.robinson at garvan.org.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 ------------------------------ ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD COMMENT • link 13.9 years ago Mark Robinson ★ 1.1k

0

Entering edit mode

Mark, My gene library contains only 141 genes. Is this low number Alright in this model? The length on the genes are not accounted for in this package edgeR? Lana -----Original Message----- From: Mark Robinson [mailto:mrobinson@wehi.EDU.AU] Sent: Friday, May 06, 2011 4:55 PM To: Lana Schaffer Cc: 'bioconductor at r-project.org' Subject: Re: [BioC] calcNormFactors - normalization Hi Lana, The factor (offset) that gets used in the statistical model is actually the *product* of lib.size and norm.factors, so the lower depth of library HCV_100d_2 is taken into account. Mark On 2011-05-07, at 9:49 AM, Lana Schaffer wrote: > Greetings, > Using d <- calcNormFactors(d) > I get the following normalization factors. > Why are the factors so similar when the the 4th count is 1/20 the counts as the rest? > >> d$samples > group lib.size norm.factors > HCV_45d_1 d45 7812615 1.0471701 > HCV_45d_2 d45 9728373 1.0004453 > HCV_100d_1 d100 8606449 0.9516424 > HCV_100d_2 d100 446991 1.0030340 > > Lana Schaffer > Biostatistics, Informatics > DNA Array Core Facility > 858-784-2263 > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: mrobinson at wehi.edu.au e: m.robinson at garvan.org.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 ------------------------------ ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 13.9 years ago Lana Schaffer ★ 1.3k

0

Entering edit mode

On 2011-05-07, at 9:57 AM, Lana Schaffer wrote: > Mark, > My gene library contains only 141 genes. Is this low number > Alright in this model? Yes, this is alright. For one thing, you pay a much smaller multiple testing penalty. > The length on the genes are not accounted for in this package edgeR? Correct, not accounted for. But, you are comparing genes across samples. Mark > Lana > > -----Original Message----- > From: Mark Robinson [mailto:mrobinson at wehi.EDU.AU] > Sent: Friday, May 06, 2011 4:55 PM > To: Lana Schaffer > Cc: 'bioconductor at r-project.org' > Subject: Re: [BioC] calcNormFactors - normalization > > Hi Lana, > > The factor (offset) that gets used in the statistical model is actually the *product* of lib.size and norm.factors, so the lower depth of library HCV_100d_2 is taken into account. > > Mark > > On 2011-05-07, at 9:49 AM, Lana Schaffer wrote: > >> Greetings, >> Using d <- calcNormFactors(d) >> I get the following normalization factors. >> Why are the factors so similar when the the 4th count is 1/20 the counts as the rest? >> >>> d$samples >> group lib.size norm.factors >> HCV_45d_1 d45 7812615 1.0471701 >> HCV_45d_2 d45 9728373 1.0004453 >> HCV_100d_1 d100 8606449 0.9516424 >> HCV_100d_2 d100 446991 1.0030340 >> >> Lana Schaffer >> Biostatistics, Informatics >> DNA Array Core Facility >> 858-784-2263 >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > ------------------------------ > Mark Robinson, PhD (Melb) > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: mrobinson at wehi.edu.au > e: m.robinson at garvan.org.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > ------------------------------ > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:24}}

ADD REPLY • link 13.9 years ago Mark Robinson ★ 1.1k

0

Entering edit mode

Mark, Thanks. Can you use this package for only 1 sample each group? Lana -----Original Message----- From: Mark Robinson [mailto:mrobinson@wehi.EDU.AU] Sent: Friday, May 06, 2011 5:15 PM To: Lana Schaffer Cc: 'bioconductor at r-project.org' Subject: Re: [BioC] calcNormFactors - normalization On 2011-05-07, at 9:57 AM, Lana Schaffer wrote: > Mark, > My gene library contains only 141 genes. Is this low number > Alright in this model? Yes, this is alright. For one thing, you pay a much smaller multiple testing penalty. > The length on the genes are not accounted for in this package edgeR? Correct, not accounted for. But, you are comparing genes across samples. Mark > Lana > > -----Original Message----- > From: Mark Robinson [mailto:mrobinson at wehi.EDU.AU] > Sent: Friday, May 06, 2011 4:55 PM > To: Lana Schaffer > Cc: 'bioconductor at r-project.org' > Subject: Re: [BioC] calcNormFactors - normalization > > Hi Lana, > > The factor (offset) that gets used in the statistical model is actually the *product* of lib.size and norm.factors, so the lower depth of library HCV_100d_2 is taken into account. > > Mark > > On 2011-05-07, at 9:49 AM, Lana Schaffer wrote: > >> Greetings, >> Using d <- calcNormFactors(d) >> I get the following normalization factors. >> Why are the factors so similar when the the 4th count is 1/20 the counts as the rest? >> >>> d$samples >> group lib.size norm.factors >> HCV_45d_1 d45 7812615 1.0471701 >> HCV_45d_2 d45 9728373 1.0004453 >> HCV_100d_1 d100 8606449 0.9516424 >> HCV_100d_2 d100 446991 1.0030340 >> >> Lana Schaffer >> Biostatistics, Informatics >> DNA Array Core Facility >> 858-784-2263 >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > ------------------------------ > Mark Robinson, PhD (Melb) > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: mrobinson at wehi.edu.au > e: m.robinson at garvan.org.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > ------------------------------ > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:24}}

ADD REPLY • link 13.9 years ago Lana Schaffer ★ 1.3k

0

Entering edit mode

Davis McCarthy ▴ 260

@davis-mccarthy-4138

Last seen 10.6 years ago

Hi Lana The package can handle more than one sample per group and indeed the full utility of the methods in edgeR are unlocked when there is replication in at least one group. Cheers Davis > Mark, > Thanks. > Can you use this package for only 1 sample each group? > Lana > > -----Original Message----- > From: Mark Robinson [mailto:mrobinson at wehi.EDU.AU] > Sent: Friday, May 06, 2011 5:15 PM > To: Lana Schaffer > Cc: 'bioconductor at r-project.org' > Subject: Re: [BioC] calcNormFactors - normalization > > > On 2011-05-07, at 9:57 AM, Lana Schaffer wrote: > >> Mark, >> My gene library contains only 141 genes. Is this low number >> Alright in this model? > > Yes, this is alright. For one thing, you pay a much smaller multiple > testing penalty. > >> The length on the genes are not accounted for in this package edgeR? > > Correct, not accounted for. But, you are comparing genes across samples. > > Mark > > >> Lana >> >> -----Original Message----- >> From: Mark Robinson [mailto:mrobinson at wehi.EDU.AU] >> Sent: Friday, May 06, 2011 4:55 PM >> To: Lana Schaffer >> Cc: 'bioconductor at r-project.org' >> Subject: Re: [BioC] calcNormFactors - normalization >> >> Hi Lana, >> >> The factor (offset) that gets used in the statistical model is actually >> the *product* of lib.size and norm.factors, so the lower depth of >> library HCV_100d_2 is taken into account. >> >> Mark >> >> On 2011-05-07, at 9:49 AM, Lana Schaffer wrote: >> >>> Greetings, >>> Using d <- calcNormFactors(d) >>> I get the following normalization factors. >>> Why are the factors so similar when the the 4th count is 1/20 the >>> counts as the rest? >>> >>>> d$samples >>> group lib.size norm.factors >>> HCV_45d_1 d45 7812615 1.0471701 >>> HCV_45d_2 d45 9728373 1.0004453 >>> HCV_100d_1 d100 8606449 0.9516424 >>> HCV_100d_2 d100 446991 1.0030340 >>> >>> Lana Schaffer >>> Biostatistics, Informatics >>> DNA Array Core Facility >>> 858-784-2263 >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> ------------------------------ >> Mark Robinson, PhD (Melb) >> Epigenetics Laboratory, Garvan >> Bioinformatics Division, WEHI >> e: mrobinson at wehi.edu.au >> e: m.robinson at garvan.org.au >> p: +61 (0)3 9345 2628 >> f: +61 (0)3 9347 0852 >> ------------------------------ >> >> >> ______________________________________________________________________ >> The information in this email is confidential and inte...{{dropped:24}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -------------------------------------------------- Davis J McCarthy Research Technician Bioinformatics Division Walter and Eliza Hall Institute of Medical Research 1G Royal Parade, Parkville, Vic 3052, Australia. dmccarthy at wehi.edu.au http://www.wehi.edu.au ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 13.9 years ago Davis McCarthy ▴ 260

0

Entering edit mode

Davis, Untranslated this means that the package doesn't handle just 1 sample per group, For both groups. Lana -----Original Message----- From: Davis McCarthy [mailto:dmccarthy@wehi.EDU.AU] Sent: Saturday, May 07, 2011 12:41 AM To: Lana Schaffer Cc: 'Mark Robinson'; 'bioconductor at r-project.org' Subject: Re: [BioC] calcNormFactors - normalization Hi Lana The package can handle more than one sample per group and indeed the full utility of the methods in edgeR are unlocked when there is replication in at least one group. Cheers Davis > Mark, > Thanks. > Can you use this package for only 1 sample each group? > Lana > > -----Original Message----- > From: Mark Robinson [mailto:mrobinson at wehi.EDU.AU] > Sent: Friday, May 06, 2011 5:15 PM > To: Lana Schaffer > Cc: 'bioconductor at r-project.org' > Subject: Re: [BioC] calcNormFactors - normalization > > > On 2011-05-07, at 9:57 AM, Lana Schaffer wrote: > >> Mark, >> My gene library contains only 141 genes. Is this low number >> Alright in this model? > > Yes, this is alright. For one thing, you pay a much smaller multiple > testing penalty. > >> The length on the genes are not accounted for in this package edgeR? > > Correct, not accounted for. But, you are comparing genes across samples. > > Mark > > >> Lana >> >> -----Original Message----- >> From: Mark Robinson [mailto:mrobinson at wehi.EDU.AU] >> Sent: Friday, May 06, 2011 4:55 PM >> To: Lana Schaffer >> Cc: 'bioconductor at r-project.org' >> Subject: Re: [BioC] calcNormFactors - normalization >> >> Hi Lana, >> >> The factor (offset) that gets used in the statistical model is actually >> the *product* of lib.size and norm.factors, so the lower depth of >> library HCV_100d_2 is taken into account. >> >> Mark >> >> On 2011-05-07, at 9:49 AM, Lana Schaffer wrote: >> >>> Greetings, >>> Using d <- calcNormFactors(d) >>> I get the following normalization factors. >>> Why are the factors so similar when the the 4th count is 1/20 the >>> counts as the rest? >>> >>>> d$samples >>> group lib.size norm.factors >>> HCV_45d_1 d45 7812615 1.0471701 >>> HCV_45d_2 d45 9728373 1.0004453 >>> HCV_100d_1 d100 8606449 0.9516424 >>> HCV_100d_2 d100 446991 1.0030340 >>> >>> Lana Schaffer >>> Biostatistics, Informatics >>> DNA Array Core Facility >>> 858-784-2263 >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> ------------------------------ >> Mark Robinson, PhD (Melb) >> Epigenetics Laboratory, Garvan >> Bioinformatics Division, WEHI >> e: mrobinson at wehi.edu.au >> e: m.robinson at garvan.org.au >> p: +61 (0)3 9345 2628 >> f: +61 (0)3 9347 0852 >> ------------------------------ >> >> >> ______________________________________________________________________ >> The information in this email is confidential and inte...{{dropped:24}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -------------------------------------------------- Davis J McCarthy Research Technician Bioinformatics Division Walter and Eliza Hall Institute of Medical Research 1G Royal Parade, Parkville, Vic 3052, Australia. dmccarthy at wehi.edu.au http://www.wehi.edu.au ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 13.9 years ago Lana Schaffer ★ 1.3k

0

Entering edit mode

Davis McCarthy ▴ 260

@davis-mccarthy-4138

Last seen 10.6 years ago

Lana The package can handle just 1 sample per group, but in that case you do not have sufficient degrees of freedom to estimate the dispersion parameter, which enables the model to account for biological variability between samples. There are a couple of workarounds that you can use: 1) Dispersion value of zero, which is equivalent to the Poisson model. This is the default if edgeR detects no replication in your groups. 2) Treat the samples as members of one group to estimate a value for the common dispersion and plug that value in when you look for DE between groups. The first approach will likely overstate the amount of DE between the samples. The second approach will tend to overestimate the dispersion parameter, so is conservative. In an ideal world you would have biological replicate samples. Cheers Davis > Davis, > Untranslated this means that the package doesn't handle just 1 sample per > group, > For both groups. > Lana > > -----Original Message----- > From: Davis McCarthy [mailto:dmccarthy at wehi.EDU.AU] > Sent: Saturday, May 07, 2011 12:41 AM > To: Lana Schaffer > Cc: 'Mark Robinson'; 'bioconductor at r-project.org' > Subject: Re: [BioC] calcNormFactors - normalization > > Hi Lana > > The package can handle more than one sample per group and indeed the full > utility of the methods in edgeR are unlocked when there is replication in > at least one group. > > Cheers > Davis > > >> Mark, >> Thanks. >> Can you use this package for only 1 sample each group? >> Lana >> >> -----Original Message----- >> From: Mark Robinson [mailto:mrobinson at wehi.EDU.AU] >> Sent: Friday, May 06, 2011 5:15 PM >> To: Lana Schaffer >> Cc: 'bioconductor at r-project.org' >> Subject: Re: [BioC] calcNormFactors - normalization >> >> >> On 2011-05-07, at 9:57 AM, Lana Schaffer wrote: >> >>> Mark, >>> My gene library contains only 141 genes. Is this low number >>> Alright in this model? >> >> Yes, this is alright. For one thing, you pay a much smaller multiple >> testing penalty. >> >>> The length on the genes are not accounted for in this package edgeR? >> >> Correct, not accounted for. But, you are comparing genes across >> samples. >> >> Mark >> >> >>> Lana >>> >>> -----Original Message----- >>> From: Mark Robinson [mailto:mrobinson at wehi.EDU.AU] >>> Sent: Friday, May 06, 2011 4:55 PM >>> To: Lana Schaffer >>> Cc: 'bioconductor at r-project.org' >>> Subject: Re: [BioC] calcNormFactors - normalization >>> >>> Hi Lana, >>> >>> The factor (offset) that gets used in the statistical model is actually >>> the *product* of lib.size and norm.factors, so the lower depth of >>> library HCV_100d_2 is taken into account. >>> >>> Mark >>> >>> On 2011-05-07, at 9:49 AM, Lana Schaffer wrote: >>> >>>> Greetings, >>>> Using d <- calcNormFactors(d) >>>> I get the following normalization factors. >>>> Why are the factors so similar when the the 4th count is 1/20 the >>>> counts as the rest? >>>> >>>>> d$samples >>>> group lib.size norm.factors >>>> HCV_45d_1 d45 7812615 1.0471701 >>>> HCV_45d_2 d45 9728373 1.0004453 >>>> HCV_100d_1 d100 8606449 0.9516424 >>>> HCV_100d_2 d100 446991 1.0030340 >>>> >>>> Lana Schaffer >>>> Biostatistics, Informatics >>>> DNA Array Core Facility >>>> 858-784-2263 >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> ------------------------------ >>> Mark Robinson, PhD (Melb) >>> Epigenetics Laboratory, Garvan >>> Bioinformatics Division, WEHI >>> e: mrobinson at wehi.edu.au >>> e: m.robinson at garvan.org.au >>> p: +61 (0)3 9345 2628 >>> f: +61 (0)3 9347 0852 >>> ------------------------------ >>> >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and inte...{{dropped:24}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -------------------------------------------------- > Davis J McCarthy > Research Technician > Bioinformatics Division > Walter and Eliza Hall Institute of Medical Research > 1G Royal Parade, Parkville, Vic 3052, Australia. > dmccarthy at wehi.edu.au > http://www.wehi.edu.au > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:23}}

ADD COMMENT • link 13.9 years ago Davis McCarthy ▴ 260

Login before adding your answer.