Question

uneven counts for edgeR

0

Entering edit mode

Lana Schaffer ★ 1.3k

@lana-schaffer-1056

Last seen 10.3 years ago

Hi, I have replicate sample counts for 2 groups but one sample is 4x number of mapped reads Than the other samples. 528,428 625,889 498,569 2,328,333 I divided all the mapped transcript reads by 4 and then did the normalization and analysis With edgeR. What do you recommend to do with the 4th sample counts? Lana Schaffer Biostatistics, Informatics DNA Array Core Facility 858-784-2263 [[alternative HTML version deleted]]

edgeR edgeR • 1.5k views

ADD COMMENT • link updated 13.2 years ago by Gordon Smyth 52k • written 13.2 years ago by Lana Schaffer ★ 1.3k

score 0 · Answer 1 · 2011-10-23

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 7 hours ago

WEHI, Melbourne, Australia

Dear Lana, edgeR has no difficulty with uneven library sizes, and will adjust for this automatically for this during the analysis. There is no need for you to do anything other than follow a standard analysis pipeline. You do not need to standardize the 4th sample by dividing the counts by dividing by 4, in fact you must not do this since it changes the mean-variance relationship for your data and invalidates the subsequent analysis. You need to input the true read counts into edgeR. Best wishes Gordon > Date: Fri, 21 Oct 2011 15:27:25 -0700 > From: Lana Schaffer <schaffer at="" scripps.edu=""> > To: "'bioconductor at r-project.org'" <bioconductor at="" r-project.org=""> > Subject: [BioC] uneven counts for edgeR > > Hi, > I have replicate sample counts for 2 groups but one sample is 4x number of mapped reads > Than the other samples. > 528,428 > > 625,889 > > 498,569 > > 2,328,333 > > I divided all the mapped transcript reads by 4 and then did the > normalization and analysis With edgeR. What do you recommend to do with > the 4th sample counts? > > Lana Schaffer > Biostatistics, Informatics > DNA Array Core Facility > 858-784-2263 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 13.2 years ago Gordon Smyth 52k

0

Entering edit mode

Gordon, Thank you for this information. Is the same true for DeSeq? Lana -----Original Message----- From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] Sent: Saturday, October 22, 2011 5:42 PM To: Lana Schaffer Cc: Bioconductor mailing list Subject: uneven counts for edgeR Dear Lana, edgeR has no difficulty with uneven library sizes, and will adjust for this automatically for this during the analysis. There is no need for you to do anything other than follow a standard analysis pipeline. You do not need to standardize the 4th sample by dividing the counts by dividing by 4, in fact you must not do this since it changes the mean-variance relationship for your data and invalidates the subsequent analysis. You need to input the true read counts into edgeR. Best wishes Gordon > Date: Fri, 21 Oct 2011 15:27:25 -0700 > From: Lana Schaffer <schaffer at="" scripps.edu=""> > To: "'bioconductor at r-project.org'" <bioconductor at="" r-project.org=""> > Subject: [BioC] uneven counts for edgeR > > Hi, > I have replicate sample counts for 2 groups but one sample is 4x number of mapped reads > Than the other samples. > 528,428 > > 625,889 > > 498,569 > > 2,328,333 > > I divided all the mapped transcript reads by 4 and then did the > normalization and analysis With edgeR. What do you recommend to do with > the 4th sample counts? > > Lana Schaffer > Biostatistics, Informatics > DNA Array Core Facility > 858-784-2263 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 13.2 years ago Lana Schaffer ★ 1.3k

0

Entering edit mode

Gordon Smyth wrote: > edgeR has no difficulty with uneven library sizes, and will adjust for > this automatically for this during the analysis. There is no need > for you to do anything other than follow a standard analysis pipeline. Lana Schaffer wrote: > Is the same true for DeSeq? Yes, it is. Simon

ADD REPLY • link 13.2 years ago Simon Anders ★ 3.8k

0

Entering edit mode

Gordon, An unnamed company is claiming that the RPKM counts and/or Some transformation of the RPKM counts is 90% normal, 5% NB, And 5% poisson distribution using the Akaiki Information Criteria. Can you explain why this is or is not plausable? Lana -----Original Message----- From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] Sent: Saturday, October 22, 2011 5:42 PM To: Lana Schaffer Cc: Bioconductor mailing list Subject: uneven counts for edgeR Dear Lana, edgeR has no difficulty with uneven library sizes, and will adjust for this automatically for this during the analysis. There is no need for you to do anything other than follow a standard analysis pipeline. You do not need to standardize the 4th sample by dividing the counts by dividing by 4, in fact you must not do this since it changes the mean-variance relationship for your data and invalidates the subsequent analysis. You need to input the true read counts into edgeR. Best wishes Gordon > Date: Fri, 21 Oct 2011 15:27:25 -0700 > From: Lana Schaffer <schaffer at="" scripps.edu=""> > To: "'bioconductor at r-project.org'" <bioconductor at="" r-project.org=""> > Subject: [BioC] uneven counts for edgeR > > Hi, > I have replicate sample counts for 2 groups but one sample is 4x number of mapped reads > Than the other samples. > 528,428 > > 625,889 > > 498,569 > > 2,328,333 > > I divided all the mapped transcript reads by 4 and then did the > normalization and analysis With edgeR. What do you recommend to do with > the 4th sample counts? > > Lana Schaffer > Biostatistics, Informatics > DNA Array Core Facility > 858-784-2263 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 13.2 years ago Lana Schaffer ★ 1.3k

0

Entering edit mode

Dear Lana, It sounds strange, but it would be unwise for me to comment without knowing what they mean. It is of course technically impossible for RPKM to be negative binomial or Poisson because RPKM values are not integers. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. smyth at wehi.edu.au http://www.wehi.edu.au http://www.statsci.org/smyth On Wed, 26 Oct 2011, Lana Schaffer wrote: > Gordon, > An unnamed company is claiming that the RPKM counts and/or > Some transformation of the RPKM counts is 90% normal, 5% NB, > And 5% poisson distribution using the Akaiki Information Criteria. > Can you explain why this is or is not plausable? > > Lana > > -----Original Message----- > From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] > Sent: Saturday, October 22, 2011 5:42 PM > To: Lana Schaffer > Cc: Bioconductor mailing list > Subject: uneven counts for edgeR > > Dear Lana, > > edgeR has no difficulty with uneven library sizes, and will adjust for > this automatically for this during the analysis. There is no need for you > to do anything other than follow a standard analysis pipeline. > > You do not need to standardize the 4th sample by dividing the counts by > dividing by 4, in fact you must not do this since it changes the > mean-variance relationship for your data and invalidates the subsequent > analysis. You need to input the true read counts into edgeR. > > Best wishes > Gordon > > >> Date: Fri, 21 Oct 2011 15:27:25 -0700 >> From: Lana Schaffer <schaffer at="" scripps.edu=""> >> To: "'bioconductor at r-project.org'" <bioconductor at="" r-project.org=""> >> Subject: [BioC] uneven counts for edgeR >> >> Hi, >> I have replicate sample counts for 2 groups but one sample is 4x number of mapped reads >> Than the other samples. >> 528,428 >> >> 625,889 >> >> 498,569 >> >> 2,328,333 >> >> I divided all the mapped transcript reads by 4 and then did the >> normalization and analysis With edgeR. What do you recommend to do with >> the 4th sample counts? >> >> Lana Schaffer >> Biostatistics, Informatics >> DNA Array Core Facility >> 858-784-2263 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 13.2 years ago Gordon Smyth 52k