Behaviour of weights in limma
1
0
Entering edit mode
Paul Harrison ▴ 100
@paul-harrison-5740
Last seen 5 weeks ago
Australia/Melbourne/Monash University B…
Hello, I have some data from a variant of RNA-seq which I am hoping do some moderated t-test differential testing on with limma. In this data, many of the reads have sequenced through into the poly(A) tail, and we believe this gives us information about changes in poly(A) tail length. For each gene and sample, we can calculate an average observed tail length. It seems easy enough to calculate a standard error for this average as well. In some cases we have few reads and the standard error is high, in others we have quite a lot of reads and the standard error is low. What I'm hoping is that this can be translated into weights that can be fed to limma to make it behave correctly. Do weights have some specific meaning in terms of measurement variance? And how does this interact with moderation between genes, for example could including highly noisy measurements from some genes detract from the significance of other genes where the measurement is more precise? regards, Paul Harrison Victorian Bioinformatics Consortium / Monash University
limma limma • 1.3k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia
Dear Paul, > Date: Wed, 4 Jun 2014 17:30:59 +1000 > From: Paul Harrison <paul.harrison at="" monash.edu=""> > To: Bioconductor mailing list <bioconductor at="" r-project.org=""> > Subject: [BioC] Behaviour of weights in limma > > Hello, > > I have some data from a variant of RNA-seq which I am hoping do some > moderated t-test differential testing on with limma. In this data, many > of the reads have sequenced through into the poly(A) tail, and we > believe this gives us information about changes in poly(A) tail length. > > For each gene and sample, we can calculate an average observed tail > length. It seems easy enough to calculate a standard error for this > average as well. I don't think that you can actually calculate a measingful standard error. The total error depends on both biological and technical components. You can predict how the measurement error depends on the number of reads, but you don't know what proportion of the total error the measurement error makes up. > In some cases we have few reads and the standard error is high, in > others we have quite a lot of reads and the standard error is low. > > What I'm hoping is that this can be translated into weights that can be > fed to limma to make it behave correctly. Do weights have some specific > meaning in terms of measurement variance? They have a specific meaning, but it is in terms of total variance not in terms of measurement variance. The meaning of weights in limma is the same as for any linear modelling or regression procedures, which is that the total variance is assumed inversely proportional to the weight. > And how does this interact with moderation between genes, Intimately. > for example could including highly noisy measurements from some genes > detract from the significance of other genes where the measurement is > more precise? Yes. Could you not simply use voom or edgeR, both of which already do what you seem to be asking, which is to take the number of reads into account when estimating variability and assessing DE? Best wishes Gordon > regards, > Paul Harrison > > Victorian Bioinformatics Consortium / Monash University ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
On Thu, Jun 5, 2014 at 10:04 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > Dear Paul, > >> Date: Wed, 4 Jun 2014 17:30:59 +1000 >> From: Paul Harrison <paul.harrison at="" monash.edu=""> >> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >> Subject: [BioC] Behaviour of weights in limma >> >> Hello, >> >> I have some data from a variant of RNA-seq which I am hoping do some >> moderated t-test differential testing on with limma. In this data, many of >> the reads have sequenced through into the poly(A) tail, and we believe this >> gives us information about changes in poly(A) tail length. >> >> For each gene and sample, we can calculate an average observed tail >> length. It seems easy enough to calculate a standard error for this average >> as well. > > > I don't think that you can actually calculate a measingful standard error. > The total error depends on both biological and technical components. You > can predict how the measurement error depends on the number of reads, but > you don't know what proportion of the total error the measurement error > makes up. > Yes. Sorry, I meant that the technical error is fairly accurately known. >> In some cases we have few reads and the standard error is high, in others >> we have quite a lot of reads and the standard error is low. >> >> What I'm hoping is that this can be translated into weights that can be >> fed to limma to make it behave correctly. Do weights have some specific >> meaning in terms of measurement variance? > > > They have a specific meaning, but it is in terms of total variance not in > terms of measurement variance. > > The meaning of weights in limma is the same as for any linear modelling or > regression procedures, which is that the total variance is assumed inversely > proportional to the weight. > Ah. >> And how does this interact with moderation between genes, > > > Intimately. > >> for example could including highly noisy measurements from some genes >> detract from the significance of other genes where the measurement is more >> precise? > > > Yes. > That would also mean that even in conventional RNA-seq data it could be worthwhile to filter out low coverage genes before applying voom and limma? > Could you not simply use voom or edgeR, both of which already do what you > seem to be asking, which is to take the number of reads into account when > estimating variability and assessing DE? > To use voom I would need to alter the voom function to take the average tail lengths as a parameter in addition to counts. This looks fairly straightforward. Given what you've said above, an alternative would be, just for the purpose of calculating weights, to come up with a constant value for the biological variance, for example by Maximum Likelihood. Thank you, Paul Harrison Victorian Bioinformatics Consortium / Monash University
ADD REPLY

Login before adding your answer.

Traffic: 520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6