Hello,
I have some data from a variant of RNA-seq which I am hoping do some
moderated t-test differential testing on with limma. In this data,
many of the reads have sequenced through into the poly(A) tail, and we
believe this gives us information about changes in poly(A) tail
length.
For each gene and sample, we can calculate an average observed tail
length. It seems easy enough to calculate a standard error for this
average as well. In some cases we have few reads and the standard
error is high, in others we have quite a lot of reads and the standard
error is low.
What I'm hoping is that this can be translated into weights that can
be fed to limma to make it behave correctly. Do weights have some
specific meaning in terms of measurement variance? And how does this
interact with moderation between genes, for example could including
highly noisy measurements from some genes detract from the
significance of other genes where the measurement is more precise?
regards,
Paul Harrison
Victorian Bioinformatics Consortium / Monash University
Dear Paul,
> Date: Wed, 4 Jun 2014 17:30:59 +1000
> From: Paul Harrison <paul.harrison at="" monash.edu="">
> To: Bioconductor mailing list <bioconductor at="" r-project.org="">
> Subject: [BioC] Behaviour of weights in limma
>
> Hello,
>
> I have some data from a variant of RNA-seq which I am hoping do some
> moderated t-test differential testing on with limma. In this data,
many
> of the reads have sequenced through into the poly(A) tail, and we
> believe this gives us information about changes in poly(A) tail
length.
>
> For each gene and sample, we can calculate an average observed tail
> length. It seems easy enough to calculate a standard error for this
> average as well.
I don't think that you can actually calculate a measingful standard
error.
The total error depends on both biological and technical components.
You
can predict how the measurement error depends on the number of reads,
but
you don't know what proportion of the total error the measurement
error
makes up.
> In some cases we have few reads and the standard error is high, in
> others we have quite a lot of reads and the standard error is low.
>
> What I'm hoping is that this can be translated into weights that can
be
> fed to limma to make it behave correctly. Do weights have some
specific
> meaning in terms of measurement variance?
They have a specific meaning, but it is in terms of total variance not
in
terms of measurement variance.
The meaning of weights in limma is the same as for any linear
modelling or
regression procedures, which is that the total variance is assumed
inversely proportional to the weight.
> And how does this interact with moderation between genes,
Intimately.
> for example could including highly noisy measurements from some
genes
> detract from the significance of other genes where the measurement
is
> more precise?
Yes.
Could you not simply use voom or edgeR, both of which already do what
you
seem to be asking, which is to take the number of reads into account
when
estimating variability and assessing DE?
Best wishes
Gordon
> regards,
> Paul Harrison
>
> Victorian Bioinformatics Consortium / Monash University
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
On Thu, Jun 5, 2014 at 10:04 AM, Gordon K Smyth <smyth at="" wehi.edu.au="">
wrote:
> Dear Paul,
>
>> Date: Wed, 4 Jun 2014 17:30:59 +1000
>> From: Paul Harrison <paul.harrison at="" monash.edu="">
>> To: Bioconductor mailing list <bioconductor at="" r-project.org="">
>> Subject: [BioC] Behaviour of weights in limma
>>
>> Hello,
>>
>> I have some data from a variant of RNA-seq which I am hoping do
some
>> moderated t-test differential testing on with limma. In this data,
many of
>> the reads have sequenced through into the poly(A) tail, and we
believe this
>> gives us information about changes in poly(A) tail length.
>>
>> For each gene and sample, we can calculate an average observed tail
>> length. It seems easy enough to calculate a standard error for this
average
>> as well.
>
>
> I don't think that you can actually calculate a measingful standard
error.
> The total error depends on both biological and technical components.
You
> can predict how the measurement error depends on the number of
reads, but
> you don't know what proportion of the total error the measurement
error
> makes up.
>
Yes. Sorry, I meant that the technical error is fairly accurately
known.
>> In some cases we have few reads and the standard error is high, in
others
>> we have quite a lot of reads and the standard error is low.
>>
>> What I'm hoping is that this can be translated into weights that
can be
>> fed to limma to make it behave correctly. Do weights have some
specific
>> meaning in terms of measurement variance?
>
>
> They have a specific meaning, but it is in terms of total variance
not in
> terms of measurement variance.
>
> The meaning of weights in limma is the same as for any linear
modelling or
> regression procedures, which is that the total variance is assumed
inversely
> proportional to the weight.
>
Ah.
>> And how does this interact with moderation between genes,
>
>
> Intimately.
>
>> for example could including highly noisy measurements from some
genes
>> detract from the significance of other genes where the measurement
is more
>> precise?
>
>
> Yes.
>
That would also mean that even in conventional RNA-seq data it could
be worthwhile to filter out low coverage genes before applying voom
and limma?
> Could you not simply use voom or edgeR, both of which already do
what you
> seem to be asking, which is to take the number of reads into account
when
> estimating variability and assessing DE?
>
To use voom I would need to alter the voom function to take the
average tail lengths as a parameter in addition to counts. This looks
fairly straightforward.
Given what you've said above, an alternative would be, just for the
purpose of calculating weights, to come up with a constant value for
the biological variance, for example by Maximum Likelihood.
Thank you,
Paul Harrison
Victorian Bioinformatics Consortium / Monash University