I am trying to work out the best way to represent weights in the newer SummarizedExperiment
and SingleCellExperiment
classes.
By weights I mean inverse variance (up to some scaling factor) for example as accepted by the lm()
function. Weights are a somewhat simplistic but general way of accounting for the varying noise levels between different observations in many types of data. I'm interested in developing generic approaches for things like visualization and principal components analysis based on this. limma closely follows the "Statistical models in S" book's approach to linear modelling, and so the limma EList
class has support for weights. This has been a quite convenient data type to use, but I'd like to follow modern Bioconductor standards if possible. limma also has an internal getEAWP
function for extracting data including weights from a variety of classes, but this doesn't look like it supports SummarizedExperiement
.
Are there any other packages with support for weights, and what is their preferred representation?
Would a naming convention for assays be a good way to do this? For example a "log2CPM" assay could have an associated "log2CPMWeights" assay.
There is possibly also a need to clearly distinguish technical variability alone from and technical+biological variability.
It looks like
SingleCellExperiment
has aweights()
method, which is hopeful but I'm not clear on which assay the weights are meant to be associated with with.It is not possible for limma to support SummarizedExperiment objects because it is not a sufficiently well-defined class. The Assays can contain anything, so it's impossible for a downstream package like limma to know what sort of analysis might be appropriate.
Ok, it's important not to guess wrong. I'll require the user to do something explicitly marking the assays appropriate to use.
I think my solution will be to add some metadata fields to a SummarizedExperiment defining the appropriate pair of assays to use. I'll then write my code to produce an error if these fields are not present.
(An alternative would be to define a subclass of SummarizedExperiment which was sufficiently well defined. However there's already an ecosystem of subclasses of SummarizedExperiment which I want to be able to build on without subclassing each of them.)