Question

Using Scran normalization with in edgeR

1

Entering edit mode

mnaymik ▴ 10

@mnaymik-7522

Last seen 7.4 years ago

United States

I am trying to use a custom size factors (computed using the Scran package) in edgeR when doing the workflow for differential expression and get the following error:

d2=estimateDisp(d2,design.mat,mixed.df=T,offset=sce$size_factor)
Error in estimateDisp.default(y = y$counts, design = design, group = group, :
formal argument "offset" matched by multiple actual arguments

Here d2 is the raw counts DGEList and sce$size_factor is the size factors computed by scran for my single cell data. Is this the correct way to use the Scran size factors?

If instead of the above method I load an already Scran normalized counts table and use the default offset and default size factors of 1 then calling the estimate dispersion function, it seems to work. I guess my real question is are these equivalent? My end goal is just to use edgeR differential expression on single cell data with the Scran normalization method.

edger estimatedisp scran sizefactors • 2.5k views

ADD COMMENT • link updated 8.8 years ago by Aaron Lun ★ 28k • written 8.8 years ago by mnaymik ▴ 10

score 2 · Accepted Answer · 2016-07-18

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 12 hours ago

The city by the bay

It's a lot easier to just use the convertTo function to convert your SCESet to a DGEList.

d2 <- convertTo(sce, type="edgeR")

This is the recommended approach as it converts the size factors properly for use in edgeR. Doing it manually has several gotchas that you must be wary of. For starters, the size factors are functionally equivalent to the effective library sizes, which means that the offsets should be the log-size factors. Secondly, if you call estimateDisp on a DGEList, it will automatically try to pull the offsets out of the DGEList; this means that specifying offset in the function call will result in the "multiple arguments" error. If you must do it manually, you should instead assign the offset argument to d2$offset prior to calling estimateDisp.

As for using the "normalized counts"; scran does not report normalized counts. If you look carefully, the values in exprs(sce) are actually normalized log-expression values. Treating them as counts in edgeR would not be correct. Even if scran did report normalized counts, I would still use the raw counts for the edgeR analysis to ensure that the mean-variance relationship is modelled correctly.

ADD COMMENT • link 8.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you very much!

ADD REPLY • link 8.8 years ago mnaymik ▴ 10

0

Entering edit mode

I noticed also that if I change the order or the samples in the data frame and normalize via the quick cluster method that the size factors slightly change. Is there a random seed or something that is causing this?

ADD REPLY • link 8.8 years ago mnaymik ▴ 10

0

Entering edit mode

There are no random seeds, and I don't think it's an issue with quickCluster. One other possibility is that you have some cells with the same library size, which means that the ordering of cells to be used for pooling will change upon reordering of the samples in computeSumFactors. This will affect the pools that are formed and the size factors that are calculated. The slight changes shouldn't be a major problem for downstream analyses, so I wouldn't worry about it.

ADD REPLY • link 8.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Yeah, I noticed that the change is very small. I went through the differential expression workflow in edgeR on the same dataset ordered differently and the results are almost identical. Thanks again for the help!

ADD REPLY • link 8.8 years ago mnaymik ▴ 10