I was wondering whether it makes sense (or even possible) to use the SC3-scater packages to analyse bulk-RNA. We have a data set from human cancer with ~350 samples. In those I have three risk groups and multiple abnormalities within, which I would like to use like with scRNA as cell types and sub-types.
As I don't have Spike-Ins in the dataset, I was thinking either to use the normalised values from a previous DESeq2 analysis i did or normalise with the size factors by calculating the geometric mean.
Hi Assa
It is certainly possible and makes sense to use scater for bulk RNA-seq
data. I expect that SC3 would also work well on bulk RNA-seq data. The
authors of that package could offer more insight, but a priori I think
it would perform well in this setting as well.
Normalised values (if they are on the log scale) from a previous DESeq2
should be fine as expression values for input to scater and SC3.
However, if you were to start with raw count data, then I would
construct an SCESet object in scater from the count matrix and then
normalise with size-factor methods designed for bulk RNA-seq data.
In the "normaliseExprs" function in scater you can apply TMM
normalisation (from edgeR) or the DESeq size-factor normalisation
approach to obtain log2-scale normalised expression values that would
provide appropriate input for SC3. The TMM and DESeq size factor methods
designed for bulk RNA-seq should be a little better than using the
geometric mean, though the difference might be small if your libraries
are similar.
Best
Davis
On 10/03/2017 12:27, Assa Yeroslaviz [bioc] wrote:
> Activity on a post you are following on support.bioconductor.org
> <https: support.bioconductor.org="">
>
> User Assa Yeroslaviz <https: support.bioconductor.org="" u="" 1597=""/> wrote
> Question: using SC3 and scater for bulk-RNAseq
> <https: support.bioconductor.org="" p="" 93681=""/>:
>
> Hello
>
> I was wondering whether it makes sense (or even possible) to use the
> SC3-scater packages to analyse bulk-RNA. We have a data set from human
> cancer with ~350 samples. In those I have three risk groups and
> multiple abnormalities within, which I would like to use like with
> scRNA as cell types and sub-types.
>
> As I don't have Spike-Ins in the dataset, I was thinking either to use
> the normalised values from a previous DESeq2 analysis i did or
> normalise with the size factors by calculating the geometric mean.
>
> I would appreciate any Ideas
>
> thanks
>
> Assa
>
> ------------------------------------------------------------------------
>
> Post tags: sc3, scater, scRNAseq
>
> You may reply via email or visit
> using SC3 and scater for bulk-RNAseq
>
--
Davis McCarthy
NHMRC Early Career Fellow
Stegle Group
EMBL-EBI, Cambridge, UK
www.ebi.ac.uk
Is it possible? Yes. Counts are counts are counts, and scater will process them regardless of their origin.
Is it sensible? Well, I guess so, most clustering algorithms don't care where the counts came from. However, a lot of the subtleties with scater (e.g., QC on single-cells, normalization) are not relevant for bulk data. You might as well just run cpm with log=TRUE (from edgeR) and feed that directly into the clustering algorithms.
Regarding scater Davis and Aaron extensively replied above.
Regarding SC3 I will second Davis - using SC3 in general should be OK. But keep in mind that we optimised the range of the eigenvectors used for clustering (4%-7% of N, where N is the number of cells, see paper for details - http://biorxiv.org/content/early/2016/09/02/036558 ) specifically for scRNA-seq data. For the bulk data this range may not be optimal anymore, but still OK, because we cut all the noisy eigenvectors anyway.
You can change the range of eigenvectors by using d_region_min and d_region_max parameters.