Quality control in SingleCellExperiment
1
0
Entering edit mode
Linda • 0
@linda-23123
Last seen 3.6 years ago
United Kingdom

Hi all,

I am new to R and SingleCellExperiment.

I am currently working through the following tutorial (Chapter 6):

https://osca.bioconductor.org/ but it is not working! Here is what I type in the RStudio console:

BiocManager::install("scRNAseq")

library(scRNAseq)

sce.416b <- LunSpikeInData(which="416b")

library(AnnotationHub)
ens.mm.v97 <- AnnotationHub()[["AH73905"]]
location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
                   keytype="GENEID", column="SEQNAME")
is.mito <- which(location=="MT")

but I get the following error:

> sce.416b <- LunSpikeInData(which="416b")
snapshotDate(): 2019-10-22
see ?scRNAseq and browseVignettes('scRNAseq') for documentation
loading from cache
see ?scRNAseq and browseVignettes('scRNAseq') for documentation
loading from cache
see ?scRNAseq and browseVignettes('scRNAseq') for documentation
loading from cache
> 
> library(AnnotationHub)
> ens.mm.v97 <- AnnotationHub()[["AH73905"]]
snapshotDate(): 2019-10-29
loading from cache
require(“ensembldb”)
Error: failed to load resource
  name: AH73905
  title: Ensembl 97 EnsDb for Mus musculus
  reason: require(“ensembldb”) failed: use BiocManager::install() to install package?
In addition: Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘ensembldb’
> location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
+                    keytype="GENEID", column="SEQNAME")
Error in mapIds(ens.mm.v97, keys = rownames(sce.416b), keytype = "GENEID",  : 
  could not find function "mapIds"
> is.mito <- which(location=="MT")
Error in which(location == "MT") : object 'location' not found

Can anyone tell me why I am unable to follow this tutorial?

Also, can someone tell me what the heck this code is actually doing:

library(AnnotationHub)
ens.mm.v97 <- AnnotationHub()[["AH73905"]]
location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
                   keytype="GENEID", column="SEQNAME")
is.mito <- which(location=="MT")

What is "AH73905" what is "ens.mm.v97" and how exactly is this finding mitochondrial genes in sce.416b?

Any help with this would be really appreciated.

Kind regards, Linford

SingleCellExperiment scater sce quality control • 2.9k views
ADD COMMENT
0
Entering edit mode

OK so I have followed the error messages and realised I needed to additional packages:

BiocManager::install("ensembldb")

Now it appears to work, but I am getting an error message:

library(AnnotationHub)
> ens.mm.v97 <- AnnotationHub()[["AH73905"]]
snapshotDate(): 2019-10-29
loading from cache
> location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
+                    keytype="GENEID", column="SEQNAME")
Warning message:
Unable to map 563 of 46604 requested IDs. 
> is.mito <- which(location=="MT")

Which I assume is not such a big issue?

library(scater)
df <- perCellQCMetrics(sce.416b, subsets=list(Mito=is.mito))
df

However, if someone could explain exactly how this code is identifying mitochondrial transcripts I would be very grateful.

ADD REPLY
0
Entering edit mode

I am glad you have solved the first problem.

I think you just have a warning message. I think it doesn't matter. This message is common. The reason is that there are 563 IDs that can't map to the reference annotation dataset ens.mm.v97.

The content of location is like that:

GRangesList object of length 46604: $ENSMUSG00000102693 GRanges object with 1 range and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle>

[1] 1 3073253-3074322 +

seqinfo: 61 sequences from GRCm38 genome

$ENSMUSG00000064842 GRanges object with 1 range and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle>

[1] 1 3102016-3102125 +

seqinfo: 61 sequences from GRCm38 genome

$ENSMUSG00000051951 GRanges object with 1 range and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle>

[1] 1 3205901-3671498 -

seqinfo: 61 sequences from GRCm38 genome

...

<46601 more elements>

The seqnames is the chr number.The genes are located on MT are marked by MT. That defines the MT genes.

ADD REPLY
0
Entering edit mode

library(scRNAseq) sce.416b <- LunSpikeInData(which="416b") sce.416b$block <- factor(sce.416b$block)

如果以上不能用,可以使用如下方法自行构建测试数据集

构建所需的数据:1.colData 2.edb 3.rowData 4.sceData 5.spikein

1.colData下载途径:http://s3.amazonaws.com/experimenthub/scRNAseq/lun-spikein/2.0.0/coldata-416b.rds

colData <- readRDS("d:/analysis-learn/mms_mt_analysis/coldata-416b.rds")

2.edb下载途径:

BiocManager::install("EnsDb.Mmusculus.v79")此处没有使用EnsDb.Mmusculus.v97数据

library(EnsDb.Mmusculus.v79)

edb <- EnsDb.Mmusculus.v79

3.rowData下载途径:http://s3.amazonaws.com/experimenthub/scRNAseq/lun-spikein/2.0.0/rowdata-416b.rds

rowData <- readRDS("d:/analysis-learn/mms_mt_analysis/rowdata-416b.rds")

4.sceData下载途径:http://s3.amazonaws.com/experimenthub/scRNAseq/lun-spikein/2.0.0/counts-416b.rds

sceData <- readRDS("d:/analysis-learn/mms_mt_analysis/counts-416b.rds")

5.spikein下载途径:

此处直接使用教程中的方法,并将数据转换为文档spikein数据

spikedata <- read.table("spikedata.txt")

构建测试所用数据集:

sce <- SingleCellExperiment(assays = list(counts = sceData),colData = colData, rowData = rowData)

spike.type <- rep("endogenous", nrow(sce))

spike.type[grep("ERCC", rownames(sce))] <- "ERCC"

spike.type[grep("SIRV", rownames(sce))] <- "SIRV"

sce <- splitAltExps(sce, spike.type, ref="endogenous")

spike.exp <- altExp(sce, "ERCC")

spikedata <- spikedata[rownames(spike.exp), ]

rowData(spike.exp) <- cbind(rowData(spike.exp), spikedata)

altExp(sce, "ERCC") <- spike.exp

ginfo <- genes(edb, columns=character(0))

mcols(ginfo) <- NULL

m <- match(rownames(sce), names(ginfo))

present <- !is.na(m)

if (!all(present)) {

replacement <- rowRanges(sce)

mcols(replacement) <- NULL

ginfo <- as(ginfo, "GRangesList")

replacement[present] <- ginfo[m[present]]

} else {

replacement <- ginfo[m]

}

mcols(replacement) <- rowData(sce)

rowRanges(sce) <- replacement

sce.416b <- sce

此时sce.416b数据集与示例中的数据相似度很高,可以进行后续的步骤

if you have any questions,please send e-mails to 1498554536@qq.com

Best wishes!

ADD REPLY
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 7 hours ago
The city by the bay

Also, can someone tell me what the heck this code is actually doing:

First, breath deeply. And then look at each step.

library(AnnotationHub)

This loads the AnnotationHub package, which provides easy access to various annotation files in the AnnotationHub. This is a central hub for all sorts of things, e.g., gene models for various organisms, peak calls from various epigenomics projects, dbSNP variants, and so on. It would be a pain to embed each of these files into a separate package, so instead we use a one-stop-shop for all of them.

ens.mm.v97 <- AnnotationHub()[["AH73905"]]

This grabs the file corresponding to the mouse Ensembl annotation, version 97. The "AH73905" is just the identifier for this file in the AnnotationHub (AH, get it?). If that seems a bit cryptic, you could just do query(AnnotationHub(), "Ensembl 97") and dig down to "Mus musculus" to get the same number.

location <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
                   keytype="GENEID", column="SEQNAME")

Pretty simple here. We're just taking all the Ensembl IDs in the row names of sce.416b, and we're mapping them to the SEQNAME, i.e., we're getting the chromosome on which each gene is found. The keytype= just specifies that the character vector contains... well, gene IDs. If you had, say, gene symbols, you could replace that with keytype="SYMBOL".

I should note that the latest version of LunSpikeInData will load location information by default, but that hasn't made its way into the book yet. So when that happens, you don't need to do all of this, but it's good to learn in case you want to annotate your own objects.

is.mito <- which(location=="MT")

Nothing much to say here. Is it mitochondrial or not?

Which I assume is not such a big issue?

That's correct. For the sake of simplicity, I didn't bother to find the exact annotation that I used to obtain the counts, and there are slight differences between Ensembl versions (usually around genes that no one cares about, e.g., GmXXXX).

ADD COMMENT

Login before adding your answer.

Traffic: 832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6