I have a human and mouse SingleCellExperiment object and would like to concatenate them. I have converted mouse IDs into human IDs and subset each object to the intersection of IDs. I get an error message because Symbol ends up with duplicate values in it.
head(rowData(SCEmouse))
DataFrame with 6 rows and 3 columns
ID Symbol Type
<character> <character> <character>
ENSMUSG00000061195 ENSG00000186092 OR4F5 Gene Expression
ENSMUSG00000093804 ENSG00000284733 OR4F29 Gene Expression
ENSMUSG00000096351 ENSG00000187634 SAMD11 Gene Expression
ENSMUSG00000095567 ENSG00000188976 NOC2L Gene Expression
ENSMUSG00000078485 ENSG00000187583 PLEKHN1 Gene Expression
ENSMUSG00000078486 ENSG00000187642 PERM1 Gene Expression
head(rowData(SCEhuman))
DataFrame with 6 rows and 3 columns
ID Symbol Type
<character> <character> <character>
ENSG00000186092 ENSG00000186092 OR4F5 Gene Expression
ENSG00000284733 ENSG00000284733 OR4F29 Gene Expression
ENSG00000187634 ENSG00000187634 SAMD11 Gene Expression
ENSG00000188976 ENSG00000188976 NOC2L Gene Expression
ENSG00000187583 ENSG00000187583 PLEKHN1 Gene Expression
ENSG00000187642 ENSG00000187642 PERM1 Gene Expression
cbind(SCEhuman, SCEmouse) # Error
range(table(mcols(SCEhuman)[, "Symbol"]))
1 2
range(table(mcols(SCEmouse)[, "Symbol"]))
1 19
Is there a better way to do this task? The matching which I performed was to replace the mouse ID and symbol in the rowData
of the mouse object with the human value based on an orthologs table from ENSEMBL.
What is the error message? My guess that it is rownames duplication because one mouse gene might have more than one human ortholog.
It is treating Symbol as some kind of primary key, bizarrely. I had to change the commas to get the support website to accept it. You can see my tabulation of the values in Symbol column shows that there are some multiples.