Entering edit mode
I am trying to annotate my genes with ensemble and
ah = AnnotationHub()
ens.hs.107<- query(ah, c("Homo sapiens", "EnsDb", 107))[[1]]
genes <- rowData(sce)$ID
head(genes)
gene_annot <- AnnotationDbi::select(ens.hs.107,
keys = genes,
keytype = "GENEID",
columns = c("GENEID", "SEQNAME")) %>%
set_names(c("ID", "Chromosome"))
head(gene_annot)
rowData(sce) <- merge(rowData(sce), gene_annot, by.= "ID", sort=FALSE)
Error in .local(x, ..., value = value) :
26554 rows in value to replace 26664rows
## genes found in genne_annot have fewer GENEIDs than genes in sce object
rownames(rowData(sce)) <- rowData(sce)$ID
I have tried to use by.x and all.x but not sure its correct as I don't know why I have fewer geneids
by.x, all.x= T
#and
by.x,
Thank you!
Thank you!
I have retrieved chromosomes but still have the same problem with the length of ensbl ID and gene ID
What do you get from
You might have mismatched versions.
Or it might be PAR genes.
I know that Ensembl hard masks the Y PAR region in their genome FASTA files, and maybe they do the same for the genes that occur in those regions? UCSC doesn't appear to.
It gives
I have found cell ranger used GRCh38-2020-A for mapping so I am using ensbl 101 now. I have 30 GENEIDs that don't match, maybe is the Y PAR genes but just wonder why more people don't have the same issue. Not sure how to get around it and/or whether this will have an effect on the downstream analysis.
I think people do have the same issue. I often get data from the sequencing core that has already been aligned to whatever version of the Ensembl genome they are using, and I then have to iterate through the AnnotationHub, testing for consistency in available Ensembl Gene IDs from a given Ensembl version and the data I have in hand until I can find the version that matches.
Thank you!
I have been looking further into it and it looks like it's due to PAR as CellRanger uses GENCODE v32. The IDs are version controlled so I cannot map directly using annotationhub will try using UCS.
Hi, I have exactly the same problem here. May I ask how did you solve this at the end? I have around 200 genes that cannot be annotated by ens.hs.107 and I downloaded the reference (human (GRCh38)) from cellranger directly. Thank you!