So I am trying get gene information from MCA genes.
I used :
library(biomaRt)
human <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
mca_filter <- mca@var.genes
attr <- c("ensembl_gene_id", "hgnc_symbol","chromosome_name",'entrezgene', "start_position", "end_position")
Info <- getBM(attributes = attr,
filters = "hgnc_symbol",
values = mca_filter,
mart = human)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
But the problem is that I cannot get multiple gene information by using biomaRt. (collecting single/individual information is fine)
For example, 'mca_filter' contains "Selenop" gene. So
> which(mca_filter == "Selenop")
[1] 400
and it can also get gene information by using this code
Info <- getBM(attributes = attr,
filters = "hgnc_symbol",
values = "Selenop",
mart = human)
which gives this result
>Info
ensembl_gene_id hgnc_symbol chromosome_name entrezgene start_position end_position
1 ENSG00000250722 SELENOP 5 6414 42799880 42887392
HOWEVER, If I just put mca_filter instead of single gene:
Info <- getBM(attributes = attr,
filters = "hgnc_symbol",
values = mca_filter,
mart = human)
I cannot get many single gene information.
> which(Info$hgnc_symbol == "Selenop")
integer(0)
Do you know why? Please let me know. Thank you!
Before digging any deeper, can you check this isn't due to case sensitive matching. The command
which(Info$hgnc_symbol == "Selenop")
will only match entries that look likeSelenop
, but your query returnsSELENOP
. Your first example will fail this too:You can use a function like grep to perform a case-insenstive search e.g.
If this doesn't resolve the issue then please include the output of
is(mca_filter)
andhead(mca_filter)
so we can see examples of what values are present.Thanks for the comment, Mike. I am afraid that it is not a case sensitive matching.
mca_filter has "Selenop" gene so I tried both values = mca_filter and values = c("Selenop").
But only values = c("Selenop") gives the correct result.
=================================================================
> Info <- getBM(attributes = attr,
+ filters = "hgnc_symbol",
+ values = c("Selenop", "CDC6"),
+ mart = human)
> Info
ensembl_gene_id hgnc_symbol chromosome_name entrezgene start_position end_position
1 ENSG00000094804 CDC6 17 990 40287633 40304657
2 ENSG00000250722 SELENOP 5 6414 42799880 42887392
This also works for me but when I put mca_filter rather than some single or multiple gene, it only gives one gene information.
> intersect(Info$hgnc_symbol, mca_filter)
[1] "H19"
This means it only get "H19" gene information.
> length(Info$hgnc_symbol)
[1] 697
> length(mca_filter)
[1] 1000
When I check the number of genes in each list, they show like the above.
==========================================================
I will also give you the information that you asked for.
> is(mca_filter)
[1] "character" "vector" "data.frameRowLabels" "SuperClassMethod" "index" "atomicVector" "kfunction"
[8] "EnumerationValue" "characterORconnection" "characterORMIAME" "character_OR_NULL" "atomic" "listI" "output"
[15] "vector_OR_factor"
> head(mca_filter, 10)
[1] "Spink1" "Gast" "Sbp" "Wap" "Csn1s2a" "Ins2" "Igha" "Igkc" "Sftpc" "Scgb1a1"
=============================================================================
For pbmc data from Seurat, it worked perfectly fine but when I used mca data from Seurat (https://satijalab.org/seurat/mca_loom.html), it doesn't work. I also used mca_filter as hv.genes from this website.
Hence,
mca_filter <- hv.genes
================================================================
Thank you so much again.