First let's address why you're seeing the error. The reason you're getting the Timeout was reached
error is because BioMart has a limit of 5 minutes for queries to run. The more data you ask for the longer a query will take. In your case, when you're asking for attributes$name
that's actually 11 attributes. Combine that with ~25,000 genes and you're asking for a lot of data. BioMart isn't designed as a bulk data provider and so it times out. You can improve the chances of your query running by either reducing the number of attributes or the number of genes. You can submit multiple smaller queries and then try to stitch the results back together.
Regarding finding the homologs, I think what you've already tried looks like a reasonable strategy. However we can actually combine your two queries into one. We'll use the with_mmusculus_paralog
filter to restrict our results to only those that have paralogs, and then ask for the gene name and the paralog information in the attributes
argument. Here's an example:
library(biomaRt)
mouse <- useEnsembl("ensembl", dataset = "mmusculus_gene_ensembl")
res <- getBM(filter = "with_mmusculus_paralog",
value = TRUE,
attributes = c("ensembl_gene_id",
"mmusculus_paralog_ensembl_gene",
"mmusculus_paralog_orthology_type",
"mmusculus_paralog_perc_id"),
mart = mouse)
dim(res)
#> [1] 2395592 4
head(res, n = 8)
#> ensembl_gene_id mmusculus_paralog_ensembl_gene
#> 1 ENSMUSG00000064345 ENSMUSG00000064367
#> 2 ENSMUSG00000064345 ENSMUSG00000064363
#> 3 ENSMUSG00000064363 ENSMUSG00000064345
#> 4 ENSMUSG00000064363 ENSMUSG00000064367
#> 5 ENSMUSG00000064367 ENSMUSG00000064345
#> 6 ENSMUSG00000064367 ENSMUSG00000064363
#> 7 ENSMUSG00002074970 ENSMUSG00002075052
#> 8 ENSMUSG00002074970 ENSMUSG00002076483
#> mmusculus_paralog_orthology_type mmusculus_paralog_perc_id
#> 1 other_paralog 20.5797
#> 2 other_paralog 18.5507
#> 3 other_paralog 13.9434
#> 4 other_paralog 16.9935
#> 5 other_paralog 11.6969
#> 6 other_paralog 12.8501
#> 7 within_species_paralog 79.6040
#> 8 within_species_paralog 75.6436
I've selected three of the paralog related attributes, you can pick different ones if they're more appropriate for whatever you're trying to do. Note that the mmusculus_paralog_orthology_type
column distinguishes between paralogs that appear only in mouse vs those that are also homologous across species (that's the "other_paralog" type). There's also a lot of duplication in here because for every set of homologs all possible pairings will be listed - you can see that in the first 6 lines of this output.
@James W. MacDonald Thanks for the reply above one will give orthologus gene info I want "Paralog genes" within species
This user recently posted here: Using R BioMart getting orthologus genes between two species