STRINGdb server response time for mapping
0
0
Entering edit mode
@39f0103f
Last seen 5 months ago
Italy

Hello, I have noticed that the server response time for mapping 15-20 genes/proteins is 40-70 seconds. Is there a way to speed up the process?"

Initialize the STRINGdb object

string_db <- STRINGdb$new(version = "12.0", species = OMcode, score_threshold = score_threshold, network_type = network_type, input_directory = "")

Map genes in the merged_tab_reaz dataframe to STRING IDs

my_df <- string_db$map(reazionifinale, "GENE", removeUnmappedRows = TRUE)

platform x86_64-w64-mingw32
arch x86_64
os mingw32
crt ucrt
system x86_64, mingw32
status
major 4
minor 3.1
year 2023
month 06
day 16
svn rev 84548
language R
version.string R version 4.3.1 (2023-06-16 ucrt) nickname Beagle Scouts

mapping serverresponsetime STRINGdb • 399 views
ADD COMMENT
0
Entering edit mode

It should not take that long.

That said the method loads all aliases every time you call it which takes 5 to 10s, so it's not particularly efficient in mapping small protein sets repeatedly.

40-70s is way too long though.

If it's a new species (you map it the first time) STRINGdb package must download the alias file (which may take tens of seconds). Ensure the alias files with all the names is already downloaded by setting the input_directory to a directory that does not change between the runs. The second time you run the species it should take less than 10s more or less independently of your input size.

Hope that solves the problem.

ADD REPLY
0
Entering edit mode

Thank you for your prompt reply, below I show you the code I am working with. I have inserted commands to check the duration of the most time consuming steps. For mapping I had 33 seconds. I point out that getting the interactions is also quite time consuming :

Start time monitoring

start_time <- Sys.time()

Initialize the STRINGdb object

string_db <- STRINGdb$new(version = "12.0", species = 9606, score_threshold = 600, network_type = "physical", input_directory = "")

end_time1 <- Sys.time() print(paste("Initialize the STRINGdb object:", as.numeric(difftime(end_time1, start_time, units = "secs")), "seconds")) [1] "Initialize the STRINGdb object: 0.366113185882568 seconds"

input <-c("PLD2","PLD1","PLD4","PLD3","EPT1","FAM73B","FAM73A", "CEPT1") input_df <- data.frame(gene_name = input)

Map genes in the dataframe merged_tab_reaz to STRING IDs

input_mapped <- string_db$map(input_df, my_data_frame_id_col_names = c("gene_name"), removeUnmappedRows = TRUE)

end_time2 <- Sys.time() print(paste("Map genes in the dataframe:", as.numeric(difftime(end_time2, end_time1, units = "secs")), "seconds")) [1] "Map genes in the dataframe: 33.4210779666901 seconds"

Display input_mapped

print(input_mapped) gene_name STRING_id 1 PLD2 9606.ENSP00000263088 2 PLD1 9606.ENSP00000342793 3 PLD4 9606.ENSP00000438677 4 PLD3 9606.ENSP00000387050 5 EPT1 9606.ENSP00000260585 6 FAM73B 9606.ENSP00000351138 7 FAM73A 9606.ENSP00000393675 8 CEPT1 9606.ENSP00000441980


As you can see, I have included commands to measure the response time of the STRINGdb commands. The server takes 33 seconds to map 8 genes. The same group of genes, on the string website, are mapped in less than a second...

The question I ask is: is there any way to reduce the response time for mapping?

You can try it yourself using the code below:

<h6>#</h6>

Start time monitoring

start_time <- Sys.time()

Initialize the STRINGdb object

string_db <- STRINGdb$new(version = "12.0", species = 9606, score_threshold = 600, network_type = "physical", input_directory = "")

end_time1 <- Sys.time() print(paste("Initialize the STRINGdb object:", as.numeric(difftime(end_time1, start_time, units = "secs")), "seconds"))

input <-c("PLD2","PLD1","PLD4","PLD3","EPT1","FAM73B","FAM73A", "CEPT1") input_df <- data.frame(gene_name = input)

Map genes in the dataframe merged_tab_reaz to STRING IDs

input_mapped <- string_db$map(input_df, my_data_frame_id_col_names = c("gene_name"), removeUnmappedRows = TRUE)

end_time2 <- Sys.time() print(paste("Map genes in the dataframe:", as.numeric(difftime(end_time2, end_time1, units = "secs")), "seconds"))

Display input_mapped

print(input_mapped)

<h6>#</h6>
ADD REPLY
0
Entering edit mode

btw. The mapping in STRINGdb bionc package is local. R package does not communicate with the STRING server to map the proteins. You should be able to run it without an internet connection.

Again. Make sure it does not redownload the mapping data every time you run this script. I cannot know this from the provided code.

ADD REPLY

Login before adding your answer.

Traffic: 934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6