I have isolated this hang in my R code to rtracklayer. I see it on my Amazon Linux (AWS) box but not a local server. The AWS box usually, but not always, hangs on either the browserSession() call or the genome(session) call. Here is the strace output of the AWS box:
> library(rtracklayer) (...snip library load spam...) socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3 connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3 connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=6982, si_status=0, si_utime=0, si_stime=0} --- > sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Amazon Linux AMI 2016.03 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] rtracklayer_1.30.4 GenomicRanges_1.22.4 GenomeInfoDb_1.6.3 [4] IRanges_2.4.8 S4Vectors_0.8.11 BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] XML_3.98-1.4 Rsamtools_1.22.0 [3] Biostrings_2.38.4 GenomicAlignments_1.6.3 [5] bitops_1.0-6 futile.options_1.0.0 [7] zlibbioc_1.16.0 XVector_0.10.0 [9] futile.logger_1.4.1 lambda.r_1.1.7 [11] BiocParallel_1.4.3 tools_3.2.2 [13] Biobase_2.30.0 RCurl_1.95-4.8 [15] SummarizedExperiment_1.0.2 > session = browserSession() socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3 connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("128.114.119.133")}, 16) = -1 EINPROGRESS (Operation now in progress) getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 getpeername(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("128.114.119.133")}, [16]) = 0 getsockname(3, {sa_family=AF_INET, sin_port=htons(34220), sin_addr=inet_addr("10.0.7.56")}, [16]) = 0 sendto(3, "GET /cgi-bin/hgGateway HTTP/1.1\r"..., 96, MSG_NOSIGNAL, NULL, 0) = 96 recvfrom(3, "HTTP/1.1 200 OK\r\nDate: Fri, 22 A"..., 16384, 0, NULL, NULL) = 8332 recvfrom(3, "1000\r\nenu -->\n</div><!-- end mai"..., 16384, 0, NULL, NULL) = 4104 recvfrom(3, "4000\r\nnatee</OPTION>\n<OPTION VAL"..., 16384, 0, NULL, NULL) = 16384 recvfrom(3, "ZE=-1>\r\n8e1\r\n25</FONT></TD></TR>"..., 16384, 0, NULL, NULL) = 2293 > genome(session) = "hg19" socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4 connect(4, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("128.114.119.132")}, 16) = -1 EINPROGRESS (Operation now in progress) getsockopt(4, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 getpeername(4, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("128.114.119.132")}, [16]) = 0 getsockname(4, {sa_family=AF_INET, sin_port=htons(52026), sin_addr=inet_addr("10.0.7.56")}, [16]) = 0 sendto(4, "GET /cgi-bin/hgGateway?db=hg19 H"..., 158, MSG_NOSIGNAL, NULL, 0) = 158 (...hang indefinitely...)
The strace output of the same code on my local gateway is too long to post here. It never hangs as far as I can tell. Here's the gist:
Here's the sessionInfo:
> sessionInfo() --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=25320, si_status=0, si_utime=0, si_stime=0} --- --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=25322, si_status=0, si_utime=0, si_stime=0} --- R version 3.2.3 (2015-12-10) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS release 6.7 (Final) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] rtracklayer_1.26.3 GenomicRanges_1.18.4 GenomeInfoDb_1.2.5 [4] IRanges_2.0.1 S4Vectors_0.4.0 BiocGenerics_0.12.1 loaded via a namespace (and not attached): [1] magrittr_1.5 XVector_0.6.0 zlibbioc_1.12.0 [4] GenomicAlignments_1.2.2 BiocParallel_1.0.3 brew_1.0-6 [7] foreach_1.4.3 stringr_1.0.0 sendmailR_1.2-1 [10] tools_3.2.3 fail_1.3 checkmate_1.7.4 [13] DBI_0.3.1 iterators_1.0.8 BatchJobs_1.6 [16] digest_0.6.9 base64enc_0.1-3 bitops_1.0-6 [19] codetools_0.2-14 RCurl_1.95-4.8 RSQLite_1.0.0 [22] stringi_1.0-1 BBmisc_1.9 backports_1.0.2 [25] Biostrings_2.34.1 Rsamtools_1.18.3 XML_3.98-1.4
I don't understand how it could be any networking issue with my AWS box. I have no trouble with other internet tasks. I'm hosting a Galaxy server on the box and trying to get Galaxy to run this R script.
It appears to be related to the AWS region. The same AMI in a different region (us-east-1) works fine, while us-west-2 hangs.