How to loop over two files ...and calculate R^2 between SNPs
1
0
Entering edit mode
anamaria ▴ 10
@anamaria-21976
Last seen 4.4 years ago

Hello,

I have two files (each has 300 lines)like this:

head 1g.txt
rs6792369
rs1414517
rs16857712
rs16857703
rs12239392
...

head 1n.txt
rs1042779
rs2360630
rs10753597
rs7549096
rs2343491
...

For each pair of rs# from those two files I can run this command in R

library(httr)
library(jsonlite)
library(xml2)

server <- "http://rest.ensembl.org"
ext <- "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"

r <- GET(paste(server, ext, sep = ""), content_type("application/json"))

stop_for_status(r)
head(fromJSON(toJSON(content(r))))
   d_prime       r2 variation1 variation2         population_name
 1 0.975513 0.951626  rs6792369  rs1042779 1000GENOMES:phase_3:KHV

What I would like to do is to do is to run this command for every SNP in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP# is rs# and output every line of result in list.txt

The process is illustrated in the attachment. https://imgur.com/a/adpCskU

enter image description here

loops R • 955 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 22 minutes ago
United States

This isn't a Bioconductor question, since you aren't actually using any Bioconductor packages. For random questions about R you could try r-help@r-project.org. Or for random bioinformatics questions you can try biostars.org.

ADD COMMENT
0
Entering edit mode

Thanks! I was hoping that someone who did use bioconductor packages did encounter the same problem or if someone knows if there is a bioconductor packages that does this.

ADD REPLY
1
Entering edit mode

All you are doing is repeated GET requests using the Ensembl API. I suppose somebody might have coded that up in a package, but I have no idea why one might do such a thing.

Do note that you are contemplating somewhere around 90,000 GET requests (300 * 300), which is an exceeding large number, and which if you don't space (timewise) accordingly will almost surely get your IP address banned by somebody at Ensembl. Put a different way, there has to be a different way to get these data that doesn't involve something as inefficient as what you propose.

Which is why I suggested Biostars.org, which is a better venue for questions like this. I would be surprised if Kevin Blighe or ATpoint haven't already answered something very similar over there already.

ADD REPLY
1
Entering edit mode

anamaria, have you not contacted Ensembl directly about this? They have a great support team.

ADD REPLY
0
Entering edit mode

yes I do understand this is a terrible solution. I will ask at Biostars.org

ADD REPLY

Login before adding your answer.

Traffic: 669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6