Hello, I have two files (each has 300 lines)like this: head 1g.txt rs6792369 rs1414517 rs16857712 rs16857703 rs12239392 ... head 1n.txt rs1042779 rs2360630 rs10753597 rs7549096 rs2343491 ... For each pair of rs# from those two files I can run this command in R library(httr) library(jsonlite) library(xml2) server <- "http://rest.ensembl.org" ext <- "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV" r <- GET(paste(server, ext, sep = ""), content_type("application/json")) stop_for_status(r) head(fromJSON(toJSON(content(r)))) d_prime r2 variation1 variation2 population_name 1 0.975513 0.951626 rs6792369 rs1042779 1000GENOMES:phase_3:KHV What I would like to do is to do is to run this command for every SNP in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP# is rs# and output every line of result in list.txt The process is illustrated in the attachment. Please help, Ana -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.pdf Type: application/pdf Size: 27519 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200619/45aca40e/attachment.pdf>
cpoiw@rt m@iii@g oii chemo@org@uk
2020-Jun-19 20:28 UTC
[R] How to loop over two files ...
so (untested) if you did something like f1 <- read.text("1g.txt") f2 <- read.text("1n.txt") for ( a in as.list(f1) ) { for ( b in as.list(f2) ) { ext <- paste0( "/ld/human/pairwise/", a, "/", b, "?population_name=1000GENOMES:phase_3:KHV") r <- GET(paste(server, ext, sep = ""), content_type("application/json")) # You presumably need to do something with 'r' at the moment its over written by the next loop.. were # you appending it to list.txt? Possibly its just a bit of the R output you want.? write(r,file="list.txt",append=TRUE) } } Are we doing your PhD for you ;-) Do we get to share ;-) On 2020-06-19 20:34, Ana Marija wrote:> Hello, > > I have two files (each has 300 lines)like this: > > head 1g.txt > rs6792369 > rs1414517 > rs16857712 > rs16857703 > rs12239392 > ... > > head 1n.txt > rs1042779 > rs2360630 > rs10753597 > rs7549096 > rs2343491 > ... > > For each pair of rs# from those two files I can run this command in R > > library(httr) > library(jsonlite) > library(xml2) > > server <- "http://rest.ensembl.org" > ext <- > "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV" > > r <- GET(paste(server, ext, sep = ""), > content_type("application/json")) > > stop_for_status(r) > head(fromJSON(toJSON(content(r)))) > d_prime r2 variation1 variation2 population_name > 1 0.975513 0.951626 rs6792369 rs1042779 1000GENOMES:phase_3:KHV > > What I would like to do is to do is to run this command for every SNP > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP# > is rs# and output every line of result in list.txt > > The process is illustrated in the attachment. > > Please help, > Ana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 2020-06-19 14:34 -0500, Ana Marija wrote:> > server <- "http://rest.ensembl.org" > ext <- "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV" > > r <- GET(paste(server, ext, sep = ""), content_type("application/json")) > > stop_for_status(r) > head(fromJSON(toJSON(content(r)))) > d_prime r2 variation1 variation2 population_name > 1 0.975513 0.951626 rs6792369 rs1042779 1000GENOMES:phase_3:KHV > > What I would like to do is to do is to run this command for every SNP > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP# > is rs# and output every line of result in list.txtDear Ana, I tried, but for some reason I get only a response for the first URL you supplied. I wrote this: files <- c("1g.txt", "1n.txt") files <- lapply(files, readLines) server <- "http://rest.ensembl.org" population.name <- "1000GENOMES:phase_3:KHV" ext <- apply(expand.grid(files), 1, function(x) { return(paste0(server, "/ld/human/pairwise/", x[1], "/", x[2], "?population_name=", population.name)) }) # r <- lapply(ext, function(x) { # httr::GET(x, httr::content_type("application/json")) # }) # names(r) <- ext # file <- paste0(population.name, ".rds") # saveRDS(object=r, compress="xz", file=file) r <- readRDS(paste0(population.name, ".rds")) lapply(r[1:4], function(x) { jsonlite::fromJSON(jsonlite::toJSON(httr::content(x))) }) Which if you are able to run it (saving the output in that rds file), yields this: $`http://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV` variation2 population_name d_prime r2 variation1 1 rs1042779 1000GENOMES:phase_3:KHV 0.975513 0.951626 rs6792369 $`http://rest.ensembl.org/ld/human/pairwise/rs1414517/rs1042779?population_name=1000GENOMES:phase_3:KHV` list() $`http://rest.ensembl.org/ld/human/pairwise/rs16857712/rs1042779?population_name=1000GENOMES:phase_3:KHV` list() $`http://rest.ensembl.org/ld/human/pairwise/rs16857703/rs1042779?population_name=1000GENOMES:phase_3:KHV` list() For some reason, only the first url works ... I am a bit unfamiliar working with REST API's. Or web scraping in general. Daniel Cegie?ka knows something in this thread some days ago, where it might be similar to the API of borsaitaliana.it, where you can supply headers with curl like he quickly did [2]. You might be able to supply the list of SNPs in a header to Ensemble in httr::GET somehow if you read some docs on their API? Best, Rasmus [1] https://marc.info/?t=159249246100002&r=1&w=2 [2] https://marc.info/?l=r-sig-finance&m=159249894208684&w=2 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200619/65f5d896/attachment.sig>
Hi, thanks for getting back to me, it is just for my job :) so I tried it: library(httr) library(jsonlite) library(xml2) library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"))) sparkR.session(master = "local[*]", sparkConfig list(spark.driver.memory = "2g")) server <- "http://rest.ensembl.org" f1 <- read.text("1g.txt") f2 <- read.text("1n.txt") for ( a in as.list(f1) ) { for ( b in as.list(f2) ) { ext <- paste0( "/ld/human/pairwise/", a, "/", b, "?population_name=1000GENOMES:phase_3:KHV") r <- GET(paste(server, ext, sep = ""), content_type("application/json")) write(r,file="list.txt",append=TRUE) } } and I got this error: Error in as.list.default(f1) : no method for coercing this S4 class to a vector Please advise On Fri, Jun 19, 2020 at 3:28 PM <cpolwart at chemo.org.uk> wrote:> > so (untested) if you did something like > > f1 <- read.text("1g.txt") > f2 <- read.text("1n.txt") > > for ( a in as.list(f1) ) { > > for ( b in as.list(f2) ) { > > ext <- paste0( "/ld/human/pairwise/", > a, > "/", > b, > "?population_name=1000GENOMES:phase_3:KHV") > > r <- GET(paste(server, ext, sep = ""), > content_type("application/json")) > > # You presumably need to do something with 'r' at the > moment its over written by the next loop.. were > # you appending it to list.txt? Possibly its just a bit > of the R output you want.? > > write(r,file="list.txt",append=TRUE) > > > } > > } > > > Are we doing your PhD for you ;-) Do we get to share ;-) > > > On 2020-06-19 20:34, Ana Marija wrote: > > Hello, > > > > I have two files (each has 300 lines)like this: > > > > head 1g.txt > > rs6792369 > > rs1414517 > > rs16857712 > > rs16857703 > > rs12239392 > > ... > > > > head 1n.txt > > rs1042779 > > rs2360630 > > rs10753597 > > rs7549096 > > rs2343491 > > ... > > > > For each pair of rs# from those two files I can run this command in R > > > > library(httr) > > library(jsonlite) > > library(xml2) > > > > server <- "http://rest.ensembl.org" > > ext <- > > "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV" > > > > r <- GET(paste(server, ext, sep = ""), > > content_type("application/json")) > > > > stop_for_status(r) > > head(fromJSON(toJSON(content(r)))) > > d_prime r2 variation1 variation2 population_name > > 1 0.975513 0.951626 rs6792369 rs1042779 1000GENOMES:phase_3:KHV > > > > What I would like to do is to do is to run this command for every SNP > > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP# > > is rs# and output every line of result in list.txt > > > > The process is illustrated in the attachment. > > > > Please help, > > Ana > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.
HI Rasmus, I tried it: library(base) files <- c("1g.txt", "1n.txt") files <- lapply(files, readLines) server <- "http://rest.ensembl.org" population.name <- "1000GENOMES:phase_3:KHV" ext <- apply(expand.grid(files), 1, function(x) { return(paste0(server, "/ld/human/pairwise/", x[1], "/", x[2], "?population_name=", population.name)) }) r <- readRDS(paste0(population.name, ".rds")) lapply(r[1:4], function(x) { jsonlite::fromJSON(jsonlite::toJSON(httr::content(x))) }) and I got this error:> r <- readRDS(paste0(population.name, ".rds"))Error in gzfile(file, "rb") : cannot open the connection In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '1000GENOMES:phase_3:KHV.rds', probable reason 'No such file or directory'> lapply(r[1:4], function(x) {+ jsonlite::fromJSON(jsonlite::toJSON(httr::content(x))) + }) Error in lapply(r[1:4], function(x) { : object 'r' not found Am I am doing here something wrong? Do I need any other libraries loaded? Thanks Ana On Fri, Jun 19, 2020 at 3:49 PM Rasmus Liland <jral at posteo.no> wrote:> > On 2020-06-19 14:34 -0500, Ana Marija wrote: > > > > server <- "http://rest.ensembl.org" > > ext <- "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV" > > > > r <- GET(paste(server, ext, sep = ""), content_type("application/json")) > > > > stop_for_status(r) > > head(fromJSON(toJSON(content(r)))) > > d_prime r2 variation1 variation2 population_name > > 1 0.975513 0.951626 rs6792369 rs1042779 1000GENOMES:phase_3:KHV > > > > What I would like to do is to do is to run this command for every SNP > > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP# > > is rs# and output every line of result in list.txt > > Dear Ana, > > I tried, but for some reason I get only a > response for the first URL you supplied. > > I wrote this: > > files <- c("1g.txt", "1n.txt") > files <- lapply(files, readLines) > server <- "http://rest.ensembl.org" > population.name <- "1000GENOMES:phase_3:KHV" > ext <- apply(expand.grid(files), 1, function(x) { > return(paste0(server, "/ld/human/pairwise/", > x[1], "/", x[2], > "?population_name=", population.name)) > }) > > # r <- lapply(ext, function(x) { > # httr::GET(x, httr::content_type("application/json")) > # }) > # names(r) <- ext > # file <- paste0(population.name, ".rds") > # saveRDS(object=r, compress="xz", file=file) > > r <- readRDS(paste0(population.name, ".rds")) > lapply(r[1:4], function(x) { > jsonlite::fromJSON(jsonlite::toJSON(httr::content(x))) > }) > > > Which if you are able to run it (saving the > output in that rds file), yields this: > > $`http://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV` > variation2 population_name d_prime r2 variation1 > 1 rs1042779 1000GENOMES:phase_3:KHV 0.975513 0.951626 rs6792369 > > $`http://rest.ensembl.org/ld/human/pairwise/rs1414517/rs1042779?population_name=1000GENOMES:phase_3:KHV` > list() > > $`http://rest.ensembl.org/ld/human/pairwise/rs16857712/rs1042779?population_name=1000GENOMES:phase_3:KHV` > list() > > $`http://rest.ensembl.org/ld/human/pairwise/rs16857703/rs1042779?population_name=1000GENOMES:phase_3:KHV` > list() > > For some reason, only the first url works ... > > I am a bit unfamiliar working with REST > API's. Or web scraping in general. Daniel > Cegie?ka knows something in this thread some > days ago, where it might be similar to the > API of borsaitaliana.it, where you can supply > headers with curl like he quickly did [2]. > > You might be able to supply the list of SNPs > in a header to Ensemble in httr::GET somehow > if you read some docs on their API? > > Best, > Rasmus > > [1] https://marc.info/?t=159249246100002&r=1&w=2 > [2] https://marc.info/?l=r-sig-finance&m=159249894208684&w=2
On 2020-06-19 14:34 -0500, Ana Marija wrote:> > I have two files (each has 300 lines)like this:The example looks quite similar to the R example in https://rest.ensembl.org/documentation/info/ld_pairwise_get#ra ... The question becomes: how did you query the 600 variant names in 1g.txt and 1n.txt? curl 'https://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?' -H 'Content-type:application/json' shows the 26 population_names for the rs6792369/rs1042779 combination ... -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200620/07eaf84e/attachment.sig>
Hi Rasmus, I got those SNPs from two GWAS-es which I run with different phenotypes and I would like to compare weather the top SNPs in both of them are in LD. So 1n.txt and 1g.txt are just top SNPs from those two GWAS-es. Unfortunately https://ldlink.nci.nih.gov/?tab=ldpair works for only two SNPs at the time and I need to do that for 300 pairs On Fri, Jun 19, 2020 at 6:42 PM Rasmus Liland <jral at posteo.no> wrote:> > On 2020-06-19 14:34 -0500, Ana Marija wrote: > > > > I have two files (each has 300 lines)like this: > > The example looks quite similar to the R example in > https://rest.ensembl.org/documentation/info/ld_pairwise_get#ra > ... > > The question becomes: how did you query the > 600 variant names in 1g.txt and 1n.txt? > > curl 'https://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?' -H 'Content-type:application/json' > > shows the 26 population_names for the > rs6792369/rs1042779 combination ...