Hello,
I have two files (each has 300 lines)like this:
head 1g.txt
rs6792369
rs1414517
rs16857712
rs16857703
rs12239392
...
head 1n.txt
rs1042779
rs2360630
rs10753597
rs7549096
rs2343491
...
For each pair of rs# from those two files I can run this command in R
library(httr)
library(jsonlite)
library(xml2)
server <- "http://rest.ensembl.org"
ext <-
"/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"
r <- GET(paste(server, ext, sep = ""),
content_type("application/json"))
stop_for_status(r)
head(fromJSON(toJSON(content(r))))
d_prime r2 variation1 variation2 population_name
1 0.975513 0.951626 rs6792369 rs1042779 1000GENOMES:phase_3:KHV
What I would like to do is to do is to run this command for every SNP
in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
is rs# and output every line of result in list.txt
The process is illustrated in the attachment.
Please help,
Ana
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lists.pdf
Type: application/pdf
Size: 27519 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20200619/45aca40e/attachment.pdf>
cpoiw@rt m@iii@g oii chemo@org@uk
2020-Jun-19 20:28 UTC
[R] How to loop over two files ...
so (untested) if you did something like
f1 <- read.text("1g.txt")
f2 <- read.text("1n.txt")
for ( a in as.list(f1) ) {
for ( b in as.list(f2) ) {
ext <- paste0( "/ld/human/pairwise/",
a,
"/",
b,
"?population_name=1000GENOMES:phase_3:KHV")
r <- GET(paste(server, ext, sep = ""),
content_type("application/json"))
# You presumably need to do something with 'r' at the
moment its over written by the next loop.. were
# you appending it to list.txt? Possibly its just a bit
of the R output you want.?
write(r,file="list.txt",append=TRUE)
}
}
Are we doing your PhD for you ;-) Do we get to share ;-)
On 2020-06-19 20:34, Ana Marija wrote:> Hello,
>
> I have two files (each has 300 lines)like this:
>
> head 1g.txt
> rs6792369
> rs1414517
> rs16857712
> rs16857703
> rs12239392
> ...
>
> head 1n.txt
> rs1042779
> rs2360630
> rs10753597
> rs7549096
> rs2343491
> ...
>
> For each pair of rs# from those two files I can run this command in R
>
> library(httr)
> library(jsonlite)
> library(xml2)
>
> server <- "http://rest.ensembl.org"
> ext <-
>
"/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"
>
> r <- GET(paste(server, ext, sep = ""),
> content_type("application/json"))
>
> stop_for_status(r)
> head(fromJSON(toJSON(content(r))))
> d_prime r2 variation1 variation2 population_name
> 1 0.975513 0.951626 rs6792369 rs1042779 1000GENOMES:phase_3:KHV
>
> What I would like to do is to do is to run this command for every SNP
> in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
> is rs# and output every line of result in list.txt
>
> The process is illustrated in the attachment.
>
> Please help,
> Ana
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
On 2020-06-19 14:34 -0500, Ana Marija wrote:> > server <- "http://rest.ensembl.org" > ext <- "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV" > > r <- GET(paste(server, ext, sep = ""), content_type("application/json")) > > stop_for_status(r) > head(fromJSON(toJSON(content(r)))) > d_prime r2 variation1 variation2 population_name > 1 0.975513 0.951626 rs6792369 rs1042779 1000GENOMES:phase_3:KHV > > What I would like to do is to do is to run this command for every SNP > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP# > is rs# and output every line of result in list.txtDear Ana, I tried, but for some reason I get only a response for the first URL you supplied. I wrote this: files <- c("1g.txt", "1n.txt") files <- lapply(files, readLines) server <- "http://rest.ensembl.org" population.name <- "1000GENOMES:phase_3:KHV" ext <- apply(expand.grid(files), 1, function(x) { return(paste0(server, "/ld/human/pairwise/", x[1], "/", x[2], "?population_name=", population.name)) }) # r <- lapply(ext, function(x) { # httr::GET(x, httr::content_type("application/json")) # }) # names(r) <- ext # file <- paste0(population.name, ".rds") # saveRDS(object=r, compress="xz", file=file) r <- readRDS(paste0(population.name, ".rds")) lapply(r[1:4], function(x) { jsonlite::fromJSON(jsonlite::toJSON(httr::content(x))) }) Which if you are able to run it (saving the output in that rds file), yields this: $`http://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV` variation2 population_name d_prime r2 variation1 1 rs1042779 1000GENOMES:phase_3:KHV 0.975513 0.951626 rs6792369 $`http://rest.ensembl.org/ld/human/pairwise/rs1414517/rs1042779?population_name=1000GENOMES:phase_3:KHV` list() $`http://rest.ensembl.org/ld/human/pairwise/rs16857712/rs1042779?population_name=1000GENOMES:phase_3:KHV` list() $`http://rest.ensembl.org/ld/human/pairwise/rs16857703/rs1042779?population_name=1000GENOMES:phase_3:KHV` list() For some reason, only the first url works ... I am a bit unfamiliar working with REST API's. Or web scraping in general. Daniel Cegie?ka knows something in this thread some days ago, where it might be similar to the API of borsaitaliana.it, where you can supply headers with curl like he quickly did [2]. You might be able to supply the list of SNPs in a header to Ensemble in httr::GET somehow if you read some docs on their API? Best, Rasmus [1] https://marc.info/?t=159249246100002&r=1&w=2 [2] https://marc.info/?l=r-sig-finance&m=159249894208684&w=2 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200619/65f5d896/attachment.sig>
Hi,
thanks for getting back to me, it is just for my job :)
so I tried it:
library(httr)
library(jsonlite)
library(xml2)
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"),
"R", "lib")))
sparkR.session(master = "local[*]", sparkConfig
list(spark.driver.memory = "2g"))
server <- "http://rest.ensembl.org"
f1 <- read.text("1g.txt")
f2 <- read.text("1n.txt")
for ( a in as.list(f1) ) {
for ( b in as.list(f2) ) {
ext <- paste0( "/ld/human/pairwise/",
a,
"/",
b,
"?population_name=1000GENOMES:phase_3:KHV")
r <- GET(paste(server, ext, sep = ""),
content_type("application/json"))
write(r,file="list.txt",append=TRUE)
}
}
and I got this error:
Error in as.list.default(f1) :
no method for coercing this S4 class to a vector
Please advise
On Fri, Jun 19, 2020 at 3:28 PM <cpolwart at chemo.org.uk>
wrote:>
> so (untested) if you did something like
>
> f1 <- read.text("1g.txt")
> f2 <- read.text("1n.txt")
>
> for ( a in as.list(f1) ) {
>
> for ( b in as.list(f2) ) {
>
> ext <- paste0( "/ld/human/pairwise/",
> a,
> "/",
> b,
> "?population_name=1000GENOMES:phase_3:KHV")
>
> r <- GET(paste(server, ext, sep = ""),
> content_type("application/json"))
>
> # You presumably need to do something with 'r' at
the
> moment its over written by the next loop.. were
> # you appending it to list.txt? Possibly its just a bit
> of the R output you want.?
>
> write(r,file="list.txt",append=TRUE)
>
>
> }
>
> }
>
>
> Are we doing your PhD for you ;-) Do we get to share ;-)
>
>
> On 2020-06-19 20:34, Ana Marija wrote:
> > Hello,
> >
> > I have two files (each has 300 lines)like this:
> >
> > head 1g.txt
> > rs6792369
> > rs1414517
> > rs16857712
> > rs16857703
> > rs12239392
> > ...
> >
> > head 1n.txt
> > rs1042779
> > rs2360630
> > rs10753597
> > rs7549096
> > rs2343491
> > ...
> >
> > For each pair of rs# from those two files I can run this command in R
> >
> > library(httr)
> > library(jsonlite)
> > library(xml2)
> >
> > server <- "http://rest.ensembl.org"
> > ext <-
> >
"/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"
> >
> > r <- GET(paste(server, ext, sep = ""),
> > content_type("application/json"))
> >
> > stop_for_status(r)
> > head(fromJSON(toJSON(content(r))))
> > d_prime r2 variation1 variation2 population_name
> > 1 0.975513 0.951626 rs6792369 rs1042779 1000GENOMES:phase_3:KHV
> >
> > What I would like to do is to do is to run this command for every SNP
> > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
> > is rs# and output every line of result in list.txt
> >
> > The process is illustrated in the attachment.
> >
> > Please help,
> > Ana
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
HI Rasmus,
I tried it:
library(base)
files <- c("1g.txt", "1n.txt")
files <- lapply(files, readLines)
server <- "http://rest.ensembl.org"
population.name <- "1000GENOMES:phase_3:KHV"
ext <- apply(expand.grid(files), 1, function(x) {
return(paste0(server, "/ld/human/pairwise/",
x[1], "/", x[2],
"?population_name=", population.name))
})
r <- readRDS(paste0(population.name, ".rds"))
lapply(r[1:4], function(x) {
jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
})
and I got this error:> r <- readRDS(paste0(population.name, ".rds"))
Error in gzfile(file, "rb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file '1000GENOMES:phase_3:KHV.rds', probable
reason 'No such file or directory'> lapply(r[1:4], function(x) {
+ jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
+ })
Error in lapply(r[1:4], function(x) { : object 'r' not found
Am I am doing here something wrong?
Do I need any other libraries loaded?
Thanks
Ana
On Fri, Jun 19, 2020 at 3:49 PM Rasmus Liland <jral at posteo.no>
wrote:>
> On 2020-06-19 14:34 -0500, Ana Marija wrote:
> >
> > server <- "http://rest.ensembl.org"
> > ext <-
"/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"
> >
> > r <- GET(paste(server, ext, sep = ""),
content_type("application/json"))
> >
> > stop_for_status(r)
> > head(fromJSON(toJSON(content(r))))
> > d_prime r2 variation1 variation2 population_name
> > 1 0.975513 0.951626 rs6792369 rs1042779 1000GENOMES:phase_3:KHV
> >
> > What I would like to do is to do is to run this command for every SNP
> > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
> > is rs# and output every line of result in list.txt
>
> Dear Ana,
>
> I tried, but for some reason I get only a
> response for the first URL you supplied.
>
> I wrote this:
>
> files <- c("1g.txt", "1n.txt")
> files <- lapply(files, readLines)
> server <- "http://rest.ensembl.org"
> population.name <- "1000GENOMES:phase_3:KHV"
> ext <- apply(expand.grid(files), 1, function(x) {
> return(paste0(server, "/ld/human/pairwise/",
> x[1], "/", x[2],
> "?population_name=", population.name))
> })
>
> # r <- lapply(ext, function(x) {
> # httr::GET(x, httr::content_type("application/json"))
> # })
> # names(r) <- ext
> # file <- paste0(population.name, ".rds")
> # saveRDS(object=r, compress="xz", file=file)
>
> r <- readRDS(paste0(population.name, ".rds"))
> lapply(r[1:4], function(x) {
> jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
> })
>
>
> Which if you are able to run it (saving the
> output in that rds file), yields this:
>
>
$`http://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV`
> variation2 population_name d_prime r2 variation1
> 1 rs1042779 1000GENOMES:phase_3:KHV 0.975513 0.951626 rs6792369
>
>
$`http://rest.ensembl.org/ld/human/pairwise/rs1414517/rs1042779?population_name=1000GENOMES:phase_3:KHV`
> list()
>
>
$`http://rest.ensembl.org/ld/human/pairwise/rs16857712/rs1042779?population_name=1000GENOMES:phase_3:KHV`
> list()
>
>
$`http://rest.ensembl.org/ld/human/pairwise/rs16857703/rs1042779?population_name=1000GENOMES:phase_3:KHV`
> list()
>
> For some reason, only the first url works ...
>
> I am a bit unfamiliar working with REST
> API's. Or web scraping in general. Daniel
> Cegie?ka knows something in this thread some
> days ago, where it might be similar to the
> API of borsaitaliana.it, where you can supply
> headers with curl like he quickly did [2].
>
> You might be able to supply the list of SNPs
> in a header to Ensemble in httr::GET somehow
> if you read some docs on their API?
>
> Best,
> Rasmus
>
> [1] https://marc.info/?t=159249246100002&r=1&w=2
> [2] https://marc.info/?l=r-sig-finance&m=159249894208684&w=2
On 2020-06-19 14:34 -0500, Ana Marija wrote:> > I have two files (each has 300 lines)like this:The example looks quite similar to the R example in https://rest.ensembl.org/documentation/info/ld_pairwise_get#ra ... The question becomes: how did you query the 600 variant names in 1g.txt and 1n.txt? curl 'https://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?' -H 'Content-type:application/json' shows the 26 population_names for the rs6792369/rs1042779 combination ... -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200620/07eaf84e/attachment.sig>
Hi Rasmus, I got those SNPs from two GWAS-es which I run with different phenotypes and I would like to compare weather the top SNPs in both of them are in LD. So 1n.txt and 1g.txt are just top SNPs from those two GWAS-es. Unfortunately https://ldlink.nci.nih.gov/?tab=ldpair works for only two SNPs at the time and I need to do that for 300 pairs On Fri, Jun 19, 2020 at 6:42 PM Rasmus Liland <jral at posteo.no> wrote:> > On 2020-06-19 14:34 -0500, Ana Marija wrote: > > > > I have two files (each has 300 lines)like this: > > The example looks quite similar to the R example in > https://rest.ensembl.org/documentation/info/ld_pairwise_get#ra > ... > > The question becomes: how did you query the > 600 variant names in 1g.txt and 1n.txt? > > curl 'https://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?' -H 'Content-type:application/json' > > shows the 26 population_names for the > rs6792369/rs1042779 combination ...