thr3ads.net - R help - [R] How to loop over two files ... [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Ana Marija

2020-Jun-19 21:07 UTC

[R] How to loop over two files ...

HI Rasmus,

I tried it:

library(base)

files <- c("1g.txt", "1n.txt")
        files <- lapply(files, readLines)
        server <- "http://rest.ensembl.org"
        population.name <- "1000GENOMES:phase_3:KHV"
        ext <- apply(expand.grid(files), 1, function(x) {
          return(paste0(server, "/ld/human/pairwise/",
            x[1], "/", x[2],
            "?population_name=", population.name))
        })

r <- readRDS(paste0(population.name, ".rds"))
        lapply(r[1:4], function(x) {
          jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
        })

and I got this error:> r <- readRDS(paste0(population.name, ".rds"))Error in gzfile(file, "rb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file '1000GENOMES:phase_3:KHV.rds', probable
reason 'No such file or directory'>         lapply(r[1:4], function(x) {+           jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
+         })
Error in lapply(r[1:4], function(x) { : object 'r' not found

Am I am doing here something wrong?
Do I need any other libraries loaded?

Thanks
Ana

On Fri, Jun 19, 2020 at 3:49 PM Rasmus Liland <jral at posteo.no>
wrote:>
> On 2020-06-19 14:34 -0500, Ana Marija wrote:
> >
> > server <- "http://rest.ensembl.org"
> > ext <-
"/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"
> >
> > r <- GET(paste(server, ext, sep = ""),
content_type("application/json"))
> >
> > stop_for_status(r)
> > head(fromJSON(toJSON(content(r))))
> >    d_prime       r2 variation1 variation2         population_name
> > 1 0.975513 0.951626  rs6792369  rs1042779 1000GENOMES:phase_3:KHV
> >
> > What I would like to do is to do is to run this command for every SNP
> > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
> > is rs# and output every line of result in list.txt
>
> Dear Ana,
>
> I tried, but for some reason I get only a
> response for the first URL you supplied.
>
> I wrote this:
>
>         files <- c("1g.txt", "1n.txt")
>         files <- lapply(files, readLines)
>         server <- "http://rest.ensembl.org"
>         population.name <- "1000GENOMES:phase_3:KHV"
>         ext <- apply(expand.grid(files), 1, function(x) {
>           return(paste0(server, "/ld/human/pairwise/",
>             x[1], "/", x[2],
>             "?population_name=", population.name))
>         })
>
>         # r <- lapply(ext, function(x) {
>         #   httr::GET(x, httr::content_type("application/json"))
>         # })
>         # names(r) <- ext
>         # file <- paste0(population.name, ".rds")
>         # saveRDS(object=r, compress="xz", file=file)
>
>         r <- readRDS(paste0(population.name, ".rds"))
>         lapply(r[1:4], function(x) {
>           jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
>         })
>
>
> Which if you are able to run it (saving the
> output in that rds file), yields this:
>
>        
$`http://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV`
>           variation2         population_name  d_prime       r2 variation1
>         1  rs1042779 1000GENOMES:phase_3:KHV 0.975513 0.951626  rs6792369
>
>        
$`http://rest.ensembl.org/ld/human/pairwise/rs1414517/rs1042779?population_name=1000GENOMES:phase_3:KHV`
>         list()
>
>        
$`http://rest.ensembl.org/ld/human/pairwise/rs16857712/rs1042779?population_name=1000GENOMES:phase_3:KHV`
>         list()
>
>        
$`http://rest.ensembl.org/ld/human/pairwise/rs16857703/rs1042779?population_name=1000GENOMES:phase_3:KHV`
>         list()
>
> For some reason, only the first url works ...
>
> I am a bit unfamiliar working with REST
> API's.  Or web scraping in general.  Daniel
> Cegie?ka knows something in this thread some
> days ago, where it might be similar to the
> API of borsaitaliana.it, where you can supply
> headers with curl like he quickly did [2].
>
> You might be able to supply the list of SNPs
> in a header to Ensemble in httr::GET somehow
> if you read some docs on their API?
>
> Best,
> Rasmus
>
> [1] https://marc.info/?t=159249246100002&r=1&w=2
> [2] https://marc.info/?l=r-sig-finance&m=159249894208684&w=2

Rasmus Liland

2020-Jun-19 21:31 UTC

head link

[R] How to loop over two files ...

On 2020-06-19 16:07 -0500, Ana Marija wrote:> HI Rasmus,
> 
> I tried it:
> 
> library(base)
> 
> > r <- readRDS(paste0(population.name, ".rds"))
> Error in gzfile(file, "rb") : cannot open the connection
> In addition: Warning message:
> In gzfile(file, "rb") :
>   cannot open compressed file '1000GENOMES:phase_3:KHV.rds',
probable
> reason 'No such file or directory'
Because I run my script again and again after 
every little small change using the program 
entr[1] as opposed to using Emacs Speaks 
Statistics or RStudio, I find it useful to 
save partial outputs in rds files, but it 
also make sense to not call ensembl.org again 
and again ...

Right, so you would run the commented bit 
before that first, then save the output list 
to the rds to not send too many requests to 
the list.  I have attached my rds here.  

	files <- c("1g.txt", "1n.txt")
	files <- lapply(files, readLines)
	server <- "http://rest.ensembl.org"
	population.name <- "1000GENOMES:phase_3:KHV"
	ext <- apply(expand.grid(files), 1, function(x) {
	  return(paste0(server, "/ld/human/pairwise/",
	    x[1], "/", x[2],
	    "?population_name=", population.name))
	})
	
	r <- lapply(ext, function(x) {
	  httr::GET(x, httr::content_type("application/json"))
	})
	names(r) <- ext
	file <- paste0(population.name, ".rds")
	
	saveRDS(object=r, compress="xz", file=file)  # <--- Then save the
list here for another time!
	# r <- readRDS(paste0(population.name, ".rds"))  # Read it back
like this
	
	r <-
	sapply(r, function(x) {
	  x <- jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
	  length(x)
	})
	names(r) <- NULL
	r

[1] http://eradman.com/entrproject/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20200619/e6b6ddea/attachment.sig>

Rasmus Liland

2020-Jun-19 22:17 UTC

head link

[R] How to loop over two files ...

Dear other list readers, 

On 2020-06-19 23:31 +0200, Rasmus Liland wrote:> I have attached my rds here.  
only Ana recieved this because of a Mailman 
attachment policy, which also is why my 
signature was bad ...

Best,
Rasmus

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20200620/7c5a3036/attachment.sig>

R help - Jun 2020 - How to loop over two files ...

[R] How to loop over two files ...

[R] How to loop over two files ...

[R] How to loop over two files ...