Buenas Estoy usando H20 en local y tb en un ec2 de amazon, pero tengo que tener algo mal configurado seguro. Para iniciarlo, hago lo siguiente: conexion<-h2o.init() Me arranca el cluster con el maximo de cores y memoria que se permite. Una vez hech oesto, quiero calcular la distancia entre dos data.frames: uno<-data.frame(matrix(rnorm(300000),ncol=10)) dos<-data.frame(matrix(rnorm(500),ncol=10)) uno<-as.h2o(uno) dos<-as.h2o(dos) matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos)) for(i in nrow(dos)){ matriz[,i]<-h2o.distance(uno,dos[i,]) } Al hacerlo, y haciendo uso de htop veo que de lso 4 nucleos de mi pc o los 16 del ec2 de amazon, solo se usa uno, y es mas, en el ec2 esta tardando en ejecutarlo mas que en el pc. Por ello creo que no esta paralelizando bien. ¿A alguien le ha ocurrido? Si hago un h2o.clusterStatus() me aparece que esta todo OK R version 3.4.1 (2017-06-30) -- "Single Candle" Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R es un software libre y viene sin GARANTIA ALGUNA. Usted puede redistribuirlo bajo ciertas circunstancias. Escriba 'license()' o 'licence()' para detalles de distribucion. R es un proyecto colaborativo con muchos contribuyentes. Escriba 'contributors()' para obtener más información y 'citation()' para saber cómo citar R o paquetes de R en publicaciones. Escriba 'demo()' para demostraciones, 'help()' para el sistema on-line de ayuda, o 'help.start()' para abrir el sistema de ayuda HTML con su navegador. Escriba 'q()' para salir de R. [Workspace loaded from ~/.RData]> library(h2o)---------------------------------------------------------------------- Your next step is to start H2O: > h2o.init() For H2O package documentation, ask for help: > ??h2o After starting H2O, you can use the Web UI at localhost:54321 For more information visit docs.h2o.ai ---------------------------------------------------------------------- Attaching package: ‘h2o’ The following objects are masked from ‘package:stats’: cor, sd, var The following objects are masked from ‘package:base’: ||, &&, %*%, apply, as.factor, as.numeric, colnames, colnames<-, ifelse, %in%, is.character, is.factor, is.numeric, log, log10, log1p, log2, round, signif, trunc> h2o.init()H2O is not running yet, starting it now... Note: In case of errors look at the following log files: /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.out /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.err java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) Starting H2O JVM and connecting: .. Connection successful! R is connected to the H2O cluster: H2O cluster uptime: 1 seconds 905 milliseconds H2O cluster version: 3.10.5.3 H2O cluster version age: 1 month and 20 days H2O cluster name: H2O_started_from_R_jesus_rqh095 H2O cluster total nodes: 1 H2O cluster total memory: 1.71 GB H2O cluster total cores: 4 H2O cluster allowed cores: 4 H2O cluster healthy: TRUE H2O Connection ip: localhost H2O Connection port: 54321 H2O Connection proxy: NA H2O Internal Security: FALSE R Version: R version 3.4.1 (2017-06-30)> gc()used (Mb) gc trigger (Mb) max used (Mb) Ncells 679385 36.3 1168576 62.5 940480 50.3 Vcells 1138497 8.7 1920143 14.7 1532430 11.7> rm(list=ls()) > datos<-read.table("/home/jesus/master/datos/datos-balanceado/datos-100/datos.csv",header=T,dec=".",sep=",") > uno<-datos[datos$InspectionReport == "ACCEPTED",] > dos<-datos[datos$InspectionReport != "ACCEPTED",] > uno$InspectionReport<-NULL > dos$InspectionReport<-NULL > uno2<-as.h2o(uno)|=======================================================================================| 100%> dos2<-as.h2o(dos)|=======================================================================================| 100%> h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))Error in !missing(row) && !(base::is.character(row)) : objeto 'i' no encontrado> i<-1 > h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))|=======================================================================================| 100%> t=Sys.time() > for(i in 1:10){+ h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv")) + + } ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url = localhost:54321/3/Frames/RTMP_sid_8e47_4/export) water.exceptions.H2OIllegalArgumentException [1] "water.exceptions.H2OIllegalArgumentException: Illegal argument: /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv of function: exportFrame: File /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv already exists!" [2] " water.fvec.Frame.export(Frame.java:1370)" [3] " water.api.FramesHandler.export(FramesHandler.java:258)" [4] " sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)" [5] " sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)" [6] " sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)" [7] " java.lang.reflect.Method.invoke(Method.java:498)" [8] " water.api.Handler.handle(Handler.java:63)" [9] " water.api.RequestServer.serve(RequestServer.java:448)" [10] " water.api.RequestServer.doGeneric(RequestServer.java:297)" [11] " water.api.RequestServer.doPost(RequestServer.java:223)" [12] " javax.servlet.http.HttpServlet.service(HttpServlet.java:755)" [13] " javax.servlet.http.HttpServlet.service(HttpServlet.java:848)" [14] " org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)" [15] " org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)" [16] " org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)" [17] " org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)" [18] " org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)" [19] " org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)" [20] " org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)" [21] " org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)" [22] " water.JettyHTTPD$LoginHandler.handle(JettyHTTPD.java:183)" [23] " org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)" [24] " org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)" [25] " org.eclipse.jetty.server.Server.handle(Server.java:370)" [26] " org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)" [27] " org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)" [28] " org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)" [29] " org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)" [30] " org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)" [31] " org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)" [32] " org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)" [33] " org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)" [34] " org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)" [35] " org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)" [36] " java.lang.Thread.run(Thread.java:748)" Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, : ERROR MESSAGE: Illegal argument: /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv of function: exportFrame: File /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv already exists!> > print(Sys.time()-t)Time difference of 0.218374 secs> ?h2o.exportFile > t=Sys.time() > for(i in 1:10){+ h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"),force=T) + + } |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100%> > print(Sys.time()-t)Time difference of 11.31977 secs> matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))|=======================================================================================| 100%> t=Sys.time() > for(i in 1:10){+ + matriz[,j]<-h2o.distance(uno2,dos2[i,]) + } Error in !allCol && is.na(col) : objeto 'j' no encontrado> > print(Sys.time()-t)Time difference of 0.006168127 secs> t=Sys.time() > for(i in 1:10){+ + matriz[,i]<-h2o.distance(uno2,dos2[i,]) + }> > print(Sys.time()-t)Time difference of 30.33803 secs> 10/30[1] 0.3333333> 30/10*nrow(dos)[1] 16068> 30/10*nrow(dos)[1] 16068> 30/10*nrow(dos)/3600[1] 4.463333> library(data.table)data.table 1.10.4 The fastest way to learn (by data.table authors): datacamp.com/courses/data-analysis-the-data-table-way Documentation: ?data.table, example(data.table) and browseVignettes("data.table") Release notes, videos and slides: r-datatable.com Attaching package: ‘data.table’ The following objects are masked from ‘package:h2o’: hour, month, week, year> t=Sys.time() > for(i in 1:10){+ fwrite(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv")) + + } Error: is.list(x) is not TRUE> print(Sys.time()-t)Time difference of 0.1015148 secs> ?fwrite > matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))|=======================================================================================| 100%> t=Sys.time() > for(i in 1:10){+ + matriz[,i]<-h2o.distance(uno2,dos2[i,]) + }> > print(Sys.time()-t)Time difference of 28.89684 secs> 30/10[1] 3> 30/10*nrow(dos)[1] 16068> 30/10*nrow(dos)/3600[1] 4.463333> t=Sys.time() > for(i in 1:50){+ + matriz[,i]<-h2o.distance(uno2,dos2[i,]) + }> > print(Sys.time()-t)Time difference of 2.506209 mins> 2*60+30[1] 150> 150/50[1] 3> h2o.cluster_sizes()Error in .model.parts(object) : el argumento "object" está ausente, sin valor por omisión> h2o.clusterStatus()Version: 3.10.5.3 Cluster name: H2O_started_from_R_jesus_rqh095 Cluster size: 1 Cluster is locked h2o healthy last_ping num_cpus sys_load mem_value_size free_mem 1 localhost/127.0.0.1:54321 TRUE 1.50317e+12 4 0.55 707369984 1129209856 pojo_mem swap_mem free_disk max_disk pid num_keys tcps_active open_fds rpcs_active 1 0 0 6571425792 20121124864 8553 16417 0 38 0> ?h2o.init() > h2o.clusterIsUp()[1] TRUE> h2o.clusterInfo()R is connected to the H2O cluster: H2O cluster uptime: 1 hours 31 minutes H2O cluster version: 3.10.5.3 H2O cluster version age: 1 month and 20 days H2O cluster name: H2O_started_from_R_jesus_rqh095 H2O cluster total nodes: 1 H2O cluster total memory: 1.05 GB H2O cluster total cores: 4 H2O cluster allowed cores: 4 H2O cluster healthy: TRUE H2O Connection ip: localhost H2O Connection port: 54321 H2O Connection proxy: NA H2O Internal Security: FALSE R Version: R version 3.4.1 (2017-06-30)> h2o.cluster_sizes(dos2)Error in .model.parts(object) : tentativa de obtener un slot "model" de un objeto cuya clase ("H2OFrame") que no es un objecto clase S4> h2o.clusterInfo()R is connected to the H2O cluster: H2O cluster uptime: 1 hours 37 minutes H2O cluster version: 3.10.5.3 H2O cluster version age: 1 month and 20 days H2O cluster name: H2O_started_from_R_jesus_rqh095 H2O cluster total nodes: 1 H2O cluster total memory: 1.05 GB H2O cluster total cores: 4 H2O cluster allowed cores: 4 H2O cluster healthy: TRUE H2O Connection ip: localhost H2O Connection port: 54321 H2O Connection proxy: NA H2O Internal Security: FALSE R Version: R version 3.4.1 (2017-06-30) Gracias Jesus [[alternative HTML version deleted]]
Hola, ¿Has probado a forzar a la hora de iniciar h2o que trabaje con el máximo número de cores...? h2o.init(threads = -1) En la ayuda de "h2o.init": nthreads (Optional) Number of threads in the thread pool. This relates very closely to the number of CPUs used. -1 means use all CPUs on the host (Default). A positive integer specifies the number of CPUs directly. This value is only used when R starts H2O. Otra de las cosas que veo en tu código aunque no preguntas por ello es que grabas ficheros con "fwrite" cuando h2o tiene una función propia "h2o.exportFile()" que paraleliza la escritura.... Saludos, Carlos Ortega qualityexcellence.es 2017-08-19 21:26 GMT+02:00 Jesús Para Fernández < j.para.fernandez en hotmail.com>:> Buenas > > > Estoy usando H20 en local y tb en un ec2 de amazon, pero tengo que tener > algo mal configurado seguro. > > Para iniciarlo, hago lo siguiente: > > conexion<-h2o.init() > > > Me arranca el cluster con el maximo de cores y memoria que se permite. > > > Una vez hech oesto, quiero calcular la distancia entre dos data.frames: > > > uno<-data.frame(matrix(rnorm(300000),ncol=10)) > > dos<-data.frame(matrix(rnorm(500),ncol=10)) > > uno<-as.h2o(uno) > > dos<-as.h2o(dos) > > matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos)) > > for(i in nrow(dos)){ > > matriz[,i]<-h2o.distance(uno,dos[i,]) > > } > > > > Al hacerlo, y haciendo uso de htop veo que de lso 4 nucleos de mi pc o los > 16 del ec2 de amazon, solo se usa uno, y es mas, en el ec2 esta tardando en > ejecutarlo mas que en el pc. > > Por ello creo que no esta paralelizando bien. ¿A alguien le ha ocurrido? > > > Si hago un h2o.clusterStatus() me aparece que esta todo OK > > R version 3.4.1 (2017-06-30) -- "Single Candle" > Copyright (C) 2017 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > > R es un software libre y viene sin GARANTIA ALGUNA. > Usted puede redistribuirlo bajo ciertas circunstancias. > Escriba 'license()' o 'licence()' para detalles de distribucion. > > R es un proyecto colaborativo con muchos contribuyentes. > Escriba 'contributors()' para obtener más información y > 'citation()' para saber cómo citar R o paquetes de R en publicaciones. > > Escriba 'demo()' para demostraciones, 'help()' para el sistema on-line de > ayuda, > o 'help.start()' para abrir el sistema de ayuda HTML con su navegador. > Escriba 'q()' para salir de R. > > [Workspace loaded from ~/.RData] > > > library(h2o) > > ---------------------------------------------------------------------- > > Your next step is to start H2O: > > h2o.init() > > For H2O package documentation, ask for help: > > ??h2o > > After starting H2O, you can use the Web UI at localhost:54321 > For more information visit docs.h2o.ai > > ---------------------------------------------------------------------- > > > Attaching package: ?h2o? > > The following objects are masked from ?package:stats?: > > cor, sd, var > > The following objects are masked from ?package:base?: > > ||, &&, %*%, apply, as.factor, as.numeric, colnames, colnames<-, > ifelse, %in%, > is.character, is.factor, is.numeric, log, log10, log1p, log2, round, > signif, trunc > > > h2o.init() > > H2O is not running yet, starting it now... > > Note: In case of errors look at the following log files: > /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.out > /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.err > > java version "1.8.0_144" > Java(TM) SE Runtime Environment (build 1.8.0_144-b01) > Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) > > Starting H2O JVM and connecting: .. Connection successful! > > R is connected to the H2O cluster: > H2O cluster uptime: 1 seconds 905 milliseconds > H2O cluster version: 3.10.5.3 > H2O cluster version age: 1 month and 20 days > H2O cluster name: H2O_started_from_R_jesus_rqh095 > H2O cluster total nodes: 1 > H2O cluster total memory: 1.71 GB > H2O cluster total cores: 4 > H2O cluster allowed cores: 4 > H2O cluster healthy: TRUE > H2O Connection ip: localhost > H2O Connection port: 54321 > H2O Connection proxy: NA > H2O Internal Security: FALSE > R Version: R version 3.4.1 (2017-06-30) > > > gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 679385 36.3 1168576 62.5 940480 50.3 > Vcells 1138497 8.7 1920143 14.7 1532430 11.7 > > rm(list=ls()) > > datos<-read.table("/home/jesus/master/datos/datos- > balanceado/datos-100/datos.csv",header=T,dec=".",sep=",") > > uno<-datos[datos$InspectionReport == "ACCEPTED",] > > dos<-datos[datos$InspectionReport != "ACCEPTED",] > > uno$InspectionReport<-NULL > > dos$InspectionReport<-NULL > > uno2<-as.h2o(uno) > |=======================================================================================| > 100% > > dos2<-as.h2o(dos) > |=======================================================================================| > 100% > > h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/ > jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv")) > Error in !missing(row) && !(base::is.character(row)) : > objeto 'i' no encontrado > > i<-1 > > h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/ > jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv")) > |=======================================================================================| > 100% > > t=Sys.time() > > for(i in 1:10){ > + h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/ > jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv")) > + > + } > > ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url > localhost:54321/3/Frames/RTMP_sid_8e47_4/export) > > water.exceptions.H2OIllegalArgumentException > [1] "water.exceptions.H2OIllegalArgumentException: Illegal argument: > /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv of > function: exportFrame: File /home/jesus/master/datos/ > datos-balanceado/matriz-overlapping/k1.csv already exists!" > [2] " water.fvec.Frame.export(Frame.java:1370)" > [3] " water.api.FramesHandler.export(FramesHandler.java:258)" > [4] " sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)" > [5] " sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62)" > [6] " sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43)" > [7] " java.lang.reflect.Method.invoke(Method.java:498)" > [8] " water.api.Handler.handle(Handler.java:63)" > [9] " water.api.RequestServer.serve(RequestServer.java:448)" > [10] " water.api.RequestServer.doGeneric(RequestServer.java:297)" > [11] " water.api.RequestServer.doPost(RequestServer.java:223)" > [12] " javax.servlet.http.HttpServlet.service(HttpServlet.java:755)" > [13] " javax.servlet.http.HttpServlet.service(HttpServlet.java:848)" > [14] " org.eclipse.jetty.servlet.ServletHolder.handle( > ServletHolder.java:684)" > [15] " org.eclipse.jetty.servlet.ServletHandler.doHandle( > ServletHandler.java:503)" > [16] " org.eclipse.jetty.server.handler.ContextHandler. > doHandle(ContextHandler.java:1086)" > [17] " org.eclipse.jetty.servlet.ServletHandler.doScope( > ServletHandler.java:429)" > [18] " org.eclipse.jetty.server.handler.ContextHandler. > doScope(ContextHandler.java:1020)" > [19] " org.eclipse.jetty.server.handler.ScopedHandler.handle( > ScopedHandler.java:135)" > [20] " org.eclipse.jetty.server.handler.HandlerCollection. > handle(HandlerCollection.java:154)" > [21] " org.eclipse.jetty.server.handler.HandlerWrapper.handle( > HandlerWrapper.java:116)" > [22] " water.JettyHTTPD$LoginHandler.handle(JettyHTTPD.java:183)" > [23] " org.eclipse.jetty.server.handler.HandlerCollection. > handle(HandlerCollection.java:154)" > [24] " org.eclipse.jetty.server.handler.HandlerWrapper.handle( > HandlerWrapper.java:116)" > [25] " org.eclipse.jetty.server.Server.handle(Server.java:370)" > [26] " org.eclipse.jetty.server.AbstractHttpConnection.handleRequest( > AbstractHttpConnection.java:494)" > [27] " org.eclipse.jetty.server.BlockingHttpConnection.handleRequest( > BlockingHttpConnection.java:53)" > [28] " org.eclipse.jetty.server.AbstractHttpConnection.content( > AbstractHttpConnection.java:982)" > [29] " org.eclipse.jetty.server.AbstractHttpConnection$ > RequestHandler.content(AbstractHttpConnection.java:1043)" > [30] " org.eclipse.jetty.http.HttpParser.parseNext( > HttpParser.java:865)" > [31] " org.eclipse.jetty.http.HttpParser.parseAvailable( > HttpParser.java:240)" > [32] " org.eclipse.jetty.server.BlockingHttpConnection.handle( > BlockingHttpConnection.java:72)" > [33] " org.eclipse.jetty.server.bio.SocketConnector$ > ConnectorEndPoint.run(SocketConnector.java:264)" > [34] " org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( > QueuedThreadPool.java:608)" > [35] " org.eclipse.jetty.util.thread.QueuedThreadPool$3.run( > QueuedThreadPool.java:543)" > [36] " java.lang.Thread.run(Thread.java:748)" > > Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix > = page, : > > > ERROR MESSAGE: > > Illegal argument: /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv > of function: exportFrame: File /home/jesus/master/datos/ > datos-balanceado/matriz-overlapping/k1.csv already exists! > > > > > print(Sys.time()-t) > Time difference of 0.218374 secs > > ?h2o.exportFile > > t=Sys.time() > > for(i in 1:10){ > + h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/ > jesus/master/datos/datos-balanceado/matriz-overlapping/ > k",i,".csv"),force=T) > + > + } > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > > > > print(Sys.time()-t) > Time difference of 11.31977 secs > > matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos)) > |=======================================================================================| > 100% > > t=Sys.time() > > for(i in 1:10){ > + > + matriz[,j]<-h2o.distance(uno2,dos2[i,]) > + } > Error in !allCol && is.na(col) : objeto 'j' no encontrado > > > > print(Sys.time()-t) > Time difference of 0.006168127 secs > > t=Sys.time() > > for(i in 1:10){ > + > + matriz[,i]<-h2o.distance(uno2,dos2[i,]) > + } > > > > print(Sys.time()-t) > Time difference of 30.33803 secs > > 10/30 > [1] 0.3333333 > > 30/10*nrow(dos) > [1] 16068 > > 30/10*nrow(dos) > [1] 16068 > > 30/10*nrow(dos)/3600 > [1] 4.463333 > > library(data.table) > data.table 1.10.4 > The fastest way to learn (by data.table authors): > datacamp.com/courses/data-analysis-the-data-table-way > Documentation: ?data.table, example(data.table) and > browseVignettes("data.table") > Release notes, videos and slides: r-datatable.com > > Attaching package: ?data.table? > > The following objects are masked from ?package:h2o?: > > hour, month, week, year > > > t=Sys.time() > > for(i in 1:10){ > + fwrite(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/ > master/datos/datos-balanceado/matriz-overlapping/k",i,".csv")) > + > + } > Error: is.list(x) is not TRUE > > print(Sys.time()-t) > Time difference of 0.1015148 secs > > ?fwrite > > matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos)) > |=======================================================================================| > 100% > > t=Sys.time() > > for(i in 1:10){ > + > + matriz[,i]<-h2o.distance(uno2,dos2[i,]) > + } > > > > print(Sys.time()-t) > Time difference of 28.89684 secs > > 30/10 > [1] 3 > > 30/10*nrow(dos) > [1] 16068 > > 30/10*nrow(dos)/3600 > [1] 4.463333 > > t=Sys.time() > > for(i in 1:50){ > + > + matriz[,i]<-h2o.distance(uno2,dos2[i,]) > + } > > > > print(Sys.time()-t) > Time difference of 2.506209 mins > > 2*60+30 > [1] 150 > > 150/50 > [1] 3 > > h2o.cluster_sizes() > Error in .model.parts(object) : > el argumento "object" está ausente, sin valor por omisión > > h2o.clusterStatus() > Version: 3.10.5.3 > Cluster name: H2O_started_from_R_jesus_rqh095 > Cluster size: 1 > Cluster is locked > > h2o healthy last_ping num_cpus sys_load > mem_value_size free_mem > 1 localhost/127.0.0.1:54321 TRUE 1.50317e+12 4 0.55 > 707369984 1129209856 > pojo_mem swap_mem free_disk max_disk pid num_keys tcps_active > open_fds rpcs_active > 1 0 0 6571425792 20121124864 8553 16417 0 > 38 0 > > ?h2o.init() > > h2o.clusterIsUp() > [1] TRUE > > h2o.clusterInfo() > R is connected to the H2O cluster: > H2O cluster uptime: 1 hours 31 minutes > H2O cluster version: 3.10.5.3 > H2O cluster version age: 1 month and 20 days > H2O cluster name: H2O_started_from_R_jesus_rqh095 > H2O cluster total nodes: 1 > H2O cluster total memory: 1.05 GB > H2O cluster total cores: 4 > H2O cluster allowed cores: 4 > H2O cluster healthy: TRUE > H2O Connection ip: localhost > H2O Connection port: 54321 > H2O Connection proxy: NA > H2O Internal Security: FALSE > R Version: R version 3.4.1 (2017-06-30) > > h2o.cluster_sizes(dos2) > Error in .model.parts(object) : > tentativa de obtener un slot "model" de un objeto cuya clase > ("H2OFrame") que no es un objecto clase S4 > > h2o.clusterInfo() > R is connected to the H2O cluster: > H2O cluster uptime: 1 hours 37 minutes > H2O cluster version: 3.10.5.3 > H2O cluster version age: 1 month and 20 days > H2O cluster name: H2O_started_from_R_jesus_rqh095 > H2O cluster total nodes: 1 > H2O cluster total memory: 1.05 GB > H2O cluster total cores: 4 > H2O cluster allowed cores: 4 > H2O cluster healthy: TRUE > H2O Connection ip: localhost > H2O Connection port: 54321 > H2O Connection proxy: NA > H2O Internal Security: FALSE > R Version: R version 3.4.1 (2017-06-30) > > > > Gracias > > Jesus > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > R-help-es mailing list > R-help-es en r-project.org > stat.ethz.ch/mailman/listinfo/r-help-es >-- Saludos, Carlos Ortega qualityexcellence.es [[alternative HTML version deleted]]
Nada, lo he probado y ni con esas...aunque creo que algo si esta haciendo bien, ya que calcula la distancia de una manera increibilemente rapida (aunque no se nota hacerlo en local o en una instancia ec2...) Lo que me parece raro es que si grabo cada archivo uqe si lo almaceno en una matriz de h2o. Es decir, me va mas rapido este codigo: for(i in nrow(dos)){ h2o.exportFile(h2o.distance(uno,dos[i,]),paste0("/home/jesus/datos",i,".csv")) } que este: matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos)) for(i in nrow(dos)){ matriz[,i]<-h2o.distance(uno,dos[i,]) } Lo cual a primera vista no tendriamucho sentido, que se tarde menos en escribir en disco que en una matriz. (aunque la matriz tiene un peso de 2.7 gb...) ________________________________ De: Jesús Para Fernández <j.para.fernandez en hotmail.com> Enviado: sábado, 19 de agosto de 2017 21:26 Para: r-help-es en r-project.org Asunto: Problemas h2O Buenas Estoy usando H20 en local y tb en un ec2 de amazon, pero tengo que tener algo mal configurado seguro. Para iniciarlo, hago lo siguiente: conexion<-h2o.init() Me arranca el cluster con el maximo de cores y memoria que se permite. Una vez hech oesto, quiero calcular la distancia entre dos data.frames: uno<-data.frame(matrix(rnorm(300000),ncol=10)) dos<-data.frame(matrix(rnorm(500),ncol=10)) uno<-as.h2o(uno) dos<-as.h2o(dos) matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos)) for(i in nrow(dos)){ matriz[,i]<-h2o.distance(uno,dos[i,]) } Al hacerlo, y haciendo uso de htop veo que de lso 4 nucleos de mi pc o los 16 del ec2 de amazon, solo se usa uno, y es mas, en el ec2 esta tardando en ejecutarlo mas que en el pc. Por ello creo que no esta paralelizando bien. ¿A alguien le ha ocurrido? Si hago un h2o.clusterStatus() me aparece que esta todo OK R version 3.4.1 (2017-06-30) -- "Single Candle" Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R es un software libre y viene sin GARANTIA ALGUNA. Usted puede redistribuirlo bajo ciertas circunstancias. Escriba 'license()' o 'licence()' para detalles de distribucion. R es un proyecto colaborativo con muchos contribuyentes. Escriba 'contributors()' para obtener más información y 'citation()' para saber cómo citar R o paquetes de R en publicaciones. Escriba 'demo()' para demostraciones, 'help()' para el sistema on-line de ayuda, o 'help.start()' para abrir el sistema de ayuda HTML con su navegador. Escriba 'q()' para salir de R. [Workspace loaded from ~/.RData]> library(h2o)---------------------------------------------------------------------- Your next step is to start H2O: > h2o.init() For H2O package documentation, ask for help: > ??h2o After starting H2O, you can use the Web UI at localhost:54321 For more information visit docs.h2o.ai ---------------------------------------------------------------------- Attaching package: ‘h2o’ The following objects are masked from ‘package:stats’: cor, sd, var The following objects are masked from ‘package:base’: ||, &&, %*%, apply, as.factor, as.numeric, colnames, colnames<-, ifelse, %in%, is.character, is.factor, is.numeric, log, log10, log1p, log2, round, signif, trunc> h2o.init()H2O is not running yet, starting it now... Note: In case of errors look at the following log files: /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.out /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.err java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) Starting H2O JVM and connecting: .. Connection successful! R is connected to the H2O cluster: H2O cluster uptime: 1 seconds 905 milliseconds H2O cluster version: 3.10.5.3 H2O cluster version age: 1 month and 20 days H2O cluster name: H2O_started_from_R_jesus_rqh095 H2O cluster total nodes: 1 H2O cluster total memory: 1.71 GB H2O cluster total cores: 4 H2O cluster allowed cores: 4 H2O cluster healthy: TRUE H2O Connection ip: localhost H2O Connection port: 54321 H2O Connection proxy: NA H2O Internal Security: FALSE R Version: R version 3.4.1 (2017-06-30)> gc()used (Mb) gc trigger (Mb) max used (Mb) Ncells 679385 36.3 1168576 62.5 940480 50.3 Vcells 1138497 8.7 1920143 14.7 1532430 11.7> rm(list=ls()) > datos<-read.table("/home/jesus/master/datos/datos-balanceado/datos-100/datos.csv",header=T,dec=".",sep=",") > uno<-datos[datos$InspectionReport == "ACCEPTED",] > dos<-datos[datos$InspectionReport != "ACCEPTED",] > uno$InspectionReport<-NULL > dos$InspectionReport<-NULL > uno2<-as.h2o(uno)|=======================================================================================| 100%> dos2<-as.h2o(dos)|=======================================================================================| 100%> h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))Error in !missing(row) && !(base::is.character(row)) : objeto 'i' no encontrado> i<-1 > h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))|=======================================================================================| 100%> t=Sys.time() > for(i in 1:10){+ h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv")) + + } ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url = localhost:54321/3/Frames/RTMP_sid_8e47_4/export) water.exceptions.H2OIllegalArgumentException [1] "water.exceptions.H2OIllegalArgumentException: Illegal argument: /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv of function: exportFrame: File /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv already exists!" [2] " water.fvec.Frame.export(Frame.java:1370)" [3] " water.api.FramesHandler.export(FramesHandler.java:258)" [4] " sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)" [5] " sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)" [6] " sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)" [7] " java.lang.reflect.Method.invoke(Method.java:498)" [8] " water.api.Handler.handle(Handler.java:63)" [9] " water.api.RequestServer.serve(RequestServer.java:448)" [10] " water.api.RequestServer.doGeneric(RequestServer.java:297)" [11] " water.api.RequestServer.doPost(RequestServer.java:223)" [12] " javax.servlet.http.HttpServlet.service(HttpServlet.java:755)" [13] " javax.servlet.http.HttpServlet.service(HttpServlet.java:848)" [14] " org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)" [15] " org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)" [16] " org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)" [17] " org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)" [18] " org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)" [19] " org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)" [20] " org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)" [21] " org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)" [22] " water.JettyHTTPD$LoginHandler.handle(JettyHTTPD.java:183)" [23] " org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)" [24] " org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)" [25] " org.eclipse.jetty.server.Server.handle(Server.java:370)" [26] " org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)" [27] " org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)" [28] " org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)" [29] " org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)" [30] " org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)" [31] " org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)" [32] " org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)" [33] " org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)" [34] " org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)" [35] " org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)" [36] " java.lang.Thread.run(Thread.java:748)" Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, : ERROR MESSAGE: Illegal argument: /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv of function: exportFrame: File /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv already exists!> > print(Sys.time()-t)Time difference of 0.218374 secs> ?h2o.exportFile > t=Sys.time() > for(i in 1:10){+ h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"),force=T) + + } |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100% |=======================================================================================| 100%> > print(Sys.time()-t)Time difference of 11.31977 secs> matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))|=======================================================================================| 100%> t=Sys.time() > for(i in 1:10){+ + matriz[,j]<-h2o.distance(uno2,dos2[i,]) + } Error in !allCol && is.na(col) : objeto 'j' no encontrado> > print(Sys.time()-t)Time difference of 0.006168127 secs> t=Sys.time() > for(i in 1:10){+ + matriz[,i]<-h2o.distance(uno2,dos2[i,]) + }> > print(Sys.time()-t)Time difference of 30.33803 secs> 10/30[1] 0.3333333> 30/10*nrow(dos)[1] 16068> 30/10*nrow(dos)[1] 16068> 30/10*nrow(dos)/3600[1] 4.463333> library(data.table)data.table 1.10.4 The fastest way to learn (by data.table authors): datacamp.com/courses/data-analysis-the-data-table-way Documentation: ?data.table, example(data.table) and browseVignettes("data.table") Release notes, videos and slides: r-datatable.com Attaching package: ‘data.table’ The following objects are masked from ‘package:h2o’: hour, month, week, year> t=Sys.time() > for(i in 1:10){+ fwrite(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv")) + + } Error: is.list(x) is not TRUE> print(Sys.time()-t)Time difference of 0.1015148 secs> ?fwrite > matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))|=======================================================================================| 100%> t=Sys.time() > for(i in 1:10){+ + matriz[,i]<-h2o.distance(uno2,dos2[i,]) + }> > print(Sys.time()-t)Time difference of 28.89684 secs> 30/10[1] 3> 30/10*nrow(dos)[1] 16068> 30/10*nrow(dos)/3600[1] 4.463333> t=Sys.time() > for(i in 1:50){+ + matriz[,i]<-h2o.distance(uno2,dos2[i,]) + }> > print(Sys.time()-t)Time difference of 2.506209 mins> 2*60+30[1] 150> 150/50[1] 3> h2o.cluster_sizes()Error in .model.parts(object) : el argumento "object" está ausente, sin valor por omisión> h2o.clusterStatus()Version: 3.10.5.3 Cluster name: H2O_started_from_R_jesus_rqh095 Cluster size: 1 Cluster is locked h2o healthy last_ping num_cpus sys_load mem_value_size free_mem 1 localhost/127.0.0.1:54321 TRUE 1.50317e+12 4 0.55 707369984 1129209856 pojo_mem swap_mem free_disk max_disk pid num_keys tcps_active open_fds rpcs_active 1 0 0 6571425792 20121124864 8553 16417 0 38 0> ?h2o.init() > h2o.clusterIsUp()[1] TRUE> h2o.clusterInfo()R is connected to the H2O cluster: H2O cluster uptime: 1 hours 31 minutes H2O cluster version: 3.10.5.3 H2O cluster version age: 1 month and 20 days H2O cluster name: H2O_started_from_R_jesus_rqh095 H2O cluster total nodes: 1 H2O cluster total memory: 1.05 GB H2O cluster total cores: 4 H2O cluster allowed cores: 4 H2O cluster healthy: TRUE H2O Connection ip: localhost H2O Connection port: 54321 H2O Connection proxy: NA H2O Internal Security: FALSE R Version: R version 3.4.1 (2017-06-30)> h2o.cluster_sizes(dos2)Error in .model.parts(object) : tentativa de obtener un slot "model" de un objeto cuya clase ("H2OFrame") que no es un objecto clase S4> h2o.clusterInfo()R is connected to the H2O cluster: H2O cluster uptime: 1 hours 37 minutes H2O cluster version: 3.10.5.3 H2O cluster version age: 1 month and 20 days H2O cluster name: H2O_started_from_R_jesus_rqh095 H2O cluster total nodes: 1 H2O cluster total memory: 1.05 GB H2O cluster total cores: 4 H2O cluster allowed cores: 4 H2O cluster healthy: TRUE H2O Connection ip: localhost H2O Connection port: 54321 H2O Connection proxy: NA H2O Internal Security: FALSE R Version: R version 3.4.1 (2017-06-30) Gracias Jesus [[alternative HTML version deleted]]