Buenas
Estoy usando H20 en local y tb en un ec2 de amazon, pero tengo que tener algo
mal configurado seguro.
Para iniciarlo, hago lo siguiente:
conexion<-h2o.init()
Me arranca el cluster con el maximo de cores y memoria que se permite.
Una vez hech oesto, quiero calcular la distancia entre dos data.frames:
uno<-data.frame(matrix(rnorm(300000),ncol=10))
dos<-data.frame(matrix(rnorm(500),ncol=10))
uno<-as.h2o(uno)
dos<-as.h2o(dos)
matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))
for(i in nrow(dos)){
matriz[,i]<-h2o.distance(uno,dos[i,])
}
Al hacerlo, y haciendo uso de htop veo que de lso 4 nucleos de mi pc o los 16
del ec2 de amazon, solo se usa uno, y es mas, en el ec2 esta tardando en
ejecutarlo mas que en el pc.
Por ello creo que no esta paralelizando bien. ¿A alguien le ha ocurrido?
Si hago un h2o.clusterStatus() me aparece que esta todo OK
R version 3.4.1 (2017-06-30) -- "Single Candle"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R es un software libre y viene sin GARANTIA ALGUNA.
Usted puede redistribuirlo bajo ciertas circunstancias.
Escriba 'license()' o 'licence()' para detalles de distribucion.
R es un proyecto colaborativo con muchos contribuyentes.
Escriba 'contributors()' para obtener más información y
'citation()' para saber cómo citar R o paquetes de R en publicaciones.
Escriba 'demo()' para demostraciones, 'help()' para el sistema
on-line de ayuda,
o 'help.start()' para abrir el sistema de ayuda HTML con su navegador.
Escriba 'q()' para salir de R.
[Workspace loaded from ~/.RData]
> library(h2o)
----------------------------------------------------------------------
Your next step is to start H2O:
    > h2o.init()
For H2O package documentation, ask for help:
    > ??h2o
After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai
----------------------------------------------------------------------
Attaching package: ‘h2o’
The following objects are masked from ‘package:stats’:
    cor, sd, var
The following objects are masked from ‘package:base’:
    ||, &&, %*%, apply, as.factor, as.numeric, colnames, colnames<-,
ifelse, %in%,
    is.character, is.factor, is.numeric, log, log10, log1p, log2, round, signif,
trunc
> h2o.init()
H2O is not running yet, starting it now...
Note:  In case of errors look at the following log files:
    /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.out
    /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.err
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
Starting H2O JVM and connecting: .. Connection successful!
R is connected to the H2O cluster:
    H2O cluster uptime:         1 seconds 905 milliseconds
    H2O cluster version:        3.10.5.3
    H2O cluster version age:    1 month and 20 days
    H2O cluster name:           H2O_started_from_R_jesus_rqh095
    H2O cluster total nodes:    1
    H2O cluster total memory:   1.71 GB
    H2O cluster total cores:    4
    H2O cluster allowed cores:  4
    H2O cluster healthy:        TRUE
    H2O Connection ip:          localhost
    H2O Connection port:        54321
    H2O Connection proxy:       NA
    H2O Internal Security:      FALSE
    R Version:                  R version 3.4.1 (2017-06-30)
> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  679385 36.3    1168576 62.5   940480 50.3
Vcells 1138497  8.7    1920143 14.7  1532430 11.7> rm(list=ls())
>
datos<-read.table("/home/jesus/master/datos/datos-balanceado/datos-100/datos.csv",header=T,dec=".",sep=",")
> uno<-datos[datos$InspectionReport == "ACCEPTED",]
> dos<-datos[datos$InspectionReport != "ACCEPTED",]
> uno$InspectionReport<-NULL
> dos$InspectionReport<-NULL
> uno2<-as.h2o(uno)
 
|=======================================================================================|
100%> dos2<-as.h2o(dos)
 
|=======================================================================================|
100%>
h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
Error in !missing(row) && !(base::is.character(row)) :
  objeto 'i' no encontrado> i<-1
>
h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
 
|=======================================================================================|
100%> t=Sys.time()
> for(i in 1:10){
+ 
h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
+
+ }
ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url =
http://localhost:54321/3/Frames/RTMP_sid_8e47_4/export)
water.exceptions.H2OIllegalArgumentException
 [1] "water.exceptions.H2OIllegalArgumentException: Illegal argument:
/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv of function:
exportFrame: File
/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv already
exists!"
 [2] "    water.fvec.Frame.export(Frame.java:1370)"
 [3] "    water.api.FramesHandler.export(FramesHandler.java:258)"
 [4] "    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"
 [5] "   
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)"
 [6] "   
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)"
 [7] "    java.lang.reflect.Method.invoke(Method.java:498)"
 [8] "    water.api.Handler.handle(Handler.java:63)"
 [9] "    water.api.RequestServer.serve(RequestServer.java:448)"
[10] "    water.api.RequestServer.doGeneric(RequestServer.java:297)"
[11] "    water.api.RequestServer.doPost(RequestServer.java:223)"
[12] "   
javax.servlet.http.HttpServlet.service(HttpServlet.java:755)"
[13] "   
javax.servlet.http.HttpServlet.service(HttpServlet.java:848)"
[14] "   
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)"
[15] "   
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)"
[16] "   
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)"
[17] "   
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)"
[18] "   
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)"
[19] "   
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)"
[20] "   
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"
[21] "   
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"
[22] "    water.JettyHTTPD$LoginHandler.handle(JettyHTTPD.java:183)"
[23] "   
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"
[24] "   
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"
[25] "    org.eclipse.jetty.server.Server.handle(Server.java:370)"
[26] "   
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)"
[27] "   
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)"
[28] "   
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)"
[29] "   
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)"
[30] "   
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)"
[31] "   
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)"
[32] "   
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)"
[33] "   
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)"
[34] "   
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)"
[35] "   
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)"
[36] "    java.lang.Thread.run(Thread.java:748)"
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix =
page,  :
ERROR MESSAGE:
Illegal argument:
/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv of function:
exportFrame: File
/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv already
exists!
>
> print(Sys.time()-t)
Time difference of 0.218374 secs> ?h2o.exportFile
> t=Sys.time()
> for(i in 1:10){
+ 
h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"),force=T)
+
+ }
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%>
> print(Sys.time()-t)
Time difference of 11.31977 secs> matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))
 
|=======================================================================================|
100%> t=Sys.time()
> for(i in 1:10){
+
+  matriz[,j]<-h2o.distance(uno2,dos2[i,])
+ }
Error in !allCol && is.na(col) : objeto 'j' no
encontrado>
> print(Sys.time()-t)
Time difference of 0.006168127 secs> t=Sys.time()
> for(i in 1:10){
+
+  matriz[,i]<-h2o.distance(uno2,dos2[i,])
+ }>
> print(Sys.time()-t)
Time difference of 30.33803 secs> 10/30
[1] 0.3333333> 30/10*nrow(dos)
[1] 16068> 30/10*nrow(dos)
[1] 16068> 30/10*nrow(dos)/3600
[1] 4.463333> library(data.table)
data.table 1.10.4
  The fastest way to learn (by data.table authors):
https://www.datacamp.com/courses/data-analysis-the-data-table-way
  Documentation: ?data.table, example(data.table) and
browseVignettes("data.table")
  Release notes, videos and slides: http://r-datatable.com
Attaching package: ‘data.table’
The following objects are masked from ‘package:h2o’:
    hour, month, week, year
> t=Sys.time()
> for(i in 1:10){
+
fwrite(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
+
+ }
Error: is.list(x) is not TRUE> print(Sys.time()-t)
Time difference of 0.1015148 secs> ?fwrite
> matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))
 
|=======================================================================================|
100%> t=Sys.time()
> for(i in 1:10){
+
+  matriz[,i]<-h2o.distance(uno2,dos2[i,])
+ }>
> print(Sys.time()-t)
Time difference of 28.89684 secs> 30/10
[1] 3> 30/10*nrow(dos)
[1] 16068> 30/10*nrow(dos)/3600
[1] 4.463333> t=Sys.time()
> for(i in 1:50){
+
+  matriz[,i]<-h2o.distance(uno2,dos2[i,])
+ }>
> print(Sys.time()-t)
Time difference of 2.506209 mins> 2*60+30
[1] 150> 150/50
[1] 3> h2o.cluster_sizes()
Error in .model.parts(object) :
  el argumento "object" está ausente, sin valor por
omisión> h2o.clusterStatus()
Version: 3.10.5.3
Cluster name: H2O_started_from_R_jesus_rqh095
Cluster size: 1
Cluster is locked
                        h2o healthy   last_ping num_cpus sys_load mem_value_size
free_mem
1 localhost/127.0.0.1:54321    TRUE 1.50317e+12        4     0.55      707369984
1129209856
  pojo_mem swap_mem  free_disk    max_disk  pid num_keys tcps_active open_fds
rpcs_active
1        0        0 6571425792 20121124864 8553    16417           0       38   
0> ?h2o.init()
> h2o.clusterIsUp()
[1] TRUE> h2o.clusterInfo()
R is connected to the H2O cluster:
    H2O cluster uptime:         1 hours 31 minutes
    H2O cluster version:        3.10.5.3
    H2O cluster version age:    1 month and 20 days
    H2O cluster name:           H2O_started_from_R_jesus_rqh095
    H2O cluster total nodes:    1
    H2O cluster total memory:   1.05 GB
    H2O cluster total cores:    4
    H2O cluster allowed cores:  4
    H2O cluster healthy:        TRUE
    H2O Connection ip:          localhost
    H2O Connection port:        54321
    H2O Connection proxy:       NA
    H2O Internal Security:      FALSE
    R Version:                  R version 3.4.1
(2017-06-30)> h2o.cluster_sizes(dos2)
Error in .model.parts(object) :
  tentativa de obtener un slot "model" de un objeto cuya clase
("H2OFrame") que no es un objecto clase S4> h2o.clusterInfo()
R is connected to the H2O cluster:
    H2O cluster uptime:         1 hours 37 minutes
    H2O cluster version:        3.10.5.3
    H2O cluster version age:    1 month and 20 days
    H2O cluster name:           H2O_started_from_R_jesus_rqh095
    H2O cluster total nodes:    1
    H2O cluster total memory:   1.05 GB
    H2O cluster total cores:    4
    H2O cluster allowed cores:  4
    H2O cluster healthy:        TRUE
    H2O Connection ip:          localhost
    H2O Connection port:        54321
    H2O Connection proxy:       NA
    H2O Internal Security:      FALSE
    R Version:                  R version 3.4.1 (2017-06-30)
Gracias
Jesus
	[[alternative HTML version deleted]]
Hola, ¿Has probado a forzar a la hora de iniciar h2o que trabaje con el máximo número de cores...? h2o.init(threads = -1) En la ayuda de "h2o.init": nthreads (Optional) Number of threads in the thread pool. This relates very closely to the number of CPUs used. -1 means use all CPUs on the host (Default). A positive integer specifies the number of CPUs directly. This value is only used when R starts H2O. Otra de las cosas que veo en tu código aunque no preguntas por ello es que grabas ficheros con "fwrite" cuando h2o tiene una función propia "h2o.exportFile()" que paraleliza la escritura.... Saludos, Carlos Ortega www.qualityexcellence.es 2017-08-19 21:26 GMT+02:00 Jesús Para Fernández < j.para.fernandez en hotmail.com>:> Buenas > > > Estoy usando H20 en local y tb en un ec2 de amazon, pero tengo que tener > algo mal configurado seguro. > > Para iniciarlo, hago lo siguiente: > > conexion<-h2o.init() > > > Me arranca el cluster con el maximo de cores y memoria que se permite. > > > Una vez hech oesto, quiero calcular la distancia entre dos data.frames: > > > uno<-data.frame(matrix(rnorm(300000),ncol=10)) > > dos<-data.frame(matrix(rnorm(500),ncol=10)) > > uno<-as.h2o(uno) > > dos<-as.h2o(dos) > > matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos)) > > for(i in nrow(dos)){ > > matriz[,i]<-h2o.distance(uno,dos[i,]) > > } > > > > Al hacerlo, y haciendo uso de htop veo que de lso 4 nucleos de mi pc o los > 16 del ec2 de amazon, solo se usa uno, y es mas, en el ec2 esta tardando en > ejecutarlo mas que en el pc. > > Por ello creo que no esta paralelizando bien. ¿A alguien le ha ocurrido? > > > Si hago un h2o.clusterStatus() me aparece que esta todo OK > > R version 3.4.1 (2017-06-30) -- "Single Candle" > Copyright (C) 2017 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > > R es un software libre y viene sin GARANTIA ALGUNA. > Usted puede redistribuirlo bajo ciertas circunstancias. > Escriba 'license()' o 'licence()' para detalles de distribucion. > > R es un proyecto colaborativo con muchos contribuyentes. > Escriba 'contributors()' para obtener más información y > 'citation()' para saber cómo citar R o paquetes de R en publicaciones. > > Escriba 'demo()' para demostraciones, 'help()' para el sistema on-line de > ayuda, > o 'help.start()' para abrir el sistema de ayuda HTML con su navegador. > Escriba 'q()' para salir de R. > > [Workspace loaded from ~/.RData] > > > library(h2o) > > ---------------------------------------------------------------------- > > Your next step is to start H2O: > > h2o.init() > > For H2O package documentation, ask for help: > > ??h2o > > After starting H2O, you can use the Web UI at http://localhost:54321 > For more information visit http://docs.h2o.ai > > ---------------------------------------------------------------------- > > > Attaching package: ?h2o? > > The following objects are masked from ?package:stats?: > > cor, sd, var > > The following objects are masked from ?package:base?: > > ||, &&, %*%, apply, as.factor, as.numeric, colnames, colnames<-, > ifelse, %in%, > is.character, is.factor, is.numeric, log, log10, log1p, log2, round, > signif, trunc > > > h2o.init() > > H2O is not running yet, starting it now... > > Note: In case of errors look at the following log files: > /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.out > /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.err > > java version "1.8.0_144" > Java(TM) SE Runtime Environment (build 1.8.0_144-b01) > Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) > > Starting H2O JVM and connecting: .. Connection successful! > > R is connected to the H2O cluster: > H2O cluster uptime: 1 seconds 905 milliseconds > H2O cluster version: 3.10.5.3 > H2O cluster version age: 1 month and 20 days > H2O cluster name: H2O_started_from_R_jesus_rqh095 > H2O cluster total nodes: 1 > H2O cluster total memory: 1.71 GB > H2O cluster total cores: 4 > H2O cluster allowed cores: 4 > H2O cluster healthy: TRUE > H2O Connection ip: localhost > H2O Connection port: 54321 > H2O Connection proxy: NA > H2O Internal Security: FALSE > R Version: R version 3.4.1 (2017-06-30) > > > gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 679385 36.3 1168576 62.5 940480 50.3 > Vcells 1138497 8.7 1920143 14.7 1532430 11.7 > > rm(list=ls()) > > datos<-read.table("/home/jesus/master/datos/datos- > balanceado/datos-100/datos.csv",header=T,dec=".",sep=",") > > uno<-datos[datos$InspectionReport == "ACCEPTED",] > > dos<-datos[datos$InspectionReport != "ACCEPTED",] > > uno$InspectionReport<-NULL > > dos$InspectionReport<-NULL > > uno2<-as.h2o(uno) > |=======================================================================================| > 100% > > dos2<-as.h2o(dos) > |=======================================================================================| > 100% > > h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/ > jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv")) > Error in !missing(row) && !(base::is.character(row)) : > objeto 'i' no encontrado > > i<-1 > > h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/ > jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv")) > |=======================================================================================| > 100% > > t=Sys.time() > > for(i in 1:10){ > + h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/ > jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv")) > + > + } > > ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url > http://localhost:54321/3/Frames/RTMP_sid_8e47_4/export) > > water.exceptions.H2OIllegalArgumentException > [1] "water.exceptions.H2OIllegalArgumentException: Illegal argument: > /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv of > function: exportFrame: File /home/jesus/master/datos/ > datos-balanceado/matriz-overlapping/k1.csv already exists!" > [2] " water.fvec.Frame.export(Frame.java:1370)" > [3] " water.api.FramesHandler.export(FramesHandler.java:258)" > [4] " sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)" > [5] " sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62)" > [6] " sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43)" > [7] " java.lang.reflect.Method.invoke(Method.java:498)" > [8] " water.api.Handler.handle(Handler.java:63)" > [9] " water.api.RequestServer.serve(RequestServer.java:448)" > [10] " water.api.RequestServer.doGeneric(RequestServer.java:297)" > [11] " water.api.RequestServer.doPost(RequestServer.java:223)" > [12] " javax.servlet.http.HttpServlet.service(HttpServlet.java:755)" > [13] " javax.servlet.http.HttpServlet.service(HttpServlet.java:848)" > [14] " org.eclipse.jetty.servlet.ServletHolder.handle( > ServletHolder.java:684)" > [15] " org.eclipse.jetty.servlet.ServletHandler.doHandle( > ServletHandler.java:503)" > [16] " org.eclipse.jetty.server.handler.ContextHandler. > doHandle(ContextHandler.java:1086)" > [17] " org.eclipse.jetty.servlet.ServletHandler.doScope( > ServletHandler.java:429)" > [18] " org.eclipse.jetty.server.handler.ContextHandler. > doScope(ContextHandler.java:1020)" > [19] " org.eclipse.jetty.server.handler.ScopedHandler.handle( > ScopedHandler.java:135)" > [20] " org.eclipse.jetty.server.handler.HandlerCollection. > handle(HandlerCollection.java:154)" > [21] " org.eclipse.jetty.server.handler.HandlerWrapper.handle( > HandlerWrapper.java:116)" > [22] " water.JettyHTTPD$LoginHandler.handle(JettyHTTPD.java:183)" > [23] " org.eclipse.jetty.server.handler.HandlerCollection. > handle(HandlerCollection.java:154)" > [24] " org.eclipse.jetty.server.handler.HandlerWrapper.handle( > HandlerWrapper.java:116)" > [25] " org.eclipse.jetty.server.Server.handle(Server.java:370)" > [26] " org.eclipse.jetty.server.AbstractHttpConnection.handleRequest( > AbstractHttpConnection.java:494)" > [27] " org.eclipse.jetty.server.BlockingHttpConnection.handleRequest( > BlockingHttpConnection.java:53)" > [28] " org.eclipse.jetty.server.AbstractHttpConnection.content( > AbstractHttpConnection.java:982)" > [29] " org.eclipse.jetty.server.AbstractHttpConnection$ > RequestHandler.content(AbstractHttpConnection.java:1043)" > [30] " org.eclipse.jetty.http.HttpParser.parseNext( > HttpParser.java:865)" > [31] " org.eclipse.jetty.http.HttpParser.parseAvailable( > HttpParser.java:240)" > [32] " org.eclipse.jetty.server.BlockingHttpConnection.handle( > BlockingHttpConnection.java:72)" > [33] " org.eclipse.jetty.server.bio.SocketConnector$ > ConnectorEndPoint.run(SocketConnector.java:264)" > [34] " org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( > QueuedThreadPool.java:608)" > [35] " org.eclipse.jetty.util.thread.QueuedThreadPool$3.run( > QueuedThreadPool.java:543)" > [36] " java.lang.Thread.run(Thread.java:748)" > > Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix > = page, : > > > ERROR MESSAGE: > > Illegal argument: /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv > of function: exportFrame: File /home/jesus/master/datos/ > datos-balanceado/matriz-overlapping/k1.csv already exists! > > > > > print(Sys.time()-t) > Time difference of 0.218374 secs > > ?h2o.exportFile > > t=Sys.time() > > for(i in 1:10){ > + h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/ > jesus/master/datos/datos-balanceado/matriz-overlapping/ > k",i,".csv"),force=T) > + > + } > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > |=======================================================================================| > 100% > > > > print(Sys.time()-t) > Time difference of 11.31977 secs > > matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos)) > |=======================================================================================| > 100% > > t=Sys.time() > > for(i in 1:10){ > + > + matriz[,j]<-h2o.distance(uno2,dos2[i,]) > + } > Error in !allCol && is.na(col) : objeto 'j' no encontrado > > > > print(Sys.time()-t) > Time difference of 0.006168127 secs > > t=Sys.time() > > for(i in 1:10){ > + > + matriz[,i]<-h2o.distance(uno2,dos2[i,]) > + } > > > > print(Sys.time()-t) > Time difference of 30.33803 secs > > 10/30 > [1] 0.3333333 > > 30/10*nrow(dos) > [1] 16068 > > 30/10*nrow(dos) > [1] 16068 > > 30/10*nrow(dos)/3600 > [1] 4.463333 > > library(data.table) > data.table 1.10.4 > The fastest way to learn (by data.table authors): > https://www.datacamp.com/courses/data-analysis-the-data-table-way > Documentation: ?data.table, example(data.table) and > browseVignettes("data.table") > Release notes, videos and slides: http://r-datatable.com > > Attaching package: ?data.table? > > The following objects are masked from ?package:h2o?: > > hour, month, week, year > > > t=Sys.time() > > for(i in 1:10){ > + fwrite(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/ > master/datos/datos-balanceado/matriz-overlapping/k",i,".csv")) > + > + } > Error: is.list(x) is not TRUE > > print(Sys.time()-t) > Time difference of 0.1015148 secs > > ?fwrite > > matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos)) > |=======================================================================================| > 100% > > t=Sys.time() > > for(i in 1:10){ > + > + matriz[,i]<-h2o.distance(uno2,dos2[i,]) > + } > > > > print(Sys.time()-t) > Time difference of 28.89684 secs > > 30/10 > [1] 3 > > 30/10*nrow(dos) > [1] 16068 > > 30/10*nrow(dos)/3600 > [1] 4.463333 > > t=Sys.time() > > for(i in 1:50){ > + > + matriz[,i]<-h2o.distance(uno2,dos2[i,]) > + } > > > > print(Sys.time()-t) > Time difference of 2.506209 mins > > 2*60+30 > [1] 150 > > 150/50 > [1] 3 > > h2o.cluster_sizes() > Error in .model.parts(object) : > el argumento "object" está ausente, sin valor por omisión > > h2o.clusterStatus() > Version: 3.10.5.3 > Cluster name: H2O_started_from_R_jesus_rqh095 > Cluster size: 1 > Cluster is locked > > h2o healthy last_ping num_cpus sys_load > mem_value_size free_mem > 1 localhost/127.0.0.1:54321 TRUE 1.50317e+12 4 0.55 > 707369984 1129209856 > pojo_mem swap_mem free_disk max_disk pid num_keys tcps_active > open_fds rpcs_active > 1 0 0 6571425792 20121124864 8553 16417 0 > 38 0 > > ?h2o.init() > > h2o.clusterIsUp() > [1] TRUE > > h2o.clusterInfo() > R is connected to the H2O cluster: > H2O cluster uptime: 1 hours 31 minutes > H2O cluster version: 3.10.5.3 > H2O cluster version age: 1 month and 20 days > H2O cluster name: H2O_started_from_R_jesus_rqh095 > H2O cluster total nodes: 1 > H2O cluster total memory: 1.05 GB > H2O cluster total cores: 4 > H2O cluster allowed cores: 4 > H2O cluster healthy: TRUE > H2O Connection ip: localhost > H2O Connection port: 54321 > H2O Connection proxy: NA > H2O Internal Security: FALSE > R Version: R version 3.4.1 (2017-06-30) > > h2o.cluster_sizes(dos2) > Error in .model.parts(object) : > tentativa de obtener un slot "model" de un objeto cuya clase > ("H2OFrame") que no es un objecto clase S4 > > h2o.clusterInfo() > R is connected to the H2O cluster: > H2O cluster uptime: 1 hours 37 minutes > H2O cluster version: 3.10.5.3 > H2O cluster version age: 1 month and 20 days > H2O cluster name: H2O_started_from_R_jesus_rqh095 > H2O cluster total nodes: 1 > H2O cluster total memory: 1.05 GB > H2O cluster total cores: 4 > H2O cluster allowed cores: 4 > H2O cluster healthy: TRUE > H2O Connection ip: localhost > H2O Connection port: 54321 > H2O Connection proxy: NA > H2O Internal Security: FALSE > R Version: R version 3.4.1 (2017-06-30) > > > > Gracias > > Jesus > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > R-help-es mailing list > R-help-es en r-project.org > https://stat.ethz.ch/mailman/listinfo/r-help-es >-- Saludos, Carlos Ortega www.qualityexcellence.es [[alternative HTML version deleted]]
Nada, lo he probado y ni con esas...aunque creo que algo si esta haciendo bien,
ya que calcula la distancia de una manera increibilemente rapida (aunque no se
nota hacerlo en local o en una instancia ec2...)
Lo que me parece raro es que si grabo cada archivo uqe si lo almaceno en una
matriz de h2o. Es decir, me va mas rapido este codigo:
for(i in nrow(dos)){
h2o.exportFile(h2o.distance(uno,dos[i,]),paste0("/home/jesus/datos",i,".csv"))
}
que este:
matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))
for(i in nrow(dos)){
matriz[,i]<-h2o.distance(uno,dos[i,])
}
Lo cual a primera vista no tendriamucho sentido, que se tarde menos en escribir
en disco que en una matriz. (aunque la matriz tiene un peso de 2.7 gb...)
________________________________
De: Jesús Para Fernández <j.para.fernandez en hotmail.com>
Enviado: sábado, 19 de agosto de 2017 21:26
Para: r-help-es en r-project.org
Asunto: Problemas h2O
Buenas
Estoy usando H20 en local y tb en un ec2 de amazon, pero tengo que tener algo
mal configurado seguro.
Para iniciarlo, hago lo siguiente:
conexion<-h2o.init()
Me arranca el cluster con el maximo de cores y memoria que se permite.
Una vez hech oesto, quiero calcular la distancia entre dos data.frames:
uno<-data.frame(matrix(rnorm(300000),ncol=10))
dos<-data.frame(matrix(rnorm(500),ncol=10))
uno<-as.h2o(uno)
dos<-as.h2o(dos)
matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))
for(i in nrow(dos)){
matriz[,i]<-h2o.distance(uno,dos[i,])
}
Al hacerlo, y haciendo uso de htop veo que de lso 4 nucleos de mi pc o los 16
del ec2 de amazon, solo se usa uno, y es mas, en el ec2 esta tardando en
ejecutarlo mas que en el pc.
Por ello creo que no esta paralelizando bien. ¿A alguien le ha ocurrido?
Si hago un h2o.clusterStatus() me aparece que esta todo OK
R version 3.4.1 (2017-06-30) -- "Single Candle"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R es un software libre y viene sin GARANTIA ALGUNA.
Usted puede redistribuirlo bajo ciertas circunstancias.
Escriba 'license()' o 'licence()' para detalles de distribucion.
R es un proyecto colaborativo con muchos contribuyentes.
Escriba 'contributors()' para obtener más información y
'citation()' para saber cómo citar R o paquetes de R en publicaciones.
Escriba 'demo()' para demostraciones, 'help()' para el sistema
on-line de ayuda,
o 'help.start()' para abrir el sistema de ayuda HTML con su navegador.
Escriba 'q()' para salir de R.
[Workspace loaded from ~/.RData]
> library(h2o)
----------------------------------------------------------------------
Your next step is to start H2O:
    > h2o.init()
For H2O package documentation, ask for help:
    > ??h2o
After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai
----------------------------------------------------------------------
Attaching package: ‘h2o’
The following objects are masked from ‘package:stats’:
    cor, sd, var
The following objects are masked from ‘package:base’:
    ||, &&, %*%, apply, as.factor, as.numeric, colnames, colnames<-,
ifelse, %in%,
    is.character, is.factor, is.numeric, log, log10, log1p, log2, round, signif,
trunc
> h2o.init()
H2O is not running yet, starting it now...
Note:  In case of errors look at the following log files:
    /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.out
    /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.err
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
Starting H2O JVM and connecting: .. Connection successful!
R is connected to the H2O cluster:
    H2O cluster uptime:         1 seconds 905 milliseconds
    H2O cluster version:        3.10.5.3
    H2O cluster version age:    1 month and 20 days
    H2O cluster name:           H2O_started_from_R_jesus_rqh095
    H2O cluster total nodes:    1
    H2O cluster total memory:   1.71 GB
    H2O cluster total cores:    4
    H2O cluster allowed cores:  4
    H2O cluster healthy:        TRUE
    H2O Connection ip:          localhost
    H2O Connection port:        54321
    H2O Connection proxy:       NA
    H2O Internal Security:      FALSE
    R Version:                  R version 3.4.1 (2017-06-30)
> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  679385 36.3    1168576 62.5   940480 50.3
Vcells 1138497  8.7    1920143 14.7  1532430 11.7> rm(list=ls())
>
datos<-read.table("/home/jesus/master/datos/datos-balanceado/datos-100/datos.csv",header=T,dec=".",sep=",")
> uno<-datos[datos$InspectionReport == "ACCEPTED",]
> dos<-datos[datos$InspectionReport != "ACCEPTED",]
> uno$InspectionReport<-NULL
> dos$InspectionReport<-NULL
> uno2<-as.h2o(uno)
 
|=======================================================================================|
100%> dos2<-as.h2o(dos)
 
|=======================================================================================|
100%>
h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
Error in !missing(row) && !(base::is.character(row)) :
  objeto 'i' no encontrado> i<-1
>
h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
 
|=======================================================================================|
100%> t=Sys.time()
> for(i in 1:10){
+ 
h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
+
+ }
ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url =
http://localhost:54321/3/Frames/RTMP_sid_8e47_4/export)
water.exceptions.H2OIllegalArgumentException
 [1] "water.exceptions.H2OIllegalArgumentException: Illegal argument:
/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv of function:
exportFrame: File
/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv already
exists!"
 [2] "    water.fvec.Frame.export(Frame.java:1370)"
 [3] "    water.api.FramesHandler.export(FramesHandler.java:258)"
 [4] "    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"
 [5] "   
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)"
 [6] "   
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)"
 [7] "    java.lang.reflect.Method.invoke(Method.java:498)"
 [8] "    water.api.Handler.handle(Handler.java:63)"
 [9] "    water.api.RequestServer.serve(RequestServer.java:448)"
[10] "    water.api.RequestServer.doGeneric(RequestServer.java:297)"
[11] "    water.api.RequestServer.doPost(RequestServer.java:223)"
[12] "   
javax.servlet.http.HttpServlet.service(HttpServlet.java:755)"
[13] "   
javax.servlet.http.HttpServlet.service(HttpServlet.java:848)"
[14] "   
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)"
[15] "   
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)"
[16] "   
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)"
[17] "   
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)"
[18] "   
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)"
[19] "   
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)"
[20] "   
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"
[21] "   
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"
[22] "    water.JettyHTTPD$LoginHandler.handle(JettyHTTPD.java:183)"
[23] "   
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"
[24] "   
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"
[25] "    org.eclipse.jetty.server.Server.handle(Server.java:370)"
[26] "   
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)"
[27] "   
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)"
[28] "   
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)"
[29] "   
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)"
[30] "   
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)"
[31] "   
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)"
[32] "   
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)"
[33] "   
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)"
[34] "   
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)"
[35] "   
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)"
[36] "    java.lang.Thread.run(Thread.java:748)"
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix =
page,  :
ERROR MESSAGE:
Illegal argument:
/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv of function:
exportFrame: File
/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv already
exists!
>
> print(Sys.time()-t)
Time difference of 0.218374 secs> ?h2o.exportFile
> t=Sys.time()
> for(i in 1:10){
+ 
h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"),force=T)
+
+ }
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%
 
|=======================================================================================|
100%>
> print(Sys.time()-t)
Time difference of 11.31977 secs> matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))
 
|=======================================================================================|
100%> t=Sys.time()
> for(i in 1:10){
+
+  matriz[,j]<-h2o.distance(uno2,dos2[i,])
+ }
Error in !allCol && is.na(col) : objeto 'j' no
encontrado>
> print(Sys.time()-t)
Time difference of 0.006168127 secs> t=Sys.time()
> for(i in 1:10){
+
+  matriz[,i]<-h2o.distance(uno2,dos2[i,])
+ }>
> print(Sys.time()-t)
Time difference of 30.33803 secs> 10/30
[1] 0.3333333> 30/10*nrow(dos)
[1] 16068> 30/10*nrow(dos)
[1] 16068> 30/10*nrow(dos)/3600
[1] 4.463333> library(data.table)
data.table 1.10.4
  The fastest way to learn (by data.table authors):
https://www.datacamp.com/courses/data-analysis-the-data-table-way
  Documentation: ?data.table, example(data.table) and
browseVignettes("data.table")
  Release notes, videos and slides: http://r-datatable.com
Attaching package: ‘data.table’
The following objects are masked from ‘package:h2o’:
    hour, month, week, year
> t=Sys.time()
> for(i in 1:10){
+
fwrite(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
+
+ }
Error: is.list(x) is not TRUE> print(Sys.time()-t)
Time difference of 0.1015148 secs> ?fwrite
> matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))
 
|=======================================================================================|
100%> t=Sys.time()
> for(i in 1:10){
+
+  matriz[,i]<-h2o.distance(uno2,dos2[i,])
+ }>
> print(Sys.time()-t)
Time difference of 28.89684 secs> 30/10
[1] 3> 30/10*nrow(dos)
[1] 16068> 30/10*nrow(dos)/3600
[1] 4.463333> t=Sys.time()
> for(i in 1:50){
+
+  matriz[,i]<-h2o.distance(uno2,dos2[i,])
+ }>
> print(Sys.time()-t)
Time difference of 2.506209 mins> 2*60+30
[1] 150> 150/50
[1] 3> h2o.cluster_sizes()
Error in .model.parts(object) :
  el argumento "object" está ausente, sin valor por
omisión> h2o.clusterStatus()
Version: 3.10.5.3
Cluster name: H2O_started_from_R_jesus_rqh095
Cluster size: 1
Cluster is locked
                        h2o healthy   last_ping num_cpus sys_load mem_value_size
free_mem
1 localhost/127.0.0.1:54321    TRUE 1.50317e+12        4     0.55      707369984
1129209856
  pojo_mem swap_mem  free_disk    max_disk  pid num_keys tcps_active open_fds
rpcs_active
1        0        0 6571425792 20121124864 8553    16417           0       38   
0> ?h2o.init()
> h2o.clusterIsUp()
[1] TRUE> h2o.clusterInfo()
R is connected to the H2O cluster:
    H2O cluster uptime:         1 hours 31 minutes
    H2O cluster version:        3.10.5.3
    H2O cluster version age:    1 month and 20 days
    H2O cluster name:           H2O_started_from_R_jesus_rqh095
    H2O cluster total nodes:    1
    H2O cluster total memory:   1.05 GB
    H2O cluster total cores:    4
    H2O cluster allowed cores:  4
    H2O cluster healthy:        TRUE
    H2O Connection ip:          localhost
    H2O Connection port:        54321
    H2O Connection proxy:       NA
    H2O Internal Security:      FALSE
    R Version:                  R version 3.4.1
(2017-06-30)> h2o.cluster_sizes(dos2)
Error in .model.parts(object) :
  tentativa de obtener un slot "model" de un objeto cuya clase
("H2OFrame") que no es un objecto clase S4> h2o.clusterInfo()
R is connected to the H2O cluster:
    H2O cluster uptime:         1 hours 37 minutes
    H2O cluster version:        3.10.5.3
    H2O cluster version age:    1 month and 20 days
    H2O cluster name:           H2O_started_from_R_jesus_rqh095
    H2O cluster total nodes:    1
    H2O cluster total memory:   1.05 GB
    H2O cluster total cores:    4
    H2O cluster allowed cores:  4
    H2O cluster healthy:        TRUE
    H2O Connection ip:          localhost
    H2O Connection port:        54321
    H2O Connection proxy:       NA
    H2O Internal Security:      FALSE
    R Version:                  R version 3.4.1 (2017-06-30)
Gracias
Jesus
	[[alternative HTML version deleted]]