thr3ads.net - search: "sparkr"

Displaying 19 results from an estimated 19 matches for "sparkr".

Did you mean: sparks

2015 Sep 01

lazy loading in SparkR

Hi, I'm using SparkR and R won't read the promises from the SparkR package, only if I run lazyLoad manually. .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) print(.libPaths()) # [1] "/private/tmp/spark-1.5/spark-1.5.0-SNAPSHOT-bin-hadoop2.6/R/lib"...

Suggestion: Create On-Disk Dataframes

2017 Sep 04

Suggestion: Create On-Disk Dataframes

On 4 September 2017 at 11:35, Suzen, Mehmet wrote: | It is not needed. There is a large community of developer using SparkR. | https://spark.apache.org/docs/latest/sparkr.html | It does exactly what you want. I hope you are not going to mail a sparkr commercial to this list every day. As the count is now at two, this may be an excellent good time to stop it. Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | edd...

Big data con R

2017 Jan 04

Big data con R

...Paralelización con openMp - h2o y su paquete para R, - Paquete sparklyr como wrapper de los algoritmos de spark, Y por supuesto, utilizar muestreo o incluso si tenemos grandes volúmenes de datos, utilizar varias muestras para ajustar los modelos. A todo esto, se añade ahora la disponibilidad en SparkR de los algoritmos de spark (en la versión 2.1 de spark liberada hace menos de un mes) http://spark.apache.org/docs/latest/sparkr.html#machine-learning Parece que la tendencia es hacia el uso de entornos hadoop y spark. ¿qué opináis al respecto? ¿Es una tendencia pasajera? Saludos [[alternative...

Suggestion: Create On-Disk Dataframes

2017 Sep 03

Suggestion: Create On-Disk Dataframes

Dear R Developers, I would like to suggest the creation of a new S4 object class for On-Disk data.frames which do not fit in RAM memory, which could be called disk.data.frame() It could be based in rsqlite for example (By translating R syntax to SQL syntax for example), and the syntax and way of working of the disk.data.frame() class could be exactly the same than with data.frame objects. When

Leer parquet files desde R

2017 Oct 04

Leer parquet files desde R

Buenas a todos. Ya sé que con sparkR o sparklyr puedo leer fácilmente ficheros con formato parquet, pero ¿hay alguna forma de leerlos sin tener que arrancar spark? Mi situación es que tengo unos ficheros en formato parquet en s3 y quiero leerlos desde una instancia pequeñita de amazon EC2 que quiero mantener sin instalarle spark. Es...

Big data con R o phyton?

2016 Dec 05

Big data con R o phyton?

Merece la pena aprender python para Big data con Spark o usando la libreria que acaba de salir para R es suficiente? Qué creeis? [[alternative HTML version deleted]]

Leer parquet files desde R

2017 Oct 04

Leer parquet files desde R

...k.rstudio.com/reference/sparklyr/latest/spark_read_json.html> > > Saludos, > Carlos Ortega > www.qualityexcellence.es > > > El 4 de octubre de 2017, 21:33, José Luis Cañadas <canadasreche en gmail.com> > escribió: > >> Buenas a todos. >> Ya sé que con sparkR o sparklyr puedo leer fácilmente ficheros con formato >> parquet, pero ¿hay alguna forma de leerlos sin tener que arrancar spark? >> >> Mi situación es que tengo unos ficheros en formato parquet en s3 y quiero >> leerlos desde una instancia pequeñita de amazon EC2 que quiero...

Suggestion: Create On-Disk Dataframes

2017 Sep 04

Suggestion: Create On-Disk Dataframes

It is not needed. There is a large community of developer using SparkR. https://spark.apache.org/docs/latest/sparkr.html It does exactly what you want. On 3 September 2017 at 20:38, Juan Telleria <jtelleriar at gmail.com> wrote: > Dear R Developers, > > I would like to suggest the creation of a new S4 object class for On-Disk > data.frames which do...

SparksR

2018 Apr 13

SparksR

R-Help I'm working in my first large database (53,098,492,383 records). When I select the db via something like Library(SparkR) mydata <- sql("SELECT * FROM <table name>") is "mydata" a SparkDataFrame, and do I work with SparkDataFrames like I would regular df (per say); because I can't image I would ever create a 53 billion record df. I'm starting to acquaint myself with e Spark...

¿Está R perdiendo la batalla?

2017 Jan 15

¿Está R perdiendo la batalla?

Hace poco me puse al tema del big data y la verdad es que een este campo la decisión parece clara. Python ahora mismo está un paso por delante de R, aunque sparklyR puede igualar la contienda. Pero lo que me empieza a preocupar es que parece que si nos alejamos del Big Data, Python tb está ganando adeptos a pasos agigantados. ¿Está perdiendo R la batalla? [[alternative HTML version deleted]]

Revolutions Blog: January 2014 roundup

2014 Feb 04

Revolutions Blog: January 2014 roundup

...book by Rachel Schutt and Cathy O'Neil http://bit.ly/1c1berp Hadley Wickham introduces the dplyr package, with its "grammar of data manipulation" http://bit.ly/1c1bcje The new choroplethr package makes it easier to create data maps in R: http://bit.ly/1c1bcjf A developer preview of SparkR, an interface between R and Apache Spark, is now available: http://bit.ly/1c1berq Joseph Rickert reviews the capabilities of R for topological data analysis: http://bit.ly/1c1berr In a recent survey of data scientists, R is the most-used software tool other than SQL: http://bit.ly/1c1bers A new...

Discovering patterns in textual strings

2018 May 05

Discovering patterns in textual strings

...there could be Abc 123 could be a matching string > > This would not be considered a match ... > abc_something > this.is_a long stringwithabcinthemiddle > > The sequence(s) are always are at the beginning (or so it appears). Out > of the 54 billion records I am able to pull (SparkR sql) 948,679 unique > strings. It is from these unique strings that I (if possible) want to > identify the "key" strings. > > 1. Abc_1232.niok7j9hd > 2. Abc > 3. Abc.2#348hfk2.njilo > 4. Abc.2 > 5. Abc.7 > 6. BAdfr_kajdhf98#kjsdh > 7. BAdrf_gofer &gt...

SVM hadoop

2015 Dec 11

SVM hadoop

Hola Mª Luz, Te cuento un poco mi visión: Lo primero de todo es tener claro qué quiero hacer exactamente en paralelo, se me ocurren 3 escenarios: (1) Aplicar un modelo en este caso SVM sobre unos datos muy grandes y por eso necesito hadoop/spark (2) Realizar muchos modelos SVM sobre datos pequeños (por ejemplo uno por usuario) y por eso necesito hadoop/spark para parelilizar estos procesos

Resumen de R-help-es, Vol 75, Envío 7

2015 May 06

Resumen de R-help-es, Vol 75, Envío 7

Hola, me sorprende leer tu opinión ("R (puro) no es la herramienta ideal para el manejo directo del 'big data'") cuando precisamente este pasado mes de abril SparkR (ver descripción de su web más abajo) se ha integrado en Apache Spark y todo el mundo que está en "ese ajo" del "big data" (buzzword donde las haya) no le quita ojo a la publicación oficial este verano. https://amplab-extras.github.io/SparkR-pkg/ SparkR is an R package that pro...

Discovering patterns in textual strings

2018 May 07

Discovering patterns in textual strings

...s in 1.2.3_ABC ..... #3 Yes. So there could be Abc 123 could be a matching string This would not be considered a match ... abc_something this.is_a long stringwithabcinthemiddle The sequence(s) are always are at the beginning (or so it appears). Out of the 54 billion records I am able to pull (SparkR sql) 948,679 unique strings. It is from these unique strings that I (if possible) want to identify the "key" strings. 1. Abc_1232.niok7j9hd 2. Abc 3. Abc.2#348hfk2.njilo 4. Abc.2 5. Abc.7 6. BAdfr_kajdhf98#kjsdh 7. BAdrf_gofer 948679 .... So I may have a thousand individuals s...

Hadoop

2016 Jun 15

Hadoop

Hola buenas, me preguntaba si alguno usa hadoop Spark en su día día y si me podíais recomendar un buen curso para empezar. Estuve en la charla de meetup de madrid hace unos meses de Rspark y estuvo bien, ahora me preguntaba si es posible profundizar. Pero me gustaría tener recomendaciones de cualquier material que podáis recomendar, cursos de coursera que hayais hecho, libros que hayais leido,

readLines() segfaults on large file & question on how to work around

2017 Sep 02

readLines() segfaults on large file & question on how to work around

Thank you for your suggestion. Unfortunately, while R doesn't segfault calling readr::read_file() on the test file I described, I get the error message: Error in read_file_(ds, locale) : negative length vectors are not allowed Jen On Sat, Sep 2, 2017 at 1:38 PM, Ista Zahn <istazahn at gmail.com> wrote: > As s work-around I suggest readr::read_file. > > --Ista > > >

help estadística!!!!!

2016 Jan 28

help estadística!!!!!

Buenas tengo una consulta. Tengo un grupo de 15,000 clientes a los cuales debo de segmentar en base a variables que por sus características pueden ser agrupadas en 4 grupos. Lo primero que he realizado es segmentar las variables pero de cada grupo (xq necesito realizar un análisis sobre esto) mediante el análisis clúster y luego realizar una segmentación con todas las variables, también utilizando

Es todo python

2017 Nov 11

Es todo python

Un offtopic para empezar el fin de semana. Acabo de entrar en kaggle, hacia tiempo que no lo hacia, y veo con 'estupor' que practicamente la gente trabaja con python. Es cierto que la mayor parte de competiciones son usadas con redes neuronales, donde python si tiene ventaja frente a R, pero me da miedo ver como python para temas de machine learning parece estar ganando la partida...

search for: sparkr