It is not needed. There is a large community of developer using SparkR.
https://spark.apache.org/docs/latest/sparkr.html
It does exactly what you want.
On 3 September 2017 at 20:38, Juan Telleria <jtelleriar at gmail.com>
wrote:> Dear R Developers,
>
> I would like to suggest the creation of a new S4 object class for On-Disk
> data.frames which do not fit in RAM memory, which could be called
> disk.data.frame()
>
> It could be based in rsqlite for example (By translating R syntax to SQL
> syntax for example), and the syntax and way of working of the
> disk.data.frame() class could be exactly the same than with data.frame
> objects.
>
> When the session is of, is such disk.data.frames are not saved, and
> implicit DROP TABLE could be done in all the schemas created in rsqlite.
>
> Nowadays, with the SSD disk drives such new data.frame() class could have
> sense, specially when dealing with Big Data.
>
> It is true that this new class might be slower than regular data.frame,
> data.table or tibble classes, but we would be able to handle much more
> data, even if it is at cost of speed.
>
> Also with data sampling, and use of a regular odbc connection we could do
> all the work, but for people who do not know how to use RDBMS or specific
> purpose R packages for this job, this could work.
>
> Another option would be to base this new S4 class on feather files, but
> maybe making it with rsqlite is simply easier.
>
> A GitHub project could be created for such purpose, so that all the
> community can contribute (included me :D ).
>
> Thank you,
> Juan
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel