All,
I am using a pretty crude method to get data out of HDFS via Hive and into R and
was curious about alternatives that the group has explored.
Basically, I run a system command that runs a hive statement and writes the
returned data to a delimited file. Then, I read that file into an object and
continue.
For example:
hive.script <- "select * from orders where date =
''2011-05-06'';"
write(hive.script,"script")
system("hive –f script > 20110506_orders")
order.file <- read("20110506_orders", sep = "\t", header
= F)
Does the group have experience with a better way to handle this?
The goal would be to write data directly from hive into the R CLI in order to
avoid unnecessary file clutter on a shared server. There are hadoop packages,
such as Rhipe and hive
Thanks.
Josh
[[alternative HTML version deleted]]