On Sat, 2012-06-16 at 15:38 -0400, Vlad wrote:> Greetings,
>
> I am not sure I understand fully the relationship b/w glusterfs
> project per se and https://github.com/gluster/hadoop-glusterfs, but
> I'd like to follow up on the "Hadoop Connector" mention here
> (http://www.gluster.org/community/documentation/index.php/Hadoop) and
> getFileBlockLocations API mention here
>
(http://community.gluster.org/q/how-gluster-supports-map-reduce-in-absence-of-metadata-of-input-data-chunks-spread-over-multiple-machines-see-desciption/):
>
> - does 3.3 install any kind of a C lib that would allow for "where
> data actually landed" queries?
>
> This would be very useful to users who want to structure their HPC
> jobs in the map/reduce style, but without using Hadoop specifically.
> (Really, the main innovation in MapReduce is colocation of calculation
> and data and I'd rather use a fs that's mountable in the classic
sense
> as opposed to HDFS).
The getFileLocations API is implemented using a "magic"
extended-attribute request. The extended attribute is
trusted.glusterfs.pathinfo; if you try to fetch that, we dynamically
construct a reply describing where the data went. I suppose we could
provide a C/C++ library to parse that into some sort of structure that's
easier to use from within a program, but AFAIK that has not been done.