Hi nithin:
The fuse mount is what allows the filesystem to access distributed files in
gluster: that is, GlusterFS has its own fuse mount ... And GlusterFileSystem
wraps that in hadoop FileSystem semantics.
Meanwhile, The mapreduce jobs are invoked using on custom core-site and
mapred-site XML nodes which specify GlusterFileSystem as the dfs.
On Feb 22, 2013, at 3:17 AM, Nikhil Agarwal <nikagar17 at gmail.com>
wrote:
> Hi All,
>
>
>
> Thanks a lot for taking out your time to answer my question.
>
>
>
> I am trying to implement a file system in hadoop under irg.apache.hadoop.fs
package something similar to KFS, glusterfs, etc. I wanted to know is that in
README.txt of glusterfs it is mentioned :
>
>
>
> >> # ./bin/start-mapred.sh
> If the map/reduce job/task trackers are up, all I/O will be done to
GlusterFS.
>
>
>
> So, suppose my input files are scattered in different nodes(glusterfs
servers), how do I(hadoop client having glusterfs plugged in) issue a Mapreduce
command?
>
> Moreover, after issuing a Mapreduce command would my hadoop client fetch
all the data from different servers to my local machine and then do a Mapreduce
or would it start the TaskTracker daemons on the machine(s) where the input
file(s) are located and perform a Mapreduce there?
>
> Please rectify me if I am wrong but I suppose that the location of input
files top Mapreduce is being returned by the function getFileBlockLocations
(FileStatus file, long start, long len).
>
>
>
> Thank you very much for your time and helping me out.
>
>
>
> Regards,
>
> Nikhil
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130222/d1e54e41/attachment.html>