hi WeiDong,
In the source code, I found xlators/cluster/map/, is this what you
are looking for?
On Fri, Aug 21, 2009 at 12:15 AM, Wei Dong<wdong.pku at gmail.com>
wrote:> Hi All,
>
> We are using glusterfs on our lab cluster for a shared storage to save a
> large number of image files, about 30 million at the moment. ?We use Hadoop
> for distributed computing, but we are reluctant to store small files on
> hadoop for it's low throughput on small files and also the non-standard
> filesystem interface (e.g. we won't be able to run convert on each
image to
> produce a thumbnail if the files are stored in hadoop). ?What we do now is
> to store a list of paths to all images in hadoop, and use Hadoop streaming
> to pipe the paths to some script, which will then read the images from
> glusterfs filesystem and do the processing. ?This has been working for a
> while so long as glusterfs doesn't hang, but the problem is that we
> basically lose all data locality. ?We have 66 nodes and the chance that a
> needed file is on local disk is only 1/66, and 55/66 of file I/O has to go
> through network, which make me very uncomfortable. ?I'm wondering if
there's
> a better way of making glusterfs and Hadoop work together to take the
> advantage of data locality.
>
> I know that there's a nufa translator which gives high preference to
local
> drive. ?This is good enough if the assignment of files to nodes is fixed.
> ?But if we want to assign files to nodes according to the location of the
> file, what interface should we use to get the physical location of the
file?
>
> I appreciate all your suggestions.
>
> - Wei Dong
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>