linux freaker
2013-Mar-18 07:52 UTC
[Lustre-discuss] Need help on lustre filesystem setup..
Hi, I am trying to run Apache Hadoop project on parallel filesystem like lustre. I have 1 MDS, 2 OSS/OST and 1 Lustre Client. My lustre client shows: Code: [root at lustreclient1 ~]# lfs df -h UUID bytes Used Available Use% Mounted on lustre-MDT0000_UUID 4.5G 274.3M 3.9G 6% /mnt/lustre[MDT:0] lustre-OST0000_UUID 5.9G 276.1M 5.3G 5% /mnt/lustre[OST:0] lustre-OST0001_UUID 5.9G 276.1M 5.3G 5% /mnt/lustre[OST:1] lustre-OST0002_UUID 5.9G 276.1M 5.3G 5% /mnt/lustre[OST:2] lustre-OST0003_UUID 5.9G 276.1M 5.3G 5% /mnt/lustre[OST:3] lustre-OST0004_UUID 5.9G 276.1M 5.3G 5% /mnt/lustre[OST:4] lustre-OST0005_UUID 5.9G 276.1M 5.3G 5% /mnt/lustre[OST:5] lustre-OST0006_UUID 5.9G 276.1M 5.3G 5% /mnt/lustre[OST:6] lustre-OST0007_UUID 5.9G 276.1M 5.3G 5% /mnt/lustre[OST:7] lustre-OST0008_UUID 5.9G 276.1M 5.3G 5% /mnt/lustre[OST:8] lustre-OST0009_UUID 5.9G 276.1M 5.3G 5% /mnt/lustre[OST:9] lustre-OST000a_UUID 5.9G 276.1M 5.3G 5% /mnt/lustre[OST:10] lustre-OST000b_UUID 5.9G 276.1M 5.3G 5% /mnt/lustre[OST:11] filesystem summary: 70.9G 3.2G 64.0G 5% /mnt/lustre As I was unsure about which machine I need to install Hadoop softwareon, I decided to go ahead with installing Hadoop on LustreClient1. I configured LustreClient1 with JAVA_HOME and HADOOP parameter with the following files entry: File: conf/core-site.xml Code: <property> <name>fs.default.name</name> <value>file:///mnt/lustre</value> </property> <property> <name>mapred.system.dir</name> <value>${fs.default.name}/hadoop_tmp/mapred/system</value> <description>The shared directory where MapReduce stores control files. </description> </property> I dint make changes in mapred-site.xml. Now when I start ''bin/start-mapred.sh'' which tried to ssh to my own local machine. I am not sure if I am doing right. Doubt> Do I need to have two Lustre Client for this to work? Then I tried running wordcount program shown below: Code: bin/hadoop jar hadoop-examples-1.1.1.jar wordcount /tmp/rahul /tmp/rahul/rahul-output ied 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 13/03/14 18:12:29 INFO ipc.Client: Retrying connect to server: 10.94.214.188/10.94.214.188:54311. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 13/03/14 18:12:30 INFO ipc.Client: Retrying connect to server: 10.94.214.188/10.94.214.188:54311. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 13/03/14 18:12:31 INFO ipc.Client: Retrying connect to server: 10.94.214.188/10.94.214.188:54311. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 13/03/14 18:12:32 INFO ipc.Client: Retrying connect to server: 10.94.214.188/10.94.214.188:54311. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) Question:1. As I have been comparing HDFS and Lustre for Hadoop, what would be the right number of hardware nodes to compare?Say, I have 1 MDS, 2 OSS and 1 Lustre Client, on the other hand, 1 Namenode and 3 datanodes? How can I compare both FS? Question:2. Do I really need 2 lustre client to setup Hadoop over Lustre? if it is possible, how can I use OSS and MDS too for Hadoop setup? Question:3. As I read regarding the wordcount example, we need to insert data into HDFS filesystem, do we need to do same for Lustre too? Whats the command? Question:4. What are the steps to confirm if HAdoop is actually using lustre FS?