Peter Kjellstrom
2010-Jan-15 13:41 UTC
[Lustre-discuss] Unable to get lmt to work properly
I first posted this on lmt-discuss but I''m not sure it''s alive. Hopefully someone here can shed some light on it. The setup is fairly trivial, 2 OSTs per OSS using lustre 1.6.7.1 (more details below). Tia, Peter -------------------------------------------------------------- I have two problems: 1) I only see one (out of two) OSTs per OSS when I run "cerebro-stat -m lmt_ost": [root at oss44 ~]# cerebro-stat -m lmt_ost | grep oss44 oss44: 1.0;oss44;test1-OST0001;303628200;303628288;4779243244;4781815224;2147483648;2147483648 [root at oss44 ~]# ls /proc/fs/lustre/obdfilter/ num_refs test1-OST0000 test1-OST0001 [root at oss44 ~]# df -t lustre Filesystem 1K-blocks Used Available Use% Mounted on /dev/oss44_vg/ost1 4781819320 3620560 4729618276 1% /lustre/ost1 /dev/oss44_vg/ost2 4781815224 2571980 4730662800 1% /lustre/ost2 [root at oss44 ~]# uname -r 2.6.18-92.1.17.el5_lustre.1.6.7.1smp [root at oss44 ~]# rpm -qa | grep lmt lmt-client-2.6.3-1.ch4.2 lmt-server-agent-2.6.3-1.ch4.2 lmt-server-2.6.3-1.ch4.2 [root at oss44 ~]# rpm -qa | grep cerebro cerebro-1.9-1 2) Something''s probably strange with the database but I don''t know what really (originally no OST_INFO was generated): Symptoms include "empty" ltop output (ltop -X shows all OSSes but only CPU-usage is correct, all other "***") And the cron-script writes log-files like this: ##################### # OST - test1 ##################### Updating hourly ost agg table for test1 Updating OST_AGGREGATE_HOUR for test1... [LMT::connect] - Unknown filesystem: test1 Updating other ost agg tables for test1 [LMT::connect] - Unknown filesystem: test1 Updating filesys-level ost tables for test1 [LMT::connect] - Unknown filesystem: test1 ##################### # ROUTER - test1 ##################### Updating hourly router agg table for test1 Updating ROUTER_AGGREGATE_HOUR for test1... [LMT::connect] - Unknown filesystem: test1 Updating other router agg tables for test1 [LMT::connect] - Unknown filesystem: test1 ##################### # MDS - test1 ##################### Updating hourly mds agg table for test1 Updating MDS_AGGREGATE_HOUR for test1... [LMT::connect] - Unknown filesystem: test1 Updating other mds agg tables for test1 [LMT::connect] - Unknown filesystem: test1 *** Aggregate Table Update Complete *** Here''s the database config (I think it includes the important bits). mysql> show tables; +----------------------------+ | Tables_in_filesystem_test1 | +----------------------------+ | EVENT_DATA | | EVENT_INFO | | FILESYSTEM_AGGREGATE_DAY | | FILESYSTEM_AGGREGATE_HOUR | | FILESYSTEM_AGGREGATE_MONTH | | FILESYSTEM_AGGREGATE_WEEK | | FILESYSTEM_AGGREGATE_YEAR | | FILESYSTEM_INFO | | MDS_AGGREGATE_DAY | | MDS_AGGREGATE_HOUR | | MDS_AGGREGATE_MONTH | | MDS_AGGREGATE_WEEK | | MDS_AGGREGATE_YEAR | | MDS_DATA | | MDS_INFO | | MDS_OPS_DATA | | MDS_VARIABLE_INFO | | OPERATION_INFO | | OSS_DATA | | OSS_INFO | | OSS_INTERFACE_DATA | | OSS_INTERFACE_INFO | | OSS_VARIABLE_INFO | | OST_AGGREGATE_DAY | | OST_AGGREGATE_HOUR | | OST_AGGREGATE_MONTH | | OST_AGGREGATE_WEEK | | OST_AGGREGATE_YEAR | | OST_DATA | | OST_INFO | | OST_OPS_DATA | | OST_VARIABLE_INFO | | ROUTER_AGGREGATE_DAY | | ROUTER_AGGREGATE_HOUR | | ROUTER_AGGREGATE_MONTH | | ROUTER_AGGREGATE_WEEK | | ROUTER_AGGREGATE_YEAR | | ROUTER_DATA | | ROUTER_INFO | | ROUTER_VARIABLE_INFO | | TIMESTAMP_INFO | | VERSION | +----------------------------+ 42 rows in set (0.00 sec) mysql> select * from FILESYSTEM_INFO; +---------------+-----------------+-----------------------+----------------+ | FILESYSTEM_ID | FILESYSTEM_NAME | FILESYSTEM_MOUNT_NAME | SCHEMA_VERSION | +---------------+-----------------+-----------------------+----------------+ | 1 | test1 | | 1.1 | +---------------+-----------------+-----------------------+----------------+ 1 row in set (0.00 sec) mysql> select * from MDS_INFO; +--------+---------------+---------------+----------+-------------+ | MDS_ID | FILESYSTEM_ID | MDS_NAME | HOSTNAME | DEVICE_NAME | +--------+---------------+---------------+----------+-------------+ | 1 | 1 | test1-MDT0000 | mds4 | | +--------+---------------+---------------+----------+-------------+ 1 row in set (0.00 sec) mysql> select * from OSS_INFO where HOSTNAME like "oss4%" LIMIT 4; +--------+---------------+----------+--------------+ | OSS_ID | FILESYSTEM_ID | HOSTNAME | FAILOVERHOST | +--------+---------------+----------+--------------+ | 18 | 1 | oss40 | NULL | | 19 | 1 | oss41 | NULL | | 20 | 1 | oss42 | NULL | | 21 | 1 | oss43 | NULL | +--------+---------------+----------+--------------+ 4 rows in set (0.00 sec) mysql> select * from OST_INFO where HOSTNAME like "oss44" LIMIT 4; +--------+--------+---------------+----------+---------+-------------+ | OST_ID | OSS_ID | OST_NAME | HOSTNAME | OFFLINE | DEVICE_NAME | +--------+--------+---------------+----------+---------+-------------+ | 43 | 22 | test1-OST0000 | oss44 | NULL | NULL | | 44 | 22 | test1-OST0001 | oss44 | NULL | NULL | +--------+--------+---------------+----------+---------+-------------+ 2 rows in set (0.01 sec) mysql> select * from OST_DATA where OST_ID like "43" or OST_ID like "44" LIMIT 4; +--------+-------+------------+-------------+---------+-------------+-------------+-------------+-------------+ | OST_ID | TS_ID | READ_BYTES | WRITE_BYTES | PCT_CPU | KBYTES_FREE | KBYTES_USED | INODES_FREE | INODES_USED | +--------+-------+------------+-------------+---------+-------------+-------------+-------------+-------------+ | 44 | 51610 | 0 | 0 | NULL | 4781340404 | 474820 | 303628200 | 88 | | 43 | 51610 | 0 | 0 | NULL | 4781344500 | 474820 | 303628200 | 88 | | 44 | 51611 | 0 | 0 | NULL | 4781340404 | 474820 | 303628200 | 88 | | 43 | 51611 | 0 | 0 | NULL | 4781344500 | 474820 | 303628200 | 88 | +--------+-------+------------+-------------+---------+-------------+-------------+-------------+-------------+ 4 rows in set (0.00 sec) mysql> select * from MDS_DATA LIMIT 4; +--------+-------+----------+-------------+-------------+-------------+-------------+ | MDS_ID | TS_ID | PCT_CPU | KBYTES_FREE | KBYTES_USED | INODES_FREE | INODES_USED | +--------+-------+----------+-------------+-------------+-------------+-------------+ | 1 | 2 | 0.24975 | 71211604 | 463084 | 17802901 | 86 | | 1 | 3 | 0.29985 | 71211604 | 463084 | 17802901 | 86 | | 1 | 4 | 0.29985 | 71211604 | 463084 | 17802901 | 86 | | 1 | 5 | 0.549176 | 71211604 | 463084 | 17802901 | 86 | +--------+-------+----------+-------------+-------------+-------------+-------------+ 4 rows in set (0.02 sec) mysql> select * from OSS_DATA where OSS_ID like "22" LIMIT 4; +--------+-------+----------+------------+ | OSS_ID | TS_ID | PCT_CPU | PCT_MEMORY | +--------+-------+----------+------------+ | 22 | 2 | 0.62267 | 31.5018 | | 22 | 3 | 0.1 | 31.5017 | | 22 | 4 | 0.049975 | 31.5017 | | 22 | 5 | 0.05 | 31.5017 | +--------+-------+----------+------------+ 4 rows in set (0.00 sec) Any help/comments/pointers appreciated, tia, Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100115/14dd94e3/attachment.bin