Hello-- Sture Lygren wrote:> > Mail and UML-mail can both access the SAN storage area now mounted as > /mnt/lustre. Write performance is just fine (gige interconnects, but UML > is somewhat slower). Everything looks just perfect - _BUT_ - if I try to > read data off lustre from the the uml-client node performance is just > lousy, degrading to a crawl and ends up crashing lustre -> LBUG. The > attached file contains the last lines from the lustre log file. > > Note: This have only happened for the uml client node. Further - it seems > to happen only while reading/copying large files from lustre. Writes are > always fine. > > I realise that my setup is a very low end one, but still - it would be > nice to have lustre running. So quiestion is - is there some fix to let > uml''s read from lustre without crashing it or am I on a dead end here?We have not spent any time optimizing Lustre for use in UML, but I have a couple of thoughts: First, Lustre 1.0.x has pretty aggressive readahead properties -- if your UML is only configured to have 32 or 64 MB of ram, it could easily be doing enough readahead to push your UML into swapping. Second, Lustre is very heavily threaded, and makes use of dozens of kernel threads. I could imagine that these are not being scheduled optimally in UML, but this would probably be the same for reads and writes. The LBUG is fixed in Lustre 1.2.x, and is the unfortunate result of a timeout during bulk I/O. If you solve the root cause that degrades things to a crawl, you will likely not see the LBUG again. Hope that helps-- -Phil
------=_20040321193501_39252 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: us-ascii Hi, Lustre is completely new to me, so please bare with me if my questions are way off or just plain stupid. First an explanation of my setup: 3 physical machines connected by HBA''s (qla2300) to a SAN, the three different machines are dedicated ldap-, mail- and web-servers. To make some sort of a cheap failover solution for the three servers, each of them are running one uml instance (ldap has web uml, mail has ldap uml and web has mail uml). To allow for failover I needed a cluster-like filesystem that allowed for two or more servers to access same LUN''s on the SAN read-write - heres where lustre comes in. It looked very promising and somewhat simpler to setup than opengfs and others. I got lustre up and running (1.0.4 on 2.4.24 (I''ve also tried lustre cvs from 20.04)) and my setup for mail failover were: Mail-server running as MST with local MST storage (/dev/sda8) Web-server running OST (OST1) with OBD on the SAN (/dev/sdb1) LDAP-server running OST (OST2) with OBD on the SAN (/dev/sdb1) LOV1 is made up of OST1 and OST2 Mail-server is configured as a client node UML on Web-server is configured as a client node (Other configs also tried and I''m not sure this is what I''ll end up with) The result: Mail and UML-mail can both access the SAN storage area now mounted as /mnt/lustre. Write performance is just fine (gige interconnects, but UML is somewhat slower). Everything looks just perfect - _BUT_ - if I try to read data off lustre from the the uml-client node performance is just lousy, degrading to a crawl and ends up crashing lustre -> LBUG. The attached file contains the last lines from the lustre log file. Note: This have only happened for the uml client node. Further - it seems to happen only while reading/copying large files from lustre. Writes are always fine. I realise that my setup is a very low end one, but still - it would be nice to have lustre running. So quiestion is - is there some fix to let uml''s read from lustre without crashing it or am I on a dead end here? Appreciate your responses! Regards, Sture -- Sture Lygren Computer Systems Administrator Andoya Rocket Range Work: +4776144451 / Fax: +4776144401 ------=_20040321193501_39252 Content-Type: application/octet-stream; name="lustre-log" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="lustre-log" MDAwMTAwOjAyMDAwMDowOjEwNzk4MTQ1NzYuNzIzNDczOjEyODQ6NTczOi0xNDAzNTIwNTcyOihj bGllbnQuYzo4MTI6cHRscnBjX2V4cGlyZV9vbmVfcmVxdWVzdCgpKQpAQEAgdGltZW91dCByZXFA YWJhYmI2MDAgeDIyNjI1L3QwIG8zLT5vc3QyX1VVSURATklEX2ItbGRhcF9VVUlEOjYgbGVucwoy ODgvMjQwIHJlZgowMDAxMDA6MDgwMDAwOjA6MTA3OTgxNDU3Ni43MjM1NzU6MTUwODo1NzM6LTE0 MDM1MjA1NzI6KHJlY292ZXIuYzozMjc6cHRscnBjX2ZhaWxfaW1wb3J0KCkpCm9zYzogbmV3IHN0 YXRlOiBESVNDT04KMDAwMTAwOjA4MDAwMDowOjEwNzk4MTQ1NzYuNzIzNTg5OjE1NDA6NTczOi0x NDAzNTIwNTcyOihyZWNvdmVyLmM6MjM5OnB0bHJwY19oYW5kbGVfZmFpbGVkX2ltcG9ydCgpKQpp bXBvcnQgb3N0Ml9VVUlEQE5JRF9iLWxkYXBfVVVJRCBmb3IKT1NDX21haWwtYjEubm9yZGx5c25l dHQubm9fb3N0Ml9NTlRfYi1tYWlsLQowMDAxMDA6MDgwMDAwOjA6MTA3OTgxNDU3Ni43MjM2MDE6 MTU3Mjo1NzM6LTE0MDM1MjA1NzI6KHJlY292ZXIuYzoyOTE6cHRscnBjX3NldF9pbXBvcnRfYWN0 aXZlKCkpCnNldHRpbmcgaW1wb3J0IG9zdDJfVVVJRCBJTlZBTElECjAwMDEwMDowODAwMDA6MDox MDc5ODE0NTc2LjcyMzYxNjoxNjUyOjU3MzotMTQwMzUyMDU3MjooY2xpZW50LmM6MTYyMzpwdGxy cGNfYWJvcnRfaW5mbGlnaHQoKSkKQEBAIGluZmxpZ2h0IHJlcUBhYmFiYjYwMCB4MjI2MjUvdDAg bzMtPm9zdDJfVVVJREBOSURfYi1sZGFwX1VVSUQ6NiBsZW5zCjI4OC8yNDAgcmVmIDEKMDAwMTAw OjA4MDAwMDowOjEwNzk4MTQ1NzYuNzIzNjMwOjE2NTI6NTczOi0xNDAzNTIwNTcyOihjbGllbnQu YzoxNjIzOnB0bHJwY19hYm9ydF9pbmZsaWdodCgpKQpAQEAgaW5mbGlnaHQgcmVxQGFmZTYyNDAw IHgyMjYyNi90MCBvMy0+b3N0Ml9VVUlEQE5JRF9iLWxkYXBfVVVJRDo2IGxlbnMKMjg4LzI0MCBy ZWYgMQowMDAxMDA6MDgwMDAwOjA6MTA3OTgxNDU3Ni43MjM2NDM6MTY1Mjo1NzM6LTE0MDM1MjA1 NzI6KGNsaWVudC5jOjE2MjM6cHRscnBjX2Fib3J0X2luZmxpZ2h0KCkpCkBAQCBpbmZsaWdodCBy ZXFAYTk1Zjg4MDAgeDIyNjI3L3QwIG8zLT5vc3QyX1VVSURATklEX2ItbGRhcF9VVUlEOjYgbGVu cwoyODgvMjQwIHJlZiAxCjAwMDEwMDowODAwMDA6MDoxMDc5ODE0NTc2LjcyMzY1NToxNjUyOjU3 MzotMTQwMzUyMDU3MjooY2xpZW50LmM6MTYyMzpwdGxycGNfYWJvcnRfaW5mbGlnaHQoKSkKQEBA IGluZmxpZ2h0IHJlcUBhOTVmODQwMCB4MjI2MjgvdDAgbzMtPm9zdDJfVVVJREBOSURfYi1sZGFw X1VVSUQ6NiBsZW5zCjI4OC8yNDAgcmVmIDEKMDAwMTAwOjAyMDAwMDowOjEwNzk4MTQ1NzYuNzI0 ODkzOjE1NzI6NTczOi0xNDAzNTIxMTE2OihyZWNvdmVyLmM6MTAwOnB0bHJwY19ydW5fZmFpbGVk X2ltcG9ydF91cGNhbGwoKSkKRXJyb3IgaW52b2tpbmcgcmVjb3ZlcnkgdXBjYWxsIC91c3IvbGli L2x1c3RyZS9sdXN0cmVfdXBjYWxsIEZBSUxFRF9JTVBPUlQgbwowMDAxMDA6MDIwMDAwOjA6MTA3 OTgxNDU3Ni43MjQ5MjI6MTI4NDo1NzM6LTE0MDM1MjExMTY6KGNsaWVudC5jOjgxMjpwdGxycGNf ZXhwaXJlX29uZV9yZXF1ZXN0KCkpCkBAQCB0aW1lb3V0IHJlcUBhZmU2MjQwMCB4MjI2MjYvdDAg bzMtPm9zdDJfVVVJREBOSURfYi1sZGFwX1VVSUQ6NiBsZW5zCjI4OC8yNDAgcmVmCjAwMDEwMDow MjAwMDA6MDoxMDc5ODE0NTc2LjcyNDk3MzoxMjg0OjU3MzotMTQwMzUyMTExNjooY2xpZW50LmM6 ODEyOnB0bHJwY19leHBpcmVfb25lX3JlcXVlc3QoKSkKQEBAIHRpbWVvdXQgcmVxQGE5NWY4ODAw IHgyMjYyNy90MCBvMy0+b3N0Ml9VVUlEQE5JRF9iLWxkYXBfVVVJRDo2IGxlbnMKMjg4LzI0MCBy ZWYKMDAwMTAwOjAyMDAwMDowOjEwNzk4MTQ1NzYuNzI1MDE1OjEyODQ6NTczOi0xNDAzNTIxMTE2 OihjbGllbnQuYzo4MTI6cHRscnBjX2V4cGlyZV9vbmVfcmVxdWVzdCgpKQpAQEAgdGltZW91dCBy ZXFAYTk1Zjg0MDAgeDIyNjI4L3QwIG8zLT5vc3QyX1VVSURATklEX2ItbGRhcF9VVUlEOjYgbGVu cwoyODgvMjQwIHJlZgowMDA4MDA6MDIwMDAwOjA6MTA3OTgxNDU4NS43ODM0NTM6MTE4ODo1NDA6 LTE0MTI5MjQ4OTI6KHNvY2tuYWxfY2IuYzoyNTM0Omtzb2NrbmFsX2ZpbmRfdGltZWRfb3V0X2Nv bm4oKSkKVGltZWQgb3V0IFRYIHRvIDB4YzBhODAxMWUgMTc4MCBhYmFiYTgwMCAxOTIuMTY4LjEu MzAKMDAwODAwOjAyMDAwMDowOjEwNzk4MTQ1ODUuNzgzNDkwOjExNDA6NTQwOi0xNDEyOTI0ODky Oihzb2NrbmFsX2NiLmM6MjU2Njprc29ja25hbF9jaGVja19wZWVyX3RpbWVvdXRzKCkpClRpbWVv dXQgb3V0IGNvbm4tPjB4YzBhODAxMWUgaXAgMTkyLjE2OC4xLjMwOjk4OAowMDA4MDA6MDIwMDAw OjA6MTA3OTgxNDU4NS43ODQ4MTc6MTE0MDo1NDA6LTE0MTI5MjQ4OTI6KHNvY2tuYWwuYzoxMDA5 Omtzb2NrbmFsX2Rlc3Ryb3lfY29ubigpKQpSZWZ1c2luZyB0byBjb21wbGV0ZSBhIHBhcnRpYWwg cmVjZWl2ZSBmcm9tIDB4YzBhODAxMWUsIGlwIDE5Mi4xNjguMS4zMDo5ODgKMDAwODAwOjAyMDAw MDowOjEwNzk4MTQ1ODUuNzg0ODUzOjExNDA6NTQwOi0xNDEyOTI0ODkyOihzb2NrbmFsLmM6MTAx MTprc29ja25hbF9kZXN0cm95X2Nvbm4oKSkKVGhpcyBtYXkgaGFuZyBjb21tdW5pY2F0aW9ucyBh bmQgcHJldmVudCBtb2R1bGVzIGZyb20gdW5sb2FkaW5nCjAwMDEwMDowMjAwMDA6MDoxMDc5ODE0 NTg2LjczMzQ1OToxNTcyOjU3MzotMTQwMzUyMTA4NDoobmlvYnVmLmM6NTM4OnB0bHJwY191bnJl Z2lzdGVyX2J1bGsoKSkKVW5leHBlY3RlZGx5IGxvbmcgdGltZW91dDogZGVzYyBhYTE4MTAwMAow MDAxMDA6MDQwMDAwOjA6MTA3OTgxNDU4Ni43MzM0OTM6MTU3Mjo1NzM6LTE0MDM1MjEwODQ6KG5p b2J1Zi5jOjUzOTpwdGxycGNfdW5yZWdpc3Rlcl9idWxrKCkpCkxCVUcgLSB0cnlpbmcgdG8gZHVt cCBsb2cgdG8gL3RtcC9sdXN0cmUtbG9nCkA------=_20040321193501_39252--