nathan@clusterfs.com
2006-Dec-15 10:38 UTC
[Lustre-devel] [Bug 10866] proc_file_read() causes MDS crash on stop
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10866 (In reply to comment #16)> I agree the opinion without holding a reference from the obd_device. But i don''t > understand why we need to move lprocfs_obd_cleanup() to precleanup SELF_EXP.By the time it gets to xxx_cleanup, much of the obd has already been destroyed - all imports, exports, including the self export, locks, much private obd data (the lov target list, etc.) Some of the proc files may be trying to read some of that destroyed data, which could again lead to failures. By moving the proc cleanup to precleanup, we insure that any such referenced structures are still around. The SELF_EXP stage of precleanup means the self-export is still around, and the obd is still "fully set up", but with no exports. (The export-specific proc files (in b1_5) are already correctly cleaned up when the export is destroyed, so they''re safe.)
nathan@clusterfs.com
2006-Dec-15 15:46 UTC
[Lustre-devel] [Bug 10866] proc_file_read() causes MDS crash on stop
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10866 nathan@clusterfs.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #9148|review?(nathan@clusterfs.com|review- Flag|) | (From update of attachment 9148) There are tons more proc entries than just these few. Many have their own fops already -- see e.g. lprocfs_obd_seq_create. If you look in the b1_5 tree I''ve added a LPROC_SEQ_FOPS macro that should be able to be used many more places, but still the _show routines are going to need locks. Again I''ll say macros to get and drop the lock will be useful, rather than copying the same check code into ech one.
nathan@clusterfs.com
2006-Dec-19 10:28 UTC
[Lustre-devel] [Bug 10866] proc_file_read() causes MDS crash on stop
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10866 from irc discussion, the plan is this: 1. declare a global rw_sem to cover all proc file manipulation Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: 2. write our own lproc_fops (https://bugzilla.lustre.org/attachment.cgi?id=9148) that should be used to call all of our read_proc and write_procs. These fops will hold a read lock, check for non-null private data, call the read/write_proc, and drop the read lock. 3. any proc that uses the seq_operations should be mostly replaced by a macro (LPROC_SEQ_FOPS in b1_5), which will also hold the read lock between the .open and .release fops. 4. hold a write lock during the proc file removal, and set the private data to null inside that lock. The above insures that no proc files are read/written after lprocfs_remove has been called (generally meaning obd cleanup method was called). However, there may still be particular elements referenced in some proc files that may be destroyed earlier than obd cleanup. This can be addressed in a combination of either of two ways: 1. Take appropriate locks and check inside the individual proc. 2. Move the lprocfs_obd_cleanup call to an earlier point in the obd cleanup (e.g. precleanup stage OBD_CLEANUP_SELF_EXPORT), where this data is known to still be safe.
jxiong@clusterfs.com
2006-Dec-22 07:19 UTC
[Lustre-devel] [Bug 10866] proc_file_read() causes MDS crash on stop
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10866 Created an attachment (id=9210) Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: --> (https://bugzilla.lustre.org/attachment.cgi?id=9210&action=view) Patch for b1_4 I have checked the difference between b1_4 and b1_5, but i haven''t found any substantial difference in them, so i fixed it on b1_4 at first, and i think it will be ported into b1_5 without any difficulty.
jxiong@clusterfs.com
2007-Jan-02 07:40 UTC
[Lustre-devel] [Bug 10866] proc_file_read() causes MDS crash on stop
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10866 What |Removed |Added ---------------------------------------------------------------------------- Attachment #9148 is|0 |1 obsolete| | Attachment #9210 is|0 |1 obsolete| | Created an attachment (id=9254) Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: --> (https://bugzilla.lustre.org/attachment.cgi?id=9254&action=view) Patch for b1_4 -
jxiong@clusterfs.com
2007-Jan-11 08:14 UTC
[Lustre-devel] [Bug 10866] proc_file_read() causes MDS crash on stop
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10866 What |Removed |Added ---------------------------------------------------------------------------- Attachment #9269 is|0 |1 obsolete| | Created an attachment (id=9320) Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: --> (https://bugzilla.lustre.org/attachment.cgi?id=9320&action=view) Patch for b1_4 the final version patch
nathan@clusterfs.com
2007-Jan-11 11:57 UTC
[Lustre-devel] [Bug 10866] proc_file_read() causes MDS crash on stop
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10866 What |Removed |Added ---------------------------------------------------------------------------- Attachment #9320| |review+ Flag| | (From update of attachment 9320) Looks good. Make sure the rw26.c change is removed, and make sure sanity, recovery-small, conf-sanity, and replay-single all pass before you sign it in.
jxiong@clusterfs.com
2007-Jan-14 00:15 UTC
[Lustre-devel] [Bug 10866] proc_file_read() causes MDS crash on stop
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10866 Created an attachment (id=9339) Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: --> (https://bugzilla.lustre.org/attachment.cgi?id=9339&action=view) Patch for b1_5 patch for b1_5