The following series of patches comprises the bulk of our outstanding changes for Ocfs2. Aside from the usual set of cleanups and fixes that were inappropriate for 2.6.25, there are a few highlights: The '/sys/o2cb' directory has been moved to '/sys/fs/o2cb'. The new location meshes better with modern sysfs layout. A symbolic link has been placed in the old location so as to not break old versions of ocfs2-tools. New versions of ocfs2-tools know to look in /sys/fs/o2cb. When an appropriate amount of time has passed (decided to be two years), we can remove the link. This change required a small patch to sysfs (entirely external to Ocfs2) which is included here with the appropriate 'Acked-by' lines. Inode allocation in Ocfs2 has been modified to better handle an annoying corner case. When a node's local inode allocator fills up, it attempts to grow the allocator by adding an inode group. This might be impossible though, if the main file system bitmap is too full or fragmented to provide the required space. This used to be treated as an ENOSPC condition, but with the addition of Tao's "inode stealing" patches, the allocation code will attempt to allocate from other node's inode allocators before throwing an error. Merging of unwritten extents has also undergone an incremental but significant improvement - extents can now be merged between leaf nodes. This ensures that the allocation btree stays as compact as possible, even if previous write patterns had caused it to fragment. Thanks again goes to Tao for this improvement. Sunil has improved our ability to debug the Ocfs2 DLM by allowing us to track DLM state via a set of debugfs files. It's now possible to get a point-in-time view of master list entries, lock resource states and more. Debugfs.ocfs2 has been patched to make this process even easier. And finally, we have Joel's work to allow Ocfs2 to use userspace cluster stacks. This series of patches is the last step in a multi-year process of seperating the Ocfs2 file system code from the underlying cluster stack. The file system is now cluster stack agnostic. Users can choose between the "o2cb" stack which is comprised of the traditional Ocfs2 cluster components (including fs/ocfs2/dlm, also referred to as "o2dlm") or the new "user" cluster stack. The "user" cluster stack requires a userspace component to communicate node membership information to the file system via a misc device. In "user" cluster stack mode, Dave Teigland's dlm (fs/dlm) is used as it already contains a cluster stack agnostic userspace API. This all has several benefits. The most obvious is that we now get to share code and maintenance cost with other cluster-related projects instead of re-implementing cluster and dlm features in parallel. Additionally, Ocfs2 users can now run the cluster stack of their choice. For example, while we anticipate that some users will want to stick with o2cb for it's simplicity of setup and use, many will want access to some of the advanced features (clustered volume management, hardware fencing, service failover, etc) that are already provided by most userspace cluster stacks. These patches allow for that sort of decision to be made. Of course, all of this is 100% backwards compatible with old versions of Ocfs2-tools. Users only need to download a new version of Ocfs2-tools if they want to take advantage of the userspace cluster stack feature. An ocfs2-tools tree with code to enable userspace cluster stacks can be found at: http://oss.oracle.com/git/?p=ocfs2-tools.git;a=shortlog;h=stack-user Right now, the sole stack interface implemented in the toolchain is to Red Hat's "cluster" project. In time, we anticipate that ocfs2-tools will grow support for other cluster stacks, including linux-ha. Our thanks go to the folks involved in the "cluster" project. Their help and advice was instrumental to getting this together. Finally, my apologies for the large e-mail. There's a lot of patches here and I wanted to make sure any interested parties had a good idea of what they represent. --Mark Git branch with these changes: git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2.git upstream-linus Diffstat and shortlog: Documentation/ABI/obsolete/o2cb | 11 + Documentation/ABI/stable/o2cb | 10 + Documentation/ABI/testing/sysfs-ocfs2 | 89 +++ Documentation/feature-removal-schedule.txt | 10 + MAINTAINERS | 1 + fs/Kconfig | 26 + fs/ocfs2/Makefile | 14 +- fs/ocfs2/alloc.c | 465 +++++++++++++-- fs/ocfs2/aops.c | 6 +- fs/ocfs2/cluster/sys.c | 9 + fs/ocfs2/cluster/tcp.c | 96 ++-- fs/ocfs2/cluster/tcp_internal.h | 2 + fs/ocfs2/dlm/dlmcommon.h | 49 ++ fs/ocfs2/dlm/dlmdebug.c | 911 +++++++++++++++++++++++++--- fs/ocfs2/dlm/dlmdebug.h | 86 +++ fs/ocfs2/dlm/dlmdomain.c | 70 ++- fs/ocfs2/dlm/dlmlock.c | 22 +- fs/ocfs2/dlm/dlmmaster.c | 200 ++----- fs/ocfs2/dlmglue.c | 645 ++++++++++++-------- fs/ocfs2/dlmglue.h | 5 +- fs/ocfs2/file.c | 4 +- fs/ocfs2/heartbeat.c | 184 +------ fs/ocfs2/heartbeat.h | 17 +- fs/ocfs2/ioctl.c | 13 +- fs/ocfs2/ioctl.h | 3 +- fs/ocfs2/journal.c | 211 ++++++- fs/ocfs2/journal.h | 4 + fs/ocfs2/localalloc.c | 4 + fs/ocfs2/namei.c | 4 +- fs/ocfs2/ocfs2.h | 77 ++- fs/ocfs2/ocfs2_fs.h | 79 +++- fs/ocfs2/ocfs2_lockid.h | 2 +- fs/ocfs2/slot_map.c | 454 +++++++++++--- fs/ocfs2/slot_map.h | 32 +- fs/ocfs2/stack_o2cb.c | 420 +++++++++++++ fs/ocfs2/stack_user.c | 883 +++++++++++++++++++++++++++ fs/ocfs2/stackglue.c | 568 +++++++++++++++++ fs/ocfs2/stackglue.h | 261 ++++++++ fs/ocfs2/suballoc.c | 103 +++- fs/ocfs2/suballoc.h | 1 + fs/ocfs2/super.c | 208 ++++--- fs/sysfs/symlink.c | 9 +- 42 files changed, 5230 insertions(+), 1038 deletions(-) create mode 100644 Documentation/ABI/obsolete/o2cb create mode 100644 Documentation/ABI/stable/o2cb create mode 100644 Documentation/ABI/testing/sysfs-ocfs2 create mode 100644 fs/ocfs2/dlm/dlmdebug.h create mode 100644 fs/ocfs2/stack_o2cb.c create mode 100644 fs/ocfs2/stack_user.c create mode 100644 fs/ocfs2/stackglue.c create mode 100644 fs/ocfs2/stackglue.h Andi Kleen (1): ocfs2: Convert ocfs2 over to unlocked_ioctl David Teigland (2): ocfs2: handle async EAGAIN from NOQUEUE request ocfs2: add fsdlm to stackglue Jan Kara (1): ocfs2: Improve rename locking Jeff Mahoney (1): ocfs2/cluster: Get rid of arguments to the timeout routines Joel Becker (33): ocfs2: Make ocfs2_slot_info private. ocfs2: Change the recovery map to an array of node numbers. ocfs2: slot_map I/O based on max_slots. ocfs2: De-magic the in-memory slot map. ocfs2: Define the contents of the slot_map file. ocfs2: New slot map format ocfs2: Separate out dlm lock functions. ocfs2: Use global DLM_ constants in generic code. ocfs2: Use -errno instead of dlm_status for ocfs2_dlm_lock/unlock() API. ocfs2: Create the lock status block union. ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API. ocfs2: Abstract out node number queries. ocfs2: Move o2hb functionality into the stack glue. ocfs2: Remove CANCELGRANT from the view of dlmglue. ocfs2: Abstract out a debugging function for underlying dlms. ocfs2: Clean up stackglue initialization ocfs2: Split o2cb code from generic stack functions. ocfs2: Create ocfs2_stack_operations and split out the o2cb stack. ocfs2: Break out stackglue into modules. ocfs2: Create stack glue sysfs files. ocfs2: Add the USERSPACE_STACK incompat bit. ocfs2: Add the 'cluster_stack' sysfs file. ocfs2: Add the user stack module. ocfs2: Add the ocfs2_control misc device. ocfs2: Start the ocfs2_control handshake. ocfs2: Introduce the DOWN message to ocfs2_control ocfs2: Add the local node id to the handshake. ocfs2: Add the 'set version' message to the ocfs2_control device. ocfs2: Change mlog_bug_on to BUG_ON in ocfs2_lockid.h ocfs2: Add kbuild for ocfs2_stack_user.ko ocfs2: Allow selection of cluster plug-ins. ocfs2: Document /sys/fs/ocfs2 ocfs2: Put tree in MAINTAINERS Julia Lawall (2): fs/ocfs2/aops.c: test for IS_ERR rather than 0 ocfs2: Use BUG_ON Mark Fasheh (4): ocfs2: Move slot map access into slot_map.c ocfs2: Fill node number during cluster stack init sysfs: Allow removal of symlinks in the sysfs root ocfs2: Move /sys/o2cb to /sys/fs/o2cb Sunil Mushran (12): ocfs2/dlm: Rename slabcache dlm_mle_cache to o2dlm_mle ocfs2/dlm: Create slabcaches for lock and lockres ocfs2/dlm: Link all lockres' to a tracking list ocfs2/dlm: Create debugfs dirs ocfs2/dlm: Dump the dlm state in a debugfs file ocfs2/dlm: Dumps the lockres' into a debugfs file ocfs2/dlm: Move struct dlm_master_list_entry to dlmcommon.h ocfs2/dlm: Dumps the mles into a debugfs file ocfs2/dlm: Dumps the purgelist into a debugfs file ocfs2/dlm: Move dlm_print_one_mle() from dlmmaster.c to dlmdebug.c ocfs2/dlm: Fix lockname in lockres print function ocfs2/dlm: Cleanup lockres print Tao Ma (6): ocfs2: Reconnect after idle time out. ocfs2: Add support for cross extent block ocfs2: Enable cross extent block merge. ocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits ocfs2: Add ac_alloc_slot in ocfs2_alloc_context ocfs2: Add inode stealing for ocfs2_reserve_new_inode
Mark Fasheh
2008-Apr-02 20:14 UTC
[Ocfs2-devel] [PATCH 01/62] ocfs2: Move slot map access into slot_map.c
journal.c and dlmglue.c would refresh the slot map by hand. Instead, have the update and clear functions do the work inside slot_map.c. The eventual result is to make ocfs2_slot_info defined privately in slot_map.c Signed-off-by: Joel Becker <joel.becker at oracle.com> Signed-off-by: Mark Fasheh <mfasheh at suse.com> --- fs/ocfs2/dlmglue.c | 8 +----- fs/ocfs2/journal.c | 3 +- fs/ocfs2/slot_map.c | 62 +++++++++++++++++++++++++++++++++++++++----------- fs/ocfs2/slot_map.h | 11 +++----- fs/ocfs2/super.c | 3 +- 5 files changed, 55 insertions(+), 32 deletions(-) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 1f1873b..1a80fa9 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -2132,8 +2132,6 @@ int ocfs2_super_lock(struct ocfs2_super *osb, int status = 0; int level = ex ? LKM_EXMODE : LKM_PRMODE; struct ocfs2_lock_res *lockres = &osb->osb_super_lockres; - struct buffer_head *bh; - struct ocfs2_slot_info *si = osb->slot_info; mlog_entry_void(); @@ -2159,11 +2157,7 @@ int ocfs2_super_lock(struct ocfs2_super *osb, goto bail; } if (status) { - bh = si->si_bh; - status = ocfs2_read_block(osb, bh->b_blocknr, &bh, 0, - si->si_inode); - if (status == 0) - ocfs2_update_slot_info(si); + status = ocfs2_refresh_slot_info(osb); ocfs2_complete_lock_res_refresh(lockres, status); diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index f31c7e8..c2e654e 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -1123,8 +1123,7 @@ static int ocfs2_recover_node(struct ocfs2_super *osb, /* Likewise, this would be a strange but ultimately not so * harmful place to get an error... */ - ocfs2_clear_slot(si, slot_num); - status = ocfs2_update_disk_slots(osb, si); + status = ocfs2_clear_slot(osb, slot_num); if (status < 0) mlog_errno(status); diff --git a/fs/ocfs2/slot_map.c b/fs/ocfs2/slot_map.c index 3a50ce5..f5727b8 100644 --- a/fs/ocfs2/slot_map.c +++ b/fs/ocfs2/slot_map.c @@ -49,7 +49,7 @@ static void __ocfs2_fill_slot(struct ocfs2_slot_info *si, s16 node_num); /* post the slot information on disk into our slot_info struct. */ -void ocfs2_update_slot_info(struct ocfs2_slot_info *si) +static void ocfs2_update_slot_info(struct ocfs2_slot_info *si) { int i; __le16 *disk_info; @@ -65,10 +65,27 @@ void ocfs2_update_slot_info(struct ocfs2_slot_info *si) spin_unlock(&si->si_lock); } +int ocfs2_refresh_slot_info(struct ocfs2_super *osb) +{ + int ret; + struct ocfs2_slot_info *si = osb->slot_info; + struct buffer_head *bh; + + if (si == NULL) + return 0; + + bh = si->si_bh; + ret = ocfs2_read_block(osb, bh->b_blocknr, &bh, 0, si->si_inode); + if (ret == 0) + ocfs2_update_slot_info(si); + + return ret; +} + /* post the our slot info stuff into it's destination bh and write it * out. */ -int ocfs2_update_disk_slots(struct ocfs2_super *osb, - struct ocfs2_slot_info *si) +static int ocfs2_update_disk_slots(struct ocfs2_super *osb, + struct ocfs2_slot_info *si) { int status, i; __le16 *disk_info = (__le16 *) si->si_bh->b_data; @@ -135,6 +152,19 @@ s16 ocfs2_node_num_to_slot(struct ocfs2_slot_info *si, return ret; } +static void __ocfs2_free_slot_info(struct ocfs2_slot_info *si) +{ + if (si == NULL) + return; + + if (si->si_inode) + iput(si->si_inode); + if (si->si_bh) + brelse(si->si_bh); + + kfree(si); +} + static void __ocfs2_fill_slot(struct ocfs2_slot_info *si, s16 slot_num, s16 node_num) @@ -147,12 +177,18 @@ static void __ocfs2_fill_slot(struct ocfs2_slot_info *si, si->si_global_node_nums[slot_num] = node_num; } -void ocfs2_clear_slot(struct ocfs2_slot_info *si, - s16 slot_num) +int ocfs2_clear_slot(struct ocfs2_super *osb, s16 slot_num) { + struct ocfs2_slot_info *si = osb->slot_info; + + if (si == NULL) + return 0; + spin_lock(&si->si_lock); __ocfs2_fill_slot(si, slot_num, OCFS2_INVALID_SLOT); spin_unlock(&si->si_lock); + + return ocfs2_update_disk_slots(osb, osb->slot_info); } int ocfs2_init_slot_info(struct ocfs2_super *osb) @@ -202,18 +238,17 @@ int ocfs2_init_slot_info(struct ocfs2_super *osb) osb->slot_info = si; bail: if (status < 0 && si) - ocfs2_free_slot_info(si); + __ocfs2_free_slot_info(si); return status; } -void ocfs2_free_slot_info(struct ocfs2_slot_info *si) +void ocfs2_free_slot_info(struct ocfs2_super *osb) { - if (si->si_inode) - iput(si->si_inode); - if (si->si_bh) - brelse(si->si_bh); - kfree(si); + struct ocfs2_slot_info *si = osb->slot_info; + + osb->slot_info = NULL; + __ocfs2_free_slot_info(si); } int ocfs2_find_slot(struct ocfs2_super *osb) @@ -285,7 +320,6 @@ void ocfs2_put_slot(struct ocfs2_super *osb) } bail: - osb->slot_info = NULL; - ocfs2_free_slot_info(si); + ocfs2_free_slot_info(osb); } diff --git a/fs/ocfs2/slot_map.h b/fs/ocfs2/slot_map.h index 1025872..b029ffd 100644 --- a/fs/ocfs2/slot_map.h +++ b/fs/ocfs2/slot_map.h @@ -30,7 +30,7 @@ struct ocfs2_slot_info { spinlock_t si_lock; - struct inode *si_inode; + struct inode *si_inode; struct buffer_head *si_bh; unsigned int si_num_slots; unsigned int si_size; @@ -38,19 +38,16 @@ struct ocfs2_slot_info { }; int ocfs2_init_slot_info(struct ocfs2_super *osb); -void ocfs2_free_slot_info(struct ocfs2_slot_info *si); +void ocfs2_free_slot_info(struct ocfs2_super *osb); int ocfs2_find_slot(struct ocfs2_super *osb); void ocfs2_put_slot(struct ocfs2_super *osb); -void ocfs2_update_slot_info(struct ocfs2_slot_info *si); -int ocfs2_update_disk_slots(struct ocfs2_super *osb, - struct ocfs2_slot_info *si); +int ocfs2_refresh_slot_info(struct ocfs2_super *osb); s16 ocfs2_node_num_to_slot(struct ocfs2_slot_info *si, s16 global); -void ocfs2_clear_slot(struct ocfs2_slot_info *si, - s16 slot_num); +int ocfs2_clear_slot(struct ocfs2_super *osb, s16 slot_num); static inline int ocfs2_is_empty_slot(struct ocfs2_slot_info *si, int slot_num) diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index bec75af..fad37af 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -1724,8 +1724,7 @@ static void ocfs2_delete_osb(struct ocfs2_super *osb) /* This function assumes that the caller has the main osb resource */ - if (osb->slot_info) - ocfs2_free_slot_info(osb->slot_info); + ocfs2_free_slot_info(osb); kfree(osb->osb_orphan_wipes); /* FIXME -- 1.5.4.1
Mark Fasheh
2008-Apr-02 20:14 UTC
[Ocfs2-devel] [PATCH 02/62] ocfs2: Make ocfs2_slot_info private.
From: Joel Becker <joel.becker at oracle.com> Just use osb_lock around the ocfs2_slot_info data. This allows us to take the ocfs2_slot_info structure private in slot_info.c. All access is now via accessors. Signed-off-by: Joel Becker <joel.becker at oracle.com> Signed-off-by: Mark Fasheh <mfasheh at suse.com> --- fs/ocfs2/journal.c | 24 +++++++------- fs/ocfs2/ocfs2.h | 1 + fs/ocfs2/slot_map.c | 81 ++++++++++++++++++++++++++++++++++++--------------- fs/ocfs2/slot_map.h | 25 ++------------- 4 files changed, 74 insertions(+), 57 deletions(-) diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index c2e654e..ed0c6d0 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -1079,7 +1079,6 @@ static int ocfs2_recover_node(struct ocfs2_super *osb, { int status = 0; int slot_num; - struct ocfs2_slot_info *si = osb->slot_info; struct ocfs2_dinode *la_copy = NULL; struct ocfs2_dinode *tl_copy = NULL; @@ -1092,8 +1091,8 @@ static int ocfs2_recover_node(struct ocfs2_super *osb, * case we should've called ocfs2_journal_load instead. */ BUG_ON(osb->node_num == node_num); - slot_num = ocfs2_node_num_to_slot(si, node_num); - if (slot_num == OCFS2_INVALID_SLOT) { + slot_num = ocfs2_node_num_to_slot(osb, node_num); + if (slot_num == -ENOENT) { status = 0; mlog(0, "no slot for this node, so no recovery required.\n"); goto done; @@ -1183,23 +1182,24 @@ bail: * slot info struct has been updated from disk. */ int ocfs2_mark_dead_nodes(struct ocfs2_super *osb) { - int status, i, node_num; - struct ocfs2_slot_info *si = osb->slot_info; + unsigned int node_num; + int status, i; /* This is called with the super block cluster lock, so we * know that the slot map can't change underneath us. */ - spin_lock(&si->si_lock); - for(i = 0; i < si->si_num_slots; i++) { + spin_lock(&osb->osb_lock); + for (i = 0; i < osb->max_slots; i++) { if (i == osb->slot_num) continue; - if (ocfs2_is_empty_slot(si, i)) + + status = ocfs2_slot_to_node_num_locked(osb, i, &node_num); + if (status == -ENOENT) continue; - node_num = si->si_global_node_nums[i]; if (ocfs2_node_map_test_bit(osb, &osb->recovery_map, node_num)) continue; - spin_unlock(&si->si_lock); + spin_unlock(&osb->osb_lock); /* Ok, we have a slot occupied by another node which * is not in the recovery map. We trylock his journal @@ -1215,9 +1215,9 @@ int ocfs2_mark_dead_nodes(struct ocfs2_super *osb) goto bail; } - spin_lock(&si->si_lock); + spin_lock(&osb->osb_lock); } - spin_unlock(&si->si_lock); + spin_unlock(&osb->osb_lock); status = 0; bail: diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h index 6546cef..ee3f675 100644 --- a/fs/ocfs2/ocfs2.h +++ b/fs/ocfs2/ocfs2.h @@ -179,6 +179,7 @@ enum ocfs2_mount_options #define OCFS2_DEFAULT_ATIME_QUANTUM 60 struct ocfs2_journal; +struct ocfs2_slot_info; struct ocfs2_super { struct task_struct *commit_task; diff --git a/fs/ocfs2/slot_map.c b/fs/ocfs2/slot_map.c index f5727b8..762360d 100644 --- a/fs/ocfs2/slot_map.c +++ b/fs/ocfs2/slot_map.c @@ -42,13 +42,25 @@ #include "buffer_head_io.h" +struct ocfs2_slot_info { + struct inode *si_inode; + struct buffer_head *si_bh; + unsigned int si_num_slots; + unsigned int si_size; + s16 si_global_node_nums[OCFS2_MAX_SLOTS]; +}; + + static s16 __ocfs2_node_num_to_slot(struct ocfs2_slot_info *si, s16 global); static void __ocfs2_fill_slot(struct ocfs2_slot_info *si, s16 slot_num, s16 node_num); -/* post the slot information on disk into our slot_info struct. */ +/* + * Post the slot information on disk into our slot_info struct. + * Must be protected by osb_lock. + */ static void ocfs2_update_slot_info(struct ocfs2_slot_info *si) { int i; @@ -56,13 +68,10 @@ static void ocfs2_update_slot_info(struct ocfs2_slot_info *si) /* we don't read the slot block here as ocfs2_super_lock * should've made sure we have the most recent copy. */ - spin_lock(&si->si_lock); disk_info = (__le16 *) si->si_bh->b_data; for (i = 0; i < si->si_size; i++) si->si_global_node_nums[i] = le16_to_cpu(disk_info[i]); - - spin_unlock(&si->si_lock); } int ocfs2_refresh_slot_info(struct ocfs2_super *osb) @@ -76,8 +85,11 @@ int ocfs2_refresh_slot_info(struct ocfs2_super *osb) bh = si->si_bh; ret = ocfs2_read_block(osb, bh->b_blocknr, &bh, 0, si->si_inode); - if (ret == 0) + if (ret == 0) { + spin_lock(&osb->osb_lock); ocfs2_update_slot_info(si); + spin_unlock(&osb->osb_lock); + } return ret; } @@ -90,10 +102,10 @@ static int ocfs2_update_disk_slots(struct ocfs2_super *osb, int status, i; __le16 *disk_info = (__le16 *) si->si_bh->b_data; - spin_lock(&si->si_lock); + spin_lock(&osb->osb_lock); for (i = 0; i < si->si_size; i++) disk_info[i] = cpu_to_le16(si->si_global_node_nums[i]); - spin_unlock(&si->si_lock); + spin_unlock(&osb->osb_lock); status = ocfs2_write_block(osb, si->si_bh, si->si_inode); if (status < 0) @@ -119,7 +131,8 @@ static s16 __ocfs2_node_num_to_slot(struct ocfs2_slot_info *si, return ret; } -static s16 __ocfs2_find_empty_slot(struct ocfs2_slot_info *si, s16 preferred) +static s16 __ocfs2_find_empty_slot(struct ocfs2_slot_info *si, + s16 preferred) { int i; s16 ret = OCFS2_INVALID_SLOT; @@ -141,15 +154,36 @@ out: return ret; } -s16 ocfs2_node_num_to_slot(struct ocfs2_slot_info *si, - s16 global) +int ocfs2_node_num_to_slot(struct ocfs2_super *osb, unsigned int node_num) { - s16 ret; + s16 slot; + struct ocfs2_slot_info *si = osb->slot_info; - spin_lock(&si->si_lock); - ret = __ocfs2_node_num_to_slot(si, global); - spin_unlock(&si->si_lock); - return ret; + spin_lock(&osb->osb_lock); + slot = __ocfs2_node_num_to_slot(si, node_num); + spin_unlock(&osb->osb_lock); + + if (slot == OCFS2_INVALID_SLOT) + return -ENOENT; + + return slot; +} + +int ocfs2_slot_to_node_num_locked(struct ocfs2_super *osb, int slot_num, + unsigned int *node_num) +{ + struct ocfs2_slot_info *si = osb->slot_info; + + assert_spin_locked(&osb->osb_lock); + + BUG_ON(slot_num < 0); + BUG_ON(slot_num > osb->max_slots); + + if (si->si_global_node_nums[slot_num] == OCFS2_INVALID_SLOT) + return -ENOENT; + + *node_num = si->si_global_node_nums[slot_num]; + return 0; } static void __ocfs2_free_slot_info(struct ocfs2_slot_info *si) @@ -184,9 +218,9 @@ int ocfs2_clear_slot(struct ocfs2_super *osb, s16 slot_num) if (si == NULL) return 0; - spin_lock(&si->si_lock); + spin_lock(&osb->osb_lock); __ocfs2_fill_slot(si, slot_num, OCFS2_INVALID_SLOT); - spin_unlock(&si->si_lock); + spin_unlock(&osb->osb_lock); return ocfs2_update_disk_slots(osb, osb->slot_info); } @@ -206,7 +240,6 @@ int ocfs2_init_slot_info(struct ocfs2_super *osb) goto bail; } - spin_lock_init(&si->si_lock); si->si_num_slots = osb->max_slots; si->si_size = OCFS2_MAX_SLOTS; @@ -235,7 +268,7 @@ int ocfs2_init_slot_info(struct ocfs2_super *osb) si->si_inode = inode; si->si_bh = bh; - osb->slot_info = si; + osb->slot_info = (struct ocfs2_slot_info *)si; bail: if (status < 0 && si) __ocfs2_free_slot_info(si); @@ -261,9 +294,9 @@ int ocfs2_find_slot(struct ocfs2_super *osb) si = osb->slot_info; + spin_lock(&osb->osb_lock); ocfs2_update_slot_info(si); - spin_lock(&si->si_lock); /* search for ourselves first and take the slot if it already * exists. Perhaps we need to mark this in a variable for our * own journal recovery? Possibly not, though we certainly @@ -274,7 +307,7 @@ int ocfs2_find_slot(struct ocfs2_super *osb) * one. */ slot = __ocfs2_find_empty_slot(si, osb->preferred_slot); if (slot == OCFS2_INVALID_SLOT) { - spin_unlock(&si->si_lock); + spin_unlock(&osb->osb_lock); mlog(ML_ERROR, "no free slots available!\n"); status = -EINVAL; goto bail; @@ -285,7 +318,7 @@ int ocfs2_find_slot(struct ocfs2_super *osb) __ocfs2_fill_slot(si, slot, osb->node_num); osb->slot_num = slot; - spin_unlock(&si->si_lock); + spin_unlock(&osb->osb_lock); mlog(0, "taking node slot %d\n", osb->slot_num); @@ -306,12 +339,12 @@ void ocfs2_put_slot(struct ocfs2_super *osb) if (!si) return; + spin_lock(&osb->osb_lock); ocfs2_update_slot_info(si); - spin_lock(&si->si_lock); __ocfs2_fill_slot(si, osb->slot_num, OCFS2_INVALID_SLOT); osb->slot_num = OCFS2_INVALID_SLOT; - spin_unlock(&si->si_lock); + spin_unlock(&osb->osb_lock); status = ocfs2_update_disk_slots(osb, si); if (status < 0) { diff --git a/fs/ocfs2/slot_map.h b/fs/ocfs2/slot_map.h index b029ffd..5118e89 100644 --- a/fs/ocfs2/slot_map.h +++ b/fs/ocfs2/slot_map.h @@ -27,16 +27,6 @@ #ifndef SLOTMAP_H #define SLOTMAP_H -struct ocfs2_slot_info { - spinlock_t si_lock; - - struct inode *si_inode; - struct buffer_head *si_bh; - unsigned int si_num_slots; - unsigned int si_size; - s16 si_global_node_nums[OCFS2_MAX_SLOTS]; -}; - int ocfs2_init_slot_info(struct ocfs2_super *osb); void ocfs2_free_slot_info(struct ocfs2_super *osb); @@ -45,17 +35,10 @@ void ocfs2_put_slot(struct ocfs2_super *osb); int ocfs2_refresh_slot_info(struct ocfs2_super *osb); -s16 ocfs2_node_num_to_slot(struct ocfs2_slot_info *si, - s16 global); -int ocfs2_clear_slot(struct ocfs2_super *osb, s16 slot_num); +int ocfs2_node_num_to_slot(struct ocfs2_super *osb, unsigned int node_num); +int ocfs2_slot_to_node_num_locked(struct ocfs2_super *osb, int slot_num, + unsigned int *node_num); -static inline int ocfs2_is_empty_slot(struct ocfs2_slot_info *si, - int slot_num) -{ - BUG_ON(slot_num == OCFS2_INVALID_SLOT); - assert_spin_locked(&si->si_lock); - - return si->si_global_node_nums[slot_num] == OCFS2_INVALID_SLOT; -} +int ocfs2_clear_slot(struct ocfs2_super *osb, s16 slot_num); #endif -- 1.5.4.1
Mark Fasheh
2008-Apr-02 20:14 UTC
[Ocfs2-devel] [PATCH 34/62] ocfs2: Add kbuild for ocfs2_stack_user.ko
From: Joel Becker <joel.becker at oracle.com> Add ocfs2_stack_user.ko to the Makefile so that it builds. Signed-off-by: Joel Becker <joel.becker at oracle.com> Signed-off-by: Mark Fasheh <mfasheh at suse.com> --- fs/ocfs2/Makefile | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile index b734254..b8d6d02 100644 --- a/fs/ocfs2/Makefile +++ b/fs/ocfs2/Makefile @@ -2,7 +2,11 @@ EXTRA_CFLAGS += -Ifs/ocfs2 EXTRA_CFLAGS += -DCATCH_BH_JBD_RACES -obj-$(CONFIG_OCFS2_FS) += ocfs2.o ocfs2_stackglue.o ocfs2_stack_o2cb.o +obj-$(CONFIG_OCFS2_FS) += \ + ocfs2.o \ + ocfs2_stackglue.o \ + ocfs2_stack_o2cb.o \ + ocfs2_stack_user.o ocfs2-objs := \ alloc.o \ @@ -33,6 +37,7 @@ ocfs2-objs := \ ocfs2_stackglue-objs := stackglue.o ocfs2_stack_o2cb-objs := stack_o2cb.o +ocfs2_stack_user-objs := stack_user.o obj-$(CONFIG_OCFS2_FS) += cluster/ obj-$(CONFIG_OCFS2_FS) += dlm/ -- 1.5.4.1
Mark Fasheh
2008-Apr-02 20:14 UTC
[Ocfs2-devel] [PATCH 35/62] ocfs2: Allow selection of cluster plug-ins.
From: Joel Becker <joel.becker at oracle.com> ocfs2 now supports plug-ins for the classic O2CB stack as well as userspace cluster stacks in conjunction with fs/dlm. This allows zero, one, or both of the plug-ins to be selected in Kconfig. For local mounts (non-clustered), neither plug-in is needed. Both plugins can be loaded at one time, the runtime will select the one needed for the cluster systme in use. Signed-off-by: Joel Becker <joel.becker at oracle.com> Signed-off-by: Mark Fasheh <mfasheh at suse.com> --- fs/Kconfig | 26 ++++++++++++++++++++++++++ fs/ocfs2/Makefile | 10 ++++++---- 2 files changed, 32 insertions(+), 4 deletions(-) diff --git a/fs/Kconfig b/fs/Kconfig index d731282..c7b50ce 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -444,6 +444,32 @@ config OCFS2_FS For more information on OCFS2, see the file <file:Documentation/filesystems/ocfs2.txt>. +config OCFS2_FS_O2CB + tristate "O2CB Kernelspace Clustering" + depends on OCFS2_FS + default y + help + OCFS2 includes a simple kernelspace clustering package, the OCFS2 + Cluster Base. It only requires a very small userspace component + to configure it. This comes with the standard ocfs2-tools package. + O2CB is limited to maintaining a cluster for OCFS2 file systems. + It cannot manage any other cluster applications. + + It is always safe to say Y here, as the clustering method is + run-time selectable. + +config OCFS2_FS_USERSPACE_CLUSTER + tristate "OCFS2 Userspace Clustering" + depends on OCFS2_FS && DLM + default y + help + This option will allow OCFS2 to use userspace clustering services + in conjunction with the DLM in fs/dlm. If you are using a + userspace cluster manager, say Y here. + + It is safe to say Y, as the clustering method is run-time + selectable. + config OCFS2_DEBUG_MASKLOG bool "OCFS2 logging support" depends on OCFS2_FS diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile index b8d6d02..f6956de 100644 --- a/fs/ocfs2/Makefile +++ b/fs/ocfs2/Makefile @@ -4,9 +4,10 @@ EXTRA_CFLAGS += -DCATCH_BH_JBD_RACES obj-$(CONFIG_OCFS2_FS) += \ ocfs2.o \ - ocfs2_stackglue.o \ - ocfs2_stack_o2cb.o \ - ocfs2_stack_user.o + ocfs2_stackglue.o + +obj-$(CONFIG_OCFS2_FS_O2CB) += ocfs2_stack_o2cb.o +obj-$(CONFIG_OCFS2_FS_USERSPACE_CLUSTER) += ocfs2_stack_user.o ocfs2-objs := \ alloc.o \ @@ -39,5 +40,6 @@ ocfs2_stackglue-objs := stackglue.o ocfs2_stack_o2cb-objs := stack_o2cb.o ocfs2_stack_user-objs := stack_user.o +# cluster/ is always needed when OCFS2_FS for masklog support obj-$(CONFIG_OCFS2_FS) += cluster/ -obj-$(CONFIG_OCFS2_FS) += dlm/ +obj-$(CONFIG_OCFS2_FS_O2CB) += dlm/ -- 1.5.4.1
Mark Fasheh
2008-Apr-02 20:14 UTC
[Ocfs2-devel] [PATCH 41/62] ocfs2/dlm: Dump the dlm state in a debugfs file
From: Sunil Mushran <sunil.mushran at oracle.com> This patch dumps the dlm state (dlm_ctxt) into a debugfs file. Useful for debugging. Signed-off-by: Sunil Mushran <sunil.mushran at oracle.com> Signed-off-by: Joel Becker <joel.becker at oracle.com> Signed-off-by: Mark Fasheh <mfasheh at suse.com> --- fs/ocfs2/dlm/dlmcommon.h | 1 + fs/ocfs2/dlm/dlmdebug.c | 296 ++++++++++++++++++++++++++++++++++++++++++++++ fs/ocfs2/dlm/dlmdebug.h | 20 +++ fs/ocfs2/dlm/dlmdomain.c | 8 ++ 4 files changed, 325 insertions(+), 0 deletions(-) diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h index 6a49140..f7a51ca 100644 --- a/fs/ocfs2/dlm/dlmcommon.h +++ b/fs/ocfs2/dlm/dlmcommon.h @@ -123,6 +123,7 @@ struct dlm_ctxt atomic_t remote_resources; atomic_t unknown_resources; + struct dlm_debug_ctxt *dlm_debug_ctxt; struct dentry *dlm_debugfs_subroot; /* NOTE: Next three are protected by dlm_domain_lock */ diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c index 62e2a4c..e335403 100644 --- a/fs/ocfs2/dlm/dlmdebug.c +++ b/fs/ocfs2/dlm/dlmdebug.c @@ -274,6 +274,294 @@ EXPORT_SYMBOL_GPL(dlm_errname); static struct dentry *dlm_debugfs_root = NULL; #define DLM_DEBUGFS_DIR "o2dlm" +#define DLM_DEBUGFS_DLM_STATE "dlm_state" + +/* begin - utils funcs */ +static void dlm_debug_free(struct kref *kref) +{ + struct dlm_debug_ctxt *dc; + + dc = container_of(kref, struct dlm_debug_ctxt, debug_refcnt); + + kfree(dc); +} + +void dlm_debug_put(struct dlm_debug_ctxt *dc) +{ + if (dc) + kref_put(&dc->debug_refcnt, dlm_debug_free); +} + +static void dlm_debug_get(struct dlm_debug_ctxt *dc) +{ + kref_get(&dc->debug_refcnt); +} + +static int stringify_nodemap(unsigned long *nodemap, int maxnodes, + char *buf, int len) +{ + int out = 0; + int i = -1; + + while ((i = find_next_bit(nodemap, maxnodes, i + 1)) < maxnodes) + out += snprintf(buf + out, len - out, "%d ", i); + + return out; +} + +static struct debug_buffer *debug_buffer_allocate(void) +{ + struct debug_buffer *db = NULL; + + db = kzalloc(sizeof(struct debug_buffer), GFP_KERNEL); + if (!db) + goto bail; + + db->len = PAGE_SIZE; + db->buf = kmalloc(db->len, GFP_KERNEL); + if (!db->buf) + goto bail; + + return db; +bail: + kfree(db); + return NULL; +} + +static ssize_t debug_buffer_read(struct file *file, char __user *buf, + size_t nbytes, loff_t *ppos) +{ + struct debug_buffer *db = file->private_data; + + return simple_read_from_buffer(buf, nbytes, ppos, db->buf, db->len); +} + +static loff_t debug_buffer_llseek(struct file *file, loff_t off, int whence) +{ + struct debug_buffer *db = file->private_data; + loff_t new = -1; + + switch (whence) { + case 0: + new = off; + break; + case 1: + new = file->f_pos + off; + break; + } + + if (new < 0 || new > db->len) + return -EINVAL; + + return (file->f_pos = new); +} + +static int debug_buffer_release(struct inode *inode, struct file *file) +{ + struct debug_buffer *db = (struct debug_buffer *)file->private_data; + + if (db) + kfree(db->buf); + kfree(db); + + return 0; +} +/* end - util funcs */ + +/* begin - debug state funcs */ +static int debug_state_print(struct dlm_ctxt *dlm, struct debug_buffer *db) +{ + int out = 0; + struct dlm_reco_node_data *node; + char *state; + int lres, rres, ures, tres; + + lres = atomic_read(&dlm->local_resources); + rres = atomic_read(&dlm->remote_resources); + ures = atomic_read(&dlm->unknown_resources); + tres = lres + rres + ures; + + spin_lock(&dlm->spinlock); + + switch (dlm->dlm_state) { + case DLM_CTXT_NEW: + state = "NEW"; break; + case DLM_CTXT_JOINED: + state = "JOINED"; break; + case DLM_CTXT_IN_SHUTDOWN: + state = "SHUTDOWN"; break; + case DLM_CTXT_LEAVING: + state = "LEAVING"; break; + default: + state = "UNKNOWN"; break; + } + + /* Domain: xxxxxxxxxx Key: 0xdfbac769 */ + out += snprintf(db->buf + out, db->len - out, + "Domain: %s Key: 0x%08x\n", dlm->name, dlm->key); + + /* Thread Pid: xxx Node: xxx State: xxxxx */ + out += snprintf(db->buf + out, db->len - out, + "Thread Pid: %d Node: %d State: %s\n", + dlm->dlm_thread_task->pid, dlm->node_num, state); + + /* Number of Joins: xxx Joining Node: xxx */ + out += snprintf(db->buf + out, db->len - out, + "Number of Joins: %d Joining Node: %d\n", + dlm->num_joins, dlm->joining_node); + + /* Domain Map: xx xx xx */ + out += snprintf(db->buf + out, db->len - out, "Domain Map: "); + out += stringify_nodemap(dlm->domain_map, O2NM_MAX_NODES, + db->buf + out, db->len - out); + out += snprintf(db->buf + out, db->len - out, "\n"); + + /* Live Map: xx xx xx */ + out += snprintf(db->buf + out, db->len - out, "Live Map: "); + out += stringify_nodemap(dlm->live_nodes_map, O2NM_MAX_NODES, + db->buf + out, db->len - out); + out += snprintf(db->buf + out, db->len - out, "\n"); + + /* Mastered Resources Total: xxx Locally: xxx Remotely: ... */ + out += snprintf(db->buf + out, db->len - out, + "Mastered Resources Total: %d Locally: %d " + "Remotely: %d Unknown: %d\n", + tres, lres, rres, ures); + + /* Lists: Dirty=Empty Purge=InUse PendingASTs=Empty ... */ + out += snprintf(db->buf + out, db->len - out, + "Lists: Dirty=%s Purge=%s PendingASTs=%s " + "PendingBASTs=%s Master=%s\n", + (list_empty(&dlm->dirty_list) ? "Empty" : "InUse"), + (list_empty(&dlm->purge_list) ? "Empty" : "InUse"), + (list_empty(&dlm->pending_asts) ? "Empty" : "InUse"), + (list_empty(&dlm->pending_basts) ? "Empty" : "InUse"), + (list_empty(&dlm->master_list) ? "Empty" : "InUse")); + + /* Purge Count: xxx Refs: xxx */ + out += snprintf(db->buf + out, db->len - out, + "Purge Count: %d Refs: %d\n", dlm->purge_count, + atomic_read(&dlm->dlm_refs.refcount)); + + /* Dead Node: xxx */ + out += snprintf(db->buf + out, db->len - out, + "Dead Node: %d\n", dlm->reco.dead_node); + + /* What about DLM_RECO_STATE_FINALIZE? */ + if (dlm->reco.state == DLM_RECO_STATE_ACTIVE) + state = "ACTIVE"; + else + state = "INACTIVE"; + + /* Recovery Pid: xxxx Master: xxx State: xxxx */ + out += snprintf(db->buf + out, db->len - out, + "Recovery Pid: %d Master: %d State: %s\n", + dlm->dlm_reco_thread_task->pid, + dlm->reco.new_master, state); + + /* Recovery Map: xx xx */ + out += snprintf(db->buf + out, db->len - out, "Recovery Map: "); + out += stringify_nodemap(dlm->recovery_map, O2NM_MAX_NODES, + db->buf + out, db->len - out); + out += snprintf(db->buf + out, db->len - out, "\n"); + + /* Recovery Node State: */ + out += snprintf(db->buf + out, db->len - out, "Recovery Node State:\n"); + list_for_each_entry(node, &dlm->reco.node_data, list) { + switch (node->state) { + case DLM_RECO_NODE_DATA_INIT: + state = "INIT"; + break; + case DLM_RECO_NODE_DATA_REQUESTING: + state = "REQUESTING"; + break; + case DLM_RECO_NODE_DATA_DEAD: + state = "DEAD"; + break; + case DLM_RECO_NODE_DATA_RECEIVING: + state = "RECEIVING"; + break; + case DLM_RECO_NODE_DATA_REQUESTED: + state = "REQUESTED"; + break; + case DLM_RECO_NODE_DATA_DONE: + state = "DONE"; + break; + case DLM_RECO_NODE_DATA_FINALIZE_SENT: + state = "FINALIZE-SENT"; + break; + default: + state = "BAD"; + break; + } + out += snprintf(db->buf + out, db->len - out, "\t%u - %s\n", + node->node_num, state); + } + + spin_unlock(&dlm->spinlock); + + return out; +} + +static int debug_state_open(struct inode *inode, struct file *file) +{ + struct dlm_ctxt *dlm = inode->i_private; + struct debug_buffer *db = NULL; + + db = debug_buffer_allocate(); + if (!db) + goto bail; + + db->len = debug_state_print(dlm, db); + + file->private_data = db; + + return 0; +bail: + return -ENOMEM; +} + +static struct file_operations debug_state_fops = { + .open = debug_state_open, + .release = debug_buffer_release, + .read = debug_buffer_read, + .llseek = debug_buffer_llseek, +}; +/* end - debug state funcs */ + +/* files in subroot */ +int dlm_debug_init(struct dlm_ctxt *dlm) +{ + struct dlm_debug_ctxt *dc = dlm->dlm_debug_ctxt; + + /* for dumping dlm_ctxt */ + dc->debug_state_dentry = debugfs_create_file(DLM_DEBUGFS_DLM_STATE, + S_IFREG|S_IRUSR, + dlm->dlm_debugfs_subroot, + dlm, &debug_state_fops); + if (!dc->debug_state_dentry) { + mlog_errno(-ENOMEM); + goto bail; + } + + dlm_debug_get(dc); + return 0; + +bail: + dlm_debug_shutdown(dlm); + return -ENOMEM; +} + +void dlm_debug_shutdown(struct dlm_ctxt *dlm) +{ + struct dlm_debug_ctxt *dc = dlm->dlm_debug_ctxt; + + if (dc) { + if (dc->debug_state_dentry) + debugfs_remove(dc->debug_state_dentry); + dlm_debug_put(dc); + } +} /* subroot - domain dir */ int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm) @@ -285,6 +573,14 @@ int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm) goto bail; } + dlm->dlm_debug_ctxt = kzalloc(sizeof(struct dlm_debug_ctxt), + GFP_KERNEL); + if (!dlm->dlm_debug_ctxt) { + mlog_errno(-ENOMEM); + goto bail; + } + kref_init(&dlm->dlm_debug_ctxt->debug_refcnt); + return 0; bail: dlm_destroy_debugfs_subroot(dlm); diff --git a/fs/ocfs2/dlm/dlmdebug.h b/fs/ocfs2/dlm/dlmdebug.h index b969595..94cc10a 100644 --- a/fs/ocfs2/dlm/dlmdebug.h +++ b/fs/ocfs2/dlm/dlmdebug.h @@ -27,6 +27,19 @@ #ifdef CONFIG_DEBUG_FS +struct dlm_debug_ctxt { + struct kref debug_refcnt; + struct dentry *debug_state_dentry; +}; + +struct debug_buffer { + int len; + char *buf; +}; + +int dlm_debug_init(struct dlm_ctxt *dlm); +void dlm_debug_shutdown(struct dlm_ctxt *dlm); + int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm); void dlm_destroy_debugfs_subroot(struct dlm_ctxt *dlm); @@ -35,6 +48,13 @@ void dlm_destroy_debugfs_root(void); #else +static int dlm_debug_init(struct dlm_ctxt *dlm) +{ + return 0; +} +static void dlm_debug_shutdown(struct dlm_ctxt *dlm) +{ +} static int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm) { return 0; diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c index c137d69..63f8125 100644 --- a/fs/ocfs2/dlm/dlmdomain.c +++ b/fs/ocfs2/dlm/dlmdomain.c @@ -398,6 +398,7 @@ static void dlm_destroy_dlm_worker(struct dlm_ctxt *dlm) static void dlm_complete_dlm_shutdown(struct dlm_ctxt *dlm) { dlm_unregister_domain_handlers(dlm); + dlm_debug_shutdown(dlm); dlm_complete_thread(dlm); dlm_complete_recovery_thread(dlm); dlm_destroy_dlm_worker(dlm); @@ -1418,6 +1419,12 @@ static int dlm_join_domain(struct dlm_ctxt *dlm) goto bail; } + status = dlm_debug_init(dlm); + if (status < 0) { + mlog_errno(status); + goto bail; + } + status = dlm_launch_thread(dlm); if (status < 0) { mlog_errno(status); @@ -1485,6 +1492,7 @@ bail: if (status) { dlm_unregister_domain_handlers(dlm); + dlm_debug_shutdown(dlm); dlm_complete_thread(dlm); dlm_complete_recovery_thread(dlm); dlm_destroy_dlm_worker(dlm); -- 1.5.4.1
Mark Fasheh
2008-Apr-02 20:14 UTC
[Ocfs2-devel] [PATCH 44/62] ocfs2/dlm: Dumps the mles into a debugfs file
From: Sunil Mushran <sunil.mushran at oracle.com> This patch dumps all mles it can fit in one page into a debugfs file. Useful for debugging. Signed-off-by: Sunil Mushran <sunil.mushran at oracle.com> Signed-off-by: Joel Becker <joel.becker at oracle.com> Signed-off-by: Mark Fasheh <mfasheh at suse.com> --- fs/ocfs2/dlm/dlmdebug.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++ fs/ocfs2/dlm/dlmdebug.h | 1 + 2 files changed, 120 insertions(+), 0 deletions(-) diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c index cccb1ce..6de326b 100644 --- a/fs/ocfs2/dlm/dlmdebug.c +++ b/fs/ocfs2/dlm/dlmdebug.c @@ -302,6 +302,7 @@ static int stringify_lockname(const char *lockname, int locklen, #define DLM_DEBUGFS_DIR "o2dlm" #define DLM_DEBUGFS_DLM_STATE "dlm_state" #define DLM_DEBUGFS_LOCKING_STATE "locking_state" +#define DLM_DEBUGFS_MLE_STATE "mle_state" /* begin - utils funcs */ static void dlm_debug_free(struct kref *kref) @@ -395,6 +396,112 @@ static int debug_buffer_release(struct inode *inode, struct file *file) } /* end - util funcs */ +/* begin - debug mle funcs */ +static int dump_mle(struct dlm_master_list_entry *mle, char *buf, int len) +{ + int out = 0; + unsigned int namelen; + const char *name; + char *mle_type; + + if (mle->type != DLM_MLE_MASTER) { + namelen = mle->u.name.len; + name = mle->u.name.name; + } else { + namelen = mle->u.res->lockname.len; + name = mle->u.res->lockname.name; + } + + if (mle->type == DLM_MLE_BLOCK) + mle_type = "BLK"; + else if (mle->type == DLM_MLE_MASTER) + mle_type = "MAS"; + else + mle_type = "MIG"; + + out += stringify_lockname(name, namelen, buf + out, len - out); + out += snprintf(buf + out, len - out, + "\t%3s\tmas=%3u\tnew=%3u\tevt=%1d\tuse=%1d\tref=%3d\n", + mle_type, mle->master, mle->new_master, + !list_empty(&mle->hb_events), + !!mle->inuse, + atomic_read(&mle->mle_refs.refcount)); + + out += snprintf(buf + out, len - out, "Maybe="); + out += stringify_nodemap(mle->maybe_map, O2NM_MAX_NODES, + buf + out, len - out); + out += snprintf(buf + out, len - out, "\n"); + + out += snprintf(buf + out, len - out, "Vote="); + out += stringify_nodemap(mle->vote_map, O2NM_MAX_NODES, + buf + out, len - out); + out += snprintf(buf + out, len - out, "\n"); + + out += snprintf(buf + out, len - out, "Response="); + out += stringify_nodemap(mle->response_map, O2NM_MAX_NODES, + buf + out, len - out); + out += snprintf(buf + out, len - out, "\n"); + + out += snprintf(buf + out, len - out, "Node="); + out += stringify_nodemap(mle->node_map, O2NM_MAX_NODES, + buf + out, len - out); + out += snprintf(buf + out, len - out, "\n"); + + out += snprintf(buf + out, len - out, "\n"); + + return out; +} + +static int debug_mle_print(struct dlm_ctxt *dlm, struct debug_buffer *db) +{ + struct dlm_master_list_entry *mle; + int out = 0; + unsigned long total = 0; + + out += snprintf(db->buf + out, db->len - out, + "Dumping MLEs for Domain: %s\n", dlm->name); + + spin_lock(&dlm->master_lock); + list_for_each_entry(mle, &dlm->master_list, list) { + ++total; + if (db->len - out < 200) + continue; + out += dump_mle(mle, db->buf + out, db->len - out); + } + spin_unlock(&dlm->master_lock); + + out += snprintf(db->buf + out, db->len - out, + "Total on list: %ld\n", total); + return out; +} + +static int debug_mle_open(struct inode *inode, struct file *file) +{ + struct dlm_ctxt *dlm = inode->i_private; + struct debug_buffer *db; + + db = debug_buffer_allocate(); + if (!db) + goto bail; + + db->len = debug_mle_print(dlm, db); + + file->private_data = db; + + return 0; +bail: + return -ENOMEM; +} + +static struct file_operations debug_mle_fops = { + .open = debug_mle_open, + .release = debug_buffer_release, + .read = debug_buffer_read, + .llseek = debug_buffer_llseek, +}; + +/* end - debug mle funcs */ + /* begin - debug lockres funcs */ static int dump_lock(struct dlm_lock *lock, int list_type, char *buf, int len) { @@ -789,6 +896,16 @@ int dlm_debug_init(struct dlm_ctxt *dlm) goto bail; } + /* for dumping mles */ + dc->debug_mle_dentry = debugfs_create_file(DLM_DEBUGFS_MLE_STATE, + S_IFREG|S_IRUSR, + dlm->dlm_debugfs_subroot, + dlm, &debug_mle_fops); + if (!dc->debug_mle_dentry) { + mlog_errno(-ENOMEM); + goto bail; + } + dlm_debug_get(dc); return 0; @@ -802,6 +919,8 @@ void dlm_debug_shutdown(struct dlm_ctxt *dlm) struct dlm_debug_ctxt *dc = dlm->dlm_debug_ctxt; if (dc) { + if (dc->debug_mle_dentry) + debugfs_remove(dc->debug_mle_dentry); if (dc->debug_lockres_dentry) debugfs_remove(dc->debug_lockres_dentry); if (dc->debug_state_dentry) diff --git a/fs/ocfs2/dlm/dlmdebug.h b/fs/ocfs2/dlm/dlmdebug.h index 7c5b2b0..cbc69f2 100644 --- a/fs/ocfs2/dlm/dlmdebug.h +++ b/fs/ocfs2/dlm/dlmdebug.h @@ -31,6 +31,7 @@ struct dlm_debug_ctxt { struct kref debug_refcnt; struct dentry *debug_state_dentry; struct dentry *debug_lockres_dentry; + struct dentry *debug_mle_dentry; }; struct debug_buffer { -- 1.5.4.1
Mark Fasheh
2008-Apr-02 20:24 UTC
[Ocfs2-devel] [PATCH 59/62] ocfs2: Convert ocfs2 over to unlocked_ioctl
From: Andi Kleen <andi-suse at firstfloor.org> As far as I can see there is nothing in ocfs2_ioctl that requires the BKL, so use unlocked_ioctl Signed-off-by: Andi Kleen <ak at suse.de> Signed-off-by: Mark Fasheh <mfasheh at suse.com> --- fs/ocfs2/file.c | 4 ++-- fs/ocfs2/ioctl.c | 12 +++--------- fs/ocfs2/ioctl.h | 3 +-- 3 files changed, 6 insertions(+), 13 deletions(-) diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index ed5d523..9154c82 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -2242,7 +2242,7 @@ const struct file_operations ocfs2_fops = { .open = ocfs2_file_open, .aio_read = ocfs2_file_aio_read, .aio_write = ocfs2_file_aio_write, - .ioctl = ocfs2_ioctl, + .unlocked_ioctl = ocfs2_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = ocfs2_compat_ioctl, #endif @@ -2258,7 +2258,7 @@ const struct file_operations ocfs2_dops = { .fsync = ocfs2_sync_file, .release = ocfs2_dir_release, .open = ocfs2_dir_open, - .ioctl = ocfs2_ioctl, + .unlocked_ioctl = ocfs2_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = ocfs2_compat_ioctl, #endif diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c index ab1c216..b413166 100644 --- a/fs/ocfs2/ioctl.c +++ b/fs/ocfs2/ioctl.c @@ -113,9 +113,9 @@ bail: return status; } -int ocfs2_ioctl(struct inode * inode, struct file * filp, - unsigned int cmd, unsigned long arg) +long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { + struct inode *inode = filp->f_path.dentry->d_inode; unsigned int flags; int new_clusters; int status; @@ -169,9 +169,6 @@ int ocfs2_ioctl(struct inode * inode, struct file * filp, #ifdef CONFIG_COMPAT long ocfs2_compat_ioctl(struct file *file, unsigned cmd, unsigned long arg) { - struct inode *inode = file->f_path.dentry->d_inode; - int ret; - switch (cmd) { case OCFS2_IOC32_GETFLAGS: cmd = OCFS2_IOC_GETFLAGS; @@ -191,9 +188,6 @@ long ocfs2_compat_ioctl(struct file *file, unsigned cmd, unsigned long arg) return -ENOIOCTLCMD; } - lock_kernel(); - ret = ocfs2_ioctl(inode, file, cmd, arg); - unlock_kernel(); - return ret; + return ocfs2_ioctl(file, cmd, arg); } #endif diff --git a/fs/ocfs2/ioctl.h b/fs/ocfs2/ioctl.h index 4d6c4f4..cf9a5ee 100644 --- a/fs/ocfs2/ioctl.h +++ b/fs/ocfs2/ioctl.h @@ -10,8 +10,7 @@ #ifndef OCFS2_IOCTL_H #define OCFS2_IOCTL_H -int ocfs2_ioctl(struct inode * inode, struct file * filp, - unsigned int cmd, unsigned long arg); +long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg); long ocfs2_compat_ioctl(struct file *file, unsigned cmd, unsigned long arg); #endif /* OCFS2_IOCTL_H */ -- 1.5.4.1
From: Julia Lawall <julia at diku.dk> if (...) BUG(); should be replaced with BUG_ON(...) when the test has no side-effects to allow a definition of BUG_ON that drops the code completely. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @ disable unlikely @ expression E,f; @@ ( if (<... f(...) ...>) { BUG(); } | - if (unlikely(E)) { BUG(); } + BUG_ON(E); ) @@ expression E,f; @@ ( if (<... f(...) ...>) { BUG(); } | - if (E) { BUG(); } + BUG_ON(E); ) // </smpl> Signed-off-by: Julia Lawall <julia at diku.dk> Signed-off-by: Andrew Morton <akpm at linux-foundation.org> Signed-off-by: Mark Fasheh <mfasheh at suse.com> --- fs/ocfs2/alloc.c | 3 +-- fs/ocfs2/journal.c | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c index a268821..41f84c9 100644 --- a/fs/ocfs2/alloc.c +++ b/fs/ocfs2/alloc.c @@ -1029,8 +1029,7 @@ static void ocfs2_rotate_leaf(struct ocfs2_extent_list *el, BUG_ON(!next_free); /* The tree code before us didn't allow enough room in the leaf. */ - if (el->l_next_free_rec == el->l_count && !has_empty) - BUG(); + BUG_ON(el->l_next_free_rec == el->l_count && !has_empty); /* * The easiest way to approach this is to just remove the diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index bffd2d7..9698338 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -717,8 +717,7 @@ int ocfs2_journal_load(struct ocfs2_journal *journal, int local) mlog_entry_void(); - if (!journal) - BUG(); + BUG_ON(!journal); osb = journal->j_osb; -- 1.5.4.1
Mark Fasheh
2008-Apr-02 20:24 UTC
[Ocfs2-devel] [PATCH 61/62] ocfs2: Put tree in MAINTAINERS
From: Joel Becker <joel.becker at oracle.com> The ocfs2 MAINTAINERS entry should have the git tree URL. Signed-off-by: Joel Becker <joel.becker at oracle.com> Signed-off-by: Mark Fasheh <mfasheh at suse.com> --- MAINTAINERS | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 90dcbbc..5c76ad3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2946,6 +2946,7 @@ P: Joel Becker M: joel.becker at oracle.com L: ocfs2-devel at oss.oracle.com W: http://oss.oracle.com/projects/ocfs2/ +T: git git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2.git S: Supported OMNIKEY CARDMAN 4000 DRIVER -- 1.5.4.1
Mark Fasheh
2008-Apr-02 20:24 UTC
[Ocfs2-devel] [PATCH 62/62] ocfs2/cluster: Get rid of arguments to the timeout routines
From: Jeff Mahoney <jeffm at suse.com> We keep seeing bug reports related to NULL pointer derefs in o2net_set_nn_state(). When I originally wrote up the configurable timeout patch, I had tried to plan for multiple clusters. This was silly. The timeout routines all use o2nm_single_cluster so there's no point in passing an argument at all. This patch removes the arguments and kills those bugs dead. Signed-off-by: Jeff Mahoney <jeffm at suse.com> Signed-off-by: Mark Fasheh <mfasheh at suse.com> --- fs/ocfs2/cluster/tcp.c | 47 ++++++++++++++++++++--------------------------- 1 files changed, 20 insertions(+), 27 deletions(-) diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c index 4ea4b0a..1170918 100644 --- a/fs/ocfs2/cluster/tcp.c +++ b/fs/ocfs2/cluster/tcp.c @@ -142,23 +142,17 @@ static void o2net_idle_timer(unsigned long data); static void o2net_sc_postpone_idle(struct o2net_sock_container *sc); static void o2net_sc_reset_idle_timer(struct o2net_sock_container *sc); -/* - * FIXME: These should use to_o2nm_cluster_from_node(), but we end up - * losing our parent link to the cluster during shutdown. This can be - * solved by adding a pre-removal callback to configfs, or passing - * around the cluster with the node. -jeffm - */ -static inline int o2net_reconnect_delay(struct o2nm_node *node) +static inline int o2net_reconnect_delay(void) { return o2nm_single_cluster->cl_reconnect_delay_ms; } -static inline int o2net_keepalive_delay(struct o2nm_node *node) +static inline int o2net_keepalive_delay(void) { return o2nm_single_cluster->cl_keepalive_delay_ms; } -static inline int o2net_idle_timeout(struct o2nm_node *node) +static inline int o2net_idle_timeout(void) { return o2nm_single_cluster->cl_idle_timeout_ms; } @@ -444,9 +438,9 @@ static void o2net_set_nn_state(struct o2net_node *nn, /* delay if we're withing a RECONNECT_DELAY of the * last attempt */ delay = (nn->nn_last_connect_attempt + - msecs_to_jiffies(o2net_reconnect_delay(NULL))) + msecs_to_jiffies(o2net_reconnect_delay())) - jiffies; - if (delay > msecs_to_jiffies(o2net_reconnect_delay(NULL))) + if (delay > msecs_to_jiffies(o2net_reconnect_delay())) delay = 0; mlog(ML_CONN, "queueing conn attempt in %lu jiffies\n", delay); queue_delayed_work(o2net_wq, &nn->nn_connect_work, delay); @@ -460,7 +454,7 @@ static void o2net_set_nn_state(struct o2net_node *nn, * the connect_expired work will do anything. The rest will see * that it's already queued and do nothing. */ - delay += msecs_to_jiffies(o2net_idle_timeout(NULL)); + delay += msecs_to_jiffies(o2net_idle_timeout()); queue_delayed_work(o2net_wq, &nn->nn_connect_expired, delay); } @@ -1159,23 +1153,23 @@ static int o2net_check_handshake(struct o2net_sock_container *sc) * but isn't. This can ultimately cause corruption. */ if (be32_to_cpu(hand->o2net_idle_timeout_ms) !- o2net_idle_timeout(sc->sc_node)) { + o2net_idle_timeout()) { mlog(ML_NOTICE, SC_NODEF_FMT " uses a network idle timeout of " "%u ms, but we use %u ms locally. disconnecting\n", SC_NODEF_ARGS(sc), be32_to_cpu(hand->o2net_idle_timeout_ms), - o2net_idle_timeout(sc->sc_node)); + o2net_idle_timeout()); o2net_ensure_shutdown(nn, sc, -ENOTCONN); return -1; } if (be32_to_cpu(hand->o2net_keepalive_delay_ms) !- o2net_keepalive_delay(sc->sc_node)) { + o2net_keepalive_delay()) { mlog(ML_NOTICE, SC_NODEF_FMT " uses a keepalive delay of " "%u ms, but we use %u ms locally. disconnecting\n", SC_NODEF_ARGS(sc), be32_to_cpu(hand->o2net_keepalive_delay_ms), - o2net_keepalive_delay(sc->sc_node)); + o2net_keepalive_delay()); o2net_ensure_shutdown(nn, sc, -ENOTCONN); return -1; } @@ -1353,12 +1347,11 @@ static void o2net_initialize_handshake(void) { o2net_hand->o2hb_heartbeat_timeout_ms = cpu_to_be32( O2HB_MAX_WRITE_TIMEOUT_MS); - o2net_hand->o2net_idle_timeout_ms = cpu_to_be32( - o2net_idle_timeout(NULL)); + o2net_hand->o2net_idle_timeout_ms = cpu_to_be32(o2net_idle_timeout()); o2net_hand->o2net_keepalive_delay_ms = cpu_to_be32( - o2net_keepalive_delay(NULL)); + o2net_keepalive_delay()); o2net_hand->o2net_reconnect_delay_ms = cpu_to_be32( - o2net_reconnect_delay(NULL)); + o2net_reconnect_delay()); } /* ------------------------------------------------------------ */ @@ -1404,8 +1397,8 @@ static void o2net_idle_timer(unsigned long data) printk(KERN_INFO "o2net: connection to " SC_NODEF_FMT " has been idle for %u.%u " "seconds, shutting it down.\n", SC_NODEF_ARGS(sc), - o2net_idle_timeout(sc->sc_node) / 1000, - o2net_idle_timeout(sc->sc_node) % 1000); + o2net_idle_timeout() / 1000, + o2net_idle_timeout() % 1000); mlog(ML_NOTICE, "here are some times that might help debug the " "situation: (tmr %ld.%ld now %ld.%ld dr %ld.%ld adv " "%ld.%ld:%ld.%ld func (%08x:%u) %ld.%ld:%ld.%ld)\n", @@ -1433,10 +1426,10 @@ static void o2net_sc_reset_idle_timer(struct o2net_sock_container *sc) { o2net_sc_cancel_delayed_work(sc, &sc->sc_keepalive_work); o2net_sc_queue_delayed_work(sc, &sc->sc_keepalive_work, - msecs_to_jiffies(o2net_keepalive_delay(sc->sc_node))); + msecs_to_jiffies(o2net_keepalive_delay())); do_gettimeofday(&sc->sc_tv_timer); mod_timer(&sc->sc_idle_timeout, - jiffies + msecs_to_jiffies(o2net_idle_timeout(sc->sc_node))); + jiffies + msecs_to_jiffies(o2net_idle_timeout())); } static void o2net_sc_postpone_idle(struct o2net_sock_container *sc) @@ -1578,8 +1571,8 @@ static void o2net_connect_expired(struct work_struct *work) mlog(ML_ERROR, "no connection established with node %u after " "%u.%u seconds, giving up and returning errors.\n", o2net_num_from_nn(nn), - o2net_idle_timeout(NULL) / 1000, - o2net_idle_timeout(NULL) % 1000); + o2net_idle_timeout() / 1000, + o2net_idle_timeout() % 1000); o2net_set_nn_state(nn, NULL, 0, -ENOTCONN); } @@ -1634,7 +1627,7 @@ static void o2net_hb_node_up_cb(struct o2nm_node *node, int node_num, /* ensure an immediate connect attempt */ nn->nn_last_connect_attempt = jiffies - - (msecs_to_jiffies(o2net_reconnect_delay(node)) + 1); + (msecs_to_jiffies(o2net_reconnect_delay()) + 1); if (node_num != o2nm_this_node()) { /* believe it or not, accept and node hearbeating testing -- 1.5.4.1