Jia Guo
2019-Jan-26 08:19 UTC
[Ocfs2-devel] [PATCH v2] ocfs2: fix a panic problem caused by o2cb_ctl
In the process of creating a node, it will cause NULL pointer
dereference in kernel if o2cb_ctl failed in the interval
(mkdir, o2cb_set_node_attribute(node_num)] in function o2cb_add_node.
The node num is initialized to 0 in function o2nm_node_group_make_item,
o2nm_node_group_drop_item will mistake the node number 0 for a
valid node number when we delete the node before the node number is set
correctly. If the local node number of the current host happens to be 0,
cluster->cl_local_node will be set to O2NM_INVALID_NODE_NUM while
o2hb_thread still running. The panic stack is generated as follows:
o2hb_thread
\-o2hb_do_disk_heartbeat
\-o2hb_check_own_slot
|-slot = ®->hr_slots[o2nm_this_node()];
//o2nm_this_node() return O2NM_INVALID_NODE_NUM
We need to check whether the node number is set when we delete the node.
Signed-off-by: Jia Guo <guojia12 at huawei.com>
---
fs/ocfs2/cluster/nodemanager.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/fs/ocfs2/cluster/nodemanager.c b/fs/ocfs2/cluster/nodemanager.c
index 0e4166c..989abad 100644
--- a/fs/ocfs2/cluster/nodemanager.c
+++ b/fs/ocfs2/cluster/nodemanager.c
@@ -620,14 +620,19 @@ static void o2nm_node_group_drop_item(struct config_group
*group,
{
struct o2nm_node *node = to_o2nm_node(item);
struct o2nm_cluster *cluster = to_o2nm_cluster(group->cg_item.ci_parent);
+ int nd_num_set = 0;
- o2net_disconnect_node(node);
-
- if (cluster->cl_has_local &&
- (cluster->cl_local_node == node->nd_num)) {
- cluster->cl_has_local = 0;
- cluster->cl_local_node = O2NM_INVALID_NODE_NUM;
- o2net_stop_listening(node);
+ /* nd_num might be 0 if the node number hasn't been set.. */
+ if (cluster->cl_nodes[node->nd_num] == node) {
+ nd_num_set = 1;
+ o2net_disconnect_node(node);
+
+ if (cluster->cl_has_local &&
+ (cluster->cl_local_node == node->nd_num)) {
+ cluster->cl_has_local = 0;
+ cluster->cl_local_node = O2NM_INVALID_NODE_NUM;
+ o2net_stop_listening(node);
+ }
}
/* XXX call into net to stop this node from trading messages */
@@ -638,8 +643,7 @@ static void o2nm_node_group_drop_item(struct config_group
*group,
if (node->nd_ipv4_address)
rb_erase(&node->nd_ip_node, &cluster->cl_node_ip_tree);
- /* nd_num might be 0 if the node number hasn't been set.. */
- if (cluster->cl_nodes[node->nd_num] == node) {
+ if (nd_num_set) {
cluster->cl_nodes[node->nd_num] = NULL;
clear_bit(node->nd_num, cluster->cl_nodes_bitmap);
}
Joseph Qi
2019-Jan-28 01:09 UTC
[Ocfs2-devel] [PATCH v2] ocfs2: fix a panic problem caused by o2cb_ctl
Hi, On 19/1/26 16:19, Jia Guo wrote:> In the process of creating a node, it will cause NULL pointer > dereference in kernel if o2cb_ctl failed in the interval > (mkdir, o2cb_set_node_attribute(node_num)] in function o2cb_add_node. > > The node num is initialized to 0 in function o2nm_node_group_make_item, > o2nm_node_group_drop_item will mistake the node number 0 for a > valid node number when we delete the node before the node number is set > correctly. If the local node number of the current host happens to be 0, > cluster->cl_local_node will be set to O2NM_INVALID_NODE_NUM while > o2hb_thread still running. The panic stack is generated as follows: > > o2hb_thread > \-o2hb_do_disk_heartbeat > \-o2hb_check_own_slot > |-slot = ®->hr_slots[o2nm_this_node()]; > //o2nm_this_node() return O2NM_INVALID_NODE_NUM > > We need to check whether the node number is set when we delete the node. > > Signed-off-by: Jia Guo <guojia12 at huawei.com> > --- > fs/ocfs2/cluster/nodemanager.c | 22 +++++++++++++--------- > 1 file changed, 13 insertions(+), 9 deletions(-) > > diff --git a/fs/ocfs2/cluster/nodemanager.c b/fs/ocfs2/cluster/nodemanager.c > index 0e4166c..989abad 100644 > --- a/fs/ocfs2/cluster/nodemanager.c > +++ b/fs/ocfs2/cluster/nodemanager.c > @@ -620,14 +620,19 @@ static void o2nm_node_group_drop_item(struct config_group *group, > { > struct o2nm_node *node = to_o2nm_node(item); > struct o2nm_cluster *cluster = to_o2nm_cluster(group->cg_item.ci_parent); > + int nd_num_set = 0;Could we just add the condition for the first part, and leave the second part no change? Then we won't introduce the local nd_num_set. With the above comments addressed, you can add: Reviewed-by: Joseph Qi <jiangqi903 at gmail.com>> > - o2net_disconnect_node(node); > - > - if (cluster->cl_has_local && > - (cluster->cl_local_node == node->nd_num)) { > - cluster->cl_has_local = 0; > - cluster->cl_local_node = O2NM_INVALID_NODE_NUM; > - o2net_stop_listening(node); > + /* nd_num might be 0 if the node number hasn't been set.. */ > + if (cluster->cl_nodes[node->nd_num] == node) { > + nd_num_set = 1; > + o2net_disconnect_node(node); > + > + if (cluster->cl_has_local && > + (cluster->cl_local_node == node->nd_num)) { > + cluster->cl_has_local = 0; > + cluster->cl_local_node = O2NM_INVALID_NODE_NUM; > + o2net_stop_listening(node); > + } > } > > /* XXX call into net to stop this node from trading messages */ > @@ -638,8 +643,7 @@ static void o2nm_node_group_drop_item(struct config_group *group, > if (node->nd_ipv4_address) > rb_erase(&node->nd_ip_node, &cluster->cl_node_ip_tree); > > - /* nd_num might be 0 if the node number hasn't been set.. */ > - if (cluster->cl_nodes[node->nd_num] == node) { > + if (nd_num_set) { > cluster->cl_nodes[node->nd_num] = NULL; > clear_bit(node->nd_num, cluster->cl_nodes_bitmap); > } >