@@ -4979,7 +4979,6 @@ void ceph_mdsc_destroy(struct ceph_fs_client *fsc)
ceph_metric_destroy(&mdsc->metric);
- flush_delayed_work(&mdsc->metric.delayed_work);
fsc->mdsc = NULL;
kfree(mdsc);
dout("mdsc_destroy %p done\n", mdsc);
@@ -302,6 +302,8 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
if (!m)
return;
+ cancel_delayed_work_sync(&m->delayed_work);
+
percpu_counter_destroy(&m->total_inodes);
percpu_counter_destroy(&m->opened_inodes);
percpu_counter_destroy(&m->i_caps_mis);
@@ -309,8 +311,6 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
percpu_counter_destroy(&m->d_lease_mis);
percpu_counter_destroy(&m->d_lease_hit);
- cancel_delayed_work_sync(&m->delayed_work);
-
ceph_put_mds_session(m->session);
}
The first thing metric_delayed_work does is check mdsc->stopping, and then return immediately if it's set. That's good since we would have already torn down the metric structures at this point, otherwise, but there is no locking around mdsc->stopping. It's possible that the ceph_metric_destroy call could race with the delayed_work, in which case we could end up with the delayed_work accessing destroyed percpu variables. At this point in the mdsc teardown, the "stopping" flag has already been set, so there's no benefit to flushing the work. Move the work cancellation in ceph_metric_destroy ahead of the percpu variable destruction, and eliminate the flush_delayed_work call in ceph_mdsc_destroy. Cc: Xiubo Li <xiubli@redhat.com> Fixes: 18f473b384a6 ("ceph: periodically send perf metrics to MDSes") Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/ceph/mds_client.c | 1 - fs/ceph/metric.c | 4 ++-- 2 files changed, 2 insertions(+), 3 deletions(-) v2: just drop the flush call altogether and move the cancel before the percpu variables are destroyed (per Xiubo's suggestion).