From patchwork Thu Dec  8 18:39:54 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: David Disseldorp <ddiss@suse.de>
X-Patchwork-Id: 87333
Delivered-To: patch@linaro.org
Received: by 10.140.20.101 with SMTP id 92csp992417qgi;
 Thu, 8 Dec 2016 10:40:01 -0800 (PST)
X-Received: by 10.99.171.65 with SMTP id k1mr133684751pgp.87.1481222401174; 
 Thu, 08 Dec 2016 10:40:01 -0800 (PST)
Return-Path: <ceph-devel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 k189si29905289pgd.312.2016.12.08.10.40.01; 
 Thu, 08 Dec 2016 10:40:01 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 ceph-devel-owner@vger.kernel.org designates 209.132.180.67 as
 permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 spf=pass (google.com: best guess record for domain of
 ceph-devel-owner@vger.kernel.org designates 209.132.180.67 as
 permitted sender)
 smtp.mailfrom=ceph-devel-owner@vger.kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1752691AbcLHSj6 (ORCPT <rfc822;patch@linaro.org> + 3 others);
 Thu, 8 Dec 2016 13:39:58 -0500
Received: from mx2.suse.de ([195.135.220.15]:48463 "EHLO mx2.suse.de"
 rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
 id S1752105AbcLHSj5 (ORCPT <rfc822;ceph-devel@vger.kernel.org>);
 Thu, 8 Dec 2016 13:39:57 -0500
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254])
 by mx2.suse.de (Postfix) with ESMTP id C5008AABE;
 Thu,  8 Dec 2016 18:39:54 +0000 (UTC)
Date: Thu, 8 Dec 2016 19:39:54 +0100
From: David Disseldorp <ddiss@suse.de>
To: Amitay Isaacs <amitay@gmail.com>
Cc: Samba Technical <samba-technical@lists.samba.org>,
 "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: [PATCH] Ceph RADOS cluster mutex helper for Samba CTDB
Message-ID: <20161208193954.7ce6b896@suse.de>
In-Reply-To: <CAJ+X7mRh04D+Yvtf0xx3dT6rTa=9KvJagyK=1PJQC=1R+u++7w@mail.gmail.com>
References: <20161201151715.019228c1@suse.de>
 <CAJ+X7mTkBLQDYb+r9LELQe-sqfG_4YkQ9HbkDFAp70cPp7V8zA@mail.gmail.com>
 <20161206131404.38f737d0@suse.de> <20161206131827.273bd9b8@suse.de>
 <CAJ+X7mRh04D+Yvtf0xx3dT6rTa=9KvJagyK=1PJQC=1R+u++7w@mail.gmail.com>
X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.31; x86_64-suse-linux-gnu)
MIME-Version: 1.0
Sender: ceph-devel-owner@vger.kernel.org
Precedence: bulk
List-ID: <ceph-devel.vger.kernel.org>
X-Mailing-List: ceph-devel@vger.kernel.org

Hi Amitay,

On Wed, 7 Dec 2016 13:32:34 +1100, Amitay Isaacs wrote:

> Hi David,
> 
> On Tue, Dec 6, 2016 at 11:18 PM, David Disseldorp <ddiss@suse.de> wrote:
> 
> > This time with the patch-set attached...
> >  
> > >  ctdb/doc/Makefile                             |   3 +-
> > >  ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml   |  90 +++++
> > >  .../utils/ceph/ctdb_mutex_ceph_rados_helper.c | 334 ++++++++++++++++++
> > >  ctdb/utils/ceph/test_ceph_rados_reclock.sh    | 151 ++++++++
> > >  ctdb/wscript                                  |  19 +
> > >  5 files changed, 596 insertions(+), 1 deletion(-)  
> >  
> 
> In patch 1, why do you need to include any of the CTDB files
> (protocol/protocol.h and common/system.h) and have dependency on
> ctdb-system?  I don't see you are using any of the functions defined in
> common/system.h.
> 
> Please include the manpage in SAMBA_BINARY() definition. Also include it in
> manpages[] list.  It might be better to merge patch 1 and patch 2.

Thanks for the feedback. Please find a new version attached (atop the
etcd changes), attempting to address your points above:
- drop unnecessary includes and ctdb-system dependency
  + add separate talloc and tevent deps
  + use tevent_timeval_current_ofs() instead of timeval_current_ofs()
- conditionally generate the man page

Cheers, David

>From 54c16ac1dfafb06111aeafb2377b06bd5db36994 Mon Sep 17 00:00:00 2001
From: David Disseldorp <ddiss@samba.org>
Date: Thu, 1 Dec 2016 13:33:22 +0100
Subject: [PATCH 1/3] ctdb: cluster mutex helper using Ceph RADOS

ctdb_mutex_ceph_rados_helper implements the cluster mutex helper API
atop Ceph using the librados rados_lock_exclusive()/rados_unlock()
functionality.

Once configured, split brain avoidance during CTDB recovery will be
handled using locks against an object located in a Ceph RADOS pool.

Signed-off-by: David Disseldorp <ddiss@samba.org>
---
 ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c | 328 +++++++++++++++++++++++++
 ctdb/wscript                                   |  19 ++
 2 files changed, 347 insertions(+)
 create mode 100644 ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c

diff --git a/ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c b/ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c
new file mode 100644
index 0000000..326a0b0
--- /dev/null
+++ b/ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c
@@ -0,0 +1,328 @@
+/*
+   CTDB mutex helper using Ceph librados locks
+
+   Copyright (C) David Disseldorp 2016
+
+   Based on ctdb_mutex_fcntl_helper.c, which is:
+   Copyright (C) Martin Schwenke 2015
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include "replace.h"
+
+#include "tevent.h"
+#include "talloc.h"
+#include "rados/librados.h"
+
+#define CTDB_MUTEX_CEPH_LOCK_NAME	"ctdb_reclock_mutex"
+#define CTDB_MUTEX_CEPH_LOCK_COOKIE	CTDB_MUTEX_CEPH_LOCK_NAME
+#define CTDB_MUTEX_CEPH_LOCK_DESC	"CTDB recovery lock"
+
+#define CTDB_MUTEX_STATUS_HOLDING "0"
+#define CTDB_MUTEX_STATUS_CONTENDED "1"
+#define CTDB_MUTEX_STATUS_TIMEOUT "2"
+#define CTDB_MUTEX_STATUS_ERROR "3"
+
+static char *progname = NULL;
+
+static int ctdb_mutex_rados_ctx_create(const char *ceph_cluster_name,
+				       const char *ceph_auth_name,
+				       const char *pool_name,
+				       rados_t *_ceph_cluster,
+				       rados_ioctx_t *_ioctx)
+{
+	rados_t ceph_cluster = NULL;
+	rados_ioctx_t ioctx = NULL;
+	int ret;
+
+	ret = rados_create2(&ceph_cluster, ceph_cluster_name, ceph_auth_name, 0);
+	if (ret < 0) {
+		fprintf(stderr, "%s: failed to initialise Ceph cluster %s as %s"
+			" - (%s)\n", progname, ceph_cluster_name, ceph_auth_name,
+			strerror(-ret));
+		return ret;
+	}
+
+	/* path=NULL tells librados to use default locations */
+	ret = rados_conf_read_file(ceph_cluster, NULL);
+	if (ret < 0) {
+		fprintf(stderr, "%s: failed to parse Ceph cluster config"
+			" - (%s)\n", progname, strerror(-ret));
+		rados_shutdown(ceph_cluster);
+		return ret;
+	}
+
+	ret = rados_connect(ceph_cluster);
+	if (ret < 0) {
+		fprintf(stderr, "%s: failed to connect to Ceph cluster %s as %s"
+			" - (%s)\n", progname, ceph_cluster_name, ceph_auth_name,
+			strerror(-ret));
+		rados_shutdown(ceph_cluster);
+		return ret;
+	}
+
+
+	ret = rados_ioctx_create(ceph_cluster, pool_name, &ioctx);
+	if (ret < 0) {
+		fprintf(stderr, "%s: failed to create Ceph ioctx for pool %s"
+			" - (%s)\n", progname, pool_name, strerror(-ret));
+		rados_shutdown(ceph_cluster);
+		return ret;
+	}
+
+	*_ceph_cluster = ceph_cluster;
+	*_ioctx = ioctx;
+
+	return 0;
+}
+
+static void ctdb_mutex_rados_ctx_destroy(rados_t ceph_cluster,
+					 rados_ioctx_t ioctx)
+{
+	rados_ioctx_destroy(ioctx);
+	rados_shutdown(ceph_cluster);
+}
+
+static int ctdb_mutex_rados_lock(rados_ioctx_t *ioctx,
+				 const char *oid)
+{
+	int ret;
+
+	ret = rados_lock_exclusive(ioctx, oid,
+                                   CTDB_MUTEX_CEPH_LOCK_NAME,
+				   CTDB_MUTEX_CEPH_LOCK_COOKIE,
+				   CTDB_MUTEX_CEPH_LOCK_DESC,
+                                   NULL, /* infinite duration */
+                                   0);
+	if ((ret == -EEXIST) || (ret == -EBUSY)) {
+		/* lock contention */
+		return ret;
+	} else if (ret < 0) {
+		/* unexpected failure */
+		fprintf(stderr,
+			"%s: Failed to get lock on RADOS object '%s' - (%s)\n",
+			progname, oid, strerror(-ret));
+		return ret;
+	}
+
+	/* lock obtained */
+	return 0;
+}
+
+static int ctdb_mutex_rados_unlock(rados_ioctx_t *ioctx,
+				   const char *oid)
+{
+	int ret;
+
+	ret = rados_unlock(ioctx, oid,
+			   CTDB_MUTEX_CEPH_LOCK_NAME,
+			   CTDB_MUTEX_CEPH_LOCK_COOKIE);
+	if (ret < 0) {
+		fprintf(stderr,
+			"%s: Failed to drop lock on RADOS object '%s' - (%s)\n",
+			progname, oid, strerror(-ret));
+		return ret;
+	}
+
+	return 0;
+}
+
+struct ctdb_mutex_rados_state {
+	bool holding_mutex;
+	const char *ceph_cluster_name;
+	const char *ceph_auth_name;
+	const char *pool_name;
+	const char *object;
+	int ppid;
+	struct tevent_context *ev;
+	struct tevent_signal *sig_ev;
+	struct tevent_timer *timer_ev;
+	rados_t ceph_cluster;
+	rados_ioctx_t ioctx;
+};
+
+static void ctdb_mutex_rados_sigterm_cb(struct tevent_context *ev,
+					struct tevent_signal *se,
+					int signum,
+					int count,
+					void *siginfo,
+					void *private_data)
+{
+	struct ctdb_mutex_rados_state *cmr_state = private_data;
+	int ret;
+
+	if (!cmr_state->holding_mutex) {
+		fprintf(stderr, "Sigterm callback invoked without mutex!\n");
+		ret = -EINVAL;
+		goto err_ctx_cleanup;
+	}
+
+	ret = ctdb_mutex_rados_unlock(cmr_state->ioctx, cmr_state->object);
+err_ctx_cleanup:
+	ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster,
+				     cmr_state->ioctx);
+	talloc_free(cmr_state);
+	exit(ret ? 1 : 0);
+}
+
+static void ctdb_mutex_rados_timer_cb(struct tevent_context *ev,
+				      struct tevent_timer *te,
+				      struct timeval current_time,
+				      void *private_data)
+{
+	struct ctdb_mutex_rados_state *cmr_state = private_data;
+	int ret;
+
+	if (!cmr_state->holding_mutex) {
+		fprintf(stderr, "Timer callback invoked without mutex!\n");
+		ret = -EINVAL;
+		goto err_ctx_cleanup;
+	}
+
+	if ((kill(cmr_state->ppid, 0) == 0) || (errno != ESRCH)) {
+		/* parent still around, keep waiting */
+		cmr_state->timer_ev = tevent_add_timer(cmr_state->ev, cmr_state,
+					       tevent_timeval_current_ofs(5, 0),
+						      ctdb_mutex_rados_timer_cb,
+						       cmr_state);
+		if (cmr_state->timer_ev == NULL) {
+			fprintf(stderr, "Failed to create timer event\n");
+			/* rely on signal cb */
+		}
+		return;
+	}
+
+	/* parent ended, drop lock and exit */
+	ret = ctdb_mutex_rados_unlock(cmr_state->ioctx, cmr_state->object);
+err_ctx_cleanup:
+	ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster,
+				     cmr_state->ioctx);
+	talloc_free(cmr_state);
+	exit(ret ? 1 : 0);
+}
+
+int main(int argc, char *argv[])
+{
+	int ret;
+	struct ctdb_mutex_rados_state *cmr_state;
+
+	progname = argv[0];
+
+	if (argc != 5) {
+		fprintf(stderr, "Usage: %s <Ceph Cluster> <Ceph user> "
+				"<RADOS pool> <RADOS object>\n",
+			progname);
+		ret = -EINVAL;
+		goto err_out;
+	}
+
+	ret = setvbuf(stdout, NULL, _IONBF, 0);
+	if (ret != 0) {
+		fprintf(stderr, "Failed to configure unbuffered stdout I/O\n");
+	}
+
+	cmr_state = talloc_zero(NULL, struct ctdb_mutex_rados_state);
+	if (cmr_state == NULL) {
+		fprintf(stdout, CTDB_MUTEX_STATUS_ERROR);
+		ret = -ENOMEM;
+		goto err_out;
+	}
+
+	cmr_state->ceph_cluster_name = argv[1];
+	cmr_state->ceph_auth_name = argv[2];
+	cmr_state->pool_name = argv[3];
+	cmr_state->object = argv[4];
+
+	cmr_state->ppid = getppid();
+	if (cmr_state->ppid == 1) {
+		/*
+		 * The original parent is gone and the process has
+		 * been reparented to init.  This can happen if the
+		 * helper is started just as the parent is killed
+		 * during shutdown.  The error message doesn't need to
+		 * be stellar, since there won't be anything around to
+		 * capture and log it...
+		 */
+		fprintf(stderr, "%s: PPID == 1\n", progname);
+		ret = -EPIPE;
+		goto err_state_free;
+	}
+
+	cmr_state->ev = tevent_context_init(cmr_state);
+	if (cmr_state->ev == NULL) {
+		fprintf(stderr, "tevent_context_init failed\n");
+		fprintf(stdout, CTDB_MUTEX_STATUS_ERROR);
+		ret = -ENOMEM;
+		goto err_state_free;
+	}
+
+	/* wait for sigterm */
+	cmr_state->sig_ev = tevent_add_signal(cmr_state->ev, cmr_state, SIGTERM, 0,
+					      ctdb_mutex_rados_sigterm_cb,
+					      cmr_state);
+	if (cmr_state->sig_ev == NULL) {
+		fprintf(stderr, "Failed to create signal event\n");
+		fprintf(stdout, CTDB_MUTEX_STATUS_ERROR);
+		ret = -ENOMEM;
+		goto err_state_free;
+	}
+
+	/* periodically check parent */
+	cmr_state->timer_ev = tevent_add_timer(cmr_state->ev, cmr_state,
+					       tevent_timeval_current_ofs(5, 0),
+					       ctdb_mutex_rados_timer_cb,
+					       cmr_state);
+	if (cmr_state->timer_ev == NULL) {
+		fprintf(stderr, "Failed to create timer event\n");
+		fprintf(stdout, CTDB_MUTEX_STATUS_ERROR);
+		ret = -ENOMEM;
+		goto err_state_free;
+	}
+
+	ret = ctdb_mutex_rados_ctx_create(cmr_state->ceph_cluster_name,
+					  cmr_state->ceph_auth_name,
+					  cmr_state->pool_name,
+					  &cmr_state->ceph_cluster,
+					  &cmr_state->ioctx);
+	if (ret < 0) {
+		fprintf(stdout, CTDB_MUTEX_STATUS_ERROR);
+		goto err_state_free;
+	}
+
+	ret = ctdb_mutex_rados_lock(cmr_state->ioctx, cmr_state->object);
+	if ((ret == -EEXIST) || (ret == -EBUSY)) {
+		fprintf(stdout, CTDB_MUTEX_STATUS_CONTENDED);
+		goto err_ctx_cleanup;
+	} else if (ret < 0) {
+		fprintf(stdout, CTDB_MUTEX_STATUS_ERROR);
+		goto err_ctx_cleanup;
+	}
+
+	cmr_state->holding_mutex = true;
+	fprintf(stdout, CTDB_MUTEX_STATUS_HOLDING);
+
+	/* wait for the signal / timer events to do their work */
+	ret = tevent_loop_wait(cmr_state->ev);
+	if (ret < 0) {
+		goto err_ctx_cleanup;
+	}
+err_ctx_cleanup:
+	ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster,
+				     cmr_state->ioctx);
+err_state_free:
+	talloc_free(cmr_state);
+err_out:
+	return ret ? 1 : 0;
+}
diff --git a/ctdb/wscript b/ctdb/wscript
index d7b1891..59bd8e2 100644
--- a/ctdb/wscript
+++ b/ctdb/wscript
@@ -79,6 +79,9 @@ def set_options(opt):
     opt.add_option('--enable-etcd-reclock',
                    help=("Enable etcd recovery lock helper (default=no)"),
                    action="store_true", dest='ctdb_etcd_reclock', default=False)
+    opt.add_option('--enable-ceph-reclock',
+                   help=("Enable Ceph CTDB recovery lock helper (default=no)"),
+                   action="store_true", dest='ctdb_ceph_reclock', default=False)
 
     opt.add_option('--with-logdir',
                    help=("Path to log directory"),
@@ -201,6 +204,15 @@ def configure(conf):
         Logs.info('Building with etcd support')
     conf.env.etcd_reclock = have_etcd_reclock
 
+    if Options.options.ctdb_ceph_reclock:
+        if (conf.CHECK_HEADERS('rados/librados.h', False, False, 'rados') and
+					conf.CHECK_LIB('rados', shlib=True)):
+            Logs.info('Building with Ceph librados recovery lock support')
+            conf.define('HAVE_LIBRADOS', 1)
+        else:
+            Logs.error("Missing librados for Ceph recovery lock support")
+            sys.exit(1)
+
     conf.env.CTDB_BINDIR = os.path.join(conf.env.EXEC_PREFIX, 'bin')
     conf.env.CTDB_ETCDIR = os.path.join(conf.env.SYSCONFDIR, 'ctdb')
     conf.env.CTDB_VARDIR = os.path.join(conf.env.LOCALSTATEDIR, 'lib/ctdb')
@@ -540,6 +552,13 @@ def build(bld):
         bld.INSTALL_FILES('${CTDB_PMDADIR}', 'utils/pmda/README',
                           destname='README')
 
+    if bld.env.HAVE_LIBRADOS:
+        bld.SAMBA_BINARY('ctdb_mutex_ceph_rados_helper',
+                         source='utils/ceph/ctdb_mutex_ceph_rados_helper.c',
+			 deps='talloc tevent rados',
+			 includes='include',
+			 install_path='${CTDB_HELPER_BINDIR}')
+
     sed_expr1 = 's|/usr/local/var/lib/ctdb|%s|g'  % (bld.env.CTDB_VARDIR)
     sed_expr2 = 's|/usr/local/etc/ctdb|%s|g'      % (bld.env.CTDB_ETCDIR)
     sed_expr3 = 's|/usr/local/var/log|%s|g'       % (bld.env.CTDB_LOGDIR)
-- 
2.10.2


>From 35912b7dca417639615ad5662b5a76ee3e25a6ec Mon Sep 17 00:00:00 2001
From: David Disseldorp <ddiss@samba.org>
Date: Thu, 1 Dec 2016 14:22:45 +0100
Subject: [PATCH 2/3] ctdb/doc: man page for Ceph RADOS cluster mutex helper

Signed-off-by: David Disseldorp <ddiss@samba.org>
---
 ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml | 90 +++++++++++++++++++++++++++++
 ctdb/wscript                                | 12 +++-
 2 files changed, 100 insertions(+), 2 deletions(-)
 create mode 100644 ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml

diff --git a/ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml b/ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml
new file mode 100644
index 0000000..e5dedc7
--- /dev/null
+++ b/ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml
@@ -0,0 +1,90 @@
+<?xml version="1.0" encoding="iso-8859-1"?>
+<!DOCTYPE refentry
+	PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+	"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
+<refentry id="ctdb_mutex_ceph_rados_helper.7">
+
+  <refmeta>
+    <refentrytitle>Ceph RADOS Mutex</refentrytitle>
+    <manvolnum>7</manvolnum>
+    <refmiscinfo class="source">ctdb</refmiscinfo>
+    <refmiscinfo class="manual">CTDB - clustered TDB database</refmiscinfo>
+  </refmeta>
+
+  <refnamediv>
+    <refname>ctdb_mutex_ceph_rados_helper</refname>
+    <refpurpose>Ceph RADOS cluster mutex helper</refpurpose>
+  </refnamediv>
+
+  <refsect1>
+    <title>DESCRIPTION</title>
+    <para>
+      ctdb_mutex_ceph_rados_helper_lock can be used as a recovery lock provider
+      for CTDB.  When configured, split brain avoidance during CTDB recovery
+      will be handled using locks against an object located in a Ceph RADOS
+      pool.
+      To enable this functionality, include the following line in your CTDB
+      config file:
+    </para>
+    <screen format="linespecific">
+CTDB_RECOVERY_LOCK="!ctdb_mutex_ceph_rados_helper_lock [Cluster] [User] [Pool] [Object]"
+
+Cluster: Ceph cluster name (e.g. ceph)
+User: Ceph cluster user name (e.g. client.admin)
+Pool: Ceph RADOS pool name
+Object: Ceph RADOS object name
+    </screen>
+    <para>
+      The Ceph cluster <parameter>Cluster</parameter> must be up and running,
+      with a configuration, and keyring file for <parameter>User</parameter>
+      located in a librados default search path (e.g. /etc/ceph/).
+      <parameter>Pool</parameter> must already exist.
+    </para>
+  </refsect1>
+
+  <refsect1>
+    <title>SEE ALSO</title>
+    <para>
+      <citerefentry><refentrytitle>ctdb</refentrytitle>
+      <manvolnum>7</manvolnum></citerefentry>,
+
+      <citerefentry><refentrytitle>ctdbd</refentrytitle>
+      <manvolnum>1</manvolnum></citerefentry>,
+
+      <ulink url="http://ctdb.samba.org/"/>
+    </para>
+  </refsect1>
+
+  <refentryinfo>
+    <author>
+      <contrib>
+	This documentation was written by David Disseldorp
+      </contrib>
+    </author>
+
+    <copyright>
+      <year>2016</year>
+      <holder>David Disseldorp</holder>
+    </copyright>
+    <legalnotice>
+      <para>
+	This program is free software; you can redistribute it and/or
+	modify it under the terms of the GNU General Public License as
+	published by the Free Software Foundation; either version 3 of
+	the License, or (at your option) any later version.
+      </para>
+      <para>
+	This program is distributed in the hope that it will be
+	useful, but WITHOUT ANY WARRANTY; without even the implied
+	warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
+	PURPOSE.  See the GNU General Public License for more details.
+      </para>
+      <para>
+	You should have received a copy of the GNU General Public
+	License along with this program; if not, see
+	<ulink url="http://www.gnu.org/licenses"/>.
+      </para>
+    </legalnotice>
+  </refentryinfo>
+
+</refentry>
diff --git a/ctdb/wscript b/ctdb/wscript
index 59bd8e2..d0e8ec7 100644
--- a/ctdb/wscript
+++ b/ctdb/wscript
@@ -58,6 +58,10 @@ manpages_etcd = [
     'ctdb-etcd.7',
 ]
 
+manpages_ceph = [
+    'ctdb_mutex_ceph_rados_helper.7',
+]
+
 
 def set_options(opt):
     opt.PRIVATE_EXTENSION_DEFAULT('ctdb')
@@ -273,7 +277,9 @@ def configure(conf):
         conf.env.ctdb_prebuilt_manpages = []
         manpages = manpages_binary + manpages_misc
         if conf.env.etcd_reclock:
-            manpages = manpages + manpages_etcd
+            manpages += manpages_etcd
+	if conf.env.HAVE_LIBRADOS:
+            manpages += manpages_ceph
         for m in manpages:
             if os.path.exists(os.path.join("doc", m)):
                 Logs.info("  %s: yes" % (m))
@@ -572,7 +578,9 @@ def build(bld):
 
     manpages_extra = manpages_misc
     if bld.env.etcd_reclock:
-        manpages_extra = manpages_extra + manpages_etcd
+        manpages_extra += manpages_etcd
+    if bld.env.HAVE_LIBRADOS:
+        manpages_extra += manpages_ceph
     for f in manpages_binary + manpages_extra:
         x = '%s.xml' % (f)
         bld.SAMBA_GENERATOR(x,
-- 
2.10.2


>From dbc411675b338ba755c4521a0d859e2c9d67bf87 Mon Sep 17 00:00:00 2001
From: David Disseldorp <ddiss@samba.org>
Date: Tue, 6 Dec 2016 13:03:27 +0100
Subject: [PATCH 3/3] ctdb: add test script for ctdb_mutex_ceph_rados_helper

This standalone test script performs the following:
- using ctdb_mutex_ceph_rados_helper, take a lock on the Ceph RADOS
  object a CLUSTER/$POOL/$OBJECT using the Ceph keyring for $USER
  + confirm that lock is obtained, via ctdb_mutex_ceph_rados_helper "0"
    output
- check RADOS object lock state, using the "rados lock info" command
- attempt to obtain the lock again, using ctdb_mutex_ceph_rados_helper
  + confirm that the lock is not successfully taken
- tell the first locker to drop the lock and exit, via SIGTERM
- once the first locker has exited, attempt to get the lock again
  + confirm that this attempt succeeds

Signed-off-by: David Disseldorp <ddiss@samba.org>
---
 ctdb/utils/ceph/test_ceph_rados_reclock.sh | 151 +++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)
 create mode 100755 ctdb/utils/ceph/test_ceph_rados_reclock.sh

diff --git a/ctdb/utils/ceph/test_ceph_rados_reclock.sh b/ctdb/utils/ceph/test_ceph_rados_reclock.sh
new file mode 100755
index 0000000..1adacf6
--- /dev/null
+++ b/ctdb/utils/ceph/test_ceph_rados_reclock.sh
@@ -0,0 +1,151 @@
+#!/bin/bash
+# standalone test for ctdb_mutex_ceph_rados_helper
+#
+# Copyright (C) David Disseldorp 2016
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, see <http://www.gnu.org/licenses/>.
+
+# XXX The following parameters may require configuration:
+CLUSTER="ceph"				# Name of the Ceph cluster under test
+USER="client.admin"			# Ceph user - a keyring must exist
+POOL="rbd"				# RADOS pool - must exist
+OBJECT="ctdb_reclock"			# RADOS object: target for lock requests
+
+# test procedure:
+# - using ctdb_mutex_ceph_rados_helper, take a lock on the Ceph RADOS object at
+#   CLUSTER/$POOL/$OBJECT using the Ceph keyring for $USER
+#   + confirm that lock is obtained, via ctdb_mutex_ceph_rados_helper "0" output
+# - check RADOS object lock state, using the "rados lock info" command
+# - attempt to obtain the lock again, using ctdb_mutex_ceph_rados_helper
+#   + confirm that the lock is not successfully taken ("1" output=contention)
+# - tell the first locker to drop the lock and exit, via SIGTERM
+# - once the first locker has exited, attempt to get the lock again
+#   + confirm that this attempt succeeds
+
+function _fail() {
+	echo "FAILED: $*"
+	exit 1
+}
+
+# this test requires the Ceph "rados" binary, and "jq" json parser
+which jq > /dev/null || exit 1
+which rados > /dev/null || exit 1
+which ctdb_mutex_ceph_rados_helper || exit 1
+
+TMP_DIR="$(mktemp --directory)" || exit 1
+rados -p "$POOL" rm "$OBJECT"
+
+(ctdb_mutex_ceph_rados_helper "$CLUSTER" "$USER" "$POOL" "$OBJECT" \
+							> ${TMP_DIR}/first) &
+locker_pid=$!
+
+# TODO wait for ctdb_mutex_ceph_rados_helper to write one byte to stdout,
+# indicating lock acquisition success/failure
+sleep 1
+
+first_out=$(cat ${TMP_DIR}/first)
+[ "$first_out" == "0" ] \
+	|| _fail "expected lock acquisition (0), but got $first_out"
+
+rados -p "$POOL" lock info "$OBJECT" ctdb_reclock_mutex \
+						> ${TMP_DIR}/lock_state_first
+
+# echo "with lock: `cat ${TMP_DIR}/lock_state_first`"
+
+LOCK_NAME="$(jq -r '.name' ${TMP_DIR}/lock_state_first)"
+[ "$LOCK_NAME" == "ctdb_reclock_mutex" ] \
+	|| _fail "unexpected lock name: $LOCK_NAME"
+LOCK_TYPE="$(jq -r '.type' ${TMP_DIR}/lock_state_first)"
+[ "$LOCK_TYPE" == "exclusive" ] \
+	|| _fail "unexpected lock type: $LOCK_TYPE"
+
+LOCK_COUNT="$(jq -r '.lockers | length' ${TMP_DIR}/lock_state_first)"
+[ $LOCK_COUNT -eq 1 ] || _fail "expected 1 lock in rados state, got $LOCK_COUNT"
+LOCKER_COOKIE="$(jq -r '.lockers[0].cookie' ${TMP_DIR}/lock_state_first)"
+[ "$LOCKER_COOKIE" == "ctdb_reclock_mutex" ] \
+	|| _fail "unexpected locker cookie: $LOCKER_COOKIE"
+LOCKER_DESC="$(jq -r '.lockers[0].description' ${TMP_DIR}/lock_state_first)"
+[ "$LOCKER_DESC" == "CTDB recovery lock" ] \
+	|| _fail "unexpected locker description: $LOCKER_DESC"
+
+# second attempt while first is still holding the lock - expect failure
+ctdb_mutex_ceph_rados_helper "$CLUSTER" "$USER" "$POOL" "$OBJECT" \
+							> ${TMP_DIR}/second
+second_out=$(cat ${TMP_DIR}/second)
+[ "$second_out" == "1" ] \
+	|| _fail "expected lock contention (1), but got $second_out"
+
+# confirm lock state didn't change
+rados -p "$POOL" lock info "$OBJECT" ctdb_reclock_mutex \
+						> ${TMP_DIR}/lock_state_second
+
+diff ${TMP_DIR}/lock_state_first ${TMP_DIR}/lock_state_second \
+					|| _fail "unexpected lock state change"
+
+# tell first locker to drop the lock and terminate
+kill $locker_pid || exit 1
+
+wait $locker_pid &> /dev/null
+
+rados -p "$POOL" lock info "$OBJECT" ctdb_reclock_mutex \
+						> ${TMP_DIR}/lock_state_third
+# echo "without lock: `cat ${TMP_DIR}/lock_state_third`"
+
+LOCK_NAME="$(jq -r '.name' ${TMP_DIR}/lock_state_third)"
+[ "$LOCK_NAME" == "ctdb_reclock_mutex" ] \
+	|| _fail "unexpected lock name: $LOCK_NAME"
+LOCK_TYPE="$(jq -r '.type' ${TMP_DIR}/lock_state_third)"
+[ "$LOCK_TYPE" == "exclusive" ] \
+	|| _fail "unexpected lock type: $LOCK_TYPE"
+
+LOCK_COUNT="$(jq -r '.lockers | length' ${TMP_DIR}/lock_state_third)"
+[ $LOCK_COUNT -eq 0 ] \
+	|| _fail "didn\'t expect any locks in rados state, got $LOCK_COUNT"
+
+exec >${TMP_DIR}/third -- ctdb_mutex_ceph_rados_helper "$CLUSTER" "$USER" "$POOL" "$OBJECT" &
+locker_pid=$!
+
+sleep 1
+
+rados -p "$POOL" lock info "$OBJECT" ctdb_reclock_mutex \
+						> ${TMP_DIR}/lock_state_fourth
+# echo "with lock again: `cat ${TMP_DIR}/lock_state_fourth`"
+
+LOCK_NAME="$(jq -r '.name' ${TMP_DIR}/lock_state_fourth)"
+[ "$LOCK_NAME" == "ctdb_reclock_mutex" ] \
+	|| _fail "unexpected lock name: $LOCK_NAME"
+LOCK_TYPE="$(jq -r '.type' ${TMP_DIR}/lock_state_fourth)"
+[ "$LOCK_TYPE" == "exclusive" ] \
+	|| _fail "unexpected lock type: $LOCK_TYPE"
+
+LOCK_COUNT="$(jq -r '.lockers | length' ${TMP_DIR}/lock_state_fourth)"
+[ $LOCK_COUNT -eq 1 ] || _fail "expected 1 lock in rados state, got $LOCK_COUNT"
+LOCKER_COOKIE="$(jq -r '.lockers[0].cookie' ${TMP_DIR}/lock_state_fourth)"
+[ "$LOCKER_COOKIE" == "ctdb_reclock_mutex" ] \
+	|| _fail "unexpected locker cookie: $LOCKER_COOKIE"
+LOCKER_DESC="$(jq -r '.lockers[0].description' ${TMP_DIR}/lock_state_fourth)"
+[ "$LOCKER_DESC" == "CTDB recovery lock" ] \
+	|| _fail "unexpected locker description: $LOCKER_DESC"
+
+kill $locker_pid || exit 1
+wait $locker_pid &> /dev/null
+
+third_out=$(cat ${TMP_DIR}/third)
+[ "$third_out" == "0" ] \
+	|| _fail "expected lock acquisition (0), but got $third_out"
+
+rm ${TMP_DIR}/*
+rmdir $TMP_DIR
+
+echo "$0: all tests passed"
-- 
2.10.2