mbox series

[0/4] ceph: new mount device syntax

Message ID 20210628075545.702106-1-vshankar@redhat.com
Headers show
Series ceph: new mount device syntax | expand

Message

Venky Shankar June 28, 2021, 7:55 a.m. UTC
This series introduces changes Ceph File System mount device string.
Old mount device syntax (source) has the following problems:

mounts to the same cluster but with different fsnames
and/or creds have identical device string which can
confuse xfstests.

Userspace mount helper tool resolves monitor addresses
and fill in mon addrs automatically, but that means the
device shown in /proc/mounts is different than what was
used for mounting.

New device syntax is as follows:

  cephuser@fsid.mycephfs2=/path

Note, there is no "monitor address" in the device string.
That gets passed in as mount option. This keeps the device
string same when monitor addresses change (on remounts).

Also note that the userspace mount helper tool is backward
compatible. I.e., the mount helper will fallback to using
old syntax after trying to mount with the new syntax.

Venky Shankar (4):
  ceph: new device mount syntax
  ceph: validate cluster FSID for new device syntax
  ceph: record updated mon_addr on remount
  doc: document new CephFS mount device syntax

 Documentation/filesystems/ceph.rst |  23 ++++-
 fs/ceph/super.c                    | 132 ++++++++++++++++++++++++++---
 fs/ceph/super.h                    |   4 +
 include/linux/ceph/libceph.h       |   1 +
 net/ceph/ceph_common.c             |   3 +-
 5 files changed, 149 insertions(+), 14 deletions(-)

Comments

Jeff Layton June 28, 2021, 3:19 p.m. UTC | #1
On Mon, 2021-06-28 at 13:25 +0530, Venky Shankar wrote:
> Note that the new monitors are just shown in /proc/mounts.
> Ceph does not (re)connect to new monitors yet.
> 

I wasn't sure we'd want to do that anyway, but now I think it might be a
good idea. Being able to re-point a client to a new set of mons manually
seems like a reasonable thing to be able to do in a disaster recovery
situation or the like.

> Signed-off-by: Venky Shankar <vshankar@redhat.com>
> ---
>  fs/ceph/super.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index 84bc06e51680..48493ac372fa 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -1250,6 +1250,12 @@ static int ceph_reconfigure_fc(struct fs_context *fc)
>  	else
>  		ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
>  
> +	if (strcmp(fsc->mount_options->mon_addr, fsopt->mon_addr)) {
> +		kfree(fsc->mount_options->mon_addr);
> +		fsc->mount_options->mon_addr = kstrdup(fsopt->mon_addr,
> +						       GFP_KERNEL);
> +	}
> +

It's probably worth logging a KERN_NOTICE message or something that the
new monitor addresses were ignored. That way, if you implement
connecting to the new mons on remount later you'd have a way to tell.


>  	sync_filesystem(fc->root->d_sb);
>  	return 0;
>  }
Jeff Layton June 28, 2021, 3:32 p.m. UTC | #2
On Mon, 2021-06-28 at 13:25 +0530, Venky Shankar wrote:
> This series introduces changes Ceph File System mount device string.
> Old mount device syntax (source) has the following problems:
> 
> mounts to the same cluster but with different fsnames
> and/or creds have identical device string which can
> confuse xfstests.
> 
> Userspace mount helper tool resolves monitor addresses
> and fill in mon addrs automatically, but that means the
> device shown in /proc/mounts is different than what was
> used for mounting.
> 
> New device syntax is as follows:
> 
>   cephuser@fsid.mycephfs2=/path
> 
> Note, there is no "monitor address" in the device string.
> That gets passed in as mount option. This keeps the device
> string same when monitor addresses change (on remounts).
> 
> Also note that the userspace mount helper tool is backward
> compatible. I.e., the mount helper will fallback to using
> old syntax after trying to mount with the new syntax.
> 
> Venky Shankar (4):
>   ceph: new device mount syntax
>   ceph: validate cluster FSID for new device syntax
>   ceph: record updated mon_addr on remount
>   doc: document new CephFS mount device syntax
> 
>  Documentation/filesystems/ceph.rst |  23 ++++-
>  fs/ceph/super.c                    | 132 ++++++++++++++++++++++++++---
>  fs/ceph/super.h                    |   4 +
>  include/linux/ceph/libceph.h       |   1 +
>  net/ceph/ceph_common.c             |   3 +-
>  5 files changed, 149 insertions(+), 14 deletions(-)
> 

Nice work, Venky. It needs a few minor changes, but this looks good
overall. Unless anyone has objections or other suggestions for changes,
we ought to aim to get this into the testing branch soon and aim for
merging it in v5.15.

Thoughts?
Venky Shankar June 29, 2021, 4:42 a.m. UTC | #3
On Mon, Jun 28, 2021 at 8:34 PM Jeff Layton <jlayton@redhat.com> wrote:
>

> On Mon, 2021-06-28 at 13:25 +0530, Venky Shankar wrote:

> > The new device syntax requires the cluster FSID as part

> > of the device string. Use this FSID to verify if it matches

> > the cluster FSID we get back from the monitor, failing the

> > mount on mismatch.

> >

> > Signed-off-by: Venky Shankar <vshankar@redhat.com>

> > ---

> >  fs/ceph/super.c              | 9 +++++++++

> >  fs/ceph/super.h              | 1 +

> >  include/linux/ceph/libceph.h | 1 +

> >  net/ceph/ceph_common.c       | 3 ++-

> >  4 files changed, 13 insertions(+), 1 deletion(-)

> >

> > diff --git a/fs/ceph/super.c b/fs/ceph/super.c

> > index 950a28ad9c59..84bc06e51680 100644

> > --- a/fs/ceph/super.c

> > +++ b/fs/ceph/super.c

> > @@ -266,6 +266,9 @@ static int ceph_parse_new_source(const char *dev_name, const char *dev_name_end,

> >       if (!fs_name_start)

> >               return invalfc(fc, "missing file system name");

> >

> > +     if (parse_fsid(fsid_start, &fsopt->fsid))

> > +             return invalfc(fc, "invalid fsid format");

> > +

> >       ++fs_name_start; /* start of file system name */

> >       fsopt->mds_namespace = kstrndup(fs_name_start,

> >                                       dev_name_end - fs_name_start, GFP_KERNEL);

> > @@ -748,6 +751,12 @@ static struct ceph_fs_client *create_fs_client(struct ceph_mount_options *fsopt,

> >       }

> >       opt = NULL; /* fsc->client now owns this */

> >

> > +     /* help learn fsid */

> > +     if (fsopt->new_dev_syntax) {

> > +             ceph_check_fsid(fsc->client, &fsopt->fsid);

> > +             fsc->client->have_fsid = true;

> > +     }

> > +

> >       fsc->client->extra_mon_dispatch = extra_mon_dispatch;

> >       ceph_set_opt(fsc->client, ABORT_ON_FULL);

> >

> > diff --git a/fs/ceph/super.h b/fs/ceph/super.h

> > index 557348ff3203..cfd8ec25a9a8 100644

> > --- a/fs/ceph/super.h

> > +++ b/fs/ceph/super.h

> > @@ -100,6 +100,7 @@ struct ceph_mount_options {

> >       char *server_path;    /* default NULL (means "/") */

> >       char *fscache_uniq;   /* default NULL */

> >       char *mon_addr;

> > +     struct ceph_fsid fsid;

> >  };

> >

> >  struct ceph_fs_client {

> > diff --git a/include/linux/ceph/libceph.h b/include/linux/ceph/libceph.h

> > index 409d8c29bc4f..24c1f4e9144d 100644

> > --- a/include/linux/ceph/libceph.h

> > +++ b/include/linux/ceph/libceph.h

> > @@ -296,6 +296,7 @@ extern bool libceph_compatible(void *data);

> >  extern const char *ceph_msg_type_name(int type);

> >  extern int ceph_check_fsid(struct ceph_client *client, struct ceph_fsid *fsid);

> >  extern void *ceph_kvmalloc(size_t size, gfp_t flags);

> > +extern int parse_fsid(const char *str, struct ceph_fsid *fsid);

> >

> >  struct fs_parameter;

> >  struct fc_log;

> > diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c

> > index 97d6ea763e32..db21734462a4 100644

> > --- a/net/ceph/ceph_common.c

> > +++ b/net/ceph/ceph_common.c

> > @@ -217,7 +217,7 @@ void *ceph_kvmalloc(size_t size, gfp_t flags)

> >       return p;

> >  }

> >

> > -static int parse_fsid(const char *str, struct ceph_fsid *fsid)

> > +int parse_fsid(const char *str, struct ceph_fsid *fsid)

> >  {

> >       int i = 0;

> >       char tmp[3];

> > @@ -247,6 +247,7 @@ static int parse_fsid(const char *str, struct ceph_fsid *fsid)

> >       dout("parse_fsid ret %d got fsid %pU\n", err, fsid);

> >       return err;

> >  }

> > +EXPORT_SYMBOL(parse_fsid);

>

> This function name is too generic. Maybe rename it to "ceph_parse_fsid"?


Makes sense. ACK.

>

> >

> >  /*

> >   * ceph options

>

> --

> Jeff Layton <jlayton@redhat.com>

>



-- 
Cheers,
Venky
Luis Henriques June 29, 2021, 11:22 a.m. UTC | #4
On Mon, Jun 28, 2021 at 01:25:41PM +0530, Venky Shankar wrote:
> This series introduces changes Ceph File System mount device string.

> Old mount device syntax (source) has the following problems:

> 

> mounts to the same cluster but with different fsnames

> and/or creds have identical device string which can

> confuse xfstests.

> 

> Userspace mount helper tool resolves monitor addresses

> and fill in mon addrs automatically, but that means the

> device shown in /proc/mounts is different than what was

> used for mounting.

> 

> New device syntax is as follows:

> 

>   cephuser@fsid.mycephfs2=/path

> 

> Note, there is no "monitor address" in the device string.

> That gets passed in as mount option. This keeps the device

> string same when monitor addresses change (on remounts).

> 

> Also note that the userspace mount helper tool is backward

> compatible. I.e., the mount helper will fallback to using

> old syntax after trying to mount with the new syntax.


I haven't fully reviewed this patchset yet.  I've started doing that (I'll
send a few comments in a bit), but stopped when I found some parsing
issues that need fixing.

I gave these patches a quick test (with a not-so-up-to-date mount.ceph)
and saw the splat below.  Does this patchset depends on anything on the
testing branch?  I've tried it on v5.13 mainline.

I also had a segmentation fault on the userspace mount.  I've used
something like:

mount -t ceph admin@ef274016-6131-4936-9277-946b535f5d03.a=/ /mnt/test

Cheers,
--
Luís

[    7.847565] ------------[ cut here ]------------
[    7.849322] kernel BUG at net/ceph/mon_client.c:209!
[    7.851151] invalid opcode: 0000 [#1] SMP PTI
[    7.852651] CPU: 1 PID: 188 Comm: mount Not tainted 5.13.0+ #32
[    7.854698] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
[    7.858555] RIP: 0010:__open_session+0x186/0x210 [libceph]
[    7.860517] Code: 50 01 00 00 e8 db a9 ff ff 48 8b b3 50 01 00 00 48 89 ef 5b 5d 41 5c e9 68 b6 ff ff e8 43 8e 48 e1 31 d2 f7 f5 e9 c9 fe ff ff <0f> 0b 48 8b 43 08 41 b91
[    7.866902] RSP: 0018:ffffc9000085fda0 EFLAGS: 00010246
[    7.868736] RAX: ffff888114396520 RBX: 0000000000003a98 RCX: 0000000000000000
[    7.871260] RDX: 0000000000000000 RSI: ffff88810eeec2a8 RDI: ffff88810eeec298
[    7.873653] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000000
[    7.875923] R10: ffffc9000085fdc0 R11: 0000000000000000 R12: 00000000ffffffff
[    7.878229] R13: ffff88811aef8500 R14: 00000000fffee289 R15: 7fffffffffffffff
[    7.880503] FS:  00007fda2ac9b800(0000) GS:ffff888237d00000(0000) knlGS:0000000000000000
[    7.883088] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.884915] CR2: 00007f837ba61e88 CR3: 000000010fefc000 CR4: 00000000000006a0
[    7.887174] Call Trace:
[    7.887920]  ceph_monc_open_session+0x43/0x60 [libceph]
[    7.889502]  __ceph_open_session+0x4b/0x250 [libceph]
[    7.891036]  ceph_get_tree+0x41b/0x880 [ceph]
[    7.892337]  vfs_get_tree+0x23/0x90
[    7.893315]  path_mount+0x73d/0xb20
[    7.894291]  __x64_sys_mount+0x103/0x140
[    7.895387]  do_syscall_64+0x45/0x80
[    7.896324]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[    7.897738] RIP: 0033:0x7fda2aeb617e
[    7.898886] Code: 48 8b 0d f5 1c 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 018
[    7.903441] RSP: 002b:00007fff4b0d3e58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[    7.905181] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fda2aeb617e
[    7.906813] RDX: 0000563e07b04c80 RSI: 0000563e07b04d20 RDI: 0000563e07b04ca0
[    7.908506] RBP: 0000563e07b04960 R08: 0000563e07b04be0 R09: 00007fda2af78a60
[    7.910176] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[    7.911830] R13: 0000563e07b04c80 R14: 0000563e07b04ca0 R15: 0000563e07b04960
[    7.913480] Modules linked in: ceph libceph
[    7.914492] ---[ end trace 2798408fec037d5a ]---
[    7.915582] RIP: 0010:__open_session+0x186/0x210 [libceph]
[    7.916853] Code: 50 01 00 00 e8 db a9 ff ff 48 8b b3 50 01 00 00 48 89 ef 5b 5d 41 5c e9 68 b6 ff ff e8 43 8e 48 e1 31 d2 f7 f5 e9 c9 fe ff ff <0f> 0b 48 8b 43 08 41 b91
[    7.921090] RSP: 0018:ffffc9000085fda0 EFLAGS: 00010246
[    7.922288] RAX: ffff888114396520 RBX: 0000000000003a98 RCX: 0000000000000000
[    7.923875] RDX: 0000000000000000 RSI: ffff88810eeec2a8 RDI: ffff88810eeec298
[    7.925323] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000000
[    7.926822] R10: ffffc9000085fdc0 R11: 0000000000000000 R12: 00000000ffffffff
[    7.928320] R13: ffff88811aef8500 R14: 00000000fffee289 R15: 7fffffffffffffff
[    7.929790] FS:  00007fda2ac9b800(0000) GS:ffff888237d00000(0000) knlGS:0000000000000000
[    7.931471] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.932579] CR2: 00007f837ba61e88 CR3: 000000010fefc000 CR4: 00000000000006a0
Venky Shankar June 29, 2021, 12:13 p.m. UTC | #5
On Tue, Jun 29, 2021 at 4:53 PM Luis Henriques <lhenriques@suse.de> wrote:
>

> On Mon, Jun 28, 2021 at 01:25:41PM +0530, Venky Shankar wrote:

> > This series introduces changes Ceph File System mount device string.

> > Old mount device syntax (source) has the following problems:

> >

> > mounts to the same cluster but with different fsnames

> > and/or creds have identical device string which can

> > confuse xfstests.

> >

> > Userspace mount helper tool resolves monitor addresses

> > and fill in mon addrs automatically, but that means the

> > device shown in /proc/mounts is different than what was

> > used for mounting.

> >

> > New device syntax is as follows:

> >

> >   cephuser@fsid.mycephfs2=/path

> >

> > Note, there is no "monitor address" in the device string.

> > That gets passed in as mount option. This keeps the device

> > string same when monitor addresses change (on remounts).

> >

> > Also note that the userspace mount helper tool is backward

> > compatible. I.e., the mount helper will fallback to using

> > old syntax after trying to mount with the new syntax.

>

> I haven't fully reviewed this patchset yet.  I've started doing that (I'll

> send a few comments in a bit), but stopped when I found some parsing

> issues that need fixing.

>

> I gave these patches a quick test (with a not-so-up-to-date mount.ceph)

> and saw the splat below.  Does this patchset depends on anything on the

> testing branch?  I've tried it on v5.13 mainline.


No.

>

> I also had a segmentation fault on the userspace mount.  I've used

> something like:

>

> mount -t ceph admin@ef274016-6131-4936-9277-946b535f5d03.a=/ /mnt/test


I will check and revert (the trace below seems odd -- somewhere during
opening session with mon)

>

> Cheers,

> --

> Luís

>

> [    7.847565] ------------[ cut here ]------------

> [    7.849322] kernel BUG at net/ceph/mon_client.c:209!

> [    7.851151] invalid opcode: 0000 [#1] SMP PTI

> [    7.852651] CPU: 1 PID: 188 Comm: mount Not tainted 5.13.0+ #32

> [    7.854698] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014

> [    7.858555] RIP: 0010:__open_session+0x186/0x210 [libceph]

> [    7.860517] Code: 50 01 00 00 e8 db a9 ff ff 48 8b b3 50 01 00 00 48 89 ef 5b 5d 41 5c e9 68 b6 ff ff e8 43 8e 48 e1 31 d2 f7 f5 e9 c9 fe ff ff <0f> 0b 48 8b 43 08 41 b91

> [    7.866902] RSP: 0018:ffffc9000085fda0 EFLAGS: 00010246

> [    7.868736] RAX: ffff888114396520 RBX: 0000000000003a98 RCX: 0000000000000000

> [    7.871260] RDX: 0000000000000000 RSI: ffff88810eeec2a8 RDI: ffff88810eeec298

> [    7.873653] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000000

> [    7.875923] R10: ffffc9000085fdc0 R11: 0000000000000000 R12: 00000000ffffffff

> [    7.878229] R13: ffff88811aef8500 R14: 00000000fffee289 R15: 7fffffffffffffff

> [    7.880503] FS:  00007fda2ac9b800(0000) GS:ffff888237d00000(0000) knlGS:0000000000000000

> [    7.883088] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

> [    7.884915] CR2: 00007f837ba61e88 CR3: 000000010fefc000 CR4: 00000000000006a0

> [    7.887174] Call Trace:

> [    7.887920]  ceph_monc_open_session+0x43/0x60 [libceph]

> [    7.889502]  __ceph_open_session+0x4b/0x250 [libceph]

> [    7.891036]  ceph_get_tree+0x41b/0x880 [ceph]

> [    7.892337]  vfs_get_tree+0x23/0x90

> [    7.893315]  path_mount+0x73d/0xb20

> [    7.894291]  __x64_sys_mount+0x103/0x140

> [    7.895387]  do_syscall_64+0x45/0x80

> [    7.896324]  entry_SYSCALL_64_after_hwframe+0x44/0xae

> [    7.897738] RIP: 0033:0x7fda2aeb617e

> [    7.898886] Code: 48 8b 0d f5 1c 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 018

> [    7.903441] RSP: 002b:00007fff4b0d3e58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5

> [    7.905181] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fda2aeb617e

> [    7.906813] RDX: 0000563e07b04c80 RSI: 0000563e07b04d20 RDI: 0000563e07b04ca0

> [    7.908506] RBP: 0000563e07b04960 R08: 0000563e07b04be0 R09: 00007fda2af78a60

> [    7.910176] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000

> [    7.911830] R13: 0000563e07b04c80 R14: 0000563e07b04ca0 R15: 0000563e07b04960

> [    7.913480] Modules linked in: ceph libceph

> [    7.914492] ---[ end trace 2798408fec037d5a ]---

> [    7.915582] RIP: 0010:__open_session+0x186/0x210 [libceph]

> [    7.916853] Code: 50 01 00 00 e8 db a9 ff ff 48 8b b3 50 01 00 00 48 89 ef 5b 5d 41 5c e9 68 b6 ff ff e8 43 8e 48 e1 31 d2 f7 f5 e9 c9 fe ff ff <0f> 0b 48 8b 43 08 41 b91

> [    7.921090] RSP: 0018:ffffc9000085fda0 EFLAGS: 00010246

> [    7.922288] RAX: ffff888114396520 RBX: 0000000000003a98 RCX: 0000000000000000

> [    7.923875] RDX: 0000000000000000 RSI: ffff88810eeec2a8 RDI: ffff88810eeec298

> [    7.925323] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000000

> [    7.926822] R10: ffffc9000085fdc0 R11: 0000000000000000 R12: 00000000ffffffff

> [    7.928320] R13: ffff88811aef8500 R14: 00000000fffee289 R15: 7fffffffffffffff

> [    7.929790] FS:  00007fda2ac9b800(0000) GS:ffff888237d00000(0000) knlGS:0000000000000000

> [    7.931471] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

> [    7.932579] CR2: 00007f837ba61e88 CR3: 000000010fefc000 CR4: 00000000000006a0

>



-- 
Cheers,
Venky