mbox series

[v9,00/12] ceph: support idmapped mounts

Message ID 20230804084858.126104-1-aleksandr.mikhalitsyn@canonical.com
Headers show
Series ceph: support idmapped mounts | expand

Message

Aleksandr Mikhalitsyn Aug. 4, 2023, 8:48 a.m. UTC
Dear friends,

This patchset was originally developed by Christian Brauner but I'll continue
to push it forward. Christian allowed me to do that :)

This feature is already actively used/tested with LXD/LXC project.

Git tree (based on https://github.com/ceph/ceph-client.git testing):
v9: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v9
current: https://github.com/mihalicyn/linux/tree/fs.idmapped.ceph

In the version 3 I've changed only two commits:
- fs: export mnt_idmap_get/mnt_idmap_put
- ceph: allow idmapped setattr inode op
and added a new one:
- ceph: pass idmap to __ceph_setattr

In the version 4 I've reworked the ("ceph: stash idmapping in mdsc request")
commit. Now we take idmap refcounter just in place where req->r_mnt_idmap
is filled. It's more safer approach and prevents possible refcounter underflow
on error paths where __register_request wasn't called but ceph_mdsc_release_request is
called.

Changelog for version 5:
- a few commits were squashed into one (as suggested by Xiubo Li)
- started passing an idmapping everywhere (if possible), so a caller
UID/GID-s will be mapped almost everywhere (as suggested by Xiubo Li)

Changelog for version 6:
- rebased on top of testing branch
- passed an idmapping in a few places (readdir, ceph_netfs_issue_op_inline)

Changelog for version 7:
- rebased on top of testing branch
- this thing now requires a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
https://github.com/ceph/ceph/pull/52575

Changelog for version 8:
- rebased on top of testing branch
- added enable_unsafe_idmap module parameter to make idmapped mounts
work with old MDS server versions
- properly handled case when old MDS used with new kernel client

Changelog for version 9:
- added "struct_len" field in struct ceph_mds_request_head as requested by Xiubo Li

I can confirm that this version passes xfstests and
tested with old MDS (without CEPHFS_FEATURE_HAS_OWNER_UIDGID)
and with recent MDS version.

Links to previous versions:
v1: https://lore.kernel.org/all/20220104140414.155198-1-brauner@kernel.org/
v2: https://lore.kernel.org/lkml/20230524153316.476973-1-aleksandr.mikhalitsyn@canonical.com/
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v2
v3: https://lore.kernel.org/lkml/20230607152038.469739-1-aleksandr.mikhalitsyn@canonical.com/#t
v4: https://lore.kernel.org/lkml/20230607180958.645115-1-aleksandr.mikhalitsyn@canonical.com/#t
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v4
v5: https://lore.kernel.org/lkml/20230608154256.562906-1-aleksandr.mikhalitsyn@canonical.com/#t
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v5
v6: https://lore.kernel.org/lkml/20230609093125.252186-1-aleksandr.mikhalitsyn@canonical.com/
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v6
v7: https://lore.kernel.org/all/20230726141026.307690-1-aleksandr.mikhalitsyn@canonical.com/
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v7
v8: https://lore.kernel.org/all/20230803135955.230449-1-aleksandr.mikhalitsyn@canonical.com/
tree: -

Kind regards,
Alex

Original description from Christian:
========================================================================
This patch series enables cephfs to support idmapped mounts, i.e. the
ability to alter ownership information on a per-mount basis.

Container managers such as LXD support sharaing data via cephfs between
the host and unprivileged containers and between unprivileged containers.
They may all use different idmappings. Idmapped mounts can be used to
create mounts with the idmapping used for the container (or a different
one specific to the use-case).

There are in fact more use-cases such as remapping ownership for
mountpoints on the host itself to grant or restrict access to different
users or to make it possible to enforce that programs running as root
will write with a non-zero {g,u}id to disk.

The patch series is simple overall and few changes are needed to cephfs.
There is one cephfs specific issue that I would like to discuss and
solve which I explain in detail in:

[PATCH 02/12] ceph: handle idmapped mounts in create_request_message()

It has to do with how to handle mds serves which have id-based access
restrictions configured. I would ask you to please take a look at the
explanation in the aforementioned patch.

The patch series passes the vfs and idmapped mount testsuite as part of
xfstests. To run it you will need a config like:

[ceph]
export FSTYP=ceph
export TEST_DIR=/mnt/test
export TEST_DEV=10.103.182.10:6789:/
export TEST_FS_MOUNT_OPTS="-o name=admin,secret=$password

and then simply call

sudo ./check -g idmapped

========================================================================

Alexander Mikhalitsyn (3):
  fs: export mnt_idmap_get/mnt_idmap_put
  ceph: add enable_unsafe_idmap module parameter
  ceph: pass idmap to __ceph_setattr

Christian Brauner (9):
  ceph: stash idmapping in mdsc request
  ceph: handle idmapped mounts in create_request_message()
  ceph: pass an idmapping to mknod/symlink/mkdir
  ceph: allow idmapped getattr inode op
  ceph: allow idmapped permission inode op
  ceph: allow idmapped setattr inode op
  ceph/acl: allow idmapped set_acl inode op
  ceph/file: allow idmapped atomic_open inode op
  ceph: allow idmapped mounts

 fs/ceph/acl.c                 |  6 +--
 fs/ceph/crypto.c              |  2 +-
 fs/ceph/dir.c                 |  3 ++
 fs/ceph/file.c                | 10 ++++-
 fs/ceph/inode.c               | 29 +++++++++------
 fs/ceph/mds_client.c          | 70 ++++++++++++++++++++++++++++++++---
 fs/ceph/mds_client.h          |  8 +++-
 fs/ceph/super.c               |  7 +++-
 fs/ceph/super.h               |  3 +-
 fs/mnt_idmapping.c            |  2 +
 include/linux/ceph/ceph_fs.h  |  5 ++-
 include/linux/mnt_idmapping.h |  3 ++
 12 files changed, 121 insertions(+), 27 deletions(-)

Comments

Christian Brauner Aug. 4, 2023, 2:53 p.m. UTC | #1
On Fri, Aug 04, 2023 at 10:48:49AM +0200, Alexander Mikhalitsyn wrote:
> From: Christian Brauner <brauner@kernel.org>
> 
> Inode operations that create a new filesystem object such as ->mknod,
> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> filesystem object.
> 
> In order to ensure that the correct {g,u}id is used map the caller's
> fs{g,u}id for creation requests. This doesn't require complex changes.
> It suffices to pass in the relevant idmapping recorded in the request
> message. If this request message was triggered from an inode operation
> that creates filesystem objects it will have passed down the relevant
> idmaping. If this is a request message that was triggered from an inode
> operation that doens't need to take idmappings into account the initial
> idmapping is passed down which is an identity mapping.
> 
> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
> which adds two new fields (owner_{u,g}id) to the request head structure.
> So, we need to ensure that MDS supports it otherwise we need to fail
> any IO that comes through an idmapped mount because we can't process it
> in a proper way. MDS server without such an extension will use caller_{u,g}id
> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
> values are unmapped. At the same time we can't map these fields with an
> idmapping as it can break UID/GID-based permission checks logic on the
> MDS side. This problem was described with a lot of details at [1], [2].
> 
> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
> 
> Link: https://github.com/ceph/ceph/pull/52575
> Link: https://tracker.ceph.com/issues/62217
> Cc: Xiubo Li <xiubli@redhat.com>
> Cc: Jeff Layton <jlayton@kernel.org>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: ceph-devel@vger.kernel.org
> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---

I like the new extension,
Acked-by: Christian Brauner <brauner@kernel.org>
Xiubo Li Aug. 7, 2023, 1:24 a.m. UTC | #2
On 8/4/23 16:48, Alexander Mikhalitsyn wrote:
> From: Christian Brauner <brauner@kernel.org>
>
> Inode operations that create a new filesystem object such as ->mknod,
> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> filesystem object.
>
> In order to ensure that the correct {g,u}id is used map the caller's
> fs{g,u}id for creation requests. This doesn't require complex changes.
> It suffices to pass in the relevant idmapping recorded in the request
> message. If this request message was triggered from an inode operation
> that creates filesystem objects it will have passed down the relevant
> idmaping. If this is a request message that was triggered from an inode
> operation that doens't need to take idmappings into account the initial
> idmapping is passed down which is an identity mapping.
>
> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
> which adds two new fields (owner_{u,g}id) to the request head structure.
> So, we need to ensure that MDS supports it otherwise we need to fail
> any IO that comes through an idmapped mount because we can't process it
> in a proper way. MDS server without such an extension will use caller_{u,g}id
> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
> values are unmapped. At the same time we can't map these fields with an
> idmapping as it can break UID/GID-based permission checks logic on the
> MDS side. This problem was described with a lot of details at [1], [2].
>
> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
>
> Link: https://github.com/ceph/ceph/pull/52575
> Link: https://tracker.ceph.com/issues/62217
> Cc: Xiubo Li <xiubli@redhat.com>
> Cc: Jeff Layton <jlayton@kernel.org>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: ceph-devel@vger.kernel.org
> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
> v7:
> 	- reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
> v8:
> 	- properly handled case when old MDS used with new kernel client
> ---
>   fs/ceph/mds_client.c         | 47 +++++++++++++++++++++++++++++++++---
>   fs/ceph/mds_client.h         |  5 +++-
>   include/linux/ceph/ceph_fs.h |  5 +++-
>   3 files changed, 52 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 8829f55103da..41e4bf3811c4 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
>   	}
>   }
>   
> +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
> +{
> +	if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
> +		return 1;
> +
> +	if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
> +		return 2;
> +
> +	return CEPH_MDS_REQUEST_HEAD_VERSION;
> +}
> +
>   static struct ceph_mds_request_head_legacy *
>   find_legacy_request_head(void *p, u64 features)
>   {
> @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>   {
>   	int mds = session->s_mds;
>   	struct ceph_mds_client *mdsc = session->s_mdsc;
> +	struct ceph_client *cl = mdsc->fsc->client;
>   	struct ceph_msg *msg;
>   	struct ceph_mds_request_head_legacy *lhead;
>   	const char *path1 = NULL;
> @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>   	void *p, *end;
>   	int ret;
>   	bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
> -	bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
> +	u16 request_head_version = mds_supported_head_version(session);
>   
>   	ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
>   			      req->r_parent, req->r_path1, req->r_ino1.ino,
> @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>   	 */
>   	if (legacy)
>   		len = sizeof(struct ceph_mds_request_head_legacy);
> -	else if (old_version)
> +	else if (request_head_version == 1)
>   		len = sizeof(struct ceph_mds_request_head_old);
> +	else if (request_head_version == 2)
> +		len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
>   	else
>   		len = sizeof(struct ceph_mds_request_head);
>   
> @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>   	lhead = find_legacy_request_head(msg->front.iov_base,
>   					 session->s_con.peer_features);
>   
> +	if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> +	    !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
> +		pr_err_ratelimited_client(cl,
> +			"idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> +			" is not supported by MDS. Fail request with -EIO.\n");
> +
> +		ret = -EIO;
> +		goto out_err;
> +	}
> +
>   	/*
>   	 * The ceph_mds_request_head_legacy didn't contain a version field, and
>   	 * one was added when we moved the message version from 3->4.
> @@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>   	if (legacy) {
>   		msg->hdr.version = cpu_to_le16(3);
>   		p = msg->front.iov_base + sizeof(*lhead);
> -	} else if (old_version) {
> +	} else if (request_head_version == 1) {
>   		struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
>   
>   		msg->hdr.version = cpu_to_le16(4);
>   		ohead->version = cpu_to_le16(1);
>   		p = msg->front.iov_base + sizeof(*ohead);
> +	} else if (request_head_version == 2) {
> +		struct ceph_mds_request_head *nhead = msg->front.iov_base;
> +
> +		msg->hdr.version = cpu_to_le16(6);
> +		nhead->version = cpu_to_le16(2);
> +
> +		p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
>   	} else {
>   		struct ceph_mds_request_head *nhead = msg->front.iov_base;
> +		kuid_t owner_fsuid;
> +		kgid_t owner_fsgid;
>   
>   		msg->hdr.version = cpu_to_le16(6);
>   		nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> +		nhead->struct_len = sizeof(struct ceph_mds_request_head);
> +
> +		owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> +					  VFSUIDT_INIT(req->r_cred->fsuid));
> +		owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> +					  VFSGIDT_INIT(req->r_cred->fsgid));
> +		nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
> +		nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
>   		p = msg->front.iov_base + sizeof(*nhead);
>   	}
>   
> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> index e3bbf3ba8ee8..8f683e8203bd 100644
> --- a/fs/ceph/mds_client.h
> +++ b/fs/ceph/mds_client.h
> @@ -33,8 +33,10 @@ enum ceph_feature_type {
>   	CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
>   	CEPHFS_FEATURE_OP_GETVXATTR,
>   	CEPHFS_FEATURE_32BITS_RETRY_FWD,
> +	CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> +	CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>   
> -	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> +	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>   };
>   
>   #define CEPHFS_FEATURES_CLIENT_SUPPORTED {	\
> @@ -49,6 +51,7 @@ enum ceph_feature_type {
>   	CEPHFS_FEATURE_NOTIFY_SESSION_STATE,	\
>   	CEPHFS_FEATURE_OP_GETVXATTR,		\
>   	CEPHFS_FEATURE_32BITS_RETRY_FWD,	\
> +	CEPHFS_FEATURE_HAS_OWNER_UIDGID,	\
>   }
>   
>   /*
> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> index 5f2301ee88bc..b91699b08f26 100644
> --- a/include/linux/ceph/ceph_fs.h
> +++ b/include/linux/ceph/ceph_fs.h
> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
>   	union ceph_mds_request_args args;
>   } __attribute__ ((packed));
>   
> -#define CEPH_MDS_REQUEST_HEAD_VERSION  2
> +#define CEPH_MDS_REQUEST_HEAD_VERSION  3
>   
>   struct ceph_mds_request_head_old {
>   	__le16 version;                /* struct version */
> @@ -530,6 +530,9 @@ struct ceph_mds_request_head {
>   
>   	__le32 ext_num_retry;          /* new count retry attempts */
>   	__le32 ext_num_fwd;            /* new count fwd attempts */
> +
> +	__le32 struct_len;             /* to store size of struct ceph_mds_request_head */
> +	__le32 owner_uid, owner_gid;   /* used for OPs which create inodes */

Let's also initialize them to -1 for all the other requests as we do in 
your PR.

Thanks

- Xiubo



>   } __attribute__ ((packed));
>   
>   /* cap/lease release record */
Aleksandr Mikhalitsyn Aug. 7, 2023, 6:51 a.m. UTC | #3
On Mon, Aug 7, 2023 at 3:25 AM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 8/4/23 16:48, Alexander Mikhalitsyn wrote:
> > From: Christian Brauner <brauner@kernel.org>
> >
> > Inode operations that create a new filesystem object such as ->mknod,
> > ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> > Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> > filesystem object.
> >
> > In order to ensure that the correct {g,u}id is used map the caller's
> > fs{g,u}id for creation requests. This doesn't require complex changes.
> > It suffices to pass in the relevant idmapping recorded in the request
> > message. If this request message was triggered from an inode operation
> > that creates filesystem objects it will have passed down the relevant
> > idmaping. If this is a request message that was triggered from an inode
> > operation that doens't need to take idmappings into account the initial
> > idmapping is passed down which is an identity mapping.
> >
> > This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
> > which adds two new fields (owner_{u,g}id) to the request head structure.
> > So, we need to ensure that MDS supports it otherwise we need to fail
> > any IO that comes through an idmapped mount because we can't process it
> > in a proper way. MDS server without such an extension will use caller_{u,g}id
> > fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
> > values are unmapped. At the same time we can't map these fields with an
> > idmapping as it can break UID/GID-based permission checks logic on the
> > MDS side. This problem was described with a lot of details at [1], [2].
> >
> > [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> > [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
> >
> > Link: https://github.com/ceph/ceph/pull/52575
> > Link: https://tracker.ceph.com/issues/62217
> > Cc: Xiubo Li <xiubli@redhat.com>
> > Cc: Jeff Layton <jlayton@kernel.org>
> > Cc: Ilya Dryomov <idryomov@gmail.com>
> > Cc: ceph-devel@vger.kernel.org
> > Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > ---
> > v7:
> >       - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
> > v8:
> >       - properly handled case when old MDS used with new kernel client
> > ---
> >   fs/ceph/mds_client.c         | 47 +++++++++++++++++++++++++++++++++---
> >   fs/ceph/mds_client.h         |  5 +++-
> >   include/linux/ceph/ceph_fs.h |  5 +++-
> >   3 files changed, 52 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index 8829f55103da..41e4bf3811c4 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
> >       }
> >   }
> >
> > +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
> > +{
> > +     if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
> > +             return 1;
> > +
> > +     if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
> > +             return 2;
> > +
> > +     return CEPH_MDS_REQUEST_HEAD_VERSION;
> > +}
> > +
> >   static struct ceph_mds_request_head_legacy *
> >   find_legacy_request_head(void *p, u64 features)
> >   {
> > @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >   {
> >       int mds = session->s_mds;
> >       struct ceph_mds_client *mdsc = session->s_mdsc;
> > +     struct ceph_client *cl = mdsc->fsc->client;
> >       struct ceph_msg *msg;
> >       struct ceph_mds_request_head_legacy *lhead;
> >       const char *path1 = NULL;
> > @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >       void *p, *end;
> >       int ret;
> >       bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
> > -     bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
> > +     u16 request_head_version = mds_supported_head_version(session);
> >
> >       ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> >                             req->r_parent, req->r_path1, req->r_ino1.ino,
> > @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >        */
> >       if (legacy)
> >               len = sizeof(struct ceph_mds_request_head_legacy);
> > -     else if (old_version)
> > +     else if (request_head_version == 1)
> >               len = sizeof(struct ceph_mds_request_head_old);
> > +     else if (request_head_version == 2)
> > +             len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >       else
> >               len = sizeof(struct ceph_mds_request_head);
> >
> > @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >       lhead = find_legacy_request_head(msg->front.iov_base,
> >                                        session->s_con.peer_features);
> >
> > +     if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> > +         !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
> > +             pr_err_ratelimited_client(cl,
> > +                     "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> > +                     " is not supported by MDS. Fail request with -EIO.\n");
> > +
> > +             ret = -EIO;
> > +             goto out_err;
> > +     }
> > +
> >       /*
> >        * The ceph_mds_request_head_legacy didn't contain a version field, and
> >        * one was added when we moved the message version from 3->4.
> > @@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >       if (legacy) {
> >               msg->hdr.version = cpu_to_le16(3);
> >               p = msg->front.iov_base + sizeof(*lhead);
> > -     } else if (old_version) {
> > +     } else if (request_head_version == 1) {
> >               struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
> >
> >               msg->hdr.version = cpu_to_le16(4);
> >               ohead->version = cpu_to_le16(1);
> >               p = msg->front.iov_base + sizeof(*ohead);
> > +     } else if (request_head_version == 2) {
> > +             struct ceph_mds_request_head *nhead = msg->front.iov_base;
> > +
> > +             msg->hdr.version = cpu_to_le16(6);
> > +             nhead->version = cpu_to_le16(2);
> > +
> > +             p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >       } else {
> >               struct ceph_mds_request_head *nhead = msg->front.iov_base;
> > +             kuid_t owner_fsuid;
> > +             kgid_t owner_fsgid;
> >
> >               msg->hdr.version = cpu_to_le16(6);
> >               nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> > +             nhead->struct_len = sizeof(struct ceph_mds_request_head);
> > +
> > +             owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> > +                                       VFSUIDT_INIT(req->r_cred->fsuid));
> > +             owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> > +                                       VFSGIDT_INIT(req->r_cred->fsgid));
> > +             nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
> > +             nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
> >               p = msg->front.iov_base + sizeof(*nhead);
> >       }
> >
> > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > index e3bbf3ba8ee8..8f683e8203bd 100644
> > --- a/fs/ceph/mds_client.h
> > +++ b/fs/ceph/mds_client.h
> > @@ -33,8 +33,10 @@ enum ceph_feature_type {
> >       CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
> >       CEPHFS_FEATURE_OP_GETVXATTR,
> >       CEPHFS_FEATURE_32BITS_RETRY_FWD,
> > +     CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> > +     CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >
> > -     CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> > +     CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >   };
> >
> >   #define CEPHFS_FEATURES_CLIENT_SUPPORTED {  \
> > @@ -49,6 +51,7 @@ enum ceph_feature_type {
> >       CEPHFS_FEATURE_NOTIFY_SESSION_STATE,    \
> >       CEPHFS_FEATURE_OP_GETVXATTR,            \
> >       CEPHFS_FEATURE_32BITS_RETRY_FWD,        \
> > +     CEPHFS_FEATURE_HAS_OWNER_UIDGID,        \
> >   }
> >
> >   /*
> > diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> > index 5f2301ee88bc..b91699b08f26 100644
> > --- a/include/linux/ceph/ceph_fs.h
> > +++ b/include/linux/ceph/ceph_fs.h
> > @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
> >       union ceph_mds_request_args args;
> >   } __attribute__ ((packed));
> >
> > -#define CEPH_MDS_REQUEST_HEAD_VERSION  2
> > +#define CEPH_MDS_REQUEST_HEAD_VERSION  3
> >
> >   struct ceph_mds_request_head_old {
> >       __le16 version;                /* struct version */
> > @@ -530,6 +530,9 @@ struct ceph_mds_request_head {
> >
> >       __le32 ext_num_retry;          /* new count retry attempts */
> >       __le32 ext_num_fwd;            /* new count fwd attempts */
> > +
> > +     __le32 struct_len;             /* to store size of struct ceph_mds_request_head */
> > +     __le32 owner_uid, owner_gid;   /* used for OPs which create inodes */
>
> Let's also initialize them to -1 for all the other requests as we do in
> your PR.

They are always initialized already. As you can see from the code we
don't have any extra conditions
on filling these fields. We always fill them with an appropriate
UID/GID. If mount is not idmapped then it's just == caller_uid/gid,
if mount idmapped then it's idmapped caller_uid/gid.

Kind regards,
Alex

>
> Thanks
>
> - Xiubo
>
>
>
> >   } __attribute__ ((packed));
> >
> >   /* cap/lease release record */
>
Xiubo Li Aug. 7, 2023, 10:28 a.m. UTC | #4
On 8/7/23 14:51, Aleksandr Mikhalitsyn wrote:
> On Mon, Aug 7, 2023 at 3:25 AM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 8/4/23 16:48, Alexander Mikhalitsyn wrote:
>>> From: Christian Brauner <brauner@kernel.org>
>>>
>>> Inode operations that create a new filesystem object such as ->mknod,
>>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
>>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
>>> filesystem object.
>>>
>>> In order to ensure that the correct {g,u}id is used map the caller's
>>> fs{g,u}id for creation requests. This doesn't require complex changes.
>>> It suffices to pass in the relevant idmapping recorded in the request
>>> message. If this request message was triggered from an inode operation
>>> that creates filesystem objects it will have passed down the relevant
>>> idmaping. If this is a request message that was triggered from an inode
>>> operation that doens't need to take idmappings into account the initial
>>> idmapping is passed down which is an identity mapping.
>>>
>>> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
>>> which adds two new fields (owner_{u,g}id) to the request head structure.
>>> So, we need to ensure that MDS supports it otherwise we need to fail
>>> any IO that comes through an idmapped mount because we can't process it
>>> in a proper way. MDS server without such an extension will use caller_{u,g}id
>>> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
>>> values are unmapped. At the same time we can't map these fields with an
>>> idmapping as it can break UID/GID-based permission checks logic on the
>>> MDS side. This problem was described with a lot of details at [1], [2].
>>>
>>> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
>>> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
>>>
>>> Link: https://github.com/ceph/ceph/pull/52575
>>> Link: https://tracker.ceph.com/issues/62217
>>> Cc: Xiubo Li <xiubli@redhat.com>
>>> Cc: Jeff Layton <jlayton@kernel.org>
>>> Cc: Ilya Dryomov <idryomov@gmail.com>
>>> Cc: ceph-devel@vger.kernel.org
>>> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
>>> Signed-off-by: Christian Brauner <brauner@kernel.org>
>>> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
>>> ---
>>> v7:
>>>        - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
>>> v8:
>>>        - properly handled case when old MDS used with new kernel client
>>> ---
>>>    fs/ceph/mds_client.c         | 47 +++++++++++++++++++++++++++++++++---
>>>    fs/ceph/mds_client.h         |  5 +++-
>>>    include/linux/ceph/ceph_fs.h |  5 +++-
>>>    3 files changed, 52 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>> index 8829f55103da..41e4bf3811c4 100644
>>> --- a/fs/ceph/mds_client.c
>>> +++ b/fs/ceph/mds_client.c
>>> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
>>>        }
>>>    }
>>>
>>> +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
>>> +{
>>> +     if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
>>> +             return 1;
>>> +
>>> +     if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
>>> +             return 2;
>>> +
>>> +     return CEPH_MDS_REQUEST_HEAD_VERSION;
>>> +}
>>> +
>>>    static struct ceph_mds_request_head_legacy *
>>>    find_legacy_request_head(void *p, u64 features)
>>>    {
>>> @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>    {
>>>        int mds = session->s_mds;
>>>        struct ceph_mds_client *mdsc = session->s_mdsc;
>>> +     struct ceph_client *cl = mdsc->fsc->client;
>>>        struct ceph_msg *msg;
>>>        struct ceph_mds_request_head_legacy *lhead;
>>>        const char *path1 = NULL;
>>> @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>        void *p, *end;
>>>        int ret;
>>>        bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
>>> -     bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
>>> +     u16 request_head_version = mds_supported_head_version(session);
>>>
>>>        ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
>>>                              req->r_parent, req->r_path1, req->r_ino1.ino,
>>> @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>         */
>>>        if (legacy)
>>>                len = sizeof(struct ceph_mds_request_head_legacy);
>>> -     else if (old_version)
>>> +     else if (request_head_version == 1)
>>>                len = sizeof(struct ceph_mds_request_head_old);
>>> +     else if (request_head_version == 2)
>>> +             len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
>>>        else
>>>                len = sizeof(struct ceph_mds_request_head);
>>>
>>> @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>        lhead = find_legacy_request_head(msg->front.iov_base,
>>>                                         session->s_con.peer_features);
>>>
>>> +     if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
>>> +         !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
>>> +             pr_err_ratelimited_client(cl,
>>> +                     "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
>>> +                     " is not supported by MDS. Fail request with -EIO.\n");
>>> +
>>> +             ret = -EIO;
>>> +             goto out_err;
>>> +     }
>>> +
>>>        /*
>>>         * The ceph_mds_request_head_legacy didn't contain a version field, and
>>>         * one was added when we moved the message version from 3->4.
>>> @@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>        if (legacy) {
>>>                msg->hdr.version = cpu_to_le16(3);
>>>                p = msg->front.iov_base + sizeof(*lhead);
>>> -     } else if (old_version) {
>>> +     } else if (request_head_version == 1) {
>>>                struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
>>>
>>>                msg->hdr.version = cpu_to_le16(4);
>>>                ohead->version = cpu_to_le16(1);
>>>                p = msg->front.iov_base + sizeof(*ohead);
>>> +     } else if (request_head_version == 2) {
>>> +             struct ceph_mds_request_head *nhead = msg->front.iov_base;
>>> +
>>> +             msg->hdr.version = cpu_to_le16(6);
>>> +             nhead->version = cpu_to_le16(2);
>>> +
>>> +             p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
>>>        } else {
>>>                struct ceph_mds_request_head *nhead = msg->front.iov_base;
>>> +             kuid_t owner_fsuid;
>>> +             kgid_t owner_fsgid;
>>>
>>>                msg->hdr.version = cpu_to_le16(6);
>>>                nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
>>> +             nhead->struct_len = sizeof(struct ceph_mds_request_head);
>>> +
>>> +             owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
>>> +                                       VFSUIDT_INIT(req->r_cred->fsuid));
>>> +             owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
>>> +                                       VFSGIDT_INIT(req->r_cred->fsgid));
>>> +             nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
>>> +             nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
>>>                p = msg->front.iov_base + sizeof(*nhead);
>>>        }
>>>
>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>>> index e3bbf3ba8ee8..8f683e8203bd 100644
>>> --- a/fs/ceph/mds_client.h
>>> +++ b/fs/ceph/mds_client.h
>>> @@ -33,8 +33,10 @@ enum ceph_feature_type {
>>>        CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
>>>        CEPHFS_FEATURE_OP_GETVXATTR,
>>>        CEPHFS_FEATURE_32BITS_RETRY_FWD,
>>> +     CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
>>> +     CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>>>
>>> -     CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
>>> +     CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>>>    };
>>>
>>>    #define CEPHFS_FEATURES_CLIENT_SUPPORTED {  \
>>> @@ -49,6 +51,7 @@ enum ceph_feature_type {
>>>        CEPHFS_FEATURE_NOTIFY_SESSION_STATE,    \
>>>        CEPHFS_FEATURE_OP_GETVXATTR,            \
>>>        CEPHFS_FEATURE_32BITS_RETRY_FWD,        \
>>> +     CEPHFS_FEATURE_HAS_OWNER_UIDGID,        \
>>>    }
>>>
>>>    /*
>>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
>>> index 5f2301ee88bc..b91699b08f26 100644
>>> --- a/include/linux/ceph/ceph_fs.h
>>> +++ b/include/linux/ceph/ceph_fs.h
>>> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
>>>        union ceph_mds_request_args args;
>>>    } __attribute__ ((packed));
>>>
>>> -#define CEPH_MDS_REQUEST_HEAD_VERSION  2
>>> +#define CEPH_MDS_REQUEST_HEAD_VERSION  3
>>>
>>>    struct ceph_mds_request_head_old {
>>>        __le16 version;                /* struct version */
>>> @@ -530,6 +530,9 @@ struct ceph_mds_request_head {
>>>
>>>        __le32 ext_num_retry;          /* new count retry attempts */
>>>        __le32 ext_num_fwd;            /* new count fwd attempts */
>>> +
>>> +     __le32 struct_len;             /* to store size of struct ceph_mds_request_head */
>>> +     __le32 owner_uid, owner_gid;   /* used for OPs which create inodes */
>> Let's also initialize them to -1 for all the other requests as we do in
>> your PR.
> They are always initialized already. As you can see from the code we
> don't have any extra conditions
> on filling these fields. We always fill them with an appropriate
> UID/GID. If mount is not idmapped then it's just == caller_uid/gid,
> if mount idmapped then it's idmapped caller_uid/gid.

Then in kclient all the request will always initialized the 
'owner_{uid/gid}' with 'caller_{uid/gid}'. While in userspace libcephfs 
it will only set them for 'create/mknod/mkdir/symlink` instead.

I'd prefer to make them consistent, which is what I am still focusing 
on, to make them easier to read and comparing when troubles hooting.

Thanks

- Xiubo

> Kind regards,
> Alex
>
>> Thanks
>>
>> - Xiubo
>>
>>
>>
>>>    } __attribute__ ((packed));
>>>
>>>    /* cap/lease release record */
Aleksandr Mikhalitsyn Aug. 7, 2023, 10:34 a.m. UTC | #5
On Mon, Aug 7, 2023 at 12:28 PM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 8/7/23 14:51, Aleksandr Mikhalitsyn wrote:
> > On Mon, Aug 7, 2023 at 3:25 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>
> >> On 8/4/23 16:48, Alexander Mikhalitsyn wrote:
> >>> From: Christian Brauner <brauner@kernel.org>
> >>>
> >>> Inode operations that create a new filesystem object such as ->mknod,
> >>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> >>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> >>> filesystem object.
> >>>
> >>> In order to ensure that the correct {g,u}id is used map the caller's
> >>> fs{g,u}id for creation requests. This doesn't require complex changes.
> >>> It suffices to pass in the relevant idmapping recorded in the request
> >>> message. If this request message was triggered from an inode operation
> >>> that creates filesystem objects it will have passed down the relevant
> >>> idmaping. If this is a request message that was triggered from an inode
> >>> operation that doens't need to take idmappings into account the initial
> >>> idmapping is passed down which is an identity mapping.
> >>>
> >>> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
> >>> which adds two new fields (owner_{u,g}id) to the request head structure.
> >>> So, we need to ensure that MDS supports it otherwise we need to fail
> >>> any IO that comes through an idmapped mount because we can't process it
> >>> in a proper way. MDS server without such an extension will use caller_{u,g}id
> >>> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
> >>> values are unmapped. At the same time we can't map these fields with an
> >>> idmapping as it can break UID/GID-based permission checks logic on the
> >>> MDS side. This problem was described with a lot of details at [1], [2].
> >>>
> >>> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> >>> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
> >>>
> >>> Link: https://github.com/ceph/ceph/pull/52575
> >>> Link: https://tracker.ceph.com/issues/62217
> >>> Cc: Xiubo Li <xiubli@redhat.com>
> >>> Cc: Jeff Layton <jlayton@kernel.org>
> >>> Cc: Ilya Dryomov <idryomov@gmail.com>
> >>> Cc: ceph-devel@vger.kernel.org
> >>> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> >>> Signed-off-by: Christian Brauner <brauner@kernel.org>
> >>> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> >>> ---
> >>> v7:
> >>>        - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
> >>> v8:
> >>>        - properly handled case when old MDS used with new kernel client
> >>> ---
> >>>    fs/ceph/mds_client.c         | 47 +++++++++++++++++++++++++++++++++---
> >>>    fs/ceph/mds_client.h         |  5 +++-
> >>>    include/linux/ceph/ceph_fs.h |  5 +++-
> >>>    3 files changed, 52 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> >>> index 8829f55103da..41e4bf3811c4 100644
> >>> --- a/fs/ceph/mds_client.c
> >>> +++ b/fs/ceph/mds_client.c
> >>> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
> >>>        }
> >>>    }
> >>>
> >>> +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
> >>> +{
> >>> +     if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
> >>> +             return 1;
> >>> +
> >>> +     if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
> >>> +             return 2;
> >>> +
> >>> +     return CEPH_MDS_REQUEST_HEAD_VERSION;
> >>> +}
> >>> +
> >>>    static struct ceph_mds_request_head_legacy *
> >>>    find_legacy_request_head(void *p, u64 features)
> >>>    {
> >>> @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>    {
> >>>        int mds = session->s_mds;
> >>>        struct ceph_mds_client *mdsc = session->s_mdsc;
> >>> +     struct ceph_client *cl = mdsc->fsc->client;
> >>>        struct ceph_msg *msg;
> >>>        struct ceph_mds_request_head_legacy *lhead;
> >>>        const char *path1 = NULL;
> >>> @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>        void *p, *end;
> >>>        int ret;
> >>>        bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
> >>> -     bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
> >>> +     u16 request_head_version = mds_supported_head_version(session);
> >>>
> >>>        ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> >>>                              req->r_parent, req->r_path1, req->r_ino1.ino,
> >>> @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>         */
> >>>        if (legacy)
> >>>                len = sizeof(struct ceph_mds_request_head_legacy);
> >>> -     else if (old_version)
> >>> +     else if (request_head_version == 1)
> >>>                len = sizeof(struct ceph_mds_request_head_old);
> >>> +     else if (request_head_version == 2)
> >>> +             len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >>>        else
> >>>                len = sizeof(struct ceph_mds_request_head);
> >>>
> >>> @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>        lhead = find_legacy_request_head(msg->front.iov_base,
> >>>                                         session->s_con.peer_features);
> >>>
> >>> +     if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> >>> +         !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
> >>> +             pr_err_ratelimited_client(cl,
> >>> +                     "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> >>> +                     " is not supported by MDS. Fail request with -EIO.\n");
> >>> +
> >>> +             ret = -EIO;
> >>> +             goto out_err;
> >>> +     }
> >>> +
> >>>        /*
> >>>         * The ceph_mds_request_head_legacy didn't contain a version field, and
> >>>         * one was added when we moved the message version from 3->4.
> >>> @@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>        if (legacy) {
> >>>                msg->hdr.version = cpu_to_le16(3);
> >>>                p = msg->front.iov_base + sizeof(*lhead);
> >>> -     } else if (old_version) {
> >>> +     } else if (request_head_version == 1) {
> >>>                struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
> >>>
> >>>                msg->hdr.version = cpu_to_le16(4);
> >>>                ohead->version = cpu_to_le16(1);
> >>>                p = msg->front.iov_base + sizeof(*ohead);
> >>> +     } else if (request_head_version == 2) {
> >>> +             struct ceph_mds_request_head *nhead = msg->front.iov_base;
> >>> +
> >>> +             msg->hdr.version = cpu_to_le16(6);
> >>> +             nhead->version = cpu_to_le16(2);
> >>> +
> >>> +             p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >>>        } else {
> >>>                struct ceph_mds_request_head *nhead = msg->front.iov_base;
> >>> +             kuid_t owner_fsuid;
> >>> +             kgid_t owner_fsgid;
> >>>
> >>>                msg->hdr.version = cpu_to_le16(6);
> >>>                nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> >>> +             nhead->struct_len = sizeof(struct ceph_mds_request_head);
> >>> +
> >>> +             owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> >>> +                                       VFSUIDT_INIT(req->r_cred->fsuid));
> >>> +             owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> >>> +                                       VFSGIDT_INIT(req->r_cred->fsgid));
> >>> +             nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
> >>> +             nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
> >>>                p = msg->front.iov_base + sizeof(*nhead);
> >>>        }
> >>>
> >>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> >>> index e3bbf3ba8ee8..8f683e8203bd 100644
> >>> --- a/fs/ceph/mds_client.h
> >>> +++ b/fs/ceph/mds_client.h
> >>> @@ -33,8 +33,10 @@ enum ceph_feature_type {
> >>>        CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
> >>>        CEPHFS_FEATURE_OP_GETVXATTR,
> >>>        CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >>> +     CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> >>> +     CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >>>
> >>> -     CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >>> +     CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >>>    };
> >>>
> >>>    #define CEPHFS_FEATURES_CLIENT_SUPPORTED {  \
> >>> @@ -49,6 +51,7 @@ enum ceph_feature_type {
> >>>        CEPHFS_FEATURE_NOTIFY_SESSION_STATE,    \
> >>>        CEPHFS_FEATURE_OP_GETVXATTR,            \
> >>>        CEPHFS_FEATURE_32BITS_RETRY_FWD,        \
> >>> +     CEPHFS_FEATURE_HAS_OWNER_UIDGID,        \
> >>>    }
> >>>
> >>>    /*
> >>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> >>> index 5f2301ee88bc..b91699b08f26 100644
> >>> --- a/include/linux/ceph/ceph_fs.h
> >>> +++ b/include/linux/ceph/ceph_fs.h
> >>> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
> >>>        union ceph_mds_request_args args;
> >>>    } __attribute__ ((packed));
> >>>
> >>> -#define CEPH_MDS_REQUEST_HEAD_VERSION  2
> >>> +#define CEPH_MDS_REQUEST_HEAD_VERSION  3
> >>>
> >>>    struct ceph_mds_request_head_old {
> >>>        __le16 version;                /* struct version */
> >>> @@ -530,6 +530,9 @@ struct ceph_mds_request_head {
> >>>
> >>>        __le32 ext_num_retry;          /* new count retry attempts */
> >>>        __le32 ext_num_fwd;            /* new count fwd attempts */
> >>> +
> >>> +     __le32 struct_len;             /* to store size of struct ceph_mds_request_head */
> >>> +     __le32 owner_uid, owner_gid;   /* used for OPs which create inodes */
> >> Let's also initialize them to -1 for all the other requests as we do in
> >> your PR.
> > They are always initialized already. As you can see from the code we
> > don't have any extra conditions
> > on filling these fields. We always fill them with an appropriate
> > UID/GID. If mount is not idmapped then it's just == caller_uid/gid,
> > if mount idmapped then it's idmapped caller_uid/gid.
>
> Then in kclient all the request will always initialized the
> 'owner_{uid/gid}' with 'caller_{uid/gid}'. While in userspace libcephfs
> it will only set them for 'create/mknod/mkdir/symlink` instead.
>
> I'd prefer to make them consistent, which is what I am still focusing
> on, to make them easier to read and comparing when troubles hooting.

Dear Xiubo,

Sure, I will do it.

Couldn't you please review all the rest of the patches before I fix
this particular thing?
It will allow me to fix and send -v10 with all required fixes
incorporated in it.

Kind regards,
Alex

>
> Thanks
>
> - Xiubo
>
> > Kind regards,
> > Alex
> >
> >> Thanks
> >>
> >> - Xiubo
> >>
> >>
> >>
> >>>    } __attribute__ ((packed));
> >>>
> >>>    /* cap/lease release record */
>
Xiubo Li Aug. 7, 2023, 11:21 a.m. UTC | #6
On 8/7/23 18:34, Aleksandr Mikhalitsyn wrote:
> On Mon, Aug 7, 2023 at 12:28 PM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 8/7/23 14:51, Aleksandr Mikhalitsyn wrote:
>>> On Mon, Aug 7, 2023 at 3:25 AM Xiubo Li <xiubli@redhat.com> wrote:
>>>> On 8/4/23 16:48, Alexander Mikhalitsyn wrote:
>>>>> From: Christian Brauner <brauner@kernel.org>
>>>>>
>>>>> Inode operations that create a new filesystem object such as ->mknod,
>>>>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
>>>>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
>>>>> filesystem object.
>>>>>
>>>>> In order to ensure that the correct {g,u}id is used map the caller's
>>>>> fs{g,u}id for creation requests. This doesn't require complex changes.
>>>>> It suffices to pass in the relevant idmapping recorded in the request
>>>>> message. If this request message was triggered from an inode operation
>>>>> that creates filesystem objects it will have passed down the relevant
>>>>> idmaping. If this is a request message that was triggered from an inode
>>>>> operation that doens't need to take idmappings into account the initial
>>>>> idmapping is passed down which is an identity mapping.
>>>>>
>>>>> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
>>>>> which adds two new fields (owner_{u,g}id) to the request head structure.
>>>>> So, we need to ensure that MDS supports it otherwise we need to fail
>>>>> any IO that comes through an idmapped mount because we can't process it
>>>>> in a proper way. MDS server without such an extension will use caller_{u,g}id
>>>>> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
>>>>> values are unmapped. At the same time we can't map these fields with an
>>>>> idmapping as it can break UID/GID-based permission checks logic on the
>>>>> MDS side. This problem was described with a lot of details at [1], [2].
>>>>>
>>>>> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
>>>>> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
>>>>>
>>>>> Link: https://github.com/ceph/ceph/pull/52575
>>>>> Link: https://tracker.ceph.com/issues/62217
>>>>> Cc: Xiubo Li <xiubli@redhat.com>
>>>>> Cc: Jeff Layton <jlayton@kernel.org>
>>>>> Cc: Ilya Dryomov <idryomov@gmail.com>
>>>>> Cc: ceph-devel@vger.kernel.org
>>>>> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
>>>>> Signed-off-by: Christian Brauner <brauner@kernel.org>
>>>>> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
>>>>> ---
>>>>> v7:
>>>>>         - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
>>>>> v8:
>>>>>         - properly handled case when old MDS used with new kernel client
>>>>> ---
>>>>>     fs/ceph/mds_client.c         | 47 +++++++++++++++++++++++++++++++++---
>>>>>     fs/ceph/mds_client.h         |  5 +++-
>>>>>     include/linux/ceph/ceph_fs.h |  5 +++-
>>>>>     3 files changed, 52 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>>> index 8829f55103da..41e4bf3811c4 100644
>>>>> --- a/fs/ceph/mds_client.c
>>>>> +++ b/fs/ceph/mds_client.c
>>>>> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
>>>>>         }
>>>>>     }
>>>>>
>>>>> +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
>>>>> +{
>>>>> +     if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
>>>>> +             return 1;
>>>>> +
>>>>> +     if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
>>>>> +             return 2;
>>>>> +
>>>>> +     return CEPH_MDS_REQUEST_HEAD_VERSION;
>>>>> +}
>>>>> +
>>>>>     static struct ceph_mds_request_head_legacy *
>>>>>     find_legacy_request_head(void *p, u64 features)
>>>>>     {
>>>>> @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>>>     {
>>>>>         int mds = session->s_mds;
>>>>>         struct ceph_mds_client *mdsc = session->s_mdsc;
>>>>> +     struct ceph_client *cl = mdsc->fsc->client;
>>>>>         struct ceph_msg *msg;
>>>>>         struct ceph_mds_request_head_legacy *lhead;
>>>>>         const char *path1 = NULL;
>>>>> @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>>>         void *p, *end;
>>>>>         int ret;
>>>>>         bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
>>>>> -     bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
>>>>> +     u16 request_head_version = mds_supported_head_version(session);
>>>>>
>>>>>         ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
>>>>>                               req->r_parent, req->r_path1, req->r_ino1.ino,
>>>>> @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>>>          */
>>>>>         if (legacy)
>>>>>                 len = sizeof(struct ceph_mds_request_head_legacy);
>>>>> -     else if (old_version)
>>>>> +     else if (request_head_version == 1)
>>>>>                 len = sizeof(struct ceph_mds_request_head_old);
>>>>> +     else if (request_head_version == 2)
>>>>> +             len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
>>>>>         else
>>>>>                 len = sizeof(struct ceph_mds_request_head);
>>>>>
>>>>> @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>>>         lhead = find_legacy_request_head(msg->front.iov_base,
>>>>>                                          session->s_con.peer_features);
>>>>>
>>>>> +     if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
>>>>> +         !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
>>>>> +             pr_err_ratelimited_client(cl,
>>>>> +                     "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
>>>>> +                     " is not supported by MDS. Fail request with -EIO.\n");
>>>>> +
>>>>> +             ret = -EIO;
>>>>> +             goto out_err;
>>>>> +     }
>>>>> +
>>>>>         /*
>>>>>          * The ceph_mds_request_head_legacy didn't contain a version field, and
>>>>>          * one was added when we moved the message version from 3->4.
>>>>> @@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>>>         if (legacy) {
>>>>>                 msg->hdr.version = cpu_to_le16(3);
>>>>>                 p = msg->front.iov_base + sizeof(*lhead);
>>>>> -     } else if (old_version) {
>>>>> +     } else if (request_head_version == 1) {
>>>>>                 struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
>>>>>
>>>>>                 msg->hdr.version = cpu_to_le16(4);
>>>>>                 ohead->version = cpu_to_le16(1);
>>>>>                 p = msg->front.iov_base + sizeof(*ohead);
>>>>> +     } else if (request_head_version == 2) {
>>>>> +             struct ceph_mds_request_head *nhead = msg->front.iov_base;
>>>>> +
>>>>> +             msg->hdr.version = cpu_to_le16(6);
>>>>> +             nhead->version = cpu_to_le16(2);
>>>>> +
>>>>> +             p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
>>>>>         } else {
>>>>>                 struct ceph_mds_request_head *nhead = msg->front.iov_base;
>>>>> +             kuid_t owner_fsuid;
>>>>> +             kgid_t owner_fsgid;
>>>>>
>>>>>                 msg->hdr.version = cpu_to_le16(6);
>>>>>                 nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
>>>>> +             nhead->struct_len = sizeof(struct ceph_mds_request_head);
>>>>> +
>>>>> +             owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
>>>>> +                                       VFSUIDT_INIT(req->r_cred->fsuid));
>>>>> +             owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
>>>>> +                                       VFSGIDT_INIT(req->r_cred->fsgid));
>>>>> +             nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
>>>>> +             nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
>>>>>                 p = msg->front.iov_base + sizeof(*nhead);
>>>>>         }
>>>>>
>>>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>>>>> index e3bbf3ba8ee8..8f683e8203bd 100644
>>>>> --- a/fs/ceph/mds_client.h
>>>>> +++ b/fs/ceph/mds_client.h
>>>>> @@ -33,8 +33,10 @@ enum ceph_feature_type {
>>>>>         CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
>>>>>         CEPHFS_FEATURE_OP_GETVXATTR,
>>>>>         CEPHFS_FEATURE_32BITS_RETRY_FWD,
>>>>> +     CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
>>>>> +     CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>>>>>
>>>>> -     CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
>>>>> +     CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>>>>>     };
>>>>>
>>>>>     #define CEPHFS_FEATURES_CLIENT_SUPPORTED {  \
>>>>> @@ -49,6 +51,7 @@ enum ceph_feature_type {
>>>>>         CEPHFS_FEATURE_NOTIFY_SESSION_STATE,    \
>>>>>         CEPHFS_FEATURE_OP_GETVXATTR,            \
>>>>>         CEPHFS_FEATURE_32BITS_RETRY_FWD,        \
>>>>> +     CEPHFS_FEATURE_HAS_OWNER_UIDGID,        \
>>>>>     }
>>>>>
>>>>>     /*
>>>>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
>>>>> index 5f2301ee88bc..b91699b08f26 100644
>>>>> --- a/include/linux/ceph/ceph_fs.h
>>>>> +++ b/include/linux/ceph/ceph_fs.h
>>>>> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
>>>>>         union ceph_mds_request_args args;
>>>>>     } __attribute__ ((packed));
>>>>>
>>>>> -#define CEPH_MDS_REQUEST_HEAD_VERSION  2
>>>>> +#define CEPH_MDS_REQUEST_HEAD_VERSION  3
>>>>>
>>>>>     struct ceph_mds_request_head_old {
>>>>>         __le16 version;                /* struct version */
>>>>> @@ -530,6 +530,9 @@ struct ceph_mds_request_head {
>>>>>
>>>>>         __le32 ext_num_retry;          /* new count retry attempts */
>>>>>         __le32 ext_num_fwd;            /* new count fwd attempts */
>>>>> +
>>>>> +     __le32 struct_len;             /* to store size of struct ceph_mds_request_head */
>>>>> +     __le32 owner_uid, owner_gid;   /* used for OPs which create inodes */
>>>> Let's also initialize them to -1 for all the other requests as we do in
>>>> your PR.
>>> They are always initialized already. As you can see from the code we
>>> don't have any extra conditions
>>> on filling these fields. We always fill them with an appropriate
>>> UID/GID. If mount is not idmapped then it's just == caller_uid/gid,
>>> if mount idmapped then it's idmapped caller_uid/gid.
>> Then in kclient all the request will always initialized the
>> 'owner_{uid/gid}' with 'caller_{uid/gid}'. While in userspace libcephfs
>> it will only set them for 'create/mknod/mkdir/symlink` instead.
>>
>> I'd prefer to make them consistent, which is what I am still focusing
>> on, to make them easier to read and comparing when troubles hooting.
> Dear Xiubo,
>
> Sure, I will do it.
>
> Couldn't you please review all the rest of the patches before I fix
> this particular thing?
> It will allow me to fix and send -v10 with all required fixes
> incorporated in it.

I have gone through them all and they LGTM.

Thanks

- Xiubo


> Kind regards,
> Alex
>
>> Thanks
>>
>> - Xiubo
>>
>>> Kind regards,
>>> Alex
>>>
>>>> Thanks
>>>>
>>>> - Xiubo
>>>>
>>>>
>>>>
>>>>>     } __attribute__ ((packed));
>>>>>
>>>>>     /* cap/lease release record */
Aleksandr Mikhalitsyn Aug. 7, 2023, 11:28 a.m. UTC | #7
On Mon, Aug 7, 2023 at 1:21 PM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 8/7/23 18:34, Aleksandr Mikhalitsyn wrote:
> > On Mon, Aug 7, 2023 at 12:28 PM Xiubo Li <xiubli@redhat.com> wrote:
> >>
> >> On 8/7/23 14:51, Aleksandr Mikhalitsyn wrote:
> >>> On Mon, Aug 7, 2023 at 3:25 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>>> On 8/4/23 16:48, Alexander Mikhalitsyn wrote:
> >>>>> From: Christian Brauner <brauner@kernel.org>
> >>>>>
> >>>>> Inode operations that create a new filesystem object such as ->mknod,
> >>>>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> >>>>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> >>>>> filesystem object.
> >>>>>
> >>>>> In order to ensure that the correct {g,u}id is used map the caller's
> >>>>> fs{g,u}id for creation requests. This doesn't require complex changes.
> >>>>> It suffices to pass in the relevant idmapping recorded in the request
> >>>>> message. If this request message was triggered from an inode operation
> >>>>> that creates filesystem objects it will have passed down the relevant
> >>>>> idmaping. If this is a request message that was triggered from an inode
> >>>>> operation that doens't need to take idmappings into account the initial
> >>>>> idmapping is passed down which is an identity mapping.
> >>>>>
> >>>>> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
> >>>>> which adds two new fields (owner_{u,g}id) to the request head structure.
> >>>>> So, we need to ensure that MDS supports it otherwise we need to fail
> >>>>> any IO that comes through an idmapped mount because we can't process it
> >>>>> in a proper way. MDS server without such an extension will use caller_{u,g}id
> >>>>> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
> >>>>> values are unmapped. At the same time we can't map these fields with an
> >>>>> idmapping as it can break UID/GID-based permission checks logic on the
> >>>>> MDS side. This problem was described with a lot of details at [1], [2].
> >>>>>
> >>>>> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> >>>>> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
> >>>>>
> >>>>> Link: https://github.com/ceph/ceph/pull/52575
> >>>>> Link: https://tracker.ceph.com/issues/62217
> >>>>> Cc: Xiubo Li <xiubli@redhat.com>
> >>>>> Cc: Jeff Layton <jlayton@kernel.org>
> >>>>> Cc: Ilya Dryomov <idryomov@gmail.com>
> >>>>> Cc: ceph-devel@vger.kernel.org
> >>>>> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> >>>>> Signed-off-by: Christian Brauner <brauner@kernel.org>
> >>>>> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> >>>>> ---
> >>>>> v7:
> >>>>>         - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
> >>>>> v8:
> >>>>>         - properly handled case when old MDS used with new kernel client
> >>>>> ---
> >>>>>     fs/ceph/mds_client.c         | 47 +++++++++++++++++++++++++++++++++---
> >>>>>     fs/ceph/mds_client.h         |  5 +++-
> >>>>>     include/linux/ceph/ceph_fs.h |  5 +++-
> >>>>>     3 files changed, 52 insertions(+), 5 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> >>>>> index 8829f55103da..41e4bf3811c4 100644
> >>>>> --- a/fs/ceph/mds_client.c
> >>>>> +++ b/fs/ceph/mds_client.c
> >>>>> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
> >>>>>         }
> >>>>>     }
> >>>>>
> >>>>> +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
> >>>>> +{
> >>>>> +     if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
> >>>>> +             return 1;
> >>>>> +
> >>>>> +     if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
> >>>>> +             return 2;
> >>>>> +
> >>>>> +     return CEPH_MDS_REQUEST_HEAD_VERSION;
> >>>>> +}
> >>>>> +
> >>>>>     static struct ceph_mds_request_head_legacy *
> >>>>>     find_legacy_request_head(void *p, u64 features)
> >>>>>     {
> >>>>> @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>>>     {
> >>>>>         int mds = session->s_mds;
> >>>>>         struct ceph_mds_client *mdsc = session->s_mdsc;
> >>>>> +     struct ceph_client *cl = mdsc->fsc->client;
> >>>>>         struct ceph_msg *msg;
> >>>>>         struct ceph_mds_request_head_legacy *lhead;
> >>>>>         const char *path1 = NULL;
> >>>>> @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>>>         void *p, *end;
> >>>>>         int ret;
> >>>>>         bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
> >>>>> -     bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
> >>>>> +     u16 request_head_version = mds_supported_head_version(session);
> >>>>>
> >>>>>         ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> >>>>>                               req->r_parent, req->r_path1, req->r_ino1.ino,
> >>>>> @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>>>          */
> >>>>>         if (legacy)
> >>>>>                 len = sizeof(struct ceph_mds_request_head_legacy);
> >>>>> -     else if (old_version)
> >>>>> +     else if (request_head_version == 1)
> >>>>>                 len = sizeof(struct ceph_mds_request_head_old);
> >>>>> +     else if (request_head_version == 2)
> >>>>> +             len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >>>>>         else
> >>>>>                 len = sizeof(struct ceph_mds_request_head);
> >>>>>
> >>>>> @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>>>         lhead = find_legacy_request_head(msg->front.iov_base,
> >>>>>                                          session->s_con.peer_features);
> >>>>>
> >>>>> +     if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> >>>>> +         !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
> >>>>> +             pr_err_ratelimited_client(cl,
> >>>>> +                     "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> >>>>> +                     " is not supported by MDS. Fail request with -EIO.\n");
> >>>>> +
> >>>>> +             ret = -EIO;
> >>>>> +             goto out_err;
> >>>>> +     }
> >>>>> +
> >>>>>         /*
> >>>>>          * The ceph_mds_request_head_legacy didn't contain a version field, and
> >>>>>          * one was added when we moved the message version from 3->4.
> >>>>> @@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>>>         if (legacy) {
> >>>>>                 msg->hdr.version = cpu_to_le16(3);
> >>>>>                 p = msg->front.iov_base + sizeof(*lhead);
> >>>>> -     } else if (old_version) {
> >>>>> +     } else if (request_head_version == 1) {
> >>>>>                 struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
> >>>>>
> >>>>>                 msg->hdr.version = cpu_to_le16(4);
> >>>>>                 ohead->version = cpu_to_le16(1);
> >>>>>                 p = msg->front.iov_base + sizeof(*ohead);
> >>>>> +     } else if (request_head_version == 2) {
> >>>>> +             struct ceph_mds_request_head *nhead = msg->front.iov_base;
> >>>>> +
> >>>>> +             msg->hdr.version = cpu_to_le16(6);
> >>>>> +             nhead->version = cpu_to_le16(2);
> >>>>> +
> >>>>> +             p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >>>>>         } else {
> >>>>>                 struct ceph_mds_request_head *nhead = msg->front.iov_base;
> >>>>> +             kuid_t owner_fsuid;
> >>>>> +             kgid_t owner_fsgid;
> >>>>>
> >>>>>                 msg->hdr.version = cpu_to_le16(6);
> >>>>>                 nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> >>>>> +             nhead->struct_len = sizeof(struct ceph_mds_request_head);
> >>>>> +
> >>>>> +             owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> >>>>> +                                       VFSUIDT_INIT(req->r_cred->fsuid));
> >>>>> +             owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> >>>>> +                                       VFSGIDT_INIT(req->r_cred->fsgid));
> >>>>> +             nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
> >>>>> +             nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
> >>>>>                 p = msg->front.iov_base + sizeof(*nhead);
> >>>>>         }
> >>>>>
> >>>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> >>>>> index e3bbf3ba8ee8..8f683e8203bd 100644
> >>>>> --- a/fs/ceph/mds_client.h
> >>>>> +++ b/fs/ceph/mds_client.h
> >>>>> @@ -33,8 +33,10 @@ enum ceph_feature_type {
> >>>>>         CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
> >>>>>         CEPHFS_FEATURE_OP_GETVXATTR,
> >>>>>         CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >>>>> +     CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> >>>>> +     CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >>>>>
> >>>>> -     CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >>>>> +     CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >>>>>     };
> >>>>>
> >>>>>     #define CEPHFS_FEATURES_CLIENT_SUPPORTED {  \
> >>>>> @@ -49,6 +51,7 @@ enum ceph_feature_type {
> >>>>>         CEPHFS_FEATURE_NOTIFY_SESSION_STATE,    \
> >>>>>         CEPHFS_FEATURE_OP_GETVXATTR,            \
> >>>>>         CEPHFS_FEATURE_32BITS_RETRY_FWD,        \
> >>>>> +     CEPHFS_FEATURE_HAS_OWNER_UIDGID,        \
> >>>>>     }
> >>>>>
> >>>>>     /*
> >>>>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> >>>>> index 5f2301ee88bc..b91699b08f26 100644
> >>>>> --- a/include/linux/ceph/ceph_fs.h
> >>>>> +++ b/include/linux/ceph/ceph_fs.h
> >>>>> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
> >>>>>         union ceph_mds_request_args args;
> >>>>>     } __attribute__ ((packed));
> >>>>>
> >>>>> -#define CEPH_MDS_REQUEST_HEAD_VERSION  2
> >>>>> +#define CEPH_MDS_REQUEST_HEAD_VERSION  3
> >>>>>
> >>>>>     struct ceph_mds_request_head_old {
> >>>>>         __le16 version;                /* struct version */
> >>>>> @@ -530,6 +530,9 @@ struct ceph_mds_request_head {
> >>>>>
> >>>>>         __le32 ext_num_retry;          /* new count retry attempts */
> >>>>>         __le32 ext_num_fwd;            /* new count fwd attempts */
> >>>>> +
> >>>>> +     __le32 struct_len;             /* to store size of struct ceph_mds_request_head */
> >>>>> +     __le32 owner_uid, owner_gid;   /* used for OPs which create inodes */
> >>>> Let's also initialize them to -1 for all the other requests as we do in
> >>>> your PR.
> >>> They are always initialized already. As you can see from the code we
> >>> don't have any extra conditions
> >>> on filling these fields. We always fill them with an appropriate
> >>> UID/GID. If mount is not idmapped then it's just == caller_uid/gid,
> >>> if mount idmapped then it's idmapped caller_uid/gid.
> >> Then in kclient all the request will always initialized the
> >> 'owner_{uid/gid}' with 'caller_{uid/gid}'. While in userspace libcephfs
> >> it will only set them for 'create/mknod/mkdir/symlink` instead.
> >>
> >> I'd prefer to make them consistent, which is what I am still focusing
> >> on, to make them easier to read and comparing when troubles hooting.
> > Dear Xiubo,
> >
> > Sure, I will do it.
> >
> > Couldn't you please review all the rest of the patches before I fix
> > this particular thing?
> > It will allow me to fix and send -v10 with all required fixes
> > incorporated in it.
>
> I have gone through them all and they LGTM.

Thanks!

Kind regards,
Alex

>
> Thanks
>
> - Xiubo
>
>
> > Kind regards,
> > Alex
> >
> >> Thanks
> >>
> >> - Xiubo
> >>
> >>> Kind regards,
> >>> Alex
> >>>
> >>>> Thanks
> >>>>
> >>>> - Xiubo
> >>>>
> >>>>
> >>>>
> >>>>>     } __attribute__ ((packed));
> >>>>>
> >>>>>     /* cap/lease release record */
>
Aleksandr Mikhalitsyn Aug. 7, 2023, 11:45 a.m. UTC | #8
On Mon, Aug 7, 2023 at 12:28 PM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 8/7/23 14:51, Aleksandr Mikhalitsyn wrote:
> > On Mon, Aug 7, 2023 at 3:25 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>
> >> On 8/4/23 16:48, Alexander Mikhalitsyn wrote:
> >>> From: Christian Brauner <brauner@kernel.org>
> >>>
> >>> Inode operations that create a new filesystem object such as ->mknod,
> >>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> >>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> >>> filesystem object.
> >>>
> >>> In order to ensure that the correct {g,u}id is used map the caller's
> >>> fs{g,u}id for creation requests. This doesn't require complex changes.
> >>> It suffices to pass in the relevant idmapping recorded in the request
> >>> message. If this request message was triggered from an inode operation
> >>> that creates filesystem objects it will have passed down the relevant
> >>> idmaping. If this is a request message that was triggered from an inode
> >>> operation that doens't need to take idmappings into account the initial
> >>> idmapping is passed down which is an identity mapping.
> >>>
> >>> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
> >>> which adds two new fields (owner_{u,g}id) to the request head structure.
> >>> So, we need to ensure that MDS supports it otherwise we need to fail
> >>> any IO that comes through an idmapped mount because we can't process it
> >>> in a proper way. MDS server without such an extension will use caller_{u,g}id
> >>> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
> >>> values are unmapped. At the same time we can't map these fields with an
> >>> idmapping as it can break UID/GID-based permission checks logic on the
> >>> MDS side. This problem was described with a lot of details at [1], [2].
> >>>
> >>> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> >>> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
> >>>
> >>> Link: https://github.com/ceph/ceph/pull/52575
> >>> Link: https://tracker.ceph.com/issues/62217
> >>> Cc: Xiubo Li <xiubli@redhat.com>
> >>> Cc: Jeff Layton <jlayton@kernel.org>
> >>> Cc: Ilya Dryomov <idryomov@gmail.com>
> >>> Cc: ceph-devel@vger.kernel.org
> >>> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> >>> Signed-off-by: Christian Brauner <brauner@kernel.org>
> >>> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> >>> ---
> >>> v7:
> >>>        - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
> >>> v8:
> >>>        - properly handled case when old MDS used with new kernel client
> >>> ---
> >>>    fs/ceph/mds_client.c         | 47 +++++++++++++++++++++++++++++++++---
> >>>    fs/ceph/mds_client.h         |  5 +++-
> >>>    include/linux/ceph/ceph_fs.h |  5 +++-
> >>>    3 files changed, 52 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> >>> index 8829f55103da..41e4bf3811c4 100644
> >>> --- a/fs/ceph/mds_client.c
> >>> +++ b/fs/ceph/mds_client.c
> >>> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
> >>>        }
> >>>    }
> >>>
> >>> +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
> >>> +{
> >>> +     if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
> >>> +             return 1;
> >>> +
> >>> +     if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
> >>> +             return 2;
> >>> +
> >>> +     return CEPH_MDS_REQUEST_HEAD_VERSION;
> >>> +}
> >>> +
> >>>    static struct ceph_mds_request_head_legacy *
> >>>    find_legacy_request_head(void *p, u64 features)
> >>>    {
> >>> @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>    {
> >>>        int mds = session->s_mds;
> >>>        struct ceph_mds_client *mdsc = session->s_mdsc;
> >>> +     struct ceph_client *cl = mdsc->fsc->client;
> >>>        struct ceph_msg *msg;
> >>>        struct ceph_mds_request_head_legacy *lhead;
> >>>        const char *path1 = NULL;
> >>> @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>        void *p, *end;
> >>>        int ret;
> >>>        bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
> >>> -     bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
> >>> +     u16 request_head_version = mds_supported_head_version(session);
> >>>
> >>>        ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> >>>                              req->r_parent, req->r_path1, req->r_ino1.ino,
> >>> @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>         */
> >>>        if (legacy)
> >>>                len = sizeof(struct ceph_mds_request_head_legacy);
> >>> -     else if (old_version)
> >>> +     else if (request_head_version == 1)
> >>>                len = sizeof(struct ceph_mds_request_head_old);
> >>> +     else if (request_head_version == 2)
> >>> +             len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >>>        else
> >>>                len = sizeof(struct ceph_mds_request_head);
> >>>
> >>> @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>        lhead = find_legacy_request_head(msg->front.iov_base,
> >>>                                         session->s_con.peer_features);
> >>>
> >>> +     if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> >>> +         !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
> >>> +             pr_err_ratelimited_client(cl,
> >>> +                     "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> >>> +                     " is not supported by MDS. Fail request with -EIO.\n");
> >>> +
> >>> +             ret = -EIO;
> >>> +             goto out_err;
> >>> +     }
> >>> +
> >>>        /*
> >>>         * The ceph_mds_request_head_legacy didn't contain a version field, and
> >>>         * one was added when we moved the message version from 3->4.
> >>> @@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>        if (legacy) {
> >>>                msg->hdr.version = cpu_to_le16(3);
> >>>                p = msg->front.iov_base + sizeof(*lhead);
> >>> -     } else if (old_version) {
> >>> +     } else if (request_head_version == 1) {
> >>>                struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
> >>>
> >>>                msg->hdr.version = cpu_to_le16(4);
> >>>                ohead->version = cpu_to_le16(1);
> >>>                p = msg->front.iov_base + sizeof(*ohead);
> >>> +     } else if (request_head_version == 2) {
> >>> +             struct ceph_mds_request_head *nhead = msg->front.iov_base;
> >>> +
> >>> +             msg->hdr.version = cpu_to_le16(6);
> >>> +             nhead->version = cpu_to_le16(2);
> >>> +
> >>> +             p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >>>        } else {
> >>>                struct ceph_mds_request_head *nhead = msg->front.iov_base;
> >>> +             kuid_t owner_fsuid;
> >>> +             kgid_t owner_fsgid;
> >>>
> >>>                msg->hdr.version = cpu_to_le16(6);
> >>>                nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> >>> +             nhead->struct_len = sizeof(struct ceph_mds_request_head);
> >>> +
> >>> +             owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> >>> +                                       VFSUIDT_INIT(req->r_cred->fsuid));
> >>> +             owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> >>> +                                       VFSGIDT_INIT(req->r_cred->fsgid));
> >>> +             nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
> >>> +             nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
> >>>                p = msg->front.iov_base + sizeof(*nhead);
> >>>        }
> >>>
> >>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> >>> index e3bbf3ba8ee8..8f683e8203bd 100644
> >>> --- a/fs/ceph/mds_client.h
> >>> +++ b/fs/ceph/mds_client.h
> >>> @@ -33,8 +33,10 @@ enum ceph_feature_type {
> >>>        CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
> >>>        CEPHFS_FEATURE_OP_GETVXATTR,
> >>>        CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >>> +     CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> >>> +     CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >>>
> >>> -     CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >>> +     CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >>>    };
> >>>
> >>>    #define CEPHFS_FEATURES_CLIENT_SUPPORTED {  \
> >>> @@ -49,6 +51,7 @@ enum ceph_feature_type {
> >>>        CEPHFS_FEATURE_NOTIFY_SESSION_STATE,    \
> >>>        CEPHFS_FEATURE_OP_GETVXATTR,            \
> >>>        CEPHFS_FEATURE_32BITS_RETRY_FWD,        \
> >>> +     CEPHFS_FEATURE_HAS_OWNER_UIDGID,        \
> >>>    }
> >>>
> >>>    /*
> >>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> >>> index 5f2301ee88bc..b91699b08f26 100644
> >>> --- a/include/linux/ceph/ceph_fs.h
> >>> +++ b/include/linux/ceph/ceph_fs.h
> >>> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
> >>>        union ceph_mds_request_args args;
> >>>    } __attribute__ ((packed));
> >>>
> >>> -#define CEPH_MDS_REQUEST_HEAD_VERSION  2
> >>> +#define CEPH_MDS_REQUEST_HEAD_VERSION  3
> >>>
> >>>    struct ceph_mds_request_head_old {
> >>>        __le16 version;                /* struct version */
> >>> @@ -530,6 +530,9 @@ struct ceph_mds_request_head {
> >>>
> >>>        __le32 ext_num_retry;          /* new count retry attempts */
> >>>        __le32 ext_num_fwd;            /* new count fwd attempts */
> >>> +
> >>> +     __le32 struct_len;             /* to store size of struct ceph_mds_request_head */
> >>> +     __le32 owner_uid, owner_gid;   /* used for OPs which create inodes */
> >> Let's also initialize them to -1 for all the other requests as we do in
> >> your PR.
> > They are always initialized already. As you can see from the code we
> > don't have any extra conditions
> > on filling these fields. We always fill them with an appropriate
> > UID/GID. If mount is not idmapped then it's just == caller_uid/gid,
> > if mount idmapped then it's idmapped caller_uid/gid.
>
> Then in kclient all the request will always initialized the
> 'owner_{uid/gid}' with 'caller_{uid/gid}'. While in userspace libcephfs
> it will only set them for 'create/mknod/mkdir/symlink` instead.
>
> I'd prefer to make them consistent, which is what I am still focusing
> on, to make them easier to read and comparing when troubles hooting.

Have fixed:
https://github.com/mihalicyn/linux/commit/5a5b590ca5aa9ec81d68ff60d77ea54fc86bf33a

Also have added appropriate checks in mkdir/atomic_open:
https://github.com/mihalicyn/linux/commit/bc1fa68f7143a58af8c181bbfab64edf0397dca5
https://github.com/mihalicyn/linux/commit/30e21387063710a10cdca15a5d840fcf8e1e6ccd

Will send v10 soon.

Kind regards,
Alex

>
> Thanks
>
> - Xiubo
>
> > Kind regards,
> > Alex
> >
> >> Thanks
> >>
> >> - Xiubo
> >>
> >>
> >>
> >>>    } __attribute__ ((packed));
> >>>
> >>>    /* cap/lease release record */
>