From patchwork Fri Sep 30 11:18:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 611004 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8005AC43219 for ; Fri, 30 Sep 2022 11:27:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232433AbiI3L06 (ORCPT ); Fri, 30 Sep 2022 07:26:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48254 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232307AbiI3L02 (ORCPT ); Fri, 30 Sep 2022 07:26:28 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87D3929378; Fri, 30 Sep 2022 04:18:49 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id C1731B8280C; Fri, 30 Sep 2022 11:18:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8AA6AC43470; Fri, 30 Sep 2022 11:18:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1664536726; bh=S/Dx1SlVR4nhbzfbebAjqPJrTxRJSD7/7YKwODgznpg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qqhBm8i+JoI2zMoaavTiAn1oZA2BRa2zQ73rhoHUGNk2X16KSZTAPGBSp45NIvjov sPmWO0td/2BEU9CvBurIUiLAmE0gfKdkstk5LGPWvCSUsvyTz7qMr1aCwoQcJU6K6J bXSLl771ipnsemKit3USDA8GfYnigLhyirbjYZ3yMXTWFfViPZiNUu+aMtwVfhDrMC elCka/qE+claAKyQDuyP4+xRHOCgTxrlM+gTI4bdzoVuKEBkUtpUvSq7H2N2lpRm0s 7V0DUa82mgFAwJt5InCu2EVTFKfpd+/o+dcLP5dieoaNLOyu3NZDhlR08hlA2JsR7h YM9w3zRGytFhQ== From: Jeff Layton To: tytso@mit.edu, adilger.kernel@dilger.ca, djwong@kernel.org, david@fromorbit.com, trondmy@hammerspace.com, neilb@suse.de, viro@zeniv.linux.org.uk, zohar@linux.ibm.com, xiubli@redhat.com, chuck.lever@oracle.com, lczerner@redhat.com, jack@suse.cz, bfields@fieldses.org, brauner@kernel.org, fweimer@redhat.com Cc: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org Subject: [PATCH v6 1/9] iversion: move inode_query_iversion to libfs.c Date: Fri, 30 Sep 2022 07:18:32 -0400 Message-Id: <20220930111840.10695-2-jlayton@kernel.org> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220930111840.10695-1-jlayton@kernel.org> References: <20220930111840.10695-1-jlayton@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org There's no need to have such a large function forcibly inlined. Signed-off-by: Jeff Layton --- fs/libfs.c | 36 ++++++++++++++++++++++++++++++++++++ include/linux/iversion.h | 38 ++------------------------------------ 2 files changed, 38 insertions(+), 36 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index 682d56345a1c..5ae81466a422 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -1566,3 +1566,39 @@ bool inode_maybe_inc_iversion(struct inode *inode, bool force) return true; } EXPORT_SYMBOL(inode_maybe_inc_iversion); + +/** + * inode_query_iversion - read i_version for later use + * @inode: inode from which i_version should be read + * + * Read the inode i_version counter. This should be used by callers that wish + * to store the returned i_version for later comparison. This will guarantee + * that a later query of the i_version will result in a different value if + * anything has changed. + * + * In this implementation, we fetch the current value, set the QUERIED flag and + * then try to swap it into place with a cmpxchg, if it wasn't already set. If + * that fails, we try again with the newly fetched value from the cmpxchg. + */ +u64 inode_query_iversion(struct inode *inode) +{ + u64 cur, new; + + cur = inode_peek_iversion_raw(inode); + do { + /* If flag is already set, then no need to swap */ + if (cur & I_VERSION_QUERIED) { + /* + * This barrier (and the implicit barrier in the + * cmpxchg below) pairs with the barrier in + * inode_maybe_inc_iversion(). + */ + smp_mb(); + break; + } + + new = cur | I_VERSION_QUERIED; + } while (!atomic64_try_cmpxchg(&inode->i_version, &cur, new)); + return cur >> I_VERSION_QUERIED_SHIFT; +} +EXPORT_SYMBOL(inode_query_iversion); diff --git a/include/linux/iversion.h b/include/linux/iversion.h index e27bd4f55d84..6755d8b4f20b 100644 --- a/include/linux/iversion.h +++ b/include/linux/iversion.h @@ -234,42 +234,6 @@ inode_peek_iversion(const struct inode *inode) return inode_peek_iversion_raw(inode) >> I_VERSION_QUERIED_SHIFT; } -/** - * inode_query_iversion - read i_version for later use - * @inode: inode from which i_version should be read - * - * Read the inode i_version counter. This should be used by callers that wish - * to store the returned i_version for later comparison. This will guarantee - * that a later query of the i_version will result in a different value if - * anything has changed. - * - * In this implementation, we fetch the current value, set the QUERIED flag and - * then try to swap it into place with a cmpxchg, if it wasn't already set. If - * that fails, we try again with the newly fetched value from the cmpxchg. - */ -static inline u64 -inode_query_iversion(struct inode *inode) -{ - u64 cur, new; - - cur = inode_peek_iversion_raw(inode); - do { - /* If flag is already set, then no need to swap */ - if (cur & I_VERSION_QUERIED) { - /* - * This barrier (and the implicit barrier in the - * cmpxchg below) pairs with the barrier in - * inode_maybe_inc_iversion(). - */ - smp_mb(); - break; - } - - new = cur | I_VERSION_QUERIED; - } while (!atomic64_try_cmpxchg(&inode->i_version, &cur, new)); - return cur >> I_VERSION_QUERIED_SHIFT; -} - /* * For filesystems without any sort of change attribute, the best we can * do is fake one up from the ctime: @@ -283,6 +247,8 @@ static inline u64 time_to_chattr(struct timespec64 *t) return chattr; } +u64 inode_query_iversion(struct inode *inode); + /** * inode_eq_iversion_raw - check whether the raw i_version counter has changed * @inode: inode to check From patchwork Fri Sep 30 11:18:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 611993 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A14FC4321E for ; Fri, 30 Sep 2022 11:27:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231747AbiI3L1A (ORCPT ); Fri, 30 Sep 2022 07:27:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57170 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232155AbiI3L02 (ORCPT ); Fri, 30 Sep 2022 07:26:28 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDE7A17586; Fri, 30 Sep 2022 04:18:49 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5875A62299; Fri, 30 Sep 2022 11:18:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B809DC4347C; Fri, 30 Sep 2022 11:18:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1664536728; bh=G3/t3BDJDz9EMHp0QOUkv/5jjrPwNlRd22Dc40GCbVw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=C25sKiNghUeIo8SwmYaYrlE+zqQ8SiqFO1YRyhXnVgyCIbnfvq9wmQYyiIqF4zFsv rSZh+fnIP5m63rqFcDKETiVW/X9XMxJme7LIGk8zgSkJCNKnKqPFNjX2YVRgqO0Xat pWCw1FipBPRWr9MCej2o/JrTvT7fANuSyH5vsOxqloZ99kutXIsgZTaKLP4MJneiPb 90+iOUXbBP+d+xXmnjJF3BjfZ0AmRT95opCBPYg28qO6uaZrehTC9U1CH6OnxdBMkq xg7JPVll9IPj7BrNU4AUfgIJ9QGSRvH+Tw/ZwggNIU1uq90lmtvvXkR1AUTBLtyBgB fhIcj1YFTc/aA== From: Jeff Layton To: tytso@mit.edu, adilger.kernel@dilger.ca, djwong@kernel.org, david@fromorbit.com, trondmy@hammerspace.com, neilb@suse.de, viro@zeniv.linux.org.uk, zohar@linux.ibm.com, xiubli@redhat.com, chuck.lever@oracle.com, lczerner@redhat.com, jack@suse.cz, bfields@fieldses.org, brauner@kernel.org, fweimer@redhat.com Cc: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org, Colin Walters Subject: [PATCH v6 2/9] iversion: clarify when the i_version counter must be updated Date: Fri, 30 Sep 2022 07:18:33 -0400 Message-Id: <20220930111840.10695-3-jlayton@kernel.org> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220930111840.10695-1-jlayton@kernel.org> References: <20220930111840.10695-1-jlayton@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org The i_version field in the kernel has had different semantics over the decades, but NFSv4 has certain expectations. Update the comments in iversion.h to describe when the i_version must change. Cc: Colin Walters Cc: NeilBrown Cc: Trond Myklebust Cc: Dave Chinner Link: https://lore.kernel.org/linux-xfs/166086932784.5425.17134712694961326033@noble.neil.brown.name/#t Signed-off-by: Jeff Layton --- include/linux/iversion.h | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/include/linux/iversion.h b/include/linux/iversion.h index 6755d8b4f20b..9925cac1fa94 100644 --- a/include/linux/iversion.h +++ b/include/linux/iversion.h @@ -9,8 +9,14 @@ * --------------------------- * The change attribute (i_version) is mandated by NFSv4 and is mostly for * knfsd, but is also used for other purposes (e.g. IMA). The i_version must - * appear different to observers if there was a change to the inode's data or - * metadata since it was last queried. + * appear larger to observers if there was an explicit change to the inode's + * data or metadata since it was last queried. + * + * An explicit change is one that would ordinarily result in a change to the + * inode status change time (aka ctime). i_version must appear to change, even + * if the ctime does not (since the whole point is to avoid missing updates due + * to timestamp granularity). If POSIX mandates that the ctime must change due + * to an operation, then the i_version counter must be incremented as well. * * Observers see the i_version as a 64-bit number that never decreases. If it * remains the same since it was last checked, then nothing has changed in the From patchwork Fri Sep 30 11:18:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 611000 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D53EC4332F for ; Fri, 30 Sep 2022 11:27:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231459AbiI3L1Z (ORCPT ); Fri, 30 Sep 2022 07:27:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232589AbiI3L03 (ORCPT ); Fri, 30 Sep 2022 07:26:29 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8174238467; Fri, 30 Sep 2022 04:18:53 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 39EEDB827BA; Fri, 30 Sep 2022 11:18:52 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 04497C43470; Fri, 30 Sep 2022 11:18:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1664536731; bh=OiWqH9H96RIYc4sQV4NyZY0vkQDrR45eK/XVH8S1FtM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EdbjRPFWkSzB6MCZ18crcDxdhPQchA/8rL+mJpt+iG2f3ETidE58kob/wyZuk8Sh+ 30vZpPbChd95ytHLRm1uTjN1VA1H8u7Q7Bz7yvUX6pElumMlxhUhQynFthmKyTolsP nk5LpQbwiZlKnDNENa5+PGoxcm3rMS92gyLPnPTV5qSH2A5Yt+QkphTB9wiGj0bC9N PkMyqoCQ2/EBPVZLZqXPWF98f/lQEYHqi01ba2ICkR2MHMH7hCEE2EXrDeIiBYAYR9 Ut80RrJ1iPXj1GVQOMtzlBpHwZH4AaKqu8L3n8TeQ7wwcdrqzzC1ufwbaWZYcLAcJX sg4uwlHjitrHg== From: Jeff Layton To: tytso@mit.edu, adilger.kernel@dilger.ca, djwong@kernel.org, david@fromorbit.com, trondmy@hammerspace.com, neilb@suse.de, viro@zeniv.linux.org.uk, zohar@linux.ibm.com, xiubli@redhat.com, chuck.lever@oracle.com, lczerner@redhat.com, jack@suse.cz, bfields@fieldses.org, brauner@kernel.org, fweimer@redhat.com Cc: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org, Jeff Layton Subject: [PATCH v6 3/9] vfs: plumb i_version handling into struct kstat Date: Fri, 30 Sep 2022 07:18:34 -0400 Message-Id: <20220930111840.10695-4-jlayton@kernel.org> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220930111840.10695-1-jlayton@kernel.org> References: <20220930111840.10695-1-jlayton@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton The NFS server has a lot of special handling for different types of change attribute access, depending on what sort of inode we have. In most cases, it's doing a getattr anyway and then fetching that value after the fact. Rather that do that, add a new STATX_VERSION flag that is a kernel-only symbol (for now). If requested and getattr can implement it, it can fill out this field. For IS_I_VERSION inodes, add a generic implementation in vfs_getattr_nosec. Take care to mask STATX_VERSION off in requests from userland and in the result mask. Eventually if we decide to make this available to userland, we can just designate a field for it in struct statx, and move the STATX_VERSION definition to the uapi header. Signed-off-by: Jeff Layton --- fs/stat.c | 17 +++++++++++++++-- include/linux/stat.h | 9 +++++++++ 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/fs/stat.c b/fs/stat.c index a7930d744483..e7f8cd4b24e1 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -118,6 +119,11 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat, stat->attributes_mask |= (STATX_ATTR_AUTOMOUNT | STATX_ATTR_DAX); + if ((request_mask & STATX_VERSION) && IS_I_VERSION(inode)) { + stat->result_mask |= STATX_VERSION; + stat->version = inode_query_iversion(inode); + } + mnt_userns = mnt_user_ns(path->mnt); if (inode->i_op->getattr) return inode->i_op->getattr(mnt_userns, path, stat, @@ -587,9 +593,11 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer) memset(&tmp, 0, sizeof(tmp)); - tmp.stx_mask = stat->result_mask; + /* STATX_VERSION is kernel-only for now */ + tmp.stx_mask = stat->result_mask & ~STATX_VERSION; tmp.stx_blksize = stat->blksize; - tmp.stx_attributes = stat->attributes; + /* STATX_ATTR_VERSION_MONOTONIC is kernel-only for now */ + tmp.stx_attributes = stat->attributes & ~STATX_ATTR_VERSION_MONOTONIC; tmp.stx_nlink = stat->nlink; tmp.stx_uid = from_kuid_munged(current_user_ns(), stat->uid); tmp.stx_gid = from_kgid_munged(current_user_ns(), stat->gid); @@ -628,6 +636,11 @@ int do_statx(int dfd, struct filename *filename, unsigned int flags, if ((flags & AT_STATX_SYNC_TYPE) == AT_STATX_SYNC_TYPE) return -EINVAL; + /* STATX_VERSION is kernel-only for now. Ignore requests + * from userland. + */ + mask &= ~STATX_VERSION; + error = vfs_statx(dfd, filename, flags, &stat, mask); if (error) return error; diff --git a/include/linux/stat.h b/include/linux/stat.h index ff277ced50e9..4e9428d86a3a 100644 --- a/include/linux/stat.h +++ b/include/linux/stat.h @@ -52,6 +52,15 @@ struct kstat { u64 mnt_id; u32 dio_mem_align; u32 dio_offset_align; + u64 version; }; +/* These definitions are internal to the kernel for now. Mainly used by nfsd. */ + +/* mask values */ +#define STATX_VERSION 0x40000000U /* Want/got stx_change_attr */ + +/* file attribute values */ +#define STATX_ATTR_VERSION_MONOTONIC 0x8000000000000000ULL /* version monotonically increases */ + #endif From patchwork Fri Sep 30 11:18:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 611002 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C1C6C4332F for ; Fri, 30 Sep 2022 11:27:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230110AbiI3L1M (ORCPT ); Fri, 30 Sep 2022 07:27:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232575AbiI3L03 (ORCPT ); Fri, 30 Sep 2022 07:26:29 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30A063ECC4; Fri, 30 Sep 2022 04:18:54 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id BD7C662299; Fri, 30 Sep 2022 11:18:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 45083C4347C; Fri, 30 Sep 2022 11:18:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1664536733; bh=RJmSmkbr0DPIG61xWI+MMXJapiGj8COfeLHhDCBUg/g=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qSgdTK0OkVqwuPneLv8OETMMx+MgJGaIW8o8j/RrOd+1X4XklGgsgv+VO7ZCVpFzS m39M+7b3tW7h2s/S1s1HQslRMr/yZbSzTHLj4bItFJi16KX5RHuh4F7+MBpFNh3Om9 Be0JAf/us9KqWqmXjTtzI94l1o8fO+7YhUw9cHpm5/b1MdORIGcnuSWKIQKjmCwJSK CfkAOG9TrJ69UPkDSqjsb/UgFQTgW8XIEkD8Y5B5O8mbZbvchl7Z1eEbhsuzbYaxxD J2buF3KHDJbsuYGrh4rsD/11byM8hFPgLA5oWkhWWdeQV5sv0F6yf2k+TcqUPIQUJ+ g40PWwUX4LrpQ== From: Jeff Layton To: tytso@mit.edu, adilger.kernel@dilger.ca, djwong@kernel.org, david@fromorbit.com, trondmy@hammerspace.com, neilb@suse.de, viro@zeniv.linux.org.uk, zohar@linux.ibm.com, xiubli@redhat.com, chuck.lever@oracle.com, lczerner@redhat.com, jack@suse.cz, bfields@fieldses.org, brauner@kernel.org, fweimer@redhat.com Cc: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org Subject: [PATCH v6 4/9] nfs: report the inode version in getattr if requested Date: Fri, 30 Sep 2022 07:18:35 -0400 Message-Id: <20220930111840.10695-5-jlayton@kernel.org> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220930111840.10695-1-jlayton@kernel.org> References: <20220930111840.10695-1-jlayton@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Allow NFS to report the i_version in getattr requests. Since the cost to fetch it is relatively cheap, do it unconditionally and just set the flag if it looks like it's valid. Also, conditionally enable the MONOTONIC flag when the server reports its change attr type as such. Signed-off-by: Jeff Layton Reviewed-by: NeilBrown --- fs/nfs/inode.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index bea7c005119c..5cb7017e5089 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -830,6 +830,8 @@ static u32 nfs_get_valid_attrmask(struct inode *inode) reply_mask |= STATX_UID | STATX_GID; if (!(cache_validity & NFS_INO_INVALID_BLOCKS)) reply_mask |= STATX_BLOCKS; + if (!(cache_validity & NFS_INO_INVALID_CHANGE)) + reply_mask |= STATX_VERSION; return reply_mask; } @@ -848,7 +850,7 @@ int nfs_getattr(struct user_namespace *mnt_userns, const struct path *path, request_mask &= STATX_TYPE | STATX_MODE | STATX_NLINK | STATX_UID | STATX_GID | STATX_ATIME | STATX_MTIME | STATX_CTIME | - STATX_INO | STATX_SIZE | STATX_BLOCKS; + STATX_INO | STATX_SIZE | STATX_BLOCKS | STATX_VERSION; if ((query_flags & AT_STATX_DONT_SYNC) && !force_sync) { if (readdirplus_enabled) @@ -877,7 +879,7 @@ int nfs_getattr(struct user_namespace *mnt_userns, const struct path *path, /* Is the user requesting attributes that might need revalidation? */ if (!(request_mask & (STATX_MODE|STATX_NLINK|STATX_ATIME|STATX_CTIME| STATX_MTIME|STATX_UID|STATX_GID| - STATX_SIZE|STATX_BLOCKS))) + STATX_SIZE|STATX_BLOCKS|STATX_VERSION))) goto out_no_revalidate; /* Check whether the cached attributes are stale */ @@ -915,6 +917,10 @@ int nfs_getattr(struct user_namespace *mnt_userns, const struct path *path, generic_fillattr(&init_user_ns, inode, stat); stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode)); + stat->version = inode_peek_iversion_raw(inode); + stat->attributes_mask |= STATX_ATTR_VERSION_MONOTONIC; + if (server->change_attr_type != NFS4_CHANGE_TYPE_IS_UNDEFINED) + stat->attributes |= STATX_ATTR_VERSION_MONOTONIC; if (S_ISDIR(inode->i_mode)) stat->blksize = NFS_SERVER(inode)->dtsize; out: From patchwork Fri Sep 30 11:18:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 611989 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEAB1C4332F for ; Fri, 30 Sep 2022 11:27:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231284AbiI3L1d (ORCPT ); Fri, 30 Sep 2022 07:27:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232598AbiI3L03 (ORCPT ); Fri, 30 Sep 2022 07:26:29 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0743F18B09; Fri, 30 Sep 2022 04:18:57 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 9DC85B827BA; Fri, 30 Sep 2022 11:18:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7186AC433B5; Fri, 30 Sep 2022 11:18:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1664536735; bh=xnrewwvldKIjoeh5ebiZu+onJb8qn6gj1nIuV6NIss8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NcCbX8EemXumNyb6KT05tfi/yNz/uFtXUTZvXGxbnSuQ2ZLVjG52bwXYXh5fpucLL Axp4jUtiz64IRjTVjSGPMpgOGwmu8wvMssoRLFOIpJI522UPlfZVlcKmzveia0MsA0 QXEv6KjqFhnOQUm7yvMwpovx4fomvggXw6gH6kOyDiUirhpwVhUyKpaIclmtmJt7RQ kx5xoyorCoq1DbAdE8t5XYdq8PRHdXV81SrEz057Yr0UW5CY0v+c4HHNpWyI2ju5z3 XlPgr7T1s3XlyK0ebZK7J8b1c6Sx0BGR1KCu1Fdmob2dvqZZYmgMwUW2ECkfNDyAWo +oq2fJMT+cU2w== From: Jeff Layton To: tytso@mit.edu, adilger.kernel@dilger.ca, djwong@kernel.org, david@fromorbit.com, trondmy@hammerspace.com, neilb@suse.de, viro@zeniv.linux.org.uk, zohar@linux.ibm.com, xiubli@redhat.com, chuck.lever@oracle.com, lczerner@redhat.com, jack@suse.cz, bfields@fieldses.org, brauner@kernel.org, fweimer@redhat.com Cc: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org Subject: [PATCH v6 5/9] ceph: report the inode version in getattr if requested Date: Fri, 30 Sep 2022 07:18:36 -0400 Message-Id: <20220930111840.10695-6-jlayton@kernel.org> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220930111840.10695-1-jlayton@kernel.org> References: <20220930111840.10695-1-jlayton@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org When getattr requests the STX_VERSION, request the full gamut of caps (similarly to how ctime is handled). When the change attribute seems to be valid, return it in the ino_version field and set the flag in the reply mask. Also, unconditionally enable STATX_ATTR_VERSION_MONOTONIC. Reviewed-by: Xiubo Li Signed-off-by: Jeff Layton --- fs/ceph/inode.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 42351d7a0dd6..bcab855bf1ae 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -2415,10 +2415,10 @@ static int statx_to_caps(u32 want, umode_t mode) { int mask = 0; - if (want & (STATX_MODE|STATX_UID|STATX_GID|STATX_CTIME|STATX_BTIME)) + if (want & (STATX_MODE|STATX_UID|STATX_GID|STATX_CTIME|STATX_BTIME|STATX_VERSION)) mask |= CEPH_CAP_AUTH_SHARED; - if (want & (STATX_NLINK|STATX_CTIME)) { + if (want & (STATX_NLINK|STATX_CTIME|STATX_VERSION)) { /* * The link count for directories depends on inode->i_subdirs, * and that is only updated when Fs caps are held. @@ -2429,11 +2429,10 @@ static int statx_to_caps(u32 want, umode_t mode) mask |= CEPH_CAP_LINK_SHARED; } - if (want & (STATX_ATIME|STATX_MTIME|STATX_CTIME|STATX_SIZE| - STATX_BLOCKS)) + if (want & (STATX_ATIME|STATX_MTIME|STATX_CTIME|STATX_SIZE|STATX_BLOCKS|STATX_VERSION)) mask |= CEPH_CAP_FILE_SHARED; - if (want & (STATX_CTIME)) + if (want & (STATX_CTIME|STATX_VERSION)) mask |= CEPH_CAP_XATTR_SHARED; return mask; @@ -2475,6 +2474,11 @@ int ceph_getattr(struct user_namespace *mnt_userns, const struct path *path, valid_mask |= STATX_BTIME; } + if (request_mask & STATX_VERSION) { + stat->version = inode_peek_iversion_raw(inode); + valid_mask |= STATX_VERSION; + } + if (ceph_snap(inode) == CEPH_NOSNAP) stat->dev = inode->i_sb->s_dev; else @@ -2498,6 +2502,8 @@ int ceph_getattr(struct user_namespace *mnt_userns, const struct path *path, stat->nlink = 1 + 1 + ci->i_subdirs; } + stat->attributes_mask |= STATX_ATTR_VERSION_MONOTONIC; + stat->attributes |= STATX_ATTR_VERSION_MONOTONIC; stat->result_mask = request_mask & valid_mask; return err; } From patchwork Fri Sep 30 11:18:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 611001 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2F1EC43217 for ; Fri, 30 Sep 2022 11:27:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232141AbiI3L1Q (ORCPT ); Fri, 30 Sep 2022 07:27:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232178AbiI3L0a (ORCPT ); Fri, 30 Sep 2022 07:26:30 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 825513ECDB; Fri, 30 Sep 2022 04:18:58 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 28C68622D1; Fri, 30 Sep 2022 11:18:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A1DCDC433C1; Fri, 30 Sep 2022 11:18:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1664536737; bh=K6suG7gA7G0HL4El6Nr54LKmYnHFtn8h7wvH1rQyVEo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mye0eefhon6zgf3rEMigQnADi4MYYy1SPa+0E3bOzjJBh0pxZdTjSPFVb60JuKxss mQKT8S9kyCLyqClu25FMnCYLyij4/7GkC8CMc9MW+mW6FFf8g61LhK0CJpZVwFvVRj dTnu//QHLUHOwo0WlvpvugAKjngDyn/pJLDHjwN0Ozc9vfvDQACePpHgto8n4yMQlZ lSAQiIVNJhA04nL2Pupzv9/XEHjOYi8US17R4LqjBNeX4l2HAi7AUsA9Q2P7apnXoe 5J68XCW0pVOrvjW46rMtbTXpwQtGC2LQBmjTp1VDU31UwgZ6oHvmx7jBpC1WpvIaf6 oUnYUPpQoZ8Uw== From: Jeff Layton To: tytso@mit.edu, adilger.kernel@dilger.ca, djwong@kernel.org, david@fromorbit.com, trondmy@hammerspace.com, neilb@suse.de, viro@zeniv.linux.org.uk, zohar@linux.ibm.com, xiubli@redhat.com, chuck.lever@oracle.com, lczerner@redhat.com, jack@suse.cz, bfields@fieldses.org, brauner@kernel.org, fweimer@redhat.com Cc: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org Subject: [PATCH v6 6/9] nfsd: use the getattr operation to fetch i_version Date: Fri, 30 Sep 2022 07:18:37 -0400 Message-Id: <20220930111840.10695-7-jlayton@kernel.org> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220930111840.10695-1-jlayton@kernel.org> References: <20220930111840.10695-1-jlayton@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Now that we can call into vfs_getattr to get the i_version field, use that facility to fetch it instead of doing it in nfsd4_change_attribute. Neil also pointed out recently that IS_I_VERSION directory operations are always logged, and so we only need to mitigate the rollback problem on regular files. Also, we don't need to factor in the ctime when reexporting NFS or Ceph. Set the STATX_VERSION (and BTIME) bits in the request when we're dealing with a v4 request. Then, instead of looking at IS_I_VERSION when generating the change attr, look at the result mask and only use it if STATX_VERSION is set. With this change, we can drop the fetch_iversion export operation as well. Move nfsd4_change_attribute into nfsfh.c, and change it to only factor in the ctime if it's a regular file and the fs doesn't advertise STATX_ATTR_VERSION_MONOTONIC. Signed-off-by: Jeff Layton --- fs/nfs/export.c | 7 ------- fs/nfsd/nfs4xdr.c | 4 +++- fs/nfsd/nfsfh.c | 40 ++++++++++++++++++++++++++++++++++++++++ fs/nfsd/nfsfh.h | 29 +---------------------------- fs/nfsd/vfs.h | 7 ++++++- include/linux/exportfs.h | 1 - 6 files changed, 50 insertions(+), 38 deletions(-) diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 01596f2d0a1e..1a9d5aa51dfb 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -145,17 +145,10 @@ nfs_get_parent(struct dentry *dentry) return parent; } -static u64 nfs_fetch_iversion(struct inode *inode) -{ - nfs_revalidate_inode(inode, NFS_INO_INVALID_CHANGE); - return inode_peek_iversion_raw(inode); -} - const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, - .fetch_iversion = nfs_fetch_iversion, .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK| EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS| EXPORT_OP_NOATOMIC_ATTR, diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 1e9690a061ec..779c009314c6 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2869,7 +2869,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, goto out; } - err = vfs_getattr(&path, &stat, STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT); + err = vfs_getattr(&path, &stat, + STATX_BASIC_STATS | STATX_BTIME | STATX_VERSION, + AT_STATX_SYNC_AS_STAT); if (err) goto out_nfserr; if (!(stat.result_mask & STATX_BTIME)) diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index a5b71526cee0..9168bc657378 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -634,6 +634,10 @@ void fh_fill_pre_attrs(struct svc_fh *fhp) stat.mtime = inode->i_mtime; stat.ctime = inode->i_ctime; stat.size = inode->i_size; + if (v4 && IS_I_VERSION(inode)) { + stat.version = inode_query_iversion(inode); + stat.result_mask |= STATX_VERSION; + } } if (v4) fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode); @@ -665,6 +669,8 @@ void fh_fill_post_attrs(struct svc_fh *fhp) if (err) { fhp->fh_post_saved = false; fhp->fh_post_attr.ctime = inode->i_ctime; + if (v4 && IS_I_VERSION(inode)) + fhp->fh_post_attr.version = inode_query_iversion(inode); } else fhp->fh_post_saved = true; if (v4) @@ -754,3 +760,37 @@ enum fsid_source fsid_source(const struct svc_fh *fhp) return FSIDSOURCE_UUID; return FSIDSOURCE_DEV; } + +/* + * We could use i_version alone as the change attribute. However, i_version + * can go backwards on a regular file after an unclean shutdown. On its own + * that doesn't necessarily cause a problem, but if i_version goes backwards + * and then is incremented again it could reuse a value that was previously + * used before boot, and a client who queried the two values might incorrectly + * assume nothing changed. + * + * By using both ctime and the i_version counter we guarantee that as long as + * time doesn't go backwards we never reuse an old value. If the filesystem + * advertises STATX_ATTR_VERSION_MONOTONIC, then this mitigation is not needed. + * + * We only need to do this for regular files as well. For directories, we + * assume that the new change attr is always logged to stable storage in some + * fashion before the results can be seen. + */ +u64 nfsd4_change_attribute(struct kstat *stat, struct inode *inode) +{ + u64 chattr; + + if (stat->result_mask & STATX_VERSION) { + chattr = stat->version; + + if (S_ISREG(inode->i_mode) && + !(stat->attributes & STATX_ATTR_VERSION_MONOTONIC)) { + chattr += (u64)stat->ctime.tv_sec << 30; + chattr += stat->ctime.tv_nsec; + } + } else { + chattr = time_to_chattr(&stat->ctime); + } + return chattr; +} diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index c3ae6414fc5c..4c223a7a91d4 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -291,34 +291,7 @@ static inline void fh_clear_pre_post_attrs(struct svc_fh *fhp) fhp->fh_pre_saved = false; } -/* - * We could use i_version alone as the change attribute. However, - * i_version can go backwards after a reboot. On its own that doesn't - * necessarily cause a problem, but if i_version goes backwards and then - * is incremented again it could reuse a value that was previously used - * before boot, and a client who queried the two values might - * incorrectly assume nothing changed. - * - * By using both ctime and the i_version counter we guarantee that as - * long as time doesn't go backwards we never reuse an old value. - */ -static inline u64 nfsd4_change_attribute(struct kstat *stat, - struct inode *inode) -{ - if (inode->i_sb->s_export_op->fetch_iversion) - return inode->i_sb->s_export_op->fetch_iversion(inode); - else if (IS_I_VERSION(inode)) { - u64 chattr; - - chattr = stat->ctime.tv_sec; - chattr <<= 30; - chattr += stat->ctime.tv_nsec; - chattr += inode_query_iversion(inode); - return chattr; - } else - return time_to_chattr(&stat->ctime); -} - +u64 nfsd4_change_attribute(struct kstat *stat, struct inode *inode); extern void fh_fill_pre_attrs(struct svc_fh *fhp); extern void fh_fill_post_attrs(struct svc_fh *fhp); extern void fh_fill_both_attrs(struct svc_fh *fhp); diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index c95cd414b4bb..a905f59481ee 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -168,9 +168,14 @@ static inline void fh_drop_write(struct svc_fh *fh) static inline __be32 fh_getattr(const struct svc_fh *fh, struct kstat *stat) { + u32 request_mask = STATX_BASIC_STATS; struct path p = {.mnt = fh->fh_export->ex_path.mnt, .dentry = fh->fh_dentry}; - return nfserrno(vfs_getattr(&p, stat, STATX_BASIC_STATS, + + if (fh->fh_maxsize == NFS4_FHSIZE) + request_mask |= (STATX_BTIME | STATX_VERSION); + + return nfserrno(vfs_getattr(&p, stat, request_mask, AT_STATX_SYNC_AS_STAT)); } diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index fe848901fcc3..9f4d4bcbf251 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -213,7 +213,6 @@ struct export_operations { bool write, u32 *device_generation); int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); - u64 (*fetch_iversion)(struct inode *); #define EXPORT_OP_NOWCC (0x1) /* don't collect v3 wcc data */ #define EXPORT_OP_NOSUBTREECHK (0x2) /* no subtree checking */ #define EXPORT_OP_CLOSE_BEFORE_UNLINK (0x4) /* close files before unlink */ From patchwork Fri Sep 30 11:18:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 611991 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C80E8C433FE for ; Fri, 30 Sep 2022 11:27:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231485AbiI3L1O (ORCPT ); Fri, 30 Sep 2022 07:27:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51252 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230365AbiI3L0a (ORCPT ); Fri, 30 Sep 2022 07:26:30 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BA4E33F; Fri, 30 Sep 2022 04:19:02 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 1AC6DB82796; Fri, 30 Sep 2022 11:19:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D0497C4347C; Fri, 30 Sep 2022 11:18:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1664536739; bh=St1cCChc0lbaP2GGgoYTfAcwzz9QW4MhOm7A/jIRCQo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kMwAocvfHTBKUtRqGroqE07NXObF5mX7jhX8Yb75KJ4sLpO6GlqOMMbKvITUPYfDG VhfFiQdBwCDsLYoWGp0y+JBTZy+X6CxdqsNIuRCyrFFtQE6m6Dd31cQAhlMcTrWUin uCWwnsAOKlIMBSMoOqhAfKjdBmumYru8t29F21uX2gOVDn61SaA/jZieZliVKTQMhh 5T5zCYz+IqB3XcLMC2kFSiwNDl43vnl61dWKX8TqFGByzF3u0ZsTTogDC4YAuoJVcS O9LWMRwSxm/DtMXhe3LJPWKnvulaQEsZvuS1TFagOrdQ/WbFuCNC/LAvnRxGdJ7/NV uwyWWsIN1JHtQ== From: Jeff Layton To: tytso@mit.edu, adilger.kernel@dilger.ca, djwong@kernel.org, david@fromorbit.com, trondmy@hammerspace.com, neilb@suse.de, viro@zeniv.linux.org.uk, zohar@linux.ibm.com, xiubli@redhat.com, chuck.lever@oracle.com, lczerner@redhat.com, jack@suse.cz, bfields@fieldses.org, brauner@kernel.org, fweimer@redhat.com Cc: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org, Jeff Layton Subject: [PATCH v6 7/9] vfs: expose STATX_VERSION to userland Date: Fri, 30 Sep 2022 07:18:38 -0400 Message-Id: <20220930111840.10695-8-jlayton@kernel.org> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220930111840.10695-1-jlayton@kernel.org> References: <20220930111840.10695-1-jlayton@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Jeff Layton Claim one of the spare fields in struct statx to hold a 64-bit inode version attribute. When userland requests STATX_VERSION, copy the value from the kstat struct there, and stop masking off STATX_ATTR_VERSION_MONOTONIC. Update the test-statx sample program to output the change attr and MountId. Signed-off-by: Jeff Layton Reviewed-by: NeilBrown --- fs/stat.c | 12 +++--------- include/linux/stat.h | 9 --------- include/uapi/linux/stat.h | 6 ++++-- samples/vfs/test-statx.c | 8 ++++++-- 4 files changed, 13 insertions(+), 22 deletions(-) diff --git a/fs/stat.c b/fs/stat.c index e7f8cd4b24e1..8396c372022f 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -593,11 +593,9 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer) memset(&tmp, 0, sizeof(tmp)); - /* STATX_VERSION is kernel-only for now */ - tmp.stx_mask = stat->result_mask & ~STATX_VERSION; + tmp.stx_mask = stat->result_mask; tmp.stx_blksize = stat->blksize; - /* STATX_ATTR_VERSION_MONOTONIC is kernel-only for now */ - tmp.stx_attributes = stat->attributes & ~STATX_ATTR_VERSION_MONOTONIC; + tmp.stx_attributes = stat->attributes; tmp.stx_nlink = stat->nlink; tmp.stx_uid = from_kuid_munged(current_user_ns(), stat->uid); tmp.stx_gid = from_kgid_munged(current_user_ns(), stat->gid); @@ -621,6 +619,7 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer) tmp.stx_mnt_id = stat->mnt_id; tmp.stx_dio_mem_align = stat->dio_mem_align; tmp.stx_dio_offset_align = stat->dio_offset_align; + tmp.stx_version = stat->version; return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0; } @@ -636,11 +635,6 @@ int do_statx(int dfd, struct filename *filename, unsigned int flags, if ((flags & AT_STATX_SYNC_TYPE) == AT_STATX_SYNC_TYPE) return -EINVAL; - /* STATX_VERSION is kernel-only for now. Ignore requests - * from userland. - */ - mask &= ~STATX_VERSION; - error = vfs_statx(dfd, filename, flags, &stat, mask); if (error) return error; diff --git a/include/linux/stat.h b/include/linux/stat.h index 4e9428d86a3a..69c79e4fd1b1 100644 --- a/include/linux/stat.h +++ b/include/linux/stat.h @@ -54,13 +54,4 @@ struct kstat { u32 dio_offset_align; u64 version; }; - -/* These definitions are internal to the kernel for now. Mainly used by nfsd. */ - -/* mask values */ -#define STATX_VERSION 0x40000000U /* Want/got stx_change_attr */ - -/* file attribute values */ -#define STATX_ATTR_VERSION_MONOTONIC 0x8000000000000000ULL /* version monotonically increases */ - #endif diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h index 7cab2c65d3d7..4a0a1f27c059 100644 --- a/include/uapi/linux/stat.h +++ b/include/uapi/linux/stat.h @@ -127,7 +127,8 @@ struct statx { __u32 stx_dio_mem_align; /* Memory buffer alignment for direct I/O */ __u32 stx_dio_offset_align; /* File offset alignment for direct I/O */ /* 0xa0 */ - __u64 __spare3[12]; /* Spare space for future expansion */ + __u64 stx_version; /* Inode change attribute */ + __u64 __spare3[11]; /* Spare space for future expansion */ /* 0x100 */ }; @@ -154,6 +155,7 @@ struct statx { #define STATX_BTIME 0x00000800U /* Want/got stx_btime */ #define STATX_MNT_ID 0x00001000U /* Got stx_mnt_id */ #define STATX_DIOALIGN 0x00002000U /* Want/got direct I/O alignment info */ +#define STATX_VERSION 0x00004000U /* Want/got stx_version */ #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */ @@ -189,6 +191,6 @@ struct statx { #define STATX_ATTR_MOUNT_ROOT 0x00002000 /* Root of a mount */ #define STATX_ATTR_VERITY 0x00100000 /* [I] Verity protected file */ #define STATX_ATTR_DAX 0x00200000 /* File is currently in DAX state */ - +#define STATX_ATTR_VERSION_MONOTONIC 0x00400000 /* stx_version increases w/ every change */ #endif /* _UAPI_LINUX_STAT_H */ diff --git a/samples/vfs/test-statx.c b/samples/vfs/test-statx.c index 49c7a46cee07..bdbc371c9774 100644 --- a/samples/vfs/test-statx.c +++ b/samples/vfs/test-statx.c @@ -107,6 +107,8 @@ static void dump_statx(struct statx *stx) printf("Device: %-15s", buffer); if (stx->stx_mask & STATX_INO) printf(" Inode: %-11llu", (unsigned long long) stx->stx_ino); + if (stx->stx_mask & STATX_MNT_ID) + printf(" MountId: %llx", stx->stx_mnt_id); if (stx->stx_mask & STATX_NLINK) printf(" Links: %-5u", stx->stx_nlink); if (stx->stx_mask & STATX_TYPE) { @@ -145,7 +147,9 @@ static void dump_statx(struct statx *stx) if (stx->stx_mask & STATX_CTIME) print_time("Change: ", &stx->stx_ctime); if (stx->stx_mask & STATX_BTIME) - print_time(" Birth: ", &stx->stx_btime); + print_time("Birth: ", &stx->stx_btime); + if (stx->stx_mask & STATX_VERSION) + printf("Inode Version: 0x%llx\n", stx->stx_version); if (stx->stx_attributes_mask) { unsigned char bits, mbits; @@ -218,7 +222,7 @@ int main(int argc, char **argv) struct statx stx; int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW; - unsigned int mask = STATX_BASIC_STATS | STATX_BTIME; + unsigned int mask = STATX_BASIC_STATS | STATX_BTIME | STATX_MNT_ID | STATX_VERSION; for (argv++; *argv; argv++) { if (strcmp(*argv, "-F") == 0) { From patchwork Fri Sep 30 11:18:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 611990 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CF69C4332F for ; Fri, 30 Sep 2022 11:27:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231719AbiI3L1S (ORCPT ); Fri, 30 Sep 2022 07:27:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49832 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230438AbiI3L0a (ORCPT ); Fri, 30 Sep 2022 07:26:30 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 80D7D10D3; Fri, 30 Sep 2022 04:19:04 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 3C818B827BA; Fri, 30 Sep 2022 11:19:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1D155C43470; Fri, 30 Sep 2022 11:19:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1664536742; bh=uSYS665bHJOW0VgPrjPhB+eIdRUg3nkzUm52VZICAu0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=imbcD7d6rpnQgdKAoXWpTwniyU5ehyOAtwQa0fjjF1Zv+C/Up8oaasOc8/vQ/kmjj T26yTA/V8BY0QY16x7AeFuF4wpt8BDpqsyiMe3rMIE1wWyD3nyrljy0gTh/3lzj6b5 TD5biiRLBTzFAubEfD67uUdE7z/Zv9VQvaxicYgMzHJ4Ok81YaNYDaZQwEYQVcU2uw V3kKWaYG6BHZeX28d8LrQZJkRbcIQYxb4vmyGgU4ldNYll813gqZuqyuPUHZ9DIVw3 gLaGZV55F9N2OzXNFYsWJceKPjJ06Irqjkoq1i2VXH4Gi4ewHaCU8b4oypBh16FVSz 0EikFQXuIpMtw== From: Jeff Layton To: tytso@mit.edu, adilger.kernel@dilger.ca, djwong@kernel.org, david@fromorbit.com, trondmy@hammerspace.com, neilb@suse.de, viro@zeniv.linux.org.uk, zohar@linux.ibm.com, xiubli@redhat.com, chuck.lever@oracle.com, lczerner@redhat.com, jack@suse.cz, bfields@fieldses.org, brauner@kernel.org, fweimer@redhat.com Cc: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org Subject: [PATCH v6 8/9] vfs: update times after copying data in __generic_file_write_iter Date: Fri, 30 Sep 2022 07:18:39 -0400 Message-Id: <20220930111840.10695-9-jlayton@kernel.org> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220930111840.10695-1-jlayton@kernel.org> References: <20220930111840.10695-1-jlayton@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org The c/mtime and i_version currently get updated before the data is copied (or a DIO write is issued), which is problematic for NFS. READ+GETATTR can race with a write (even a local one) in such a way as to make the client associate the state of the file with the wrong change attribute. That association can persist indefinitely if the file sees no further changes. Move the setting of times to the bottom of the function in __generic_file_write_iter and only update it if something was successfully written. If the time update fails, log a warning once, but don't fail the write. All of the existing callers use update_time functions that don't fail, so we should never trip this. Signed-off-by: Jeff Layton --- mm/filemap.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 15800334147b..72c0ceb75176 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3812,10 +3812,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (err) goto out; - err = file_update_time(file); - if (err) - goto out; - if (iocb->ki_flags & IOCB_DIRECT) { loff_t pos, endbyte; @@ -3868,6 +3864,19 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) iocb->ki_pos += written; } out: + if (written > 0) { + err = file_update_time(file); + /* + * There isn't much we can do at this point if updating the + * times fails after a successful write. The times and i_version + * should still be updated in the inode, and it should still be + * marked dirty, so hopefully the next inode update will catch it. + * Log a warning once so we have a record that something untoward + * has occurred. + */ + WARN_ONCE(err, "Failed to update m/ctime after write: %ld\n", err); + } + current->backing_dev_info = NULL; return written ? written : err; } From patchwork Fri Sep 30 11:18:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 611992 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7E11C43217 for ; Fri, 30 Sep 2022 11:27:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231281AbiI3L1I (ORCPT ); Fri, 30 Sep 2022 07:27:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231177AbiI3L0b (ORCPT ); Fri, 30 Sep 2022 07:26:31 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA7E2EE2E; Fri, 30 Sep 2022 04:19:06 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 77060B8280D; Fri, 30 Sep 2022 11:19:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4AF6BC4347C; Fri, 30 Sep 2022 11:19:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1664536744; bh=x94ngh3SR30xuVLdH4ioO8YH0PC8Q6w0q3cK1o2WgAs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=E1+suCyBf4Mq23dZykr03FKvOkB2KN2pmo4hPt9FSzh5mDyAgRwzx4IBi3jD2xYta 2aH0OFlPU2JQZt0iJZDPo+cYiYNcx36IEelulL17hYdS4gazqnHPs16SmxbYG7/sIh Mn4Q66DPEqaqeLSh88qk9dZa9Z+SLVE+UCijLAM0IQjzkK+04GeTC+Y6owqADky9WX 6QI36erT6IJdk6dUqhK+InrXSOp5nQD/lPUXYB6IWKocKgbeYVjwswHwH3urvBP8HN fKO0R+gneckl6h3osuEOy9z4kQ2Iw97ImHbJTgAgvforuxHZb1TWNVjqkRNIonephB t2vFf8ic4Ckug== From: Jeff Layton To: tytso@mit.edu, adilger.kernel@dilger.ca, djwong@kernel.org, david@fromorbit.com, trondmy@hammerspace.com, neilb@suse.de, viro@zeniv.linux.org.uk, zohar@linux.ibm.com, xiubli@redhat.com, chuck.lever@oracle.com, lczerner@redhat.com, jack@suse.cz, bfields@fieldses.org, brauner@kernel.org, fweimer@redhat.com Cc: linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org Subject: [PATCH v6 9/9] ext4: update times after I/O in write codepaths Date: Fri, 30 Sep 2022 07:18:40 -0400 Message-Id: <20220930111840.10695-10-jlayton@kernel.org> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220930111840.10695-1-jlayton@kernel.org> References: <20220930111840.10695-1-jlayton@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org The times currently get updated before the data is copied (or the DIO is issued) which is problematic for NFSv4. A READ+GETATTR could race with a write in such a way to make the client associate the state of the file with the wrong change attribute, and that association could persist indefinitely if the file sees no further changes. For this reason, it's better to bump the times and change attribute after the data has been copied or the DIO write issued. Signed-off-by: Jeff Layton --- fs/ext4/file.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 109d07629f81..1fa8e0239856 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -246,7 +246,7 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) if (count <= 0) return count; - ret = file_modified(iocb->ki_filp); + ret = file_remove_privs(iocb->ki_filp); if (ret) return ret; return count; @@ -269,7 +269,11 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb, current->backing_dev_info = inode_to_bdi(inode); ret = generic_perform_write(iocb, from); current->backing_dev_info = NULL; - + if (ret > 0) { + ssize_t ret2 = file_update_time(iocb->ki_filp); + if (ret2) + ret = ret2; + } out: inode_unlock(inode); if (likely(ret > 0)) { @@ -455,7 +459,7 @@ static ssize_t ext4_dio_write_checks(struct kiocb *iocb, struct iov_iter *from, goto restart; } - ret = file_modified(file); + ret = file_remove_privs(file); if (ret < 0) goto out; @@ -572,6 +576,11 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) if (extend) ret = ext4_handle_inode_extension(inode, offset, ret, count); + if (ret > 0) { + ssize_t ret2 = file_update_time(iocb->ki_filp); + if (ret2) + ret = ret2; + } out: if (ilock_shared) inode_unlock_shared(inode); @@ -653,6 +662,11 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) if (extend) ret = ext4_handle_inode_extension(inode, offset, ret, count); + if (ret > 0) { + ssize_t ret2 = file_update_time(iocb->ki_filp); + if (ret2) + ret = ret2; + } out: inode_unlock(inode); if (ret > 0)