From patchwork Tue Sep 1 15:10:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg Kroah-Hartman X-Patchwork-Id: 310343 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D0B5C433E2 for ; Tue, 1 Sep 2020 16:17:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DD34720767 for ; Tue, 1 Sep 2020 16:17:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598977058; bh=uR7P4Q8mt8JWHFlW7DZ/bj4KrTVmE6vOkXGUZ96+1Fk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=ZQcl5tHAR89IUlNC5wU0UQYYu/LYzuVwq5T0nMXLugnJIZTXKElb7nq8JAlg2qSgq bR9LxBu04/B2Ep+WSwOdoZiHEqkdp9kiMrtqhVcnBw69VkzotMD0s75k1NpcJxUCK5 6BiSnvQVbbaU8pfI4cQc2HGJNNFMHzp61OTLPR2I= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726922AbgIAQRg (ORCPT ); Tue, 1 Sep 2020 12:17:36 -0400 Received: from mail.kernel.org ([198.145.29.99]:37902 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728077AbgIAPdg (ORCPT ); Tue, 1 Sep 2020 11:33:36 -0400 Received: from localhost (83-86-74-64.cable.dynamic.v4.ziggo.nl [83.86.74.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3E8C7214D8; Tue, 1 Sep 2020 15:33:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598974415; bh=uR7P4Q8mt8JWHFlW7DZ/bj4KrTVmE6vOkXGUZ96+1Fk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UClPfQux7g0IO4+j5Yu0p6eUXmm+Ykm6G7HtRCxEvFkd3DnSY0y40+NN/Olo4Pstl XCwDFYeOmG1ECW8MbYG3A00UqPEC+bJsIX48wEPBoqVTEo12xd0XynkJR4wZ1+BKQW k7LGjBQ7qEzV3yCuBiNKkSI7lmb0ZoZwiiLLWGcE= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Martijn Coenen , Christoph Hellwig , Jan Kara Subject: [PATCH 5.4 168/214] writeback: Avoid skipping inode writeback Date: Tue, 1 Sep 2020 17:10:48 +0200 Message-Id: <20200901151001.023896654@linuxfoundation.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200901150952.963606936@linuxfoundation.org> References: <20200901150952.963606936@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Jan Kara commit 5afced3bf28100d81fb2fe7e98918632a08feaf5 upstream. Inode's i_io_list list head is used to attach inode to several different lists - wb->{b_dirty, b_dirty_time, b_io, b_more_io}. When flush worker prepares a list of inodes to writeback e.g. for sync(2), it moves inodes to b_io list. Thus it is critical for sync(2) data integrity guarantees that inode is not requeued to any other writeback list when inode is queued for processing by flush worker. That's the reason why writeback_single_inode() does not touch i_io_list (unless the inode is completely clean) and why __mark_inode_dirty() does not touch i_io_list if I_SYNC flag is set. However there are two flaws in the current logic: 1) When inode has only I_DIRTY_TIME set but it is already queued in b_io list due to sync(2), concurrent __mark_inode_dirty(inode, I_DIRTY_SYNC) can still move inode back to b_dirty list resulting in skipping writeback of inode time stamps during sync(2). 2) When inode is on b_dirty_time list and writeback_single_inode() races with __mark_inode_dirty() like: writeback_single_inode() __mark_inode_dirty(inode, I_DIRTY_PAGES) inode->i_state |= I_SYNC __writeback_single_inode() inode->i_state |= I_DIRTY_PAGES; if (inode->i_state & I_SYNC) bail if (!(inode->i_state & I_DIRTY_ALL)) - not true so nothing done We end up with I_DIRTY_PAGES inode on b_dirty_time list and thus standard background writeback will not writeback this inode leading to possible dirty throttling stalls etc. (thanks to Martijn Coenen for this analysis). Fix these problems by tracking whether inode is queued in b_io or b_more_io lists in a new I_SYNC_QUEUED flag. When this flag is set, we know flush worker has queued inode and we should not touch i_io_list. On the other hand we also know that once flush worker is done with the inode it will requeue the inode to appropriate dirty list. When I_SYNC_QUEUED is not set, __mark_inode_dirty() can (and must) move inode to appropriate dirty list. Reported-by: Martijn Coenen Reviewed-by: Martijn Coenen Tested-by: Martijn Coenen Reviewed-by: Christoph Hellwig Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option") CC: stable@vger.kernel.org Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/fs-writeback.c | 17 ++++++++++++----- include/linux/fs.h | 8 ++++++-- 2 files changed, 18 insertions(+), 7 deletions(-) --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -146,6 +146,7 @@ static void inode_io_list_del_locked(str assert_spin_locked(&wb->list_lock); assert_spin_locked(&inode->i_lock); + inode->i_state &= ~I_SYNC_QUEUED; list_del_init(&inode->i_io_list); wb_io_lists_depopulated(wb); } @@ -1187,6 +1188,7 @@ static void redirty_tail_locked(struct i inode->dirtied_when = jiffies; } inode_io_list_move_locked(inode, wb, &wb->b_dirty); + inode->i_state &= ~I_SYNC_QUEUED; } static void redirty_tail(struct inode *inode, struct bdi_writeback *wb) @@ -1262,8 +1264,11 @@ static int move_expired_inodes(struct li break; list_move(&inode->i_io_list, &tmp); moved++; + spin_lock(&inode->i_lock); if (flags & EXPIRE_DIRTY_ATIME) - set_bit(__I_DIRTY_TIME_EXPIRED, &inode->i_state); + inode->i_state |= I_DIRTY_TIME_EXPIRED; + inode->i_state |= I_SYNC_QUEUED; + spin_unlock(&inode->i_lock); if (sb_is_blkdev_sb(inode->i_sb)) continue; if (sb && sb != inode->i_sb) @@ -1438,6 +1443,7 @@ static void requeue_inode(struct inode * } else if (inode->i_state & I_DIRTY_TIME) { inode->dirtied_when = jiffies; inode_io_list_move_locked(inode, wb, &wb->b_dirty_time); + inode->i_state &= ~I_SYNC_QUEUED; } else { /* The inode is clean. Remove from writeback lists. */ inode_io_list_del_locked(inode, wb); @@ -2301,11 +2307,12 @@ void __mark_inode_dirty(struct inode *in inode->i_state |= flags; /* - * If the inode is being synced, just update its dirty state. - * The unlocker will place the inode on the appropriate - * superblock list, based upon its state. + * If the inode is queued for writeback by flush worker, just + * update its dirty state. Once the flush worker is done with + * the inode it will place it on the appropriate superblock + * list, based upon its state. */ - if (inode->i_state & I_SYNC) + if (inode->i_state & I_SYNC_QUEUED) goto out_unlock_inode; /* --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2140,6 +2140,10 @@ static inline void init_sync_kiocb(struc * * I_CREATING New object's inode in the middle of setting up. * + * I_SYNC_QUEUED Inode is queued in b_io or b_more_io writeback lists. + * Used to detect that mark_inode_dirty() should not move + * inode between dirty lists. + * * Q: What is the difference between I_WILL_FREE and I_FREEING? */ #define I_DIRTY_SYNC (1 << 0) @@ -2157,11 +2161,11 @@ static inline void init_sync_kiocb(struc #define I_DIO_WAKEUP (1 << __I_DIO_WAKEUP) #define I_LINKABLE (1 << 10) #define I_DIRTY_TIME (1 << 11) -#define __I_DIRTY_TIME_EXPIRED 12 -#define I_DIRTY_TIME_EXPIRED (1 << __I_DIRTY_TIME_EXPIRED) +#define I_DIRTY_TIME_EXPIRED (1 << 12) #define I_WB_SWITCH (1 << 13) #define I_OVL_INUSE (1 << 14) #define I_CREATING (1 << 15) +#define I_SYNC_QUEUED (1 << 17) #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC) #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)