From patchwork Wed Jul 21 13:44:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 483389 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36FEDC636CA for ; Wed, 21 Jul 2021 13:45:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1D20D61249 for ; Wed, 21 Jul 2021 13:45:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232494AbhGUNEi (ORCPT ); Wed, 21 Jul 2021 09:04:38 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:30924 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238604AbhGUNEU (ORCPT ); Wed, 21 Jul 2021 09:04:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875092; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dD0k0Pq287Lg2ikVcgHQbQEUjRPU8jmSQuhejIIyzXI=; b=AT4XY8N673hG8+VAv7dM/parF9UGleZqBlMuCJCZ9KYmSaAn4KoxUIxD+fhIzV9nysUbx0 LFwDNXezvmveQWImNbmd5Eu6N7cCSivGXoBQR1FUK0/ORSF7T237O9J/jkPAJEbNTsA63Q JgryLjNIb38xuk3INtPDbGvKnAbPOGw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-252-1-2iTUoAPoiPWa2aZlt75Q-1; Wed, 21 Jul 2021 09:44:50 -0400 X-MC-Unique: 1-2iTUoAPoiPWa2aZlt75Q-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D3B08100B704; Wed, 21 Jul 2021 13:44:47 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6540D60CEC; Wed, 21 Jul 2021 13:44:40 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 01/12] afs: Sort out symlink reading From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:44:40 +0100 Message-ID: <162687508008.276387.6418924257569297305.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org afs_readpage() doesn't get a file pointer when called for a symlink, so separate it from regular file pointer handling. Signed-off-by: David Howells Reviewed-by: Jeff Layton --- fs/afs/file.c | 14 +++++++++----- fs/afs/inode.c | 6 +++--- fs/afs/internal.h | 3 ++- 3 files changed, 14 insertions(+), 9 deletions(-) diff --git a/fs/afs/file.c b/fs/afs/file.c index ca0d993add65..c9c21ad0e7c9 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -19,6 +19,7 @@ static int afs_file_mmap(struct file *file, struct vm_area_struct *vma); static int afs_readpage(struct file *file, struct page *page); +static int afs_symlink_readpage(struct file *file, struct page *page); static void afs_invalidatepage(struct page *page, unsigned int offset, unsigned int length); static int afs_releasepage(struct page *page, gfp_t gfp_flags); @@ -46,7 +47,7 @@ const struct inode_operations afs_file_inode_operations = { .permission = afs_permission, }; -const struct address_space_operations afs_fs_aops = { +const struct address_space_operations afs_file_aops = { .readpage = afs_readpage, .readahead = afs_readahead, .set_page_dirty = afs_set_page_dirty, @@ -60,6 +61,12 @@ const struct address_space_operations afs_fs_aops = { .writepages = afs_writepages, }; +const struct address_space_operations afs_symlink_aops = { + .readpage = afs_symlink_readpage, + .releasepage = afs_releasepage, + .invalidatepage = afs_invalidatepage, +}; + static const struct vm_operations_struct afs_vm_ops = { .fault = filemap_fault, .map_pages = filemap_map_pages, @@ -321,7 +328,7 @@ static void afs_req_issue_op(struct netfs_read_subrequest *subreq) afs_fetch_data(fsreq->vnode, fsreq); } -static int afs_symlink_readpage(struct page *page) +static int afs_symlink_readpage(struct file *file, struct page *page) { struct afs_vnode *vnode = AFS_FS_I(page->mapping->host); struct afs_read *fsreq; @@ -386,9 +393,6 @@ const struct netfs_read_request_ops afs_req_ops = { static int afs_readpage(struct file *file, struct page *page) { - if (!file) - return afs_symlink_readpage(page); - return netfs_readpage(file, page, &afs_req_ops, NULL); } diff --git a/fs/afs/inode.c b/fs/afs/inode.c index bef6f5ccfb09..cf7b66957c6f 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -105,7 +105,7 @@ static int afs_inode_init_from_status(struct afs_operation *op, inode->i_mode = S_IFREG | (status->mode & S_IALLUGO); inode->i_op = &afs_file_inode_operations; inode->i_fop = &afs_file_operations; - inode->i_mapping->a_ops = &afs_fs_aops; + inode->i_mapping->a_ops = &afs_file_aops; break; case AFS_FTYPE_DIR: inode->i_mode = S_IFDIR | (status->mode & S_IALLUGO); @@ -123,11 +123,11 @@ static int afs_inode_init_from_status(struct afs_operation *op, inode->i_mode = S_IFDIR | 0555; inode->i_op = &afs_mntpt_inode_operations; inode->i_fop = &afs_mntpt_file_operations; - inode->i_mapping->a_ops = &afs_fs_aops; + inode->i_mapping->a_ops = &afs_symlink_aops; } else { inode->i_mode = S_IFLNK | status->mode; inode->i_op = &afs_symlink_inode_operations; - inode->i_mapping->a_ops = &afs_fs_aops; + inode->i_mapping->a_ops = &afs_symlink_aops; } inode_nohighmem(inode); break; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 791cf02e5696..ccdde00ada8a 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -1050,7 +1050,8 @@ extern void afs_dynroot_depopulate(struct super_block *); /* * file.c */ -extern const struct address_space_operations afs_fs_aops; +extern const struct address_space_operations afs_file_aops; +extern const struct address_space_operations afs_symlink_aops; extern const struct inode_operations afs_file_inode_operations; extern const struct file_operations afs_file_operations; extern const struct netfs_read_request_ops afs_req_ops; From patchwork Wed Jul 21 13:45:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 483388 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6433DC636C9 for ; Wed, 21 Jul 2021 13:46:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4DBE260FF4 for ; Wed, 21 Jul 2021 13:46:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238684AbhGUNFZ (ORCPT ); Wed, 21 Jul 2021 09:05:25 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:27336 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237994AbhGUNE6 (ORCPT ); Wed, 21 Jul 2021 09:04:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875125; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l2ULAssdCvCuvxiyM6eqGFkpgO6JFY0hX9JtLUvVjhk=; b=SVIr604nd/jOLWcU/P+RUxMeg6x2JEI9WU9LhB+jbmud6ymBHf8ZsR2ncQZZx82Gu7NVlm SBRQNPCDWVOGgx7aQSzpMBW88upI2Py9msWBYTuqYi9eUVoONfRza1Z7sgDhcWxsQ5+SKo bFrcItALOhMgNHHd8gF/dgWAxGtVtGs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-109-REGEyuZqOumWlWTCvhfd0A-1; Wed, 21 Jul 2021 09:45:21 -0400 X-MC-Unique: REGEyuZqOumWlWTCvhfd0A-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 78771804140; Wed, 21 Jul 2021 13:45:19 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id A6E731036D14; Wed, 21 Jul 2021 13:45:11 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 03/12] netfs: Remove netfs_read_subrequest::transferred From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:45:11 +0100 Message-ID: <162687511125.276387.15493860267582539643.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Remove netfs_read_subrequest::transferred as it's redundant as the count on the iterator added to the subrequest can be used instead. Signed-off-by: David Howells --- fs/afs/file.c | 4 ++-- fs/netfs/read_helper.c | 26 ++++---------------------- include/linux/netfs.h | 1 - include/trace/events/netfs.h | 12 ++++++------ 4 files changed, 12 insertions(+), 31 deletions(-) diff --git a/fs/afs/file.c b/fs/afs/file.c index ca529f23515a..82e945dbe379 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -315,8 +315,8 @@ static void afs_req_issue_op(struct netfs_read_subrequest *subreq) return netfs_subreq_terminated(subreq, -ENOMEM, false); fsreq->subreq = subreq; - fsreq->pos = subreq->start + subreq->transferred; - fsreq->len = subreq->len - subreq->transferred; + fsreq->pos = subreq->start + subreq->len - iov_iter_count(&subreq->iter); + fsreq->len = iov_iter_count(&subreq->iter); fsreq->key = subreq->rreq->netfs_priv; fsreq->vnode = vnode; fsreq->iter = &subreq->iter; diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index 715f3e9c380d..5e1a9be48130 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -148,12 +148,7 @@ static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq, */ static void netfs_clear_unread(struct netfs_read_subrequest *subreq) { - struct iov_iter iter; - - iov_iter_xarray(&iter, READ, &subreq->rreq->mapping->i_pages, - subreq->start + subreq->transferred, - subreq->len - subreq->transferred); - iov_iter_zero(iov_iter_count(&iter), &iter); + iov_iter_zero(iov_iter_count(&subreq->iter), &subreq->iter); } static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error, @@ -173,14 +168,9 @@ static void netfs_read_from_cache(struct netfs_read_request *rreq, bool seek_data) { struct netfs_cache_resources *cres = &rreq->cache_resources; - struct iov_iter iter; netfs_stat(&netfs_n_rh_read); - iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages, - subreq->start + subreq->transferred, - subreq->len - subreq->transferred); - - cres->ops->read(cres, subreq->start, &iter, seek_data, + cres->ops->read(cres, subreq->start, &subreq->iter, seek_data, netfs_cache_read_terminated, subreq); } @@ -419,7 +409,7 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq) if (pgend < iopos + subreq->len) break; - account += subreq->transferred; + account += subreq->len - iov_iter_count(&subreq->iter); iopos += subreq->len; if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { subreq = list_next_entry(subreq, rreq_link); @@ -635,15 +625,8 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, goto failed; } - if (WARN(transferred_or_error > subreq->len - subreq->transferred, - "Subreq overread: R%x[%x] %zd > %zu - %zu", - rreq->debug_id, subreq->debug_index, - transferred_or_error, subreq->len, subreq->transferred)) - transferred_or_error = subreq->len - subreq->transferred; - subreq->error = 0; - subreq->transferred += transferred_or_error; - if (subreq->transferred < subreq->len) + if (iov_iter_count(&subreq->iter)) goto incomplete; complete: @@ -667,7 +650,6 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, incomplete: if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) { netfs_clear_unread(subreq); - subreq->transferred = subreq->len; goto complete; } diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 5e4fafcc9480..45d40c622205 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -116,7 +116,6 @@ struct netfs_read_subrequest { struct iov_iter iter; /* Iterator for this subrequest */ loff_t start; /* Where to start the I/O */ size_t len; /* Size of the I/O */ - size_t transferred; /* Amount of data transferred */ refcount_t usage; short error; /* 0 or error that occurred */ unsigned short debug_index; /* Index in list (for debugging output) */ diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index 4d470bffd9f1..04ac29fc700f 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -190,7 +190,7 @@ TRACE_EVENT(netfs_sreq, __field(enum netfs_read_source, source ) __field(enum netfs_sreq_trace, what ) __field(size_t, len ) - __field(size_t, transferred ) + __field(size_t, remain ) __field(loff_t, start ) ), @@ -202,7 +202,7 @@ TRACE_EVENT(netfs_sreq, __entry->source = sreq->source; __entry->what = what; __entry->len = sreq->len; - __entry->transferred = sreq->transferred; + __entry->remain = iov_iter_count(&sreq->iter); __entry->start = sreq->start; ), @@ -211,7 +211,7 @@ TRACE_EVENT(netfs_sreq, __print_symbolic(__entry->what, netfs_sreq_traces), __print_symbolic(__entry->source, netfs_sreq_sources), __entry->flags, - __entry->start, __entry->transferred, __entry->len, + __entry->start, __entry->len - __entry->remain, __entry->len, __entry->error) ); @@ -230,7 +230,7 @@ TRACE_EVENT(netfs_failure, __field(enum netfs_read_source, source ) __field(enum netfs_failure, what ) __field(size_t, len ) - __field(size_t, transferred ) + __field(size_t, remain ) __field(loff_t, start ) ), @@ -242,7 +242,7 @@ TRACE_EVENT(netfs_failure, __entry->source = sreq ? sreq->source : NETFS_INVALID_READ; __entry->what = what; __entry->len = sreq ? sreq->len : 0; - __entry->transferred = sreq ? sreq->transferred : 0; + __entry->remain = sreq ? iov_iter_count(&sreq->iter) : 0; __entry->start = sreq ? sreq->start : 0; ), @@ -250,7 +250,7 @@ TRACE_EVENT(netfs_failure, __entry->rreq, __entry->index, __print_symbolic(__entry->source, netfs_sreq_sources), __entry->flags, - __entry->start, __entry->transferred, __entry->len, + __entry->start, __entry->len - __entry->remain, __entry->len, __print_symbolic(__entry->what, netfs_failures), __entry->error) ); From patchwork Wed Jul 21 13:45:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 483387 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6344C6377A for ; Wed, 21 Jul 2021 13:46:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0C4A661362 for ; Wed, 21 Jul 2021 13:46:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238799AbhGUNGU (ORCPT ); Wed, 21 Jul 2021 09:06:20 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:24096 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238751AbhGUNFa (ORCPT ); Wed, 21 Jul 2021 09:05:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875166; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tJF6irQE00ShIEgld85BW9bY0f39qS1KE4GzM9lKwvI=; b=fWLA0O9z0sWG4/xLSEzvQDDnz13+qBIrzRR4RnRMXkCC5s44Zl+PVjI/V5ll36VRLhtfGf eXydgItOBswImpx85l4cd/3zOzz+OnxZDTbCWUcSKu7x8ZnoVQR7H2VzyINwsyOX+mR+3L iUL2D8xl1oigKHWFcZxV/9Y/M8ClCPs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-495-0ZU9Yv9pPY-NZ_7LkAyTLg-1; Wed, 21 Jul 2021 09:46:05 -0400 X-MC-Unique: 0ZU9Yv9pPY-NZ_7LkAyTLg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DFEBC93920; Wed, 21 Jul 2021 13:46:02 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id A515560C59; Wed, 21 Jul 2021 13:45:58 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 05/12] netfs: Add a netfs inode context From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:45:52 +0100 Message-ID: <162687515266.276387.1299416976214634692.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Add a netfs_i_context struct that should be included in the network filesystem's own inode struct wrapper, directly after the VFS's inode struct, e.g.: struct my_inode { struct { struct inode vfs_inode; struct netfs_i_context netfs_ctx; }; }; The netfs_i_context struct contains two fields for the network filesystem to use: struct netfs_i_context { ... struct fscache_cookie *cache; unsigned long flags; #define NETFS_ICTX_NEW_CONTENT 0 }; There's a pointer to the cache cookie and a flag to indicate that the content in the file is locally generated and entirely new (ie. the file was just created locally or was truncated to nothing). Two functions are provided to help with this: (1) void netfs_i_context_init(struct inode *inode, const struct netfs_request_ops *ops); Initialise the netfs context and set the operations. (2) struct netfs_i_context *netfs_i_context(struct inode *inode); Find the netfs context from the inode struct. Signed-off-by: David Howells --- fs/afs/callback.c | 2 - fs/afs/dir.c | 2 - fs/afs/dynroot.c | 1 fs/afs/file.c | 29 ++--------- fs/afs/inode.c | 10 ++-- fs/afs/internal.h | 13 ++--- fs/afs/super.c | 2 - fs/afs/write.c | 7 +-- fs/ceph/addr.c | 2 - fs/netfs/internal.h | 11 ++++ fs/netfs/read_helper.c | 124 ++++++++++++++++++++++-------------------------- fs/netfs/stats.c | 1 include/linux/netfs.h | 66 +++++++++++++++++++++----- 13 files changed, 146 insertions(+), 124 deletions(-) diff --git a/fs/afs/callback.c b/fs/afs/callback.c index 7d9b23d981bf..0d4b9678ad22 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -41,7 +41,7 @@ void __afs_break_callback(struct afs_vnode *vnode, enum afs_cb_break_reason reas { _enter(""); - clear_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags); + clear_bit(NETFS_ICTX_NEW_CONTENT, &netfs_i_context(&vnode->vfs_inode)->flags); if (test_and_clear_bit(AFS_VNODE_CB_PROMISED, &vnode->flags)) { vnode->cb_break++; afs_clear_permits(vnode); diff --git a/fs/afs/dir.c b/fs/afs/dir.c index ac829e63c570..a4c9cd6de622 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1350,7 +1350,7 @@ static void afs_vnode_new_inode(struct afs_operation *op) } vnode = AFS_FS_I(inode); - set_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags); + set_bit(NETFS_ICTX_NEW_CONTENT, &netfs_i_context(&vnode->vfs_inode)->flags); if (!op->error) afs_cache_permit(vnode, op->key, vnode->cb_break, &vp->scb); d_instantiate(op->dentry, inode); diff --git a/fs/afs/dynroot.c b/fs/afs/dynroot.c index db832cc931c8..f120bcb8bf73 100644 --- a/fs/afs/dynroot.c +++ b/fs/afs/dynroot.c @@ -76,6 +76,7 @@ struct inode *afs_iget_pseudo_dir(struct super_block *sb, bool root) /* there shouldn't be an existing inode */ BUG_ON(!(inode->i_state & I_NEW)); + netfs_i_context_init(inode, NULL); inode->i_size = 0; inode->i_mode = S_IFDIR | S_IRUGO | S_IXUGO; if (root) { diff --git a/fs/afs/file.c b/fs/afs/file.c index 82e945dbe379..1861e4ecc2ce 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -18,13 +18,11 @@ #include "internal.h" static int afs_file_mmap(struct file *file, struct vm_area_struct *vma); -static int afs_readpage(struct file *file, struct page *page); static int afs_symlink_readpage(struct file *file, struct page *page); static void afs_invalidatepage(struct page *page, unsigned int offset, unsigned int length); static int afs_releasepage(struct page *page, gfp_t gfp_flags); -static void afs_readahead(struct readahead_control *ractl); static ssize_t afs_direct_IO(struct kiocb *iocb, struct iov_iter *iter); const struct file_operations afs_file_operations = { @@ -48,8 +46,8 @@ const struct inode_operations afs_file_inode_operations = { }; const struct address_space_operations afs_file_aops = { - .readpage = afs_readpage, - .readahead = afs_readahead, + .readpage = netfs_readpage, + .readahead = netfs_readahead, .set_page_dirty = afs_set_page_dirty, .launder_page = afs_launder_page, .releasepage = afs_releasepage, @@ -153,7 +151,8 @@ int afs_open(struct inode *inode, struct file *file) } if (file->f_flags & O_TRUNC) - set_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags); + set_bit(NETFS_ICTX_NEW_CONTENT, + &netfs_i_context(&vnode->vfs_inode)->flags); fscache_use_cookie(afs_vnode_cache(vnode), file->f_mode & FMODE_WRITE); @@ -351,13 +350,6 @@ static void afs_init_rreq(struct netfs_read_request *rreq, struct file *file) rreq->netfs_priv = key_get(afs_file_key(file)); } -static bool afs_is_cache_enabled(struct inode *inode) -{ - struct fscache_cookie *cookie = afs_vnode_cache(AFS_FS_I(inode)); - - return fscache_cookie_enabled(cookie) && cookie->cache_priv; -} - static int afs_begin_cache_operation(struct netfs_read_request *rreq) { struct afs_vnode *vnode = AFS_FS_I(rreq->inode); @@ -378,25 +370,14 @@ static void afs_priv_cleanup(struct address_space *mapping, void *netfs_priv) key_put(netfs_priv); } -const struct netfs_read_request_ops afs_req_ops = { +const struct netfs_request_ops afs_req_ops = { .init_rreq = afs_init_rreq, - .is_cache_enabled = afs_is_cache_enabled, .begin_cache_operation = afs_begin_cache_operation, .check_write_begin = afs_check_write_begin, .issue_op = afs_req_issue_op, .cleanup = afs_priv_cleanup, }; -static int afs_readpage(struct file *file, struct page *page) -{ - return netfs_readpage(file, page, &afs_req_ops, NULL); -} - -static void afs_readahead(struct readahead_control *ractl) -{ - netfs_readahead(ractl, &afs_req_ops, NULL); -} - int afs_write_inode(struct inode *inode, struct writeback_control *wbc) { fscache_unpin_writeback(wbc, afs_vnode_cache(AFS_FS_I(inode))); diff --git a/fs/afs/inode.c b/fs/afs/inode.c index cf7b66957c6f..3e9e388245a1 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -430,7 +430,7 @@ static void afs_get_inode_cache(struct afs_vnode *vnode) struct afs_vnode_cache_aux aux; if (vnode->status.type != AFS_FTYPE_FILE) { - vnode->cache = NULL; + vnode->netfs_ctx.cache = NULL; return; } @@ -440,7 +440,7 @@ static void afs_get_inode_cache(struct afs_vnode *vnode) key.vnode_id_ext[1] = htonl(vnode->fid.vnode_hi); afs_set_cache_aux(vnode, &aux); - vnode->cache = fscache_acquire_cookie( + vnode->netfs_ctx.cache = fscache_acquire_cookie( vnode->volume->cache, vnode->status.type == AFS_FTYPE_FILE ? 0 : FSCACHE_ADV_SINGLE_CHUNK, &key, sizeof(key), @@ -479,6 +479,7 @@ struct inode *afs_iget(struct afs_operation *op, struct afs_vnode_param *vp) return inode; } + netfs_i_context_init(inode, &afs_req_ops); ret = afs_inode_init_from_status(op, vp, vnode); if (ret < 0) goto bad_inode; @@ -535,6 +536,7 @@ struct inode *afs_root_iget(struct super_block *sb, struct key *key) _debug("GOT ROOT INODE %p { vl=%llx }", inode, as->volume->vid); BUG_ON(!(inode->i_state & I_NEW)); + netfs_i_context_init(inode, &afs_req_ops); vnode = AFS_FS_I(inode); vnode->cb_v_break = as->volume->cb_v_break, @@ -803,9 +805,9 @@ void afs_evict_inode(struct inode *inode) } #ifdef CONFIG_AFS_FSCACHE - fscache_relinquish_cookie(vnode->cache, + fscache_relinquish_cookie(vnode->netfs_ctx.cache, test_bit(AFS_VNODE_DELETED, &vnode->flags)); - vnode->cache = NULL; + vnode->netfs_ctx.cache = NULL; #endif afs_prune_wb_keys(vnode); diff --git a/fs/afs/internal.h b/fs/afs/internal.h index ccdde00ada8a..e0204dde4b50 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -615,15 +615,15 @@ enum afs_lock_state { * leak from one inode to another. */ struct afs_vnode { - struct inode vfs_inode; /* the VFS's inode record */ + struct { + struct inode vfs_inode; /* the VFS's inode record */ + struct netfs_i_context netfs_ctx; /* Netfslib context */ + }; struct afs_volume *volume; /* volume on which vnode resides */ struct afs_fid fid; /* the file identifier for this inode */ struct afs_file_status status; /* AFS status info for this file */ afs_dataversion_t invalid_before; /* Child dentries are invalid before this */ -#ifdef CONFIG_AFS_FSCACHE - struct fscache_cookie *cache; /* caching cookie */ -#endif struct afs_permits __rcu *permit_cache; /* cache of permits so far obtained */ struct mutex io_lock; /* Lock for serialising I/O on this mutex */ struct rw_semaphore validate_lock; /* lock for validating this vnode */ @@ -640,7 +640,6 @@ struct afs_vnode { #define AFS_VNODE_MOUNTPOINT 5 /* set if vnode is a mountpoint symlink */ #define AFS_VNODE_AUTOCELL 6 /* set if Vnode is an auto mount point */ #define AFS_VNODE_PSEUDODIR 7 /* set if Vnode is a pseudo directory */ -#define AFS_VNODE_NEW_CONTENT 8 /* Set if file has new content (create/trunc-0) */ #define AFS_VNODE_SILLY_DELETED 9 /* Set if file has been silly-deleted */ #define AFS_VNODE_MODIFYING 10 /* Set if we're performing a modification op */ @@ -666,7 +665,7 @@ struct afs_vnode { static inline struct fscache_cookie *afs_vnode_cache(struct afs_vnode *vnode) { #ifdef CONFIG_AFS_FSCACHE - return vnode->cache; + return vnode->netfs_ctx.cache; #else return NULL; #endif @@ -1054,7 +1053,7 @@ extern const struct address_space_operations afs_file_aops; extern const struct address_space_operations afs_symlink_aops; extern const struct inode_operations afs_file_inode_operations; extern const struct file_operations afs_file_operations; -extern const struct netfs_read_request_ops afs_req_ops; +extern const struct netfs_request_ops afs_req_ops; extern int afs_cache_wb_key(struct afs_vnode *, struct afs_file *); extern void afs_put_wb_key(struct afs_wb_key *); diff --git a/fs/afs/super.c b/fs/afs/super.c index 85e52c78f44f..29c1178beb72 100644 --- a/fs/afs/super.c +++ b/fs/afs/super.c @@ -692,7 +692,7 @@ static struct inode *afs_alloc_inode(struct super_block *sb) vnode->lock_key = NULL; vnode->permit_cache = NULL; #ifdef CONFIG_AFS_FSCACHE - vnode->cache = NULL; + vnode->netfs_ctx.cache = NULL; #endif vnode->flags = 1 << AFS_VNODE_UNSET; diff --git a/fs/afs/write.c b/fs/afs/write.c index 3be3a594124c..a244187f3503 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -49,8 +49,7 @@ int afs_write_begin(struct file *file, struct address_space *mapping, * file. We need to do this before we get a lock on the page in case * there's more than one writer competing for the same cache block. */ - ret = netfs_write_begin(file, mapping, pos, len, flags, &page, fsdata, - &afs_req_ops, NULL); + ret = netfs_write_begin(file, mapping, pos, len, flags, &page, fsdata); if (ret < 0) return ret; @@ -76,7 +75,7 @@ int afs_write_begin(struct file *file, struct address_space *mapping, * spaces to be merged into writes. If it's not, only write * back what the user gives us. */ - if (!test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags) && + if (!test_bit(NETFS_ICTX_NEW_CONTENT, &vnode->netfs_ctx.flags) && (to < f || from > t)) goto flush_conflicting_write; } @@ -557,7 +556,7 @@ static ssize_t afs_write_back_from_locked_page(struct address_space *mapping, unsigned long priv; unsigned int offset, to, len, max_len; loff_t i_size = i_size_read(&vnode->vfs_inode); - bool new_content = test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags); + bool new_content = test_bit(NETFS_ICTX_NEW_CONTENT, &vnode->netfs_ctx.flags); long count = wbc->nr_to_write; int ret; diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index a1e2813731d1..a8a41254e691 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -305,7 +305,7 @@ static void ceph_readahead_cleanup(struct address_space *mapping, void *priv) ceph_put_cap_refs(ci, got); } -static const struct netfs_read_request_ops ceph_netfs_read_ops = { +static const struct netfs_request_ops ceph_netfs_read_ops = { .init_rreq = ceph_init_rreq, .is_cache_enabled = ceph_is_cache_enabled, .begin_cache_operation = ceph_begin_cache_operation, diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index b7f2c4459f33..4805d9fc8808 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -5,6 +5,10 @@ * Written by David Howells (dhowells@redhat.com) */ +#include +#include +#include + #ifdef pr_fmt #undef pr_fmt #endif @@ -50,6 +54,13 @@ static inline void netfs_stat_d(atomic_t *stat) atomic_dec(stat); } +static inline bool netfs_is_cache_enabled(struct inode *inode) +{ + struct fscache_cookie *cookie = netfs_i_cookie(inode); + + return fscache_cookie_enabled(cookie) && cookie->cache_priv; +} + #else #define netfs_stat(x) do {} while(0) #define netfs_stat_d(x) do {} while(0) diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index b03bc5b0da5a..aa98ecf6df6b 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -14,7 +14,6 @@ #include #include #include -#include #include "internal.h" #define CREATE_TRACE_POINTS #include @@ -38,26 +37,27 @@ static void netfs_put_subrequest(struct netfs_read_subrequest *subreq, __netfs_put_subrequest(subreq, was_async); } -static struct netfs_read_request *netfs_alloc_read_request( - const struct netfs_read_request_ops *ops, void *netfs_priv, - struct file *file) +static struct netfs_read_request *netfs_alloc_read_request(struct address_space *mapping, + struct file *file) { static atomic_t debug_ids; + struct inode *inode = file ? file_inode(file) : mapping->host; + struct netfs_i_context *ctx = netfs_i_context(inode); struct netfs_read_request *rreq; rreq = kzalloc(sizeof(struct netfs_read_request), GFP_KERNEL); if (rreq) { - rreq->netfs_ops = ops; - rreq->netfs_priv = netfs_priv; - rreq->inode = file_inode(file); - rreq->i_size = i_size_read(rreq->inode); + rreq->mapping = mapping; + rreq->inode = inode; + rreq->netfs_ops = ctx->ops; + rreq->i_size = i_size_read(inode); rreq->debug_id = atomic_inc_return(&debug_ids); xa_init(&rreq->buffer); INIT_LIST_HEAD(&rreq->subrequests); INIT_WORK(&rreq->work, netfs_rreq_work); refcount_set(&rreq->usage, 1); __set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags); - ops->init_rreq(rreq, file); + ctx->ops->init_rreq(rreq, file); netfs_stat(&netfs_n_rh_rreq); } @@ -971,8 +971,6 @@ static int netfs_rreq_set_up_buffer(struct netfs_read_request *rreq, /** * netfs_readahead - Helper to manage a read request * @ractl: The description of the readahead request - * @ops: The network filesystem's operations for the helper to use - * @netfs_priv: Private netfs data to be retained in the request * * Fulfil a readahead request by drawing data from the cache if possible, or * the netfs if not. Space beyond the EOF is zero-filled. Multiple I/O @@ -980,34 +978,31 @@ static int netfs_rreq_set_up_buffer(struct netfs_read_request *rreq, * readahead window can be expanded in either direction to a more convenient * alighment for RPC efficiency or to make storage in the cache feasible. * - * The calling netfs must provide a table of operations, only one of which, - * issue_op, is mandatory. It may also be passed a private token, which will - * be retained in rreq->netfs_priv and will be cleaned up by ops->cleanup(). + * The calling netfs must initialise a netfs context contiguous to the vfs + * inode before calling this. * * This is usable whether or not caching is enabled. */ -void netfs_readahead(struct readahead_control *ractl, - const struct netfs_read_request_ops *ops, - void *netfs_priv) +void netfs_readahead(struct readahead_control *ractl) { struct netfs_read_request *rreq; + struct netfs_i_context *ctx = netfs_i_context(ractl->mapping->host); unsigned int debug_index = 0; int ret; _enter("%lx,%x", readahead_index(ractl), readahead_count(ractl)); if (readahead_count(ractl) == 0) - goto cleanup; + return; - rreq = netfs_alloc_read_request(ops, netfs_priv, ractl->file); + rreq = netfs_alloc_read_request(ractl->mapping, ractl->file); if (!rreq) - goto cleanup; - rreq->mapping = ractl->mapping; + return; rreq->start = readahead_pos(ractl); rreq->len = readahead_length(ractl); - if (ops->begin_cache_operation) { - ret = ops->begin_cache_operation(rreq); + if (ctx->ops->begin_cache_operation) { + ret = ctx->ops->begin_cache_operation(rreq); if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) goto cleanup_free; } @@ -1039,10 +1034,6 @@ void netfs_readahead(struct readahead_control *ractl, cleanup_free: netfs_put_read_request(rreq, false); return; -cleanup: - if (netfs_priv) - ops->cleanup(ractl->mapping, netfs_priv); - return; } EXPORT_SYMBOL(netfs_readahead); @@ -1050,43 +1041,34 @@ EXPORT_SYMBOL(netfs_readahead); * netfs_readpage - Helper to manage a readpage request * @file: The file to read from * @page: The page to read - * @ops: The network filesystem's operations for the helper to use - * @netfs_priv: Private netfs data to be retained in the request * * Fulfil a readpage request by drawing data from the cache if possible, or the * netfs if not. Space beyond the EOF is zero-filled. Multiple I/O requests * from different sources will get munged together. * - * The calling netfs must provide a table of operations, only one of which, - * issue_op, is mandatory. It may also be passed a private token, which will - * be retained in rreq->netfs_priv and will be cleaned up by ops->cleanup(). + * The calling netfs must initialise a netfs context contiguous to the vfs + * inode before calling this. * * This is usable whether or not caching is enabled. */ -int netfs_readpage(struct file *file, - struct page *page, - const struct netfs_read_request_ops *ops, - void *netfs_priv) +int netfs_readpage(struct file *file, struct page *page) { + struct address_space *mapping = page_file_mapping(page); struct netfs_read_request *rreq; + struct netfs_i_context *ctx = netfs_i_context(mapping->host); unsigned int debug_index = 0; int ret; _enter("%lx", page_index(page)); - rreq = netfs_alloc_read_request(ops, netfs_priv, file); - if (!rreq) { - if (netfs_priv) - ops->cleanup(netfs_priv, page_file_mapping(page)); - unlock_page(page); - return -ENOMEM; - } - rreq->mapping = page_file_mapping(page); + rreq = netfs_alloc_read_request(mapping, file); + if (!rreq) + goto nomem; rreq->start = page_file_offset(page); rreq->len = thp_size(page); - if (ops->begin_cache_operation) { - ret = ops->begin_cache_operation(rreq); + if (ctx->ops->begin_cache_operation) { + ret = ctx->ops->begin_cache_operation(rreq); if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) { unlock_page(page); goto out; @@ -1128,6 +1110,9 @@ int netfs_readpage(struct file *file, out: netfs_put_read_request(rreq, false); return ret; +nomem: + unlock_page(page); + return -ENOMEM; } EXPORT_SYMBOL(netfs_readpage); @@ -1136,6 +1121,7 @@ EXPORT_SYMBOL(netfs_readpage); * @page: page being prepared * @pos: starting position for the write * @len: length of write + * @always_fill: T if the page should always be completely filled/cleared * * In some cases, write_begin doesn't need to read at all: * - full page write @@ -1145,14 +1131,24 @@ EXPORT_SYMBOL(netfs_readpage); * If any of these criteria are met, then zero out the unwritten parts * of the page and return true. Otherwise, return false. */ -static bool netfs_skip_page_read(struct page *page, loff_t pos, size_t len) +static bool netfs_skip_page_read(struct page *page, loff_t pos, size_t len, + bool always_fill) { struct inode *inode = page->mapping->host; loff_t i_size = i_size_read(inode); size_t offset = offset_in_thp(page, pos); + size_t plen = thp_size(page); + + if (unlikely(always_fill)) { + if (pos - offset + len <= i_size) + return false; /* Page entirely before EOF */ + zero_user_segment(page, 0, plen); + SetPageUptodate(page); + return true; + } /* Full page write */ - if (offset == 0 && len >= thp_size(page)) + if (offset == 0 && len >= plen) return true; /* pos beyond last page in the file */ @@ -1165,7 +1161,7 @@ static bool netfs_skip_page_read(struct page *page, loff_t pos, size_t len) return false; zero_out: - zero_user_segments(page, 0, offset, offset + len, thp_size(page)); + zero_user_segments(page, 0, offset, offset + len, plen); return true; } @@ -1178,8 +1174,6 @@ static bool netfs_skip_page_read(struct page *page, loff_t pos, size_t len) * @flags: AOP_* flags * @_page: Where to put the resultant page * @_fsdata: Place for the netfs to store a cookie - * @ops: The network filesystem's operations for the helper to use - * @netfs_priv: Private netfs data to be retained in the request * * Pre-read data for a write-begin request by drawing data from the cache if * possible, or the netfs if not. Space beyond the EOF is zero-filled. @@ -1198,17 +1192,19 @@ static bool netfs_skip_page_read(struct page *page, loff_t pos, size_t len) * should go ahead; unlock the page and return -EAGAIN to cause the page to be * regot; or return an error. * + * The calling netfs must initialise a netfs context contiguous to the vfs + * inode before calling this. + * * This is usable whether or not caching is enabled. */ int netfs_write_begin(struct file *file, struct address_space *mapping, loff_t pos, unsigned int len, unsigned int flags, - struct page **_page, void **_fsdata, - const struct netfs_read_request_ops *ops, - void *netfs_priv) + struct page **_page, void **_fsdata) { struct netfs_read_request *rreq; struct page *page, *xpage; struct inode *inode = file_inode(file); + struct netfs_i_context *ctx = netfs_i_context(inode); unsigned int debug_index = 0; pgoff_t index = pos >> PAGE_SHIFT; int ret; @@ -1220,9 +1216,9 @@ int netfs_write_begin(struct file *file, struct address_space *mapping, if (!page) return -ENOMEM; - if (ops->check_write_begin) { + if (ctx->ops->check_write_begin) { /* Allow the netfs (eg. ceph) to flush conflicts. */ - ret = ops->check_write_begin(file, pos, len, page, _fsdata); + ret = ctx->ops->check_write_begin(file, pos, len, page, _fsdata); if (ret < 0) { trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin); if (ret == -EAGAIN) @@ -1238,25 +1234,23 @@ int netfs_write_begin(struct file *file, struct address_space *mapping, * within the cache granule containing the EOF, in which case we need * to preload the granule. */ - if (!ops->is_cache_enabled(inode) && - netfs_skip_page_read(page, pos, len)) { + if (!netfs_is_cache_enabled(inode) && + netfs_skip_page_read(page, pos, len, false)) { netfs_stat(&netfs_n_rh_write_zskip); goto have_page_no_wait; } ret = -ENOMEM; - rreq = netfs_alloc_read_request(ops, netfs_priv, file); + rreq = netfs_alloc_read_request(mapping, file); if (!rreq) goto error; - rreq->mapping = page->mapping; rreq->start = page_offset(page); rreq->len = thp_size(page); rreq->no_unlock_page = page->index; __set_bit(NETFS_RREQ_NO_UNLOCK_PAGE, &rreq->flags); - netfs_priv = NULL; - if (ops->begin_cache_operation) { - ret = ops->begin_cache_operation(rreq); + if (ctx->ops->begin_cache_operation) { + ret = ctx->ops->begin_cache_operation(rreq); if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) goto error_put; } @@ -1314,8 +1308,6 @@ int netfs_write_begin(struct file *file, struct address_space *mapping, if (ret < 0) goto error; have_page_no_wait: - if (netfs_priv) - ops->cleanup(netfs_priv, mapping); *_page = page; _leave(" = 0"); return 0; @@ -1325,8 +1317,6 @@ int netfs_write_begin(struct file *file, struct address_space *mapping, error: unlock_page(page); put_page(page); - if (netfs_priv) - ops->cleanup(netfs_priv, mapping); _leave(" = %d", ret); return ret; } diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index 9ae538c85378..5510a7a14a40 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -7,7 +7,6 @@ #include #include -#include #include "internal.h" atomic_t netfs_n_rh_readahead; diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 815001fe7a76..35bcd916c3a0 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -157,14 +157,25 @@ struct netfs_read_request { #define NETFS_RREQ_DONT_UNLOCK_PAGES 3 /* Don't unlock the pages on completion */ #define NETFS_RREQ_FAILED 4 /* The request failed */ #define NETFS_RREQ_IN_PROGRESS 5 /* Unlocked when the request completes */ - const struct netfs_read_request_ops *netfs_ops; + const struct netfs_request_ops *netfs_ops; +}; + +/* + * Per-inode description. This must be directly after the inode struct. + */ +struct netfs_i_context { + const struct netfs_request_ops *ops; +#ifdef CONFIG_FSCACHE + struct fscache_cookie *cache; +#endif + unsigned long flags; +#define NETFS_ICTX_NEW_CONTENT 0 /* Set if file has new content (create/trunc-0) */ }; /* * Operations the network filesystem can/must provide to the helpers. */ -struct netfs_read_request_ops { - bool (*is_cache_enabled)(struct inode *inode); +struct netfs_request_ops { void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); int (*begin_cache_operation)(struct netfs_read_request *rreq); void (*expand_readahead)(struct netfs_read_request *rreq); @@ -218,20 +229,49 @@ struct netfs_cache_ops { }; struct readahead_control; -extern void netfs_readahead(struct readahead_control *, - const struct netfs_read_request_ops *, - void *); -extern int netfs_readpage(struct file *, - struct page *, - const struct netfs_read_request_ops *, - void *); +extern void netfs_readahead(struct readahead_control *); +extern int netfs_readpage(struct file *, struct page *); extern int netfs_write_begin(struct file *, struct address_space *, loff_t, unsigned int, unsigned int, struct page **, - void **, - const struct netfs_read_request_ops *, - void *); + void **); extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); extern void netfs_stats_show(struct seq_file *); +/** + * netfs_i_context - Get the netfs inode context from the inode + * @inode: The inode to query + * + * This function gets the netfs lib inode context from the network filesystem's + * inode. It expects it to follow on directly from the VFS inode struct. + */ +static inline struct netfs_i_context *netfs_i_context(struct inode *inode) +{ + return (struct netfs_i_context *)(inode + 1); +} + +static inline void netfs_i_context_init(struct inode *inode, + const struct netfs_request_ops *ops) +{ + struct netfs_i_context *ctx = netfs_i_context(inode); + + ctx->ops = ops; +} + +/** + * netfs_i_cookie - Get the cache cookie from the inode + * @inode: The inode to query + * + * Get the caching cookie (if enabled) from the network filesystem's inode. + */ +static inline struct fscache_cookie *netfs_i_cookie(struct inode *inode) +{ +#ifdef CONFIG_FSCACHE + struct netfs_i_context *ctx = netfs_i_context(inode); + return ctx->cache; +#else + return NULL; +#endif +} + #endif /* _LINUX_NETFS_H */ From patchwork Wed Jul 21 13:46:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 483385 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67B6EC63793 for ; Wed, 21 Jul 2021 13:47:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 44C8361241 for ; Wed, 21 Jul 2021 13:47:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238667AbhGUNGf (ORCPT ); Wed, 21 Jul 2021 09:06:35 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:28430 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238756AbhGUNFl (ORCPT ); Wed, 21 Jul 2021 09:05:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875177; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i721gV0aIu0rvLexN0pBqMbwBbkKzEhWVB3KEf7BlKk=; b=R/nCxhts+XWfzbH+GT3/TPTfIAKGJJfsISHiyOy4ho4Qb0Mo9wtc3cg8wdleGa5DpbO2KR SQL56LXj1wf0fnIDcp4a6GUGA9ootBWcvrx4PRlPpIEWZkPzjbxNoq/v/Nq5I2fKwp6Tj/ SHRPhZkkmp2INJkpWGN8SqeJfkf5p6U= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-199-KrkAGfI-N4-Yedd1kx3LBQ-1; Wed, 21 Jul 2021 09:46:15 -0400 X-MC-Unique: KrkAGfI-N4-Yedd1kx3LBQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1B59C93920; Wed, 21 Jul 2021 13:46:13 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id E905A6EF4F; Wed, 21 Jul 2021 13:46:08 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 06/12] netfs: Keep lists of pending, active, dirty and flushed regions From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:46:08 +0100 Message-ID: <162687516812.276387.504081062999158040.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org This looks nice, in theory, and has the following features: (*) Things are managed with write records. (-) A WRITE is a region defined by an outer bounding box that spans the pages that are involved and an inner region that contains the actual modifications. (-) The bounding box must encompass all the data that will be necessary to perform a write operation to the server (for example, if we want to encrypt with a 64K block size when we have 4K pages). (*) There are four list of write records: (-) The PENDING LIST holds writes that are blocked by another active write. This list is in order of submission to avoid starvation and may overlap. (-) The ACTIVE LIST holds writes that have been granted exclusive access to a patch. This is in order of starting position and regions held therein may not overlap. (-) The DIRTY LIST holds a list of regions that have been modified. This is also in order of starting position and regions may not overlap, though they can be merged. (-) The FLUSH LIST holds a list of regions that require writing. This is in order of grouping. (*) An active region acts as an exclusion zone on part of the range, allowing the inode sem to be dropped once the region is on a list. (-) A DIO write creates its own exclusive region that must not overlap with any other dirty region. (-) An active write may overlap one or more dirty regions. (-) A dirty region may be overlapped by one or more writes. (-) If an active write overlaps with an incompatible dirty region, that region gets flushed, the active write has to wait for it to complete. (*) When an active write completes, the region is inserted or merged into the dirty list. (-) Merging can only happen between compatible regions. (-) Contiguous dirty regions can be merged. (-) If an inode has all new content, generated locally, dirty regions that have contiguous/ovelapping bounding boxes can be merged, bridging any gaps with zeros. (-) O_DSYNC causes the region to be flushed immediately. (*) There's a queue of groups of regions and those regions must be flushed in order. (-) If a region in a group needs flushing, then all prior groups must be flushed first. TRICKY BITS =========== (*) The active and dirty lists are O(n) search time. An interval tree might be a better option. (*) Having four list_heads is a lot of memory per inode. (*) Activating pending writes. (-) The pending list can contain a bunch of writes that can overlap. (-) When an active write completes, it is removed from the active queue and usually added to the dirty queue (except DIO, DSYNC). This makes a hole. (-) One or more pending writes can then be moved over, but care has to be taken not to misorder them to avoid starvation. (-) When a pending write is added to the active list, it may require part of the dirty list to be flushed. (*) A write that has been put onto the active queue may have to wait for flushing to complete. (*) How should an active write interact with a dirty region? (-) A dirty region may get flushed even whilst it is being modified on the assumption that the active write record will get added to the dirty list and cause a follow up write to the server. (*) RAM pinning. (-) An active write could pin a lot of pages, thereby causing a large write to run the system out of RAM. (-) Allow active writes to start being flushed whilst still being modified. (-) Use a scheduler hook to decant the modified portion into the dirty list when the modifying task is switched away from? (*) Bounding box and variably-sized pages/folios. (-) The bounding box needs to be rounded out to the page boundaries so that DIO writes can claim exclusivity on a series of pages so that they can be invalidated. (-) Allocation of higher-order folios could be limited in scope so that they don't escape the requested bounding box. (-) Bounding boxes could be enlarged to allow for larger folios. (-) Overlarge bounding boxes can be shrunk later, possibly on merging into the dirty list. (-) Ordinary writes can have overlapping bounding boxes, even if they're otherwise incompatible. --- fs/afs/file.c | 30 + fs/afs/internal.h | 7 fs/afs/write.c | 166 -------- fs/netfs/Makefile | 8 fs/netfs/dio_helper.c | 140 ++++++ fs/netfs/internal.h | 32 + fs/netfs/objects.c | 113 +++++ fs/netfs/read_helper.c | 94 ++++ fs/netfs/stats.c | 5 fs/netfs/write_helper.c | 908 ++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 98 +++++ include/trace/events/netfs.h | 180 ++++++++ 12 files changed, 1604 insertions(+), 177 deletions(-) create mode 100644 fs/netfs/dio_helper.c create mode 100644 fs/netfs/objects.c create mode 100644 fs/netfs/write_helper.c diff --git a/fs/afs/file.c b/fs/afs/file.c index 1861e4ecc2ce..8400cdf086b6 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -30,7 +30,7 @@ const struct file_operations afs_file_operations = { .release = afs_release, .llseek = generic_file_llseek, .read_iter = generic_file_read_iter, - .write_iter = afs_file_write, + .write_iter = netfs_file_write_iter, .mmap = afs_file_mmap, .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, @@ -53,8 +53,6 @@ const struct address_space_operations afs_file_aops = { .releasepage = afs_releasepage, .invalidatepage = afs_invalidatepage, .direct_IO = afs_direct_IO, - .write_begin = afs_write_begin, - .write_end = afs_write_end, .writepage = afs_writepage, .writepages = afs_writepages, }; @@ -370,12 +368,38 @@ static void afs_priv_cleanup(struct address_space *mapping, void *netfs_priv) key_put(netfs_priv); } +static void afs_init_dirty_region(struct netfs_dirty_region *region, struct file *file) +{ + region->netfs_priv = key_get(afs_file_key(file)); +} + +static void afs_free_dirty_region(struct netfs_dirty_region *region) +{ + key_put(region->netfs_priv); +} + +static void afs_update_i_size(struct file *file, loff_t new_i_size) +{ + struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); + loff_t i_size; + + write_seqlock(&vnode->cb_lock); + i_size = i_size_read(&vnode->vfs_inode); + if (new_i_size > i_size) + i_size_write(&vnode->vfs_inode, new_i_size); + write_sequnlock(&vnode->cb_lock); + fscache_update_cookie(afs_vnode_cache(vnode), NULL, &new_i_size); +} + const struct netfs_request_ops afs_req_ops = { .init_rreq = afs_init_rreq, .begin_cache_operation = afs_begin_cache_operation, .check_write_begin = afs_check_write_begin, .issue_op = afs_req_issue_op, .cleanup = afs_priv_cleanup, + .init_dirty_region = afs_init_dirty_region, + .free_dirty_region = afs_free_dirty_region, + .update_i_size = afs_update_i_size, }; int afs_write_inode(struct inode *inode, struct writeback_control *wbc) diff --git a/fs/afs/internal.h b/fs/afs/internal.h index e0204dde4b50..0d01ed2fe8fa 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -1511,15 +1511,8 @@ extern int afs_check_volume_status(struct afs_volume *, struct afs_operation *); * write.c */ extern int afs_set_page_dirty(struct page *); -extern int afs_write_begin(struct file *file, struct address_space *mapping, - loff_t pos, unsigned len, unsigned flags, - struct page **pagep, void **fsdata); -extern int afs_write_end(struct file *file, struct address_space *mapping, - loff_t pos, unsigned len, unsigned copied, - struct page *page, void *fsdata); extern int afs_writepage(struct page *, struct writeback_control *); extern int afs_writepages(struct address_space *, struct writeback_control *); -extern ssize_t afs_file_write(struct kiocb *, struct iov_iter *); extern int afs_fsync(struct file *, loff_t, loff_t, int); extern vm_fault_t afs_page_mkwrite(struct vm_fault *vmf); extern void afs_prune_wb_keys(struct afs_vnode *); diff --git a/fs/afs/write.c b/fs/afs/write.c index a244187f3503..e6e2e924c8ae 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -27,152 +27,6 @@ int afs_set_page_dirty(struct page *page) return fscache_set_page_dirty(page, afs_vnode_cache(AFS_FS_I(page->mapping->host))); } -/* - * Prepare to perform part of a write to a page. Note that len may extend - * beyond the end of the page. - */ -int afs_write_begin(struct file *file, struct address_space *mapping, - loff_t pos, unsigned len, unsigned flags, - struct page **_page, void **fsdata) -{ - struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); - struct page *page; - unsigned long priv; - unsigned f, from; - unsigned t, to; - int ret; - - _enter("{%llx:%llu},%llx,%x", - vnode->fid.vid, vnode->fid.vnode, pos, len); - - /* Prefetch area to be written into the cache if we're caching this - * file. We need to do this before we get a lock on the page in case - * there's more than one writer competing for the same cache block. - */ - ret = netfs_write_begin(file, mapping, pos, len, flags, &page, fsdata); - if (ret < 0) - return ret; - - from = offset_in_thp(page, pos); - len = min_t(size_t, len, thp_size(page) - from); - to = from + len; - -try_again: - /* See if this page is already partially written in a way that we can - * merge the new write with. - */ - if (PagePrivate(page)) { - priv = page_private(page); - f = afs_page_dirty_from(page, priv); - t = afs_page_dirty_to(page, priv); - ASSERTCMP(f, <=, t); - - if (PageWriteback(page)) { - trace_afs_page_dirty(vnode, tracepoint_string("alrdy"), page); - goto flush_conflicting_write; - } - /* If the file is being filled locally, allow inter-write - * spaces to be merged into writes. If it's not, only write - * back what the user gives us. - */ - if (!test_bit(NETFS_ICTX_NEW_CONTENT, &vnode->netfs_ctx.flags) && - (to < f || from > t)) - goto flush_conflicting_write; - } - - *_page = find_subpage(page, pos / PAGE_SIZE); - _leave(" = 0"); - return 0; - - /* The previous write and this write aren't adjacent or overlapping, so - * flush the page out. - */ -flush_conflicting_write: - _debug("flush conflict"); - ret = write_one_page(page); - if (ret < 0) - goto error; - - ret = lock_page_killable(page); - if (ret < 0) - goto error; - goto try_again; - -error: - put_page(page); - _leave(" = %d", ret); - return ret; -} - -/* - * Finalise part of a write to a page. Note that len may extend beyond the end - * of the page. - */ -int afs_write_end(struct file *file, struct address_space *mapping, - loff_t pos, unsigned len, unsigned copied, - struct page *subpage, void *fsdata) -{ - struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); - struct page *page = thp_head(subpage); - unsigned long priv; - unsigned int f, from = offset_in_thp(page, pos); - unsigned int t, to = from + copied; - loff_t i_size, write_end_pos; - - _enter("{%llx:%llu},{%lx}", - vnode->fid.vid, vnode->fid.vnode, page->index); - - len = min_t(size_t, len, thp_size(page) - from); - if (!PageUptodate(page)) { - if (copied < len) { - copied = 0; - goto out; - } - - SetPageUptodate(page); - } - - if (copied == 0) - goto out; - - write_end_pos = pos + copied; - - i_size = i_size_read(&vnode->vfs_inode); - if (write_end_pos > i_size) { - write_seqlock(&vnode->cb_lock); - i_size = i_size_read(&vnode->vfs_inode); - if (write_end_pos > i_size) - i_size_write(&vnode->vfs_inode, write_end_pos); - write_sequnlock(&vnode->cb_lock); - fscache_update_cookie(afs_vnode_cache(vnode), NULL, &write_end_pos); - } - - if (PagePrivate(page)) { - priv = page_private(page); - f = afs_page_dirty_from(page, priv); - t = afs_page_dirty_to(page, priv); - if (from < f) - f = from; - if (to > t) - t = to; - priv = afs_page_dirty(page, f, t); - set_page_private(page, priv); - trace_afs_page_dirty(vnode, tracepoint_string("dirty+"), page); - } else { - priv = afs_page_dirty(page, from, to); - attach_page_private(page, (void *)priv); - trace_afs_page_dirty(vnode, tracepoint_string("dirty"), page); - } - - if (set_page_dirty(page)) - _debug("dirtied %lx", page->index); - -out: - unlock_page(page); - put_page(page); - return copied; -} - /* * kill all the pages in the given range */ @@ -812,26 +666,6 @@ int afs_writepages(struct address_space *mapping, return ret; } -/* - * write to an AFS file - */ -ssize_t afs_file_write(struct kiocb *iocb, struct iov_iter *from) -{ - struct afs_vnode *vnode = AFS_FS_I(file_inode(iocb->ki_filp)); - size_t count = iov_iter_count(from); - - _enter("{%llx:%llu},{%zu},", - vnode->fid.vid, vnode->fid.vnode, count); - - if (IS_SWAPFILE(&vnode->vfs_inode)) { - printk(KERN_INFO - "AFS: Attempt to write to active swap file!\n"); - return -EBUSY; - } - - return generic_file_write_iter(iocb, from); -} - /* * flush any dirty pages for this process, and check for write errors. * - the return status from this call provides a reliable indication of diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index c15bfc966d96..3e11453ad2c5 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -1,5 +1,11 @@ # SPDX-License-Identifier: GPL-2.0 -netfs-y := read_helper.o stats.o +netfs-y := \ + objects.o \ + read_helper.o \ + write_helper.o +# dio_helper.o + +netfs-$(CONFIG_NETFS_STATS) += stats.o obj-$(CONFIG_NETFS_SUPPORT) := netfs.o diff --git a/fs/netfs/dio_helper.c b/fs/netfs/dio_helper.c new file mode 100644 index 000000000000..3072de344601 --- /dev/null +++ b/fs/netfs/dio_helper.c @@ -0,0 +1,140 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Network filesystem high-level DIO support. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "internal.h" +#include + +/* + * Perform a direct I/O write to a netfs server. + */ +ssize_t netfs_file_direct_write(struct netfs_dirty_region *region, + struct kiocb *iocb, struct iov_iter *from) +{ + struct file *file = iocb->ki_filp; + struct address_space *mapping = file->f_mapping; + struct inode *inode = mapping->host; + loff_t pos = iocb->ki_pos, last; + ssize_t written; + size_t write_len; + pgoff_t end; + int ret; + + write_len = iov_iter_count(from); + last = pos + write_len - 1; + end = to >> PAGE_SHIFT; + + if (iocb->ki_flags & IOCB_NOWAIT) { + /* If there are pages to writeback, return */ + if (filemap_range_has_page(file->f_mapping, pos, last)) + return -EAGAIN; + } else { + ret = filemap_write_and_wait_range(mapping, pos, last); + if (ret) + return ret; + } + + /* After a write we want buffered reads to be sure to go to disk to get + * the new data. We invalidate clean cached page from the region we're + * about to write. We do this *before* the write so that we can return + * without clobbering -EIOCBQUEUED from ->direct_IO(). + */ + ret = invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, end); + if (ret) { + /* If the page can't be invalidated, return 0 to fall back to + * buffered write. + */ + return ret == -EBUSY ? 0 : ret; + } + + written = mapping->a_ops->direct_IO(iocb, from); + + /* Finally, try again to invalidate clean pages which might have been + * cached by non-direct readahead, or faulted in by get_user_pages() + * if the source of the write was an mmap'ed region of the file + * we're writing. Either one is a pretty crazy thing to do, + * so we don't support it 100%. If this invalidation + * fails, tough, the write still worked... + * + * Most of the time we do not need this since dio_complete() will do + * the invalidation for us. However there are some file systems that + * do not end up with dio_complete() being called, so let's not break + * them by removing it completely. + * + * Noticeable example is a blkdev_direct_IO(). + * + * Skip invalidation for async writes or if mapping has no pages. + */ + if (written > 0 && mapping->nrpages && + invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, end)) + dio_warn_stale_pagecache(file); + + if (written > 0) { + pos += written; + write_len -= written; + if (pos > i_size_read(inode) && !S_ISBLK(inode->i_mode)) { + i_size_write(inode, pos); + mark_inode_dirty(inode); + } + iocb->ki_pos = pos; + } + if (written != -EIOCBQUEUED) + iov_iter_revert(from, write_len - iov_iter_count(from)); +out: +#if 0 + /* + * If the write stopped short of completing, fall back to + * buffered writes. Some filesystems do this for writes to + * holes, for example. For DAX files, a buffered write will + * not succeed (even if it did, DAX does not handle dirty + * page-cache pages correctly). + */ + if (written < 0 || !iov_iter_count(from) || IS_DAX(inode)) + goto out; + + status = netfs_perform_write(region, file, from, pos = iocb->ki_pos); + /* + * If generic_perform_write() returned a synchronous error + * then we want to return the number of bytes which were + * direct-written, or the error code if that was zero. Note + * that this differs from normal direct-io semantics, which + * will return -EFOO even if some bytes were written. + */ + if (unlikely(status < 0)) { + err = status; + goto out; + } + /* + * We need to ensure that the page cache pages are written to + * disk and invalidated to preserve the expected O_DIRECT + * semantics. + */ + endbyte = pos + status - 1; + err = filemap_write_and_wait_range(mapping, pos, endbyte); + if (err == 0) { + iocb->ki_pos = endbyte + 1; + written += status; + invalidate_mapping_pages(mapping, + pos >> PAGE_SHIFT, + endbyte >> PAGE_SHIFT); + } else { + /* + * We don't know how much we wrote, so just return + * the number of bytes which were direct-written + */ + } +#endif + return written; +} diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index 4805d9fc8808..77ceab694348 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -15,11 +15,41 @@ #define pr_fmt(fmt) "netfs: " fmt +/* + * dio_helper.c + */ +ssize_t netfs_file_direct_write(struct netfs_dirty_region *region, + struct kiocb *iocb, struct iov_iter *from); + +/* + * objects.c + */ +struct netfs_flush_group *netfs_get_flush_group(struct netfs_flush_group *group); +void netfs_put_flush_group(struct netfs_flush_group *group); +struct netfs_dirty_region *netfs_alloc_dirty_region(void); +struct netfs_dirty_region *netfs_get_dirty_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + enum netfs_region_trace what); +void netfs_free_dirty_region(struct netfs_i_context *ctx, struct netfs_dirty_region *region); +void netfs_put_dirty_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + enum netfs_region_trace what); + /* * read_helper.c */ extern unsigned int netfs_debug; +int netfs_prefetch_for_write(struct file *file, struct page *page, loff_t pos, size_t len, + bool always_fill); + +/* + * write_helper.c + */ +void netfs_flush_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + enum netfs_dirty_trace why); + /* * stats.c */ @@ -42,6 +72,8 @@ extern atomic_t netfs_n_rh_write_begin; extern atomic_t netfs_n_rh_write_done; extern atomic_t netfs_n_rh_write_failed; extern atomic_t netfs_n_rh_write_zskip; +extern atomic_t netfs_n_wh_region; +extern atomic_t netfs_n_wh_flush_group; static inline void netfs_stat(atomic_t *stat) diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c new file mode 100644 index 000000000000..ba1e052aa352 --- /dev/null +++ b/fs/netfs/objects.c @@ -0,0 +1,113 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Object lifetime handling and tracing. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include +#include +#include "internal.h" + +/** + * netfs_new_flush_group - Create a new write flush group + * @inode: The inode for which this is a flush group. + * @netfs_priv: Netfs private data to include in the new group + * + * Create a new flush group and add it to the tail of the inode's group list. + * Flush groups are used to control the order in which dirty data is written + * back to the server. + * + * The caller must hold ctx->lock. + */ +struct netfs_flush_group *netfs_new_flush_group(struct inode *inode, void *netfs_priv) +{ + struct netfs_flush_group *group; + struct netfs_i_context *ctx = netfs_i_context(inode); + + group = kzalloc(sizeof(*group), GFP_KERNEL); + if (group) { + group->netfs_priv = netfs_priv; + INIT_LIST_HEAD(&group->region_list); + refcount_set(&group->ref, 1); + netfs_stat(&netfs_n_wh_flush_group); + list_add_tail(&group->group_link, &ctx->flush_groups); + } + return group; +} +EXPORT_SYMBOL(netfs_new_flush_group); + +struct netfs_flush_group *netfs_get_flush_group(struct netfs_flush_group *group) +{ + refcount_inc(&group->ref); + return group; +} + +void netfs_put_flush_group(struct netfs_flush_group *group) +{ + if (group && refcount_dec_and_test(&group->ref)) { + netfs_stat_d(&netfs_n_wh_flush_group); + kfree(group); + } +} + +struct netfs_dirty_region *netfs_alloc_dirty_region(void) +{ + struct netfs_dirty_region *region; + + region = kzalloc(sizeof(struct netfs_dirty_region), GFP_KERNEL); + if (region) + netfs_stat(&netfs_n_wh_region); + return region; +} + +struct netfs_dirty_region *netfs_get_dirty_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + enum netfs_region_trace what) +{ + int ref; + + __refcount_inc(®ion->ref, &ref); + trace_netfs_ref_region(region->debug_id, ref + 1, what); + return region; +} + +void netfs_free_dirty_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region) +{ + if (region) { + trace_netfs_ref_region(region->debug_id, 0, netfs_region_trace_free); + if (ctx->ops->free_dirty_region) + ctx->ops->free_dirty_region(region); + netfs_put_flush_group(region->group); + netfs_stat_d(&netfs_n_wh_region); + kfree(region); + } +} + +void netfs_put_dirty_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + enum netfs_region_trace what) +{ + bool dead; + int ref; + + if (!region) + return; + dead = __refcount_dec_and_test(®ion->ref, &ref); + trace_netfs_ref_region(region->debug_id, ref - 1, what); + if (dead) { + if (!list_empty(®ion->active_link) || + !list_empty(®ion->dirty_link)) { + spin_lock(&ctx->lock); + list_del_init(®ion->active_link); + list_del_init(®ion->dirty_link); + spin_unlock(&ctx->lock); + } + netfs_free_dirty_region(ctx, region); + } +} diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index aa98ecf6df6b..bfcdbbd32f4c 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -1321,3 +1321,97 @@ int netfs_write_begin(struct file *file, struct address_space *mapping, return ret; } EXPORT_SYMBOL(netfs_write_begin); + +/* + * Preload the data into a page we're proposing to write into. + */ +int netfs_prefetch_for_write(struct file *file, struct page *page, + loff_t pos, size_t len, bool always_fill) +{ + struct address_space *mapping = page_file_mapping(page); + struct netfs_read_request *rreq; + struct netfs_i_context *ctx = netfs_i_context(mapping->host); + struct page *xpage; + unsigned int debug_index = 0; + int ret; + + DEFINE_READAHEAD(ractl, file, NULL, mapping, page_index(page)); + + /* If the page is beyond the EOF, we want to clear it - unless it's + * within the cache granule containing the EOF, in which case we need + * to preload the granule. + */ + if (!netfs_is_cache_enabled(mapping->host)) { + if (netfs_skip_page_read(page, pos, len, always_fill)) { + netfs_stat(&netfs_n_rh_write_zskip); + ret = 0; + goto error; + } + } + + ret = -ENOMEM; + rreq = netfs_alloc_read_request(mapping, file); + if (!rreq) + goto error; + rreq->start = page_offset(page); + rreq->len = thp_size(page); + rreq->no_unlock_page = page_file_offset(page); + __set_bit(NETFS_RREQ_NO_UNLOCK_PAGE, &rreq->flags); + + if (ctx->ops->begin_cache_operation) { + ret = ctx->ops->begin_cache_operation(rreq); + if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) + goto error_put; + } + + netfs_stat(&netfs_n_rh_write_begin); + trace_netfs_read(rreq, pos, len, netfs_read_trace_prefetch_for_write); + + /* Expand the request to meet caching requirements and download + * preferences. + */ + ractl._nr_pages = thp_nr_pages(page); + netfs_rreq_expand(rreq, &ractl); + + /* Set up the output buffer */ + ret = netfs_rreq_set_up_buffer(rreq, &ractl, page, + readahead_index(&ractl), readahead_count(&ractl)); + if (ret < 0) { + while ((xpage = readahead_page(&ractl))) + if (xpage != page) + put_page(xpage); + goto error_put; + } + + netfs_get_read_request(rreq); + atomic_set(&rreq->nr_rd_ops, 1); + do { + if (!netfs_rreq_submit_slice(rreq, &debug_index)) + break; + + } while (rreq->submitted < rreq->len); + + /* Keep nr_rd_ops incremented so that the ref always belongs to us, and + * the service code isn't punted off to a random thread pool to + * process. + */ + for (;;) { + wait_var_event(&rreq->nr_rd_ops, atomic_read(&rreq->nr_rd_ops) == 1); + netfs_rreq_assess(rreq, false); + if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags)) + break; + cond_resched(); + } + + ret = rreq->error; + if (ret == 0 && rreq->submitted < rreq->len) { + trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_write_begin); + ret = -EIO; + } + +error_put: + netfs_put_read_request(rreq, false); +error: + _leave(" = %d", ret); + return ret; +} diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index 5510a7a14a40..7c079ca47b5b 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -27,6 +27,8 @@ atomic_t netfs_n_rh_write_begin; atomic_t netfs_n_rh_write_done; atomic_t netfs_n_rh_write_failed; atomic_t netfs_n_rh_write_zskip; +atomic_t netfs_n_wh_region; +atomic_t netfs_n_wh_flush_group; void netfs_stats_show(struct seq_file *m) { @@ -54,5 +56,8 @@ void netfs_stats_show(struct seq_file *m) atomic_read(&netfs_n_rh_write), atomic_read(&netfs_n_rh_write_done), atomic_read(&netfs_n_rh_write_failed)); + seq_printf(m, "WrHelp : R=%u F=%u\n", + atomic_read(&netfs_n_wh_region), + atomic_read(&netfs_n_wh_flush_group)); } EXPORT_SYMBOL(netfs_stats_show); diff --git a/fs/netfs/write_helper.c b/fs/netfs/write_helper.c new file mode 100644 index 000000000000..a8c58eaa84d0 --- /dev/null +++ b/fs/netfs/write_helper.c @@ -0,0 +1,908 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Network filesystem high-level write support. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include +#include +#include "internal.h" + +static atomic_t netfs_region_debug_ids; + +static bool __overlaps(loff_t start1, loff_t end1, loff_t start2, loff_t end2) +{ + return (start1 < start2) ? end1 > start2 : end2 > start1; +} + +static bool overlaps(struct netfs_range *a, struct netfs_range *b) +{ + return __overlaps(a->start, a->end, b->start, b->end); +} + +static int wait_on_region(struct netfs_dirty_region *region, + enum netfs_region_state state) +{ + return wait_var_event_interruptible(®ion->state, + READ_ONCE(region->state) >= state); +} + +/* + * Grab a page for writing. We don't lock it at this point as we have yet to + * preemptively trigger a fault-in - but we need to know how large the page + * will be before we try that. + */ +static struct page *netfs_grab_page_for_write(struct address_space *mapping, + loff_t pos, size_t len_remaining) +{ + struct page *page; + int fgp_flags = FGP_LOCK | FGP_WRITE | FGP_CREAT; + + page = pagecache_get_page(mapping, pos >> PAGE_SHIFT, fgp_flags, + mapping_gfp_mask(mapping)); + if (!page) + return ERR_PTR(-ENOMEM); + wait_for_stable_page(page); + return page; +} + +/* + * Initialise a new dirty page group. The caller is responsible for setting + * the type and any flags that they want. + */ +static void netfs_init_dirty_region(struct netfs_dirty_region *region, + struct inode *inode, struct file *file, + enum netfs_region_type type, + unsigned long flags, + loff_t start, loff_t end) +{ + struct netfs_flush_group *group; + struct netfs_i_context *ctx = netfs_i_context(inode); + + region->state = NETFS_REGION_IS_PENDING; + region->type = type; + region->flags = flags; + region->reserved.start = start; + region->reserved.end = end; + region->dirty.start = start; + region->dirty.end = start; + region->bounds.start = round_down(start, ctx->bsize); + region->bounds.end = round_up(end, ctx->bsize); + region->i_size = i_size_read(inode); + region->debug_id = atomic_inc_return(&netfs_region_debug_ids); + INIT_LIST_HEAD(®ion->active_link); + INIT_LIST_HEAD(®ion->dirty_link); + INIT_LIST_HEAD(®ion->flush_link); + refcount_set(®ion->ref, 1); + spin_lock_init(®ion->lock); + if (file && ctx->ops->init_dirty_region) + ctx->ops->init_dirty_region(region, file); + if (!region->group) { + group = list_last_entry(&ctx->flush_groups, + struct netfs_flush_group, group_link); + region->group = netfs_get_flush_group(group); + list_add_tail(®ion->flush_link, &group->region_list); + } + trace_netfs_ref_region(region->debug_id, 1, netfs_region_trace_new); + trace_netfs_dirty(ctx, region, NULL, netfs_dirty_trace_new); +} + +/* + * Queue a region for flushing. Regions may need to be flushed in the right + * order (e.g. ceph snaps) and so we may need to chuck other regions onto the + * flush queue first. + * + * The caller must hold ctx->lock. + */ +void netfs_flush_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + enum netfs_dirty_trace why) +{ + struct netfs_flush_group *group; + + LIST_HEAD(flush_queue); + + kenter("%x", region->debug_id); + + if (test_bit(NETFS_REGION_FLUSH_Q, ®ion->flags) || + region->group->flush) + return; + + trace_netfs_dirty(ctx, region, NULL, why); + + /* If the region isn't in the bottom flush group, we need to flush out + * all of the flush groups below it. + */ + while (!list_is_first(®ion->group->group_link, &ctx->flush_groups)) { + group = list_first_entry(&ctx->flush_groups, + struct netfs_flush_group, group_link); + group->flush = true; + list_del_init(&group->group_link); + list_splice_tail_init(&group->region_list, &ctx->flush_queue); + netfs_put_flush_group(group); + } + + set_bit(NETFS_REGION_FLUSH_Q, ®ion->flags); + list_move_tail(®ion->flush_link, &ctx->flush_queue); +} + +/* + * Decide if/how a write can be merged with a dirty region. + */ +static enum netfs_write_compatibility netfs_write_compatibility( + struct netfs_i_context *ctx, + struct netfs_dirty_region *old, + struct netfs_dirty_region *candidate) +{ + if (old->type == NETFS_REGION_DIO || + old->type == NETFS_REGION_DSYNC || + old->state >= NETFS_REGION_IS_FLUSHING || + /* The bounding boxes of DSYNC writes can overlap with those of + * other DSYNC writes and ordinary writes. + */ + candidate->group != old->group || + old->group->flush) + return NETFS_WRITES_INCOMPATIBLE; + if (!ctx->ops->is_write_compatible) { + if (candidate->type == NETFS_REGION_DSYNC) + return NETFS_WRITES_SUPERSEDE; + return NETFS_WRITES_COMPATIBLE; + } + return ctx->ops->is_write_compatible(ctx, old, candidate); +} + +/* + * Split a dirty region. + */ +static struct netfs_dirty_region *netfs_split_dirty_region( + struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + struct netfs_dirty_region **spare, + unsigned long long pos) +{ + struct netfs_dirty_region *tail = *spare; + + *spare = NULL; + *tail = *region; + region->dirty.end = pos; + tail->dirty.start = pos; + tail->debug_id = atomic_inc_return(&netfs_region_debug_ids); + + refcount_set(&tail->ref, 1); + INIT_LIST_HEAD(&tail->active_link); + netfs_get_flush_group(tail->group); + spin_lock_init(&tail->lock); + // TODO: grab cache resources + + // need to split the bounding box? + __set_bit(NETFS_REGION_SUPERSEDED, &tail->flags); + if (ctx->ops->split_dirty_region) + ctx->ops->split_dirty_region(tail); + list_add(&tail->dirty_link, ®ion->dirty_link); + list_add(&tail->flush_link, ®ion->flush_link); + trace_netfs_dirty(ctx, tail, region, netfs_dirty_trace_split); + return tail; +} + +/* + * Queue a write for access to the pagecache. The caller must hold ctx->lock. + * The NETFS_REGION_PENDING flag will be cleared when it's possible to proceed. + */ +static void netfs_queue_write(struct netfs_i_context *ctx, + struct netfs_dirty_region *candidate) +{ + struct netfs_dirty_region *r; + struct list_head *p; + + /* We must wait for any overlapping pending writes */ + list_for_each_entry(r, &ctx->pending_writes, active_link) { + if (overlaps(&candidate->bounds, &r->bounds)) { + if (overlaps(&candidate->reserved, &r->reserved) || + netfs_write_compatibility(ctx, r, candidate) == + NETFS_WRITES_INCOMPATIBLE) + goto add_to_pending_queue; + } + } + + /* We mustn't let the request overlap with the reservation of any other + * active writes, though it can overlap with a bounding box if the + * writes are compatible. + */ + list_for_each(p, &ctx->active_writes) { + r = list_entry(p, struct netfs_dirty_region, active_link); + if (r->bounds.end <= candidate->bounds.start) + continue; + if (r->bounds.start >= candidate->bounds.end) + break; + if (overlaps(&candidate->bounds, &r->bounds)) { + if (overlaps(&candidate->reserved, &r->reserved) || + netfs_write_compatibility(ctx, r, candidate) == + NETFS_WRITES_INCOMPATIBLE) + goto add_to_pending_queue; + } + } + + /* We can install the record in the active list to reserve our slot */ + list_add(&candidate->active_link, p); + + /* Okay, we've reserved our slot in the active queue */ + smp_store_release(&candidate->state, NETFS_REGION_IS_RESERVED); + trace_netfs_dirty(ctx, candidate, NULL, netfs_dirty_trace_reserved); + wake_up_var(&candidate->state); + kleave(" [go]"); + return; + +add_to_pending_queue: + /* We get added to the pending list and then we have to wait */ + list_add(&candidate->active_link, &ctx->pending_writes); + trace_netfs_dirty(ctx, candidate, NULL, netfs_dirty_trace_wait_pend); + kleave(" [wait pend]"); +} + +/* + * Make sure there's a flush group. + */ +static int netfs_require_flush_group(struct inode *inode) +{ + struct netfs_flush_group *group; + struct netfs_i_context *ctx = netfs_i_context(inode); + + if (list_empty(&ctx->flush_groups)) { + kdebug("new flush group"); + group = netfs_new_flush_group(inode, NULL); + if (!group) + return -ENOMEM; + } + return 0; +} + +/* + * Create a dirty region record for the write we're about to do and add it to + * the list of regions. We may need to wait for conflicting writes to + * complete. + */ +static struct netfs_dirty_region *netfs_prepare_region(struct inode *inode, + struct file *file, + loff_t start, size_t len, + enum netfs_region_type type, + unsigned long flags) +{ + struct netfs_dirty_region *candidate; + struct netfs_i_context *ctx = netfs_i_context(inode); + loff_t end = start + len; + int ret; + + ret = netfs_require_flush_group(inode); + if (ret < 0) + return ERR_PTR(ret); + + candidate = netfs_alloc_dirty_region(); + if (!candidate) + return ERR_PTR(-ENOMEM); + + netfs_init_dirty_region(candidate, inode, file, type, flags, start, end); + + spin_lock(&ctx->lock); + netfs_queue_write(ctx, candidate); + spin_unlock(&ctx->lock); + return candidate; +} + +/* + * Activate a write. This adds it to the dirty list and does any necessary + * flushing and superceding there. The caller must provide a spare region + * record so that we can split a dirty record if we need to supersede it. + */ +static void __netfs_activate_write(struct netfs_i_context *ctx, + struct netfs_dirty_region *candidate, + struct netfs_dirty_region **spare) +{ + struct netfs_dirty_region *r; + struct list_head *p; + enum netfs_write_compatibility comp; + bool conflicts = false; + + /* See if there are any dirty regions that need flushing first. */ + list_for_each(p, &ctx->dirty_regions) { + r = list_entry(p, struct netfs_dirty_region, dirty_link); + if (r->bounds.end <= candidate->bounds.start) + continue; + if (r->bounds.start >= candidate->bounds.end) + break; + + if (list_empty(&candidate->dirty_link) && + r->dirty.start > candidate->dirty.start) + list_add_tail(&candidate->dirty_link, p); + + comp = netfs_write_compatibility(ctx, r, candidate); + switch (comp) { + case NETFS_WRITES_INCOMPATIBLE: + netfs_flush_region(ctx, r, netfs_dirty_trace_flush_conflict); + conflicts = true; + continue; + + case NETFS_WRITES_SUPERSEDE: + if (!overlaps(&candidate->reserved, &r->dirty)) + continue; + if (r->dirty.start < candidate->dirty.start) { + /* The region overlaps the beginning of our + * region, we split it and mark the overlapping + * part as superseded. We insert ourself + * between. + */ + r = netfs_split_dirty_region(ctx, r, spare, + candidate->reserved.start); + list_add_tail(&candidate->dirty_link, &r->dirty_link); + p = &r->dirty_link; /* Advance the for-loop */ + } else { + /* The region is after ours, so make sure we're + * inserted before it. + */ + if (list_empty(&candidate->dirty_link)) + list_add_tail(&candidate->dirty_link, &r->dirty_link); + set_bit(NETFS_REGION_SUPERSEDED, &r->flags); + trace_netfs_dirty(ctx, candidate, r, netfs_dirty_trace_supersedes); + } + continue; + + case NETFS_WRITES_COMPATIBLE: + continue; + } + } + + if (list_empty(&candidate->dirty_link)) + list_add_tail(&candidate->dirty_link, p); + netfs_get_dirty_region(ctx, candidate, netfs_region_trace_get_dirty); + + if (conflicts) { + /* The caller must wait for the flushes to complete. */ + trace_netfs_dirty(ctx, candidate, NULL, netfs_dirty_trace_wait_active); + kleave(" [wait flush]"); + return; + } + + /* Okay, we're cleared to proceed. */ + smp_store_release(&candidate->state, NETFS_REGION_IS_ACTIVE); + trace_netfs_dirty(ctx, candidate, NULL, netfs_dirty_trace_active); + wake_up_var(&candidate->state); + kleave(" [go]"); + return; +} + +static int netfs_activate_write(struct netfs_i_context *ctx, + struct netfs_dirty_region *region) +{ + struct netfs_dirty_region *spare; + + spare = netfs_alloc_dirty_region(); + if (!spare) + return -ENOMEM; + + spin_lock(&ctx->lock); + __netfs_activate_write(ctx, region, &spare); + spin_unlock(&ctx->lock); + netfs_free_dirty_region(ctx, spare); + return 0; +} + +/* + * Merge a completed active write into the list of dirty regions. The region + * can be in one of a number of states: + * + * - Ordinary write, error, no data copied. Discard. + * - Ordinary write, unflushed. Dirty + * - Ordinary write, flush started. Dirty + * - Ordinary write, completed/failed. Discard. + * - DIO write, completed/failed. Discard. + * - DSYNC write, error before flush. As ordinary. + * - DSYNC write, flushed in progress, EINTR. Dirty (supersede). + * - DSYNC write, written to server and cache. Dirty (supersede)/Discard. + * - DSYNC write, written to server but not yet cache. Dirty. + * + * Once we've dealt with this record, we see about activating some other writes + * to fill the activity hole. + * + * This eats the caller's ref on the region. + */ +static void netfs_merge_dirty_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region) +{ + struct netfs_dirty_region *p, *q, *front; + bool new_content = test_bit(NETFS_ICTX_NEW_CONTENT, &ctx->flags); + LIST_HEAD(graveyard); + + list_del_init(®ion->active_link); + + switch (region->type) { + case NETFS_REGION_DIO: + list_move_tail(®ion->dirty_link, &graveyard); + goto discard; + + case NETFS_REGION_DSYNC: + /* A DSYNC write may have overwritten some dirty data + * and caused the writeback of other dirty data. + */ + goto scan_forwards; + + case NETFS_REGION_ORDINARY: + if (region->dirty.end == region->dirty.start) { + list_move_tail(®ion->dirty_link, &graveyard); + goto discard; + } + goto scan_backwards; + } + +scan_backwards: + kdebug("scan_backwards"); + /* Search backwards for a preceding record that we might be able to + * merge with. We skip over any intervening flush-in-progress records. + */ + p = front = region; + list_for_each_entry_continue_reverse(p, &ctx->dirty_regions, dirty_link) { + kdebug("- back %x", p->debug_id); + if (p->state >= NETFS_REGION_IS_FLUSHING) + continue; + if (p->state == NETFS_REGION_IS_ACTIVE) + break; + if (p->bounds.end < region->bounds.start) + break; + if (p->dirty.end >= region->dirty.start || new_content) + goto merge_backwards; + } + goto scan_forwards; + +merge_backwards: + kdebug("merge_backwards"); + if (test_bit(NETFS_REGION_SUPERSEDED, &p->flags) || + netfs_write_compatibility(ctx, p, region) != NETFS_WRITES_COMPATIBLE) + goto scan_forwards; + + front = p; + front->bounds.end = max(front->bounds.end, region->bounds.end); + front->dirty.end = max(front->dirty.end, region->dirty.end); + set_bit(NETFS_REGION_SUPERSEDED, ®ion->flags); + list_del_init(®ion->flush_link); + trace_netfs_dirty(ctx, front, region, netfs_dirty_trace_merged_back); + +scan_forwards: + /* Subsume forwards any records this one covers. There should be no + * non-supersedeable incompatible regions in our range as we would have + * flushed and waited for them before permitting this write to start. + * + * There can, however, be regions undergoing flushing which we need to + * skip over and not merge with. + */ + kdebug("scan_forwards"); + p = region; + list_for_each_entry_safe_continue(p, q, &ctx->dirty_regions, dirty_link) { + kdebug("- forw %x", p->debug_id); + if (p->state >= NETFS_REGION_IS_FLUSHING) + continue; + if (p->state == NETFS_REGION_IS_ACTIVE) + break; + if (p->dirty.start > region->dirty.end && + (!new_content || p->bounds.start > p->bounds.end)) + break; + + if (region->dirty.end >= p->dirty.end) { + /* Entirely subsumed */ + list_move_tail(&p->dirty_link, &graveyard); + list_del_init(&p->flush_link); + trace_netfs_dirty(ctx, front, p, netfs_dirty_trace_merged_sub); + continue; + } + + goto merge_forwards; + } + goto merge_complete; + +merge_forwards: + kdebug("merge_forwards"); + if (test_bit(NETFS_REGION_SUPERSEDED, &p->flags) || + netfs_write_compatibility(ctx, p, front) == NETFS_WRITES_SUPERSEDE) { + /* If a region was partially superseded by us, we need to roll + * it forwards and remove the superseded flag. + */ + if (p->dirty.start < front->dirty.end) { + p->dirty.start = front->dirty.end; + clear_bit(NETFS_REGION_SUPERSEDED, &p->flags); + } + trace_netfs_dirty(ctx, p, front, netfs_dirty_trace_superseded); + goto merge_complete; + } + + /* Simply merge overlapping/contiguous ordinary areas together. */ + front->bounds.end = max(front->bounds.end, p->bounds.end); + front->dirty.end = max(front->dirty.end, p->dirty.end); + list_move_tail(&p->dirty_link, &graveyard); + list_del_init(&p->flush_link); + trace_netfs_dirty(ctx, front, p, netfs_dirty_trace_merged_forw); + +merge_complete: + if (test_bit(NETFS_REGION_SUPERSEDED, ®ion->flags)) { + list_move_tail(®ion->dirty_link, &graveyard); + } +discard: + while (!list_empty(&graveyard)) { + p = list_first_entry(&graveyard, struct netfs_dirty_region, dirty_link); + list_del_init(&p->dirty_link); + smp_store_release(&p->state, NETFS_REGION_IS_COMPLETE); + trace_netfs_dirty(ctx, p, NULL, netfs_dirty_trace_complete); + wake_up_var(&p->state); + netfs_put_dirty_region(ctx, p, netfs_region_trace_put_merged); + } +} + +/* + * Start pending writes in a window we've created by the removal of an active + * write. The writes are bundled onto the given queue and it's left as an + * exercise for the caller to actually start them. + */ +static void netfs_start_pending_writes(struct netfs_i_context *ctx, + struct list_head *prev_p, + struct list_head *queue) +{ + struct netfs_dirty_region *prev = NULL, *next = NULL, *p, *q; + struct netfs_range window = { 0, ULLONG_MAX }; + + if (prev_p != &ctx->active_writes) { + prev = list_entry(prev_p, struct netfs_dirty_region, active_link); + window.start = prev->reserved.end; + if (!list_is_last(prev_p, &ctx->active_writes)) { + next = list_next_entry(prev, active_link); + window.end = next->reserved.start; + } + } else if (!list_empty(&ctx->active_writes)) { + next = list_last_entry(&ctx->active_writes, + struct netfs_dirty_region, active_link); + window.end = next->reserved.start; + } + + list_for_each_entry_safe(p, q, &ctx->pending_writes, active_link) { + bool skip = false; + + if (!overlaps(&p->reserved, &window)) + continue; + + /* Narrow the window when we find a region that requires more + * than we can immediately provide. The queue is in submission + * order and we need to prevent starvation. + */ + if (p->type == NETFS_REGION_DIO) { + if (p->bounds.start < window.start) { + window.start = p->bounds.start; + skip = true; + } + if (p->bounds.end > window.end) { + window.end = p->bounds.end; + skip = true; + } + } else { + if (p->reserved.start < window.start) { + window.start = p->reserved.start; + skip = true; + } + if (p->reserved.end > window.end) { + window.end = p->reserved.end; + skip = true; + } + } + if (window.start >= window.end) + break; + if (skip) + continue; + + /* Okay, we have a gap that's large enough to start this write + * in. Make sure it's compatible with any region its bounds + * overlap. + */ + if (prev && + p->bounds.start < prev->bounds.end && + netfs_write_compatibility(ctx, prev, p) == NETFS_WRITES_INCOMPATIBLE) { + window.start = max(window.start, p->bounds.end); + skip = true; + } + + if (next && + p->bounds.end > next->bounds.start && + netfs_write_compatibility(ctx, next, p) == NETFS_WRITES_INCOMPATIBLE) { + window.end = min(window.end, p->bounds.start); + skip = true; + } + if (window.start >= window.end) + break; + if (skip) + continue; + + /* Okay, we can start this write. */ + trace_netfs_dirty(ctx, p, NULL, netfs_dirty_trace_start_pending); + list_move(&p->active_link, + prev ? &prev->active_link : &ctx->pending_writes); + list_add_tail(&p->dirty_link, queue); + if (p->type == NETFS_REGION_DIO) + window.start = p->bounds.end; + else + window.start = p->reserved.end; + prev = p; + } +} + +/* + * We completed the modification phase of a write. We need to fix up the dirty + * list, remove this region from the active list and start waiters. + */ +static void netfs_commit_write(struct netfs_i_context *ctx, + struct netfs_dirty_region *region) +{ + struct netfs_dirty_region *p; + struct list_head *prev; + LIST_HEAD(queue); + + spin_lock(&ctx->lock); + smp_store_release(®ion->state, NETFS_REGION_IS_DIRTY); + trace_netfs_dirty(ctx, region, NULL, netfs_dirty_trace_commit); + wake_up_var(®ion->state); + + prev = region->active_link.prev; + netfs_merge_dirty_region(ctx, region); + if (!list_empty(&ctx->pending_writes)) + netfs_start_pending_writes(ctx, prev, &queue); + spin_unlock(&ctx->lock); + + while (!list_empty(&queue)) { + p = list_first_entry(&queue, struct netfs_dirty_region, dirty_link); + list_del_init(&p->dirty_link); + smp_store_release(&p->state, NETFS_REGION_IS_DIRTY); + wake_up_var(&p->state); + } +} + +/* + * Write data into a prereserved region of the pagecache attached to a netfs + * inode. + */ +static ssize_t netfs_perform_write(struct netfs_dirty_region *region, + struct kiocb *iocb, struct iov_iter *i) +{ + struct file *file = iocb->ki_filp; + struct netfs_i_context *ctx = netfs_i_context(file_inode(file)); + struct page *page; + ssize_t written = 0, ret; + loff_t new_pos, i_size; + bool always_fill = false; + + do { + size_t plen; + size_t offset; /* Offset into pagecache page */ + size_t bytes; /* Bytes to write to page */ + size_t copied; /* Bytes copied from user */ + bool relock = false; + + page = netfs_grab_page_for_write(file->f_mapping, region->dirty.end, + iov_iter_count(i)); + if (!page) + return -ENOMEM; + + plen = thp_size(page); + offset = region->dirty.end - page_file_offset(page); + bytes = min_t(size_t, plen - offset, iov_iter_count(i)); + + kdebug("segment %zx @%zx", bytes, offset); + + if (!PageUptodate(page)) { + unlock_page(page); /* Avoid deadlocking fault-in */ + relock = true; + } + + /* Bring in the user page that we will copy from _first_. + * Otherwise there's a nasty deadlock on copying from the + * same page as we're writing to, without it being marked + * up-to-date. + * + * Not only is this an optimisation, but it is also required + * to check that the address is actually valid, when atomic + * usercopies are used, below. + */ + if (unlikely(iov_iter_fault_in_readable(i, bytes))) { + kdebug("fault-in"); + ret = -EFAULT; + goto error_page; + } + + if (fatal_signal_pending(current)) { + ret = -EINTR; + goto error_page; + } + + if (relock) { + ret = lock_page_killable(page); + if (ret < 0) + goto error_page; + } + +redo_prefetch: + /* Prefetch area to be written into the cache if we're caching + * this file. We need to do this before we get a lock on the + * page in case there's more than one writer competing for the + * same cache block. + */ + if (!PageUptodate(page)) { + ret = netfs_prefetch_for_write(file, page, region->dirty.end, + bytes, always_fill); + kdebug("prefetch %zx", ret); + if (ret < 0) + goto error_page; + } + + if (mapping_writably_mapped(page->mapping)) + flush_dcache_page(page); + copied = copy_page_from_iter_atomic(page, offset, bytes, i); + flush_dcache_page(page); + kdebug("copied %zx", copied); + + /* Deal with a (partially) failed copy */ + if (!PageUptodate(page)) { + if (copied == 0) { + ret = -EFAULT; + goto error_page; + } + if (copied < bytes) { + iov_iter_revert(i, copied); + always_fill = true; + goto redo_prefetch; + } + SetPageUptodate(page); + } + + /* Update the inode size if we moved the EOF marker */ + new_pos = region->dirty.end + copied; + i_size = i_size_read(file_inode(file)); + if (new_pos > i_size) { + if (ctx->ops->update_i_size) { + ctx->ops->update_i_size(file, new_pos); + } else { + i_size_write(file_inode(file), new_pos); + fscache_update_cookie(ctx->cache, NULL, &new_pos); + } + } + + /* Update the region appropriately */ + if (i_size > region->i_size) + region->i_size = i_size; + smp_store_release(®ion->dirty.end, new_pos); + + trace_netfs_dirty(ctx, region, NULL, netfs_dirty_trace_modified); + set_page_dirty(page); + unlock_page(page); + put_page(page); + page = NULL; + + cond_resched(); + + written += copied; + + balance_dirty_pages_ratelimited(file->f_mapping); + } while (iov_iter_count(i)); + +out: + if (likely(written)) { + kdebug("written"); + iocb->ki_pos += written; + + /* Flush and wait for a write that requires immediate synchronisation. */ + if (region->type == NETFS_REGION_DSYNC) { + kdebug("dsync"); + spin_lock(&ctx->lock); + netfs_flush_region(ctx, region, netfs_dirty_trace_flush_dsync); + spin_unlock(&ctx->lock); + + ret = wait_on_region(region, NETFS_REGION_IS_COMPLETE); + if (ret < 0) + written = ret; + } + } + + netfs_commit_write(ctx, region); + return written ? written : ret; + +error_page: + unlock_page(page); + put_page(page); + goto out; +} + +/** + * netfs_file_write_iter - write data to a file + * @iocb: IO state structure + * @from: iov_iter with data to write + * + * This is a wrapper around __generic_file_write_iter() to be used by most + * filesystems. It takes care of syncing the file in case of O_SYNC file + * and acquires i_mutex as needed. + * Return: + * * negative error code if no data has been written at all of + * vfs_fsync_range() failed for a synchronous write + * * number of bytes written, even for truncated writes + */ +ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct netfs_dirty_region *region = NULL; + struct file *file = iocb->ki_filp; + struct inode *inode = file->f_mapping->host; + struct netfs_i_context *ctx = netfs_i_context(inode); + enum netfs_region_type type; + unsigned long flags = 0; + ssize_t ret; + + printk("\n"); + kenter("%llx,%zx,%llx", iocb->ki_pos, iov_iter_count(from), i_size_read(inode)); + + inode_lock(inode); + ret = generic_write_checks(iocb, from); + if (ret <= 0) + goto error_unlock; + + if (iocb->ki_flags & IOCB_DIRECT) + type = NETFS_REGION_DIO; + if (iocb->ki_flags & IOCB_DSYNC) + type = NETFS_REGION_DSYNC; + else + type = NETFS_REGION_ORDINARY; + if (iocb->ki_flags & IOCB_SYNC) + __set_bit(NETFS_REGION_SYNC, &flags); + + region = netfs_prepare_region(inode, file, iocb->ki_pos, + iov_iter_count(from), type, flags); + if (IS_ERR(region)) { + ret = PTR_ERR(region); + goto error_unlock; + } + + trace_netfs_write_iter(region, iocb, from); + + /* We can write back this queue in page reclaim */ + current->backing_dev_info = inode_to_bdi(inode); + ret = file_remove_privs(file); + if (ret) + goto error_unlock; + + ret = file_update_time(file); + if (ret) + goto error_unlock; + + inode_unlock(inode); + + ret = wait_on_region(region, NETFS_REGION_IS_RESERVED); + if (ret < 0) + goto error; + + ret = netfs_activate_write(ctx, region); + if (ret < 0) + goto error; + + /* The region excludes overlapping writes and is used to synchronise + * versus flushes. + */ + if (iocb->ki_flags & IOCB_DIRECT) + ret = -EOPNOTSUPP; //netfs_file_direct_write(region, iocb, from); + else + ret = netfs_perform_write(region, iocb, from); + +out: + netfs_put_dirty_region(ctx, region, netfs_region_trace_put_write_iter); + current->backing_dev_info = NULL; + return ret; + +error_unlock: + inode_unlock(inode); +error: + if (region) + netfs_commit_write(ctx, region); + goto out; +} +EXPORT_SYMBOL(netfs_file_write_iter); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 35bcd916c3a0..fc91711d3178 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -165,17 +165,95 @@ struct netfs_read_request { */ struct netfs_i_context { const struct netfs_request_ops *ops; + struct list_head pending_writes; /* List of writes waiting to be begin */ + struct list_head active_writes; /* List of writes being applied */ + struct list_head dirty_regions; /* List of dirty regions in the pagecache */ + struct list_head flush_groups; /* Writeable region ordering queue */ + struct list_head flush_queue; /* Regions that need to be flushed */ #ifdef CONFIG_FSCACHE struct fscache_cookie *cache; #endif unsigned long flags; #define NETFS_ICTX_NEW_CONTENT 0 /* Set if file has new content (create/trunc-0) */ + spinlock_t lock; + unsigned int rsize; /* Maximum read size */ + unsigned int wsize; /* Maximum write size */ + unsigned int bsize; /* Min block size for bounding box */ + unsigned int inval_counter; /* Number of invalidations made */ +}; + +/* + * Descriptor for a set of writes that will need to be flushed together. + */ +struct netfs_flush_group { + struct list_head group_link; /* Link in i_context->flush_groups */ + struct list_head region_list; /* List of regions in this group */ + void *netfs_priv; + refcount_t ref; + bool flush; +}; + +struct netfs_range { + unsigned long long start; /* Start of region */ + unsigned long long end; /* End of region */ +}; + +/* State of a netfs_dirty_region */ +enum netfs_region_state { + NETFS_REGION_IS_PENDING, /* Proposed write is waiting on an active write */ + NETFS_REGION_IS_RESERVED, /* Writable region is reserved, waiting on flushes */ + NETFS_REGION_IS_ACTIVE, /* Write is actively modifying the pagecache */ + NETFS_REGION_IS_DIRTY, /* Region is dirty */ + NETFS_REGION_IS_FLUSHING, /* Region is being flushed */ + NETFS_REGION_IS_COMPLETE, /* Region has been completed (stored/invalidated) */ +} __attribute__((mode(byte))); + +enum netfs_region_type { + NETFS_REGION_ORDINARY, /* Ordinary write */ + NETFS_REGION_DIO, /* Direct I/O write */ + NETFS_REGION_DSYNC, /* O_DSYNC/RWF_DSYNC write */ +} __attribute__((mode(byte))); + +/* + * Descriptor for a dirty region that has a common set of parameters and can + * feasibly be written back in one go. These are held in an ordered list. + * + * Regions are not allowed to overlap, though they may be merged. + */ +struct netfs_dirty_region { + struct netfs_flush_group *group; + struct list_head active_link; /* Link in i_context->pending/active_writes */ + struct list_head dirty_link; /* Link in i_context->dirty_regions */ + struct list_head flush_link; /* Link in group->region_list or + * i_context->flush_queue */ + spinlock_t lock; + void *netfs_priv; /* Private data for the netfs */ + struct netfs_range bounds; /* Bounding box including all affected pages */ + struct netfs_range reserved; /* The region reserved against other writes */ + struct netfs_range dirty; /* The region that has been modified */ + loff_t i_size; /* Size of the file */ + enum netfs_region_type type; + enum netfs_region_state state; + unsigned long flags; +#define NETFS_REGION_SYNC 0 /* Set if metadata sync required (RWF_SYNC) */ +#define NETFS_REGION_FLUSH_Q 1 /* Set if region is on flush queue */ +#define NETFS_REGION_SUPERSEDED 2 /* Set if region is being superseded */ + unsigned int debug_id; + refcount_t ref; +}; + +enum netfs_write_compatibility { + NETFS_WRITES_COMPATIBLE, /* Dirty regions can be directly merged */ + NETFS_WRITES_SUPERSEDE, /* Second write can supersede the first without first + * having to be flushed (eg. authentication, DSYNC) */ + NETFS_WRITES_INCOMPATIBLE, /* Second write must wait for first (eg. DIO, ceph snap) */ }; /* * Operations the network filesystem can/must provide to the helpers. */ struct netfs_request_ops { + /* Read request handling */ void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); int (*begin_cache_operation)(struct netfs_read_request *rreq); void (*expand_readahead)(struct netfs_read_request *rreq); @@ -186,6 +264,17 @@ struct netfs_request_ops { struct page *page, void **_fsdata); void (*done)(struct netfs_read_request *rreq); void (*cleanup)(struct address_space *mapping, void *netfs_priv); + + /* Dirty region handling */ + void (*init_dirty_region)(struct netfs_dirty_region *region, struct file *file); + void (*split_dirty_region)(struct netfs_dirty_region *region); + void (*free_dirty_region)(struct netfs_dirty_region *region); + enum netfs_write_compatibility (*is_write_compatible)( + struct netfs_i_context *ctx, + struct netfs_dirty_region *old_region, + struct netfs_dirty_region *candidate); + bool (*check_compatible_write)(struct netfs_dirty_region *region, struct file *file); + void (*update_i_size)(struct file *file, loff_t i_size); }; /* @@ -234,9 +323,11 @@ extern int netfs_readpage(struct file *, struct page *); extern int netfs_write_begin(struct file *, struct address_space *, loff_t, unsigned int, unsigned int, struct page **, void **); +extern ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from); extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); extern void netfs_stats_show(struct seq_file *); +extern struct netfs_flush_group *netfs_new_flush_group(struct inode *, void *); /** * netfs_i_context - Get the netfs inode context from the inode @@ -256,6 +347,13 @@ static inline void netfs_i_context_init(struct inode *inode, struct netfs_i_context *ctx = netfs_i_context(inode); ctx->ops = ops; + ctx->bsize = PAGE_SIZE; + INIT_LIST_HEAD(&ctx->pending_writes); + INIT_LIST_HEAD(&ctx->active_writes); + INIT_LIST_HEAD(&ctx->dirty_regions); + INIT_LIST_HEAD(&ctx->flush_groups); + INIT_LIST_HEAD(&ctx->flush_queue); + spin_lock_init(&ctx->lock); } /** diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index 04ac29fc700f..808433e6ddd3 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -23,6 +23,7 @@ enum netfs_read_trace { netfs_read_trace_readahead, netfs_read_trace_readpage, netfs_read_trace_write_begin, + netfs_read_trace_prefetch_for_write, }; enum netfs_rreq_trace { @@ -56,12 +57,43 @@ enum netfs_failure { netfs_fail_prepare_write, }; +enum netfs_dirty_trace { + netfs_dirty_trace_active, + netfs_dirty_trace_commit, + netfs_dirty_trace_complete, + netfs_dirty_trace_flush_conflict, + netfs_dirty_trace_flush_dsync, + netfs_dirty_trace_merged_back, + netfs_dirty_trace_merged_forw, + netfs_dirty_trace_merged_sub, + netfs_dirty_trace_modified, + netfs_dirty_trace_new, + netfs_dirty_trace_reserved, + netfs_dirty_trace_split, + netfs_dirty_trace_start_pending, + netfs_dirty_trace_superseded, + netfs_dirty_trace_supersedes, + netfs_dirty_trace_wait_active, + netfs_dirty_trace_wait_pend, +}; + +enum netfs_region_trace { + netfs_region_trace_get_dirty, + netfs_region_trace_get_wreq, + netfs_region_trace_put_discard, + netfs_region_trace_put_merged, + netfs_region_trace_put_write_iter, + netfs_region_trace_free, + netfs_region_trace_new, +}; + #endif #define netfs_read_traces \ EM(netfs_read_trace_expanded, "EXPANDED ") \ EM(netfs_read_trace_readahead, "READAHEAD") \ EM(netfs_read_trace_readpage, "READPAGE ") \ + EM(netfs_read_trace_prefetch_for_write, "PREFETCHW") \ E_(netfs_read_trace_write_begin, "WRITEBEGN") #define netfs_rreq_traces \ @@ -98,6 +130,46 @@ enum netfs_failure { EM(netfs_fail_short_write_begin, "short-write-begin") \ E_(netfs_fail_prepare_write, "prep-write") +#define netfs_region_types \ + EM(NETFS_REGION_ORDINARY, "ORD") \ + EM(NETFS_REGION_DIO, "DIO") \ + E_(NETFS_REGION_DSYNC, "DSY") + +#define netfs_region_states \ + EM(NETFS_REGION_IS_PENDING, "pend") \ + EM(NETFS_REGION_IS_RESERVED, "resv") \ + EM(NETFS_REGION_IS_ACTIVE, "actv") \ + EM(NETFS_REGION_IS_DIRTY, "drty") \ + EM(NETFS_REGION_IS_FLUSHING, "flsh") \ + E_(NETFS_REGION_IS_COMPLETE, "done") + +#define netfs_dirty_traces \ + EM(netfs_dirty_trace_active, "ACTIVE ") \ + EM(netfs_dirty_trace_commit, "COMMIT ") \ + EM(netfs_dirty_trace_complete, "COMPLETE ") \ + EM(netfs_dirty_trace_flush_conflict, "FLSH CONFL") \ + EM(netfs_dirty_trace_flush_dsync, "FLSH DSYNC") \ + EM(netfs_dirty_trace_merged_back, "MERGE BACK") \ + EM(netfs_dirty_trace_merged_forw, "MERGE FORW") \ + EM(netfs_dirty_trace_merged_sub, "SUBSUMED ") \ + EM(netfs_dirty_trace_modified, "MODIFIED ") \ + EM(netfs_dirty_trace_new, "NEW ") \ + EM(netfs_dirty_trace_reserved, "RESERVED ") \ + EM(netfs_dirty_trace_split, "SPLIT ") \ + EM(netfs_dirty_trace_start_pending, "START PEND") \ + EM(netfs_dirty_trace_superseded, "SUPERSEDED") \ + EM(netfs_dirty_trace_supersedes, "SUPERSEDES") \ + EM(netfs_dirty_trace_wait_active, "WAIT ACTV ") \ + E_(netfs_dirty_trace_wait_pend, "WAIT PEND ") + +#define netfs_region_traces \ + EM(netfs_region_trace_get_dirty, "GET DIRTY ") \ + EM(netfs_region_trace_get_wreq, "GET WREQ ") \ + EM(netfs_region_trace_put_discard, "PUT DISCARD") \ + EM(netfs_region_trace_put_merged, "PUT MERGED ") \ + EM(netfs_region_trace_put_write_iter, "PUT WRITER ") \ + EM(netfs_region_trace_free, "FREE ") \ + E_(netfs_region_trace_new, "NEW ") /* * Export enum symbols via userspace. @@ -112,6 +184,9 @@ netfs_rreq_traces; netfs_sreq_sources; netfs_sreq_traces; netfs_failures; +netfs_region_types; +netfs_region_states; +netfs_dirty_traces; /* * Now redefine the EM() and E_() macros to map the enums to the strings that @@ -255,6 +330,111 @@ TRACE_EVENT(netfs_failure, __entry->error) ); +TRACE_EVENT(netfs_write_iter, + TP_PROTO(struct netfs_dirty_region *region, struct kiocb *iocb, + struct iov_iter *from), + + TP_ARGS(region, iocb, from), + + TP_STRUCT__entry( + __field(unsigned int, region ) + __field(unsigned long long, start ) + __field(size_t, len ) + __field(unsigned int, flags ) + ), + + TP_fast_assign( + __entry->region = region->debug_id; + __entry->start = iocb->ki_pos; + __entry->len = iov_iter_count(from); + __entry->flags = iocb->ki_flags; + ), + + TP_printk("D=%x WRITE-ITER s=%llx l=%zx f=%x", + __entry->region, __entry->start, __entry->len, __entry->flags) + ); + +TRACE_EVENT(netfs_ref_region, + TP_PROTO(unsigned int region_debug_id, int ref, + enum netfs_region_trace what), + + TP_ARGS(region_debug_id, ref, what), + + TP_STRUCT__entry( + __field(unsigned int, region ) + __field(int, ref ) + __field(enum netfs_region_trace, what ) + ), + + TP_fast_assign( + __entry->region = region_debug_id; + __entry->ref = ref; + __entry->what = what; + ), + + TP_printk("D=%x %s r=%u", + __entry->region, + __print_symbolic(__entry->what, netfs_region_traces), + __entry->ref) + ); + +TRACE_EVENT(netfs_dirty, + TP_PROTO(struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + struct netfs_dirty_region *region2, + enum netfs_dirty_trace why), + + TP_ARGS(ctx, region, region2, why), + + TP_STRUCT__entry( + __field(ino_t, ino ) + __field(unsigned long long, bounds_start ) + __field(unsigned long long, bounds_end ) + __field(unsigned long long, reserved_start ) + __field(unsigned long long, reserved_end ) + __field(unsigned long long, dirty_start ) + __field(unsigned long long, dirty_end ) + __field(unsigned int, debug_id ) + __field(unsigned int, debug_id2 ) + __field(enum netfs_region_type, type ) + __field(enum netfs_region_state, state ) + __field(unsigned short, flags ) + __field(unsigned int, ref ) + __field(enum netfs_dirty_trace, why ) + ), + + TP_fast_assign( + __entry->ino = (((struct inode *)ctx) - 1)->i_ino; + __entry->why = why; + __entry->bounds_start = region->bounds.start; + __entry->bounds_end = region->bounds.end; + __entry->reserved_start = region->reserved.start; + __entry->reserved_end = region->reserved.end; + __entry->dirty_start = region->dirty.start; + __entry->dirty_end = region->dirty.end; + __entry->debug_id = region->debug_id; + __entry->type = region->type; + __entry->state = region->state; + __entry->flags = region->flags; + __entry->debug_id2 = region2 ? region2->debug_id : 0; + ), + + TP_printk("i=%lx D=%x %s %s dt=%04llx-%04llx bb=%04llx-%04llx rs=%04llx-%04llx %s f=%x XD=%x", + __entry->ino, __entry->debug_id, + __print_symbolic(__entry->why, netfs_dirty_traces), + __print_symbolic(__entry->type, netfs_region_types), + __entry->dirty_start, + __entry->dirty_end, + __entry->bounds_start, + __entry->bounds_end, + __entry->reserved_start, + __entry->reserved_end, + __print_symbolic(__entry->state, netfs_region_states), + __entry->flags, + __entry->debug_id2 + ) + ); + #endif /* _TRACE_NETFS_H */ /* This part must be outside protection */ From patchwork Wed Jul 21 13:46:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 483386 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 919DCC636C9 for ; Wed, 21 Jul 2021 13:47:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8131D61241 for ; Wed, 21 Jul 2021 13:47:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238670AbhGUNGd (ORCPT ); Wed, 21 Jul 2021 09:06:33 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:41497 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238603AbhGUNFx (ORCPT ); Wed, 21 Jul 2021 09:05:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875189; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YIMMVaqdVhebN2c0WSSt5y4n7OTol1pb5VWKxV3dgW0=; b=cHwjCwoNHmvjLkd5ezvisiigvprYEqE33e3f66TxI01C/6BUpXgdXdHGMyQcHmAFN0pZb/ btBAeVg8moJCe+1e8G5qMofO948wcTbOkwGDdwtr5ViF5kGpNQYae6khJFcdUNbpqt1bAZ mf9uBRyWcEZXZHjDttxRoNb4we0TwCw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-455-2dRCJwMHMCuYSt_yp_EJGw-1; Wed, 21 Jul 2021 09:46:25 -0400 X-MC-Unique: 2dRCJwMHMCuYSt_yp_EJGw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5FEE993921; Wed, 21 Jul 2021 13:46:23 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 45A5360583; Wed, 21 Jul 2021 13:46:19 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 07/12] netfs: Initiate write request from a dirty region From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:46:18 +0100 Message-ID: <162687517832.276387.10765642135364197990.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Handle the initiation of writeback of a piece of the dirty list. The first region on the flush list is extracted and a write request is set up to manage it. The pages in the affected region are flipped from dirty to writeback-in-progress. The writeback is then dispatched (which currently just logs a "--- WRITE ---" message to dmesg and then abandons it). Notes: (*) A page may host multiple disjoint dirty regions, each with its own netfs_dirty_region, and a region may span multiple pages. Dirty regions are not permitted to overlap, though they may be merged if they would otherwise overlap. (*) A page may be involved in multiple simultaneous writebacks. Each one is managed by a separate netfs_dirty_region and netfs_write_request. (*) Multiple pages may be required to form a write (for crypto/compression purposes) and so adjacent non-dirty pages may also get marked for writeback. Signed-off-by: David Howells --- fs/afs/file.c | 128 ++---------------- fs/netfs/Makefile | 1 fs/netfs/internal.h | 16 ++ fs/netfs/objects.c | 78 +++++++++++ fs/netfs/read_helper.c | 34 +++++ fs/netfs/stats.c | 6 + fs/netfs/write_back.c | 306 ++++++++++++++++++++++++++++++++++++++++++ fs/netfs/xa_iterator.h | 85 ++++++++++++ include/linux/netfs.h | 35 +++++ include/trace/events/netfs.h | 72 ++++++++++ 10 files changed, 642 insertions(+), 119 deletions(-) create mode 100644 fs/netfs/write_back.c create mode 100644 fs/netfs/xa_iterator.h diff --git a/fs/afs/file.c b/fs/afs/file.c index 8400cdf086b6..a6d483fe4e74 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -19,9 +19,6 @@ static int afs_file_mmap(struct file *file, struct vm_area_struct *vma); static int afs_symlink_readpage(struct file *file, struct page *page); -static void afs_invalidatepage(struct page *page, unsigned int offset, - unsigned int length); -static int afs_releasepage(struct page *page, gfp_t gfp_flags); static ssize_t afs_direct_IO(struct kiocb *iocb, struct iov_iter *iter); @@ -50,17 +47,17 @@ const struct address_space_operations afs_file_aops = { .readahead = netfs_readahead, .set_page_dirty = afs_set_page_dirty, .launder_page = afs_launder_page, - .releasepage = afs_releasepage, - .invalidatepage = afs_invalidatepage, + .releasepage = netfs_releasepage, + .invalidatepage = netfs_invalidatepage, .direct_IO = afs_direct_IO, .writepage = afs_writepage, - .writepages = afs_writepages, + .writepages = netfs_writepages, }; const struct address_space_operations afs_symlink_aops = { .readpage = afs_symlink_readpage, - .releasepage = afs_releasepage, - .invalidatepage = afs_invalidatepage, + .releasepage = netfs_releasepage, + .invalidatepage = netfs_invalidatepage, }; static const struct vm_operations_struct afs_vm_ops = { @@ -378,6 +375,11 @@ static void afs_free_dirty_region(struct netfs_dirty_region *region) key_put(region->netfs_priv); } +static void afs_init_wreq(struct netfs_write_request *wreq) +{ + //wreq->netfs_priv = key_get(afs_file_key(file)); +} + static void afs_update_i_size(struct file *file, loff_t new_i_size) { struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); @@ -400,6 +402,7 @@ const struct netfs_request_ops afs_req_ops = { .init_dirty_region = afs_init_dirty_region, .free_dirty_region = afs_free_dirty_region, .update_i_size = afs_update_i_size, + .init_wreq = afs_init_wreq, }; int afs_write_inode(struct inode *inode, struct writeback_control *wbc) @@ -408,115 +411,6 @@ int afs_write_inode(struct inode *inode, struct writeback_control *wbc) return 0; } -/* - * Adjust the dirty region of the page on truncation or full invalidation, - * getting rid of the markers altogether if the region is entirely invalidated. - */ -static void afs_invalidate_dirty(struct page *page, unsigned int offset, - unsigned int length) -{ - struct afs_vnode *vnode = AFS_FS_I(page->mapping->host); - unsigned long priv; - unsigned int f, t, end = offset + length; - - priv = page_private(page); - - /* we clean up only if the entire page is being invalidated */ - if (offset == 0 && length == thp_size(page)) - goto full_invalidate; - - /* If the page was dirtied by page_mkwrite(), the PTE stays writable - * and we don't get another notification to tell us to expand it - * again. - */ - if (afs_is_page_dirty_mmapped(priv)) - return; - - /* We may need to shorten the dirty region */ - f = afs_page_dirty_from(page, priv); - t = afs_page_dirty_to(page, priv); - - if (t <= offset || f >= end) - return; /* Doesn't overlap */ - - if (f < offset && t > end) - return; /* Splits the dirty region - just absorb it */ - - if (f >= offset && t <= end) - goto undirty; - - if (f < offset) - t = offset; - else - f = end; - if (f == t) - goto undirty; - - priv = afs_page_dirty(page, f, t); - set_page_private(page, priv); - trace_afs_page_dirty(vnode, tracepoint_string("trunc"), page); - return; - -undirty: - trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page); - clear_page_dirty_for_io(page); -full_invalidate: - trace_afs_page_dirty(vnode, tracepoint_string("inval"), page); - detach_page_private(page); -} - -/* - * invalidate part or all of a page - * - release a page and clean up its private data if offset is 0 (indicating - * the entire page) - */ -static void afs_invalidatepage(struct page *page, unsigned int offset, - unsigned int length) -{ - _enter("{%lu},%u,%u", page->index, offset, length); - - BUG_ON(!PageLocked(page)); - - if (PagePrivate(page)) - afs_invalidate_dirty(page, offset, length); - - wait_on_page_fscache(page); - _leave(""); -} - -/* - * release a page and clean up its private state if it's not busy - * - return true if the page can now be released, false if not - */ -static int afs_releasepage(struct page *page, gfp_t gfp_flags) -{ - struct afs_vnode *vnode = AFS_FS_I(page->mapping->host); - - _enter("{{%llx:%llu}[%lu],%lx},%x", - vnode->fid.vid, vnode->fid.vnode, page->index, page->flags, - gfp_flags); - - /* deny if page is being written to the cache and the caller hasn't - * elected to wait */ -#ifdef CONFIG_AFS_FSCACHE - if (PageFsCache(page)) { - if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS)) - return false; - wait_on_page_fscache(page); - fscache_note_page_release(afs_vnode_cache(vnode)); - } -#endif - - if (PagePrivate(page)) { - trace_afs_page_dirty(vnode, tracepoint_string("rel"), page); - detach_page_private(page); - } - - /* indicate that the page can be released */ - _leave(" = T"); - return 1; -} - /* * Handle setting up a memory mapping on an AFS file. */ diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index 3e11453ad2c5..a201fd7b22cf 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -3,6 +3,7 @@ netfs-y := \ objects.o \ read_helper.o \ + write_back.o \ write_helper.o # dio_helper.o diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index 77ceab694348..fe85581d8ac0 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -8,6 +8,7 @@ #include #include #include +#include "xa_iterator.h" #ifdef pr_fmt #undef pr_fmt @@ -34,6 +35,19 @@ void netfs_free_dirty_region(struct netfs_i_context *ctx, struct netfs_dirty_reg void netfs_put_dirty_region(struct netfs_i_context *ctx, struct netfs_dirty_region *region, enum netfs_region_trace what); +struct netfs_write_request *netfs_alloc_write_request(struct address_space *mapping, + bool is_dio); +void netfs_get_write_request(struct netfs_write_request *wreq, + enum netfs_wreq_trace what); +void netfs_free_write_request(struct work_struct *work); +void netfs_put_write_request(struct netfs_write_request *wreq, + bool was_async, enum netfs_wreq_trace what); + +static inline void netfs_see_write_request(struct netfs_write_request *wreq, + enum netfs_wreq_trace what) +{ + trace_netfs_ref_wreq(wreq->debug_id, refcount_read(&wreq->usage), what); +} /* * read_helper.c @@ -46,6 +60,7 @@ int netfs_prefetch_for_write(struct file *file, struct page *page, loff_t pos, s /* * write_helper.c */ +void netfs_writeback_worker(struct work_struct *work); void netfs_flush_region(struct netfs_i_context *ctx, struct netfs_dirty_region *region, enum netfs_dirty_trace why); @@ -74,6 +89,7 @@ extern atomic_t netfs_n_rh_write_failed; extern atomic_t netfs_n_rh_write_zskip; extern atomic_t netfs_n_wh_region; extern atomic_t netfs_n_wh_flush_group; +extern atomic_t netfs_n_wh_wreq; static inline void netfs_stat(atomic_t *stat) diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c index ba1e052aa352..6e9b2a00076d 100644 --- a/fs/netfs/objects.c +++ b/fs/netfs/objects.c @@ -111,3 +111,81 @@ void netfs_put_dirty_region(struct netfs_i_context *ctx, netfs_free_dirty_region(ctx, region); } } + +struct netfs_write_request *netfs_alloc_write_request(struct address_space *mapping, + bool is_dio) +{ + static atomic_t debug_ids; + struct inode *inode = mapping->host; + struct netfs_i_context *ctx = netfs_i_context(inode); + struct netfs_write_request *wreq; + + wreq = kzalloc(sizeof(struct netfs_write_request), GFP_KERNEL); + if (wreq) { + wreq->mapping = mapping; + wreq->inode = inode; + wreq->netfs_ops = ctx->ops; + wreq->debug_id = atomic_inc_return(&debug_ids); + xa_init(&wreq->buffer); + INIT_WORK(&wreq->work, netfs_writeback_worker); + refcount_set(&wreq->usage, 1); + ctx->ops->init_wreq(wreq); + netfs_stat(&netfs_n_wh_wreq); + trace_netfs_ref_wreq(wreq->debug_id, 1, netfs_wreq_trace_new); + } + + return wreq; +} + +void netfs_get_write_request(struct netfs_write_request *wreq, + enum netfs_wreq_trace what) +{ + int ref; + + __refcount_inc(&wreq->usage, &ref); + trace_netfs_ref_wreq(wreq->debug_id, ref + 1, what); +} + +void netfs_free_write_request(struct work_struct *work) +{ + struct netfs_write_request *wreq = + container_of(work, struct netfs_write_request, work); + struct netfs_i_context *ctx = netfs_i_context(wreq->inode); + struct page *page; + pgoff_t index; + + if (wreq->netfs_priv) + wreq->netfs_ops->cleanup(wreq->mapping, wreq->netfs_priv); + trace_netfs_ref_wreq(wreq->debug_id, 0, netfs_wreq_trace_free); + if (wreq->cache_resources.ops) + wreq->cache_resources.ops->end_operation(&wreq->cache_resources); + if (wreq->region) + netfs_put_dirty_region(ctx, wreq->region, + netfs_region_trace_put_wreq); + xa_for_each(&wreq->buffer, index, page) { + __free_page(page); + } + xa_destroy(&wreq->buffer); + kfree(wreq); + netfs_stat_d(&netfs_n_wh_wreq); +} + +void netfs_put_write_request(struct netfs_write_request *wreq, + bool was_async, enum netfs_wreq_trace what) +{ + unsigned int debug_id = wreq->debug_id; + bool dead; + int ref; + + dead = __refcount_dec_and_test(&wreq->usage, &ref); + trace_netfs_ref_wreq(debug_id, ref - 1, what); + if (dead) { + if (was_async) { + wreq->work.func = netfs_free_write_request; + if (!queue_work(system_unbound_wq, &wreq->work)) + BUG(); + } else { + netfs_free_write_request(&wreq->work); + } + } +} diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index bfcdbbd32f4c..0b771f2f5449 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -1415,3 +1415,37 @@ int netfs_prefetch_for_write(struct file *file, struct page *page, _leave(" = %d", ret); return ret; } + +/* + * Invalidate part or all of a page + * - release a page and clean up its private data if offset is 0 (indicating + * the entire page) + */ +void netfs_invalidatepage(struct page *page, unsigned int offset, unsigned int length) +{ + _enter("{%lu},%u,%u", page->index, offset, length); + + wait_on_page_fscache(page); +} +EXPORT_SYMBOL(netfs_invalidatepage); + +/* + * Release a page and clean up its private state if it's not busy + * - return true if the page can now be released, false if not + */ +int netfs_releasepage(struct page *page, gfp_t gfp_flags) +{ + struct netfs_i_context *ctx = netfs_i_context(page->mapping->host); + + kenter(""); + + if (PageFsCache(page)) { + if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS)) + return false; + wait_on_page_fscache(page); + fscache_note_page_release(ctx->cache); + } + + return true; +} +EXPORT_SYMBOL(netfs_releasepage); diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index 7c079ca47b5b..ac2510f8cab0 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -29,6 +29,7 @@ atomic_t netfs_n_rh_write_failed; atomic_t netfs_n_rh_write_zskip; atomic_t netfs_n_wh_region; atomic_t netfs_n_wh_flush_group; +atomic_t netfs_n_wh_wreq; void netfs_stats_show(struct seq_file *m) { @@ -56,8 +57,9 @@ void netfs_stats_show(struct seq_file *m) atomic_read(&netfs_n_rh_write), atomic_read(&netfs_n_rh_write_done), atomic_read(&netfs_n_rh_write_failed)); - seq_printf(m, "WrHelp : R=%u F=%u\n", + seq_printf(m, "WrHelp : R=%u F=%u wr=%u\n", atomic_read(&netfs_n_wh_region), - atomic_read(&netfs_n_wh_flush_group)); + atomic_read(&netfs_n_wh_flush_group), + atomic_read(&netfs_n_wh_wreq)); } EXPORT_SYMBOL(netfs_stats_show); diff --git a/fs/netfs/write_back.c b/fs/netfs/write_back.c new file mode 100644 index 000000000000..9fcb2ac50ebb --- /dev/null +++ b/fs/netfs/write_back.c @@ -0,0 +1,306 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Network filesystem high-level write support. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include "internal.h" + +/* + * Process a write request. + */ +static void netfs_writeback(struct netfs_write_request *wreq) +{ + kdebug("--- WRITE ---"); +} + +void netfs_writeback_worker(struct work_struct *work) +{ + struct netfs_write_request *wreq = + container_of(work, struct netfs_write_request, work); + + netfs_see_write_request(wreq, netfs_wreq_trace_see_work); + netfs_writeback(wreq); + netfs_put_write_request(wreq, false, netfs_wreq_trace_put_work); +} + +/* + * Flush some of the dirty queue. + */ +static int netfs_flush_dirty(struct address_space *mapping, + struct writeback_control *wbc, + struct netfs_range *range, + loff_t *next) +{ + struct netfs_dirty_region *p, *q; + struct netfs_i_context *ctx = netfs_i_context(mapping->host); + + kenter("%llx-%llx", range->start, range->end); + + spin_lock(&ctx->lock); + + /* Scan forwards to find dirty regions containing the suggested start + * point. + */ + list_for_each_entry_safe(p, q, &ctx->dirty_regions, dirty_link) { + _debug("D=%x %llx-%llx", p->debug_id, p->dirty.start, p->dirty.end); + if (p->dirty.end <= range->start) + continue; + if (p->dirty.start >= range->end) + break; + if (p->state != NETFS_REGION_IS_DIRTY) + continue; + if (test_bit(NETFS_REGION_FLUSH_Q, &p->flags)) + continue; + + netfs_flush_region(ctx, p, netfs_dirty_trace_flush_writepages); + } + + spin_unlock(&ctx->lock); + return 0; +} + +static int netfs_unlock_pages_iterator(struct page *page) +{ + unlock_page(page); + put_page(page); + return 0; +} + +/* + * Unlock all the pages in a range. + */ +static void netfs_unlock_pages(struct address_space *mapping, + pgoff_t start, pgoff_t end) +{ + netfs_iterate_pages(mapping, start, end, netfs_unlock_pages_iterator); +} + +static int netfs_lock_pages_iterator(struct xa_state *xas, + struct page *page, + struct netfs_write_request *wreq, + struct writeback_control *wbc) +{ + int ret; + + /* At this point we hold neither the i_pages lock nor the + * page lock: the page may be truncated or invalidated + * (changing page->mapping to NULL), or even swizzled + * back from swapper_space to tmpfs file mapping + */ + if (wbc->sync_mode != WB_SYNC_NONE) { + xas_pause(xas); + rcu_read_unlock(); + ret = lock_page_killable(page); + rcu_read_lock(); + } else { + if (!trylock_page(page)) + ret = -EBUSY; + } + + return ret; +} + +/* + * Lock all the pages in a range and add them to the write request. + */ +static int netfs_lock_pages(struct address_space *mapping, + struct writeback_control *wbc, + struct netfs_write_request *wreq) +{ + pgoff_t last = wreq->last; + int ret; + + kenter("%lx-%lx", wreq->first, wreq->last); + ret = netfs_iterate_get_pages(mapping, wreq->first, wreq->last, + netfs_lock_pages_iterator, wreq, wbc); + if (ret < 0) + goto failed; + + if (wreq->last < last) { + kdebug("Some pages missing %lx < %lx", wreq->last, last); + ret = -EIO; + goto failed; + } + + return 0; + +failed: + netfs_unlock_pages(mapping, wreq->first, wreq->last); + return ret; +} + +static int netfs_set_page_writeback(struct page *page) +{ + /* Now we need to clear the dirty flags on any page that's not shared + * with any other dirty region. + */ + if (!clear_page_dirty_for_io(page)) + BUG(); + + /* We set writeback unconditionally because a page may participate in + * more than one simultaneous writeback. + */ + set_page_writeback(page); + return 0; +} + +/* + * Extract a region to write back. + */ +static struct netfs_dirty_region *netfs_extract_dirty_region( + struct netfs_i_context *ctx, + struct netfs_write_request *wreq) +{ + struct netfs_dirty_region *region = NULL, *spare; + + spare = netfs_alloc_dirty_region(); + if (!spare) + return NULL; + + spin_lock(&ctx->lock); + + if (list_empty(&ctx->flush_queue)) + goto out; + + region = list_first_entry(&ctx->flush_queue, + struct netfs_dirty_region, flush_link); + + wreq->region = netfs_get_dirty_region(ctx, region, netfs_region_trace_get_wreq); + wreq->start = region->dirty.start; + wreq->len = region->dirty.end - region->dirty.start; + wreq->first = region->dirty.start / PAGE_SIZE; + wreq->last = (region->dirty.end - 1) / PAGE_SIZE; + + /* TODO: Split the region if it's larger than a certain size. This is + * tricky as we need to observe page, crypto and compression block + * boundaries. The crypto/comp bounds are defined by ctx->bsize, but + * we don't know where the page boundaries are. + * + * All of these boundaries, however, must be pow-of-2 sized and + * pow-of-2 aligned, so they never partially overlap + */ + + smp_store_release(®ion->state, NETFS_REGION_IS_FLUSHING); + trace_netfs_dirty(ctx, region, NULL, netfs_dirty_trace_flushing); + wake_up_var(®ion->state); + list_del_init(®ion->flush_link); + +out: + spin_unlock(&ctx->lock); + netfs_free_dirty_region(ctx, spare); + kleave(" = D=%x", region ? region->debug_id : 0); + return region; +} + +/* + * Schedule a write for the first region on the flush queue. + */ +static int netfs_begin_write(struct address_space *mapping, + struct writeback_control *wbc) +{ + struct netfs_write_request *wreq; + struct netfs_dirty_region *region; + struct netfs_i_context *ctx = netfs_i_context(mapping->host); + int ret; + + wreq = netfs_alloc_write_request(mapping, false); + if (!wreq) + return -ENOMEM; + + ret = 0; + region = netfs_extract_dirty_region(ctx, wreq); + if (!region) + goto error; + + ret = netfs_lock_pages(mapping, wbc, wreq); + if (ret < 0) + goto error; + + trace_netfs_wreq(wreq); + + netfs_iterate_pages(mapping, wreq->first, wreq->last, + netfs_set_page_writeback); + netfs_unlock_pages(mapping, wreq->first, wreq->last); + iov_iter_xarray(&wreq->source, WRITE, &wreq->mapping->i_pages, + wreq->start, wreq->len); + + if (!queue_work(system_unbound_wq, &wreq->work)) + BUG(); + + kleave(" = %lu", wreq->last - wreq->first + 1); + return wreq->last - wreq->first + 1; + +error: + netfs_put_write_request(wreq, wbc->sync_mode != WB_SYNC_NONE, + netfs_wreq_trace_put_discard); + kleave(" = %d", ret); + return ret; +} + +/** + * netfs_writepages - Initiate writeback to the server and cache + * @mapping: The pagecache to write from + * @wbc: Hints from the VM as to what to write + * + * This is a helper intended to be called directly from a network filesystem's + * address space operations table to perform writeback to the server and the + * cache. + * + * We have to be careful as we can end up racing with setattr() truncating the + * pagecache since the caller doesn't take a lock here to prevent it. + */ +int netfs_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + struct netfs_range range; + loff_t next; + int ret; + + kenter("%lx,%llx-%llx,%u,%c%c%c%c,%u,%u", + wbc->nr_to_write, + wbc->range_start, wbc->range_end, + wbc->sync_mode, + wbc->for_kupdate ? 'k' : '-', + wbc->for_background ? 'b' : '-', + wbc->for_reclaim ? 'r' : '-', + wbc->for_sync ? 's' : '-', + wbc->tagged_writepages, + wbc->range_cyclic); + + //dump_stack(); + + if (wbc->range_cyclic) { + range.start = mapping->writeback_index * PAGE_SIZE; + range.end = ULLONG_MAX; + ret = netfs_flush_dirty(mapping, wbc, &range, &next); + if (range.start > 0 && wbc->nr_to_write > 0 && ret == 0) { + range.start = 0; + range.end = mapping->writeback_index * PAGE_SIZE; + ret = netfs_flush_dirty(mapping, wbc, &range, &next); + } + mapping->writeback_index = next / PAGE_SIZE; + } else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) { + range.start = 0; + range.end = ULLONG_MAX; + ret = netfs_flush_dirty(mapping, wbc, &range, &next); + if (wbc->nr_to_write > 0 && ret == 0) + mapping->writeback_index = next; + } else { + range.start = wbc->range_start; + range.end = wbc->range_end + 1; + ret = netfs_flush_dirty(mapping, wbc, &range, &next); + } + + if (ret == 0) + ret = netfs_begin_write(mapping, wbc); + + _leave(" = %d", ret); + return ret; +} +EXPORT_SYMBOL(netfs_writepages); diff --git a/fs/netfs/xa_iterator.h b/fs/netfs/xa_iterator.h new file mode 100644 index 000000000000..3f37827f0f99 --- /dev/null +++ b/fs/netfs/xa_iterator.h @@ -0,0 +1,85 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* xarray iterator macros for netfslib. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +/* + * Iterate over a range of pages. xarray locks are not held over the iterator + * function, so it can sleep if necessary. The start and end positions are + * updated to indicate the span of pages actually processed. + */ +#define netfs_iterate_pages(MAPPING, START, END, ITERATOR, ...) \ + ({ \ + unsigned long __it_index; \ + struct page *page; \ + pgoff_t __it_start = (START); \ + pgoff_t __it_end = (END); \ + pgoff_t __it_tmp; \ + int ret = 0; \ + \ + (END) = __it_start; \ + xa_for_each_range(&(MAPPING)->i_pages, __it_index, page, \ + __it_start, __it_end) { \ + if (xa_is_value(page)) { \ + ret = -EIO; /* Not a real page. */ \ + break; \ + } \ + if (__it_index < (START)) \ + (START) = __it_index; \ + ret = ITERATOR(page, ##__VA_ARGS__); \ + if (ret < 0) \ + break; \ + __it_tmp = __it_index + thp_nr_pages(page) - 1; \ + if (__it_tmp > (END)) \ + (END) = __it_tmp; \ + } \ + ret; \ + }) + +/* + * Iterate over a set of pages, getting each one before calling the iteration + * function. The iteration function may drop the RCU read lock, but should + * call xas_pause() before it does so. The start and end positions are updated + * to indicate the span of pages actually processed. + */ +#define netfs_iterate_get_pages(MAPPING, START, END, ITERATOR, ...) \ + ({ \ + unsigned long __it_index; \ + struct page *page; \ + pgoff_t __it_start = (START); \ + pgoff_t __it_end = (END); \ + pgoff_t __it_tmp; \ + int ret = 0; \ + \ + XA_STATE(xas, &(MAPPING)->i_pages, __it_start); \ + (END) = __it_start; \ + rcu_read_lock(); \ + for (page = xas_load(&xas); page; page = xas_next_entry(&xas, __it_end)) { \ + if (xas_retry(&xas, page)) \ + continue; \ + if (xa_is_value(page)) \ + break; \ + if (!page_cache_get_speculative(page)) { \ + xas_reset(&xas); \ + continue; \ + } \ + if (unlikely(page != xas_reload(&xas))) { \ + put_page(page); \ + xas_reset(&xas); \ + continue; \ + } \ + __it_index = page_index(page); \ + if (__it_index < (START)) \ + (START) = __it_index; \ + ret = ITERATOR(&xas, page, ##__VA_ARGS__); \ + if (ret < 0) \ + break; \ + __it_tmp = __it_index + thp_nr_pages(page) - 1; \ + if (__it_tmp > (END)) \ + (END) = __it_tmp; \ + } \ + rcu_read_unlock(); \ + ret; \ + }) diff --git a/include/linux/netfs.h b/include/linux/netfs.h index fc91711d3178..9f874e7ed45a 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -242,6 +242,35 @@ struct netfs_dirty_region { refcount_t ref; }; +/* + * Descriptor for a write request. This is used to manage the preparation and + * storage of a sequence of dirty data - its compression/encryption and its + * writing to one or more servers and the cache. + * + * The prepared data is buffered here. + */ +struct netfs_write_request { + struct work_struct work; + struct inode *inode; /* The file being accessed */ + struct address_space *mapping; /* The mapping being accessed */ + struct netfs_dirty_region *region; /* The region we're writing back */ + struct netfs_cache_resources cache_resources; + struct xarray buffer; /* Buffer for encrypted/compressed data */ + struct iov_iter source; /* The iterator to be used */ + struct list_head write_link; /* Link in i_context->write_requests */ + void *netfs_priv; /* Private data for the netfs */ + unsigned int debug_id; + short error; /* 0 or error that occurred */ + loff_t i_size; /* Size of the file */ + loff_t start; /* Start position */ + size_t len; /* Length of the request */ + pgoff_t first; /* First page included */ + pgoff_t last; /* Last page included */ + refcount_t usage; + unsigned long flags; + const struct netfs_request_ops *netfs_ops; +}; + enum netfs_write_compatibility { NETFS_WRITES_COMPATIBLE, /* Dirty regions can be directly merged */ NETFS_WRITES_SUPERSEDE, /* Second write can supersede the first without first @@ -275,6 +304,9 @@ struct netfs_request_ops { struct netfs_dirty_region *candidate); bool (*check_compatible_write)(struct netfs_dirty_region *region, struct file *file); void (*update_i_size)(struct file *file, loff_t i_size); + + /* Write request handling */ + void (*init_wreq)(struct netfs_write_request *wreq); }; /* @@ -324,6 +356,9 @@ extern int netfs_write_begin(struct file *, struct address_space *, loff_t, unsigned int, unsigned int, struct page **, void **); extern ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from); +extern int netfs_writepages(struct address_space *mapping, struct writeback_control *wbc); +extern void netfs_invalidatepage(struct page *page, unsigned int offset, unsigned int length); +extern int netfs_releasepage(struct page *page, gfp_t gfp_flags); extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); extern void netfs_stats_show(struct seq_file *); diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index 808433e6ddd3..e70abb5033e6 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -63,6 +63,8 @@ enum netfs_dirty_trace { netfs_dirty_trace_complete, netfs_dirty_trace_flush_conflict, netfs_dirty_trace_flush_dsync, + netfs_dirty_trace_flush_writepages, + netfs_dirty_trace_flushing, netfs_dirty_trace_merged_back, netfs_dirty_trace_merged_forw, netfs_dirty_trace_merged_sub, @@ -82,11 +84,20 @@ enum netfs_region_trace { netfs_region_trace_get_wreq, netfs_region_trace_put_discard, netfs_region_trace_put_merged, + netfs_region_trace_put_wreq, netfs_region_trace_put_write_iter, netfs_region_trace_free, netfs_region_trace_new, }; +enum netfs_wreq_trace { + netfs_wreq_trace_free, + netfs_wreq_trace_put_discard, + netfs_wreq_trace_put_work, + netfs_wreq_trace_see_work, + netfs_wreq_trace_new, +}; + #endif #define netfs_read_traces \ @@ -149,6 +160,8 @@ enum netfs_region_trace { EM(netfs_dirty_trace_complete, "COMPLETE ") \ EM(netfs_dirty_trace_flush_conflict, "FLSH CONFL") \ EM(netfs_dirty_trace_flush_dsync, "FLSH DSYNC") \ + EM(netfs_dirty_trace_flush_writepages, "WRITEPAGES") \ + EM(netfs_dirty_trace_flushing, "FLUSHING ") \ EM(netfs_dirty_trace_merged_back, "MERGE BACK") \ EM(netfs_dirty_trace_merged_forw, "MERGE FORW") \ EM(netfs_dirty_trace_merged_sub, "SUBSUMED ") \ @@ -167,10 +180,19 @@ enum netfs_region_trace { EM(netfs_region_trace_get_wreq, "GET WREQ ") \ EM(netfs_region_trace_put_discard, "PUT DISCARD") \ EM(netfs_region_trace_put_merged, "PUT MERGED ") \ + EM(netfs_region_trace_put_wreq, "PUT WREQ ") \ EM(netfs_region_trace_put_write_iter, "PUT WRITER ") \ EM(netfs_region_trace_free, "FREE ") \ E_(netfs_region_trace_new, "NEW ") +#define netfs_wreq_traces \ + EM(netfs_wreq_trace_free, "FREE ") \ + EM(netfs_wreq_trace_put_discard, "PUT DISCARD") \ + EM(netfs_wreq_trace_put_work, "PUT WORK ") \ + EM(netfs_wreq_trace_see_work, "SEE WORK ") \ + E_(netfs_wreq_trace_new, "NEW ") + + /* * Export enum symbols via userspace. */ @@ -187,6 +209,7 @@ netfs_failures; netfs_region_types; netfs_region_states; netfs_dirty_traces; +netfs_wreq_traces; /* * Now redefine the EM() and E_() macros to map the enums to the strings that @@ -435,6 +458,55 @@ TRACE_EVENT(netfs_dirty, ) ); +TRACE_EVENT(netfs_wreq, + TP_PROTO(struct netfs_write_request *wreq), + + TP_ARGS(wreq), + + TP_STRUCT__entry( + __field(unsigned int, wreq ) + __field(unsigned int, cookie ) + __field(loff_t, start ) + __field(size_t, len ) + ), + + TP_fast_assign( + __entry->wreq = wreq->debug_id; + __entry->cookie = wreq->cache_resources.debug_id; + __entry->start = wreq->start; + __entry->len = wreq->len; + ), + + TP_printk("W=%08x c=%08x s=%llx %zx", + __entry->wreq, + __entry->cookie, + __entry->start, __entry->len) + ); + +TRACE_EVENT(netfs_ref_wreq, + TP_PROTO(unsigned int wreq_debug_id, int ref, + enum netfs_wreq_trace what), + + TP_ARGS(wreq_debug_id, ref, what), + + TP_STRUCT__entry( + __field(unsigned int, wreq ) + __field(int, ref ) + __field(enum netfs_wreq_trace, what ) + ), + + TP_fast_assign( + __entry->wreq = wreq_debug_id; + __entry->ref = ref; + __entry->what = what; + ), + + TP_printk("W=%08x %s r=%u", + __entry->wreq, + __print_symbolic(__entry->what, netfs_wreq_traces), + __entry->ref) + ); + #endif /* _TRACE_NETFS_H */ /* This part must be outside protection */ From patchwork Wed Jul 21 13:47:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 483384 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70EA9C6377A for ; Wed, 21 Jul 2021 13:47:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5AFFF60FF4 for ; Wed, 21 Jul 2021 13:47:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238793AbhGUNGy (ORCPT ); Wed, 21 Jul 2021 09:06:54 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:49083 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238827AbhGUNGi (ORCPT ); Wed, 21 Jul 2021 09:06:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875234; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7b17LEAqphmyTt8gMhglRAPvylMkn7ZjLScOd5X+QhQ=; b=dyoKYpj758H7f2b/XB+/Rug4dsf0/hd3uY9dH48tyDhSlM/wRvD4aRe4PicgYAQm9RfFbX t6PveiW+jYGY1vEI3YKUjE7pnwGk3Wmg8KY/A/HMScY08EY1SGAyakUfD/7w4g9KfRldaZ yy6hSOaHKEdJpfER1gJsboktqIOZNkY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-479-LqENnw8hMdOglizr9-Q89w-1; Wed, 21 Jul 2021 09:47:13 -0400 X-MC-Unique: LqENnw8hMdOglizr9-Q89w-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1C0C1800581; Wed, 21 Jul 2021 13:47:10 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 675D85D6D1; Wed, 21 Jul 2021 13:47:02 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 11/12] netfs: Put a list of regions in /proc/fs/netfs/regions From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:47:01 +0100 Message-ID: <162687522190.276387.10953470388038836276.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org --- fs/netfs/Makefile | 1 fs/netfs/internal.h | 24 +++++++++++ fs/netfs/main.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++ fs/netfs/objects.c | 6 ++- fs/netfs/write_helper.c | 4 ++ include/linux/netfs.h | 1 6 files changed, 139 insertions(+), 1 deletion(-) create mode 100644 fs/netfs/main.c diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index a7c3a9173ac0..62dad3d7bea0 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 netfs-y := \ + main.o \ objects.o \ read_helper.o \ write_back.o \ diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index 381ca64062eb..a9ec6591f90a 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -22,6 +22,30 @@ ssize_t netfs_file_direct_write(struct netfs_dirty_region *region, struct kiocb *iocb, struct iov_iter *from); +/* + * main.c + */ +extern struct list_head netfs_regions; +extern spinlock_t netfs_regions_lock; + +#ifdef CONFIG_PROC_FS +static inline void netfs_proc_add_region(struct netfs_dirty_region *region) +{ + spin_lock(&netfs_regions_lock); + list_add_tail_rcu(®ion->proc_link, &netfs_regions); + spin_unlock(&netfs_regions_lock); +} +static inline void netfs_proc_del_region(struct netfs_dirty_region *region) +{ + spin_lock(&netfs_regions_lock); + list_del_rcu(®ion->proc_link); + spin_unlock(&netfs_regions_lock); +} +#else +static inline void netfs_proc_add_region(struct netfs_dirty_region *region) {} +static inline void netfs_proc_del_region(struct netfs_dirty_region *region) {} +#endif + /* * objects.c */ diff --git a/fs/netfs/main.c b/fs/netfs/main.c new file mode 100644 index 000000000000..125b570efefd --- /dev/null +++ b/fs/netfs/main.c @@ -0,0 +1,104 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Network filesystem library. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include +#include "internal.h" + +#ifdef CONFIG_PROC_FS +LIST_HEAD(netfs_regions); +DEFINE_SPINLOCK(netfs_regions_lock); + +static const char netfs_proc_region_states[] = "PRADFC"; +static const char *netfs_proc_region_types[] = { + [NETFS_REGION_ORDINARY] = "ORD ", + [NETFS_REGION_DIO] = "DIOW", + [NETFS_REGION_DSYNC] = "DSYN", +}; + +/* + * Generate a list of regions in /proc/fs/netfs/regions + */ +static int netfs_regions_seq_show(struct seq_file *m, void *v) +{ + struct netfs_dirty_region *region; + + if (v == &netfs_regions) { + seq_puts(m, + "REGION REF TYPE S FL DEV INODE DIRTY, BOUNDS, RESV\n" + "======== === ==== = == ===== ======== ==============================\n" + ); + return 0; + } + + region = list_entry(v, struct netfs_dirty_region, proc_link); + seq_printf(m, + "%08x %3d %s %c %2lx %02x:%02x %8x %04llx-%04llx %04llx-%04llx %04llx-%04llx\n", + region->debug_id, + refcount_read(®ion->ref), + netfs_proc_region_types[region->type], + netfs_proc_region_states[region->state], + region->flags, + 0, 0, 0, + region->dirty.start, region->dirty.end, + region->bounds.start, region->bounds.end, + region->reserved.start, region->reserved.end); + return 0; +} + +static void *netfs_regions_seq_start(struct seq_file *m, loff_t *_pos) + __acquires(rcu) +{ + rcu_read_lock(); + return seq_list_start_head(&netfs_regions, *_pos); +} + +static void *netfs_regions_seq_next(struct seq_file *m, void *v, loff_t *_pos) +{ + return seq_list_next(v, &netfs_regions, _pos); +} + +static void netfs_regions_seq_stop(struct seq_file *m, void *v) + __releases(rcu) +{ + rcu_read_unlock(); +} + +const struct seq_operations netfs_regions_seq_ops = { + .start = netfs_regions_seq_start, + .next = netfs_regions_seq_next, + .stop = netfs_regions_seq_stop, + .show = netfs_regions_seq_show, +}; +#endif /* CONFIG_PROC_FS */ + +static int __init netfs_init(void) +{ + if (!proc_mkdir("fs/netfs", NULL)) + goto error; + + if (!proc_create_seq("fs/netfs/regions", S_IFREG | 0444, NULL, + &netfs_regions_seq_ops)) + goto error_proc; + + return 0; + +error_proc: + remove_proc_entry("fs/netfs", NULL); +error: + return -ENOMEM; +} +fs_initcall(netfs_init); + +static void __exit netfs_exit(void) +{ + remove_proc_entry("fs/netfs", NULL); +} +module_exit(netfs_exit); diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c index 8926b4230d91..1149f12ca8c9 100644 --- a/fs/netfs/objects.c +++ b/fs/netfs/objects.c @@ -60,8 +60,10 @@ struct netfs_dirty_region *netfs_alloc_dirty_region(void) struct netfs_dirty_region *region; region = kzalloc(sizeof(struct netfs_dirty_region), GFP_KERNEL); - if (region) + if (region) { + INIT_LIST_HEAD(®ion->proc_link); netfs_stat(&netfs_n_wh_region); + } return region; } @@ -81,6 +83,8 @@ void netfs_free_dirty_region(struct netfs_i_context *ctx, { if (region) { trace_netfs_ref_region(region->debug_id, 0, netfs_region_trace_free); + if (!list_empty(®ion->proc_link)) + netfs_proc_del_region(region); if (ctx->ops->free_dirty_region) ctx->ops->free_dirty_region(region); netfs_put_flush_group(region->group); diff --git a/fs/netfs/write_helper.c b/fs/netfs/write_helper.c index fa048e3882ea..b1fe2d4c0df6 100644 --- a/fs/netfs/write_helper.c +++ b/fs/netfs/write_helper.c @@ -86,10 +86,13 @@ static void netfs_init_dirty_region(struct netfs_dirty_region *region, group = list_last_entry(&ctx->flush_groups, struct netfs_flush_group, group_link); region->group = netfs_get_flush_group(group); + spin_lock(&ctx->lock); list_add_tail(®ion->flush_link, &group->region_list); + spin_unlock(&ctx->lock); } trace_netfs_ref_region(region->debug_id, 1, netfs_region_trace_new); trace_netfs_dirty(ctx, region, NULL, netfs_dirty_trace_new); + netfs_proc_add_region(region); } /* @@ -198,6 +201,7 @@ static struct netfs_dirty_region *netfs_split_dirty_region( list_add(&tail->dirty_link, ®ion->dirty_link); list_add(&tail->flush_link, ®ion->flush_link); trace_netfs_dirty(ctx, tail, region, netfs_dirty_trace_split); + netfs_proc_add_region(tail); return tail; } diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 6acf3fb170c3..43d195badb0d 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -228,6 +228,7 @@ enum netfs_region_type { */ struct netfs_dirty_region { struct netfs_flush_group *group; + struct list_head proc_link; /* Link in /proc/fs/netfs/regions */ struct list_head active_link; /* Link in i_context->pending/active_writes */ struct list_head dirty_link; /* Link in i_context->dirty_regions */ struct list_head flush_link; /* Link in group->region_list or From patchwork Wed Jul 21 18:42:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 483383 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81EECC636CA for ; Wed, 21 Jul 2021 18:42:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 65BBF60FF1 for ; Wed, 21 Jul 2021 18:42:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239059AbhGUSBn (ORCPT ); Wed, 21 Jul 2021 14:01:43 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:42984 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238936AbhGUSBm (ORCPT ); Wed, 21 Jul 2021 14:01:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626892938; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ev/WKl0u0g6blajjg/aIie6Q9S6Qwmj0bAxvObPEahw=; b=YaU16PYm3JY+eswLxPvn6f5y1NQyQtvXbLgAXsdUsPIJWFreGsB2QBtM9Eh8TuWPB2jm2O UH19MkxRJhFACSgbBTx0Ugrj4oxrJpVsuBWYk/5WKVyO8CvJhUkfu8gmI/tfGQr9dMIDV5 kAHPl+w+e5gwljEeFxt0NZ3z7Svey4c= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-543-buZuGVf6OgCQ8QBChvYCsA-1; Wed, 21 Jul 2021 14:42:14 -0400 X-MC-Unique: buZuGVf6OgCQ8QBChvYCsA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2C47B801B0A; Wed, 21 Jul 2021 18:42:12 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 843381970E; Wed, 21 Jul 2021 18:42:04 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 13/12] netfs: Do copy-to-cache-on-read through VM writeback MIME-Version: 1.0 Content-ID: <297201.1626892923.1@warthog.procyon.org.uk> Date: Wed, 21 Jul 2021 19:42:03 +0100 Message-ID: <297202.1626892923@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org When data is read from the server and intended to be copied to the cache, offload the cache write to the VM writeback mechanism rather than scheduling it immediately. This allows the downloaded data to be superseded by local changes before it is written to the cache and means that we no longer need to use the PG_fscache flag. This is done by the following means: (1) The pages just downloaded into are marked dirty in netfs_rreq_unlock(). (2) A region of NETFS_REGION_CACHE_COPY type is added to the dirty region list. (3) If a region-to-be-modified overlaps the cache-copy region, the modifications supersede the download, moving the end marker over in netfs_merge_dirty_region(). (4) We don't really want to supersede in the middle of a region, so we may split a pristine region so that we can supersede forwards only. (5) We mark regions we're going to supersede with NETFS_REGION_SUPERSEDED to prevent them getting merged whilst we're superseding them. This flag is cleared when we're done and we may merge afterwards. (6) Adjacent download regions are potentially mergeable. (7) When being flushed, CACHE_COPY regions are intended only to be written to the cache, not the server, though they may contribute data to a cross-page chunk that has to be encrypted or compressed and sent to the server. Signed-off-by: David Howells --- fs/netfs/internal.h | 4 -- fs/netfs/main.c | 1 fs/netfs/read_helper.c | 126 ++-------------------------------------------------------------- fs/netfs/stats.c | 7 --- fs/netfs/write_back.c | 3 + fs/netfs/write_helper.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++++++- include/linux/netfs.h | 2 - include/trace/events/netfs.h | 3 + mm/filemap.c | 4 +- 9 files changed, 125 insertions(+), 137 deletions(-) diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index 6ae1eb55093a..ee83b81e4682 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -98,6 +98,7 @@ void netfs_writeback_worker(struct work_struct *work); void netfs_flush_region(struct netfs_i_context *ctx, struct netfs_dirty_region *region, enum netfs_dirty_trace why); +void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq); /* * write_prep.c @@ -121,10 +122,7 @@ extern atomic_t netfs_n_rh_read_done; extern atomic_t netfs_n_rh_read_failed; extern atomic_t netfs_n_rh_zero; extern atomic_t netfs_n_rh_short_read; -extern atomic_t netfs_n_rh_write; extern atomic_t netfs_n_rh_write_begin; -extern atomic_t netfs_n_rh_write_done; -extern atomic_t netfs_n_rh_write_failed; extern atomic_t netfs_n_rh_write_zskip; extern atomic_t netfs_n_wh_region; extern atomic_t netfs_n_wh_flush_group; diff --git a/fs/netfs/main.c b/fs/netfs/main.c index 125b570efefd..ad204dcbb5f7 100644 --- a/fs/netfs/main.c +++ b/fs/netfs/main.c @@ -21,6 +21,7 @@ static const char *netfs_proc_region_types[] = { [NETFS_REGION_ORDINARY] = "ORD ", [NETFS_REGION_DIO] = "DIOW", [NETFS_REGION_DSYNC] = "DSYN", + [NETFS_REGION_CACHE_COPY] = "CCPY", }; /* diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index e5c636acc756..7fa677d4c9ca 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -212,124 +212,6 @@ void netfs_rreq_completed(struct netfs_read_request *rreq, bool was_async) netfs_put_read_request(rreq, was_async); } -/* - * Deal with the completion of writing the data to the cache. We have to clear - * the PG_fscache bits on the pages involved and release the caller's ref. - * - * May be called in softirq mode and we inherit a ref from the caller. - */ -static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq, - bool was_async) -{ - struct netfs_read_subrequest *subreq; - struct page *page; - pgoff_t unlocked = 0; - bool have_unlocked = false; - - rcu_read_lock(); - - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - XA_STATE(xas, &rreq->mapping->i_pages, subreq->start / PAGE_SIZE); - - xas_for_each(&xas, page, (subreq->start + subreq->len - 1) / PAGE_SIZE) { - /* We might have multiple writes from the same huge - * page, but we mustn't unlock a page more than once. - */ - if (have_unlocked && page->index <= unlocked) - continue; - unlocked = page->index; - end_page_fscache(page); - have_unlocked = true; - } - } - - rcu_read_unlock(); - netfs_rreq_completed(rreq, was_async); -} - -static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error, - bool was_async) -{ - struct netfs_read_subrequest *subreq = priv; - struct netfs_read_request *rreq = subreq->rreq; - - if (IS_ERR_VALUE(transferred_or_error)) { - netfs_stat(&netfs_n_rh_write_failed); - trace_netfs_failure(rreq, subreq, transferred_or_error, - netfs_fail_copy_to_cache); - } else { - netfs_stat(&netfs_n_rh_write_done); - } - - trace_netfs_sreq(subreq, netfs_sreq_trace_write_term); - - /* If we decrement nr_wr_ops to 0, the ref belongs to us. */ - if (atomic_dec_and_test(&rreq->nr_wr_ops)) - netfs_rreq_unmark_after_write(rreq, was_async); - - netfs_put_subrequest(subreq, was_async); -} - -/* - * Perform any outstanding writes to the cache. We inherit a ref from the - * caller. - */ -static void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq) -{ - struct netfs_cache_resources *cres = &rreq->cache_resources; - struct netfs_read_subrequest *subreq, *next, *p; - struct iov_iter iter; - int ret; - - trace_netfs_rreq(rreq, netfs_rreq_trace_write); - - /* We don't want terminating writes trying to wake us up whilst we're - * still going through the list. - */ - atomic_inc(&rreq->nr_wr_ops); - - list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) { - if (!test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) { - list_del_init(&subreq->rreq_link); - netfs_put_subrequest(subreq, false); - } - } - - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - /* Amalgamate adjacent writes */ - while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { - next = list_next_entry(subreq, rreq_link); - if (next->start != subreq->start + subreq->len) - break; - subreq->len += next->len; - list_del_init(&next->rreq_link); - netfs_put_subrequest(next, false); - } - - ret = cres->ops->prepare_write(cres, &subreq->start, &subreq->len, - rreq->i_size); - if (ret < 0) { - trace_netfs_failure(rreq, subreq, ret, netfs_fail_prepare_write); - trace_netfs_sreq(subreq, netfs_sreq_trace_write_skip); - continue; - } - - iov_iter_xarray(&iter, WRITE, &rreq->mapping->i_pages, - subreq->start, subreq->len); - - atomic_inc(&rreq->nr_wr_ops); - netfs_stat(&netfs_n_rh_write); - netfs_get_read_subrequest(subreq); - trace_netfs_sreq(subreq, netfs_sreq_trace_write); - cres->ops->write(cres, subreq->start, &iter, - netfs_rreq_copy_terminated, subreq); - } - - /* If we decrement nr_wr_ops to 0, the usage ref belongs to us. */ - if (atomic_dec_and_test(&rreq->nr_wr_ops)) - netfs_rreq_unmark_after_write(rreq, false); -} - static void netfs_rreq_write_to_cache_work(struct work_struct *work) { struct netfs_read_request *rreq = @@ -390,19 +272,19 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq) xas_for_each(&xas, page, last_page) { unsigned int pgpos = (page->index - start_page) * PAGE_SIZE; unsigned int pgend = pgpos + thp_size(page); - bool pg_failed = false; + bool pg_failed = false, caching; for (;;) { if (!subreq) { pg_failed = true; break; } - if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) - set_page_fscache(page); pg_failed |= subreq_failed; if (pgend < iopos + subreq->len) break; + caching = test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags); + account += subreq->len - iov_iter_count(&subreq->iter); iopos += subreq->len; if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { @@ -420,6 +302,8 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq) for (i = 0; i < thp_nr_pages(page); i++) flush_dcache_page(page); SetPageUptodate(page); + if (caching) + set_page_dirty(page); } if (!test_bit(NETFS_RREQ_DONT_UNLOCK_PAGES, &rreq->flags)) { diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index a02d95bba158..414c2fca6b23 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -22,10 +22,7 @@ atomic_t netfs_n_rh_read_done; atomic_t netfs_n_rh_read_failed; atomic_t netfs_n_rh_zero; atomic_t netfs_n_rh_short_read; -atomic_t netfs_n_rh_write; atomic_t netfs_n_rh_write_begin; -atomic_t netfs_n_rh_write_done; -atomic_t netfs_n_rh_write_failed; atomic_t netfs_n_rh_write_zskip; atomic_t netfs_n_wh_region; atomic_t netfs_n_wh_flush_group; @@ -59,10 +56,6 @@ void netfs_stats_show(struct seq_file *m) atomic_read(&netfs_n_rh_read), atomic_read(&netfs_n_rh_read_done), atomic_read(&netfs_n_rh_read_failed)); - seq_printf(m, "RdHelp : WR=%u ws=%u wf=%u\n", - atomic_read(&netfs_n_rh_write), - atomic_read(&netfs_n_rh_write_done), - atomic_read(&netfs_n_rh_write_failed)); seq_printf(m, "WrHelp : R=%u F=%u wr=%u\n", atomic_read(&netfs_n_wh_region), atomic_read(&netfs_n_wh_flush_group), diff --git a/fs/netfs/write_back.c b/fs/netfs/write_back.c index 7363c3324602..4433c3121435 100644 --- a/fs/netfs/write_back.c +++ b/fs/netfs/write_back.c @@ -263,7 +263,8 @@ static void netfs_writeback(struct netfs_write_request *wreq) if (test_bit(NETFS_WREQ_WRITE_TO_CACHE, &wreq->flags)) netfs_set_up_write_to_cache(wreq); - ctx->ops->add_write_streams(wreq); + if (wreq->region->type != NETFS_REGION_CACHE_COPY) + ctx->ops->add_write_streams(wreq); out: if (atomic_dec_and_test(&wreq->outstanding)) diff --git a/fs/netfs/write_helper.c b/fs/netfs/write_helper.c index b1fe2d4c0df6..5e50b01527fb 100644 --- a/fs/netfs/write_helper.c +++ b/fs/netfs/write_helper.c @@ -80,6 +80,11 @@ static void netfs_init_dirty_region(struct netfs_dirty_region *region, INIT_LIST_HEAD(®ion->flush_link); refcount_set(®ion->ref, 1); spin_lock_init(®ion->lock); + if (type == NETFS_REGION_CACHE_COPY) { + region->state = NETFS_REGION_IS_DIRTY; + region->dirty.end = end; + } + if (file && ctx->ops->init_dirty_region) ctx->ops->init_dirty_region(region, file); if (!region->group) { @@ -160,6 +165,19 @@ static enum netfs_write_compatibility netfs_write_compatibility( return NETFS_WRITES_INCOMPATIBLE; } + /* Pending writes to the cache alone (ie. copy from a read) can be + * merged or superseded by a modification that will require writing to + * the server too. + */ + if (old->type == NETFS_REGION_CACHE_COPY) { + if (candidate->type == NETFS_REGION_CACHE_COPY) { + kleave(" = COMPT [ccopy]"); + return NETFS_WRITES_COMPATIBLE; + } + kleave(" = SUPER [ccopy]"); + return NETFS_WRITES_SUPERSEDE; + } + if (!ctx->ops->is_write_compatible) { if (candidate->type == NETFS_REGION_DSYNC) { kleave(" = SUPER [dsync]"); @@ -220,8 +238,11 @@ static void netfs_queue_write(struct netfs_i_context *ctx, if (overlaps(&candidate->bounds, &r->bounds)) { if (overlaps(&candidate->reserved, &r->reserved) || netfs_write_compatibility(ctx, r, candidate) == - NETFS_WRITES_INCOMPATIBLE) + NETFS_WRITES_INCOMPATIBLE) { + kdebug("conflict %x with pend %x", + candidate->debug_id, r->debug_id); goto add_to_pending_queue; + } } } @@ -238,8 +259,11 @@ static void netfs_queue_write(struct netfs_i_context *ctx, if (overlaps(&candidate->bounds, &r->bounds)) { if (overlaps(&candidate->reserved, &r->reserved) || netfs_write_compatibility(ctx, r, candidate) == - NETFS_WRITES_INCOMPATIBLE) + NETFS_WRITES_INCOMPATIBLE) { + kdebug("conflict %x with actv %x", + candidate->debug_id, r->debug_id); goto add_to_pending_queue; + } } } @@ -451,6 +475,9 @@ static void netfs_merge_dirty_region(struct netfs_i_context *ctx, goto discard; } goto scan_backwards; + + case NETFS_REGION_CACHE_COPY: + goto scan_backwards; } scan_backwards: @@ -922,3 +949,84 @@ ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) goto out; } EXPORT_SYMBOL(netfs_file_write_iter); + +/* + * Add a region that's just been read as a region on the dirty list to + * schedule a write to the cache. + */ +static bool netfs_copy_to_cache(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq) +{ + struct netfs_dirty_region *candidate, *r; + struct netfs_i_context *ctx = netfs_i_context(rreq->inode); + struct list_head *p; + loff_t end = subreq->start + subreq->len; + int ret; + + ret = netfs_require_flush_group(rreq->inode); + if (ret < 0) + return false; + + candidate = netfs_alloc_dirty_region(); + if (!candidate) + return false; + + netfs_init_dirty_region(candidate, rreq->inode, NULL, + NETFS_REGION_CACHE_COPY, 0, subreq->start, end); + + spin_lock(&ctx->lock); + + /* Find a place to insert. There can't be any dirty regions + * overlapping with the region we're adding. + */ + list_for_each(p, &ctx->dirty_regions) { + r = list_entry(p, struct netfs_dirty_region, dirty_link); + if (r->bounds.end <= candidate->bounds.start) + continue; + if (r->bounds.start >= candidate->bounds.end) + break; + } + + list_add_tail(&candidate->dirty_link, p); + netfs_merge_dirty_region(ctx, candidate); + + spin_unlock(&ctx->lock); + return true; +} + +/* + * If we downloaded some data and it now needs writing to the cache, we add it + * to the dirty region list and let that flush it. This way it can get merged + * with writes. + * + * We inherit a ref from the caller. + */ +void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq, *next, *p; + + trace_netfs_rreq(rreq, netfs_rreq_trace_write); + + list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) { + if (!test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) { + list_del_init(&subreq->rreq_link); + netfs_put_subrequest(subreq, false); + } + } + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + /* Amalgamate adjacent writes */ + while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { + next = list_next_entry(subreq, rreq_link); + if (next->start != subreq->start + subreq->len) + break; + subreq->len += next->len; + list_del_init(&next->rreq_link); + netfs_put_subrequest(next, false); + } + + netfs_copy_to_cache(rreq, subreq); + } + + netfs_rreq_completed(rreq, false); +} diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 43d195badb0d..527f08eb4898 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -145,7 +145,6 @@ struct netfs_read_request { void *netfs_priv; /* Private data for the netfs */ unsigned int debug_id; atomic_t nr_rd_ops; /* Number of read ops in progress */ - atomic_t nr_wr_ops; /* Number of write ops in progress */ size_t submitted; /* Amount submitted for I/O so far */ size_t len; /* Length of the request */ short error; /* 0 or error that occurred */ @@ -218,6 +217,7 @@ enum netfs_region_type { NETFS_REGION_ORDINARY, /* Ordinary write */ NETFS_REGION_DIO, /* Direct I/O write */ NETFS_REGION_DSYNC, /* O_DSYNC/RWF_DSYNC write */ + NETFS_REGION_CACHE_COPY, /* Data to be written to cache only */ } __attribute__((mode(byte))); /* diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index aa002725b209..136cc42263f9 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -156,7 +156,8 @@ enum netfs_write_stream_trace { #define netfs_region_types \ EM(NETFS_REGION_ORDINARY, "ORD") \ EM(NETFS_REGION_DIO, "DIO") \ - E_(NETFS_REGION_DSYNC, "DSY") + EM(NETFS_REGION_DSYNC, "DSY") \ + E_(NETFS_REGION_CACHE_COPY, "CCP") #define netfs_region_states \ EM(NETFS_REGION_IS_PENDING, "pend") \ diff --git a/mm/filemap.c b/mm/filemap.c index d1458ecf2f51..442cd767a047 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1545,8 +1545,10 @@ void end_page_writeback(struct page *page) * reused before the wake_up_page(). */ get_page(page); - if (!test_clear_page_writeback(page)) + if (!test_clear_page_writeback(page)) { + pr_err("Page %lx doesn't have wb set\n", page->index); BUG(); + } smp_mb__after_atomic(); wake_up_page(page, PG_writeback);