From patchwork Wed Mar 10 16:54:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396830 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D186C43381 for ; Wed, 10 Mar 2021 16:55:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E51CB64FCE for ; Wed, 10 Mar 2021 16:55:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233399AbhCJQyz (ORCPT ); Wed, 10 Mar 2021 11:54:55 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:54731 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229950AbhCJQys (ORCPT ); Wed, 10 Mar 2021 11:54:48 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395288; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A1kUfhyDJENd62dFanXyJmXmUPU5LOfWFPVlFCBfXuY=; b=b73bu1hb2P4FallXrGSKT42Bnr3/EA9xXtn4/cuNK6j32NwPgEiEgNlaIhB1021vaxF8J9 +JIzuvVEQgrwSHmudzSw+xrjJPtph+O6CBYMqe0kjDx9ngKAxtCa7TTGovlZQuEeCidsvf vWyATbSf/3Mp9oqqi9SQSJWiwq2X4sk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-425-fKa2XOQ8P-uqTZcPTuM4MA-1; Wed, 10 Mar 2021 11:54:46 -0500 X-MC-Unique: fKa2XOQ8P-uqTZcPTuM4MA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DB738192FDA9; Wed, 10 Mar 2021 16:54:43 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 058B75D6D7; Wed, 10 Mar 2021 16:54:38 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 01/28] iov_iter: Add ITER_XARRAY From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Alexander Viro , "Matthew Wilcox (Oracle)" , Christoph Hellwig , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:54:38 +0000 Message-ID: <161539527815.286939.14607323792547049341.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Add an iterator, ITER_XARRAY, that walks through a set of pages attached to an xarray, starting at a given page and offset and walking for the specified amount of bytes. The iterator supports transparent huge pages. The iterate_xarray() macro calls the helper function with rcu_access() helped. I think that this is only a problem for iov_iter_for_each_range() - and that returns an error for ITER_XARRAY (also, this function does not appear to be called). The caller must guarantee that the pages are all present and they must be locked using PG_locked, PG_writeback or PG_fscache to prevent them from going away or being migrated whilst they're being accessed. This is useful for copying data from socket buffers to inodes in network filesystems and for transferring data between those inodes and the cache using direct I/O. Whilst it is true that ITER_BVEC could be used instead, that would require a bio_vec array to be allocated to refer to all the pages - which should be redundant if inode->i_pages also points to all these pages. Note that older versions of this patch implemented an ITER_MAPPING instead, which was almost the same. Signed-off-by: David Howells cc: Alexander Viro cc: Matthew Wilcox (Oracle) cc: Christoph Hellwig cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/3577430.1579705075@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/158861205740.340223.16592990225607814022.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465785214.1376674.6062549291411362531.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588477334.3465195.3608963255682568730.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118129703.1232039.17141248432017826976.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161026313.2537118.14676007075365418649.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340386671.1303470.10752208972482479840.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/uio.h | 11 ++ lib/iov_iter.c | 313 +++++++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 301 insertions(+), 23 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 27ff8eb786dc..5f5ffc45d4aa 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -10,6 +10,7 @@ #include struct page; +struct address_space; struct pipe_inode_info; struct kvec { @@ -24,6 +25,7 @@ enum iter_type { ITER_BVEC = 16, ITER_PIPE = 32, ITER_DISCARD = 64, + ITER_XARRAY = 128, }; struct iov_iter { @@ -39,6 +41,7 @@ struct iov_iter { const struct iovec *iov; const struct kvec *kvec; const struct bio_vec *bvec; + struct xarray *xarray; struct pipe_inode_info *pipe; }; union { @@ -47,6 +50,7 @@ struct iov_iter { unsigned int head; unsigned int start_head; }; + loff_t xarray_start; }; }; @@ -80,6 +84,11 @@ static inline bool iov_iter_is_discard(const struct iov_iter *i) return iov_iter_type(i) == ITER_DISCARD; } +static inline bool iov_iter_is_xarray(const struct iov_iter *i) +{ + return iov_iter_type(i) == ITER_XARRAY; +} + static inline unsigned char iov_iter_rw(const struct iov_iter *i) { return i->type & (READ | WRITE); @@ -221,6 +230,8 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe, size_t count); void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); +void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray, + loff_t start, size_t count); ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start); ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f66c62aa7154..f808c625c11e 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -76,7 +76,44 @@ } \ } -#define iterate_all_kinds(i, n, v, I, B, K) { \ +#define iterate_xarray(i, n, __v, skip, STEP) { \ + struct page *head = NULL; \ + size_t wanted = n, seg, offset; \ + loff_t start = i->xarray_start + skip; \ + pgoff_t index = start >> PAGE_SHIFT; \ + int j; \ + \ + XA_STATE(xas, i->xarray, index); \ + \ + rcu_read_lock(); \ + xas_for_each(&xas, head, ULONG_MAX) { \ + if (xas_retry(&xas, head)) \ + continue; \ + if (WARN_ON(xa_is_value(head))) \ + break; \ + if (WARN_ON(PageHuge(head))) \ + break; \ + for (j = (head->index < index) ? index - head->index : 0; \ + j < thp_nr_pages(head); j++) { \ + __v.bv_page = head + j; \ + offset = (i->xarray_start + skip) & ~PAGE_MASK; \ + seg = PAGE_SIZE - offset; \ + __v.bv_offset = offset; \ + __v.bv_len = min(n, seg); \ + (void)(STEP); \ + n -= __v.bv_len; \ + skip += __v.bv_len; \ + if (n == 0) \ + break; \ + } \ + if (n == 0) \ + break; \ + } \ + rcu_read_unlock(); \ + n = wanted - n; \ +} + +#define iterate_all_kinds(i, n, v, I, B, K, X) { \ if (likely(n)) { \ size_t skip = i->iov_offset; \ if (unlikely(i->type & ITER_BVEC)) { \ @@ -88,6 +125,9 @@ struct kvec v; \ iterate_kvec(i, n, v, kvec, skip, (K)) \ } else if (unlikely(i->type & ITER_DISCARD)) { \ + } else if (unlikely(i->type & ITER_XARRAY)) { \ + struct bio_vec v; \ + iterate_xarray(i, n, v, skip, (X)); \ } else { \ const struct iovec *iov; \ struct iovec v; \ @@ -96,7 +136,7 @@ } \ } -#define iterate_and_advance(i, n, v, I, B, K) { \ +#define iterate_and_advance(i, n, v, I, B, K, X) { \ if (unlikely(i->count < n)) \ n = i->count; \ if (i->count) { \ @@ -121,6 +161,9 @@ i->kvec = kvec; \ } else if (unlikely(i->type & ITER_DISCARD)) { \ skip += n; \ + } else if (unlikely(i->type & ITER_XARRAY)) { \ + struct bio_vec v; \ + iterate_xarray(i, n, v, skip, (X)) \ } else { \ const struct iovec *iov; \ struct iovec v; \ @@ -622,7 +665,9 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i) copyout(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len), memcpy_to_page(v.bv_page, v.bv_offset, (from += v.bv_len) - v.bv_len, v.bv_len), - memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len) + memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len), + memcpy_to_page(v.bv_page, v.bv_offset, + (from += v.bv_len) - v.bv_len, v.bv_len) ) return bytes; @@ -738,6 +783,16 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i) bytes = curr_addr - s_addr - rem; return bytes; } + }), + ({ + rem = copy_mc_to_page(v.bv_page, v.bv_offset, + (from += v.bv_len) - v.bv_len, v.bv_len); + if (rem) { + curr_addr = (unsigned long) from; + bytes = curr_addr - s_addr - rem; + rcu_read_unlock(); + return bytes; + } }) ) @@ -759,7 +814,9 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i) copyin((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) return bytes; @@ -785,7 +842,9 @@ bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i) 0;}), memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) iov_iter_advance(i, bytes); @@ -805,7 +864,9 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i) v.iov_base, v.iov_len), memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) return bytes; @@ -840,7 +901,9 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i) memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), memcpy_flushcache((to += v.iov_len) - v.iov_len, v.iov_base, - v.iov_len) + v.iov_len), + memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) return bytes; @@ -864,7 +927,9 @@ bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i) 0;}), memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) iov_iter_advance(i, bytes); @@ -901,7 +966,7 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes, { if (unlikely(!page_copy_sane(page, offset, bytes))) return 0; - if (i->type & (ITER_BVEC|ITER_KVEC)) { + if (i->type & (ITER_BVEC | ITER_KVEC | ITER_XARRAY)) { void *kaddr = kmap_atomic(page); size_t wanted = copy_to_iter(kaddr + offset, bytes, i); kunmap_atomic(kaddr); @@ -924,7 +989,7 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes, WARN_ON(1); return 0; } - if (i->type & (ITER_BVEC|ITER_KVEC)) { + if (i->type & (ITER_BVEC | ITER_KVEC | ITER_XARRAY)) { void *kaddr = kmap_atomic(page); size_t wanted = _copy_from_iter(kaddr + offset, bytes, i); kunmap_atomic(kaddr); @@ -968,7 +1033,8 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i) iterate_and_advance(i, bytes, v, clear_user(v.iov_base, v.iov_len), memzero_page(v.bv_page, v.bv_offset, v.bv_len), - memset(v.iov_base, 0, v.iov_len) + memset(v.iov_base, 0, v.iov_len), + memzero_page(v.bv_page, v.bv_offset, v.bv_len) ) return bytes; @@ -992,7 +1058,9 @@ size_t iov_iter_copy_from_user_atomic(struct page *page, copyin((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) kunmap_atomic(kaddr); return bytes; @@ -1078,11 +1146,16 @@ void iov_iter_advance(struct iov_iter *i, size_t size) i->count -= size; return; } + if (unlikely(iov_iter_is_xarray(i))) { + i->iov_offset += size; + i->count -= size; + return; + } if (iov_iter_is_bvec(i)) { iov_iter_bvec_advance(i, size); return; } - iterate_and_advance(i, size, v, 0, 0, 0) + iterate_and_advance(i, size, v, 0, 0, 0, 0) } EXPORT_SYMBOL(iov_iter_advance); @@ -1126,7 +1199,12 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll) return; } unroll -= i->iov_offset; - if (iov_iter_is_bvec(i)) { + if (iov_iter_is_xarray(i)) { + BUG(); /* We should never go beyond the start of the specified + * range since we might then be straying into pages that + * aren't pinned. + */ + } else if (iov_iter_is_bvec(i)) { const struct bio_vec *bvec = i->bvec; while (1) { size_t n = (--bvec)->bv_len; @@ -1163,9 +1241,9 @@ size_t iov_iter_single_seg_count(const struct iov_iter *i) return i->count; // it is a silly place, anyway if (i->nr_segs == 1) return i->count; - if (unlikely(iov_iter_is_discard(i))) + if (unlikely(iov_iter_is_discard(i) || iov_iter_is_xarray(i))) return i->count; - else if (iov_iter_is_bvec(i)) + if (iov_iter_is_bvec(i)) return min(i->count, i->bvec->bv_len - i->iov_offset); else return min(i->count, i->iov->iov_len - i->iov_offset); @@ -1213,6 +1291,31 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, } EXPORT_SYMBOL(iov_iter_pipe); +/** + * iov_iter_xarray - Initialise an I/O iterator to use the pages in an xarray + * @i: The iterator to initialise. + * @direction: The direction of the transfer. + * @xarray: The xarray to access. + * @start: The start file position. + * @count: The size of the I/O buffer in bytes. + * + * Set up an I/O iterator to either draw data out of the pages attached to an + * inode or to inject data into those pages. The pages *must* be prevented + * from evaporation, either by taking a ref on them or locking them by the + * caller. + */ +void iov_iter_xarray(struct iov_iter *i, unsigned int direction, + struct xarray *xarray, loff_t start, size_t count) +{ + BUG_ON(direction & ~1); + i->type = ITER_XARRAY | (direction & (READ | WRITE)); + i->xarray = xarray; + i->xarray_start = start; + i->count = count; + i->iov_offset = 0; +} +EXPORT_SYMBOL(iov_iter_xarray); + /** * iov_iter_discard - Initialise an I/O iterator that discards data * @i: The iterator to initialise. @@ -1246,7 +1349,8 @@ unsigned long iov_iter_alignment(const struct iov_iter *i) iterate_all_kinds(i, size, v, (res |= (unsigned long)v.iov_base | v.iov_len, 0), res |= v.bv_offset | v.bv_len, - res |= (unsigned long)v.iov_base | v.iov_len + res |= (unsigned long)v.iov_base | v.iov_len, + res |= v.bv_offset | v.bv_len ) return res; } @@ -1268,7 +1372,9 @@ unsigned long iov_iter_gap_alignment(const struct iov_iter *i) (res |= (!res ? 0 : (unsigned long)v.bv_offset) | (size != v.bv_len ? size : 0)), (res |= (!res ? 0 : (unsigned long)v.iov_base) | - (size != v.iov_len ? size : 0)) + (size != v.iov_len ? size : 0)), + (res |= (!res ? 0 : (unsigned long)v.bv_offset) | + (size != v.bv_len ? size : 0)) ); return res; } @@ -1318,6 +1424,75 @@ static ssize_t pipe_get_pages(struct iov_iter *i, return __pipe_get_pages(i, min(maxsize, capacity), pages, iter_head, start); } +static ssize_t iter_xarray_copy_pages(struct page **pages, struct xarray *xa, + pgoff_t index, unsigned int nr_pages) +{ + XA_STATE(xas, xa, index); + struct page *page; + unsigned int ret = 0; + + rcu_read_lock(); + for (page = xas_load(&xas); page; page = xas_next(&xas)) { + if (xas_retry(&xas, page)) + continue; + + /* Has the page moved or been split? */ + if (unlikely(page != xas_reload(&xas))) { + xas_reset(&xas); + continue; + } + + pages[ret] = find_subpage(page, xas.xa_index); + get_page(pages[ret]); + if (++ret == nr_pages) + break; + } + rcu_read_unlock(); + return ret; +} + +static ssize_t iter_xarray_get_pages(struct iov_iter *i, + struct page **pages, size_t maxsize, + unsigned maxpages, size_t *_start_offset) +{ + unsigned nr, offset; + pgoff_t index, count; + size_t size = maxsize, actual; + loff_t pos; + + if (!size || !maxpages) + return 0; + + pos = i->xarray_start + i->iov_offset; + index = pos >> PAGE_SHIFT; + offset = pos & ~PAGE_MASK; + *_start_offset = offset; + + count = 1; + if (size > PAGE_SIZE - offset) { + size -= PAGE_SIZE - offset; + count += size >> PAGE_SHIFT; + size &= ~PAGE_MASK; + if (size) + count++; + } + + if (count > maxpages) + count = maxpages; + + nr = iter_xarray_copy_pages(pages, i->xarray, index, count); + if (nr == 0) + return 0; + + actual = PAGE_SIZE * nr; + actual -= offset; + if (nr == count && size > 0) { + unsigned last_offset = (nr > 1) ? 0 : offset; + actual -= PAGE_SIZE - (last_offset + size); + } + return actual; +} + ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start) @@ -1327,6 +1502,8 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, if (unlikely(iov_iter_is_pipe(i))) return pipe_get_pages(i, pages, maxsize, maxpages, start); + if (unlikely(iov_iter_is_xarray(i))) + return iter_xarray_get_pages(i, pages, maxsize, maxpages, start); if (unlikely(iov_iter_is_discard(i))) return -EFAULT; @@ -1353,7 +1530,8 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, return v.bv_len; }),({ return -EFAULT; - }) + }), + 0 ) return 0; } @@ -1397,6 +1575,51 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i, return n; } +static ssize_t iter_xarray_get_pages_alloc(struct iov_iter *i, + struct page ***pages, size_t maxsize, + size_t *_start_offset) +{ + struct page **p; + unsigned nr, offset; + pgoff_t index, count; + size_t size = maxsize, actual; + loff_t pos; + + if (!size) + return 0; + + pos = i->xarray_start + i->iov_offset; + index = pos >> PAGE_SHIFT; + offset = pos & ~PAGE_MASK; + *_start_offset = offset; + + count = 1; + if (size > PAGE_SIZE - offset) { + size -= PAGE_SIZE - offset; + count += size >> PAGE_SHIFT; + size &= ~PAGE_MASK; + if (size) + count++; + } + + p = get_pages_array(count); + if (!p) + return -ENOMEM; + *pages = p; + + nr = iter_xarray_copy_pages(p, i->xarray, index, count); + if (nr == 0) + return 0; + + actual = PAGE_SIZE * nr; + actual -= offset; + if (nr == count && size > 0) { + unsigned last_offset = (nr > 1) ? 0 : offset; + actual -= PAGE_SIZE - (last_offset + size); + } + return actual; +} + ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start) @@ -1408,6 +1631,8 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, if (unlikely(iov_iter_is_pipe(i))) return pipe_get_pages_alloc(i, pages, maxsize, start); + if (unlikely(iov_iter_is_xarray(i))) + return iter_xarray_get_pages_alloc(i, pages, maxsize, start); if (unlikely(iov_iter_is_discard(i))) return -EFAULT; @@ -1440,7 +1665,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, return v.bv_len; }),({ return -EFAULT; - }) + }), 0 ) return 0; } @@ -1478,6 +1703,13 @@ size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, v.iov_base, v.iov_len, sum, off); off += v.iov_len; + }), ({ + char *p = kmap_atomic(v.bv_page); + sum = csum_and_memcpy((to += v.bv_len) - v.bv_len, + p + v.bv_offset, v.bv_len, + sum, off); + kunmap_atomic(p); + off += v.bv_len; }) ) *csum = sum; @@ -1519,6 +1751,13 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, v.iov_base, v.iov_len, sum, off); off += v.iov_len; + }), ({ + char *p = kmap_atomic(v.bv_page); + sum = csum_and_memcpy((to += v.bv_len) - v.bv_len, + p + v.bv_offset, v.bv_len, + sum, off); + kunmap_atomic(p); + off += v.bv_len; }) ) *csum = sum; @@ -1565,6 +1804,13 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate, (from += v.iov_len) - v.iov_len, v.iov_len, sum, off); off += v.iov_len; + }), ({ + char *p = kmap_atomic(v.bv_page); + sum = csum_and_memcpy(p + v.bv_offset, + (from += v.bv_len) - v.bv_len, + v.bv_len, sum, off); + kunmap_atomic(p); + off += v.bv_len; }) ) csstate->csum = sum; @@ -1615,6 +1861,21 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages) npages = pipe_space_for_user(iter_head, pipe->tail, pipe); if (npages >= maxpages) return maxpages; + } else if (unlikely(iov_iter_is_xarray(i))) { + unsigned offset; + + offset = (i->xarray_start + i->iov_offset) & ~PAGE_MASK; + + npages = 1; + if (size > PAGE_SIZE - offset) { + size -= PAGE_SIZE - offset; + npages += size >> PAGE_SHIFT; + size &= ~PAGE_MASK; + if (size) + npages++; + } + if (npages >= maxpages) + return maxpages; } else iterate_all_kinds(i, size, v, ({ unsigned long p = (unsigned long)v.iov_base; npages += DIV_ROUND_UP(p + v.iov_len, PAGE_SIZE) @@ -1631,7 +1892,8 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages) - p / PAGE_SIZE; if (npages >= maxpages) return maxpages; - }) + }), + 0 ) return npages; } @@ -1644,7 +1906,7 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags) WARN_ON(1); return NULL; } - if (unlikely(iov_iter_is_discard(new))) + if (unlikely(iov_iter_is_discard(new) || iov_iter_is_xarray(new))) return NULL; if (iov_iter_is_bvec(new)) return new->bvec = kmemdup(new->bvec, @@ -1849,7 +2111,12 @@ int iov_iter_for_each_range(struct iov_iter *i, size_t bytes, kunmap(v.bv_page); err;}), ({ w = v; - err = f(&w, context);}) + err = f(&w, context);}), ({ + w.iov_base = kmap(v.bv_page) + v.bv_offset; + w.iov_len = v.bv_len; + err = f(&w, context); + kunmap(v.bv_page); + err;}) ) return err; } From patchwork Wed Mar 10 16:54:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396829 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15EBEC433E6 for ; Wed, 10 Mar 2021 16:55:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D2A0D64FCD for ; Wed, 10 Mar 2021 16:55:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233181AbhCJQzZ (ORCPT ); Wed, 10 Mar 2021 11:55:25 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:27230 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233498AbhCJQzE (ORCPT ); Wed, 10 Mar 2021 11:55:04 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395303; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3JZ+E12Bv/rbfH/p1hytCimW8jzaBIx4fcjnzeRaUec=; b=YKZkIZO/bGfMnnrzrYQtGpf2dHkHS7pc/+0zNKjlEhk/qQDr46qtGMq1IvevD7ZPCR+m18 DPiAKtUEO6cBixdIctkzgMeTPDUF6wwgpBI0bKiQw9ten6n32mRuhCjI/NsCLTCCjDM2fE RMuRRXejgOg/Q2QrcJHUfkTcA80J9TU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-169-hg6zAUlwNXOShk3xrouNGQ-1; Wed, 10 Mar 2021 11:55:02 -0500 X-MC-Unique: hg6zAUlwNXOShk3xrouNGQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9CD7E19324A9; Wed, 10 Mar 2021 16:54:59 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 854AA6062F; Wed, 10 Mar 2021 16:54:49 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 02/28] mm: Add an unlock function for PG_private_2/PG_fscache From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Linus Torvalds , Christoph Hellwig , "Matthew Wilcox (Oracle)" , Alexander Viro , Christoph Hellwig , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:54:49 +0000 Message-ID: <161539528910.286939.1252328699383291173.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Add a function, unlock_page_private_2(), to unlock PG_private_2 analogous to that of PG_lock. Add a kerneldoc banner to that indicating the example usage case. A wrapper will need to be placed in the netfs header in the patch that adds that. [This implements a suggestion by Linus[1] to not mix the terminology of PG_private_2 and PG_fscache in the mm core function] Changes: - Remove extern from the declaration[2]. Suggested-by: Linus Torvalds Signed-off-by: David Howells Reviewed-by: Christoph Hellwig cc: Matthew Wilcox (Oracle) cc: Alexander Viro cc: Christoph Hellwig cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/1330473.1612974547@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=XAEMTKswPu1Yw4uaro1R3ibs27ztw@mail.gmail.com/ [1] Link: https://lore.kernel.org/r/20210216102659.GA27714@lst.de/ [2] Link: https://lore.kernel.org/r/161340387944.1303470.7944159520278177652.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/pagemap.h | 1 + mm/filemap.c | 20 ++++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 20225b067583..ac91b33f3873 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -591,6 +591,7 @@ extern int __lock_page_async(struct page *page, struct wait_page_queue *wait); extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm, unsigned int flags); extern void unlock_page(struct page *page); +void unlock_page_private_2(struct page *page); /* * Return true if the page was successfully locked diff --git a/mm/filemap.c b/mm/filemap.c index 43700480d897..925964b67583 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1432,6 +1432,26 @@ void unlock_page(struct page *page) } EXPORT_SYMBOL(unlock_page); +/** + * unlock_page_private_2 - Unlock a page that's locked with PG_private_2 + * @page: The page + * + * Unlocks a page that's locked with PG_private_2 and wakes up sleepers in + * wait_on_page_private_2(). + * + * This is, for example, used when a netfs page is being written to a local + * disk cache, thereby allowing writes to the cache for the same page to be + * serialised. + */ +void unlock_page_private_2(struct page *page) +{ + page = compound_head(page); + VM_BUG_ON_PAGE(!PagePrivate2(page), page); + clear_bit_unlock(PG_private_2, &page->flags); + wake_up_page_bit(page, PG_private_2); +} +EXPORT_SYMBOL(unlock_page_private_2); + /** * end_page_writeback - end writeback against a page * @page: the page From patchwork Wed Mar 10 16:55:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 397631 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAAB3C43333 for ; Wed, 10 Mar 2021 16:55:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A994864FCA for ; Wed, 10 Mar 2021 16:55:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233528AbhCJQz2 (ORCPT ); Wed, 10 Mar 2021 11:55:28 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:44559 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233507AbhCJQzP (ORCPT ); Wed, 10 Mar 2021 11:55:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395314; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uVKrir/E2mYwb0N659VVvjNlhgvQd6dn5f4tWZkQLtk=; b=OO3aXp0ymWIOleIXfd876C+YEGD7/JlmTKnFG9Kv6DN309fUwa+t+tQKsK/U/9gsYWLFF9 XedW1w1KJyWU3EpBTNcRmdZtzpJW6iOgBAdiEOHmfIc0+1qHcCA2Bm3mCb77umARbGFdBp 1ImC7lgaLuThPXahm83N0tDOh6clXb0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-183-lW15TNIWNVCptwrkSDAfDw-1; Wed, 10 Mar 2021 11:55:12 -0500 X-MC-Unique: lW15TNIWNVCptwrkSDAfDw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7204084B7DF; Wed, 10 Mar 2021 16:55:10 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id C20C15945E; Wed, 10 Mar 2021 16:55:06 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 03/28] mm: Implement readahead_control pageset expansion From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: "Matthew Wilcox (Oracle)" , "Matthew Wilcox (Oracle)" , Alexander Viro , Christoph Hellwig , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:55:04 +0000 Message-ID: <161539530488.286939.18085961677838089157.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Provide a function, readahead_expand(), that expands the set of pages specified by a readahead_control object to encompass a revised area with a proposed size and length. The proposed area must include all of the old area and may be expanded yet more by this function so that the edges align on (transparent huge) page boundaries as allocated. The expansion will be cut short if a page already exists in either of the areas being expanded into. Note that any expansion made in such a case is not rolled back. This will be used by fscache so that reads can be expanded to cache granule boundaries, thereby allowing whole granules to be stored in the cache, but there are other potential users also. Changes: - Moved the declaration of readahead_expand() to a better place[1]. Suggested-by: Matthew Wilcox (Oracle) Signed-off-by: David Howells cc: Matthew Wilcox (Oracle) cc: Alexander Viro cc: Christoph Hellwig cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20210217161358.GM2858050@casper.infradead.org/ [1] Link: https://lore.kernel.org/r/159974633888.2094769.8326206446358128373.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588479816.3465195.553952688795241765.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118131787.1232039.4863969952441067985.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161028670.2537118.13831420617039766044.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340389201.1303470.14353807284546854878.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/pagemap.h | 2 + mm/readahead.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 72 insertions(+) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index ac91b33f3873..bf05e99ce588 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -818,6 +818,8 @@ void page_cache_sync_ra(struct readahead_control *, struct file_ra_state *, unsigned long req_count); void page_cache_async_ra(struct readahead_control *, struct file_ra_state *, struct page *, unsigned long req_count); +void readahead_expand(struct readahead_control *ractl, + loff_t new_start, size_t new_len); /** * page_cache_sync_readahead - generic file readahead diff --git a/mm/readahead.c b/mm/readahead.c index c5b0457415be..4446dada0bc2 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -638,3 +638,73 @@ SYSCALL_DEFINE3(readahead, int, fd, loff_t, offset, size_t, count) { return ksys_readahead(fd, offset, count); } + +/** + * readahead_expand - Expand a readahead request + * @ractl: The request to be expanded + * @new_start: The revised start + * @new_len: The revised size of the request + * + * Attempt to expand a readahead request outwards from the current size to the + * specified size by inserting locked pages before and after the current window + * to increase the size to the new window. This may involve the insertion of + * THPs, in which case the window may get expanded even beyond what was + * requested. + * + * The algorithm will stop if it encounters a conflicting page already in the + * pagecache and leave a smaller expansion than requested. + * + * The caller must check for this by examining the revised @ractl object for a + * different expansion than was requested. + */ +void readahead_expand(struct readahead_control *ractl, + loff_t new_start, size_t new_len) +{ + struct address_space *mapping = ractl->mapping; + pgoff_t new_index, new_nr_pages; + gfp_t gfp_mask = readahead_gfp_mask(mapping); + + new_index = new_start / PAGE_SIZE; + + /* Expand the leading edge downwards */ + while (ractl->_index > new_index) { + unsigned long index = ractl->_index - 1; + struct page *page = xa_load(&mapping->i_pages, index); + + if (page && !xa_is_value(page)) + return; /* Page apparently present */ + + page = __page_cache_alloc(gfp_mask); + if (!page) + return; + if (add_to_page_cache_lru(page, mapping, index, gfp_mask) < 0) { + put_page(page); + return; + } + + ractl->_nr_pages++; + ractl->_index = page->index; + } + + new_len += new_start - readahead_pos(ractl); + new_nr_pages = DIV_ROUND_UP(new_len, PAGE_SIZE); + + /* Expand the trailing edge upwards */ + while (ractl->_nr_pages < new_nr_pages) { + unsigned long index = ractl->_index + ractl->_nr_pages; + struct page *page = xa_load(&mapping->i_pages, index); + + if (page && !xa_is_value(page)) + return; /* Page apparently present */ + + page = __page_cache_alloc(gfp_mask); + if (!page) + return; + if (add_to_page_cache_lru(page, mapping, index, gfp_mask) < 0) { + put_page(page); + return; + } + ractl->_nr_pages++; + } +} +EXPORT_SYMBOL(readahead_expand); From patchwork Wed Mar 10 16:55:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396828 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13FA8C433E9 for ; Wed, 10 Mar 2021 16:56:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EB13D64FE0 for ; Wed, 10 Mar 2021 16:56:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232677AbhCJQz5 (ORCPT ); Wed, 10 Mar 2021 11:55:57 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:35305 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233553AbhCJQzb (ORCPT ); Wed, 10 Mar 2021 11:55:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395331; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dQFazTNg8ZNuoMbVCQ8NYcIVhTkKfyT97DX5gOD7eo0=; b=a1fhTWu6ZCRQLo3ZmEPQDGXf7CvCLkvp5gF6YF+4V7ZULE0c23l+Ncsi5S9UPAN0rfBToi OrF4CiNyV4KKlfxFvYEbh9urZHl0OKPStsWRZvR/I/2haHdH9hcbKnaxm/sxAu2hruMyMh f0Wp9ISnSftkqDxdDuZ4yviQ+7UQZ4M= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-244-azPSCIZBP_eQ-riT_DCfOQ-1; Wed, 10 Mar 2021 11:55:27 -0500 X-MC-Unique: azPSCIZBP_eQ-riT_DCfOQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 63F12CC64E; Wed, 10 Mar 2021 16:55:25 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id CD7AB60C13; Wed, 10 Mar 2021 16:55:16 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 04/28] netfs: Make a netfs helper module From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:55:15 +0000 Message-ID: <161539531569.286939.18317119181653706665.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Make a netfs helper module to manage read request segmentation, caching support and transparent huge page support on behalf of a network filesystem. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588496284.3465195.10102643717770106661.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118135638.1232039.1622182202673126285.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161031028.2537118.1213974428943508753.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340391427.1303470.14884950716721956560.stgit@warthog.procyon.org.uk/ # v3 --- fs/netfs/Kconfig | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 fs/netfs/Kconfig diff --git a/fs/netfs/Kconfig b/fs/netfs/Kconfig new file mode 100644 index 000000000000..2ebf90e6ca95 --- /dev/null +++ b/fs/netfs/Kconfig @@ -0,0 +1,8 @@ +# SPDX-License-Identifier: GPL-2.0-only + +config NETFS_SUPPORT + tristate "Support for network filesystem high-level I/O" + help + This option enables support for network filesystems, including + helpers for high-level buffered I/O, abstracting out read + segmentation, local caching and transparent huge page support. From patchwork Wed Mar 10 16:55:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 397630 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F7D3C43331 for ; Wed, 10 Mar 2021 16:56:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1735D64FD4 for ; Wed, 10 Mar 2021 16:56:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233293AbhCJQz6 (ORCPT ); Wed, 10 Mar 2021 11:55:58 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:35556 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233384AbhCJQzr (ORCPT ); Wed, 10 Mar 2021 11:55:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395346; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=62z6k1rtJQ013YAJgGmkOZd4eEa+QoRXeNsYeZUSq4A=; b=MJwfW/nPRN7qkyEMKpUrB3JgMfgb6pTQ4knu/fWC3C0iVtKSMNJ42i2Pb1d6nSk4P9qGoZ 7gn7S6x6ZG89zXtabn5jmx/U67X38zeSr+KP9YZkDwLKh2nylyAyoQuugaw7jK65QKmm0P Q+/8oXRtv9SrXFfQXLch/V/xq6HnIzU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-333-pc0PttDoNa-UI0b-WXHwJA-1; Wed, 10 Mar 2021 11:55:42 -0500 X-MC-Unique: pc0PttDoNa-UI0b-WXHwJA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EB7E71934109; Wed, 10 Mar 2021 16:55:39 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 079A0694CA; Wed, 10 Mar 2021 16:55:33 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 05/28] netfs: Documentation for helper library From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:55:32 +0000 Message-ID: <161539533275.286939.6246011228676840978.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Add interface documentation for the netfs helper library. Signed-off-by: David Howells --- Documentation/filesystems/index.rst | 1 Documentation/filesystems/netfs_library.rst | 526 +++++++++++++++++++++++++++ 2 files changed, 527 insertions(+) create mode 100644 Documentation/filesystems/netfs_library.rst diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index 1f76b1cb3348..d4853cb919d2 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -53,6 +53,7 @@ filesystem implementations. journalling fscrypt fsverity + netfs_library Filesystems =========== diff --git a/Documentation/filesystems/netfs_library.rst b/Documentation/filesystems/netfs_library.rst new file mode 100644 index 000000000000..57a641847818 --- /dev/null +++ b/Documentation/filesystems/netfs_library.rst @@ -0,0 +1,526 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================= +NETWORK FILESYSTEM HELPER LIBRARY +================================= + +.. Contents: + + - Overview. + - Buffered read helpers. + - Read helper functions. + - Read helper structures. + - Read helper operations. + - Read helper procedure. + - Read helper cache API. + + +Overview +======== + +The network filesystem helper library is a set of functions designed to aid a +network filesystem in implementing VM/VFS operations. For the moment, that +just includes turning various VM buffered read operations into requests to read +from the server. The helper library, however, can also interpose other +services, such as local caching or local data encryption. + +Note that the library module doesn't link against local caching directly, so +access must be provided by the netfs. + + +Buffered Read Helpers +===================== + +The library provides a set of read helpers that handle the ->readpage(), +->readahead() and much of the ->write_begin() VM operations and translate them +into a common call framework. + +The following services are provided: + + * Handles transparent huge pages (THPs). + + * Insulates the netfs from VM interface changes. + + * Allows the netfs to arbitrarily split reads up into pieces, even ones that + don't match page sizes or page alignments and that may cross pages. + + * Allows the netfs to expand a readahead request in both directions to meet + its needs. + + * Allows the netfs to partially fulfil a read, which will then be resubmitted. + + * Handles local caching, allowing cached data and server-read data to be + interleaved for a single request. + + * Handles clearing of bufferage that aren't on the server. + + * Handle retrying of reads that failed, switching reads from the cache to the + server as necessary. + + * In the future, this is a place that other services can be performed, such as + local encryption of data to be stored remotely or in the cache. + +From the network filesystem, the helpers require a table of operations. This +includes a mandatory method to issue a read operation along with a number of +optional methods. + + +Read Helper Functions +--------------------- + +Three read helpers are provided:: + + * void netfs_readahead(struct readahead_control *ractl, + const struct netfs_read_request_ops *ops, + void *netfs_priv);`` + * int netfs_readpage(struct file *file, + struct page *page, + const struct netfs_read_request_ops *ops, + void *netfs_priv); + * int netfs_write_begin(struct file *file, + struct address_space *mapping, + loff_t pos, + unsigned int len, + unsigned int flags, + struct page **_page, + void **_fsdata, + const struct netfs_read_request_ops *ops, + void *netfs_priv); + +Each corresponds to a VM operation, with the addition of a couple of parameters +for the use of the read helpers: + + * ``ops`` + + A table of operations through which the helpers can talk to the filesystem. + + * ``netfs_priv`` + + Filesystem private data (can be NULL). + +Both of these values will be stored into the read request structure. + +For ->readahead() and ->readpage(), the network filesystem should just jump +into the corresponding read helper; whereas for ->write_begin(), it may be a +little more complicated as the network filesystem might want to flush +conflicting writes or track dirty data and needs to put the acquired page if an +error occurs after calling the helper. + +The helpers manage the read request, calling back into the network filesystem +through the suppplied table of operations. Waits will be performed as +necessary before returning for helpers that are meant to be synchronous. + +If an error occurs and netfs_priv is non-NULL, ops->cleanup() will be called to +deal with it. If some parts of the request are in progress when an error +occurs, the request will get partially completed if sufficient data is read. + +Additionally, there is:: + + * void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, + ssize_t transferred_or_error, + bool was_async); + +which should be called to complete a read subrequest. This is given the number +of bytes transferred or a negative error code, plus a flag indicating whether +the operation was asynchronous (ie. whether the follow-on processing can be +done in the current context, given this may involve sleeping). + + +Read Helper Structures +---------------------- + +The read helpers make use of a couple of structures to maintain the state of +the read. The first is a structure that manages a read request as a whole:: + + struct netfs_read_request { + struct inode *inode; + struct address_space *mapping; + struct netfs_cache_resources cache_resources; + void *netfs_priv; + loff_t start; + size_t len; + loff_t i_size; + const struct netfs_read_request_ops *netfs_ops; + unsigned int debug_id; + ... + }; + +The above fields are the ones the netfs can use. They are: + + * ``inode`` + * ``mapping`` + + The inode and the address space of the file being read from. The mapping + may or may not point to inode->i_data. + + * ``cache_resources`` + + Resources for the local cache to use, if present. + + * ``netfs_priv`` + + The network filesystem's private data. The value for this can be passed in + to the helper functions or set during the request. The ->cleanup() op will + be called if this is non-NULL at the end. + + * ``start`` + * ``len`` + + The file position of the start of the read request and the length. These + may be altered by the ->expand_readahead() op. + + * ``i_size`` + + The size of the file at the start of the request. + + * ``netfs_ops`` + + A pointer to the operation table. The value for this is passed into the + helper functions. + + * ``debug_id`` + + A number allocated to this operation that can be displayed in trace lines + for reference. + + +The second structure is used to manage individual slices of the overall read +request:: + + struct netfs_read_subrequest { + struct netfs_read_request *rreq; + loff_t start; + size_t len; + size_t transferred; + unsigned long flags; + unsigned short debug_index; + ... + }; + +Each subrequest is expected to access a single source, though the helpers will +handle falling back from one source type to another. The members are: + + * ``rreq`` + + A pointer to the read request. + + * ``start`` + * ``len`` + + The file position of the start of this slice of the read request and the + length. + + * ``transferred`` + + The amount of data transferred so far of the length of this slice. The + network filesystem or cache should start the operation this far into the + slice. If a short read occurs, the helpers will call again, having updated + this to reflect the amount read so far. + + * ``flags`` + + Flags pertaining to the read. There are two of interest to the filesystem + or cache: + + * ``NETFS_SREQ_CLEAR_TAIL`` + + This can be set to indicate that the remainder of the slice, from + transferred to len, should be cleared. + + * ``NETFS_SREQ_SEEK_DATA_READ`` + + This is a hint to the cache that it might want to try skipping ahead to + the next data (ie. using SEEK_DATA). + + * ``debug_index`` + + A number allocated to this slice that can be displayed in trace lines for + reference. + + +Read Helper Operations +---------------------- + +The network filesystem must provide the read helpers with a table of operations +through which it can issue requests and negotiate:: + + struct netfs_read_request_ops { + void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); + bool (*is_cache_enabled)(struct inode *inode); + int (*begin_cache_operation)(struct netfs_read_request *rreq); + void (*expand_readahead)(struct netfs_read_request *rreq); + bool (*clamp_length)(struct netfs_read_subrequest *subreq); + void (*issue_op)(struct netfs_read_subrequest *subreq); + bool (*is_still_valid)(struct netfs_read_request *rreq); + int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, + struct page *page, void **_fsdata); + void (*done)(struct netfs_read_request *rreq); + void (*cleanup)(struct address_space *mapping, void *netfs_priv); + }; + +The operations are as follows: + + * ``init_rreq()`` + + [Optional] This is called to initialise the request structure. It is given + the file for reference and can modify the ->netfs_priv value. + + * ``is_cache_enabled()`` + + [Required] This is called by netfs_write_begin() to ask if the file is being + cached. It should return true if it is being cached and false otherwise. + + * ``begin_cache_operation()`` + + [Optional] This is called to ask the network filesystem to call into the + cache (if present) to initialise the caching state for this read. The netfs + library module cannot access the cache directly, so the cache should call + something like fscache_begin_read_operation() to do this. + + The cache gets to store its state in ->cache_resources and must set a table + of operations of its own there (though of a different type). + + This should return 0 on success and an error code otherwise. If an error is + reported, the operation may proceed anyway, just without local caching (only + out of memory and interruption errors cause failure here). + + * ``expand_readahead()`` + + [Optional] This is called to allow the filesystem to expand the size of a + readahead read request. The filesystem gets to expand the request in both + directions, though it's not permitted to reduce it as the numbers may + represent an allocation already made. If local caching is enabled, it gets + to expand the request first. + + Expansion is communicated by changing ->start and ->len in the request + structure. Note that if any change is made, ->len must be increased by at + least as much as ->start is reduced. + + * ``clamp_length()`` + + [Optional] This is called to allow the filesystem to reduce the size of a + subrequest. The filesystem can use this, for example, to chop up a request + that has to be split across multiple servers or to put multiple reads in + flight. + + This should return 0 on success and an error code on error. + + * ``issue_op()`` + + [Required] The helpers use this to dispatch a subrequest to the server for + reading. In the subrequest, ->start, ->len and ->transferred indicate what + data should be read from the server. + + There is no return value; the netfs_subreq_terminated() function should be + called to indicate whether or not the operation succeeded and how much data + it transferred. The filesystem also should not deal with setting pages + uptodate, unlocking them or dropping their refs - the helpers need to deal + with this as they have to coordinate with copying to the local cache. + + Note that the helpers have the pages locked, but not pinned. It is possible + to use the ITER_XARRAY iov iterator to refer to the range of the inode that + is being operated upon without the need to allocate large bvec tables. + + * ``is_still_valid()`` + + [Optional] This is called to find out if the data just read from the local + cache is still valid. It should return true if it is still valid and false + if not. If it's not still valid, it will be reread from the server. + + * ``check_write_begin()`` + + [Optional] This is called from the netfs_write_begin() helper once it has + allocated/grabbed the page to be modified to allow the filesystem to flush + conflicting state before allowing it to be modified. + + It should return 0 if everything is now fine, -EAGAIN if the page should be + regrabbed and any other error code to abort the operation. + + * ``done`` + + [Optional] This is called after the pages in the request have all been + unlocked (and marked uptodate if applicable). + + * ``cleanup`` + + [Optional] This is called as the request is being deallocated so that the + filesystem can clean up ->netfs_priv. + + + +Read Helper Procedure +--------------------- + +The read helpers work by the following general procedure: + + * Set up the request. + + * For readahead, allow the local cache and then the network filesystem to + propose expansions to the read request. This is then proposed to the VM. + If the VM cannot fully perform the expansion, a partially expanded read will + be performed, though this may not get written to the cache in its entirety. + + * Loop around slicing chunks off of the request to form subrequests: + + * If a local cache is present, it gets to do the slicing, otherwise the + helpers just try to generate maximal slices. + + * The network filesystem gets to clamp the size of each slice if it is to be + the source. This allows rsize and chunking to be implemented. + + * The helpers issue a read from the cache or a read from the server or just + clears the slice as appropriate. + + * The next slice begins at the end of the last one. + + * As slices finish being read, they terminate. + + * When all the subrequests have terminated, the subrequests are assessed and + any that are short or have failed are reissued: + + * Failed cache requests are issued against the server instead. + + * Failed server requests just fail. + + * Short reads against either source will be reissued against that source + provided they have transferred some more data: + + * The cache may need to skip holes that it can't do DIO from. + + * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the + end of the slice instead of reissuing. + + * Once the data is read, the pages that have been fully read/cleared: + + * Will be marked uptodate. + + * If a cache is present, will be marked with PG_fscache. + + * Unlocked + + * Any pages that need writing to the cache will then have DIO writes issued. + + * Synchronous operations will wait for reading to be complete. + + * Writes to the cache will proceed asynchronously and the pages will have the + PG_fscache mark removed when that completes. + + * The request structures will be cleaned up when everything has completed. + + +Read Helper Cache API +--------------------- + +When implementing a local cache to be used by the read helpers, two things are +required: some way for the network filesystem to initialise the caching for a +read request and a table of operations for the helpers to call. + +The network filesystem's ->begin_cache_operation() method is called to set up a +cache and this must call into the cache to do the work. If using fscache, for +example, the cache would call:: + + int fscache_begin_read_operation(struct netfs_read_request *rreq, + struct fscache_cookie *cookie); + +passing in the request pointer and the cookie corresponding to the file. + +The netfs_read_request object contains a place for the cache to hang its +state:: + + struct netfs_cache_resources { + const struct netfs_cache_ops *ops; + void *cache_priv; + void *cache_priv2; + }; + +This contains an operations table pointer and two private pointers. The +operation table looks like the following:: + + struct netfs_cache_ops { + void (*end_operation)(struct netfs_cache_resources *cres); + + void (*expand_readahead)(struct netfs_cache_resources *cres, + loff_t *_start, size_t *_len, loff_t i_size); + + enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq, + loff_t i_size); + + int (*read)(struct netfs_cache_resources *cres, + loff_t start_pos, + struct iov_iter *iter, + bool seek_data, + netfs_io_terminated_t term_func, + void *term_func_priv); + + int (*write)(struct netfs_cache_resources *cres, + loff_t start_pos, + struct iov_iter *iter, + netfs_io_terminated_t term_func, + void *term_func_priv); + }; + +With a termination handler function pointer:: + + typedef void (*netfs_io_terminated_t)(void *priv, + ssize_t transferred_or_error, + bool was_async); + +The methods defined in the table are: + + * ``end_operation()`` + + [Required] Called to clean up the resources at the end of the read request. + + * ``expand_readahead()`` + + [Optional] Called at the beginning of a netfs_readahead() operation to allow + the cache to expand a request in either direction. This allows the cache to + size the request appropriately for the cache granularity. + + The function is passed poiners to the start and length in its parameters, + plus the size of the file for reference, and adjusts the start and length + appropriately. It should return one of: + + * ``NETFS_FILL_WITH_ZEROES`` + * ``NETFS_DOWNLOAD_FROM_SERVER`` + * ``NETFS_READ_FROM_CACHE`` + * ``NETFS_INVALID_READ`` + + to indicate whether the slice should just be cleared or whether it should be + downloaded from the server or read from the cache - or whether slicing + should be given up at the current point. + + * ``prepare_read()`` + + [Required] Called to configure the next slice of a request. ->start and + ->len in the subrequest indicate where and how big the next slice can be; + the cache gets to reduce the length to match its granularity requirements. + + * ``read()`` + + [Required] Called to read from the cache. The start file offset is given + along with an iterator to read to, which gives the length also. It can be + given a hint requesting that it seek forward from that start position for + data. + + Also provided is a pointer to a termination handler function and private + data to pass to that function. The termination function should be called + with the number of bytes transferred or an error code, plus a flag + indicating whether the termination is definitely happening in the caller's + context. + + * ``write()`` + + [Required] Called to write to the cache. The start file offset is given + along with an iterator to write from, which gives the length also. + + Also provided is a pointer to a termination handler function and private + data to pass to that function. The termination function should be called + with the number of bytes transferred or an error code, plus a flag + indicating whether the termination is definitely happening in the caller's + context. + +Note that these methods are passed a pointer to the cache resource structure, +not the read request structure as they could be used in other situations where +there isn't a read request structure as well, such as writing dirty data to the +cache. From patchwork Wed Mar 10 16:55:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 397629 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6305CC43332 for ; Wed, 10 Mar 2021 16:57:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3CDF764FC7 for ; Wed, 10 Mar 2021 16:57:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233406AbhCJQ4b (ORCPT ); Wed, 10 Mar 2021 11:56:31 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:24965 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233534AbhCJQ4B (ORCPT ); Wed, 10 Mar 2021 11:56:01 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395360; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1RYVoXzqFKdR4U0WYTzkOXoOXi2/ID7WAnru9qnTxRg=; b=RBxfGvWctO/gqY2jK2QaGmYhKTX3dWEmGjR4GkwJk07i/2d1ezQkp0UV/5gD2wiuxGgxrL 9dcmLmu0b7NEi6dKG8w1BVeByawLZZYh9+PaQu/d0ynv1BNOITiYVYbVzeCJbakzDaMaNu JC4qz656/3KgaJwtAT5sgIDFMA9y678= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-380-ruTx7nFRORShHGyMDeCvOg-1; Wed, 10 Mar 2021 11:55:57 -0500 X-MC-Unique: ruTx7nFRORShHGyMDeCvOg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B423210449B3; Wed, 10 Mar 2021 16:55:55 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id F24D060C6E; Wed, 10 Mar 2021 16:55:45 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 06/28] netfs, mm: Move PG_fscache helper funcs to linux/netfs.h From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:55:45 +0000 Message-ID: <161539534516.286939.6265142985563005000.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Move the PG_fscache related helper funcs (such as SetPageFsCache()) to linux/netfs.h rather than linux/fscache.h as the intention is to move to a model where they're used by the network filesystem and the helper library, but not by fscache/cachefiles itself. Signed-off-by: David Howells cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161340392347.1303470.18065131603507621762.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/fscache.h | 11 +---------- include/linux/netfs.h | 29 +++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+), 10 deletions(-) create mode 100644 include/linux/netfs.h diff --git a/include/linux/fscache.h b/include/linux/fscache.h index a1c928fe98e7..1f8dc72369ee 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -19,6 +19,7 @@ #include #include #include +#include #if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE) #define fscache_available() (1) @@ -29,16 +30,6 @@ #endif -/* - * overload PG_private_2 to give us PG_fscache - this is used to indicate that - * a page is currently backed by a local disk cache - */ -#define PageFsCache(page) PagePrivate2((page)) -#define SetPageFsCache(page) SetPagePrivate2((page)) -#define ClearPageFsCache(page) ClearPagePrivate2((page)) -#define TestSetPageFsCache(page) TestSetPagePrivate2((page)) -#define TestClearPageFsCache(page) TestClearPagePrivate2((page)) - /* pattern used to fill dead space in an index entry */ #define FSCACHE_INDEX_DEADFILL_PATTERN 0x79 diff --git a/include/linux/netfs.h b/include/linux/netfs.h new file mode 100644 index 000000000000..cc1102040488 --- /dev/null +++ b/include/linux/netfs.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Network filesystem support services. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See: + * + * Documentation/filesystems/netfs_library.rst + * + * for a description of the network filesystem interface declared here. + */ + +#ifndef _LINUX_NETFS_H +#define _LINUX_NETFS_H + +#include + +/* + * Overload PG_private_2 to give us PG_fscache - this is used to indicate that + * a page is currently backed by a local disk cache + */ +#define PageFsCache(page) PagePrivate2((page)) +#define SetPageFsCache(page) SetPagePrivate2((page)) +#define ClearPageFsCache(page) ClearPagePrivate2((page)) +#define TestSetPageFsCache(page) TestSetPagePrivate2((page)) +#define TestClearPageFsCache(page) TestClearPagePrivate2((page)) + +#endif /* _LINUX_NETFS_H */ From patchwork Wed Mar 10 16:56:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396827 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 245A7C432C3 for ; Wed, 10 Mar 2021 16:57:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0E13D64FCA for ; Wed, 10 Mar 2021 16:57:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233569AbhCJQ4d (ORCPT ); Wed, 10 Mar 2021 11:56:33 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:34923 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233571AbhCJQ4M (ORCPT ); Wed, 10 Mar 2021 11:56:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395372; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3RkMaKS84Erw5iqrERot0PwU2wMGkYNCFnxXlts7pQU=; b=cZvc+RPOpdqi2rwFzA+mVbQyFjOrgjS/c+M4dh25FVErPIzJmfH6/MxwFQweQ0+2Rx8URn MMfeNV/mwLl0+4eLIzChmWeWtzg1posGSsIn0kHXhmBCphlDnV/5u1eK4QxKFXQgJr+Ybo viZ25wYQojw7MIlVAKRn7LuF8HK7LPU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-596-w3Zuq-KbOdGbIIgAEsl6Sg-1; Wed, 10 Mar 2021 11:56:10 -0500 X-MC-Unique: w3Zuq-KbOdGbIIgAEsl6Sg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 83A101934106; Wed, 10 Mar 2021 16:56:08 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 537A010013C1; Wed, 10 Mar 2021 16:56:01 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 07/28] netfs, mm: Add unlock_page_fscache() and wait_on_page_fscache() From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Linus Torvalds , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:56:00 +0000 Message-ID: <161539536093.286939.5076448803512118764.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Add unlock_page_fscache() as an alias of unlock_page_private_2(). This allows a page 'locked' with PG_fscache to be unlocked. Add wait_on_page_fscache() to wait for PG_fscache to be unlocked. [Linus suggested putting the fscache-themed functions into the caching-specific headers rather than pagemap.h[1]] Signed-off-by: David Howells cc: Linus Torvalds cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/1330473.1612974547@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=XAEMTKswPu1Yw4uaro1R3ibs27ztw@mail.gmail.com/ [1] Link: https://lore.kernel.org/r/161340393568.1303470.4997526899111310530.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/netfs.h | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/include/linux/netfs.h b/include/linux/netfs.h index cc1102040488..5ed8c6dc7453 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -26,4 +26,33 @@ #define TestSetPageFsCache(page) TestSetPagePrivate2((page)) #define TestClearPageFsCache(page) TestClearPagePrivate2((page)) +/** + * unlock_page_fscache - Unlock a page that's locked with PG_fscache + * @page: The page + * + * Unlocks a page that's locked with PG_fscache and wakes up sleepers in + * wait_on_page_fscache(). This page bit is used by the netfs helpers when a + * netfs page is being written to a local disk cache, thereby allowing writes + * to the cache for the same page to be serialised. + */ +static inline void unlock_page_fscache(struct page *page) +{ + unlock_page_private_2(page); +} + +/** + * wait_on_page_fscache - Wait for PG_fscache to be cleared on a page + * @page: The page + * + * Wait for the PG_fscache (PG_private_2) page bit to be removed from a page. + * This is, for example, used to handle a netfs page being written to a local + * disk cache, thereby allowing writes to the cache for the same page to be + * serialised. + */ +static inline void wait_on_page_fscache(struct page *page) +{ + if (PageFsCache(page)) + wait_on_page_bit(compound_head(page), PG_fscache); +} + #endif /* _LINUX_NETFS_H */ From patchwork Wed Mar 10 16:56:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 397628 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC957C4332E for ; Wed, 10 Mar 2021 16:57:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7BF4C64FE5 for ; Wed, 10 Mar 2021 16:57:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233644AbhCJQ47 (ORCPT ); Wed, 10 Mar 2021 11:56:59 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:34682 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233633AbhCJQ41 (ORCPT ); Wed, 10 Mar 2021 11:56:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395386; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FPJX7b9dndGN393JTy0OyOUFEb/PbK+waIrfRf/dU0U=; b=I3BHJtpcy4yrI9qF/0MMZ/CxeYLQMq/TEDvfz6/XI5efEyAISkBJz0Ep9WFSebPLZ8KdlF wHAuIuKwO4g+tGrAdsaog4F6k9kPn1P7KFMp+ME34m1YUg7Uqk79A9iohFVklzKfFkgGRy 6hqCCehO/jhFemcBFbqo4W3DJKynf6E= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-329-GGcgcgwdMEacjZs1atKNYw-1; Wed, 10 Mar 2021 11:56:23 -0500 X-MC-Unique: GGcgcgwdMEacjZs1atKNYw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AEF3210866A6; Wed, 10 Mar 2021 16:56:21 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1E91446; Wed, 10 Mar 2021 16:56:14 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 08/28] netfs: Provide readahead and readpage netfs helpers From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:56:13 +0000 Message-ID: <161539537375.286939.16642940088716990995.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Add a pair of helper functions: (*) netfs_readahead() (*) netfs_readpage() to do the work of handling a readahead or a readpage, where the page(s) that form part of the request may be split between the local cache, the server or just require clearing, and may be single pages and transparent huge pages. This is all handled within the helper. Note that while both will read from the cache if there is data present, only netfs_readahead() will expand the request beyond what it was asked to do, and only netfs_readahead() will write back to the cache. netfs_readpage(), on the other hand, is synchronous and only fetches the page (which might be a THP) it is asked for. The netfs gives the helper parameters from the VM, the cache cookie it wants to use (or NULL) and a table of operations (only one of which is mandatory): (*) expand_readahead() [optional] Called to allow the netfs to request an expansion of a readahead request to meet its own alignment requirements. This is done by changing rreq->start and rreq->len. (*) clamp_length() [optional] Called to allow the netfs to cut down a subrequest to meet its own boundary requirements. If it does this, the helper will generate additional subrequests until the full request is satisfied. (*) is_still_valid() [optional] Called to find out if the data just read from the cache has been invalidated and must be reread from the server. (*) issue_op() [required] Called to ask the netfs to issue a read to the server. The subrequest describes the read. The read request holds information about the file being accessed. The netfs can cache information in rreq->netfs_priv. Upon completion, the netfs should set the error, transferred and can also set FSCACHE_SREQ_CLEAR_TAIL and then call fscache_subreq_terminated(). (*) done() [optional] Called after the pages have been unlocked. The read request is still pinning the file and mapping and may still be pinning pages with PG_fscache. rreq->error indicates any error that has been accumulated. (*) cleanup() [optional] Called when the helper is disposing of a finished read request. This allows the netfs to clear rreq->netfs_priv. Netfs support is enabled with CONFIG_NETFS_SUPPORT=y. It will be built even if CONFIG_FSCACHE=n and in this case much of it should be optimised away, allowing the filesystem to use it even when caching is disabled. Changes: - Folded in a kerneldoc comment fix. - Folded in a fix for the error handling in the case that ENOMEM occurs. - Added flag to netfs_subreq_terminated() to indicate that the caller may have been running async and stuff that might sleep needs punting to a workqueue (can't use in_softirq()[1]). Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588497406.3465195.18003475695899726222.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118136849.1232039.8923686136144228724.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161032290.2537118.13400578415247339173.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340394873.1303470.6237319335883242536.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [1] --- fs/Kconfig | 1 fs/Makefile | 1 fs/netfs/Makefile | 6 fs/netfs/internal.h | 61 ++++ fs/netfs/read_helper.c | 717 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 82 +++++ 6 files changed, 868 insertions(+) create mode 100644 fs/netfs/Makefile create mode 100644 fs/netfs/internal.h create mode 100644 fs/netfs/read_helper.c diff --git a/fs/Kconfig b/fs/Kconfig index 462253ae483a..eccbcf1e3f2e 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -125,6 +125,7 @@ source "fs/overlayfs/Kconfig" menu "Caches" +source "fs/netfs/Kconfig" source "fs/fscache/Kconfig" source "fs/cachefiles/Kconfig" diff --git a/fs/Makefile b/fs/Makefile index 3215fe205256..9c708e1fbe8f 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -67,6 +67,7 @@ obj-y += devpts/ obj-$(CONFIG_DLM) += dlm/ # Do not add any filesystems before this line +obj-$(CONFIG_NETFS_SUPPORT) += netfs/ obj-$(CONFIG_FSCACHE) += fscache/ obj-$(CONFIG_REISERFS_FS) += reiserfs/ obj-$(CONFIG_EXT4_FS) += ext4/ diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile new file mode 100644 index 000000000000..4b4eff2ba369 --- /dev/null +++ b/fs/netfs/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0 + +netfs-y := \ + read_helper.o + +obj-$(CONFIG_NETFS_SUPPORT) := netfs.o diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h new file mode 100644 index 000000000000..ee665c0e7dc8 --- /dev/null +++ b/fs/netfs/internal.h @@ -0,0 +1,61 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Internal definitions for network filesystem support + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) "netfs: " fmt + +/* + * read_helper.c + */ +extern unsigned int netfs_debug; + +#define netfs_stat(x) do {} while(0) +#define netfs_stat_d(x) do {} while(0) + +/*****************************************************************************/ +/* + * debug tracing + */ +#define dbgprintk(FMT, ...) \ + printk("[%-6.6s] "FMT"\n", current->comm, ##__VA_ARGS__) + +#define kenter(FMT, ...) dbgprintk("==> %s("FMT")", __func__, ##__VA_ARGS__) +#define kleave(FMT, ...) dbgprintk("<== %s()"FMT"", __func__, ##__VA_ARGS__) +#define kdebug(FMT, ...) dbgprintk(FMT, ##__VA_ARGS__) + +#ifdef __KDEBUG +#define _enter(FMT, ...) kenter(FMT, ##__VA_ARGS__) +#define _leave(FMT, ...) kleave(FMT, ##__VA_ARGS__) +#define _debug(FMT, ...) kdebug(FMT, ##__VA_ARGS__) + +#elif defined(CONFIG_NETFS_DEBUG) +#define _enter(FMT, ...) \ +do { \ + if (netfs_debug) \ + kenter(FMT, ##__VA_ARGS__); \ +} while (0) + +#define _leave(FMT, ...) \ +do { \ + if (netfs_debug) \ + kleave(FMT, ##__VA_ARGS__); \ +} while (0) + +#define _debug(FMT, ...) \ +do { \ + if (netfs_debug) \ + kdebug(FMT, ##__VA_ARGS__); \ +} while (0) + +#else +#define _enter(FMT, ...) no_printk("==> %s("FMT")", __func__, ##__VA_ARGS__) +#define _leave(FMT, ...) no_printk("<== %s()"FMT"", __func__, ##__VA_ARGS__) +#define _debug(FMT, ...) no_printk(FMT, ##__VA_ARGS__) +#endif diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c new file mode 100644 index 000000000000..c5ed0dcc9571 --- /dev/null +++ b/fs/netfs/read_helper.c @@ -0,0 +1,717 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Network filesystem high-level read support. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "internal.h" + +MODULE_DESCRIPTION("Network fs support"); +MODULE_AUTHOR("Red Hat, Inc."); +MODULE_LICENSE("GPL"); + +unsigned netfs_debug; +module_param_named(debug, netfs_debug, uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(netfs_debug, "Netfs support debugging mask"); + +static void netfs_rreq_work(struct work_struct *); +static void __netfs_put_subrequest(struct netfs_read_subrequest *, bool); + +static void netfs_put_subrequest(struct netfs_read_subrequest *subreq, + bool was_async) +{ + if (refcount_dec_and_test(&subreq->usage)) + __netfs_put_subrequest(subreq, was_async); +} + +static struct netfs_read_request *netfs_alloc_read_request( + const struct netfs_read_request_ops *ops, void *netfs_priv, + struct file *file) +{ + struct netfs_read_request *rreq; + + rreq = kzalloc(sizeof(struct netfs_read_request), GFP_KERNEL); + if (rreq) { + rreq->netfs_ops = ops; + rreq->netfs_priv = netfs_priv; + rreq->inode = file_inode(file); + rreq->i_size = i_size_read(rreq->inode); + INIT_LIST_HEAD(&rreq->subrequests); + INIT_WORK(&rreq->work, netfs_rreq_work); + refcount_set(&rreq->usage, 1); + __set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags); + ops->init_rreq(rreq, file); + } + + return rreq; +} + +static void netfs_get_read_request(struct netfs_read_request *rreq) +{ + refcount_inc(&rreq->usage); +} + +static void netfs_rreq_clear_subreqs(struct netfs_read_request *rreq, + bool was_async) +{ + struct netfs_read_subrequest *subreq; + + while (!list_empty(&rreq->subrequests)) { + subreq = list_first_entry(&rreq->subrequests, + struct netfs_read_subrequest, rreq_link); + list_del(&subreq->rreq_link); + netfs_put_subrequest(subreq, was_async); + } +} + +static void netfs_free_read_request(struct work_struct *work) +{ + struct netfs_read_request *rreq = + container_of(work, struct netfs_read_request, work); + netfs_rreq_clear_subreqs(rreq, false); + if (rreq->netfs_priv) + rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv); + kfree(rreq); +} + +static void netfs_put_read_request(struct netfs_read_request *rreq, bool was_async) +{ + if (refcount_dec_and_test(&rreq->usage)) { + if (was_async) { + rreq->work.func = netfs_free_read_request; + if (!queue_work(system_unbound_wq, &rreq->work)) + BUG(); + } else { + netfs_free_read_request(&rreq->work); + } + } +} + +/* + * Allocate and partially initialise an I/O request structure. + */ +static struct netfs_read_subrequest *netfs_alloc_subrequest( + struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq; + + subreq = kzalloc(sizeof(struct netfs_read_subrequest), GFP_KERNEL); + if (subreq) { + INIT_LIST_HEAD(&subreq->rreq_link); + refcount_set(&subreq->usage, 2); + subreq->rreq = rreq; + netfs_get_read_request(rreq); + } + + return subreq; +} + +static void netfs_get_read_subrequest(struct netfs_read_subrequest *subreq) +{ + refcount_inc(&subreq->usage); +} + +static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq, + bool was_async) +{ + struct netfs_read_request *rreq = subreq->rreq; + + kfree(subreq); + netfs_put_read_request(rreq, was_async); +} + +/* + * Clear the unread part of an I/O request. + */ +static void netfs_clear_unread(struct netfs_read_subrequest *subreq) +{ + struct iov_iter iter; + + iov_iter_xarray(&iter, WRITE, &subreq->rreq->mapping->i_pages, + subreq->start + subreq->transferred, + subreq->len - subreq->transferred); + iov_iter_zero(iov_iter_count(&iter), &iter); +} + +/* + * Fill a subrequest region with zeroes. + */ +static void netfs_fill_with_zeroes(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq) +{ + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + netfs_subreq_terminated(subreq, 0, false); +} + +/* + * Ask the netfs to issue a read request to the server for us. + * + * The netfs is expected to read from subreq->pos + subreq->transferred to + * subreq->pos + subreq->len - 1. It may not backtrack and write data into the + * buffer prior to the transferred point as it might clobber dirty data + * obtained from the cache. + * + * Alternatively, the netfs is allowed to indicate one of two things: + * + * - NETFS_SREQ_SHORT_READ: A short read - it will get called again to try and + * make progress. + * + * - NETFS_SREQ_CLEAR_TAIL: A short read - the rest of the buffer will be + * cleared. + */ +static void netfs_read_from_server(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq) +{ + rreq->netfs_ops->issue_op(subreq); +} + +/* + * Release those waiting. + */ +static void netfs_rreq_completed(struct netfs_read_request *rreq, bool was_async) +{ + netfs_rreq_clear_subreqs(rreq, was_async); + netfs_put_read_request(rreq, was_async); +} + +/* + * Unlock the pages in a read operation. We need to set PG_fscache on any + * pages we're going to write back before we unlock them. + */ +static void netfs_rreq_unlock(struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq; + struct page *page; + unsigned int iopos, account = 0; + pgoff_t start_page = rreq->start / PAGE_SIZE; + pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1; + bool subreq_failed = false; + int i; + + XA_STATE(xas, &rreq->mapping->i_pages, start_page); + + if (test_bit(NETFS_RREQ_FAILED, &rreq->flags)) { + __clear_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags); + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + __clear_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags); + } + } + + /* Walk through the pagecache and the I/O request lists simultaneously. + * We may have a mixture of cached and uncached sections and we only + * really want to write out the uncached sections. This is slightly + * complicated by the possibility that we might have huge pages with a + * mixture inside. + */ + subreq = list_first_entry(&rreq->subrequests, + struct netfs_read_subrequest, rreq_link); + iopos = 0; + subreq_failed = (subreq->error < 0); + + rcu_read_lock(); + xas_for_each(&xas, page, last_page) { + unsigned int pgpos = (page->index - start_page) * PAGE_SIZE; + unsigned int pgend = pgpos + thp_size(page); + bool pg_failed = false; + + for (;;) { + if (!subreq) { + pg_failed = true; + break; + } + if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) + SetPageFsCache(page); + pg_failed |= subreq_failed; + if (pgend < iopos + subreq->len) + break; + + account += subreq->transferred; + iopos += subreq->len; + if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { + subreq = list_next_entry(subreq, rreq_link); + subreq_failed = (subreq->error < 0); + } else { + subreq = NULL; + subreq_failed = false; + } + if (pgend == iopos) + break; + } + + if (!pg_failed) { + for (i = 0; i < thp_nr_pages(page); i++) + flush_dcache_page(page); + SetPageUptodate(page); + } + + if (!test_bit(NETFS_RREQ_DONT_UNLOCK_PAGES, &rreq->flags)) { + if (page->index == rreq->no_unlock_page && + test_bit(NETFS_RREQ_NO_UNLOCK_PAGE, &rreq->flags)) + _debug("no unlock"); + else + unlock_page(page); + } + } + rcu_read_unlock(); + + task_io_account_read(account); + if (rreq->netfs_ops->done) + rreq->netfs_ops->done(rreq); +} + +/* + * Handle a short read. + */ +static void netfs_rreq_short_read(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq) +{ + __clear_bit(NETFS_SREQ_SHORT_READ, &subreq->flags); + __set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags); + + netfs_get_read_subrequest(subreq); + atomic_inc(&rreq->nr_rd_ops); + netfs_read_from_server(rreq, subreq); +} + +/* + * Resubmit any short or failed operations. Returns true if we got the rreq + * ref back. + */ +static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq; + + WARN_ON(in_interrupt()); + + /* We don't want terminating submissions trying to wake us up whilst + * we're still going through the list. + */ + atomic_inc(&rreq->nr_rd_ops); + + __clear_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + if (subreq->error) { + if (subreq->source != NETFS_READ_FROM_CACHE) + break; + subreq->source = NETFS_DOWNLOAD_FROM_SERVER; + subreq->error = 0; + netfs_get_read_subrequest(subreq); + atomic_inc(&rreq->nr_rd_ops); + netfs_read_from_server(rreq, subreq); + } else if (test_bit(NETFS_SREQ_SHORT_READ, &subreq->flags)) { + netfs_rreq_short_read(rreq, subreq); + } + } + + /* If we decrement nr_rd_ops to 0, the usage ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_rd_ops)) + return true; + + wake_up_var(&rreq->nr_rd_ops); + return false; +} + +/* + * Assess the state of a read request and decide what to do next. + * + * Note that we could be in an ordinary kernel thread, on a workqueue or in + * softirq context at this point. We inherit a ref from the caller. + */ +static void netfs_rreq_assess(struct netfs_read_request *rreq, bool was_async) +{ +again: + if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) && + test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) { + if (netfs_rreq_perform_resubmissions(rreq)) + goto again; + return; + } + + netfs_rreq_unlock(rreq); + + clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags); + wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS); + + netfs_rreq_completed(rreq, was_async); +} + +static void netfs_rreq_work(struct work_struct *work) +{ + struct netfs_read_request *rreq = + container_of(work, struct netfs_read_request, work); + netfs_rreq_assess(rreq, false); +} + +/* + * Handle the completion of all outstanding I/O operations on a read request. + * We inherit a ref from the caller. + */ +static void netfs_rreq_terminated(struct netfs_read_request *rreq, + bool was_async) +{ + if (test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags) && + was_async) { + if (!queue_work(system_unbound_wq, &rreq->work)) + BUG(); + } else { + netfs_rreq_assess(rreq, was_async); + } +} + +/** + * netfs_subreq_terminated - Note the termination of an I/O operation. + * @subreq: The I/O request that has terminated. + * @transferred_or_error: The amount of data transferred or an error code. + * @was_async: The termination was asynchronous + * + * This tells the read helper that a contributory I/O operation has terminated, + * one way or another, and that it should integrate the results. + * + * The caller indicates in @transferred_or_error the outcome of the operation, + * supplying a positive value to indicate the number of bytes transferred, 0 to + * indicate a failure to transfer anything that should be retried or a negative + * error code. The helper will look after reissuing I/O operations as + * appropriate and writing downloaded data to the cache. + * + * If @was_async is true, the caller might be running in softirq or interrupt + * context and we can't sleep. + */ +void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, + ssize_t transferred_or_error, + bool was_async) +{ + struct netfs_read_request *rreq = subreq->rreq; + int u; + + _enter("[%u]{%llx,%lx},%zd", + subreq->debug_index, subreq->start, subreq->flags, + transferred_or_error); + + if (IS_ERR_VALUE(transferred_or_error)) { + subreq->error = transferred_or_error; + goto failed; + } + + if (WARN_ON(transferred_or_error > subreq->len - subreq->transferred)) + transferred_or_error = subreq->len - subreq->transferred; + + subreq->error = 0; + subreq->transferred += transferred_or_error; + if (subreq->transferred < subreq->len) + goto incomplete; + +complete: + __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); + if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) + set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags); + +out: + /* If we decrement nr_rd_ops to 0, the ref belongs to us. */ + u = atomic_dec_return(&rreq->nr_rd_ops); + if (u == 0) + netfs_rreq_terminated(rreq, was_async); + else if (u == 1) + wake_up_var(&rreq->nr_rd_ops); + + netfs_put_subrequest(subreq, was_async); + return; + +incomplete: + if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) { + netfs_clear_unread(subreq); + subreq->transferred = subreq->len; + goto complete; + } + + if (transferred_or_error == 0) { + if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) { + subreq->error = -ENODATA; + goto failed; + } + } else { + __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); + } + + __set_bit(NETFS_SREQ_SHORT_READ, &subreq->flags); + set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + goto out; + +failed: + if (subreq->source == NETFS_READ_FROM_CACHE) { + set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + } else { + set_bit(NETFS_RREQ_FAILED, &rreq->flags); + rreq->error = subreq->error; + } + goto out; +} +EXPORT_SYMBOL(netfs_subreq_terminated); + +static enum netfs_read_source netfs_cache_prepare_read(struct netfs_read_subrequest *subreq, + loff_t i_size) +{ + struct netfs_read_request *rreq = subreq->rreq; + + if (subreq->start >= rreq->i_size) + return NETFS_FILL_WITH_ZEROES; + return NETFS_DOWNLOAD_FROM_SERVER; +} + +/* + * Work out what sort of subrequest the next one will be. + */ +static enum netfs_read_source +netfs_rreq_prepare_read(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq) +{ + enum netfs_read_source source; + + _enter("%llx-%llx,%llx", subreq->start, subreq->start + subreq->len, rreq->i_size); + + source = netfs_cache_prepare_read(subreq, rreq->i_size); + if (source == NETFS_INVALID_READ) + goto out; + + if (source == NETFS_DOWNLOAD_FROM_SERVER) { + /* Call out to the netfs to let it shrink the request to fit + * its own I/O sizes and boundaries. If it shinks it here, it + * will be called again to make simultaneous calls; if it wants + * to make serial calls, it can indicate a short read and then + * we will call it again. + */ + if (subreq->len > rreq->i_size - subreq->start) + subreq->len = rreq->i_size - subreq->start; + + if (rreq->netfs_ops->clamp_length && + !rreq->netfs_ops->clamp_length(subreq)) { + source = NETFS_INVALID_READ; + goto out; + } + } + + if (WARN_ON(subreq->len == 0)) + source = NETFS_INVALID_READ; + +out: + subreq->source = source; + return source; +} + +/* + * Slice off a piece of a read request and submit an I/O request for it. + */ +static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq, + unsigned int *_debug_index) +{ + struct netfs_read_subrequest *subreq; + enum netfs_read_source source; + + subreq = netfs_alloc_subrequest(rreq); + if (!subreq) + return false; + + subreq->debug_index = (*_debug_index)++; + subreq->start = rreq->start + rreq->submitted; + subreq->len = rreq->len - rreq->submitted; + + _debug("slice %llx,%zx,%zx", subreq->start, subreq->len, rreq->submitted); + list_add_tail(&subreq->rreq_link, &rreq->subrequests); + + /* Call out to the cache to find out what it can do with the remaining + * subset. It tells us in subreq->flags what it decided should be done + * and adjusts subreq->len down if the subset crosses a cache boundary. + * + * Then when we hand the subset, it can choose to take a subset of that + * (the starts must coincide), in which case, we go around the loop + * again and ask it to download the next piece. + */ + source = netfs_rreq_prepare_read(rreq, subreq); + if (source == NETFS_INVALID_READ) + goto subreq_failed; + + atomic_inc(&rreq->nr_rd_ops); + + rreq->submitted += subreq->len; + + switch (source) { + case NETFS_FILL_WITH_ZEROES: + netfs_fill_with_zeroes(rreq, subreq); + break; + case NETFS_DOWNLOAD_FROM_SERVER: + netfs_read_from_server(rreq, subreq); + break; + default: + BUG(); + } + + return true; + +subreq_failed: + rreq->error = subreq->error; + netfs_put_subrequest(subreq, false); + return false; +} + +static void netfs_rreq_expand(struct netfs_read_request *rreq, + struct readahead_control *ractl) +{ + /* Give the netfs a chance to change the request parameters. The + * resultant request must contain the original region. + */ + if (rreq->netfs_ops->expand_readahead) + rreq->netfs_ops->expand_readahead(rreq); + + /* Expand the request if the cache wants it to start earlier. Note + * that the expansion may get further extended if the VM wishes to + * insert THPs and the preferred start and/or end wind up in the middle + * of THPs. + * + * If this is the case, however, the THP size should be an integer + * multiple of the cache granule size, so we get a whole number of + * granules to deal with. + */ + if (rreq->start != readahead_pos(ractl) || + rreq->len != readahead_length(ractl)) { + readahead_expand(ractl, rreq->start, rreq->len); + rreq->start = readahead_pos(ractl); + rreq->len = readahead_length(ractl); + } +} + +/** + * netfs_readahead - Helper to manage a read request + * @ractl: The description of the readahead request + * @ops: The network filesystem's operations for the helper to use + * @netfs_priv: Private netfs data to be retained in the request + * + * Fulfil a readahead request by drawing data from the cache if possible, or + * the netfs if not. Space beyond the EOF is zero-filled. Multiple I/O + * requests from different sources will get munged together. If necessary, the + * readahead window can be expanded in either direction to a more convenient + * alighment for RPC efficiency or to make storage in the cache feasible. + * + * The calling netfs must provide a table of operations, only one of which, + * issue_op, is mandatory. It may also be passed a private token, which will + * be retained in rreq->netfs_priv and will be cleaned up by ops->cleanup(). + * + * This is usable whether or not caching is enabled. + */ +void netfs_readahead(struct readahead_control *ractl, + const struct netfs_read_request_ops *ops, + void *netfs_priv) +{ + struct netfs_read_request *rreq; + struct page *page; + unsigned int debug_index = 0; + + _enter("%lx,%x", readahead_index(ractl), readahead_count(ractl)); + + if (readahead_count(ractl) == 0) + goto cleanup; + + rreq = netfs_alloc_read_request(ops, netfs_priv, ractl->file); + if (!rreq) + goto cleanup; + rreq->mapping = ractl->mapping; + rreq->start = readahead_pos(ractl); + rreq->len = readahead_length(ractl); + + netfs_rreq_expand(rreq, ractl); + + atomic_set(&rreq->nr_rd_ops, 1); + do { + if (!netfs_rreq_submit_slice(rreq, &debug_index)) + break; + + } while (rreq->submitted < rreq->len); + + while ((page = readahead_page(ractl))) + put_page(page); + + /* If we decrement nr_rd_ops to 0, the ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_rd_ops)) + netfs_rreq_assess(rreq, false); + return; + +cleanup: + if (netfs_priv) + ops->cleanup(ractl->mapping, netfs_priv); + return; +} +EXPORT_SYMBOL(netfs_readahead); + +/** + * netfs_page - Helper to manage a readpage request + * @file: The file to read from + * @page: The page to read + * @ops: The network filesystem's operations for the helper to use + * @netfs_priv: Private netfs data to be retained in the request + * + * Fulfil a readpage request by drawing data from the cache if possible, or the + * netfs if not. Space beyond the EOF is zero-filled. Multiple I/O requests + * from different sources will get munged together. + * + * The calling netfs must provide a table of operations, only one of which, + * issue_op, is mandatory. It may also be passed a private token, which will + * be retained in rreq->netfs_priv and will be cleaned up by ops->cleanup(). + * + * This is usable whether or not caching is enabled. + */ +int netfs_readpage(struct file *file, + struct page *page, + const struct netfs_read_request_ops *ops, + void *netfs_priv) +{ + struct netfs_read_request *rreq; + unsigned int debug_index = 0; + int ret; + + _enter("%lx", page->index); + + rreq = netfs_alloc_read_request(ops, netfs_priv, file); + if (!rreq) { + if (netfs_priv) + ops->cleanup(netfs_priv, page->mapping); + unlock_page(page); + return -ENOMEM; + } + rreq->mapping = page->mapping; + rreq->start = page->index * PAGE_SIZE; + rreq->len = thp_size(page); + + netfs_get_read_request(rreq); + + atomic_set(&rreq->nr_rd_ops, 1); + do { + if (!netfs_rreq_submit_slice(rreq, &debug_index)) + break; + + } while (rreq->submitted < rreq->len); + + /* Keep nr_rd_ops incremented so that the ref always belongs to us, and + * the service code isn't punted off to a random thread pool to + * process. + */ + do { + wait_var_event(&rreq->nr_rd_ops, atomic_read(&rreq->nr_rd_ops) == 1); + netfs_rreq_assess(rreq, false); + } while (test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags)); + + ret = rreq->error; + if (ret == 0 && rreq->submitted < rreq->len) + ret = -EIO; + netfs_put_read_request(rreq, false); + return ret; +} +EXPORT_SYMBOL(netfs_readpage); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 5ed8c6dc7453..d5453f625610 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -14,6 +14,8 @@ #ifndef _LINUX_NETFS_H #define _LINUX_NETFS_H +#include +#include #include /* @@ -55,4 +57,84 @@ static inline void wait_on_page_fscache(struct page *page) wait_on_page_bit(compound_head(page), PG_fscache); } +enum netfs_read_source { + NETFS_FILL_WITH_ZEROES, + NETFS_DOWNLOAD_FROM_SERVER, + NETFS_READ_FROM_CACHE, + NETFS_INVALID_READ, +} __mode(byte); + +/* + * Descriptor for a single component subrequest. + */ +struct netfs_read_subrequest { + struct netfs_read_request *rreq; /* Supervising read request */ + struct list_head rreq_link; /* Link in rreq->subrequests */ + loff_t start; /* Where to start the I/O */ + size_t len; /* Size of the I/O */ + size_t transferred; /* Amount of data transferred */ + refcount_t usage; + short error; /* 0 or error that occurred */ + unsigned short debug_index; /* Index in list (for debugging output) */ + enum netfs_read_source source; /* Where to read from */ + unsigned long flags; +#define NETFS_SREQ_WRITE_TO_CACHE 0 /* Set if should write to cache */ +#define NETFS_SREQ_CLEAR_TAIL 1 /* Set if the rest of the read should be cleared */ +#define NETFS_SREQ_SHORT_READ 2 /* Set if there was a short read from the cache */ +#define NETFS_SREQ_SEEK_DATA_READ 3 /* Set if ->read() should SEEK_DATA first */ +#define NETFS_SREQ_NO_PROGRESS 4 /* Set if we didn't manage to read any data */ +}; + +/* + * Descriptor for a read helper request. This is used to make multiple I/O + * requests on a variety of sources and then stitch the result together. + */ +struct netfs_read_request { + struct work_struct work; + struct inode *inode; /* The file being accessed */ + struct address_space *mapping; /* The mapping being accessed */ + struct list_head subrequests; /* Requests to fetch I/O from disk or net */ + void *netfs_priv; /* Private data for the netfs */ + atomic_t nr_rd_ops; /* Number of read ops in progress */ + size_t submitted; /* Amount submitted for I/O so far */ + size_t len; /* Length of the request */ + short error; /* 0 or error that occurred */ + loff_t i_size; /* Size of the file */ + loff_t start; /* Start position */ + pgoff_t no_unlock_page; /* Don't unlock this page after read */ + refcount_t usage; + unsigned long flags; +#define NETFS_RREQ_INCOMPLETE_IO 0 /* Some ioreqs terminated short or with error */ +#define NETFS_RREQ_WRITE_TO_CACHE 1 /* Need to write to the cache */ +#define NETFS_RREQ_NO_UNLOCK_PAGE 2 /* Don't unlock no_unlock_page on completion */ +#define NETFS_RREQ_DONT_UNLOCK_PAGES 3 /* Don't unlock the pages on completion */ +#define NETFS_RREQ_FAILED 4 /* The request failed */ +#define NETFS_RREQ_IN_PROGRESS 5 /* Unlocked when the request completes */ + const struct netfs_read_request_ops *netfs_ops; +}; + +/* + * Operations the network filesystem can/must provide to the helpers. + */ +struct netfs_read_request_ops { + void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); + void (*expand_readahead)(struct netfs_read_request *rreq); + bool (*clamp_length)(struct netfs_read_subrequest *subreq); + void (*issue_op)(struct netfs_read_subrequest *subreq); + bool (*is_still_valid)(struct netfs_read_request *rreq); + void (*done)(struct netfs_read_request *rreq); + void (*cleanup)(struct address_space *mapping, void *netfs_priv); +}; + +struct readahead_control; +extern void netfs_readahead(struct readahead_control *, + const struct netfs_read_request_ops *, + void *); +extern int netfs_readpage(struct file *, + struct page *, + const struct netfs_read_request_ops *, + void *); + +extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); + #endif /* _LINUX_NETFS_H */ From patchwork Wed Mar 10 16:56:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396816 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BDF3C433E0 for ; Wed, 10 Mar 2021 17:04:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0B67664FB0 for ; Wed, 10 Mar 2021 17:04:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233653AbhCJQ5B (ORCPT ); Wed, 10 Mar 2021 11:57:01 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:20136 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233590AbhCJQ4k (ORCPT ); Wed, 10 Mar 2021 11:56:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395400; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4GCv4I1iASJ1LZas4NpXejo/8oEgSQY2RrQvJ/ewopk=; b=epxHjKn2RSsucdi0ppSmCHi1/fC4lIx8wljQyBIdWVEWp/iqJhA/R1BceQOlug0729puU9 9gjHEb+4CYYELL3cTnn+mfPCUREOXcjh0thaGHwfWVkKHrT/T1A9zWBSZMJCn7dLgbzO4q pgIR8oU+BNXZglSx0nPNd+nEWo6nijQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-214-HTL4YWLlOEuKZYNz0HIkig-1; Wed, 10 Mar 2021 11:56:36 -0500 X-MC-Unique: HTL4YWLlOEuKZYNz0HIkig-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5361380432D; Wed, 10 Mar 2021 16:56:34 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0A7732CFDB; Wed, 10 Mar 2021 16:56:27 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 09/28] netfs: Add tracepoints From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:56:26 +0000 Message-ID: <161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 --- fs/netfs/read_helper.c | 28 ++++++ include/linux/netfs.h | 2 include/trace/events/netfs.h | 199 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 229 insertions(+) create mode 100644 include/trace/events/netfs.h diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index c5ed0dcc9571..e64d9dad0c36 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -16,6 +16,8 @@ #include #include #include "internal.h" +#define CREATE_TRACE_POINTS +#include MODULE_DESCRIPTION("Network fs support"); MODULE_AUTHOR("Red Hat, Inc."); @@ -39,6 +41,7 @@ static struct netfs_read_request *netfs_alloc_read_request( const struct netfs_read_request_ops *ops, void *netfs_priv, struct file *file) { + static atomic_t debug_ids; struct netfs_read_request *rreq; rreq = kzalloc(sizeof(struct netfs_read_request), GFP_KERNEL); @@ -47,6 +50,7 @@ static struct netfs_read_request *netfs_alloc_read_request( rreq->netfs_priv = netfs_priv; rreq->inode = file_inode(file); rreq->i_size = i_size_read(rreq->inode); + rreq->debug_id = atomic_inc_return(&debug_ids); INIT_LIST_HEAD(&rreq->subrequests); INIT_WORK(&rreq->work, netfs_rreq_work); refcount_set(&rreq->usage, 1); @@ -82,6 +86,7 @@ static void netfs_free_read_request(struct work_struct *work) netfs_rreq_clear_subreqs(rreq, false); if (rreq->netfs_priv) rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv); + trace_netfs_rreq(rreq, netfs_rreq_trace_free); kfree(rreq); } @@ -127,6 +132,7 @@ static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq, { struct netfs_read_request *rreq = subreq->rreq; + trace_netfs_sreq(subreq, netfs_sreq_trace_free); kfree(subreq); netfs_put_read_request(rreq, was_async); } @@ -181,6 +187,7 @@ static void netfs_read_from_server(struct netfs_read_request *rreq, */ static void netfs_rreq_completed(struct netfs_read_request *rreq, bool was_async) { + trace_netfs_rreq(rreq, netfs_rreq_trace_done); netfs_rreq_clear_subreqs(rreq, was_async); netfs_put_read_request(rreq, was_async); } @@ -219,6 +226,8 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq) iopos = 0; subreq_failed = (subreq->error < 0); + trace_netfs_rreq(rreq, netfs_rreq_trace_unlock); + rcu_read_lock(); xas_for_each(&xas, page, last_page) { unsigned int pgpos = (page->index - start_page) * PAGE_SIZE; @@ -279,6 +288,8 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq, __clear_bit(NETFS_SREQ_SHORT_READ, &subreq->flags); __set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags); + trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short); + netfs_get_read_subrequest(subreq); atomic_inc(&rreq->nr_rd_ops); netfs_read_from_server(rreq, subreq); @@ -294,6 +305,8 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) WARN_ON(in_interrupt()); + trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit); + /* We don't want terminating submissions trying to wake us up whilst * we're still going through the list. */ @@ -306,6 +319,7 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) break; subreq->source = NETFS_DOWNLOAD_FROM_SERVER; subreq->error = 0; + trace_netfs_sreq(subreq, netfs_sreq_trace_download_instead); netfs_get_read_subrequest(subreq); atomic_inc(&rreq->nr_rd_ops); netfs_read_from_server(rreq, subreq); @@ -330,6 +344,8 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) */ static void netfs_rreq_assess(struct netfs_read_request *rreq, bool was_async) { + trace_netfs_rreq(rreq, netfs_rreq_trace_assess); + again: if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) && test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) { @@ -417,6 +433,8 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags); out: + trace_netfs_sreq(subreq, netfs_sreq_trace_terminated); + /* If we decrement nr_rd_ops to 0, the ref belongs to us. */ u = atomic_dec_return(&rreq->nr_rd_ops); if (u == 0) @@ -505,6 +523,7 @@ netfs_rreq_prepare_read(struct netfs_read_request *rreq, out: subreq->source = source; + trace_netfs_sreq(subreq, netfs_sreq_trace_prepare); return source; } @@ -544,6 +563,7 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq, rreq->submitted += subreq->len; + trace_netfs_sreq(subreq, netfs_sreq_trace_submit); switch (source) { case NETFS_FILL_WITH_ZEROES: netfs_fill_with_zeroes(rreq, subreq); @@ -586,6 +606,9 @@ static void netfs_rreq_expand(struct netfs_read_request *rreq, readahead_expand(ractl, rreq->start, rreq->len); rreq->start = readahead_pos(ractl); rreq->len = readahead_length(ractl); + + trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), + netfs_read_trace_expanded); } } @@ -627,6 +650,9 @@ void netfs_readahead(struct readahead_control *ractl, rreq->start = readahead_pos(ractl); rreq->len = readahead_length(ractl); + trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), + netfs_read_trace_readahead); + netfs_rreq_expand(rreq, ractl); atomic_set(&rreq->nr_rd_ops, 1); @@ -690,6 +716,8 @@ int netfs_readpage(struct file *file, rreq->start = page->index * PAGE_SIZE; rreq->len = thp_size(page); + trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage); + netfs_get_read_request(rreq); atomic_set(&rreq->nr_rd_ops, 1); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index d5453f625610..156a19f47213 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -95,6 +95,8 @@ struct netfs_read_request { struct address_space *mapping; /* The mapping being accessed */ struct list_head subrequests; /* Requests to fetch I/O from disk or net */ void *netfs_priv; /* Private data for the netfs */ + unsigned int debug_id; + unsigned int cookie_debug_id; atomic_t nr_rd_ops; /* Number of read ops in progress */ size_t submitted; /* Amount submitted for I/O so far */ size_t len; /* Length of the request */ diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h new file mode 100644 index 000000000000..12ad382764c5 --- /dev/null +++ b/include/trace/events/netfs.h @@ -0,0 +1,199 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Network filesystem support module tracepoints + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM netfs + +#if !defined(_TRACE_NETFS_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_NETFS_H + +#include + +/* + * Define enums for tracing information. + */ +#ifndef __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY +#define __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY + +enum netfs_read_trace { + netfs_read_trace_expanded, + netfs_read_trace_readahead, + netfs_read_trace_readpage, +}; + +enum netfs_rreq_trace { + netfs_rreq_trace_assess, + netfs_rreq_trace_done, + netfs_rreq_trace_free, + netfs_rreq_trace_resubmit, + netfs_rreq_trace_unlock, + netfs_rreq_trace_unmark, + netfs_rreq_trace_write, +}; + +enum netfs_sreq_trace { + netfs_sreq_trace_download_instead, + netfs_sreq_trace_free, + netfs_sreq_trace_prepare, + netfs_sreq_trace_resubmit_short, + netfs_sreq_trace_submit, + netfs_sreq_trace_terminated, + netfs_sreq_trace_write, + netfs_sreq_trace_write_term, +}; + +#endif + +#define netfs_read_traces \ + EM(netfs_read_trace_expanded, "EXPANDED ") \ + EM(netfs_read_trace_readahead, "READAHEAD") \ + E_(netfs_read_trace_readpage, "READPAGE ") + +#define netfs_rreq_traces \ + EM(netfs_rreq_trace_assess, "ASSESS") \ + EM(netfs_rreq_trace_done, "DONE ") \ + EM(netfs_rreq_trace_free, "FREE ") \ + EM(netfs_rreq_trace_resubmit, "RESUBM") \ + EM(netfs_rreq_trace_unlock, "UNLOCK") \ + EM(netfs_rreq_trace_unmark, "UNMARK") \ + E_(netfs_rreq_trace_write, "WRITE ") + +#define netfs_sreq_sources \ + EM(NETFS_FILL_WITH_ZEROES, "ZERO") \ + EM(NETFS_DOWNLOAD_FROM_SERVER, "DOWN") \ + EM(NETFS_READ_FROM_CACHE, "READ") \ + E_(NETFS_INVALID_READ, "INVL") \ + +#define netfs_sreq_traces \ + EM(netfs_sreq_trace_download_instead, "RDOWN") \ + EM(netfs_sreq_trace_free, "FREE ") \ + EM(netfs_sreq_trace_prepare, "PREP ") \ + EM(netfs_sreq_trace_resubmit_short, "SHORT") \ + EM(netfs_sreq_trace_submit, "SUBMT") \ + EM(netfs_sreq_trace_terminated, "TERM ") \ + EM(netfs_sreq_trace_write, "WRITE") \ + E_(netfs_sreq_trace_write_term, "WTERM") + + +/* + * Export enum symbols via userspace. + */ +#undef EM +#undef E_ +#define EM(a, b) TRACE_DEFINE_ENUM(a); +#define E_(a, b) TRACE_DEFINE_ENUM(a); + +netfs_read_traces; +netfs_rreq_traces; +netfs_sreq_sources; +netfs_sreq_traces; + +/* + * Now redefine the EM() and E_() macros to map the enums to the strings that + * will be printed in the output. + */ +#undef EM +#undef E_ +#define EM(a, b) { a, b }, +#define E_(a, b) { a, b } + +TRACE_EVENT(netfs_read, + TP_PROTO(struct netfs_read_request *rreq, + loff_t start, size_t len, + enum netfs_read_trace what), + + TP_ARGS(rreq, start, len, what), + + TP_STRUCT__entry( + __field(unsigned int, rreq ) + __field(unsigned int, cookie ) + __field(loff_t, start ) + __field(size_t, len ) + __field(enum netfs_read_trace, what ) + ), + + TP_fast_assign( + __entry->rreq = rreq->debug_id; + __entry->cookie = rreq->cookie_debug_id; + __entry->start = start; + __entry->len = len; + __entry->what = what; + ), + + TP_printk("R=%08x %s c=%08x s=%llx %zx", + __entry->rreq, + __print_symbolic(__entry->what, netfs_read_traces), + __entry->cookie, + __entry->start, __entry->len) + ); + +TRACE_EVENT(netfs_rreq, + TP_PROTO(struct netfs_read_request *rreq, + enum netfs_rreq_trace what), + + TP_ARGS(rreq, what), + + TP_STRUCT__entry( + __field(unsigned int, rreq ) + __field(unsigned short, flags ) + __field(enum netfs_rreq_trace, what ) + ), + + TP_fast_assign( + __entry->rreq = rreq->debug_id; + __entry->flags = rreq->flags; + __entry->what = what; + ), + + TP_printk("R=%08x %s f=%02x", + __entry->rreq, + __print_symbolic(__entry->what, netfs_rreq_traces), + __entry->flags) + ); + +TRACE_EVENT(netfs_sreq, + TP_PROTO(struct netfs_read_subrequest *sreq, + enum netfs_sreq_trace what), + + TP_ARGS(sreq, what), + + TP_STRUCT__entry( + __field(unsigned int, rreq ) + __field(unsigned short, index ) + __field(short, error ) + __field(unsigned short, flags ) + __field(enum netfs_read_source, source ) + __field(enum netfs_sreq_trace, what ) + __field(size_t, len ) + __field(size_t, transferred ) + __field(loff_t, start ) + ), + + TP_fast_assign( + __entry->rreq = sreq->rreq->debug_id; + __entry->index = sreq->debug_index; + __entry->error = sreq->error; + __entry->flags = sreq->flags; + __entry->source = sreq->source; + __entry->what = what; + __entry->len = sreq->len; + __entry->transferred = sreq->transferred; + __entry->start = sreq->start; + ), + + TP_printk("R=%08x[%u] %s %s f=%02x s=%llx %zx/%zx e=%d", + __entry->rreq, __entry->index, + __print_symbolic(__entry->what, netfs_sreq_traces), + __print_symbolic(__entry->source, netfs_sreq_sources), + __entry->flags, + __entry->start, __entry->transferred, __entry->len, + __entry->error) + ); + +#endif /* _TRACE_NETFS_H */ + +/* This part must be outside protection */ +#include From patchwork Wed Mar 10 16:56:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396826 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECDDBC43332 for ; Wed, 10 Mar 2021 16:57:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CE19B64FC5 for ; Wed, 10 Mar 2021 16:57:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233663AbhCJQ5C (ORCPT ); Wed, 10 Mar 2021 11:57:02 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:43771 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233622AbhCJQ4z (ORCPT ); Wed, 10 Mar 2021 11:56:55 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395414; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Rkh5lVhzzHqr4Q+Ob+vOhztGy8XP0TYRI88h8la+vqA=; b=ZdzDshBvXSKAMMVroRw3Flx3h7IexqnoQbbx8AB9w6YkU5TYM7m2VF0R3PxaAsSltssx4Z 5+NJWYZSSbq/kwqlWPYn40bPxJeS+/J3RSZYH28AWVr1r8SE/80Ve25aN0NlMUIDrjKAEu G4p0FNfrv5pGJTEkfDvvJAFyAJdjwjU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-466-KNSDZm0nP0adO2WmHCCFnw-1; Wed, 10 Mar 2021 11:56:52 -0500 X-MC-Unique: KNSDZm0nP0adO2WmHCCFnw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3099B8189E3; Wed, 10 Mar 2021 16:56:50 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 156B460C6E; Wed, 10 Mar 2021 16:56:40 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 10/28] netfs: Gather stats From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:56:39 +0000 Message-ID: <161539539959.286939.6794352576462965914.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Gather statistics from the netfs interface that can be exported through a seqfile. This is intended to be called by a later patch when viewing /proc/fs/fscache/stats. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118139247.1232039.10556850937548511068.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161034669.2537118.2761232524997091480.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340397101.1303470.17581910581108378458.stgit@warthog.procyon.org.uk/ # v3 --- fs/netfs/Kconfig | 15 +++++++++++++ fs/netfs/Makefile | 3 +-- fs/netfs/internal.h | 34 ++++++++++++++++++++++++++++++ fs/netfs/read_helper.c | 23 ++++++++++++++++++++ fs/netfs/stats.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 1 + 6 files changed, 128 insertions(+), 2 deletions(-) create mode 100644 fs/netfs/stats.c diff --git a/fs/netfs/Kconfig b/fs/netfs/Kconfig index 2ebf90e6ca95..578112713703 100644 --- a/fs/netfs/Kconfig +++ b/fs/netfs/Kconfig @@ -6,3 +6,18 @@ config NETFS_SUPPORT This option enables support for network filesystems, including helpers for high-level buffered I/O, abstracting out read segmentation, local caching and transparent huge page support. + +config NETFS_STATS + bool "Gather statistical information on local caching" + depends on NETFS_SUPPORT && PROC_FS + help + This option causes statistical information to be gathered on local + caching and exported through file: + + /proc/fs/fscache/stats + + The gathering of statistics adds a certain amount of overhead to + execution as there are a quite a few stats gathered, and on a + multi-CPU system these may be on cachelines that keep bouncing + between CPUs. On the other hand, the stats are very useful for + debugging purposes. Saying 'Y' here is recommended. diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index 4b4eff2ba369..c15bfc966d96 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -1,6 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 -netfs-y := \ - read_helper.o +netfs-y := read_helper.o stats.o obj-$(CONFIG_NETFS_SUPPORT) := netfs.o diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index ee665c0e7dc8..98b6f4516da1 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -16,8 +16,42 @@ */ extern unsigned int netfs_debug; +/* + * stats.c + */ +#ifdef CONFIG_NETFS_STATS +extern atomic_t netfs_n_rh_readahead; +extern atomic_t netfs_n_rh_readpage; +extern atomic_t netfs_n_rh_rreq; +extern atomic_t netfs_n_rh_sreq; +extern atomic_t netfs_n_rh_download; +extern atomic_t netfs_n_rh_download_done; +extern atomic_t netfs_n_rh_download_failed; +extern atomic_t netfs_n_rh_download_instead; +extern atomic_t netfs_n_rh_read; +extern atomic_t netfs_n_rh_read_done; +extern atomic_t netfs_n_rh_read_failed; +extern atomic_t netfs_n_rh_zero; +extern atomic_t netfs_n_rh_short_read; +extern atomic_t netfs_n_rh_write; +extern atomic_t netfs_n_rh_write_done; +extern atomic_t netfs_n_rh_write_failed; + + +static inline void netfs_stat(atomic_t *stat) +{ + atomic_inc(stat); +} + +static inline void netfs_stat_d(atomic_t *stat) +{ + atomic_dec(stat); +} + +#else #define netfs_stat(x) do {} while(0) #define netfs_stat_d(x) do {} while(0) +#endif /*****************************************************************************/ /* diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index e64d9dad0c36..8b03fcc2f8f1 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -56,6 +56,7 @@ static struct netfs_read_request *netfs_alloc_read_request( refcount_set(&rreq->usage, 1); __set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags); ops->init_rreq(rreq, file); + netfs_stat(&netfs_n_rh_rreq); } return rreq; @@ -88,6 +89,7 @@ static void netfs_free_read_request(struct work_struct *work) rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv); trace_netfs_rreq(rreq, netfs_rreq_trace_free); kfree(rreq); + netfs_stat_d(&netfs_n_rh_rreq); } static void netfs_put_read_request(struct netfs_read_request *rreq, bool was_async) @@ -117,6 +119,7 @@ static struct netfs_read_subrequest *netfs_alloc_subrequest( refcount_set(&subreq->usage, 2); subreq->rreq = rreq; netfs_get_read_request(rreq); + netfs_stat(&netfs_n_rh_sreq); } return subreq; @@ -134,6 +137,7 @@ static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq, trace_netfs_sreq(subreq, netfs_sreq_trace_free); kfree(subreq); + netfs_stat_d(&netfs_n_rh_sreq); netfs_put_read_request(rreq, was_async); } @@ -156,6 +160,7 @@ static void netfs_clear_unread(struct netfs_read_subrequest *subreq) static void netfs_fill_with_zeroes(struct netfs_read_request *rreq, struct netfs_read_subrequest *subreq) { + netfs_stat(&netfs_n_rh_zero); __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); netfs_subreq_terminated(subreq, 0, false); } @@ -179,6 +184,7 @@ static void netfs_fill_with_zeroes(struct netfs_read_request *rreq, static void netfs_read_from_server(struct netfs_read_request *rreq, struct netfs_read_subrequest *subreq) { + netfs_stat(&netfs_n_rh_download); rreq->netfs_ops->issue_op(subreq); } @@ -288,6 +294,7 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq, __clear_bit(NETFS_SREQ_SHORT_READ, &subreq->flags); __set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags); + netfs_stat(&netfs_n_rh_short_read); trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short); netfs_get_read_subrequest(subreq); @@ -319,6 +326,7 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) break; subreq->source = NETFS_DOWNLOAD_FROM_SERVER; subreq->error = 0; + netfs_stat(&netfs_n_rh_download_instead); trace_netfs_sreq(subreq, netfs_sreq_trace_download_instead); netfs_get_read_subrequest(subreq); atomic_inc(&rreq->nr_rd_ops); @@ -414,6 +422,17 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, subreq->debug_index, subreq->start, subreq->flags, transferred_or_error); + switch (subreq->source) { + case NETFS_READ_FROM_CACHE: + netfs_stat(&netfs_n_rh_read_done); + break; + case NETFS_DOWNLOAD_FROM_SERVER: + netfs_stat(&netfs_n_rh_download_done); + break; + default: + break; + } + if (IS_ERR_VALUE(transferred_or_error)) { subreq->error = transferred_or_error; goto failed; @@ -467,8 +486,10 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, failed: if (subreq->source == NETFS_READ_FROM_CACHE) { + netfs_stat(&netfs_n_rh_read_failed); set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); } else { + netfs_stat(&netfs_n_rh_download_failed); set_bit(NETFS_RREQ_FAILED, &rreq->flags); rreq->error = subreq->error; } @@ -650,6 +671,7 @@ void netfs_readahead(struct readahead_control *ractl, rreq->start = readahead_pos(ractl); rreq->len = readahead_length(ractl); + netfs_stat(&netfs_n_rh_readahead); trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), netfs_read_trace_readahead); @@ -716,6 +738,7 @@ int netfs_readpage(struct file *file, rreq->start = page->index * PAGE_SIZE; rreq->len = thp_size(page); + netfs_stat(&netfs_n_rh_readpage); trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage); netfs_get_read_request(rreq); diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c new file mode 100644 index 000000000000..df6ff5718f25 --- /dev/null +++ b/fs/netfs/stats.c @@ -0,0 +1,54 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Netfs support statistics + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include "internal.h" + +atomic_t netfs_n_rh_readahead; +atomic_t netfs_n_rh_readpage; +atomic_t netfs_n_rh_rreq; +atomic_t netfs_n_rh_sreq; +atomic_t netfs_n_rh_download; +atomic_t netfs_n_rh_download_done; +atomic_t netfs_n_rh_download_failed; +atomic_t netfs_n_rh_download_instead; +atomic_t netfs_n_rh_read; +atomic_t netfs_n_rh_read_done; +atomic_t netfs_n_rh_read_failed; +atomic_t netfs_n_rh_zero; +atomic_t netfs_n_rh_short_read; +atomic_t netfs_n_rh_write; +atomic_t netfs_n_rh_write_done; +atomic_t netfs_n_rh_write_failed; + +void netfs_stats_show(struct seq_file *m) +{ + seq_printf(m, "RdHelp : RA=%u RP=%u rr=%u sr=%u\n", + atomic_read(&netfs_n_rh_readahead), + atomic_read(&netfs_n_rh_readpage), + atomic_read(&netfs_n_rh_rreq), + atomic_read(&netfs_n_rh_sreq)); + seq_printf(m, "RdHelp : ZR=%u sh=%u\n", + atomic_read(&netfs_n_rh_zero), + atomic_read(&netfs_n_rh_short_read)); + seq_printf(m, "RdHelp : DL=%u ds=%u df=%u di=%u\n", + atomic_read(&netfs_n_rh_download), + atomic_read(&netfs_n_rh_download_done), + atomic_read(&netfs_n_rh_download_failed), + atomic_read(&netfs_n_rh_download_instead)); + seq_printf(m, "RdHelp : RD=%u rs=%u rf=%u\n", + atomic_read(&netfs_n_rh_read), + atomic_read(&netfs_n_rh_read_done), + atomic_read(&netfs_n_rh_read_failed)); + seq_printf(m, "RdHelp : WR=%u ws=%u wf=%u\n", + atomic_read(&netfs_n_rh_write), + atomic_read(&netfs_n_rh_write_done), + atomic_read(&netfs_n_rh_write_failed)); +} +EXPORT_SYMBOL(netfs_stats_show); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 156a19f47213..f2a0c2dc425b 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -138,5 +138,6 @@ extern int netfs_readpage(struct file *, void *); extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); +extern void netfs_stats_show(struct seq_file *); #endif /* _LINUX_NETFS_H */ From patchwork Wed Mar 10 16:56:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396825 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD32CC433E6 for ; Wed, 10 Mar 2021 16:58:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8F83164FD2 for ; Wed, 10 Mar 2021 16:58:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233603AbhCJQ5c (ORCPT ); Wed, 10 Mar 2021 11:57:32 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:45018 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233550AbhCJQ5I (ORCPT ); Wed, 10 Mar 2021 11:57:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395428; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Anevj5TH4/GDh/44apd4qtX1bFuDOsvHHbarWQ52AHc=; b=Ytl9LUzQVJPkt/n3sWPv4aQ3MeQDW4iBEkdtJ8LYJu2MeJQcj4DMfrJRLWlJiUpX8ngC81 xlvSiglB7svpUyKNJsfxlHqFVQ9n2WH+G0O0oVKxHNLULUhZkBQoG4vCvzLXGGDhbkm6Cr IVr9Lxr0ilj3el5wl2JKJY+8pAo4fWQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-398-CX8fvXziMZWFg5MnErcmKg-1; Wed, 10 Mar 2021 11:57:05 -0500 X-MC-Unique: CX8fvXziMZWFg5MnErcmKg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7C26DE5762; Wed, 10 Mar 2021 16:57:03 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 295F41F076; Wed, 10 Mar 2021 16:56:56 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 11/28] netfs: Add write_begin helper From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:56:55 +0000 Message-ID: <161539541541.286939.1889738674057013729.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Add a helper to do the pre-reading work for the netfs write_begin address space op. Changes - Added flag to netfs_subreq_terminated() to indicate that the caller may have been running async and stuff that might sleep needs punting to a workqueue (can't use in_softirq()[1]). Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588543960.3465195.2792938973035886168.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118140165.1232039.16418853874312234477.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161035539.2537118.15674887534950908530.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340398368.1303470.11242918276563276090.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [1] --- fs/netfs/internal.h | 2 + fs/netfs/read_helper.c | 165 ++++++++++++++++++++++++++++++++++++++++++ fs/netfs/stats.c | 11 ++- include/linux/netfs.h | 8 ++ include/trace/events/netfs.h | 4 + 5 files changed, 186 insertions(+), 4 deletions(-) diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index 98b6f4516da1..b7f2c4459f33 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -34,8 +34,10 @@ extern atomic_t netfs_n_rh_read_failed; extern atomic_t netfs_n_rh_zero; extern atomic_t netfs_n_rh_short_read; extern atomic_t netfs_n_rh_write; +extern atomic_t netfs_n_rh_write_begin; extern atomic_t netfs_n_rh_write_done; extern atomic_t netfs_n_rh_write_failed; +extern atomic_t netfs_n_rh_write_zskip; static inline void netfs_stat(atomic_t *stat) diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index 8b03fcc2f8f1..ce3a2f5844d1 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -766,3 +766,168 @@ int netfs_readpage(struct file *file, return ret; } EXPORT_SYMBOL(netfs_readpage); + +static void netfs_clear_thp(struct page *page) +{ + unsigned int i; + + for (i = 0; i < thp_nr_pages(page); i++) + clear_highpage(page + i); +} + +/** + * netfs_write_begin - Helper to prepare for writing + * @file: The file to read from + * @mapping: The mapping to read from + * @pos: File position at which the write will begin + * @len: The length of the write in this page + * @flags: AOP_* flags + * @_page: Where to put the resultant page + * @_fsdata: Place for the netfs to store a cookie + * @ops: The network filesystem's operations for the helper to use + * @netfs_priv: Private netfs data to be retained in the request + * + * Pre-read data for a write-begin request by drawing data from the cache if + * possible, or the netfs if not. Space beyond the EOF is zero-filled. + * Multiple I/O requests from different sources will get munged together. If + * necessary, the readahead window can be expanded in either direction to a + * more convenient alighment for RPC efficiency or to make storage in the cache + * feasible. + * + * The calling netfs must provide a table of operations, only one of which, + * issue_op, is mandatory. + * + * The check_write_begin() operation can be provided to check for and flush + * conflicting writes once the page is grabbed and locked. It is passed a + * pointer to the fsdata cookie that gets returned to the VM to be passed to + * write_end. It is permitted to sleep. It should return 0 if the request + * should go ahead; unlock the page and return -EAGAIN to cause the page to be + * regot; or return an error. + * + * This is usable whether or not caching is enabled. + */ +int netfs_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned int len, unsigned int flags, + struct page **_page, void **_fsdata, + const struct netfs_read_request_ops *ops, + void *netfs_priv) +{ + struct netfs_read_request *rreq; + struct page *page, *xpage; + struct inode *inode = file_inode(file); + unsigned int debug_index = 0; + pgoff_t index = pos >> PAGE_SHIFT; + int pos_in_page = pos & ~PAGE_MASK; + loff_t size; + int ret; + + struct readahead_control ractl = { + .file = file, + .mapping = mapping, + ._index = index, + ._nr_pages = 0, + }; + +retry: + page = grab_cache_page_write_begin(mapping, index, 0); + if (!page) + return -ENOMEM; + + if (ops->check_write_begin) { + /* Allow the netfs (eg. ceph) to flush conflicts. */ + ret = ops->check_write_begin(file, pos, len, page, _fsdata); + if (ret < 0) { + if (ret == -EAGAIN) + goto retry; + goto error; + } + } + + if (PageUptodate(page)) + goto have_page; + + /* If the page is beyond the EOF, we want to clear it - unless it's + * within the cache granule containing the EOF, in which case we need + * to preload the granule. + */ + size = i_size_read(inode); + if (!ops->is_cache_enabled(inode) && + ((pos_in_page == 0 && len == thp_size(page)) || + (pos >= size) || + (pos_in_page == 0 && (pos + len) >= size))) { + netfs_clear_thp(page); + SetPageUptodate(page); + netfs_stat(&netfs_n_rh_write_zskip); + goto have_page_no_wait; + } + + ret = -ENOMEM; + rreq = netfs_alloc_read_request(ops, netfs_priv, file); + if (!rreq) + goto error; + rreq->mapping = page->mapping; + rreq->start = page->index * PAGE_SIZE; + rreq->len = thp_size(page); + rreq->no_unlock_page = page->index; + __set_bit(NETFS_RREQ_NO_UNLOCK_PAGE, &rreq->flags); + netfs_priv = NULL; + + netfs_stat(&netfs_n_rh_write_begin); + trace_netfs_read(rreq, pos, len, netfs_read_trace_write_begin); + + /* Expand the request to meet caching requirements and download + * preferences. + */ + ractl._nr_pages = thp_nr_pages(page); + netfs_rreq_expand(rreq, &ractl); + netfs_get_read_request(rreq); + + /* We hold the page locks, so we can drop the references */ + while ((xpage = readahead_page(&ractl))) + if (xpage != page) + put_page(xpage); + + atomic_set(&rreq->nr_rd_ops, 1); + do { + if (!netfs_rreq_submit_slice(rreq, &debug_index)) + break; + + } while (rreq->submitted < rreq->len); + + /* Keep nr_rd_ops incremented so that the ref always belongs to us, and + * the service code isn't punted off to a random thread pool to + * process. + */ + for (;;) { + wait_var_event(&rreq->nr_rd_ops, atomic_read(&rreq->nr_rd_ops) == 1); + netfs_rreq_assess(rreq, false); + if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags)) + break; + cond_resched(); + } + + ret = rreq->error; + if (ret == 0 && rreq->submitted < rreq->len) + ret = -EIO; + netfs_put_read_request(rreq, false); + if (ret < 0) + goto error; + +have_page: + wait_on_page_fscache(page); +have_page_no_wait: + if (netfs_priv) + ops->cleanup(netfs_priv, mapping); + *_page = page; + _leave(" = 0"); + return 0; + +error: + unlock_page(page); + put_page(page); + if (netfs_priv) + ops->cleanup(netfs_priv, mapping); + _leave(" = %d", ret); + return ret; +} +EXPORT_SYMBOL(netfs_write_begin); diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index df6ff5718f25..9ae538c85378 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -24,19 +24,24 @@ atomic_t netfs_n_rh_read_failed; atomic_t netfs_n_rh_zero; atomic_t netfs_n_rh_short_read; atomic_t netfs_n_rh_write; +atomic_t netfs_n_rh_write_begin; atomic_t netfs_n_rh_write_done; atomic_t netfs_n_rh_write_failed; +atomic_t netfs_n_rh_write_zskip; void netfs_stats_show(struct seq_file *m) { - seq_printf(m, "RdHelp : RA=%u RP=%u rr=%u sr=%u\n", + seq_printf(m, "RdHelp : RA=%u RP=%u WB=%u WBZ=%u rr=%u sr=%u\n", atomic_read(&netfs_n_rh_readahead), atomic_read(&netfs_n_rh_readpage), + atomic_read(&netfs_n_rh_write_begin), + atomic_read(&netfs_n_rh_write_zskip), atomic_read(&netfs_n_rh_rreq), atomic_read(&netfs_n_rh_sreq)); - seq_printf(m, "RdHelp : ZR=%u sh=%u\n", + seq_printf(m, "RdHelp : ZR=%u sh=%u sk=%u\n", atomic_read(&netfs_n_rh_zero), - atomic_read(&netfs_n_rh_short_read)); + atomic_read(&netfs_n_rh_short_read), + atomic_read(&netfs_n_rh_write_zskip)); seq_printf(m, "RdHelp : DL=%u ds=%u df=%u di=%u\n", atomic_read(&netfs_n_rh_download), atomic_read(&netfs_n_rh_download_done), diff --git a/include/linux/netfs.h b/include/linux/netfs.h index f2a0c2dc425b..9e1a1c48a45a 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -119,11 +119,14 @@ struct netfs_read_request { * Operations the network filesystem can/must provide to the helpers. */ struct netfs_read_request_ops { + bool (*is_cache_enabled)(struct inode *inode); void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); void (*expand_readahead)(struct netfs_read_request *rreq); bool (*clamp_length)(struct netfs_read_subrequest *subreq); void (*issue_op)(struct netfs_read_subrequest *subreq); bool (*is_still_valid)(struct netfs_read_request *rreq); + int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, + struct page *page, void **_fsdata); void (*done)(struct netfs_read_request *rreq); void (*cleanup)(struct address_space *mapping, void *netfs_priv); }; @@ -136,6 +139,11 @@ extern int netfs_readpage(struct file *, struct page *, const struct netfs_read_request_ops *, void *); +extern int netfs_write_begin(struct file *, struct address_space *, + loff_t, unsigned int, unsigned int, struct page **, + void **, + const struct netfs_read_request_ops *, + void *); extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); extern void netfs_stats_show(struct seq_file *); diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index 12ad382764c5..a2bf6cd84bd4 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -22,6 +22,7 @@ enum netfs_read_trace { netfs_read_trace_expanded, netfs_read_trace_readahead, netfs_read_trace_readpage, + netfs_read_trace_write_begin, }; enum netfs_rreq_trace { @@ -50,7 +51,8 @@ enum netfs_sreq_trace { #define netfs_read_traces \ EM(netfs_read_trace_expanded, "EXPANDED ") \ EM(netfs_read_trace_readahead, "READAHEAD") \ - E_(netfs_read_trace_readpage, "READPAGE ") + EM(netfs_read_trace_readpage, "READPAGE ") \ + E_(netfs_read_trace_write_begin, "WRITEBEGN") #define netfs_rreq_traces \ EM(netfs_rreq_trace_assess, "ASSESS") \ From patchwork Wed Mar 10 16:57:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 397627 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73EFEC432C3 for ; Wed, 10 Mar 2021 16:58:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5A17764FC8 for ; Wed, 10 Mar 2021 16:58:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233624AbhCJQ5g (ORCPT ); Wed, 10 Mar 2021 11:57:36 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:43581 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232435AbhCJQ51 (ORCPT ); Wed, 10 Mar 2021 11:57:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395447; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=27pFUuTN4WLlbSd2h0Sj0hHiIC29hLO1F3AEYyoAjRk=; b=BR9hNE5fE40VP1ayOO1JZ4AEamoCFXdzuunL/+Z8SAHV/ah3W8x1J02PJgdtkky8fizW9a +AOQcl3hSh6frTO9iR+nT9BGSkDcXKSIBfYfSL3Ql04/gCxoMxX0dyjsD0NEPbEVPjYNev wKkNqu8RcxK7v3tSHYELevInE2fhs64= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-349-_UQYMPkhOpux-uY6fjCmWQ-1; Wed, 10 Mar 2021 11:57:22 -0500 X-MC-Unique: _UQYMPkhOpux-uY6fjCmWQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0FCB6E5777; Wed, 10 Mar 2021 16:57:19 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id ACB956D024; Wed, 10 Mar 2021 16:57:09 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 12/28] netfs: Define an interface to talk to a cache From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:57:08 +0000 Message-ID: <161539542874.286939.13337898213448136687.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Add an interface to the netfs helper library for reading data from the cache instead of downloading it from the server and support for writing data just downloaded or cleared to the cache. The API passes an iov_iter to the cache read/write routines to indicate the data/buffer to be used. This is done using the ITER_XARRAY type to provide direct access to the netfs inode's pagecache. When the netfs's ->begin_cache_operation() method is called, this must fill in the cache_resources in the netfs_read_request struct, including the netfs_cache_ops used by the helper lib to talk to the cache. The helper lib does not directly access the cache. Changes - Added flag to netfs_subreq_terminated() to indicate that the caller may have been running async and stuff that might sleep needs punting to a workqueue (can't use in_softirq()[1]). - Add missing inc of netfs_n_rh_read stat. - Move initial definition of fscache_begin_read_operation() elsewhere. - Need to call op->begin_cache_operation() from netfs_write_begin(). Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118141321.1232039.8296910406755622458.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161036700.2537118.11170748455436854978.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340399569.1303470.1138884774643385730.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [1] --- fs/netfs/read_helper.c | 241 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 49 ++++++++++ 2 files changed, 289 insertions(+), 1 deletion(-) diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index ce3a2f5844d1..0a7b76c11c98 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -88,6 +88,8 @@ static void netfs_free_read_request(struct work_struct *work) if (rreq->netfs_priv) rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv); trace_netfs_rreq(rreq, netfs_rreq_trace_free); + if (rreq->cache_resources.ops) + rreq->cache_resources.ops->end_operation(&rreq->cache_resources); kfree(rreq); netfs_stat_d(&netfs_n_rh_rreq); } @@ -154,6 +156,34 @@ static void netfs_clear_unread(struct netfs_read_subrequest *subreq) iov_iter_zero(iov_iter_count(&iter), &iter); } +static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error, + bool was_async) +{ + struct netfs_read_subrequest *subreq = priv; + + netfs_subreq_terminated(subreq, transferred_or_error, was_async); +} + +/* + * Issue a read against the cache. + * - Eats the caller's ref on subreq. + */ +static void netfs_read_from_cache(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq, + bool seek_data) +{ + struct netfs_cache_resources *cres = &rreq->cache_resources; + struct iov_iter iter; + + netfs_stat(&netfs_n_rh_read); + iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages, + subreq->start + subreq->transferred, + subreq->len - subreq->transferred); + + cres->ops->read(cres, subreq->start, &iter, seek_data, + netfs_cache_read_terminated, subreq); +} + /* * Fill a subrequest region with zeroes. */ @@ -198,6 +228,144 @@ static void netfs_rreq_completed(struct netfs_read_request *rreq, bool was_async netfs_put_read_request(rreq, was_async); } +/* + * Deal with the completion of writing the data to the cache. We have to clear + * the PG_fscache bits on the pages involved and release the caller's ref. + * + * May be called in softirq mode and we inherit a ref from the caller. + */ +static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq, + bool was_async) +{ + struct netfs_read_subrequest *subreq; + struct page *page; + pgoff_t unlocked = 0; + bool have_unlocked = false; + + rcu_read_lock(); + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + XA_STATE(xas, &rreq->mapping->i_pages, subreq->start / PAGE_SIZE); + + xas_for_each(&xas, page, (subreq->start + subreq->len - 1) / PAGE_SIZE) { + /* We might have multiple writes from the same huge + * page, but we mustn't unlock a page more than once. + */ + if (have_unlocked && page->index <= unlocked) + continue; + unlocked = page->index; + unlock_page_fscache(page); + have_unlocked = true; + } + } + + rcu_read_unlock(); + netfs_rreq_completed(rreq, was_async); +} + +static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error, + bool was_async) +{ + struct netfs_read_subrequest *subreq = priv; + struct netfs_read_request *rreq = subreq->rreq; + + if (IS_ERR_VALUE(transferred_or_error)) { + subreq->error = transferred_or_error; + netfs_stat(&netfs_n_rh_write_failed); + } else { + subreq->error = 0; + netfs_stat(&netfs_n_rh_write_done); + } + + trace_netfs_sreq(subreq, netfs_sreq_trace_write_term); + + /* If we decrement nr_wr_ops to 0, the ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_wr_ops)) + netfs_rreq_unmark_after_write(rreq, was_async); + + netfs_put_subrequest(subreq, was_async); +} + +/* + * Perform any outstanding writes to the cache. We inherit a ref from the + * caller. + */ +static void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq) +{ + struct netfs_cache_resources *cres = &rreq->cache_resources; + struct netfs_read_subrequest *subreq, *next, *p; + struct iov_iter iter; + loff_t pos; + + trace_netfs_rreq(rreq, netfs_rreq_trace_write); + + /* We don't want terminating writes trying to wake us up whilst we're + * still going through the list. + */ + atomic_inc(&rreq->nr_wr_ops); + + list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) { + if (!test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) { + list_del_init(&subreq->rreq_link); + netfs_put_subrequest(subreq, false); + } + } + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + /* Amalgamate adjacent writes */ + pos = round_down(subreq->start, PAGE_SIZE); + if (pos != subreq->start) { + subreq->len += subreq->start - pos; + subreq->start = pos; + } + subreq->len = round_up(subreq->len, PAGE_SIZE); + + while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { + next = list_next_entry(subreq, rreq_link); + if (next->start > subreq->start + subreq->len) + break; + subreq->len += next->len; + subreq->len = round_up(subreq->len, PAGE_SIZE); + list_del_init(&next->rreq_link); + netfs_put_subrequest(next, false); + } + + iov_iter_xarray(&iter, WRITE, &rreq->mapping->i_pages, + subreq->start, subreq->len); + + atomic_inc(&rreq->nr_wr_ops); + netfs_stat(&netfs_n_rh_write); + netfs_get_read_subrequest(subreq); + trace_netfs_sreq(subreq, netfs_sreq_trace_write); + cres->ops->write(cres, subreq->start, &iter, + netfs_rreq_copy_terminated, subreq); + } + + /* If we decrement nr_wr_ops to 0, the usage ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_wr_ops)) + netfs_rreq_unmark_after_write(rreq, false); +} + +static void netfs_rreq_write_to_cache_work(struct work_struct *work) +{ + struct netfs_read_request *rreq = + container_of(work, struct netfs_read_request, work); + + netfs_rreq_do_write_to_cache(rreq); +} + +static void netfs_rreq_write_to_cache(struct netfs_read_request *rreq, + bool was_async) +{ + if (was_async) { + rreq->work.func = netfs_rreq_write_to_cache_work; + if (!queue_work(system_unbound_wq, &rreq->work)) + BUG(); + } else { + netfs_rreq_do_write_to_cache(rreq); + } +} + /* * Unlock the pages in a read operation. We need to set PG_fscache on any * pages we're going to write back before we unlock them. @@ -299,7 +467,10 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq, netfs_get_read_subrequest(subreq); atomic_inc(&rreq->nr_rd_ops); - netfs_read_from_server(rreq, subreq); + if (subreq->source == NETFS_READ_FROM_CACHE) + netfs_read_from_cache(rreq, subreq, true); + else + netfs_read_from_server(rreq, subreq); } /* @@ -344,6 +515,25 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) return false; } +/* + * Check to see if the data read is still valid. + */ +static void netfs_rreq_is_still_valid(struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq; + + if (!rreq->netfs_ops->is_still_valid || + rreq->netfs_ops->is_still_valid(rreq)) + return; + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + if (subreq->source == NETFS_READ_FROM_CACHE) { + subreq->error = -ESTALE; + __set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + } + } +} + /* * Assess the state of a read request and decide what to do next. * @@ -355,6 +545,8 @@ static void netfs_rreq_assess(struct netfs_read_request *rreq, bool was_async) trace_netfs_rreq(rreq, netfs_rreq_trace_assess); again: + netfs_rreq_is_still_valid(rreq); + if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) && test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) { if (netfs_rreq_perform_resubmissions(rreq)) @@ -367,6 +559,9 @@ static void netfs_rreq_assess(struct netfs_read_request *rreq, bool was_async) clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags); wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS); + if (test_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags)) + return netfs_rreq_write_to_cache(rreq, was_async); + netfs_rreq_completed(rreq, was_async); } @@ -501,7 +696,10 @@ static enum netfs_read_source netfs_cache_prepare_read(struct netfs_read_subrequ loff_t i_size) { struct netfs_read_request *rreq = subreq->rreq; + struct netfs_cache_resources *cres = &rreq->cache_resources; + if (cres->ops) + return cres->ops->prepare_read(subreq, i_size); if (subreq->start >= rreq->i_size) return NETFS_FILL_WITH_ZEROES; return NETFS_DOWNLOAD_FROM_SERVER; @@ -592,6 +790,9 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq, case NETFS_DOWNLOAD_FROM_SERVER: netfs_read_from_server(rreq, subreq); break; + case NETFS_READ_FROM_CACHE: + netfs_read_from_cache(rreq, subreq, false); + break; default: BUG(); } @@ -604,9 +805,23 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq, return false; } +static void netfs_cache_expand_readahead(struct netfs_read_request *rreq, + loff_t *_start, size_t *_len, loff_t i_size) +{ + struct netfs_cache_resources *cres = &rreq->cache_resources; + + if (cres->ops && cres->ops->expand_readahead) + cres->ops->expand_readahead(cres, _start, _len, i_size); +} + static void netfs_rreq_expand(struct netfs_read_request *rreq, struct readahead_control *ractl) { + /* Give the cache a chance to change the request parameters. The + * resultant request must contain the original region. + */ + netfs_cache_expand_readahead(rreq, &rreq->start, &rreq->len, rreq->i_size); + /* Give the netfs a chance to change the request parameters. The * resultant request must contain the original region. */ @@ -658,6 +873,7 @@ void netfs_readahead(struct readahead_control *ractl, struct netfs_read_request *rreq; struct page *page; unsigned int debug_index = 0; + int ret; _enter("%lx,%x", readahead_index(ractl), readahead_count(ractl)); @@ -675,6 +891,11 @@ void netfs_readahead(struct readahead_control *ractl, trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), netfs_read_trace_readahead); + if (ops->begin_cache_operation) { + ret = ops->begin_cache_operation(rreq); + if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) + goto cleanup_free; + } netfs_rreq_expand(rreq, ractl); atomic_set(&rreq->nr_rd_ops, 1); @@ -692,6 +913,9 @@ void netfs_readahead(struct readahead_control *ractl, netfs_rreq_assess(rreq, false); return; +cleanup_free: + netfs_put_read_request(rreq, false); + return; cleanup: if (netfs_priv) ops->cleanup(ractl->mapping, netfs_priv); @@ -741,6 +965,14 @@ int netfs_readpage(struct file *file, netfs_stat(&netfs_n_rh_readpage); trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage); + if (ops->begin_cache_operation) { + ret = ops->begin_cache_operation(rreq); + if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) { + unlock_page(page); + goto out; + } + } + netfs_get_read_request(rreq); atomic_set(&rreq->nr_rd_ops, 1); @@ -762,6 +994,7 @@ int netfs_readpage(struct file *file, ret = rreq->error; if (ret == 0 && rreq->submitted < rreq->len) ret = -EIO; +out: netfs_put_read_request(rreq, false); return ret; } @@ -875,6 +1108,12 @@ int netfs_write_begin(struct file *file, struct address_space *mapping, netfs_stat(&netfs_n_rh_write_begin); trace_netfs_read(rreq, pos, len, netfs_read_trace_write_begin); + if (ops->begin_cache_operation) { + ret = ops->begin_cache_operation(rreq); + if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) + goto error; + } + /* Expand the request to meet caching requirements and download * preferences. */ diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 9e1a1c48a45a..9d3fbed4e30a 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -64,6 +64,18 @@ enum netfs_read_source { NETFS_INVALID_READ, } __mode(byte); +typedef void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error, + bool was_async); + +/* + * Resources required to do operations on a cache. + */ +struct netfs_cache_resources { + const struct netfs_cache_ops *ops; + void *cache_priv; + void *cache_priv2; +}; + /* * Descriptor for a single component subrequest. */ @@ -93,11 +105,13 @@ struct netfs_read_request { struct work_struct work; struct inode *inode; /* The file being accessed */ struct address_space *mapping; /* The mapping being accessed */ + struct netfs_cache_resources cache_resources; struct list_head subrequests; /* Requests to fetch I/O from disk or net */ void *netfs_priv; /* Private data for the netfs */ unsigned int debug_id; unsigned int cookie_debug_id; atomic_t nr_rd_ops; /* Number of read ops in progress */ + atomic_t nr_wr_ops; /* Number of write ops in progress */ size_t submitted; /* Amount submitted for I/O so far */ size_t len; /* Length of the request */ short error; /* 0 or error that occurred */ @@ -121,6 +135,7 @@ struct netfs_read_request { struct netfs_read_request_ops { bool (*is_cache_enabled)(struct inode *inode); void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); + int (*begin_cache_operation)(struct netfs_read_request *rreq); void (*expand_readahead)(struct netfs_read_request *rreq); bool (*clamp_length)(struct netfs_read_subrequest *subreq); void (*issue_op)(struct netfs_read_subrequest *subreq); @@ -131,6 +146,40 @@ struct netfs_read_request_ops { void (*cleanup)(struct address_space *mapping, void *netfs_priv); }; +/* + * Table of operations for access to a cache. This is obtained by + * rreq->ops->begin_cache_operation(). + */ +struct netfs_cache_ops { + /* End an operation */ + void (*end_operation)(struct netfs_cache_resources *cres); + + /* Read data from the cache */ + int (*read)(struct netfs_cache_resources *cres, + loff_t start_pos, + struct iov_iter *iter, + bool seek_data, + netfs_io_terminated_t term_func, + void *term_func_priv); + + /* Write data to the cache */ + int (*write)(struct netfs_cache_resources *cres, + loff_t start_pos, + struct iov_iter *iter, + netfs_io_terminated_t term_func, + void *term_func_priv); + + /* Expand readahead request */ + void (*expand_readahead)(struct netfs_cache_resources *cres, + loff_t *_start, size_t *_len, loff_t i_size); + + /* Prepare a read operation, shortening it to a cached/uncached + * boundary as appropriate. + */ + enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq, + loff_t i_size); +}; + struct readahead_control; extern void netfs_readahead(struct readahead_control *, const struct netfs_read_request_ops *, From patchwork Wed Mar 10 16:57:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396824 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19A86C4332B for ; Wed, 10 Mar 2021 16:58:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E7FEB64FFE for ; Wed, 10 Mar 2021 16:58:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233520AbhCJQ6G (ORCPT ); Wed, 10 Mar 2021 11:58:06 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:55852 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233707AbhCJQ5k (ORCPT ); Wed, 10 Mar 2021 11:57:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395459; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tJ1em54X7cNlxYOwt1XkWsi2bN3JlmKyhBUdFCYW68w=; b=HTYcJaDdkTww9iOY4s1j+pQjzqAA8LnkTmI7+Yj4S/LuCFvVMNs23KBVv31BeoAquKww0s tGNYo1tPVdfrlT4lKA/BJD6dZZ50hbEJgyTy+e3e0FZmla/+22qGgD27PAc1CAfqU7wQ0l Mcj2jp9Fwn2PZY7g66Ndfy6tNf8FQQQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-312-lhDkcfi-NXOnfsBkFqHPVA-1; Wed, 10 Mar 2021 11:57:36 -0500 X-MC-Unique: lhDkcfi-NXOnfsBkFqHPVA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E7161108BD07; Wed, 10 Mar 2021 16:57:33 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3C3C460C13; Wed, 10 Mar 2021 16:57:24 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 13/28] netfs: Hold a ref on a page when PG_private_2 is set From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Linus Torvalds , Matthew Wilcox , Linus Torvalds , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:57:24 +0000 Message-ID: <161539544426.286939.4484560053278261236.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Take a reference on a page when PG_private_2 is set and drop it once the bit is unlocked[1]. Reported-by: Linus Torvalds Signed-off-by: David Howells cc: Matthew Wilcox cc: Linus Torvalds cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pUm3ww@mail.gmail.com/ [1] Link: https://lore.kernel.org/r/1331025.1612974993@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/161340400833.1303470.8070750156044982282.stgit@warthog.procyon.org.uk/ # v3 --- fs/netfs/read_helper.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index 0a7b76c11c98..ce11ca4c32e4 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -238,10 +239,13 @@ static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq, bool was_async) { struct netfs_read_subrequest *subreq; + struct pagevec pvec; struct page *page; pgoff_t unlocked = 0; bool have_unlocked = false; + pagevec_init(&pvec); + rcu_read_lock(); list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { @@ -255,6 +259,8 @@ static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq, continue; unlocked = page->index; unlock_page_fscache(page); + if (pagevec_add(&pvec, page) == 0) + pagevec_release(&pvec); have_unlocked = true; } } @@ -413,8 +419,10 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq) pg_failed = true; break; } - if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) + if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) { + get_page(page); SetPageFsCache(page); + } pg_failed |= subreq_failed; if (pgend < iopos + subreq->len) break; From patchwork Wed Mar 10 16:57:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 397626 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 361C7C4332E for ; Wed, 10 Mar 2021 16:58:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0670664FCB for ; Wed, 10 Mar 2021 16:58:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233685AbhCJQ6G (ORCPT ); Wed, 10 Mar 2021 11:58:06 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:57803 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233665AbhCJQ5y (ORCPT ); Wed, 10 Mar 2021 11:57:54 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395474; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=80+xjIQgeuRpjD84x8ZI1c7oI1Pjg+xxd6+Fse2z5/I=; b=Mqk6JXq+a8my+DeHUXmt6qcU58+DaIPJeWN07AtofKCTN3JUZqWpL8UzNVzlQhkuB1q1VQ /p7SGNmrKKq/fq3bXQumBQOMxXV56LVMm7WJubWVlg3R7zMxozRoznuuT16Y2TXnXorFXj tPSgzvgcuK3a29IKVAfbbw+L9rRSW1Q= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-511-S6PscL6aPSWuR-xkF1rnvA-1; Wed, 10 Mar 2021 11:57:48 -0500 X-MC-Unique: S6PscL6aPSWuR-xkF1rnvA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7365B83DD20; Wed, 10 Mar 2021 16:57:46 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2725362467; Wed, 10 Mar 2021 16:57:39 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 14/28] fscache, cachefiles: Add alternate API to use kiocb for read/write to cache From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Christoph Hellwig , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:57:39 +0000 Message-ID: <161539545919.286939.14573472672781434757.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Add an alternate API by which the cache can be accessed through a kiocb, doing async DIO, rather than using the current API that tells the cache where all the pages are. The new API is intended to be used in conjunction with the netfs helper library. A filesystem must pick one or the other and not mix them. Filesystems wanting to use the new API must #define FSCACHE_USE_NEW_IO_API before #including the header. This prevents them from continuing to use the old API at the same time as there are incompatibilities in how the PG_fscache page bit is used. Changes: - Use the vfs_iocb_iter_read/write() helpers[1] - Move initial definition of fscache_begin_read_operation() here. - Remove a commented-out line[2] - Combine ki->term_func calls in cachefiles_read_complete()[2]. - Remove explicit NULL initialiser[2]. - Remove extern on func decl[2]. - Put in param names on func decl[2]. - Remove redundant else[2]. - Fill out the kdoc comment for fscache_begin_read_operation(). - Rename fs/fscache/page2.c to io.c to match later patches. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Christoph Hellwig cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118142558.1232039.17993829899588971439.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161037850.2537118.8819808229350326503.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340402057.1303470.8038373593844486698.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/20210216102614.GA27555@lst.de/ [1] Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [2] --- fs/cachefiles/Makefile | 1 fs/cachefiles/interface.c | 5 - fs/cachefiles/internal.h | 9 + fs/cachefiles/rdwr2.c | 403 +++++++++++++++++++++++++++++++++++++++++ fs/fscache/Kconfig | 1 fs/fscache/Makefile | 1 fs/fscache/internal.h | 4 fs/fscache/io.c | 116 ++++++++++++ fs/fscache/page.c | 2 fs/fscache/stats.c | 1 include/linux/fscache-cache.h | 4 include/linux/fscache.h | 39 ++++ 12 files changed, 583 insertions(+), 3 deletions(-) create mode 100644 fs/cachefiles/rdwr2.c create mode 100644 fs/fscache/io.c diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile index 891dedda5905..ea17b169ea5e 100644 --- a/fs/cachefiles/Makefile +++ b/fs/cachefiles/Makefile @@ -11,6 +11,7 @@ cachefiles-y := \ main.o \ namei.o \ rdwr.o \ + rdwr2.o \ security.o \ xattr.o diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 5efa6a3702c0..da3948fdb615 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -319,8 +319,8 @@ static void cachefiles_drop_object(struct fscache_object *_object) /* * dispose of a reference to an object */ -static void cachefiles_put_object(struct fscache_object *_object, - enum fscache_obj_ref_trace why) +void cachefiles_put_object(struct fscache_object *_object, + enum fscache_obj_ref_trace why) { struct cachefiles_object *object; struct fscache_cache *cache; @@ -568,4 +568,5 @@ const struct fscache_cache_ops cachefiles_cache_ops = { .uncache_page = cachefiles_uncache_page, .dissociate_pages = cachefiles_dissociate_pages, .check_consistency = cachefiles_check_consistency, + .begin_read_operation = cachefiles_begin_read_operation, }; diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index cf9bd6401c2d..4ed83aa5253b 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -150,6 +150,9 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache, */ extern const struct fscache_cache_ops cachefiles_cache_ops; +void cachefiles_put_object(struct fscache_object *_object, + enum fscache_obj_ref_trace why); + /* * key.c */ @@ -217,6 +220,12 @@ extern int cachefiles_allocate_pages(struct fscache_retrieval *, extern int cachefiles_write_page(struct fscache_storage *, struct page *); extern void cachefiles_uncache_page(struct fscache_object *, struct page *); +/* + * rdwr2.c + */ +extern int cachefiles_begin_read_operation(struct netfs_read_request *, + struct fscache_retrieval *); + /* * security.c */ diff --git a/fs/cachefiles/rdwr2.c b/fs/cachefiles/rdwr2.c new file mode 100644 index 000000000000..620959d1e95b --- /dev/null +++ b/fs/cachefiles/rdwr2.c @@ -0,0 +1,403 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* kiocb-using read/write + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include +#include +#include "internal.h" + +struct cachefiles_kiocb { + struct kiocb iocb; + refcount_t ki_refcnt; + loff_t start; + union { + size_t skipped; + size_t len; + }; + netfs_io_terminated_t term_func; + void *term_func_priv; + bool was_async; +}; + +static inline void cachefiles_put_kiocb(struct cachefiles_kiocb *ki) +{ + if (refcount_dec_and_test(&ki->ki_refcnt)) { + fput(ki->iocb.ki_filp); + kfree(ki); + } +} + +/* + * Handle completion of a read from the cache. + */ +static void cachefiles_read_complete(struct kiocb *iocb, long ret, long ret2) +{ + struct cachefiles_kiocb *ki = container_of(iocb, struct cachefiles_kiocb, iocb); + + _enter("%ld,%ld", ret, ret2); + + if (ki->term_func) { + if (ret >= 0) + ret += ki->skipped; + ki->term_func(ki->term_func_priv, ret, ki->was_async); + } + + cachefiles_put_kiocb(ki); +} + +/* + * Initiate a read from the cache. + */ +static int cachefiles_read(struct netfs_cache_resources *cres, + loff_t start_pos, + struct iov_iter *iter, + bool seek_data, + netfs_io_terminated_t term_func, + void *term_func_priv) +{ + struct cachefiles_kiocb *ki; + struct file *file = cres->cache_priv2; + unsigned int old_nofs; + ssize_t ret = -ENOBUFS; + size_t len = iov_iter_count(iter), skipped = 0; + + _enter("%pD,%li,%llx,%zx/%llx", + file, file_inode(file)->i_ino, start_pos, len, + i_size_read(file->f_inode)); + + /* If the caller asked us to seek for data before doing the read, then + * we should do that now. If we find a gap, we fill it with zeros. + */ + if (seek_data) { + loff_t off = start_pos, off2; + + off2 = vfs_llseek(file, off, SEEK_DATA); + if (off2 < 0 && off2 >= (loff_t)-MAX_ERRNO && off2 != -ENXIO) { + skipped = 0; + ret = off2; + goto presubmission_error; + } + + if (off2 == -ENXIO || off2 >= start_pos + len) { + /* The region is beyond the EOF or there's no more data + * in the region, so clear the rest of the buffer and + * return success. + */ + iov_iter_zero(len, iter); + skipped = len; + ret = 0; + goto presubmission_error; + } + + skipped = off2 - off; + iov_iter_zero(skipped, iter); + } + + ret = -ENOBUFS; + ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL); + if (!ki) + goto presubmission_error; + + refcount_set(&ki->ki_refcnt, 2); + ki->iocb.ki_filp = file; + ki->iocb.ki_pos = start_pos + skipped; + ki->iocb.ki_flags = IOCB_DIRECT; + ki->iocb.ki_hint = ki_hint_validate(file_write_hint(file)); + ki->iocb.ki_ioprio = get_current_ioprio(); + ki->skipped = skipped; + ki->term_func = term_func; + ki->term_func_priv = term_func_priv; + ki->was_async = true; + + if (ki->term_func) + ki->iocb.ki_complete = cachefiles_read_complete; + + get_file(ki->iocb.ki_filp); + + old_nofs = memalloc_nofs_save(); + ret = vfs_iocb_iter_read(file, &ki->iocb, iter); + memalloc_nofs_restore(old_nofs); + switch (ret) { + case -EIOCBQUEUED: + goto in_progress; + + case -ERESTARTSYS: + case -ERESTARTNOINTR: + case -ERESTARTNOHAND: + case -ERESTART_RESTARTBLOCK: + /* There's no easy way to restart the syscall since other AIO's + * may be already running. Just fail this IO with EINTR. + */ + ret = -EINTR; + fallthrough; + default: + ki->was_async = false; + cachefiles_read_complete(&ki->iocb, ret, 0); + if (ret > 0) + ret = 0; + break; + } + +in_progress: + cachefiles_put_kiocb(ki); + _leave(" = %zd", ret); + return ret; + +presubmission_error: + if (term_func) + term_func(term_func_priv, ret < 0 ? ret : skipped, false); + return ret; +} + +/* + * Handle completion of a write to the cache. + */ +static void cachefiles_write_complete(struct kiocb *iocb, long ret, long ret2) +{ + struct cachefiles_kiocb *ki = container_of(iocb, struct cachefiles_kiocb, iocb); + struct inode *inode = file_inode(ki->iocb.ki_filp); + + _enter("%ld,%ld", ret, ret2); + + /* Tell lockdep we inherited freeze protection from submission thread */ + __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE); + __sb_end_write(inode->i_sb, SB_FREEZE_WRITE); + + if (ki->term_func) + ki->term_func(ki->term_func_priv, ret, ki->was_async); + + cachefiles_put_kiocb(ki); +} + +/* + * Initiate a write to the cache. + */ +static int cachefiles_write(struct netfs_cache_resources *cres, + loff_t start_pos, + struct iov_iter *iter, + netfs_io_terminated_t term_func, + void *term_func_priv) +{ + struct cachefiles_kiocb *ki; + struct inode *inode; + struct file *file = cres->cache_priv2; + unsigned int old_nofs; + ssize_t ret = -ENOBUFS; + size_t len = iov_iter_count(iter); + + _enter("%pD,%li,%llx,%zx/%llx", + file, file_inode(file)->i_ino, start_pos, len, + i_size_read(file->f_inode)); + + ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL); + if (!ki) + goto presubmission_error; + + refcount_set(&ki->ki_refcnt, 2); + ki->iocb.ki_filp = file; + ki->iocb.ki_pos = start_pos; + ki->iocb.ki_flags = IOCB_DIRECT | IOCB_WRITE; + ki->iocb.ki_hint = ki_hint_validate(file_write_hint(file)); + ki->iocb.ki_ioprio = get_current_ioprio(); + ki->start = start_pos; + ki->len = len; + ki->term_func = term_func; + ki->term_func_priv = term_func_priv; + ki->was_async = true; + + if (ki->term_func) + ki->iocb.ki_complete = cachefiles_write_complete; + + /* Open-code file_start_write here to grab freeze protection, which + * will be released by another thread in aio_complete_rw(). Fool + * lockdep by telling it the lock got released so that it doesn't + * complain about the held lock when we return to userspace. + */ + inode = file_inode(file); + __sb_start_write(inode->i_sb, SB_FREEZE_WRITE); + __sb_writers_release(inode->i_sb, SB_FREEZE_WRITE); + + get_file(ki->iocb.ki_filp); + + old_nofs = memalloc_nofs_save(); + ret = vfs_iocb_iter_write(file, &ki->iocb, iter); + memalloc_nofs_restore(old_nofs); + switch (ret) { + case -EIOCBQUEUED: + goto in_progress; + + case -ERESTARTSYS: + case -ERESTARTNOINTR: + case -ERESTARTNOHAND: + case -ERESTART_RESTARTBLOCK: + /* There's no easy way to restart the syscall since other AIO's + * may be already running. Just fail this IO with EINTR. + */ + ret = -EINTR; + fallthrough; + default: + ki->was_async = false; + cachefiles_write_complete(&ki->iocb, ret, 0); + if (ret > 0) + ret = 0; + break; + } + +in_progress: + cachefiles_put_kiocb(ki); + _leave(" = %zd", ret); + return ret; + +presubmission_error: + if (term_func) + term_func(term_func_priv, -ENOMEM, false); + return -ENOMEM; +} + +/* + * Prepare a read operation, shortening it to a cached/uncached + * boundary as appropriate. + */ +static enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subrequest *subreq, + loff_t i_size) +{ + struct fscache_retrieval *op = subreq->rreq->cache_resources.cache_priv; + struct cachefiles_object *object; + struct cachefiles_cache *cache; + const struct cred *saved_cred; + struct file *file = subreq->rreq->cache_resources.cache_priv2; + loff_t off, to; + + _enter("%zx @%llx/%llx", subreq->len, subreq->start, i_size); + + object = container_of(op->op.object, + struct cachefiles_object, fscache); + cache = container_of(object->fscache.cache, + struct cachefiles_cache, cache); + + if (!file) + goto cache_fail_nosec; + + if (subreq->start >= i_size) + return NETFS_FILL_WITH_ZEROES; + + cachefiles_begin_secure(cache, &saved_cred); + + off = vfs_llseek(file, subreq->start, SEEK_DATA); + if (off < 0 && off >= (loff_t)-MAX_ERRNO) { + if (off == (loff_t)-ENXIO) + goto download_and_store; + goto cache_fail; + } + + if (off >= subreq->start + subreq->len) + goto download_and_store; + + if (off > subreq->start) { + off = round_up(off, cache->bsize); + subreq->len = off - subreq->start; + goto download_and_store; + } + + to = vfs_llseek(file, subreq->start, SEEK_HOLE); + if (to < 0 && to >= (loff_t)-MAX_ERRNO) + goto cache_fail; + + if (to < subreq->start + subreq->len) { + if (subreq->start + subreq->len >= i_size) + to = round_up(to, cache->bsize); + else + to = round_down(to, cache->bsize); + subreq->len = to - subreq->start; + } + + cachefiles_end_secure(cache, saved_cred); + return NETFS_READ_FROM_CACHE; + +download_and_store: + if (cachefiles_has_space(cache, 0, (subreq->len + PAGE_SIZE - 1) / PAGE_SIZE) == 0) + __set_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags); +cache_fail: + cachefiles_end_secure(cache, saved_cred); +cache_fail_nosec: + return NETFS_DOWNLOAD_FROM_SERVER; +} + +/* + * Clean up an operation. + */ +static void cachefiles_end_operation(struct netfs_cache_resources *cres) +{ + struct fscache_retrieval *op = cres->cache_priv; + struct file *file = cres->cache_priv2; + + _enter(""); + + if (file) + fput(file); + if (op) { + fscache_op_complete(&op->op, false); + fscache_put_retrieval(op); + } + + _leave(""); +} + +static const struct netfs_cache_ops cachefiles_netfs_cache_ops = { + .end_operation = cachefiles_end_operation, + .read = cachefiles_read, + .write = cachefiles_write, + .prepare_read = cachefiles_prepare_read, +}; + +/* + * Open the cache file when beginning a cache operation. + */ +int cachefiles_begin_read_operation(struct netfs_read_request *rreq, + struct fscache_retrieval *op) +{ + struct cachefiles_object *object; + struct cachefiles_cache *cache; + struct path path; + struct file *file; + + _enter(""); + + object = container_of(op->op.object, + struct cachefiles_object, fscache); + cache = container_of(object->fscache.cache, + struct cachefiles_cache, cache); + + path.mnt = cache->mnt; + path.dentry = object->backer; + file = open_with_fake_path(&path, O_RDWR | O_LARGEFILE | O_DIRECT, + d_inode(object->backer), cache->cache_cred); + if (IS_ERR(file)) + return PTR_ERR(file); + if (!S_ISREG(file_inode(file)->i_mode)) + goto error_file; + if (unlikely(!file->f_op->read_iter) || + unlikely(!file->f_op->write_iter)) { + pr_notice("Cache does not support read_iter and write_iter\n"); + goto error_file; + } + + fscache_get_retrieval(op); + rreq->cache_resources.cache_priv = op; + rreq->cache_resources.cache_priv2 = file; + rreq->cache_resources.ops = &cachefiles_netfs_cache_ops; + rreq->cookie_debug_id = object->fscache.debug_id; + _leave(""); + return 0; + +error_file: + fput(file); + return -EIO; +} diff --git a/fs/fscache/Kconfig b/fs/fscache/Kconfig index 5e796e6c38e5..427efa73b9bd 100644 --- a/fs/fscache/Kconfig +++ b/fs/fscache/Kconfig @@ -2,6 +2,7 @@ config FSCACHE tristate "General filesystem local caching manager" + select NETFS_SUPPORT help This option enables a generic filesystem caching manager that can be used by various network and other filesystems to cache data locally. diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile index 79e08e05ef84..3b2ffa93ac18 100644 --- a/fs/fscache/Makefile +++ b/fs/fscache/Makefile @@ -7,6 +7,7 @@ fscache-y := \ cache.o \ cookie.o \ fsdef.o \ + io.o \ main.o \ netfs.o \ object.o \ diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h index 08e91efbce53..c483863b740a 100644 --- a/fs/fscache/internal.h +++ b/fs/fscache/internal.h @@ -142,6 +142,10 @@ extern int fscache_wait_for_operation_activation(struct fscache_object *, atomic_t *, atomic_t *); extern void fscache_invalidate_writes(struct fscache_cookie *); +struct fscache_retrieval *fscache_alloc_retrieval(struct fscache_cookie *cookie, + struct address_space *mapping, + fscache_rw_complete_t end_io_func, + void *context); /* * proc.c diff --git a/fs/fscache/io.c b/fs/fscache/io.c new file mode 100644 index 000000000000..8ecc1141802f --- /dev/null +++ b/fs/fscache/io.c @@ -0,0 +1,116 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Cache data I/O routines + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#define FSCACHE_DEBUG_LEVEL PAGE +#include +#define FSCACHE_USE_NEW_IO_API +#include +#include +#include +#include "internal.h" + +/* + * Start a cache read operation. + * - we return: + * -ENOMEM - out of memory, some pages may be being read + * -ERESTARTSYS - interrupted, some pages may be being read + * -ENOBUFS - no backing object or space available in which to cache any + * pages not being read + * -ENODATA - no data available in the backing object for some or all of + * the pages + * 0 - dispatched a read on all pages + */ +int __fscache_begin_read_operation(struct netfs_read_request *rreq, + struct fscache_cookie *cookie) +{ + struct fscache_retrieval *op; + struct fscache_object *object; + bool wake_cookie = false; + int ret; + + _enter("rr=%08x", rreq->debug_id); + + fscache_stat(&fscache_n_retrievals); + + if (hlist_empty(&cookie->backing_objects)) + goto nobufs; + + if (test_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags)) { + _leave(" = -ENOBUFS [invalidating]"); + return -ENOBUFS; + } + + ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX); + + if (fscache_wait_for_deferred_lookup(cookie) < 0) + return -ERESTARTSYS; + + op = fscache_alloc_retrieval(cookie, NULL, NULL, NULL); + if (!op) + return -ENOMEM; + trace_fscache_page_op(cookie, NULL, &op->op, fscache_page_op_retr_multi); + + spin_lock(&cookie->lock); + + if (!fscache_cookie_enabled(cookie) || + hlist_empty(&cookie->backing_objects)) + goto nobufs_unlock; + object = hlist_entry(cookie->backing_objects.first, + struct fscache_object, cookie_link); + + __fscache_use_cookie(cookie); + atomic_inc(&object->n_reads); + __set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags); + + if (fscache_submit_op(object, &op->op) < 0) + goto nobufs_unlock_dec; + spin_unlock(&cookie->lock); + + fscache_stat(&fscache_n_retrieval_ops); + + /* we wait for the operation to become active, and then process it + * *here*, in this thread, and not in the thread pool */ + ret = fscache_wait_for_operation_activation( + object, &op->op, + __fscache_stat(&fscache_n_retrieval_op_waits), + __fscache_stat(&fscache_n_retrievals_object_dead)); + if (ret < 0) + goto error; + + /* ask the cache to honour the operation */ + ret = object->cache->ops->begin_read_operation(rreq, op); + +error: + if (ret == -ENOMEM) + fscache_stat(&fscache_n_retrievals_nomem); + else if (ret == -ERESTARTSYS) + fscache_stat(&fscache_n_retrievals_intr); + else if (ret == -ENODATA) + fscache_stat(&fscache_n_retrievals_nodata); + else if (ret < 0) + fscache_stat(&fscache_n_retrievals_nobufs); + else + fscache_stat(&fscache_n_retrievals_ok); + + fscache_put_retrieval(op); + _leave(" = %d", ret); + return ret; + +nobufs_unlock_dec: + atomic_dec(&object->n_reads); + wake_cookie = __fscache_unuse_cookie(cookie); +nobufs_unlock: + spin_unlock(&cookie->lock); + fscache_put_retrieval(op); + if (wake_cookie) + __fscache_wake_unused_cookie(cookie); +nobufs: + fscache_stat(&fscache_n_retrievals_nobufs); + _leave(" = -ENOBUFS"); + return -ENOBUFS; +} +EXPORT_SYMBOL(__fscache_begin_read_operation); diff --git a/fs/fscache/page.c b/fs/fscache/page.c index 26af6fdf1538..991b0a871744 100644 --- a/fs/fscache/page.c +++ b/fs/fscache/page.c @@ -299,7 +299,7 @@ static void fscache_release_retrieval_op(struct fscache_operation *_op) /* * allocate a retrieval op */ -static struct fscache_retrieval *fscache_alloc_retrieval( +struct fscache_retrieval *fscache_alloc_retrieval( struct fscache_cookie *cookie, struct address_space *mapping, fscache_rw_complete_t end_io_func, diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c index a5aa93ece8c5..a7c3ed89a3e0 100644 --- a/fs/fscache/stats.c +++ b/fs/fscache/stats.c @@ -278,5 +278,6 @@ int fscache_stats_show(struct seq_file *m, void *v) atomic_read(&fscache_n_cache_stale_objects), atomic_read(&fscache_n_cache_retired_objects), atomic_read(&fscache_n_cache_culled_objects)); + netfs_stats_show(m); return 0; } diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h index 3f0b19dcfae7..3235ddbdcc09 100644 --- a/include/linux/fscache-cache.h +++ b/include/linux/fscache-cache.h @@ -304,6 +304,10 @@ struct fscache_cache_ops { /* dissociate a cache from all the pages it was backing */ void (*dissociate_pages)(struct fscache_cache *cache); + + /* Begin a read operation for the netfs lib */ + int (*begin_read_operation)(struct netfs_read_request *rreq, + struct fscache_retrieval *op); }; extern struct fscache_cookie fscache_fsdef_index; diff --git a/include/linux/fscache.h b/include/linux/fscache.h index 1f8dc72369ee..abc1c4737fb8 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -37,6 +37,7 @@ struct pagevec; struct fscache_cache_tag; struct fscache_cookie; struct fscache_netfs; +struct netfs_read_request; typedef void (*fscache_rw_complete_t)(struct page *page, void *context, @@ -191,6 +192,10 @@ extern void __fscache_update_cookie(struct fscache_cookie *, const void *); extern int __fscache_attr_changed(struct fscache_cookie *); extern void __fscache_invalidate(struct fscache_cookie *); extern void __fscache_wait_on_invalidate(struct fscache_cookie *); + +#ifdef FSCACHE_USE_NEW_IO_API +extern int __fscache_begin_read_operation(struct netfs_read_request *, struct fscache_cookie *); +#else extern int __fscache_read_or_alloc_page(struct fscache_cookie *, struct page *, fscache_rw_complete_t, @@ -214,6 +219,8 @@ extern void __fscache_uncache_all_inode_pages(struct fscache_cookie *, struct inode *); extern void __fscache_readpages_cancel(struct fscache_cookie *cookie, struct list_head *pages); +#endif /* FSCACHE_USE_NEW_IO_API */ + extern void __fscache_disable_cookie(struct fscache_cookie *, const void *, bool); extern void __fscache_enable_cookie(struct fscache_cookie *, const void *, loff_t, bool (*)(void *), void *); @@ -498,6 +505,36 @@ int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size) return -ENOBUFS; } +#ifdef FSCACHE_USE_NEW_IO_API + +/** + * fscache_begin_read_operation - Begin a read operation for the netfs lib + * @rreq: The read request being undertaken + * @cookie: The cookie representing the cache object + * + * Begin a read operation on behalf of the netfs helper library. @rreq + * indicates the read request to which the operation state should be attached; + * @cookie indicates the cache object that will be accessed. + * + * This is intended to be called from the ->begin_cache_operation() netfs lib + * operation as implemented by the network filesystem. + * + * Returns: + * * 0 - Success + * * -ENOBUFS - No caching available + * * Other error code from the cache, such as -ENOMEM. + */ +static inline +int fscache_begin_read_operation(struct netfs_read_request *rreq, + struct fscache_cookie *cookie) +{ + if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie)) + return __fscache_begin_read_operation(rreq, cookie); + return -ENOBUFS; +} + +#else /* FSCACHE_USE_NEW_IO_API */ + /** * fscache_read_or_alloc_page - Read a page from the cache or allocate a block * in which to store it @@ -777,6 +814,8 @@ void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie, __fscache_uncache_all_inode_pages(cookie, inode); } +#endif /* FSCACHE_USE_NEW_IO_API */ + /** * fscache_disable_cookie - Disable a cookie * @cookie: The cookie representing the cache object From patchwork Wed Mar 10 16:57:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396823 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9F10C43331 for ; Wed, 10 Mar 2021 16:59:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C160164E59 for ; Wed, 10 Mar 2021 16:59:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233191AbhCJQ6i (ORCPT ); Wed, 10 Mar 2021 11:58:38 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:35486 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233341AbhCJQ6E (ORCPT ); Wed, 10 Mar 2021 11:58:04 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395483; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mFJ3IAsTRuTsvMfrtCEB3mNV0om6oV5MNb3aFFRCAxI=; b=E8hRSFFD5XXZlP6pcoIalurmhriaaGW92HSVzpy5xw8dgtuhYqrh4eofYsXA0pXKH1I42l LDjIEtxNqKhINjHI2bvwoR0ONCDSxq/1qnFlfek/aIdQ0T8xYO4gz9wyQOia5Ryj5ThY/A HXe+4QzN5o5kJ1shPj2/4dB35C08n2k= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-537-UozVF1Y4P8u9I1u4ve3Uxg-1; Wed, 10 Mar 2021 11:58:01 -0500 X-MC-Unique: UozVF1Y4P8u9I1u4ve3Uxg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A117A1005D4A; Wed, 10 Mar 2021 16:57:58 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7DE75E15C; Wed, 10 Mar 2021 16:57:52 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 15/28] afs: Disable use of the fscache I/O routines From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:57:51 +0000 Message-ID: <161539547167.286939.3536238932531122332.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Disable use of the fscache I/O routined by the AFS filesystem. It's about to transition to passing iov_iters down and fscache is about to have its I/O path to use iov_iter, so all that needs to change. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/158861209824.340223.1864211542341758994.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465768717.1376105.2229314852486665807.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588457929.3465195.1730097418904945578.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118143744.1232039.2727898205333669064.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161039077.2537118.7986870854927176905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340403323.1303470.8159439948319423431.stgit@warthog.procyon.org.uk/ # v3 --- fs/afs/file.c | 199 ++++++++++---------------------------------------------- fs/afs/inode.c | 2 - fs/afs/write.c | 10 --- 3 files changed, 36 insertions(+), 175 deletions(-) diff --git a/fs/afs/file.c b/fs/afs/file.c index 85f5adf21aa0..6d43713fde01 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -203,24 +203,6 @@ void afs_put_read(struct afs_read *req) } } -#ifdef CONFIG_AFS_FSCACHE -/* - * deal with notification that a page was read from the cache - */ -static void afs_file_readpage_read_complete(struct page *page, - void *data, - int error) -{ - _enter("%p,%p,%d", page, data, error); - - /* if the read completes with an error, we just unlock the page and let - * the VM reissue the readpage */ - if (!error) - SetPageUptodate(page); - unlock_page(page); -} -#endif - static void afs_fetch_data_success(struct afs_operation *op) { struct afs_vnode *vnode = op->file[0].vnode; @@ -288,89 +270,46 @@ int afs_page_filler(void *data, struct page *page) if (test_bit(AFS_VNODE_DELETED, &vnode->flags)) goto error; - /* is it cached? */ -#ifdef CONFIG_AFS_FSCACHE - ret = fscache_read_or_alloc_page(vnode->cache, - page, - afs_file_readpage_read_complete, - NULL, - GFP_KERNEL); -#else - ret = -ENOBUFS; -#endif - switch (ret) { - /* read BIO submitted (page in cache) */ - case 0: - break; - - /* page not yet cached */ - case -ENODATA: - _debug("cache said ENODATA"); - goto go_on; - - /* page will not be cached */ - case -ENOBUFS: - _debug("cache said ENOBUFS"); - - fallthrough; - default: - go_on: - req = kzalloc(struct_size(req, array, 1), GFP_KERNEL); - if (!req) - goto enomem; - - /* We request a full page. If the page is a partial one at the - * end of the file, the server will return a short read and the - * unmarshalling code will clear the unfilled space. - */ - refcount_set(&req->usage, 1); - req->pos = (loff_t)page->index << PAGE_SHIFT; - req->len = PAGE_SIZE; - req->nr_pages = 1; - req->pages = req->array; - req->pages[0] = page; - get_page(page); - - /* read the contents of the file from the server into the - * page */ - ret = afs_fetch_data(vnode, key, req); - afs_put_read(req); - - if (ret < 0) { - if (ret == -ENOENT) { - _debug("got NOENT from server" - " - marking file deleted and stale"); - set_bit(AFS_VNODE_DELETED, &vnode->flags); - ret = -ESTALE; - } - -#ifdef CONFIG_AFS_FSCACHE - fscache_uncache_page(vnode->cache, page); -#endif - BUG_ON(PageFsCache(page)); - - if (ret == -EINTR || - ret == -ENOMEM || - ret == -ERESTARTSYS || - ret == -EAGAIN) - goto error; - goto io_error; - } + req = kzalloc(struct_size(req, array, 1), GFP_KERNEL); + if (!req) + goto enomem; - SetPageUptodate(page); + /* We request a full page. If the page is a partial one at the + * end of the file, the server will return a short read and the + * unmarshalling code will clear the unfilled space. + */ + refcount_set(&req->usage, 1); + req->pos = (loff_t)page->index << PAGE_SHIFT; + req->len = PAGE_SIZE; + req->nr_pages = 1; + req->pages = req->array; + req->pages[0] = page; + get_page(page); + + /* read the contents of the file from the server into the + * page */ + ret = afs_fetch_data(vnode, key, req); + afs_put_read(req); - /* send the page to the cache */ -#ifdef CONFIG_AFS_FSCACHE - if (PageFsCache(page) && - fscache_write_page(vnode->cache, page, vnode->status.size, - GFP_KERNEL) != 0) { - fscache_uncache_page(vnode->cache, page); - BUG_ON(PageFsCache(page)); + if (ret < 0) { + if (ret == -ENOENT) { + _debug("got NOENT from server" + " - marking file deleted and stale"); + set_bit(AFS_VNODE_DELETED, &vnode->flags); + ret = -ESTALE; } -#endif - unlock_page(page); + + if (ret == -EINTR || + ret == -ENOMEM || + ret == -ERESTARTSYS || + ret == -EAGAIN) + goto error; + goto io_error; } + SetPageUptodate(page); + unlock_page(page); + _leave(" = 0"); return 0; @@ -416,23 +355,10 @@ static int afs_readpage(struct file *file, struct page *page) */ static void afs_readpages_page_done(struct afs_read *req) { -#ifdef CONFIG_AFS_FSCACHE - struct afs_vnode *vnode = req->vnode; -#endif struct page *page = req->pages[req->index]; req->pages[req->index] = NULL; SetPageUptodate(page); - - /* send the page to the cache */ -#ifdef CONFIG_AFS_FSCACHE - if (PageFsCache(page) && - fscache_write_page(vnode->cache, page, vnode->status.size, - GFP_KERNEL) != 0) { - fscache_uncache_page(vnode->cache, page); - BUG_ON(PageFsCache(page)); - } -#endif unlock_page(page); put_page(page); } @@ -491,9 +417,6 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping, index = page->index; if (add_to_page_cache_lru(page, mapping, index, readahead_gfp_mask(mapping))) { -#ifdef CONFIG_AFS_FSCACHE - fscache_uncache_page(vnode->cache, page); -#endif put_page(page); break; } @@ -526,9 +449,6 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping, for (i = 0; i < req->nr_pages; i++) { page = req->pages[i]; if (page) { -#ifdef CONFIG_AFS_FSCACHE - fscache_uncache_page(vnode->cache, page); -#endif SetPageError(page); unlock_page(page); } @@ -560,37 +480,6 @@ static int afs_readpages(struct file *file, struct address_space *mapping, } /* attempt to read as many of the pages as possible */ -#ifdef CONFIG_AFS_FSCACHE - ret = fscache_read_or_alloc_pages(vnode->cache, - mapping, - pages, - &nr_pages, - afs_file_readpage_read_complete, - NULL, - mapping_gfp_mask(mapping)); -#else - ret = -ENOBUFS; -#endif - - switch (ret) { - /* all pages are being read from the cache */ - case 0: - BUG_ON(!list_empty(pages)); - BUG_ON(nr_pages != 0); - _leave(" = 0 [reading all]"); - return 0; - - /* there were pages that couldn't be read from the cache */ - case -ENODATA: - case -ENOBUFS: - break; - - /* other error */ - default: - _leave(" = %d", ret); - return ret; - } - while (!list_empty(pages)) { ret = afs_readpages_one(file, mapping, pages); if (ret < 0) @@ -670,17 +559,6 @@ static void afs_invalidatepage(struct page *page, unsigned int offset, BUG_ON(!PageLocked(page)); -#ifdef CONFIG_AFS_FSCACHE - /* we clean up only if the entire page is being invalidated */ - if (offset == 0 && length == PAGE_SIZE) { - if (PageFsCache(page)) { - struct afs_vnode *vnode = AFS_FS_I(page->mapping->host); - fscache_wait_on_page_write(vnode->cache, page); - fscache_uncache_page(vnode->cache, page); - } - } -#endif - if (PagePrivate(page)) afs_invalidate_dirty(page, offset, length); @@ -702,13 +580,6 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags) /* deny if page is being written to the cache and the caller hasn't * elected to wait */ -#ifdef CONFIG_AFS_FSCACHE - if (!fscache_maybe_release_page(vnode->cache, page, gfp_flags)) { - _leave(" = F [cache busy]"); - return 0; - } -#endif - if (PagePrivate(page)) { priv = (unsigned long)detach_page_private(page); trace_afs_page_dirty(vnode, tracepoint_string("rel"), diff --git a/fs/afs/inode.c b/fs/afs/inode.c index 1156b2df28d3..48519ee00eef 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -428,7 +428,7 @@ static void afs_get_inode_cache(struct afs_vnode *vnode) } __packed key; struct afs_vnode_cache_aux aux; - if (vnode->status.type == AFS_FTYPE_DIR) { + if (vnode->status.type != AFS_FTYPE_FILE) { vnode->cache = NULL; return; } diff --git a/fs/afs/write.c b/fs/afs/write.c index c9195fc67fd8..92eaa88000d7 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -847,9 +847,6 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf) /* Wait for the page to be written to the cache before we allow it to * be modified. We then assume the entire page will need writing back. */ -#ifdef CONFIG_AFS_FSCACHE - fscache_wait_on_page_write(vnode->cache, vmf->page); -#endif if (PageWriteback(vmf->page) && wait_on_page_bit_killable(vmf->page, PG_writeback) < 0) @@ -936,12 +933,5 @@ int afs_launder_page(struct page *page) priv = (unsigned long)detach_page_private(page); trace_afs_page_dirty(vnode, tracepoint_string("laundered"), page->index, priv); - -#ifdef CONFIG_AFS_FSCACHE - if (PageFsCache(page)) { - fscache_wait_on_page_write(vnode->cache, page); - fscache_uncache_page(vnode->cache, page); - } -#endif return ret; } From patchwork Wed Mar 10 16:58:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396822 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2516DC43332 for ; Wed, 10 Mar 2021 16:59:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 048B964FD7 for ; Wed, 10 Mar 2021 16:59:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232569AbhCJQ6k (ORCPT ); Wed, 10 Mar 2021 11:58:40 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:32461 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233699AbhCJQ6R (ORCPT ); Wed, 10 Mar 2021 11:58:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395496; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LyCsQkGvQAru5Y/mzUxu73NjCs6Bha5hG3ntWhEheOE=; b=a7sDbvTSQ87MPqNKhf6s75/t6++dHtapOAHqlzn1PbPnbpUNWA2tZSy7XS/oxPI8Ti2xZe zd/pb2VOpeo4QzWIl9B8k8rRBjNyyGu+BIkGDQ/CRYZ3G5JKf76JF25aFOeq6y3O2Bym6A Am+uxB0BPZ+8id/LJtmxmo47Q3I0XyQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-254-cYOMH67wOCCcVd0pJ7QSBw-1; Wed, 10 Mar 2021 11:58:13 -0500 X-MC-Unique: cYOMH67wOCCcVd0pJ7QSBw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1098E83DD21; Wed, 10 Mar 2021 16:58:11 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id E5CCD694CD; Wed, 10 Mar 2021 16:58:04 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 16/28] afs: Pass page into dirty region helpers to provide THP size From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:58:03 +0000 Message-ID: <161539548385.286939.8864598314493255313.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Pass a pointer to the page being accessed into the dirty region helpers so that the size of the page can be determined in case it's a transparent huge page. This also required the page to be passed into the afs_page_dirty trace point - so there's no need to specifically pass in the index or private data as these can be retrieved directly from the page struct. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588527183.3465195.16107942526481976308.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118144921.1232039.11377711180492625929.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161040747.2537118.11435394902674511430.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340404553.1303470.11414163641767769882.stgit@warthog.procyon.org.uk/ # v3 --- fs/afs/file.c | 20 +++++++-------- fs/afs/internal.h | 16 ++++++------ fs/afs/write.c | 60 ++++++++++++++++++-------------------------- include/trace/events/afs.h | 23 ++++++++++------- 4 files changed, 55 insertions(+), 64 deletions(-) diff --git a/fs/afs/file.c b/fs/afs/file.c index 6d43713fde01..21868bfc3a44 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -515,8 +515,8 @@ static void afs_invalidate_dirty(struct page *page, unsigned int offset, return; /* We may need to shorten the dirty region */ - f = afs_page_dirty_from(priv); - t = afs_page_dirty_to(priv); + f = afs_page_dirty_from(page, priv); + t = afs_page_dirty_to(page, priv); if (t <= offset || f >= end) return; /* Doesn't overlap */ @@ -534,17 +534,17 @@ static void afs_invalidate_dirty(struct page *page, unsigned int offset, if (f == t) goto undirty; - priv = afs_page_dirty(f, t); + priv = afs_page_dirty(page, f, t); set_page_private(page, priv); - trace_afs_page_dirty(vnode, tracepoint_string("trunc"), page->index, priv); + trace_afs_page_dirty(vnode, tracepoint_string("trunc"), page); return; undirty: - trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page->index, priv); + trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page); clear_page_dirty_for_io(page); full_invalidate: - priv = (unsigned long)detach_page_private(page); - trace_afs_page_dirty(vnode, tracepoint_string("inval"), page->index, priv); + detach_page_private(page); + trace_afs_page_dirty(vnode, tracepoint_string("inval"), page); } /* @@ -572,7 +572,6 @@ static void afs_invalidatepage(struct page *page, unsigned int offset, static int afs_releasepage(struct page *page, gfp_t gfp_flags) { struct afs_vnode *vnode = AFS_FS_I(page->mapping->host); - unsigned long priv; _enter("{{%llx:%llu}[%lu],%lx},%x", vnode->fid.vid, vnode->fid.vnode, page->index, page->flags, @@ -581,9 +580,8 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags) /* deny if page is being written to the cache and the caller hasn't * elected to wait */ if (PagePrivate(page)) { - priv = (unsigned long)detach_page_private(page); - trace_afs_page_dirty(vnode, tracepoint_string("rel"), - page->index, priv); + detach_page_private(page); + trace_afs_page_dirty(vnode, tracepoint_string("rel"), page); } /* indicate that the page can be released */ diff --git a/fs/afs/internal.h b/fs/afs/internal.h index b626e38e9ab5..180eae8134da 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -875,31 +875,31 @@ struct afs_vnode_cache_aux { #define __AFS_PAGE_PRIV_MMAPPED 0x8000UL #endif -static inline unsigned int afs_page_dirty_resolution(void) +static inline unsigned int afs_page_dirty_resolution(struct page *page) { - int shift = PAGE_SHIFT - (__AFS_PAGE_PRIV_SHIFT - 1); + int shift = thp_order(page) + PAGE_SHIFT - (__AFS_PAGE_PRIV_SHIFT - 1); return (shift > 0) ? shift : 0; } -static inline size_t afs_page_dirty_from(unsigned long priv) +static inline size_t afs_page_dirty_from(struct page *page, unsigned long priv) { unsigned long x = priv & __AFS_PAGE_PRIV_MASK; /* The lower bound is inclusive */ - return x << afs_page_dirty_resolution(); + return x << afs_page_dirty_resolution(page); } -static inline size_t afs_page_dirty_to(unsigned long priv) +static inline size_t afs_page_dirty_to(struct page *page, unsigned long priv) { unsigned long x = (priv >> __AFS_PAGE_PRIV_SHIFT) & __AFS_PAGE_PRIV_MASK; /* The upper bound is immediately beyond the region */ - return (x + 1) << afs_page_dirty_resolution(); + return (x + 1) << afs_page_dirty_resolution(page); } -static inline unsigned long afs_page_dirty(size_t from, size_t to) +static inline unsigned long afs_page_dirty(struct page *page, size_t from, size_t to) { - unsigned int res = afs_page_dirty_resolution(); + unsigned int res = afs_page_dirty_resolution(page); from >>= res; to = (to - 1) >> res; return (to << __AFS_PAGE_PRIV_SHIFT) | from; diff --git a/fs/afs/write.c b/fs/afs/write.c index 92eaa88000d7..9d0cef35ecba 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -112,15 +112,14 @@ int afs_write_begin(struct file *file, struct address_space *mapping, t = f = 0; if (PagePrivate(page)) { priv = page_private(page); - f = afs_page_dirty_from(priv); - t = afs_page_dirty_to(priv); + f = afs_page_dirty_from(page, priv); + t = afs_page_dirty_to(page, priv); ASSERTCMP(f, <=, t); } if (f != t) { if (PageWriteback(page)) { - trace_afs_page_dirty(vnode, tracepoint_string("alrdy"), - page->index, priv); + trace_afs_page_dirty(vnode, tracepoint_string("alrdy"), page); goto flush_conflicting_write; } /* If the file is being filled locally, allow inter-write @@ -204,21 +203,19 @@ int afs_write_end(struct file *file, struct address_space *mapping, if (PagePrivate(page)) { priv = page_private(page); - f = afs_page_dirty_from(priv); - t = afs_page_dirty_to(priv); + f = afs_page_dirty_from(page, priv); + t = afs_page_dirty_to(page, priv); if (from < f) f = from; if (to > t) t = to; - priv = afs_page_dirty(f, t); + priv = afs_page_dirty(page, f, t); set_page_private(page, priv); - trace_afs_page_dirty(vnode, tracepoint_string("dirty+"), - page->index, priv); + trace_afs_page_dirty(vnode, tracepoint_string("dirty+"), page); } else { - priv = afs_page_dirty(from, to); + priv = afs_page_dirty(page, from, to); attach_page_private(page, (void *)priv); - trace_afs_page_dirty(vnode, tracepoint_string("dirty"), - page->index, priv); + trace_afs_page_dirty(vnode, tracepoint_string("dirty"), page); } set_page_dirty(page); @@ -321,7 +318,6 @@ static void afs_pages_written_back(struct afs_vnode *vnode, pgoff_t first, pgoff_t last) { struct pagevec pv; - unsigned long priv; unsigned count, loop; _enter("{%llx:%llu},{%lx-%lx}", @@ -340,9 +336,9 @@ static void afs_pages_written_back(struct afs_vnode *vnode, ASSERTCMP(pv.nr, ==, count); for (loop = 0; loop < count; loop++) { - priv = (unsigned long)detach_page_private(pv.pages[loop]); + detach_page_private(pv.pages[loop]); trace_afs_page_dirty(vnode, tracepoint_string("clear"), - pv.pages[loop]->index, priv); + pv.pages[loop]); end_page_writeback(pv.pages[loop]); } first += count; @@ -516,15 +512,13 @@ static int afs_write_back_from_locked_page(struct address_space *mapping, */ start = primary_page->index; priv = page_private(primary_page); - offset = afs_page_dirty_from(priv); - to = afs_page_dirty_to(priv); - trace_afs_page_dirty(vnode, tracepoint_string("store"), - primary_page->index, priv); + offset = afs_page_dirty_from(primary_page, priv); + to = afs_page_dirty_to(primary_page, priv); + trace_afs_page_dirty(vnode, tracepoint_string("store"), primary_page); WARN_ON(offset == to); if (offset == to) - trace_afs_page_dirty(vnode, tracepoint_string("WARN"), - primary_page->index, priv); + trace_afs_page_dirty(vnode, tracepoint_string("WARN"), primary_page); if (start >= final_page || (to < PAGE_SIZE && !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags))) @@ -562,8 +556,8 @@ static int afs_write_back_from_locked_page(struct address_space *mapping, } priv = page_private(page); - f = afs_page_dirty_from(priv); - t = afs_page_dirty_to(priv); + f = afs_page_dirty_from(page, priv); + t = afs_page_dirty_to(page, priv); if (f != 0 && !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags)) { unlock_page(page); @@ -571,8 +565,7 @@ static int afs_write_back_from_locked_page(struct address_space *mapping, } to = t; - trace_afs_page_dirty(vnode, tracepoint_string("store+"), - page->index, priv); + trace_afs_page_dirty(vnode, tracepoint_string("store+"), page); if (!clear_page_dirty_for_io(page)) BUG(); @@ -861,14 +854,13 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf) */ wait_on_page_writeback(vmf->page); - priv = afs_page_dirty(0, PAGE_SIZE); + priv = afs_page_dirty(vmf->page, 0, PAGE_SIZE); priv = afs_page_dirty_mmapped(priv); - trace_afs_page_dirty(vnode, tracepoint_string("mkwrite"), - vmf->page->index, priv); if (PagePrivate(vmf->page)) set_page_private(vmf->page, priv); else attach_page_private(vmf->page, (void *)priv); + trace_afs_page_dirty(vnode, tracepoint_string("mkwrite"), vmf->page); file_update_time(file); sb_end_pagefault(inode->i_sb); @@ -921,17 +913,15 @@ int afs_launder_page(struct page *page) f = 0; t = PAGE_SIZE; if (PagePrivate(page)) { - f = afs_page_dirty_from(priv); - t = afs_page_dirty_to(priv); + f = afs_page_dirty_from(page, priv); + t = afs_page_dirty_to(page, priv); } - trace_afs_page_dirty(vnode, tracepoint_string("launder"), - page->index, priv); + trace_afs_page_dirty(vnode, tracepoint_string("launder"), page); ret = afs_store_data(mapping, page->index, page->index, t, f, true); } - priv = (unsigned long)detach_page_private(page); - trace_afs_page_dirty(vnode, tracepoint_string("laundered"), - page->index, priv); + detach_page_private(page); + trace_afs_page_dirty(vnode, tracepoint_string("laundered"), page); return ret; } diff --git a/include/trace/events/afs.h b/include/trace/events/afs.h index 4a5cc8c64be3..9203cf6a8c53 100644 --- a/include/trace/events/afs.h +++ b/include/trace/events/afs.h @@ -969,30 +969,33 @@ TRACE_EVENT(afs_dir_check_failed, ); TRACE_EVENT(afs_page_dirty, - TP_PROTO(struct afs_vnode *vnode, const char *where, - pgoff_t page, unsigned long priv), + TP_PROTO(struct afs_vnode *vnode, const char *where, struct page *page), - TP_ARGS(vnode, where, page, priv), + TP_ARGS(vnode, where, page), TP_STRUCT__entry( __field(struct afs_vnode *, vnode ) __field(const char *, where ) __field(pgoff_t, page ) - __field(unsigned long, priv ) + __field(unsigned long, from ) + __field(unsigned long, to ) ), TP_fast_assign( __entry->vnode = vnode; __entry->where = where; - __entry->page = page; - __entry->priv = priv; + __entry->page = page->index; + __entry->from = afs_page_dirty_from(page, page->private); + __entry->to = afs_page_dirty_to(page, page->private); + __entry->to |= (afs_is_page_dirty_mmapped(page->private) ? + (1UL << (BITS_PER_LONG - 1)) : 0); ), - TP_printk("vn=%p %lx %s %zx-%zx%s", + TP_printk("vn=%p %lx %s %lx-%lx%s", __entry->vnode, __entry->page, __entry->where, - afs_page_dirty_from(__entry->priv), - afs_page_dirty_to(__entry->priv), - afs_is_page_dirty_mmapped(__entry->priv) ? " M" : "") + __entry->from, + __entry->to & ~(1UL << (BITS_PER_LONG - 1)), + __entry->to & (1UL << (BITS_PER_LONG - 1)) ? " M" : "") ); TRACE_EVENT(afs_call_state, From patchwork Wed Mar 10 16:58:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 397625 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0953CC432C3 for ; Wed, 10 Mar 2021 16:59:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E5B2964FF9 for ; Wed, 10 Mar 2021 16:59:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233670AbhCJQ6m (ORCPT ); Wed, 10 Mar 2021 11:58:42 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:26371 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233403AbhCJQ61 (ORCPT ); Wed, 10 Mar 2021 11:58:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395506; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G+I2KUsaliMtYnZ9KwnlNFY8IkAMksOA42zhMc7qaDY=; b=Hi4K/ADCHnM0Nku6vgCqIsIyEPrFajQA2i5Qck6vSxan+2vmM2APM9BYcEv0XU9fMOAqYz er5XUxDOMGSG6BM0GKkNYLxr3DMkNES8pvnWIYAOx26LuBe2JD6+1QNTFRGM2XzF+/0jC9 vBiwfOJGszZhxB9sGFehvEk3eUHl7og= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-46-F2ZpZ4aaOu6N43_bnjHlsA-1; Wed, 10 Mar 2021 11:58:24 -0500 X-MC-Unique: F2ZpZ4aaOu6N43_bnjHlsA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F28271005D4E; Wed, 10 Mar 2021 16:58:22 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 27BEB5D6D7; Wed, 10 Mar 2021 16:58:16 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 17/28] afs: Print the operation debug_id when logging an unexpected data version From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:58:16 +0000 Message-ID: <161539549628.286939.15234870409714613954.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Print the afs_operation debug_id when logging an unexpected change in the data version. This allows the logged message to be matched against tracelines. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588528377.3465195.2206051235095182302.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118146111.1232039.11398082422487058312.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161042180.2537118.2471333561661033316.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340405772.1303470.3877167548944248214.stgit@warthog.procyon.org.uk/ # v3 --- fs/afs/inode.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/afs/inode.c b/fs/afs/inode.c index 48519ee00eef..af6556bb3d6a 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -215,11 +215,12 @@ static void afs_apply_status(struct afs_operation *op, if (vp->dv_before + vp->dv_delta != status->data_version) { if (test_bit(AFS_VNODE_CB_PROMISED, &vnode->flags)) - pr_warn("kAFS: vnode modified {%llx:%llu} %llx->%llx %s\n", + pr_warn("kAFS: vnode modified {%llx:%llu} %llx->%llx %s (op=%x)\n", vnode->fid.vid, vnode->fid.vnode, (unsigned long long)vp->dv_before + vp->dv_delta, (unsigned long long)status->data_version, - op->type ? op->type->name : "???"); + op->type ? op->type->name : "???", + op->debug_id); vnode->invalid_before = status->data_version; if (vnode->status.type == AFS_FTYPE_DIR) { From patchwork Wed Mar 10 16:58:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 397624 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA77BC433E9 for ; Wed, 10 Mar 2021 16:59:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AC57564FD9 for ; Wed, 10 Mar 2021 16:59:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233357AbhCJQ7I (ORCPT ); Wed, 10 Mar 2021 11:59:08 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:32987 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232435AbhCJQ6g (ORCPT ); Wed, 10 Mar 2021 11:58:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395515; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gPWltm5R5JyUJmPKgfMLd6XG8DedG8luCkEzn607pSE=; b=OLDk6iOVHOjnEqzLYsQxCx+kNS5FYuYv4uTgT35qKTj6Nm5mnweufWoTmF23dOHBUNYSab MRHbOMLQklbBFc+N+1bBEozIHcxMbjqjThzJ9qKeunFQaRT/XtTnRoSwRKhbEVKuChPbjn wxGKhx4guo08QlLPyFQQgjHcgYSYAbg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-60-RCwqnQW8NQOV-mp51v112g-1; Wed, 10 Mar 2021 11:58:33 -0500 X-MC-Unique: RCwqnQW8NQOV-mp51v112g-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 001701005D4F; Wed, 10 Mar 2021 16:58:31 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 014405B6BF; Wed, 10 Mar 2021 16:58:28 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 18/28] afs: Move key to afs_read struct From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:58:28 +0000 Message-ID: <161539550819.286939.1268332875889175195.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Stash the key used to authenticate read operations in the afs_read struct. This will be necessary to reissue the operation against the server if a read from the cache fails in upcoming cache changes. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/158861248336.340223.1851189950710196001.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465823899.1377938.11925978022348532049.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588529557.3465195.7303323479305254243.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118147693.1232039.13780672951838643842.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161043340.2537118.511899217704140722.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340406678.1303470.12676824086429446370.stgit@warthog.procyon.org.uk/ # v3 --- fs/afs/dir.c | 3 ++- fs/afs/file.c | 16 +++++++++------- fs/afs/internal.h | 3 ++- fs/afs/write.c | 12 ++++++------ 4 files changed, 19 insertions(+), 15 deletions(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 714fcca9af99..30c769efee26 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -242,6 +242,7 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) return ERR_PTR(-ENOMEM); refcount_set(&req->usage, 1); + req->key = key_get(key); req->nr_pages = nr_pages; req->actual_len = i_size; /* May change */ req->len = nr_pages * PAGE_SIZE; /* We can ask for more than there is */ @@ -306,7 +307,7 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) { trace_afs_reload_dir(dvnode); - ret = afs_fetch_data(dvnode, key, req); + ret = afs_fetch_data(dvnode, req); if (ret < 0) goto error_unlock; diff --git a/fs/afs/file.c b/fs/afs/file.c index 21868bfc3a44..d23192b3b933 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -199,6 +199,7 @@ void afs_put_read(struct afs_read *req) if (req->pages != req->array) kfree(req->pages); } + key_put(req->key); kfree(req); } } @@ -229,7 +230,7 @@ static const struct afs_operation_ops afs_fetch_data_operation = { /* * Fetch file data from the volume. */ -int afs_fetch_data(struct afs_vnode *vnode, struct key *key, struct afs_read *req) +int afs_fetch_data(struct afs_vnode *vnode, struct afs_read *req) { struct afs_operation *op; @@ -238,9 +239,9 @@ int afs_fetch_data(struct afs_vnode *vnode, struct key *key, struct afs_read *re vnode->fid.vid, vnode->fid.vnode, vnode->fid.unique, - key_serial(key)); + key_serial(req->key)); - op = afs_alloc_operation(key, vnode->volume); + op = afs_alloc_operation(req->key, vnode->volume); if (IS_ERR(op)) return PTR_ERR(op); @@ -279,6 +280,7 @@ int afs_page_filler(void *data, struct page *page) * unmarshalling code will clear the unfilled space. */ refcount_set(&req->usage, 1); + req->key = key_get(key); req->pos = (loff_t)page->index << PAGE_SHIFT; req->len = PAGE_SIZE; req->nr_pages = 1; @@ -288,7 +290,7 @@ int afs_page_filler(void *data, struct page *page) /* read the contents of the file from the server into the * page */ - ret = afs_fetch_data(vnode, key, req); + ret = afs_fetch_data(vnode, req); afs_put_read(req); if (ret < 0) { @@ -373,7 +375,6 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping, struct afs_read *req; struct list_head *p; struct page *first, *page; - struct key *key = afs_file_key(file); pgoff_t index; int ret, n, i; @@ -397,6 +398,7 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping, refcount_set(&req->usage, 1); req->vnode = vnode; + req->key = key_get(afs_file_key(file)); req->page_done = afs_readpages_page_done; req->pos = first->index; req->pos <<= PAGE_SHIFT; @@ -426,11 +428,11 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping, } while (req->nr_pages < n); if (req->nr_pages == 0) { - kfree(req); + afs_put_read(req); return 0; } - ret = afs_fetch_data(vnode, key, req); + ret = afs_fetch_data(vnode, req); if (ret < 0) goto error; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 180eae8134da..921e7d3b2cfa 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -204,6 +204,7 @@ struct afs_read { loff_t actual_len; /* How much we're actually getting */ loff_t remain; /* Amount remaining */ loff_t file_size; /* File size returned by server */ + struct key *key; /* The key to use to reissue the read */ afs_dataversion_t data_version; /* Version number returned by server */ refcount_t usage; unsigned int index; /* Which page we're reading into */ @@ -1045,7 +1046,7 @@ extern int afs_cache_wb_key(struct afs_vnode *, struct afs_file *); extern void afs_put_wb_key(struct afs_wb_key *); extern int afs_open(struct inode *, struct file *); extern int afs_release(struct inode *, struct file *); -extern int afs_fetch_data(struct afs_vnode *, struct key *, struct afs_read *); +extern int afs_fetch_data(struct afs_vnode *, struct afs_read *); extern int afs_page_filler(void *, struct page *); extern void afs_put_read(struct afs_read *); diff --git a/fs/afs/write.c b/fs/afs/write.c index 9d0cef35ecba..7eba0d3201ba 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -25,9 +25,10 @@ int afs_set_page_dirty(struct page *page) /* * partly or wholly fill a page that's under preparation for writing */ -static int afs_fill_page(struct afs_vnode *vnode, struct key *key, +static int afs_fill_page(struct file *file, loff_t pos, unsigned int len, struct page *page) { + struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); struct afs_read *req; size_t p; void *data; @@ -49,6 +50,7 @@ static int afs_fill_page(struct afs_vnode *vnode, struct key *key, return -ENOMEM; refcount_set(&req->usage, 1); + req->key = key_get(afs_file_key(file)); req->pos = pos; req->len = len; req->nr_pages = 1; @@ -56,7 +58,7 @@ static int afs_fill_page(struct afs_vnode *vnode, struct key *key, req->pages[0] = page; get_page(page); - ret = afs_fetch_data(vnode, key, req); + ret = afs_fetch_data(vnode, req); afs_put_read(req); if (ret < 0) { if (ret == -ENOENT) { @@ -80,7 +82,6 @@ int afs_write_begin(struct file *file, struct address_space *mapping, { struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); struct page *page; - struct key *key = afs_file_key(file); unsigned long priv; unsigned f, from = pos & (PAGE_SIZE - 1); unsigned t, to = from + len; @@ -95,7 +96,7 @@ int afs_write_begin(struct file *file, struct address_space *mapping, return -ENOMEM; if (!PageUptodate(page) && len != PAGE_SIZE) { - ret = afs_fill_page(vnode, key, pos & PAGE_MASK, PAGE_SIZE, page); + ret = afs_fill_page(file, pos & PAGE_MASK, PAGE_SIZE, page); if (ret < 0) { unlock_page(page); put_page(page); @@ -163,7 +164,6 @@ int afs_write_end(struct file *file, struct address_space *mapping, struct page *page, void *fsdata) { struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); - struct key *key = afs_file_key(file); unsigned long priv; unsigned int f, from = pos & (PAGE_SIZE - 1); unsigned int t, to = from + copied; @@ -193,7 +193,7 @@ int afs_write_end(struct file *file, struct address_space *mapping, * unmarshalling routine will take care of clearing any * bits that are beyond the EOF. */ - ret = afs_fill_page(vnode, key, pos + copied, + ret = afs_fill_page(file, pos + copied, len - copied, page); if (ret < 0) goto out; From patchwork Wed Mar 10 16:58:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 397623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F896C4321A for ; Wed, 10 Mar 2021 16:59:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3CC1B64FD5 for ; Wed, 10 Mar 2021 16:59:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233513AbhCJQ7J (ORCPT ); Wed, 10 Mar 2021 11:59:09 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:32442 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233728AbhCJQ6v (ORCPT ); Wed, 10 Mar 2021 11:58:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395530; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CQNUuBW+9nMhmEIAwv23wViHSyykZAhr8Meh+ZBeISc=; b=H0ldLEPZ18PkKZwTX/hik8Yw3a2lPqRlWiqHQz3FvxNx67z3TDMrLCMsld6f/GWXqsmeZr 5frTWWbGlIsCGmRhnOJBOeQj9hMQfd6w82Vt0NTMkKQ2YD6Ayb1xSG1LSUPzNNhvKtCZb6 fMxors2Uj++j+l/rt56M19qzL87g6U0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-195--bcf8gTHO2i8G-PBFB0ZNw-1; Wed, 10 Mar 2021 11:58:46 -0500 X-MC-Unique: -bcf8gTHO2i8G-PBFB0ZNw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 66D4183DD22; Wed, 10 Mar 2021 16:58:44 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 403C35D9DB; Wed, 10 Mar 2021 16:58:37 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 19/28] afs: Don't truncate iter during data fetch From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:58:37 +0000 Message-ID: <161539551721.286939.14655713136572200716.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Don't truncate the iterator to correspond to the actual data size when fetching the data from the server - rather, pass the length we want to read to rxrpc. This will allow the clear-after-read code in future to simply clear the remaining iterator capacity rather than having to reinitialise the iterator. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/158861249201.340223.13035445866976590375.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465825061.1377938.14403904452300909320.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588531418.3465195.10712005940763063144.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118148567.1232039.13380313332292947956.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161044610.2537118.17908520793806837792.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340407907.1303470.6501394859511712746.stgit@warthog.procyon.org.uk/ # v3 --- fs/afs/fsclient.c | 6 ++++-- fs/afs/internal.h | 6 ++++++ fs/afs/rxrpc.c | 13 +++++++++---- fs/afs/yfsclient.c | 6 ++++-- include/net/af_rxrpc.h | 2 +- net/rxrpc/recvmsg.c | 9 +++++---- 6 files changed, 29 insertions(+), 13 deletions(-) diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 1d95ed9dd86e..4a57c6c6f12b 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -305,8 +305,9 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call) unsigned int size; int ret; - _enter("{%u,%zu/%llu}", - call->unmarshall, iov_iter_count(call->iter), req->actual_len); + _enter("{%u,%zu,%zu/%llu}", + call->unmarshall, call->iov_len, iov_iter_count(call->iter), + req->actual_len); switch (call->unmarshall) { case 0: @@ -343,6 +344,7 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call) size = PAGE_SIZE - req->offset; else size = req->remain; + call->iov_len = size; call->bvec[0].bv_len = size; call->bvec[0].bv_offset = req->offset; call->bvec[0].bv_page = req->pages[req->index]; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 921e7d3b2cfa..4725cfc4aaef 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -104,6 +104,7 @@ struct afs_call { struct afs_server *server; /* The fileserver record if fs op (pins ref) */ struct afs_vlserver *vlserver; /* The vlserver record if vl op */ void *request; /* request data (first part) */ + size_t iov_len; /* Size of *iter to be used */ struct iov_iter def_iter; /* Default buffer/data iterator */ struct iov_iter *iter; /* Iterator currently in use */ union { /* Convenience for ->def_iter */ @@ -1271,6 +1272,7 @@ static inline void afs_make_op_call(struct afs_operation *op, struct afs_call *c static inline void afs_extract_begin(struct afs_call *call, void *buf, size_t size) { + call->iov_len = size; call->kvec[0].iov_base = buf; call->kvec[0].iov_len = size; iov_iter_kvec(&call->def_iter, READ, call->kvec, 1, size); @@ -1278,21 +1280,25 @@ static inline void afs_extract_begin(struct afs_call *call, void *buf, size_t si static inline void afs_extract_to_tmp(struct afs_call *call) { + call->iov_len = sizeof(call->tmp); afs_extract_begin(call, &call->tmp, sizeof(call->tmp)); } static inline void afs_extract_to_tmp64(struct afs_call *call) { + call->iov_len = sizeof(call->tmp64); afs_extract_begin(call, &call->tmp64, sizeof(call->tmp64)); } static inline void afs_extract_discard(struct afs_call *call, size_t size) { + call->iov_len = size; iov_iter_discard(&call->def_iter, READ, size); } static inline void afs_extract_to_buf(struct afs_call *call, size_t size) { + call->iov_len = size; afs_extract_begin(call, call->buffer, size); } diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index 8be709cb8542..0ec38b758f29 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -363,6 +363,7 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp) struct rxrpc_call *rxcall; struct msghdr msg; struct kvec iov[1]; + size_t len; s64 tx_total_len; int ret; @@ -466,9 +467,10 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp) rxrpc_kernel_abort_call(call->net->socket, rxcall, RX_USER_ABORT, ret, "KSD"); } else { + len = 0; iov_iter_kvec(&msg.msg_iter, READ, NULL, 0, 0); rxrpc_kernel_recv_data(call->net->socket, rxcall, - &msg.msg_iter, false, + &msg.msg_iter, &len, false, &call->abort_code, &call->service_id); ac->abort_code = call->abort_code; ac->responded = true; @@ -504,6 +506,7 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp) static void afs_deliver_to_call(struct afs_call *call) { enum afs_call_state state; + size_t len; u32 abort_code, remote_abort = 0; int ret; @@ -516,10 +519,11 @@ static void afs_deliver_to_call(struct afs_call *call) state == AFS_CALL_SV_AWAIT_ACK ) { if (state == AFS_CALL_SV_AWAIT_ACK) { + len = 0; iov_iter_kvec(&call->def_iter, READ, NULL, 0, 0); ret = rxrpc_kernel_recv_data(call->net->socket, call->rxcall, &call->def_iter, - false, &remote_abort, + &len, false, &remote_abort, &call->service_id); trace_afs_receive_data(call, &call->def_iter, false, ret); @@ -929,10 +933,11 @@ int afs_extract_data(struct afs_call *call, bool want_more) u32 remote_abort = 0; int ret; - _enter("{%s,%zu},%d", call->type->name, iov_iter_count(iter), want_more); + _enter("{%s,%zu,%zu},%d", + call->type->name, call->iov_len, iov_iter_count(iter), want_more); ret = rxrpc_kernel_recv_data(net->socket, call->rxcall, iter, - want_more, &remote_abort, + &call->iov_len, want_more, &remote_abort, &call->service_id); if (ret == 0 || ret == -EAGAIN) return ret; diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index bd787e71a657..6c45d32da13c 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -363,8 +363,9 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call) unsigned int size; int ret; - _enter("{%u,%zu/%llu}", - call->unmarshall, iov_iter_count(call->iter), req->actual_len); + _enter("{%u,%zu, %zu/%llu}", + call->unmarshall, call->iov_len, iov_iter_count(call->iter), + req->actual_len); switch (call->unmarshall) { case 0: @@ -396,6 +397,7 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call) size = PAGE_SIZE - req->offset; else size = req->remain; + call->iov_len = size; call->bvec[0].bv_len = size; call->bvec[0].bv_offset = req->offset; call->bvec[0].bv_page = req->pages[req->index]; diff --git a/include/net/af_rxrpc.h b/include/net/af_rxrpc.h index f6abcc0bbd6e..cee5f83c0f11 100644 --- a/include/net/af_rxrpc.h +++ b/include/net/af_rxrpc.h @@ -53,7 +53,7 @@ int rxrpc_kernel_send_data(struct socket *, struct rxrpc_call *, struct msghdr *, size_t, rxrpc_notify_end_tx_t); int rxrpc_kernel_recv_data(struct socket *, struct rxrpc_call *, - struct iov_iter *, bool, u32 *, u16 *); + struct iov_iter *, size_t *, bool, u32 *, u16 *); bool rxrpc_kernel_abort_call(struct socket *, struct rxrpc_call *, u32, int, const char *); void rxrpc_kernel_end_call(struct socket *, struct rxrpc_call *); diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c index fef3573fdc8b..eca6dda26c77 100644 --- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -669,6 +669,7 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, * @sock: The socket that the call exists on * @call: The call to send data through * @iter: The buffer to receive into + * @_len: The amount of data we want to receive (decreased on return) * @want_more: True if more data is expected to be read * @_abort: Where the abort code is stored if -ECONNABORTED is returned * @_service: Where to store the actual service ID (may be upgraded) @@ -684,7 +685,7 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, * *_abort should also be initialised to 0. */ int rxrpc_kernel_recv_data(struct socket *sock, struct rxrpc_call *call, - struct iov_iter *iter, + struct iov_iter *iter, size_t *_len, bool want_more, u32 *_abort, u16 *_service) { size_t offset = 0; @@ -692,7 +693,7 @@ int rxrpc_kernel_recv_data(struct socket *sock, struct rxrpc_call *call, _enter("{%d,%s},%zu,%d", call->debug_id, rxrpc_call_states[call->state], - iov_iter_count(iter), want_more); + *_len, want_more); ASSERTCMP(call->state, !=, RXRPC_CALL_SERVER_SECURING); @@ -703,8 +704,8 @@ int rxrpc_kernel_recv_data(struct socket *sock, struct rxrpc_call *call, case RXRPC_CALL_SERVER_RECV_REQUEST: case RXRPC_CALL_SERVER_ACK_REQUEST: ret = rxrpc_recvmsg_data(sock, call, NULL, iter, - iov_iter_count(iter), 0, - &offset); + *_len, 0, &offset); + *_len -= offset; if (ret < 0) goto out; From patchwork Wed Mar 10 16:58:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396821 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 564DBC43333 for ; Wed, 10 Mar 2021 16:59:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2E4B964FFE for ; Wed, 10 Mar 2021 16:59:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233723AbhCJQ7L (ORCPT ); Wed, 10 Mar 2021 11:59:11 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:23967 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233767AbhCJQ7C (ORCPT ); Wed, 10 Mar 2021 11:59:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395541; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IgML5J+3iCW02sw2/JrZqpg8EYlST+NuVTNG9PnWres=; b=OV2cJVAKPnuCRvZu6sQIck0dcQujsckfX9n+pbAlrf8ka0xxn+kMipovbzBHj0KYp+Y2dN PEcseg7gpFUVTV+1hSvkz450uvNwPeEk6q4HOxFEIyjISPMdo/WnRs6lJt6OeqHhaPwBS2 VlhTFIe//GjZRbLtDuWGnkFwV4fjiZQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-78-bSljcV4KPuKoG8O7vz6E4A-1; Wed, 10 Mar 2021 11:58:58 -0500 X-MC-Unique: bSljcV4KPuKoG8O7vz6E4A-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9FFA31005D50; Wed, 10 Mar 2021 16:58:56 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id A22755B6A8; Wed, 10 Mar 2021 16:58:50 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 20/28] afs: Log remote unmarshalling errors From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:58:49 +0000 Message-ID: <161539552964.286939.16503232687974398308.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Log unmarshalling errors reported by the peer (ie. it can't parse what we sent it). Limit the maximum number of messages to 3. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/159465826250.1377938.16372395422217583913.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588532584.3465195.15618385466614028590.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118149739.1232039.208060911149801695.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161046033.2537118.7779717661044373273.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340409118.1303470.17812607349396199116.stgit@warthog.procyon.org.uk/ # v3 --- fs/afs/rxrpc.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index 0ec38b758f29..ae68576f822f 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -500,6 +500,39 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp) _leave(" = %d", ret); } +/* + * Log remote abort codes that indicate that we have a protocol disagreement + * with the server. + */ +static void afs_log_error(struct afs_call *call, s32 remote_abort) +{ + static int max = 0; + const char *msg; + int m; + + switch (remote_abort) { + case RX_EOF: msg = "unexpected EOF"; break; + case RXGEN_CC_MARSHAL: msg = "client marshalling"; break; + case RXGEN_CC_UNMARSHAL: msg = "client unmarshalling"; break; + case RXGEN_SS_MARSHAL: msg = "server marshalling"; break; + case RXGEN_SS_UNMARSHAL: msg = "server unmarshalling"; break; + case RXGEN_DECODE: msg = "opcode decode"; break; + case RXGEN_SS_XDRFREE: msg = "server XDR cleanup"; break; + case RXGEN_CC_XDRFREE: msg = "client XDR cleanup"; break; + case -32: msg = "insufficient data"; break; + default: + return; + } + + m = max; + if (m < 3) { + max = m + 1; + pr_notice("kAFS: Peer reported %s failure on %s [%pISp]\n", + msg, call->type->name, + &call->alist->addrs[call->addr_ix].transport); + } +} + /* * deliver messages to a call */ @@ -563,6 +596,7 @@ static void afs_deliver_to_call(struct afs_call *call) goto out; case -ECONNABORTED: ASSERTCMP(state, ==, AFS_CALL_COMPLETE); + afs_log_error(call, call->abort_code); goto done; case -ENOTSUPP: abort_code = RXGEN_OPCODE; From patchwork Wed Mar 10 16:59:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396820 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4A9AC433E6 for ; Wed, 10 Mar 2021 17:00:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7A1AA64FCA for ; Wed, 10 Mar 2021 17:00:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233325AbhCJQ7l (ORCPT ); Wed, 10 Mar 2021 11:59:41 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:30551 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233759AbhCJQ7P (ORCPT ); Wed, 10 Mar 2021 11:59:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395554; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jKGhlUEF79lTcDOpkH1+fZH/NlMvvSQiStkG5EHK4KI=; b=KQ9tomzHXpGVPzLlk+rPabtPBV91dzRHo72fggO1Hq7Xk1jDH6FKzthAehALokzZ5y1Dv5 COUqcXHaYRXI9YIlikRb3fL643yUnw9rZ+W+A8+PsmeWVAfI9yEAgIxkGl5sha786i3Txi MviT74as+zJsM4CCrLxoG1gGQ5PSA3g= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-62-kU9pJ-PsNkWyN2Vujzgx3w-1; Wed, 10 Mar 2021 11:59:13 -0500 X-MC-Unique: kU9pJ-PsNkWyN2Vujzgx3w-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 179F3108BD06; Wed, 10 Mar 2021 16:59:11 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id EAC7360CE9; Wed, 10 Mar 2021 16:59:02 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 21/28] afs: Set up the iov_iter before calling afs_extract_data() From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:59:01 +0000 Message-ID: <161539554187.286939.15305559004905459852.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org afs_extract_data() sets up a temporary iov_iter and passes it to AF_RXRPC each time it is called to describe the remaining buffer to be filled. Instead: (1) Put an iterator in the afs_call struct. (2) Set the iterator for each marshalling stage to load data into the appropriate places. A number of convenience functions are provided to this end (eg. afs_extract_to_buf()). This iterator is then passed to afs_extract_data(). (3) Use the new ITER_MAPPING iterator when reading data to load directly into the inode's pages without needing to create a list of them. This will allow O_DIRECT calls to be supported in future patches. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/152898380012.11616.12094591785228251717.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/153685394431.14766.3178466345696987059.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/153999787395.866.11218209749223643998.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/154033911195.12041.3882700371848894587.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/158861250059.340223.1248231474865140653.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465827399.1377938.11181327349704960046.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588533776.3465195.3612752083351956948.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118151238.1232039.17015723405750601161.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161047240.2537118.14721975104810564022.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340410333.1303470.16260122230371140878.stgit@warthog.procyon.org.uk/ # v3 --- fs/afs/dir.c | 222 +++++++++++++++++++++++++++++++++++----------------- fs/afs/file.c | 190 ++++++++++++++++++++++++++------------------- fs/afs/fsclient.c | 54 +++---------- fs/afs/internal.h | 16 ++-- fs/afs/write.c | 27 ++++-- fs/afs/yfsclient.c | 54 +++---------- 6 files changed, 314 insertions(+), 249 deletions(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 30c769efee26..021489497801 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -103,6 +103,35 @@ struct afs_lookup_cookie { struct afs_fid fids[50]; }; +/* + * Drop the refs that we're holding on the pages we were reading into. We've + * got refs on the first nr_pages pages. + */ +static void afs_dir_read_cleanup(struct afs_read *req) +{ + struct address_space *mapping = req->vnode->vfs_inode.i_mapping; + struct page *page; + pgoff_t last = req->nr_pages - 1; + + XA_STATE(xas, &mapping->i_pages, 0); + + if (unlikely(!req->nr_pages)) + return; + + rcu_read_lock(); + xas_for_each(&xas, page, last) { + if (xas_retry(&xas, page)) + continue; + BUG_ON(xa_is_value(page)); + BUG_ON(PageCompound(page)); + ASSERTCMP(page->mapping, ==, mapping); + + put_page(page); + } + + rcu_read_unlock(); +} + /* * check that a directory page is valid */ @@ -128,7 +157,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page, qty /= sizeof(union afs_xdr_dir_block); /* check them */ - dbuf = kmap(page); + dbuf = kmap_atomic(page); for (tmp = 0; tmp < qty; tmp++) { if (dbuf->blocks[tmp].hdr.magic != AFS_DIR_MAGIC) { printk("kAFS: %s(%lx): bad magic %d/%d is %04hx\n", @@ -147,7 +176,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page, ((u8 *)&dbuf->blocks[tmp])[AFS_DIR_BLOCK_SIZE - 1] = 0; } - kunmap(page); + kunmap_atomic(dbuf); checked: afs_stat_v(dvnode, n_read_dir); @@ -158,35 +187,74 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page, } /* - * Check the contents of a directory that we've just read. + * Dump the contents of a directory. */ -static bool afs_dir_check_pages(struct afs_vnode *dvnode, struct afs_read *req) +static void afs_dir_dump(struct afs_vnode *dvnode, struct afs_read *req) { struct afs_xdr_dir_page *dbuf; - unsigned int i, j, qty = PAGE_SIZE / sizeof(union afs_xdr_dir_block); + struct address_space *mapping = dvnode->vfs_inode.i_mapping; + struct page *page; + unsigned int i, qty = PAGE_SIZE / sizeof(union afs_xdr_dir_block); + pgoff_t last = req->nr_pages - 1; - for (i = 0; i < req->nr_pages; i++) - if (!afs_dir_check_page(dvnode, req->pages[i], req->actual_len)) - goto bad; - return true; + XA_STATE(xas, &mapping->i_pages, 0); -bad: - pr_warn("DIR %llx:%llx f=%llx l=%llx al=%llx r=%llx\n", + pr_warn("DIR %llx:%llx f=%llx l=%llx al=%llx\n", dvnode->fid.vid, dvnode->fid.vnode, - req->file_size, req->len, req->actual_len, req->remain); - pr_warn("DIR %llx %x %x %x\n", - req->pos, req->index, req->nr_pages, req->offset); + req->file_size, req->len, req->actual_len); + pr_warn("DIR %llx %x %zx %zx\n", + req->pos, req->nr_pages, + req->iter->iov_offset, iov_iter_count(req->iter)); - for (i = 0; i < req->nr_pages; i++) { - dbuf = kmap(req->pages[i]); - for (j = 0; j < qty; j++) { - union afs_xdr_dir_block *block = &dbuf->blocks[j]; + xas_for_each(&xas, page, last) { + if (xas_retry(&xas, page)) + continue; + + BUG_ON(PageCompound(page)); + BUG_ON(page->mapping != mapping); + + dbuf = kmap_atomic(page); + for (i = 0; i < qty; i++) { + union afs_xdr_dir_block *block = &dbuf->blocks[i]; - pr_warn("[%02x] %32phN\n", i * qty + j, block); + pr_warn("[%02lx] %32phN\n", page->index * qty + i, block); } - kunmap(req->pages[i]); + kunmap_atomic(dbuf); } - return false; +} + +/* + * Check all the pages in a directory. All the pages are held pinned. + */ +static int afs_dir_check(struct afs_vnode *dvnode, struct afs_read *req) +{ + struct address_space *mapping = dvnode->vfs_inode.i_mapping; + struct page *page; + pgoff_t last = req->nr_pages - 1; + int ret = 0; + + XA_STATE(xas, &mapping->i_pages, 0); + + if (unlikely(!req->nr_pages)) + return 0; + + rcu_read_lock(); + xas_for_each(&xas, page, last) { + if (xas_retry(&xas, page)) + continue; + + BUG_ON(PageCompound(page)); + BUG_ON(page->mapping != mapping); + + if (!afs_dir_check_page(dvnode, page, req->file_size)) { + afs_dir_dump(dvnode, req); + ret = -EIO; + break; + } + } + + rcu_read_unlock(); + return ret; } /* @@ -215,58 +283,57 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) { struct afs_read *req; loff_t i_size; - int nr_pages, nr_inline, i, n; - int ret = -ENOMEM; + int nr_pages, i, n; + int ret; + + _enter(""); -retry: + req = kzalloc(sizeof(*req), GFP_KERNEL); + if (!req) + return ERR_PTR(-ENOMEM); + + refcount_set(&req->usage, 1); + req->vnode = dvnode; + req->key = key_get(key); + req->cleanup = afs_dir_read_cleanup; + +expand: i_size = i_size_read(&dvnode->vfs_inode); - if (i_size < 2048) - return ERR_PTR(afs_bad(dvnode, afs_file_error_dir_small)); + if (i_size < 2048) { + ret = afs_bad(dvnode, afs_file_error_dir_small); + goto error; + } if (i_size > 2048 * 1024) { trace_afs_file_error(dvnode, -EFBIG, afs_file_error_dir_big); - return ERR_PTR(-EFBIG); + ret = -EFBIG; + goto error; } _enter("%llu", i_size); - /* Get a request record to hold the page list. We want to hold it - * inline if we can, but we don't want to make an order 1 allocation. - */ nr_pages = (i_size + PAGE_SIZE - 1) / PAGE_SIZE; - nr_inline = nr_pages; - if (nr_inline > (PAGE_SIZE - sizeof(*req)) / sizeof(struct page *)) - nr_inline = 0; - req = kzalloc(struct_size(req, array, nr_inline), GFP_KERNEL); - if (!req) - return ERR_PTR(-ENOMEM); - - refcount_set(&req->usage, 1); - req->key = key_get(key); - req->nr_pages = nr_pages; req->actual_len = i_size; /* May change */ req->len = nr_pages * PAGE_SIZE; /* We can ask for more than there is */ req->data_version = dvnode->status.data_version; /* May change */ - if (nr_inline > 0) { - req->pages = req->array; - } else { - req->pages = kcalloc(nr_pages, sizeof(struct page *), - GFP_KERNEL); - if (!req->pages) - goto error; - } + iov_iter_xarray(&req->def_iter, READ, &dvnode->vfs_inode.i_mapping->i_pages, + 0, i_size); + req->iter = &req->def_iter; - /* Get a list of all the pages that hold or will hold the directory - * content. We need to fill in any gaps that we might find where the - * memory reclaimer has been at work. If there are any gaps, we will + /* Fill in any gaps that we might find where the memory reclaimer has + * been at work and pin all the pages. If there are any gaps, we will * need to reread the entire directory contents. */ - i = 0; - do { + i = req->nr_pages; + while (i < nr_pages) { + struct page *pages[8], *page; + n = find_get_pages_contig(dvnode->vfs_inode.i_mapping, i, - req->nr_pages - i, - req->pages + i); - _debug("find %u at %u/%u", n, i, req->nr_pages); + min_t(unsigned int, nr_pages - i, + ARRAY_SIZE(pages)), + pages); + _debug("find %u at %u/%u", n, i, nr_pages); + if (n == 0) { gfp_t gfp = dvnode->vfs_inode.i_mapping->gfp_mask; @@ -274,22 +341,24 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) afs_stat_v(dvnode, n_inval); ret = -ENOMEM; - req->pages[i] = __page_cache_alloc(gfp); - if (!req->pages[i]) + page = __page_cache_alloc(gfp); + if (!page) goto error; - ret = add_to_page_cache_lru(req->pages[i], + ret = add_to_page_cache_lru(page, dvnode->vfs_inode.i_mapping, i, gfp); if (ret < 0) goto error; - attach_page_private(req->pages[i], (void *)1); - unlock_page(req->pages[i]); + attach_page_private(page, (void *)1); + unlock_page(page); + req->nr_pages++; i++; } else { + req->nr_pages += n; i += n; } - } while (i < req->nr_pages); + } /* If we're going to reload, we need to lock all the pages to prevent * races. @@ -313,12 +382,17 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) task_io_account_read(PAGE_SIZE * req->nr_pages); - if (req->len < req->file_size) - goto content_has_grown; + if (req->len < req->file_size) { + /* The content has grown, so we need to expand the + * buffer. + */ + up_write(&dvnode->validate_lock); + goto expand; + } /* Validate the data we just read. */ - ret = -EIO; - if (!afs_dir_check_pages(dvnode, req)) + ret = afs_dir_check(dvnode, req); + if (ret < 0) goto error_unlock; // TODO: Trim excess pages @@ -336,11 +410,6 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) afs_put_read(req); _leave(" = %d", ret); return ERR_PTR(ret); - -content_has_grown: - up_write(&dvnode->validate_lock); - afs_put_read(req); - goto retry; } /* @@ -450,6 +519,7 @@ static int afs_dir_iterate(struct inode *dir, struct dir_context *ctx, struct afs_read *req; struct page *page; unsigned blkoff, limit; + void __rcu **slot; int ret; _enter("{%lu},%u,,", dir->i_ino, (unsigned)ctx->pos); @@ -474,9 +544,15 @@ static int afs_dir_iterate(struct inode *dir, struct dir_context *ctx, blkoff = ctx->pos & ~(sizeof(union afs_xdr_dir_block) - 1); /* Fetch the appropriate page from the directory and re-add it - * to the LRU. + * to the LRU. We have all the pages pinned with an extra ref. */ - page = req->pages[blkoff / PAGE_SIZE]; + rcu_read_lock(); + page = NULL; + slot = radix_tree_lookup_slot(&dvnode->vfs_inode.i_mapping->i_pages, + blkoff / PAGE_SIZE); + if (slot) + page = radix_tree_deref_slot(slot); + rcu_read_unlock(); if (!page) { ret = afs_bad(dvnode, afs_file_error_dir_missing_page); break; diff --git a/fs/afs/file.c b/fs/afs/file.c index d23192b3b933..f1bab69e99d4 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -184,21 +184,72 @@ int afs_release(struct inode *inode, struct file *file) return ret; } +/* + * Handle completion of a read operation. + */ +static void afs_file_read_done(struct afs_read *req) +{ + struct afs_vnode *vnode = req->vnode; + struct page *page; + pgoff_t index = req->pos >> PAGE_SHIFT; + pgoff_t last = index + req->nr_pages - 1; + + XA_STATE(xas, &vnode->vfs_inode.i_mapping->i_pages, index); + + if (iov_iter_count(req->iter) > 0) { + /* The read was short - clear the excess buffer. */ + _debug("afterclear %zx %zx %llx/%llx", + req->iter->iov_offset, + iov_iter_count(req->iter), + req->actual_len, req->len); + iov_iter_zero(iov_iter_count(req->iter), req->iter); + } + + rcu_read_lock(); + xas_for_each(&xas, page, last) { + page_endio(page, false, 0); + put_page(page); + } + rcu_read_unlock(); + + task_io_account_read(req->len); + req->cleanup = NULL; +} + +/* + * Dispose of our locks and refs on the pages if the read failed. + */ +static void afs_file_read_cleanup(struct afs_read *req) +{ + struct page *page; + pgoff_t index = req->pos >> PAGE_SHIFT; + pgoff_t last = index + req->nr_pages - 1; + + if (req->iter) { + XA_STATE(xas, &req->vnode->vfs_inode.i_mapping->i_pages, index); + + _enter("%lu,%u,%zu", index, req->nr_pages, iov_iter_count(req->iter)); + + rcu_read_lock(); + xas_for_each(&xas, page, last) { + BUG_ON(xa_is_value(page)); + BUG_ON(PageCompound(page)); + + page_endio(page, false, req->error); + put_page(page); + } + rcu_read_unlock(); + } +} + /* * Dispose of a ref to a read record. */ void afs_put_read(struct afs_read *req) { - int i; - if (refcount_dec_and_test(&req->usage)) { - if (req->pages) { - for (i = 0; i < req->nr_pages; i++) - if (req->pages[i]) - put_page(req->pages[i]); - if (req->pages != req->array) - kfree(req->pages); - } + if (req->cleanup) + req->cleanup(req); key_put(req->key); kfree(req); } @@ -216,6 +267,7 @@ static void afs_fetch_data_success(struct afs_operation *op) static void afs_fetch_data_put(struct afs_operation *op) { + op->fetch.req->error = op->error; afs_put_read(op->fetch.req); } @@ -255,12 +307,11 @@ int afs_fetch_data(struct afs_vnode *vnode, struct afs_read *req) /* * read page from file, directory or symlink, given a key to use */ -int afs_page_filler(void *data, struct page *page) +static int afs_page_filler(struct key *key, struct page *page) { struct inode *inode = page->mapping->host; struct afs_vnode *vnode = AFS_FS_I(inode); struct afs_read *req; - struct key *key = data; int ret; _enter("{%x},{%lu},{%lu}", key_serial(key), inode->i_ino, page->index); @@ -271,53 +322,52 @@ int afs_page_filler(void *data, struct page *page) if (test_bit(AFS_VNODE_DELETED, &vnode->flags)) goto error; - req = kzalloc(struct_size(req, array, 1), GFP_KERNEL); + req = kzalloc(sizeof(struct afs_read), GFP_KERNEL); if (!req) goto enomem; - /* We request a full page. If the page is a partial one at the - * end of the file, the server will return a short read and the - * unmarshalling code will clear the unfilled space. - */ refcount_set(&req->usage, 1); - req->key = key_get(key); - req->pos = (loff_t)page->index << PAGE_SHIFT; - req->len = PAGE_SIZE; - req->nr_pages = 1; - req->pages = req->array; - req->pages[0] = page; + req->vnode = vnode; + req->key = key_get(key); + req->pos = (loff_t)page->index << PAGE_SHIFT; + req->len = PAGE_SIZE; + req->nr_pages = 1; + req->done = afs_file_read_done; + req->cleanup = afs_file_read_cleanup; + get_page(page); + iov_iter_xarray(&req->def_iter, READ, &page->mapping->i_pages, + req->pos, req->len); + req->iter = &req->def_iter; - /* read the contents of the file from the server into the - * page */ ret = afs_fetch_data(vnode, req); - afs_put_read(req); - - if (ret < 0) { - if (ret == -ENOENT) { - _debug("got NOENT from server" - " - marking file deleted and stale"); - set_bit(AFS_VNODE_DELETED, &vnode->flags); - ret = -ESTALE; - } - - if (ret == -EINTR || - ret == -ENOMEM || - ret == -ERESTARTSYS || - ret == -EAGAIN) - goto error; - goto io_error; - } - - SetPageUptodate(page); - unlock_page(page); + if (ret < 0) + goto fetch_error; + afs_put_read(req); _leave(" = 0"); return 0; -io_error: - SetPageError(page); - goto error; +fetch_error: + switch (ret) { + case -EINTR: + case -ENOMEM: + case -ERESTARTSYS: + case -EAGAIN: + afs_put_read(req); + goto error; + case -ENOENT: + _debug("got NOENT from server - marking file deleted and stale"); + set_bit(AFS_VNODE_DELETED, &vnode->flags); + ret = -ESTALE; + /* Fall through */ + default: + page_endio(page, false, ret); + afs_put_read(req); + _leave(" = %d", ret); + return ret; + } + enomem: ret = -ENOMEM; error: @@ -352,19 +402,6 @@ static int afs_readpage(struct file *file, struct page *page) return ret; } -/* - * Make pages available as they're filled. - */ -static void afs_readpages_page_done(struct afs_read *req) -{ - struct page *page = req->pages[req->index]; - - req->pages[req->index] = NULL; - SetPageUptodate(page); - unlock_page(page); - put_page(page); -} - /* * Read a contiguous set of pages. */ @@ -376,7 +413,7 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping, struct list_head *p; struct page *first, *page; pgoff_t index; - int ret, n, i; + int ret, n; /* Count the number of contiguous pages at the front of the list. Note * that the list goes prev-wards rather than next-wards. @@ -392,21 +429,20 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping, n++; } - req = kzalloc(struct_size(req, array, n), GFP_NOFS); + req = kzalloc(sizeof(struct afs_read), GFP_NOFS); if (!req) return -ENOMEM; refcount_set(&req->usage, 1); req->vnode = vnode; req->key = key_get(afs_file_key(file)); - req->page_done = afs_readpages_page_done; + req->done = afs_file_read_done; + req->cleanup = afs_file_read_cleanup; req->pos = first->index; req->pos <<= PAGE_SHIFT; - req->pages = req->array; - /* Transfer the pages to the request. We add them in until one fails - * to add to the LRU and then we stop (as that'll make a hole in the - * contiguous run. + /* Add pages to the LRU until it fails. We keep the pages ref'd and + * locked until the read is complete. * * Note that it's possible for the file size to change whilst we're * doing this, but we rely on the server returning less than we asked @@ -423,8 +459,7 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping, break; } - req->pages[req->nr_pages++] = page; - req->len += PAGE_SIZE; + req->nr_pages++; } while (req->nr_pages < n); if (req->nr_pages == 0) { @@ -432,30 +467,25 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping, return 0; } + req->len = req->nr_pages * PAGE_SIZE; + iov_iter_xarray(&req->def_iter, READ, &file->f_mapping->i_pages, + req->pos, req->len); + req->iter = &req->def_iter; + ret = afs_fetch_data(vnode, req); if (ret < 0) goto error; - task_io_account_read(PAGE_SIZE * req->nr_pages); afs_put_read(req); return 0; error: if (ret == -ENOENT) { - _debug("got NOENT from server" - " - marking file deleted and stale"); + _debug("got NOENT from server - marking file deleted and stale"); set_bit(AFS_VNODE_DELETED, &vnode->flags); ret = -ESTALE; } - for (i = 0; i < req->nr_pages; i++) { - page = req->pages[i]; - if (page) { - SetPageError(page); - unlock_page(page); - } - } - afs_put_read(req); return ret; } diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 4a57c6c6f12b..897b37301851 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -302,7 +302,6 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call) struct afs_vnode_param *vp = &op->file[0]; struct afs_read *req = op->fetch.req; const __be32 *bp; - unsigned int size; int ret; _enter("{%u,%zu,%zu/%llu}", @@ -312,8 +311,6 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call) switch (call->unmarshall) { case 0: req->actual_len = 0; - req->index = 0; - req->offset = req->pos & (PAGE_SIZE - 1); call->unmarshall++; if (call->operation_ID == FSFETCHDATA64) { afs_extract_to_tmp64(call); @@ -323,7 +320,10 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call) } fallthrough; - /* extract the returned data length */ + /* Extract the returned data length into + * ->actual_len. This may indicate more or less data than was + * requested will be returned. + */ case 1: _debug("extract data length"); ret = afs_extract_data(call, true); @@ -332,45 +332,25 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call) req->actual_len = be64_to_cpu(call->tmp64); _debug("DATA length: %llu", req->actual_len); - req->remain = min(req->len, req->actual_len); - if (req->remain == 0) + + if (req->actual_len == 0) goto no_more_data; + call->iter = req->iter; + call->iov_len = min(req->actual_len, req->len); call->unmarshall++; - - begin_page: - ASSERTCMP(req->index, <, req->nr_pages); - if (req->remain > PAGE_SIZE - req->offset) - size = PAGE_SIZE - req->offset; - else - size = req->remain; - call->iov_len = size; - call->bvec[0].bv_len = size; - call->bvec[0].bv_offset = req->offset; - call->bvec[0].bv_page = req->pages[req->index]; - iov_iter_bvec(&call->def_iter, READ, call->bvec, 1, size); - ASSERTCMP(size, <=, PAGE_SIZE); fallthrough; /* extract the returned data */ case 2: _debug("extract data %zu/%llu", - iov_iter_count(call->iter), req->remain); + iov_iter_count(call->iter), req->actual_len); ret = afs_extract_data(call, true); if (ret < 0) return ret; - req->remain -= call->bvec[0].bv_len; - req->offset += call->bvec[0].bv_len; - ASSERTCMP(req->offset, <=, PAGE_SIZE); - if (req->offset == PAGE_SIZE) { - req->offset = 0; - req->index++; - if (req->remain > 0) - goto begin_page; - } - ASSERTCMP(req->remain, ==, 0); + call->iter = &call->def_iter; if (req->actual_len <= req->len) goto no_more_data; @@ -412,16 +392,8 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call) break; } - for (; req->index < req->nr_pages; req->index++) { - if (req->offset < PAGE_SIZE) - zero_user_segment(req->pages[req->index], - req->offset, PAGE_SIZE); - req->offset = 0; - } - - if (req->page_done) - for (req->index = 0; req->index < req->nr_pages; req->index++) - req->page_done(req); + if (req->done) + req->done(req); _leave(" = 0 [done]"); return 0; @@ -496,6 +468,8 @@ void afs_fs_fetch_data(struct afs_operation *op) if (!call) return afs_op_nomem(op); + req->call_debug_id = call->debug_id; + /* marshall the parameters */ bp = call->request; bp[0] = htonl(FSFETCHDATA); diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 4725cfc4aaef..0ed1793949dd 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -31,6 +31,7 @@ struct pagevec; struct afs_call; +struct afs_vnode; /* * Partial file-locking emulation mode. (The problem being that AFS3 only @@ -203,18 +204,18 @@ struct afs_read { loff_t pos; /* Where to start reading */ loff_t len; /* How much we're asking for */ loff_t actual_len; /* How much we're actually getting */ - loff_t remain; /* Amount remaining */ loff_t file_size; /* File size returned by server */ struct key *key; /* The key to use to reissue the read */ + struct afs_vnode *vnode; /* The file being read into. */ afs_dataversion_t data_version; /* Version number returned by server */ refcount_t usage; - unsigned int index; /* Which page we're reading into */ + unsigned int call_debug_id; unsigned int nr_pages; - unsigned int offset; /* offset into current page */ - struct afs_vnode *vnode; - void (*page_done)(struct afs_read *); - struct page **pages; - struct page *array[]; + int error; + void (*done)(struct afs_read *); + void (*cleanup)(struct afs_read *); + struct iov_iter *iter; /* Iterator representing the buffer */ + struct iov_iter def_iter; /* Default iterator */ }; /* @@ -1048,7 +1049,6 @@ extern void afs_put_wb_key(struct afs_wb_key *); extern int afs_open(struct inode *, struct file *); extern int afs_release(struct inode *, struct file *); extern int afs_fetch_data(struct afs_vnode *, struct afs_read *); -extern int afs_page_filler(void *, struct page *); extern void afs_put_read(struct afs_read *); static inline struct afs_read *afs_get_read(struct afs_read *req) diff --git a/fs/afs/write.c b/fs/afs/write.c index 7eba0d3201ba..e78a9bc3b02d 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -22,6 +22,16 @@ int afs_set_page_dirty(struct page *page) return __set_page_dirty_nobuffers(page); } +/* + * Handle completion of a read operation to fill a page. + */ +static void afs_fill_hole(struct afs_read *req) +{ + if (iov_iter_count(req->iter) > 0) + /* The read was short - clear the excess buffer. */ + iov_iter_zero(iov_iter_count(req->iter), req->iter); +} + /* * partly or wholly fill a page that's under preparation for writing */ @@ -45,18 +55,19 @@ static int afs_fill_page(struct file *file, return 0; } - req = kzalloc(struct_size(req, array, 1), GFP_KERNEL); + req = kzalloc(sizeof(struct afs_read), GFP_KERNEL); if (!req) return -ENOMEM; refcount_set(&req->usage, 1); - req->key = key_get(afs_file_key(file)); - req->pos = pos; - req->len = len; - req->nr_pages = 1; - req->pages = req->array; - req->pages[0] = page; - get_page(page); + req->vnode = vnode; + req->done = afs_fill_hole; + req->key = key_get(afs_file_key(file)); + req->pos = pos; + req->len = len; + req->nr_pages = 1; + req->iter = &req->def_iter; + iov_iter_xarray(&req->def_iter, READ, &file->f_mapping->i_pages, pos, len); ret = afs_fetch_data(vnode, req); afs_put_read(req); diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index 6c45d32da13c..abcec145db4b 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -360,7 +360,6 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call) struct afs_vnode_param *vp = &op->file[0]; struct afs_read *req = op->fetch.req; const __be32 *bp; - unsigned int size; int ret; _enter("{%u,%zu, %zu/%llu}", @@ -370,13 +369,14 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call) switch (call->unmarshall) { case 0: req->actual_len = 0; - req->index = 0; - req->offset = req->pos & (PAGE_SIZE - 1); afs_extract_to_tmp64(call); call->unmarshall++; fallthrough; - /* extract the returned data length */ + /* Extract the returned data length into ->actual_len. This + * may indicate more or less data than was requested will be + * returned. + */ case 1: _debug("extract data length"); ret = afs_extract_data(call, true); @@ -385,45 +385,25 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call) req->actual_len = be64_to_cpu(call->tmp64); _debug("DATA length: %llu", req->actual_len); - req->remain = min(req->len, req->actual_len); - if (req->remain == 0) + + if (req->actual_len == 0) goto no_more_data; + call->iter = req->iter; + call->iov_len = min(req->actual_len, req->len); call->unmarshall++; - - begin_page: - ASSERTCMP(req->index, <, req->nr_pages); - if (req->remain > PAGE_SIZE - req->offset) - size = PAGE_SIZE - req->offset; - else - size = req->remain; - call->iov_len = size; - call->bvec[0].bv_len = size; - call->bvec[0].bv_offset = req->offset; - call->bvec[0].bv_page = req->pages[req->index]; - iov_iter_bvec(&call->def_iter, READ, call->bvec, 1, size); - ASSERTCMP(size, <=, PAGE_SIZE); fallthrough; /* extract the returned data */ case 2: _debug("extract data %zu/%llu", - iov_iter_count(call->iter), req->remain); + iov_iter_count(call->iter), req->actual_len); ret = afs_extract_data(call, true); if (ret < 0) return ret; - req->remain -= call->bvec[0].bv_len; - req->offset += call->bvec[0].bv_len; - ASSERTCMP(req->offset, <=, PAGE_SIZE); - if (req->offset == PAGE_SIZE) { - req->offset = 0; - req->index++; - if (req->remain > 0) - goto begin_page; - } - ASSERTCMP(req->remain, ==, 0); + call->iter = &call->def_iter; if (req->actual_len <= req->len) goto no_more_data; @@ -469,16 +449,8 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call) break; } - for (; req->index < req->nr_pages; req->index++) { - if (req->offset < PAGE_SIZE) - zero_user_segment(req->pages[req->index], - req->offset, PAGE_SIZE); - req->offset = 0; - } - - if (req->page_done) - for (req->index = 0; req->index < req->nr_pages; req->index++) - req->page_done(req); + if (req->done) + req->done(req); _leave(" = 0 [done]"); return 0; @@ -518,6 +490,8 @@ void yfs_fs_fetch_data(struct afs_operation *op) if (!call) return afs_op_nomem(op); + req->call_debug_id = call->debug_id; + /* marshall the parameters */ bp = call->request; bp = xdr_encode_u32(bp, YFSFETCHDATA64); From patchwork Wed Mar 10 16:59:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 397622 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE71FC43381 for ; Wed, 10 Mar 2021 17:00:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AEE5364FC8 for ; Wed, 10 Mar 2021 17:00:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233516AbhCJQ7m (ORCPT ); Wed, 10 Mar 2021 11:59:42 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:58978 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233777AbhCJQ7a (ORCPT ); Wed, 10 Mar 2021 11:59:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395569; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=23/uau4IKv0ZIDU3TV8ZbfKGSR3Dzt3o5ODM+F77Hxs=; b=J7FR8eO1gY5rtmbM/g7PI1SVRQLrwpKTCVmKCFekAwdFC+wJePsnE2HsVwaK1khEEWnNDp XRKbBbDaKldMbrtrQDr1X38qrOU2cc4oYShUCqPuEg6qG4Fv1fn1JgtmepnZjF01PzGBsp J6JSXcm28zmKpFy3ioHEu1qFAR/9E5A= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-263-CgcrjcsTN3KP4V5ZM2sWXg-1; Wed, 10 Mar 2021 11:59:25 -0500 X-MC-Unique: CgcrjcsTN3KP4V5ZM2sWXg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A53A0802B62; Wed, 10 Mar 2021 16:59:23 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5BA745B6A8; Wed, 10 Mar 2021 16:59:16 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 22/28] afs: Use ITER_XARRAY for writing From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:59:16 +0000 Message-ID: <161539555629.286939.5241869986617154517.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Use a single ITER_XARRAY iterator to describe the portion of a file to be transmitted to the server rather than generating a series of small ITER_BVEC iterators on the fly. This will make it easier to implement AIO in afs. In theory we could maybe use one giant ITER_BVEC, but that means potentially allocating a huge array of bio_vec structs (max 256 per page) when in fact the pagecache already has a structure listing all the relevant pages (radix_tree/xarray) that can be walked over. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/153685395197.14766.16289516750731233933.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/158861251312.340223.17924900795425422532.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465828607.1377938.6903132788463419368.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588535018.3465195.14509994354240338307.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118152415.1232039.6452879415814850025.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161048194.2537118.13763612220937637316.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340411602.1303470.4661108879482218408.stgit@warthog.procyon.org.uk/ # v3 --- fs/afs/fsclient.c | 50 +++++++++------------ fs/afs/internal.h | 15 +++--- fs/afs/rxrpc.c | 103 ++++++-------------------------------------- fs/afs/write.c | 100 ++++++++++++++++++++++++------------------- fs/afs/yfsclient.c | 25 +++-------- include/trace/events/afs.h | 51 ++++++++-------------- 6 files changed, 126 insertions(+), 218 deletions(-) diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 897b37301851..31e6b3635541 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -1055,8 +1055,7 @@ static const struct afs_call_type afs_RXFSStoreData64 = { /* * store a set of pages to a very large file */ -static void afs_fs_store_data64(struct afs_operation *op, - loff_t pos, loff_t size, loff_t i_size) +static void afs_fs_store_data64(struct afs_operation *op) { struct afs_vnode_param *vp = &op->file[0]; struct afs_call *call; @@ -1071,7 +1070,7 @@ static void afs_fs_store_data64(struct afs_operation *op, if (!call) return afs_op_nomem(op); - call->send_pages = true; + call->write_iter = op->store.write_iter; /* marshall the parameters */ bp = call->request; @@ -1087,47 +1086,38 @@ static void afs_fs_store_data64(struct afs_operation *op, *bp++ = 0; /* unix mode */ *bp++ = 0; /* segment size */ - *bp++ = htonl(upper_32_bits(pos)); - *bp++ = htonl(lower_32_bits(pos)); - *bp++ = htonl(upper_32_bits(size)); - *bp++ = htonl(lower_32_bits(size)); - *bp++ = htonl(upper_32_bits(i_size)); - *bp++ = htonl(lower_32_bits(i_size)); + *bp++ = htonl(upper_32_bits(op->store.pos)); + *bp++ = htonl(lower_32_bits(op->store.pos)); + *bp++ = htonl(upper_32_bits(op->store.size)); + *bp++ = htonl(lower_32_bits(op->store.size)); + *bp++ = htonl(upper_32_bits(op->store.i_size)); + *bp++ = htonl(lower_32_bits(op->store.i_size)); trace_afs_make_fs_call(call, &vp->fid); afs_make_op_call(op, call, GFP_NOFS); } /* - * store a set of pages + * Write data to a file on the server. */ void afs_fs_store_data(struct afs_operation *op) { struct afs_vnode_param *vp = &op->file[0]; struct afs_call *call; - loff_t size, pos, i_size; __be32 *bp; _enter(",%x,{%llx:%llu},,", key_serial(op->key), vp->fid.vid, vp->fid.vnode); - size = (loff_t)op->store.last_to - (loff_t)op->store.first_offset; - if (op->store.first != op->store.last) - size += (loff_t)(op->store.last - op->store.first) << PAGE_SHIFT; - pos = (loff_t)op->store.first << PAGE_SHIFT; - pos += op->store.first_offset; - - i_size = i_size_read(&vp->vnode->vfs_inode); - if (pos + size > i_size) - i_size = size + pos; - _debug("size %llx, at %llx, i_size %llx", - (unsigned long long) size, (unsigned long long) pos, - (unsigned long long) i_size); + (unsigned long long)op->store.size, + (unsigned long long)op->store.pos, + (unsigned long long)op->store.i_size); - if (upper_32_bits(pos) || upper_32_bits(i_size) || upper_32_bits(size) || - upper_32_bits(pos + size)) - return afs_fs_store_data64(op, pos, size, i_size); + if (upper_32_bits(op->store.pos) || + upper_32_bits(op->store.size) || + upper_32_bits(op->store.i_size)) + return afs_fs_store_data64(op); call = afs_alloc_flat_call(op->net, &afs_RXFSStoreData, (4 + 6 + 3) * 4, @@ -1135,7 +1125,7 @@ void afs_fs_store_data(struct afs_operation *op) if (!call) return afs_op_nomem(op); - call->send_pages = true; + call->write_iter = op->store.write_iter; /* marshall the parameters */ bp = call->request; @@ -1151,9 +1141,9 @@ void afs_fs_store_data(struct afs_operation *op) *bp++ = 0; /* unix mode */ *bp++ = 0; /* segment size */ - *bp++ = htonl(lower_32_bits(pos)); - *bp++ = htonl(lower_32_bits(size)); - *bp++ = htonl(lower_32_bits(i_size)); + *bp++ = htonl(lower_32_bits(op->store.pos)); + *bp++ = htonl(lower_32_bits(op->store.size)); + *bp++ = htonl(lower_32_bits(op->store.i_size)); trace_afs_make_fs_call(call, &vp->fid); afs_make_op_call(op, call, GFP_NOFS); diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 0ed1793949dd..4076c6ba43eb 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -107,6 +107,7 @@ struct afs_call { void *request; /* request data (first part) */ size_t iov_len; /* Size of *iter to be used */ struct iov_iter def_iter; /* Default buffer/data iterator */ + struct iov_iter *write_iter; /* Iterator defining write to be made */ struct iov_iter *iter; /* Iterator currently in use */ union { /* Convenience for ->def_iter */ struct kvec kvec[1]; @@ -133,7 +134,6 @@ struct afs_call { unsigned char unmarshall; /* unmarshalling phase */ unsigned char addr_ix; /* Address in ->alist */ bool drop_ref; /* T if need to drop ref for incoming call */ - bool send_pages; /* T if data from mapping should be sent */ bool need_attention; /* T if RxRPC poked us */ bool async; /* T if asynchronous */ bool upgrade; /* T to request service upgrade */ @@ -811,12 +811,13 @@ struct afs_operation { afs_lock_type_t type; } lock; struct { - struct address_space *mapping; /* Pages being written from */ - pgoff_t first; /* first page in mapping to deal with */ - pgoff_t last; /* last page in mapping to deal with */ - unsigned first_offset; /* offset into mapping[first] */ - unsigned last_to; /* amount of mapping[last] */ - bool laundering; /* Laundering page, PG_writeback not set */ + struct iov_iter *write_iter; + loff_t pos; + loff_t size; + loff_t i_size; + pgoff_t first; /* first page in mapping to deal with */ + pgoff_t last; /* last page in mapping to deal with */ + bool laundering; /* Laundering page, PG_writeback not set */ } store; struct { struct iattr *attr; diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index ae68576f822f..23a1a92d64bb 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -271,40 +271,6 @@ void afs_flat_call_destructor(struct afs_call *call) call->buffer = NULL; } -#define AFS_BVEC_MAX 8 - -/* - * Load the given bvec with the next few pages. - */ -static void afs_load_bvec(struct afs_call *call, struct msghdr *msg, - struct bio_vec *bv, pgoff_t first, pgoff_t last, - unsigned offset) -{ - struct afs_operation *op = call->op; - struct page *pages[AFS_BVEC_MAX]; - unsigned int nr, n, i, to, bytes = 0; - - nr = min_t(pgoff_t, last - first + 1, AFS_BVEC_MAX); - n = find_get_pages_contig(op->store.mapping, first, nr, pages); - ASSERTCMP(n, ==, nr); - - msg->msg_flags |= MSG_MORE; - for (i = 0; i < nr; i++) { - to = PAGE_SIZE; - if (first + i >= last) { - to = op->store.last_to; - msg->msg_flags &= ~MSG_MORE; - } - bv[i].bv_page = pages[i]; - bv[i].bv_len = to - offset; - bv[i].bv_offset = offset; - bytes += to - offset; - offset = 0; - } - - iov_iter_bvec(&msg->msg_iter, WRITE, bv, nr, bytes); -} - /* * Advance the AFS call state when the RxRPC call ends the transmit phase. */ @@ -317,42 +283,6 @@ static void afs_notify_end_request_tx(struct sock *sock, afs_set_call_state(call, AFS_CALL_CL_REQUESTING, AFS_CALL_CL_AWAIT_REPLY); } -/* - * attach the data from a bunch of pages on an inode to a call - */ -static int afs_send_pages(struct afs_call *call, struct msghdr *msg) -{ - struct afs_operation *op = call->op; - struct bio_vec bv[AFS_BVEC_MAX]; - unsigned int bytes, nr, loop, offset; - pgoff_t first = op->store.first, last = op->store.last; - int ret; - - offset = op->store.first_offset; - op->store.first_offset = 0; - - do { - afs_load_bvec(call, msg, bv, first, last, offset); - trace_afs_send_pages(call, msg, first, last, offset); - - offset = 0; - bytes = msg->msg_iter.count; - nr = msg->msg_iter.nr_segs; - - ret = rxrpc_kernel_send_data(op->net->socket, call->rxcall, msg, - bytes, afs_notify_end_request_tx); - for (loop = 0; loop < nr; loop++) - put_page(bv[loop].bv_page); - if (ret < 0) - break; - - first += nr; - } while (first <= last); - - trace_afs_sent_pages(call, op->store.first, last, first, ret); - return ret; -} - /* * Initiate a call and synchronously queue up the parameters for dispatch. Any * error is stored into the call struct, which the caller must check for. @@ -384,21 +314,8 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp) * after the initial fixed part. */ tx_total_len = call->request_size; - if (call->send_pages) { - struct afs_operation *op = call->op; - - if (op->store.last == op->store.first) { - tx_total_len += op->store.last_to - op->store.first_offset; - } else { - /* It looks mathematically like you should be able to - * combine the following lines with the ones above, but - * unsigned arithmetic is fun when it wraps... - */ - tx_total_len += PAGE_SIZE - op->store.first_offset; - tx_total_len += op->store.last_to; - tx_total_len += (op->store.last - op->store.first - 1) * PAGE_SIZE; - } - } + if (call->write_iter) + tx_total_len += iov_iter_count(call->write_iter); /* If the call is going to be asynchronous, we need an extra ref for * the call to hold itself so the caller need not hang on to its ref. @@ -440,7 +357,7 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp) iov_iter_kvec(&msg.msg_iter, WRITE, iov, 1, call->request_size); msg.msg_control = NULL; msg.msg_controllen = 0; - msg.msg_flags = MSG_WAITALL | (call->send_pages ? MSG_MORE : 0); + msg.msg_flags = MSG_WAITALL | (call->write_iter ? MSG_MORE : 0); ret = rxrpc_kernel_send_data(call->net->socket, rxcall, &msg, call->request_size, @@ -448,8 +365,18 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp) if (ret < 0) goto error_do_abort; - if (call->send_pages) { - ret = afs_send_pages(call, &msg); + if (call->write_iter) { + msg.msg_iter = *call->write_iter; + msg.msg_flags &= ~MSG_MORE; + trace_afs_send_data(call, &msg); + + ret = rxrpc_kernel_send_data(call->net->socket, + call->rxcall, &msg, + iov_iter_count(&msg.msg_iter), + afs_notify_end_request_tx); + *call->write_iter = msg.msg_iter; + + trace_afs_sent_data(call, &msg, ret); if (ret < 0) goto error_do_abort; } diff --git a/fs/afs/write.c b/fs/afs/write.c index e78a9bc3b02d..dd4dc1c868b5 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -325,36 +325,27 @@ static void afs_redirty_pages(struct writeback_control *wbc, /* * completion of write to server */ -static void afs_pages_written_back(struct afs_vnode *vnode, - pgoff_t first, pgoff_t last) +static void afs_pages_written_back(struct afs_vnode *vnode, pgoff_t start, pgoff_t last) { - struct pagevec pv; - unsigned count, loop; + struct address_space *mapping = vnode->vfs_inode.i_mapping; + struct page *page; + + XA_STATE(xas, &mapping->i_pages, start); _enter("{%llx:%llu},{%lx-%lx}", - vnode->fid.vid, vnode->fid.vnode, first, last); + vnode->fid.vid, vnode->fid.vnode, start, last); - pagevec_init(&pv); + rcu_read_lock(); - do { - _debug("done %lx-%lx", first, last); + xas_for_each(&xas, page, last) { + ASSERT(PageWriteback(page)); - count = last - first + 1; - if (count > PAGEVEC_SIZE) - count = PAGEVEC_SIZE; - pv.nr = find_get_pages_contig(vnode->vfs_inode.i_mapping, - first, count, pv.pages); - ASSERTCMP(pv.nr, ==, count); + detach_page_private(page); + trace_afs_page_dirty(vnode, tracepoint_string("clear"), page); + page_endio(page, true, 0); + } - for (loop = 0; loop < count; loop++) { - detach_page_private(pv.pages[loop]); - trace_afs_page_dirty(vnode, tracepoint_string("clear"), - pv.pages[loop]); - end_page_writeback(pv.pages[loop]); - } - first += count; - __pagevec_release(&pv); - } while (first <= last); + rcu_read_unlock(); afs_prune_wb_keys(vnode); _leave(""); @@ -411,9 +402,7 @@ static void afs_store_data_success(struct afs_operation *op) if (!op->store.laundering) afs_pages_written_back(vnode, op->store.first, op->store.last); afs_stat_v(vnode, n_stores); - atomic_long_add((op->store.last * PAGE_SIZE + op->store.last_to) - - (op->store.first * PAGE_SIZE + op->store.first_offset), - &afs_v2net(vnode)->n_store_bytes); + atomic_long_add(op->store.size, &afs_v2net(vnode)->n_store_bytes); } } @@ -426,21 +415,21 @@ static const struct afs_operation_ops afs_store_data_operation = { /* * write to a file */ -static int afs_store_data(struct address_space *mapping, - pgoff_t first, pgoff_t last, - unsigned offset, unsigned to, bool laundering) +static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, + loff_t pos, pgoff_t first, pgoff_t last, + bool laundering) { - struct afs_vnode *vnode = AFS_FS_I(mapping->host); struct afs_operation *op; struct afs_wb_key *wbk = NULL; - int ret; + loff_t size = iov_iter_count(iter), i_size; + int ret = -ENOKEY; - _enter("%s{%llx:%llu.%u},%lx,%lx,%x,%x", + _enter("%s{%llx:%llu.%u},%llx,%llx", vnode->volume->name, vnode->fid.vid, vnode->fid.vnode, vnode->fid.unique, - first, last, offset, to); + size, pos); ret = afs_get_writeback_key(vnode, &wbk); if (ret) { @@ -454,13 +443,16 @@ static int afs_store_data(struct address_space *mapping, return -ENOMEM; } + i_size = i_size_read(&vnode->vfs_inode); + afs_op_set_vnode(op, 0, vnode); op->file[0].dv_delta = 1; - op->store.mapping = mapping; + op->store.write_iter = iter; + op->store.pos = pos; op->store.first = first; op->store.last = last; - op->store.first_offset = offset; - op->store.last_to = to; + op->store.size = size; + op->store.i_size = max(pos + size, i_size); op->store.laundering = laundering; op->mtime = vnode->vfs_inode.i_mtime; op->flags |= AFS_OPERATION_UNINTR; @@ -503,11 +495,12 @@ static int afs_write_back_from_locked_page(struct address_space *mapping, pgoff_t final_page) { struct afs_vnode *vnode = AFS_FS_I(mapping->host); + struct iov_iter iter; struct page *pages[8], *page; unsigned long count, priv; unsigned n, offset, to, f, t; pgoff_t start, first, last; - loff_t i_size, end; + loff_t i_size, pos, end; int loop, ret; _enter(",%lx", primary_page->index); @@ -604,15 +597,28 @@ static int afs_write_back_from_locked_page(struct address_space *mapping, first = primary_page->index; last = first + count - 1; + _debug("write back %lx[%u..] to %lx[..%u]", first, offset, last, to); - end = (loff_t)last * PAGE_SIZE + to; - i_size = i_size_read(&vnode->vfs_inode); + pos = first; + pos <<= PAGE_SHIFT; + pos += offset; + end = last; + end <<= PAGE_SHIFT; + end += to; - _debug("write back %lx[%u..] to %lx[..%u]", first, offset, last, to); + /* Trim the actual write down to the EOF */ + i_size = i_size_read(&vnode->vfs_inode); if (end > i_size) - to = i_size & ~PAGE_MASK; + end = i_size; + + if (pos < i_size) { + iov_iter_xarray(&iter, WRITE, &mapping->i_pages, pos, end - pos); + ret = afs_store_data(vnode, &iter, pos, first, last, false); + } else { + /* The dirty region was entirely beyond the EOF. */ + ret = 0; + } - ret = afs_store_data(mapping, first, last, offset, to, false); switch (ret) { case 0: ret = count; @@ -913,6 +919,8 @@ int afs_launder_page(struct page *page) { struct address_space *mapping = page->mapping; struct afs_vnode *vnode = AFS_FS_I(mapping->host); + struct iov_iter iter; + struct bio_vec bv[1]; unsigned long priv; unsigned int f, t; int ret = 0; @@ -928,8 +936,14 @@ int afs_launder_page(struct page *page) t = afs_page_dirty_to(page, priv); } + bv[0].bv_page = page; + bv[0].bv_offset = f; + bv[0].bv_len = t - f; + iov_iter_bvec(&iter, WRITE, bv, 1, bv[0].bv_len); + trace_afs_page_dirty(vnode, tracepoint_string("launder"), page); - ret = afs_store_data(mapping, page->index, page->index, t, f, true); + ret = afs_store_data(vnode, &iter, (loff_t)page->index << PAGE_SHIFT, + page->index, page->index, true); } detach_page_private(page); diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index abcec145db4b..363d6dd276c0 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -1078,25 +1078,15 @@ void yfs_fs_store_data(struct afs_operation *op) { struct afs_vnode_param *vp = &op->file[0]; struct afs_call *call; - loff_t size, pos, i_size; __be32 *bp; _enter(",%x,{%llx:%llu},,", key_serial(op->key), vp->fid.vid, vp->fid.vnode); - size = (loff_t)op->store.last_to - (loff_t)op->store.first_offset; - if (op->store.first != op->store.last) - size += (loff_t)(op->store.last - op->store.first) << PAGE_SHIFT; - pos = (loff_t)op->store.first << PAGE_SHIFT; - pos += op->store.first_offset; - - i_size = i_size_read(&vp->vnode->vfs_inode); - if (pos + size > i_size) - i_size = size + pos; - _debug("size %llx, at %llx, i_size %llx", - (unsigned long long)size, (unsigned long long)pos, - (unsigned long long)i_size); + (unsigned long long)op->store.size, + (unsigned long long)op->store.pos, + (unsigned long long)op->store.i_size); call = afs_alloc_flat_call(op->net, &yfs_RXYFSStoreData64, sizeof(__be32) + @@ -1109,8 +1099,7 @@ void yfs_fs_store_data(struct afs_operation *op) if (!call) return afs_op_nomem(op); - call->key = op->key; - call->send_pages = true; + call->write_iter = op->store.write_iter; /* marshall the parameters */ bp = call->request; @@ -1118,9 +1107,9 @@ void yfs_fs_store_data(struct afs_operation *op) bp = xdr_encode_u32(bp, 0); /* RPC flags */ bp = xdr_encode_YFSFid(bp, &vp->fid); bp = xdr_encode_YFSStoreStatus_mtime(bp, &op->mtime); - bp = xdr_encode_u64(bp, pos); - bp = xdr_encode_u64(bp, size); - bp = xdr_encode_u64(bp, i_size); + bp = xdr_encode_u64(bp, op->store.pos); + bp = xdr_encode_u64(bp, op->store.size); + bp = xdr_encode_u64(bp, op->store.i_size); yfs_check_req(call, bp); trace_afs_make_fs_call(call, &vp->fid); diff --git a/include/trace/events/afs.h b/include/trace/events/afs.h index 9203cf6a8c53..3ccf591b2374 100644 --- a/include/trace/events/afs.h +++ b/include/trace/events/afs.h @@ -886,65 +886,52 @@ TRACE_EVENT(afs_call_done, __entry->rx_call) ); -TRACE_EVENT(afs_send_pages, - TP_PROTO(struct afs_call *call, struct msghdr *msg, - pgoff_t first, pgoff_t last, unsigned int offset), +TRACE_EVENT(afs_send_data, + TP_PROTO(struct afs_call *call, struct msghdr *msg), - TP_ARGS(call, msg, first, last, offset), + TP_ARGS(call, msg), TP_STRUCT__entry( __field(unsigned int, call ) - __field(pgoff_t, first ) - __field(pgoff_t, last ) - __field(unsigned int, nr ) - __field(unsigned int, bytes ) - __field(unsigned int, offset ) __field(unsigned int, flags ) + __field(loff_t, offset ) + __field(loff_t, count ) ), TP_fast_assign( __entry->call = call->debug_id; - __entry->first = first; - __entry->last = last; - __entry->nr = msg->msg_iter.nr_segs; - __entry->bytes = msg->msg_iter.count; - __entry->offset = offset; __entry->flags = msg->msg_flags; + __entry->offset = msg->msg_iter.xarray_start + msg->msg_iter.iov_offset; + __entry->count = iov_iter_count(&msg->msg_iter); ), - TP_printk(" c=%08x %lx-%lx-%lx b=%x o=%x f=%x", - __entry->call, - __entry->first, __entry->first + __entry->nr - 1, __entry->last, - __entry->bytes, __entry->offset, + TP_printk(" c=%08x o=%llx n=%llx f=%x", + __entry->call, __entry->offset, __entry->count, __entry->flags) ); -TRACE_EVENT(afs_sent_pages, - TP_PROTO(struct afs_call *call, pgoff_t first, pgoff_t last, - pgoff_t cursor, int ret), +TRACE_EVENT(afs_sent_data, + TP_PROTO(struct afs_call *call, struct msghdr *msg, int ret), - TP_ARGS(call, first, last, cursor, ret), + TP_ARGS(call, msg, ret), TP_STRUCT__entry( __field(unsigned int, call ) - __field(pgoff_t, first ) - __field(pgoff_t, last ) - __field(pgoff_t, cursor ) __field(int, ret ) + __field(loff_t, offset ) + __field(loff_t, count ) ), TP_fast_assign( __entry->call = call->debug_id; - __entry->first = first; - __entry->last = last; - __entry->cursor = cursor; __entry->ret = ret; + __entry->offset = msg->msg_iter.xarray_start + msg->msg_iter.iov_offset; + __entry->count = iov_iter_count(&msg->msg_iter); ), - TP_printk(" c=%08x %lx-%lx c=%lx r=%d", - __entry->call, - __entry->first, __entry->last, - __entry->cursor, __entry->ret) + TP_printk(" c=%08x o=%llx n=%llx r=%x", + __entry->call, __entry->offset, __entry->count, + __entry->ret) ); TRACE_EVENT(afs_dir_check_failed, From patchwork Wed Mar 10 16:59:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396819 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B306BC433E6 for ; Wed, 10 Mar 2021 17:00:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 886F664FCB for ; Wed, 10 Mar 2021 17:00:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233776AbhCJRAN (ORCPT ); Wed, 10 Mar 2021 12:00:13 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:21294 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233540AbhCJQ7n (ORCPT ); Wed, 10 Mar 2021 11:59:43 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395582; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wp36hD7JXRi7sa3JIW/Z4FZzgMvgLF1imcH0YqnzA5o=; b=SGwGMPp3DOb92pcU0Tv7h/+S1mdgP5YFNPMS+ZgiByHJQ+wtupNWrhWX71WT9NSZRQFY8Q 4pJjHXYY6gz8dESjBcm22JQrEbAoPkGI9ZAzUHUk3IEYVuMfMCfLxdTQ7QnG1UgX6qEOWj 5xQGM6s9u/pE7HLhWCk/w5n/DTaBp04= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-468-eVVLh8SjNK-TU7i4vsi3cA-1; Wed, 10 Mar 2021 11:59:40 -0500 X-MC-Unique: eVVLh8SjNK-TU7i4vsi3cA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E96ADE5762; Wed, 10 Mar 2021 16:59:38 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id CC75560C13; Wed, 10 Mar 2021 16:59:29 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 23/28] afs: Wait on PG_fscache before modifying/releasing a page From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:59:28 +0000 Message-ID: <161539556890.286939.5873470593519458598.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org PG_fscache is going to be used to indicate that a page is being written to the cache, and that the page should not be modified or released until it's finished. Make afs_invalidatepage() and afs_releasepage() wait for it. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/158861253957.340223.7465334678444521655.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465832417.1377938.3571599385208729791.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588536286.3465195.13231895135369807920.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118153708.1232039.3535103645871176749.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161049369.2537118.11591934943429117060.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340412903.1303470.6424701655031380012.stgit@warthog.procyon.org.uk/ # v3 --- fs/afs/file.c | 9 +++++++++ fs/afs/write.c | 10 ++++++++++ 2 files changed, 19 insertions(+) diff --git a/fs/afs/file.c b/fs/afs/file.c index f1bab69e99d4..acbc21a8c80e 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -594,6 +594,7 @@ static void afs_invalidatepage(struct page *page, unsigned int offset, if (PagePrivate(page)) afs_invalidate_dirty(page, offset, length); + wait_on_page_fscache(page); _leave(""); } @@ -611,6 +612,14 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags) /* deny if page is being written to the cache and the caller hasn't * elected to wait */ +#ifdef CONFIG_AFS_FSCACHE + if (PageFsCache(page)) { + if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS)) + return false; + wait_on_page_fscache(page); + } +#endif + if (PagePrivate(page)) { detach_page_private(page); trace_afs_page_dirty(vnode, tracepoint_string("rel"), page); diff --git a/fs/afs/write.c b/fs/afs/write.c index dd4dc1c868b5..e1791de90478 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -117,6 +117,10 @@ int afs_write_begin(struct file *file, struct address_space *mapping, SetPageUptodate(page); } +#ifdef CONFIG_AFS_FSCACHE + wait_on_page_fscache(page); +#endif + try_again: /* See if this page is already partially written in a way that we can * merge the new write with. @@ -857,6 +861,11 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf) /* Wait for the page to be written to the cache before we allow it to * be modified. We then assume the entire page will need writing back. */ +#ifdef CONFIG_AFS_FSCACHE + if (PageFsCache(vmf->page) && + wait_on_page_bit_killable(vmf->page, PG_fscache) < 0) + return VM_FAULT_RETRY; +#endif if (PageWriteback(vmf->page) && wait_on_page_bit_killable(vmf->page, PG_writeback) < 0) @@ -948,5 +957,6 @@ int afs_launder_page(struct page *page) detach_page_private(page); trace_afs_page_dirty(vnode, tracepoint_string("laundered"), page); + wait_on_page_fscache(page); return ret; } From patchwork Wed Mar 10 16:59:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396818 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAF6EC43619 for ; Wed, 10 Mar 2021 17:00:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C590764FD9 for ; Wed, 10 Mar 2021 17:00:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233800AbhCJRAP (ORCPT ); Wed, 10 Mar 2021 12:00:15 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:54386 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233753AbhCJQ7w (ORCPT ); Wed, 10 Mar 2021 11:59:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395592; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0NFc/6bTxidZIbh3ksStRf2pMhjGVlLFuNwIaLmNRWM=; b=JxQuD/HMXhWs9wp19juah6HRQF6rknjqaszLgi7s0TvwqmRlwkDmjXcc+UdHRw6tvL+SRL YqqW9EO7Zx5jqwNMulE7a17eKOlZ9Wq6eYKvL0bcnfVQIeNOCliWraUd2lMMroaRnSrrpK B96C4nzak+tq95ShbIbvcMe4drAzO24= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-230-RCIFmV8FMF6G_eCrsVBQ7g-1; Wed, 10 Mar 2021 11:59:50 -0500 X-MC-Unique: RCIFmV8FMF6G_eCrsVBQ7g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5EFACE5760; Wed, 10 Mar 2021 16:59:48 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 211C560C13; Wed, 10 Mar 2021 16:59:44 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 24/28] afs: Extract writeback extension into its own function From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:59:44 +0000 Message-ID: <161539558417.286939.2879469588895925399.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Extract writeback extension into its own function to break up the writeback function a bit. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588538471.3465195.782513375683399583.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118154610.1232039.1765365632920504822.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161050546.2537118.2202554806419189453.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340414102.1303470.9078891484034668985.stgit@warthog.procyon.org.uk/ # v3 --- fs/afs/write.c | 109 ++++++++++++++++++++++++++++++++++---------------------- 1 file changed, 67 insertions(+), 42 deletions(-) diff --git a/fs/afs/write.c b/fs/afs/write.c index e1791de90478..89c804bfe253 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -490,47 +490,25 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, } /* - * Synchronously write back the locked page and any subsequent non-locked dirty - * pages. + * Extend the region to be written back to include subsequent contiguously + * dirty pages if possible, but don't sleep while doing so. + * + * If this page holds new content, then we can include filler zeros in the + * writeback. */ -static int afs_write_back_from_locked_page(struct address_space *mapping, - struct writeback_control *wbc, - struct page *primary_page, - pgoff_t final_page) +static void afs_extend_writeback(struct address_space *mapping, + struct afs_vnode *vnode, + long *_count, + pgoff_t start, + pgoff_t final_page, + unsigned *_offset, + unsigned *_to, + bool new_content) { - struct afs_vnode *vnode = AFS_FS_I(mapping->host); - struct iov_iter iter; struct page *pages[8], *page; - unsigned long count, priv; - unsigned n, offset, to, f, t; - pgoff_t start, first, last; - loff_t i_size, pos, end; - int loop, ret; - - _enter(",%lx", primary_page->index); - - count = 1; - if (test_set_page_writeback(primary_page)) - BUG(); - - /* Find all consecutive lockable dirty pages that have contiguous - * written regions, stopping when we find a page that is not - * immediately lockable, is not dirty or is missing, or we reach the - * end of the range. - */ - start = primary_page->index; - priv = page_private(primary_page); - offset = afs_page_dirty_from(primary_page, priv); - to = afs_page_dirty_to(primary_page, priv); - trace_afs_page_dirty(vnode, tracepoint_string("store"), primary_page); - - WARN_ON(offset == to); - if (offset == to) - trace_afs_page_dirty(vnode, tracepoint_string("WARN"), primary_page); - - if (start >= final_page || - (to < PAGE_SIZE && !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags))) - goto no_more; + unsigned long count = *_count, priv; + unsigned offset = *_offset, to = *_to, n, f, t; + int loop; start++; do { @@ -551,8 +529,7 @@ static int afs_write_back_from_locked_page(struct address_space *mapping, for (loop = 0; loop < n; loop++) { page = pages[loop]; - if (to != PAGE_SIZE && - !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags)) + if (to != PAGE_SIZE && !new_content) break; if (page->index > final_page) break; @@ -566,8 +543,7 @@ static int afs_write_back_from_locked_page(struct address_space *mapping, priv = page_private(page); f = afs_page_dirty_from(page, priv); t = afs_page_dirty_to(page, priv); - if (f != 0 && - !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags)) { + if (f != 0 && !new_content) { unlock_page(page); break; } @@ -593,6 +569,55 @@ static int afs_write_back_from_locked_page(struct address_space *mapping, } while (start <= final_page && count < 65536); no_more: + *_count = count; + *_offset = offset; + *_to = to; +} + +/* + * Synchronously write back the locked page and any subsequent non-locked dirty + * pages. + */ +static int afs_write_back_from_locked_page(struct address_space *mapping, + struct writeback_control *wbc, + struct page *primary_page, + pgoff_t final_page) +{ + struct afs_vnode *vnode = AFS_FS_I(mapping->host); + struct iov_iter iter; + unsigned long count, priv; + unsigned offset, to; + pgoff_t start, first, last; + loff_t i_size, pos, end; + bool new_content = test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags); + int ret; + + _enter(",%lx", primary_page->index); + + count = 1; + if (test_set_page_writeback(primary_page)) + BUG(); + + /* Find all consecutive lockable dirty pages that have contiguous + * written regions, stopping when we find a page that is not + * immediately lockable, is not dirty or is missing, or we reach the + * end of the range. + */ + start = primary_page->index; + priv = page_private(primary_page); + offset = afs_page_dirty_from(primary_page, priv); + to = afs_page_dirty_to(primary_page, priv); + trace_afs_page_dirty(vnode, tracepoint_string("store"), primary_page); + + WARN_ON(offset == to); + if (offset == to) + trace_afs_page_dirty(vnode, tracepoint_string("WARN"), primary_page); + + if (start < final_page && + (to == PAGE_SIZE || new_content)) + afs_extend_writeback(mapping, vnode, &count, start, final_page, + &offset, &to, new_content); + /* We now have a contiguous set of dirty pages, each with writeback * set; the first page is still locked at this point, but all the rest * have been unlocked. From patchwork Wed Mar 10 16:59:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 397621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E818C432C3 for ; Wed, 10 Mar 2021 17:00:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4184164FD9 for ; Wed, 10 Mar 2021 17:00:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233836AbhCJRAR (ORCPT ); Wed, 10 Mar 2021 12:00:17 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:26458 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233807AbhCJRAI (ORCPT ); Wed, 10 Mar 2021 12:00:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395607; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eOgDDdRj4DvPJUHh4D3BT62T78QHM4T73XHG9YHJ44w=; b=aEIbSQq0TCMCcQXO+vvYleX86nSr+luHXekU7qdVbO7UvyW+FkaweKEnLFRCXWwzBB1l1l mk6tZjyW11AsPocXsbwDSXC4GWjntfB8Wl+o1qa9vzELCMVzl4hGIdErDeU3VFqAYvFWAE 3so5RAggAkJkiDsJgAKVF/PmkMfghIg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-185-nuCjDEziOZWemMGmpLWTBg-1; Wed, 10 Mar 2021 12:00:03 -0500 X-MC-Unique: nuCjDEziOZWemMGmpLWTBg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7C9AA1005D45; Wed, 10 Mar 2021 17:00:01 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1513310013C1; Wed, 10 Mar 2021 16:59:54 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 25/28] afs: Prepare for use of THPs From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:59:53 +0000 Message-ID: <161539559365.286939.18344613540296085269.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org As a prelude to supporting transparent huge pages, use thp_size() and similar rather than PAGE_SIZE/SHIFT. Further, try and frame everything in terms of file positions and lengths rather than page indices and numbers of pages. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588540227.3465195.4752143929716269062.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118155821.1232039.540445038028845740.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161051439.2537118.15577827510426326534.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340415869.1303470.6040191748634322355.stgit@warthog.procyon.org.uk/ # v3 --- fs/afs/dir.c | 2 fs/afs/file.c | 8 - fs/afs/internal.h | 2 fs/afs/write.c | 436 +++++++++++++++++++++++++++++------------------------ 4 files changed, 245 insertions(+), 203 deletions(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 021489497801..0dc6e1409405 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -2084,6 +2084,6 @@ static void afs_dir_invalidatepage(struct page *page, unsigned int offset, afs_stat_v(dvnode, n_inval); /* we clean up only if the entire page is being invalidated */ - if (offset == 0 && length == PAGE_SIZE) + if (offset == 0 && length == thp_size(page)) detach_page_private(page); } diff --git a/fs/afs/file.c b/fs/afs/file.c index acbc21a8c80e..f6282ac0d222 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -330,8 +330,8 @@ static int afs_page_filler(struct key *key, struct page *page) req->vnode = vnode; req->key = key_get(key); req->pos = (loff_t)page->index << PAGE_SHIFT; - req->len = PAGE_SIZE; - req->nr_pages = 1; + req->len = thp_size(page); + req->nr_pages = thp_nr_pages(page); req->done = afs_file_read_done; req->cleanup = afs_file_read_cleanup; @@ -575,8 +575,8 @@ static void afs_invalidate_dirty(struct page *page, unsigned int offset, trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page); clear_page_dirty_for_io(page); full_invalidate: - detach_page_private(page); trace_afs_page_dirty(vnode, tracepoint_string("inval"), page); + detach_page_private(page); } /* @@ -621,8 +621,8 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags) #endif if (PagePrivate(page)) { - detach_page_private(page); trace_afs_page_dirty(vnode, tracepoint_string("rel"), page); + detach_page_private(page); } /* indicate that the page can be released */ diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 4076c6ba43eb..889f504d7308 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -815,8 +815,6 @@ struct afs_operation { loff_t pos; loff_t size; loff_t i_size; - pgoff_t first; /* first page in mapping to deal with */ - pgoff_t last; /* last page in mapping to deal with */ bool laundering; /* Laundering page, PG_writeback not set */ } store; struct { diff --git a/fs/afs/write.c b/fs/afs/write.c index 89c804bfe253..e672833c99bc 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -94,15 +94,15 @@ int afs_write_begin(struct file *file, struct address_space *mapping, struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); struct page *page; unsigned long priv; - unsigned f, from = pos & (PAGE_SIZE - 1); - unsigned t, to = from + len; - pgoff_t index = pos >> PAGE_SHIFT; + unsigned f, from; + unsigned t, to; + pgoff_t index; int ret; - _enter("{%llx:%llu},{%lx},%u,%u", - vnode->fid.vid, vnode->fid.vnode, index, from, to); + _enter("{%llx:%llu},%llx,%x", + vnode->fid.vid, vnode->fid.vnode, pos, len); - page = grab_cache_page_write_begin(mapping, index, flags); + page = grab_cache_page_write_begin(mapping, pos / PAGE_SIZE, flags); if (!page) return -ENOMEM; @@ -121,19 +121,20 @@ int afs_write_begin(struct file *file, struct address_space *mapping, wait_on_page_fscache(page); #endif + index = page->index; + from = pos - index * PAGE_SIZE; + to = from + len; + try_again: /* See if this page is already partially written in a way that we can * merge the new write with. */ - t = f = 0; if (PagePrivate(page)) { priv = page_private(page); f = afs_page_dirty_from(page, priv); t = afs_page_dirty_to(page, priv); ASSERTCMP(f, <=, t); - } - if (f != t) { if (PageWriteback(page)) { trace_afs_page_dirty(vnode, tracepoint_string("alrdy"), page); goto flush_conflicting_write; @@ -180,7 +181,7 @@ int afs_write_end(struct file *file, struct address_space *mapping, { struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); unsigned long priv; - unsigned int f, from = pos & (PAGE_SIZE - 1); + unsigned int f, from = pos & (thp_size(page) - 1); unsigned int t, to = from + copied; loff_t i_size, maybe_i_size; int ret = 0; @@ -233,9 +234,8 @@ int afs_write_end(struct file *file, struct address_space *mapping, trace_afs_page_dirty(vnode, tracepoint_string("dirty"), page); } - set_page_dirty(page); - if (PageDirty(page)) - _debug("dirtied"); + if (set_page_dirty(page)) + _debug("dirtied %lx", page->index); ret = copied; out: @@ -248,40 +248,43 @@ int afs_write_end(struct file *file, struct address_space *mapping, * kill all the pages in the given range */ static void afs_kill_pages(struct address_space *mapping, - pgoff_t first, pgoff_t last) + loff_t start, loff_t len) { struct afs_vnode *vnode = AFS_FS_I(mapping->host); struct pagevec pv; - unsigned count, loop; + unsigned int loop, psize; - _enter("{%llx:%llu},%lx-%lx", - vnode->fid.vid, vnode->fid.vnode, first, last); + _enter("{%llx:%llu},%llx @%llx", + vnode->fid.vid, vnode->fid.vnode, len, start); pagevec_init(&pv); do { - _debug("kill %lx-%lx", first, last); + _debug("kill %llx @%llx", len, start); - count = last - first + 1; - if (count > PAGEVEC_SIZE) - count = PAGEVEC_SIZE; - pv.nr = find_get_pages_contig(mapping, first, count, pv.pages); - ASSERTCMP(pv.nr, ==, count); + pv.nr = find_get_pages_contig(mapping, start / PAGE_SIZE, + PAGEVEC_SIZE, pv.pages); + if (pv.nr == 0) + break; - for (loop = 0; loop < count; loop++) { + for (loop = 0; loop < pv.nr; loop++) { struct page *page = pv.pages[loop]; + + if (page->index * PAGE_SIZE >= start + len) + break; + + psize = thp_size(page); + start += psize; + len -= psize; ClearPageUptodate(page); - SetPageError(page); end_page_writeback(page); - if (page->index >= first) - first = page->index + 1; lock_page(page); generic_error_remove_page(mapping, page); unlock_page(page); } __pagevec_release(&pv); - } while (first <= last); + } while (len > 0); _leave(""); } @@ -291,37 +294,40 @@ static void afs_kill_pages(struct address_space *mapping, */ static void afs_redirty_pages(struct writeback_control *wbc, struct address_space *mapping, - pgoff_t first, pgoff_t last) + loff_t start, loff_t len) { struct afs_vnode *vnode = AFS_FS_I(mapping->host); struct pagevec pv; - unsigned count, loop; + unsigned int loop, psize; - _enter("{%llx:%llu},%lx-%lx", - vnode->fid.vid, vnode->fid.vnode, first, last); + _enter("{%llx:%llu},%llx @%llx", + vnode->fid.vid, vnode->fid.vnode, len, start); pagevec_init(&pv); do { - _debug("redirty %lx-%lx", first, last); + _debug("redirty %llx @%llx", len, start); - count = last - first + 1; - if (count > PAGEVEC_SIZE) - count = PAGEVEC_SIZE; - pv.nr = find_get_pages_contig(mapping, first, count, pv.pages); - ASSERTCMP(pv.nr, ==, count); + pv.nr = find_get_pages_contig(mapping, start / PAGE_SIZE, + PAGEVEC_SIZE, pv.pages); + if (pv.nr == 0) + break; - for (loop = 0; loop < count; loop++) { + for (loop = 0; loop < pv.nr; loop++) { struct page *page = pv.pages[loop]; + if (page->index * PAGE_SIZE >= start + len) + break; + + psize = thp_size(page); + start += psize; + len -= psize; redirty_page_for_writepage(wbc, page); end_page_writeback(page); - if (page->index >= first) - first = page->index + 1; } __pagevec_release(&pv); - } while (first <= last); + } while (len > 0); _leave(""); } @@ -329,23 +335,28 @@ static void afs_redirty_pages(struct writeback_control *wbc, /* * completion of write to server */ -static void afs_pages_written_back(struct afs_vnode *vnode, pgoff_t start, pgoff_t last) +static void afs_pages_written_back(struct afs_vnode *vnode, loff_t start, unsigned int len) { struct address_space *mapping = vnode->vfs_inode.i_mapping; struct page *page; + pgoff_t end; - XA_STATE(xas, &mapping->i_pages, start); + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); - _enter("{%llx:%llu},{%lx-%lx}", - vnode->fid.vid, vnode->fid.vnode, start, last); + _enter("{%llx:%llu},{%x @%llx}", + vnode->fid.vid, vnode->fid.vnode, len, start); rcu_read_lock(); - xas_for_each(&xas, page, last) { - ASSERT(PageWriteback(page)); + end = (start + len - 1) / PAGE_SIZE; + xas_for_each(&xas, page, end) { + if (!PageWriteback(page)) { + kdebug("bad %x @%llx page %lx %lx", len, start, page->index, end); + ASSERT(PageWriteback(page)); + } - detach_page_private(page); trace_afs_page_dirty(vnode, tracepoint_string("clear"), page); + detach_page_private(page); page_endio(page, true, 0); } @@ -404,7 +415,7 @@ static void afs_store_data_success(struct afs_operation *op) afs_vnode_commit_status(op, &op->file[0]); if (op->error == 0) { if (!op->store.laundering) - afs_pages_written_back(vnode, op->store.first, op->store.last); + afs_pages_written_back(vnode, op->store.pos, op->store.size); afs_stat_v(vnode, n_stores); atomic_long_add(op->store.size, &afs_v2net(vnode)->n_store_bytes); } @@ -419,8 +430,7 @@ static const struct afs_operation_ops afs_store_data_operation = { /* * write to a file */ -static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, - loff_t pos, pgoff_t first, pgoff_t last, +static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t pos, bool laundering) { struct afs_operation *op; @@ -453,8 +463,6 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, op->file[0].dv_delta = 1; op->store.write_iter = iter; op->store.pos = pos; - op->store.first = first; - op->store.last = last; op->store.size = size; op->store.i_size = max(pos + size, i_size); op->store.laundering = laundering; @@ -499,40 +507,49 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, static void afs_extend_writeback(struct address_space *mapping, struct afs_vnode *vnode, long *_count, - pgoff_t start, - pgoff_t final_page, - unsigned *_offset, - unsigned *_to, - bool new_content) + loff_t start, + loff_t max_len, + bool new_content, + unsigned int *_len) { - struct page *pages[8], *page; - unsigned long count = *_count, priv; - unsigned offset = *_offset, to = *_to, n, f, t; - int loop; + struct pagevec pvec; + struct page *page; + unsigned long priv; + unsigned int psize, filler = 0; + unsigned int f, t; + loff_t len = *_len; + pgoff_t index = (start + len) / PAGE_SIZE; + bool stop = true; + unsigned int i; + + XA_STATE(xas, &mapping->i_pages, index); + pagevec_init(&pvec); - start++; do { - _debug("more %lx [%lx]", start, count); - n = final_page - start + 1; - if (n > ARRAY_SIZE(pages)) - n = ARRAY_SIZE(pages); - n = find_get_pages_contig(mapping, start, ARRAY_SIZE(pages), pages); - _debug("fgpc %u", n); - if (n == 0) - goto no_more; - if (pages[0]->index != start) { - do { - put_page(pages[--n]); - } while (n > 0); - goto no_more; - } + /* Firstly, we gather up a batch of contiguous dirty pages + * under the RCU read lock - but we can't clear the dirty flags + * there if any of those pages are mapped. + */ + rcu_read_lock(); - for (loop = 0; loop < n; loop++) { - page = pages[loop]; - if (to != PAGE_SIZE && !new_content) + xas_for_each(&xas, page, ULONG_MAX) { + stop = true; + if (xas_retry(&xas, page)) + continue; + if (xa_is_value(page)) + break; + if (page->index != index) break; - if (page->index > final_page) + + if (!page_cache_get_speculative(page)) { + xas_reset(&xas); + continue; + } + + /* Has the page moved or been split? */ + if (unlikely(page != xas_reload(&xas))) break; + if (!trylock_page(page)) break; if (!PageDirty(page) || PageWriteback(page)) { @@ -540,6 +557,7 @@ static void afs_extend_writeback(struct address_space *mapping, break; } + psize = thp_size(page); priv = page_private(page); f = afs_page_dirty_from(page, priv); t = afs_page_dirty_to(page, priv); @@ -547,110 +565,126 @@ static void afs_extend_writeback(struct address_space *mapping, unlock_page(page); break; } - to = t; + len += filler + t; + filler = psize - t; + if (len >= max_len || *_count <= 0) + stop = true; + else if (t == psize || new_content) + stop = false; + + index += thp_nr_pages(page); + if (!pagevec_add(&pvec, page)) + break; + if (stop) + break; + } + + if (!stop) + xas_pause(&xas); + rcu_read_unlock(); + + /* Now, if we obtained any pages, we can shift them to being + * writable and mark them for caching. + */ + if (!pagevec_count(&pvec)) + break; + + for (i = 0; i < pagevec_count(&pvec); i++) { + page = pvec.pages[i]; trace_afs_page_dirty(vnode, tracepoint_string("store+"), page); if (!clear_page_dirty_for_io(page)) BUG(); if (test_set_page_writeback(page)) BUG(); + + *_count -= thp_nr_pages(page); unlock_page(page); - put_page(page); - } - count += loop; - if (loop < n) { - for (; loop < n; loop++) - put_page(pages[loop]); - goto no_more; } - start += loop; - } while (start <= final_page && count < 65536); + pagevec_release(&pvec); + cond_resched(); + } while (!stop); -no_more: - *_count = count; - *_offset = offset; - *_to = to; + *_len = len; } /* * Synchronously write back the locked page and any subsequent non-locked dirty * pages. */ -static int afs_write_back_from_locked_page(struct address_space *mapping, - struct writeback_control *wbc, - struct page *primary_page, - pgoff_t final_page) +static ssize_t afs_write_back_from_locked_page(struct address_space *mapping, + struct writeback_control *wbc, + struct page *page, + loff_t start, loff_t end) { struct afs_vnode *vnode = AFS_FS_I(mapping->host); struct iov_iter iter; - unsigned long count, priv; - unsigned offset, to; - pgoff_t start, first, last; - loff_t i_size, pos, end; + unsigned long priv; + unsigned int offset, to, len, max_len; + loff_t i_size = i_size_read(&vnode->vfs_inode); bool new_content = test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags); + long count = wbc->nr_to_write; int ret; - _enter(",%lx", primary_page->index); + _enter(",%lx,%llx-%llx", page->index, start, end); - count = 1; - if (test_set_page_writeback(primary_page)) + if (test_set_page_writeback(page)) BUG(); + count -= thp_nr_pages(page); + /* Find all consecutive lockable dirty pages that have contiguous * written regions, stopping when we find a page that is not * immediately lockable, is not dirty or is missing, or we reach the * end of the range. */ - start = primary_page->index; - priv = page_private(primary_page); - offset = afs_page_dirty_from(primary_page, priv); - to = afs_page_dirty_to(primary_page, priv); - trace_afs_page_dirty(vnode, tracepoint_string("store"), primary_page); - - WARN_ON(offset == to); - if (offset == to) - trace_afs_page_dirty(vnode, tracepoint_string("WARN"), primary_page); - - if (start < final_page && - (to == PAGE_SIZE || new_content)) - afs_extend_writeback(mapping, vnode, &count, start, final_page, - &offset, &to, new_content); + priv = page_private(page); + offset = afs_page_dirty_from(page, priv); + to = afs_page_dirty_to(page, priv); + trace_afs_page_dirty(vnode, tracepoint_string("store"), page); + + len = to - offset; + start += offset; + if (start < i_size) { + /* Trim the write to the EOF; the extra data is ignored. Also + * put an upper limit on the size of a single storedata op. + */ + max_len = 65536 * 4096; + max_len = min_t(unsigned long long, max_len, end - start + 1); + max_len = min_t(unsigned long long, max_len, i_size - start); + + if (len < max_len && + (to == thp_size(page) || new_content)) + afs_extend_writeback(mapping, vnode, &count, + start, max_len, new_content, &len); + len = min_t(loff_t, len, max_len); + } /* We now have a contiguous set of dirty pages, each with writeback * set; the first page is still locked at this point, but all the rest * have been unlocked. */ - unlock_page(primary_page); - - first = primary_page->index; - last = first + count - 1; - _debug("write back %lx[%u..] to %lx[..%u]", first, offset, last, to); - - pos = first; - pos <<= PAGE_SHIFT; - pos += offset; - end = last; - end <<= PAGE_SHIFT; - end += to; + unlock_page(page); - /* Trim the actual write down to the EOF */ - i_size = i_size_read(&vnode->vfs_inode); - if (end > i_size) - end = i_size; + if (start < i_size) { + _debug("write back %x @%llx [%llx]", len, start, i_size); - if (pos < i_size) { - iov_iter_xarray(&iter, WRITE, &mapping->i_pages, pos, end - pos); - ret = afs_store_data(vnode, &iter, pos, first, last, false); + iov_iter_xarray(&iter, WRITE, &mapping->i_pages, start, len); + ret = afs_store_data(vnode, &iter, start, false); } else { + _debug("write discard %x @%llx [%llx]", len, start, i_size); + /* The dirty region was entirely beyond the EOF. */ + afs_pages_written_back(vnode, start, len); ret = 0; } switch (ret) { case 0: - ret = count; + wbc->nr_to_write = count; + ret = len; break; default: @@ -662,13 +696,13 @@ static int afs_write_back_from_locked_page(struct address_space *mapping, case -EKEYEXPIRED: case -EKEYREJECTED: case -EKEYREVOKED: - afs_redirty_pages(wbc, mapping, first, last); + afs_redirty_pages(wbc, mapping, start, len); mapping_set_error(mapping, ret); break; case -EDQUOT: case -ENOSPC: - afs_redirty_pages(wbc, mapping, first, last); + afs_redirty_pages(wbc, mapping, start, len); mapping_set_error(mapping, -ENOSPC); break; @@ -680,7 +714,7 @@ static int afs_write_back_from_locked_page(struct address_space *mapping, case -ENOMEDIUM: case -ENXIO: trace_afs_file_error(vnode, ret, afs_file_error_writeback_fail); - afs_kill_pages(mapping, first, last); + afs_kill_pages(mapping, start, len); mapping_set_error(mapping, ret); break; } @@ -695,19 +729,19 @@ static int afs_write_back_from_locked_page(struct address_space *mapping, */ int afs_writepage(struct page *page, struct writeback_control *wbc) { - int ret; + ssize_t ret; + loff_t start; _enter("{%lx},", page->index); + start = page->index * PAGE_SIZE; ret = afs_write_back_from_locked_page(page->mapping, wbc, page, - wbc->range_end >> PAGE_SHIFT); + start, LLONG_MAX - start); if (ret < 0) { - _leave(" = %d", ret); - return 0; + _leave(" = %zd", ret); + return ret; } - wbc->nr_to_write -= ret; - _leave(" = 0"); return 0; } @@ -717,35 +751,46 @@ int afs_writepage(struct page *page, struct writeback_control *wbc) */ static int afs_writepages_region(struct address_space *mapping, struct writeback_control *wbc, - pgoff_t index, pgoff_t end, pgoff_t *_next) + loff_t start, loff_t end, loff_t *_next) { struct page *page; - int ret, n; + ssize_t ret; + int n; - _enter(",,%lx,%lx,", index, end); + _enter("%llx,%llx,", start, end); do { - n = find_get_pages_range_tag(mapping, &index, end, - PAGECACHE_TAG_DIRTY, 1, &page); + pgoff_t index = start / PAGE_SIZE; + + n = find_get_pages_range_tag(mapping, &index, end / PAGE_SIZE, + PAGECACHE_TAG_DIRTY, 1, &page); if (!n) break; + start = (loff_t)page->index * PAGE_SIZE; /* May regress with THPs */ + _debug("wback %lx", page->index); - /* - * at this point we hold neither the i_pages lock nor the + /* At this point we hold neither the i_pages lock nor the * page lock: the page may be truncated or invalidated * (changing page->mapping to NULL), or even swizzled * back from swapper_space to tmpfs file mapping */ - ret = lock_page_killable(page); - if (ret < 0) { - put_page(page); - _leave(" = %d", ret); - return ret; + if (wbc->sync_mode != WB_SYNC_NONE) { + ret = lock_page_killable(page); + if (ret < 0) { + put_page(page); + return ret; + } + } else { + if (!trylock_page(page)) { + put_page(page); + return 0; + } } if (page->mapping != mapping || !PageDirty(page)) { + start += thp_size(page); unlock_page(page); put_page(page); continue; @@ -761,20 +806,20 @@ static int afs_writepages_region(struct address_space *mapping, if (!clear_page_dirty_for_io(page)) BUG(); - ret = afs_write_back_from_locked_page(mapping, wbc, page, end); + ret = afs_write_back_from_locked_page(mapping, wbc, page, start, end); put_page(page); if (ret < 0) { - _leave(" = %d", ret); + _leave(" = %zd", ret); return ret; } - wbc->nr_to_write -= ret; + start += ret * PAGE_SIZE; cond_resched(); - } while (index < end && wbc->nr_to_write > 0); + } while (wbc->nr_to_write > 0); - *_next = index; - _leave(" = 0 [%lx]", *_next); + *_next = start; + _leave(" = 0 [%llx]", *_next); return 0; } @@ -785,7 +830,7 @@ int afs_writepages(struct address_space *mapping, struct writeback_control *wbc) { struct afs_vnode *vnode = AFS_FS_I(mapping->host); - pgoff_t start, end, next; + loff_t start, next; int ret; _enter(""); @@ -800,22 +845,19 @@ int afs_writepages(struct address_space *mapping, return 0; if (wbc->range_cyclic) { - start = mapping->writeback_index; - end = -1; - ret = afs_writepages_region(mapping, wbc, start, end, &next); + start = mapping->writeback_index * PAGE_SIZE; + ret = afs_writepages_region(mapping, wbc, start, LLONG_MAX, &next); if (start > 0 && wbc->nr_to_write > 0 && ret == 0) ret = afs_writepages_region(mapping, wbc, 0, start, &next); - mapping->writeback_index = next; + mapping->writeback_index = next / PAGE_SIZE; } else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) { - end = (pgoff_t)(LLONG_MAX >> PAGE_SHIFT); - ret = afs_writepages_region(mapping, wbc, 0, end, &next); + ret = afs_writepages_region(mapping, wbc, 0, LLONG_MAX, &next); if (wbc->nr_to_write > 0) mapping->writeback_index = next; } else { - start = wbc->range_start >> PAGE_SHIFT; - end = wbc->range_end >> PAGE_SHIFT; - ret = afs_writepages_region(mapping, wbc, start, end, &next); + ret = afs_writepages_region(mapping, wbc, + wbc->range_start, wbc->range_end, &next); } up_read(&vnode->validate_lock); @@ -873,13 +915,13 @@ int afs_fsync(struct file *file, loff_t start, loff_t end, int datasync) */ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf) { + struct page *page = thp_head(vmf->page); struct file *file = vmf->vma->vm_file; struct inode *inode = file_inode(file); struct afs_vnode *vnode = AFS_FS_I(inode); unsigned long priv; - _enter("{{%llx:%llu}},{%lx}", - vnode->fid.vid, vnode->fid.vnode, vmf->page->index); + _enter("{{%llx:%llu}},{%lx}", vnode->fid.vid, vnode->fid.vnode, page->index); sb_start_pagefault(inode->i_sb); @@ -887,31 +929,33 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf) * be modified. We then assume the entire page will need writing back. */ #ifdef CONFIG_AFS_FSCACHE - if (PageFsCache(vmf->page) && - wait_on_page_bit_killable(vmf->page, PG_fscache) < 0) + if (PageFsCache(page) && + wait_on_page_bit_killable(page, PG_fscache) < 0) return VM_FAULT_RETRY; #endif - if (PageWriteback(vmf->page) && - wait_on_page_bit_killable(vmf->page, PG_writeback) < 0) + if (PageWriteback(page) && + wait_on_page_bit_killable(page, PG_writeback) < 0) return VM_FAULT_RETRY; - if (lock_page_killable(vmf->page) < 0) + if (lock_page_killable(page) < 0) return VM_FAULT_RETRY; /* We mustn't change page->private until writeback is complete as that * details the portion of the page we need to write back and we might * need to redirty the page if there's a problem. */ - wait_on_page_writeback(vmf->page); + wait_on_page_writeback(page); - priv = afs_page_dirty(vmf->page, 0, PAGE_SIZE); + priv = afs_page_dirty(page, 0, thp_size(page)); priv = afs_page_dirty_mmapped(priv); - if (PagePrivate(vmf->page)) - set_page_private(vmf->page, priv); - else - attach_page_private(vmf->page, (void *)priv); - trace_afs_page_dirty(vnode, tracepoint_string("mkwrite"), vmf->page); + if (PagePrivate(page)) { + set_page_private(page, priv); + trace_afs_page_dirty(vnode, tracepoint_string("mkwrite+"), page); + } else { + attach_page_private(page, (void *)priv); + trace_afs_page_dirty(vnode, tracepoint_string("mkwrite"), page); + } file_update_time(file); sb_end_pagefault(inode->i_sb); @@ -964,7 +1008,7 @@ int afs_launder_page(struct page *page) priv = page_private(page); if (clear_page_dirty_for_io(page)) { f = 0; - t = PAGE_SIZE; + t = thp_size(page); if (PagePrivate(page)) { f = afs_page_dirty_from(page, priv); t = afs_page_dirty_to(page, priv); @@ -976,12 +1020,12 @@ int afs_launder_page(struct page *page) iov_iter_bvec(&iter, WRITE, bv, 1, bv[0].bv_len); trace_afs_page_dirty(vnode, tracepoint_string("launder"), page); - ret = afs_store_data(vnode, &iter, (loff_t)page->index << PAGE_SHIFT, - page->index, page->index, true); + ret = afs_store_data(vnode, &iter, (loff_t)page->index * PAGE_SIZE, + true); } - detach_page_private(page); trace_afs_page_dirty(vnode, tracepoint_string("laundered"), page); + detach_page_private(page); wait_on_page_fscache(page); return ret; } From patchwork Wed Mar 10 17:00:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 397620 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0785CC43381 for ; Wed, 10 Mar 2021 17:01:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CEF2B64FC7 for ; Wed, 10 Mar 2021 17:01:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232608AbhCJRAp (ORCPT ); Wed, 10 Mar 2021 12:00:45 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:30931 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233843AbhCJRAT (ORCPT ); Wed, 10 Mar 2021 12:00:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395618; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TSFY1mfuIl4jA5nx/IZIR5MuKLty4TPmhh0I/izWwUU=; b=GwXcuzDGBuQ1ps/LGaC0hzGBXuFQJgE34KGnxxFJTqOqKSTETf78/trRPMZLb5mK0e981t RwiEzfnDAMkJeJLszyMTWZJvkapIFfFsl+7HVOlnX5g3Ygg4cM47JrYs1BKHG5uPuc0txJ 3Bbz01SsfAwqIL9l/pAOVQS5rJkF9FY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-489-hmYzCFwQMMGxYKAp7EJGpQ-1; Wed, 10 Mar 2021 12:00:16 -0500 X-MC-Unique: hmYzCFwQMMGxYKAp7EJGpQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0D53F108BD07; Wed, 10 Mar 2021 17:00:14 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 11E3E60C05; Wed, 10 Mar 2021 17:00:07 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 26/28] afs: Use the fs operation ops to handle FetchData completion From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 17:00:06 +0000 Message-ID: <161539560673.286939.391310781674212229.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Use the 'success' and 'aborted' afs_operations_ops methods and add a 'failed' method to handle the completion of an AFS.FetchData, AFS.FetchData64 or YFS.FetchData64 RPC operation rather than directly calling the done func pointed to by the afs_read struct from the call delivery handler. This means the done function will be called back on error also, not just on successful completion. This allows motion towards asynchronous data reception on data fetch calls and allows any error to be handed off to the fscache read helper in the same place as a successful completion. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588541471.3465195.8807019223378490810.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118157260.1232039.6549085372718234792.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161052647.2537118.12922380836599003659.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340417106.1303470.3502017303898569631.stgit@warthog.procyon.org.uk/ # v3 --- fs/afs/file.c | 15 +++++++++++++++ fs/afs/fs_operation.c | 4 +++- fs/afs/fsclient.c | 3 --- fs/afs/internal.h | 1 + fs/afs/yfsclient.c | 3 --- 5 files changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/afs/file.c b/fs/afs/file.c index f6282ac0d222..231e9fd7882b 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -255,6 +255,19 @@ void afs_put_read(struct afs_read *req) } } +static void afs_fetch_data_notify(struct afs_operation *op) +{ + struct afs_read *req = op->fetch.req; + int error = op->error; + + if (error == -ECONNABORTED) + error = afs_abort_to_error(op->ac.abort_code); + req->error = error; + + if (req->done) + req->done(req); +} + static void afs_fetch_data_success(struct afs_operation *op) { struct afs_vnode *vnode = op->file[0].vnode; @@ -263,6 +276,7 @@ static void afs_fetch_data_success(struct afs_operation *op) afs_vnode_commit_status(op, &op->file[0]); afs_stat_v(vnode, n_fetches); atomic_long_add(op->fetch.req->actual_len, &op->net->n_fetch_bytes); + afs_fetch_data_notify(op); } static void afs_fetch_data_put(struct afs_operation *op) @@ -276,6 +290,7 @@ static const struct afs_operation_ops afs_fetch_data_operation = { .issue_yfs_rpc = yfs_fs_fetch_data, .success = afs_fetch_data_success, .aborted = afs_check_for_remote_deletion, + .failed = afs_fetch_data_notify, .put = afs_fetch_data_put, }; diff --git a/fs/afs/fs_operation.c b/fs/afs/fs_operation.c index 97cab12b0a6c..938e28a00101 100644 --- a/fs/afs/fs_operation.c +++ b/fs/afs/fs_operation.c @@ -195,8 +195,10 @@ void afs_wait_for_operation(struct afs_operation *op) case -ECONNABORTED: if (op->ops->aborted) op->ops->aborted(op); - break; + fallthrough; default: + if (op->ops->failed) + op->ops->failed(op); break; } diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 31e6b3635541..5e34f4dbd385 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -392,9 +392,6 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call) break; } - if (req->done) - req->done(req); - _leave(" = 0 [done]"); return 0; } diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 889f504d7308..62c1b38fa98b 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -742,6 +742,7 @@ struct afs_operation_ops { void (*issue_yfs_rpc)(struct afs_operation *op); void (*success)(struct afs_operation *op); void (*aborted)(struct afs_operation *op); + void (*failed)(struct afs_operation *op); void (*edit_dir)(struct afs_operation *op); void (*put)(struct afs_operation *op); }; diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index 363d6dd276c0..2b35cba8ad62 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -449,9 +449,6 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call) break; } - if (req->done) - req->done(req); - _leave(" = 0 [done]"); return 0; } From patchwork Wed Mar 10 17:00:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 396817 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2257FC4332B for ; Wed, 10 Mar 2021 17:01:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DDD4E64FDC for ; Wed, 10 Mar 2021 17:01:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233508AbhCJRAq (ORCPT ); Wed, 10 Mar 2021 12:00:46 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:52195 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233898AbhCJRAe (ORCPT ); Wed, 10 Mar 2021 12:00:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395634; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MUWy5v1OQUbj7p0Vb5UgsKMrIPeKLKkB6nhoC2zCYc8=; b=NS/HuRfYDgWKsF3LjzqdzLlb2tLLsgsE3+81z6/2WjZuGOkFdz5QtzPVGZuvJCqYGyga3G iL2oRnITYO1lDyXRS2apD+ABFCvH6buLlbHG6PUuBdOrD7PovKGsshoqRXtAsQ8yrPB0v9 A9avVC88Gah8khsyuBJInEyfhnWfcDo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-590-ppoqFDQVOqeBYJAO46FhVg-1; Wed, 10 Mar 2021 12:00:30 -0500 X-MC-Unique: ppoqFDQVOqeBYJAO46FhVg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3011D8015BA; Wed, 10 Mar 2021 17:00:27 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id D2AE5E16B; Wed, 10 Mar 2021 17:00:20 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 27/28] afs: Use new fscache read helper API From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 17:00:19 +0000 Message-ID: <161539561926.286939.5729036262354802339.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Make AFS use the new fscache read helpers to implement the VM read operations: - afs_readpage() now hands off responsibility to fscache_readpage(). - afs_readpages() is gone and replaced with afs_readahead(). - afs_readahead() just hands off responsibility to fscache_readahead(). These make use of the cache if a cookie is supplied, otherwise just call the ->issue_op() method a sufficient number of times to complete the entire request. Changes: - Folded in error handling fixes to afs_req_issue_op(). - Added flag to netfs_subreq_terminated() to indicate that the caller may have been running async and stuff that might sleep needs punting to a workqueue. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588542733.3465195.7526541422073350302.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118158436.1232039.3884845981224091996.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161053540.2537118.14904446369309535330.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340418739.1303470.5908092911600241280.stgit@warthog.procyon.org.uk/ # v3 --- fs/afs/Kconfig | 1 fs/afs/file.c | 327 +++++++++++++---------------------------------------- fs/afs/fsclient.c | 1 fs/afs/internal.h | 3 4 files changed, 83 insertions(+), 249 deletions(-) diff --git a/fs/afs/Kconfig b/fs/afs/Kconfig index 1ad211d72b3b..fc8ba9142f2f 100644 --- a/fs/afs/Kconfig +++ b/fs/afs/Kconfig @@ -4,6 +4,7 @@ config AFS_FS depends on INET select AF_RXRPC select DNS_RESOLVER + select NETFS_SUPPORT help If you say Y here, you will get an experimental Andrew File System driver. It currently only supports unsecured read-only AFS access. diff --git a/fs/afs/file.c b/fs/afs/file.c index 231e9fd7882b..99bb4649a306 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "internal.h" static int afs_file_mmap(struct file *file, struct vm_area_struct *vma); @@ -22,8 +23,7 @@ static void afs_invalidatepage(struct page *page, unsigned int offset, unsigned int length); static int afs_releasepage(struct page *page, gfp_t gfp_flags); -static int afs_readpages(struct file *filp, struct address_space *mapping, - struct list_head *pages, unsigned nr_pages); +static void afs_readahead(struct readahead_control *ractl); const struct file_operations afs_file_operations = { .open = afs_open, @@ -48,7 +48,7 @@ const struct inode_operations afs_file_inode_operations = { const struct address_space_operations afs_fs_aops = { .readpage = afs_readpage, - .readpages = afs_readpages, + .readahead = afs_readahead, .set_page_dirty = afs_set_page_dirty, .launder_page = afs_launder_page, .releasepage = afs_releasepage, @@ -185,61 +185,17 @@ int afs_release(struct inode *inode, struct file *file) } /* - * Handle completion of a read operation. + * Allocate a new read record. */ -static void afs_file_read_done(struct afs_read *req) +struct afs_read *afs_alloc_read(gfp_t gfp) { - struct afs_vnode *vnode = req->vnode; - struct page *page; - pgoff_t index = req->pos >> PAGE_SHIFT; - pgoff_t last = index + req->nr_pages - 1; - - XA_STATE(xas, &vnode->vfs_inode.i_mapping->i_pages, index); - - if (iov_iter_count(req->iter) > 0) { - /* The read was short - clear the excess buffer. */ - _debug("afterclear %zx %zx %llx/%llx", - req->iter->iov_offset, - iov_iter_count(req->iter), - req->actual_len, req->len); - iov_iter_zero(iov_iter_count(req->iter), req->iter); - } - - rcu_read_lock(); - xas_for_each(&xas, page, last) { - page_endio(page, false, 0); - put_page(page); - } - rcu_read_unlock(); - - task_io_account_read(req->len); - req->cleanup = NULL; -} - -/* - * Dispose of our locks and refs on the pages if the read failed. - */ -static void afs_file_read_cleanup(struct afs_read *req) -{ - struct page *page; - pgoff_t index = req->pos >> PAGE_SHIFT; - pgoff_t last = index + req->nr_pages - 1; - - if (req->iter) { - XA_STATE(xas, &req->vnode->vfs_inode.i_mapping->i_pages, index); - - _enter("%lu,%u,%zu", index, req->nr_pages, iov_iter_count(req->iter)); + struct afs_read *req; - rcu_read_lock(); - xas_for_each(&xas, page, last) { - BUG_ON(xa_is_value(page)); - BUG_ON(PageCompound(page)); + req = kzalloc(sizeof(struct afs_read), gfp); + if (req) + refcount_set(&req->usage, 1); - page_endio(page, false, req->error); - put_page(page); - } - rcu_read_unlock(); - } + return req; } /* @@ -258,14 +214,20 @@ void afs_put_read(struct afs_read *req) static void afs_fetch_data_notify(struct afs_operation *op) { struct afs_read *req = op->fetch.req; + struct netfs_read_subrequest *subreq = req->subreq; int error = op->error; if (error == -ECONNABORTED) error = afs_abort_to_error(op->ac.abort_code); req->error = error; - if (req->done) + if (subreq) { + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + netfs_subreq_terminated(subreq, error ?: req->actual_len, false); + req->subreq = NULL; + } else if (req->done) { req->done(req); + } } static void afs_fetch_data_success(struct afs_operation *op) @@ -309,8 +271,11 @@ int afs_fetch_data(struct afs_vnode *vnode, struct afs_read *req) key_serial(req->key)); op = afs_alloc_operation(req->key, vnode->volume); - if (IS_ERR(op)) + if (IS_ERR(op)) { + if (req->subreq) + netfs_subreq_terminated(req->subreq, PTR_ERR(op), false); return PTR_ERR(op); + } afs_op_set_vnode(op, 0, vnode); @@ -319,222 +284,86 @@ int afs_fetch_data(struct afs_vnode *vnode, struct afs_read *req) return afs_do_sync_operation(op); } -/* - * read page from file, directory or symlink, given a key to use - */ -static int afs_page_filler(struct key *key, struct page *page) +static void afs_req_issue_op(struct netfs_read_subrequest *subreq) { - struct inode *inode = page->mapping->host; - struct afs_vnode *vnode = AFS_FS_I(inode); - struct afs_read *req; - int ret; - - _enter("{%x},{%lu},{%lu}", key_serial(key), inode->i_ino, page->index); + struct afs_vnode *vnode = AFS_FS_I(subreq->rreq->inode); + struct afs_read *fsreq; - BUG_ON(!PageLocked(page)); - - ret = -ESTALE; - if (test_bit(AFS_VNODE_DELETED, &vnode->flags)) - goto error; + fsreq = afs_alloc_read(GFP_NOFS); + if (!fsreq) + return netfs_subreq_terminated(subreq, -ENOMEM, false); - req = kzalloc(sizeof(struct afs_read), GFP_KERNEL); - if (!req) - goto enomem; - - refcount_set(&req->usage, 1); - req->vnode = vnode; - req->key = key_get(key); - req->pos = (loff_t)page->index << PAGE_SHIFT; - req->len = thp_size(page); - req->nr_pages = thp_nr_pages(page); - req->done = afs_file_read_done; - req->cleanup = afs_file_read_cleanup; - - get_page(page); - iov_iter_xarray(&req->def_iter, READ, &page->mapping->i_pages, - req->pos, req->len); - req->iter = &req->def_iter; - - ret = afs_fetch_data(vnode, req); - if (ret < 0) - goto fetch_error; + fsreq->subreq = subreq; + fsreq->pos = subreq->start + subreq->transferred; + fsreq->len = subreq->len - subreq->transferred; + fsreq->key = subreq->rreq->netfs_priv; + fsreq->vnode = vnode; + fsreq->iter = &fsreq->def_iter; - afs_put_read(req); - _leave(" = 0"); - return 0; + iov_iter_xarray(&fsreq->def_iter, READ, + &fsreq->vnode->vfs_inode.i_mapping->i_pages, + fsreq->pos, fsreq->len); -fetch_error: - switch (ret) { - case -EINTR: - case -ENOMEM: - case -ERESTARTSYS: - case -EAGAIN: - afs_put_read(req); - goto error; - case -ENOENT: - _debug("got NOENT from server - marking file deleted and stale"); - set_bit(AFS_VNODE_DELETED, &vnode->flags); - ret = -ESTALE; - /* Fall through */ - default: - page_endio(page, false, ret); - afs_put_read(req); - _leave(" = %d", ret); - return ret; - } - -enomem: - ret = -ENOMEM; -error: - unlock_page(page); - _leave(" = %d", ret); - return ret; + afs_fetch_data(fsreq->vnode, fsreq); } -/* - * read page from file, directory or symlink, given a file to nominate the key - * to be used - */ -static int afs_readpage(struct file *file, struct page *page) +static int afs_symlink_readpage(struct page *page) { - struct key *key; + struct afs_vnode *vnode = AFS_FS_I(page->mapping->host); + struct afs_read *fsreq; int ret; - if (file) { - key = afs_file_key(file); - ASSERT(key != NULL); - ret = afs_page_filler(key, page); - } else { - struct inode *inode = page->mapping->host; - key = afs_request_key(AFS_FS_S(inode->i_sb)->cell); - if (IS_ERR(key)) { - ret = PTR_ERR(key); - } else { - ret = afs_page_filler(key, page); - key_put(key); - } - } - return ret; -} - -/* - * Read a contiguous set of pages. - */ -static int afs_readpages_one(struct file *file, struct address_space *mapping, - struct list_head *pages) -{ - struct afs_vnode *vnode = AFS_FS_I(mapping->host); - struct afs_read *req; - struct list_head *p; - struct page *first, *page; - pgoff_t index; - int ret, n; - - /* Count the number of contiguous pages at the front of the list. Note - * that the list goes prev-wards rather than next-wards. - */ - first = lru_to_page(pages); - index = first->index + 1; - n = 1; - for (p = first->lru.prev; p != pages; p = p->prev) { - page = list_entry(p, struct page, lru); - if (page->index != index) - break; - index++; - n++; - } - - req = kzalloc(sizeof(struct afs_read), GFP_NOFS); - if (!req) + fsreq = afs_alloc_read(GFP_NOFS); + if (!fsreq) return -ENOMEM; - refcount_set(&req->usage, 1); - req->vnode = vnode; - req->key = key_get(afs_file_key(file)); - req->done = afs_file_read_done; - req->cleanup = afs_file_read_cleanup; - req->pos = first->index; - req->pos <<= PAGE_SHIFT; - - /* Add pages to the LRU until it fails. We keep the pages ref'd and - * locked until the read is complete. - * - * Note that it's possible for the file size to change whilst we're - * doing this, but we rely on the server returning less than we asked - * for if the file shrank. We also rely on this to deal with a partial - * page at the end of the file. - */ - do { - page = lru_to_page(pages); - list_del(&page->lru); - index = page->index; - if (add_to_page_cache_lru(page, mapping, index, - readahead_gfp_mask(mapping))) { - put_page(page); - break; - } - - req->nr_pages++; - } while (req->nr_pages < n); - - if (req->nr_pages == 0) { - afs_put_read(req); - return 0; - } - - req->len = req->nr_pages * PAGE_SIZE; - iov_iter_xarray(&req->def_iter, READ, &file->f_mapping->i_pages, - req->pos, req->len); - req->iter = &req->def_iter; + fsreq->pos = page->index * PAGE_SIZE; + fsreq->len = PAGE_SIZE; + fsreq->vnode = vnode; + fsreq->iter = &fsreq->def_iter; + iov_iter_xarray(&fsreq->def_iter, READ, &page->mapping->i_pages, + fsreq->pos, fsreq->len); - ret = afs_fetch_data(vnode, req); - if (ret < 0) - goto error; + ret = afs_fetch_data(fsreq->vnode, fsreq); + page_endio(page, false, ret); + return ret; +} - afs_put_read(req); - return 0; +static void afs_init_rreq(struct netfs_read_request *rreq, struct file *file) +{ + rreq->netfs_priv = key_get(afs_file_key(file)); +} -error: - if (ret == -ENOENT) { - _debug("got NOENT from server - marking file deleted and stale"); - set_bit(AFS_VNODE_DELETED, &vnode->flags); - ret = -ESTALE; - } +static int afs_begin_cache_operation(struct netfs_read_request *rreq) +{ + struct afs_vnode *vnode = AFS_FS_I(rreq->inode); - afs_put_read(req); - return ret; + return fscache_begin_read_operation(rreq, afs_vnode_cache(vnode)); } -/* - * read a set of pages - */ -static int afs_readpages(struct file *file, struct address_space *mapping, - struct list_head *pages, unsigned nr_pages) +static void afs_priv_cleanup(struct address_space *mapping, void *netfs_priv) { - struct key *key = afs_file_key(file); - struct afs_vnode *vnode; - int ret = 0; - - _enter("{%d},{%lu},,%d", - key_serial(key), mapping->host->i_ino, nr_pages); + key_put(netfs_priv); +} - ASSERT(key != NULL); +static const struct netfs_read_request_ops afs_req_ops = { + .init_rreq = afs_init_rreq, + .begin_cache_operation = afs_begin_cache_operation, + .issue_op = afs_req_issue_op, + .cleanup = afs_priv_cleanup, +}; - vnode = AFS_FS_I(mapping->host); - if (test_bit(AFS_VNODE_DELETED, &vnode->flags)) { - _leave(" = -ESTALE"); - return -ESTALE; - } +static int afs_readpage(struct file *file, struct page *page) +{ + if (!file) + return afs_symlink_readpage(page); - /* attempt to read as many of the pages as possible */ - while (!list_empty(pages)) { - ret = afs_readpages_one(file, mapping, pages); - if (ret < 0) - break; - } + return netfs_readpage(file, page, &afs_req_ops, NULL); +} - _leave(" = %d [netting]", ret); - return ret; +static void afs_readahead(struct readahead_control *ractl) +{ + netfs_readahead(ractl, &afs_req_ops, NULL); } /* diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 5e34f4dbd385..2f695a260442 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "internal.h" #include "afs_fs.h" #include "xdr_fs.h" diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 62c1b38fa98b..96b33d2e3116 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -14,6 +14,7 @@ #include #include #include +#define FSCACHE_USE_NEW_IO_API #include #include #include @@ -207,6 +208,7 @@ struct afs_read { loff_t file_size; /* File size returned by server */ struct key *key; /* The key to use to reissue the read */ struct afs_vnode *vnode; /* The file being read into. */ + struct netfs_read_subrequest *subreq; /* Fscache helper read request this belongs to */ afs_dataversion_t data_version; /* Version number returned by server */ refcount_t usage; unsigned int call_debug_id; @@ -1049,6 +1051,7 @@ extern void afs_put_wb_key(struct afs_wb_key *); extern int afs_open(struct inode *, struct file *); extern int afs_release(struct inode *, struct file *); extern int afs_fetch_data(struct afs_vnode *, struct afs_read *); +extern struct afs_read *afs_alloc_read(gfp_t); extern void afs_put_read(struct afs_read *); static inline struct afs_read *afs_get_read(struct afs_read *req) From patchwork Wed Mar 10 17:00:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 397619 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D52F0C433E0 for ; Wed, 10 Mar 2021 17:01:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 914C364E59 for ; Wed, 10 Mar 2021 17:01:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233799AbhCJRBQ (ORCPT ); Wed, 10 Mar 2021 12:01:16 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:22905 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233783AbhCJRAx (ORCPT ); Wed, 10 Mar 2021 12:00:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395652; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/QrxkL5yU3bNKMVvjsOwNL+2HswH3DuEf3MKoAjRTHw=; b=F6eNcDndFWxpQy87y5L3ZqbLeaCgvFYDrHPQnJb3NT2WS5AuwHDE+YXibs0LnYnq1Htuq7 U6nudxWgtCdZMXr1C8XhOO5YAmfL9kOgP67/dBgb7GJb+I57cvlAtBqzSBrYyOQfcvdLX9 wIJYBVD7DRYjSaelc6mlBsqwyRNdbXw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-54-qyyWa-4pPuSOIVR5bFKa8A-1; Wed, 10 Mar 2021 12:00:48 -0500 X-MC-Unique: qyyWa-4pPuSOIVR5bFKa8A-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CEAD757; Wed, 10 Mar 2021 17:00:45 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id ADBA810016FD; Wed, 10 Mar 2021 17:00:33 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 28/28] afs: Use the fscache_write_begin() helper From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 17:00:32 +0000 Message-ID: <161539563244.286939.16537296241609909980.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Make AFS use the new fscache_write_begin() helper to do the pre-reading required before the write. If successful, the helper returns with the required page filled in and locked. It may read more than just one page, expanding the read to meet cache granularity requirements as necessary. Note: A more advanced version of this could be made that does generic_perform_write() for a whole cache granule. This would make it easier to avoid doing the download/read for the data to be overwritten. Signed-off-by: David Howells cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588546422.3465195.1546354372589291098.stgit@warthog.procyon.org.uk/ # rfc --- fs/afs/file.c | 19 +++++++++ fs/afs/internal.h | 1 fs/afs/write.c | 108 ++++++----------------------------------------------- 3 files changed, 31 insertions(+), 97 deletions(-) diff --git a/fs/afs/file.c b/fs/afs/file.c index 99bb4649a306..cf2b664a68a5 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -334,6 +334,13 @@ static void afs_init_rreq(struct netfs_read_request *rreq, struct file *file) rreq->netfs_priv = key_get(afs_file_key(file)); } +static bool afs_is_cache_enabled(struct inode *inode) +{ + struct fscache_cookie *cookie = afs_vnode_cache(AFS_FS_I(inode)); + + return fscache_cookie_enabled(cookie) && !hlist_empty(&cookie->backing_objects); +} + static int afs_begin_cache_operation(struct netfs_read_request *rreq) { struct afs_vnode *vnode = AFS_FS_I(rreq->inode); @@ -341,14 +348,24 @@ static int afs_begin_cache_operation(struct netfs_read_request *rreq) return fscache_begin_read_operation(rreq, afs_vnode_cache(vnode)); } +static int afs_check_write_begin(struct file *file, loff_t pos, unsigned len, + struct page *page, void **_fsdata) +{ + struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); + + return test_bit(AFS_VNODE_DELETED, &vnode->flags) ? -ESTALE : 0; +} + static void afs_priv_cleanup(struct address_space *mapping, void *netfs_priv) { key_put(netfs_priv); } -static const struct netfs_read_request_ops afs_req_ops = { +const struct netfs_read_request_ops afs_req_ops = { .init_rreq = afs_init_rreq, + .is_cache_enabled = afs_is_cache_enabled, .begin_cache_operation = afs_begin_cache_operation, + .check_write_begin = afs_check_write_begin, .issue_op = afs_req_issue_op, .cleanup = afs_priv_cleanup, }; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 96b33d2e3116..9f4040724318 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -1045,6 +1045,7 @@ extern void afs_dynroot_depopulate(struct super_block *); extern const struct address_space_operations afs_fs_aops; extern const struct inode_operations afs_file_inode_operations; extern const struct file_operations afs_file_operations; +extern const struct netfs_read_request_ops afs_req_ops; extern int afs_cache_wb_key(struct afs_vnode *, struct afs_file *); extern void afs_put_wb_key(struct afs_wb_key *); diff --git a/fs/afs/write.c b/fs/afs/write.c index e672833c99bc..b2e03de09c24 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -11,6 +11,8 @@ #include #include #include +#include +#include #include "internal.h" /* @@ -22,68 +24,6 @@ int afs_set_page_dirty(struct page *page) return __set_page_dirty_nobuffers(page); } -/* - * Handle completion of a read operation to fill a page. - */ -static void afs_fill_hole(struct afs_read *req) -{ - if (iov_iter_count(req->iter) > 0) - /* The read was short - clear the excess buffer. */ - iov_iter_zero(iov_iter_count(req->iter), req->iter); -} - -/* - * partly or wholly fill a page that's under preparation for writing - */ -static int afs_fill_page(struct file *file, - loff_t pos, unsigned int len, struct page *page) -{ - struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); - struct afs_read *req; - size_t p; - void *data; - int ret; - - _enter(",,%llu", (unsigned long long)pos); - - if (pos >= vnode->vfs_inode.i_size) { - p = pos & ~PAGE_MASK; - ASSERTCMP(p + len, <=, PAGE_SIZE); - data = kmap(page); - memset(data + p, 0, len); - kunmap(page); - return 0; - } - - req = kzalloc(sizeof(struct afs_read), GFP_KERNEL); - if (!req) - return -ENOMEM; - - refcount_set(&req->usage, 1); - req->vnode = vnode; - req->done = afs_fill_hole; - req->key = key_get(afs_file_key(file)); - req->pos = pos; - req->len = len; - req->nr_pages = 1; - req->iter = &req->def_iter; - iov_iter_xarray(&req->def_iter, READ, &file->f_mapping->i_pages, pos, len); - - ret = afs_fetch_data(vnode, req); - afs_put_read(req); - if (ret < 0) { - if (ret == -ENOENT) { - _debug("got NOENT from server" - " - marking file deleted and stale"); - set_bit(AFS_VNODE_DELETED, &vnode->flags); - ret = -ESTALE; - } - } - - _leave(" = %d", ret); - return ret; -} - /* * prepare to perform part of a write to a page */ @@ -102,24 +42,14 @@ int afs_write_begin(struct file *file, struct address_space *mapping, _enter("{%llx:%llu},%llx,%x", vnode->fid.vid, vnode->fid.vnode, pos, len); - page = grab_cache_page_write_begin(mapping, pos / PAGE_SIZE, flags); - if (!page) - return -ENOMEM; - - if (!PageUptodate(page) && len != PAGE_SIZE) { - ret = afs_fill_page(file, pos & PAGE_MASK, PAGE_SIZE, page); - if (ret < 0) { - unlock_page(page); - put_page(page); - _leave(" = %d [prep]", ret); - return ret; - } - SetPageUptodate(page); - } - -#ifdef CONFIG_AFS_FSCACHE - wait_on_page_fscache(page); -#endif + /* Prefetch area to be written into the cache if we're caching this + * file. We need to do this before we get a lock on the page in case + * there's more than one writer competing for the same cache block. + */ + ret = netfs_write_begin(file, mapping, pos, len, flags, &page, fsdata, + &afs_req_ops, NULL); + if (ret < 0) + return ret; index = page->index; from = pos - index * PAGE_SIZE; @@ -184,7 +114,6 @@ int afs_write_end(struct file *file, struct address_space *mapping, unsigned int f, from = pos & (thp_size(page) - 1); unsigned int t, to = from + copied; loff_t i_size, maybe_i_size; - int ret = 0; _enter("{%llx:%llu},{%lx}", vnode->fid.vid, vnode->fid.vnode, page->index); @@ -203,19 +132,7 @@ int afs_write_end(struct file *file, struct address_space *mapping, write_sequnlock(&vnode->cb_lock); } - if (!PageUptodate(page)) { - if (copied < len) { - /* Try and load any missing data from the server. The - * unmarshalling routine will take care of clearing any - * bits that are beyond the EOF. - */ - ret = afs_fill_page(file, pos + copied, - len - copied, page); - if (ret < 0) - goto out; - } - SetPageUptodate(page); - } + ASSERT(PageUptodate(page)); if (PagePrivate(page)) { priv = page_private(page); @@ -236,12 +153,11 @@ int afs_write_end(struct file *file, struct address_space *mapping, if (set_page_dirty(page)) _debug("dirtied %lx", page->index); - ret = copied; out: unlock_page(page); put_page(page); - return ret; + return copied; } /*