Message ID | 163967073889.1823006.12237147297060239168.stgit@warthog.procyon.org.uk |
---|---|
Headers | show |
Series | fscache, cachefiles: Rewrite | expand |
On Thu, 2021-12-16 at 16:10 +0000, David Howells wrote: > Implement a very simple cookie state machine to handle lookup, > invalidation, withdrawal, relinquishment and, to be added later, commit on > LRU discard. > > Three cache methods are provided: ->lookup_cookie() to look up and, if > necessary, create a data storage object; ->withdraw_cookie() to free the > resources associated with that object and potentially delete it; and > ->prepare_to_write(), to do prepare for changes to the cached data to be > modified locally. > > Changes > ======= > ver #3: > - Fix a race between LRU discard and relinquishment whereby the former > would override the latter and thus the latter would never happen[1]. > > ver #2: > - Don't hold n_accesses elevated whilst cache is bound to a cookie, but > rather add a flag that prevents the state machine from being queued when > n_accesses reaches 0. > > Signed-off-by: David Howells <dhowells@redhat.com> > cc: linux-cachefs@redhat.com > Link: https://lore.kernel.org/r/599331.1639410068@warthog.procyon.org.uk/ [1] > Link: https://lore.kernel.org/r/163819599657.215744.15799615296912341745.stgit@warthog.procyon.org.uk/ # v1 > Link: https://lore.kernel.org/r/163906903925.143852.1805855338154353867.stgit@warthog.procyon.org.uk/ # v2 > --- > > fs/fscache/cookie.c | 313 +++++++++++++++++++++++++++++++++++----- > include/linux/fscache-cache.h | 27 +++ > include/trace/events/fscache.h | 4 + > 3 files changed, 300 insertions(+), 44 deletions(-) > > diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c > index 04d2127bd354..336046de08ee 100644 > --- a/fs/fscache/cookie.c > +++ b/fs/fscache/cookie.c > @@ -15,7 +15,8 @@ > > struct kmem_cache *fscache_cookie_jar; > > -static void fscache_drop_cookie(struct fscache_cookie *cookie); > +static void fscache_cookie_worker(struct work_struct *work); > +static void fscache_unhash_cookie(struct fscache_cookie *cookie); > > #define fscache_cookie_hash_shift 15 > static struct hlist_bl_head fscache_cookie_hash[1 << fscache_cookie_hash_shift]; > @@ -62,6 +63,19 @@ static void fscache_free_cookie(struct fscache_cookie *cookie) > kmem_cache_free(fscache_cookie_jar, cookie); > } > > +static void __fscache_queue_cookie(struct fscache_cookie *cookie) > +{ > + if (!queue_work(fscache_wq, &cookie->work)) > + fscache_put_cookie(cookie, fscache_cookie_put_over_queued); > +} > + > +static void fscache_queue_cookie(struct fscache_cookie *cookie, > + enum fscache_cookie_trace where) > +{ > + fscache_get_cookie(cookie, where); > + __fscache_queue_cookie(cookie); > +} > + > /* > * Initialise the access gate on a cookie by setting a flag to prevent the > * state machine from being queued when the access counter transitions to 0. > @@ -98,9 +112,8 @@ void fscache_end_cookie_access(struct fscache_cookie *cookie, > trace_fscache_access(cookie->debug_id, refcount_read(&cookie->ref), > n_accesses, why); > if (n_accesses == 0 && > - !test_bit(FSCACHE_COOKIE_NO_ACCESS_WAKE, &cookie->flags)) { > - // PLACEHOLDER: Need to poke the state machine > - } > + !test_bit(FSCACHE_COOKIE_NO_ACCESS_WAKE, &cookie->flags)) > + fscache_queue_cookie(cookie, fscache_cookie_get_end_access); > } > EXPORT_SYMBOL(fscache_end_cookie_access); > > @@ -171,35 +184,58 @@ static inline void wake_up_cookie_state(struct fscache_cookie *cookie) > wake_up_var(&cookie->state); > } > > +/* > + * Change the state a cookie is at and wake up anyone waiting for that. Impose > + * an ordering between the stuff stored in the cookie and the state member. > + * Paired with fscache_cookie_state(). > + */ > static void __fscache_set_cookie_state(struct fscache_cookie *cookie, > enum fscache_cookie_state state) > { > - cookie->state = state; > + smp_store_release(&cookie->state, state); > } > > -/* > - * Change the state a cookie is at and wake up anyone waiting for that - but > - * only if the cookie isn't already marked as being in a cleanup state. > - */ > -void fscache_set_cookie_state(struct fscache_cookie *cookie, > - enum fscache_cookie_state state) > +static void fscache_set_cookie_state(struct fscache_cookie *cookie, > + enum fscache_cookie_state state) > { > - bool changed = false; > - > spin_lock(&cookie->lock); > - switch (cookie->state) { > - case FSCACHE_COOKIE_STATE_RELINQUISHING: > - break; > - default: > - __fscache_set_cookie_state(cookie, state); > - changed = true; > - break; > - } > + __fscache_set_cookie_state(cookie, state); > spin_unlock(&cookie->lock); > - if (changed) > - wake_up_cookie_state(cookie); > + wake_up_cookie_state(cookie); > +} > + > +/** > + * fscache_cookie_lookup_negative - Note negative lookup > + * @cookie: The cookie that was being looked up > + * > + * Note that some part of the metadata path in the cache doesn't exist and so > + * we can release any waiting readers in the certain knowledge that there's > + * nothing for them to actually read. > + * > + * This function uses no locking and must only be called from the state machine. > + */ > +void fscache_cookie_lookup_negative(struct fscache_cookie *cookie) > +{ > + set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags); > + fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_CREATING); > +} > +EXPORT_SYMBOL(fscache_cookie_lookup_negative); > + > +/** > + * fscache_caching_failed - Report that a failure stopped caching on a cookie > + * @cookie: The cookie that was affected > + * > + * Tell fscache that caching on a cookie needs to be stopped due to some sort > + * of failure. > + * > + * This function uses no locking and must only be called from the state machine. > + */ > +void fscache_caching_failed(struct fscache_cookie *cookie) > +{ > + clear_bit(FSCACHE_COOKIE_IS_CACHING, &cookie->flags); > + fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_FAILED); > } > -EXPORT_SYMBOL(fscache_set_cookie_state); > +EXPORT_SYMBOL(fscache_caching_failed); > > /* > * Set the index key in a cookie. The cookie struct has space for a 16-byte > @@ -291,10 +327,10 @@ static struct fscache_cookie *fscache_alloc_cookie( > > refcount_set(&cookie->ref, 1); > cookie->debug_id = atomic_inc_return(&fscache_cookie_debug_id); > - cookie->state = FSCACHE_COOKIE_STATE_QUIESCENT; > spin_lock_init(&cookie->lock); > INIT_LIST_HEAD(&cookie->commit_link); > - INIT_WORK(&cookie->work, NULL /* PLACEHOLDER */); > + INIT_WORK(&cookie->work, fscache_cookie_worker); > + __fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_QUIESCENT); > > write_lock(&fscache_cookies_lock); > list_add_tail(&cookie->proc_link, &fscache_cookies); > @@ -417,6 +453,192 @@ struct fscache_cookie *__fscache_acquire_cookie( > } > EXPORT_SYMBOL(__fscache_acquire_cookie); > > +/* > + * Prepare a cache object to be written to. > + */ > +static void fscache_prepare_to_write(struct fscache_cookie *cookie) > +{ > + cookie->volume->cache->ops->prepare_to_write(cookie); > +} > + > +/* > + * Look up a cookie in the cache. > + */ > +static void fscache_perform_lookup(struct fscache_cookie *cookie) > +{ > + enum fscache_access_trace trace = fscache_access_lookup_cookie_end_failed; > + bool need_withdraw = false; > + > + _enter(""); > + > + if (!cookie->volume->cache_priv) { > + fscache_create_volume(cookie->volume, true); > + if (!cookie->volume->cache_priv) { > + fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_QUIESCENT); > + goto out; > + } > + } > + > + if (!cookie->volume->cache->ops->lookup_cookie(cookie)) { > + if (cookie->state != FSCACHE_COOKIE_STATE_FAILED) > + fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_QUIESCENT); > + need_withdraw = true; > + _leave(" [fail]"); > + goto out; > + } > + > + fscache_see_cookie(cookie, fscache_cookie_see_active); > + fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_ACTIVE); > + trace = fscache_access_lookup_cookie_end; > + > +out: > + fscache_end_cookie_access(cookie, trace); > + if (need_withdraw) > + fscache_withdraw_cookie(cookie); > + fscache_end_volume_access(cookie->volume, cookie, trace); > +} > + > +/* > + * Perform work upon the cookie, such as committing its cache state, > + * relinquishing it or withdrawing the backing cache. We're protected from the > + * cache going away under us as object withdrawal must come through this > + * non-reentrant work item. > + */ > +static void fscache_cookie_state_machine(struct fscache_cookie *cookie) > +{ > + enum fscache_cookie_state state; > + bool wake = false; > + > + _enter("c=%x", cookie->debug_id); > + > +again: > + spin_lock(&cookie->lock); > +again_locked: > + state = cookie->state; > + switch (state) { > + case FSCACHE_COOKIE_STATE_QUIESCENT: > + /* The QUIESCENT state is jumped to the LOOKING_UP state by > + * fscache_use_cookie(). > + */ > + > + if (atomic_read(&cookie->n_accesses) == 0 && > + test_bit(FSCACHE_COOKIE_DO_RELINQUISH, &cookie->flags)) { > + __fscache_set_cookie_state(cookie, > + FSCACHE_COOKIE_STATE_RELINQUISHING); > + wake = true; > + goto again_locked; > + } > + break; > + > + case FSCACHE_COOKIE_STATE_LOOKING_UP: > + spin_unlock(&cookie->lock); > + fscache_init_access_gate(cookie); > + fscache_perform_lookup(cookie); > + goto again; > + > + case FSCACHE_COOKIE_STATE_ACTIVE: > + if (test_and_clear_bit(FSCACHE_COOKIE_DO_PREP_TO_WRITE, &cookie->flags)) { > + spin_unlock(&cookie->lock); > + fscache_prepare_to_write(cookie); > + spin_lock(&cookie->lock); > + } > + fallthrough; > + > + case FSCACHE_COOKIE_STATE_FAILED: > + if (atomic_read(&cookie->n_accesses) != 0) > + break; > + if (test_bit(FSCACHE_COOKIE_DO_RELINQUISH, &cookie->flags)) { > + __fscache_set_cookie_state(cookie, > + FSCACHE_COOKIE_STATE_RELINQUISHING); > + wake = true; > + goto again_locked; > + } > + if (test_bit(FSCACHE_COOKIE_DO_WITHDRAW, &cookie->flags)) { > + __fscache_set_cookie_state(cookie, > + FSCACHE_COOKIE_STATE_WITHDRAWING); > + wake = true; > + goto again_locked; > + } > + break; > + > + case FSCACHE_COOKIE_STATE_RELINQUISHING: > + case FSCACHE_COOKIE_STATE_WITHDRAWING: > + if (cookie->cache_priv) { > + spin_unlock(&cookie->lock); > + cookie->volume->cache->ops->withdraw_cookie(cookie); > + spin_lock(&cookie->lock); > + } > + > + switch (state) { > + case FSCACHE_COOKIE_STATE_RELINQUISHING: > + fscache_see_cookie(cookie, fscache_cookie_see_relinquish); > + fscache_unhash_cookie(cookie); > + __fscache_set_cookie_state(cookie, > + FSCACHE_COOKIE_STATE_DROPPED); > + wake = true; > + goto out; > + case FSCACHE_COOKIE_STATE_WITHDRAWING: > + fscache_see_cookie(cookie, fscache_cookie_see_withdraw); > + break; > + default: > + BUG(); > + } > + Ugh, the nested switch here is a bit hard to follow. It makes it seem like the state could change due to the withdraw_cookie and you're checking it again, but it doesn't do that. This would be clearer if you just duplicated the withdraw_cookie stanza for both states and moved the stuff below here to a helper or to a new goto block. > + clear_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &cookie->flags); > + clear_bit(FSCACHE_COOKIE_DO_WITHDRAW, &cookie->flags); > + clear_bit(FSCACHE_COOKIE_DO_LRU_DISCARD, &cookie->flags); > + clear_bit(FSCACHE_COOKIE_DO_PREP_TO_WRITE, &cookie->flags); > + set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags); > + __fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_QUIESCENT); > + wake = true; > + goto again_locked; > + > + case FSCACHE_COOKIE_STATE_DROPPED: > + break; > + > + default: > + WARN_ONCE(1, "Cookie %x in unexpected state %u\n", > + cookie->debug_id, state); > + break; > + } > + > +out: > + spin_unlock(&cookie->lock); > + if (wake) > + wake_up_cookie_state(cookie); > + _leave(""); > +} > + > +static void fscache_cookie_worker(struct work_struct *work) > +{ > + struct fscache_cookie *cookie = container_of(work, struct fscache_cookie, work); > + > + fscache_see_cookie(cookie, fscache_cookie_see_work); > + fscache_cookie_state_machine(cookie); > + fscache_put_cookie(cookie, fscache_cookie_put_work); > +} > + > +/* > + * Wait for the object to become inactive. The cookie's work item will be > + * scheduled when someone transitions n_accesses to 0 - but if someone's > + * already done that, schedule it anyway. > + */ > +static void __fscache_withdraw_cookie(struct fscache_cookie *cookie) > +{ > + int n_accesses; > + bool unpinned; > + > + unpinned = test_and_clear_bit(FSCACHE_COOKIE_NO_ACCESS_WAKE, &cookie->flags); > + > + /* Need to read the access count after unpinning */ > + n_accesses = atomic_read(&cookie->n_accesses); > + if (unpinned) > + trace_fscache_access(cookie->debug_id, refcount_read(&cookie->ref), > + n_accesses, fscache_access_cache_unpin); > + if (n_accesses == 0) > + fscache_queue_cookie(cookie, fscache_cookie_get_end_access); > +} > + > /* > * Remove a cookie from the hash table. > */ > @@ -432,21 +654,27 @@ static void fscache_unhash_cookie(struct fscache_cookie *cookie) > hlist_bl_del(&cookie->hash_link); > clear_bit(FSCACHE_COOKIE_IS_HASHED, &cookie->flags); > hlist_bl_unlock(h); > + fscache_stat(&fscache_n_relinquishes_dropped); > } > > -/* > - * Finalise a cookie after all its resources have been disposed of. > - */ > -static void fscache_drop_cookie(struct fscache_cookie *cookie) > +static void fscache_drop_withdraw_cookie(struct fscache_cookie *cookie) > { > - spin_lock(&cookie->lock); > - __fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_DROPPED); > - spin_unlock(&cookie->lock); > - wake_up_cookie_state(cookie); > + __fscache_withdraw_cookie(cookie); > +} > > - fscache_unhash_cookie(cookie); > - fscache_stat(&fscache_n_relinquishes_dropped); > +/** > + * fscache_withdraw_cookie - Mark a cookie for withdrawal > + * @cookie: The cookie to be withdrawn. > + * > + * Allow the cache backend to withdraw the backing for a cookie for its own > + * reasons, even if that cookie is in active use. > + */ > +void fscache_withdraw_cookie(struct fscache_cookie *cookie) > +{ > + set_bit(FSCACHE_COOKIE_DO_WITHDRAW, &cookie->flags); > + fscache_drop_withdraw_cookie(cookie); > } > +EXPORT_SYMBOL(fscache_withdraw_cookie); > > /* > * Allow the netfs to release a cookie back to the cache. > @@ -473,12 +701,13 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie, bool retire) > ASSERTCMP(atomic_read(&cookie->volume->n_cookies), >, 0); > atomic_dec(&cookie->volume->n_cookies); > > - set_bit(FSCACHE_COOKIE_DO_RELINQUISH, &cookie->flags); > - > - if (test_bit(FSCACHE_COOKIE_HAS_BEEN_CACHED, &cookie->flags)) > - ; // PLACEHOLDER: Do something here if the cookie was cached > - else > - fscache_drop_cookie(cookie); > + if (test_bit(FSCACHE_COOKIE_HAS_BEEN_CACHED, &cookie->flags)) { > + set_bit(FSCACHE_COOKIE_DO_RELINQUISH, &cookie->flags); > + fscache_drop_withdraw_cookie(cookie); > + } else { > + fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_DROPPED); > + fscache_unhash_cookie(cookie); > + } > fscache_put_cookie(cookie, fscache_cookie_put_relinquish); > } > EXPORT_SYMBOL(__fscache_relinquish_cookie); > diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h > index 936ef731bbc7..ae6a75976450 100644 > --- a/include/linux/fscache-cache.h > +++ b/include/linux/fscache-cache.h > @@ -57,6 +57,15 @@ struct fscache_cache_ops { > > /* Free the cache's data attached to a volume */ > void (*free_volume)(struct fscache_volume *volume); > + > + /* Look up a cookie in the cache */ > + bool (*lookup_cookie)(struct fscache_cookie *cookie); > + > + /* Withdraw an object without any cookie access counts held */ > + void (*withdraw_cookie)(struct fscache_cookie *cookie); > + > + /* Prepare to write to a live cache object */ > + void (*prepare_to_write)(struct fscache_cookie *cookie); > }; > > extern struct workqueue_struct *fscache_wq; > @@ -72,6 +81,7 @@ extern int fscache_add_cache(struct fscache_cache *cache, > void *cache_priv); > extern void fscache_withdraw_cache(struct fscache_cache *cache); > extern void fscache_withdraw_volume(struct fscache_volume *volume); > +extern void fscache_withdraw_cookie(struct fscache_cookie *cookie); > > extern void fscache_io_error(struct fscache_cache *cache); > > @@ -85,8 +95,21 @@ extern void fscache_put_cookie(struct fscache_cookie *cookie, > enum fscache_cookie_trace where); > extern void fscache_end_cookie_access(struct fscache_cookie *cookie, > enum fscache_access_trace why); > -extern void fscache_set_cookie_state(struct fscache_cookie *cookie, > - enum fscache_cookie_state state); > +extern void fscache_cookie_lookup_negative(struct fscache_cookie *cookie); > +extern void fscache_caching_failed(struct fscache_cookie *cookie); > + > +/** > + * fscache_cookie_state - Read the state of a cookie > + * @cookie: The cookie to query > + * > + * Get the state of a cookie, imposing an ordering between the cookie contents > + * and the state value. Paired with fscache_set_cookie_state(). > + */ > +static inline > +enum fscache_cookie_state fscache_cookie_state(struct fscache_cookie *cookie) > +{ > + return smp_load_acquire(&cookie->state); > +} > > /** > * fscache_get_key - Get a pointer to the cookie key > diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h > index 1d576bd8112e..030c97bb9c8b 100644 > --- a/include/trace/events/fscache.h > +++ b/include/trace/events/fscache.h > @@ -68,6 +68,8 @@ enum fscache_access_trace { > fscache_access_acquire_volume_end, > fscache_access_cache_pin, > fscache_access_cache_unpin, > + fscache_access_lookup_cookie_end, > + fscache_access_lookup_cookie_end_failed, > fscache_access_relinquish_volume, > fscache_access_relinquish_volume_end, > fscache_access_unlive, > @@ -124,6 +126,8 @@ enum fscache_access_trace { > EM(fscache_access_acquire_volume_end, "END acq_vol") \ > EM(fscache_access_cache_pin, "PIN cache ") \ > EM(fscache_access_cache_unpin, "UNPIN cache ") \ > + EM(fscache_access_lookup_cookie_end, "END lookup ") \ > + EM(fscache_access_lookup_cookie_end_failed,"END lookupf") \ > EM(fscache_access_relinquish_volume, "BEGIN rlq_vol") \ > EM(fscache_access_relinquish_volume_end,"END rlq_vol") \ > E_(fscache_access_unlive, "END unlive ") > >
Jeff Layton <jlayton@kernel.org> wrote: > > + case FSCACHE_COOKIE_STATE_RELINQUISHING: > > + case FSCACHE_COOKIE_STATE_WITHDRAWING: > > + if (cookie->cache_priv) { > > + spin_unlock(&cookie->lock); > > + cookie->volume->cache->ops->withdraw_cookie(cookie); > > + spin_lock(&cookie->lock); > > + } > > + > > + switch (state) { > > + case FSCACHE_COOKIE_STATE_RELINQUISHING: > > + fscache_see_cookie(cookie, fscache_cookie_see_relinquish); > > + fscache_unhash_cookie(cookie); > > + __fscache_set_cookie_state(cookie, > > + FSCACHE_COOKIE_STATE_DROPPED); > > + wake = true; > > + goto out; > > + case FSCACHE_COOKIE_STATE_WITHDRAWING: > > + fscache_see_cookie(cookie, fscache_cookie_see_withdraw); > > + break; > > + default: > > + BUG(); > > + } > > + > > Ugh, the nested switch here is a bit hard to follow. It makes it seem > like the state could change due to the withdraw_cookie and you're > checking it again, but it doesn't do that. > > This would be clearer if you just duplicated the withdraw_cookie stanza > for both states and moved the stuff below here to a helper or to a new > goto block. There are actually three states, but one's added in a later patch. It probably does make sense to split the RELINQ state from the other two. David
On Thu, 2021-12-16 at 16:05 +0000, David Howells wrote: > Here's a set of patches implements a rewrite of the fscache driver and a > matching rewrite of the cachefiles driver, significantly simplifying the > code compared to what's upstream, removing the complex operation scheduling > and object state machine in favour of something much smaller and simpler. > > The patchset is structured such that the first few patches disable fscache > use by the network filesystems using it, remove the cachefiles driver > entirely and as much of the fscache driver as can be got away with without > causing build failures in the network filesystems. The patches after that > recreate fscache and then cachefiles, attempting to add the pieces in a > logical order. Finally, the filesystems are reenabled and then the very > last patch changes the documentation. > > > WHY REWRITE? > ============ > > Fscache's operation scheduling API was intended to handle sequencing of > cache operations, which were all required (where possible) to run > asynchronously in parallel with the operations being done by the network > filesystem, whilst allowing the cache to be brought online and offline and > to interrupt service for invalidation. > > With the advent of the tmpfile capacity in the VFS, however, an opportunity > arises to do invalidation much more simply, without having to wait for I/O > that's actually in progress: Cachefiles can simply create a tmpfile, cut > over the file pointer for the backing object attached to a cookie and > abandon the in-progress I/O, dismissing it upon completion. > > Future work here would involve using Omar Sandoval's vfs_link() with > AT_LINK_REPLACE[1] to allow an extant file to be displaced by a new hard > link from a tmpfile as currently I have to unlink the old file first. > > These patches can also simplify the object state handling as I/O operations > to the cache don't all have to be brought to a stop in order to invalidate > a file. To that end, and with an eye on to writing a new backing cache > model in the future, I've taken the opportunity to simplify the indexing > structure. > > I've separated the index cookie concept from the file cookie concept by C > type now. The former is now called a "volume cookie" (struct > fscache_volume) and there is a container of file cookies. There are then > just the two levels. All the index cookie levels are collapsed into a > single volume cookie, and this has a single printable string as a key. For > instance, an AFS volume would have a key of something like > "afs,example.com,1000555", combining the filesystem name, cell name and > volume ID. This is freeform, but must not have '/' chars in it. > > I've also eliminated all pointers back from fscache into the network > filesystem. This required the duplication of a little bit of data in the > cookie (cookie key, coherency data and file size), but it's not actually > that much. This gets rid of problems with making sure we keep netfs data > structures around so that the cache can access them. > > These patches mean that most of the code that was in the drivers before is > simply gone and those drivers are now almost entirely new code. That being > the case, there doesn't seem any particular reason to try and maintain > bisectability across it. Further, there has to be a point in the middle > where things are cut over as there's a single point everything has to go > through (ie. /dev/cachefiles) and it can't be in use by two drivers at > once. > > > ISSUES YET OUTSTANDING > ====================== > > There are some issues still outstanding, unaddressed by this patchset, that > will need fixing in future patchsets, but that don't stop this series from > being usable: > > (1) The cachefiles driver needs to stop using the backing filesystem's > metadata to store information about what parts of the cache are > populated. This is not reliable with modern extent-based filesystems. > > Fixing this is deferred to a separate patchset as it involves > negotiation with the network filesystem and the VM as to how much data > to download to fulfil a read - which brings me on to (2)... > > (2) NFS and CIFS do not take account of how the cache would like I/O to be > structured to meet its granularity requirements. Previously, the > cache used page granularity, which was fine as the network filesystems > also dealt in page granularity, and the backing filesystem (ext4, xfs > or whatever) did whatever it did out of sight. However, we now have > folios to deal with and the cache will now have to store its own > metadata to track its contents. > > The change I'm looking at making for cachefiles is to store content > bitmaps in one or more xattrs and making a bit in the map correspond > to something like a 256KiB block. However, the size of an xattr and > the fact that they have to be read/updated in one go means that I'm > looking at covering 1GiB of data per 512-byte map and storing each map > in an xattr. Cachefiles has the potential to grow into a fully > fledged filesystem of its very own if I'm not careful. > > However, I'm also looking at changing things even more radically and > going to a different model of how the cache is arranged and managed - > one that's more akin to the way, say, openafs does things - which > brings me on to (3)... > > (3) The way cachefilesd does culling is very inefficient for large caches > and it would be better to move it into the kernel if I can as > cachefilesd has to keep asking the kernel if it can cull a file. > Changing the way the backend works would allow this to be addressed. > > > BITS THAT MAY BE CONTROVERSIAL > ============================== > > There are some bits I've added that may be controversial: > > (1) I've provided a flag, S_KERNEL_FILE, that cachefiles uses to check if > a files is already being used by some other kernel service (e.g. a > duplicate cachefiles cache in the same directory) and reject it if it > is. This isn't entirely necessary, but it helps prevent accidental > data corruption. > > I don't want to use S_SWAPFILE as that has other effects, but quite > possibly swapon() should set S_KERNEL_FILE too. > > Note that it doesn't prevent userspace from interfering, though > perhaps it should. (I have made it prevent a marked directory from > being rmdir-able). > > (2) Cachefiles wants to keep the backing file for a cookie open whilst we > might need to write to it from network filesystem writeback. The > problem is that the network filesystem unuses its cookie when its file > is closed, and so we have nothing pinning the cachefiles file open and > it will get closed automatically after a short time to avoid > EMFILE/ENFILE problems. > > Reopening the cache file, however, is a problem if this is being done > due to writeback triggered by exit(). Some filesystems will oops if > we try to open a file in that context because they want to access > current->fs or suchlike. > > To get around this, I added the following: > > (A) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network > filesystem inode to indicate that we have a usage count on the > cookie caching that inode. > > (B) A flag in struct writeback_control, unpinned_fscache_wb, that is > set when __writeback_single_inode() clears the last dirty page > from i_pages - at which point it clears I_PINNING_FSCACHE_WB and > sets this flag. > > This has to be done here so that clearing I_PINNING_FSCACHE_WB can > be done atomically with the check of PAGECACHE_TAG_DIRTY that > clears I_DIRTY_PAGES. > > (C) A function, fscache_set_page_dirty(), which if it is not set, sets > I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to pin the > cache resources. > > (D) A function, fscache_unpin_writeback(), to be called by > ->write_inode() to unuse the cookie. > > (E) A function, fscache_clear_inode_writeback(), to be called when the > inode is evicted, before clear_inode() is called. This cleans up > any lingering I_PINNING_FSCACHE_WB. > > The network filesystem can then use these tools to make sure that > fscache_write_to_cache() can write locally modified data to the cache > as well as to the server. > > For the future, I'm working on write helpers for netfs lib that should > allow this facility to be removed by keeping track of the dirty > regions separately - but that's incomplete at the moment and is also > going to be affected by folios, one way or another, since it deals > with pages. > > > These patches can be found also on: > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-rewrite > > David > > > Changes > ======= > ver #3: > - Fixed a race in the cookie state machine between LRU discard and > relinquishment[4]. > - Fixed up the hashing to make it portable[5]. > - Fixed up some netfs coherency data to make it portable. > - Fixed some missing NFS_FSCACHE=n fallback functions in nfs[6]. > - Added a patch to store volume coherency data in an xattr. > - Added a check that the cookie is unhashed before being freed. > - Fixed fscache to use remove_proc_subtree() to remove /proc/fs/fscache/. > > ver #2: > - Fix an unused-var warning due to CONFIG_9P_FSCACHE=n. > - Use gfpflags_allow_blocking() rather than using flag directly. > - Fixed some error logging in a couple of cachefiles functions. > - Fixed an error check in the fscache volume allocation. > - Need to unmark an inode we've moved to the graveyard before unlocking. > - Upgraded to -rc4 to allow for upstream changes to cifs. > - Should only change to inval state if can get access to cache. > - Don't hold n_accesses elevated whilst cache is bound to a cookie, but > rather add a flag that prevents the state machine from being queued when > n_accesses reaches 0. > - Remove the unused cookie pointer field from the fscache_acquire > tracepoint. > - Added missing transition to LRU_DISCARDING state. > - Added two ceph patches from Jeff Layton[2]. > - Remove NFS_INO_FSCACHE as it's no longer used. > - In NFS, need to unuse a cookie on file-release, not inode-clear. > - Filled in the NFS cache I/O routines, borrowing from the previously posted > fallback I/O code[3]. > > > Link: https://lore.kernel.org/r/cover.1580251857.git.osandov@fb.com/ [1] > Link: https://lore.kernel.org/r/20211207134451.66296-1-jlayton@kernel.org/ [2] > Link: https://lore.kernel.org/r/163189108292.2509237.12615909591150927232.stgit@warthog.procyon.org.uk/ [3] > Link: https://lore.kernel.org/r/599331.1639410068@warthog.procyon.org.uk/ [4] > Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com [5] > Link: https://lore.kernel.org/r/61b90f3d.H1IkoeQfEsGNhvq9%lkp@intel.com/ [6] > > References > ========== > > These patches have been published for review before, firstly as part of a > larger set: > > Link: https://lore.kernel.org/r/158861203563.340223.7585359869938129395.stgit@warthog.procyon.org.uk/ > > Link: https://lore.kernel.org/r/159465766378.1376105.11619976251039287525.stgit@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/r/159465784033.1376674.18106463693989811037.stgit@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/r/159465821598.1377938.2046362270225008168.stgit@warthog.procyon.org.uk/ > > Link: https://lore.kernel.org/r/160588455242.3465195.3214733858273019178.stgit@warthog.procyon.org.uk/ > > Then as a cut-down set: > > Link: https://lore.kernel.org/r/161118128472.1232039.11746799833066425131.stgit@warthog.procyon.org.uk/ # v1 > Link: https://lore.kernel.org/r/161161025063.2537118.2009249444682241405.stgit@warthog.procyon.org.uk/ # v2 > Link: https://lore.kernel.org/r/161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk/ # v3 > Link: https://lore.kernel.org/r/161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk/ # v4 > Link: https://lore.kernel.org/r/161653784755.2770958.11820491619308713741.stgit@warthog.procyon.org.uk/ # v5 > > I split out a set to just restructure the I/O, which got merged back in to > this one: > > Link: https://lore.kernel.org/r/163363935000.1980952.15279841414072653108.stgit@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/r/163189104510.2509237.10805032055807259087.stgit@warthog.procyon.org.uk/ # v2 > Link: https://lore.kernel.org/r/163363935000.1980952.15279841414072653108.stgit@warthog.procyon.org.uk/ # v3 > Link: https://lore.kernel.org/r/163551653404.1877519.12363794970541005441.stgit@warthog.procyon.org.uk/ # v4 > > ... and a larger set to do the conversion, also merged back into this one: > > Link: https://lore.kernel.org/r/163456861570.2614702.14754548462706508617.stgit@warthog.procyon.org.uk/ # v1 > Link: https://lore.kernel.org/r/163492911924.1038219.13107463173777870713.stgit@warthog.procyon.org.uk/ # v2 > > Older versions of this one: > > Link: https://lore.kernel.org/r/163819575444.215744.318477214576928110.stgit@warthog.procyon.org.uk/ # v1 > Link: https://lore.kernel.org/r/163906878733.143852.5604115678965006622.stgit@warthog.procyon.org.uk/ # v2 > > Proposals/information about the design have been published here: > > Link: https://lore.kernel.org/r/24942.1573667720@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/r/2758811.1610621106@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/r/1441311.1598547738@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/r/160655.1611012999@warthog.procyon.org.uk/ > > And requests for information: > > Link: https://lore.kernel.org/r/3326.1579019665@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/r/4467.1579020509@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/r/3577430.1579705075@warthog.procyon.org.uk/ > > I've posted partial patches to try and help 9p and cifs along: > > Link: https://lore.kernel.org/r/1514086.1605697347@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/r/1794123.1605713481@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/r/241017.1612263863@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/r/270998.1612265397@warthog.procyon.org.uk/ > > --- > Dave Wysochanski (1): > nfs: Convert to new fscache volume/cookie API > > David Howells (65): > fscache, cachefiles: Disable configuration > cachefiles: Delete the cachefiles driver pending rewrite > fscache: Remove the contents of the fscache driver, pending rewrite > netfs: Display the netfs inode number in the netfs_read tracepoint > netfs: Pass a flag to ->prepare_write() to say if there's no alloc'd space > fscache: Introduce new driver > fscache: Implement a hash function > fscache: Implement cache registration > fscache: Implement volume registration > fscache: Implement cookie registration > fscache: Implement cache-level access helpers > fscache: Implement volume-level access helpers > fscache: Implement cookie-level access helpers > fscache: Implement functions add/remove a cache > fscache: Provide and use cache methods to lookup/create/free a volume > fscache: Add a function for a cache backend to note an I/O error > fscache: Implement simple cookie state machine > fscache: Implement cookie user counting and resource pinning > fscache: Implement cookie invalidation > fscache: Provide a means to begin an operation > fscache: Count data storage objects in a cache > fscache: Provide read/write stat counters for the cache > fscache: Provide a function to let the netfs update its coherency data > netfs: Pass more information on how to deal with a hole in the cache > fscache: Implement raw I/O interface > fscache: Implement higher-level write I/O interface > vfs, fscache: Implement pinning of cache usage for writeback > fscache: Provide a function to note the release of a page > fscache: Provide a function to resize a cookie > cachefiles: Introduce rewritten driver > cachefiles: Define structs > cachefiles: Add some error injection support > cachefiles: Add a couple of tracepoints for logging errors > cachefiles: Add cache error reporting macro > cachefiles: Add security derivation > cachefiles: Register a miscdev and parse commands over it > cachefiles: Provide a function to check how much space there is > vfs, cachefiles: Mark a backing file in use with an inode flag > cachefiles: Implement a function to get/create a directory in the cache > cachefiles: Implement cache registration and withdrawal > cachefiles: Implement volume support > cachefiles: Add tracepoints for calls to the VFS > cachefiles: Implement object lifecycle funcs > cachefiles: Implement key to filename encoding > cachefiles: Implement metadata/coherency data storage in xattrs > cachefiles: Mark a backing file in use with an inode flag > cachefiles: Implement culling daemon commands > cachefiles: Implement backing file wrangling > cachefiles: Implement begin and end I/O operation > cachefiles: Implement cookie resize for truncate > cachefiles: Implement the I/O routines > fscache, cachefiles: Store the volume coherency data > cachefiles: Allow cachefiles to actually function > fscache, cachefiles: Display stats of no-space events > fscache, cachefiles: Display stat of culling events > afs: Handle len being extending over page end in write_begin/write_end > afs: Fix afs_write_end() to handle len > page size > afs: Convert afs to use the new fscache API > afs: Copy local writes to the cache when writing to the server > afs: Skip truncation on the server of data we haven't written yet > 9p: Use fscache indexing rewrite and reenable caching > 9p: Copy local writes to the cache when writing to the server > nfs: Implement cache I/O by accessing the cache directly > cifs: Support fscache indexing rewrite (untested) > fscache: Rewrite documentation > > Jeff Layton (2): > ceph: conversion to new fscache API > ceph: add fscache writeback support > > > .../filesystems/caching/backend-api.rst | 850 ++++------ > .../filesystems/caching/cachefiles.rst | 6 +- > Documentation/filesystems/caching/fscache.rst | 525 ++---- > Documentation/filesystems/caching/index.rst | 4 +- > .../filesystems/caching/netfs-api.rst | 1136 ++++--------- > Documentation/filesystems/caching/object.rst | 313 ---- > .../filesystems/caching/operations.rst | 210 --- > Documentation/filesystems/netfs_library.rst | 16 +- > fs/9p/Kconfig | 2 +- > fs/9p/cache.c | 195 +-- > fs/9p/cache.h | 25 +- > fs/9p/v9fs.c | 17 +- > fs/9p/v9fs.h | 13 +- > fs/9p/vfs_addr.c | 56 +- > fs/9p/vfs_dir.c | 13 + > fs/9p/vfs_file.c | 3 +- > fs/9p/vfs_inode.c | 26 +- > fs/9p/vfs_inode_dotl.c | 3 +- > fs/9p/vfs_super.c | 3 + > fs/afs/Kconfig | 2 +- > fs/afs/Makefile | 3 - > fs/afs/cache.c | 68 - > fs/afs/cell.c | 12 - > fs/afs/file.c | 37 +- > fs/afs/inode.c | 101 +- > fs/afs/internal.h | 37 +- > fs/afs/main.c | 14 - > fs/afs/super.c | 1 + > fs/afs/volume.c | 29 +- > fs/afs/write.c | 100 +- > fs/cachefiles/Kconfig | 7 + > fs/cachefiles/Makefile | 6 +- > fs/cachefiles/bind.c | 278 ---- > fs/cachefiles/cache.c | 378 +++++ > fs/cachefiles/daemon.c | 180 +-- > fs/cachefiles/error_inject.c | 46 + > fs/cachefiles/interface.c | 747 ++++----- > fs/cachefiles/internal.h | 270 ++-- > fs/cachefiles/io.c | 330 +++- > fs/cachefiles/key.c | 201 ++- > fs/cachefiles/main.c | 22 +- > fs/cachefiles/namei.c | 1223 ++++++-------- > fs/cachefiles/rdwr.c | 972 ----------- > fs/cachefiles/security.c | 2 +- > fs/cachefiles/volume.c | 139 ++ > fs/cachefiles/xattr.c | 425 +++-- > fs/ceph/Kconfig | 2 +- > fs/ceph/addr.c | 101 +- > fs/ceph/cache.c | 218 +-- > fs/ceph/cache.h | 97 +- > fs/ceph/caps.c | 3 +- > fs/ceph/file.c | 13 +- > fs/ceph/inode.c | 22 +- > fs/ceph/super.c | 10 +- > fs/ceph/super.h | 3 +- > fs/cifs/Kconfig | 2 +- > fs/cifs/Makefile | 2 +- > fs/cifs/cache.c | 105 -- > fs/cifs/cifsfs.c | 11 +- > fs/cifs/cifsglob.h | 5 +- > fs/cifs/connect.c | 12 - > fs/cifs/file.c | 64 +- > fs/cifs/fscache.c | 333 +--- > fs/cifs/fscache.h | 126 +- > fs/cifs/inode.c | 36 +- > fs/fs-writeback.c | 8 + > fs/fscache/Makefile | 6 +- > fs/fscache/cache.c | 618 +++---- > fs/fscache/cookie.c | 1433 +++++++++-------- > fs/fscache/fsdef.c | 98 -- > fs/fscache/internal.h | 317 +--- > fs/fscache/io.c | 376 ++++- > fs/fscache/main.c | 147 +- > fs/fscache/netfs.c | 74 - > fs/fscache/object.c | 1125 ------------- > fs/fscache/operation.c | 633 -------- > fs/fscache/page.c | 1242 -------------- > fs/fscache/proc.c | 47 +- > fs/fscache/stats.c | 293 +--- > fs/fscache/volume.c | 517 ++++++ > fs/namei.c | 3 +- > fs/netfs/read_helper.c | 10 +- > fs/nfs/Kconfig | 2 +- > fs/nfs/Makefile | 2 +- > fs/nfs/client.c | 4 - > fs/nfs/direct.c | 2 + > fs/nfs/file.c | 13 +- > fs/nfs/fscache-index.c | 140 -- > fs/nfs/fscache.c | 490 ++---- > fs/nfs/fscache.h | 179 +- > fs/nfs/inode.c | 11 +- > fs/nfs/nfstrace.h | 1 - > fs/nfs/read.c | 25 +- > fs/nfs/super.c | 28 +- > fs/nfs/write.c | 8 +- > include/linux/fs.h | 4 + > include/linux/fscache-cache.h | 614 ++----- > include/linux/fscache.h | 1022 +++++------- > include/linux/netfs.h | 15 +- > include/linux/nfs_fs.h | 1 - > include/linux/nfs_fs_sb.h | 9 +- > include/linux/writeback.h | 1 + > include/trace/events/cachefiles.h | 527 ++++-- > include/trace/events/fscache.h | 626 ++++--- > include/trace/events/netfs.h | 5 +- > 105 files changed, 7356 insertions(+), 13531 deletions(-) > delete mode 100644 Documentation/filesystems/caching/object.rst > delete mode 100644 Documentation/filesystems/caching/operations.rst > delete mode 100644 fs/afs/cache.c > delete mode 100644 fs/cachefiles/bind.c > create mode 100644 fs/cachefiles/cache.c > create mode 100644 fs/cachefiles/error_inject.c > delete mode 100644 fs/cachefiles/rdwr.c > create mode 100644 fs/cachefiles/volume.c > delete mode 100644 fs/cifs/cache.c > delete mode 100644 fs/fscache/fsdef.c > delete mode 100644 fs/fscache/netfs.c > delete mode 100644 fs/fscache/object.c > delete mode 100644 fs/fscache/operation.c > delete mode 100644 fs/fscache/page.c > create mode 100644 fs/fscache/volume.c > delete mode 100644 fs/nfs/fscache-index.c > > I went through patches #1-#27 and they all look good to me. I made few comments here and there but none of them are show-stoppers, IMO. You can add my Reviewed-by to 1-27 if you like. I'll look over the rest when I have some more time (may be a bit). Thanks!