Message ID | 20250401-work-freeze-v1-0-d000611d4ab0@kernel.org |
---|---|
Headers | show |
Series | power: wire-up filesystem freeze/thaw with suspend/resume | expand |
On Tue, Apr 01, 2025 at 02:32:48AM +0200, Christian Brauner wrote: > From: Luis Chamberlain <mcgrof@kernel.org> > > The kernel power management now supports allowing the VFS > to handle filesystem freezing freezes and thawing. Take advantage > of that and remove the kthread freezing. This is needed so that we > properly really stop IO in flight without races after userspace > has been frozen. Without this we rely on kthread freezing and > its semantics are loose and error prone. > > The filesystem therefore is in charge of properly dealing with > quiescing of the filesystem through its callbacks if it thinks > it knows better than how the VFS handles it. > ..... > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > index 0fcb1828e598..ad8183db0780 100644 > --- a/fs/xfs/xfs_trans_ail.c > +++ b/fs/xfs/xfs_trans_ail.c > @@ -636,7 +636,6 @@ xfsaild( > unsigned int noreclaim_flag; > > noreclaim_flag = memalloc_noreclaim_save(); > - set_freezable(); > > while (1) { > /* > @@ -695,8 +694,6 @@ xfsaild( > > __set_current_state(TASK_RUNNING); > > - try_to_freeze(); > - > tout = xfsaild_push(ailp); > } > So what about the TASK_FREEZABLE flag that is set in this code before sleeping? i.e. this code before we schedule(): if (tout && tout <= 20) set_current_state(TASK_KILLABLE|TASK_FREEZABLE); else set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE); Shouldn't TASK_FREEZABLE go away, too? > diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c > index c5136ea9bb1d..1875b6551ab0 100644 > --- a/fs/xfs/xfs_zone_gc.c > +++ b/fs/xfs/xfs_zone_gc.c > @@ -993,7 +993,6 @@ xfs_zone_gc_handle_work( > } > > __set_current_state(TASK_RUNNING); > - try_to_freeze(); > > if (reset_list) > xfs_zone_gc_reset_zones(data, reset_list); > @@ -1041,7 +1040,6 @@ xfs_zoned_gcd( > unsigned int nofs_flag; > > nofs_flag = memalloc_nofs_save(); > - set_freezable(); > > for (;;) { > set_current_state(TASK_INTERRUPTIBLE | TASK_FREEZABLE); Same question here for this newly merged code, too... -Dave.
On Tue, Apr 01, 2025 at 12:11:04PM +1100, Dave Chinner wrote: > On Tue, Apr 01, 2025 at 02:32:48AM +0200, Christian Brauner wrote: > > From: Luis Chamberlain <mcgrof@kernel.org> > > > > The kernel power management now supports allowing the VFS > > to handle filesystem freezing freezes and thawing. Take advantage > > of that and remove the kthread freezing. This is needed so that we > > properly really stop IO in flight without races after userspace > > has been frozen. Without this we rely on kthread freezing and > > its semantics are loose and error prone. > > > > The filesystem therefore is in charge of properly dealing with > > quiescing of the filesystem through its callbacks if it thinks > > it knows better than how the VFS handles it. > > > ..... > > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > > index 0fcb1828e598..ad8183db0780 100644 > > --- a/fs/xfs/xfs_trans_ail.c > > +++ b/fs/xfs/xfs_trans_ail.c > > @@ -636,7 +636,6 @@ xfsaild( > > unsigned int noreclaim_flag; > > > > noreclaim_flag = memalloc_noreclaim_save(); > > - set_freezable(); > > > > while (1) { > > /* > > @@ -695,8 +694,6 @@ xfsaild( > > > > __set_current_state(TASK_RUNNING); > > > > - try_to_freeze(); > > - > > tout = xfsaild_push(ailp); > > } > > > > So what about the TASK_FREEZABLE flag that is set in this code > before sleeping? > > i.e. this code before we schedule(): > > if (tout && tout <= 20) > set_current_state(TASK_KILLABLE|TASK_FREEZABLE); > else > set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE); > > Shouldn't TASK_FREEZABLE go away, too? Thanks for spotting! Yes, yesterday late at night I just took Luis patches as they are and had only gotten around to testing btrfs. The coccinelle scripts seemed to have missed those. I'll wait for comments and will do another pass and send out v2. > > diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c > > index c5136ea9bb1d..1875b6551ab0 100644 > > --- a/fs/xfs/xfs_zone_gc.c > > +++ b/fs/xfs/xfs_zone_gc.c > > @@ -993,7 +993,6 @@ xfs_zone_gc_handle_work( > > } > > > > __set_current_state(TASK_RUNNING); > > - try_to_freeze(); > > > > if (reset_list) > > xfs_zone_gc_reset_zones(data, reset_list); > > @@ -1041,7 +1040,6 @@ xfs_zoned_gcd( > > unsigned int nofs_flag; > > > > nofs_flag = memalloc_nofs_save(); > > - set_freezable(); > > > > for (;;) { > > set_current_state(TASK_INTERRUPTIBLE | TASK_FREEZABLE); > > Same question here for this newly merged code, too... I'm not sure if this is supposed to be a snipe or not but just in case this is a hidden question: This isn't merged. Per the cover letter this is in a work.* branch. Anything that is considered mergable is in vfs-6.16.* branches. But since we're pre -rc1 even those branches are not yet showing up in -next.
On Tue 01-04-25 02:32:46, Christian Brauner wrote: > From: Luis Chamberlain <mcgrof@kernel.org> > > The kernel power management now supports allowing the VFS > to handle filesystem freezing freezes and thawing. Take advantage > of that and remove the kthread freezing. This is needed so that we > properly really stop IO in flight without races after userspace > has been frozen. Without this we rely on kthread freezing and > its semantics are loose and error prone. > > The filesystem therefore is in charge of properly dealing with > quiescing of the filesystem through its callbacks if it thinks > it knows better than how the VFS handles it. > > The following Coccinelle rule was used as to remove the now superfluous > freezer calls: > > make coccicheck MODE=patch SPFLAGS="--in-place --no-show-diff" COCCI=./fs-freeze-cleanup.cocci M=fs/ext4 > > virtual patch > > @ remove_set_freezable @ > expression time; > statement S, S2; > expression task, current; > @@ > > ( > - set_freezable(); > | > - if (try_to_freeze()) > - continue; > | > - try_to_freeze(); > | > - freezable_schedule(); > + schedule(); > | > - freezable_schedule_timeout(time); > + schedule_timeout(time); > | > - if (freezing(task)) { S } > | > - if (freezing(task)) { S } > - else > { S2 } > | > - freezing(current) > ) > > @ remove_wq_freezable @ > expression WQ_E, WQ_ARG1, WQ_ARG2, WQ_ARG3, WQ_ARG4; > identifier fs_wq_fn; > @@ > > ( > WQ_E = alloc_workqueue(WQ_ARG1, > - WQ_ARG2 | WQ_FREEZABLE, > + WQ_ARG2, > ...); > | > WQ_E = alloc_workqueue(WQ_ARG1, > - WQ_ARG2 | WQ_FREEZABLE | WQ_ARG3, > + WQ_ARG2 | WQ_ARG3, > ...); > | > WQ_E = alloc_workqueue(WQ_ARG1, > - WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE, > + WQ_ARG2 | WQ_ARG3, > ...); > | > WQ_E = alloc_workqueue(WQ_ARG1, > - WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE | WQ_ARG4, > + WQ_ARG2 | WQ_ARG3 | WQ_ARG4, > ...); > | > WQ_E = > - WQ_ARG1 | WQ_FREEZABLE > + WQ_ARG1 > | > WQ_E = > - WQ_ARG1 | WQ_FREEZABLE | WQ_ARG3 > + WQ_ARG1 | WQ_ARG3 > | > fs_wq_fn( > - WQ_FREEZABLE | WQ_ARG2 | WQ_ARG3 > + WQ_ARG2 | WQ_ARG3 > ) > | > fs_wq_fn( > - WQ_FREEZABLE | WQ_ARG2 > + WQ_ARG2 > ) > | > fs_wq_fn( > - WQ_FREEZABLE > + 0 > ) > ) > > @ add_auto_flag @ > expression E1; > identifier fs_type; > @@ > > struct file_system_type fs_type = { > .fs_flags = E1 > + | FS_AUTOFREEZE > , > }; > > Generated-by: Coccinelle SmPL > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> > Link: https://lore.kernel.org/r/20250326112220.1988619-5-mcgrof@kernel.org > Signed-off-by: Christian Brauner <brauner@kernel.org> > --- > fs/ext4/mballoc.c | 2 +- > fs/ext4/super.c | 3 --- > 2 files changed, 1 insertion(+), 4 deletions(-) > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 0d523e9fb3d5..ae235ec5ff3a 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -6782,7 +6782,7 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb, > > static bool ext4_trim_interrupted(void) > { > - return fatal_signal_pending(current) || freezing(current); > + return fatal_signal_pending(current); > } This change should not happen. ext4_trim_interrupted() makes sure FITRIM ioctl doesn't cause hibernation failures and has nothing to do with kthread freezing... Otherwise the patch looks good. Honza
On Tue, Apr 01, 2025 at 11:16:18AM +0200, Jan Kara wrote: > On Tue 01-04-25 02:32:46, Christian Brauner wrote: > > From: Luis Chamberlain <mcgrof@kernel.org> > > > > The kernel power management now supports allowing the VFS > > to handle filesystem freezing freezes and thawing. Take advantage > > of that and remove the kthread freezing. This is needed so that we > > properly really stop IO in flight without races after userspace > > has been frozen. Without this we rely on kthread freezing and > > its semantics are loose and error prone. > > > > The filesystem therefore is in charge of properly dealing with > > quiescing of the filesystem through its callbacks if it thinks > > it knows better than how the VFS handles it. > > > > The following Coccinelle rule was used as to remove the now superfluous > > freezer calls: > > > > make coccicheck MODE=patch SPFLAGS="--in-place --no-show-diff" COCCI=./fs-freeze-cleanup.cocci M=fs/ext4 > > > > virtual patch > > > > @ remove_set_freezable @ > > expression time; > > statement S, S2; > > expression task, current; > > @@ > > > > ( > > - set_freezable(); > > | > > - if (try_to_freeze()) > > - continue; > > | > > - try_to_freeze(); > > | > > - freezable_schedule(); > > + schedule(); > > | > > - freezable_schedule_timeout(time); > > + schedule_timeout(time); > > | > > - if (freezing(task)) { S } > > | > > - if (freezing(task)) { S } > > - else > > { S2 } > > | > > - freezing(current) > > ) > > > > @ remove_wq_freezable @ > > expression WQ_E, WQ_ARG1, WQ_ARG2, WQ_ARG3, WQ_ARG4; > > identifier fs_wq_fn; > > @@ > > > > ( > > WQ_E = alloc_workqueue(WQ_ARG1, > > - WQ_ARG2 | WQ_FREEZABLE, > > + WQ_ARG2, > > ...); > > | > > WQ_E = alloc_workqueue(WQ_ARG1, > > - WQ_ARG2 | WQ_FREEZABLE | WQ_ARG3, > > + WQ_ARG2 | WQ_ARG3, > > ...); > > | > > WQ_E = alloc_workqueue(WQ_ARG1, > > - WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE, > > + WQ_ARG2 | WQ_ARG3, > > ...); > > | > > WQ_E = alloc_workqueue(WQ_ARG1, > > - WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE | WQ_ARG4, > > + WQ_ARG2 | WQ_ARG3 | WQ_ARG4, > > ...); > > | > > WQ_E = > > - WQ_ARG1 | WQ_FREEZABLE > > + WQ_ARG1 > > | > > WQ_E = > > - WQ_ARG1 | WQ_FREEZABLE | WQ_ARG3 > > + WQ_ARG1 | WQ_ARG3 > > | > > fs_wq_fn( > > - WQ_FREEZABLE | WQ_ARG2 | WQ_ARG3 > > + WQ_ARG2 | WQ_ARG3 > > ) > > | > > fs_wq_fn( > > - WQ_FREEZABLE | WQ_ARG2 > > + WQ_ARG2 > > ) > > | > > fs_wq_fn( > > - WQ_FREEZABLE > > + 0 > > ) > > ) > > > > @ add_auto_flag @ > > expression E1; > > identifier fs_type; > > @@ > > > > struct file_system_type fs_type = { > > .fs_flags = E1 > > + | FS_AUTOFREEZE > > , > > }; > > > > Generated-by: Coccinelle SmPL > > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> > > Link: https://lore.kernel.org/r/20250326112220.1988619-5-mcgrof@kernel.org > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > --- > > fs/ext4/mballoc.c | 2 +- > > fs/ext4/super.c | 3 --- > > 2 files changed, 1 insertion(+), 4 deletions(-) > > > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > > index 0d523e9fb3d5..ae235ec5ff3a 100644 > > --- a/fs/ext4/mballoc.c > > +++ b/fs/ext4/mballoc.c > > @@ -6782,7 +6782,7 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb, > > > > static bool ext4_trim_interrupted(void) > > { > > - return fatal_signal_pending(current) || freezing(current); > > + return fatal_signal_pending(current); > > } > > This change should not happen. ext4_trim_interrupted() makes sure FITRIM > ioctl doesn't cause hibernation failures and has nothing to do with kthread > freezing... > > Otherwise the patch looks good. Afaict, we don't have to do these changes now. Yes, once fsfreeze reliably works in the suspend/resume codepaths then we can switch all that off and remove the old freezer. But we should only do that once we have some experience with the new filesystem freezing during suspend/hibernate. So we should place this under a /sys/power/freeze_filesystems knob and wait a few kernel releases to see whether we see significant problems. How does that sound to you?
On Tue, Apr 01, 2025 at 09:17:12AM +0200, Christian Brauner wrote: > On Tue, Apr 01, 2025 at 12:11:04PM +1100, Dave Chinner wrote: > > On Tue, Apr 01, 2025 at 02:32:48AM +0200, Christian Brauner wrote: > > > diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c > > > index c5136ea9bb1d..1875b6551ab0 100644 > > > --- a/fs/xfs/xfs_zone_gc.c > > > +++ b/fs/xfs/xfs_zone_gc.c > > > @@ -993,7 +993,6 @@ xfs_zone_gc_handle_work( > > > } > > > > > > __set_current_state(TASK_RUNNING); > > > - try_to_freeze(); > > > > > > if (reset_list) > > > xfs_zone_gc_reset_zones(data, reset_list); > > > @@ -1041,7 +1040,6 @@ xfs_zoned_gcd( > > > unsigned int nofs_flag; > > > > > > nofs_flag = memalloc_nofs_save(); > > > - set_freezable(); > > > > > > for (;;) { > > > set_current_state(TASK_INTERRUPTIBLE | TASK_FREEZABLE); > > > > Same question here for this newly merged code, too... > > I'm not sure if this is supposed to be a snipe or not but just in case > this is a hidden question: No, I meant that this is changing shiny new just-merged XFS code (part of zone device support). It only just arrived this merge window and is largely just doing the same thing as the older aild code. It is probably safe to assume that this new code has never been tested against hibernate... -Dave.
On Tue, Apr 01, 2025 at 02:32:45AM +0200, Christian Brauner wrote: > The whole shebang can also be found at: > https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze > > I know nothing about power or hibernation. I've tested it as best as I > could. Works for me (TM). > > I need to catch some actual sleep now... > > --- > > Now all the pieces are in place to actually allow the power subsystem to > freeze/thaw filesystems during suspend/resume. Filesystems are only > frozen and thawed if the power subsystem does actually own the freeze. Urgh, I was relying on all kthreads to be freezable for live-patching: https://lkml.kernel.org/r/20250324134909.GA14718@noisy.programming.kicks-ass.net So I understand the problem with freezing filesystems, but can't we leave the TASK_FREEZABLE in the kthreads? The way I understand it, the power subsystem will first freeze the filesystems before it goes freeze threads anyway. So them remaining freezable should not affect anything, right?
The whole shebang can also be found at: https://web.git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=work.freeze I know nothing about power or hibernation. I've tested it as best as I could. Works for me (TM). I need to catch some actual sleep now... --- Now all the pieces are in place to actually allow the power subsystem to freeze/thaw filesystems during suspend/resume. Filesystems are only frozen and thawed if the power subsystem does actually own the freeze. Othwerwise it risks thawing filesystems it didn't own. This could be done differently be e.g., keeping the filesystems that were actually frozen on a list and then unfreezing them from that list. This is disgustingly unclean though and reeks of an ugly hack. If the filesystem is already frozen by the time we've frozen all userspace processes we don't care to freeze it again. That's userspace's job once the process resumes. We only actually freeze filesystems if we absolutely have to and we ignore other failures to freeze. We could bubble up errors and fail suspend/resume if the error isn't EBUSY (aka it's already frozen) but I don't think that this is worth it. Filesystem freezing during suspend/resume is best-effort. If the user has 500 ext4 filesystems mounted and 4 fail to freeze for whatever reason then we simply skip them. What we have now is already a big improvement and let's see how we fare with it before making our lives even harder (and uglier) than we have to. Signed-off-by: Christian Brauner <brauner@kernel.org> --- Christian Brauner (3): fs: add owner of freeze/thaw fs: allow pagefault based writers to be frozen power: freeze filesystems during suspend/resume Luis Chamberlain (3): ext4: replace kthread freezing with auto fs freezing btrfs: replace kthread freezing with auto fs freezing xfs: replace kthread freezing with auto fs freezing fs/btrfs/disk-io.c | 4 +-- fs/btrfs/scrub.c | 2 +- fs/ext4/mballoc.c | 2 +- fs/ext4/super.c | 3 -- fs/f2fs/gc.c | 6 ++-- fs/gfs2/super.c | 20 ++++++----- fs/gfs2/sys.c | 4 +-- fs/ioctl.c | 8 ++--- fs/super.c | 82 ++++++++++++++++++++++++++++++++++++--------- fs/xfs/scrub/fscounters.c | 4 +-- fs/xfs/xfs_discard.c | 2 +- fs/xfs/xfs_log.c | 3 +- fs/xfs/xfs_log_cil.c | 2 +- fs/xfs/xfs_mru_cache.c | 2 +- fs/xfs/xfs_notify_failure.c | 6 ++-- fs/xfs/xfs_pwork.c | 2 +- fs/xfs/xfs_super.c | 14 ++++---- fs/xfs/xfs_trans_ail.c | 3 -- fs/xfs/xfs_zone_gc.c | 2 -- include/linux/fs.h | 16 ++++++--- kernel/power/hibernate.c | 13 ++++++- kernel/power/suspend.c | 8 +++++ 22 files changed, 139 insertions(+), 69 deletions(-) --- base-commit: a68c99192db8060f383a2680333866c0be688ece change-id: 20250401-work-freeze-693b5b5a78e0