Message ID | 20250402-work-freeze-v2-2-6719a97b52ac@kernel.org |
---|---|
State | New |
Headers | show |
Series | power: wire-up filesystem freeze/thaw with suspend/resume | expand |
On Wed, Apr 02, 2025 at 04:07:32PM +0200, Christian Brauner wrote: > During freeze/thaw we need to be able to freeze all writers during > suspend/hibernate. Otherwise tasks such as systemd-journald that mmap a > file and write to it will not be frozen after we've already frozen the > filesystem. > > This has some risk of not being able to freeze processes in case a > process has acquired SB_FREEZE_PAGEFAULT under mmap_sem or > SB_FREEZE_INTERNAL under some other filesytem specific lock. If the > filesystem is frozen, a task can block on the frozen filesystem with > e.g., mmap_sem held. If some other task then blocks on grabbing that > mmap_sem, hibernation ill fail because it is unable to hibernate a task > holding mmap_sem. This could be fixed by making a range of filesystem > related locks use freezable sleeping. That's impractical and not > warranted just for suspend/hibernate. Assume that this is an infrequent > problem and we've given userspace a way to skip filesystem freezing > through a sysfs file. > > Signed-off-by: Christian Brauner <brauner@kernel.org> > --- > include/linux/fs.h | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index b379a46b5576..1edcba3cd68e 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1781,8 +1781,7 @@ static inline void __sb_end_write(struct super_block *sb, int level) > > static inline void __sb_start_write(struct super_block *sb, int level) > { > - percpu_down_read_freezable(sb->s_writers.rw_sem + level - 1, > - level == SB_FREEZE_WRITE); > + percpu_down_read_freezable(sb->s_writers.rw_sem + level - 1, true); > } Jan, one more thought about freezability here. We know that there will can be at least one process during hibernation that ends up generating page faults and that's systemd-journald. When systemd-sleep requests writing a hibernation image via /sys/power/ files it will inevitably end up freezing systemd-journald and it may be generating a page fault with ->mmap_lock held. systemd-journald is now sleeping with SB_FREEZE_PAGEFAULT and TASK_FREEZABLE. We know this can cause hibernation to fail. That part is fine. What isn't is that we will very likely always trigger: #ifdef CONFIG_LOCKDEP /* * It's dangerous to freeze with locks held; there be dragons there. */ if (!(state & __TASK_FREEZABLE_UNSAFE)) WARN_ON_ONCE(debug_locks && p->lockdep_depth); #endif with lockdep enabled. So we really actually need percpu_rswem_read_freezable_unsafe(), i.e., TASK_FREEZABLE_UNSAFE.
On Wed, Apr 02, 2025 at 12:03:24PM -0400, James Bottomley wrote: > On Wed, 2025-04-02 at 17:32 +0200, Christian Brauner wrote: > [...] > > Jan, one more thought about freezability here. We know that there > > will can be at least one process during hibernation that ends up > > generating page faults and that's systemd-journald. When systemd- > > sleep requests writing a hibernation image via /sys/power/ files it > > will inevitably end up freezing systemd-journald and it may be > > generating a page fault with ->mmap_lock held. systemd-journald is > > now sleeping with SB_FREEZE_PAGEFAULT and TASK_FREEZABLE. We know > > this can cause hibernation to fail. That part is fine. What isn't is > > that we will very likely always trigger: > > > > #ifdef CONFIG_LOCKDEP > > /* > > * It's dangerous to freeze with locks held; there be dragons > > there. > > */ > > if (!(state & __TASK_FREEZABLE_UNSAFE)) > > WARN_ON_ONCE(debug_locks && p->lockdep_depth); > > #endif > > > > with lockdep enabled. > > > > So we really actually need percpu_rswem_read_freezable_unsafe(), > > i.e., TASK_FREEZABLE_UNSAFE. > > The sched people have pretty strong views about people not doing this, > expressed in the comment in sched.h and commit f5d39b020809 > ("freezer,sched: Rewrite core freezer logic") where most of the _unsafe > variants got removed with prejudice. > > If we do get into this situation the worst that can happen is that > another upper lock acquisition triggers a hibernate failure and we thaw > everything, thus we can never truly deadlock, which is the fear, so Yes, I know that it's harmless but we need to not generate misleading lockdep splats when lockdep is turned on and confuse users. > perhaps they might be OK with this. Note that Rafael's solution to > this was to disable lockdep around hibernate/suspend and resume, which > is another possibility. It can be done as a follow-up. I'm just saying it needs treatment.
diff --git a/include/linux/fs.h b/include/linux/fs.h index b379a46b5576..1edcba3cd68e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1781,8 +1781,7 @@ static inline void __sb_end_write(struct super_block *sb, int level) static inline void __sb_start_write(struct super_block *sb, int level) { - percpu_down_read_freezable(sb->s_writers.rw_sem + level - 1, - level == SB_FREEZE_WRITE); + percpu_down_read_freezable(sb->s_writers.rw_sem + level - 1, true); } static inline bool __sb_start_write_trylock(struct super_block *sb, int level)
During freeze/thaw we need to be able to freeze all writers during suspend/hibernate. Otherwise tasks such as systemd-journald that mmap a file and write to it will not be frozen after we've already frozen the filesystem. This has some risk of not being able to freeze processes in case a process has acquired SB_FREEZE_PAGEFAULT under mmap_sem or SB_FREEZE_INTERNAL under some other filesytem specific lock. If the filesystem is frozen, a task can block on the frozen filesystem with e.g., mmap_sem held. If some other task then blocks on grabbing that mmap_sem, hibernation ill fail because it is unable to hibernate a task holding mmap_sem. This could be fixed by making a range of filesystem related locks use freezable sleeping. That's impractical and not warranted just for suspend/hibernate. Assume that this is an infrequent problem and we've given userspace a way to skip filesystem freezing through a sysfs file. Signed-off-by: Christian Brauner <brauner@kernel.org> --- include/linux/fs.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)