diff mbox series

[v1] driver core: Extend deferred probe timeout on driver registration

Message ID 20220429220933.1350374-1-saravanak@google.com
State Accepted
Commit 2b28a1a84a0eb3412bad1a2d5cce2bb4addec626
Headers show
Series [v1] driver core: Extend deferred probe timeout on driver registration | expand

Commit Message

Saravana Kannan April 29, 2022, 10:09 p.m. UTC
The deferred probe timer that's used for this currently starts at
late_initcall and runs for driver_deferred_probe_timeout seconds. The
assumption being that all available drivers would be loaded and
registered before the timer expires. This means, the
driver_deferred_probe_timeout has to be pretty large for it to cover the
worst case. But if we set the default value for it to cover the worst
case, it would significantly slow down the average case. For this
reason, the default value is set to 0.

Also, with CONFIG_MODULES=y and the current default values of
driver_deferred_probe_timeout=0 and fw_devlink=on, devices with missing
drivers will cause their consumer devices to always defer their probes.
This is because device links created by fw_devlink defer the probe even
before the consumer driver's probe() is called.

Instead of a fixed timeout, if we extend an unexpired deferred probe
timer on every successful driver registration, with the expectation more
modules would be loaded in the near future, then the default value of
driver_deferred_probe_timeout only needs to be as long as the worst case
time difference between two consecutive module loads.

So let's implement that and set the default value to 10 seconds when
CONFIG_MODULES=y.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Kevin Hilman <khilman@kernel.org>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Cc: linux-gpio@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 .../admin-guide/kernel-parameters.txt         |  6 ++++--
 drivers/base/base.h                           |  1 +
 drivers/base/dd.c                             | 19 +++++++++++++++++++
 drivers/base/driver.c                         |  1 +
 4 files changed, 25 insertions(+), 2 deletions(-)

Comments

Mark Brown May 13, 2022, 12:48 p.m. UTC | #1
On Fri, Apr 29, 2022 at 03:09:32PM -0700, Saravana Kannan wrote:
> The deferred probe timer that's used for this currently starts at
> late_initcall and runs for driver_deferred_probe_timeout seconds. The
> assumption being that all available drivers would be loaded and
> registered before the timer expires. This means, the
> driver_deferred_probe_timeout has to be pretty large for it to cover the
> worst case. But if we set the default value for it to cover the worst
> case, it would significantly slow down the average case. For this
> reason, the default value is set to 0.

This makes sense to me.

Reviewed-by: Mark Brown <broonie@kernel.org>
Rob Herring May 13, 2022, 1:58 p.m. UTC | #2
On Fri, Apr 29, 2022 at 5:09 PM Saravana Kannan <saravanak@google.com> wrote:
>
> The deferred probe timer that's used for this currently starts at
> late_initcall and runs for driver_deferred_probe_timeout seconds. The
> assumption being that all available drivers would be loaded and
> registered before the timer expires. This means, the
> driver_deferred_probe_timeout has to be pretty large for it to cover the
> worst case. But if we set the default value for it to cover the worst
> case, it would significantly slow down the average case. For this
> reason, the default value is set to 0.
>
> Also, with CONFIG_MODULES=y and the current default values of
> driver_deferred_probe_timeout=0 and fw_devlink=on, devices with missing
> drivers will cause their consumer devices to always defer their probes.
> This is because device links created by fw_devlink defer the probe even
> before the consumer driver's probe() is called.
>
> Instead of a fixed timeout, if we extend an unexpired deferred probe
> timer on every successful driver registration, with the expectation more
> modules would be loaded in the near future, then the default value of
> driver_deferred_probe_timeout only needs to be as long as the worst case
> time difference between two consecutive module loads.
>
> So let's implement that and set the default value to 10 seconds when
> CONFIG_MODULES=y.

We had to revert a non-zero timeout before (issue with NFS root IIRC).
Does fw_devlink=on somehow fix that?

Rob
Saravana Kannan May 13, 2022, 5:25 p.m. UTC | #3
On Fri, May 13, 2022 at 6:58 AM Rob Herring <robh@kernel.org> wrote:
>
> On Fri, Apr 29, 2022 at 5:09 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > The deferred probe timer that's used for this currently starts at
> > late_initcall and runs for driver_deferred_probe_timeout seconds. The
> > assumption being that all available drivers would be loaded and
> > registered before the timer expires. This means, the
> > driver_deferred_probe_timeout has to be pretty large for it to cover the
> > worst case. But if we set the default value for it to cover the worst
> > case, it would significantly slow down the average case. For this
> > reason, the default value is set to 0.
> >
> > Also, with CONFIG_MODULES=y and the current default values of
> > driver_deferred_probe_timeout=0 and fw_devlink=on, devices with missing
> > drivers will cause their consumer devices to always defer their probes.
> > This is because device links created by fw_devlink defer the probe even
> > before the consumer driver's probe() is called.
> >
> > Instead of a fixed timeout, if we extend an unexpired deferred probe
> > timer on every successful driver registration, with the expectation more
> > modules would be loaded in the near future, then the default value of
> > driver_deferred_probe_timeout only needs to be as long as the worst case
> > time difference between two consecutive module loads.
> >
> > So let's implement that and set the default value to 10 seconds when
> > CONFIG_MODULES=y.
>
> We had to revert a non-zero timeout before (issue with NFS root IIRC).
> Does fw_devlink=on somehow fix that?

If it's the one where ip autoconfig was timing out, then John Stultz
fixed it by fixing wait_for_device_probe().
https://lore.kernel.org/all/20200422203245.83244-4-john.stultz@linaro.org/

If you are referring to some other issue, then I'd need more details.

-Saravana
Rob Herring May 16, 2022, 1:49 p.m. UTC | #4
On Fri, May 13, 2022 at 12:26 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Fri, May 13, 2022 at 6:58 AM Rob Herring <robh@kernel.org> wrote:
> >
> > On Fri, Apr 29, 2022 at 5:09 PM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > The deferred probe timer that's used for this currently starts at
> > > late_initcall and runs for driver_deferred_probe_timeout seconds. The
> > > assumption being that all available drivers would be loaded and
> > > registered before the timer expires. This means, the
> > > driver_deferred_probe_timeout has to be pretty large for it to cover the
> > > worst case. But if we set the default value for it to cover the worst
> > > case, it would significantly slow down the average case. For this
> > > reason, the default value is set to 0.
> > >
> > > Also, with CONFIG_MODULES=y and the current default values of
> > > driver_deferred_probe_timeout=0 and fw_devlink=on, devices with missing
> > > drivers will cause their consumer devices to always defer their probes.
> > > This is because device links created by fw_devlink defer the probe even
> > > before the consumer driver's probe() is called.
> > >
> > > Instead of a fixed timeout, if we extend an unexpired deferred probe
> > > timer on every successful driver registration, with the expectation more
> > > modules would be loaded in the near future, then the default value of
> > > driver_deferred_probe_timeout only needs to be as long as the worst case
> > > time difference between two consecutive module loads.
> > >
> > > So let's implement that and set the default value to 10 seconds when
> > > CONFIG_MODULES=y.
> >
> > We had to revert a non-zero timeout before (issue with NFS root IIRC).
> > Does fw_devlink=on somehow fix that?
>
> If it's the one where ip autoconfig was timing out, then John Stultz
> fixed it by fixing wait_for_device_probe().
> https://lore.kernel.org/all/20200422203245.83244-4-john.stultz@linaro.org/

Yeah, that was it.

Acked-by: Rob Herring <robh@kernel.org>
Nathan Chancellor May 20, 2022, 11:30 p.m. UTC | #5
Hi Saravana,

On Fri, Apr 29, 2022 at 03:09:32PM -0700, Saravana Kannan wrote:
> The deferred probe timer that's used for this currently starts at
> late_initcall and runs for driver_deferred_probe_timeout seconds. The
> assumption being that all available drivers would be loaded and
> registered before the timer expires. This means, the
> driver_deferred_probe_timeout has to be pretty large for it to cover the
> worst case. But if we set the default value for it to cover the worst
> case, it would significantly slow down the average case. For this
> reason, the default value is set to 0.
> 
> Also, with CONFIG_MODULES=y and the current default values of
> driver_deferred_probe_timeout=0 and fw_devlink=on, devices with missing
> drivers will cause their consumer devices to always defer their probes.
> This is because device links created by fw_devlink defer the probe even
> before the consumer driver's probe() is called.
> 
> Instead of a fixed timeout, if we extend an unexpired deferred probe
> timer on every successful driver registration, with the expectation more
> modules would be loaded in the near future, then the default value of
> driver_deferred_probe_timeout only needs to be as long as the worst case
> time difference between two consecutive module loads.
> 
> So let's implement that and set the default value to 10 seconds when
> CONFIG_MODULES=y.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Linus Walleij <linus.walleij@linaro.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ulf Hansson <ulf.hansson@linaro.org>
> Cc: Kevin Hilman <khilman@kernel.org>
> Cc: Thierry Reding <treding@nvidia.com>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
> Cc: linux-gpio@vger.kernel.org
> Cc: linux-pm@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org
> Signed-off-by: Saravana Kannan <saravanak@google.com>

I bisected a boot hang with ARCH=s390 defconfig in QEMU down to this
change as commit 2b28a1a84a0e ("driver core: Extend deferred probe
timeout on driver registration") in next-20220520 (bisect log below).

$ make -skj"$(nproc)" ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- defconfig bzImage

$ timeout --foreground 15m stdbuf -oL -eL \
qemu-system-s390x \
-initrd ... \
-M s390-ccw-virtio \
-display none \
-kernel arch/s390/boot/bzImage \
-m 512m \
-nodefaults \
-serial mon:stdio
...
[    2.077303] In-situ OAM (IOAM) with IPv6
[    2.077639] NET: Registered PF_PACKET protocol family
[    2.078063] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[    2.078795] Key type dns_resolver registered
[    2.079317] cio: Channel measurement facility initialized using format extended (mode autodetected)
[    2.081494] Discipline DIAG cannot be used without z/VM
[  260.626363] random: crng init done
qemu-system-s390x: terminating on signal 15 from pid 3815762 (timeout)

We have a simple rootfs available if necessary:

https://github.com/ClangBuiltLinux/boot-utils/raw/bc0d17785eb67f1edd0ee0a134970a807895f741/images/s390/rootfs.cpio.zst

If there is any other information I can provide, please let me know!

Cheers,
Nathan

# bad: [18ecd30af1a8402c162cca1bd58771c0e5be7815] Add linux-next specific files for 20220520
# good: [b015dcd62b86d298829990f8261d5d154b8d7af5] Merge tag 'for-5.18/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
git bisect start '18ecd30af1a8402c162cca1bd58771c0e5be7815' 'b015dcd62b86d298829990f8261d5d154b8d7af5'
# good: [f9b63740b666dd9887eb0282d21b5f65bb0cadd0] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
git bisect good f9b63740b666dd9887eb0282d21b5f65bb0cadd0
# good: [1f5eb3e76303572f0318e8c50da51c516580aa03] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
git bisect good 1f5eb3e76303572f0318e8c50da51c516580aa03
# bad: [4c1d9cc0363691893ef94fa0d798faca013e27d3] Merge branch 'staging-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
git bisect bad 4c1d9cc0363691893ef94fa0d798faca013e27d3
# bad: [dcb68304485c0ba5f84f1a54687c751b68263d93] Merge branch 'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial.git
git bisect bad dcb68304485c0ba5f84f1a54687c751b68263d93
# good: [61271996dc46aecb40fd26f89a4ec0a6bd8f3a8f] Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm.git
git bisect good 61271996dc46aecb40fd26f89a4ec0a6bd8f3a8f
# good: [d4db45a71f56032b552e161968bb0e5fdd2767f8] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds.git
git bisect good d4db45a71f56032b552e161968bb0e5fdd2767f8
# good: [d090c7a2ab84663185e4abda21d7d83880937c8a] USB / dwc3: Fix a checkpatch warning in core.c
git bisect good d090c7a2ab84663185e4abda21d7d83880937c8a
# bad: [b232b02bf3c205b13a26dcec08e53baddd8e59ed] driver core: fix deadlock in __device_attach
git bisect bad b232b02bf3c205b13a26dcec08e53baddd8e59ed
# good: [4c32174a24759d5ac6dc42b508fcec2afb8b9602] Documentation: dd: Use ReST lists for return values of driver_deferred_probe_check_state()
git bisect good 4c32174a24759d5ac6dc42b508fcec2afb8b9602
# good: [38ea74eb8fc1b82b39e13a6527095a0036539117] rpmsg: use local 'dev' variable
git bisect good 38ea74eb8fc1b82b39e13a6527095a0036539117
# good: [1f7ff11ca68f464b6a9a71b8fbe9e5219e7cac57] driver core: location: Add "back" as a possible output for panel
git bisect good 1f7ff11ca68f464b6a9a71b8fbe9e5219e7cac57
# good: [6ee60e9c9f2f83ad218159af6a175c57a395ae69] MAINTAINERS: add Russ Weight as a firmware loader maintainer
git bisect good 6ee60e9c9f2f83ad218159af6a175c57a395ae69
# bad: [15f214f9bdb7c1f560b4bf863c5a72ff53b442a4] topology: Remove unused cpu_cluster_mask()
git bisect bad 15f214f9bdb7c1f560b4bf863c5a72ff53b442a4
# bad: [2b28a1a84a0eb3412bad1a2d5cce2bb4addec626] driver core: Extend deferred probe timeout on driver registration
git bisect bad 2b28a1a84a0eb3412bad1a2d5cce2bb4addec626
# first bad commit: [2b28a1a84a0eb3412bad1a2d5cce2bb4addec626] driver core: Extend deferred probe timeout on driver registration
Saravana Kannan May 20, 2022, 11:49 p.m. UTC | #6
On Fri, May 20, 2022 at 4:30 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> Hi Saravana,
>
> On Fri, Apr 29, 2022 at 03:09:32PM -0700, Saravana Kannan wrote:
> > The deferred probe timer that's used for this currently starts at
> > late_initcall and runs for driver_deferred_probe_timeout seconds. The
> > assumption being that all available drivers would be loaded and
> > registered before the timer expires. This means, the
> > driver_deferred_probe_timeout has to be pretty large for it to cover the
> > worst case. But if we set the default value for it to cover the worst
> > case, it would significantly slow down the average case. For this
> > reason, the default value is set to 0.
> >
> > Also, with CONFIG_MODULES=y and the current default values of
> > driver_deferred_probe_timeout=0 and fw_devlink=on, devices with missing
> > drivers will cause their consumer devices to always defer their probes.
> > This is because device links created by fw_devlink defer the probe even
> > before the consumer driver's probe() is called.
> >
> > Instead of a fixed timeout, if we extend an unexpired deferred probe
> > timer on every successful driver registration, with the expectation more
> > modules would be loaded in the near future, then the default value of
> > driver_deferred_probe_timeout only needs to be as long as the worst case
> > time difference between two consecutive module loads.
> >
> > So let's implement that and set the default value to 10 seconds when
> > CONFIG_MODULES=y.
> >
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > Cc: Rob Herring <robh@kernel.org>
> > Cc: Linus Walleij <linus.walleij@linaro.org>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Ulf Hansson <ulf.hansson@linaro.org>
> > Cc: Kevin Hilman <khilman@kernel.org>
> > Cc: Thierry Reding <treding@nvidia.com>
> > Cc: Mark Brown <broonie@kernel.org>
> > Cc: Pavel Machek <pavel@ucw.cz>
> > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
> > Cc: linux-gpio@vger.kernel.org
> > Cc: linux-pm@vger.kernel.org
> > Cc: iommu@lists.linux-foundation.org
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> I bisected a boot hang with ARCH=s390 defconfig in QEMU down to this
> change as commit 2b28a1a84a0e ("driver core: Extend deferred probe
> timeout on driver registration") in next-20220520 (bisect log below).
>
> $ make -skj"$(nproc)" ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- defconfig bzImage
>
> $ timeout --foreground 15m stdbuf -oL -eL \
> qemu-system-s390x \
> -initrd ... \
> -M s390-ccw-virtio \
> -display none \
> -kernel arch/s390/boot/bzImage \
> -m 512m \
> -nodefaults \
> -serial mon:stdio
> ...
> [    2.077303] In-situ OAM (IOAM) with IPv6
> [    2.077639] NET: Registered PF_PACKET protocol family
> [    2.078063] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
> [    2.078795] Key type dns_resolver registered
> [    2.079317] cio: Channel measurement facility initialized using format extended (mode autodetected)
> [    2.081494] Discipline DIAG cannot be used without z/VM
> [  260.626363] random: crng init done
> qemu-system-s390x: terminating on signal 15 from pid 3815762 (timeout)
>
> We have a simple rootfs available if necessary:
>
> https://github.com/ClangBuiltLinux/boot-utils/raw/bc0d17785eb67f1edd0ee0a134970a807895f741/images/s390/rootfs.cpio.zst
>
> If there is any other information I can provide, please let me know!

Hmm... strange. Can you please try the following command line options
and tell me which of these has the issue and which don't?
1) deferred_probe_timeout=0
2) deferred_probe_timeout=1
3) deferred_probe_timeout=300

That should help me narrow down where the error might be.

Thanks,
Saravana

>
> Cheers,
> Nathan
>
> # bad: [18ecd30af1a8402c162cca1bd58771c0e5be7815] Add linux-next specific files for 20220520
> # good: [b015dcd62b86d298829990f8261d5d154b8d7af5] Merge tag 'for-5.18/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
> git bisect start '18ecd30af1a8402c162cca1bd58771c0e5be7815' 'b015dcd62b86d298829990f8261d5d154b8d7af5'
> # good: [f9b63740b666dd9887eb0282d21b5f65bb0cadd0] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> git bisect good f9b63740b666dd9887eb0282d21b5f65bb0cadd0
> # good: [1f5eb3e76303572f0318e8c50da51c516580aa03] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
> git bisect good 1f5eb3e76303572f0318e8c50da51c516580aa03
> # bad: [4c1d9cc0363691893ef94fa0d798faca013e27d3] Merge branch 'staging-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
> git bisect bad 4c1d9cc0363691893ef94fa0d798faca013e27d3
> # bad: [dcb68304485c0ba5f84f1a54687c751b68263d93] Merge branch 'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial.git
> git bisect bad dcb68304485c0ba5f84f1a54687c751b68263d93
> # good: [61271996dc46aecb40fd26f89a4ec0a6bd8f3a8f] Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm.git
> git bisect good 61271996dc46aecb40fd26f89a4ec0a6bd8f3a8f
> # good: [d4db45a71f56032b552e161968bb0e5fdd2767f8] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds.git
> git bisect good d4db45a71f56032b552e161968bb0e5fdd2767f8
> # good: [d090c7a2ab84663185e4abda21d7d83880937c8a] USB / dwc3: Fix a checkpatch warning in core.c
> git bisect good d090c7a2ab84663185e4abda21d7d83880937c8a
> # bad: [b232b02bf3c205b13a26dcec08e53baddd8e59ed] driver core: fix deadlock in __device_attach
> git bisect bad b232b02bf3c205b13a26dcec08e53baddd8e59ed
> # good: [4c32174a24759d5ac6dc42b508fcec2afb8b9602] Documentation: dd: Use ReST lists for return values of driver_deferred_probe_check_state()
> git bisect good 4c32174a24759d5ac6dc42b508fcec2afb8b9602
> # good: [38ea74eb8fc1b82b39e13a6527095a0036539117] rpmsg: use local 'dev' variable
> git bisect good 38ea74eb8fc1b82b39e13a6527095a0036539117
> # good: [1f7ff11ca68f464b6a9a71b8fbe9e5219e7cac57] driver core: location: Add "back" as a possible output for panel
> git bisect good 1f7ff11ca68f464b6a9a71b8fbe9e5219e7cac57
> # good: [6ee60e9c9f2f83ad218159af6a175c57a395ae69] MAINTAINERS: add Russ Weight as a firmware loader maintainer
> git bisect good 6ee60e9c9f2f83ad218159af6a175c57a395ae69
> # bad: [15f214f9bdb7c1f560b4bf863c5a72ff53b442a4] topology: Remove unused cpu_cluster_mask()
> git bisect bad 15f214f9bdb7c1f560b4bf863c5a72ff53b442a4
> # bad: [2b28a1a84a0eb3412bad1a2d5cce2bb4addec626] driver core: Extend deferred probe timeout on driver registration
> git bisect bad 2b28a1a84a0eb3412bad1a2d5cce2bb4addec626
> # first bad commit: [2b28a1a84a0eb3412bad1a2d5cce2bb4addec626] driver core: Extend deferred probe timeout on driver registration
Nathan Chancellor May 21, 2022, 12:04 a.m. UTC | #7
On Fri, May 20, 2022 at 04:49:48PM -0700, Saravana Kannan wrote:
> On Fri, May 20, 2022 at 4:30 PM Nathan Chancellor <nathan@kernel.org> wrote:
> >
> > Hi Saravana,
> >
> > On Fri, Apr 29, 2022 at 03:09:32PM -0700, Saravana Kannan wrote:
> > > The deferred probe timer that's used for this currently starts at
> > > late_initcall and runs for driver_deferred_probe_timeout seconds. The
> > > assumption being that all available drivers would be loaded and
> > > registered before the timer expires. This means, the
> > > driver_deferred_probe_timeout has to be pretty large for it to cover the
> > > worst case. But if we set the default value for it to cover the worst
> > > case, it would significantly slow down the average case. For this
> > > reason, the default value is set to 0.
> > >
> > > Also, with CONFIG_MODULES=y and the current default values of
> > > driver_deferred_probe_timeout=0 and fw_devlink=on, devices with missing
> > > drivers will cause their consumer devices to always defer their probes.
> > > This is because device links created by fw_devlink defer the probe even
> > > before the consumer driver's probe() is called.
> > >
> > > Instead of a fixed timeout, if we extend an unexpired deferred probe
> > > timer on every successful driver registration, with the expectation more
> > > modules would be loaded in the near future, then the default value of
> > > driver_deferred_probe_timeout only needs to be as long as the worst case
> > > time difference between two consecutive module loads.
> > >
> > > So let's implement that and set the default value to 10 seconds when
> > > CONFIG_MODULES=y.
> > >
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > > Cc: Rob Herring <robh@kernel.org>
> > > Cc: Linus Walleij <linus.walleij@linaro.org>
> > > Cc: Will Deacon <will@kernel.org>
> > > Cc: Ulf Hansson <ulf.hansson@linaro.org>
> > > Cc: Kevin Hilman <khilman@kernel.org>
> > > Cc: Thierry Reding <treding@nvidia.com>
> > > Cc: Mark Brown <broonie@kernel.org>
> > > Cc: Pavel Machek <pavel@ucw.cz>
> > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > > Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
> > > Cc: linux-gpio@vger.kernel.org
> > > Cc: linux-pm@vger.kernel.org
> > > Cc: iommu@lists.linux-foundation.org
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> >
> > I bisected a boot hang with ARCH=s390 defconfig in QEMU down to this
> > change as commit 2b28a1a84a0e ("driver core: Extend deferred probe
> > timeout on driver registration") in next-20220520 (bisect log below).
> >
> > $ make -skj"$(nproc)" ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- defconfig bzImage
> >
> > $ timeout --foreground 15m stdbuf -oL -eL \
> > qemu-system-s390x \
> > -initrd ... \
> > -M s390-ccw-virtio \
> > -display none \
> > -kernel arch/s390/boot/bzImage \
> > -m 512m \
> > -nodefaults \
> > -serial mon:stdio
> > ...
> > [    2.077303] In-situ OAM (IOAM) with IPv6
> > [    2.077639] NET: Registered PF_PACKET protocol family
> > [    2.078063] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
> > [    2.078795] Key type dns_resolver registered
> > [    2.079317] cio: Channel measurement facility initialized using format extended (mode autodetected)
> > [    2.081494] Discipline DIAG cannot be used without z/VM
> > [  260.626363] random: crng init done
> > qemu-system-s390x: terminating on signal 15 from pid 3815762 (timeout)
> >
> > We have a simple rootfs available if necessary:
> >
> > https://github.com/ClangBuiltLinux/boot-utils/raw/bc0d17785eb67f1edd0ee0a134970a807895f741/images/s390/rootfs.cpio.zst
> >
> > If there is any other information I can provide, please let me know!
> 
> Hmm... strange. Can you please try the following command line options
> and tell me which of these has the issue and which don't?

Sure thing!

> 1) deferred_probe_timeout=0

No issue.

> 2) deferred_probe_timeout=1
> 3) deferred_probe_timeout=300

Both of these appear to hang in the same way, I let each sit for five
minutes.

Cheers,
Nathan
Saravana Kannan May 21, 2022, 12:15 a.m. UTC | #8
On Fri, May 20, 2022 at 5:04 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Fri, May 20, 2022 at 04:49:48PM -0700, Saravana Kannan wrote:
> > On Fri, May 20, 2022 at 4:30 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > >
> > > Hi Saravana,
> > >
> > > On Fri, Apr 29, 2022 at 03:09:32PM -0700, Saravana Kannan wrote:
> > > > The deferred probe timer that's used for this currently starts at
> > > > late_initcall and runs for driver_deferred_probe_timeout seconds. The
> > > > assumption being that all available drivers would be loaded and
> > > > registered before the timer expires. This means, the
> > > > driver_deferred_probe_timeout has to be pretty large for it to cover the
> > > > worst case. But if we set the default value for it to cover the worst
> > > > case, it would significantly slow down the average case. For this
> > > > reason, the default value is set to 0.
> > > >
> > > > Also, with CONFIG_MODULES=y and the current default values of
> > > > driver_deferred_probe_timeout=0 and fw_devlink=on, devices with missing
> > > > drivers will cause their consumer devices to always defer their probes.
> > > > This is because device links created by fw_devlink defer the probe even
> > > > before the consumer driver's probe() is called.
> > > >
> > > > Instead of a fixed timeout, if we extend an unexpired deferred probe
> > > > timer on every successful driver registration, with the expectation more
> > > > modules would be loaded in the near future, then the default value of
> > > > driver_deferred_probe_timeout only needs to be as long as the worst case
> > > > time difference between two consecutive module loads.
> > > >
> > > > So let's implement that and set the default value to 10 seconds when
> > > > CONFIG_MODULES=y.
> > > >
> > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > > > Cc: Rob Herring <robh@kernel.org>
> > > > Cc: Linus Walleij <linus.walleij@linaro.org>
> > > > Cc: Will Deacon <will@kernel.org>
> > > > Cc: Ulf Hansson <ulf.hansson@linaro.org>
> > > > Cc: Kevin Hilman <khilman@kernel.org>
> > > > Cc: Thierry Reding <treding@nvidia.com>
> > > > Cc: Mark Brown <broonie@kernel.org>
> > > > Cc: Pavel Machek <pavel@ucw.cz>
> > > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > > Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > > > Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
> > > > Cc: linux-gpio@vger.kernel.org
> > > > Cc: linux-pm@vger.kernel.org
> > > > Cc: iommu@lists.linux-foundation.org
> > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > >
> > > I bisected a boot hang with ARCH=s390 defconfig in QEMU down to this
> > > change as commit 2b28a1a84a0e ("driver core: Extend deferred probe
> > > timeout on driver registration") in next-20220520 (bisect log below).
> > >
> > > $ make -skj"$(nproc)" ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- defconfig bzImage
> > >
> > > $ timeout --foreground 15m stdbuf -oL -eL \
> > > qemu-system-s390x \
> > > -initrd ... \
> > > -M s390-ccw-virtio \
> > > -display none \
> > > -kernel arch/s390/boot/bzImage \
> > > -m 512m \
> > > -nodefaults \
> > > -serial mon:stdio
> > > ...
> > > [    2.077303] In-situ OAM (IOAM) with IPv6
> > > [    2.077639] NET: Registered PF_PACKET protocol family
> > > [    2.078063] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
> > > [    2.078795] Key type dns_resolver registered
> > > [    2.079317] cio: Channel measurement facility initialized using format extended (mode autodetected)
> > > [    2.081494] Discipline DIAG cannot be used without z/VM
> > > [  260.626363] random: crng init done
> > > qemu-system-s390x: terminating on signal 15 from pid 3815762 (timeout)
> > >
> > > We have a simple rootfs available if necessary:
> > >
> > > https://github.com/ClangBuiltLinux/boot-utils/raw/bc0d17785eb67f1edd0ee0a134970a807895f741/images/s390/rootfs.cpio.zst
> > >
> > > If there is any other information I can provide, please let me know!
> >
> > Hmm... strange. Can you please try the following command line options
> > and tell me which of these has the issue and which don't?
>
> Sure thing!
>
> > 1) deferred_probe_timeout=0
>
> No issue.
>
> > 2) deferred_probe_timeout=1
> > 3) deferred_probe_timeout=300
>
> Both of these appear to hang in the same way, I let each sit for five
> minutes.

Strange that a sufficiently large timeout isn't helping. Is it trying
to boot off a network mount? I'll continue looking into this next
week.

-Saravana

>
> Cheers,
> Nathan
Nathan Chancellor May 23, 2022, 3:17 p.m. UTC | #9
On Fri, May 20, 2022 at 05:15:55PM -0700, Saravana Kannan wrote:
> On Fri, May 20, 2022 at 5:04 PM Nathan Chancellor <nathan@kernel.org> wrote:
> >
> > On Fri, May 20, 2022 at 04:49:48PM -0700, Saravana Kannan wrote:
> > > On Fri, May 20, 2022 at 4:30 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > > >
> > > > Hi Saravana,
> > > >
> > > > On Fri, Apr 29, 2022 at 03:09:32PM -0700, Saravana Kannan wrote:
> > > > > The deferred probe timer that's used for this currently starts at
> > > > > late_initcall and runs for driver_deferred_probe_timeout seconds. The
> > > > > assumption being that all available drivers would be loaded and
> > > > > registered before the timer expires. This means, the
> > > > > driver_deferred_probe_timeout has to be pretty large for it to cover the
> > > > > worst case. But if we set the default value for it to cover the worst
> > > > > case, it would significantly slow down the average case. For this
> > > > > reason, the default value is set to 0.
> > > > >
> > > > > Also, with CONFIG_MODULES=y and the current default values of
> > > > > driver_deferred_probe_timeout=0 and fw_devlink=on, devices with missing
> > > > > drivers will cause their consumer devices to always defer their probes.
> > > > > This is because device links created by fw_devlink defer the probe even
> > > > > before the consumer driver's probe() is called.
> > > > >
> > > > > Instead of a fixed timeout, if we extend an unexpired deferred probe
> > > > > timer on every successful driver registration, with the expectation more
> > > > > modules would be loaded in the near future, then the default value of
> > > > > driver_deferred_probe_timeout only needs to be as long as the worst case
> > > > > time difference between two consecutive module loads.
> > > > >
> > > > > So let's implement that and set the default value to 10 seconds when
> > > > > CONFIG_MODULES=y.
> > > > >
> > > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > > > > Cc: Rob Herring <robh@kernel.org>
> > > > > Cc: Linus Walleij <linus.walleij@linaro.org>
> > > > > Cc: Will Deacon <will@kernel.org>
> > > > > Cc: Ulf Hansson <ulf.hansson@linaro.org>
> > > > > Cc: Kevin Hilman <khilman@kernel.org>
> > > > > Cc: Thierry Reding <treding@nvidia.com>
> > > > > Cc: Mark Brown <broonie@kernel.org>
> > > > > Cc: Pavel Machek <pavel@ucw.cz>
> > > > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > > > Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > > > > Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
> > > > > Cc: linux-gpio@vger.kernel.org
> > > > > Cc: linux-pm@vger.kernel.org
> > > > > Cc: iommu@lists.linux-foundation.org
> > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > >
> > > > I bisected a boot hang with ARCH=s390 defconfig in QEMU down to this
> > > > change as commit 2b28a1a84a0e ("driver core: Extend deferred probe
> > > > timeout on driver registration") in next-20220520 (bisect log below).
> > > >
> > > > $ make -skj"$(nproc)" ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- defconfig bzImage
> > > >
> > > > $ timeout --foreground 15m stdbuf -oL -eL \
> > > > qemu-system-s390x \
> > > > -initrd ... \
> > > > -M s390-ccw-virtio \
> > > > -display none \
> > > > -kernel arch/s390/boot/bzImage \
> > > > -m 512m \
> > > > -nodefaults \
> > > > -serial mon:stdio
> > > > ...
> > > > [    2.077303] In-situ OAM (IOAM) with IPv6
> > > > [    2.077639] NET: Registered PF_PACKET protocol family
> > > > [    2.078063] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
> > > > [    2.078795] Key type dns_resolver registered
> > > > [    2.079317] cio: Channel measurement facility initialized using format extended (mode autodetected)
> > > > [    2.081494] Discipline DIAG cannot be used without z/VM
> > > > [  260.626363] random: crng init done
> > > > qemu-system-s390x: terminating on signal 15 from pid 3815762 (timeout)
> > > >
> > > > We have a simple rootfs available if necessary:
> > > >
> > > > https://github.com/ClangBuiltLinux/boot-utils/raw/bc0d17785eb67f1edd0ee0a134970a807895f741/images/s390/rootfs.cpio.zst
> > > >
> > > > If there is any other information I can provide, please let me know!
> > >
> > > Hmm... strange. Can you please try the following command line options
> > > and tell me which of these has the issue and which don't?
> >
> > Sure thing!
> >
> > > 1) deferred_probe_timeout=0
> >
> > No issue.
> >
> > > 2) deferred_probe_timeout=1
> > > 3) deferred_probe_timeout=300
> >
> > Both of these appear to hang in the same way, I let each sit for five
> > minutes.
> 
> Strange that a sufficiently large timeout isn't helping. Is it trying
> to boot off a network mount? I'll continue looking into this next
> week.

I don't think so, it seems like doing that requires some extra flags
that we do not have:

https://wiki.qemu.org/Features/S390xNetworkBoot

If you need any additional information or want something tested, please
let me know!

Cheers,
Nathan
Sebastian Andrzej Siewior May 24, 2022, 4:41 p.m. UTC | #10
On 2022-05-23 20:43:06 [-0700], Saravana Kannan wrote:
…
> Thanks for all the help. I think I know what's going on.

I, too got here because my boot recently was extended by 10 seconds and
bisected to that commit in question.

> If you revert the following commit, then you'll see that your device
> no longer hangs with my changes.
> 35a672363ab3 driver core: Ensure wait_for_device_probe() waits until
> the deferred_probe_timeout fires

Removing probe_timeout_waitqueue (as suggested) or setting the timeout
to 0 avoids the delay.

> -Saravana

Sebastian
Saravana Kannan May 25, 2022, 7:49 p.m. UTC | #11
On Wed, May 25, 2022 at 12:12 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2022-05-24 10:46:49 [-0700], Saravana Kannan wrote:
> > > Removing probe_timeout_waitqueue (as suggested) or setting the timeout
> > > to 0 avoids the delay.
> >
> > In your case, I think it might be working as intended? Curious, what
> > was the call stack in your case where it was blocked?
>
> Why is then there 10sec delay during boot? The backtrace is
> |------------[ cut here ]------------
> |WARNING: CPU: 4 PID: 1 at drivers/base/dd.c:742 wait_for_device_probe+0x30/0x110
> |Modules linked in:
> |CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc5+ #154
> |RIP: 0010:wait_for_device_probe+0x30/0x110
> |Call Trace:
> | <TASK>
> | prepare_namespace+0x2b/0x160
> | kernel_init_freeable+0x2b3/0x2dd
> | kernel_init+0x11/0x110
> | ret_from_fork+0x22/0x30
> | </TASK>
>
> Looking closer, it can't access init. This in particular box boots
> directly the kernel without an initramfs so the kernel later mounts
> /dev/sda1 and everything is good.  So that seems to be the reason…

Hmmm... that part shouldn't matter. As long as you are hitting the
same code path. My guess is one of them has CONFIG_MODULES enabled and
the other doesn't.

In either case, I think the patch needs to be reverted (I'll send out
one soon), but that'll also mean I need to revert part of my patch
(sets the timeout back to 0) or I need to fix this case:
https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/

I'll try to do the latter if I can get something reasonable soon.
Otherwise, I'll just do the revert + partial revert.

-Saravana


> My other machine with an initramfs does not show this problem.
>
> > -Saravana
>
> Sebastian
Niklas Cassel May 29, 2022, 8:34 a.m. UTC | #12
On Wed, May 25, 2022 at 12:49:00PM -0700, Saravana Kannan wrote:
> On Wed, May 25, 2022 at 12:12 AM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > On 2022-05-24 10:46:49 [-0700], Saravana Kannan wrote:
> > > > Removing probe_timeout_waitqueue (as suggested) or setting the timeout
> > > > to 0 avoids the delay.
> > >
> > > In your case, I think it might be working as intended? Curious, what
> > > was the call stack in your case where it was blocked?
> >
> > Why is then there 10sec delay during boot? The backtrace is
> > |------------[ cut here ]------------
> > |WARNING: CPU: 4 PID: 1 at drivers/base/dd.c:742 wait_for_device_probe+0x30/0x110
> > |Modules linked in:
> > |CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc5+ #154
> > |RIP: 0010:wait_for_device_probe+0x30/0x110
> > |Call Trace:
> > | <TASK>
> > | prepare_namespace+0x2b/0x160
> > | kernel_init_freeable+0x2b3/0x2dd
> > | kernel_init+0x11/0x110
> > | ret_from_fork+0x22/0x30
> > | </TASK>
> >
> > Looking closer, it can't access init. This in particular box boots
> > directly the kernel without an initramfs so the kernel later mounts
> > /dev/sda1 and everything is good.  So that seems to be the reason…
>

Hello there,

My (QEMU) boot times were recently extended by 10 seconds.
Looking at the timestamps, it looks like nothing is being done for 10 whole
seconds.

A git bisect landed me at the patch in $subject:
2b28a1a84a0e ("driver core: Extend deferred probe timeout on driver registration")

Adding a WARN_ON(1) in wait_for_device_probe(), as requested by the patch
author from the others seeing a regression with this patch, gives two different
stacktraces during boot:

[    0.459633] printk: console [netcon0] enabled
[    0.459636] printk: console [netcon0] printing thread started
[    0.459637] netconsole: network logging started
[    0.459896] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[    0.460230] kworker/u8:6 (105) used greatest stack depth: 14744 bytes left
[    0.461031] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[    0.461077] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[    0.461085] cfg80211: failed to load regulatory.db
[    0.461113] ALSA device list:
[    0.461116]   No soundcards found.
[    0.461614] ------------[ cut here ]------------
[    0.461615] WARNING: CPU: 2 PID: 1 at drivers/base/dd.c:741 wait_for_device_probe+0x1a/0x160
[    0.485809] Modules linked in:
[    0.486089] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.18.0-next-20220526-00004-g74f936013b08-dirty #20
[    0.486842] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[    0.487707] RIP: 0010:wait_for_device_probe+0x1a/0x160
[    0.488103] Code: 00 e8 fa e4 b5 ff 8b 44 24 04 48 83 c4 08 5b c3 0f 1f 44 00 00 53 48 83 ec 30 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 <0f> 0b e8 1f ac 57 00 8b 15 f1 b3 24 01 85 d2 75 3d 48 c7 c7 60 2f
[    0.489539] RSP: 0000:ffff9c7900013ed8 EFLAGS: 00010246
[    0.489965] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000d02
[    0.490597] RDX: 0000000000000cc2 RSI: 0000000000000000 RDI: 000000000002e990
[    0.491181] RBP: 0000000000000214 R08: 000000000000000f R09: 0000000000000064
[    0.491788] R10: ffff9c7900013c6c R11: 0000000000000000 R12: ffff8964c0343640
[    0.492384] R13: ffffffff9e51791c R14: 0000000000000000 R15: 0000000000000000
[    0.492960] FS:  0000000000000000(0000) GS:ffff896637d00000(0000) knlGS:0000000000000000
[    0.493658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.494501] CR2: 0000000000000000 CR3: 00000001ed20c001 CR4: 0000000000370ee0
[    0.495621] Call Trace:
[    0.496059]  <TASK>
[    0.496266]  ? init_eaccess+0x3b/0x76
[    0.496657]  prepare_namespace+0x30/0x16a
[    0.497016]  kernel_init_freeable+0x207/0x212
[    0.497407]  ? rest_init+0xc0/0xc0
[    0.497714]  kernel_init+0x16/0x120
[    0.498250]  ret_from_fork+0x1f/0x30
[    0.498898]  </TASK>
[    0.499307] ---[ end trace 0000000000000000 ]---
[    0.748413] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    0.749053] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    0.749461] ata2.00: ATA-7: QEMU HARDDISK, version, max UDMA/100
[    0.749470] ata2.00: 732 sectors, multi 16: LBA48 NCQ (depth 32)
[    0.749479] ata2.00: applying bridge limits
[    0.750915] ata4: SATA link down (SStatus 0 SControl 300)
[    0.752110] ata5: SATA link down (SStatus 0 SControl 300)
[    0.753424] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    0.754877] ata3: SATA link down (SStatus 0 SControl 300)
[    0.755342] ata1.00: ATA-7: QEMU HARDDISK, version, max UDMA/100
[    0.755377] ata1.00: 268435456 sectors, multi 16: LBA48 NCQ (depth 32)
[    0.755387] ata1.00: applying bridge limits
[    0.755486] ata6.00: ATA-7: QEMU HARDDISK, version, max UDMA/100
[    0.755492] ata6.00: 8388608 sectors, multi 16: LBA48 NCQ (depth 32)
[    0.755500] ata6.00: applying bridge limits
[    0.757330] ata1.00: configured for UDMA/100
[    0.757441] ata6.00: configured for UDMA/100
[    0.757505] ata2.00: configured for UDMA/100
[    0.758015] scsi 0:0:0:0: Direct-Access     ATA      QEMU HARDDISK    ion  PQ: 0 ANSI: 5
[    0.760542] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    0.760669] sd 0:0:0:0: [sda] 268435456 512-byte logical blocks: (137 GB/128 GiB)
[    0.760778] sd 0:0:0:0: [sda] Write Protect is off
[    0.760787] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    0.760847] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    0.760981] sd 0:0:0:0: [sda] Preferred minimum I/O size 512 bytes
[    0.761319] scsi 1:0:0:0: Direct-Access     ATA      QEMU HARDDISK    ion  PQ: 0 ANSI: 5
[    0.762808] sd 1:0:0:0: Attached scsi generic sg1 type 0
[    0.763004] sd 1:0:0:0: [sdb] 732 512-byte logical blocks: (375 kB/366 KiB)
[    0.763649] sd 1:0:0:0: [sdb] Write Protect is off
[    0.763661] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    0.763965] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    0.764026] sd 1:0:0:0: [sdb] Preferred minimum I/O size 512 bytes
[    0.765923] scsi 5:0:0:0: Direct-Access     ATA      QEMU HARDDISK    ion  PQ: 0 ANSI: 5
[    0.767987] sd 5:0:0:0: Attached scsi generic sg2 type 0
[    0.768626] sd 5:0:0:0: [sdc] 8388608 512-byte logical blocks: (4.29 GB/4.00 GiB)
[    0.769909] sd 5:0:0:0: [sdc] Write Protect is off
[    0.769920] sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[    0.770515] sd 1:0:0:0: [sdb] Attached SCSI disk
[    0.770900] sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    0.771782] sd 5:0:0:0: [sdc] Preferred minimum I/O size 512 bytes
[    0.773900]  sda: sda1 sda14 sda15
[    0.774499] sd 5:0:0:0: [sdc] Attached SCSI disk
[    0.774907] sd 0:0:0:0: [sda] Attached SCSI disk
[    1.084011] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
[   10.887350] md: Waiting for all devices to be available before autodetect
[   10.887395] md: If you don't use raid, use raid=noautodetect
[   10.887445] ------------[ cut here ]------------
[   10.887448] WARNING: CPU: 1 PID: 1 at drivers/base/dd.c:741 wait_for_device_probe+0x1a/0x160
[   10.893987] Modules linked in:
[   10.894314] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W         5.18.0-next-20220526-00004-g74f936013b08-dirty #20
[   10.895389] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   10.896590] RIP: 0010:wait_for_device_probe+0x1a/0x160
[   10.897215] Code: 00 e8 fa e4 b5 ff 8b 44 24 04 48 83 c4 08 5b c3 0f 1f 44 00 00 53 48 83 ec 30 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 <0f> 0b e8 1f ac 57 00 8b 15 f1 b3 24 01 85 d2 75 3d 48 c7 c7 60 2f
[   10.899269] RSP: 0000:ffff9c7900013ec8 EFLAGS: 00010246
[   10.899762] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000000
[   10.900424] RDX: 0000000000000000 RSI: ffffffff9d998089 RDI: 00000000ffffffff
[   10.901060] RBP: 0000000000000214 R08: 80000000ffffe1ca R09: ffff9c7900013ea8
[   10.901839] R10: 3fffffffffffffff R11: 0000000000000000 R12: ffff8964c0343640
[   10.902615] R13: ffffffff9e51791c R14: 0000000000000000 R15: 0000000000000000
[   10.903371] FS:  0000000000000000(0000) GS:ffff896637c80000(0000) knlGS:0000000000000000
[   10.904188] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.904721] CR2: 0000000000000000 CR3: 00000001ed20c001 CR4: 0000000000370ee0
[   10.905340] Call Trace:
[   10.905558]  <TASK>
[   10.905739]  md_run_setup+0x3c/0x6a
[   10.906036]  prepare_namespace+0x35/0x16a
[   10.906391]  kernel_init_freeable+0x207/0x212
[   10.906748]  ? rest_init+0xc0/0xc0
[   10.907021]  kernel_init+0x16/0x120
[   10.907312]  ret_from_fork+0x1f/0x30
[   10.907617]  </TASK>
[   10.907796] ---[ end trace 0000000000000000 ]---
[   10.908159] md: Autodetecting RAID arrays.
[   10.908160] md: autorun ...
[   10.908160] md: ... autorun DONE.
[   10.911301] EXT4-fs (sda1): INFO: recovery required on readonly filesystem
[   10.911303] EXT4-fs (sda1): write access will be enabled during recovery
[   11.038001] EXT4-fs (sda1): orphan cleanup on readonly fs
[   11.038466] EXT4-fs (sda1): 2 orphan inodes deleted
[   11.038475] EXT4-fs (sda1): recovery complete
[   11.053033] EXT4-fs (sda1): mounted filesystem with ordered data mode. Quota mode: none.
[   11.053074] VFS: Mounted root (ext4 filesystem) readonly on device 8:1.
[   11.053175] devtmpfs: mounted
[   11.061921] Freeing unused kernel image (initmem) memory: 1680K
[   11.061930] Write protecting the kernel read-only data: 24576k
[   11.071108] Freeing unused kernel image (text/rodata gap) memory: 2032K
[   11.073596] Freeing unused kernel image (rodata/data gap) memory: 924K
[   11.073609] Run /sbin/init as init process


Kind regards,
Niklas
Saravana Kannan May 30, 2022, 4:59 a.m. UTC | #13
On Sun, May 29, 2022 at 1:34 AM 'Niklas Cassel' via kernel-team
<kernel-team@android.com> wrote:
>
> On Wed, May 25, 2022 at 12:49:00PM -0700, Saravana Kannan wrote:
> > On Wed, May 25, 2022 at 12:12 AM Sebastian Andrzej Siewior
> > <bigeasy@linutronix.de> wrote:
> > >
> > > On 2022-05-24 10:46:49 [-0700], Saravana Kannan wrote:
> > > > > Removing probe_timeout_waitqueue (as suggested) or setting the timeout
> > > > > to 0 avoids the delay.
> > > >
> > > > In your case, I think it might be working as intended? Curious, what
> > > > was the call stack in your case where it was blocked?
> > >
> > > Why is then there 10sec delay during boot? The backtrace is
> > > |------------[ cut here ]------------
> > > |WARNING: CPU: 4 PID: 1 at drivers/base/dd.c:742 wait_for_device_probe+0x30/0x110
> > > |Modules linked in:
> > > |CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc5+ #154
> > > |RIP: 0010:wait_for_device_probe+0x30/0x110
> > > |Call Trace:
> > > | <TASK>
> > > | prepare_namespace+0x2b/0x160
> > > | kernel_init_freeable+0x2b3/0x2dd
> > > | kernel_init+0x11/0x110
> > > | ret_from_fork+0x22/0x30
> > > | </TASK>
> > >
> > > Looking closer, it can't access init. This in particular box boots
> > > directly the kernel without an initramfs so the kernel later mounts
> > > /dev/sda1 and everything is good.  So that seems to be the reason…
> >
>
> Hello there,
>
> My (QEMU) boot times were recently extended by 10 seconds.
> Looking at the timestamps, it looks like nothing is being done for 10 whole
> seconds.
>
> A git bisect landed me at the patch in $subject:
> 2b28a1a84a0e ("driver core: Extend deferred probe timeout on driver registration")
>
> Adding a WARN_ON(1) in wait_for_device_probe(), as requested by the patch
> author from the others seeing a regression with this patch, gives two different
> stacktraces during boot:

Thanks for the report. My patch was exposing an existing issue. I
already sent a fix for that a few days ago. Waiting for it to get
picked up.

-Saravana

>
> [    0.459633] printk: console [netcon0] enabled
> [    0.459636] printk: console [netcon0] printing thread started
> [    0.459637] netconsole: network logging started
> [    0.459896] cfg80211: Loading compiled-in X.509 certificates for regulatory database
> [    0.460230] kworker/u8:6 (105) used greatest stack depth: 14744 bytes left
> [    0.461031] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
> [    0.461077] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
> [    0.461085] cfg80211: failed to load regulatory.db
> [    0.461113] ALSA device list:
> [    0.461116]   No soundcards found.
> [    0.461614] ------------[ cut here ]------------
> [    0.461615] WARNING: CPU: 2 PID: 1 at drivers/base/dd.c:741 wait_for_device_probe+0x1a/0x160
> [    0.485809] Modules linked in:
> [    0.486089] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.18.0-next-20220526-00004-g74f936013b08-dirty #20
> [    0.486842] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [    0.487707] RIP: 0010:wait_for_device_probe+0x1a/0x160
> [    0.488103] Code: 00 e8 fa e4 b5 ff 8b 44 24 04 48 83 c4 08 5b c3 0f 1f 44 00 00 53 48 83 ec 30 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 <0f> 0b e8 1f ac 57 00 8b 15 f1 b3 24 01 85 d2 75 3d 48 c7 c7 60 2f
> [    0.489539] RSP: 0000:ffff9c7900013ed8 EFLAGS: 00010246
> [    0.489965] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000d02
> [    0.490597] RDX: 0000000000000cc2 RSI: 0000000000000000 RDI: 000000000002e990
> [    0.491181] RBP: 0000000000000214 R08: 000000000000000f R09: 0000000000000064
> [    0.491788] R10: ffff9c7900013c6c R11: 0000000000000000 R12: ffff8964c0343640
> [    0.492384] R13: ffffffff9e51791c R14: 0000000000000000 R15: 0000000000000000
> [    0.492960] FS:  0000000000000000(0000) GS:ffff896637d00000(0000) knlGS:0000000000000000
> [    0.493658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.494501] CR2: 0000000000000000 CR3: 00000001ed20c001 CR4: 0000000000370ee0
> [    0.495621] Call Trace:
> [    0.496059]  <TASK>
> [    0.496266]  ? init_eaccess+0x3b/0x76
> [    0.496657]  prepare_namespace+0x30/0x16a
> [    0.497016]  kernel_init_freeable+0x207/0x212
> [    0.497407]  ? rest_init+0xc0/0xc0
> [    0.497714]  kernel_init+0x16/0x120
> [    0.498250]  ret_from_fork+0x1f/0x30
> [    0.498898]  </TASK>
> [    0.499307] ---[ end trace 0000000000000000 ]---
> [    0.748413] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [    0.749053] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [    0.749461] ata2.00: ATA-7: QEMU HARDDISK, version, max UDMA/100
> [    0.749470] ata2.00: 732 sectors, multi 16: LBA48 NCQ (depth 32)
> [    0.749479] ata2.00: applying bridge limits
> [    0.750915] ata4: SATA link down (SStatus 0 SControl 300)
> [    0.752110] ata5: SATA link down (SStatus 0 SControl 300)
> [    0.753424] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [    0.754877] ata3: SATA link down (SStatus 0 SControl 300)
> [    0.755342] ata1.00: ATA-7: QEMU HARDDISK, version, max UDMA/100
> [    0.755377] ata1.00: 268435456 sectors, multi 16: LBA48 NCQ (depth 32)
> [    0.755387] ata1.00: applying bridge limits
> [    0.755486] ata6.00: ATA-7: QEMU HARDDISK, version, max UDMA/100
> [    0.755492] ata6.00: 8388608 sectors, multi 16: LBA48 NCQ (depth 32)
> [    0.755500] ata6.00: applying bridge limits
> [    0.757330] ata1.00: configured for UDMA/100
> [    0.757441] ata6.00: configured for UDMA/100
> [    0.757505] ata2.00: configured for UDMA/100
> [    0.758015] scsi 0:0:0:0: Direct-Access     ATA      QEMU HARDDISK    ion  PQ: 0 ANSI: 5
> [    0.760542] sd 0:0:0:0: Attached scsi generic sg0 type 0
> [    0.760669] sd 0:0:0:0: [sda] 268435456 512-byte logical blocks: (137 GB/128 GiB)
> [    0.760778] sd 0:0:0:0: [sda] Write Protect is off
> [    0.760787] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [    0.760847] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> [    0.760981] sd 0:0:0:0: [sda] Preferred minimum I/O size 512 bytes
> [    0.761319] scsi 1:0:0:0: Direct-Access     ATA      QEMU HARDDISK    ion  PQ: 0 ANSI: 5
> [    0.762808] sd 1:0:0:0: Attached scsi generic sg1 type 0
> [    0.763004] sd 1:0:0:0: [sdb] 732 512-byte logical blocks: (375 kB/366 KiB)
> [    0.763649] sd 1:0:0:0: [sdb] Write Protect is off
> [    0.763661] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> [    0.763965] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> [    0.764026] sd 1:0:0:0: [sdb] Preferred minimum I/O size 512 bytes
> [    0.765923] scsi 5:0:0:0: Direct-Access     ATA      QEMU HARDDISK    ion  PQ: 0 ANSI: 5
> [    0.767987] sd 5:0:0:0: Attached scsi generic sg2 type 0
> [    0.768626] sd 5:0:0:0: [sdc] 8388608 512-byte logical blocks: (4.29 GB/4.00 GiB)
> [    0.769909] sd 5:0:0:0: [sdc] Write Protect is off
> [    0.769920] sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> [    0.770515] sd 1:0:0:0: [sdb] Attached SCSI disk
> [    0.770900] sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> [    0.771782] sd 5:0:0:0: [sdc] Preferred minimum I/O size 512 bytes
> [    0.773900]  sda: sda1 sda14 sda15
> [    0.774499] sd 5:0:0:0: [sdc] Attached SCSI disk
> [    0.774907] sd 0:0:0:0: [sda] Attached SCSI disk
> [    1.084011] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
> [   10.887350] md: Waiting for all devices to be available before autodetect
> [   10.887395] md: If you don't use raid, use raid=noautodetect
> [   10.887445] ------------[ cut here ]------------
> [   10.887448] WARNING: CPU: 1 PID: 1 at drivers/base/dd.c:741 wait_for_device_probe+0x1a/0x160
> [   10.893987] Modules linked in:
> [   10.894314] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W         5.18.0-next-20220526-00004-g74f936013b08-dirty #20
> [   10.895389] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [   10.896590] RIP: 0010:wait_for_device_probe+0x1a/0x160
> [   10.897215] Code: 00 e8 fa e4 b5 ff 8b 44 24 04 48 83 c4 08 5b c3 0f 1f 44 00 00 53 48 83 ec 30 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 <0f> 0b e8 1f ac 57 00 8b 15 f1 b3 24 01 85 d2 75 3d 48 c7 c7 60 2f
> [   10.899269] RSP: 0000:ffff9c7900013ec8 EFLAGS: 00010246
> [   10.899762] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000000
> [   10.900424] RDX: 0000000000000000 RSI: ffffffff9d998089 RDI: 00000000ffffffff
> [   10.901060] RBP: 0000000000000214 R08: 80000000ffffe1ca R09: ffff9c7900013ea8
> [   10.901839] R10: 3fffffffffffffff R11: 0000000000000000 R12: ffff8964c0343640
> [   10.902615] R13: ffffffff9e51791c R14: 0000000000000000 R15: 0000000000000000
> [   10.903371] FS:  0000000000000000(0000) GS:ffff896637c80000(0000) knlGS:0000000000000000
> [   10.904188] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   10.904721] CR2: 0000000000000000 CR3: 00000001ed20c001 CR4: 0000000000370ee0
> [   10.905340] Call Trace:
> [   10.905558]  <TASK>
> [   10.905739]  md_run_setup+0x3c/0x6a
> [   10.906036]  prepare_namespace+0x35/0x16a
> [   10.906391]  kernel_init_freeable+0x207/0x212
> [   10.906748]  ? rest_init+0xc0/0xc0
> [   10.907021]  kernel_init+0x16/0x120
> [   10.907312]  ret_from_fork+0x1f/0x30
> [   10.907617]  </TASK>
> [   10.907796] ---[ end trace 0000000000000000 ]---
> [   10.908159] md: Autodetecting RAID arrays.
> [   10.908160] md: autorun ...
> [   10.908160] md: ... autorun DONE.
> [   10.911301] EXT4-fs (sda1): INFO: recovery required on readonly filesystem
> [   10.911303] EXT4-fs (sda1): write access will be enabled during recovery
> [   11.038001] EXT4-fs (sda1): orphan cleanup on readonly fs
> [   11.038466] EXT4-fs (sda1): 2 orphan inodes deleted
> [   11.038475] EXT4-fs (sda1): recovery complete
> [   11.053033] EXT4-fs (sda1): mounted filesystem with ordered data mode. Quota mode: none.
> [   11.053074] VFS: Mounted root (ext4 filesystem) readonly on device 8:1.
> [   11.053175] devtmpfs: mounted
> [   11.061921] Freeing unused kernel image (initmem) memory: 1680K
> [   11.061930] Write protecting the kernel read-only data: 24576k
> [   11.071108] Freeing unused kernel image (text/rodata gap) memory: 2032K
> [   11.073596] Freeing unused kernel image (rodata/data gap) memory: 924K
> [   11.073609] Run /sbin/init as init process
>
>
> Kind regards,
> Niklas
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7123524a86b8..fcc1dfc877a9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -939,8 +939,10 @@ 
 			[KNL] Debugging option to set a timeout in seconds for
 			deferred probe to give up waiting on dependencies to
 			probe. Only specific dependencies (subsystems or
-			drivers) that have opted in will be ignored. A timeout of 0
-			will timeout at the end of initcalls. This option will also
+			drivers) that have opted in will be ignored. A timeout
+			of 0 will timeout at the end of initcalls. If the time
+			out hasn't expired, it'll be restarted by each
+			successful driver registration. This option will also
 			dump out devices still on the deferred probe list after
 			retrying.
 
diff --git a/drivers/base/base.h b/drivers/base/base.h
index 2882af26392a..ab71403d102f 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -159,6 +159,7 @@  extern char *make_class_name(const char *name, struct kobject *kobj);
 extern int devres_release_all(struct device *dev);
 extern void device_block_probing(void);
 extern void device_unblock_probing(void);
+extern void deferred_probe_extend_timeout(void);
 
 /* /sys/devices directory */
 extern struct kset *devices_kset;
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index f47cab21430f..603379b5f9dd 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -255,7 +255,12 @@  static int deferred_devs_show(struct seq_file *s, void *data)
 }
 DEFINE_SHOW_ATTRIBUTE(deferred_devs);
 
+#ifdef CONFIG_MODULES
+int driver_deferred_probe_timeout = 10;
+#else
 int driver_deferred_probe_timeout;
+#endif
+
 EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
 static DECLARE_WAIT_QUEUE_HEAD(probe_timeout_waitqueue);
 
@@ -315,6 +320,20 @@  static void deferred_probe_timeout_work_func(struct work_struct *work)
 }
 static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_work_func);
 
+void deferred_probe_extend_timeout(void)
+{
+	/*
+	 * If the work hasn't been queued yet or if the work expired, don't
+	 * start a new one.
+	 */
+	if (cancel_delayed_work(&deferred_probe_timeout_work)) {
+		schedule_delayed_work(&deferred_probe_timeout_work,
+				driver_deferred_probe_timeout * HZ);
+		pr_debug("Extended deferred probe timeout by %d secs\n",
+					driver_deferred_probe_timeout);
+	}
+}
+
 /**
  * deferred_probe_initcall() - Enable probing of deferred devices
  *
diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 8c0d33e182fd..77a77b2095cd 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -177,6 +177,7 @@  int driver_register(struct device_driver *drv)
 		return ret;
 	}
 	kobject_uevent(&drv->p->kobj, KOBJ_ADD);
+	deferred_probe_extend_timeout();
 
 	return ret;
 }