mbox series

[v2,0/5] Allow for qcom-pdc, pinctrl-msm and qcom-scm drivers to be loadable as modules

Message ID 20200625001039.56174-1-john.stultz@linaro.org
Headers show
Series Allow for qcom-pdc, pinctrl-msm and qcom-scm drivers to be loadable as modules | expand

Message

John Stultz June 25, 2020, 12:10 a.m. UTC
This patch series provides exports and config tweaks to allow
the qcom-pdc, pinctrl-msm and qcom-scm drivers to be able to be
configured as permement modules (particularlly useful for the
Android Generic Kernel Image efforts).

Feedback would be appreciated!

New in v2:
* Fix up MSM_PINCTRL Kconfig dependency logic so we match
  QCOM_SCM.
* Minor tweaks and corrections suggested by Bjorn and Maulik


thanks
-john

Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Lina Iyer <ilina@codeaurora.org>
Cc: Maulik Shah <mkshah@codeaurora.org>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-arm-msm@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: linux-gpio@vger.kernel.org


John Stultz (5):
  irq: irqdomain: Export irq_domain_update_bus_token
  irq: irqchip: Export irq_chip_retrigger_hierarchy and
    irq_chip_set_vcpu_affinity_parent
  irqchip: Allow QCOM_PDC to be loadable as a permanent module
  pinctrl: qcom: Allow pinctrl-msm code to be loadable as a module
  firmware: QCOM_SCM: Allow qcom_scm driver to be loadable as a
    permenent module

 drivers/firmware/Kconfig           |  2 +-
 drivers/firmware/Makefile          |  3 ++-
 drivers/firmware/qcom_scm.c        |  4 ++++
 drivers/iommu/Kconfig              |  2 ++
 drivers/irqchip/Kconfig            |  2 +-
 drivers/irqchip/qcom-pdc.c         | 31 ++++++++++++++++++++++++++++++
 drivers/pinctrl/qcom/Kconfig       | 24 ++++++++++++++++++++++-
 drivers/pinctrl/qcom/pinctrl-msm.c |  2 ++
 kernel/irq/chip.c                  |  3 ++-
 kernel/irq/irqdomain.c             |  1 +
 10 files changed, 69 insertions(+), 5 deletions(-)

Comments

John Stultz July 10, 2020, 3:28 a.m. UTC | #1
On Thu, Jul 2, 2020 at 7:18 AM Will Deacon <will@kernel.org> wrote:
> On Thu, Jun 25, 2020 at 12:10:39AM +0000, John Stultz wrote:

> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig

> > index b510f67dfa49..714893535dd2 100644

> > --- a/drivers/iommu/Kconfig

> > +++ b/drivers/iommu/Kconfig

> > @@ -381,6 +381,7 @@ config SPAPR_TCE_IOMMU

> >  config ARM_SMMU

> >       tristate "ARM Ltd. System MMU (SMMU) Support"

> >       depends on (ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64)) && MMU

> > +     depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y

> >       select IOMMU_API

> >       select IOMMU_IO_PGTABLE_LPAE

> >       select ARM_DMA_USE_IOMMU if ARM

>

> This looks like a giant hack. Is there another way to handle this?


Sorry for the slow response here.

So, I agree the syntax looks strange (requiring a comment obviously
isn't a good sign), but it's a fairly common way to ensure drivers
don't get built in if they optionally depend on another driver that
can be built as a module.
  See "RFKILL || !RFKILL", "EXTCON || !EXTCON", or "USB_GADGET ||
!USB_GADGET" in various Kconfig files.

I'm open to using a different method, and in a different thread you
suggested using something like symbol_get(). I need to look into it
more, but that approach looks even more messy and prone to runtime
failures. Blocking the unwanted case at build time seems a bit cleaner
to me, even if the syntax is odd.

thanks
-john
Stephen Boyd July 10, 2020, 6:02 a.m. UTC | #2
Quoting Marc Zyngier (2020-06-27 02:37:47)
> On Sat, 27 Jun 2020 02:34:25 +0100,

> John Stultz <john.stultz@linaro.org> wrote:

> > 

> > On Fri, Jun 26, 2020 at 12:42 AM Stephen Boyd <swboyd@chromium.org> wrote:

> > >

> > >

> > > Is there any reason to use IRQCHIP_DECLARE if this can work as a

> > > platform device driver?

> > >

> > 

> > Hey! Thanks so much for the review!

> > 

> > Mostly it was done this way to minimize the change in the non-module

> > case. But if you'd rather avoid the #ifdefery I'll respin it without.

> 

> That would certainly be my own preference. In general, IRQCHIP_DECLARE

> and platform drivers should be mutually exclusive in the same driver:

> if you can delay the probing and have it as a proper platform device,

> then this should be the one true way.

> 


Does it work? I haven't looked in detail but I worry that the child
irqdomain (i.e. pinctrl-msm) would need to delay probing until this
parent irqdomain is registered. Or has the hierarchical irqdomain code
been updated to handle the parent child relationship and wait for things
to probe or be loaded?
John Stultz July 10, 2020, 10:21 p.m. UTC | #3
On Fri, Jul 10, 2020 at 12:54 AM Will Deacon <will@kernel.org> wrote:
> On Thu, Jul 09, 2020 at 08:28:45PM -0700, John Stultz wrote:

> > On Thu, Jul 2, 2020 at 7:18 AM Will Deacon <will@kernel.org> wrote:

> > > On Thu, Jun 25, 2020 at 12:10:39AM +0000, John Stultz wrote:

> > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig

> > > > index b510f67dfa49..714893535dd2 100644

> > > > --- a/drivers/iommu/Kconfig

> > > > +++ b/drivers/iommu/Kconfig

> > > > @@ -381,6 +381,7 @@ config SPAPR_TCE_IOMMU

> > > >  config ARM_SMMU

> > > >       tristate "ARM Ltd. System MMU (SMMU) Support"

> > > >       depends on (ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64)) && MMU

> > > > +     depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y

> > > >       select IOMMU_API

> > > >       select IOMMU_IO_PGTABLE_LPAE

> > > >       select ARM_DMA_USE_IOMMU if ARM

> > >

> > > This looks like a giant hack. Is there another way to handle this?

> >

> > Sorry for the slow response here.

> >

> > So, I agree the syntax looks strange (requiring a comment obviously

> > isn't a good sign), but it's a fairly common way to ensure drivers

> > don't get built in if they optionally depend on another driver that

> > can be built as a module.

> >   See "RFKILL || !RFKILL", "EXTCON || !EXTCON", or "USB_GADGET ||

> > !USB_GADGET" in various Kconfig files.

> >

> > I'm open to using a different method, and in a different thread you

> > suggested using something like symbol_get(). I need to look into it

> > more, but that approach looks even more messy and prone to runtime

> > failures. Blocking the unwanted case at build time seems a bit cleaner

> > to me, even if the syntax is odd.

>

> Maybe just split it out then, so that the ARM_SMMU entry doesn't have this,

> as that driver _really_ doesn't care about SoC details like this. In other

> words, add a new entry along the lines of:

>

>         config ARM_SMMU_QCOM_IMPL

>         default y

>         #if QCOM_SCM=m this can't be =y

>         depends on ARM_SMMU & (QCOM_SCM || !QCOM_SCM)

>

> and then have arm-smmu.h provide a static inline qcom_smmu_impl_init()

> which returns -ENODEV if CONFIG_ARM_SMMU_QCOM_IMPL=n and hack the Makefile

> so that we don't bother to compile arm-smmu-qcom.o in that case.

>

> Would that work?


I think this proposal still has problems with the directionality of the call.

The arm-smmu-impl.o calls to arm-smmu-qcom.o which calls qcom_scm.o
So if qcom_scm.o is part of a module, the calling code in
arm-smmu-qcom.o also needs to be a module, which means CONFIG_ARM_SMMU
needs to be a module.

I know you said the arm-smmu driver doesn't care about SoC details,
but the trouble is that currently the arm-smmu driver does directly
call the qcom-scm code. So it is a real dependency. However, if
QCOM_SCM is not configured, it calls stubs and that's ok.  In that
way, the "depends on QCOM_SCM || !QCOM_SCM" line actually makes sense.
It looks terrible because we're used to boolean logic, but it's
ternary.

Maybe can have the ARM_SMMU_QCOM_IMPL approach you suggest above, but
that just holds the issue out at arms length, because we're still
going to need to have:
  depends on ARM_SMMU_QCOM_IMPL || !ARM_SMMU_QCOM_IMPL
in the ARM_SMMU definition, which I suspect you're wanting to avoid.

Otherwise the only thing I can think of is a deeper reworking of the
arm-smmu-impl code so that the arm-smmu-qcom code probes itself and
registers its hooks with the arm-smmu core.
That way the arm-smmu driver would not directly call any SoC specific
code (and thus have no dependencies outward). But it's probably a fair
amount of churn vs the extra depends string.

thanks
-john
John Stultz July 10, 2020, 10:44 p.m. UTC | #4
On Thu, Jul 9, 2020 at 11:02 PM Stephen Boyd <swboyd@chromium.org> wrote:
> Quoting Marc Zyngier (2020-06-27 02:37:47)

> > On Sat, 27 Jun 2020 02:34:25 +0100,

> > John Stultz <john.stultz@linaro.org> wrote:

> > >

> > > On Fri, Jun 26, 2020 at 12:42 AM Stephen Boyd <swboyd@chromium.org> wrote:

> > > >

> > > >

> > > > Is there any reason to use IRQCHIP_DECLARE if this can work as a

> > > > platform device driver?

> > > >

> > >

> > > Hey! Thanks so much for the review!

> > >

> > > Mostly it was done this way to minimize the change in the non-module

> > > case. But if you'd rather avoid the #ifdefery I'll respin it without.

> >

> > That would certainly be my own preference. In general, IRQCHIP_DECLARE

> > and platform drivers should be mutually exclusive in the same driver:

> > if you can delay the probing and have it as a proper platform device,

> > then this should be the one true way.

> >

>

> Does it work? I haven't looked in detail but I worry that the child

> irqdomain (i.e. pinctrl-msm) would need to delay probing until this

> parent irqdomain is registered. Or has the hierarchical irqdomain code

> been updated to handle the parent child relationship and wait for things

> to probe or be loaded?


So I can't say I know the underlying hardware particularly well, but
I've been using this successfully on the Dragonboard 845c with both
static builds as well as module enabled builds.
And the same patch has been in the android-mainline and android-5.4
kernels for a while without objections from QCOM.

As to the probe ordering question, Saravana can maybe speak in more
detail if it's involved in this case but the fw_devlink code has
addressed many of these sorts of ordering issues.
However, I'm not sure if I'm lucking into the right probe order, as we
have been able to boot android-mainline w/ both fw_devlink=on and
fw_devlink=off (though in the =off case, we need
deferred_probe_timeout=30 to give us a bit more time for modules to
load after init starts).

thanks
-john
Stephen Boyd July 10, 2020, 11:27 p.m. UTC | #5
Quoting John Stultz (2020-07-10 15:44:18)
> On Thu, Jul 9, 2020 at 11:02 PM Stephen Boyd <swboyd@chromium.org> wrote:

> >

> > Does it work? I haven't looked in detail but I worry that the child

> > irqdomain (i.e. pinctrl-msm) would need to delay probing until this

> > parent irqdomain is registered. Or has the hierarchical irqdomain code

> > been updated to handle the parent child relationship and wait for things

> > to probe or be loaded?

> 

> So I can't say I know the underlying hardware particularly well, but

> I've been using this successfully on the Dragonboard 845c with both

> static builds as well as module enabled builds.

> And the same patch has been in the android-mainline and android-5.4

> kernels for a while without objections from QCOM.

> 

> As to the probe ordering question, Saravana can maybe speak in more

> detail if it's involved in this case but the fw_devlink code has

> addressed many of these sorts of ordering issues.

> However, I'm not sure if I'm lucking into the right probe order, as we

> have been able to boot android-mainline w/ both fw_devlink=on and

> fw_devlink=off (though in the =off case, we need

> deferred_probe_timeout=30 to give us a bit more time for modules to

> load after init starts).

> 


Ok I looked at the code (sorry for not checking earlier) and I see this in
msm_gpio_init()

        np = of_parse_phandle(pctrl->dev->of_node, "wakeup-parent", 0);
        if (np) {
                chip->irq.parent_domain = irq_find_matching_host(np,
                                                 DOMAIN_BUS_WAKEUP);
                of_node_put(np);
                if (!chip->irq.parent_domain)
                        return -EPROBE_DEFER;

so it looks like we'll probe defer the pinctrl driver until the pdc module
loads. Meaning it should work to have pinctrl builtin and pdc as a module.
Marc Zyngier July 12, 2020, 9:27 a.m. UTC | #6
On Sat, 11 Jul 2020 00:27:45 +0100,
Stephen Boyd <swboyd@chromium.org> wrote:
> 

> Quoting John Stultz (2020-07-10 15:44:18)

> > On Thu, Jul 9, 2020 at 11:02 PM Stephen Boyd <swboyd@chromium.org> wrote:

> > >

> > > Does it work? I haven't looked in detail but I worry that the child

> > > irqdomain (i.e. pinctrl-msm) would need to delay probing until this

> > > parent irqdomain is registered. Or has the hierarchical irqdomain code

> > > been updated to handle the parent child relationship and wait for things

> > > to probe or be loaded?

> > 

> > So I can't say I know the underlying hardware particularly well, but

> > I've been using this successfully on the Dragonboard 845c with both

> > static builds as well as module enabled builds.

> > And the same patch has been in the android-mainline and android-5.4

> > kernels for a while without objections from QCOM.

> > 

> > As to the probe ordering question, Saravana can maybe speak in more

> > detail if it's involved in this case but the fw_devlink code has

> > addressed many of these sorts of ordering issues.

> > However, I'm not sure if I'm lucking into the right probe order, as we

> > have been able to boot android-mainline w/ both fw_devlink=on and

> > fw_devlink=off (though in the =off case, we need

> > deferred_probe_timeout=30 to give us a bit more time for modules to

> > load after init starts).

> > 

> 

> Ok I looked at the code (sorry for not checking earlier) and I see this in

> msm_gpio_init()

> 

>         np = of_parse_phandle(pctrl->dev->of_node, "wakeup-parent", 0);

>         if (np) {

>                 chip->irq.parent_domain = irq_find_matching_host(np,

>                                                  DOMAIN_BUS_WAKEUP);

>                 of_node_put(np);

>                 if (!chip->irq.parent_domain)

>                         return -EPROBE_DEFER;

> 

> so it looks like we'll probe defer the pinctrl driver until the pdc module

> loads. Meaning it should work to have pinctrl builtin and pdc as a module.


What I hope is that eventually fw_devlink will become the norm (on by
default), and that probe deferral will become a thing of the past.

	M.

-- 
Without deviation from the norm, progress is not possible.
Will Deacon July 13, 2020, 8:41 p.m. UTC | #7
On Fri, Jul 10, 2020 at 03:21:53PM -0700, John Stultz wrote:
> On Fri, Jul 10, 2020 at 12:54 AM Will Deacon <will@kernel.org> wrote:

> > On Thu, Jul 09, 2020 at 08:28:45PM -0700, John Stultz wrote:

> > > On Thu, Jul 2, 2020 at 7:18 AM Will Deacon <will@kernel.org> wrote:

> > > > On Thu, Jun 25, 2020 at 12:10:39AM +0000, John Stultz wrote:

> > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig

> > > > > index b510f67dfa49..714893535dd2 100644

> > > > > --- a/drivers/iommu/Kconfig

> > > > > +++ b/drivers/iommu/Kconfig

> > > > > @@ -381,6 +381,7 @@ config SPAPR_TCE_IOMMU

> > > > >  config ARM_SMMU

> > > > >       tristate "ARM Ltd. System MMU (SMMU) Support"

> > > > >       depends on (ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64)) && MMU

> > > > > +     depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y

> > > > >       select IOMMU_API

> > > > >       select IOMMU_IO_PGTABLE_LPAE

> > > > >       select ARM_DMA_USE_IOMMU if ARM

> > > >

> > > > This looks like a giant hack. Is there another way to handle this?

> > >

> > > Sorry for the slow response here.

> > >

> > > So, I agree the syntax looks strange (requiring a comment obviously

> > > isn't a good sign), but it's a fairly common way to ensure drivers

> > > don't get built in if they optionally depend on another driver that

> > > can be built as a module.

> > >   See "RFKILL || !RFKILL", "EXTCON || !EXTCON", or "USB_GADGET ||

> > > !USB_GADGET" in various Kconfig files.

> > >

> > > I'm open to using a different method, and in a different thread you

> > > suggested using something like symbol_get(). I need to look into it

> > > more, but that approach looks even more messy and prone to runtime

> > > failures. Blocking the unwanted case at build time seems a bit cleaner

> > > to me, even if the syntax is odd.

> >

> > Maybe just split it out then, so that the ARM_SMMU entry doesn't have this,

> > as that driver _really_ doesn't care about SoC details like this. In other

> > words, add a new entry along the lines of:

> >

> >         config ARM_SMMU_QCOM_IMPL

> >         default y

> >         #if QCOM_SCM=m this can't be =y

> >         depends on ARM_SMMU & (QCOM_SCM || !QCOM_SCM)

> >

> > and then have arm-smmu.h provide a static inline qcom_smmu_impl_init()

> > which returns -ENODEV if CONFIG_ARM_SMMU_QCOM_IMPL=n and hack the Makefile

> > so that we don't bother to compile arm-smmu-qcom.o in that case.

> >

> > Would that work?

> 

> I think this proposal still has problems with the directionality of the call.

> 

> The arm-smmu-impl.o calls to arm-smmu-qcom.o which calls qcom_scm.o

> So if qcom_scm.o is part of a module, the calling code in

> arm-smmu-qcom.o also needs to be a module, which means CONFIG_ARM_SMMU

> needs to be a module.

> 

> I know you said the arm-smmu driver doesn't care about SoC details,

> but the trouble is that currently the arm-smmu driver does directly

> call the qcom-scm code. So it is a real dependency. However, if

> QCOM_SCM is not configured, it calls stubs and that's ok.  In that

> way, the "depends on QCOM_SCM || !QCOM_SCM" line actually makes sense.

> It looks terrible because we're used to boolean logic, but it's

> ternary.


Yes, it looks ugly, but the part I really have issues with is that building
QCOM_SCM=m and ARM_SMMU=y is perfectly fine if you don't run on an SoC
with the qcom implementation. I don't see why we need to enforce things
here beyond making sure that all selectable permutations _build_ and
fail gracefully at runtime on the qcom SoC if SCM isn't available.

Will
Will Deacon July 14, 2020, 7:56 a.m. UTC | #8
On Mon, Jul 13, 2020 at 01:48:29PM -0700, John Stultz wrote:
> On Mon, Jul 13, 2020 at 1:41 PM Will Deacon <will@kernel.org> wrote:

> > On Fri, Jul 10, 2020 at 03:21:53PM -0700, John Stultz wrote:

> > > On Fri, Jul 10, 2020 at 12:54 AM Will Deacon <will@kernel.org> wrote:

> > > > On Thu, Jul 09, 2020 at 08:28:45PM -0700, John Stultz wrote:

> > > > > On Thu, Jul 2, 2020 at 7:18 AM Will Deacon <will@kernel.org> wrote:

> > > > > > On Thu, Jun 25, 2020 at 12:10:39AM +0000, John Stultz wrote:

> > > > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig

> > > > > > > index b510f67dfa49..714893535dd2 100644

> > > > > > > --- a/drivers/iommu/Kconfig

> > > > > > > +++ b/drivers/iommu/Kconfig

> > > > > > > @@ -381,6 +381,7 @@ config SPAPR_TCE_IOMMU

> > > > > > >  config ARM_SMMU

> > > > > > >       tristate "ARM Ltd. System MMU (SMMU) Support"

> > > > > > >       depends on (ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64)) && MMU

> > > > > > > +     depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y

> > > > > > >       select IOMMU_API

> > > > > > >       select IOMMU_IO_PGTABLE_LPAE

> > > > > > >       select ARM_DMA_USE_IOMMU if ARM

> > > > > >

> > > > > > This looks like a giant hack. Is there another way to handle this?

> > > > >

> > > > > Sorry for the slow response here.

> > > > >

> > > > > So, I agree the syntax looks strange (requiring a comment obviously

> > > > > isn't a good sign), but it's a fairly common way to ensure drivers

> > > > > don't get built in if they optionally depend on another driver that

> > > > > can be built as a module.

> > > > >   See "RFKILL || !RFKILL", "EXTCON || !EXTCON", or "USB_GADGET ||

> > > > > !USB_GADGET" in various Kconfig files.

> > > > >

> > > > > I'm open to using a different method, and in a different thread you

> > > > > suggested using something like symbol_get(). I need to look into it

> > > > > more, but that approach looks even more messy and prone to runtime

> > > > > failures. Blocking the unwanted case at build time seems a bit cleaner

> > > > > to me, even if the syntax is odd.

> > > >

> > > > Maybe just split it out then, so that the ARM_SMMU entry doesn't have this,

> > > > as that driver _really_ doesn't care about SoC details like this. In other

> > > > words, add a new entry along the lines of:

> > > >

> > > >         config ARM_SMMU_QCOM_IMPL

> > > >         default y

> > > >         #if QCOM_SCM=m this can't be =y

> > > >         depends on ARM_SMMU & (QCOM_SCM || !QCOM_SCM)

> > > >

> > > > and then have arm-smmu.h provide a static inline qcom_smmu_impl_init()

> > > > which returns -ENODEV if CONFIG_ARM_SMMU_QCOM_IMPL=n and hack the Makefile

> > > > so that we don't bother to compile arm-smmu-qcom.o in that case.

> > > >

> > > > Would that work?

> > >

> > > I think this proposal still has problems with the directionality of the call.

> > >

> > > The arm-smmu-impl.o calls to arm-smmu-qcom.o which calls qcom_scm.o

> > > So if qcom_scm.o is part of a module, the calling code in

> > > arm-smmu-qcom.o also needs to be a module, which means CONFIG_ARM_SMMU

> > > needs to be a module.

> > >

> > > I know you said the arm-smmu driver doesn't care about SoC details,

> > > but the trouble is that currently the arm-smmu driver does directly

> > > call the qcom-scm code. So it is a real dependency. However, if

> > > QCOM_SCM is not configured, it calls stubs and that's ok.  In that

> > > way, the "depends on QCOM_SCM || !QCOM_SCM" line actually makes sense.

> > > It looks terrible because we're used to boolean logic, but it's

> > > ternary.

> >

> > Yes, it looks ugly, but the part I really have issues with is that building

> > QCOM_SCM=m and ARM_SMMU=y is perfectly fine if you don't run on an SoC

> > with the qcom implementation. I don't see why we need to enforce things

> > here beyond making sure that all selectable permutations _build_ and

> > fail gracefully at runtime on the qcom SoC if SCM isn't available.

> 

> Ok. Thanks, that context/rationale makes sense to me now!  I'll dig in

> and see if I can figure out the runtime get_symbol handling you

> suggested for the scm callout.


Cheers, but in the interest of not being a massive pain in the bum, please
yell if it ends up being tonnes of work and I'll close my eyes while
applying your original patch.

Will
John Stultz Oct. 28, 2020, 5:53 a.m. UTC | #9
On Mon, Jul 13, 2020 at 1:41 PM Will Deacon <will@kernel.org> wrote:
> On Fri, Jul 10, 2020 at 03:21:53PM -0700, John Stultz wrote:
> > On Fri, Jul 10, 2020 at 12:54 AM Will Deacon <will@kernel.org> wrote:
> > > On Thu, Jul 09, 2020 at 08:28:45PM -0700, John Stultz wrote:
> > > > On Thu, Jul 2, 2020 at 7:18 AM Will Deacon <will@kernel.org> wrote:
> > > > > On Thu, Jun 25, 2020 at 12:10:39AM +0000, John Stultz wrote:
> > > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > > > > > index b510f67dfa49..714893535dd2 100644
> > > > > > --- a/drivers/iommu/Kconfig
> > > > > > +++ b/drivers/iommu/Kconfig
> > > > > > @@ -381,6 +381,7 @@ config SPAPR_TCE_IOMMU
> > > > > >  config ARM_SMMU
> > > > > >       tristate "ARM Ltd. System MMU (SMMU) Support"
> > > > > >       depends on (ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64)) && MMU
> > > > > > +     depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y
> > > > > >       select IOMMU_API
> > > > > >       select IOMMU_IO_PGTABLE_LPAE
> > > > > >       select ARM_DMA_USE_IOMMU if ARM
> > > > >
> > > > > This looks like a giant hack. Is there another way to handle this?
> > > >
> > > > Sorry for the slow response here.
> > > >
> > > > So, I agree the syntax looks strange (requiring a comment obviously
> > > > isn't a good sign), but it's a fairly common way to ensure drivers
> > > > don't get built in if they optionally depend on another driver that
> > > > can be built as a module.
> > > >   See "RFKILL || !RFKILL", "EXTCON || !EXTCON", or "USB_GADGET ||
> > > > !USB_GADGET" in various Kconfig files.
> > > >
> > > > I'm open to using a different method, and in a different thread you
> > > > suggested using something like symbol_get(). I need to look into it
> > > > more, but that approach looks even more messy and prone to runtime
> > > > failures. Blocking the unwanted case at build time seems a bit cleaner
> > > > to me, even if the syntax is odd.
> > >
> > > Maybe just split it out then, so that the ARM_SMMU entry doesn't have this,
> > > as that driver _really_ doesn't care about SoC details like this. In other
> > > words, add a new entry along the lines of:
> > >
> > >         config ARM_SMMU_QCOM_IMPL
> > >         default y
> > >         #if QCOM_SCM=m this can't be =y
> > >         depends on ARM_SMMU & (QCOM_SCM || !QCOM_SCM)
> > >
> > > and then have arm-smmu.h provide a static inline qcom_smmu_impl_init()
> > > which returns -ENODEV if CONFIG_ARM_SMMU_QCOM_IMPL=n and hack the Makefile
> > > so that we don't bother to compile arm-smmu-qcom.o in that case.
> > >
> > > Would that work?
> >
> > I think this proposal still has problems with the directionality of the call.
> >
> > The arm-smmu-impl.o calls to arm-smmu-qcom.o which calls qcom_scm.o
> > So if qcom_scm.o is part of a module, the calling code in
> > arm-smmu-qcom.o also needs to be a module, which means CONFIG_ARM_SMMU
> > needs to be a module.
> >
> > I know you said the arm-smmu driver doesn't care about SoC details,
> > but the trouble is that currently the arm-smmu driver does directly
> > call the qcom-scm code. So it is a real dependency. However, if
> > QCOM_SCM is not configured, it calls stubs and that's ok.  In that
> > way, the "depends on QCOM_SCM || !QCOM_SCM" line actually makes sense.
> > It looks terrible because we're used to boolean logic, but it's
> > ternary.
>
> Yes, it looks ugly, but the part I really have issues with is that building
> QCOM_SCM=m and ARM_SMMU=y is perfectly fine if you don't run on an SoC
> with the qcom implementation. I don't see why we need to enforce things
> here beyond making sure that all selectable permutations _build_ and
> fail gracefully at runtime on the qcom SoC if SCM isn't available.

Hey Will,
  Sorry to dredge up this old thread. I've been off busy with other
things and didn't get around to trying to rework this until now.

Unfortunately I'm still having some trouble coming up with a better
solution. Initially I figured I'd rework the qcom_scm driver to, so
that we have the various qcom_scm_* as inline functions, which call
out to function pointers that the qcom_scm driver would register when
the module loaded (Oof, and unfortunately there are a *ton* qcom_scm_*
functions so its a bunch of churn).

The trouble I realized with that approach is that if the ARM_SMMU code
is built in, then it may try to use the qcom_scm code before the
module loads and sets those function pointers. So while it would build
ok, the issue would be when the arm_smmu_device_reset() is done by
done on arm_smmu_device_probe(), it wouldn't actually call the right
code.  There isn't a really good way to deal with the module loading
at some random time after arm_smmu_device_probe() completes.

This is the benefit of the module symbol dependency tracking: If the
arm_smmu.ko calls symbols in qcom_scm.ko then qcom_scm.ko has to load
first.
But if arm_smmu is built in, I haven't found a clear mechanism to
force qcom_scm to load before we probe, if it's configured as a
module.

I also looked into the idea of reworking the arm-smmu-impl code to be
modular instead, and while it does provide a similar method of using
function pointers to minimize the amount of symbols the arm-smmu code
needs to know about, the initialization call path is
arm_smmu_device_probe -> arm_smmu_impl_init -> qcom_smmu_impl_init. So
it doesn't really allow for dynamic registration of implementation
modules at runtime.

So I'm sort of stewing on maybe trying to rework the directionality,
so the arm-smmu-qcom.o code probes and calls arm_smmu_impl_init and
that is what initializes the arm_smmu_device_probe logic?

Alternatively, I'm considering trying to switch the module dependency
annotation so that the CONFIG_QCOM_SCM modularity depends on ARM_SMMU
being a module. But that is sort of putting the restriction on the
callee instead of the caller (sort of flipping the meaning of the
depends), which feels prone to later trouble (and with multiple users
of CONFIG_QCOM_SCM needing similar treatment, it would make it
difficult to discover the right combination of configs needed to allow
it to be a module).

Anyway, I wanted to reach out to see if you had any further ideas
here. Sorry for letting such a large time gap pass!

thanks
-john
Will Deacon Oct. 28, 2020, 1:51 p.m. UTC | #10
On Tue, Oct 27, 2020 at 10:53:47PM -0700, John Stultz wrote:
> On Mon, Jul 13, 2020 at 1:41 PM Will Deacon <will@kernel.org> wrote:
> > On Fri, Jul 10, 2020 at 03:21:53PM -0700, John Stultz wrote:
> > > On Fri, Jul 10, 2020 at 12:54 AM Will Deacon <will@kernel.org> wrote:
> > > > On Thu, Jul 09, 2020 at 08:28:45PM -0700, John Stultz wrote:
> > > > > On Thu, Jul 2, 2020 at 7:18 AM Will Deacon <will@kernel.org> wrote:
> > > > > > On Thu, Jun 25, 2020 at 12:10:39AM +0000, John Stultz wrote:
> > > > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > > > > > > index b510f67dfa49..714893535dd2 100644
> > > > > > > --- a/drivers/iommu/Kconfig
> > > > > > > +++ b/drivers/iommu/Kconfig
> > > > > > > @@ -381,6 +381,7 @@ config SPAPR_TCE_IOMMU
> > > > > > >  config ARM_SMMU
> > > > > > >       tristate "ARM Ltd. System MMU (SMMU) Support"
> > > > > > >       depends on (ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64)) && MMU
> > > > > > > +     depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y
> > > > > > >       select IOMMU_API
> > > > > > >       select IOMMU_IO_PGTABLE_LPAE
> > > > > > >       select ARM_DMA_USE_IOMMU if ARM
> > > > > >
> > > > > > This looks like a giant hack. Is there another way to handle this?
> > > > >
> > > > > Sorry for the slow response here.
> > > > >
> > > > > So, I agree the syntax looks strange (requiring a comment obviously
> > > > > isn't a good sign), but it's a fairly common way to ensure drivers
> > > > > don't get built in if they optionally depend on another driver that
> > > > > can be built as a module.
> > > > >   See "RFKILL || !RFKILL", "EXTCON || !EXTCON", or "USB_GADGET ||
> > > > > !USB_GADGET" in various Kconfig files.
> > > > >
> > > > > I'm open to using a different method, and in a different thread you
> > > > > suggested using something like symbol_get(). I need to look into it
> > > > > more, but that approach looks even more messy and prone to runtime
> > > > > failures. Blocking the unwanted case at build time seems a bit cleaner
> > > > > to me, even if the syntax is odd.
> > > >
> > > > Maybe just split it out then, so that the ARM_SMMU entry doesn't have this,
> > > > as that driver _really_ doesn't care about SoC details like this. In other
> > > > words, add a new entry along the lines of:
> > > >
> > > >         config ARM_SMMU_QCOM_IMPL
> > > >         default y
> > > >         #if QCOM_SCM=m this can't be =y
> > > >         depends on ARM_SMMU & (QCOM_SCM || !QCOM_SCM)
> > > >
> > > > and then have arm-smmu.h provide a static inline qcom_smmu_impl_init()
> > > > which returns -ENODEV if CONFIG_ARM_SMMU_QCOM_IMPL=n and hack the Makefile
> > > > so that we don't bother to compile arm-smmu-qcom.o in that case.
> > > >
> > > > Would that work?
> > >
> > > I think this proposal still has problems with the directionality of the call.
> > >
> > > The arm-smmu-impl.o calls to arm-smmu-qcom.o which calls qcom_scm.o
> > > So if qcom_scm.o is part of a module, the calling code in
> > > arm-smmu-qcom.o also needs to be a module, which means CONFIG_ARM_SMMU
> > > needs to be a module.
> > >
> > > I know you said the arm-smmu driver doesn't care about SoC details,
> > > but the trouble is that currently the arm-smmu driver does directly
> > > call the qcom-scm code. So it is a real dependency. However, if
> > > QCOM_SCM is not configured, it calls stubs and that's ok.  In that
> > > way, the "depends on QCOM_SCM || !QCOM_SCM" line actually makes sense.
> > > It looks terrible because we're used to boolean logic, but it's
> > > ternary.
> >
> > Yes, it looks ugly, but the part I really have issues with is that building
> > QCOM_SCM=m and ARM_SMMU=y is perfectly fine if you don't run on an SoC
> > with the qcom implementation. I don't see why we need to enforce things
> > here beyond making sure that all selectable permutations _build_ and
> > fail gracefully at runtime on the qcom SoC if SCM isn't available.
> 
> Hey Will,
>   Sorry to dredge up this old thread. I've been off busy with other
> things and didn't get around to trying to rework this until now.
> 
> Unfortunately I'm still having some trouble coming up with a better
> solution. Initially I figured I'd rework the qcom_scm driver to, so
> that we have the various qcom_scm_* as inline functions, which call
> out to function pointers that the qcom_scm driver would register when
> the module loaded (Oof, and unfortunately there are a *ton* qcom_scm_*
> functions so its a bunch of churn).
> 
> The trouble I realized with that approach is that if the ARM_SMMU code
> is built in, then it may try to use the qcom_scm code before the
> module loads and sets those function pointers. So while it would build
> ok, the issue would be when the arm_smmu_device_reset() is done by
> done on arm_smmu_device_probe(), it wouldn't actually call the right
> code.  There isn't a really good way to deal with the module loading
> at some random time after arm_smmu_device_probe() completes.
> 
> This is the benefit of the module symbol dependency tracking: If the
> arm_smmu.ko calls symbols in qcom_scm.ko then qcom_scm.ko has to load
> first.
> But if arm_smmu is built in, I haven't found a clear mechanism to
> force qcom_scm to load before we probe, if it's configured as a
> module.
> 
> I also looked into the idea of reworking the arm-smmu-impl code to be
> modular instead, and while it does provide a similar method of using
> function pointers to minimize the amount of symbols the arm-smmu code
> needs to know about, the initialization call path is
> arm_smmu_device_probe -> arm_smmu_impl_init -> qcom_smmu_impl_init. So
> it doesn't really allow for dynamic registration of implementation
> modules at runtime.
> 
> So I'm sort of stewing on maybe trying to rework the directionality,
> so the arm-smmu-qcom.o code probes and calls arm_smmu_impl_init and
> that is what initializes the arm_smmu_device_probe logic?
> 
> Alternatively, I'm considering trying to switch the module dependency
> annotation so that the CONFIG_QCOM_SCM modularity depends on ARM_SMMU
> being a module. But that is sort of putting the restriction on the
> callee instead of the caller (sort of flipping the meaning of the
> depends), which feels prone to later trouble (and with multiple users
> of CONFIG_QCOM_SCM needing similar treatment, it would make it
> difficult to discover the right combination of configs needed to allow
> it to be a module).
> 
> Anyway, I wanted to reach out to see if you had any further ideas
> here. Sorry for letting such a large time gap pass!

Well we can always go with your original hack, if it helps?

https://lore.kernel.org/linux-iommu/20200714075603.GE4277@willie-the-truck/

Will
Robin Murphy Oct. 28, 2020, 2:51 p.m. UTC | #11
On 2020-10-28 13:51, Will Deacon wrote:
> On Tue, Oct 27, 2020 at 10:53:47PM -0700, John Stultz wrote:
>> On Mon, Jul 13, 2020 at 1:41 PM Will Deacon <will@kernel.org> wrote:
>>> On Fri, Jul 10, 2020 at 03:21:53PM -0700, John Stultz wrote:
>>>> On Fri, Jul 10, 2020 at 12:54 AM Will Deacon <will@kernel.org> wrote:
>>>>> On Thu, Jul 09, 2020 at 08:28:45PM -0700, John Stultz wrote:
>>>>>> On Thu, Jul 2, 2020 at 7:18 AM Will Deacon <will@kernel.org> wrote:
>>>>>>> On Thu, Jun 25, 2020 at 12:10:39AM +0000, John Stultz wrote:
>>>>>>>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>>>>>>>> index b510f67dfa49..714893535dd2 100644
>>>>>>>> --- a/drivers/iommu/Kconfig
>>>>>>>> +++ b/drivers/iommu/Kconfig
>>>>>>>> @@ -381,6 +381,7 @@ config SPAPR_TCE_IOMMU
>>>>>>>>   config ARM_SMMU
>>>>>>>>        tristate "ARM Ltd. System MMU (SMMU) Support"
>>>>>>>>        depends on (ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64)) && MMU
>>>>>>>> +     depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y
>>>>>>>>        select IOMMU_API
>>>>>>>>        select IOMMU_IO_PGTABLE_LPAE
>>>>>>>>        select ARM_DMA_USE_IOMMU if ARM
>>>>>>>
>>>>>>> This looks like a giant hack. Is there another way to handle this?
>>>>>>
>>>>>> Sorry for the slow response here.
>>>>>>
>>>>>> So, I agree the syntax looks strange (requiring a comment obviously
>>>>>> isn't a good sign), but it's a fairly common way to ensure drivers
>>>>>> don't get built in if they optionally depend on another driver that
>>>>>> can be built as a module.
>>>>>>    See "RFKILL || !RFKILL", "EXTCON || !EXTCON", or "USB_GADGET ||
>>>>>> !USB_GADGET" in various Kconfig files.
>>>>>>
>>>>>> I'm open to using a different method, and in a different thread you
>>>>>> suggested using something like symbol_get(). I need to look into it
>>>>>> more, but that approach looks even more messy and prone to runtime
>>>>>> failures. Blocking the unwanted case at build time seems a bit cleaner
>>>>>> to me, even if the syntax is odd.
>>>>>
>>>>> Maybe just split it out then, so that the ARM_SMMU entry doesn't have this,
>>>>> as that driver _really_ doesn't care about SoC details like this. In other
>>>>> words, add a new entry along the lines of:
>>>>>
>>>>>          config ARM_SMMU_QCOM_IMPL
>>>>>          default y
>>>>>          #if QCOM_SCM=m this can't be =y
>>>>>          depends on ARM_SMMU & (QCOM_SCM || !QCOM_SCM)
>>>>>
>>>>> and then have arm-smmu.h provide a static inline qcom_smmu_impl_init()
>>>>> which returns -ENODEV if CONFIG_ARM_SMMU_QCOM_IMPL=n and hack the Makefile
>>>>> so that we don't bother to compile arm-smmu-qcom.o in that case.
>>>>>
>>>>> Would that work?
>>>>
>>>> I think this proposal still has problems with the directionality of the call.
>>>>
>>>> The arm-smmu-impl.o calls to arm-smmu-qcom.o which calls qcom_scm.o
>>>> So if qcom_scm.o is part of a module, the calling code in
>>>> arm-smmu-qcom.o also needs to be a module, which means CONFIG_ARM_SMMU
>>>> needs to be a module.
>>>>
>>>> I know you said the arm-smmu driver doesn't care about SoC details,
>>>> but the trouble is that currently the arm-smmu driver does directly
>>>> call the qcom-scm code. So it is a real dependency. However, if
>>>> QCOM_SCM is not configured, it calls stubs and that's ok.  In that
>>>> way, the "depends on QCOM_SCM || !QCOM_SCM" line actually makes sense.
>>>> It looks terrible because we're used to boolean logic, but it's
>>>> ternary.
>>>
>>> Yes, it looks ugly, but the part I really have issues with is that building
>>> QCOM_SCM=m and ARM_SMMU=y is perfectly fine if you don't run on an SoC
>>> with the qcom implementation. I don't see why we need to enforce things
>>> here beyond making sure that all selectable permutations _build_ and
>>> fail gracefully at runtime on the qcom SoC if SCM isn't available.
>>
>> Hey Will,
>>    Sorry to dredge up this old thread. I've been off busy with other
>> things and didn't get around to trying to rework this until now.
>>
>> Unfortunately I'm still having some trouble coming up with a better
>> solution. Initially I figured I'd rework the qcom_scm driver to, so
>> that we have the various qcom_scm_* as inline functions, which call
>> out to function pointers that the qcom_scm driver would register when
>> the module loaded (Oof, and unfortunately there are a *ton* qcom_scm_*
>> functions so its a bunch of churn).
>>
>> The trouble I realized with that approach is that if the ARM_SMMU code
>> is built in, then it may try to use the qcom_scm code before the
>> module loads and sets those function pointers. So while it would build
>> ok, the issue would be when the arm_smmu_device_reset() is done by
>> done on arm_smmu_device_probe(), it wouldn't actually call the right
>> code.  There isn't a really good way to deal with the module loading
>> at some random time after arm_smmu_device_probe() completes.
>>
>> This is the benefit of the module symbol dependency tracking: If the
>> arm_smmu.ko calls symbols in qcom_scm.ko then qcom_scm.ko has to load
>> first.
>> But if arm_smmu is built in, I haven't found a clear mechanism to
>> force qcom_scm to load before we probe, if it's configured as a
>> module.
>>
>> I also looked into the idea of reworking the arm-smmu-impl code to be
>> modular instead, and while it does provide a similar method of using
>> function pointers to minimize the amount of symbols the arm-smmu code
>> needs to know about, the initialization call path is
>> arm_smmu_device_probe -> arm_smmu_impl_init -> qcom_smmu_impl_init. So
>> it doesn't really allow for dynamic registration of implementation
>> modules at runtime.
>>
>> So I'm sort of stewing on maybe trying to rework the directionality,
>> so the arm-smmu-qcom.o code probes and calls arm_smmu_impl_init and
>> that is what initializes the arm_smmu_device_probe logic?
>>
>> Alternatively, I'm considering trying to switch the module dependency
>> annotation so that the CONFIG_QCOM_SCM modularity depends on ARM_SMMU
>> being a module. But that is sort of putting the restriction on the
>> callee instead of the caller (sort of flipping the meaning of the
>> depends), which feels prone to later trouble (and with multiple users
>> of CONFIG_QCOM_SCM needing similar treatment, it would make it
>> difficult to discover the right combination of configs needed to allow
>> it to be a module).
>>
>> Anyway, I wanted to reach out to see if you had any further ideas
>> here. Sorry for letting such a large time gap pass!
> 
> Well we can always go with your original hack, if it helps?
> 
> https://lore.kernel.org/linux-iommu/20200714075603.GE4277@willie-the-truck/

Hmm, perhaps I'm missing something here, but even if the config options 
*do* line up, what prevents arm-smmu probing before qcom-scm and 
dereferencing NULL in qcom_scm_qsmmu500_wait_safe_toggle() before __scm 
is initialised?

Robin.
John Stultz Oct. 30, 2020, 1:02 a.m. UTC | #12
On Wed, Oct 28, 2020 at 7:51 AM Robin Murphy <robin.murphy@arm.com> wrote:
> Hmm, perhaps I'm missing something here, but even if the config options
> *do* line up, what prevents arm-smmu probing before qcom-scm and
> dereferencing NULL in qcom_scm_qsmmu500_wait_safe_toggle() before __scm
> is initialised?

Oh man, this spun me on a "wait, but how does it all work!" trip. :)

So in the non-module case, the qcom_scm driver is a subsys_initcall
and the arm-smmu is a module_platform_driver, so the ordering works
out.

In the module case, the arm-smmu code isn't loaded until the qcom_scm
driver finishes probing due to the symbol dependency handling.

To double check this, I added a big msleep at the top of the
qcom_scm_probe to try to open the race window you described, but the
arm_smmu_device_probe() doesn't run until after qcom_scm_probe
completes.

So at least as a built in / built in, or a module/module case its ok.
And in the case where arm-smmu is a module and qcom_scm is built in
that's ok too.

Its just the case my patch is trying to prevent is where arm-smmu is
built in, but qcom_scm is a module that it can't work (due to build
errors in missing symbols,  or if we tried to use function pointers to
plug in the qcom_scm - the lack of initialization ordering).

Hopefully that addresses your concern? Let me know if I'm still
missing something.

thanks
-john
Robin Murphy Oct. 30, 2020, 2:12 p.m. UTC | #13
On 2020-10-30 01:02, John Stultz wrote:
> On Wed, Oct 28, 2020 at 7:51 AM Robin Murphy <robin.murphy@arm.com> wrote:

>> Hmm, perhaps I'm missing something here, but even if the config options

>> *do* line up, what prevents arm-smmu probing before qcom-scm and

>> dereferencing NULL in qcom_scm_qsmmu500_wait_safe_toggle() before __scm

>> is initialised?

> 

> Oh man, this spun me on a "wait, but how does it all work!" trip. :)

> 

> So in the non-module case, the qcom_scm driver is a subsys_initcall

> and the arm-smmu is a module_platform_driver, so the ordering works

> out.

> 

> In the module case, the arm-smmu code isn't loaded until the qcom_scm

> driver finishes probing due to the symbol dependency handling.


My point is that module load != driver probe. AFAICS you could disable 
drivers_autoprobe, load both modules, bind the SMMU to its driver first, 
and boom!

> To double check this, I added a big msleep at the top of the

> qcom_scm_probe to try to open the race window you described, but the

> arm_smmu_device_probe() doesn't run until after qcom_scm_probe

> completes.


I don't think asynchronous probing is enabled by default, so indeed I 
would expect that to still happen to work ;)

> So at least as a built in / built in, or a module/module case its ok.

> And in the case where arm-smmu is a module and qcom_scm is built in

> that's ok too.


In the built-in case, I'm sure it happens to work out similarly because 
the order of nodes in the DTB tends to be the order in which devices are 
autoprobed. Again, async probe might be enough to break things; manually 
binding drivers definitely should; moving the firmware node to the end 
of the DTB probably would as well.

> Its just the case my patch is trying to prevent is where arm-smmu is

> built in, but qcom_scm is a module that it can't work (due to build

> errors in missing symbols,  or if we tried to use function pointers to

> plug in the qcom_scm - the lack of initialization ordering).

> 

> Hopefully that addresses your concern? Let me know if I'm still

> missing something.


What I was dancing around is that the SCM API (and/or its users) appears 
to need a general way to tell whether SCM is ready or not, because the 
initialisation ordering problem is there anyway. Neither Kconfig nor the 
module loader can solve that.

One possible self-contained workaround would be to see if an SCM DT node 
exists, see if a corresponding device exists, and see if that device has 
a driver bound. It's a little ugly, and strictly it still doesn't tell 
you that the _right_ driver is bound, but at least it's a lot more 
robust than implicitly relying on DT order, default probing behaviours, 
and hope.

Robin.
John Stultz Oct. 31, 2020, 12:12 a.m. UTC | #14
On Fri, Oct 30, 2020 at 7:12 AM Robin Murphy <robin.murphy@arm.com> wrote:
> On 2020-10-30 01:02, John Stultz wrote:

> > On Wed, Oct 28, 2020 at 7:51 AM Robin Murphy <robin.murphy@arm.com> wrote:

> >> Hmm, perhaps I'm missing something here, but even if the config options

> >> *do* line up, what prevents arm-smmu probing before qcom-scm and

> >> dereferencing NULL in qcom_scm_qsmmu500_wait_safe_toggle() before __scm

> >> is initialised?

> >

> > Oh man, this spun me on a "wait, but how does it all work!" trip. :)

> >

> > So in the non-module case, the qcom_scm driver is a subsys_initcall

> > and the arm-smmu is a module_platform_driver, so the ordering works

> > out.

> >

> > In the module case, the arm-smmu code isn't loaded until the qcom_scm

> > driver finishes probing due to the symbol dependency handling.

>

> My point is that module load != driver probe. AFAICS you could disable

> drivers_autoprobe, load both modules, bind the SMMU to its driver first,

> and boom!

>

> > To double check this, I added a big msleep at the top of the

> > qcom_scm_probe to try to open the race window you described, but the

> > arm_smmu_device_probe() doesn't run until after qcom_scm_probe

> > completes.

>

> I don't think asynchronous probing is enabled by default, so indeed I

> would expect that to still happen to work ;)

>

> > So at least as a built in / built in, or a module/module case its ok.

> > And in the case where arm-smmu is a module and qcom_scm is built in

> > that's ok too.

>

> In the built-in case, I'm sure it happens to work out similarly because

> the order of nodes in the DTB tends to be the order in which devices are

> autoprobed. Again, async probe might be enough to break things; manually

> binding drivers definitely should; moving the firmware node to the end

> of the DTB probably would as well.


So, with modules, I turned on async probing for the two drivers, as
well as moved the firmware node to the end of the dtb, and couldn't
manage to trip it up even with a 6 second delay in the qcom_scm probe
function.

It was only when doing the same with everything built in that I did
manage to trigger the issue. So yes, you're right!  And it is an issue
more easily tripped with everything built in statically (and not
really connected to this patch series).

> > Its just the case my patch is trying to prevent is where arm-smmu is

> > built in, but qcom_scm is a module that it can't work (due to build

> > errors in missing symbols,  or if we tried to use function pointers to

> > plug in the qcom_scm - the lack of initialization ordering).

> >

> > Hopefully that addresses your concern? Let me know if I'm still

> > missing something.

>

> What I was dancing around is that the SCM API (and/or its users) appears

> to need a general way to tell whether SCM is ready or not, because the

> initialisation ordering problem is there anyway. Neither Kconfig nor the

> module loader can solve that.


Hrm... There is qcom_scm_is_available().  I tried adding a check for
that in qcom_smmu_impl_init() and return EPROBE_DEFER if it fails, but
I then hit a separate crash (tripping in iommu_group_remove_device on
a null dev->iommu_group value). But after adding a check for that, it
seems to be working...

I'll try to spin up a separate patch here for that in a second.

thanks
-john