diff mbox

[v2,1/4] remoteproc: Introduce auto-boot flag

Message ID 1470952373-4823-1-git-send-email-bjorn.andersson@linaro.org
State Accepted
Commit ddf711872c9d2b05b0fb25db3e6e0c2a50be39e3
Headers show

Commit Message

Bjorn Andersson Aug. 11, 2016, 9:52 p.m. UTC
Introduce an "auto-boot" flag on rprocs to make it possible to flag
remote processors without vdevs to automatically boot once the firmware
is found.

Preserve previous behavior of the wkup_m3 processor being explicitly
booted by a consumer.

Cc: Lee Jones <lee.jones@linaro.org>
Cc: Loic Pallardy <loic.pallardy@st.com>
Cc: Suman Anna <s-anna@ti.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

---

Changes since v1:
- s/always_on/auto_boot/
- Fixed double increment of "power" in recover path
- Marked wkup_m3 to not auto_boot

 drivers/remoteproc/remoteproc_core.c   | 28 +++++++++++++++++++++++++++-
 drivers/remoteproc/remoteproc_virtio.c | 13 -------------
 drivers/remoteproc/wkup_m3_rproc.c     |  2 ++
 include/linux/remoteproc.h             |  1 +
 4 files changed, 30 insertions(+), 14 deletions(-)

-- 
2.5.0

Comments

Suman Anna Aug. 31, 2016, 6:27 p.m. UTC | #1
Hi Bjorn,

On 08/11/2016 04:52 PM, Bjorn Andersson wrote:
> Introduce an "auto-boot" flag on rprocs to make it possible to flag

> remote processors without vdevs to automatically boot once the firmware

> is found.

> 

> Preserve previous behavior of the wkup_m3 processor being explicitly

> booted by a consumer.

> 

> Cc: Lee Jones <lee.jones@linaro.org>

> Cc: Loic Pallardy <loic.pallardy@st.com>

> Cc: Suman Anna <s-anna@ti.com>

> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

> ---

> 

> Changes since v1:

> - s/always_on/auto_boot/

> - Fixed double increment of "power" in recover path

> - Marked wkup_m3 to not auto_boot

> 


I am seeing various issues with this series as I am testing this series
more thoroughly with various TI remoteproc drivers and IPC stacks based
on virtio devices. I use very simple firmware images that publishes the
rpmsg-client-sample devices, so that I can use the kernel
rpmsg_client_sample to test the communication.

Here's a summary of the main issues:

1. The rproc_boot holds a module reference count to the remoteproc
platform driver so that it cannot be removed when a remote processor is
booted. The previous architecture allowed virtio_rpmsg_bus or the
platform remoteproc driver to be installed independent of each other
with the boot actually getting triggered when the virtio_rpmsg_bus gets
probed in find_vqs. The boot now happens when just the platform
remoteproc driver is installed independent of virtio_rpmsg_bus and
results in holding self-reference counts. This makes it impossible to
remove the remoteproc platform module cleanly (we shouldn't be imposing
force remove), which means we can't stop the remoteprocs properly.

2. The reversal of boot between virtio_rpmsg_bus and remoteproc core
also meant that the virtio devices and therefore the memory for vrings
are allocated at the time virtio_rpmsg_bus is probed in find_vqs(). The
remoteproc can be booted without the virtio_rpmsg_bus module installed.
We do use the allocated dma addresses of the vrings in the published
resource table, but now the remote processor is up even before these
values are filled in. I had to actually move up the rproc_alloc_vring
alongside rproc_parse_vring to have the communication up.

3. The remoteproc platform driver cannot be removed previously when the
corresponding virtio devices are probed/configured properly and all the
communication flow w.r.t rpmsg channel publishing followed from the
remoteproc boot. These channels are child devices of the parent virtio
devices, and since the remoteproc boot/shutdown followed the virtio
device probe/removal lifecycle, the rpmsg channels life-cycle followed
that of the parent virtio device. My communication paths are now failing
if I remove the virtio_rpmsg_bus and insmod it again as the vrings and
vring buffers are configured again while the remoteproc is still
running. Also, since the remoteproc is not rebooted, the previously
published rpmsg channels are stale and they won't get recreated.

In summary, the current patches worked nicely in a error recovery
scenario but are not working properly with the various combinations of
module insertion/removal process.

regards
Suman
Bjorn Andersson Sept. 8, 2016, 10:27 p.m. UTC | #2
On Wed 31 Aug 11:27 PDT 2016, Suman Anna wrote:

> Hi Bjorn,

> 

> On 08/11/2016 04:52 PM, Bjorn Andersson wrote:

> > Introduce an "auto-boot" flag on rprocs to make it possible to flag

> > remote processors without vdevs to automatically boot once the firmware

> > is found.

> > 

> > Preserve previous behavior of the wkup_m3 processor being explicitly

> > booted by a consumer.

> > 

> > Cc: Lee Jones <lee.jones@linaro.org>

> > Cc: Loic Pallardy <loic.pallardy@st.com>

> > Cc: Suman Anna <s-anna@ti.com>

> > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

> > ---

> > 

> > Changes since v1:

> > - s/always_on/auto_boot/

> > - Fixed double increment of "power" in recover path

> > - Marked wkup_m3 to not auto_boot

> > 

> 

> I am seeing various issues with this series as I am testing this series

> more thoroughly with various TI remoteproc drivers and IPC stacks based

> on virtio devices. I use very simple firmware images that publishes the

> rpmsg-client-sample devices, so that I can use the kernel

> rpmsg_client_sample to test the communication.

> 


Thanks for bringing these up!

> Here's a summary of the main issues:

> 

> 1. The rproc_boot holds a module reference count to the remoteproc

> platform driver so that it cannot be removed when a remote processor is

> booted. The previous architecture allowed virtio_rpmsg_bus or the

> platform remoteproc driver to be installed independent of each other

> with the boot actually getting triggered when the virtio_rpmsg_bus gets

> probed in find_vqs. The boot now happens when just the platform

> remoteproc driver is installed independent of virtio_rpmsg_bus and

> results in holding self-reference counts. This makes it impossible to

> remove the remoteproc platform module cleanly (we shouldn't be imposing

> force remove), which means we can't stop the remoteprocs properly.

> 


I've always found the dependency on rpmsg awkward and don't like this
behavior to depend on the firmware content, but rather on some sort of
system configuration.

That said, I did not intend to break the ability of shutting down and
unloading the module.


Calling rmmod on your rproc module is a rather explicit operation, do
you know why there's a need to hold the module reference? Wouldn't it be
cleaner to just shutdown the remoteproc as the module is being removed -
which would include tearing down all children (rpmsg and others)

I expect to be able to compile rpmsg into my kernel and still be able to
unload my remoteproc module.

> 2. The reversal of boot between virtio_rpmsg_bus and remoteproc core

> also meant that the virtio devices and therefore the memory for vrings

> are allocated at the time virtio_rpmsg_bus is probed in find_vqs(). The

> remoteproc can be booted without the virtio_rpmsg_bus module installed.

> We do use the allocated dma addresses of the vrings in the published

> resource table, but now the remote processor is up even before these

> values are filled in. I had to actually move up the rproc_alloc_vring

> alongside rproc_parse_vring to have the communication up.

> 


This also means that for a piece of firmware that exposes more than one
virtio device we would have the same race.

As you say it makes more sense to allocate the vrings as we set up the
other resources.

> 3. The remoteproc platform driver cannot be removed previously when the

> corresponding virtio devices are probed/configured properly and all the

> communication flow w.r.t rpmsg channel publishing followed from the

> remoteproc boot. These channels are child devices of the parent virtio

> devices, and since the remoteproc boot/shutdown followed the virtio

> device probe/removal lifecycle, the rpmsg channels life-cycle followed

> that of the parent virtio device. My communication paths are now failing

> if I remove the virtio_rpmsg_bus and insmod it again as the vrings and

> vring buffers are configured again while the remoteproc is still

> running. Also, since the remoteproc is not rebooted, the previously

> published rpmsg channels are stale and they won't get recreated.

> 


So in essence this only worked because rmmod'ing the virtio_rpmsg_bus
would shut down the remoteproc(?) What if the firmware exposed a
VIRTIO_ID_NET and a VIRTIO_ID_RPMSG and you rmmod one of them?


The vrings are in use by the remote side, so allocating those with the
other remoteproc resources makes more sense, as above.

But virtio_rpmsg_bus needs to detach all buffers from the rings before
going away, so we don't get any writes to those after releasing the dma
allocation.


We will be sending out RPMSG_NS_DESTROY for some of the channels, but as
far as I can tell from the rpmsg implementation there is no support in
the protocol for your use case of dropping one end of the rpmsg channels
without restarting the firmware and(/or?) vdevs.

> In summary, the current patches worked nicely in a error recovery

> scenario but are not working properly with the various combinations of

> module insertion/removal process.

> 


Thanks for your feedback Suman. I think we should definitely fix #1 and
#2, but I'm uncertain to what extent #3 can be fixed.

Regards,
Bjorn
Suman Anna Sept. 16, 2016, 11:58 p.m. UTC | #3
Hi Bjorn,

On 09/08/2016 05:27 PM, Bjorn Andersson wrote:
> On Wed 31 Aug 11:27 PDT 2016, Suman Anna wrote:

> 

>> Hi Bjorn,

>>

>> On 08/11/2016 04:52 PM, Bjorn Andersson wrote:

>>> Introduce an "auto-boot" flag on rprocs to make it possible to flag

>>> remote processors without vdevs to automatically boot once the firmware

>>> is found.

>>>

>>> Preserve previous behavior of the wkup_m3 processor being explicitly

>>> booted by a consumer.

>>>

>>> Cc: Lee Jones <lee.jones@linaro.org>

>>> Cc: Loic Pallardy <loic.pallardy@st.com>

>>> Cc: Suman Anna <s-anna@ti.com>

>>> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

>>> ---

>>>

>>> Changes since v1:

>>> - s/always_on/auto_boot/

>>> - Fixed double increment of "power" in recover path

>>> - Marked wkup_m3 to not auto_boot

>>>

>>

>> I am seeing various issues with this series as I am testing this series

>> more thoroughly with various TI remoteproc drivers and IPC stacks based

>> on virtio devices. I use very simple firmware images that publishes the

>> rpmsg-client-sample devices, so that I can use the kernel

>> rpmsg_client_sample to test the communication.

>>

> 

> Thanks for bringing these up!

> 

>> Here's a summary of the main issues:

>>

>> 1. The rproc_boot holds a module reference count to the remoteproc

>> platform driver so that it cannot be removed when a remote processor is

>> booted. The previous architecture allowed virtio_rpmsg_bus or the

>> platform remoteproc driver to be installed independent of each other

>> with the boot actually getting triggered when the virtio_rpmsg_bus gets

>> probed in find_vqs. The boot now happens when just the platform

>> remoteproc driver is installed independent of virtio_rpmsg_bus and

>> results in holding self-reference counts. This makes it impossible to

>> remove the remoteproc platform module cleanly (we shouldn't be imposing

>> force remove), which means we can't stop the remoteprocs properly.

>>

> 

> I've always found the dependency on rpmsg awkward and don't like this

> behavior to depend on the firmware content, but rather on some sort of

> system configuration.

> 

> That said, I did not intend to break the ability of shutting down and

> unloading the module.


The remoteproc infrastructure always allowed the boot to be done by an
external module (with vdevs, it looks intrinsic because of the
invocation in remoteproc_virtio.c, but the boot was really triggered
during virtio_rpmsg probe) and provided protection against removing the
remoteproc driver removal when you had a consumer. I tried the auto-boot
when I was upstreaming the wkup_m3_rproc driver and it was rejected for
exactly this reason.

> Calling rmmod on your rproc module is a rather explicit operation, do

> you know why there's a need to hold the module reference? Wouldn't it be

> cleaner to just shutdown the remoteproc as the module is being removed -

> which would include tearing down all children (rpmsg and others)


Right, the introduction of the auto-boot ended up in a self-holding
reference count which should be fixed to allow you to remove the module.
The external module boot is still a valid usage (when auto_boot is
false) though.

> 

> I expect to be able to compile rpmsg into my kernel and still be able to

> unload my remoteproc module.

> 

>> 2. The reversal of boot between virtio_rpmsg_bus and remoteproc core

>> also meant that the virtio devices and therefore the memory for vrings

>> are allocated at the time virtio_rpmsg_bus is probed in find_vqs(). The

>> remoteproc can be booted without the virtio_rpmsg_bus module installed.

>> We do use the allocated dma addresses of the vrings in the published

>> resource table, but now the remote processor is up even before these

>> values are filled in. I had to actually move up the rproc_alloc_vring

>> alongside rproc_parse_vring to have the communication up.

>>

> 

> This also means that for a piece of firmware that exposes more than one

> virtio device we would have the same race.


Yes, if you had more virtio devices technically. The remoteproc
infrastructure so far hadn't supported more than one vdev. We did
support that in our downstream kernel but that was for a non-SMP usage
on a dual-core remoteproc subsystem and the virtio devices were still
virtio_rpmsg devices, so it scaled for us when we removed the
virtio_rpmsg_bus module as long as the reference counts were managed in
rproc_boot and rproc_shutdown()

> 

> As you say it makes more sense to allocate the vrings as we set up the

> other resources.

> 

>> 3. The remoteproc platform driver cannot be removed previously when the

>> corresponding virtio devices are probed/configured properly and all the

>> communication flow w.r.t rpmsg channel publishing followed from the

>> remoteproc boot. These channels are child devices of the parent virtio

>> devices, and since the remoteproc boot/shutdown followed the virtio

>> device probe/removal lifecycle, the rpmsg channels life-cycle followed

>> that of the parent virtio device. My communication paths are now failing

>> if I remove the virtio_rpmsg_bus and insmod it again as the vrings and

>> vring buffers are configured again while the remoteproc is still

>> running. Also, since the remoteproc is not rebooted, the previously

>> published rpmsg channels are stale and they won't get recreated.

>>

> 

> So in essence this only worked because rmmod'ing the virtio_rpmsg_bus

> would shut down the remoteproc(?) What if the firmware exposed a

> VIRTIO_ID_NET and a VIRTIO_ID_RPMSG and you rmmod one of them?


Yes, it's a problem if firmware exposed different virtio devices, but
like I said above, it was not supported so far. It is definitely
something we should consider going forward. Also, I would think the
VIRTIO_ID_RPMSG usage is different from VIRTIO_ID_NET or VIRTIO_CONSOLE.
The rpmsg usage does provide a bus infrastructure allowing the remote
side to publish different rpmsg devices.

> 

> 

> The vrings are in use by the remote side, so allocating those with the

> other remoteproc resources makes more sense, as above.

> 

> But virtio_rpmsg_bus needs to detach all buffers from the rings before

> going away, so we don't get any writes to those after releasing the dma

> allocation.


They do get detached alright from Linux-side but obviously you are now
creating an additional synchronization to the remoteproc side that they
cannot use these as you have freed up the memory.

> 

> We will be sending out RPMSG_NS_DESTROY for some of the channels, but as

> far as I can tell from the rpmsg implementation there is no support in

> the protocol for your use case of dropping one end of the rpmsg channels

> without restarting the firmware and(/or?) vdevs.


Right, this also causes new synchronization problems when you reprobe
and the remoteproc side has to republish the channels. The existing
remoteproc infrastructure was designed around this fact - rpmsg channels
can come and go (if needed) from the remoteproc-side, and the flow is no
different from regular boot vs error recovery, and since they are tied
to their parent virtio_rpmsg device, removing the communication path
ensured that.

> 

>> In summary, the current patches worked nicely in a error recovery

>> scenario but are not working properly with the various combinations of

>> module insertion/removal process.

>>

> 

> Thanks for your feedback Suman. I think we should definitely fix #1 and

> #2, but I'm uncertain to what extent #3 can be fixed.


The only solution I can think of is to reverse the module dependency as
well to follow the reversal of the boot dependency, so that
virtio_rpmsg_bus cannot be removed if you have a remoteproc supporting
virtio devices is booted and running once the virtio_rpmsg has probed. I
don't see any reasons to introduce new synchronization issues and force
the firmwares to change (having to republish channels) because of this
boot triggering reversal.

I am concerned on the impact as 4.9 is going to an LTS, with most of the
Si vendors using the LTS for their new software releases.

regards
Suman
Bjorn Andersson Sept. 19, 2016, 11:24 p.m. UTC | #4
On Fri 16 Sep 16:58 PDT 2016, Suman Anna wrote:

> Hi Bjorn,

> 

> On 09/08/2016 05:27 PM, Bjorn Andersson wrote:

> > On Wed 31 Aug 11:27 PDT 2016, Suman Anna wrote:

> > 

> >> Hi Bjorn,

> >>

> >> On 08/11/2016 04:52 PM, Bjorn Andersson wrote:

> >>> Introduce an "auto-boot" flag on rprocs to make it possible to flag

> >>> remote processors without vdevs to automatically boot once the firmware

> >>> is found.

> >>>

> >>> Preserve previous behavior of the wkup_m3 processor being explicitly

> >>> booted by a consumer.

> >>>

> >>> Cc: Lee Jones <lee.jones@linaro.org>

> >>> Cc: Loic Pallardy <loic.pallardy@st.com>

> >>> Cc: Suman Anna <s-anna@ti.com>

> >>> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

> >>> ---

> >>>

> >>> Changes since v1:

> >>> - s/always_on/auto_boot/

> >>> - Fixed double increment of "power" in recover path

> >>> - Marked wkup_m3 to not auto_boot

> >>>

> >>

> >> I am seeing various issues with this series as I am testing this series

> >> more thoroughly with various TI remoteproc drivers and IPC stacks based

> >> on virtio devices. I use very simple firmware images that publishes the

> >> rpmsg-client-sample devices, so that I can use the kernel

> >> rpmsg_client_sample to test the communication.

> >>

> > 

> > Thanks for bringing these up!

> > 

> >> Here's a summary of the main issues:

> >>

> >> 1. The rproc_boot holds a module reference count to the remoteproc

> >> platform driver so that it cannot be removed when a remote processor is

> >> booted. The previous architecture allowed virtio_rpmsg_bus or the

> >> platform remoteproc driver to be installed independent of each other

> >> with the boot actually getting triggered when the virtio_rpmsg_bus gets

> >> probed in find_vqs. The boot now happens when just the platform

> >> remoteproc driver is installed independent of virtio_rpmsg_bus and

> >> results in holding self-reference counts. This makes it impossible to

> >> remove the remoteproc platform module cleanly (we shouldn't be imposing

> >> force remove), which means we can't stop the remoteprocs properly.

> >>

> > 

> > I've always found the dependency on rpmsg awkward and don't like this

> > behavior to depend on the firmware content, but rather on some sort of

> > system configuration.

> > 

> > That said, I did not intend to break the ability of shutting down and

> > unloading the module.

> 

> The remoteproc infrastructure always allowed the boot to be done by an

> external module


The only caller of rproc_boot() in the mainline kernel is the
wkup_m3_ipc, which spawns a thread and waits for the remoteproc-internal
completion that indicates when the firmware is available and then call
rproc_boot().

I'm afraid I would much prefer the wkup_m3_rproc to just set auto-boot
and not work around the design like this.


That said, we still provide the support for this.

> (with vdevs, it looks intrinsic because of the

> invocation in remoteproc_virtio.c, but the boot was really triggered

> during virtio_rpmsg probe)


Except for the fact that the first vdev does this as a direct result of
the call from rproc_fw_config_virtio().

As stated above, the only other caller of rproc_boot() is the wkup_m3
driver and although it's not executing in the context of
rproc_fw_config_virtio() it's directly tied into its execution.

> and provided protection against removing the remoteproc driver removal

> when you had a consumer.  I tried the auto-boot when I was upstreaming

> the wkup_m3_rproc driver and it was rejected for exactly this reason.


One of the problems I have with the design chosen in the wkup_m3 case is
if you had to deal with crash recovery, how would you sync the
wkup_m3_ipc driver operations to the async recovery?

Flipping this the other way around and accepting that the "function
device" follows the RPROC_RUNNING state of the rproc implicitly would
allow this.

> 

> > Calling rmmod on your rproc module is a rather explicit operation, do

> > you know why there's a need to hold the module reference? Wouldn't it be

> > cleaner to just shutdown the remoteproc as the module is being removed -

> > which would include tearing down all children (rpmsg and others)

> 

> Right, the introduction of the auto-boot ended up in a self-holding

> reference count which should be fixed to allow you to remove the module.

> The external module boot is still a valid usage (when auto_boot is

> false) though.


The way this is dealt with in other subsystems is that when a client
acquire a handle to something exposed by the implementation it will
reference the implementation.

With remoteproc I can acquire a reference to a remoteproc, then unload
the implementation and when the client calls rproc_boot() the core will
try to dereference dev->parent->driver->owner to try_module_get() it; if
we're lucky that will fail as the driver is gone, but I would assume we
would panic here instead, as driver is no longer a valid pointer.

> 

> > 

> > I expect to be able to compile rpmsg into my kernel and still be able to

> > unload my remoteproc module.

> > 

> >> 2. The reversal of boot between virtio_rpmsg_bus and remoteproc core

> >> also meant that the virtio devices and therefore the memory for vrings

> >> are allocated at the time virtio_rpmsg_bus is probed in find_vqs(). The

> >> remoteproc can be booted without the virtio_rpmsg_bus module installed.

> >> We do use the allocated dma addresses of the vrings in the published

> >> resource table, but now the remote processor is up even before these

> >> values are filled in. I had to actually move up the rproc_alloc_vring

> >> alongside rproc_parse_vring to have the communication up.

> >>

> > 

> > This also means that for a piece of firmware that exposes more than one

> > virtio device we would have the same race.

> 

> Yes, if you had more virtio devices technically. The remoteproc

> infrastructure so far hadn't supported more than one vdev.


I don't see anything preventing you from putting more than one vdev in
your resource table. There will however be a race on getting the vrings
allocated before the firmware needs them.

> We did

> support that in our downstream kernel but that was for a non-SMP usage

> on a dual-core remoteproc subsystem and the virtio devices were still

> virtio_rpmsg devices, so it scaled for us when we removed the

> virtio_rpmsg_bus module as long as the reference counts were managed in

> rproc_boot and rproc_shutdown()

> 


I think the shutdown case would work in either way, but the behavior of
rmmod rpmsg && insmod rpmsg still depends on other things.

> > 

> > As you say it makes more sense to allocate the vrings as we set up the

> > other resources.

> > 

> >> 3. The remoteproc platform driver cannot be removed previously when the

> >> corresponding virtio devices are probed/configured properly and all the

> >> communication flow w.r.t rpmsg channel publishing followed from the

> >> remoteproc boot. These channels are child devices of the parent virtio

> >> devices, and since the remoteproc boot/shutdown followed the virtio

> >> device probe/removal lifecycle, the rpmsg channels life-cycle followed

> >> that of the parent virtio device. My communication paths are now failing

> >> if I remove the virtio_rpmsg_bus and insmod it again as the vrings and

> >> vring buffers are configured again while the remoteproc is still

> >> running. Also, since the remoteproc is not rebooted, the previously

> >> published rpmsg channels are stale and they won't get recreated.

> >>

> > 

> > So in essence this only worked because rmmod'ing the virtio_rpmsg_bus

> > would shut down the remoteproc(?) What if the firmware exposed a

> > VIRTIO_ID_NET and a VIRTIO_ID_RPMSG and you rmmod one of them?

> 

> Yes, it's a problem if firmware exposed different virtio devices, but

> like I said above, it was not supported so far.


It's not prevented, the only reason I can find for it not working is
what I'm arguing about - that the resource handling will be racy.

> It is definitely

> something we should consider going forward.


We had a question related to supporting multiple VIRTIO_ID_NET devices
per remoteproc on LKML last week.

> Also, I would think the

> VIRTIO_ID_RPMSG usage is different from VIRTIO_ID_NET or VIRTIO_CONSOLE.

> The rpmsg usage does provide a bus infrastructure allowing the remote

> side to publish different rpmsg devices.

> 


Maybe I'm missing something here, but if I put any other VIRTIO_ID in
the "id" member of fw_rsc_vdev the virtio core will probe that driver
and it will call the find_vqs callback and there's no limit on the
number of fw_rsc_vdev entries in the resource table.

So the use case of having a remoteproc firmware with 8 VIRTIO_ID_NET
entries should work just fine - except for only the first one being
initialized as the firmware boots (unless we finish this restructure).

> > 

> > 

> > The vrings are in use by the remote side, so allocating those with the

> > other remoteproc resources makes more sense, as above.

> > 

> > But virtio_rpmsg_bus needs to detach all buffers from the rings before

> > going away, so we don't get any writes to those after releasing the dma

> > allocation.

> 

> They do get detached alright from Linux-side but obviously you are now

> creating an additional synchronization to the remoteproc side that they

> cannot use these as you have freed up the memory.

> 


Yeah, that's not acceptable, the backing memory must at least follow the
running state of the rproc, so that it is available once we boot the
core.

> > 

> > We will be sending out RPMSG_NS_DESTROY for some of the channels, but as

> > far as I can tell from the rpmsg implementation there is no support in

> > the protocol for your use case of dropping one end of the rpmsg channels

> > without restarting the firmware and(/or?) vdevs.

> 

> Right, this also causes new synchronization problems when you reprobe

> and the remoteproc side has to republish the channels.


From the code it seems like the other virtio devices would handle this.

> The existing

> remoteproc infrastructure was designed around this fact - rpmsg channels

> can come and go (if needed) from the remoteproc-side, and the flow is no

> different from regular boot vs error recovery, and since they are tied

> to their parent virtio_rpmsg device, removing the communication path

> ensured that.


There are two levels here, the first is the virtio level which in the
case of rpmsg seems to be required to follow the remoteproc
RPROC_RUNNING state; the other being rpmsg channels have nothing to do
with remoteproc, but can as you say come and go dynamically as either
side like.

But the only problem I can see is if the firmware does not re-announce
channels after we reset the virtio device.

> 

> > 

> >> In summary, the current patches worked nicely in a error recovery

> >> scenario but are not working properly with the various combinations of

> >> module insertion/removal process.

> >>

> > 

> > Thanks for your feedback Suman. I think we should definitely fix #1 and

> > #2, but I'm uncertain to what extent #3 can be fixed.

> 

> The only solution I can think of is to reverse the module dependency as

> well to follow the reversal of the boot dependency, so that

> virtio_rpmsg_bus cannot be removed if you have a remoteproc supporting

> virtio devices is booted and running once the virtio_rpmsg has probed.


Can you please help me understand why it's important to protect either
module from being unloaded?

> I

> don't see any reasons to introduce new synchronization issues and force

> the firmwares to change (having to republish channels) because of this

> boot triggering reversal.

> 


The problem is not the boot triggering reversal - as there's no such
thing, the issue shows because I decoupled the rproc life cycle from one
of its child's.

> I am concerned on the impact as 4.9 is going to an LTS, with most of the

> Si vendors using the LTS for their new software releases.

> 


I'm sorry, but could you help me understand how rpmsg is used downstream
for this to be of concern, how is module unloading used beyond
development?

Per my own arguments I will provide some patches to correct the vring
buffer allocation and I'll look into the module locking.


PS. I really would like to see >0 users of this framework in mainline,
so we can reason about things on the basis of what's actually there!

Regards,
Bjorn
Suman Anna Sept. 20, 2016, 9:29 p.m. UTC | #5
Hi Bjorn,

>>>>

>>>> On 08/11/2016 04:52 PM, Bjorn Andersson wrote:

>>>>> Introduce an "auto-boot" flag on rprocs to make it possible to flag

>>>>> remote processors without vdevs to automatically boot once the firmware

>>>>> is found.

>>>>>

>>>>> Preserve previous behavior of the wkup_m3 processor being explicitly

>>>>> booted by a consumer.

>>>>>

>>>>> Cc: Lee Jones <lee.jones@linaro.org>

>>>>> Cc: Loic Pallardy <loic.pallardy@st.com>

>>>>> Cc: Suman Anna <s-anna@ti.com>

>>>>> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

>>>>> ---

>>>>>

>>>>> Changes since v1:

>>>>> - s/always_on/auto_boot/

>>>>> - Fixed double increment of "power" in recover path

>>>>> - Marked wkup_m3 to not auto_boot

>>>>>

>>>>

>>>> I am seeing various issues with this series as I am testing this series

>>>> more thoroughly with various TI remoteproc drivers and IPC stacks based

>>>> on virtio devices. I use very simple firmware images that publishes the

>>>> rpmsg-client-sample devices, so that I can use the kernel

>>>> rpmsg_client_sample to test the communication.

>>>>

>>>

>>> Thanks for bringing these up!

>>>

>>>> Here's a summary of the main issues:

>>>>

>>>> 1. The rproc_boot holds a module reference count to the remoteproc

>>>> platform driver so that it cannot be removed when a remote processor is

>>>> booted. The previous architecture allowed virtio_rpmsg_bus or the

>>>> platform remoteproc driver to be installed independent of each other

>>>> with the boot actually getting triggered when the virtio_rpmsg_bus gets

>>>> probed in find_vqs. The boot now happens when just the platform

>>>> remoteproc driver is installed independent of virtio_rpmsg_bus and

>>>> results in holding self-reference counts. This makes it impossible to

>>>> remove the remoteproc platform module cleanly (we shouldn't be imposing

>>>> force remove), which means we can't stop the remoteprocs properly.

>>>>

>>>

>>> I've always found the dependency on rpmsg awkward and don't like this

>>> behavior to depend on the firmware content, but rather on some sort of

>>> system configuration.

>>>

>>> That said, I did not intend to break the ability of shutting down and

>>> unloading the module.

>>

>> The remoteproc infrastructure always allowed the boot to be done by an

>> external module

> 

> The only caller of rproc_boot() in the mainline kernel is the

> wkup_m3_ipc, which spawns a thread and waits for the remoteproc-internal

> completion that indicates when the firmware is available and then call

> rproc_boot().

>

> I'm afraid I would much prefer the wkup_m3_rproc to just set auto-boot

> and not work around the design like this.

> 

> 

> That said, we still provide the support for this.


It is quite a valid usecase, and we will have more usecases like this
definitely showing up. This is exactly where I would need the
rproc_set_firmware() API as the client user dictates the firmware to
boot depending on the soft ip/functionality it wants from the remote
processor.

> 

>> (with vdevs, it looks intrinsic because of the

>> invocation in remoteproc_virtio.c, but the boot was really triggered

>> during virtio_rpmsg probe)

> 

> Except for the fact that the first vdev does this as a direct result of

> the call from rproc_fw_config_virtio().


Yeah, and I did have a downstream patch [1] to actually count the vdevs
and perform the boot on the last vdev instead of first. That is always a
limitation of the current upstream code, so let's not mix that with the
current change in behavior. Anyway, with your current changes, the patch
[1] becomes moot.

[1]
http://git.omapzoom.org/?p=kernel/omap.git;a=commitdiff;h=bdce403858a5c5d2abccd81a91bbe0528b97bef6

> 

> As stated above, the only other caller of rproc_boot() is the wkup_m3

> driver and although it's not executing in the context of

> rproc_fw_config_virtio() it's directly tied into its execution.

> 

>> and provided protection against removing the remoteproc driver removal

>> when you had a consumer.  I tried the auto-boot when I was upstreaming

>> the wkup_m3_rproc driver and it was rejected for exactly this reason.

> 

> One of the problems I have with the design chosen in the wkup_m3 case is

> if you had to deal with crash recovery, how would you sync the

> wkup_m3_ipc driver operations to the async recovery?


The wkup_m3 is akin to the System Controller Processor stuff w.r.t PM,
so first thing is we cannot afford to have this firmware crash in
general. In anycase, the communication stuff is dealt by the wkup_m3_ipc
driver itself as part of the PM suspend/resume cycle, so any error
notification detection will be it's job and it will be responsible for
managing the remoteproc life cycle appropriately.

> Flipping this the other way around and accepting that the "function

> device" follows the RPROC_RUNNING state of the rproc implicitly would

> allow this.

> 

>>

>>> Calling rmmod on your rproc module is a rather explicit operation, do

>>> you know why there's a need to hold the module reference? Wouldn't it be

>>> cleaner to just shutdown the remoteproc as the module is being removed -

>>> which would include tearing down all children (rpmsg and others)

>>

>> Right, the introduction of the auto-boot ended up in a self-holding

>> reference count which should be fixed to allow you to remove the module.

>> The external module boot is still a valid usage (when auto_boot is

>> false) though.

> 

> The way this is dealt with in other subsystems is that when a client

> acquire a handle to something exposed by the implementation it will

> reference the implementation.

> 

> With remoteproc I can acquire a reference to a remoteproc, then unload

> the implementation and when the client calls rproc_boot() the core will

> try to dereference dev->parent->driver->owner to try_module_get() it; if

> we're lucky that will fail as the driver is gone, but I would assume we

> would panic here instead, as driver is no longer a valid pointer.


I am talking about the rproc_shutdown() case - removing the underlying
implementation after the client has successfully booted the remoteproc
device and using it.

> 

>>

>>>

>>> I expect to be able to compile rpmsg into my kernel and still be able to

>>> unload my remoteproc module.

>>>

>>>> 2. The reversal of boot between virtio_rpmsg_bus and remoteproc core

>>>> also meant that the virtio devices and therefore the memory for vrings

>>>> are allocated at the time virtio_rpmsg_bus is probed in find_vqs(). The

>>>> remoteproc can be booted without the virtio_rpmsg_bus module installed.

>>>> We do use the allocated dma addresses of the vrings in the published

>>>> resource table, but now the remote processor is up even before these

>>>> values are filled in. I had to actually move up the rproc_alloc_vring

>>>> alongside rproc_parse_vring to have the communication up.

>>>>

>>>

>>> This also means that for a piece of firmware that exposes more than one

>>> virtio device we would have the same race.

>>

>> Yes, if you had more virtio devices technically. The remoteproc

>> infrastructure so far hadn't supported more than one vdev.

> 

> I don't see anything preventing you from putting more than one vdev in

> your resource table. There will however be a race on getting the vrings

> allocated before the firmware needs them.


Right, moving the vring allocation from find_vqs into resource parsing
time would fix that issue. My above downstream patch resolved this issue
with existing architecture, but as I said we were using virtio devices
of the same type.

> 

>> We did

>> support that in our downstream kernel but that was for a non-SMP usage

>> on a dual-core remoteproc subsystem and the virtio devices were still

>> virtio_rpmsg devices, so it scaled for us when we removed the

>> virtio_rpmsg_bus module as long as the reference counts were managed in

>> rproc_boot and rproc_shutdown()

>>

> 

> I think the shutdown case would work in either way, but the behavior of

> rmmod rpmsg && insmod rpmsg still depends on other things.


Right, the issues are with rmmod and insmod of virtio_rpmsg_bus.
Previously, this ensured the shutdown of remoteproc, and upon insmod,
the remoteproc is rebooted and rpmsg channels are republished (same flow
in error recovery too).

> 

>>>

>>> As you say it makes more sense to allocate the vrings as we set up the

>>> other resources.

>>>

>>>> 3. The remoteproc platform driver cannot be removed previously when the

>>>> corresponding virtio devices are probed/configured properly and all the

>>>> communication flow w.r.t rpmsg channel publishing followed from the

>>>> remoteproc boot. These channels are child devices of the parent virtio

>>>> devices, and since the remoteproc boot/shutdown followed the virtio

>>>> device probe/removal lifecycle, the rpmsg channels life-cycle followed

>>>> that of the parent virtio device. My communication paths are now failing

>>>> if I remove the virtio_rpmsg_bus and insmod it again as the vrings and

>>>> vring buffers are configured again while the remoteproc is still

>>>> running. Also, since the remoteproc is not rebooted, the previously

>>>> published rpmsg channels are stale and they won't get recreated.

>>>>

>>>

>>> So in essence this only worked because rmmod'ing the virtio_rpmsg_bus

>>> would shut down the remoteproc(?) What if the firmware exposed a

>>> VIRTIO_ID_NET and a VIRTIO_ID_RPMSG and you rmmod one of them?

>>

>> Yes, it's a problem if firmware exposed different virtio devices, but

>> like I said above, it was not supported so far.

> 

> It's not prevented, the only reason I can find for it not working is

> what I'm arguing about - that the resource handling will be racy.


Right, this is definitely a scenario to conside, but this is separate
(but similar) from the current problem I have mentioned.

> 

>> It is definitely

>> something we should consider going forward.

> 

> We had a question related to supporting multiple VIRTIO_ID_NET devices

> per remoteproc on LKML last week.

> 

>> Also, I would think the

>> VIRTIO_ID_RPMSG usage is different from VIRTIO_ID_NET or VIRTIO_CONSOLE.

>> The rpmsg usage does provide a bus infrastructure allowing the remote

>> side to publish different rpmsg devices.

>>

> 

> Maybe I'm missing something here, but if I put any other VIRTIO_ID in

> the "id" member of fw_rsc_vdev the virtio core will probe that driver

> and it will call the find_vqs callback and there's no limit on the

> number of fw_rsc_vdev entries in the resource table.

> 

> So the use case of having a remoteproc firmware with 8 VIRTIO_ID_NET

> entries should work just fine - except for only the first one being

> initialized as the firmware boots (unless we finish this restructure).


Yeah, this is racy whether with previous architecture or with the
current changes. Moving the vring allocation as I suggested should fix
this particular multiple vdev issue with the current changes.

> 

>>>

>>>

>>> The vrings are in use by the remote side, so allocating those with the

>>> other remoteproc resources makes more sense, as above.

>>>

>>> But virtio_rpmsg_bus needs to detach all buffers from the rings before

>>> going away, so we don't get any writes to those after releasing the dma

>>> allocation.

>>

>> They do get detached alright from Linux-side but obviously you are now

>> creating an additional synchronization to the remoteproc side that they

>> cannot use these as you have freed up the memory.

>>

> 

> Yeah, that's not acceptable, the backing memory must at least follow the

> running state of the rproc, so that it is available once we boot the

> core.

> 

>>>

>>> We will be sending out RPMSG_NS_DESTROY for some of the channels, but as

>>> far as I can tell from the rpmsg implementation there is no support in

>>> the protocol for your use case of dropping one end of the rpmsg channels

>>> without restarting the firmware and(/or?) vdevs.

>>

>> Right, this also causes new synchronization problems when you reprobe

>> and the remoteproc side has to republish the channels.

> 

> From the code it seems like the other virtio devices would handle this.


Do they implement a bus functionality like virtio_rpmsg?

> 

>> The existing

>> remoteproc infrastructure was designed around this fact - rpmsg channels

>> can come and go (if needed) from the remoteproc-side, and the flow is no

>> different from regular boot vs error recovery, and since they are tied

>> to their parent virtio_rpmsg device, removing the communication path

>> ensured that.

> 

> There are two levels here, the first is the virtio level which in the

> case of rpmsg seems to be required to follow the remoteproc

> RPROC_RUNNING state; the other being rpmsg channels have nothing to do

> with remoteproc, but can as you say come and go dynamically as either

> side like.

> 

> But the only problem I can see is if the firmware does not re-announce

> channels after we reset the virtio device.


Correct, this is my exact problem, the announcement is done once at
boot-time from our firmwares as they are publishing the services
supported by each. We only have a single physical transport
(virtio_rpmsg) between the host and the remote processor. And clients
cannot avail of these services once you do a rmmod/insmod cycle.

> 

>>

>>>

>>>> In summary, the current patches worked nicely in a error recovery

>>>> scenario but are not working properly with the various combinations of

>>>> module insertion/removal process.

>>>>

>>>

>>> Thanks for your feedback Suman. I think we should definitely fix #1 and

>>> #2, but I'm uncertain to what extent #3 can be fixed.

>>

>> The only solution I can think of is to reverse the module dependency as

>> well to follow the reversal of the boot dependency, so that

>> virtio_rpmsg_bus cannot be removed if you have a remoteproc supporting

>> virtio devices is booted and running once the virtio_rpmsg has probed.

> 

> Can you please help me understand why it's important to protect either

> module from being unloaded?


virtio_rpmsg_bus is quite simple, the firmwares publish their channels
one time during the boot, and the removal of virtio_rpmsg_bus also
removes these channels with them not getting republished now once you
insmod the virtio_rpmsg_bus.

I take it the other one you are talking about is the remoteproc platform
driver, that one you need to allow the removal in the case of auto-boot,
and restrict it in the case of a client booting it successfully.

> 

>> I

>> don't see any reasons to introduce new synchronization issues and force

>> the firmwares to change (having to republish channels) because of this

>> boot triggering reversal.

>>

> 

> The problem is not the boot triggering reversal - as there's no such

> thing, the issue shows because I decoupled the rproc life cycle from one

> of its child's.


I don't quite agree, the boot flow did change with the current changes,
and is also now affecting the logic in firmwares as well.

> 

>> I am concerned on the impact as 4.9 is going to an LTS, with most of the

>> Si vendors using the LTS for their new software releases.

>>

> 

> I'm sorry, but could you help me understand how rpmsg is used downstream

> for this to be of concern, how is module unloading used beyond

> development?


This is part of robustness testing with various modules especially when
they are built as modules. This definitely is a regression as
communication won't be up now.

> 

> Per my own arguments I will provide some patches to correct the vring

> buffer allocation and I'll look into the module locking.

> 

> PS. I really would like to see >0 users of this framework in mainline,

> so we can reason about things on the basis of what's actually there!


Yeah, we already have two remoteproc drivers and I have two more in the
pipeline. And looks like a whole bunch of them from other SoC vendors
are on their way too. Obviously, each has their own firmware designs/IPC
architectectures and restrictions.

regards
Suman
diff mbox

Patch

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index fe0539ed9cb5..5ca98039e494 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -933,6 +933,10 @@  static void rproc_fw_config_virtio(const struct firmware *fw, void *context)
 	/* look for virtio devices and register them */
 	ret = rproc_handle_resources(rproc, tablesz, rproc_vdev_handler);
 
+	/* if rproc is marked always-on, request it to boot */
+	if (rproc->auto_boot)
+		rproc_boot_nowait(rproc);
+
 out:
 	release_firmware(fw);
 	/* allow rproc_del() contexts, if any, to proceed */
@@ -978,11 +982,16 @@  static int rproc_add_virtio_devices(struct rproc *rproc)
 int rproc_trigger_recovery(struct rproc *rproc)
 {
 	struct rproc_vdev *rvdev, *rvtmp;
+	int ret;
 
 	dev_err(&rproc->dev, "recovering %s\n", rproc->name);
 
 	init_completion(&rproc->crash_comp);
 
+	/* shut down the remote */
+	/* TODO: make sure this works with rproc->power > 1 */
+	rproc_shutdown(rproc);
+
 	/* clean up remote vdev entries */
 	list_for_each_entry_safe(rvdev, rvtmp, &rproc->rvdevs, node)
 		rproc_remove_virtio_dev(rvdev);
@@ -993,7 +1002,18 @@  int rproc_trigger_recovery(struct rproc *rproc)
 	/* Free the copy of the resource table */
 	kfree(rproc->cached_table);
 
-	return rproc_add_virtio_devices(rproc);
+	ret = rproc_add_virtio_devices(rproc);
+	if (ret)
+		return ret;
+
+	/*
+	 * boot the remote processor up again, if the async firmware loader
+	 * didn't do so already, waiting for the async fw load to finish
+	 */
+	if (!rproc->auto_boot)
+		rproc_boot(rproc);
+
+	return 0;
 }
 
 /**
@@ -1374,6 +1394,7 @@  struct rproc *rproc_alloc(struct device *dev, const char *name,
 	rproc->name = name;
 	rproc->ops = ops;
 	rproc->priv = &rproc[1];
+	rproc->auto_boot = true;
 
 	device_initialize(&rproc->dev);
 	rproc->dev.parent = dev;
@@ -1452,6 +1473,11 @@  int rproc_del(struct rproc *rproc)
 	/* if rproc is just being registered, wait */
 	wait_for_completion(&rproc->firmware_loading_complete);
 
+	/* if rproc is marked always-on, rproc_add() booted it */
+	/* TODO: make sure this works with rproc->power > 1 */
+	if (rproc->auto_boot)
+		rproc_shutdown(rproc);
+
 	/* clean up remote vdev entries */
 	list_for_each_entry_safe(rvdev, tmp, &rproc->rvdevs, node)
 		rproc_remove_virtio_dev(rvdev);
diff --git a/drivers/remoteproc/remoteproc_virtio.c b/drivers/remoteproc/remoteproc_virtio.c
index cc91556313e1..574c90ce07f7 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -136,11 +136,6 @@  static void __rproc_virtio_del_vqs(struct virtio_device *vdev)
 
 static void rproc_virtio_del_vqs(struct virtio_device *vdev)
 {
-	struct rproc *rproc = vdev_to_rproc(vdev);
-
-	/* power down the remote processor before deleting vqs */
-	rproc_shutdown(rproc);
-
 	__rproc_virtio_del_vqs(vdev);
 }
 
@@ -149,7 +144,6 @@  static int rproc_virtio_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 		       vq_callback_t *callbacks[],
 		       const char * const names[])
 {
-	struct rproc *rproc = vdev_to_rproc(vdev);
 	int i, ret;
 
 	for (i = 0; i < nvqs; ++i) {
@@ -160,13 +154,6 @@  static int rproc_virtio_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 		}
 	}
 
-	/* now that the vqs are all set, boot the remote processor */
-	ret = rproc_boot_nowait(rproc);
-	if (ret) {
-		dev_err(&rproc->dev, "rproc_boot() failed %d\n", ret);
-		goto error;
-	}
-
 	return 0;
 
 error:
diff --git a/drivers/remoteproc/wkup_m3_rproc.c b/drivers/remoteproc/wkup_m3_rproc.c
index 02d271d101b4..3811cb522af3 100644
--- a/drivers/remoteproc/wkup_m3_rproc.c
+++ b/drivers/remoteproc/wkup_m3_rproc.c
@@ -167,6 +167,8 @@  static int wkup_m3_rproc_probe(struct platform_device *pdev)
 		goto err;
 	}
 
+	rproc->auto_boot = false;
+
 	wkupm3 = rproc->priv;
 	wkupm3->rproc = rproc;
 	wkupm3->pdev = pdev;
diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
index 1c457a8dd5a6..76d936cabbf8 100644
--- a/include/linux/remoteproc.h
+++ b/include/linux/remoteproc.h
@@ -443,6 +443,7 @@  struct rproc {
 	struct resource_table *cached_table;
 	u32 table_csum;
 	bool has_iommu;
+	bool auto_boot;
 };
 
 /* we currently support only two vrings per rvdev */