Message ID | 1470952373-4823-1-git-send-email-bjorn.andersson@linaro.org |
---|---|
State | Accepted |
Commit | ddf711872c9d2b05b0fb25db3e6e0c2a50be39e3 |
Headers | show |
Hi Bjorn, On 08/11/2016 04:52 PM, Bjorn Andersson wrote: > Introduce an "auto-boot" flag on rprocs to make it possible to flag > remote processors without vdevs to automatically boot once the firmware > is found. > > Preserve previous behavior of the wkup_m3 processor being explicitly > booted by a consumer. > > Cc: Lee Jones <lee.jones@linaro.org> > Cc: Loic Pallardy <loic.pallardy@st.com> > Cc: Suman Anna <s-anna@ti.com> > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> > --- > > Changes since v1: > - s/always_on/auto_boot/ > - Fixed double increment of "power" in recover path > - Marked wkup_m3 to not auto_boot > I am seeing various issues with this series as I am testing this series more thoroughly with various TI remoteproc drivers and IPC stacks based on virtio devices. I use very simple firmware images that publishes the rpmsg-client-sample devices, so that I can use the kernel rpmsg_client_sample to test the communication. Here's a summary of the main issues: 1. The rproc_boot holds a module reference count to the remoteproc platform driver so that it cannot be removed when a remote processor is booted. The previous architecture allowed virtio_rpmsg_bus or the platform remoteproc driver to be installed independent of each other with the boot actually getting triggered when the virtio_rpmsg_bus gets probed in find_vqs. The boot now happens when just the platform remoteproc driver is installed independent of virtio_rpmsg_bus and results in holding self-reference counts. This makes it impossible to remove the remoteproc platform module cleanly (we shouldn't be imposing force remove), which means we can't stop the remoteprocs properly. 2. The reversal of boot between virtio_rpmsg_bus and remoteproc core also meant that the virtio devices and therefore the memory for vrings are allocated at the time virtio_rpmsg_bus is probed in find_vqs(). The remoteproc can be booted without the virtio_rpmsg_bus module installed. We do use the allocated dma addresses of the vrings in the published resource table, but now the remote processor is up even before these values are filled in. I had to actually move up the rproc_alloc_vring alongside rproc_parse_vring to have the communication up. 3. The remoteproc platform driver cannot be removed previously when the corresponding virtio devices are probed/configured properly and all the communication flow w.r.t rpmsg channel publishing followed from the remoteproc boot. These channels are child devices of the parent virtio devices, and since the remoteproc boot/shutdown followed the virtio device probe/removal lifecycle, the rpmsg channels life-cycle followed that of the parent virtio device. My communication paths are now failing if I remove the virtio_rpmsg_bus and insmod it again as the vrings and vring buffers are configured again while the remoteproc is still running. Also, since the remoteproc is not rebooted, the previously published rpmsg channels are stale and they won't get recreated. In summary, the current patches worked nicely in a error recovery scenario but are not working properly with the various combinations of module insertion/removal process. regards Suman
On Wed 31 Aug 11:27 PDT 2016, Suman Anna wrote: > Hi Bjorn, > > On 08/11/2016 04:52 PM, Bjorn Andersson wrote: > > Introduce an "auto-boot" flag on rprocs to make it possible to flag > > remote processors without vdevs to automatically boot once the firmware > > is found. > > > > Preserve previous behavior of the wkup_m3 processor being explicitly > > booted by a consumer. > > > > Cc: Lee Jones <lee.jones@linaro.org> > > Cc: Loic Pallardy <loic.pallardy@st.com> > > Cc: Suman Anna <s-anna@ti.com> > > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> > > --- > > > > Changes since v1: > > - s/always_on/auto_boot/ > > - Fixed double increment of "power" in recover path > > - Marked wkup_m3 to not auto_boot > > > > I am seeing various issues with this series as I am testing this series > more thoroughly with various TI remoteproc drivers and IPC stacks based > on virtio devices. I use very simple firmware images that publishes the > rpmsg-client-sample devices, so that I can use the kernel > rpmsg_client_sample to test the communication. > Thanks for bringing these up! > Here's a summary of the main issues: > > 1. The rproc_boot holds a module reference count to the remoteproc > platform driver so that it cannot be removed when a remote processor is > booted. The previous architecture allowed virtio_rpmsg_bus or the > platform remoteproc driver to be installed independent of each other > with the boot actually getting triggered when the virtio_rpmsg_bus gets > probed in find_vqs. The boot now happens when just the platform > remoteproc driver is installed independent of virtio_rpmsg_bus and > results in holding self-reference counts. This makes it impossible to > remove the remoteproc platform module cleanly (we shouldn't be imposing > force remove), which means we can't stop the remoteprocs properly. > I've always found the dependency on rpmsg awkward and don't like this behavior to depend on the firmware content, but rather on some sort of system configuration. That said, I did not intend to break the ability of shutting down and unloading the module. Calling rmmod on your rproc module is a rather explicit operation, do you know why there's a need to hold the module reference? Wouldn't it be cleaner to just shutdown the remoteproc as the module is being removed - which would include tearing down all children (rpmsg and others) I expect to be able to compile rpmsg into my kernel and still be able to unload my remoteproc module. > 2. The reversal of boot between virtio_rpmsg_bus and remoteproc core > also meant that the virtio devices and therefore the memory for vrings > are allocated at the time virtio_rpmsg_bus is probed in find_vqs(). The > remoteproc can be booted without the virtio_rpmsg_bus module installed. > We do use the allocated dma addresses of the vrings in the published > resource table, but now the remote processor is up even before these > values are filled in. I had to actually move up the rproc_alloc_vring > alongside rproc_parse_vring to have the communication up. > This also means that for a piece of firmware that exposes more than one virtio device we would have the same race. As you say it makes more sense to allocate the vrings as we set up the other resources. > 3. The remoteproc platform driver cannot be removed previously when the > corresponding virtio devices are probed/configured properly and all the > communication flow w.r.t rpmsg channel publishing followed from the > remoteproc boot. These channels are child devices of the parent virtio > devices, and since the remoteproc boot/shutdown followed the virtio > device probe/removal lifecycle, the rpmsg channels life-cycle followed > that of the parent virtio device. My communication paths are now failing > if I remove the virtio_rpmsg_bus and insmod it again as the vrings and > vring buffers are configured again while the remoteproc is still > running. Also, since the remoteproc is not rebooted, the previously > published rpmsg channels are stale and they won't get recreated. > So in essence this only worked because rmmod'ing the virtio_rpmsg_bus would shut down the remoteproc(?) What if the firmware exposed a VIRTIO_ID_NET and a VIRTIO_ID_RPMSG and you rmmod one of them? The vrings are in use by the remote side, so allocating those with the other remoteproc resources makes more sense, as above. But virtio_rpmsg_bus needs to detach all buffers from the rings before going away, so we don't get any writes to those after releasing the dma allocation. We will be sending out RPMSG_NS_DESTROY for some of the channels, but as far as I can tell from the rpmsg implementation there is no support in the protocol for your use case of dropping one end of the rpmsg channels without restarting the firmware and(/or?) vdevs. > In summary, the current patches worked nicely in a error recovery > scenario but are not working properly with the various combinations of > module insertion/removal process. > Thanks for your feedback Suman. I think we should definitely fix #1 and #2, but I'm uncertain to what extent #3 can be fixed. Regards, Bjorn
Hi Bjorn, On 09/08/2016 05:27 PM, Bjorn Andersson wrote: > On Wed 31 Aug 11:27 PDT 2016, Suman Anna wrote: > >> Hi Bjorn, >> >> On 08/11/2016 04:52 PM, Bjorn Andersson wrote: >>> Introduce an "auto-boot" flag on rprocs to make it possible to flag >>> remote processors without vdevs to automatically boot once the firmware >>> is found. >>> >>> Preserve previous behavior of the wkup_m3 processor being explicitly >>> booted by a consumer. >>> >>> Cc: Lee Jones <lee.jones@linaro.org> >>> Cc: Loic Pallardy <loic.pallardy@st.com> >>> Cc: Suman Anna <s-anna@ti.com> >>> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> >>> --- >>> >>> Changes since v1: >>> - s/always_on/auto_boot/ >>> - Fixed double increment of "power" in recover path >>> - Marked wkup_m3 to not auto_boot >>> >> >> I am seeing various issues with this series as I am testing this series >> more thoroughly with various TI remoteproc drivers and IPC stacks based >> on virtio devices. I use very simple firmware images that publishes the >> rpmsg-client-sample devices, so that I can use the kernel >> rpmsg_client_sample to test the communication. >> > > Thanks for bringing these up! > >> Here's a summary of the main issues: >> >> 1. The rproc_boot holds a module reference count to the remoteproc >> platform driver so that it cannot be removed when a remote processor is >> booted. The previous architecture allowed virtio_rpmsg_bus or the >> platform remoteproc driver to be installed independent of each other >> with the boot actually getting triggered when the virtio_rpmsg_bus gets >> probed in find_vqs. The boot now happens when just the platform >> remoteproc driver is installed independent of virtio_rpmsg_bus and >> results in holding self-reference counts. This makes it impossible to >> remove the remoteproc platform module cleanly (we shouldn't be imposing >> force remove), which means we can't stop the remoteprocs properly. >> > > I've always found the dependency on rpmsg awkward and don't like this > behavior to depend on the firmware content, but rather on some sort of > system configuration. > > That said, I did not intend to break the ability of shutting down and > unloading the module. The remoteproc infrastructure always allowed the boot to be done by an external module (with vdevs, it looks intrinsic because of the invocation in remoteproc_virtio.c, but the boot was really triggered during virtio_rpmsg probe) and provided protection against removing the remoteproc driver removal when you had a consumer. I tried the auto-boot when I was upstreaming the wkup_m3_rproc driver and it was rejected for exactly this reason. > Calling rmmod on your rproc module is a rather explicit operation, do > you know why there's a need to hold the module reference? Wouldn't it be > cleaner to just shutdown the remoteproc as the module is being removed - > which would include tearing down all children (rpmsg and others) Right, the introduction of the auto-boot ended up in a self-holding reference count which should be fixed to allow you to remove the module. The external module boot is still a valid usage (when auto_boot is false) though. > > I expect to be able to compile rpmsg into my kernel and still be able to > unload my remoteproc module. > >> 2. The reversal of boot between virtio_rpmsg_bus and remoteproc core >> also meant that the virtio devices and therefore the memory for vrings >> are allocated at the time virtio_rpmsg_bus is probed in find_vqs(). The >> remoteproc can be booted without the virtio_rpmsg_bus module installed. >> We do use the allocated dma addresses of the vrings in the published >> resource table, but now the remote processor is up even before these >> values are filled in. I had to actually move up the rproc_alloc_vring >> alongside rproc_parse_vring to have the communication up. >> > > This also means that for a piece of firmware that exposes more than one > virtio device we would have the same race. Yes, if you had more virtio devices technically. The remoteproc infrastructure so far hadn't supported more than one vdev. We did support that in our downstream kernel but that was for a non-SMP usage on a dual-core remoteproc subsystem and the virtio devices were still virtio_rpmsg devices, so it scaled for us when we removed the virtio_rpmsg_bus module as long as the reference counts were managed in rproc_boot and rproc_shutdown() > > As you say it makes more sense to allocate the vrings as we set up the > other resources. > >> 3. The remoteproc platform driver cannot be removed previously when the >> corresponding virtio devices are probed/configured properly and all the >> communication flow w.r.t rpmsg channel publishing followed from the >> remoteproc boot. These channels are child devices of the parent virtio >> devices, and since the remoteproc boot/shutdown followed the virtio >> device probe/removal lifecycle, the rpmsg channels life-cycle followed >> that of the parent virtio device. My communication paths are now failing >> if I remove the virtio_rpmsg_bus and insmod it again as the vrings and >> vring buffers are configured again while the remoteproc is still >> running. Also, since the remoteproc is not rebooted, the previously >> published rpmsg channels are stale and they won't get recreated. >> > > So in essence this only worked because rmmod'ing the virtio_rpmsg_bus > would shut down the remoteproc(?) What if the firmware exposed a > VIRTIO_ID_NET and a VIRTIO_ID_RPMSG and you rmmod one of them? Yes, it's a problem if firmware exposed different virtio devices, but like I said above, it was not supported so far. It is definitely something we should consider going forward. Also, I would think the VIRTIO_ID_RPMSG usage is different from VIRTIO_ID_NET or VIRTIO_CONSOLE. The rpmsg usage does provide a bus infrastructure allowing the remote side to publish different rpmsg devices. > > > The vrings are in use by the remote side, so allocating those with the > other remoteproc resources makes more sense, as above. > > But virtio_rpmsg_bus needs to detach all buffers from the rings before > going away, so we don't get any writes to those after releasing the dma > allocation. They do get detached alright from Linux-side but obviously you are now creating an additional synchronization to the remoteproc side that they cannot use these as you have freed up the memory. > > We will be sending out RPMSG_NS_DESTROY for some of the channels, but as > far as I can tell from the rpmsg implementation there is no support in > the protocol for your use case of dropping one end of the rpmsg channels > without restarting the firmware and(/or?) vdevs. Right, this also causes new synchronization problems when you reprobe and the remoteproc side has to republish the channels. The existing remoteproc infrastructure was designed around this fact - rpmsg channels can come and go (if needed) from the remoteproc-side, and the flow is no different from regular boot vs error recovery, and since they are tied to their parent virtio_rpmsg device, removing the communication path ensured that. > >> In summary, the current patches worked nicely in a error recovery >> scenario but are not working properly with the various combinations of >> module insertion/removal process. >> > > Thanks for your feedback Suman. I think we should definitely fix #1 and > #2, but I'm uncertain to what extent #3 can be fixed. The only solution I can think of is to reverse the module dependency as well to follow the reversal of the boot dependency, so that virtio_rpmsg_bus cannot be removed if you have a remoteproc supporting virtio devices is booted and running once the virtio_rpmsg has probed. I don't see any reasons to introduce new synchronization issues and force the firmwares to change (having to republish channels) because of this boot triggering reversal. I am concerned on the impact as 4.9 is going to an LTS, with most of the Si vendors using the LTS for their new software releases. regards Suman
On Fri 16 Sep 16:58 PDT 2016, Suman Anna wrote: > Hi Bjorn, > > On 09/08/2016 05:27 PM, Bjorn Andersson wrote: > > On Wed 31 Aug 11:27 PDT 2016, Suman Anna wrote: > > > >> Hi Bjorn, > >> > >> On 08/11/2016 04:52 PM, Bjorn Andersson wrote: > >>> Introduce an "auto-boot" flag on rprocs to make it possible to flag > >>> remote processors without vdevs to automatically boot once the firmware > >>> is found. > >>> > >>> Preserve previous behavior of the wkup_m3 processor being explicitly > >>> booted by a consumer. > >>> > >>> Cc: Lee Jones <lee.jones@linaro.org> > >>> Cc: Loic Pallardy <loic.pallardy@st.com> > >>> Cc: Suman Anna <s-anna@ti.com> > >>> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> > >>> --- > >>> > >>> Changes since v1: > >>> - s/always_on/auto_boot/ > >>> - Fixed double increment of "power" in recover path > >>> - Marked wkup_m3 to not auto_boot > >>> > >> > >> I am seeing various issues with this series as I am testing this series > >> more thoroughly with various TI remoteproc drivers and IPC stacks based > >> on virtio devices. I use very simple firmware images that publishes the > >> rpmsg-client-sample devices, so that I can use the kernel > >> rpmsg_client_sample to test the communication. > >> > > > > Thanks for bringing these up! > > > >> Here's a summary of the main issues: > >> > >> 1. The rproc_boot holds a module reference count to the remoteproc > >> platform driver so that it cannot be removed when a remote processor is > >> booted. The previous architecture allowed virtio_rpmsg_bus or the > >> platform remoteproc driver to be installed independent of each other > >> with the boot actually getting triggered when the virtio_rpmsg_bus gets > >> probed in find_vqs. The boot now happens when just the platform > >> remoteproc driver is installed independent of virtio_rpmsg_bus and > >> results in holding self-reference counts. This makes it impossible to > >> remove the remoteproc platform module cleanly (we shouldn't be imposing > >> force remove), which means we can't stop the remoteprocs properly. > >> > > > > I've always found the dependency on rpmsg awkward and don't like this > > behavior to depend on the firmware content, but rather on some sort of > > system configuration. > > > > That said, I did not intend to break the ability of shutting down and > > unloading the module. > > The remoteproc infrastructure always allowed the boot to be done by an > external module The only caller of rproc_boot() in the mainline kernel is the wkup_m3_ipc, which spawns a thread and waits for the remoteproc-internal completion that indicates when the firmware is available and then call rproc_boot(). I'm afraid I would much prefer the wkup_m3_rproc to just set auto-boot and not work around the design like this. That said, we still provide the support for this. > (with vdevs, it looks intrinsic because of the > invocation in remoteproc_virtio.c, but the boot was really triggered > during virtio_rpmsg probe) Except for the fact that the first vdev does this as a direct result of the call from rproc_fw_config_virtio(). As stated above, the only other caller of rproc_boot() is the wkup_m3 driver and although it's not executing in the context of rproc_fw_config_virtio() it's directly tied into its execution. > and provided protection against removing the remoteproc driver removal > when you had a consumer. I tried the auto-boot when I was upstreaming > the wkup_m3_rproc driver and it was rejected for exactly this reason. One of the problems I have with the design chosen in the wkup_m3 case is if you had to deal with crash recovery, how would you sync the wkup_m3_ipc driver operations to the async recovery? Flipping this the other way around and accepting that the "function device" follows the RPROC_RUNNING state of the rproc implicitly would allow this. > > > Calling rmmod on your rproc module is a rather explicit operation, do > > you know why there's a need to hold the module reference? Wouldn't it be > > cleaner to just shutdown the remoteproc as the module is being removed - > > which would include tearing down all children (rpmsg and others) > > Right, the introduction of the auto-boot ended up in a self-holding > reference count which should be fixed to allow you to remove the module. > The external module boot is still a valid usage (when auto_boot is > false) though. The way this is dealt with in other subsystems is that when a client acquire a handle to something exposed by the implementation it will reference the implementation. With remoteproc I can acquire a reference to a remoteproc, then unload the implementation and when the client calls rproc_boot() the core will try to dereference dev->parent->driver->owner to try_module_get() it; if we're lucky that will fail as the driver is gone, but I would assume we would panic here instead, as driver is no longer a valid pointer. > > > > > I expect to be able to compile rpmsg into my kernel and still be able to > > unload my remoteproc module. > > > >> 2. The reversal of boot between virtio_rpmsg_bus and remoteproc core > >> also meant that the virtio devices and therefore the memory for vrings > >> are allocated at the time virtio_rpmsg_bus is probed in find_vqs(). The > >> remoteproc can be booted without the virtio_rpmsg_bus module installed. > >> We do use the allocated dma addresses of the vrings in the published > >> resource table, but now the remote processor is up even before these > >> values are filled in. I had to actually move up the rproc_alloc_vring > >> alongside rproc_parse_vring to have the communication up. > >> > > > > This also means that for a piece of firmware that exposes more than one > > virtio device we would have the same race. > > Yes, if you had more virtio devices technically. The remoteproc > infrastructure so far hadn't supported more than one vdev. I don't see anything preventing you from putting more than one vdev in your resource table. There will however be a race on getting the vrings allocated before the firmware needs them. > We did > support that in our downstream kernel but that was for a non-SMP usage > on a dual-core remoteproc subsystem and the virtio devices were still > virtio_rpmsg devices, so it scaled for us when we removed the > virtio_rpmsg_bus module as long as the reference counts were managed in > rproc_boot and rproc_shutdown() > I think the shutdown case would work in either way, but the behavior of rmmod rpmsg && insmod rpmsg still depends on other things. > > > > As you say it makes more sense to allocate the vrings as we set up the > > other resources. > > > >> 3. The remoteproc platform driver cannot be removed previously when the > >> corresponding virtio devices are probed/configured properly and all the > >> communication flow w.r.t rpmsg channel publishing followed from the > >> remoteproc boot. These channels are child devices of the parent virtio > >> devices, and since the remoteproc boot/shutdown followed the virtio > >> device probe/removal lifecycle, the rpmsg channels life-cycle followed > >> that of the parent virtio device. My communication paths are now failing > >> if I remove the virtio_rpmsg_bus and insmod it again as the vrings and > >> vring buffers are configured again while the remoteproc is still > >> running. Also, since the remoteproc is not rebooted, the previously > >> published rpmsg channels are stale and they won't get recreated. > >> > > > > So in essence this only worked because rmmod'ing the virtio_rpmsg_bus > > would shut down the remoteproc(?) What if the firmware exposed a > > VIRTIO_ID_NET and a VIRTIO_ID_RPMSG and you rmmod one of them? > > Yes, it's a problem if firmware exposed different virtio devices, but > like I said above, it was not supported so far. It's not prevented, the only reason I can find for it not working is what I'm arguing about - that the resource handling will be racy. > It is definitely > something we should consider going forward. We had a question related to supporting multiple VIRTIO_ID_NET devices per remoteproc on LKML last week. > Also, I would think the > VIRTIO_ID_RPMSG usage is different from VIRTIO_ID_NET or VIRTIO_CONSOLE. > The rpmsg usage does provide a bus infrastructure allowing the remote > side to publish different rpmsg devices. > Maybe I'm missing something here, but if I put any other VIRTIO_ID in the "id" member of fw_rsc_vdev the virtio core will probe that driver and it will call the find_vqs callback and there's no limit on the number of fw_rsc_vdev entries in the resource table. So the use case of having a remoteproc firmware with 8 VIRTIO_ID_NET entries should work just fine - except for only the first one being initialized as the firmware boots (unless we finish this restructure). > > > > > > The vrings are in use by the remote side, so allocating those with the > > other remoteproc resources makes more sense, as above. > > > > But virtio_rpmsg_bus needs to detach all buffers from the rings before > > going away, so we don't get any writes to those after releasing the dma > > allocation. > > They do get detached alright from Linux-side but obviously you are now > creating an additional synchronization to the remoteproc side that they > cannot use these as you have freed up the memory. > Yeah, that's not acceptable, the backing memory must at least follow the running state of the rproc, so that it is available once we boot the core. > > > > We will be sending out RPMSG_NS_DESTROY for some of the channels, but as > > far as I can tell from the rpmsg implementation there is no support in > > the protocol for your use case of dropping one end of the rpmsg channels > > without restarting the firmware and(/or?) vdevs. > > Right, this also causes new synchronization problems when you reprobe > and the remoteproc side has to republish the channels. From the code it seems like the other virtio devices would handle this. > The existing > remoteproc infrastructure was designed around this fact - rpmsg channels > can come and go (if needed) from the remoteproc-side, and the flow is no > different from regular boot vs error recovery, and since they are tied > to their parent virtio_rpmsg device, removing the communication path > ensured that. There are two levels here, the first is the virtio level which in the case of rpmsg seems to be required to follow the remoteproc RPROC_RUNNING state; the other being rpmsg channels have nothing to do with remoteproc, but can as you say come and go dynamically as either side like. But the only problem I can see is if the firmware does not re-announce channels after we reset the virtio device. > > > > >> In summary, the current patches worked nicely in a error recovery > >> scenario but are not working properly with the various combinations of > >> module insertion/removal process. > >> > > > > Thanks for your feedback Suman. I think we should definitely fix #1 and > > #2, but I'm uncertain to what extent #3 can be fixed. > > The only solution I can think of is to reverse the module dependency as > well to follow the reversal of the boot dependency, so that > virtio_rpmsg_bus cannot be removed if you have a remoteproc supporting > virtio devices is booted and running once the virtio_rpmsg has probed. Can you please help me understand why it's important to protect either module from being unloaded? > I > don't see any reasons to introduce new synchronization issues and force > the firmwares to change (having to republish channels) because of this > boot triggering reversal. > The problem is not the boot triggering reversal - as there's no such thing, the issue shows because I decoupled the rproc life cycle from one of its child's. > I am concerned on the impact as 4.9 is going to an LTS, with most of the > Si vendors using the LTS for their new software releases. > I'm sorry, but could you help me understand how rpmsg is used downstream for this to be of concern, how is module unloading used beyond development? Per my own arguments I will provide some patches to correct the vring buffer allocation and I'll look into the module locking. PS. I really would like to see >0 users of this framework in mainline, so we can reason about things on the basis of what's actually there! Regards, Bjorn
Hi Bjorn, >>>> >>>> On 08/11/2016 04:52 PM, Bjorn Andersson wrote: >>>>> Introduce an "auto-boot" flag on rprocs to make it possible to flag >>>>> remote processors without vdevs to automatically boot once the firmware >>>>> is found. >>>>> >>>>> Preserve previous behavior of the wkup_m3 processor being explicitly >>>>> booted by a consumer. >>>>> >>>>> Cc: Lee Jones <lee.jones@linaro.org> >>>>> Cc: Loic Pallardy <loic.pallardy@st.com> >>>>> Cc: Suman Anna <s-anna@ti.com> >>>>> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> >>>>> --- >>>>> >>>>> Changes since v1: >>>>> - s/always_on/auto_boot/ >>>>> - Fixed double increment of "power" in recover path >>>>> - Marked wkup_m3 to not auto_boot >>>>> >>>> >>>> I am seeing various issues with this series as I am testing this series >>>> more thoroughly with various TI remoteproc drivers and IPC stacks based >>>> on virtio devices. I use very simple firmware images that publishes the >>>> rpmsg-client-sample devices, so that I can use the kernel >>>> rpmsg_client_sample to test the communication. >>>> >>> >>> Thanks for bringing these up! >>> >>>> Here's a summary of the main issues: >>>> >>>> 1. The rproc_boot holds a module reference count to the remoteproc >>>> platform driver so that it cannot be removed when a remote processor is >>>> booted. The previous architecture allowed virtio_rpmsg_bus or the >>>> platform remoteproc driver to be installed independent of each other >>>> with the boot actually getting triggered when the virtio_rpmsg_bus gets >>>> probed in find_vqs. The boot now happens when just the platform >>>> remoteproc driver is installed independent of virtio_rpmsg_bus and >>>> results in holding self-reference counts. This makes it impossible to >>>> remove the remoteproc platform module cleanly (we shouldn't be imposing >>>> force remove), which means we can't stop the remoteprocs properly. >>>> >>> >>> I've always found the dependency on rpmsg awkward and don't like this >>> behavior to depend on the firmware content, but rather on some sort of >>> system configuration. >>> >>> That said, I did not intend to break the ability of shutting down and >>> unloading the module. >> >> The remoteproc infrastructure always allowed the boot to be done by an >> external module > > The only caller of rproc_boot() in the mainline kernel is the > wkup_m3_ipc, which spawns a thread and waits for the remoteproc-internal > completion that indicates when the firmware is available and then call > rproc_boot(). > > I'm afraid I would much prefer the wkup_m3_rproc to just set auto-boot > and not work around the design like this. > > > That said, we still provide the support for this. It is quite a valid usecase, and we will have more usecases like this definitely showing up. This is exactly where I would need the rproc_set_firmware() API as the client user dictates the firmware to boot depending on the soft ip/functionality it wants from the remote processor. > >> (with vdevs, it looks intrinsic because of the >> invocation in remoteproc_virtio.c, but the boot was really triggered >> during virtio_rpmsg probe) > > Except for the fact that the first vdev does this as a direct result of > the call from rproc_fw_config_virtio(). Yeah, and I did have a downstream patch [1] to actually count the vdevs and perform the boot on the last vdev instead of first. That is always a limitation of the current upstream code, so let's not mix that with the current change in behavior. Anyway, with your current changes, the patch [1] becomes moot. [1] http://git.omapzoom.org/?p=kernel/omap.git;a=commitdiff;h=bdce403858a5c5d2abccd81a91bbe0528b97bef6 > > As stated above, the only other caller of rproc_boot() is the wkup_m3 > driver and although it's not executing in the context of > rproc_fw_config_virtio() it's directly tied into its execution. > >> and provided protection against removing the remoteproc driver removal >> when you had a consumer. I tried the auto-boot when I was upstreaming >> the wkup_m3_rproc driver and it was rejected for exactly this reason. > > One of the problems I have with the design chosen in the wkup_m3 case is > if you had to deal with crash recovery, how would you sync the > wkup_m3_ipc driver operations to the async recovery? The wkup_m3 is akin to the System Controller Processor stuff w.r.t PM, so first thing is we cannot afford to have this firmware crash in general. In anycase, the communication stuff is dealt by the wkup_m3_ipc driver itself as part of the PM suspend/resume cycle, so any error notification detection will be it's job and it will be responsible for managing the remoteproc life cycle appropriately. > Flipping this the other way around and accepting that the "function > device" follows the RPROC_RUNNING state of the rproc implicitly would > allow this. > >> >>> Calling rmmod on your rproc module is a rather explicit operation, do >>> you know why there's a need to hold the module reference? Wouldn't it be >>> cleaner to just shutdown the remoteproc as the module is being removed - >>> which would include tearing down all children (rpmsg and others) >> >> Right, the introduction of the auto-boot ended up in a self-holding >> reference count which should be fixed to allow you to remove the module. >> The external module boot is still a valid usage (when auto_boot is >> false) though. > > The way this is dealt with in other subsystems is that when a client > acquire a handle to something exposed by the implementation it will > reference the implementation. > > With remoteproc I can acquire a reference to a remoteproc, then unload > the implementation and when the client calls rproc_boot() the core will > try to dereference dev->parent->driver->owner to try_module_get() it; if > we're lucky that will fail as the driver is gone, but I would assume we > would panic here instead, as driver is no longer a valid pointer. I am talking about the rproc_shutdown() case - removing the underlying implementation after the client has successfully booted the remoteproc device and using it. > >> >>> >>> I expect to be able to compile rpmsg into my kernel and still be able to >>> unload my remoteproc module. >>> >>>> 2. The reversal of boot between virtio_rpmsg_bus and remoteproc core >>>> also meant that the virtio devices and therefore the memory for vrings >>>> are allocated at the time virtio_rpmsg_bus is probed in find_vqs(). The >>>> remoteproc can be booted without the virtio_rpmsg_bus module installed. >>>> We do use the allocated dma addresses of the vrings in the published >>>> resource table, but now the remote processor is up even before these >>>> values are filled in. I had to actually move up the rproc_alloc_vring >>>> alongside rproc_parse_vring to have the communication up. >>>> >>> >>> This also means that for a piece of firmware that exposes more than one >>> virtio device we would have the same race. >> >> Yes, if you had more virtio devices technically. The remoteproc >> infrastructure so far hadn't supported more than one vdev. > > I don't see anything preventing you from putting more than one vdev in > your resource table. There will however be a race on getting the vrings > allocated before the firmware needs them. Right, moving the vring allocation from find_vqs into resource parsing time would fix that issue. My above downstream patch resolved this issue with existing architecture, but as I said we were using virtio devices of the same type. > >> We did >> support that in our downstream kernel but that was for a non-SMP usage >> on a dual-core remoteproc subsystem and the virtio devices were still >> virtio_rpmsg devices, so it scaled for us when we removed the >> virtio_rpmsg_bus module as long as the reference counts were managed in >> rproc_boot and rproc_shutdown() >> > > I think the shutdown case would work in either way, but the behavior of > rmmod rpmsg && insmod rpmsg still depends on other things. Right, the issues are with rmmod and insmod of virtio_rpmsg_bus. Previously, this ensured the shutdown of remoteproc, and upon insmod, the remoteproc is rebooted and rpmsg channels are republished (same flow in error recovery too). > >>> >>> As you say it makes more sense to allocate the vrings as we set up the >>> other resources. >>> >>>> 3. The remoteproc platform driver cannot be removed previously when the >>>> corresponding virtio devices are probed/configured properly and all the >>>> communication flow w.r.t rpmsg channel publishing followed from the >>>> remoteproc boot. These channels are child devices of the parent virtio >>>> devices, and since the remoteproc boot/shutdown followed the virtio >>>> device probe/removal lifecycle, the rpmsg channels life-cycle followed >>>> that of the parent virtio device. My communication paths are now failing >>>> if I remove the virtio_rpmsg_bus and insmod it again as the vrings and >>>> vring buffers are configured again while the remoteproc is still >>>> running. Also, since the remoteproc is not rebooted, the previously >>>> published rpmsg channels are stale and they won't get recreated. >>>> >>> >>> So in essence this only worked because rmmod'ing the virtio_rpmsg_bus >>> would shut down the remoteproc(?) What if the firmware exposed a >>> VIRTIO_ID_NET and a VIRTIO_ID_RPMSG and you rmmod one of them? >> >> Yes, it's a problem if firmware exposed different virtio devices, but >> like I said above, it was not supported so far. > > It's not prevented, the only reason I can find for it not working is > what I'm arguing about - that the resource handling will be racy. Right, this is definitely a scenario to conside, but this is separate (but similar) from the current problem I have mentioned. > >> It is definitely >> something we should consider going forward. > > We had a question related to supporting multiple VIRTIO_ID_NET devices > per remoteproc on LKML last week. > >> Also, I would think the >> VIRTIO_ID_RPMSG usage is different from VIRTIO_ID_NET or VIRTIO_CONSOLE. >> The rpmsg usage does provide a bus infrastructure allowing the remote >> side to publish different rpmsg devices. >> > > Maybe I'm missing something here, but if I put any other VIRTIO_ID in > the "id" member of fw_rsc_vdev the virtio core will probe that driver > and it will call the find_vqs callback and there's no limit on the > number of fw_rsc_vdev entries in the resource table. > > So the use case of having a remoteproc firmware with 8 VIRTIO_ID_NET > entries should work just fine - except for only the first one being > initialized as the firmware boots (unless we finish this restructure). Yeah, this is racy whether with previous architecture or with the current changes. Moving the vring allocation as I suggested should fix this particular multiple vdev issue with the current changes. > >>> >>> >>> The vrings are in use by the remote side, so allocating those with the >>> other remoteproc resources makes more sense, as above. >>> >>> But virtio_rpmsg_bus needs to detach all buffers from the rings before >>> going away, so we don't get any writes to those after releasing the dma >>> allocation. >> >> They do get detached alright from Linux-side but obviously you are now >> creating an additional synchronization to the remoteproc side that they >> cannot use these as you have freed up the memory. >> > > Yeah, that's not acceptable, the backing memory must at least follow the > running state of the rproc, so that it is available once we boot the > core. > >>> >>> We will be sending out RPMSG_NS_DESTROY for some of the channels, but as >>> far as I can tell from the rpmsg implementation there is no support in >>> the protocol for your use case of dropping one end of the rpmsg channels >>> without restarting the firmware and(/or?) vdevs. >> >> Right, this also causes new synchronization problems when you reprobe >> and the remoteproc side has to republish the channels. > > From the code it seems like the other virtio devices would handle this. Do they implement a bus functionality like virtio_rpmsg? > >> The existing >> remoteproc infrastructure was designed around this fact - rpmsg channels >> can come and go (if needed) from the remoteproc-side, and the flow is no >> different from regular boot vs error recovery, and since they are tied >> to their parent virtio_rpmsg device, removing the communication path >> ensured that. > > There are two levels here, the first is the virtio level which in the > case of rpmsg seems to be required to follow the remoteproc > RPROC_RUNNING state; the other being rpmsg channels have nothing to do > with remoteproc, but can as you say come and go dynamically as either > side like. > > But the only problem I can see is if the firmware does not re-announce > channels after we reset the virtio device. Correct, this is my exact problem, the announcement is done once at boot-time from our firmwares as they are publishing the services supported by each. We only have a single physical transport (virtio_rpmsg) between the host and the remote processor. And clients cannot avail of these services once you do a rmmod/insmod cycle. > >> >>> >>>> In summary, the current patches worked nicely in a error recovery >>>> scenario but are not working properly with the various combinations of >>>> module insertion/removal process. >>>> >>> >>> Thanks for your feedback Suman. I think we should definitely fix #1 and >>> #2, but I'm uncertain to what extent #3 can be fixed. >> >> The only solution I can think of is to reverse the module dependency as >> well to follow the reversal of the boot dependency, so that >> virtio_rpmsg_bus cannot be removed if you have a remoteproc supporting >> virtio devices is booted and running once the virtio_rpmsg has probed. > > Can you please help me understand why it's important to protect either > module from being unloaded? virtio_rpmsg_bus is quite simple, the firmwares publish their channels one time during the boot, and the removal of virtio_rpmsg_bus also removes these channels with them not getting republished now once you insmod the virtio_rpmsg_bus. I take it the other one you are talking about is the remoteproc platform driver, that one you need to allow the removal in the case of auto-boot, and restrict it in the case of a client booting it successfully. > >> I >> don't see any reasons to introduce new synchronization issues and force >> the firmwares to change (having to republish channels) because of this >> boot triggering reversal. >> > > The problem is not the boot triggering reversal - as there's no such > thing, the issue shows because I decoupled the rproc life cycle from one > of its child's. I don't quite agree, the boot flow did change with the current changes, and is also now affecting the logic in firmwares as well. > >> I am concerned on the impact as 4.9 is going to an LTS, with most of the >> Si vendors using the LTS for their new software releases. >> > > I'm sorry, but could you help me understand how rpmsg is used downstream > for this to be of concern, how is module unloading used beyond > development? This is part of robustness testing with various modules especially when they are built as modules. This definitely is a regression as communication won't be up now. > > Per my own arguments I will provide some patches to correct the vring > buffer allocation and I'll look into the module locking. > > PS. I really would like to see >0 users of this framework in mainline, > so we can reason about things on the basis of what's actually there! Yeah, we already have two remoteproc drivers and I have two more in the pipeline. And looks like a whole bunch of them from other SoC vendors are on their way too. Obviously, each has their own firmware designs/IPC architectectures and restrictions. regards Suman
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c index fe0539ed9cb5..5ca98039e494 100644 --- a/drivers/remoteproc/remoteproc_core.c +++ b/drivers/remoteproc/remoteproc_core.c @@ -933,6 +933,10 @@ static void rproc_fw_config_virtio(const struct firmware *fw, void *context) /* look for virtio devices and register them */ ret = rproc_handle_resources(rproc, tablesz, rproc_vdev_handler); + /* if rproc is marked always-on, request it to boot */ + if (rproc->auto_boot) + rproc_boot_nowait(rproc); + out: release_firmware(fw); /* allow rproc_del() contexts, if any, to proceed */ @@ -978,11 +982,16 @@ static int rproc_add_virtio_devices(struct rproc *rproc) int rproc_trigger_recovery(struct rproc *rproc) { struct rproc_vdev *rvdev, *rvtmp; + int ret; dev_err(&rproc->dev, "recovering %s\n", rproc->name); init_completion(&rproc->crash_comp); + /* shut down the remote */ + /* TODO: make sure this works with rproc->power > 1 */ + rproc_shutdown(rproc); + /* clean up remote vdev entries */ list_for_each_entry_safe(rvdev, rvtmp, &rproc->rvdevs, node) rproc_remove_virtio_dev(rvdev); @@ -993,7 +1002,18 @@ int rproc_trigger_recovery(struct rproc *rproc) /* Free the copy of the resource table */ kfree(rproc->cached_table); - return rproc_add_virtio_devices(rproc); + ret = rproc_add_virtio_devices(rproc); + if (ret) + return ret; + + /* + * boot the remote processor up again, if the async firmware loader + * didn't do so already, waiting for the async fw load to finish + */ + if (!rproc->auto_boot) + rproc_boot(rproc); + + return 0; } /** @@ -1374,6 +1394,7 @@ struct rproc *rproc_alloc(struct device *dev, const char *name, rproc->name = name; rproc->ops = ops; rproc->priv = &rproc[1]; + rproc->auto_boot = true; device_initialize(&rproc->dev); rproc->dev.parent = dev; @@ -1452,6 +1473,11 @@ int rproc_del(struct rproc *rproc) /* if rproc is just being registered, wait */ wait_for_completion(&rproc->firmware_loading_complete); + /* if rproc is marked always-on, rproc_add() booted it */ + /* TODO: make sure this works with rproc->power > 1 */ + if (rproc->auto_boot) + rproc_shutdown(rproc); + /* clean up remote vdev entries */ list_for_each_entry_safe(rvdev, tmp, &rproc->rvdevs, node) rproc_remove_virtio_dev(rvdev); diff --git a/drivers/remoteproc/remoteproc_virtio.c b/drivers/remoteproc/remoteproc_virtio.c index cc91556313e1..574c90ce07f7 100644 --- a/drivers/remoteproc/remoteproc_virtio.c +++ b/drivers/remoteproc/remoteproc_virtio.c @@ -136,11 +136,6 @@ static void __rproc_virtio_del_vqs(struct virtio_device *vdev) static void rproc_virtio_del_vqs(struct virtio_device *vdev) { - struct rproc *rproc = vdev_to_rproc(vdev); - - /* power down the remote processor before deleting vqs */ - rproc_shutdown(rproc); - __rproc_virtio_del_vqs(vdev); } @@ -149,7 +144,6 @@ static int rproc_virtio_find_vqs(struct virtio_device *vdev, unsigned nvqs, vq_callback_t *callbacks[], const char * const names[]) { - struct rproc *rproc = vdev_to_rproc(vdev); int i, ret; for (i = 0; i < nvqs; ++i) { @@ -160,13 +154,6 @@ static int rproc_virtio_find_vqs(struct virtio_device *vdev, unsigned nvqs, } } - /* now that the vqs are all set, boot the remote processor */ - ret = rproc_boot_nowait(rproc); - if (ret) { - dev_err(&rproc->dev, "rproc_boot() failed %d\n", ret); - goto error; - } - return 0; error: diff --git a/drivers/remoteproc/wkup_m3_rproc.c b/drivers/remoteproc/wkup_m3_rproc.c index 02d271d101b4..3811cb522af3 100644 --- a/drivers/remoteproc/wkup_m3_rproc.c +++ b/drivers/remoteproc/wkup_m3_rproc.c @@ -167,6 +167,8 @@ static int wkup_m3_rproc_probe(struct platform_device *pdev) goto err; } + rproc->auto_boot = false; + wkupm3 = rproc->priv; wkupm3->rproc = rproc; wkupm3->pdev = pdev; diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h index 1c457a8dd5a6..76d936cabbf8 100644 --- a/include/linux/remoteproc.h +++ b/include/linux/remoteproc.h @@ -443,6 +443,7 @@ struct rproc { struct resource_table *cached_table; u32 table_csum; bool has_iommu; + bool auto_boot; }; /* we currently support only two vrings per rvdev */
Introduce an "auto-boot" flag on rprocs to make it possible to flag remote processors without vdevs to automatically boot once the firmware is found. Preserve previous behavior of the wkup_m3 processor being explicitly booted by a consumer. Cc: Lee Jones <lee.jones@linaro.org> Cc: Loic Pallardy <loic.pallardy@st.com> Cc: Suman Anna <s-anna@ti.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> --- Changes since v1: - s/always_on/auto_boot/ - Fixed double increment of "power" in recover path - Marked wkup_m3 to not auto_boot drivers/remoteproc/remoteproc_core.c | 28 +++++++++++++++++++++++++++- drivers/remoteproc/remoteproc_virtio.c | 13 ------------- drivers/remoteproc/wkup_m3_rproc.c | 2 ++ include/linux/remoteproc.h | 1 + 4 files changed, 30 insertions(+), 14 deletions(-) -- 2.5.0