diff mbox series

spi: Fix spi device unregister flow

Message ID 20210426235638.1285530-1-saravanak@google.com
State Accepted
Commit c7299fea67696db5bd09d924d1f1080d894f92ef
Headers show
Series spi: Fix spi device unregister flow | expand

Commit Message

Saravana Kannan April 26, 2021, 11:56 p.m. UTC
When an SPI device is unregistered, the spi->controller->cleanup() is
called in the device's release callback. That's wrong for a couple of
reasons:

1. spi_dev_put() can be called before spi_add_device() is called. And
   it's spi_add_device() that calls spi_setup(). This will cause clean()
   to get called without the spi device ever being setup.

2. There's no guarantee that the controller's driver would be present by
   the time the spi device's release function gets called.

3. It also causes "sleeping in atomic context" stack dump[1] when device
   link deletion code does a put_device() on the spi device.

Fix these issues by simply moving the cleanup from the device release
callback to the actual spi_unregister_device() function.

[1] - https://lore.kernel.org/lkml/CAHp75Vc=FCGcUyS0v6fnxme2YJ+qD+Y-hQDQLa2JhWNON9VmsQ@mail.gmail.com/
Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/spi/spi.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

Comments

Andy Shevchenko April 27, 2021, 6:52 a.m. UTC | #1
+Cc Lukas

On Tue, Apr 27, 2021 at 2:56 AM Saravana Kannan <saravanak@google.com> wrote:
>

> When an SPI device is unregistered, the spi->controller->cleanup() is

> called in the device's release callback. That's wrong for a couple of

> reasons:

>

> 1. spi_dev_put() can be called before spi_add_device() is called. And

>    it's spi_add_device() that calls spi_setup(). This will cause clean()

>    to get called without the spi device ever being setup.

>

> 2. There's no guarantee that the controller's driver would be present by

>    the time the spi device's release function gets called.

>

> 3. It also causes "sleeping in atomic context" stack dump[1] when device

>    link deletion code does a put_device() on the spi device.

>

> Fix these issues by simply moving the cleanup from the device release

> callback to the actual spi_unregister_device() function.

>

> [1] - https://lore.kernel.org/lkml/CAHp75Vc=FCGcUyS0v6fnxme2YJ+qD+Y-hQDQLa2JhWNON9VmsQ@mail.gmail.com/

> Signed-off-by: Saravana Kannan <saravanak@google.com>

> ---

>  drivers/spi/spi.c | 18 ++++++++++++------

>  1 file changed, 12 insertions(+), 6 deletions(-)

>

> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c

> index b08efe88ccd6..7d0d89172a1d 100644

> --- a/drivers/spi/spi.c

> +++ b/drivers/spi/spi.c

> @@ -47,10 +47,6 @@ static void spidev_release(struct device *dev)

>  {

>         struct spi_device       *spi = to_spi_device(dev);

>

> -       /* spi controllers may cleanup for released devices */

> -       if (spi->controller->cleanup)

> -               spi->controller->cleanup(spi);

> -

>         spi_controller_put(spi->controller);

>         kfree(spi->driver_override);

>         kfree(spi);

> @@ -558,6 +554,12 @@ static int spi_dev_check(struct device *dev, void *data)

>         return 0;

>  }

>

> +static void spi_cleanup(struct spi_device *spi)

> +{

> +       if (spi->controller->cleanup)

> +               spi->controller->cleanup(spi);

> +}

> +

>  /**

>   * spi_add_device - Add spi_device allocated with spi_alloc_device

>   * @spi: spi_device to register

> @@ -622,11 +624,13 @@ int spi_add_device(struct spi_device *spi)

>

>         /* Device may be bound to an active driver when this returns */

>         status = device_add(&spi->dev);

> -       if (status < 0)

> +       if (status < 0) {

>                 dev_err(dev, "can't add %s, status %d\n",

>                                 dev_name(&spi->dev), status);

> -       else

> +               spi_cleanup(spi);

> +       } else {

>                 dev_dbg(dev, "registered child %s\n", dev_name(&spi->dev));

> +       }

>

>  done:

>         mutex_unlock(&spi_add_lock);

> @@ -713,6 +717,8 @@ void spi_unregister_device(struct spi_device *spi)

>         if (!spi)

>                 return;

>

> +       spi_cleanup(spi);

> +

>         if (spi->dev.of_node) {

>                 of_node_clear_flag(spi->dev.of_node, OF_POPULATED);

>                 of_node_put(spi->dev.of_node);

> --

> 2.31.1.498.g6c1eba8ee3d-goog

>



-- 
With Best Regards,
Andy Shevchenko
Mark Brown April 27, 2021, 10:48 a.m. UTC | #2
On Tue, Apr 27, 2021 at 09:52:48AM +0300, Andy Shevchenko wrote:
> +Cc Lukas


The cleanup callback has been in release() since the framework was
merged AFAICT.
Andy Shevchenko April 27, 2021, 11:42 a.m. UTC | #3
On Tue, Apr 27, 2021 at 1:49 PM Mark Brown <broonie@kernel.org> wrote:
>

> On Tue, Apr 27, 2021 at 09:52:48AM +0300, Andy Shevchenko wrote:

> > +Cc Lukas

>

> The cleanup callback has been in release() since the framework was

> merged AFAICT.


Yep.

Personally it feels to me wrong to require device_release() being
atomic. It might be that I missed something in documentation or
somewhere else that suggests the opposite.
But let's wait for other comments if any.

-- 
With Best Regards,
Andy Shevchenko
Greg Kroah-Hartman April 27, 2021, 11:49 a.m. UTC | #4
On Tue, Apr 27, 2021 at 02:42:19PM +0300, Andy Shevchenko wrote:
> On Tue, Apr 27, 2021 at 1:49 PM Mark Brown <broonie@kernel.org> wrote:

> >

> > On Tue, Apr 27, 2021 at 09:52:48AM +0300, Andy Shevchenko wrote:

> > > +Cc Lukas

> >

> > The cleanup callback has been in release() since the framework was

> > merged AFAICT.

> 

> Yep.

> 

> Personally it feels to me wrong to require device_release() being

> atomic. It might be that I missed something in documentation or

> somewhere else that suggests the opposite.

> But let's wait for other comments if any.


There is no requirement from the driver core to have the release
callback be "atomic", you should be able to sleep just fine in there.

If not, something is wrong and has changed...

thanks,

greg k-h
Saravana Kannan April 27, 2021, 3:02 p.m. UTC | #5
On Tue, Apr 27, 2021 at 4:49 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>

> On Tue, Apr 27, 2021 at 02:42:19PM +0300, Andy Shevchenko wrote:

> > On Tue, Apr 27, 2021 at 1:49 PM Mark Brown <broonie@kernel.org> wrote:

> > >

> > > On Tue, Apr 27, 2021 at 09:52:48AM +0300, Andy Shevchenko wrote:

> > > > +Cc Lukas

> > >

> > > The cleanup callback has been in release() since the framework was

> > > merged AFAICT.

> >

> > Yep.

> >

> > Personally it feels to me wrong to require device_release() being

> > atomic. It might be that I missed something in documentation or

> > somewhere else that suggests the opposite.

> > But let's wait for other comments if any.

>

> There is no requirement from the driver core to have the release

> callback be "atomic", you should be able to sleep just fine in there.

>

> If not, something is wrong and has changed...


This patch is not just about the atomic thing though. I can drop that
from the commit text and I think this still fixes a real issue.
Calling code from another driver (not even the device's own driver)
during a device's release is not guaranteed to work at all (what if
the module gets unloaded?). And this patch also fixes some mismatched
setup/cleanup calls. Using device release for the cleanup() isn't
necessary and we can avoid this bug. This patch tries to fix that too.

As for the atomic thing, that seems to be a generic device link SRCU
implementation issue. It does a put_device() in an atomic context. I'm
not too familiar with the SRCU implementation or why it was needed.
Rafael would have a better idea on that. I can drop that part from the
commit text and move the atomic discussion back to Andy's "atomic
context" thread[1].

-Saravana
Mark Brown April 28, 2021, 4:53 p.m. UTC | #6
On Mon, 26 Apr 2021 16:56:38 -0700, Saravana Kannan wrote:
> When an SPI device is unregistered, the spi->controller->cleanup() is

> called in the device's release callback. That's wrong for a couple of

> reasons:

> 

> 1. spi_dev_put() can be called before spi_add_device() is called. And

>    it's spi_add_device() that calls spi_setup(). This will cause clean()

>    to get called without the spi device ever being setup.

> 

> [...]


Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/1] spi: Fix spi device unregister flow
      commit: c7299fea67696db5bd09d924d1f1080d894f92ef

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark
Lukas Wunner May 3, 2021, 10:07 a.m. UTC | #7
On Mon, Apr 26, 2021 at 04:56:38PM -0700, Saravana Kannan wrote:
> When an SPI device is unregistered, the spi->controller->cleanup() is

> called in the device's release callback. That's wrong for a couple of

> reasons:

> 

> 1. spi_dev_put() can be called before spi_add_device() is called. And

>    it's spi_add_device() that calls spi_setup(). This will cause clean()

>    to get called without the spi device ever being setup.


Well, yes, but it's not a big problem in practice so far:

I've checked all drivers and there are only four which are affected
by this: spi-mpc512x-psc.c spi-pic32.c spi-s3c64xx.c spi-st-ssc4.c

They all fiddle with the chipselect GPIO in their ->cleanup hook
and the GPIO may not have been requested yet because that happens
during ->setup.

All the other drivers merely invoke kzalloc() on ->setup and kfree()
on ->cleanup.  The order doesn't matter in this case because
kfree(NULL) is a no-op.


> 2. There's no guarantee that the controller's driver would be present by

>    the time the spi device's release function gets called.


How so?  spi_devices are instantiated on ->probe of the controller
via spi_register_controller() and destroyed on ->remove via
spi_unregister_controller().  I don't see how the controller driver
could ever be unavailable, so this point seems moot.


> Fix these issues by simply moving the cleanup from the device release

> callback to the actual spi_unregister_device() function.


Unfortunately the fix is wrong, it introduces a new problem:

> @@ -713,6 +717,8 @@ void spi_unregister_device(struct spi_device *spi)

>  	if (!spi)

>  		return;

>  

> +	spi_cleanup(spi);

> +

>  	if (spi->dev.of_node) {

>  		of_node_clear_flag(spi->dev.of_node, OF_POPULATED);

>  		of_node_put(spi->dev.of_node);


Now you're running ->cleanup before the SPI slave's driver is unbound.
That's bad, the driver may need to access the physical device on unbound,
e.g. to quiesce interrupts.  That may not work now because the
slave's controller_state is gone.

NAK, this needs to be reverted.

Thanks,

Lukas
Andy Shevchenko May 3, 2021, 10:16 a.m. UTC | #8
On Mon, May 3, 2021 at 1:07 PM Lukas Wunner <lukas@wunner.de> wrote:
> On Mon, Apr 26, 2021 at 04:56:38PM -0700, Saravana Kannan wrote:

> > When an SPI device is unregistered, the spi->controller->cleanup() is

> > called in the device's release callback. That's wrong for a couple of

> > reasons:

> >

> > 1. spi_dev_put() can be called before spi_add_device() is called. And

> >    it's spi_add_device() that calls spi_setup(). This will cause clean()

> >    to get called without the spi device ever being setup.

>

> Well, yes, but it's not a big problem in practice so far:

>

> I've checked all drivers and there are only four which are affected

> by this: spi-mpc512x-psc.c spi-pic32.c spi-s3c64xx.c spi-st-ssc4.c

>

> They all fiddle with the chipselect GPIO in their ->cleanup hook

> and the GPIO may not have been requested yet because that happens

> during ->setup.

>

> All the other drivers merely invoke kzalloc() on ->setup and kfree()

> on ->cleanup.  The order doesn't matter in this case because

> kfree(NULL) is a no-op.


Thanks, Lukas, for jumping in.

> > 2. There's no guarantee that the controller's driver would be present by

> >    the time the spi device's release function gets called.

>

> How so?  spi_devices are instantiated on ->probe of the controller

> via spi_register_controller() and destroyed on ->remove via

> spi_unregister_controller().  I don't see how the controller driver

> could ever be unavailable, so this point seems moot.

>

>

> > Fix these issues by simply moving the cleanup from the device release

> > callback to the actual spi_unregister_device() function.

>

> Unfortunately the fix is wrong, it introduces a new problem:

>

> > @@ -713,6 +717,8 @@ void spi_unregister_device(struct spi_device *spi)

> >       if (!spi)

> >               return;

> >

> > +     spi_cleanup(spi);

> > +

> >       if (spi->dev.of_node) {

> >               of_node_clear_flag(spi->dev.of_node, OF_POPULATED);

> >               of_node_put(spi->dev.of_node);

>

> Now you're running ->cleanup before the SPI slave's driver is unbound.

> That's bad, the driver may need to access the physical device on unbound,

> e.g. to quiesce interrupts.  That may not work now because the

> slave's controller_state is gone.

>

> NAK, this needs to be reverted.


I guess somebody should send the actual revert. Are you going to do so?


-- 
With Best Regards,
Andy Shevchenko
Saravana Kannan May 3, 2021, 5:21 p.m. UTC | #9
On Mon, May 3, 2021 at 3:07 AM Lukas Wunner <lukas@wunner.de> wrote:
>

> On Mon, Apr 26, 2021 at 04:56:38PM -0700, Saravana Kannan wrote:

> > When an SPI device is unregistered, the spi->controller->cleanup() is

> > called in the device's release callback. That's wrong for a couple of

> > reasons:

> >

> > 1. spi_dev_put() can be called before spi_add_device() is called. And

> >    it's spi_add_device() that calls spi_setup(). This will cause clean()

> >    to get called without the spi device ever being setup.

>

> Well, yes, but it's not a big problem in practice so far:

>

> I've checked all drivers and there are only four which are affected

> by this: spi-mpc512x-psc.c spi-pic32.c spi-s3c64xx.c spi-st-ssc4.c

>

> They all fiddle with the chipselect GPIO in their ->cleanup hook

> and the GPIO may not have been requested yet because that happens

> during ->setup.

>

> All the other drivers merely invoke kzalloc() on ->setup and kfree()

> on ->cleanup.  The order doesn't matter in this case because

> kfree(NULL) is a no-op.


That's making a lot of assumptions about drivers not doing certain
things in the future or making assumptions about the hardware (chip
select or whatever other configuration that might happen). Totally
unnecessary and error prone.

>

>

> > 2. There's no guarantee that the controller's driver would be present by

> >    the time the spi device's release function gets called.

>

> How so?  spi_devices are instantiated on ->probe of the controller

> via spi_register_controller() and destroyed on ->remove via

> spi_unregister_controller().  I don't see how the controller driver

> could ever be unavailable, so this point seems moot.


Just because put_device() is called on a struct device doesn't mean
it's getting destroyed immediately. The refcount needs to reach zero
for ->cleanup() to be called eventually. And there's no guarantee that
by the time the ref count hits zero that your controller driver is
still around. So, it's not a moot point.

> > Fix these issues by simply moving the cleanup from the device release

> > callback to the actual spi_unregister_device() function.

>

> Unfortunately the fix is wrong, it introduces a new problem:

>

> > @@ -713,6 +717,8 @@ void spi_unregister_device(struct spi_device *spi)

> >       if (!spi)

> >               return;

> >

> > +     spi_cleanup(spi);

> > +

> >       if (spi->dev.of_node) {

> >               of_node_clear_flag(spi->dev.of_node, OF_POPULATED);

> >               of_node_put(spi->dev.of_node);

>

> Now you're running ->cleanup before the SPI slave's driver is unbound.


By "slave" device, you mean struct spi_device, right?

Sorry if I'm mistaken about my understanding of the SPI framework.
Please explain how that's happening here. The main place
spi_unregister_device() is getting called from is
spi_controller_unregister(). If the controller's child/slave
spi_device's aren't unbound by then, you've got bigger problems even
without my patch?

> That's bad, the driver may need to access the physical device on unbound,

> e.g. to quiesce interrupts.  That may not work now because the

> slave's controller_state is gone.

>

> NAK, this needs to be reverted.


Please help me understand how this is broken. It's not clear to me.

-Saravana
Lukas Wunner May 3, 2021, 5:56 p.m. UTC | #10
On Mon, May 03, 2021 at 10:21:59AM -0700, Saravana Kannan wrote:
> On Mon, May 3, 2021 at 3:07 AM Lukas Wunner <lukas@wunner.de> wrote:

> > On Mon, Apr 26, 2021 at 04:56:38PM -0700, Saravana Kannan wrote:

> > > When an SPI device is unregistered, the spi->controller->cleanup() is

> > > called in the device's release callback. That's wrong for a couple of

> > > reasons:

> > >

> > > 1. spi_dev_put() can be called before spi_add_device() is called. And

> > >    it's spi_add_device() that calls spi_setup(). This will cause clean()

> > >    to get called without the spi device ever being setup.

> >

> > Well, yes, but it's not a big problem in practice so far:

> >

> > I've checked all drivers and there are only four which are affected

> > by this: spi-mpc512x-psc.c spi-pic32.c spi-s3c64xx.c spi-st-ssc4.c

> >

> > They all fiddle with the chipselect GPIO in their ->cleanup hook

> > and the GPIO may not have been requested yet because that happens

> > during ->setup.

> >

> > All the other drivers merely invoke kzalloc() on ->setup and kfree()

> > on ->cleanup.  The order doesn't matter in this case because

> > kfree(NULL) is a no-op.

> 

> That's making a lot of assumptions about drivers not doing certain

> things in the future or making assumptions about the hardware (chip

> select or whatever other configuration that might happen). Totally

> unnecessary and error prone.


I agree, I'm just not happy with the solution presented.

This could be solved by setting a flag in struct spi_device
once ->setup has returned successfully.


> > > 2. There's no guarantee that the controller's driver would be present by

> > >    the time the spi device's release function gets called.

> >

> > How so?  spi_devices are instantiated on ->probe of the controller

> > via spi_register_controller() and destroyed on ->remove via

> > spi_unregister_controller().  I don't see how the controller driver

> > could ever be unavailable, so this point seems moot.

> 

> Just because put_device() is called on a struct device doesn't mean

> it's getting destroyed immediately. The refcount needs to reach zero

> for ->cleanup() to be called eventually. And there's no guarantee that

> by the time the ref count hits zero that your controller driver is

> still around. So, it's not a moot point.


In theory, yes, but concretely, how is that going to happen?

We remove all the things that might be holding a ref on the spi_device
(such as sysfs entries, child devices), so when device_unregister()
is called from spi_unregister_device(), the expectation is really that
that's the last reference being dropped.

In theory it would be possible for some other driver to hold a ref,
but I don't see why it would be doing that.

Perhaps spidev.c makes it possible to keep an spi_device around even
though the controller has been removed, simply by keeping the device
file open from user space.  I'm not sure if that's the case but it's
probably something worth checking.


> > > Fix these issues by simply moving the cleanup from the device release

> > > callback to the actual spi_unregister_device() function.

> >

> > Unfortunately the fix is wrong, it introduces a new problem:

> >

> > > @@ -713,6 +717,8 @@ void spi_unregister_device(struct spi_device *spi)

> > >       if (!spi)

> > >               return;

> > >

> > > +     spi_cleanup(spi);

> > > +

> > >       if (spi->dev.of_node) {

> > >               of_node_clear_flag(spi->dev.of_node, OF_POPULATED);

> > >               of_node_put(spi->dev.of_node);

> >

> > Now you're running ->cleanup before the SPI slave's driver is unbound.

> 

> By "slave" device, you mean struct spi_device, right?


Yes.


> Sorry if I'm mistaken about my understanding of the SPI framework.

> Please explain how that's happening here. The main place

> spi_unregister_device() is getting called from is

> spi_controller_unregister(). If the controller's child/slave

> spi_device's aren't unbound by then, you've got bigger problems even

> without my patch?


Without your patch:

spi_unregister_device()
  device_unregister()
    device_del()
      bus_remove_device()
        device_release_driver() # access to physical SPI device in ->remove()
    put_device()
      kobject_put()
        kref_put()
	  kobject_release()
	    kobject_cleanup()
	      device_release()
	        spidev_release()
		  spi->controller->cleanup() # controller_state freed

With your patch:

spi_unregister_device()
  spi_cleanup()
    spi->controller->cleanup() # controller_state freed
  device_unregister()
    device_del()
      bus_remove_device()
        device_release_driver() # access to physical SPI device in ->remove()

As a case in point, an SPI Ethernet driver I'm familiar with,
drivers/net/ethernet/micrel/ks8851_common.c, performs various
register accesses on driver unbind in ks8851_net_stop().
So on driver unbind, the SPI device still needs to be accessible.

However the controller_state may be necessary to access the device,
so freeing that before unbind is a no-go.

Let me know if this explanation wasn't sufficient.

Thanks,

Lukas
Saravana Kannan May 3, 2021, 6:15 p.m. UTC | #11
On Mon, May 3, 2021 at 10:56 AM Lukas Wunner <lukas@wunner.de> wrote:
>

> On Mon, May 03, 2021 at 10:21:59AM -0700, Saravana Kannan wrote:

> > On Mon, May 3, 2021 at 3:07 AM Lukas Wunner <lukas@wunner.de> wrote:

> > > On Mon, Apr 26, 2021 at 04:56:38PM -0700, Saravana Kannan wrote:

> > > > When an SPI device is unregistered, the spi->controller->cleanup() is

> > > > called in the device's release callback. That's wrong for a couple of

> > > > reasons:

> > > >

> > > > 1. spi_dev_put() can be called before spi_add_device() is called. And

> > > >    it's spi_add_device() that calls spi_setup(). This will cause clean()

> > > >    to get called without the spi device ever being setup.

> > >

> > > Well, yes, but it's not a big problem in practice so far:

> > >

> > > I've checked all drivers and there are only four which are affected

> > > by this: spi-mpc512x-psc.c spi-pic32.c spi-s3c64xx.c spi-st-ssc4.c

> > >

> > > They all fiddle with the chipselect GPIO in their ->cleanup hook

> > > and the GPIO may not have been requested yet because that happens

> > > during ->setup.

> > >

> > > All the other drivers merely invoke kzalloc() on ->setup and kfree()

> > > on ->cleanup.  The order doesn't matter in this case because

> > > kfree(NULL) is a no-op.

> >

> > That's making a lot of assumptions about drivers not doing certain

> > things in the future or making assumptions about the hardware (chip

> > select or whatever other configuration that might happen). Totally

> > unnecessary and error prone.

>

> I agree, I'm just not happy with the solution presented.

>

> This could be solved by setting a flag in struct spi_device

> once ->setup has returned successfully.

>

>

> > > > 2. There's no guarantee that the controller's driver would be present by

> > > >    the time the spi device's release function gets called.

> > >

> > > How so?  spi_devices are instantiated on ->probe of the controller

> > > via spi_register_controller() and destroyed on ->remove via

> > > spi_unregister_controller().  I don't see how the controller driver

> > > could ever be unavailable, so this point seems moot.

> >

> > Just because put_device() is called on a struct device doesn't mean

> > it's getting destroyed immediately. The refcount needs to reach zero

> > for ->cleanup() to be called eventually. And there's no guarantee that

> > by the time the ref count hits zero that your controller driver is

> > still around. So, it's not a moot point.

>

> In theory, yes, but concretely, how is that going to happen?

>

> We remove all the things that might be holding a ref on the spi_device

> (such as sysfs entries, child devices), so when device_unregister()

> is called from spi_unregister_device(), the expectation is really that

> that's the last reference being dropped.

>

> In theory it would be possible for some other driver to hold a ref,

> but I don't see why it would be doing that.

>

> Perhaps spidev.c makes it possible to keep an spi_device around even

> though the controller has been removed, simply by keeping the device

> file open from user space.  I'm not sure if that's the case but it's

> probably something worth checking.


We can't rule out all the cases and assume refcount would hit zero
when the framework does put_device() on the spi_device. So I don't
think there's even a point in trying to find if this can happen. But
since you asked, creating device links to this device is just one
example of how this could happen.

>

> > > > Fix these issues by simply moving the cleanup from the device release

> > > > callback to the actual spi_unregister_device() function.

> > >

> > > Unfortunately the fix is wrong, it introduces a new problem:

> > >

> > > > @@ -713,6 +717,8 @@ void spi_unregister_device(struct spi_device *spi)

> > > >       if (!spi)

> > > >               return;

> > > >

> > > > +     spi_cleanup(spi);

> > > > +

> > > >       if (spi->dev.of_node) {

> > > >               of_node_clear_flag(spi->dev.of_node, OF_POPULATED);

> > > >               of_node_put(spi->dev.of_node);

> > >

> > > Now you're running ->cleanup before the SPI slave's driver is unbound.

> >

> > By "slave" device, you mean struct spi_device, right?

>

> Yes.

>

>

> > Sorry if I'm mistaken about my understanding of the SPI framework.

> > Please explain how that's happening here. The main place

> > spi_unregister_device() is getting called from is

> > spi_controller_unregister(). If the controller's child/slave

> > spi_device's aren't unbound by then, you've got bigger problems even

> > without my patch?

>

> Without your patch:

>

> spi_unregister_device()

>   device_unregister()

>     device_del()

>       bus_remove_device()

>         device_release_driver() # access to physical SPI device in ->remove()

>     put_device()

>       kobject_put()

>         kref_put()

>           kobject_release()

>             kobject_cleanup()

>               device_release()

>                 spidev_release()

>                   spi->controller->cleanup() # controller_state freed

>

> With your patch:

>

> spi_unregister_device()

>   spi_cleanup()

>     spi->controller->cleanup() # controller_state freed

>   device_unregister()

>     device_del()

>       bus_remove_device()

>         device_release_driver() # access to physical SPI device in ->remove()

>

> As a case in point, an SPI Ethernet driver I'm familiar with,

> drivers/net/ethernet/micrel/ks8851_common.c, performs various

> register accesses on driver unbind in ks8851_net_stop().

> So on driver unbind, the SPI device still needs to be accessible.

>

> However the controller_state may be necessary to access the device,

> so freeing that before unbind is a no-go.

>

> Let me know if this explanation wasn't sufficient.


Ah, make sense. My bad. I saw the of_node_put() in
spi_unregister_device() and glossed over the rest of the code because
I assumed the of_node_put() wouldn't have been done before the device
was released.

So, it looks like the fix is simple. We just need to move
spi_cleanup() to the bottom of spi_unregister_device(). I'll send a
patch for that rather than reverting this and bringing back the other
bugs.

-Saravana
Lukas Wunner May 4, 2021, 9:17 a.m. UTC | #12
On Mon, May 03, 2021 at 11:15:50AM -0700, Saravana Kannan wrote:
> On Mon, May 3, 2021 at 10:56 AM Lukas Wunner <lukas@wunner.de> wrote:

> > Without your patch:

> >

> > spi_unregister_device()

> >   device_unregister()

> >     device_del()

> >       bus_remove_device()

> >         device_release_driver() # access to physical SPI device in ->remove()

> >     put_device()

> >       kobject_put()

> >         kref_put()

> >           kobject_release()

> >             kobject_cleanup()

> >               device_release()

> >                 spidev_release()

> >                   spi->controller->cleanup() # controller_state freed

> >

> > With your patch:

> >

> > spi_unregister_device()

> >   spi_cleanup()

> >     spi->controller->cleanup() # controller_state freed

> >   device_unregister()

> >     device_del()

> >       bus_remove_device()

> >         device_release_driver() # access to physical SPI device in ->remove()

[...]
> So, it looks like the fix is simple. We just need to move

> spi_cleanup() to the bottom of spi_unregister_device(). I'll send a

> patch for that rather than reverting this and bringing back the other

> bugs.


That would result in a use-after-free if the call to device_unregister()
indeed releases the last ref to the spi_device (which I'd expect is
usually the case).

However, something like this might work (in spi_unregister_device()):

	device_del(&spi->dev);
	spi_cleanup(spi);
	put_device(&spi->dev);

Thanks,

Lukas
diff mbox series

Patch

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index b08efe88ccd6..7d0d89172a1d 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -47,10 +47,6 @@  static void spidev_release(struct device *dev)
 {
 	struct spi_device	*spi = to_spi_device(dev);
 
-	/* spi controllers may cleanup for released devices */
-	if (spi->controller->cleanup)
-		spi->controller->cleanup(spi);
-
 	spi_controller_put(spi->controller);
 	kfree(spi->driver_override);
 	kfree(spi);
@@ -558,6 +554,12 @@  static int spi_dev_check(struct device *dev, void *data)
 	return 0;
 }
 
+static void spi_cleanup(struct spi_device *spi)
+{
+	if (spi->controller->cleanup)
+		spi->controller->cleanup(spi);
+}
+
 /**
  * spi_add_device - Add spi_device allocated with spi_alloc_device
  * @spi: spi_device to register
@@ -622,11 +624,13 @@  int spi_add_device(struct spi_device *spi)
 
 	/* Device may be bound to an active driver when this returns */
 	status = device_add(&spi->dev);
-	if (status < 0)
+	if (status < 0) {
 		dev_err(dev, "can't add %s, status %d\n",
 				dev_name(&spi->dev), status);
-	else
+		spi_cleanup(spi);
+	} else {
 		dev_dbg(dev, "registered child %s\n", dev_name(&spi->dev));
+	}
 
 done:
 	mutex_unlock(&spi_add_lock);
@@ -713,6 +717,8 @@  void spi_unregister_device(struct spi_device *spi)
 	if (!spi)
 		return;
 
+	spi_cleanup(spi);
+
 	if (spi->dev.of_node) {
 		of_node_clear_flag(spi->dev.of_node, OF_POPULATED);
 		of_node_put(spi->dev.of_node);