mbox series

[v3,0/2] mcb: Fix crash mcb-core module is removed

Message ID 20230906114901.63174-1-JoseJavier.Rodriguez@duagon.com
Headers show
Series mcb: Fix crash mcb-core module is removed | expand

Message

Rodríguez Barbarin, José Javier Sept. 6, 2023, 11:49 a.m. UTC
When allocating a new mcb_bus the bus_type is added to the mcb_bus itself,
causing an issue when calling mcb_bus_add_devices(). This function is not
only called for each mcb_device under the mcb_bus but for the bus itself.

The crash happens when the mcb_core module is removed, getting
the following error:

[  286.691693] ------------[ cut here ]------------
[  286.691695] ida_free called for id=1 which is not allocated.
[  286.691714] WARNING: CPU: 0 PID: 1719 at lib/idr.c:523 ida_free+0xe0/0x140
[  286.691715] Modules linked in: snd_hda_codec_hdmi amd64_edac_mod snd_hda_intel edac_mce_amd snd_intel_dspcfg kvm_amd snd_hda_codec amdgpu nls_iso8859_1 ccp snd_hda_core snd_hwdep amd_iommu_v2 kvm snd_pcm gpu_sched crct10dif_pclmul crc32_pclmul snd_seq_midi snd_seq_midi_event ghash_clmulni_intel ttm snd_rawmidi aesni_intel snd_seq binfmt_misc crypto_simd cryptd glue_helper drm_kms_helper snd_seq_device snd_timer drm snd k10temp fb_sys_fops syscopyarea sysfillrect sysimgblt snd_rn_pci_acp3x mcb_pci(-) snd_pci_acp3x soundcore altera_cvp fpga_mgr mcb spi_nor mtd 8250_dw mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 mmc_block nvme ahci i2c_piix4 libahci i2c_amd_mp2_pci igb nvme_core i2c_algo_bit dca video sdhci_acpi sdhci [last unloaded: 8250_men_mcb]
[  286.691752] CPU: 0 PID: 1719 Comm: modprobe Not tainted 5.4.702+ #11
[  286.691753] Hardware name: MEN F027/n/a, BIOS 1.03 04/20/2021
[  286.691756] RIP: 0010:ida_free+0xe0/0x140
[  286.691759] Code: a8 31 f6 e8 12 f7 00 00 eb 4b 4c 0f a3 28 72 21 48 8b 7d a8 4c 89 f6 e8 8e ad 02 00 89 de 48 c7 c7 e8 02 83 b5 e8 b0 7a 5d ff <0f> 0b e9 67 ff ff ff 4c 0f b3 28 48 8d 7d a8 31 f6 e8 da e0 00 00
[  286.691761] RSP: 0018:ffff9a56c38f7bd8 EFLAGS: 00010282
[  286.691763] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000006
[  286.691764] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8d881fa1c8c0
[  286.691765] RBP: ffff9a56c38f7c30 R08: 0000000000000487 R09: 0000000000000004
[  286.691766] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
[  286.691767] R13: 0000000000000001 R14: 0000000000000202 R15: 0000000000000001
[  286.691769] FS:  00007fb78e303540(0000) GS:ffff8d881fa00000(0000) knlGS:0000000000000000
[  286.691770] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  286.691771] CR2: 00007ffe92b2ce98 CR3: 000000079fd9c000 CR4: 00000000003406f0
[  286.691772] Call Trace:
[  286.691781]  mcb_free_bus+0x2b/0x40 [mcb]
[  286.691785]  device_release+0x2c/0x80
[  286.691787]  kobject_put+0xb9/0x1d0
[  286.691790]  put_device+0x13/0x20

As mcb_bus_add_devices() is called for the mcb_bus itself, the function
tries to cast the incorrectly passed struct mcb_bus to mcb_device. Both
structs have the same layout:

struct mcb_bus {
        struct device dev;
        struct device *carrier;
        int bus_nr;
...
};

struct mcb_device {
        struct device dev;
        struct mcb_bus *bus;
        bool is_added;
...
};

This incorrect casting is causing a wrong behaviour in
mcb_bus_add_devices() where the member bus_nr is casted to is_added,
meaning that when bus_nr is "0", the function continues and sets bus_nr
to "1" (is_added = true)

If we have 2 buses (one for each F215 board), the function ida_alloc()
will give the value "0" and "1" to each bus respectively, but as both
buses are included themselves in the devices' lists, after the call to
mcb_bus_add_devices(), the buses will have the value "1" and "1". For
this reason, when the mcb-core module is removed, the error raises as
the ida resource with value "1" is being released twice, leaking
the ida resource with value "0".

changes for V3:
* remove the need for a cast that breaks bus_nr member of mcb_bus struct
* style fix for function pointer mcb_free_bus

This patch is based on linux-next (next-20230906)

Jorge Sanjuan Garcia (2):
  mcb: remove is_added flag from mcb_device struct
  mcb: use short version for function pointer for mcb_free_bus

 drivers/mcb/mcb-core.c  | 12 ++++--------
 drivers/mcb/mcb-parse.c |  2 --
 include/linux/mcb.h     |  1 -
 3 files changed, 4 insertions(+), 11 deletions(-)

Comments

Rodríguez Barbarin, José Javier Sept. 22, 2023, 12:28 p.m. UTC | #1
On Wed, 2023-09-20 at 15:18 +0200, gregkh@linuxfoundation.org wrote:
> On Wed, Sep 06, 2023 at 11:49:28AM +0000, Rodríguez Barbarin, José
> Javier wrote:
> > From: Jorge Sanjuan Garcia <jorge.sanjuangarcia@duagon.com>
> > 
> > Just a style change so that the device release callbacks are
> > defined
> > in the same way for devices in mcb_bus and mcb_device.
> > 
> > Signed-off-by: Jorge Sanjuan Garcia
> > <jorge.sanjuangarcia@duagon.com>
> > Co-developed-by: Jose Javier Rodriguez Barbarin
> > <JoseJavier.Rodriguez@duagon.com>
> > Signed-off-by: Jose Javier Rodriguez Barbarin
> > <JoseJavier.Rodriguez@duagon.com>
> > ---
> >  drivers/mcb/mcb-core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/mcb/mcb-core.c b/drivers/mcb/mcb-core.c
> > index 0cac5bead84f..5c6157b0db75 100644
> > --- a/drivers/mcb/mcb-core.c
> > +++ b/drivers/mcb/mcb-core.c
> > @@ -288,7 +288,7 @@ struct mcb_bus *mcb_alloc_bus(struct device
> > *carrier)
> >         bus->dev.parent = carrier;
> >         bus->dev.bus = &mcb_bus_type;
> >         bus->dev.type = &mcb_carrier_device_type;
> > -       bus->dev.release = &mcb_free_bus;
> > +       bus->dev.release = mcb_free_bus;
> 
> But you aren't fixing the root cause here of an incorrect pointer
> being
> passed to this function, right?
> 
> Yes, removing the single variable is nicer, so the crash doesn't
> happen,
> but you are still passing the wrong pointer around, so why not fix
> that?
> 

> thanks,
> 
> greg k-h

The pointer to struct device in function __mcb_bus_add_devices() always
was the correct one. The problem came when calling to function
to_mcb_device() which was hapenning even for the case of struct device
pointer being a member of struct mcb_bus.

Removing the need for this conversion makes the function generic so
that it will work for both mcb_device and mcb_bus structs. This already
fixes the crash as no member overlapping will occur (is_added in
mcb_device struct and bus_nr in mcb_bus struct).

We belive the pointer is the correct one and this patch series was
actually fixing the root cause of the crash. What do you mean by
"passing the wrong pointer around"? are we missing something?

thanks,
Rodríguez Barbarin, José Javier Oct. 6, 2023, 12:32 p.m. UTC | #2
On Thu, 2023-10-05 at 09:46 +0200, gregkh@linuxfoundation.org wrote:
> On Fri, Sep 22, 2023 at 12:28:14PM +0000, Rodríguez Barbarin, José
> Javier wrote:
> > On Wed, 2023-09-20 at 15:18 +0200,
> > gregkh@linuxfoundation.org wrote:
> > > On Wed, Sep 06, 2023 at 11:49:28AM +0000, Rodríguez Barbarin,
> > > José
> > > Javier wrote:
> > > > From: Jorge Sanjuan Garcia <jorge.sanjuangarcia@duagon.com>
> > > > 
> > > > Just a style change so that the device release callbacks are
> > > > defined
> > > > in the same way for devices in mcb_bus and mcb_device.
> > > > 
> > > > Signed-off-by: Jorge Sanjuan Garcia
> > > > <jorge.sanjuangarcia@duagon.com>
> > > > Co-developed-by: Jose Javier Rodriguez Barbarin
> > > > <JoseJavier.Rodriguez@duagon.com>
> > > > Signed-off-by: Jose Javier Rodriguez Barbarin
> > > > <JoseJavier.Rodriguez@duagon.com>
> > > > ---
> > > >  drivers/mcb/mcb-core.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/mcb/mcb-core.c b/drivers/mcb/mcb-core.c
> > > > index 0cac5bead84f..5c6157b0db75 100644
> > > > --- a/drivers/mcb/mcb-core.c
> > > > +++ b/drivers/mcb/mcb-core.c
> > > > @@ -288,7 +288,7 @@ struct mcb_bus *mcb_alloc_bus(struct device
> > > > *carrier)
> > > >         bus->dev.parent = carrier;
> > > >         bus->dev.bus = &mcb_bus_type;
> > > >         bus->dev.type = &mcb_carrier_device_type;
> > > > -       bus->dev.release = &mcb_free_bus;
> > > > +       bus->dev.release = mcb_free_bus;
> > > 
> > > But you aren't fixing the root cause here of an incorrect pointer
> > > being
> > > passed to this function, right?
> > > 
> > > Yes, removing the single variable is nicer, so the crash doesn't
> > > happen,
> > > but you are still passing the wrong pointer around, so why not
> > > fix
> > > that?
> > > 
> > 
> > > thanks,
> > > 
> > > greg k-h
> > 
> > The pointer to struct device in function __mcb_bus_add_devices()
> > always
> > was the correct one. The problem came when calling to function
> > to_mcb_device() which was hapenning even for the case of struct
> > device
> > pointer being a member of struct mcb_bus.
> > 
> > Removing the need for this conversion makes the function generic so
> > that it will work for both mcb_device and mcb_bus structs. This
> > already
> > fixes the crash as no member overlapping will occur (is_added in
> > mcb_device struct and bus_nr in mcb_bus struct).
> > 
> > We belive the pointer is the correct one and this patch series was
> > actually fixing the root cause of the crash. What do you mean by
> > "passing the wrong pointer around"? are we missing something?
> 
> Ok, I understand now, yes, this looks correct.
> 
> But, the function mcb_bus_add_devices() seems odd to me.  You are
> passing in a parameter that you are never using, so why have it at
> all?
> You are implying that you only have one bus, yet you are ignoring the
> bus sent to you?
> 
> This still seems wrong.
> 
> I'll queue up this series as it obviously fixes a bug, but more needs
> to
> be done here.
> 
> thanks,
> 
> greg k-h

Thank you Greg, I will think about your suggestions and as soon as I
have a new patch that fixes it, I will send it to you.

Regards