mbox series

[v2,0/2] Let userspace know when snd-hda-intel needs i915

Message ID cover.1651314499.git.mchehab@kernel.org
Headers show
Series Let userspace know when snd-hda-intel needs i915 | expand

Message

Mauro Carvalho Chehab April 30, 2022, 10:30 a.m. UTC
Currently, kernel/module annotates module dependencies when
request_symbol is used, but it doesn't cover more complex inter-driver
dependencies that are subsystem and/or driver-specific.

In the case of hdmi sound, depending on the CPU/GPU, sometimes the
snd_hda_driver can talk directly with the hardware, but sometimes, it
uses the i915 driver. When the snd_hda_driver uses i915, it should
first be unbind/rmmod, as otherwise trying to unbind/rmmod the i915
driver cause driver issues, as as reported by CI tools with different
GPU models:

	https://intel-gfx-ci.01.org/tree/drm-tip/IGT_6415/fi-tgl-1115g4/igt@core_hotunplug@unbind-rebind.html
	https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11495/bat-adlm-1/igt@i915_module_load@reload.html

In the past, just a few CPUs were doing such bindings, but this issue now
applies to all "modern" Intel CPUs  that have onboard graphics, as well as
to the  newer discrete GPUs.

With the discrete GPU case, the HDA controller is physically separate and
requires i915 to power on the hardware for all hardware  access. In this
case, the issue is hit basicly 100% of the time.

With on-board graphics, i915 driver is needed only when the display
codec is accessed. If i915 is unbind during runtime suspend, while
snd-hda-intel is still bound, nothing bad happens, but unbinding i915
on other situations may also cause issues.

So, add support at kernel/modules to allow snd-hda drivers to properly
annotate when a dependency on a DRM driver dependencies exists,
and add a call to such new function at the snd-hda driver when it
successfully binds into the DRM driver.

This would allow userspace tools to check and properly remove the
audio driver before trying to remove or unbind the GPU driver.

It should be noticed that this series conveys the hidden module
dependencies. Other changes are needed in order to allow
removing or unbinding the i915 driver while keeping the snd-hda-intel
driver loaded/bound. With that regards, there are some discussions on
how to improve this at alsa-devel a while  back:

https://mailman.alsa-project.org/pipermail/alsa-devel/2021-September/190099.html

So, future improvements on both in i915 and the audio drivers could be made.
E.g. with  discrete GPUs, it's the only codec of the card, so it seems feasible
to detach the ALSA card if i915 is bound (using infra made for VGA
switcheroo), but,  until these improvements are done and land in
upstream, audio drivers needs to be unbound if i915 driver goes unbind.

Yet, even if such fixes got merged, this series is still needed, as it makes
such dependencies more explicit and easier to debug.

PS.: This series was generated against next-20220428.

---

v2: the dependencies are now handled directly at try_module_get().


Mauro Carvalho Chehab (2):
  module: update dependencies at try_module_get()
  ALSA: hda - identify when audio is provided by a video driver

 include/linux/module.h     |  4 +++-
 kernel/module/main.c       | 35 +++++++++++++++++++++++++++++++++--
 sound/hda/hdac_component.c |  2 +-
 3 files changed, 37 insertions(+), 4 deletions(-)

Comments

Greg Kroah-Hartman April 30, 2022, 12:04 p.m. UTC | #1
On Sat, Apr 30, 2022 at 11:30:58AM +0100, Mauro Carvalho Chehab wrote:
> Sometimes, device drivers are bound into each other via try_module_get(),
> making such references invisible when looking at /proc/modules or lsmod.
> 
> Add a function to allow setting up module references for such
> cases, and call it when try_module_get() is used.
> 
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH v2 0/2] at: https://lore.kernel.org/all/cover.1651314499.git.mchehab@kernel.org/
> 
>  include/linux/module.h |  4 +++-
>  kernel/module/main.c   | 35 +++++++++++++++++++++++++++++++++--
>  2 files changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 46d4d5f2516e..836851baaad4 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -620,7 +620,9 @@ extern void __module_get(struct module *module);
>  
>  /* This is the Right Way to get a module: if it fails, it's being removed,
>   * so pretend it's not there. */
> -extern bool try_module_get(struct module *module);
> +extern bool __try_module_get(struct module *module, struct module *this);
> +
> +#define try_module_get(mod) __try_module_get(mod, THIS_MODULE)
>  
>  extern void module_put(struct module *module);
>  
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 05a42d8fcd7a..9f4416381e65 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -631,6 +631,35 @@ static int ref_module(struct module *a, struct module *b)
>  	return 0;
>  }
>  
> +static int ref_module_dependency(struct module *mod,
> +				       struct module *this)

This can be on one line, right?

> +{
> +	int ret;
> +
> +	if (!this || !this->name) {
> +		return -EINVAL;
> +	}

Did you run checkpatch on this?  Please do :)

> +
> +	if (mod == this)
> +		return 0;

How can this happen?

When people mistakenly call try_module_get(THIS_MODULE)?  We should
throw up a big warning when that happens anyway as that's always wrong.

But that's a different issue from this change, sorry for the noise.

> +
> +	mutex_lock(&module_mutex);
> +
> +	ret = ref_module(this, mod);
> +
> +#ifdef CONFIG_MODULE_UNLOAD
> +	if (ret)
> +		goto ret;
> +
> +	ret = sysfs_create_link(mod->holders_dir,
> +				&this->mkobj.kobj, this->name);

Meta comment, why do we only create links if we can unload things?

thanks,

greg k-h
David Laight May 1, 2022, 1:23 p.m. UTC | #2
From: Mauro Carvalho Chehab
> Sent: 30 April 2022 14:38
> 
> Em Sat, 30 Apr 2022 14:04:59 +0200
> Greg KH <gregkh@linuxfoundation.org> escreveu:
> 
> > On Sat, Apr 30, 2022 at 11:30:58AM +0100, Mauro Carvalho Chehab wrote:
> 
> > Did you run checkpatch on this?  Please do :)
> >
> > > +
> > > +	if (mod == this)
> > > +		return 0;
> >
> > How can this happen?
> > When people mistakenly call try_module_get(THIS_MODULE)?
> 
> Yes. There are lots of place where this is happening:
> 
> 	$ git grep try_module_get\(THIS_MODULE|wc -l
> 	82
> 
> > We should
> > throw up a big warning when that happens anyway as that's always wrong.
> >
> > But that's a different issue from this change, sorry for the noise.
> 
> It sounds very weird to use try_module_get(THIS_MODULE).
> 
> We could add a WARN_ON() there - or something similar - but I would do it
> on a separate patch.

You could add a compile-time check.
But a run-time one seems unnecessary.
Clearly try_module_get(THIS_MODULE) usually succeeds.

I think I can invent a case where it can fail:
The module count must be zero, and a module unload in progress.
The thread doing the unload is blocked somewhere.
Another thread makes a callback into the module for some request
that (for instance) would need to create a kernel thread.
It tries to get a reference for the thread.
So try_module_get(THIS_MODULE) is the right call - and will fail here.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)