diff mbox series

ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music

Message ID 20221121050744.2278-1-david.rau.zg@renesas.com
State Accepted
Commit 969357ec94e670571d6593f2a93aba25e4577d4f
Headers show
Series ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music | expand

Commit Message

David Rau Nov. 21, 2022, 5:07 a.m. UTC
The OMTP pin define headsets can be mis-detected as line out
instead of OMTP, causing obvious issues with audio quality.
This patch is to put increased resistances within
the device at a suitable point.

To solve this issue better, the new mechanism setup
ground switches with conditional delay control
and these allow for more stabile detection process
to operate as intended. This conditional delay control
will not impact the hardware process
but use extra system resource.

This commit improves control of ground switches in the AAD logic.

Signed-off-by: David Rau <david.rau.zg@renesas.com>
---
 sound/soc/codecs/da7219-aad.c | 42 ++++++++++++++++++++++++++++++-----
 sound/soc/codecs/da7219-aad.h |  1 +
 2 files changed, 37 insertions(+), 6 deletions(-)

Comments

Mark Brown Dec. 1, 2022, 1:24 p.m. UTC | #1
On Mon, 21 Nov 2022 05:07:44 +0000, David Rau wrote:
> The OMTP pin define headsets can be mis-detected as line out
> instead of OMTP, causing obvious issues with audio quality.
> This patch is to put increased resistances within
> the device at a suitable point.
> 
> To solve this issue better, the new mechanism setup
> ground switches with conditional delay control
> and these allow for more stabile detection process
> to operate as intended. This conditional delay control
> will not impact the hardware process
> but use extra system resource.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
      commit: 969357ec94e670571d6593f2a93aba25e4577d4f

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark
David Rau Jan. 19, 2023, 11:02 a.m. UTC | #2
Would you please provide me the related error messages when hung task crashes in da7219_aad_irq_thread()?
BTW, "gnd_switch_delay = 256" is an unusual use case of the longer jack detection latency. 

-----Original Message-----
From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck
Sent: Wednesday, January 18, 2023 03:57
To: David Rau <we730128@gmail.com>
Cc: perex@perex.cz; lgirdwood@gmail.com; broonie@kernel.org; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org; David Rau <david.rau.zg@renesas.com>
Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music

On Mon, Nov 21, 2022 at 05:07:44AM +0000, David Rau wrote:
> The OMTP pin define headsets can be mis-detected as line out instead 
> of OMTP, causing obvious issues with audio quality.
> This patch is to put increased resistances within the device at a 
> suitable point.
> 
> To solve this issue better, the new mechanism setup ground switches 
> with conditional delay control and these allow for more stabile 
> detection process to operate as intended. This conditional delay 
> control will not impact the hardware process but use extra system 
> resource.
> 
> This commit improves control of ground switches in the AAD logic.
> 
> Signed-off-by: David Rau <david.rau.zg@renesas.com>
> ---
>  sound/soc/codecs/da7219-aad.c | 42 
> ++++++++++++++++++++++++++++++-----
>  sound/soc/codecs/da7219-aad.h |  1 +
>  2 files changed, 37 insertions(+), 6 deletions(-)
> 
> diff --git a/sound/soc/codecs/da7219-aad.c 
> b/sound/soc/codecs/da7219-aad.c index bba73c44c219..08200ec259f9 
> 100644
> --- a/sound/soc/codecs/da7219-aad.c
> +++ b/sound/soc/codecs/da7219-aad.c
> @@ -352,9 +352,14 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data)
>  	struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component);
>  	struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
>  	u8 events[DA7219_AAD_IRQ_REG_MAX];
> -	u8 statusa;
> +	u8 statusa, srm_st;
>  	int i, report = 0, mask = 0;
>  
> +	srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & DA7219_PLL_SRM_STS_MCLK;
> +	msleep(da7219_aad->gnd_switch_delay * ((srm_st == 0x0) ? 2 : 1) - 
> +4);

Ever since this patch was applied to ChromeOS, we have observed hung task crashes in da7219_aad_irq_thread().

Is it really appropriate to sleep up to (256 * 2) - 4 = 508 ms in an interrupt handler ?

Thanks,
Guenter

> +	/* Enable ground switch */
> +	snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
> +
>  	/* Read current IRQ events */
>  	regmap_bulk_read(da7219->regmap, DA7219_ACCDET_IRQ_EVENT_A,
>  			 events, DA7219_AAD_IRQ_REG_MAX);
> @@ -454,8 +459,8 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data)
>  			snd_soc_dapm_disable_pin(dapm, "Mic Bias");
>  			snd_soc_dapm_sync(dapm);
>  
> -			/* Enable ground switch */
> -			snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
> +			/* Disable ground switch */
> +			snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00);
>  		}
>  	}
>  
> @@ -831,6 +836,32 @@ static void da7219_aad_handle_pdata(struct snd_soc_component *component)
>  	}
>  }
>  
> +static void da7219_aad_handle_gnd_switch_time(struct 
> +snd_soc_component *component) {
> +	struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
> +	struct da7219_aad_priv *da7219_aad = da7219->aad;
> +	u8 jack_det;
> +
> +	jack_det = snd_soc_component_read(component, DA7219_ACCDET_CONFIG_2)
> +		& DA7219_JACK_DETECT_RATE_MASK;
> +	switch (jack_det) {
> +	case 0x00:
> +		da7219_aad->gnd_switch_delay = 32;
> +		break;
> +	case 0x10:
> +		da7219_aad->gnd_switch_delay = 64;
> +		break;
> +	case 0x20:
> +		da7219_aad->gnd_switch_delay = 128;
> +		break;
> +	case 0x30:
> +		da7219_aad->gnd_switch_delay = 256;
> +		break;
> +	default:
> +		da7219_aad->gnd_switch_delay = 32;
> +		break;
> +	}
> +}
>  
>  /*
>   * Suspend/Resume
> @@ -908,9 +939,6 @@ int da7219_aad_init(struct snd_soc_component *component)
>  	snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1,
>  			    DA7219_BUTTON_CONFIG_MASK, 0);
>  
> -	/* Enable ground switch */
> -	snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
> -
>  	INIT_WORK(&da7219_aad->btn_det_work, da7219_aad_btn_det_work);
>  	INIT_WORK(&da7219_aad->hptest_work, da7219_aad_hptest_work);
>  
> @@ -928,6 +956,8 @@ int da7219_aad_init(struct snd_soc_component *component)
>  	regmap_bulk_write(da7219->regmap, DA7219_ACCDET_IRQ_MASK_A,
>  			  &mask, DA7219_AAD_IRQ_REG_MAX);
>  
> +	da7219_aad_handle_gnd_switch_time(component);
> +
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(da7219_aad_init);
> diff --git a/sound/soc/codecs/da7219-aad.h 
> b/sound/soc/codecs/da7219-aad.h index f48a12012ef3..21fdf53095cc 
> 100644
> --- a/sound/soc/codecs/da7219-aad.h
> +++ b/sound/soc/codecs/da7219-aad.h
> @@ -187,6 +187,7 @@ enum da7219_aad_event_regs {  struct 
> da7219_aad_priv {
>  	struct snd_soc_component *component;
>  	int irq;
> +	int gnd_switch_delay;
>  
>  	u8 micbias_pulse_lvl;
>  	u32 micbias_pulse_time;
> --
> 2.17.1
>
Guenter Roeck Jan. 19, 2023, 4:12 p.m. UTC | #3
On Thu, Jan 19, 2023 at 11:02:25AM +0000, David Rau wrote:
> Would you please provide me the related error messages when hung task crashes in da7219_aad_irq_thread()?
> BTW, "gnd_switch_delay = 256" is an unusual use case of the longer jack detection latency. 
> 

Here is a typical traceback.

<3>[ 246.919057] INFO: task irq/105-da7219-:2854 blocked for more than 122 seconds.
<3>[ 246.919065] Not tainted 5.10.159-20927-g317f62e2494d #1
<3>[ 246.919068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[ $PHONE_NUMBER] task:irq/105-da7219- state:D stack: 0 pid: 2854 ppid: 2 flags:0x00004080
<6>[ 246.919075] Call Trace:
<6>[ 246.919084] __schedule+0x3b0/0xdaf
<6>[ 246.919090] schedule+0x44/0xa8
<6>[ 246.919093] schedule_timeout+0xb6/0x290
<6>[ 246.919098] ? run_local_timers+0x4e/0x4e
<6>[ 246.919102] msleep+0x2c/0x38
<6>[ 246.919108] da7219_aad_irq_thread+0x66/0x2b0 [snd_soc_da7219 cd5a76eef6e777074216b9d61f7918f7561bf7ec]
<6>[ 246.919113] ? irq_forced_thread_fn+0x5f/0x5f
<6>[ 246.919116] irq_thread_fn+0x22/0x4d
<6>[ 246.919120] irq_thread+0x120/0x19d
<6>[ 246.919123] ? irq_thread_fn+0x4d/0x4d
<6>[ 246.919128] kthread+0x142/0x153
<6>[ 246.919132] ? irq_forced_secondary_handler+0x21/0x21
<6>[ 246.919135] ? kthread_blkcg+0x31/0x31
<6>[ 246.919139] ret_from_fork+0x1f/0x30

The underlying question is if it really appropriate to have an
msleep() of any kind in an interrupt handler. If this is about
debouncing a signal, it should be handled with a delayed timer.

Guenter

> -----Original Message-----
> From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck
> Sent: Wednesday, January 18, 2023 03:57
> To: David Rau <we730128@gmail.com>
> Cc: perex@perex.cz; lgirdwood@gmail.com; broonie@kernel.org; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org; David Rau <david.rau.zg@renesas.com>
> Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
> 
> On Mon, Nov 21, 2022 at 05:07:44AM +0000, David Rau wrote:
> > The OMTP pin define headsets can be mis-detected as line out instead 
> > of OMTP, causing obvious issues with audio quality.
> > This patch is to put increased resistances within the device at a 
> > suitable point.
> > 
> > To solve this issue better, the new mechanism setup ground switches 
> > with conditional delay control and these allow for more stabile 
> > detection process to operate as intended. This conditional delay 
> > control will not impact the hardware process but use extra system 
> > resource.
> > 
> > This commit improves control of ground switches in the AAD logic.
> > 
> > Signed-off-by: David Rau <david.rau.zg@renesas.com>
> > ---
> >  sound/soc/codecs/da7219-aad.c | 42 
> > ++++++++++++++++++++++++++++++-----
> >  sound/soc/codecs/da7219-aad.h |  1 +
> >  2 files changed, 37 insertions(+), 6 deletions(-)
> > 
> > diff --git a/sound/soc/codecs/da7219-aad.c 
> > b/sound/soc/codecs/da7219-aad.c index bba73c44c219..08200ec259f9 
> > 100644
> > --- a/sound/soc/codecs/da7219-aad.c
> > +++ b/sound/soc/codecs/da7219-aad.c
> > @@ -352,9 +352,14 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data)
> >  	struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component);
> >  	struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
> >  	u8 events[DA7219_AAD_IRQ_REG_MAX];
> > -	u8 statusa;
> > +	u8 statusa, srm_st;
> >  	int i, report = 0, mask = 0;
> >  
> > +	srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & DA7219_PLL_SRM_STS_MCLK;
> > +	msleep(da7219_aad->gnd_switch_delay * ((srm_st == 0x0) ? 2 : 1) - 
> > +4);
> 
> Ever since this patch was applied to ChromeOS, we have observed hung task crashes in da7219_aad_irq_thread().
> 
> Is it really appropriate to sleep up to (256 * 2) - 4 = 508 ms in an interrupt handler ?
> 
> Thanks,
> Guenter
> 
> > +	/* Enable ground switch */
> > +	snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
> > +
> >  	/* Read current IRQ events */
> >  	regmap_bulk_read(da7219->regmap, DA7219_ACCDET_IRQ_EVENT_A,
> >  			 events, DA7219_AAD_IRQ_REG_MAX);
> > @@ -454,8 +459,8 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data)
> >  			snd_soc_dapm_disable_pin(dapm, "Mic Bias");
> >  			snd_soc_dapm_sync(dapm);
> >  
> > -			/* Enable ground switch */
> > -			snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
> > +			/* Disable ground switch */
> > +			snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00);
> >  		}
> >  	}
> >  
> > @@ -831,6 +836,32 @@ static void da7219_aad_handle_pdata(struct snd_soc_component *component)
> >  	}
> >  }
> >  
> > +static void da7219_aad_handle_gnd_switch_time(struct 
> > +snd_soc_component *component) {
> > +	struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
> > +	struct da7219_aad_priv *da7219_aad = da7219->aad;
> > +	u8 jack_det;
> > +
> > +	jack_det = snd_soc_component_read(component, DA7219_ACCDET_CONFIG_2)
> > +		& DA7219_JACK_DETECT_RATE_MASK;
> > +	switch (jack_det) {
> > +	case 0x00:
> > +		da7219_aad->gnd_switch_delay = 32;
> > +		break;
> > +	case 0x10:
> > +		da7219_aad->gnd_switch_delay = 64;
> > +		break;
> > +	case 0x20:
> > +		da7219_aad->gnd_switch_delay = 128;
> > +		break;
> > +	case 0x30:
> > +		da7219_aad->gnd_switch_delay = 256;
> > +		break;
> > +	default:
> > +		da7219_aad->gnd_switch_delay = 32;
> > +		break;
> > +	}
> > +}
> >  
> >  /*
> >   * Suspend/Resume
> > @@ -908,9 +939,6 @@ int da7219_aad_init(struct snd_soc_component *component)
> >  	snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1,
> >  			    DA7219_BUTTON_CONFIG_MASK, 0);
> >  
> > -	/* Enable ground switch */
> > -	snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
> > -
> >  	INIT_WORK(&da7219_aad->btn_det_work, da7219_aad_btn_det_work);
> >  	INIT_WORK(&da7219_aad->hptest_work, da7219_aad_hptest_work);
> >  
> > @@ -928,6 +956,8 @@ int da7219_aad_init(struct snd_soc_component *component)
> >  	regmap_bulk_write(da7219->regmap, DA7219_ACCDET_IRQ_MASK_A,
> >  			  &mask, DA7219_AAD_IRQ_REG_MAX);
> >  
> > +	da7219_aad_handle_gnd_switch_time(component);
> > +
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(da7219_aad_init);
> > diff --git a/sound/soc/codecs/da7219-aad.h 
> > b/sound/soc/codecs/da7219-aad.h index f48a12012ef3..21fdf53095cc 
> > 100644
> > --- a/sound/soc/codecs/da7219-aad.h
> > +++ b/sound/soc/codecs/da7219-aad.h
> > @@ -187,6 +187,7 @@ enum da7219_aad_event_regs {  struct 
> > da7219_aad_priv {
> >  	struct snd_soc_component *component;
> >  	int irq;
> > +	int gnd_switch_delay;
> >  
> >  	u8 micbias_pulse_lvl;
> >  	u32 micbias_pulse_time;
> > --
> > 2.17.1
> >
David Rau Jan. 31, 2023, 3:58 a.m. UTC | #4
Thanks for the kind feedback.
Would you please let me know what kinds of environment such error appears you ever meet?
Ex: da7219_aad->gnd_switch_delay = ?


-----Original Message-----
From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck
Sent: Friday, January 20, 2023 00:12
To: David Rau <david.rau.zg@renesas.com>
Cc: David Rau <we730128@gmail.com>; perex@perex.cz; lgirdwood@gmail.com; broonie@kernel.org; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music

On Thu, Jan 19, 2023 at 11:02:25AM +0000, David Rau wrote:
> Would you please provide me the related error messages when hung task crashes in da7219_aad_irq_thread()?
> BTW, "gnd_switch_delay = 256" is an unusual use case of the longer jack detection latency. 
> 

Here is a typical traceback.

<3>[ 246.919057] INFO: task irq/105-da7219-:2854 blocked for more than 122 seconds.
<3>[ 246.919065] Not tainted 5.10.159-20927-g317f62e2494d #1 <3>[ 246.919068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[ $PHONE_NUMBER] task:irq/105-da7219- state:D stack: 0 pid: 2854 ppid: 2 flags:0x00004080 <6>[ 246.919075] Call Trace:
<6>[ 246.919084] __schedule+0x3b0/0xdaf
<6>[ 246.919090] schedule+0x44/0xa8
<6>[ 246.919093] schedule_timeout+0xb6/0x290 <6>[ 246.919098] ? run_local_timers+0x4e/0x4e <6>[ 246.919102] msleep+0x2c/0x38 <6>[ 246.919108] da7219_aad_irq_thread+0x66/0x2b0 [snd_soc_da7219 cd5a76eef6e777074216b9d61f7918f7561bf7ec]
<6>[ 246.919113] ? irq_forced_thread_fn+0x5f/0x5f <6>[ 246.919116] irq_thread_fn+0x22/0x4d <6>[ 246.919120] irq_thread+0x120/0x19d <6>[ 246.919123] ? irq_thread_fn+0x4d/0x4d <6>[ 246.919128] kthread+0x142/0x153 <6>[ 246.919132] ? irq_forced_secondary_handler+0x21/0x21
<6>[ 246.919135] ? kthread_blkcg+0x31/0x31 <6>[ 246.919139] ret_from_fork+0x1f/0x30

The underlying question is if it really appropriate to have an
msleep() of any kind in an interrupt handler. If this is about debouncing a signal, it should be handled with a delayed timer.

Guenter

> -----Original Message-----
> From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck
> Sent: Wednesday, January 18, 2023 03:57
> To: David Rau <we730128@gmail.com>
> Cc: perex@perex.cz; lgirdwood@gmail.com; broonie@kernel.org; 
> tiwai@suse.com; support.opensource@diasemi.com; 
> alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org; David Rau 
> <david.rau.zg@renesas.com>
> Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on 
> OMTP headsets when playing music
> 
> On Mon, Nov 21, 2022 at 05:07:44AM +0000, David Rau wrote:
> > The OMTP pin define headsets can be mis-detected as line out instead 
> > of OMTP, causing obvious issues with audio quality.
> > This patch is to put increased resistances within the device at a 
> > suitable point.
> > 
> > To solve this issue better, the new mechanism setup ground switches 
> > with conditional delay control and these allow for more stabile 
> > detection process to operate as intended. This conditional delay 
> > control will not impact the hardware process but use extra system 
> > resource.
> > 
> > This commit improves control of ground switches in the AAD logic.
> > 
> > Signed-off-by: David Rau <david.rau.zg@renesas.com>
> > ---
> >  sound/soc/codecs/da7219-aad.c | 42
> > ++++++++++++++++++++++++++++++-----
> >  sound/soc/codecs/da7219-aad.h |  1 +
> >  2 files changed, 37 insertions(+), 6 deletions(-)
> > 
> > diff --git a/sound/soc/codecs/da7219-aad.c 
> > b/sound/soc/codecs/da7219-aad.c index bba73c44c219..08200ec259f9
> > 100644
> > --- a/sound/soc/codecs/da7219-aad.c
> > +++ b/sound/soc/codecs/da7219-aad.c
> > @@ -352,9 +352,14 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data)
> >  	struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component);
> >  	struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
> >  	u8 events[DA7219_AAD_IRQ_REG_MAX];
> > -	u8 statusa;
> > +	u8 statusa, srm_st;
> >  	int i, report = 0, mask = 0;
> >  
> > +	srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & DA7219_PLL_SRM_STS_MCLK;
> > +	msleep(da7219_aad->gnd_switch_delay * ((srm_st == 0x0) ? 2 : 1) - 
> > +4);
> 
> Ever since this patch was applied to ChromeOS, we have observed hung task crashes in da7219_aad_irq_thread().
> 
> Is it really appropriate to sleep up to (256 * 2) - 4 = 508 ms in an interrupt handler ?
> 
> Thanks,
> Guenter
> 
> > +	/* Enable ground switch */
> > +	snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
> > +
> >  	/* Read current IRQ events */
> >  	regmap_bulk_read(da7219->regmap, DA7219_ACCDET_IRQ_EVENT_A,
> >  			 events, DA7219_AAD_IRQ_REG_MAX); @@ -454,8 +459,8 @@ static 
> > irqreturn_t da7219_aad_irq_thread(int irq, void *data)
> >  			snd_soc_dapm_disable_pin(dapm, "Mic Bias");
> >  			snd_soc_dapm_sync(dapm);
> >  
> > -			/* Enable ground switch */
> > -			snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
> > +			/* Disable ground switch */
> > +			snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00);
> >  		}
> >  	}
> >  
> > @@ -831,6 +836,32 @@ static void da7219_aad_handle_pdata(struct snd_soc_component *component)
> >  	}
> >  }
> >  
> > +static void da7219_aad_handle_gnd_switch_time(struct
> > +snd_soc_component *component) {
> > +	struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
> > +	struct da7219_aad_priv *da7219_aad = da7219->aad;
> > +	u8 jack_det;
> > +
> > +	jack_det = snd_soc_component_read(component, DA7219_ACCDET_CONFIG_2)
> > +		& DA7219_JACK_DETECT_RATE_MASK;
> > +	switch (jack_det) {
> > +	case 0x00:
> > +		da7219_aad->gnd_switch_delay = 32;
> > +		break;
> > +	case 0x10:
> > +		da7219_aad->gnd_switch_delay = 64;
> > +		break;
> > +	case 0x20:
> > +		da7219_aad->gnd_switch_delay = 128;
> > +		break;
> > +	case 0x30:
> > +		da7219_aad->gnd_switch_delay = 256;
> > +		break;
> > +	default:
> > +		da7219_aad->gnd_switch_delay = 32;
> > +		break;
> > +	}
> > +}
> >  
> >  /*
> >   * Suspend/Resume
> > @@ -908,9 +939,6 @@ int da7219_aad_init(struct snd_soc_component *component)
> >  	snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1,
> >  			    DA7219_BUTTON_CONFIG_MASK, 0);
> >  
> > -	/* Enable ground switch */
> > -	snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
> > -
> >  	INIT_WORK(&da7219_aad->btn_det_work, da7219_aad_btn_det_work);
> >  	INIT_WORK(&da7219_aad->hptest_work, da7219_aad_hptest_work);
> >  
> > @@ -928,6 +956,8 @@ int da7219_aad_init(struct snd_soc_component *component)
> >  	regmap_bulk_write(da7219->regmap, DA7219_ACCDET_IRQ_MASK_A,
> >  			  &mask, DA7219_AAD_IRQ_REG_MAX);
> >  
> > +	da7219_aad_handle_gnd_switch_time(component);
> > +
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(da7219_aad_init);
> > diff --git a/sound/soc/codecs/da7219-aad.h 
> > b/sound/soc/codecs/da7219-aad.h index f48a12012ef3..21fdf53095cc
> > 100644
> > --- a/sound/soc/codecs/da7219-aad.h
> > +++ b/sound/soc/codecs/da7219-aad.h
> > @@ -187,6 +187,7 @@ enum da7219_aad_event_regs {  struct 
> > da7219_aad_priv {
> >  	struct snd_soc_component *component;
> >  	int irq;
> > +	int gnd_switch_delay;
> >  
> >  	u8 micbias_pulse_lvl;
> >  	u32 micbias_pulse_time;
> > --
> > 2.17.1
> >
Guenter Roeck Jan. 31, 2023, 6:16 a.m. UTC | #5
On 1/30/23 19:58, David Rau wrote:
> Thanks for the kind feedback.
> Would you please let me know what kinds of environment such error appears you ever meet?
> Ex: da7219_aad->gnd_switch_delay = ?
> 

We are seeing the problem on various Chromebooks.

Never mind, though. I really don't have time to keep arguing about this.
I would have assumed that it is obvious that a long msleep() in an
interrupt handler is not appropriate, but obviously I was wrong.
I'll see if I can implement a downstream fix.

Guenter

> 
> -----Original Message-----
> From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck
> Sent: Friday, January 20, 2023 00:12
> To: David Rau <david.rau.zg@renesas.com>
> Cc: David Rau <we730128@gmail.com>; perex@perex.cz; lgirdwood@gmail.com; broonie@kernel.org; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
> 
> On Thu, Jan 19, 2023 at 11:02:25AM +0000, David Rau wrote:
>> Would you please provide me the related error messages when hung task crashes in da7219_aad_irq_thread()?
>> BTW, "gnd_switch_delay = 256" is an unusual use case of the longer jack detection latency.
>>
> 
> Here is a typical traceback.
> 
> <3>[ 246.919057] INFO: task irq/105-da7219-:2854 blocked for more than 122 seconds.
> <3>[ 246.919065] Not tainted 5.10.159-20927-g317f62e2494d #1 <3>[ 246.919068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> <6>[ $PHONE_NUMBER] task:irq/105-da7219- state:D stack: 0 pid: 2854 ppid: 2 flags:0x00004080 <6>[ 246.919075] Call Trace:
> <6>[ 246.919084] __schedule+0x3b0/0xdaf
> <6>[ 246.919090] schedule+0x44/0xa8
> <6>[ 246.919093] schedule_timeout+0xb6/0x290 <6>[ 246.919098] ? run_local_timers+0x4e/0x4e <6>[ 246.919102] msleep+0x2c/0x38 <6>[ 246.919108] da7219_aad_irq_thread+0x66/0x2b0 [snd_soc_da7219 cd5a76eef6e777074216b9d61f7918f7561bf7ec]
> <6>[ 246.919113] ? irq_forced_thread_fn+0x5f/0x5f <6>[ 246.919116] irq_thread_fn+0x22/0x4d <6>[ 246.919120] irq_thread+0x120/0x19d <6>[ 246.919123] ? irq_thread_fn+0x4d/0x4d <6>[ 246.919128] kthread+0x142/0x153 <6>[ 246.919132] ? irq_forced_secondary_handler+0x21/0x21
> <6>[ 246.919135] ? kthread_blkcg+0x31/0x31 <6>[ 246.919139] ret_from_fork+0x1f/0x30
> 
> The underlying question is if it really appropriate to have an
> msleep() of any kind in an interrupt handler. If this is about debouncing a signal, it should be handled with a delayed timer.
> 
> Guenter
> 
>> -----Original Message-----
>> From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck
>> Sent: Wednesday, January 18, 2023 03:57
>> To: David Rau <we730128@gmail.com>
>> Cc: perex@perex.cz; lgirdwood@gmail.com; broonie@kernel.org;
>> tiwai@suse.com; support.opensource@diasemi.com;
>> alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org; David Rau
>> <david.rau.zg@renesas.com>
>> Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on
>> OMTP headsets when playing music
>>
>> On Mon, Nov 21, 2022 at 05:07:44AM +0000, David Rau wrote:
>>> The OMTP pin define headsets can be mis-detected as line out instead
>>> of OMTP, causing obvious issues with audio quality.
>>> This patch is to put increased resistances within the device at a
>>> suitable point.
>>>
>>> To solve this issue better, the new mechanism setup ground switches
>>> with conditional delay control and these allow for more stabile
>>> detection process to operate as intended. This conditional delay
>>> control will not impact the hardware process but use extra system
>>> resource.
>>>
>>> This commit improves control of ground switches in the AAD logic.
>>>
>>> Signed-off-by: David Rau <david.rau.zg@renesas.com>
>>> ---
>>>   sound/soc/codecs/da7219-aad.c | 42
>>> ++++++++++++++++++++++++++++++-----
>>>   sound/soc/codecs/da7219-aad.h |  1 +
>>>   2 files changed, 37 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/sound/soc/codecs/da7219-aad.c
>>> b/sound/soc/codecs/da7219-aad.c index bba73c44c219..08200ec259f9
>>> 100644
>>> --- a/sound/soc/codecs/da7219-aad.c
>>> +++ b/sound/soc/codecs/da7219-aad.c
>>> @@ -352,9 +352,14 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data)
>>>   	struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component);
>>>   	struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
>>>   	u8 events[DA7219_AAD_IRQ_REG_MAX];
>>> -	u8 statusa;
>>> +	u8 statusa, srm_st;
>>>   	int i, report = 0, mask = 0;
>>>   
>>> +	srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & DA7219_PLL_SRM_STS_MCLK;
>>> +	msleep(da7219_aad->gnd_switch_delay * ((srm_st == 0x0) ? 2 : 1) -
>>> +4);
>>
>> Ever since this patch was applied to ChromeOS, we have observed hung task crashes in da7219_aad_irq_thread().
>>
>> Is it really appropriate to sleep up to (256 * 2) - 4 = 508 ms in an interrupt handler ?
>>
>> Thanks,
>> Guenter
>>
>>> +	/* Enable ground switch */
>>> +	snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
>>> +
>>>   	/* Read current IRQ events */
>>>   	regmap_bulk_read(da7219->regmap, DA7219_ACCDET_IRQ_EVENT_A,
>>>   			 events, DA7219_AAD_IRQ_REG_MAX); @@ -454,8 +459,8 @@ static
>>> irqreturn_t da7219_aad_irq_thread(int irq, void *data)
>>>   			snd_soc_dapm_disable_pin(dapm, "Mic Bias");
>>>   			snd_soc_dapm_sync(dapm);
>>>   
>>> -			/* Enable ground switch */
>>> -			snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
>>> +			/* Disable ground switch */
>>> +			snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00);
>>>   		}
>>>   	}
>>>   
>>> @@ -831,6 +836,32 @@ static void da7219_aad_handle_pdata(struct snd_soc_component *component)
>>>   	}
>>>   }
>>>   
>>> +static void da7219_aad_handle_gnd_switch_time(struct
>>> +snd_soc_component *component) {
>>> +	struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
>>> +	struct da7219_aad_priv *da7219_aad = da7219->aad;
>>> +	u8 jack_det;
>>> +
>>> +	jack_det = snd_soc_component_read(component, DA7219_ACCDET_CONFIG_2)
>>> +		& DA7219_JACK_DETECT_RATE_MASK;
>>> +	switch (jack_det) {
>>> +	case 0x00:
>>> +		da7219_aad->gnd_switch_delay = 32;
>>> +		break;
>>> +	case 0x10:
>>> +		da7219_aad->gnd_switch_delay = 64;
>>> +		break;
>>> +	case 0x20:
>>> +		da7219_aad->gnd_switch_delay = 128;
>>> +		break;
>>> +	case 0x30:
>>> +		da7219_aad->gnd_switch_delay = 256;
>>> +		break;
>>> +	default:
>>> +		da7219_aad->gnd_switch_delay = 32;
>>> +		break;
>>> +	}
>>> +}
>>>   
>>>   /*
>>>    * Suspend/Resume
>>> @@ -908,9 +939,6 @@ int da7219_aad_init(struct snd_soc_component *component)
>>>   	snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1,
>>>   			    DA7219_BUTTON_CONFIG_MASK, 0);
>>>   
>>> -	/* Enable ground switch */
>>> -	snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
>>> -
>>>   	INIT_WORK(&da7219_aad->btn_det_work, da7219_aad_btn_det_work);
>>>   	INIT_WORK(&da7219_aad->hptest_work, da7219_aad_hptest_work);
>>>   
>>> @@ -928,6 +956,8 @@ int da7219_aad_init(struct snd_soc_component *component)
>>>   	regmap_bulk_write(da7219->regmap, DA7219_ACCDET_IRQ_MASK_A,
>>>   			  &mask, DA7219_AAD_IRQ_REG_MAX);
>>>   
>>> +	da7219_aad_handle_gnd_switch_time(component);
>>> +
>>>   	return 0;
>>>   }
>>>   EXPORT_SYMBOL_GPL(da7219_aad_init);
>>> diff --git a/sound/soc/codecs/da7219-aad.h
>>> b/sound/soc/codecs/da7219-aad.h index f48a12012ef3..21fdf53095cc
>>> 100644
>>> --- a/sound/soc/codecs/da7219-aad.h
>>> +++ b/sound/soc/codecs/da7219-aad.h
>>> @@ -187,6 +187,7 @@ enum da7219_aad_event_regs {  struct
>>> da7219_aad_priv {
>>>   	struct snd_soc_component *component;
>>>   	int irq;
>>> +	int gnd_switch_delay;
>>>   
>>>   	u8 micbias_pulse_lvl;
>>>   	u32 micbias_pulse_time;
>>> --
>>> 2.17.1
>>>
Mark Brown Jan. 31, 2023, 12:08 p.m. UTC | #6
On Mon, Jan 30, 2023 at 10:16:06PM -0800, Guenter Roeck wrote:
> On 1/30/23 19:58, David Rau wrote:

> > Thanks for the kind feedback.
> > Would you please let me know what kinds of environment such error appears you ever meet?
> > Ex: da7219_aad->gnd_switch_delay = ?

> We are seeing the problem on various Chromebooks.

> Never mind, though. I really don't have time to keep arguing about this.
> I would have assumed that it is obvious that a long msleep() in an
> interrupt handler is not appropriate, but obviously I was wrong.

This is a threaded interrupt handler so it's a bit less clear that it's
meaningfully different to just disabling the interrupt for debounce or
whatever.  Not to say it's ideal.

> I'll see if I can implement a downstream fix.

If you implement something I don't see a reason not to post it upstream.
Guenter Roeck Feb. 2, 2023, 3:51 p.m. UTC | #7
On Tue, Jan 31, 2023 at 12:08:53PM +0000, Mark Brown wrote:
> On Mon, Jan 30, 2023 at 10:16:06PM -0800, Guenter Roeck wrote:
> > On 1/30/23 19:58, David Rau wrote:
> 
> > > Thanks for the kind feedback.
> > > Would you please let me know what kinds of environment such error appears you ever meet?
> > > Ex: da7219_aad->gnd_switch_delay = ?
> 
> > We are seeing the problem on various Chromebooks.
> 
> > Never mind, though. I really don't have time to keep arguing about this.
> > I would have assumed that it is obvious that a long msleep() in an
> > interrupt handler is not appropriate, but obviously I was wrong.
> 
> This is a threaded interrupt handler so it's a bit less clear that it's
> meaningfully different to just disabling the interrupt for debounce or
> whatever.  Not to say it's ideal.
> 
> > I'll see if I can implement a downstream fix.
> 
> If you implement something I don't see a reason not to post it upstream.

I had a look into the code, and concluded that it is too complex for anyone
who doesn't know it to find a proper fix. For example, for an outsider it
is not conceivable (or explained) why the ground switch is enabled only
to be disabled immediately afterwards if a jack was removed.

This is now the top crash reason on affected Chromebooks (so far I
identified Asus C424, HP SeaStar, and HP StingRay) with this patch
applied. I am inclined to revert it from all ChromeOS kernel branches.
At least for us the cure for the problem is much worse than the problem
itself.

Guenter
Mark Brown Feb. 2, 2023, 5:04 p.m. UTC | #8
On Thu, Feb 02, 2023 at 07:51:01AM -0800, Guenter Roeck wrote:
> On Tue, Jan 31, 2023 at 12:08:53PM +0000, Mark Brown wrote:
> > On Mon, Jan 30, 2023 at 10:16:06PM -0800, Guenter Roeck wrote:

> > > I'll see if I can implement a downstream fix.

> > If you implement something I don't see a reason not to post it upstream.

> I had a look into the code, and concluded that it is too complex for anyone
> who doesn't know it to find a proper fix. For example, for an outsider it

It's definitely unclear, there's a datasheet at [1] which does appear to
explicitly call for a 512ms delay though (see figure 20 on page 50).  It
does look like it should only be applied in the case where an inserted
jack is detected (ie, when identifying an accessory or button press) and
not when removal is detected though.

> is not conceivable (or explained) why the ground switch is enabled only
> to be disabled immediately afterwards if a jack was removed.

It smells like there's a power benefit to leaving it disabled when
unplugged (which seems plausible), and possibly like the detection is
more stable with the ground switch enabled.  The ground switch is not
documented AFAICT (it's in register 0xfb which isn't named and doesn't
appear to appear in the datsheet from a quick search).  The code is
leaving the switch enabled so long as an accessory is plugged.

> This is now the top crash reason on affected Chromebooks (so far I
> identified Asus C424, HP SeaStar, and HP StingRay) with this patch
> applied. I am inclined to revert it from all ChromeOS kernel branches.
> At least for us the cure for the problem is much worse than the problem
> itself.

Are you saying this is actually crashing, or just that you're getting
warnings about threads being blocked for too long (that was what was
posted earlier in the thread)?  The only things I can see that look like
they have the potential to actually lock up are the cancel_work_sync()
calls but they were unchanged and the backtrace you showed was showing
the thread in the msleep().  My guess would be that you've got systems
where there are very frequent jack detection events (potentiallly with
broken accessories, or possibly due to the ground switch putting things
into the wrong priority) and that the interrupt is firing again as soon
as the thread unmasks the primary interrupt which means it never
actually stops running.

It's possible that reordering things so that the delay is only applied
if DA7219_JACK_INSERTION_STS_MASK is set would help, that'd need some
motion of the interrupt acking as well.  That's probably a good idea in
general, it's what the datasheet seems to call for and would lead to
prompter removal detection.  However if the issue is systems with broken
accessories constantly firing spurious button events they'd still be
seeing the delay.

My other guess would be that moving the delay that's been added to a
delayed work would avoid the warnings, though you might want to manually
keep the physical interrupt disabled while that's running which is fun.
Possibly also tuning down the delay given that as you say 500ms is
rather a long potential delay even in the context of jack debounces,
though if it is bad accessories then there's probably a bit of luck
involved in the original code not triggering issues and any debounce is
likely to cause fun, and like I say the datasheet does seem to say that
this is the appropriate delay.

You'd end up with something along the lines of

	disable_irq();
	schedule_delayed_work(delay, current_irq_code);

in the IRQ handler then call enable_irq() on the way out of the new
delayed_work.  That would keep the same flow but not look like the task
is running which should avoid setting off the hung task alarm.

[1] https://www.renesas.com/us/en/document/dst/da7219-datasheet?r=1563341
Guenter Roeck Feb. 2, 2023, 6:39 p.m. UTC | #9
On 2/2/23 09:04, Mark Brown wrote:
> On Thu, Feb 02, 2023 at 07:51:01AM -0800, Guenter Roeck wrote:
>> On Tue, Jan 31, 2023 at 12:08:53PM +0000, Mark Brown wrote:
>>> On Mon, Jan 30, 2023 at 10:16:06PM -0800, Guenter Roeck wrote:
> 
>>>> I'll see if I can implement a downstream fix.
> 
>>> If you implement something I don't see a reason not to post it upstream.
> 
>> I had a look into the code, and concluded that it is too complex for anyone
>> who doesn't know it to find a proper fix. For example, for an outsider it
> 
> It's definitely unclear, there's a datasheet at [1] which does appear to
> explicitly call for a 512ms delay though (see figure 20 on page 50).  It
> does look like it should only be applied in the case where an inserted
> jack is detected (ie, when identifying an accessory or button press) and
> not when removal is detected though.
> 

The datasheet doesn't really suggest that a delay shall be applied using
msleep (ie in the code). The chip presumably debounces internally (see
jackdet_debounce and jackdet_rem_deb), and there is also jack_detect_rate
to configure the detection rate. The table seems to suggest (to me) that
there is an e_jack_insertion event, which would then be followed 64-512 ms
later with an e_jack_detect_complete event.

Whatever is done in software is on top of that, or at least that is my
understanding, and not explained by anything in the datasheet.

Given that the chip itself supports debouncing internally, it is not clear
to me what the delay is actually supposed to accomplish. Soft debounce
on top of chip debounce ? I don't see that explained anywhere, though of
course I might be missing it.

>> is not conceivable (or explained) why the ground switch is enabled only
>> to be disabled immediately afterwards if a jack was removed.
> 
> It smells like there's a power benefit to leaving it disabled when
> unplugged (which seems plausible), and possibly like the detection is
> more stable with the ground switch enabled.  The ground switch is not
> documented AFAICT (it's in register 0xfb which isn't named and doesn't
> appear to appear in the datsheet from a quick search).  The code is
> leaving the switch enabled so long as an accessory is plugged.
> 

I understand. What I don't understand is that it is always enabled
in the interrupt handler, no matter if a jack was inserted or not,
only to be disabled immediately if the jack was disabled or after
insertion detection work is complete.

Overall it is not clear what the impact of enabling ground switch
actually is. What is really odd is that the original code only enabled
ground switch once during initialization and disabled it either
after a disconnect or after insertion detection was complete,
but never re-enabled it. Now it is briefly enabled in the interrupt
handler, but only after sleeping.

>> This is now the top crash reason on affected Chromebooks (so far I
>> identified Asus C424, HP SeaStar, and HP StingRay) with this patch
>> applied. I am inclined to revert it from all ChromeOS kernel branches.
>> At least for us the cure for the problem is much worse than the problem
>> itself.
> 
> Are you saying this is actually crashing, or just that you're getting
> warnings about threads being blocked for too long (that was what was
> posted earlier in the thread)?  The only things I can see that look like

ChromeOS is configured to crash after stalled threads are detected (ie
after 120 seconds), so this is actually causing crashes.

> they have the potential to actually lock up are the cancel_work_sync()
> calls but they were unchanged and the backtrace you showed was showing
> the thread in the msleep().  My guess would be that you've got systems
> where there are very frequent jack detection events (potentiallly with
> broken accessories, or possibly due to the ground switch putting things
> into the wrong priority) and that the interrupt is firing again as soon
> as the thread unmasks the primary interrupt which means it never
> actually stops running.
> 

That is what I strongly suspect is happening. I don't know why exactly
the interrupt is firing continuously, but the hang is always in msleep().
One possibility might be that the event is actually a disconnect event,
and that enabling and immediately disabling the ground switch causes
another interrupt, which is then handled immediately, causing the hang.

> It's possible that reordering things so that the delay is only applied
> if DA7219_JACK_INSERTION_STS_MASK is set would help, that'd need some
> motion of the interrupt acking as well.  That's probably a good idea in
> general, it's what the datasheet seems to call for and would lead to
> prompter removal detection.  However if the issue is systems with broken
> accessories constantly firing spurious button events they'd still be
> seeing the delay.
> 
> My other guess would be that moving the delay that's been added to a
> delayed work would avoid the warnings, though you might want to manually
> keep the physical interrupt disabled while that's running which is fun.
> Possibly also tuning down the delay given that as you say 500ms is
> rather a long potential delay even in the context of jack debounces,
> though if it is bad accessories then there's probably a bit of luck
> involved in the original code not triggering issues and any debounce is
> likely to cause fun, and like I say the datasheet does seem to say that
> this is the appropriate delay.
> 
> You'd end up with something along the lines of
> 
> 	disable_irq();
> 	schedule_delayed_work(delay, current_irq_code);
> 

I am not sure if that would fix anything. The current code sleeps, then
enables the ground switch and does the rest of the detection. I'd somewhat
understand the code if it would enable the ground switch after an "insertion
detected" interrupt, then wait for some amount of time and handle the rest
of the detection after waiting (even though that should really be handled by
the "detection complete" interrupt). But that isn't what it does.
If we were to implement the above, I suspect the result would be that the
interrupt still happens all the time, and the only difference would be that
it would be "silenced" while the delayed work is waiting to be scheduled.
That doesn't really fix the problem, it only works around it. But, sure,
it would be much better than the current situation.

My "wild shot" fix would be to enable the ground switch after an insertion
event and to drop the software sleep entirely.

However, it is really impossible to know what the delay is for in the
first place. Looking into the code further, the sleep time actually matches
the configured jack detection rate. I have no idea why it would make sense
to wait for a detection cycle after an event, then enable the ground switch
and actually handle the event (which by then probably reports that jack
detection is complete after an insertion). I really don't understand
the logic behind that.

Guenter

> in the IRQ handler then call enable_irq() on the way out of the new
> delayed_work.  That would keep the same flow but not look like the task
> is running which should avoid setting off the hung task alarm.
> 
> [1] https://www.renesas.com/us/en/document/dst/da7219-datasheet?r=1563341
Mark Brown Feb. 2, 2023, 7:36 p.m. UTC | #10
On Thu, Feb 02, 2023 at 10:39:51AM -0800, Guenter Roeck wrote:
> On 2/2/23 09:04, Mark Brown wrote:

> > It's definitely unclear, there's a datasheet at [1] which does appear to
> > explicitly call for a 512ms delay though (see figure 20 on page 50).  It
> > does look like it should only be applied in the case where an inserted
> > jack is detected (ie, when identifying an accessory or button press) and
> > not when removal is detected though.

> The datasheet doesn't really suggest that a delay shall be applied using
> msleep (ie in the code). The chip presumably debounces internally (see

Obviously it doesn't call for an explicit implementation in the host.

> jackdet_debounce and jackdet_rem_deb), and there is also jack_detect_rate
> to configure the detection rate. The table seems to suggest (to me) that
> there is an e_jack_insertion event, which would then be followed 64-512 ms
> later with an e_jack_detect_complete event.

Right, I think what I was looking at was that in combination of the fact
that there's a *much* longer window before the host clears the interrupt
shown on the first JACK_IN.  It could be spurious and possibly just due
to the host type check thing in the diagram but it smells real bad, like
the hardware state machine has robustness issues or something.  The
diagram currently doesn't quite correspond to the code since we have the
delay applied unconditionally, and there's that undocumented
register for the ground switch being managed.

> Whatever is done in software is on top of that, or at least that is my
> understanding, and not explained by anything in the datasheet.

> Given that the chip itself supports debouncing internally, it is not clear
> to me what the delay is actually supposed to accomplish. Soft debounce
> on top of chip debounce ? I don't see that explained anywhere, though of
> course I might be missing it.

That's what it looks like it's trying to accomplish but as you say it's
not exactly explicit.  I *suspect* it's trying to debounce in more cases
than is needed.

> > > is not conceivable (or explained) why the ground switch is enabled only
> > > to be disabled immediately afterwards if a jack was removed.

> > It smells like there's a power benefit to leaving it disabled when
> > unplugged (which seems plausible), and possibly like the detection is
> > more stable with the ground switch enabled.  The ground switch is not
> > documented AFAICT (it's in register 0xfb which isn't named and doesn't
> > appear to appear in the datsheet from a quick search).  The code is
> > leaving the switch enabled so long as an accessory is plugged.

> I understand. What I don't understand is that it is always enabled
> in the interrupt handler, no matter if a jack was inserted or not,
> only to be disabled immediately if the jack was disabled or after
> insertion detection work is complete.

My guess was that it was making the detection more stable, it's
surprising that it'd help with simple presence detection though.

> > Are you saying this is actually crashing, or just that you're getting
> > warnings about threads being blocked for too long (that was what was
> > posted earlier in the thread)?  The only things I can see that look like

> ChromeOS is configured to crash after stalled threads are detected (ie
> after 120 seconds), so this is actually causing crashes.

Ah, that's much more serious than I'd understood from the log you
posted.

> > they have the potential to actually lock up are the cancel_work_sync()
> > calls but they were unchanged and the backtrace you showed was showing
> > the thread in the msleep().  My guess would be that you've got systems
> > where there are very frequent jack detection events (potentiallly with
> > broken accessories, or possibly due to the ground switch putting things
> > into the wrong priority) and that the interrupt is firing again as soon
> > as the thread unmasks the primary interrupt which means it never
> > actually stops running.

> That is what I strongly suspect is happening. I don't know why exactly
> the interrupt is firing continuously, but the hang is always in msleep().
> One possibility might be that the event is actually a disconnect event,
> and that enabling and immediately disabling the ground switch causes
> another interrupt, which is then handled immediately, causing the hang.

Could be.  I'd be willing to guess that it's not just one event but
rather a stream of events of some kind.  Possibly if it's due to the
ground switch it's spuriously detecting a constant stream of button
presses for the affected systems, which don't produce any UI visible
result which would cause users to pull the accessory for whatever
reason?  Whatever's going on I bet it's broken accessories triggering it.

> > My other guess would be that moving the delay that's been added to a
> > delayed work would avoid the warnings, though you might want to manually
> > keep the physical interrupt disabled while that's running which is fun.

> I am not sure if that would fix anything. The current code sleeps, then
> enables the ground switch and does the rest of the detection. I'd somewhat
> understand the code if it would enable the ground switch after an "insertion
> detected" interrupt, then wait for some amount of time and handle the rest
> of the detection after waiting (even though that should really be handled by
> the "detection complete" interrupt). But that isn't what it does.
> If we were to implement the above, I suspect the result would be that the
> interrupt still happens all the time, and the only difference would be that
> it would be "silenced" while the delayed work is waiting to be scheduled.
> That doesn't really fix the problem, it only works around it. But, sure,
> it would be much better than the current situation.

Yes, exactly - I was just looking at a refactoring in the code which
would mitigate the immediate problem while keeping the current partially
documented algorithm in place.

> My "wild shot" fix would be to enable the ground switch after an insertion
> event and to drop the software sleep entirely.

That's entirely plausible to me, either together or possibly just one of
those is actually needed.  Do you want to send a patch?

> However, it is really impossible to know what the delay is for in the
> first place. Looking into the code further, the sleep time actually matches
> the configured jack detection rate. I have no idea why it would make sense
> to wait for a detection cycle after an event, then enable the ground switch
> and actually handle the event (which by then probably reports that jack
> detection is complete after an insertion). I really don't understand
> the logic behind that.

This all smells like there's either a race condition in a state machine
somewhere or the button detection needs a bit of help (though if it's
the latter then it'd be conditional on a microphone having been
detected).

Hopefully David will get back to us with some explanation and ideally
fix.
Guenter Roeck Feb. 4, 2023, 3:42 p.m. UTC | #11
On Thu, Feb 02, 2023 at 07:36:42PM +0000, Mark Brown wrote:
> 
> > > they have the potential to actually lock up are the cancel_work_sync()
> > > calls but they were unchanged and the backtrace you showed was showing
> > > the thread in the msleep().  My guess would be that you've got systems
> > > where there are very frequent jack detection events (potentiallly with
> > > broken accessories, or possibly due to the ground switch putting things
> > > into the wrong priority) and that the interrupt is firing again as soon
> > > as the thread unmasks the primary interrupt which means it never
> > > actually stops running.
> 
> > That is what I strongly suspect is happening. I don't know why exactly
> > the interrupt is firing continuously, but the hang is always in msleep().
> > One possibility might be that the event is actually a disconnect event,
> > and that enabling and immediately disabling the ground switch causes
> > another interrupt, which is then handled immediately, causing the hang.
> 
> Could be.  I'd be willing to guess that it's not just one event but
> rather a stream of events of some kind.  Possibly if it's due to the
> ground switch it's spuriously detecting a constant stream of button
> presses for the affected systems, which don't produce any UI visible
> result which would cause users to pull the accessory for whatever
> reason?  Whatever's going on I bet it's broken accessories triggering it.
> 

That seems to be unlikely. The average number of crashes per affected
system is 1.92, which points to something the users are doing and less
to a broken accessory. We do observe crashes due to broken accessories,
but in those cases the number of crashes per system tends to be much
higher.

Anyway, below is a patch with a possible fix. Of course, I still don't
know what the patch originally tried to fix, so it might not do much if
anything good. For example, it keeps button detection in the interrupt
handler to avoid dropping button events, so if spurious button detection
as you suspected is indeed (part of) the problem we might still see
a large number of interrupts.

Guenter

---
David Rau Feb. 6, 2023, 1:05 a.m. UTC | #12
-----Original Message-----
From: Mark Brown <broonie@kernel.org> 
Sent: Friday, February 3, 2023 03:37
To: Guenter Roeck <linux@roeck-us.net>
Cc: David Rau <david.rau.zg@renesas.com>; perex@perex.cz; lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music

On Thu, Feb 02, 2023 at 10:39:51AM -0800, Guenter Roeck wrote:
> On 2/2/23 09:04, Mark Brown wrote:

> > > It's definitely unclear, there's a datasheet at [1] which does 
> > > appear to explicitly call for a 512ms delay though (see figure 20 on 
> > > page 50).  It does look like it should only be applied in the case 
> > > where an inserted jack is detected (ie, when identifying an 
> > > accessory or button press) and not when removal is detected though.

> > The datasheet doesn't really suggest that a delay shall be applied 
> > using msleep (ie in the code). The chip presumably debounces 
> > internally (see

> Obviously it doesn't call for an explicit implementation in the host.

> > jackdet_debounce and jackdet_rem_deb), and there is also 
> > jack_detect_rate to configure the detection rate. The table seems to 
> > suggest (to me) that there is an e_jack_insertion event, which would 
> > then be followed 64-512 ms later with an e_jack_detect_complete event.

> Right, I think what I was looking at was that in combination of the fact that there's a *much* longer window before the host clears the interrupt shown on the first JACK_IN.  
> It could be spurious and possibly just due to the host type check thing in the diagram but it smells real bad, like the hardware state machine has robustness issues or something.  
> The diagram currently doesn't quite correspond to the code since we have the delay applied unconditionally, and there's that undocumented register for the ground switch being managed.

> > Whatever is done in software is on top of that, or at least that is my 
> > understanding, and not explained by anything in the datasheet.

> > Given that the chip itself supports debouncing internally, it is not 
> > clear to me what the delay is actually supposed to accomplish. Soft 
> > debounce on top of chip debounce ? I don't see that explained 
> > anywhere, though of course I might be missing it.

> That's what it looks like it's trying to accomplish but as you say it's not exactly explicit.  I *suspect* it's trying to debounce in more cases than is needed.

> > > > is not conceivable (or explained) why the ground switch is enabled 
> > > > only to be disabled immediately afterwards if a jack was removed.

> > > It smells like there's a power benefit to leaving it disabled when 
> > > unplugged (which seems plausible), and possibly like the detection 
> > > is more stable with the ground switch enabled.  The ground switch is 
> > > not documented AFAICT (it's in register 0xfb which isn't named and 
> > > doesn't appear to appear in the datsheet from a quick search).  The 
> > > code is leaving the switch enabled so long as an accessory is plugged.

> > I understand. What I don't understand is that it is always enabled in 
> > the interrupt handler, no matter if a jack was inserted or not, only 
> > to be disabled immediately if the jack was disabled or after insertion 
> > detection work is complete.

> My guess was that it was making the detection more stable, it's surprising that it'd help with simple presence detection though.
I added this software debouncing to make DA7219 more stable to do Jack detection.

> > > Are you saying this is actually crashing, or just that you're 
> > > getting warnings about threads being blocked for too long (that was 
> > > what was posted earlier in the thread)?  The only things I can see 
> > > that look like

> > ChromeOS is configured to crash after stalled threads are detected (ie 
> > after 120 seconds), so this is actually causing crashes.

> Ah, that's much more serious than I'd understood from the log you posted.
Sorry to hear about that.
Now I am refactoring the mechanism that remove the pervious delay in IRQ thread to avoid such race condition problem.


> > > they have the potential to actually lock up are the 
> > > cancel_work_sync() calls but they were unchanged and the backtrace 
> > > you showed was showing the thread in the msleep().  My guess would 
> > > be that you've got systems where there are very frequent jack 
> > > detection events (potentiallly with broken accessories, or possibly 
> > > due to the ground switch putting things into the wrong priority) and 
> > > that the interrupt is firing again as soon as the thread unmasks the 
> > > primary interrupt which means it never actually stops running.

> > That is what I strongly suspect is happening. I don't know why exactly 
> > the interrupt is firing continuously, but the hang is always in msleep().
> > One possibility might be that the event is actually a disconnect 
> > event, and that enabling and immediately disabling the ground switch 
> > causes another interrupt, which is then handled immediately, causing the hang.

> Could be.  I'd be willing to guess that it's not just one event but rather a stream of events of some kind.  
> Possibly if it's due to the ground switch it's spuriously detecting a constant stream of button presses for the affected systems, 
> which don't produce any UI visible result which would cause users to pull the accessory for whatever reason?  Whatever's going on I bet it's broken accessories triggering it.

> > > My other guess would be that moving the delay that's been added to a 
> > > delayed work would avoid the warnings, though you might want to 
> > > manually keep the physical interrupt disabled while that's running which is fun.

> > I am not sure if that would fix anything. The current code sleeps, 
> > then enables the ground switch and does the rest of the detection. I'd 
> > somewhat understand the code if it would enable the ground switch 
> > after an "insertion detected" interrupt, then wait for some amount of 
> > time and handle the rest of the detection after waiting (even though 
> > that should really be handled by the "detection complete" interrupt). But that isn't what it does.
> > If we were to implement the above, I suspect the result would be that 
> > the interrupt still happens all the time, and the only difference 
> > would be that it would be "silenced" while the delayed work is waiting to be scheduled.
> > That doesn't really fix the problem, it only works around it. But, 
> > sure, it would be much better than the current situation.

> Yes, exactly - I was just looking at a refactoring in the code which would mitigate the immediate problem while keeping the current partially documented algorithm in place.

> > My "wild shot" fix would be to enable the ground switch after an 
> > insertion event and to drop the software sleep entirely.

> That's entirely plausible to me, either together or possibly just one of those is actually needed.  Do you want to send a patch?
I will send a patch after the complete verification and waveform measurement.

> > However, it is really impossible to know what the delay is for in the 
> > first place. Looking into the code further, the sleep time actually 
> > matches the configured jack detection rate. I have no idea why it 
> > would make sense to wait for a detection cycle after an event, then 
> > enable the ground switch and actually handle the event (which by then 
> > probably reports that jack detection is complete after an insertion). 
> > I really don't understand the logic behind that.

> This all smells like there's either a race condition in a state machine somewhere or the button detection needs a bit of help (though if it's the latter then it'd be conditional on a microphone having been detected).

> Hopefully David will get back to us with some explanation and ideally fix.
David Rau Feb. 6, 2023, 5:38 a.m. UTC | #13
-----Original Message-----
From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck
Sent: Saturday, February 4, 2023 23:42
To: Mark Brown <broonie@kernel.org>
Cc: David Rau <david.rau.zg@renesas.com>; perex@perex.cz; lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music

On Thu, Feb 02, 2023 at 07:36:42PM +0000, Mark Brown wrote:
> 
> > > they have the potential to actually lock up are the 
> > > cancel_work_sync() calls but they were unchanged and the backtrace 
> > > you showed was showing the thread in the msleep().  My guess would 
> > > be that you've got systems where there are very frequent jack 
> > > detection events (potentiallly with broken accessories, or 
> > > possibly due to the ground switch putting things into the wrong 
> > > priority) and that the interrupt is firing again as soon as the 
> > > thread unmasks the primary interrupt which means it never actually stops running.
> 
> > That is what I strongly suspect is happening. I don't know why 
> > exactly the interrupt is firing continuously, but the hang is always in msleep().
> > One possibility might be that the event is actually a disconnect 
> > event, and that enabling and immediately disabling the ground switch 
> > causes another interrupt, which is then handled immediately, causing the hang.
> 
> Could be.  I'd be willing to guess that it's not just one event but 
> rather a stream of events of some kind.  Possibly if it's due to the 
> ground switch it's spuriously detecting a constant stream of button 
> presses for the affected systems, which don't produce any UI visible 
> result which would cause users to pull the accessory for whatever 
> reason?  Whatever's going on I bet it's broken accessories triggering it.
> 

> That seems to be unlikely. The average number of crashes per affected system is 1.92, which points to something the users are doing and less to a broken accessory. 
> We do observe crashes due to broken accessories, but in those cases the number of crashes per system tends to be much > higher.

> Anyway, below is a patch with a possible fix. Of course, I still don't know what the patch originally tried to fix, so it might not do much if anything good.
I added the software debouncing before insertion task to ensue the better compatibility of OMTP Jack. 
> For example, it keeps button detection in the interrupt handler to avoid dropping button events, so if spurious button detection as you suspected is indeed (part of) the problem we might still see a large number of interrupts.

> Guenter

Thanks a lot for your big efforts to implement the temporary fix and verifications.
Would you please let me know the average number of crashes per affected system if you rollback to the pervious fix?
Ref:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/sound/soc/codecs?id=2d969e8f35b1849a43156029a7a6e2943b89d0c0
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/sound/soc/codecs?id=06f5882122e3faa183d76c4ec2c92f4c38e2c7bb

David
---
Mark Brown Feb. 6, 2023, 1:37 p.m. UTC | #14
On Sat, Feb 04, 2023 at 07:42:22AM -0800, Guenter Roeck wrote:

> Solve the problem by enabling the ground switch immediately and only
> after an insertion has been detected. Delay pole orientation detection
> until after the chip reports that detection is complete plus an
> additional time depending on the chip configuration. Do this by
> implementing ground switch detection in a delayed worker.

This looks sensible to me.
Guenter Roeck Feb. 6, 2023, 2:04 p.m. UTC | #15
On 2/5/23 21:38, David Rau wrote:
> 
> 
> -----Original Message-----
> From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck
> Sent: Saturday, February 4, 2023 23:42
> To: Mark Brown <broonie@kernel.org>
> Cc: David Rau <david.rau.zg@renesas.com>; perex@perex.cz; lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
> 
> On Thu, Feb 02, 2023 at 07:36:42PM +0000, Mark Brown wrote:
>>
>>>> they have the potential to actually lock up are the
>>>> cancel_work_sync() calls but they were unchanged and the backtrace
>>>> you showed was showing the thread in the msleep().  My guess would
>>>> be that you've got systems where there are very frequent jack
>>>> detection events (potentiallly with broken accessories, or
>>>> possibly due to the ground switch putting things into the wrong
>>>> priority) and that the interrupt is firing again as soon as the
>>>> thread unmasks the primary interrupt which means it never actually stops running.
>>
>>> That is what I strongly suspect is happening. I don't know why
>>> exactly the interrupt is firing continuously, but the hang is always in msleep().
>>> One possibility might be that the event is actually a disconnect
>>> event, and that enabling and immediately disabling the ground switch
>>> causes another interrupt, which is then handled immediately, causing the hang.
>>
>> Could be.  I'd be willing to guess that it's not just one event but
>> rather a stream of events of some kind.  Possibly if it's due to the
>> ground switch it's spuriously detecting a constant stream of button
>> presses for the affected systems, which don't produce any UI visible
>> result which would cause users to pull the accessory for whatever
>> reason?  Whatever's going on I bet it's broken accessories triggering it.
>>
> 
>> That seems to be unlikely. The average number of crashes per affected system is 1.92, which points to something the users are doing and less to a broken accessory.
>> We do observe crashes due to broken accessories, but in those cases the number of crashes per system tends to be much > higher.
> 
>> Anyway, below is a patch with a possible fix. Of course, I still don't know what the patch originally tried to fix, so it might not do much if anything good.
> I added the software debouncing before insertion task to ensue the better compatibility of OMTP Jack.
>> For example, it keeps button detection in the interrupt handler to avoid dropping button events, so if spurious button detection as you suspected is indeed (part of) the problem we might still see a large number of interrupts.
> 
>> Guenter
> 
> Thanks a lot for your big efforts to implement the temporary fix and verifications.
> Would you please let me know the average number of crashes per affected system if you rollback to the pervious fix?
> Ref:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/sound/soc/codecs?id=2d969e8f35b1849a43156029a7a6e2943b89d0c0
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/sound/soc/codecs?id=06f5882122e3faa183d76c4ec2c92f4c38e2c7bb
> 

You mean just keep the above two patches and revert 969357ec94e6 ?
Sure, I can do that, but feedback from the field would take some
2-3 months. Is that what you recommend to do for now ?

Thanks,
Guenter
Guenter Roeck Feb. 7, 2023, 2:48 a.m. UTC | #16
On 2/6/23 18:42, David Rau wrote:
> 
> 
> -----Original Message-----
> From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck
> Sent: Monday, February 6, 2023 22:05
> To: David Rau <david.rau.zg@renesas.com>; Mark Brown <broonie@kernel.org>
> Cc: perex@perex.cz; lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
> 
> On 2/5/23 21:38, David Rau wrote:
>>>
>>>
>>> -----Original Message-----
>>> From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck
>>> Sent: Saturday, February 4, 2023 23:42
>>> To: Mark Brown <broonie@kernel.org>
>>> Cc: David Rau <david.rau.zg@renesas.com>; perex@perex.cz;
>>> lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com;
>>> alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org
>>> Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on
>>> OMTP headsets when playing music
>>>
>>> On Thu, Feb 02, 2023 at 07:36:42PM +0000, Mark Brown wrote:
>>>>
>>>>>> they have the potential to actually lock up are the
>>>>>> cancel_work_sync() calls but they were unchanged and the backtrace
>>>>>> you showed was showing the thread in the msleep().  My guess would
>>>>>> be that you've got systems where there are very frequent jack
>>>>>> detection events (potentiallly with broken accessories, or possibly
>>>>>> due to the ground switch putting things into the wrong
>>>>>> priority) and that the interrupt is firing again as soon as the
>>>>>> thread unmasks the primary interrupt which means it never actually stops running.
>>>>
>>>>> That is what I strongly suspect is happening. I don't know why
>>>>> exactly the interrupt is firing continuously, but the hang is always in msleep().
>>>>> One possibility might be that the event is actually a disconnect
>>>>> event, and that enabling and immediately disabling the ground switch
>>>>> causes another interrupt, which is then handled immediately, causing the hang.
>>>>
>>>> Could be.  I'd be willing to guess that it's not just one event but
>>>> rather a stream of events of some kind.  Possibly if it's due to the
>>>> ground switch it's spuriously detecting a constant stream of button
>>>> presses for the affected systems, which don't produce any UI visible
>>>> result which would cause users to pull the accessory for whatever
>>>> reason?  Whatever's going on I bet it's broken accessories triggering it.
>>>>
>>>
>>> That seems to be unlikely. The average number of crashes per affected system is 1.92, which points to something the users are doing and less to a broken accessory.
>>>> We do observe crashes due to broken accessories, but in those cases the number of crashes per system tends to be much > higher.
>>>
>>>> Anyway, below is a patch with a possible fix. Of course, I still don't know what the patch originally tried to fix, so it might not do much if anything good.
>>> I added the software debouncing before insertion task to ensue the better compatibility of OMTP Jack.
>>>> For example, it keeps button detection in the interrupt handler to avoid dropping button events, so if spurious button detection as you suspected is indeed (part of) the problem we might still see a large number of interrupts.
>>>
>>>> Guenter
>>>>
>>> Thanks a lot for your big efforts to implement the temporary fix and verifications.
>>> Would you please let me know the average number of crashes per affected system if you rollback to the pervious fix?
>>> Ref:
>>> https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
>>> kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2
>>> Fcommit%2Fsound%2Fsoc%2Fcodecs%3Fid%3D2d969e8f35b1849a43156029a7a6e294
>>> 3b89d0c0&data=05%7C01%7Cdavid.rau.zg%40renesas.com%7Cae6910f8ff4e4e299
>>> bc408db084b1a2a%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638112890
>>> 873388020%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
>>> LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8KgHP%2FOD%2BTDcr
>>> rUVSATFkDCDDmhiCu7d5%2FKhyOszThA%3D&reserved=0
>>> https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
>>> kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2
>>> Fcommit%2Fsound%2Fsoc%2Fcodecs%3Fid%3D06f5882122e3faa183d76c4ec2c92f4c
>>> 38e2c7bb&data=05%7C01%7Cdavid.rau.zg%40renesas.com%7Cae6910f8ff4e4e299
>>> bc408db084b1a2a%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638112890
>>> 873388020%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
>>> LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WosfvANk0YxeJD5PG
>>> %2FnAuAWVqt7m4U3mMaYXefLLdS4%3D&reserved=0
>>>
> 
>> You mean just keep the above two patches and revert 969357ec94e6 ?
>> Sure, I can do that, but feedback from the field would take some
>> 2-3 months. Is that what you recommend to do for now ?
> 
>> Thanks,
>> Guenter
> 
> Thanks for the feedback.
> What I mean is just do rollback to remove the "sleep" patch I did in your repository.
> 
> After the rollback, could you please let me know the average number of crashes per affected system with the same test conditions?
> Will it still take some 2-3 months?
> 

Yes, due to our rollout schedules. Those are crashes observed in the field,
after all.

Guenter
Guenter Roeck Feb. 8, 2023, 6:04 p.m. UTC | #17
On Tue, Feb 07, 2023 at 02:42:14AM +0000, David Rau wrote:
> 
> > You mean just keep the above two patches and revert 969357ec94e6 ?
> > Sure, I can do that, but feedback from the field would take some
> > 2-3 months. Is that what you recommend to do for now ?
> 
> > Thanks,
> > Guenter
> 
> Thanks for the feedback.
> What I mean is just do rollback to remove the "sleep" patch I did in your repository.
> 
> After the rollback, could you please let me know the average number of crashes per affected system with the same test conditions?
> Will it still take some 2-3 months?
> 
The msleep() patch has been reverted in R111 and dev
releases of ChromeOS. I did not get permission to land
the revert in R110, meaning we'll continue to see the
crashes there. R111 is expected to go to Beta shortly,
so we should get _some_ feedback in the next few weeks.

Still, it would be great to get a more permanent fix
for the underlying problem. Also, the msleep() patch
is still upstream, so a solution is still needed there.

I can try to get and play with one of the affected
Chromebooks, but I don't think that would help much
since we still don't know what the undocumented ground
switch is supposed to do.

Thanks,
Guenter
David Rau Feb. 9, 2023, 3:05 a.m. UTC | #18
-----Original Message-----
From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck
Sent: Thursday, February 9, 2023 02:04
To: David Rau <david.rau.zg@renesas.com>
Cc: Mark Brown <broonie@kernel.org>; perex@perex.cz; lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music

On Tue, Feb 07, 2023 at 02:42:14AM +0000, David Rau wrote:
>> 
>> > You mean just keep the above two patches and revert 969357ec94e6 ?
>> > Sure, I can do that, but feedback from the field would take some
>> > 2-3 months. Is that what you recommend to do for now ?
>> 
>> > Thanks,
>> > Guenter
>> 
>> Thanks for the feedback.
>> What I mean is just do rollback to remove the "sleep" patch I did in your repository.
>> 
>> After the rollback, could you please let me know the average number of crashes per affected system with the same test conditions?
>> Will it still take some 2-3 months?
>> 
>The msleep() patch has been reverted in R111 and dev releases of ChromeOS. I did not get permission to land the revert in R110, meaning we'll continue to see the crashes there. 
>R111 is expected to go to Beta shortly, so we should get _some_ feedback in the next few weeks.
>Still, it would be great to get a more permanent fix for the underlying problem. Also, the msleep() patch is still upstream, so a solution is still needed there.
>I can try to get and play with one of the affected Chromebooks, but I don't think that would help much since we still don't know what the undocumented ground switch is supposed to do.
Enable the GND switch earlier is needed to ensure the stable and smooth Jack detection.

>Thanks,
>Guenter

Thanks for the kind explanation and feedback.
I am verifying another method which do the msleep() in the individual schedule work to avoid such crash issue.

Would you please let me know how to reproduce this crash phenomenon?
Then I can ensure the new solution is stronger and solve this problem as well.

Thanks.
David
diff mbox series

Patch

diff --git a/sound/soc/codecs/da7219-aad.c b/sound/soc/codecs/da7219-aad.c
index bba73c44c219..08200ec259f9 100644
--- a/sound/soc/codecs/da7219-aad.c
+++ b/sound/soc/codecs/da7219-aad.c
@@ -352,9 +352,14 @@  static irqreturn_t da7219_aad_irq_thread(int irq, void *data)
 	struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component);
 	struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
 	u8 events[DA7219_AAD_IRQ_REG_MAX];
-	u8 statusa;
+	u8 statusa, srm_st;
 	int i, report = 0, mask = 0;
 
+	srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & DA7219_PLL_SRM_STS_MCLK;
+	msleep(da7219_aad->gnd_switch_delay * ((srm_st == 0x0) ? 2 : 1) - 4);
+	/* Enable ground switch */
+	snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
+
 	/* Read current IRQ events */
 	regmap_bulk_read(da7219->regmap, DA7219_ACCDET_IRQ_EVENT_A,
 			 events, DA7219_AAD_IRQ_REG_MAX);
@@ -454,8 +459,8 @@  static irqreturn_t da7219_aad_irq_thread(int irq, void *data)
 			snd_soc_dapm_disable_pin(dapm, "Mic Bias");
 			snd_soc_dapm_sync(dapm);
 
-			/* Enable ground switch */
-			snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
+			/* Disable ground switch */
+			snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00);
 		}
 	}
 
@@ -831,6 +836,32 @@  static void da7219_aad_handle_pdata(struct snd_soc_component *component)
 	}
 }
 
+static void da7219_aad_handle_gnd_switch_time(struct snd_soc_component *component)
+{
+	struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
+	struct da7219_aad_priv *da7219_aad = da7219->aad;
+	u8 jack_det;
+
+	jack_det = snd_soc_component_read(component, DA7219_ACCDET_CONFIG_2)
+		& DA7219_JACK_DETECT_RATE_MASK;
+	switch (jack_det) {
+	case 0x00:
+		da7219_aad->gnd_switch_delay = 32;
+		break;
+	case 0x10:
+		da7219_aad->gnd_switch_delay = 64;
+		break;
+	case 0x20:
+		da7219_aad->gnd_switch_delay = 128;
+		break;
+	case 0x30:
+		da7219_aad->gnd_switch_delay = 256;
+		break;
+	default:
+		da7219_aad->gnd_switch_delay = 32;
+		break;
+	}
+}
 
 /*
  * Suspend/Resume
@@ -908,9 +939,6 @@  int da7219_aad_init(struct snd_soc_component *component)
 	snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1,
 			    DA7219_BUTTON_CONFIG_MASK, 0);
 
-	/* Enable ground switch */
-	snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
-
 	INIT_WORK(&da7219_aad->btn_det_work, da7219_aad_btn_det_work);
 	INIT_WORK(&da7219_aad->hptest_work, da7219_aad_hptest_work);
 
@@ -928,6 +956,8 @@  int da7219_aad_init(struct snd_soc_component *component)
 	regmap_bulk_write(da7219->regmap, DA7219_ACCDET_IRQ_MASK_A,
 			  &mask, DA7219_AAD_IRQ_REG_MAX);
 
+	da7219_aad_handle_gnd_switch_time(component);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(da7219_aad_init);
diff --git a/sound/soc/codecs/da7219-aad.h b/sound/soc/codecs/da7219-aad.h
index f48a12012ef3..21fdf53095cc 100644
--- a/sound/soc/codecs/da7219-aad.h
+++ b/sound/soc/codecs/da7219-aad.h
@@ -187,6 +187,7 @@  enum da7219_aad_event_regs {
 struct da7219_aad_priv {
 	struct snd_soc_component *component;
 	int irq;
+	int gnd_switch_delay;
 
 	u8 micbias_pulse_lvl;
 	u32 micbias_pulse_time;