diff mbox series

leds: Fix LED_OFF brightness race

Message ID 26a2690e77671cfe687c5614613fbb6f079f7365.1739959820.git.repk@triplefau.lt
State Superseded
Headers show
Series leds: Fix LED_OFF brightness race | expand

Commit Message

Remi Pommarel Feb. 19, 2025, 10:41 a.m. UTC
While commit fa15d8c69238 ("leds: Fix set_brightness_delayed() race")
successfully forces led_set_brightness() to be called with LED_OFF at
least once when switching from blinking to LED on state so that
hw-blinking can be disabled, another race remains. Indeed in
led_set_brightness(LED_OFF) followed by led_set_brightness(any)
scenario the following CPU scheduling can happen:

    CPU0                                     CPU1
    ----                                     ----
 set_brightness_delayed() {
   test_and_clear_bit(BRIGHTNESS_OFF)
                                         led_set_brightness(LED_OFF) {
                                           set_bit(BRIGHTNESS_OFF)
					   queue_work()
                                         }
                                         led_set_brightness(any) {
                                           set_bit(BRIGHTNESS)
					   queue_work() //already queued
                                         }
   test_and_clear_bit(BRIGHTNESS)
     /* LED set with brightness any */
 }

 /* From previous CPU1 queue_work() */
 set_brightness_delayed() {
   test_and_clear_bit(BRIGHTNESS_OFF)
     /* LED turned off */
   test_and_clear_bit(BRIGHTNESS)
     /* Clear from previous run, LED remains off */

In that case the led_set_brightness(LED_OFF)/led_set_brightness(any)
sequence will be effectively executed in reverse order and LED will
remain off.

With the introduction of commit 32360bf6a5d4 ("leds: Introduce ordered
workqueue for LEDs events instead of system_wq") the race is easier to
trigger as sysfs brightness configuration does not wait for
set_brightness_delayed() work to finish (flush_work() removal).

Use delayed_set_value to optionnally re-configure brightness after a
LED_OFF. That way a LED state could be configured more that once but
final state will always be as expected.

Fixes: fa15d8c69238 ("leds: Fix set_brightness_delayed() race")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
---
While the race does seem to be very thin, it is very easy to trigger it
on my setup with the following command:

 $ echo "pattern" > /sys/class/leds/<device>/trigger
 $ echo "3 200 40 200 3 200 3 200" > /sys/class/leds/<device>/pattern
 $ sleep 1
 $ echo 0 > /sys/class/leds/<device>/brightness
 $ echo 30 > /sys/class/leds/<device>/brightness

The issue happens 8 out of 10 times, not sure why this is the case,
maybe two consecutive set_bit() on one CPU are likely to appear as
one just after a previous test_and_clear_bit() on another due to
the full memory barrier implied by this atomic operation ?

 drivers/leds/led-core.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

Comments

Hans de Goede Feb. 19, 2025, 11:52 a.m. UTC | #1
Hi,

On 19-Feb-25 11:41 AM, Remi Pommarel wrote:
> While commit fa15d8c69238 ("leds: Fix set_brightness_delayed() race")
> successfully forces led_set_brightness() to be called with LED_OFF at
> least once when switching from blinking to LED on state so that
> hw-blinking can be disabled, another race remains. Indeed in
> led_set_brightness(LED_OFF) followed by led_set_brightness(any)
> scenario the following CPU scheduling can happen:
> 
>     CPU0                                     CPU1
>     ----                                     ----
>  set_brightness_delayed() {
>    test_and_clear_bit(BRIGHTNESS_OFF)
>                                          led_set_brightness(LED_OFF) {
>                                            set_bit(BRIGHTNESS_OFF)
> 					   queue_work()
>                                          }
>                                          led_set_brightness(any) {
>                                            set_bit(BRIGHTNESS)
> 					   queue_work() //already queued
>                                          }
>    test_and_clear_bit(BRIGHTNESS)
>      /* LED set with brightness any */
>  }
> 
>  /* From previous CPU1 queue_work() */
>  set_brightness_delayed() {
>    test_and_clear_bit(BRIGHTNESS_OFF)
>      /* LED turned off */
>    test_and_clear_bit(BRIGHTNESS)
>      /* Clear from previous run, LED remains off */
> 
> In that case the led_set_brightness(LED_OFF)/led_set_brightness(any)
> sequence will be effectively executed in reverse order and LED will
> remain off.
> 
> With the introduction of commit 32360bf6a5d4 ("leds: Introduce ordered
> workqueue for LEDs events instead of system_wq") the race is easier to
> trigger as sysfs brightness configuration does not wait for
> set_brightness_delayed() work to finish (flush_work() removal).
> 
> Use delayed_set_value to optionnally re-configure brightness after a
> LED_OFF. That way a LED state could be configured more that once but
> final state will always be as expected.
> 
> Fixes: fa15d8c69238 ("leds: Fix set_brightness_delayed() race")
> Signed-off-by: Remi Pommarel <repk@triplefau.lt>

Thanks, patch looks good to me:

Reviewed-by: Hans de Goede <hdegoede@redhat.com>

Regards,

Hans



> ---
> While the race does seem to be very thin, it is very easy to trigger it
> on my setup with the following command:
> 
>  $ echo "pattern" > /sys/class/leds/<device>/trigger
>  $ echo "3 200 40 200 3 200 3 200" > /sys/class/leds/<device>/pattern
>  $ sleep 1
>  $ echo 0 > /sys/class/leds/<device>/brightness
>  $ echo 30 > /sys/class/leds/<device>/brightness
> 
> The issue happens 8 out of 10 times, not sure why this is the case,
> maybe two consecutive set_bit() on one CPU are likely to appear as
> one just after a previous test_and_clear_bit() on another due to
> the full memory barrier implied by this atomic operation ?
> 
>  drivers/leds/led-core.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/leds/led-core.c b/drivers/leds/led-core.c
> index f6c46d2e5276..528f53bf13a9 100644
> --- a/drivers/leds/led-core.c
> +++ b/drivers/leds/led-core.c
> @@ -159,8 +159,19 @@ static void set_brightness_delayed(struct work_struct *ws)
>  	 * before this work item runs once. To make sure this works properly
>  	 * handle LED_SET_BRIGHTNESS_OFF first.
>  	 */
> -	if (test_and_clear_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags))
> +	if (test_and_clear_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags)) {
>  		set_brightness_delayed_set_brightness(led_cdev, LED_OFF);
> +		/*
> +		 * The consecutives led_set_brightness(LED_OFF),
> +		 * led_set_brightness(LED_FULL) could have been executed out of
> +		 * order (LED_FULL first), if the work_flags has been set
> +		 * between LED_SET_BRIGHTNESS_OFF and LED_SET_BRIGHTNESS of this
> +		 * work. To avoid ending with the LED turned off, turn the LED
> +		 * on again.
> +		 */
> +		if (led_cdev->delayed_set_value != LED_OFF)
> +			set_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags);
> +	}
>  
>  	if (test_and_clear_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags))
>  		set_brightness_delayed_set_brightness(led_cdev, led_cdev->delayed_set_value);
> @@ -331,10 +342,11 @@ void led_set_brightness_nopm(struct led_classdev *led_cdev, unsigned int value)
>  	 * change is done immediately afterwards (before the work runs),
>  	 * it uses a separate work_flag.
>  	 */
> -	if (value) {
> -		led_cdev->delayed_set_value = value;
> +	led_cdev->delayed_set_value = value;
> +
> +	if (value)
>  		set_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags);
> -	} else {
> +	else {
>  		clear_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags);
>  		clear_bit(LED_SET_BLINK, &led_cdev->work_flags);
>  		set_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags);
Remi Pommarel Feb. 20, 2025, 8:46 a.m. UTC | #2
On Wed, Feb 19, 2025 at 12:52:36PM +0100, Hans de Goede wrote:
> Hi,
> 
> On 19-Feb-25 11:41 AM, Remi Pommarel wrote:
> > While commit fa15d8c69238 ("leds: Fix set_brightness_delayed() race")
> > successfully forces led_set_brightness() to be called with LED_OFF at
> > least once when switching from blinking to LED on state so that
> > hw-blinking can be disabled, another race remains. Indeed in
> > led_set_brightness(LED_OFF) followed by led_set_brightness(any)
> > scenario the following CPU scheduling can happen:
> > 
> >     CPU0                                     CPU1
> >     ----                                     ----
> >  set_brightness_delayed() {
> >    test_and_clear_bit(BRIGHTNESS_OFF)
> >                                          led_set_brightness(LED_OFF) {
> >                                            set_bit(BRIGHTNESS_OFF)
> > 					   queue_work()
> >                                          }
> >                                          led_set_brightness(any) {
> >                                            set_bit(BRIGHTNESS)
> > 					   queue_work() //already queued
> >                                          }
> >    test_and_clear_bit(BRIGHTNESS)
> >      /* LED set with brightness any */
> >  }
> > 
> >  /* From previous CPU1 queue_work() */
> >  set_brightness_delayed() {
> >    test_and_clear_bit(BRIGHTNESS_OFF)
> >      /* LED turned off */
> >    test_and_clear_bit(BRIGHTNESS)
> >      /* Clear from previous run, LED remains off */
> > 
> > In that case the led_set_brightness(LED_OFF)/led_set_brightness(any)
> > sequence will be effectively executed in reverse order and LED will
> > remain off.
> > 
> > With the introduction of commit 32360bf6a5d4 ("leds: Introduce ordered
> > workqueue for LEDs events instead of system_wq") the race is easier to
> > trigger as sysfs brightness configuration does not wait for
> > set_brightness_delayed() work to finish (flush_work() removal).
> > 
> > Use delayed_set_value to optionnally re-configure brightness after a
> > LED_OFF. That way a LED state could be configured more that once but
> > final state will always be as expected.
> > 
> > Fixes: fa15d8c69238 ("leds: Fix set_brightness_delayed() race")
> > Signed-off-by: Remi Pommarel <repk@triplefau.lt>
> 
> Thanks, patch looks good to me:
> 

Actually two additionnal remarks here. The first one is that now more
than before, delayed_set_value store should be seen before work_flags
modification on other CPUs. That means that a smp_mb__before_atomic()
is needed before the two set_bit().

The second one is that delayed_set_value can be bigger than a single
byte, so theoretically store tearing can happen and
set_brightness_delayed_set_brightness() could be called with an invalid
value. WRITE_ONCE/READ_ONCE could prevent that but because the
smp_mb__before_atomic() ensures that the "last" delayed_set_value is
valid I don't mind having very seldom intermediate invalid values.

So I think a v2 with smp_mb__before_atomic() is needed here, what do you
think ?

Regards,
Hans de Goede Feb. 20, 2025, 10:15 a.m. UTC | #3
Hi,

On 20-Feb-25 9:46 AM, Remi Pommarel wrote:
> On Wed, Feb 19, 2025 at 12:52:36PM +0100, Hans de Goede wrote:
>> Hi,
>>
>> On 19-Feb-25 11:41 AM, Remi Pommarel wrote:
>>> While commit fa15d8c69238 ("leds: Fix set_brightness_delayed() race")
>>> successfully forces led_set_brightness() to be called with LED_OFF at
>>> least once when switching from blinking to LED on state so that
>>> hw-blinking can be disabled, another race remains. Indeed in
>>> led_set_brightness(LED_OFF) followed by led_set_brightness(any)
>>> scenario the following CPU scheduling can happen:
>>>
>>>     CPU0                                     CPU1
>>>     ----                                     ----
>>>  set_brightness_delayed() {
>>>    test_and_clear_bit(BRIGHTNESS_OFF)
>>>                                          led_set_brightness(LED_OFF) {
>>>                                            set_bit(BRIGHTNESS_OFF)
>>> 					   queue_work()
>>>                                          }
>>>                                          led_set_brightness(any) {
>>>                                            set_bit(BRIGHTNESS)
>>> 					   queue_work() //already queued
>>>                                          }
>>>    test_and_clear_bit(BRIGHTNESS)
>>>      /* LED set with brightness any */
>>>  }
>>>
>>>  /* From previous CPU1 queue_work() */
>>>  set_brightness_delayed() {
>>>    test_and_clear_bit(BRIGHTNESS_OFF)
>>>      /* LED turned off */
>>>    test_and_clear_bit(BRIGHTNESS)
>>>      /* Clear from previous run, LED remains off */
>>>
>>> In that case the led_set_brightness(LED_OFF)/led_set_brightness(any)
>>> sequence will be effectively executed in reverse order and LED will
>>> remain off.
>>>
>>> With the introduction of commit 32360bf6a5d4 ("leds: Introduce ordered
>>> workqueue for LEDs events instead of system_wq") the race is easier to
>>> trigger as sysfs brightness configuration does not wait for
>>> set_brightness_delayed() work to finish (flush_work() removal).
>>>
>>> Use delayed_set_value to optionnally re-configure brightness after a
>>> LED_OFF. That way a LED state could be configured more that once but
>>> final state will always be as expected.
>>>
>>> Fixes: fa15d8c69238 ("leds: Fix set_brightness_delayed() race")
>>> Signed-off-by: Remi Pommarel <repk@triplefau.lt>
>>
>> Thanks, patch looks good to me:
>>
> 
> Actually two additionnal remarks here. The first one is that now more
> than before, delayed_set_value store should be seen before work_flags
> modification on other CPUs. That means that a smp_mb__before_atomic()
> is needed before the two set_bit().
> 
> The second one is that delayed_set_value can be bigger than a single
> byte, so theoretically store tearing can happen and
> set_brightness_delayed_set_brightness() could be called with an invalid
> value. WRITE_ONCE/READ_ONCE could prevent that but because the
> smp_mb__before_atomic() ensures that the "last" delayed_set_value is
> valid I don't mind having very seldom intermediate invalid values.
> 
> So I think a v2 with smp_mb__before_atomic() is needed here, what do you
> think ?

Doing a v2 with smp_mb__before_atomic() sounds good to me.

Regards,

Hans
diff mbox series

Patch

diff --git a/drivers/leds/led-core.c b/drivers/leds/led-core.c
index f6c46d2e5276..528f53bf13a9 100644
--- a/drivers/leds/led-core.c
+++ b/drivers/leds/led-core.c
@@ -159,8 +159,19 @@  static void set_brightness_delayed(struct work_struct *ws)
 	 * before this work item runs once. To make sure this works properly
 	 * handle LED_SET_BRIGHTNESS_OFF first.
 	 */
-	if (test_and_clear_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags))
+	if (test_and_clear_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags)) {
 		set_brightness_delayed_set_brightness(led_cdev, LED_OFF);
+		/*
+		 * The consecutives led_set_brightness(LED_OFF),
+		 * led_set_brightness(LED_FULL) could have been executed out of
+		 * order (LED_FULL first), if the work_flags has been set
+		 * between LED_SET_BRIGHTNESS_OFF and LED_SET_BRIGHTNESS of this
+		 * work. To avoid ending with the LED turned off, turn the LED
+		 * on again.
+		 */
+		if (led_cdev->delayed_set_value != LED_OFF)
+			set_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags);
+	}
 
 	if (test_and_clear_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags))
 		set_brightness_delayed_set_brightness(led_cdev, led_cdev->delayed_set_value);
@@ -331,10 +342,11 @@  void led_set_brightness_nopm(struct led_classdev *led_cdev, unsigned int value)
 	 * change is done immediately afterwards (before the work runs),
 	 * it uses a separate work_flag.
 	 */
-	if (value) {
-		led_cdev->delayed_set_value = value;
+	led_cdev->delayed_set_value = value;
+
+	if (value)
 		set_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags);
-	} else {
+	else {
 		clear_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags);
 		clear_bit(LED_SET_BLINK, &led_cdev->work_flags);
 		set_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags);