diff mbox series

leds: Fix LED_OFF brightness race

Message ID 26a2690e77671cfe687c5614613fbb6f079f7365.1739959820.git.repk@triplefau.lt
State Superseded
Headers show
Series leds: Fix LED_OFF brightness race | expand

Commit Message

Remi Pommarel Feb. 19, 2025, 10:41 a.m. UTC
While commit fa15d8c69238 ("leds: Fix set_brightness_delayed() race")
successfully forces led_set_brightness() to be called with LED_OFF at
least once when switching from blinking to LED on state so that
hw-blinking can be disabled, another race remains. Indeed in
led_set_brightness(LED_OFF) followed by led_set_brightness(any)
scenario the following CPU scheduling can happen:

    CPU0                                     CPU1
    ----                                     ----
 set_brightness_delayed() {
   test_and_clear_bit(BRIGHTNESS_OFF)
                                         led_set_brightness(LED_OFF) {
                                           set_bit(BRIGHTNESS_OFF)
					   queue_work()
                                         }
                                         led_set_brightness(any) {
                                           set_bit(BRIGHTNESS)
					   queue_work() //already queued
                                         }
   test_and_clear_bit(BRIGHTNESS)
     /* LED set with brightness any */
 }

 /* From previous CPU1 queue_work() */
 set_brightness_delayed() {
   test_and_clear_bit(BRIGHTNESS_OFF)
     /* LED turned off */
   test_and_clear_bit(BRIGHTNESS)
     /* Clear from previous run, LED remains off */

In that case the led_set_brightness(LED_OFF)/led_set_brightness(any)
sequence will be effectively executed in reverse order and LED will
remain off.

With the introduction of commit 32360bf6a5d4 ("leds: Introduce ordered
workqueue for LEDs events instead of system_wq") the race is easier to
trigger as sysfs brightness configuration does not wait for
set_brightness_delayed() work to finish (flush_work() removal).

Use delayed_set_value to optionnally re-configure brightness after a
LED_OFF. That way a LED state could be configured more that once but
final state will always be as expected.

Fixes: fa15d8c69238 ("leds: Fix set_brightness_delayed() race")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
---
While the race does seem to be very thin, it is very easy to trigger it
on my setup with the following command:

 $ echo "pattern" > /sys/class/leds/<device>/trigger
 $ echo "3 200 40 200 3 200 3 200" > /sys/class/leds/<device>/pattern
 $ sleep 1
 $ echo 0 > /sys/class/leds/<device>/brightness
 $ echo 30 > /sys/class/leds/<device>/brightness

The issue happens 8 out of 10 times, not sure why this is the case,
maybe two consecutive set_bit() on one CPU are likely to appear as
one just after a previous test_and_clear_bit() on another due to
the full memory barrier implied by this atomic operation ?

 drivers/leds/led-core.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

Comments

Hans de Goede Feb. 20, 2025, 10:15 a.m. UTC | #1
Hi,

On 20-Feb-25 9:46 AM, Remi Pommarel wrote:
> On Wed, Feb 19, 2025 at 12:52:36PM +0100, Hans de Goede wrote:
>> Hi,
>>
>> On 19-Feb-25 11:41 AM, Remi Pommarel wrote:
>>> While commit fa15d8c69238 ("leds: Fix set_brightness_delayed() race")
>>> successfully forces led_set_brightness() to be called with LED_OFF at
>>> least once when switching from blinking to LED on state so that
>>> hw-blinking can be disabled, another race remains. Indeed in
>>> led_set_brightness(LED_OFF) followed by led_set_brightness(any)
>>> scenario the following CPU scheduling can happen:
>>>
>>>     CPU0                                     CPU1
>>>     ----                                     ----
>>>  set_brightness_delayed() {
>>>    test_and_clear_bit(BRIGHTNESS_OFF)
>>>                                          led_set_brightness(LED_OFF) {
>>>                                            set_bit(BRIGHTNESS_OFF)
>>> 					   queue_work()
>>>                                          }
>>>                                          led_set_brightness(any) {
>>>                                            set_bit(BRIGHTNESS)
>>> 					   queue_work() //already queued
>>>                                          }
>>>    test_and_clear_bit(BRIGHTNESS)
>>>      /* LED set with brightness any */
>>>  }
>>>
>>>  /* From previous CPU1 queue_work() */
>>>  set_brightness_delayed() {
>>>    test_and_clear_bit(BRIGHTNESS_OFF)
>>>      /* LED turned off */
>>>    test_and_clear_bit(BRIGHTNESS)
>>>      /* Clear from previous run, LED remains off */
>>>
>>> In that case the led_set_brightness(LED_OFF)/led_set_brightness(any)
>>> sequence will be effectively executed in reverse order and LED will
>>> remain off.
>>>
>>> With the introduction of commit 32360bf6a5d4 ("leds: Introduce ordered
>>> workqueue for LEDs events instead of system_wq") the race is easier to
>>> trigger as sysfs brightness configuration does not wait for
>>> set_brightness_delayed() work to finish (flush_work() removal).
>>>
>>> Use delayed_set_value to optionnally re-configure brightness after a
>>> LED_OFF. That way a LED state could be configured more that once but
>>> final state will always be as expected.
>>>
>>> Fixes: fa15d8c69238 ("leds: Fix set_brightness_delayed() race")
>>> Signed-off-by: Remi Pommarel <repk@triplefau.lt>
>>
>> Thanks, patch looks good to me:
>>
> 
> Actually two additionnal remarks here. The first one is that now more
> than before, delayed_set_value store should be seen before work_flags
> modification on other CPUs. That means that a smp_mb__before_atomic()
> is needed before the two set_bit().
> 
> The second one is that delayed_set_value can be bigger than a single
> byte, so theoretically store tearing can happen and
> set_brightness_delayed_set_brightness() could be called with an invalid
> value. WRITE_ONCE/READ_ONCE could prevent that but because the
> smp_mb__before_atomic() ensures that the "last" delayed_set_value is
> valid I don't mind having very seldom intermediate invalid values.
> 
> So I think a v2 with smp_mb__before_atomic() is needed here, what do you
> think ?

Doing a v2 with smp_mb__before_atomic() sounds good to me.

Regards,

Hans
diff mbox series

Patch

diff --git a/drivers/leds/led-core.c b/drivers/leds/led-core.c
index f6c46d2e5276..528f53bf13a9 100644
--- a/drivers/leds/led-core.c
+++ b/drivers/leds/led-core.c
@@ -159,8 +159,19 @@  static void set_brightness_delayed(struct work_struct *ws)
 	 * before this work item runs once. To make sure this works properly
 	 * handle LED_SET_BRIGHTNESS_OFF first.
 	 */
-	if (test_and_clear_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags))
+	if (test_and_clear_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags)) {
 		set_brightness_delayed_set_brightness(led_cdev, LED_OFF);
+		/*
+		 * The consecutives led_set_brightness(LED_OFF),
+		 * led_set_brightness(LED_FULL) could have been executed out of
+		 * order (LED_FULL first), if the work_flags has been set
+		 * between LED_SET_BRIGHTNESS_OFF and LED_SET_BRIGHTNESS of this
+		 * work. To avoid ending with the LED turned off, turn the LED
+		 * on again.
+		 */
+		if (led_cdev->delayed_set_value != LED_OFF)
+			set_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags);
+	}
 
 	if (test_and_clear_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags))
 		set_brightness_delayed_set_brightness(led_cdev, led_cdev->delayed_set_value);
@@ -331,10 +342,11 @@  void led_set_brightness_nopm(struct led_classdev *led_cdev, unsigned int value)
 	 * change is done immediately afterwards (before the work runs),
 	 * it uses a separate work_flag.
 	 */
-	if (value) {
-		led_cdev->delayed_set_value = value;
+	led_cdev->delayed_set_value = value;
+
+	if (value)
 		set_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags);
-	} else {
+	else {
 		clear_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags);
 		clear_bit(LED_SET_BLINK, &led_cdev->work_flags);
 		set_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags);