Message ID | 26a2690e77671cfe687c5614613fbb6f079f7365.1739959820.git.repk@triplefau.lt |
---|---|
State | Superseded |
Headers | show |
Series | leds: Fix LED_OFF brightness race | expand |
Hi, On 19-Feb-25 11:41 AM, Remi Pommarel wrote: > While commit fa15d8c69238 ("leds: Fix set_brightness_delayed() race") > successfully forces led_set_brightness() to be called with LED_OFF at > least once when switching from blinking to LED on state so that > hw-blinking can be disabled, another race remains. Indeed in > led_set_brightness(LED_OFF) followed by led_set_brightness(any) > scenario the following CPU scheduling can happen: > > CPU0 CPU1 > ---- ---- > set_brightness_delayed() { > test_and_clear_bit(BRIGHTNESS_OFF) > led_set_brightness(LED_OFF) { > set_bit(BRIGHTNESS_OFF) > queue_work() > } > led_set_brightness(any) { > set_bit(BRIGHTNESS) > queue_work() //already queued > } > test_and_clear_bit(BRIGHTNESS) > /* LED set with brightness any */ > } > > /* From previous CPU1 queue_work() */ > set_brightness_delayed() { > test_and_clear_bit(BRIGHTNESS_OFF) > /* LED turned off */ > test_and_clear_bit(BRIGHTNESS) > /* Clear from previous run, LED remains off */ > > In that case the led_set_brightness(LED_OFF)/led_set_brightness(any) > sequence will be effectively executed in reverse order and LED will > remain off. > > With the introduction of commit 32360bf6a5d4 ("leds: Introduce ordered > workqueue for LEDs events instead of system_wq") the race is easier to > trigger as sysfs brightness configuration does not wait for > set_brightness_delayed() work to finish (flush_work() removal). > > Use delayed_set_value to optionnally re-configure brightness after a > LED_OFF. That way a LED state could be configured more that once but > final state will always be as expected. > > Fixes: fa15d8c69238 ("leds: Fix set_brightness_delayed() race") > Signed-off-by: Remi Pommarel <repk@triplefau.lt> Thanks, patch looks good to me: Reviewed-by: Hans de Goede <hdegoede@redhat.com> Regards, Hans > --- > While the race does seem to be very thin, it is very easy to trigger it > on my setup with the following command: > > $ echo "pattern" > /sys/class/leds/<device>/trigger > $ echo "3 200 40 200 3 200 3 200" > /sys/class/leds/<device>/pattern > $ sleep 1 > $ echo 0 > /sys/class/leds/<device>/brightness > $ echo 30 > /sys/class/leds/<device>/brightness > > The issue happens 8 out of 10 times, not sure why this is the case, > maybe two consecutive set_bit() on one CPU are likely to appear as > one just after a previous test_and_clear_bit() on another due to > the full memory barrier implied by this atomic operation ? > > drivers/leds/led-core.c | 20 ++++++++++++++++---- > 1 file changed, 16 insertions(+), 4 deletions(-) > > diff --git a/drivers/leds/led-core.c b/drivers/leds/led-core.c > index f6c46d2e5276..528f53bf13a9 100644 > --- a/drivers/leds/led-core.c > +++ b/drivers/leds/led-core.c > @@ -159,8 +159,19 @@ static void set_brightness_delayed(struct work_struct *ws) > * before this work item runs once. To make sure this works properly > * handle LED_SET_BRIGHTNESS_OFF first. > */ > - if (test_and_clear_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags)) > + if (test_and_clear_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags)) { > set_brightness_delayed_set_brightness(led_cdev, LED_OFF); > + /* > + * The consecutives led_set_brightness(LED_OFF), > + * led_set_brightness(LED_FULL) could have been executed out of > + * order (LED_FULL first), if the work_flags has been set > + * between LED_SET_BRIGHTNESS_OFF and LED_SET_BRIGHTNESS of this > + * work. To avoid ending with the LED turned off, turn the LED > + * on again. > + */ > + if (led_cdev->delayed_set_value != LED_OFF) > + set_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags); > + } > > if (test_and_clear_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags)) > set_brightness_delayed_set_brightness(led_cdev, led_cdev->delayed_set_value); > @@ -331,10 +342,11 @@ void led_set_brightness_nopm(struct led_classdev *led_cdev, unsigned int value) > * change is done immediately afterwards (before the work runs), > * it uses a separate work_flag. > */ > - if (value) { > - led_cdev->delayed_set_value = value; > + led_cdev->delayed_set_value = value; > + > + if (value) > set_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags); > - } else { > + else { > clear_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags); > clear_bit(LED_SET_BLINK, &led_cdev->work_flags); > set_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags);
On Wed, Feb 19, 2025 at 12:52:36PM +0100, Hans de Goede wrote: > Hi, > > On 19-Feb-25 11:41 AM, Remi Pommarel wrote: > > While commit fa15d8c69238 ("leds: Fix set_brightness_delayed() race") > > successfully forces led_set_brightness() to be called with LED_OFF at > > least once when switching from blinking to LED on state so that > > hw-blinking can be disabled, another race remains. Indeed in > > led_set_brightness(LED_OFF) followed by led_set_brightness(any) > > scenario the following CPU scheduling can happen: > > > > CPU0 CPU1 > > ---- ---- > > set_brightness_delayed() { > > test_and_clear_bit(BRIGHTNESS_OFF) > > led_set_brightness(LED_OFF) { > > set_bit(BRIGHTNESS_OFF) > > queue_work() > > } > > led_set_brightness(any) { > > set_bit(BRIGHTNESS) > > queue_work() //already queued > > } > > test_and_clear_bit(BRIGHTNESS) > > /* LED set with brightness any */ > > } > > > > /* From previous CPU1 queue_work() */ > > set_brightness_delayed() { > > test_and_clear_bit(BRIGHTNESS_OFF) > > /* LED turned off */ > > test_and_clear_bit(BRIGHTNESS) > > /* Clear from previous run, LED remains off */ > > > > In that case the led_set_brightness(LED_OFF)/led_set_brightness(any) > > sequence will be effectively executed in reverse order and LED will > > remain off. > > > > With the introduction of commit 32360bf6a5d4 ("leds: Introduce ordered > > workqueue for LEDs events instead of system_wq") the race is easier to > > trigger as sysfs brightness configuration does not wait for > > set_brightness_delayed() work to finish (flush_work() removal). > > > > Use delayed_set_value to optionnally re-configure brightness after a > > LED_OFF. That way a LED state could be configured more that once but > > final state will always be as expected. > > > > Fixes: fa15d8c69238 ("leds: Fix set_brightness_delayed() race") > > Signed-off-by: Remi Pommarel <repk@triplefau.lt> > > Thanks, patch looks good to me: > Actually two additionnal remarks here. The first one is that now more than before, delayed_set_value store should be seen before work_flags modification on other CPUs. That means that a smp_mb__before_atomic() is needed before the two set_bit(). The second one is that delayed_set_value can be bigger than a single byte, so theoretically store tearing can happen and set_brightness_delayed_set_brightness() could be called with an invalid value. WRITE_ONCE/READ_ONCE could prevent that but because the smp_mb__before_atomic() ensures that the "last" delayed_set_value is valid I don't mind having very seldom intermediate invalid values. So I think a v2 with smp_mb__before_atomic() is needed here, what do you think ? Regards,
Hi, On 20-Feb-25 9:46 AM, Remi Pommarel wrote: > On Wed, Feb 19, 2025 at 12:52:36PM +0100, Hans de Goede wrote: >> Hi, >> >> On 19-Feb-25 11:41 AM, Remi Pommarel wrote: >>> While commit fa15d8c69238 ("leds: Fix set_brightness_delayed() race") >>> successfully forces led_set_brightness() to be called with LED_OFF at >>> least once when switching from blinking to LED on state so that >>> hw-blinking can be disabled, another race remains. Indeed in >>> led_set_brightness(LED_OFF) followed by led_set_brightness(any) >>> scenario the following CPU scheduling can happen: >>> >>> CPU0 CPU1 >>> ---- ---- >>> set_brightness_delayed() { >>> test_and_clear_bit(BRIGHTNESS_OFF) >>> led_set_brightness(LED_OFF) { >>> set_bit(BRIGHTNESS_OFF) >>> queue_work() >>> } >>> led_set_brightness(any) { >>> set_bit(BRIGHTNESS) >>> queue_work() //already queued >>> } >>> test_and_clear_bit(BRIGHTNESS) >>> /* LED set with brightness any */ >>> } >>> >>> /* From previous CPU1 queue_work() */ >>> set_brightness_delayed() { >>> test_and_clear_bit(BRIGHTNESS_OFF) >>> /* LED turned off */ >>> test_and_clear_bit(BRIGHTNESS) >>> /* Clear from previous run, LED remains off */ >>> >>> In that case the led_set_brightness(LED_OFF)/led_set_brightness(any) >>> sequence will be effectively executed in reverse order and LED will >>> remain off. >>> >>> With the introduction of commit 32360bf6a5d4 ("leds: Introduce ordered >>> workqueue for LEDs events instead of system_wq") the race is easier to >>> trigger as sysfs brightness configuration does not wait for >>> set_brightness_delayed() work to finish (flush_work() removal). >>> >>> Use delayed_set_value to optionnally re-configure brightness after a >>> LED_OFF. That way a LED state could be configured more that once but >>> final state will always be as expected. >>> >>> Fixes: fa15d8c69238 ("leds: Fix set_brightness_delayed() race") >>> Signed-off-by: Remi Pommarel <repk@triplefau.lt> >> >> Thanks, patch looks good to me: >> > > Actually two additionnal remarks here. The first one is that now more > than before, delayed_set_value store should be seen before work_flags > modification on other CPUs. That means that a smp_mb__before_atomic() > is needed before the two set_bit(). > > The second one is that delayed_set_value can be bigger than a single > byte, so theoretically store tearing can happen and > set_brightness_delayed_set_brightness() could be called with an invalid > value. WRITE_ONCE/READ_ONCE could prevent that but because the > smp_mb__before_atomic() ensures that the "last" delayed_set_value is > valid I don't mind having very seldom intermediate invalid values. > > So I think a v2 with smp_mb__before_atomic() is needed here, what do you > think ? Doing a v2 with smp_mb__before_atomic() sounds good to me. Regards, Hans
diff --git a/drivers/leds/led-core.c b/drivers/leds/led-core.c index f6c46d2e5276..528f53bf13a9 100644 --- a/drivers/leds/led-core.c +++ b/drivers/leds/led-core.c @@ -159,8 +159,19 @@ static void set_brightness_delayed(struct work_struct *ws) * before this work item runs once. To make sure this works properly * handle LED_SET_BRIGHTNESS_OFF first. */ - if (test_and_clear_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags)) + if (test_and_clear_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags)) { set_brightness_delayed_set_brightness(led_cdev, LED_OFF); + /* + * The consecutives led_set_brightness(LED_OFF), + * led_set_brightness(LED_FULL) could have been executed out of + * order (LED_FULL first), if the work_flags has been set + * between LED_SET_BRIGHTNESS_OFF and LED_SET_BRIGHTNESS of this + * work. To avoid ending with the LED turned off, turn the LED + * on again. + */ + if (led_cdev->delayed_set_value != LED_OFF) + set_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags); + } if (test_and_clear_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags)) set_brightness_delayed_set_brightness(led_cdev, led_cdev->delayed_set_value); @@ -331,10 +342,11 @@ void led_set_brightness_nopm(struct led_classdev *led_cdev, unsigned int value) * change is done immediately afterwards (before the work runs), * it uses a separate work_flag. */ - if (value) { - led_cdev->delayed_set_value = value; + led_cdev->delayed_set_value = value; + + if (value) set_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags); - } else { + else { clear_bit(LED_SET_BRIGHTNESS, &led_cdev->work_flags); clear_bit(LED_SET_BLINK, &led_cdev->work_flags); set_bit(LED_SET_BRIGHTNESS_OFF, &led_cdev->work_flags);
While commit fa15d8c69238 ("leds: Fix set_brightness_delayed() race") successfully forces led_set_brightness() to be called with LED_OFF at least once when switching from blinking to LED on state so that hw-blinking can be disabled, another race remains. Indeed in led_set_brightness(LED_OFF) followed by led_set_brightness(any) scenario the following CPU scheduling can happen: CPU0 CPU1 ---- ---- set_brightness_delayed() { test_and_clear_bit(BRIGHTNESS_OFF) led_set_brightness(LED_OFF) { set_bit(BRIGHTNESS_OFF) queue_work() } led_set_brightness(any) { set_bit(BRIGHTNESS) queue_work() //already queued } test_and_clear_bit(BRIGHTNESS) /* LED set with brightness any */ } /* From previous CPU1 queue_work() */ set_brightness_delayed() { test_and_clear_bit(BRIGHTNESS_OFF) /* LED turned off */ test_and_clear_bit(BRIGHTNESS) /* Clear from previous run, LED remains off */ In that case the led_set_brightness(LED_OFF)/led_set_brightness(any) sequence will be effectively executed in reverse order and LED will remain off. With the introduction of commit 32360bf6a5d4 ("leds: Introduce ordered workqueue for LEDs events instead of system_wq") the race is easier to trigger as sysfs brightness configuration does not wait for set_brightness_delayed() work to finish (flush_work() removal). Use delayed_set_value to optionnally re-configure brightness after a LED_OFF. That way a LED state could be configured more that once but final state will always be as expected. Fixes: fa15d8c69238 ("leds: Fix set_brightness_delayed() race") Signed-off-by: Remi Pommarel <repk@triplefau.lt> --- While the race does seem to be very thin, it is very easy to trigger it on my setup with the following command: $ echo "pattern" > /sys/class/leds/<device>/trigger $ echo "3 200 40 200 3 200 3 200" > /sys/class/leds/<device>/pattern $ sleep 1 $ echo 0 > /sys/class/leds/<device>/brightness $ echo 30 > /sys/class/leds/<device>/brightness The issue happens 8 out of 10 times, not sure why this is the case, maybe two consecutive set_bit() on one CPU are likely to appear as one just after a previous test_and_clear_bit() on another due to the full memory barrier implied by this atomic operation ? drivers/leds/led-core.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)