diff mbox series

Performance: devfreq: avoid devfreq delay work re-init before

Message ID 20231110093457.458-1-huangzaiyang@oppo.com
State New
Headers show
Series Performance: devfreq: avoid devfreq delay work re-init before | expand

Commit Message

黄再漾(Joyyoung Huang) Nov. 10, 2023, 9:34 a.m. UTC
From: huangzaiyang <joyyoung.wang@gmail.com>

There is a timer_list race condition when executing the following test shell script:
'''
while true
do
        echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor
        echo "performance" > /sys/class/devfreq/1d84000.ufshc/governor
done
'''

[13511.214366][    C3] Unable to handle kernel paging request at virtual address dead00000000012a
[13511.214393][    C3] Mem abort info:
[13511.214398][    C3]   ESR = 0x96000044
[13511.214404][    C3]   EC = 0x25: DABT (current EL), IL = 32 bits
[13511.214409][    C3]   SET = 0, FnV = 0
[13511.214414][    C3]   EA = 0, S1PTW = 0
[13511.214417][    C3] Data abort info:
[13511.214422][    C3]   ISV = 0, ISS = 0x00000044
[13511.214427][    C3]   CM = 0, WnR = 1
[13511.214432][    C3] [dead00000000012a] address between user and kernel address ranges
[13511.214439][    C3] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[13511.215449][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G S      W  O      5.10.168-android12-9-o-g63cc297a7b34 #1
[13511.215454][    C3] Hardware name: Qualcomm Technologies, Inc. Cape MTP, Whiteswan (DT)
[13511.215460][    C3] pstate: 82400085 (Nzcv daIf +PAN -UAO +TCO BTYPE=--)
[13511.215472][    C3] pc : expire_timers+0x9c/0x428
[13511.215478][    C3] lr : __run_timers+0x1f0/0x328
[13511.215483][    C3] sp : ffffffc00801bdd0
[13511.215487][    C3] x29: ffffffc00801bdf0 x28: ffffffdb87b31698
[13511.215493][    C3] x27: ffffffdb87999e58 x26: ffffffdb87966008
[13511.215499][    C3] x25: 0000000000000001 x24: ffffff8001734a00
[13511.215506][    C3] x23: 00000000000000e0 x22: dead000000000122
[13511.215512][    C3] x21: 000000010032658e x20: ffffff89f7a9ae80
[13511.215518][    C3] x19: ffffffc00801be50 x18: ffffffc00801d038
[13511.215525][    C3] x17: 0000000000000240 x16: 0000000000000201
[13511.215532][    C3] x15: ffffffffffffffff x14: ffffff89f7a9aef8
[13511.215538][    C3] x13: 0000000000000240 x12: ffffff89f7a9aea8
[13511.215544][    C3] x11: 0000000000000021 x10: 000000014032658e
[13511.215550][    C3] x9 : ffffffc00801be50 x8 : dead000000000122
[13511.215556][    C3] x7 : ffff71646c68735e x6 : ffffff89f7aaae58
[13511.215563][    C3] x5 : 0000000000000000 x4 : 0000000000000101
[13511.215569][    C3] x3 : ffffff89f7a9aef0 x2 : ffffff89f7a9aef0
[13511.215575][    C3] x1 : ffffffc00801be50 x0 : ffffff8045804428
[13511.215581][    C3] Call trace:
[13511.215586][    C3]  expire_timers+0x9c/0x428
[13511.215591][    C3]  __run_timers+0x1f0/0x328
[13511.215596][    C3]  run_timer_softirq+0x28/0x58
[13511.215602][    C3]  efi_header_end+0x168/0x5ec
[13511.215610][    C3]  __irq_exit_rcu+0x108/0x124
[13511.215617][    C3]  __handle_domain_irq+0x118/0x1e4
[13511.215625][    C3]  gic_handle_irq.31230+0x6c/0x250
[13511.215630][    C3]  el1_irq+0xe4/0x1c0
[13511.215638][    C3]  cpuidle_enter_state+0x3a4/0xa04
[13511.215644][    C3]  do_idle+0x308/0x574
[13511.215649][    C3]  cpu_startup_entry+0x84/0x90
[13511.215656][    C3]  secondary_start_kernel+0x204/0x274
[13511.215664][    C3] Code: d503201f a9402408 f9000128 b4000048 (f9000509)
[13511.215670][    C3] ---[ end trace 5100bad72a35d566 ]---
[13511.215676][    C3] Kernel panic - not syncing: Oops: Fatal exception in interrupt

This is because when switching the governor through the sys node,
the devfreq_monitor_start function will re-initialize the delayed work task,
which will cause the delay work pending flag to become invalid, and the timer pending judgment contained in the delayed work will also become invalid,
and then the pending interception will be executed when the queue is executed.

So we remove the delay work'initialization work to the devfreq_add_device and timer_store functions,
and the delay work pending judgment is performed before the devfreq_monitor_start function performs the queue operation.

Signed-off-by: ZaiYang Huang <huangzaiyang@oppo.com>
---
 drivers/devfreq/devfreq.c | 36 ++++++++++++++++++++++++------------
 1 file changed, 24 insertions(+), 12 deletions(-)

--
2.17.1

Comments

Mukesh Ojha Nov. 14, 2023, 10:33 a.m. UTC | #1
Just saw this mail, i have posted a patch to synchronize 
devfreq_monitor_start/stop.

https://lore.kernel.org/all/1699957648-31299-1-git-send-email-quic_mojha@quicinc.com/

let me know if that works too.

-Mukesh

On 11/10/2023 3:04 PM, huangzaiyang wrote:
> From: huangzaiyang <joyyoung.wang@gmail.com>
> 
> There is a timer_list race condition when executing the following test shell script:
> '''
> while true
> do
>          echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor
>          echo "performance" > /sys/class/devfreq/1d84000.ufshc/governor
> done
> '''
> 
> [13511.214366][    C3] Unable to handle kernel paging request at virtual address dead00000000012a
> [13511.214393][    C3] Mem abort info:
> [13511.214398][    C3]   ESR = 0x96000044
> [13511.214404][    C3]   EC = 0x25: DABT (current EL), IL = 32 bits
> [13511.214409][    C3]   SET = 0, FnV = 0
> [13511.214414][    C3]   EA = 0, S1PTW = 0
> [13511.214417][    C3] Data abort info:
> [13511.214422][    C3]   ISV = 0, ISS = 0x00000044
> [13511.214427][    C3]   CM = 0, WnR = 1
> [13511.214432][    C3] [dead00000000012a] address between user and kernel address ranges
> [13511.214439][    C3] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> [13511.215449][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G S      W  O      5.10.168-android12-9-o-g63cc297a7b34 #1
> [13511.215454][    C3] Hardware name: Qualcomm Technologies, Inc. Cape MTP, Whiteswan (DT)
> [13511.215460][    C3] pstate: 82400085 (Nzcv daIf +PAN -UAO +TCO BTYPE=--)
> [13511.215472][    C3] pc : expire_timers+0x9c/0x428
> [13511.215478][    C3] lr : __run_timers+0x1f0/0x328
> [13511.215483][    C3] sp : ffffffc00801bdd0
> [13511.215487][    C3] x29: ffffffc00801bdf0 x28: ffffffdb87b31698
> [13511.215493][    C3] x27: ffffffdb87999e58 x26: ffffffdb87966008
> [13511.215499][    C3] x25: 0000000000000001 x24: ffffff8001734a00
> [13511.215506][    C3] x23: 00000000000000e0 x22: dead000000000122
> [13511.215512][    C3] x21: 000000010032658e x20: ffffff89f7a9ae80
> [13511.215518][    C3] x19: ffffffc00801be50 x18: ffffffc00801d038
> [13511.215525][    C3] x17: 0000000000000240 x16: 0000000000000201
> [13511.215532][    C3] x15: ffffffffffffffff x14: ffffff89f7a9aef8
> [13511.215538][    C3] x13: 0000000000000240 x12: ffffff89f7a9aea8
> [13511.215544][    C3] x11: 0000000000000021 x10: 000000014032658e
> [13511.215550][    C3] x9 : ffffffc00801be50 x8 : dead000000000122
> [13511.215556][    C3] x7 : ffff71646c68735e x6 : ffffff89f7aaae58
> [13511.215563][    C3] x5 : 0000000000000000 x4 : 0000000000000101
> [13511.215569][    C3] x3 : ffffff89f7a9aef0 x2 : ffffff89f7a9aef0
> [13511.215575][    C3] x1 : ffffffc00801be50 x0 : ffffff8045804428
> [13511.215581][    C3] Call trace:
> [13511.215586][    C3]  expire_timers+0x9c/0x428
> [13511.215591][    C3]  __run_timers+0x1f0/0x328
> [13511.215596][    C3]  run_timer_softirq+0x28/0x58
> [13511.215602][    C3]  efi_header_end+0x168/0x5ec
> [13511.215610][    C3]  __irq_exit_rcu+0x108/0x124
> [13511.215617][    C3]  __handle_domain_irq+0x118/0x1e4
> [13511.215625][    C3]  gic_handle_irq.31230+0x6c/0x250
> [13511.215630][    C3]  el1_irq+0xe4/0x1c0
> [13511.215638][    C3]  cpuidle_enter_state+0x3a4/0xa04
> [13511.215644][    C3]  do_idle+0x308/0x574
> [13511.215649][    C3]  cpu_startup_entry+0x84/0x90
> [13511.215656][    C3]  secondary_start_kernel+0x204/0x274
> [13511.215664][    C3] Code: d503201f a9402408 f9000128 b4000048 (f9000509)
> [13511.215670][    C3] ---[ end trace 5100bad72a35d566 ]---
> [13511.215676][    C3] Kernel panic - not syncing: Oops: Fatal exception in interrupt
> 
> This is because when switching the governor through the sys node,
> the devfreq_monitor_start function will re-initialize the delayed work task,
> which will cause the delay work pending flag to become invalid, and the timer pending judgment contained in the delayed work will also become invalid,
> and then the pending interception will be executed when the queue is executed.
> 
> So we remove the delay work'initialization work to the devfreq_add_device and timer_store functions,
> and the delay work pending judgment is performed before the devfreq_monitor_start function performs the queue operation.
> 
> Signed-off-by: ZaiYang Huang <huangzaiyang@oppo.com>
> ---
>   drivers/devfreq/devfreq.c | 36 ++++++++++++++++++++++++------------
>   1 file changed, 24 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index b3a68d5833bd..8ae6f853a21e 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -483,18 +483,7 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>          if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN))
>                  return;
> 
> -       switch (devfreq->profile->timer) {
> -       case DEVFREQ_TIMER_DEFERRABLE:
> -               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
> -               break;
> -       case DEVFREQ_TIMER_DELAYED:
> -               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
> -               break;
> -       default:
> -               return;
> -       }
> -
> -       if (devfreq->profile->polling_ms)
> +       if (devfreq->profile->polling_ms && !delayed_work_pending(&devfreq->work))
>                  queue_delayed_work(devfreq_wq, &devfreq->work,
>                          msecs_to_jiffies(devfreq->profile->polling_ms));
>   }
> @@ -830,6 +819,17 @@ struct devfreq *devfreq_add_device(struct device *dev,
>                  goto err_dev;
>          }
> 
> +       switch (devfreq->profile->timer) {
> +       case DEVFREQ_TIMER_DEFERRABLE:
> +               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
> +               break;
> +       case DEVFREQ_TIMER_DELAYED:
> +               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
> +               break;
> +       default:
> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", devfreq->governor_name,
> +                       __func__);
> +       }
>          if (!devfreq->profile->max_state || !devfreq->profile->freq_table) {
>                  mutex_unlock(&devfreq->lock);
>                  err = set_freq_table(devfreq);
> @@ -1860,6 +1860,18 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>          df->profile->timer = timer;
>          mutex_unlock(&df->lock);
> 
> +       switch (df->profile->timer) {
> +       case DEVFREQ_TIMER_DEFERRABLE:
> +               INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
> +               break;
> +       case DEVFREQ_TIMER_DELAYED:
> +               INIT_DELAYED_WORK(&df->work, devfreq_monitor);
> +               break;
> +       default:
> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
> +                       __func__);
> +       }
> +
>          ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>          if (ret) {
>                  dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
> --
> 2.17.1
> 
> ________________________________
> OPPO
> 
> ±¾µç×ÓÓʼþ¼°Æ丽¼þº¬ÓÐOPPO¹«Ë¾µÄ±£ÃÜÐÅÏ¢£¬½öÏÞÓÚÓʼþÖ¸Ã÷µÄÊÕ¼þÈË£¨°üº¬¸öÈ˼°Èº×飩ʹÓ᣽ûÖ¹ÈκÎÈËÔÚδ¾­ÊÚȨµÄÇé¿öÏÂÒÔÈκÎÐÎʽʹÓá£Èç¹ûÄú´íÊÕÁ˱¾Óʼþ£¬ÇÐÎð´«²¥¡¢·Ö·¢¡¢¸´ÖÆ¡¢Ó¡Ë¢»òʹÓñ¾ÓʼþÖ®Èκβ¿·Ö»òÆäËùÔØÖ®ÈκÎÄÚÈÝ£¬²¢ÇëÁ¢¼´ÒÔµç×ÓÓʼþ֪ͨ·¢¼þÈ˲¢É¾³ý±¾Óʼþ¼°Æ丽¼þ¡£
> ÍøÂçͨѶ¹ÌÓÐȱÏÝ¿ÉÄܵ¼ÖÂÓʼþ±»½ØÁô¡¢Ð޸ġ¢¶ªÊ§¡¢ÆÆ»µ»ò°üº¬¼ÆËã»ú²¡¶¾µÈ²»°²È«Çé¿ö£¬OPPO¶Ô´ËÀà´íÎó»òÒÅ©¶øÒýÖÂÖ®ÈκÎËðʧ¸Å²»³Ðµ£ÔðÈβ¢±£ÁôÓë±¾ÓʼþÏà¹ØÖ®Ò»ÇÐȨÀû¡£
> ³ý·ÇÃ÷ȷ˵Ã÷£¬±¾Óʼþ¼°Æ丽¼þÎÞÒâ×÷ΪÔÚÈκιú¼Ò»òµØÇøÖ®ÒªÔ¼¡¢ÕÐÀ¿»ò³Ðŵ£¬ÒàÎÞÒâ×÷ΪÈκν»Ò×»òºÏ֮ͬÕýʽȷÈÏ¡£ ·¢¼þÈË¡¢ÆäËùÊô»ú¹¹»òËùÊô»ú¹¹Ö®¹ØÁª»ú¹¹»òÈκÎÉÏÊö»ú¹¹Ö®¹É¶«¡¢¶­Ê¡¢¸ß¼¶¹ÜÀíÈËÔ±¡¢Ô±¹¤»òÆäËûÈκÎÈË£¨ÒÔϳơ°·¢¼þÈË¡±»ò¡°OPPO¡±£©²»Òò±¾ÓʼþÖ®ÎóËͶø·ÅÆúÆäËùÏíÖ®ÈκÎȨÀû£¬Ò಻¶ÔÒò¹ÊÒâ»ò¹ýʧʹÓøõÈÐÅÏ¢¶øÒý·¢»ò¿ÉÄÜÒý·¢µÄËðʧ³Ðµ£ÈκÎÔðÈΡ£
> ÎÄ»¯²îÒìÅû¶£ºÒòÈ«ÇòÎÄ»¯²îÒìÓ°Ï죬µ¥´¿ÒÔYES\OK»òÆäËû¼òµ¥´Ê»ãµÄ»Ø¸´²¢²»¹¹³É·¢¼þÈ˶ÔÈκν»Ò×»òºÏ֮ͬÕýʽȷÈÏ»ò½ÓÊÜ£¬ÇëÓë·¢¼þÈËÔÙ´ÎÈ·ÈÏÒÔ»ñµÃÃ÷È·ÊéÃæÒâ¼û¡£·¢¼þÈ˲»¶ÔÈκÎÊÜÎÄ»¯²îÒìÓ°Ïì¶øµ¼Ö¹ÊÒâ»ò´íÎóʹÓøõÈÐÅÏ¢ËùÔì³ÉµÄÈκÎÖ±½Ó»ò¼ä½ÓË𺦳е£ÔðÈΡ£
> This e-mail and its attachments contain confidential information from OPPO, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you are not the intended recipient, please do not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.
> Electronic communications may contain computer viruses or other defects inherently, may not be accurately and/or timely transmitted to other systems, or may be intercepted, modified ,delayed, deleted or interfered. OPPO shall not be liable for any damages that arise or may arise from such matter and reserves all rights in connection with the email.
> Unless expressly stated, this e-mail and its attachments are provided without any warranty, acceptance or promise of any kind in any country or region, nor constitute a formal confirmation or acceptance of any transaction or contract. The sender, together with its affiliates or any shareholder, director, officer, employee or any other person of any such institution (hereinafter referred to as "sender" or "OPPO") does not waive any rights and shall not be liable for any damages that arise or may arise from the intentional or negligent use of such information.
> Cultural Differences Disclosure: Due to global cultural differences, any reply with only YES\OK or other simple words does not constitute any confirmation or acceptance of any transaction or contract, please confirm with the sender again to ensure clear opinion in written form. The sender shall not be responsible for any direct or indirect damages resulting from the intentional or misuse of such information.
> ________________________________
> OPPO
> 
> 本电子邮件及其附件含有OPPO公司的保密信息,仅限于邮件指明的收件人(包含个人及群组)使用。禁止任何人在未经授权的情况下以任何形式使用。如果您错收了本邮件,切勿传播、分发、复制、印刷或使用本邮件之任何部分或其所载之任何内容,并请立即以电子邮件通知发件人并删除本邮件及其附件。
> 网络通讯固有缺陷可能导致邮件被截留、修改、丢失、破坏或包含计算机病毒等不安全情况,OPPO对此类错误或遗漏而引致之任何损失概不承担责任并保留与本邮件相关之一切权利。
> 除非明确说明,本邮件及其附件无意作为在任何国家或地区之要约、招揽或承诺,亦无意作为任何交易或合同之正式确认。 发件人、其所属机构或所属机构之关联机构或任何上述机构之股东、董事、高级管理人员、员工或其他任何人(以下称“发件人”或“OPPO”)不因本邮件之误送而放弃其所享之任何权利,亦不对因故意或过失使用该等信息而引发或可能引发的损失承担任何责任。
> 文化差异披露:因全球文化差异影响,单纯以YES\OK或其他简单词汇的回复并不构成发件人对任何交易或合同之正式确认或接受,请与发件人再次确认以获得明确书面意见。发件人不对任何受文化差异影响而导致故意或错误使用该等信息所造成的任何直接或间接损害承担责任。
> This e-mail and its attachments contain confidential information from OPPO, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you are not the intended recipient, please do not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.
> Electronic communications may contain computer viruses or other defects inherently, may not be accurately and/or timely transmitted to other systems, or may be intercepted, modified ,delayed, deleted or interfered. OPPO shall not be liable for any damages that arise or may arise from such matter and reserves all rights in connection with the email.
> Unless expressly stated, this e-mail and its attachments are provided without any warranty, acceptance or promise of any kind in any country or region, nor constitute a formal confirmation or acceptance of any transaction or contract. The sender, together with its affiliates or any shareholder, director, officer, employee or any other person of any such institution (hereinafter referred to as "sender" or "OPPO") does not waive any rights and shall not be liable for any damages that arise or may arise from the intentional or negligent use of such information.
> Cultural Differences Disclosure: Due to global cultural differences, any reply with only YES\OK or other simple words does not constitute any confirmation or acceptance of any transaction or contract, please confirm with the sender again to ensure clear opinion in written form. The sender shall not be responsible for any direct or indirect damages resulting from the intentional or misuse of such information.
Mukesh Ojha Nov. 14, 2023, 12:12 p.m. UTC | #2
On 11/10/2023 3:04 PM, huangzaiyang wrote:
> From: huangzaiyang <joyyoung.wang@gmail.com>
> 
> There is a timer_list race condition when executing the following test shell script:
> '''
> while true
> do
>          echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor
>          echo "performance" > /sys/class/devfreq/1d84000.ufshc/governor
> done
> '''
> 
> [13511.214366][    C3] Unable to handle kernel paging request at virtual address dead00000000012a
> [13511.214393][    C3] Mem abort info:
> [13511.214398][    C3]   ESR = 0x96000044
> [13511.214404][    C3]   EC = 0x25: DABT (current EL), IL = 32 bits
> [13511.214409][    C3]   SET = 0, FnV = 0
> [13511.214414][    C3]   EA = 0, S1PTW = 0
> [13511.214417][    C3] Data abort info:
> [13511.214422][    C3]   ISV = 0, ISS = 0x00000044
> [13511.214427][    C3]   CM = 0, WnR = 1
> [13511.214432][    C3] [dead00000000012a] address between user and kernel address ranges
> [13511.214439][    C3] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> [13511.215449][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G S      W  O      5.10.168-android12-9-o-g63cc297a7b34 #1
> [13511.215454][    C3] Hardware name: Qualcomm Technologies, Inc. Cape MTP, Whiteswan (DT)
> [13511.215460][    C3] pstate: 82400085 (Nzcv daIf +PAN -UAO +TCO BTYPE=--)
> [13511.215472][    C3] pc : expire_timers+0x9c/0x428
> [13511.215478][    C3] lr : __run_timers+0x1f0/0x328
> [13511.215483][    C3] sp : ffffffc00801bdd0
> [13511.215487][    C3] x29: ffffffc00801bdf0 x28: ffffffdb87b31698
> [13511.215493][    C3] x27: ffffffdb87999e58 x26: ffffffdb87966008
> [13511.215499][    C3] x25: 0000000000000001 x24: ffffff8001734a00
> [13511.215506][    C3] x23: 00000000000000e0 x22: dead000000000122
> [13511.215512][    C3] x21: 000000010032658e x20: ffffff89f7a9ae80
> [13511.215518][    C3] x19: ffffffc00801be50 x18: ffffffc00801d038
> [13511.215525][    C3] x17: 0000000000000240 x16: 0000000000000201
> [13511.215532][    C3] x15: ffffffffffffffff x14: ffffff89f7a9aef8
> [13511.215538][    C3] x13: 0000000000000240 x12: ffffff89f7a9aea8
> [13511.215544][    C3] x11: 0000000000000021 x10: 000000014032658e
> [13511.215550][    C3] x9 : ffffffc00801be50 x8 : dead000000000122
> [13511.215556][    C3] x7 : ffff71646c68735e x6 : ffffff89f7aaae58
> [13511.215563][    C3] x5 : 0000000000000000 x4 : 0000000000000101
> [13511.215569][    C3] x3 : ffffff89f7a9aef0 x2 : ffffff89f7a9aef0
> [13511.215575][    C3] x1 : ffffffc00801be50 x0 : ffffff8045804428
> [13511.215581][    C3] Call trace:
> [13511.215586][    C3]  expire_timers+0x9c/0x428
> [13511.215591][    C3]  __run_timers+0x1f0/0x328
> [13511.215596][    C3]  run_timer_softirq+0x28/0x58
> [13511.215602][    C3]  efi_header_end+0x168/0x5ec
> [13511.215610][    C3]  __irq_exit_rcu+0x108/0x124
> [13511.215617][    C3]  __handle_domain_irq+0x118/0x1e4
> [13511.215625][    C3]  gic_handle_irq.31230+0x6c/0x250
> [13511.215630][    C3]  el1_irq+0xe4/0x1c0
> [13511.215638][    C3]  cpuidle_enter_state+0x3a4/0xa04
> [13511.215644][    C3]  do_idle+0x308/0x574
> [13511.215649][    C3]  cpu_startup_entry+0x84/0x90
> [13511.215656][    C3]  secondary_start_kernel+0x204/0x274
> [13511.215664][    C3] Code: d503201f a9402408 f9000128 b4000048 (f9000509)
> [13511.215670][    C3] ---[ end trace 5100bad72a35d566 ]---
> [13511.215676][    C3] Kernel panic - not syncing: Oops: Fatal exception in interrupt
> 
> This is because when switching the governor through the sys node,
> the devfreq_monitor_start function will re-initialize the delayed work task,
> which will cause the delay work pending flag to become invalid, and the timer pending judgment contained in the delayed work will also become invalid,
> and then the pending interception will be executed when the queue is executed.
> 
> So we remove the delay work'initialization work to the devfreq_add_device and timer_store functions,
> and the delay work pending judgment is performed before the devfreq_monitor_start function performs the queue operation.
> 
> Signed-off-by: ZaiYang Huang <huangzaiyang@oppo.com>
> ---
>   drivers/devfreq/devfreq.c | 36 ++++++++++++++++++++++++------------
>   1 file changed, 24 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index b3a68d5833bd..8ae6f853a21e 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -483,18 +483,7 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>          if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN))
>                  return;
> 
> -       switch (devfreq->profile->timer) {
> -       case DEVFREQ_TIMER_DEFERRABLE:
> -               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
> -               break;
> -       case DEVFREQ_TIMER_DELAYED:
> -               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
> -               break;
> -       default:
> -               return;
> -       }
> -
> -       if (devfreq->profile->polling_ms)
> +       if (devfreq->profile->polling_ms && !delayed_work_pending(&devfreq->work))
>                  queue_delayed_work(devfreq_wq, &devfreq->work,
>                          msecs_to_jiffies(devfreq->profile->polling_ms));
>   }
> @@ -830,6 +819,17 @@ struct devfreq *devfreq_add_device(struct device *dev,
>                  goto err_dev;
>          }
> 
> +       switch (devfreq->profile->timer) {
> +       case DEVFREQ_TIMER_DEFERRABLE:
> +               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
> +               break;
> +       case DEVFREQ_TIMER_DELAYED:
> +               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
> +               break;
> +       default:
> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", devfreq->governor_name,
> +                       __func__);
> +       }

[..]

>          if (!devfreq->profile->max_state || !devfreq->profile->freq_table) {
>                  mutex_unlock(&devfreq->lock);
>                  err = set_freq_table(devfreq);
> @@ -1860,6 +1860,18 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>          df->profile->timer = timer;
>          mutex_unlock(&df->lock);
> 
> +       switch (df->profile->timer) {
> +       case DEVFREQ_TIMER_DEFERRABLE:
> +               INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
> +               break;
> +       case DEVFREQ_TIMER_DELAYED:
> +               INIT_DELAYED_WORK(&df->work, devfreq_monitor);
> +               break;
> +       default:
> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
> +                       __func__);
> +       }
> +
Here, this can cause issue right, as it is modifying the delayed work
data even before stopping the current running instances..

Should the above thing be done after DEVFREQ_GOV_STOP ?
But again it will boil down to the same thing as it is currently
now .
>          ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>          if (ret) {
>                  dev_warn(dev, "%s: Governor %s not stopped(%d)\n",

I was thinking if i put mutex around cancel_delayed_work_sync() as well 
in devfreq_monitor_stop() on top of below patch, should cover the 
corruption of workdata.

https://lore.kernel.org/all/1699957648-31299-1-git-send-email-quic_mojha@quicinc.com/

-Mukesh

> --
> 2.17.1

Remove the entire below thing..>
> ________________________________
> OPPO
> 
> ±¾µç×ÓÓʼþ¼°Æ丽¼þº¬ÓÐOPPO¹«Ë¾µÄ±£ÃÜÐÅÏ¢£¬½öÏÞÓÚÓʼþÖ¸Ã÷µÄÊÕ¼þÈË£¨°üº¬¸öÈ˼°Èº×飩ʹÓ᣽ûÖ¹ÈκÎÈËÔÚδ¾­ÊÚȨµÄÇé¿öÏÂÒÔÈκÎÐÎʽʹÓá£Èç¹ûÄú´íÊÕÁ˱¾Óʼþ£¬ÇÐÎð´«²¥¡¢·Ö·¢¡¢¸´ÖÆ¡¢Ó¡Ë¢»òʹÓñ¾ÓʼþÖ®Èκβ¿·Ö»òÆäËùÔØÖ®ÈκÎÄÚÈÝ£¬²¢ÇëÁ¢¼´ÒÔµç×ÓÓʼþ֪ͨ·¢¼þÈ˲¢É¾³ý±¾Óʼþ¼°Æ丽¼þ¡£
> ÍøÂçͨѶ¹ÌÓÐȱÏÝ¿ÉÄܵ¼ÖÂÓʼþ±»½ØÁô¡¢Ð޸ġ¢¶ªÊ§¡¢ÆÆ»µ»ò°üº¬¼ÆËã»ú²¡¶¾µÈ²»°²È«Çé¿ö£¬OPPO¶Ô´ËÀà´íÎó»òÒÅ©¶øÒýÖÂÖ®ÈκÎËðʧ¸Å²»³Ðµ£ÔðÈβ¢±£ÁôÓë±¾ÓʼþÏà¹ØÖ®Ò»ÇÐȨÀû¡£
> ³ý·ÇÃ÷ȷ˵Ã÷£¬±¾Óʼþ¼°Æ丽¼þÎÞÒâ×÷ΪÔÚÈκιú¼Ò»òµØÇøÖ®ÒªÔ¼¡¢ÕÐÀ¿»ò³Ðŵ£¬ÒàÎÞÒâ×÷ΪÈκν»Ò×»òºÏ֮ͬÕýʽȷÈÏ¡£ ·¢¼þÈË¡¢ÆäËùÊô»ú¹¹»òËùÊô»ú¹¹Ö®¹ØÁª»ú¹¹»òÈκÎÉÏÊö»ú¹¹Ö®¹É¶«¡¢¶­Ê¡¢¸ß¼¶¹ÜÀíÈËÔ±¡¢Ô±¹¤»òÆäËûÈκÎÈË£¨ÒÔϳơ°·¢¼þÈË¡±»ò¡°OPPO¡±£©²»Òò±¾ÓʼþÖ®ÎóËͶø·ÅÆúÆäËùÏíÖ®ÈκÎȨÀû£¬Ò಻¶ÔÒò¹ÊÒâ»ò¹ýʧʹÓøõÈÐÅÏ¢¶øÒý·¢»ò¿ÉÄÜÒý·¢µÄËðʧ³Ðµ£ÈκÎÔðÈΡ£
> ÎÄ»¯²îÒìÅû¶£ºÒòÈ«ÇòÎÄ»¯²îÒìÓ°Ï죬µ¥´¿ÒÔYES\OK»òÆäËû¼òµ¥´Ê»ãµÄ»Ø¸´²¢²»¹¹³É·¢¼þÈ˶ÔÈκν»Ò×»òºÏ֮ͬÕýʽȷÈÏ»ò½ÓÊÜ£¬ÇëÓë·¢¼þÈËÔÙ´ÎÈ·ÈÏÒÔ»ñµÃÃ÷È·ÊéÃæÒâ¼û¡£·¢¼þÈ˲»¶ÔÈκÎÊÜÎÄ»¯²îÒìÓ°Ïì¶øµ¼Ö¹ÊÒâ»ò´íÎóʹÓøõÈÐÅÏ¢ËùÔì³ÉµÄÈκÎÖ±½Ó»ò¼ä½ÓË𺦳е£ÔðÈΡ£
> This e-mail and its attachments contain confidential information from OPPO, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you are not the intended recipient, please do not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.
> Electronic communications may contain computer viruses or other defects inherently, may not be accurately and/or timely transmitted to other systems, or may be intercepted, modified ,delayed, deleted or interfered. OPPO shall not be liable for any damages that arise or may arise from such matter and reserves all rights in connection with the email.
> Unless expressly stated, this e-mail and its attachments are provided without any warranty, acceptance or promise of any kind in any country or region, nor constitute a formal confirmation or acceptance of any transaction or contract. The sender, together with its affiliates or any shareholder, director, officer, employee or any other person of any such institution (hereinafter referred to as "sender" or "OPPO") does not waive any rights and shall not be liable for any damages that arise or may arise from the intentional or negligent use of such information.
> Cultural Differences Disclosure: Due to global cultural differences, any reply with only YES\OK or other simple words does not constitute any confirmation or acceptance of any transaction or contract, please confirm with the sender again to ensure clear opinion in written form. The sender shall not be responsible for any direct or indirect damages resulting from the intentional or misuse of such information.
> ________________________________
> OPPO
> 
> 本电子邮件及其附件含有OPPO公司的保密信息,仅限于邮件指明的收件人(包含个人及群组)使用。禁止任何人在未经授权的情况下以任何形式使用。如果您错收了本邮件,切勿传播、分发、复制、印刷或使用本邮件之任何部分或其所载之任何内容,并请立即以电子邮件通知发件人并删除本邮件及其附件。
> 网络通讯固有缺陷可能导致邮件被截留、修改、丢失、破坏或包含计算机病毒等不安全情况,OPPO对此类错误或遗漏而引致之任何损失概不承担责任并保留与本邮件相关之一切权利。
> 除非明确说明,本邮件及其附件无意作为在任何国家或地区之要约、招揽或承诺,亦无意作为任何交易或合同之正式确认。 发件人、其所属机构或所属机构之关联机构或任何上述机构之股东、董事、高级管理人员、员工或其他任何人(以下称“发件人”或“OPPO”)不因本邮件之误送而放弃其所享之任何权利,亦不对因故意或过失使用该等信息而引发或可能引发的损失承担任何责任。
> 文化差异披露:因全球文化差异影响,单纯以YES\OK或其他简单词汇的回复并不构成发件人对任何交易或合同之正式确认或接受,请与发件人再次确认以获得明确书面意见。发件人不对任何受文化差异影响而导致故意或错误使用该等信息所造成的任何直接或间接损害承担责任。
> This e-mail and its attachments contain confidential information from OPPO, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you are not the intended recipient, please do not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.
> Electronic communications may contain computer viruses or other defects inherently, may not be accurately and/or timely transmitted to other systems, or may be intercepted, modified ,delayed, deleted or interfered. OPPO shall not be liable for any damages that arise or may arise from such matter and reserves all rights in connection with the email.
> Unless expressly stated, this e-mail and its attachments are provided without any warranty, acceptance or promise of any kind in any country or region, nor constitute a formal confirmation or acceptance of any transaction or contract. The sender, together with its affiliates or any shareholder, director, officer, employee or any other person of any such institution (hereinafter referred to as "sender" or "OPPO") does not waive any rights and shall not be liable for any damages that arise or may arise from the intentional or negligent use of such information.
> Cultural Differences Disclosure: Due to global cultural differences, any reply with only YES\OK or other simple words does not constitute any confirmation or acceptance of any transaction or contract, please confirm with the sender again to ensure clear opinion in written form. The sender shall not be responsible for any direct or indirect damages resulting from the intentional or misuse of such information.
黄再漾(Joyyoung Huang) Nov. 14, 2023, 1:36 p.m. UTC | #3
On 11/10/2023 3:04 PM, huangzaiyang wrote:
> From: huangzaiyang <joyyoung.wang@gmail.com>
> 
> There is a timer_list race condition when executing the following test shell script:
> '''
> while true
> do
>          echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor
>          echo "performance" > 
> /sys/class/devfreq/1d84000.ufshc/governor
> done
> '''
> 
> [13511.214366][    C3] Unable to handle kernel paging request at virtual address dead00000000012a
> [13511.214393][    C3] Mem abort info:
> [13511.214398][    C3]   ESR = 0x96000044
> [13511.214404][    C3]   EC = 0x25: DABT (current EL), IL = 32 bits
> [13511.214409][    C3]   SET = 0, FnV = 0
> [13511.214414][    C3]   EA = 0, S1PTW = 0
> [13511.214417][    C3] Data abort info:
> [13511.214422][    C3]   ISV = 0, ISS = 0x00000044
> [13511.214427][    C3]   CM = 0, WnR = 1
> [13511.214432][    C3] [dead00000000012a] address between user and kernel address ranges
> [13511.214439][    C3] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> [13511.215449][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G S      W  O      5.10.168-android12-9-o-g63cc297a7b34 #1
> [13511.215454][    C3] Hardware name: Qualcomm Technologies, Inc. Cape MTP, Whiteswan (DT)
> [13511.215460][    C3] pstate: 82400085 (Nzcv daIf +PAN -UAO +TCO BTYPE=--)
> [13511.215472][    C3] pc : expire_timers+0x9c/0x428
> [13511.215478][    C3] lr : __run_timers+0x1f0/0x328
> [13511.215483][    C3] sp : ffffffc00801bdd0
> [13511.215487][    C3] x29: ffffffc00801bdf0 x28: ffffffdb87b31698
> [13511.215493][    C3] x27: ffffffdb87999e58 x26: ffffffdb87966008
> [13511.215499][    C3] x25: 0000000000000001 x24: ffffff8001734a00
> [13511.215506][    C3] x23: 00000000000000e0 x22: dead000000000122
> [13511.215512][    C3] x21: 000000010032658e x20: ffffff89f7a9ae80
> [13511.215518][    C3] x19: ffffffc00801be50 x18: ffffffc00801d038
> [13511.215525][    C3] x17: 0000000000000240 x16: 0000000000000201
> [13511.215532][    C3] x15: ffffffffffffffff x14: ffffff89f7a9aef8
> [13511.215538][    C3] x13: 0000000000000240 x12: ffffff89f7a9aea8
> [13511.215544][    C3] x11: 0000000000000021 x10: 000000014032658e
> [13511.215550][    C3] x9 : ffffffc00801be50 x8 : dead000000000122
> [13511.215556][    C3] x7 : ffff71646c68735e x6 : ffffff89f7aaae58
> [13511.215563][    C3] x5 : 0000000000000000 x4 : 0000000000000101
> [13511.215569][    C3] x3 : ffffff89f7a9aef0 x2 : ffffff89f7a9aef0
> [13511.215575][    C3] x1 : ffffffc00801be50 x0 : ffffff8045804428
> [13511.215581][    C3] Call trace:
> [13511.215586][    C3]  expire_timers+0x9c/0x428
> [13511.215591][    C3]  __run_timers+0x1f0/0x328
> [13511.215596][    C3]  run_timer_softirq+0x28/0x58
> [13511.215602][    C3]  efi_header_end+0x168/0x5ec
> [13511.215610][    C3]  __irq_exit_rcu+0x108/0x124
> [13511.215617][    C3]  __handle_domain_irq+0x118/0x1e4
> [13511.215625][    C3]  gic_handle_irq.31230+0x6c/0x250
> [13511.215630][    C3]  el1_irq+0xe4/0x1c0
> [13511.215638][    C3]  cpuidle_enter_state+0x3a4/0xa04
> [13511.215644][    C3]  do_idle+0x308/0x574
> [13511.215649][    C3]  cpu_startup_entry+0x84/0x90
> [13511.215656][    C3]  secondary_start_kernel+0x204/0x274
> [13511.215664][    C3] Code: d503201f a9402408 f9000128 b4000048 (f9000509)
> [13511.215670][    C3] ---[ end trace 5100bad72a35d566 ]---
> [13511.215676][    C3] Kernel panic - not syncing: Oops: Fatal exception in interrupt
> 
> This is because when switching the governor through the sys node, the 
> devfreq_monitor_start function will re-initialize the delayed work 
> task, which will cause the delay work pending flag to become invalid, 
> and the timer pending judgment contained in the delayed work will also become invalid, and then the pending interception will be executed when the queue is executed.
> 
> So we remove the delay work'initialization work to the 
> devfreq_add_device and timer_store functions, and the delay work pending judgment is performed before the devfreq_monitor_start function performs the queue operation.
> 
> Signed-off-by: ZaiYang Huang <huangzaiyang@oppo.com>
> ---
>   drivers/devfreq/devfreq.c | 36 ++++++++++++++++++++++++------------
>   1 file changed, 24 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c 
> index b3a68d5833bd..8ae6f853a21e 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -483,18 +483,7 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>          if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN))
>                  return;
> 
> -       switch (devfreq->profile->timer) {
> -       case DEVFREQ_TIMER_DEFERRABLE:
> -               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
> -               break;
> -       case DEVFREQ_TIMER_DELAYED:
> -               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
> -               break;
> -       default:
> -               return;
> -       }
> -
> -       if (devfreq->profile->polling_ms)
> +       if (devfreq->profile->polling_ms && 
> + !delayed_work_pending(&devfreq->work))
>                  queue_delayed_work(devfreq_wq, &devfreq->work,
>                          msecs_to_jiffies(devfreq->profile->polling_ms));
>   }
> @@ -830,6 +819,17 @@ struct devfreq *devfreq_add_device(struct device *dev,
>                  goto err_dev;
>          }
> 
> +       switch (devfreq->profile->timer) {
> +       case DEVFREQ_TIMER_DEFERRABLE:
> +               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
> +               break;
> +       case DEVFREQ_TIMER_DELAYED:
> +               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
> +               break;
> +       default:
> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", devfreq->governor_name,
> +                       __func__);
> +       }

[..]

>          if (!devfreq->profile->max_state || !devfreq->profile->freq_table) {
>                  mutex_unlock(&devfreq->lock);
>                  err = set_freq_table(devfreq); @@ -1860,6 +1860,18 @@ 
> static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>          df->profile->timer = timer;
>          mutex_unlock(&df->lock);
> 
> +       switch (df->profile->timer) {
> +       case DEVFREQ_TIMER_DEFERRABLE:
> +               INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
> +               break;
> +       case DEVFREQ_TIMER_DELAYED:
> +               INIT_DELAYED_WORK(&df->work, devfreq_monitor);
> +               break;
> +       default:
> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
> +                       __func__);
> +       }
> +
Here, this can cause issue right, as it is modifying the delayed work data even before stopping the current running instances..

Should the above thing be done after DEVFREQ_GOV_STOP ?
But again it will boil down to the same thing as it is currently now .
>          ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>          if (ret) {
>                  dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
-->agree, seems better to put these codes after after DEVFREQ_GOV_STOP, as follow? 
@@ -1856,10 +1856,6 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
 		goto out;
 	}
 
-	mutex_lock(&df->lock);
-	df->profile->timer = timer;
-	mutex_unlock(&df->lock);
-
 	ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
 	if (ret) {
 		dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
@@ -1867,6 +1863,21 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
 		goto out;
 	}
 
+	mutex_lock(&df->lock);
+	df->profile->timer = timer;
+	switch (df->profile->timer) {
+	case DEVFREQ_TIMER_DEFERRABLE:
+		INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
+		break;
+	case DEVFREQ_TIMER_DELAYED:
+		INIT_DELAYED_WORK(&df->work, devfreq_monitor);
+		break;
+	default:
+		dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
+			__func__);
+       mutex_unlock(&df->lock);
+       goto out;
+	}
+	mutex_unlock(&df->lock);
+
 	ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
 	if (ret)
 		dev_warn(dev, "%s: Governor %s not started(%d)\n",
黄再漾(Joyyoung Huang) Nov. 15, 2023, 11:55 a.m. UTC | #4
Hi Mukesh
I try the below patch on my side,issue still happened ;but I am working on kernel 5.10.
https://lore.kernel.org/all/1699957648-31299-1-git-send-email-quic_mojha@quicinc.com/

I will try to debug the patch on my side , I will update you when there is new progress.


-----邮件原件-----
发件人: 黄再漾(Joyyoung Huang) 
发送时间: 2023年11月14日 21:37
收件人: Mukesh Ojha <quic_mojha@quicinc.com>; myungjoo.ham@samsung.com; kyungmin.park@samsung.com; cw00.choi@samsung.com
抄送: linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org; huangzaiyang <joyyoung.wang@gmail.com>
主题: 回复: [PATCH] Performance: devfreq: avoid devfreq delay work re-init before


On 11/10/2023 3:04 PM, huangzaiyang wrote:
> From: huangzaiyang <joyyoung.wang@gmail.com>
> 
> There is a timer_list race condition when executing the following test shell script:
> '''
> while true
> do
>          echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor
>          echo "performance" >
> /sys/class/devfreq/1d84000.ufshc/governor
> done
> '''
> 
> [13511.214366][    C3] Unable to handle kernel paging request at virtual address dead00000000012a
> [13511.214393][    C3] Mem abort info:
> [13511.214398][    C3]   ESR = 0x96000044
> [13511.214404][    C3]   EC = 0x25: DABT (current EL), IL = 32 bits
> [13511.214409][    C3]   SET = 0, FnV = 0
> [13511.214414][    C3]   EA = 0, S1PTW = 0
> [13511.214417][    C3] Data abort info:
> [13511.214422][    C3]   ISV = 0, ISS = 0x00000044
> [13511.214427][    C3]   CM = 0, WnR = 1
> [13511.214432][    C3] [dead00000000012a] address between user and kernel address ranges
> [13511.214439][    C3] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> [13511.215449][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G S      W  O      5.10.168-android12-9-o-g63cc297a7b34 #1
> [13511.215454][    C3] Hardware name: Qualcomm Technologies, Inc. Cape MTP, Whiteswan (DT)
> [13511.215460][    C3] pstate: 82400085 (Nzcv daIf +PAN -UAO +TCO BTYPE=--)
> [13511.215472][    C3] pc : expire_timers+0x9c/0x428
> [13511.215478][    C3] lr : __run_timers+0x1f0/0x328
> [13511.215483][    C3] sp : ffffffc00801bdd0
> [13511.215487][    C3] x29: ffffffc00801bdf0 x28: ffffffdb87b31698
> [13511.215493][    C3] x27: ffffffdb87999e58 x26: ffffffdb87966008
> [13511.215499][    C3] x25: 0000000000000001 x24: ffffff8001734a00
> [13511.215506][    C3] x23: 00000000000000e0 x22: dead000000000122
> [13511.215512][    C3] x21: 000000010032658e x20: ffffff89f7a9ae80
> [13511.215518][    C3] x19: ffffffc00801be50 x18: ffffffc00801d038
> [13511.215525][    C3] x17: 0000000000000240 x16: 0000000000000201
> [13511.215532][    C3] x15: ffffffffffffffff x14: ffffff89f7a9aef8
> [13511.215538][    C3] x13: 0000000000000240 x12: ffffff89f7a9aea8
> [13511.215544][    C3] x11: 0000000000000021 x10: 000000014032658e
> [13511.215550][    C3] x9 : ffffffc00801be50 x8 : dead000000000122
> [13511.215556][    C3] x7 : ffff71646c68735e x6 : ffffff89f7aaae58
> [13511.215563][    C3] x5 : 0000000000000000 x4 : 0000000000000101
> [13511.215569][    C3] x3 : ffffff89f7a9aef0 x2 : ffffff89f7a9aef0
> [13511.215575][    C3] x1 : ffffffc00801be50 x0 : ffffff8045804428
> [13511.215581][    C3] Call trace:
> [13511.215586][    C3]  expire_timers+0x9c/0x428
> [13511.215591][    C3]  __run_timers+0x1f0/0x328
> [13511.215596][    C3]  run_timer_softirq+0x28/0x58
> [13511.215602][    C3]  efi_header_end+0x168/0x5ec
> [13511.215610][    C3]  __irq_exit_rcu+0x108/0x124
> [13511.215617][    C3]  __handle_domain_irq+0x118/0x1e4
> [13511.215625][    C3]  gic_handle_irq.31230+0x6c/0x250
> [13511.215630][    C3]  el1_irq+0xe4/0x1c0
> [13511.215638][    C3]  cpuidle_enter_state+0x3a4/0xa04
> [13511.215644][    C3]  do_idle+0x308/0x574
> [13511.215649][    C3]  cpu_startup_entry+0x84/0x90
> [13511.215656][    C3]  secondary_start_kernel+0x204/0x274
> [13511.215664][    C3] Code: d503201f a9402408 f9000128 b4000048 (f9000509)
> [13511.215670][    C3] ---[ end trace 5100bad72a35d566 ]---
> [13511.215676][    C3] Kernel panic - not syncing: Oops: Fatal exception in interrupt
> 
> This is because when switching the governor through the sys node, the 
> devfreq_monitor_start function will re-initialize the delayed work 
> task, which will cause the delay work pending flag to become invalid, 
> and the timer pending judgment contained in the delayed work will also become invalid, and then the pending interception will be executed when the queue is executed.
> 
> So we remove the delay work'initialization work to the 
> devfreq_add_device and timer_store functions, and the delay work pending judgment is performed before the devfreq_monitor_start function performs the queue operation.
> 
> Signed-off-by: ZaiYang Huang <huangzaiyang@oppo.com>
> ---
>   drivers/devfreq/devfreq.c | 36 ++++++++++++++++++++++++------------
>   1 file changed, 24 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c 
> index b3a68d5833bd..8ae6f853a21e 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -483,18 +483,7 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>          if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN))
>                  return;
> 
> -       switch (devfreq->profile->timer) {
> -       case DEVFREQ_TIMER_DEFERRABLE:
> -               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
> -               break;
> -       case DEVFREQ_TIMER_DELAYED:
> -               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
> -               break;
> -       default:
> -               return;
> -       }
> -
> -       if (devfreq->profile->polling_ms)
> +       if (devfreq->profile->polling_ms &&
> + !delayed_work_pending(&devfreq->work))
>                  queue_delayed_work(devfreq_wq, &devfreq->work,
>                          msecs_to_jiffies(devfreq->profile->polling_ms));
>   }
> @@ -830,6 +819,17 @@ struct devfreq *devfreq_add_device(struct device *dev,
>                  goto err_dev;
>          }
> 
> +       switch (devfreq->profile->timer) {
> +       case DEVFREQ_TIMER_DEFERRABLE:
> +               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
> +               break;
> +       case DEVFREQ_TIMER_DELAYED:
> +               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
> +               break;
> +       default:
> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", devfreq->governor_name,
> +                       __func__);
> +       }

[..]

>          if (!devfreq->profile->max_state || !devfreq->profile->freq_table) {
>                  mutex_unlock(&devfreq->lock);
>                  err = set_freq_table(devfreq); @@ -1860,6 +1860,18 @@ 
> static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>          df->profile->timer = timer;
>          mutex_unlock(&df->lock);
> 
> +       switch (df->profile->timer) {
> +       case DEVFREQ_TIMER_DEFERRABLE:
> +               INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
> +               break;
> +       case DEVFREQ_TIMER_DELAYED:
> +               INIT_DELAYED_WORK(&df->work, devfreq_monitor);
> +               break;
> +       default:
> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
> +                       __func__);
> +       }
> +
Here, this can cause issue right, as it is modifying the delayed work data even before stopping the current running instances..

Should the above thing be done after DEVFREQ_GOV_STOP ?
But again it will boil down to the same thing as it is currently now .
>          ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>          if (ret) {
>                  dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
-->agree, seems better to put these codes after after DEVFREQ_GOV_STOP, as follow? 
@@ -1856,10 +1856,6 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
 		goto out;
 	}
 
-	mutex_lock(&df->lock);
-	df->profile->timer = timer;
-	mutex_unlock(&df->lock);
-
 	ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
 	if (ret) {
 		dev_warn(dev, "%s: Governor %s not stopped(%d)\n", @@ -1867,6 +1863,21 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
 		goto out;
 	}
 
+	mutex_lock(&df->lock);
+	df->profile->timer = timer;
+	switch (df->profile->timer) {
+	case DEVFREQ_TIMER_DEFERRABLE:
+		INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
+		break;
+	case DEVFREQ_TIMER_DELAYED:
+		INIT_DELAYED_WORK(&df->work, devfreq_monitor);
+		break;
+	default:
+		dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
+			__func__);
+       mutex_unlock(&df->lock);
+       goto out;
+	}
+	mutex_unlock(&df->lock);
+
 	ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
 	if (ret)
 		dev_warn(dev, "%s: Governor %s not started(%d)\n",
Mukesh Ojha Nov. 15, 2023, 1:24 p.m. UTC | #5
On 11/15/2023 5:25 PM, 黄再漾(Joyyoung Huang) wrote:
> Hi Mukesh
> I try the below patch on my side,issue still happened ;but I am working on kernel 5.10.
> https://lore.kernel.org/all/1699957648-31299-1-git-send-email-quic_mojha@quicinc.com/
> 
> I will try to debug the patch on my side , I will update you when there is new progress.

There still seems to be problem with the patch [1], work is queued to 
some cpu but not executing where there can be parallel call can come for 
devfreq_monitor_start() and that is not being checked and same can come
during cancel_delayed_work_sync() call. We can protect the same with 
below patch applied on top of the patch given on [1].

[1]
https://lore.kernel.org/all/1699957648-31299-1-git-send-email-quic_mojha@quicinc.com/

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 09b93104521b..a25c74fc31d7 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -488,6 +488,9 @@ void devfreq_monitor_start(struct devfreq *devfreq)
                 return;

         mutex_lock(&devfreq->lock);
+       if (delayed_work_pending(&devfreq->work))
+               goto out;
+
         switch (devfreq->profile->timer) {
         case DEVFREQ_TIMER_DEFERRABLE:
                 INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
@@ -503,8 +506,8 @@ void devfreq_monitor_start(struct devfreq *devfreq)
                 queue_delayed_work(devfreq_wq, &devfreq->work,
                         msecs_to_jiffies(devfreq->profile->polling_ms));

-       devfreq->stop_polling = false;
  out:
+       devfreq->stop_polling = false;
         mutex_unlock(&devfreq->lock);
  }
  EXPORT_SYMBOL(devfreq_monitor_start);
@@ -529,8 +532,8 @@ void devfreq_monitor_stop(struct devfreq *devfreq)
         }

         devfreq->stop_polling = true;
-       mutex_unlock(&devfreq->lock);
         cancel_delayed_work_sync(&devfreq->work);
+       mutex_unlock(&devfreq->lock);

-Mukesh
> 
> 
> -----邮件原件-----
> 发件人: 黄再漾(Joyyoung Huang)
> 发送时间: 2023年11月14日 21:37
> 收件人: Mukesh Ojha <quic_mojha@quicinc.com>; myungjoo.ham@samsung.com; kyungmin.park@samsung.com; cw00.choi@samsung.com
> 抄送: linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org; huangzaiyang <joyyoung.wang@gmail.com>
> 主题: 回复: [PATCH] Performance: devfreq: avoid devfreq delay work re-init before
> 
> 
> On 11/10/2023 3:04 PM, huangzaiyang wrote:
>> From: huangzaiyang <joyyoung.wang@gmail.com>
>>
>> There is a timer_list race condition when executing the following test shell script:
>> '''
>> while true
>> do
>>           echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor
>>           echo "performance" >
>> /sys/class/devfreq/1d84000.ufshc/governor
>> done
>> '''
>>
>> [13511.214366][    C3] Unable to handle kernel paging request at virtual address dead00000000012a
>> [13511.214393][    C3] Mem abort info:
>> [13511.214398][    C3]   ESR = 0x96000044
>> [13511.214404][    C3]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [13511.214409][    C3]   SET = 0, FnV = 0
>> [13511.214414][    C3]   EA = 0, S1PTW = 0
>> [13511.214417][    C3] Data abort info:
>> [13511.214422][    C3]   ISV = 0, ISS = 0x00000044
>> [13511.214427][    C3]   CM = 0, WnR = 1
>> [13511.214432][    C3] [dead00000000012a] address between user and kernel address ranges
>> [13511.214439][    C3] Internal error: Oops: 96000044 [#1] PREEMPT SMP
>> [13511.215449][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G S      W  O      5.10.168-android12-9-o-g63cc297a7b34 #1
>> [13511.215454][    C3] Hardware name: Qualcomm Technologies, Inc. Cape MTP, Whiteswan (DT)
>> [13511.215460][    C3] pstate: 82400085 (Nzcv daIf +PAN -UAO +TCO BTYPE=--)
>> [13511.215472][    C3] pc : expire_timers+0x9c/0x428
>> [13511.215478][    C3] lr : __run_timers+0x1f0/0x328
>> [13511.215483][    C3] sp : ffffffc00801bdd0
>> [13511.215487][    C3] x29: ffffffc00801bdf0 x28: ffffffdb87b31698
>> [13511.215493][    C3] x27: ffffffdb87999e58 x26: ffffffdb87966008
>> [13511.215499][    C3] x25: 0000000000000001 x24: ffffff8001734a00
>> [13511.215506][    C3] x23: 00000000000000e0 x22: dead000000000122
>> [13511.215512][    C3] x21: 000000010032658e x20: ffffff89f7a9ae80
>> [13511.215518][    C3] x19: ffffffc00801be50 x18: ffffffc00801d038
>> [13511.215525][    C3] x17: 0000000000000240 x16: 0000000000000201
>> [13511.215532][    C3] x15: ffffffffffffffff x14: ffffff89f7a9aef8
>> [13511.215538][    C3] x13: 0000000000000240 x12: ffffff89f7a9aea8
>> [13511.215544][    C3] x11: 0000000000000021 x10: 000000014032658e
>> [13511.215550][    C3] x9 : ffffffc00801be50 x8 : dead000000000122
>> [13511.215556][    C3] x7 : ffff71646c68735e x6 : ffffff89f7aaae58
>> [13511.215563][    C3] x5 : 0000000000000000 x4 : 0000000000000101
>> [13511.215569][    C3] x3 : ffffff89f7a9aef0 x2 : ffffff89f7a9aef0
>> [13511.215575][    C3] x1 : ffffffc00801be50 x0 : ffffff8045804428
>> [13511.215581][    C3] Call trace:
>> [13511.215586][    C3]  expire_timers+0x9c/0x428
>> [13511.215591][    C3]  __run_timers+0x1f0/0x328
>> [13511.215596][    C3]  run_timer_softirq+0x28/0x58
>> [13511.215602][    C3]  efi_header_end+0x168/0x5ec
>> [13511.215610][    C3]  __irq_exit_rcu+0x108/0x124
>> [13511.215617][    C3]  __handle_domain_irq+0x118/0x1e4
>> [13511.215625][    C3]  gic_handle_irq.31230+0x6c/0x250
>> [13511.215630][    C3]  el1_irq+0xe4/0x1c0
>> [13511.215638][    C3]  cpuidle_enter_state+0x3a4/0xa04
>> [13511.215644][    C3]  do_idle+0x308/0x574
>> [13511.215649][    C3]  cpu_startup_entry+0x84/0x90
>> [13511.215656][    C3]  secondary_start_kernel+0x204/0x274
>> [13511.215664][    C3] Code: d503201f a9402408 f9000128 b4000048 (f9000509)
>> [13511.215670][    C3] ---[ end trace 5100bad72a35d566 ]---
>> [13511.215676][    C3] Kernel panic - not syncing: Oops: Fatal exception in interrupt
>>
>> This is because when switching the governor through the sys node, the
>> devfreq_monitor_start function will re-initialize the delayed work
>> task, which will cause the delay work pending flag to become invalid,
>> and the timer pending judgment contained in the delayed work will also become invalid, and then the pending interception will be executed when the queue is executed.
>>
>> So we remove the delay work'initialization work to the
>> devfreq_add_device and timer_store functions, and the delay work pending judgment is performed before the devfreq_monitor_start function performs the queue operation.
>>
>> Signed-off-by: ZaiYang Huang <huangzaiyang@oppo.com>
>> ---
>>    drivers/devfreq/devfreq.c | 36 ++++++++++++++++++++++++------------
>>    1 file changed, 24 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>> index b3a68d5833bd..8ae6f853a21e 100644
>> --- a/drivers/devfreq/devfreq.c
>> +++ b/drivers/devfreq/devfreq.c
>> @@ -483,18 +483,7 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>>           if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN))
>>                   return;
>>
>> -       switch (devfreq->profile->timer) {
>> -       case DEVFREQ_TIMER_DEFERRABLE:
>> -               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
>> -               break;
>> -       case DEVFREQ_TIMER_DELAYED:
>> -               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
>> -               break;
>> -       default:
>> -               return;
>> -       }
>> -
>> -       if (devfreq->profile->polling_ms)
>> +       if (devfreq->profile->polling_ms &&
>> + !delayed_work_pending(&devfreq->work))
>>                   queue_delayed_work(devfreq_wq, &devfreq->work,
>>                           msecs_to_jiffies(devfreq->profile->polling_ms));
>>    }
>> @@ -830,6 +819,17 @@ struct devfreq *devfreq_add_device(struct device *dev,
>>                   goto err_dev;
>>           }
>>
>> +       switch (devfreq->profile->timer) {
>> +       case DEVFREQ_TIMER_DEFERRABLE:
>> +               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
>> +               break;
>> +       case DEVFREQ_TIMER_DELAYED:
>> +               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
>> +               break;
>> +       default:
>> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", devfreq->governor_name,
>> +                       __func__);
>> +       }
> 
> [..]
> 
>>           if (!devfreq->profile->max_state || !devfreq->profile->freq_table) {
>>                   mutex_unlock(&devfreq->lock);
>>                   err = set_freq_table(devfreq); @@ -1860,6 +1860,18 @@
>> static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>>           df->profile->timer = timer;
>>           mutex_unlock(&df->lock);
>>
>> +       switch (df->profile->timer) {
>> +       case DEVFREQ_TIMER_DEFERRABLE:
>> +               INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
>> +               break;
>> +       case DEVFREQ_TIMER_DELAYED:
>> +               INIT_DELAYED_WORK(&df->work, devfreq_monitor);
>> +               break;
>> +       default:
>> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
>> +                       __func__);
>> +       }
>> +
> Here, this can cause issue right, as it is modifying the delayed work data even before stopping the current running instances..
> 
> Should the above thing be done after DEVFREQ_GOV_STOP ?
> But again it will boil down to the same thing as it is currently now .
>>           ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>>           if (ret) {
>>                   dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
> -->agree, seems better to put these codes after after DEVFREQ_GOV_STOP, as follow?
> @@ -1856,10 +1856,6 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>   		goto out;
>   	}
>   
> -	mutex_lock(&df->lock);
> -	df->profile->timer = timer;
> -	mutex_unlock(&df->lock);
> -
>   	ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>   	if (ret) {
>   		dev_warn(dev, "%s: Governor %s not stopped(%d)\n", @@ -1867,6 +1863,21 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>   		goto out;
>   	}
>   
> +	mutex_lock(&df->lock);
> +	df->profile->timer = timer;
> +	switch (df->profile->timer) {
> +	case DEVFREQ_TIMER_DEFERRABLE:
> +		INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
> +		break;
> +	case DEVFREQ_TIMER_DELAYED:
> +		INIT_DELAYED_WORK(&df->work, devfreq_monitor);
> +		break;
> +	default:
> +		dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
> +			__func__);
> +       mutex_unlock(&df->lock);
> +       goto out;
> +	}
> +	mutex_unlock(&df->lock);
> +
>   	ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
>   	if (ret)
>   		dev_warn(dev, "%s: Governor %s not started(%d)\n",
黄再漾(Joyyoung Huang) Nov. 16, 2023, 6:36 a.m. UTC | #6
>There still seems to be problem with the patch [1], work is queued to some cpu but not executing where there can be parallel call can come for
>devfreq_monitor_start() and that is not being checked and same can come during cancel_delayed_work_sync() call. We can protect the same with below patch applied on top of the patch given on [1].
>
>[1]
>https://lore.kernel.org/all/1699957648-31299-1-git-send-email-quic_mojha@quicinc.com/

>diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index 09b93104521b..a25c74fc31d7 100644
>--- a/drivers/devfreq/devfreq.c
>+++ b/drivers/devfreq/devfreq.c
>@@ -488,6 +488,9 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>                 return;
>
>         mutex_lock(&devfreq->lock);
>+       if (delayed_work_pending(&devfreq->work))
>+               goto out;
>+
>         switch (devfreq->profile->timer) {
>         case DEVFREQ_TIMER_DEFERRABLE:
>                 INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor); @@ -503,8 +506,8 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>                 queue_delayed_work(devfreq_wq, &devfreq->work,
>                         msecs_to_jiffies(devfreq->profile->polling_ms));
>
>-       devfreq->stop_polling = false;
>  out:
>+       devfreq->stop_polling = false;
>         mutex_unlock(&devfreq->lock);
>  }
>  EXPORT_SYMBOL(devfreq_monitor_start);
>@@ -529,8 +532,8 @@ void devfreq_monitor_stop(struct devfreq *devfreq)
>         }
>
>         devfreq->stop_polling = true;
>-       mutex_unlock(&devfreq->lock);
>         cancel_delayed_work_sync(&devfreq->work);
>+       mutex_unlock(&devfreq->lock);

Thanks , It works well on my side!

-huangzaiyang

>
>
> -----邮件原件-----
> 发件人: 黄再漾(Joyyoung Huang)
> 发送时间: 2023年11月14日 21:37
> 收件人: Mukesh Ojha <quic_mojha@quicinc.com>; myungjoo.ham@samsung.com;
> kyungmin.park@samsung.com; cw00.choi@samsung.com
> 抄送: linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org;
> huangzaiyang <joyyoung.wang@gmail.com>
> 主题: 回复: [PATCH] Performance: devfreq: avoid devfreq delay work re-init
> before
>
>
> On 11/10/2023 3:04 PM, huangzaiyang wrote:
>> From: huangzaiyang <joyyoung.wang@gmail.com>
>>
>> There is a timer_list race condition when executing the following test shell script:
>> '''
>> while true
>> do
>>           echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor
>>           echo "performance" >
>> /sys/class/devfreq/1d84000.ufshc/governor
>> done
>> '''
>>
>> [13511.214366][    C3] Unable to handle kernel paging request at virtual address dead00000000012a
>> [13511.214393][    C3] Mem abort info:
>> [13511.214398][    C3]   ESR = 0x96000044
>> [13511.214404][    C3]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [13511.214409][    C3]   SET = 0, FnV = 0
>> [13511.214414][    C3]   EA = 0, S1PTW = 0
>> [13511.214417][    C3] Data abort info:
>> [13511.214422][    C3]   ISV = 0, ISS = 0x00000044
>> [13511.214427][    C3]   CM = 0, WnR = 1
>> [13511.214432][    C3] [dead00000000012a] address between user and kernel address ranges
>> [13511.214439][    C3] Internal error: Oops: 96000044 [#1] PREEMPT SMP
>> [13511.215449][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G S      W  O      5.10.168-android12-9-o-g63cc297a7b34 #1
>> [13511.215454][    C3] Hardware name: Qualcomm Technologies, Inc. Cape MTP, Whiteswan (DT)
>> [13511.215460][    C3] pstate: 82400085 (Nzcv daIf +PAN -UAO +TCO BTYPE=--)
>> [13511.215472][    C3] pc : expire_timers+0x9c/0x428
>> [13511.215478][    C3] lr : __run_timers+0x1f0/0x328
>> [13511.215483][    C3] sp : ffffffc00801bdd0
>> [13511.215487][    C3] x29: ffffffc00801bdf0 x28: ffffffdb87b31698
>> [13511.215493][    C3] x27: ffffffdb87999e58 x26: ffffffdb87966008
>> [13511.215499][    C3] x25: 0000000000000001 x24: ffffff8001734a00
>> [13511.215506][    C3] x23: 00000000000000e0 x22: dead000000000122
>> [13511.215512][    C3] x21: 000000010032658e x20: ffffff89f7a9ae80
>> [13511.215518][    C3] x19: ffffffc00801be50 x18: ffffffc00801d038
>> [13511.215525][    C3] x17: 0000000000000240 x16: 0000000000000201
>> [13511.215532][    C3] x15: ffffffffffffffff x14: ffffff89f7a9aef8
>> [13511.215538][    C3] x13: 0000000000000240 x12: ffffff89f7a9aea8
>> [13511.215544][    C3] x11: 0000000000000021 x10: 000000014032658e
>> [13511.215550][    C3] x9 : ffffffc00801be50 x8 : dead000000000122
>> [13511.215556][    C3] x7 : ffff71646c68735e x6 : ffffff89f7aaae58
>> [13511.215563][    C3] x5 : 0000000000000000 x4 : 0000000000000101
>> [13511.215569][    C3] x3 : ffffff89f7a9aef0 x2 : ffffff89f7a9aef0
>> [13511.215575][    C3] x1 : ffffffc00801be50 x0 : ffffff8045804428
>> [13511.215581][    C3] Call trace:
>> [13511.215586][    C3]  expire_timers+0x9c/0x428
>> [13511.215591][    C3]  __run_timers+0x1f0/0x328
>> [13511.215596][    C3]  run_timer_softirq+0x28/0x58
>> [13511.215602][    C3]  efi_header_end+0x168/0x5ec
>> [13511.215610][    C3]  __irq_exit_rcu+0x108/0x124
>> [13511.215617][    C3]  __handle_domain_irq+0x118/0x1e4
>> [13511.215625][    C3]  gic_handle_irq.31230+0x6c/0x250
>> [13511.215630][    C3]  el1_irq+0xe4/0x1c0
>> [13511.215638][    C3]  cpuidle_enter_state+0x3a4/0xa04
>> [13511.215644][    C3]  do_idle+0x308/0x574
>> [13511.215649][    C3]  cpu_startup_entry+0x84/0x90
>> [13511.215656][    C3]  secondary_start_kernel+0x204/0x274
>> [13511.215664][    C3] Code: d503201f a9402408 f9000128 b4000048 (f9000509)
>> [13511.215670][    C3] ---[ end trace 5100bad72a35d566 ]---
>> [13511.215676][    C3] Kernel panic - not syncing: Oops: Fatal exception in interrupt
>>
>> This is because when switching the governor through the sys node, the
>> devfreq_monitor_start function will re-initialize the delayed work
>> task, which will cause the delay work pending flag to become invalid,
>> and the timer pending judgment contained in the delayed work will also become invalid, and then the pending interception will be executed when the queue is executed.
>>
>> So we remove the delay work'initialization work to the
>> devfreq_add_device and timer_store functions, and the delay work pending judgment is performed before the devfreq_monitor_start function performs the queue operation.
>>
>> Signed-off-by: ZaiYang Huang <huangzaiyang@oppo.com>
>> ---
>>    drivers/devfreq/devfreq.c | 36 ++++++++++++++++++++++++------------
>>    1 file changed, 24 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>> index b3a68d5833bd..8ae6f853a21e 100644
>> --- a/drivers/devfreq/devfreq.c
>> +++ b/drivers/devfreq/devfreq.c
>> @@ -483,18 +483,7 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>>           if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN))
>>                   return;
>>
>> -       switch (devfreq->profile->timer) {
>> -       case DEVFREQ_TIMER_DEFERRABLE:
>> -               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
>> -               break;
>> -       case DEVFREQ_TIMER_DELAYED:
>> -               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
>> -               break;
>> -       default:
>> -               return;
>> -       }
>> -
>> -       if (devfreq->profile->polling_ms)
>> +       if (devfreq->profile->polling_ms &&
>> + !delayed_work_pending(&devfreq->work))
>>                   queue_delayed_work(devfreq_wq, &devfreq->work,
>>                           msecs_to_jiffies(devfreq->profile->polling_ms));
>>    }
>> @@ -830,6 +819,17 @@ struct devfreq *devfreq_add_device(struct device *dev,
>>                   goto err_dev;
>>           }
>>
>> +       switch (devfreq->profile->timer) {
>> +       case DEVFREQ_TIMER_DEFERRABLE:
>> +               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
>> +               break;
>> +       case DEVFREQ_TIMER_DELAYED:
>> +               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
>> +               break;
>> +       default:
>> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", devfreq->governor_name,
>> +                       __func__);
>> +       }
>
> [..]
>
>>           if (!devfreq->profile->max_state || !devfreq->profile->freq_table) {
>>                   mutex_unlock(&devfreq->lock);
>>                   err = set_freq_table(devfreq); @@ -1860,6 +1860,18
>> @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>>           df->profile->timer = timer;
>>           mutex_unlock(&df->lock);
>>
>> +       switch (df->profile->timer) {
>> +       case DEVFREQ_TIMER_DEFERRABLE:
>> +               INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
>> +               break;
>> +       case DEVFREQ_TIMER_DELAYED:
>> +               INIT_DELAYED_WORK(&df->work, devfreq_monitor);
>> +               break;
>> +       default:
>> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
>> +                       __func__);
>> +       }
>> +
> Here, this can cause issue right, as it is modifying the delayed work data even before stopping the current running instances..
>
> Should the above thing be done after DEVFREQ_GOV_STOP ?
> But again it will boil down to the same thing as it is currently now .
>>           ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>>           if (ret) {
>>                   dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
> -->agree, seems better to put these codes after after DEVFREQ_GOV_STOP, as follow?
> @@ -1856,10 +1856,6 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>               goto out;
>       }
>
> -     mutex_lock(&df->lock);
> -     df->profile->timer = timer;
> -     mutex_unlock(&df->lock);
> -
>       ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>       if (ret) {
>               dev_warn(dev, "%s: Governor %s not stopped(%d)\n", @@ -1867,6 +1863,21 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>               goto out;
>       }
>
> +     mutex_lock(&df->lock);
> +     df->profile->timer = timer;
> +     switch (df->profile->timer) {
> +     case DEVFREQ_TIMER_DEFERRABLE:
> +             INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
> +             break;
> +     case DEVFREQ_TIMER_DELAYED:
> +             INIT_DELAYED_WORK(&df->work, devfreq_monitor);
> +             break;
> +     default:
> +             dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
> +                     __func__);
> +       mutex_unlock(&df->lock);
> +       goto out;
> +     }
> +     mutex_unlock(&df->lock);
> +
>       ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
>       if (ret)
>               dev_warn(dev, "%s: Governor %s not started(%d)\n",
黄再漾(Joyyoung Huang) Nov. 24, 2023, 12:07 p.m. UTC | #7
>There still seems to be problem with the patch [1], work is queued to
>some cpu but not executing where there can be parallel call can come
>for
>devfreq_monitor_start() and that is not being checked and same can come during cancel_delayed_work_sync() call. We can protect the same with below patch applied on top of the patch given on [1].
>
>[1]
>https://lore.kernel.org/all/1699957648-31299-1-git-send-email-quic_mojh
>a@quicinc.com/

>diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>index 09b93104521b..a25c74fc31d7 100644
>--- a/drivers/devfreq/devfreq.c
>+++ b/drivers/devfreq/devfreq.c
>@@ -488,6 +488,9 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>                 return;
>
>         mutex_lock(&devfreq->lock);
>+       if (delayed_work_pending(&devfreq->work))
>+               goto out;
>+
>         switch (devfreq->profile->timer) {
>         case DEVFREQ_TIMER_DEFERRABLE:
>                 INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor); @@ -503,8 +506,8 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>                 queue_delayed_work(devfreq_wq, &devfreq->work,
>
> msecs_to_jiffies(devfreq->profile->polling_ms));
>
>-       devfreq->stop_polling = false;
>  out:
>+       devfreq->stop_polling = false;
>         mutex_unlock(&devfreq->lock);
>  }
>  EXPORT_SYMBOL(devfreq_monitor_start);
>@@ -529,8 +532,8 @@ void devfreq_monitor_stop(struct devfreq *devfreq)
>         }
>
>         devfreq->stop_polling = true;
>-       mutex_unlock(&devfreq->lock);
>         cancel_delayed_work_sync(&devfreq->work);
>+       mutex_unlock(&devfreq->lock);

>Thanks , It works well on my side!

Hi Mukesh,patch 2 still get some deadlock issue during pressure testing, between devfreq_monitor and devfreq_monitor_stop functions like:
    [<ffffffd99bc7ab40>] __switch_to+0x244
    [<ffffffd99d602490>] __schedule+0x5bc
    [<ffffffd99d602b9c>] schedule+0x80
    [<ffffffd99d60b53c>] schedule_timeout[jt]+0xe0
    [<ffffffd99d6040cc>] wait_for_common+0x148
    [<ffffffd99bd69008>] __flush_work+0x3b0
    [<ffffffd99bd6c670>] __cancel_work_timer+0x11c
    [<ffffffd99cddeb8c>] devfreq_simple_ondemand_handler+0x84
    [<ffffffd99cddd660>] governor_store.60718+0x104
    [<ffffffd99c8b8950>] dev_attr_store+0x38
    [<ffffffd99c2a0254>] sysfs_kf_write+0x64
    [<ffffffd99c29d98c>] kernfs_fop_write_iter+0x168
    [<ffffffd99c15f1e8>] vfs_write+0x300
    [<ffffffd99c15ee74>] ksys_write+0x7c
    [<ffffffd99c15ede8>] __arm64_sys_write+0x20
    [<ffffffd99bc94c24>] el0_svc_common+0xd4
    [<ffffffd99d37d644>] el0_svc+0x28
    [<ffffffd99d37d5b8>] el0_sync_handler+0x8c
    [<ffffffd99bc120b4>] el0_sync+0x1b4
This is because devfreq_monitor_stop keep devfreq->lock,and then cancel the work, will wait the work flush finish completion;
The work completion need devfreq_monitor finish also, but devfreq_monitor need to get devfreq->lock too.
So, cancel_delayed_work_sync(&devfreq->work);need remove to outside devfreq->lock protect!

-huangzaiyang

>
>
> -----邮件原件-----
> 发件人: 黄再漾(Joyyoung Huang)
> 发送时间: 2023年11月14日 21:37
> 收件人: Mukesh Ojha <quic_mojha@quicinc.com>; myungjoo.ham@samsung.com;
> kyungmin.park@samsung.com; cw00.choi@samsung.com
> 抄送: linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org;
> huangzaiyang <joyyoung.wang@gmail.com>
> 主题: 回复: [PATCH] Performance: devfreq: avoid devfreq delay work re-init
> before
>
>
> On 11/10/2023 3:04 PM, huangzaiyang wrote:
>> From: huangzaiyang <joyyoung.wang@gmail.com>
>>
>> There is a timer_list race condition when executing the following test shell script:
>> '''
>> while true
>> do
>>           echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor
>>           echo "performance" >
>> /sys/class/devfreq/1d84000.ufshc/governor
>> done
>> '''
>>
>> [13511.214366][    C3] Unable to handle kernel paging request at virtual address dead00000000012a
>> [13511.214393][    C3] Mem abort info:
>> [13511.214398][    C3]   ESR = 0x96000044
>> [13511.214404][    C3]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [13511.214409][    C3]   SET = 0, FnV = 0
>> [13511.214414][    C3]   EA = 0, S1PTW = 0
>> [13511.214417][    C3] Data abort info:
>> [13511.214422][    C3]   ISV = 0, ISS = 0x00000044
>> [13511.214427][    C3]   CM = 0, WnR = 1
>> [13511.214432][    C3] [dead00000000012a] address between user and kernel address ranges
>> [13511.214439][    C3] Internal error: Oops: 96000044 [#1] PREEMPT SMP
>> [13511.215449][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G S      W  O      5.10.168-android12-9-o-g63cc297a7b34 #1
>> [13511.215454][    C3] Hardware name: Qualcomm Technologies, Inc. Cape MTP, Whiteswan (DT)
>> [13511.215460][    C3] pstate: 82400085 (Nzcv daIf +PAN -UAO +TCO BTYPE=--)
>> [13511.215472][    C3] pc : expire_timers+0x9c/0x428
>> [13511.215478][    C3] lr : __run_timers+0x1f0/0x328
>> [13511.215483][    C3] sp : ffffffc00801bdd0
>> [13511.215487][    C3] x29: ffffffc00801bdf0 x28: ffffffdb87b31698
>> [13511.215493][    C3] x27: ffffffdb87999e58 x26: ffffffdb87966008
>> [13511.215499][    C3] x25: 0000000000000001 x24: ffffff8001734a00
>> [13511.215506][    C3] x23: 00000000000000e0 x22: dead000000000122
>> [13511.215512][    C3] x21: 000000010032658e x20: ffffff89f7a9ae80
>> [13511.215518][    C3] x19: ffffffc00801be50 x18: ffffffc00801d038
>> [13511.215525][    C3] x17: 0000000000000240 x16: 0000000000000201
>> [13511.215532][    C3] x15: ffffffffffffffff x14: ffffff89f7a9aef8
>> [13511.215538][    C3] x13: 0000000000000240 x12: ffffff89f7a9aea8
>> [13511.215544][    C3] x11: 0000000000000021 x10: 000000014032658e
>> [13511.215550][    C3] x9 : ffffffc00801be50 x8 : dead000000000122
>> [13511.215556][    C3] x7 : ffff71646c68735e x6 : ffffff89f7aaae58
>> [13511.215563][    C3] x5 : 0000000000000000 x4 : 0000000000000101
>> [13511.215569][    C3] x3 : ffffff89f7a9aef0 x2 : ffffff89f7a9aef0
>> [13511.215575][    C3] x1 : ffffffc00801be50 x0 : ffffff8045804428
>> [13511.215581][    C3] Call trace:
>> [13511.215586][    C3]  expire_timers+0x9c/0x428
>> [13511.215591][    C3]  __run_timers+0x1f0/0x328
>> [13511.215596][    C3]  run_timer_softirq+0x28/0x58
>> [13511.215602][    C3]  efi_header_end+0x168/0x5ec
>> [13511.215610][    C3]  __irq_exit_rcu+0x108/0x124
>> [13511.215617][    C3]  __handle_domain_irq+0x118/0x1e4
>> [13511.215625][    C3]  gic_handle_irq.31230+0x6c/0x250
>> [13511.215630][    C3]  el1_irq+0xe4/0x1c0
>> [13511.215638][    C3]  cpuidle_enter_state+0x3a4/0xa04
>> [13511.215644][    C3]  do_idle+0x308/0x574
>> [13511.215649][    C3]  cpu_startup_entry+0x84/0x90
>> [13511.215656][    C3]  secondary_start_kernel+0x204/0x274
>> [13511.215664][    C3] Code: d503201f a9402408 f9000128 b4000048 (f9000509)
>> [13511.215670][    C3] ---[ end trace 5100bad72a35d566 ]---
>> [13511.215676][    C3] Kernel panic - not syncing: Oops: Fatal exception in interrupt
>>
>> This is because when switching the governor through the sys node, the
>> devfreq_monitor_start function will re-initialize the delayed work
>> task, which will cause the delay work pending flag to become invalid,
>> and the timer pending judgment contained in the delayed work will also become invalid, and then the pending interception will be executed when the queue is executed.
>>
>> So we remove the delay work'initialization work to the
>> devfreq_add_device and timer_store functions, and the delay work pending judgment is performed before the devfreq_monitor_start function performs the queue operation.
>>
>> Signed-off-by: ZaiYang Huang <huangzaiyang@oppo.com>
>> ---
>>    drivers/devfreq/devfreq.c | 36 ++++++++++++++++++++++++------------
>>    1 file changed, 24 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>> index b3a68d5833bd..8ae6f853a21e 100644
>> --- a/drivers/devfreq/devfreq.c
>> +++ b/drivers/devfreq/devfreq.c
>> @@ -483,18 +483,7 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>>           if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN))
>>                   return;
>>
>> -       switch (devfreq->profile->timer) {
>> -       case DEVFREQ_TIMER_DEFERRABLE:
>> -               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
>> -               break;
>> -       case DEVFREQ_TIMER_DELAYED:
>> -               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
>> -               break;
>> -       default:
>> -               return;
>> -       }
>> -
>> -       if (devfreq->profile->polling_ms)
>> +       if (devfreq->profile->polling_ms &&
>> + !delayed_work_pending(&devfreq->work))
>>                   queue_delayed_work(devfreq_wq, &devfreq->work,
>>                           msecs_to_jiffies(devfreq->profile->polling_ms));
>>    }
>> @@ -830,6 +819,17 @@ struct devfreq *devfreq_add_device(struct device *dev,
>>                   goto err_dev;
>>           }
>>
>> +       switch (devfreq->profile->timer) {
>> +       case DEVFREQ_TIMER_DEFERRABLE:
>> +               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
>> +               break;
>> +       case DEVFREQ_TIMER_DELAYED:
>> +               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
>> +               break;
>> +       default:
>> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", devfreq->governor_name,
>> +                       __func__);
>> +       }
>
> [..]
>
>>           if (!devfreq->profile->max_state || !devfreq->profile->freq_table) {
>>                   mutex_unlock(&devfreq->lock);
>>                   err = set_freq_table(devfreq); @@ -1860,6 +1860,18
>> @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>>           df->profile->timer = timer;
>>           mutex_unlock(&df->lock);
>>
>> +       switch (df->profile->timer) {
>> +       case DEVFREQ_TIMER_DEFERRABLE:
>> +               INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
>> +               break;
>> +       case DEVFREQ_TIMER_DELAYED:
>> +               INIT_DELAYED_WORK(&df->work, devfreq_monitor);
>> +               break;
>> +       default:
>> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
>> +                       __func__);
>> +       }
>> +
> Here, this can cause issue right, as it is modifying the delayed work data even before stopping the current running instances..
>
> Should the above thing be done after DEVFREQ_GOV_STOP ?
> But again it will boil down to the same thing as it is currently now .
>>           ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>>           if (ret) {
>>                   dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
> -->agree, seems better to put these codes after after DEVFREQ_GOV_STOP, as follow?
> @@ -1856,10 +1856,6 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>               goto out;
>       }
>
> -     mutex_lock(&df->lock);
> -     df->profile->timer = timer;
> -     mutex_unlock(&df->lock);
> -
>       ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>       if (ret) {
>               dev_warn(dev, "%s: Governor %s not stopped(%d)\n", @@ -1867,6 +1863,21 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>               goto out;
>       }
>
> +     mutex_lock(&df->lock);
> +     df->profile->timer = timer;
> +     switch (df->profile->timer) {
> +     case DEVFREQ_TIMER_DEFERRABLE:
> +             INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
> +             break;
> +     case DEVFREQ_TIMER_DELAYED:
> +             INIT_DELAYED_WORK(&df->work, devfreq_monitor);
> +             break;
> +     default:
> +             dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
> +                     __func__);
> +       mutex_unlock(&df->lock);
> +       goto out;
> +     }
> +     mutex_unlock(&df->lock);
> +
>       ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
>       if (ret)
>               dev_warn(dev, "%s: Governor %s not started(%d)\n",
Mukesh Ojha Nov. 24, 2023, 8:42 p.m. UTC | #8
On 11/24/2023 5:37 PM, 黄再漾(Joyyoung Huang) wrote:
>> There still seems to be problem with the patch [1], work is queued to
>> some cpu but not executing where there can be parallel call can come
>> for
>> devfreq_monitor_start() and that is not being checked and same can come during cancel_delayed_work_sync() call. We can protect the same with below patch applied on top of the patch given on [1].
>>
>> [1]
>> https://lore.kernel.org/all/1699957648-31299-1-git-send-email-quic_mojh
>> a@quicinc.com/
> 
>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>> index 09b93104521b..a25c74fc31d7 100644
>> --- a/drivers/devfreq/devfreq.c
>> +++ b/drivers/devfreq/devfreq.c
>> @@ -488,6 +488,9 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>>                  return;
>>
>>          mutex_lock(&devfreq->lock);
>> +       if (delayed_work_pending(&devfreq->work))
>> +               goto out;
>> +
>>          switch (devfreq->profile->timer) {
>>          case DEVFREQ_TIMER_DEFERRABLE:
>>                  INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor); @@ -503,8 +506,8 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>>                  queue_delayed_work(devfreq_wq, &devfreq->work,
>>
>> msecs_to_jiffies(devfreq->profile->polling_ms));
>>
>> -       devfreq->stop_polling = false;
>>   out:
>> +       devfreq->stop_polling = false;
>>          mutex_unlock(&devfreq->lock);
>>   }
>>   EXPORT_SYMBOL(devfreq_monitor_start);
>> @@ -529,8 +532,8 @@ void devfreq_monitor_stop(struct devfreq *devfreq)
>>          }
>>
>>          devfreq->stop_polling = true;
>> -       mutex_unlock(&devfreq->lock);
>>          cancel_delayed_work_sync(&devfreq->work);
>> +       mutex_unlock(&devfreq->lock);
> 
>> Thanks , It works well on my side!
> 
> Hi Mukesh,patch 2 still get some deadlock issue during pressure testing, between devfreq_monitor and devfreq_monitor_stop functions like:
>      [<ffffffd99bc7ab40>] __switch_to+0x244
>      [<ffffffd99d602490>] __schedule+0x5bc
>      [<ffffffd99d602b9c>] schedule+0x80
>      [<ffffffd99d60b53c>] schedule_timeout[jt]+0xe0
>      [<ffffffd99d6040cc>] wait_for_common+0x148
>      [<ffffffd99bd69008>] __flush_work+0x3b0
>      [<ffffffd99bd6c670>] __cancel_work_timer+0x11c
>      [<ffffffd99cddeb8c>] devfreq_simple_ondemand_handler+0x84
>      [<ffffffd99cddd660>] governor_store.60718+0x104
>      [<ffffffd99c8b8950>] dev_attr_store+0x38
>      [<ffffffd99c2a0254>] sysfs_kf_write+0x64
>      [<ffffffd99c29d98c>] kernfs_fop_write_iter+0x168
>      [<ffffffd99c15f1e8>] vfs_write+0x300
>      [<ffffffd99c15ee74>] ksys_write+0x7c
>      [<ffffffd99c15ede8>] __arm64_sys_write+0x20
>      [<ffffffd99bc94c24>] el0_svc_common+0xd4
>      [<ffffffd99d37d644>] el0_svc+0x28
>      [<ffffffd99d37d5b8>] el0_sync_handler+0x8c
>      [<ffffffd99bc120b4>] el0_sync+0x1b4
> This is because devfreq_monitor_stop keep devfreq->lock,and then cancel the work, will wait the work flush finish completion;
> The work completion need devfreq_monitor finish also, but devfreq_monitor need to get devfreq->lock too.
> So, cancel_delayed_work_sync(&devfreq->work);need remove to outside devfreq->lock protect!

uh !! terrible mistake! that is very obvious deadlock.,
Let me fix and post it., thanks for testing..


-Mukesh
> 
> -huangzaiyang
> 
>>
>>
>> -----邮件原件-----
>> 发件人: 黄再漾(Joyyoung Huang)
>> 发送时间: 2023年11月14日 21:37
>> 收件人: Mukesh Ojha <quic_mojha@quicinc.com>; myungjoo.ham@samsung.com;
>> kyungmin.park@samsung.com; cw00.choi@samsung.com
>> 抄送: linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org;
>> huangzaiyang <joyyoung.wang@gmail.com>
>> 主题: 回复: [PATCH] Performance: devfreq: avoid devfreq delay work re-init
>> before
>>
>>
>> On 11/10/2023 3:04 PM, huangzaiyang wrote:
>>> From: huangzaiyang <joyyoung.wang@gmail.com>
>>>
>>> There is a timer_list race condition when executing the following test shell script:
>>> '''
>>> while true
>>> do
>>>            echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor
>>>            echo "performance" >
>>> /sys/class/devfreq/1d84000.ufshc/governor
>>> done
>>> '''
>>>
>>> [13511.214366][    C3] Unable to handle kernel paging request at virtual address dead00000000012a
>>> [13511.214393][    C3] Mem abort info:
>>> [13511.214398][    C3]   ESR = 0x96000044
>>> [13511.214404][    C3]   EC = 0x25: DABT (current EL), IL = 32 bits
>>> [13511.214409][    C3]   SET = 0, FnV = 0
>>> [13511.214414][    C3]   EA = 0, S1PTW = 0
>>> [13511.214417][    C3] Data abort info:
>>> [13511.214422][    C3]   ISV = 0, ISS = 0x00000044
>>> [13511.214427][    C3]   CM = 0, WnR = 1
>>> [13511.214432][    C3] [dead00000000012a] address between user and kernel address ranges
>>> [13511.214439][    C3] Internal error: Oops: 96000044 [#1] PREEMPT SMP
>>> [13511.215449][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G S      W  O      5.10.168-android12-9-o-g63cc297a7b34 #1
>>> [13511.215454][    C3] Hardware name: Qualcomm Technologies, Inc. Cape MTP, Whiteswan (DT)
>>> [13511.215460][    C3] pstate: 82400085 (Nzcv daIf +PAN -UAO +TCO BTYPE=--)
>>> [13511.215472][    C3] pc : expire_timers+0x9c/0x428
>>> [13511.215478][    C3] lr : __run_timers+0x1f0/0x328
>>> [13511.215483][    C3] sp : ffffffc00801bdd0
>>> [13511.215487][    C3] x29: ffffffc00801bdf0 x28: ffffffdb87b31698
>>> [13511.215493][    C3] x27: ffffffdb87999e58 x26: ffffffdb87966008
>>> [13511.215499][    C3] x25: 0000000000000001 x24: ffffff8001734a00
>>> [13511.215506][    C3] x23: 00000000000000e0 x22: dead000000000122
>>> [13511.215512][    C3] x21: 000000010032658e x20: ffffff89f7a9ae80
>>> [13511.215518][    C3] x19: ffffffc00801be50 x18: ffffffc00801d038
>>> [13511.215525][    C3] x17: 0000000000000240 x16: 0000000000000201
>>> [13511.215532][    C3] x15: ffffffffffffffff x14: ffffff89f7a9aef8
>>> [13511.215538][    C3] x13: 0000000000000240 x12: ffffff89f7a9aea8
>>> [13511.215544][    C3] x11: 0000000000000021 x10: 000000014032658e
>>> [13511.215550][    C3] x9 : ffffffc00801be50 x8 : dead000000000122
>>> [13511.215556][    C3] x7 : ffff71646c68735e x6 : ffffff89f7aaae58
>>> [13511.215563][    C3] x5 : 0000000000000000 x4 : 0000000000000101
>>> [13511.215569][    C3] x3 : ffffff89f7a9aef0 x2 : ffffff89f7a9aef0
>>> [13511.215575][    C3] x1 : ffffffc00801be50 x0 : ffffff8045804428
>>> [13511.215581][    C3] Call trace:
>>> [13511.215586][    C3]  expire_timers+0x9c/0x428
>>> [13511.215591][    C3]  __run_timers+0x1f0/0x328
>>> [13511.215596][    C3]  run_timer_softirq+0x28/0x58
>>> [13511.215602][    C3]  efi_header_end+0x168/0x5ec
>>> [13511.215610][    C3]  __irq_exit_rcu+0x108/0x124
>>> [13511.215617][    C3]  __handle_domain_irq+0x118/0x1e4
>>> [13511.215625][    C3]  gic_handle_irq.31230+0x6c/0x250
>>> [13511.215630][    C3]  el1_irq+0xe4/0x1c0
>>> [13511.215638][    C3]  cpuidle_enter_state+0x3a4/0xa04
>>> [13511.215644][    C3]  do_idle+0x308/0x574
>>> [13511.215649][    C3]  cpu_startup_entry+0x84/0x90
>>> [13511.215656][    C3]  secondary_start_kernel+0x204/0x274
>>> [13511.215664][    C3] Code: d503201f a9402408 f9000128 b4000048 (f9000509)
>>> [13511.215670][    C3] ---[ end trace 5100bad72a35d566 ]---
>>> [13511.215676][    C3] Kernel panic - not syncing: Oops: Fatal exception in interrupt
>>>
>>> This is because when switching the governor through the sys node, the
>>> devfreq_monitor_start function will re-initialize the delayed work
>>> task, which will cause the delay work pending flag to become invalid,
>>> and the timer pending judgment contained in the delayed work will also become invalid, and then the pending interception will be executed when the queue is executed.
>>>
>>> So we remove the delay work'initialization work to the
>>> devfreq_add_device and timer_store functions, and the delay work pending judgment is performed before the devfreq_monitor_start function performs the queue operation.
>>>
>>> Signed-off-by: ZaiYang Huang <huangzaiyang@oppo.com>
>>> ---
>>>     drivers/devfreq/devfreq.c | 36 ++++++++++++++++++++++++------------
>>>     1 file changed, 24 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>>> index b3a68d5833bd..8ae6f853a21e 100644
>>> --- a/drivers/devfreq/devfreq.c
>>> +++ b/drivers/devfreq/devfreq.c
>>> @@ -483,18 +483,7 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>>>            if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN))
>>>                    return;
>>>
>>> -       switch (devfreq->profile->timer) {
>>> -       case DEVFREQ_TIMER_DEFERRABLE:
>>> -               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
>>> -               break;
>>> -       case DEVFREQ_TIMER_DELAYED:
>>> -               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
>>> -               break;
>>> -       default:
>>> -               return;
>>> -       }
>>> -
>>> -       if (devfreq->profile->polling_ms)
>>> +       if (devfreq->profile->polling_ms &&
>>> + !delayed_work_pending(&devfreq->work))
>>>                    queue_delayed_work(devfreq_wq, &devfreq->work,
>>>                            msecs_to_jiffies(devfreq->profile->polling_ms));
>>>     }
>>> @@ -830,6 +819,17 @@ struct devfreq *devfreq_add_device(struct device *dev,
>>>                    goto err_dev;
>>>            }
>>>
>>> +       switch (devfreq->profile->timer) {
>>> +       case DEVFREQ_TIMER_DEFERRABLE:
>>> +               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
>>> +               break;
>>> +       case DEVFREQ_TIMER_DELAYED:
>>> +               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
>>> +               break;
>>> +       default:
>>> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", devfreq->governor_name,
>>> +                       __func__);
>>> +       }
>>
>> [..]
>>
>>>            if (!devfreq->profile->max_state || !devfreq->profile->freq_table) {
>>>                    mutex_unlock(&devfreq->lock);
>>>                    err = set_freq_table(devfreq); @@ -1860,6 +1860,18
>>> @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>>>            df->profile->timer = timer;
>>>            mutex_unlock(&df->lock);
>>>
>>> +       switch (df->profile->timer) {
>>> +       case DEVFREQ_TIMER_DEFERRABLE:
>>> +               INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
>>> +               break;
>>> +       case DEVFREQ_TIMER_DELAYED:
>>> +               INIT_DELAYED_WORK(&df->work, devfreq_monitor);
>>> +               break;
>>> +       default:
>>> +               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
>>> +                       __func__);
>>> +       }
>>> +
>> Here, this can cause issue right, as it is modifying the delayed work data even before stopping the current running instances..
>>
>> Should the above thing be done after DEVFREQ_GOV_STOP ?
>> But again it will boil down to the same thing as it is currently now .
>>>            ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>>>            if (ret) {
>>>                    dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
>> -->agree, seems better to put these codes after after DEVFREQ_GOV_STOP, as follow?
>> @@ -1856,10 +1856,6 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>>                goto out;
>>        }
>>
>> -     mutex_lock(&df->lock);
>> -     df->profile->timer = timer;
>> -     mutex_unlock(&df->lock);
>> -
>>        ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>>        if (ret) {
>>                dev_warn(dev, "%s: Governor %s not stopped(%d)\n", @@ -1867,6 +1863,21 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
>>                goto out;
>>        }
>>
>> +     mutex_lock(&df->lock);
>> +     df->profile->timer = timer;
>> +     switch (df->profile->timer) {
>> +     case DEVFREQ_TIMER_DEFERRABLE:
>> +             INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
>> +             break;
>> +     case DEVFREQ_TIMER_DELAYED:
>> +             INIT_DELAYED_WORK(&df->work, devfreq_monitor);
>> +             break;
>> +     default:
>> +             dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
>> +                     __func__);
>> +       mutex_unlock(&df->lock);
>> +       goto out;
>> +     }
>> +     mutex_unlock(&df->lock);
>> +
>>        ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
>>        if (ret)
>>                dev_warn(dev, "%s: Governor %s not started(%d)\n",
> ________________________________
> OPPO
> 
> 本电子邮件及其附件含有OPPO公司的保密信息,仅限于邮件指明的收件人(包含个人及群组)使用。禁止任何人在未经授权的情况下以任何形式使用。如果您错收了本邮件,切勿传播、分发、复制、印刷或使用本邮件之任何部分或其所载之任何内容,并请立即以电子邮件通知发件人并删除本邮件及其附件。
> 网络通讯固有缺陷可能导致邮件被截留、修改、丢失、破坏或包含计算机病毒等不安全情况,OPPO对此类错误或遗漏而引致之任何损失概不承担责任并保留与本邮件相关之一切权利。
> 除非明确说明,本邮件及其附件无意作为在任何国家或地区之要约、招揽或承诺,亦无意作为任何交易或合同之正式确认。 发件人、其所属机构或所属机构之关联机构或任何上述机构之股东、董事、高级管理人员、员工或其他任何人(以下称“发件人”或“OPPO”)不因本邮件之误送而放弃其所享之任何权利,亦不对因故意或过失使用该等信息而引发或可能引发的损失承担任何责任。
> 文化差异披露:因全球文化差异影响,单纯以YES\OK或其他简单词汇的回复并不构成发件人对任何交易或合同之正式确认或接受,请与发件人再次确认以获得明确书面意见。发件人不对任何受文化差异影响而导致故意或错误使用该等信息所造成的任何直接或间接损害承担责任。
> This e-mail and its attachments contain confidential information from OPPO, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you are not the intended recipient, please do not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.
> Electronic communications may contain computer viruses or other defects inherently, may not be accurately and/or timely transmitted to other systems, or may be intercepted, modified ,delayed, deleted or interfered. OPPO shall not be liable for any damages that arise or may arise from such matter and reserves all rights in connection with the email.
> Unless expressly stated, this e-mail and its attachments are provided without any warranty, acceptance or promise of any kind in any country or region, nor constitute a formal confirmation or acceptance of any transaction or contract. The sender, together with its affiliates or any shareholder, director, officer, employee or any other person of any such institution (hereinafter referred to as "sender" or "OPPO") does not waive any rights and shall not be liable for any damages that arise or may arise from the intentional or negligent use of such information.
> Cultural Differences Disclosure: Due to global cultural differences, any reply with only YES\OK or other simple words does not constitute any confirmation or acceptance of any transaction or contract, please confirm with the sender again to ensure clear opinion in written form. The sender shall not be responsible for any direct or indirect damages resulting from the intentional or misuse of such information.
diff mbox series

Patch

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index b3a68d5833bd..8ae6f853a21e 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -483,18 +483,7 @@  void devfreq_monitor_start(struct devfreq *devfreq)
        if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN))
                return;

-       switch (devfreq->profile->timer) {
-       case DEVFREQ_TIMER_DEFERRABLE:
-               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
-               break;
-       case DEVFREQ_TIMER_DELAYED:
-               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
-               break;
-       default:
-               return;
-       }
-
-       if (devfreq->profile->polling_ms)
+       if (devfreq->profile->polling_ms && !delayed_work_pending(&devfreq->work))
                queue_delayed_work(devfreq_wq, &devfreq->work,
                        msecs_to_jiffies(devfreq->profile->polling_ms));
 }
@@ -830,6 +819,17 @@  struct devfreq *devfreq_add_device(struct device *dev,
                goto err_dev;
        }

+       switch (devfreq->profile->timer) {
+       case DEVFREQ_TIMER_DEFERRABLE:
+               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
+               break;
+       case DEVFREQ_TIMER_DELAYED:
+               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
+               break;
+       default:
+               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", devfreq->governor_name,
+                       __func__);
+       }
        if (!devfreq->profile->max_state || !devfreq->profile->freq_table) {
                mutex_unlock(&devfreq->lock);
                err = set_freq_table(devfreq);
@@ -1860,6 +1860,18 @@  static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
        df->profile->timer = timer;
        mutex_unlock(&df->lock);

+       switch (df->profile->timer) {
+       case DEVFREQ_TIMER_DEFERRABLE:
+               INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
+               break;
+       case DEVFREQ_TIMER_DELAYED:
+               INIT_DELAYED_WORK(&df->work, devfreq_monitor);
+               break;
+       default:
+               dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
+                       __func__);
+       }
+
        ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
        if (ret) {
                dev_warn(dev, "%s: Governor %s not stopped(%d)\n",