[v2,3/3] gpu: ion: oom killer

Message ID	1346746935-1149-4-git-send-email-zhangfei.gao@marvell.com
State	Rejected
Headers	show Return-Path: <patch+caf_=linaro-patchwork=canonical.com@linaro.org> Received-SPF: neutral (google.com: 91.189.95.16 is neither permitted nor denied by best guess record for domain of linaro-mm-sig-bounces@lists.linaro.org) client-ip=91.189.95.16; From: Zhangfei Gao <zhangfei.gao@marvell.com> To: Rebecca Schultz Zavin <rebecca@android.com>, "linaro-mm-sig@lists.linaro.org" <linaro-mm-sig@lists.linaro.org>, Haojian Zhuang <haojian.zhuang@gmail.com> Date: Tue, 4 Sep 2012 16:22:15 +0800 Message-Id: <1346746935-1149-4-git-send-email-zhangfei.gao@marvell.com> In-Reply-To: <1346746935-1149-1-git-send-email-zhangfei.gao@marvell.com> References: <1346746935-1149-1-git-send-email-zhangfei.gao@marvell.com> Cc: Zhangfei Gao <zhangfei.gao@marvell.com> Subject: [Linaro-mm-sig] [PATCH v2 3/3] gpu: ion: oom killer Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: linaro-mm-sig-bounces@lists.linaro.org Errors-To: linaro-mm-sig-bounces@lists.linaro.org

Zhangfei Gao Sept. 4, 2012, 8:22 a.m. UTC

ion_shrink is called when ION_CARVEOUT_ALLOCATE_FAIL

How to test:
mount -t debugfs none /mnt
cat /mnt/ion/carveout_heap
echo adj > /mnt/ion/carveout_heap
send_sig SIGKILL to the task with higher adj and using ion buffer
Also kill all others tasks refered the buffers, in case using empty buffer

Example:

cat /mnt/ion/carveout_heap
client              pid             size    oom_score_adj
ion_test              191             4096                0
ion_test              192             4096                0
ion_test              193             4096                0

echo -1 > /mnt/ion/carveout_heap
[ 1333.689318] SIGKILL pid: 191
[ 1333.692192] SIGKILL pid: 192
[ 1333.695436] SIGKILL pid: 193
[ 1333.698312] SIGKILL pid: 193 size: 4096 adj: 0
[1]+  Killed                     ./ion_test 2

cat /mnt/ion/carveout_heap
client              pid             size    oom_score_adj

Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com>
---
 drivers/gpu/ion/ion.c |  118 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 114 insertions(+), 4 deletions(-)

Nishanth Peethambaran Sept. 4, 2012, 9:32 a.m. UTC | #1

On Tue, Sep 4, 2012 at 1:52 PM, Zhangfei Gao <zhangfei.gao@marvell.com> wrote:
> ion_shrink is called when ION_CARVEOUT_ALLOCATE_FAIL
>
> How to test:
> mount -t debugfs none /mnt
> cat /mnt/ion/carveout_heap
> echo adj > /mnt/ion/carveout_heap
> send_sig SIGKILL to the task with higher adj and using ion buffer
> Also kill all others tasks refered the buffers, in case using empty buffer
>
> Example:
>
> cat /mnt/ion/carveout_heap
> client              pid             size    oom_score_adj
> ion_test              191             4096                0
> ion_test              192             4096                0
> ion_test              193             4096                0
>
> echo -1 > /mnt/ion/carveout_heap
> [ 1333.689318] SIGKILL pid: 191
> [ 1333.692192] SIGKILL pid: 192
> [ 1333.695436] SIGKILL pid: 193
> [ 1333.698312] SIGKILL pid: 193 size: 4096 adj: 0
> [1]+  Killed                     ./ion_test 2
>
> cat /mnt/ion/carveout_heap
> client              pid             size    oom_score_adj
>
> Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com>
> ---
>  drivers/gpu/ion/ion.c |  118 +++++++++++++++++++++++++++++++++++++++++++++++--
>  1 files changed, 114 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/ion/ion.c b/drivers/gpu/ion/ion.c
> index 207d00f..5ead74c 100644
> --- a/drivers/gpu/ion/ion.c
> +++ b/drivers/gpu/ion/ion.c
> @@ -32,6 +32,8 @@
>  #include <linux/uaccess.h>
>  #include <linux/debugfs.h>
>  #include <linux/dma-buf.h>
> +#include <linux/oom.h>
> +#include <linux/delay.h>
>
>  #include "ion_priv.h"
>
> @@ -367,6 +369,8 @@ static void ion_handle_add(struct ion_client *client, struct ion_handle *handle)
>         rb_insert_color(&handle->node, &client->handles);
>  }
>
> +static int ion_shrink(struct ion_heap *heap, int kill_adj);
> +
>  struct ion_handle *ion_alloc(struct ion_client *client, size_t len,
>                              size_t align, unsigned int heap_mask,
>                              unsigned int flags)
> @@ -375,6 +379,7 @@ struct ion_handle *ion_alloc(struct ion_client *client, size_t len,
>         struct ion_handle *handle;
>         struct ion_device *dev = client->dev;
>         struct ion_buffer *buffer = NULL;
> +       struct ion_heap *heap = NULL;
>
>         pr_debug("%s: len %d align %d heap_mask %u flags %x\n", __func__, len,
>                  align, heap_mask, flags);
> @@ -389,9 +394,10 @@ struct ion_handle *ion_alloc(struct ion_client *client, size_t len,
>
>         len = PAGE_ALIGN(len);
>
> +retry:
>         mutex_lock(&dev->lock);
>         for (n = rb_first(&dev->heaps); n != NULL; n = rb_next(n)) {
> -               struct ion_heap *heap = rb_entry(n, struct ion_heap, node);
> +               heap = rb_entry(n, struct ion_heap, node);
>                 /* if the client doesn't support this heap type */
>                 if (!((1 << heap->type) & client->heap_mask))
>                         continue;
> @@ -404,6 +410,11 @@ struct ion_handle *ion_alloc(struct ion_client *client, size_t len,
>         }
>         mutex_unlock(&dev->lock);
>
> +       if (buffer == ERR_PTR(-ENOMEM)) {
> +               if (!ion_shrink(heap, 0))
> +                       goto retry;
> +       }
> +

The heap which is attempted for shrink in this patch would be the last
heap registered by platform.
It should be the last heap user mentioned to try as per heap_mask.
Else, it could go into infinite loop also.
The last cma/carveout heap attempted for ion_buffer_create() would be
a better candidate for ion_shrink().
Alos, exiting retry after a few iterations and returning allocation
failure is a safe choice.

>         if (buffer == NULL)
>                 return ERR_PTR(-ENODEV);
>
> @@ -1178,7 +1189,8 @@ static int ion_debug_heap_show(struct seq_file *s, void *unused)
>         size_t total_size = 0;
>         size_t total_orphaned_size = 0;
>
> -       seq_printf(s, "%16.s %16.s %16.s\n", "client", "pid", "size");
> +       seq_printf(s, "%16.s %16.s %16.s %16.s\n",
> +               "client", "pid", "size", "oom_score_adj");
>         seq_printf(s, "----------------------------------------------------\n");
>
>         for (n = rb_first(&dev->clients); n; n = rb_next(n)) {
> @@ -1191,8 +1203,9 @@ static int ion_debug_heap_show(struct seq_file *s, void *unused)
>                         char task_comm[TASK_COMM_LEN];
>
>                         get_task_comm(task_comm, client->task);
> -                       seq_printf(s, "%16.s %16u %16u\n", task_comm,
> -                                  client->pid, size);
> +                       seq_printf(s, "%16.s %16u %16u %16d\n", task_comm,
> +                                  client->pid, size,
> +                                  client->task->signal->oom_score_adj);
>                 } else {
>                         seq_printf(s, "%16.s %16u %16u\n", client->name,
>                                    client->pid, size);
> @@ -1227,13 +1240,110 @@ static int ion_debug_heap_open(struct inode *inode, struct file *file)
>         return single_open(file, ion_debug_heap_show, inode->i_private);
>  }
>
> +static ssize_t
> +ion_debug_heap_write(struct file *file, const char __user *ubuf,
> +                                       size_t count, loff_t *ppos)
> +{
> +       struct ion_heap *heap =
> +               ((struct seq_file *)file->private_data)->private;
> +       char buf[16];
> +       long kill_adj, ret;
> +
> +       memset(buf, 0, sizeof(buf));
> +
> +       if (copy_from_user(&buf, ubuf, min_t(size_t, sizeof(buf) - 1, count)))
> +               return -EFAULT;
> +
> +       ret = kstrtol(buf, 10, &kill_adj);
> +       if (ret)
> +               return ret;
> +
> +       ion_shrink(heap, kill_adj);
> +
> +       return count;
> +}
>  static const struct file_operations debug_heap_fops = {
>         .open = ion_debug_heap_open,
> +       .write = ion_debug_heap_write,
>         .read = seq_read,
>         .llseek = seq_lseek,
>         .release = single_release,
>  };
>
> +/*
> + * ion_shrink
> + * kill all tasks referd the buffer by selected task
> + */
> +static int ion_shrink(struct ion_heap *heap, int kill_adj)
> +{
> +       struct rb_node *n;
> +       struct ion_client *client = NULL;
> +       struct ion_device *dev = heap->dev;
> +       struct task_struct *selected = NULL;
> +       int selected_size = 0;
> +       int selected_oom_score_adj = 0;
> +
> +       for (n = rb_first(&dev->clients); n; n = rb_next(n)) {
> +               size_t size;
> +               struct task_struct *p;
> +
> +               client = rb_entry(n, struct ion_client, node);
> +               if (!client->task)
> +                       continue;
> +
> +               p = client->task;
> +
> +               if ((p->signal->oom_score_adj <= kill_adj) ||
> +                       (p->signal->oom_score_adj < selected_oom_score_adj))
> +                       continue;
> +
> +               size = ion_debug_heap_total(client, heap->type);
> +               if (!size)
> +                       continue;
> +               if (size < selected_size)
> +                       continue;
> +
> +               selected = p;
> +               selected_size = size;
> +               selected_oom_score_adj = p->signal->oom_score_adj;
> +       }
> +
> +       if (selected) {
> +               /* kill all proeces refer buffer shared with this client */
> +               mutex_lock(&client->lock);
> +               for (n = rb_first(&client->handles); n; n = rb_next(n)) {
> +                       struct rb_node *r;
> +                       struct ion_client *c;
> +                       struct ion_handle *handle = rb_entry(n,
> +                                       struct ion_handle,
> +                                       node);
> +
> +                       for (r = rb_first(&dev->clients); r; r = rb_next(r)) {
> +                               struct ion_handle *h;
> +
> +                               c = rb_entry(r, struct ion_client, node);
> +                               h = ion_handle_lookup(c, handle->buffer);
> +                               if (!IS_ERR_OR_NULL(h)) {
> +                                       send_sig(SIGKILL, c->task, 0);
> +                                       pr_info("SIGKILL pid: %u\n",
> +                                                       c->task->pid);
> +                               }
> +
> +                       }
> +               }
> +               mutex_unlock(&client->lock);
> +
> +               send_sig(SIGKILL, selected, 0);
> +               set_tsk_thread_flag(selected, TIF_MEMDIE);
> +               pr_info("SIGKILL pid: %u size: %u adj: %u\n",
> +                               selected->pid, selected_size,
> +                               selected_oom_score_adj);
> +               msleep(20);

The msleep() might not be the right way to do it. Signalling from
client_destroy/buffer_free may be better.

> +               return 0;
> +       }
> +       return -EAGAIN;
> +}
> +
>  void ion_device_add_heap(struct ion_device *dev, struct ion_heap *heap)
>  {
>         struct rb_node **p = &dev->heaps.rb_node;
> --
> 1.7.1
>
>
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Zhangfei Gao Sept. 4, 2012, 9:54 a.m. UTC | #2

>> +retry:
>>         mutex_lock(&dev->lock);
>>         for (n = rb_first(&dev->heaps); n != NULL; n = rb_next(n)) {
>> -               struct ion_heap *heap = rb_entry(n, struct ion_heap, node);
>> +               heap = rb_entry(n, struct ion_heap, node);
>>                 /* if the client doesn't support this heap type */
>>                 if (!((1 << heap->type) & client->heap_mask))
>>                         continue;
>> @@ -404,6 +410,11 @@ struct ion_handle *ion_alloc(struct ion_client *client, size_t len,
>>         }
>>         mutex_unlock(&dev->lock);
>>
>> +       if (buffer == ERR_PTR(-ENOMEM)) {
>> +               if (!ion_shrink(heap, 0))
>> +                       goto retry;
>> +       }
>> +
>
> The heap which is attempted for shrink in this patch would be the last
> heap registered by platform.

The heap is searched out according to id and type.

> It should be the last heap user mentioned to try as per heap_mask.
> Else, it could go into infinite loop also.
> The last cma/carveout heap attempted for ion_buffer_create() would be
> a better candidate for ion_shrink().
> Alos, exiting retry after a few iterations and returning allocation
> failure is a safe choice.
>

1. ion_shrink happens only buffer == ERR_PTR(-ENOMEM),  which happens
when heap alloc fail.
2. retry happens if ion_shrink succeed, if no process could be kill,
ion_alloc will return fail without retry.
For example, if all process have adj=0, then ion_shrink will not be called.

>> +/*
>> + * ion_shrink
>> + * kill all tasks referd the buffer by selected task
>> + */
>> +static int ion_shrink(struct ion_heap *heap, int kill_adj)
>> +{
>> +       struct rb_node *n;
>> +       struct ion_client *client = NULL;
>> +       struct ion_device *dev = heap->dev;
>> +       struct task_struct *selected = NULL;
>> +       int selected_size = 0;
>> +       int selected_oom_score_adj = 0;
>> +
>> +       for (n = rb_first(&dev->clients); n; n = rb_next(n)) {
>> +               size_t size;
>> +               struct task_struct *p;
>> +
>> +               client = rb_entry(n, struct ion_client, node);
>> +               if (!client->task)
>> +                       continue;
>> +
>> +               p = client->task;
>> +
>> +               if ((p->signal->oom_score_adj <= kill_adj) ||
>> +                       (p->signal->oom_score_adj < selected_oom_score_adj))
>> +                       continue;
>> +
>> +               size = ion_debug_heap_total(client, heap->type);
>> +               if (!size)
>> +                       continue;
>> +               if (size < selected_size)
>> +                       continue;
>> +
>> +               selected = p;
>> +               selected_size = size;
>> +               selected_oom_score_adj = p->signal->oom_score_adj;
>> +       }
>> +
>> +       if (selected) {
>> +               /* kill all proeces refer buffer shared with this client */
>> +               mutex_lock(&client->lock);
>> +               for (n = rb_first(&client->handles); n; n = rb_next(n)) {
>> +                       struct rb_node *r;
>> +                       struct ion_client *c;
>> +                       struct ion_handle *handle = rb_entry(n,
>> +                                       struct ion_handle,
>> +                                       node);
>> +
>> +                       for (r = rb_first(&dev->clients); r; r = rb_next(r)) {
>> +                               struct ion_handle *h;
>> +
>> +                               c = rb_entry(r, struct ion_client, node);
>> +                               h = ion_handle_lookup(c, handle->buffer);
>> +                               if (!IS_ERR_OR_NULL(h)) {
>> +                                       send_sig(SIGKILL, c->task, 0);
>> +                                       pr_info("SIGKILL pid: %u\n",
>> +                                                       c->task->pid);
>> +                               }
>> +
>> +                       }
>> +               }
>> +               mutex_unlock(&client->lock);
>> +
>> +               send_sig(SIGKILL, selected, 0);
>> +               set_tsk_thread_flag(selected, TIF_MEMDIE);
>> +               pr_info("SIGKILL pid: %u size: %u adj: %u\n",
>> +                               selected->pid, selected_size,
>> +                               selected_oom_score_adj);
>> +               msleep(20);
>
> The msleep() might not be the right way to do it. Signalling from
> client_destroy/buffer_free may be better.
>
msleep is let SIGKILL and schedule take effective as soon as possible.
The release function of the killed process will be called immediately.

Nishanth Peethambaran Sept. 4, 2012, 10:47 a.m. UTC | #3

On Tue, Sep 4, 2012 at 3:24 PM, zhangfei gao <zhangfei.gao@gmail.com> wrote:
>>> +retry:
>>>         mutex_lock(&dev->lock);
>>>         for (n = rb_first(&dev->heaps); n != NULL; n = rb_next(n)) {
>>> -               struct ion_heap *heap = rb_entry(n, struct ion_heap, node);
>>> +               heap = rb_entry(n, struct ion_heap, node);
>>>                 /* if the client doesn't support this heap type */
>>>                 if (!((1 << heap->type) & client->heap_mask))
>>>                         continue;
>>> @@ -404,6 +410,11 @@ struct ion_handle *ion_alloc(struct ion_client *client, size_t len,
>>>         }
>>>         mutex_unlock(&dev->lock);
>>>
>>> +       if (buffer == ERR_PTR(-ENOMEM)) {
>>> +               if (!ion_shrink(heap, 0))
>>> +                       goto retry;
>>> +       }
>>> +
>>
>> The heap which is attempted for shrink in this patch would be the last
>> heap registered by platform.
>
> The heap is searched out according to id and type.
>
>> It should be the last heap user mentioned to try as per heap_mask.
>> Else, it could go into infinite loop also.
>> The last cma/carveout heap attempted for ion_buffer_create() would be
>> a better candidate for ion_shrink().
>> Alos, exiting retry after a few iterations and returning allocation
>> failure is a safe choice.
>>
>
> 1. ion_shrink happens only buffer == ERR_PTR(-ENOMEM),  which happens
> when heap alloc fail.

Take the case of heap ids 0-3 were registered with ion device and user
asks for heap2. Heap 2 will be tried for ion_buffer_create() and
assume it failed. The rbtree loop will iterate heap3 also but will not
be attempted for ion_buffer_create() because of the heap_mask check.
ion_shrink() will try shrinking heap3 instead of heap2.

> 2. retry happens if ion_shrink succeed, if no process could be kill,
> ion_alloc will return fail without retry.
> For example, if all process have adj=0, then ion_shrink will not be called.
>

Got it.

>>> +/*
>>> + * ion_shrink
>>> + * kill all tasks referd the buffer by selected task
>>> + */
>>> +static int ion_shrink(struct ion_heap *heap, int kill_adj)
>>> +{
>>> +       struct rb_node *n;
>>> +       struct ion_client *client = NULL;
>>> +       struct ion_device *dev = heap->dev;
>>> +       struct task_struct *selected = NULL;
>>> +       int selected_size = 0;
>>> +       int selected_oom_score_adj = 0;
>>> +
>>> +       for (n = rb_first(&dev->clients); n; n = rb_next(n)) {
>>> +               size_t size;
>>> +               struct task_struct *p;
>>> +
>>> +               client = rb_entry(n, struct ion_client, node);
>>> +               if (!client->task)
>>> +                       continue;
>>> +
>>> +               p = client->task;
>>> +
>>> +               if ((p->signal->oom_score_adj <= kill_adj) ||
>>> +                       (p->signal->oom_score_adj < selected_oom_score_adj))
>>> +                       continue;
>>> +
>>> +               size = ion_debug_heap_total(client, heap->type);
>>> +               if (!size)
>>> +                       continue;
>>> +               if (size < selected_size)
>>> +                       continue;
>>> +
>>> +               selected = p;
>>> +               selected_size = size;
>>> +               selected_oom_score_adj = p->signal->oom_score_adj;
>>> +       }
>>> +
>>> +       if (selected) {
>>> +               /* kill all proeces refer buffer shared with this client */
>>> +               mutex_lock(&client->lock);
>>> +               for (n = rb_first(&client->handles); n; n = rb_next(n)) {
>>> +                       struct rb_node *r;
>>> +                       struct ion_client *c;
>>> +                       struct ion_handle *handle = rb_entry(n,
>>> +                                       struct ion_handle,
>>> +                                       node);
>>> +
>>> +                       for (r = rb_first(&dev->clients); r; r = rb_next(r)) {
>>> +                               struct ion_handle *h;
>>> +
>>> +                               c = rb_entry(r, struct ion_client, node);
>>> +                               h = ion_handle_lookup(c, handle->buffer);
>>> +                               if (!IS_ERR_OR_NULL(h)) {
>>> +                                       send_sig(SIGKILL, c->task, 0);
>>> +                                       pr_info("SIGKILL pid: %u\n",
>>> +                                                       c->task->pid);
>>> +                               }
>>> +
>>> +                       }
>>> +               }
>>> +               mutex_unlock(&client->lock);
>>> +
>>> +               send_sig(SIGKILL, selected, 0);
>>> +               set_tsk_thread_flag(selected, TIF_MEMDIE);
>>> +               pr_info("SIGKILL pid: %u size: %u adj: %u\n",
>>> +                               selected->pid, selected_size,
>>> +                               selected_oom_score_adj);
>>> +               msleep(20);
>>
>> The msleep() might not be the right way to do it. Signalling from
>> client_destroy/buffer_free may be better.
>>
> msleep is let SIGKILL and schedule take effective as soon as possible.
> The release function of the killed process will be called immediately.

Correct me if wrong. It can't be assumed that release function of the
killed process will be called immediately or within 20ms if system
load is high. I am using msleep() as of now because of simplicity and
ease of implementation.
Safer option is to do a wait_event instead of msleep() with wake_up
happening when buffers get released. It is a bit harder to do. Before
send_sig() in ion_shrink(), the process going to wait should add
itself to a waitque list. When the client gets released (ideally when
buffer gets released), the wake_up should be sent to the waitqueues.
This is what I was thinking.

Zhangfei Gao Sept. 5, 2012, 4:34 a.m. UTC | #4

On Tue, Sep 4, 2012 at 6:47 PM, Nishanth Peethambaran
<nishanth.peethu@gmail.com> wrote:
> On Tue, Sep 4, 2012 at 3:24 PM, zhangfei gao <zhangfei.gao@gmail.com> wrote:
>>>> +retry:
>>>>         mutex_lock(&dev->lock);
>>>>         for (n = rb_first(&dev->heaps); n != NULL; n = rb_next(n)) {
>>>> -               struct ion_heap *heap = rb_entry(n, struct ion_heap, node);
>>>> +               heap = rb_entry(n, struct ion_heap, node);
>>>>                 /* if the client doesn't support this heap type */
>>>>                 if (!((1 << heap->type) & client->heap_mask))
>>>>                         continue;
>>>> @@ -404,6 +410,11 @@ struct ion_handle *ion_alloc(struct ion_client *client, size_t len,
>>>>         }
>>>>         mutex_unlock(&dev->lock);
>>>>
>>>> +       if (buffer == ERR_PTR(-ENOMEM)) {
>>>> +               if (!ion_shrink(heap, 0))
>>>> +                       goto retry;
>>>> +       }
>>>> +
>>>
>>> The heap which is attempted for shrink in this patch would be the last
>>> heap registered by platform.
>>
>> The heap is searched out according to id and type.
>>
>>> It should be the last heap user mentioned to try as per heap_mask.
>>> Else, it could go into infinite loop also.
>>> The last cma/carveout heap attempted for ion_buffer_create() would be
>>> a better candidate for ion_shrink().
>>> Alos, exiting retry after a few iterations and returning allocation
>>> failure is a safe choice.
>>>
>>
>> 1. ion_shrink happens only buffer == ERR_PTR(-ENOMEM),  which happens
>> when heap alloc fail.
>
> Take the case of heap ids 0-3 were registered with ion device and user
> asks for heap2. Heap 2 will be tried for ion_buffer_create() and
> assume it failed. The rbtree loop will iterate heap3 also but will not
> be attempted for ion_buffer_create() because of the heap_mask check.
> ion_shrink() will try shrinking heap3 instead of heap2.
>

Thanks Nishanth

There really have issue if several id with same type, which not
considered before.
Since ion_debug_heap_total takes care of type instead of id.
It can be easily solved by change ion_debug_heap_total(type) to  ->
ion_debug_heap_total(id).
Then ion_debug_heap_total return 0 for other id.

I see you already submit one patch to replace heap->type with heap->id.
This patch could based on that one, or your patch base on this one :)

Besides, another mistake is heap_found should be used.

>> 2. retry happens if ion_shrink succeed, if no process could be kill,
>> ion_alloc will return fail without retry.
>> For example, if all process have adj=0, then ion_shrink will not be called.
>>
>
> Got it.
>
>>>> +/*
>>>> + * ion_shrink
>>>> + * kill all tasks referd the buffer by selected task
>>>> + */
>>>> +static int ion_shrink(struct ion_heap *heap, int kill_adj)
>>>> +{
>>>> +       struct rb_node *n;
>>>> +       struct ion_client *client = NULL;
>>>> +       struct ion_device *dev = heap->dev;
>>>> +       struct task_struct *selected = NULL;
>>>> +       int selected_size = 0;
>>>> +       int selected_oom_score_adj = 0;
>>>> +
>>>> +       for (n = rb_first(&dev->clients); n; n = rb_next(n)) {
>>>> +               size_t size;
>>>> +               struct task_struct *p;
>>>> +
>>>> +               client = rb_entry(n, struct ion_client, node);
>>>> +               if (!client->task)
>>>> +                       continue;
>>>> +
>>>> +               p = client->task;
>>>> +
>>>> +               if ((p->signal->oom_score_adj <= kill_adj) ||
>>>> +                       (p->signal->oom_score_adj < selected_oom_score_adj))
>>>> +                       continue;
>>>> +
>>>> +               size = ion_debug_heap_total(client, heap->type);
>>>> +               if (!size)
>>>> +                       continue;
>>>> +               if (size < selected_size)
>>>> +                       continue;
>>>> +
>>>> +               selected = p;
>>>> +               selected_size = size;
>>>> +               selected_oom_score_adj = p->signal->oom_score_adj;
>>>> +       }
>>>> +
>>>> +       if (selected) {
>>>> +               /* kill all proeces refer buffer shared with this client */
>>>> +               mutex_lock(&client->lock);
>>>> +               for (n = rb_first(&client->handles); n; n = rb_next(n)) {
>>>> +                       struct rb_node *r;
>>>> +                       struct ion_client *c;
>>>> +                       struct ion_handle *handle = rb_entry(n,
>>>> +                                       struct ion_handle,
>>>> +                                       node);
>>>> +
>>>> +                       for (r = rb_first(&dev->clients); r; r = rb_next(r)) {
>>>> +                               struct ion_handle *h;
>>>> +
>>>> +                               c = rb_entry(r, struct ion_client, node);
>>>> +                               h = ion_handle_lookup(c, handle->buffer);
>>>> +                               if (!IS_ERR_OR_NULL(h)) {
>>>> +                                       send_sig(SIGKILL, c->task, 0);
>>>> +                                       pr_info("SIGKILL pid: %u\n",
>>>> +                                                       c->task->pid);
>>>> +                               }
>>>> +
>>>> +                       }
>>>> +               }
>>>> +               mutex_unlock(&client->lock);
>>>> +
>>>> +               send_sig(SIGKILL, selected, 0);
>>>> +               set_tsk_thread_flag(selected, TIF_MEMDIE);
>>>> +               pr_info("SIGKILL pid: %u size: %u adj: %u\n",
>>>> +                               selected->pid, selected_size,
>>>> +                               selected_oom_score_adj);
>>>> +               msleep(20);
>>>
>>> The msleep() might not be the right way to do it. Signalling from
>>> client_destroy/buffer_free may be better.
>>>
>> msleep is let SIGKILL and schedule take effective as soon as possible.
>> The release function of the killed process will be called immediately.
>
> Correct me if wrong. It can't be assumed that release function of the
> killed process will be called immediately or within 20ms if system
> load is high. I am using msleep() as of now because of simplicity and
> ease of implementation.
> Safer option is to do a wait_event instead of msleep() with wake_up
> happening when buffers get released. It is a bit harder to do. Before
> send_sig() in ion_shrink(), the process going to wait should add
> itself to a waitque list. When the client gets released (ideally when
> buffer gets released), the wake_up should be sent to the waitqueues.
> This is what I was thinking.

I don't think this is good suggestion.
msleep is used to release cpu and let other task to run.
If system load is high, no difference between msleep and wait_event.
The killed process and alloc process are different, why they wait for
each other.
Also no wait in lowmemorykiller.c
And the chance for calling ion_shrink suppose limited.

Thanks

Nishanth Peethambaran Sept. 5, 2012, 5:40 a.m. UTC | #5

On Wed, Sep 5, 2012 at 10:04 AM, zhangfei gao <zhangfei.gao@gmail.com> wrote:
> On Tue, Sep 4, 2012 at 6:47 PM, Nishanth Peethambaran
> <nishanth.peethu@gmail.com> wrote:
>> On Tue, Sep 4, 2012 at 3:24 PM, zhangfei gao <zhangfei.gao@gmail.com> wrote:
>>>>> +retry:
>>>>>         mutex_lock(&dev->lock);
>>>>>         for (n = rb_first(&dev->heaps); n != NULL; n = rb_next(n)) {
>>>>> -               struct ion_heap *heap = rb_entry(n, struct ion_heap, node);
>>>>> +               heap = rb_entry(n, struct ion_heap, node);
>>>>>                 /* if the client doesn't support this heap type */
>>>>>                 if (!((1 << heap->type) & client->heap_mask))
>>>>>                         continue;
>>>>> @@ -404,6 +410,11 @@ struct ion_handle *ion_alloc(struct ion_client *client, size_t len,
>>>>>         }
>>>>>         mutex_unlock(&dev->lock);
>>>>>
>>>>> +       if (buffer == ERR_PTR(-ENOMEM)) {
>>>>> +               if (!ion_shrink(heap, 0))
>>>>> +                       goto retry;
>>>>> +       }
>>>>> +
>>>>
>>>> The heap which is attempted for shrink in this patch would be the last
>>>> heap registered by platform.
>>>
>>> The heap is searched out according to id and type.
>>>
>>>> It should be the last heap user mentioned to try as per heap_mask.
>>>> Else, it could go into infinite loop also.
>>>> The last cma/carveout heap attempted for ion_buffer_create() would be
>>>> a better candidate for ion_shrink().
>>>> Alos, exiting retry after a few iterations and returning allocation
>>>> failure is a safe choice.
>>>>
>>>
>>> 1. ion_shrink happens only buffer == ERR_PTR(-ENOMEM),  which happens
>>> when heap alloc fail.
>>
>> Take the case of heap ids 0-3 were registered with ion device and user
>> asks for heap2. Heap 2 will be tried for ion_buffer_create() and
>> assume it failed. The rbtree loop will iterate heap3 also but will not
>> be attempted for ion_buffer_create() because of the heap_mask check.
>> ion_shrink() will try shrinking heap3 instead of heap2.
>>
>
> Thanks Nishanth
>
> There really have issue if several id with same type, which not
> considered before.
> Since ion_debug_heap_total takes care of type instead of id.
> It can be easily solved by change ion_debug_heap_total(type) to  ->
> ion_debug_heap_total(id).
> Then ion_debug_heap_total return 0 for other id.
>
> I see you already submit one patch to replace heap->type with heap->id.
> This patch could based on that one, or your patch base on this one :)
>
> Besides, another mistake is heap_found should be used.
>
>>> 2. retry happens if ion_shrink succeed, if no process could be kill,
>>> ion_alloc will return fail without retry.
>>> For example, if all process have adj=0, then ion_shrink will not be called.
>>>
>>
>> Got it.
>>
>>>>> +/*
>>>>> + * ion_shrink
>>>>> + * kill all tasks referd the buffer by selected task
>>>>> + */
>>>>> +static int ion_shrink(struct ion_heap *heap, int kill_adj)
>>>>> +{
>>>>> +       struct rb_node *n;
>>>>> +       struct ion_client *client = NULL;
>>>>> +       struct ion_device *dev = heap->dev;
>>>>> +       struct task_struct *selected = NULL;
>>>>> +       int selected_size = 0;
>>>>> +       int selected_oom_score_adj = 0;
>>>>> +
>>>>> +       for (n = rb_first(&dev->clients); n; n = rb_next(n)) {
>>>>> +               size_t size;
>>>>> +               struct task_struct *p;
>>>>> +
>>>>> +               client = rb_entry(n, struct ion_client, node);
>>>>> +               if (!client->task)
>>>>> +                       continue;
>>>>> +
>>>>> +               p = client->task;
>>>>> +
>>>>> +               if ((p->signal->oom_score_adj <= kill_adj) ||
>>>>> +                       (p->signal->oom_score_adj < selected_oom_score_adj))
>>>>> +                       continue;
>>>>> +
>>>>> +               size = ion_debug_heap_total(client, heap->type);
>>>>> +               if (!size)
>>>>> +                       continue;
>>>>> +               if (size < selected_size)
>>>>> +                       continue;
>>>>> +
>>>>> +               selected = p;
>>>>> +               selected_size = size;
>>>>> +               selected_oom_score_adj = p->signal->oom_score_adj;
>>>>> +       }
>>>>> +
>>>>> +       if (selected) {
>>>>> +               /* kill all proeces refer buffer shared with this client */
>>>>> +               mutex_lock(&client->lock);
>>>>> +               for (n = rb_first(&client->handles); n; n = rb_next(n)) {
>>>>> +                       struct rb_node *r;
>>>>> +                       struct ion_client *c;
>>>>> +                       struct ion_handle *handle = rb_entry(n,
>>>>> +                                       struct ion_handle,
>>>>> +                                       node);
>>>>> +
>>>>> +                       for (r = rb_first(&dev->clients); r; r = rb_next(r)) {
>>>>> +                               struct ion_handle *h;
>>>>> +
>>>>> +                               c = rb_entry(r, struct ion_client, node);
>>>>> +                               h = ion_handle_lookup(c, handle->buffer);
>>>>> +                               if (!IS_ERR_OR_NULL(h)) {
>>>>> +                                       send_sig(SIGKILL, c->task, 0);
>>>>> +                                       pr_info("SIGKILL pid: %u\n",
>>>>> +                                                       c->task->pid);
>>>>> +                               }
>>>>> +
>>>>> +                       }
>>>>> +               }
>>>>> +               mutex_unlock(&client->lock);
>>>>> +
>>>>> +               send_sig(SIGKILL, selected, 0);
>>>>> +               set_tsk_thread_flag(selected, TIF_MEMDIE);
>>>>> +               pr_info("SIGKILL pid: %u size: %u adj: %u\n",
>>>>> +                               selected->pid, selected_size,
>>>>> +                               selected_oom_score_adj);
>>>>> +               msleep(20);
>>>>
>>>> The msleep() might not be the right way to do it. Signalling from
>>>> client_destroy/buffer_free may be better.
>>>>
>>> msleep is let SIGKILL and schedule take effective as soon as possible.
>>> The release function of the killed process will be called immediately.
>>
>> Correct me if wrong. It can't be assumed that release function of the
>> killed process will be called immediately or within 20ms if system
>> load is high. I am using msleep() as of now because of simplicity and
>> ease of implementation.
>> Safer option is to do a wait_event instead of msleep() with wake_up
>> happening when buffers get released. It is a bit harder to do. Before
>> send_sig() in ion_shrink(), the process going to wait should add
>> itself to a waitque list. When the client gets released (ideally when
>> buffer gets released), the wake_up should be sent to the waitqueues.
>> This is what I was thinking.
>
> I don't think this is good suggestion.
> msleep is used to release cpu and let other task to run.
> If system load is high, no difference between msleep and wait_event.
> The killed process and alloc process are different, why they wait for
> each other.
> Also no wait in lowmemorykiller.c
> And the chance for calling ion_shrink suppose limited.
>
> Thanks

Usage of wait_event is to optimize the sleep time inside of a magic 20ms.
If I get time, I shall implement it and post a patch.

[v2,3/3] gpu: ion: oom killer

Commit Message

Comments

Patch