diff mbox

[3/3] gpu: ion: oom killer

Message ID 1346212340-25236-4-git-send-email-zhangfei.gao@marvell.com
State Rejected
Headers show

Commit Message

Zhangfei Gao Aug. 29, 2012, 3:52 a.m. UTC
ion_shrink is called when ION_CARVEOUT_ALLOCATE_FAIL

How to test:
mount -t debugfs none /mnt
cat /mnt/ion/carveout_heap
echo adj > /mnt/ion/carveout_heap
send_sig SIGKILL to the task with higher adj and using ion buffer
Also kill all others tasks refered the buffers, in case using empty buffer

Example:

cat /mnt/ion/carveout_heap
client              pid             size    oom_score_adj
ion_test              191             4096                0
ion_test              192             4096                0
ion_test              193             4096                0

echo -1 > /mnt/ion/carveout_heap
[ 1333.689318] SIGKILL pid: 191
[ 1333.692192] SIGKILL pid: 192
[ 1333.695436] SIGKILL pid: 193
[ 1333.698312] SIGKILL pid: 193 size: 4096 adj: 0
[1]+  Killed                     ./ion_test 2

cat /mnt/ion/carveout_heap
client              pid             size    oom_score_adj

Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com>
---
 drivers/gpu/ion/ion.c |  112 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 109 insertions(+), 3 deletions(-)

Comments

Nishanth Peethambaran Aug. 30, 2012, 9:34 a.m. UTC | #1
ion_buffer_create is called with dev->lock acquired and so dev->lock
would have to be
freed and schedule out before retry so that the process getting killed
can free up all the memory.
Else, it will get blocked while freeing the first buffer in
ion_buffer_destroy() when it tries to acquire
the dev->lock.
debugfs of carveout_heap might show the client not using the heap
though memory is not freed.
If the buffer was mapped to userspace, ion_share_dma_buf() would have
taken one reference of ion_buffer()
So, all the ion_buffer_put() will succeed if dma_buf fd is not
released first and thus all the ion_handle_put()
will succeed and will remove the handles from the client before
ion_client_destroy() will try to acquire dev->lock()

A lowmem killer with multiple threshold, omm_adj values similar to
android lowmem killer would
be also an option but it is a matter of policy whether each
carveout/cma heap needs to do a lowmem killer
or not. For some cases, fallback to other heaps would be ok and for
some cases fallback to other heaps
like kmalloc and vmalloc or even other carveout/cma areas might not be
an option.

Is there any particular reason to use send_sig() and
set_tsk_thread_flag() instead of force_sig().
Just curious to know.

 - Nishanth Peethambaran


On Wed, Aug 29, 2012 at 10:50 AM, Haojian Zhuang
<haojian.zhuang@gmail.com> wrote:
> On Wed, Aug 29, 2012 at 11:52 AM, Zhangfei Gao <zhangfei.gao@marvell.com> wrote:
>> ion_shrink is called when ION_CARVEOUT_ALLOCATE_FAIL
>>
>> How to test:
>> mount -t debugfs none /mnt
>> cat /mnt/ion/carveout_heap
>> echo adj > /mnt/ion/carveout_heap
>> send_sig SIGKILL to the task with higher adj and using ion buffer
>> Also kill all others tasks refered the buffers, in case using empty buffer
>>
>> Example:
>>
>> cat /mnt/ion/carveout_heap
>> client              pid             size    oom_score_adj
>> ion_test              191             4096                0
>> ion_test              192             4096                0
>> ion_test              193             4096                0
>>
>> echo -1 > /mnt/ion/carveout_heap
>> [ 1333.689318] SIGKILL pid: 191
>> [ 1333.692192] SIGKILL pid: 192
>> [ 1333.695436] SIGKILL pid: 193
>> [ 1333.698312] SIGKILL pid: 193 size: 4096 adj: 0
>> [1]+  Killed                     ./ion_test 2
>>
>> cat /mnt/ion/carveout_heap
>> client              pid             size    oom_score_adj
>>
>> Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com>
>
> Although it's good for keeping ION can always allocating memory, I
> doubt this killer
> won't be merged in android kernel mainline.
>
> Nobody care OOM killer on ION since CMA heap may be used in the future. Although
> fragment also exists in CMA heap, the solution is used to declare a
> larger CMA area.
>
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
Zhangfei Gao Aug. 31, 2012, 8:57 a.m. UTC | #2
On Thu, Aug 30, 2012 at 5:34 PM, Nishanth Peethambaran

Thanks Nishanth for careful review.

<nishanth.peethu@gmail.com> wrote:
> ion_buffer_create is called with dev->lock acquired and so dev->lock
> would have to be
> freed and schedule out before retry so that the process getting killed
> can free up all the memory.
> Else, it will get blocked while freeing the first buffer in
> ion_buffer_destroy() when it tries to acquire
> the dev->lock.

Your concern is correct.
My mistake for considering no problem if debugfs test no problem.
Will resend the updated the patch,
1. move ion_shrink out of mutex.
2. check error flag of ERR_PTR(-ENOMEM)
3. add msleep to allow schedule out.

> debugfs of carveout_heap might show the client not using the heap
> though memory is not freed.
> If the buffer was mapped to userspace, ion_share_dma_buf() would have
> taken one reference of ion_buffer()
> So, all the ion_buffer_put() will succeed if dma_buf fd is not
> released first and thus all the ion_handle_put()
> will succeed and will remove the handles from the client before
> ion_client_destroy() will try to acquire dev->lock()

Sorry, not fully understand.
The ion_shrink will send_sig according to used size, is this solve your concern.

>
> A lowmem killer with multiple threshold, omm_adj values similar to
> android lowmem killer would
> be also an option but it is a matter of policy whether each
> carveout/cma heap needs to do a lowmem killer
> or not. For some cases, fallback to other heaps would be ok and for
> some cases fallback to other heaps
> like kmalloc and vmalloc or even other carveout/cma areas might not be
> an option.

This simple killer is refer lownmemorykiller.
It is useful for us, since we still not enable CMA.

>
> Is there any particular reason to use send_sig() and
> set_tsk_thread_flag() instead of force_sig().
> Just curious to know.

This staff is referring lowmemorykiller.c
set_tsk_thread_flag is required no matter use send_sig() or force_sig().
However, during stress testing, force_sig() may cause system NULL pointer error.
While send_sig() is much stable.

Besides, the stress test on non android system, is manually
ion_shrink(heap, -1).
While usually ion_shrink(heap, 0), without kill adj <= 0.

>
>  - Nishanth Peethambaran
>
>
> On Wed, Aug 29, 2012 at 10:50 AM, Haojian Zhuang
> <haojian.zhuang@gmail.com> wrote:
>> On Wed, Aug 29, 2012 at 11:52 AM, Zhangfei Gao <zhangfei.gao@marvell.com> wrote:
>>> ion_shrink is called when ION_CARVEOUT_ALLOCATE_FAIL
>>>
>>> How to test:
>>> mount -t debugfs none /mnt
>>> cat /mnt/ion/carveout_heap
>>> echo adj > /mnt/ion/carveout_heap
>>> send_sig SIGKILL to the task with higher adj and using ion buffer
>>> Also kill all others tasks refered the buffers, in case using empty buffer
>>>
>>> Example:
>>>
>>> cat /mnt/ion/carveout_heap
>>> client              pid             size    oom_score_adj
>>> ion_test              191             4096                0
>>> ion_test              192             4096                0
>>> ion_test              193             4096                0
>>>
>>> echo -1 > /mnt/ion/carveout_heap
>>> [ 1333.689318] SIGKILL pid: 191
>>> [ 1333.692192] SIGKILL pid: 192
>>> [ 1333.695436] SIGKILL pid: 193
>>> [ 1333.698312] SIGKILL pid: 193 size: 4096 adj: 0
>>> [1]+  Killed                     ./ion_test 2
>>>
>>> cat /mnt/ion/carveout_heap
>>> client              pid             size    oom_score_adj
>>>
>>> Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com>
>>
>> Although it's good for keeping ION can always allocating memory, I
>> doubt this killer
>> won't be merged in android kernel mainline.
>>
>> Nobody care OOM killer on ION since CMA heap may be used in the future. Although
>> fragment also exists in CMA heap, the solution is used to declare a
>> larger CMA area.
>>
>> _______________________________________________
>> Linaro-mm-sig mailing list
>> Linaro-mm-sig@lists.linaro.org
>> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
>
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
Nishanth Peethambaran Aug. 31, 2012, 12:43 p.m. UTC | #3
On Fri, Aug 31, 2012 at 2:27 PM, zhangfei gao <zhangfei.gao@gmail.com> wrote:
> On Thu, Aug 30, 2012 at 5:34 PM, Nishanth Peethambaran
>
> Thanks Nishanth for careful review.
>
> <nishanth.peethu@gmail.com> wrote:
>> ion_buffer_create is called with dev->lock acquired and so dev->lock
>> would have to be
>> freed and schedule out before retry so that the process getting killed
>> can free up all the memory.
>> Else, it will get blocked while freeing the first buffer in
>> ion_buffer_destroy() when it tries to acquire
>> the dev->lock.
>
> Your concern is correct.
> My mistake for considering no problem if debugfs test no problem.
> Will resend the updated the patch,
> 1. move ion_shrink out of mutex.
> 2. check error flag of ERR_PTR(-ENOMEM)
> 3. add msleep to allow schedule out.
>
>> debugfs of carveout_heap might show the client not using the heap
>> though memory is not freed.
>> If the buffer was mapped to userspace, ion_share_dma_buf() would have
>> taken one reference of ion_buffer()
>> So, all the ion_buffer_put() will succeed if dma_buf fd is not
>> released first and thus all the ion_handle_put()
>> will succeed and will remove the handles from the client before
>> ion_client_destroy() will try to acquire dev->lock()
>
> Sorry, not fully understand.
> The ion_shrink will send_sig according to used size, is this solve your concern.
>

It was an explanation of how the first issue can get hidden by debugfs.
Client would be removed though the buffers may not get actually freed
and so won't be shown in debugfs as a client of the heap.

>>
>> A lowmem killer with multiple threshold, omm_adj values similar to
>> android lowmem killer would
>> be also an option but it is a matter of policy whether each
>> carveout/cma heap needs to do a lowmem killer
>> or not. For some cases, fallback to other heaps would be ok and for
>> some cases fallback to other heaps
>> like kmalloc and vmalloc or even other carveout/cma areas might not be
>> an option.
>
> This simple killer is refer lownmemorykiller.
> It is useful for us, since we still not enable CMA.
>

The lowmemkiller should kick in once the frees space goes below a
threshold or heap gets fragmented, but in the attached patch, it kicks
it only when allocation fails.
Refer to the comparison of free space to lowmem_minfree[n] done in
lowmem_shrink().
Even for CMA, lowmemkiller would be needed unless a huge area is
reserved for CMA which will restrict the area available for
non-movable allocations.

>>
>> Is there any particular reason to use send_sig() and
>> set_tsk_thread_flag() instead of force_sig().
>> Just curious to know.
>
> This staff is referring lowmemorykiller.c
> set_tsk_thread_flag is required no matter use send_sig() or force_sig().
> However, during stress testing, force_sig() may cause system NULL pointer error.
> While send_sig() is much stable.
>
> Besides, the stress test on non android system, is manually
> ion_shrink(heap, -1).
> While usually ion_shrink(heap, 0), without kill adj <= 0.
>

Ok. I am in the process of migration to 3.4 and so was referring to
3.0 lowmemkiller which uses force_sig().

>>
>>  - Nishanth Peethambaran
>>
>>
>> On Wed, Aug 29, 2012 at 10:50 AM, Haojian Zhuang
>> <haojian.zhuang@gmail.com> wrote:
>>> On Wed, Aug 29, 2012 at 11:52 AM, Zhangfei Gao <zhangfei.gao@marvell.com> wrote:
>>>> ion_shrink is called when ION_CARVEOUT_ALLOCATE_FAIL
>>>>
>>>> How to test:
>>>> mount -t debugfs none /mnt
>>>> cat /mnt/ion/carveout_heap
>>>> echo adj > /mnt/ion/carveout_heap
>>>> send_sig SIGKILL to the task with higher adj and using ion buffer
>>>> Also kill all others tasks refered the buffers, in case using empty buffer
>>>>
>>>> Example:
>>>>
>>>> cat /mnt/ion/carveout_heap
>>>> client              pid             size    oom_score_adj
>>>> ion_test              191             4096                0
>>>> ion_test              192             4096                0
>>>> ion_test              193             4096                0
>>>>
>>>> echo -1 > /mnt/ion/carveout_heap
>>>> [ 1333.689318] SIGKILL pid: 191
>>>> [ 1333.692192] SIGKILL pid: 192
>>>> [ 1333.695436] SIGKILL pid: 193
>>>> [ 1333.698312] SIGKILL pid: 193 size: 4096 adj: 0
>>>> [1]+  Killed                     ./ion_test 2
>>>>
>>>> cat /mnt/ion/carveout_heap
>>>> client              pid             size    oom_score_adj
>>>>
>>>> Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com>
>>>
>>> Although it's good for keeping ION can always allocating memory, I
>>> doubt this killer
>>> won't be merged in android kernel mainline.
>>>
>>> Nobody care OOM killer on ION since CMA heap may be used in the future. Although
>>> fragment also exists in CMA heap, the solution is used to declare a
>>> larger CMA area.
>>>
>>> _______________________________________________
>>> Linaro-mm-sig mailing list
>>> Linaro-mm-sig@lists.linaro.org
>>> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
>>
>> _______________________________________________
>> Linaro-mm-sig mailing list
>> Linaro-mm-sig@lists.linaro.org
>> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
Zhangfei Gao Sept. 3, 2012, 5:32 a.m. UTC | #4
On Fri, Aug 31, 2012 at 8:43 PM, Nishanth Peethambaran
<nishanth.peethu@gmail.com> wrote:
> On Fri, Aug 31, 2012 at 2:27 PM, zhangfei gao <zhangfei.gao@gmail.com> wrote:
>> On Thu, Aug 30, 2012 at 5:34 PM, Nishanth Peethambaran
>>
>> Thanks Nishanth for careful review.
>>
>> <nishanth.peethu@gmail.com> wrote:
>>> ion_buffer_create is called with dev->lock acquired and so dev->lock
>>> would have to be
>>> freed and schedule out before retry so that the process getting killed
>>> can free up all the memory.
>>> Else, it will get blocked while freeing the first buffer in
>>> ion_buffer_destroy() when it tries to acquire
>>> the dev->lock.
>>
>> Your concern is correct.
>> My mistake for considering no problem if debugfs test no problem.
>> Will resend the updated the patch,
>> 1. move ion_shrink out of mutex.
>> 2. check error flag of ERR_PTR(-ENOMEM)
>> 3. add msleep to allow schedule out.
>>
>>> debugfs of carveout_heap might show the client not using the heap
>>> though memory is not freed.
>>> If the buffer was mapped to userspace, ion_share_dma_buf() would have
>>> taken one reference of ion_buffer()
>>> So, all the ion_buffer_put() will succeed if dma_buf fd is not
>>> released first and thus all the ion_handle_put()
>>> will succeed and will remove the handles from the client before
>>> ion_client_destroy() will try to acquire dev->lock()
>>
>> Sorry, not fully understand.
>> The ion_shrink will send_sig according to used size, is this solve your concern.
>>
>
> It was an explanation of how the first issue can get hidden by debugfs.
> Client would be removed though the buffers may not get actually freed
> and so won't be shown in debugfs as a client of the heap.
>
>>>
>>> A lowmem killer with multiple threshold, omm_adj values similar to
>>> android lowmem killer would
>>> be also an option but it is a matter of policy whether each
>>> carveout/cma heap needs to do a lowmem killer
>>> or not. For some cases, fallback to other heaps would be ok and for
>>> some cases fallback to other heaps
>>> like kmalloc and vmalloc or even other carveout/cma areas might not be
>>> an option.
>>
>> This simple killer is refer lownmemorykiller.
>> It is useful for us, since we still not enable CMA.
>>
>
> The lowmemkiller should kick in once the frees space goes below a
> threshold or heap gets fragmented, but in the attached patch, it kicks
> it only when allocation fails.
> Refer to the comparison of free space to lowmem_minfree[n] done in
> lowmem_shrink().
> Even for CMA, lowmemkiller would be needed unless a huge area is
> reserved for CMA which will restrict the area available for
> non-movable allocations.
>
Could you help correct my understand.

Though lowmem provide lowmem_minfree and lowmem_adj, lowmem_shrink
registered as shrink function and only be called when allocation fail.

3.4:
mm/page_alloc.c
__alloc_pages_nodemask ->
if (unlikely(!page)) __alloc_pages_slowpath ->
wake_all_kswapd ->
vmscan.c
kswapd -> do_shrinker_shrink ->
lowmemorykiller.c
lowmem_shrink

Simple experiment in my environment.
by default
static int lowmem_minfree[6] = {
        3 * 512,        /* 6MB */
        2 * 1024,       /* 8MB */
        4 * 1024,       /* 16MB */
        16 * 1024,      /* 64MB */
};
# cd /tmp
# dd if=/dev/zero of=del bs=1M count=350
#free
              total         used         free       shared      buffers
  Mem:       384576       377588         6988            0            0
 Swap:            0            0            0
Total:       384576       377588         6988

lowmem_shrink is not triggered even free=6M and only happen when
free=3M, suppose allocation fail.

Conclusion is lowmem_shrink only be triggered when allocation fail.
While lowmem_minfree and lowmem_adj are useful when allocation fail.
And there is no dameon monitoring system memory touch to the threshold.

I am not sure whether the understand is correct, however in our ion
oom killer we also triggered the shrink function when allocation fail,
which is simple.
Or we have to start a kernel task keep monitoring ion size? looks complicated.

>>>
>>> Is there any particular reason to use send_sig() and
>>> set_tsk_thread_flag() instead of force_sig().
>>> Just curious to know.
>>
>> This staff is referring lowmemorykiller.c
>> set_tsk_thread_flag is required no matter use send_sig() or force_sig().
>> However, during stress testing, force_sig() may cause system NULL pointer error.
>> While send_sig() is much stable.
>>
>> Besides, the stress test on non android system, is manually
>> ion_shrink(heap, -1).
>> While usually ion_shrink(heap, 0), without kill adj <= 0.
>>
>
> Ok. I am in the process of migration to 3.4 and so was referring to
> 3.0 lowmemkiller which uses force_sig().

drivers/staging/android/lowmemorykiller.c
3.0 use force_sig,
3.4:
send_sig(SIGKILL, selected, 0);
set_tsk_thread_flag(selected, TIF_MEMDIE);

>
>>>
>>>  - Nishanth Peethambaran
Nishanth Peethambaran Sept. 3, 2012, 6:38 a.m. UTC | #5
On Mon, Sep 3, 2012 at 11:02 AM, zhangfei gao <zhangfei.gao@gmail.com> wrote:
> On Fri, Aug 31, 2012 at 8:43 PM, Nishanth Peethambaran
> <nishanth.peethu@gmail.com> wrote:
>> On Fri, Aug 31, 2012 at 2:27 PM, zhangfei gao <zhangfei.gao@gmail.com> wrote:
>>> On Thu, Aug 30, 2012 at 5:34 PM, Nishanth Peethambaran
>>>
>>> Thanks Nishanth for careful review.
>>>
>>> <nishanth.peethu@gmail.com> wrote:
>>>> ion_buffer_create is called with dev->lock acquired and so dev->lock
>>>> would have to be
>>>> freed and schedule out before retry so that the process getting killed
>>>> can free up all the memory.
>>>> Else, it will get blocked while freeing the first buffer in
>>>> ion_buffer_destroy() when it tries to acquire
>>>> the dev->lock.
>>>
>>> Your concern is correct.
>>> My mistake for considering no problem if debugfs test no problem.
>>> Will resend the updated the patch,
>>> 1. move ion_shrink out of mutex.
>>> 2. check error flag of ERR_PTR(-ENOMEM)
>>> 3. add msleep to allow schedule out.
>>>
>>>> debugfs of carveout_heap might show the client not using the heap
>>>> though memory is not freed.
>>>> If the buffer was mapped to userspace, ion_share_dma_buf() would have
>>>> taken one reference of ion_buffer()
>>>> So, all the ion_buffer_put() will succeed if dma_buf fd is not
>>>> released first and thus all the ion_handle_put()
>>>> will succeed and will remove the handles from the client before
>>>> ion_client_destroy() will try to acquire dev->lock()
>>>
>>> Sorry, not fully understand.
>>> The ion_shrink will send_sig according to used size, is this solve your concern.
>>>
>>
>> It was an explanation of how the first issue can get hidden by debugfs.
>> Client would be removed though the buffers may not get actually freed
>> and so won't be shown in debugfs as a client of the heap.
>>

Rebecca's new update in debugfs prints the orphaned buffers and the
client which allocated/imported it last.

>>>>
>>>> A lowmem killer with multiple threshold, omm_adj values similar to
>>>> android lowmem killer would
>>>> be also an option but it is a matter of policy whether each
>>>> carveout/cma heap needs to do a lowmem killer
>>>> or not. For some cases, fallback to other heaps would be ok and for
>>>> some cases fallback to other heaps
>>>> like kmalloc and vmalloc or even other carveout/cma areas might not be
>>>> an option.
>>>
>>> This simple killer is refer lownmemorykiller.
>>> It is useful for us, since we still not enable CMA.
>>>
>>
>> The lowmemkiller should kick in once the frees space goes below a
>> threshold or heap gets fragmented, but in the attached patch, it kicks
>> it only when allocation fails.
>> Refer to the comparison of free space to lowmem_minfree[n] done in
>> lowmem_shrink().
>> Even for CMA, lowmemkiller would be needed unless a huge area is
>> reserved for CMA which will restrict the area available for
>> non-movable allocations.
>>
> Could you help correct my understand.
>
> Though lowmem provide lowmem_minfree and lowmem_adj, lowmem_shrink
> registered as shrink function and only be called when allocation fail.
>
> 3.4:
> mm/page_alloc.c
> __alloc_pages_nodemask ->
> if (unlikely(!page)) __alloc_pages_slowpath ->
> wake_all_kswapd ->
> vmscan.c
> kswapd -> do_shrinker_shrink ->
> lowmemorykiller.c
> lowmem_shrink
>
> Simple experiment in my environment.
> by default
> static int lowmem_minfree[6] = {
>         3 * 512,        /* 6MB */
>         2 * 1024,       /* 8MB */
>         4 * 1024,       /* 16MB */
>         16 * 1024,      /* 64MB */
> };
> # cd /tmp
> # dd if=/dev/zero of=del bs=1M count=350
> #free
>               total         used         free       shared      buffers
>   Mem:       384576       377588         6988            0            0
>  Swap:            0            0            0
> Total:       384576       377588         6988
>
> lowmem_shrink is not triggered even free=6M and only happen when
> free=3M, suppose allocation fail.
>
> Conclusion is lowmem_shrink only be triggered when allocation fail.
> While lowmem_minfree and lowmem_adj are useful when allocation fail.
> And there is no dameon monitoring system memory touch to the threshold.
>
> I am not sure whether the understand is correct, however in our ion
> oom killer we also triggered the shrink function when allocation fail,
> which is simple.
> Or we have to start a kernel task keep monitoring ion size? looks complicated.
>

I may be wrong here. I have also started looking into linux and mm
recently. My understanding is kswapd is a daemon and will be kicked in
periodically as well, not only when alloc_page fails. So, to be more
pro-active the min_free_threshold can be kept higher which will ensure
background apps get killed when free memory falls below the threshold
and so the allocation won't fail.
What I experimented was to implement the oom killer for individual
heap. Memory in use is tracked by the alloc/free functions of heap
inside a heap variable.
With every allocation, this is compared to threshold to see whether
background apps should be killed. This works fine and avoids
allocations failures and so the timeout and retry logic will be hit
only in rare cases.
The issue here is that I haven;t found a cleaner way to define the
policy in a generic fashion. For example - if multiple heaps are used,
lowmemkiller per heap should be
kicked in when you are not relying on fallback heaps.

>>>>
>>>> Is there any particular reason to use send_sig() and
>>>> set_tsk_thread_flag() instead of force_sig().
>>>> Just curious to know.
>>>
>>> This staff is referring lowmemorykiller.c
>>> set_tsk_thread_flag is required no matter use send_sig() or force_sig().
>>> However, during stress testing, force_sig() may cause system NULL pointer error.
>>> While send_sig() is much stable.
>>>
>>> Besides, the stress test on non android system, is manually
>>> ion_shrink(heap, -1).
>>> While usually ion_shrink(heap, 0), without kill adj <= 0.
>>>
>>
>> Ok. I am in the process of migration to 3.4 and so was referring to
>> 3.0 lowmemkiller which uses force_sig().
>
> drivers/staging/android/lowmemorykiller.c
> 3.0 use force_sig,
> 3.4:
> send_sig(SIGKILL, selected, 0);
> set_tsk_thread_flag(selected, TIF_MEMDIE);
>
>>
>>>>
>>>>  - Nishanth Peethambaran
Haojian Zhuang Sept. 3, 2012, 7:16 a.m. UTC | #6
On Mon, Sep 3, 2012 at 2:38 PM, Nishanth Peethambaran
<nishanth.peethu@gmail.com> wrote:
>>>
>>> The lowmemkiller should kick in once the frees space goes below a
>>> threshold or heap gets fragmented, but in the attached patch, it kicks
>>> it only when allocation fails.
>>> Refer to the comparison of free space to lowmem_minfree[n] done in
>>> lowmem_shrink().
>>> Even for CMA, lowmemkiller would be needed unless a huge area is
>>> reserved for CMA which will restrict the area available for
>>> non-movable allocations.
>>>
>> Could you help correct my understand.
>>
>> Though lowmem provide lowmem_minfree and lowmem_adj, lowmem_shrink
>> registered as shrink function and only be called when allocation fail.
>>
>> 3.4:
>> mm/page_alloc.c
>> __alloc_pages_nodemask ->
>> if (unlikely(!page)) __alloc_pages_slowpath ->
>> wake_all_kswapd ->
>> vmscan.c
>> kswapd -> do_shrinker_shrink ->
>> lowmemorykiller.c
>> lowmem_shrink
>>
>> Simple experiment in my environment.
>> by default
>> static int lowmem_minfree[6] = {
>>         3 * 512,        /* 6MB */
>>         2 * 1024,       /* 8MB */
>>         4 * 1024,       /* 16MB */
>>         16 * 1024,      /* 64MB */
>> };
>> # cd /tmp
>> # dd if=/dev/zero of=del bs=1M count=350
>> #free
>>               total         used         free       shared      buffers
>>   Mem:       384576       377588         6988            0            0
>>  Swap:            0            0            0
>> Total:       384576       377588         6988
>>
>> lowmem_shrink is not triggered even free=6M and only happen when
>> free=3M, suppose allocation fail.
>>
>> Conclusion is lowmem_shrink only be triggered when allocation fail.
>> While lowmem_minfree and lowmem_adj are useful when allocation fail.
>> And there is no dameon monitoring system memory touch to the threshold.
>>
>> I am not sure whether the understand is correct, however in our ion
>> oom killer we also triggered the shrink function when allocation fail,
>> which is simple.
>> Or we have to start a kernel task keep monitoring ion size? looks complicated.
>>
>
> I may be wrong here. I have also started looking into linux and mm
> recently. My understanding is kswapd is a daemon and will be kicked in
> periodically as well, not only when alloc_page fails. So, to be more
> pro-active the min_free_threshold can be kept higher which will ensure
> background apps get killed when free memory falls below the threshold
> and so the allocation won't fail.

I have different opinion on it. If we set threshold, it means that
heap size is monitored.
For example, we create 100MB heap. If available heap size is less than
16MB, OOM killer
would be triggered. But most failure cases are allocating larger memory size.

If we allocate 20MB heap and available heap size is 50MB. Although we
have enough
memory size, we can't get enough memory size because of memory fragment.

And how to tell it's memory fragment? If there are a lot of 2MB
blocks, we allocate 512KB.
There's no memory fragment. If there are a lot of 2MB blocks, we would
fail to allocate 8MB
because of memory fragment. We can't use a simple formula to judge
which scenario is
memory fragment or not. And killer would kill background tasks. We
should avoid to
trigger killer frequently.
Nishanth Peethambaran Sept. 4, 2012, 5:01 a.m. UTC | #7
On Mon, Sep 3, 2012 at 12:46 PM, Haojian Zhuang
<haojian.zhuang@gmail.com> wrote:
> On Mon, Sep 3, 2012 at 2:38 PM, Nishanth Peethambaran
> <nishanth.peethu@gmail.com> wrote:
>>>>
>>>> The lowmemkiller should kick in once the frees space goes below a
>>>> threshold or heap gets fragmented, but in the attached patch, it kicks
>>>> it only when allocation fails.
>>>> Refer to the comparison of free space to lowmem_minfree[n] done in
>>>> lowmem_shrink().
>>>> Even for CMA, lowmemkiller would be needed unless a huge area is
>>>> reserved for CMA which will restrict the area available for
>>>> non-movable allocations.
>>>>
>>> Could you help correct my understand.
>>>
>>> Though lowmem provide lowmem_minfree and lowmem_adj, lowmem_shrink
>>> registered as shrink function and only be called when allocation fail.
>>>
>>> 3.4:
>>> mm/page_alloc.c
>>> __alloc_pages_nodemask ->
>>> if (unlikely(!page)) __alloc_pages_slowpath ->
>>> wake_all_kswapd ->
>>> vmscan.c
>>> kswapd -> do_shrinker_shrink ->
>>> lowmemorykiller.c
>>> lowmem_shrink
>>>
>>> Simple experiment in my environment.
>>> by default
>>> static int lowmem_minfree[6] = {
>>>         3 * 512,        /* 6MB */
>>>         2 * 1024,       /* 8MB */
>>>         4 * 1024,       /* 16MB */
>>>         16 * 1024,      /* 64MB */
>>> };
>>> # cd /tmp
>>> # dd if=/dev/zero of=del bs=1M count=350
>>> #free
>>>               total         used         free       shared      buffers
>>>   Mem:       384576       377588         6988            0            0
>>>  Swap:            0            0            0
>>> Total:       384576       377588         6988
>>>
>>> lowmem_shrink is not triggered even free=6M and only happen when
>>> free=3M, suppose allocation fail.
>>>
>>> Conclusion is lowmem_shrink only be triggered when allocation fail.
>>> While lowmem_minfree and lowmem_adj are useful when allocation fail.
>>> And there is no dameon monitoring system memory touch to the threshold.
>>>
>>> I am not sure whether the understand is correct, however in our ion
>>> oom killer we also triggered the shrink function when allocation fail,
>>> which is simple.
>>> Or we have to start a kernel task keep monitoring ion size? looks complicated.
>>>
>>
>> I may be wrong here. I have also started looking into linux and mm
>> recently. My understanding is kswapd is a daemon and will be kicked in
>> periodically as well, not only when alloc_page fails. So, to be more
>> pro-active the min_free_threshold can be kept higher which will ensure
>> background apps get killed when free memory falls below the threshold
>> and so the allocation won't fail.
>
> I have different opinion on it. If we set threshold, it means that
> heap size is monitored.
> For example, we create 100MB heap. If available heap size is less than
> 16MB, OOM killer
> would be triggered. But most failure cases are allocating larger memory size.
>
> If we allocate 20MB heap and available heap size is 50MB. Although we
> have enough
> memory size, we can't get enough memory size because of memory fragment.
>
> And how to tell it's memory fragment? If there are a lot of 2MB
> blocks, we allocate 512KB.
> There's no memory fragment. If there are a lot of 2MB blocks, we would
> fail to allocate 8MB
> because of memory fragment. We can't use a simple formula to judge
> which scenario is
> memory fragment or not. And killer would kill background tasks. We
> should avoid to
> trigger killer frequently.

You are right. I did not mean we will avoid having a oom killer which
is more of reactive and more aggressive - could kill tasks upto
oom_adj of 1/0 depending on whether we want to force-kill foreground
app or want the user-space app to take action on allocation faiilure.
I am just exploring an option of having a lowmem killer which is more
of pro-active type, need not be aggressive (maybe kill apps of oom_adj
> 6). But, I see few issues with that.
1. Some of heaps do not need explicit killer (kmalloc/vmalloc) types
could work with default android lowmem killer.
    - Having lowmemkiller/shrink specific to heap will solve it.
2. Keeping track of heap usage and total size available.
    - Add variables to store reserved size and to track current usage
of heap and updating it on alloc/free will solve it.
3. Fragmentation as you pointed out.
    - Solution1 is to not let the heap to fragment. We use a
user-space allocator which avoid number of system calls and always
request ION to give same sized slabs.
    - Solution2 is to check the biggest available contiguous chunk
available and have a separate threshold for this. (android lowmem
killer also checks multiple items) but the major issue with this
             is that entire buffer-list of the heap will have to be
run through each time which is not good.
    - Solution3 is the lowmem killer not to take care of fragmentation
check and let oom killer do it.
4. Defining a general policy is where I am struggling.
    - One possibility is to treat ion heaps similar to linux mm
zones(dma,normal,highmem,cma...), each zone has a threshold and once
each zone goes below the threshold, shrinking starts.
             eg: vmalloc is similar to highmem and has id=0,
CMA/carveout is similar to normal and has id=1, specific CMA/varveout
is similar to dma and has id=2.


 - Nishanth Peethambaran
Haojian Zhuang Sept. 4, 2012, 6:33 a.m. UTC | #8
On Tue, Sep 4, 2012 at 1:01 PM, Nishanth Peethambaran
<nishanth.peethu@gmail.com> wrote:
> On Mon, Sep 3, 2012 at 12:46 PM, Haojian Zhuang
> <haojian.zhuang@gmail.com> wrote:
>> On Mon, Sep 3, 2012 at 2:38 PM, Nishanth Peethambaran
>> <nishanth.peethu@gmail.com> wrote:
>>>>>
>>>>> The lowmemkiller should kick in once the frees space goes below a
>>>>> threshold or heap gets fragmented, but in the attached patch, it kicks
>>>>> it only when allocation fails.
>>>>> Refer to the comparison of free space to lowmem_minfree[n] done in
>>>>> lowmem_shrink().
>>>>> Even for CMA, lowmemkiller would be needed unless a huge area is
>>>>> reserved for CMA which will restrict the area available for
>>>>> non-movable allocations.
>>>>>
>>>> Could you help correct my understand.
>>>>
>>>> Though lowmem provide lowmem_minfree and lowmem_adj, lowmem_shrink
>>>> registered as shrink function and only be called when allocation fail.
>>>>
>>>> 3.4:
>>>> mm/page_alloc.c
>>>> __alloc_pages_nodemask ->
>>>> if (unlikely(!page)) __alloc_pages_slowpath ->
>>>> wake_all_kswapd ->
>>>> vmscan.c
>>>> kswapd -> do_shrinker_shrink ->
>>>> lowmemorykiller.c
>>>> lowmem_shrink
>>>>
>>>> Simple experiment in my environment.
>>>> by default
>>>> static int lowmem_minfree[6] = {
>>>>         3 * 512,        /* 6MB */
>>>>         2 * 1024,       /* 8MB */
>>>>         4 * 1024,       /* 16MB */
>>>>         16 * 1024,      /* 64MB */
>>>> };
>>>> # cd /tmp
>>>> # dd if=/dev/zero of=del bs=1M count=350
>>>> #free
>>>>               total         used         free       shared      buffers
>>>>   Mem:       384576       377588         6988            0            0
>>>>  Swap:            0            0            0
>>>> Total:       384576       377588         6988
>>>>
>>>> lowmem_shrink is not triggered even free=6M and only happen when
>>>> free=3M, suppose allocation fail.
>>>>
>>>> Conclusion is lowmem_shrink only be triggered when allocation fail.
>>>> While lowmem_minfree and lowmem_adj are useful when allocation fail.
>>>> And there is no dameon monitoring system memory touch to the threshold.
>>>>
>>>> I am not sure whether the understand is correct, however in our ion
>>>> oom killer we also triggered the shrink function when allocation fail,
>>>> which is simple.
>>>> Or we have to start a kernel task keep monitoring ion size? looks complicated.
>>>>
>>>
>>> I may be wrong here. I have also started looking into linux and mm
>>> recently. My understanding is kswapd is a daemon and will be kicked in
>>> periodically as well, not only when alloc_page fails. So, to be more
>>> pro-active the min_free_threshold can be kept higher which will ensure
>>> background apps get killed when free memory falls below the threshold
>>> and so the allocation won't fail.
>>
>> I have different opinion on it. If we set threshold, it means that
>> heap size is monitored.
>> For example, we create 100MB heap. If available heap size is less than
>> 16MB, OOM killer
>> would be triggered. But most failure cases are allocating larger memory size.
>>
>> If we allocate 20MB heap and available heap size is 50MB. Although we
>> have enough
>> memory size, we can't get enough memory size because of memory fragment.
>>
>> And how to tell it's memory fragment? If there are a lot of 2MB
>> blocks, we allocate 512KB.
>> There's no memory fragment. If there are a lot of 2MB blocks, we would
>> fail to allocate 8MB
>> because of memory fragment. We can't use a simple formula to judge
>> which scenario is
>> memory fragment or not. And killer would kill background tasks. We
>> should avoid to
>> trigger killer frequently.
>
> You are right. I did not mean we will avoid having a oom killer which
> is more of reactive and more aggressive - could kill tasks upto
> oom_adj of 1/0 depending on whether we want to force-kill foreground
> app or want the user-space app to take action on allocation faiilure.
> I am just exploring an option of having a lowmem killer which is more
> of pro-active type, need not be aggressive (maybe kill apps of oom_adj
>> 6). But, I see few issues with that.
> 1. Some of heaps do not need explicit killer (kmalloc/vmalloc) types
> could work with default android lowmem killer.
>     - Having lowmemkiller/shrink specific to heap will solve it.

I think that we can add flags field into struct ion_platform_data. We can also
define ION_HEAP_OOM_KILLER for flags. If user defines
ION_HEAP_OOM_KILLER in platform data, the OOM killer would be enabled
for this heap. Otherwise, OOM killer would be skipped.

> 2. Keeping track of heap usage and total size available.
>     - Add variables to store reserved size and to track current usage
> of heap and updating it on alloc/free will solve it.

It's OK to show this kind of message in debugfs. It could help user to know
current memory usage. We have similar patch that could be submitted later.

> 3. Fragmentation as you pointed out.
>     - Solution1 is to not let the heap to fragment. We use a
> user-space allocator which avoid number of system calls and always
> request ION to give same sized slabs.

It's hard to say which size is standard size. Maybe camera driver needs 10MB
block. Maybe LCD driver needs 16MB block. If we define the size too large,
it wastes memory. If we define the size too small, it can't benefit us.

>     - Solution2 is to check the biggest available contiguous chunk
> available and have a separate threshold for this. (android lowmem
> killer also checks multiple items) but the major issue with this
>              is that entire buffer-list of the heap will have to be
> run through each time which is not good.

It could be implemented, but it's a little complex. In this case,
killer would be
triggered after malloc. In order to implement it, we need to define threshold
to monitor chunk usage. If we only use #3, it could be much simpler.

>     - Solution3 is the lowmem killer not to take care of fragmentation
> check and let oom killer do it.

It's simple and effect. We like this one. What's your opinion?

> 4. Defining a general policy is where I am struggling.
>     - One possibility is to treat ion heaps similar to linux mm
> zones(dma,normal,highmem,cma...), each zone has a threshold and once
> each zone goes below the threshold, shrinking starts.
>              eg: vmalloc is similar to highmem and has id=0,
> CMA/carveout is similar to normal and has id=1, specific CMA/varveout
> is similar to dma and has id=2.
>

What's the definition of id? In ion_platform_data, the comment says "unique
identifier for heap. When allocating (lower numbers will be allocated from
first)".

How about defining two CMA or carveout heaps at the same time?
Nishanth Peethambaran Sept. 4, 2012, 7:48 a.m. UTC | #9
On Tue, Sep 4, 2012 at 12:03 PM, Haojian Zhuang
<haojian.zhuang@gmail.com> wrote:
> On Tue, Sep 4, 2012 at 1:01 PM, Nishanth Peethambaran
> <nishanth.peethu@gmail.com> wrote:
>> On Mon, Sep 3, 2012 at 12:46 PM, Haojian Zhuang
>> <haojian.zhuang@gmail.com> wrote:
>>> On Mon, Sep 3, 2012 at 2:38 PM, Nishanth Peethambaran
>>> <nishanth.peethu@gmail.com> wrote:
>>>>>>
>>>>>> The lowmemkiller should kick in once the frees space goes below a
>>>>>> threshold or heap gets fragmented, but in the attached patch, it kicks
>>>>>> it only when allocation fails.
>>>>>> Refer to the comparison of free space to lowmem_minfree[n] done in
>>>>>> lowmem_shrink().
>>>>>> Even for CMA, lowmemkiller would be needed unless a huge area is
>>>>>> reserved for CMA which will restrict the area available for
>>>>>> non-movable allocations.
>>>>>>
>>>>> Could you help correct my understand.
>>>>>
>>>>> Though lowmem provide lowmem_minfree and lowmem_adj, lowmem_shrink
>>>>> registered as shrink function and only be called when allocation fail.
>>>>>
>>>>> 3.4:
>>>>> mm/page_alloc.c
>>>>> __alloc_pages_nodemask ->
>>>>> if (unlikely(!page)) __alloc_pages_slowpath ->
>>>>> wake_all_kswapd ->
>>>>> vmscan.c
>>>>> kswapd -> do_shrinker_shrink ->
>>>>> lowmemorykiller.c
>>>>> lowmem_shrink
>>>>>
>>>>> Simple experiment in my environment.
>>>>> by default
>>>>> static int lowmem_minfree[6] = {
>>>>>         3 * 512,        /* 6MB */
>>>>>         2 * 1024,       /* 8MB */
>>>>>         4 * 1024,       /* 16MB */
>>>>>         16 * 1024,      /* 64MB */
>>>>> };
>>>>> # cd /tmp
>>>>> # dd if=/dev/zero of=del bs=1M count=350
>>>>> #free
>>>>>               total         used         free       shared      buffers
>>>>>   Mem:       384576       377588         6988            0            0
>>>>>  Swap:            0            0            0
>>>>> Total:       384576       377588         6988
>>>>>
>>>>> lowmem_shrink is not triggered even free=6M and only happen when
>>>>> free=3M, suppose allocation fail.
>>>>>
>>>>> Conclusion is lowmem_shrink only be triggered when allocation fail.
>>>>> While lowmem_minfree and lowmem_adj are useful when allocation fail.
>>>>> And there is no dameon monitoring system memory touch to the threshold.
>>>>>
>>>>> I am not sure whether the understand is correct, however in our ion
>>>>> oom killer we also triggered the shrink function when allocation fail,
>>>>> which is simple.
>>>>> Or we have to start a kernel task keep monitoring ion size? looks complicated.
>>>>>
>>>>
>>>> I may be wrong here. I have also started looking into linux and mm
>>>> recently. My understanding is kswapd is a daemon and will be kicked in
>>>> periodically as well, not only when alloc_page fails. So, to be more
>>>> pro-active the min_free_threshold can be kept higher which will ensure
>>>> background apps get killed when free memory falls below the threshold
>>>> and so the allocation won't fail.
>>>
>>> I have different opinion on it. If we set threshold, it means that
>>> heap size is monitored.
>>> For example, we create 100MB heap. If available heap size is less than
>>> 16MB, OOM killer
>>> would be triggered. But most failure cases are allocating larger memory size.
>>>
>>> If we allocate 20MB heap and available heap size is 50MB. Although we
>>> have enough
>>> memory size, we can't get enough memory size because of memory fragment.
>>>
>>> And how to tell it's memory fragment? If there are a lot of 2MB
>>> blocks, we allocate 512KB.
>>> There's no memory fragment. If there are a lot of 2MB blocks, we would
>>> fail to allocate 8MB
>>> because of memory fragment. We can't use a simple formula to judge
>>> which scenario is
>>> memory fragment or not. And killer would kill background tasks. We
>>> should avoid to
>>> trigger killer frequently.
>>
>> You are right. I did not mean we will avoid having a oom killer which
>> is more of reactive and more aggressive - could kill tasks upto
>> oom_adj of 1/0 depending on whether we want to force-kill foreground
>> app or want the user-space app to take action on allocation faiilure.
>> I am just exploring an option of having a lowmem killer which is more
>> of pro-active type, need not be aggressive (maybe kill apps of oom_adj
>>> 6). But, I see few issues with that.
>> 1. Some of heaps do not need explicit killer (kmalloc/vmalloc) types
>> could work with default android lowmem killer.
>>     - Having lowmemkiller/shrink specific to heap will solve it.
>
> I think that we can add flags field into struct ion_platform_data. We can also
> define ION_HEAP_OOM_KILLER for flags. If user defines
> ION_HEAP_OOM_KILLER in platform data, the OOM killer would be enabled
> for this heap. Otherwise, OOM killer would be skipped.
>
>> 2. Keeping track of heap usage and total size available.
>>     - Add variables to store reserved size and to track current usage
>> of heap and updating it on alloc/free will solve it.
>
> It's OK to show this kind of message in debugfs. It could help user to know
> current memory usage. We have similar patch that could be submitted later.
>

I meant it not for debugfs but for lowmemkiller. What I felt was for
the heap to track
the total size and used sized in internal heap struct (eg: struct
ion_carveout_heap).

>> 3. Fragmentation as you pointed out.
>>     - Solution1 is to not let the heap to fragment. We use a
>> user-space allocator which avoid number of system calls and always
>> request ION to give same sized slabs.
>
> It's hard to say which size is standard size. Maybe camera driver needs 10MB
> block. Maybe LCD driver needs 16MB block. If we define the size too large,
> it wastes memory. If we define the size too small, it can't benefit us.
>
>>     - Solution2 is to check the biggest available contiguous chunk
>> available and have a separate threshold for this. (android lowmem
>> killer also checks multiple items) but the major issue with this
>>              is that entire buffer-list of the heap will have to be
>> run through each time which is not good.
>
> It could be implemented, but it's a little complex. In this case,
> killer would be
> triggered after malloc. In order to implement it, we need to define threshold
> to monitor chunk usage. If we only use #3, it could be much simpler.
>
>>     - Solution3 is the lowmem killer not to take care of fragmentation
>> check and let oom killer do it.
>
> It's simple and effect. We like this one. What's your opinion?
>

Combination of 1 & 3 is what I am trying out.

>> 4. Defining a general policy is where I am struggling.
>>     - One possibility is to treat ion heaps similar to linux mm
>> zones(dma,normal,highmem,cma...), each zone has a threshold and once
>> each zone goes below the threshold, shrinking starts.
>>              eg: vmalloc is similar to highmem and has id=0,
>> CMA/carveout is similar to normal and has id=1, specific CMA/varveout
>> is similar to dma and has id=2.
>>
>
> What's the definition of id? In ion_platform_data, the comment says "unique
> identifier for heap. When allocating (lower numbers will be allocated from
> first)".
>
> How about defining two CMA or carveout heaps at the same time?

If multiple CMA/carveout heaps are defined, they need to have different ids.
That is where I am struggling to define a policy because usage of the heaps
is dependent on the user-code - whether they fallback or not and so cannot
define a general policy. I was just thinking loud to see whether there are more
ideas or a clean way of doing the lowmemkiller.
diff mbox

Patch

diff --git a/drivers/gpu/ion/ion.c b/drivers/gpu/ion/ion.c
index 34c12df..3edaab0 100644
--- a/drivers/gpu/ion/ion.c
+++ b/drivers/gpu/ion/ion.c
@@ -32,6 +32,7 @@ 
 #include <linux/uaccess.h>
 #include <linux/debugfs.h>
 #include <linux/dma-buf.h>
+#include <linux/oom.h>
 
 #include "ion_priv.h"
 
@@ -336,6 +337,8 @@  static void ion_handle_add(struct ion_client *client, struct ion_handle *handle)
 	rb_insert_color(&handle->node, &client->handles);
 }
 
+static int ion_shrink(struct ion_heap *heap, int kill_adj);
+
 struct ion_handle *ion_alloc(struct ion_client *client, size_t len,
 			     size_t align, unsigned int heap_mask,
 			     unsigned int flags)
@@ -367,7 +370,12 @@  struct ion_handle *ion_alloc(struct ion_client *client, size_t len,
 		/* if the caller didn't specify this heap type */
 		if (!((1 << heap->id) & heap_mask))
 			continue;
+retry:
 		buffer = ion_buffer_create(heap, dev, len, align, flags);
+		if (buffer == ERR_PTR(ION_CARVEOUT_ALLOCATE_FAIL)) {
+			if (!ion_shrink(heap, 0))
+				goto retry;
+		}
 		if (!IS_ERR_OR_NULL(buffer))
 			break;
 	}
@@ -1144,7 +1152,8 @@  static int ion_debug_heap_show(struct seq_file *s, void *unused)
 	struct ion_device *dev = heap->dev;
 	struct rb_node *n;
 
-	seq_printf(s, "%16.s %16.s %16.s\n", "client", "pid", "size");
+	seq_printf(s, "%16.s %16.s %16.s %16.s\n",
+		"client", "pid", "size", "oom_score_adj");
 
 	for (n = rb_first(&dev->clients); n; n = rb_next(n)) {
 		struct ion_client *client = rb_entry(n, struct ion_client,
@@ -1156,8 +1165,9 @@  static int ion_debug_heap_show(struct seq_file *s, void *unused)
 			char task_comm[TASK_COMM_LEN];
 
 			get_task_comm(task_comm, client->task);
-			seq_printf(s, "%16.s %16u %16u\n", task_comm,
-				   client->pid, size);
+			seq_printf(s, "%16.s %16u %16u %16u\n", task_comm,
+				   client->pid, size,
+				   client->task->signal->oom_score_adj);
 		} else {
 			seq_printf(s, "%16.s %16u %16u\n", client->name,
 				   client->pid, size);
@@ -1171,13 +1181,109 @@  static int ion_debug_heap_open(struct inode *inode, struct file *file)
 	return single_open(file, ion_debug_heap_show, inode->i_private);
 }
 
+static ssize_t
+ion_debug_heap_write(struct file *file, const char __user *ubuf,
+					size_t count, loff_t *ppos)
+{
+	struct ion_heap *heap =
+		((struct seq_file *)file->private_data)->private;
+	char buf[16];
+	long kill_adj, ret;
+
+	memset(buf, 0, sizeof(buf));
+
+	if (copy_from_user(&buf, ubuf, min_t(size_t, sizeof(buf) - 1, count)))
+		return -EFAULT;
+
+	ret = kstrtol(buf, 10, &kill_adj);
+	if (ret)
+		return ret;
+
+	ion_shrink(heap, kill_adj);
+
+	return count;
+}
 static const struct file_operations debug_heap_fops = {
 	.open = ion_debug_heap_open,
+	.write = ion_debug_heap_write,
 	.read = seq_read,
 	.llseek = seq_lseek,
 	.release = single_release,
 };
 
+/*
+ * ion_shrink
+ * kill all tasks referd the buffer by selected task
+ */
+static int ion_shrink(struct ion_heap *heap, int kill_adj)
+{
+	struct rb_node *n;
+	struct ion_client *client = NULL;
+	struct ion_device *dev = heap->dev;
+	struct task_struct *selected = NULL;
+	int selected_size = 0;
+	int selected_oom_score_adj = 0;
+
+	for (n = rb_first(&dev->clients); n; n = rb_next(n)) {
+		size_t size;
+		struct task_struct *p;
+
+		client = rb_entry(n, struct ion_client, node);
+		if (!client->task)
+			continue;
+
+		p = client->task;
+
+		if ((p->signal->oom_score_adj <= kill_adj) ||
+			(p->signal->oom_score_adj < selected_oom_score_adj))
+			continue;
+
+		size = ion_debug_heap_total(client, heap->type);
+		if (!size)
+			continue;
+		if (size < selected_size)
+			continue;
+
+		selected = p;
+		selected_size = size;
+		selected_oom_score_adj = p->signal->oom_score_adj;
+	}
+
+	if (selected) {
+		/* kill all proeces refer buffer shared with this client */
+		mutex_lock(&client->lock);
+		for (n = rb_first(&client->handles); n; n = rb_next(n)) {
+			struct rb_node *r;
+			struct ion_client *c;
+			struct ion_handle *handle = rb_entry(n,
+					struct ion_handle,
+					node);
+
+			for (r = rb_first(&dev->clients); r; r = rb_next(r)) {
+				struct ion_handle *h;
+
+				c = rb_entry(r, struct ion_client, node);
+				h = ion_handle_lookup(c, handle->buffer);
+				if (!IS_ERR_OR_NULL(h)) {
+					send_sig(SIGKILL, c->task, 0);
+					pr_info("SIGKILL pid: %u\n",
+							c->task->pid);
+				}
+
+			}
+		}
+		mutex_unlock(&client->lock);
+
+		send_sig(SIGKILL, selected, 0);
+		set_tsk_thread_flag(selected, TIF_MEMDIE);
+		pr_info("SIGKILL pid: %u size: %u adj: %u\n",
+				selected->pid, selected_size,
+				selected_oom_score_adj);
+		return 0;
+	}
+	return -EAGAIN;
+}
+
 void ion_device_add_heap(struct ion_device *dev, struct ion_heap *heap)
 {
 	struct rb_node **p = &dev->heaps.rb_node;