Message ID | 20170614211556.2062728-4-arnd@arndb.de |
---|---|
State | New |
Headers | show |
Series | bring back stack frame warning with KASAN | expand |
Hello, Arnd Bergmann, on mer. 14 juin 2017 23:15:38 +0200, wrote: > As reported by kernelci, some functions in the VT code use significant > amounts of kernel stack when local variables get inlined into the caller > multiple times: > > drivers/tty/vt/keyboard.c: In function 'kbd_keycode': > drivers/tty/vt/keyboard.c:1452:1: error: the frame size of 2240 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] > > Annotating those functions as noinline_if_stackbloat prevents the inlining > and reduces the overall stack usage in this driver. > --- a/drivers/tty/vt/keyboard.c > +++ b/drivers/tty/vt/keyboard.c > @@ -301,13 +301,13 @@ int kbd_rate(struct kbd_repeat *rpt) > /* > * Helper Functions. > */ > -static void put_queue(struct vc_data *vc, int ch) > +static noinline_if_stackbloat void put_queue(struct vc_data *vc, int ch) > { > tty_insert_flip_char(&vc->port, ch, 0); > tty_schedule_flip(&vc->port); > } I'm surprised that this be able generate so much stack use: the tty_insert_flip_char inline only uses a pointer and an int. And I'm surprised that multiple inlines can accumulate stack usage. I however agree that it's a bad idea to inline it in functions where it's called so many times (and we're talking about the keyboard anyway). > -static void puts_queue(struct vc_data *vc, char *cp) > +static noinline_if_stackbloat void puts_queue(struct vc_data *vc, char *cp) I don't see why, it's only called once in the callers. k_fn, however, is called several times in k_pad, so that could be why, but then it's rather be the inlining of k_fn which is a bad idea. > -static void fn_send_intr(struct vc_data *vc) > +static noinline_if_stackbloat void fn_send_intr(struct vc_data *vc) This one is only referenced, not called, I don't see how that could pose problem. Samuel
On Wed, Jun 14, 2017 at 11:28 PM, Samuel Thibault <samuel.thibault@ens-lyon.org> wrote: > Hello, > > Arnd Bergmann, on mer. 14 juin 2017 23:15:38 +0200, wrote: >> As reported by kernelci, some functions in the VT code use significant >> amounts of kernel stack when local variables get inlined into the caller >> multiple times: >> >> drivers/tty/vt/keyboard.c: In function 'kbd_keycode': >> drivers/tty/vt/keyboard.c:1452:1: error: the frame size of 2240 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] >> >> Annotating those functions as noinline_if_stackbloat prevents the inlining >> and reduces the overall stack usage in this driver. > > >> --- a/drivers/tty/vt/keyboard.c >> +++ b/drivers/tty/vt/keyboard.c >> @@ -301,13 +301,13 @@ int kbd_rate(struct kbd_repeat *rpt) >> /* >> * Helper Functions. >> */ >> -static void put_queue(struct vc_data *vc, int ch) >> +static noinline_if_stackbloat void put_queue(struct vc_data *vc, int ch) >> { >> tty_insert_flip_char(&vc->port, ch, 0); >> tty_schedule_flip(&vc->port); >> } > > I'm surprised that this be able generate so much stack use: the > tty_insert_flip_char inline only uses a pointer and an int. > > And I'm surprised that multiple inlines can accumulate stack usage. The reason is that CONFIG_KASAN forces each local variable to have a separate location on the stack whenever it gets passed into an external function (tty_insert_flip_string_flags in this case), so the sanitizer is able to report exactly which instance caused the problem. > I however agree that it's a bad idea to inline it in functions where > it's called so many times (and we're talking about the keyboard anyway). > >> -static void puts_queue(struct vc_data *vc, char *cp) >> +static noinline_if_stackbloat void puts_queue(struct vc_data *vc, char *cp) > > I don't see why, it's only called once in the callers. k_fn, however, is > called several times in k_pad, so that could be why, but then it's > rather be the inlining of k_fn which is a bad idea. It's called by applkey, which in turn is called by k_pad(), and this all gets inlined by default. >> -static void fn_send_intr(struct vc_data *vc) >> +static noinline_if_stackbloat void fn_send_intr(struct vc_data *vc) > > This one is only referenced, not called, I don't see how that could pose > problem. I was surprised by this as well, but it seems that gcc these days is smart enough to turn the indirect function calls for k_handler[type] and/or f_handler[value] into inlines again when it has already determined the index to be constant. It's been a while since I looked at the patch, and I'd have to disassemble it again to figure out the details, but I'm pretty sure I needed this to get the stack usage down to normal levels. Arnd
Arnd Bergmann, on mer. 14 juin 2017 23:56:39 +0200, wrote: > > I however agree that it's a bad idea to inline it in functions where > > it's called so many times (and we're talking about the keyboard anyway). > > > >> -static void puts_queue(struct vc_data *vc, char *cp) > >> +static noinline_if_stackbloat void puts_queue(struct vc_data *vc, char *cp) > > > > I don't see why, it's only called once in the callers. k_fn, however, is > > called several times in k_pad, so that could be why, but then it's > > rather be the inlining of k_fn which is a bad idea. > > It's called by applkey, which in turn is called by k_pad(), k_pad calls applkey twice only. Is that really to be considered bloat? > >> -static void fn_send_intr(struct vc_data *vc) > >> +static noinline_if_stackbloat void fn_send_intr(struct vc_data *vc) > > > > This one is only referenced, not called, I don't see how that could pose > > problem. > > I was surprised by this as well, but it seems that gcc these days is > smart enough to turn the indirect function calls for k_handler[type] > and/or f_handler[value] into inlines again when it has already > determined the index to be constant. Cool :) But I don't see how it can see find it out constant. The only fn_handler[] caller is k_spec, using value as index. The only caller of f_handler[] is kbd_keycode, using type as index, and keysym&0xff as value. That is definitely not constant :) And it's only one caller, I don't see how that can bloat. Samuel
On Wed, Jun 14, 2017 at 11:15:38PM +0200, Arnd Bergmann wrote: > As reported by kernelci, some functions in the VT code use significant > amounts of kernel stack when local variables get inlined into the caller > multiple times: > > drivers/tty/vt/keyboard.c: In function 'kbd_keycode': > drivers/tty/vt/keyboard.c:1452:1: error: the frame size of 2240 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] > > Annotating those functions as noinline_if_stackbloat prevents the inlining > and reduces the overall stack usage in this driver. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > drivers/tty/vt/keyboard.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/tty/vt/keyboard.c b/drivers/tty/vt/keyboard.c > index f4166263bb3a..c0d111444a0e 100644 > --- a/drivers/tty/vt/keyboard.c > +++ b/drivers/tty/vt/keyboard.c > @@ -301,13 +301,13 @@ int kbd_rate(struct kbd_repeat *rpt) > /* > * Helper Functions. > */ > -static void put_queue(struct vc_data *vc, int ch) > +static noinline_if_stackbloat void put_queue(struct vc_data *vc, int ch) > { > tty_insert_flip_char(&vc->port, ch, 0); > tty_schedule_flip(&vc->port); > } Ugh, really? We have to start telling gcc not to be stupid here? That's not going to be easy, and will just entail us doing this all over the place, right? The code isn't asking to be inlined, so why is gcc allowing it to be done that way? Doesn't that imply gcc is the problem here? thanks, greg k-h
On Thu, Jun 15, 2017 at 06:52:21AM +0200, Greg Kroah-Hartman wrote: > On Wed, Jun 14, 2017 at 11:15:38PM +0200, Arnd Bergmann wrote: > > As reported by kernelci, some functions in the VT code use significant > > amounts of kernel stack when local variables get inlined into the caller > > multiple times: > > > > drivers/tty/vt/keyboard.c: In function 'kbd_keycode': > > drivers/tty/vt/keyboard.c:1452:1: error: the frame size of 2240 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] > > > > Annotating those functions as noinline_if_stackbloat prevents the inlining > > and reduces the overall stack usage in this driver. > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > --- > > drivers/tty/vt/keyboard.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/tty/vt/keyboard.c b/drivers/tty/vt/keyboard.c > > index f4166263bb3a..c0d111444a0e 100644 > > --- a/drivers/tty/vt/keyboard.c > > +++ b/drivers/tty/vt/keyboard.c > > @@ -301,13 +301,13 @@ int kbd_rate(struct kbd_repeat *rpt) > > /* > > * Helper Functions. > > */ > > -static void put_queue(struct vc_data *vc, int ch) > > +static noinline_if_stackbloat void put_queue(struct vc_data *vc, int ch) > > { > > tty_insert_flip_char(&vc->port, ch, 0); > > tty_schedule_flip(&vc->port); > > } > > Ugh, really? We have to start telling gcc not to be stupid here? > That's not going to be easy, and will just entail us doing this all over > the place, right? > > The code isn't asking to be inlined, so why is gcc allowing it to be > done that way? Doesn't that imply gcc is the problem here? Wait, you are now, in this patch, _asking_ for it to be inlined. How is that solving anything? greg k-h
On Thu, Jun 15, 2017 at 6:53 AM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Thu, Jun 15, 2017 at 06:52:21AM +0200, Greg Kroah-Hartman wrote: >> On Wed, Jun 14, 2017 at 11:15:38PM +0200, Arnd Bergmann wrote: >> > As reported by kernelci, some functions in the VT code use significant >> > amounts of kernel stack when local variables get inlined into the caller >> > multiple times: >> > >> > drivers/tty/vt/keyboard.c: In function 'kbd_keycode': >> > drivers/tty/vt/keyboard.c:1452:1: error: the frame size of 2240 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] >> > >> > Annotating those functions as noinline_if_stackbloat prevents the inlining >> > and reduces the overall stack usage in this driver. >> > >> > Signed-off-by: Arnd Bergmann <arnd@arndb.de> >> > --- >> > drivers/tty/vt/keyboard.c | 6 +++--- >> > 1 file changed, 3 insertions(+), 3 deletions(-) >> > >> > diff --git a/drivers/tty/vt/keyboard.c b/drivers/tty/vt/keyboard.c >> > index f4166263bb3a..c0d111444a0e 100644 >> > --- a/drivers/tty/vt/keyboard.c >> > +++ b/drivers/tty/vt/keyboard.c >> > @@ -301,13 +301,13 @@ int kbd_rate(struct kbd_repeat *rpt) >> > /* >> > * Helper Functions. >> > */ >> > -static void put_queue(struct vc_data *vc, int ch) >> > +static noinline_if_stackbloat void put_queue(struct vc_data *vc, int ch) >> > { >> > tty_insert_flip_char(&vc->port, ch, 0); >> > tty_schedule_flip(&vc->port); >> > } >> >> Ugh, really? We have to start telling gcc not to be stupid here? >> That's not going to be easy, and will just entail us doing this all over >> the place, right? >> >> The code isn't asking to be inlined, so why is gcc allowing it to be >> done that way? Doesn't that imply gcc is the problem here? > > Wait, you are now, in this patch, _asking_ for it to be inlined. How is > that solving anything? The three functions that gain the attribute are all those that gcc decided to inline for itself. Usually gcc makes reasonable inlining decisions, so I left the existing behavior my marking them as 'inline' without CONFIG_KASAN and 'noinline' when KASAN is enabled. Would you rather see this patch instead? This is just as good at eliminating the crazy stack usage in vt/keyboard.o, but it will also impact all other users of that function. Arnddiff --git a/include/linux/tty_flip.h b/include/linux/tty_flip.h index c28dd523f96e..25348c5ffcb7 100644 --- a/include/linux/tty_flip.h +++ b/include/linux/tty_flip.h @@ -13,8 +13,8 @@ extern int tty_prepare_flip_string(struct tty_port *port, extern void tty_flip_buffer_push(struct tty_port *port); void tty_schedule_flip(struct tty_port *port); -static inline int tty_insert_flip_char(struct tty_port *port, - unsigned char ch, char flag) +static noinline_if_stackbloat int +tty_insert_flip_char(struct tty_port *port, unsigned char ch, char flag) { struct tty_buffer *tb = port->buf.tail; int change;
On Fri, Jun 16, 2017 at 02:01:57PM +0200, Arnd Bergmann wrote: > On Thu, Jun 15, 2017 at 6:53 AM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > On Thu, Jun 15, 2017 at 06:52:21AM +0200, Greg Kroah-Hartman wrote: > >> On Wed, Jun 14, 2017 at 11:15:38PM +0200, Arnd Bergmann wrote: > >> > As reported by kernelci, some functions in the VT code use significant > >> > amounts of kernel stack when local variables get inlined into the caller > >> > multiple times: > >> > > >> > drivers/tty/vt/keyboard.c: In function 'kbd_keycode': > >> > drivers/tty/vt/keyboard.c:1452:1: error: the frame size of 2240 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] > >> > > >> > Annotating those functions as noinline_if_stackbloat prevents the inlining > >> > and reduces the overall stack usage in this driver. > >> > > >> > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > >> > --- > >> > drivers/tty/vt/keyboard.c | 6 +++--- > >> > 1 file changed, 3 insertions(+), 3 deletions(-) > >> > > >> > diff --git a/drivers/tty/vt/keyboard.c b/drivers/tty/vt/keyboard.c > >> > index f4166263bb3a..c0d111444a0e 100644 > >> > --- a/drivers/tty/vt/keyboard.c > >> > +++ b/drivers/tty/vt/keyboard.c > >> > @@ -301,13 +301,13 @@ int kbd_rate(struct kbd_repeat *rpt) > >> > /* > >> > * Helper Functions. > >> > */ > >> > -static void put_queue(struct vc_data *vc, int ch) > >> > +static noinline_if_stackbloat void put_queue(struct vc_data *vc, int ch) > >> > { > >> > tty_insert_flip_char(&vc->port, ch, 0); > >> > tty_schedule_flip(&vc->port); > >> > } > >> > >> Ugh, really? We have to start telling gcc not to be stupid here? > >> That's not going to be easy, and will just entail us doing this all over > >> the place, right? > >> > >> The code isn't asking to be inlined, so why is gcc allowing it to be > >> done that way? Doesn't that imply gcc is the problem here? > > > > Wait, you are now, in this patch, _asking_ for it to be inlined. How is > > that solving anything? > > The three functions that gain the attribute are all those that gcc decided > to inline for itself. Usually gcc makes reasonable inlining decisions, so > I left the existing behavior my marking them as 'inline' without > CONFIG_KASAN and 'noinline' when KASAN is enabled. But why should we have to care about this? If gcc wanted to inline them, and it did so in a way that blows up the stack, that would be a gcc bug, right? Why do I have to tell gcc "don't inline", when really, I never told it to inline it in the first place? > Would you rather see this patch instead? > > diff --git a/include/linux/tty_flip.h b/include/linux/tty_flip.h > index c28dd523f96e..25348c5ffcb7 100644 > --- a/include/linux/tty_flip.h > +++ b/include/linux/tty_flip.h > @@ -13,8 +13,8 @@ extern int tty_prepare_flip_string(struct tty_port *port, > extern void tty_flip_buffer_push(struct tty_port *port); > void tty_schedule_flip(struct tty_port *port); > > -static inline int tty_insert_flip_char(struct tty_port *port, > - unsigned char ch, char flag) > +static noinline_if_stackbloat int > +tty_insert_flip_char(struct tty_port *port, unsigned char ch, char flag) > { > struct tty_buffer *tb = port->buf.tail; > int change; > > This is just as good at eliminating the crazy stack usage in vt/keyboard.o, > but it will also impact all other users of that function. How is this function blowing up the stack? We have 2 variables being added, that's it. Are we really that low on stack that 2 words is too much? And no, we shouldn't need to do this. It sounds like ksan is the problem here... thanks, greg k-h
On Fri, Jun 16, 2017 at 3:02 PM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Fri, Jun 16, 2017 at 02:01:57PM +0200, Arnd Bergmann wrote: >> On Thu, Jun 15, 2017 at 6:53 AM, Greg Kroah-Hartman >> <gregkh@linuxfoundation.org> wrote: >> > On Thu, Jun 15, 2017 at 06:52:21AM +0200, Greg Kroah-Hartman wrote: >> >> On Wed, Jun 14, 2017 at 11:15:38PM +0200, Arnd Bergmann wrote: >> >> > -static void put_queue(struct vc_data *vc, int ch) >> >> > +static noinline_if_stackbloat void put_queue(struct vc_data *vc, int ch) >> >> > { >> >> > tty_insert_flip_char(&vc->port, ch, 0); >> >> > tty_schedule_flip(&vc->port); >> >> > } >> >> >> >> Ugh, really? We have to start telling gcc not to be stupid here? >> >> That's not going to be easy, and will just entail us doing this all over >> >> the place, right? >> >> >> >> The code isn't asking to be inlined, so why is gcc allowing it to be >> >> done that way? Doesn't that imply gcc is the problem here? >> > >> > Wait, you are now, in this patch, _asking_ for it to be inlined. How is >> > that solving anything? >> >> The three functions that gain the attribute are all those that gcc decided >> to inline for itself. Usually gcc makes reasonable inlining decisions, so >> I left the existing behavior my marking them as 'inline' without >> CONFIG_KASAN and 'noinline' when KASAN is enabled. > > But why should we have to care about this? If gcc wanted to inline > them, and it did so in a way that blows up the stack, that would be a > gcc bug, right? Why do I have to tell gcc "don't inline", when really, > I never told it to inline it in the first place? I don't think gcc takes stack size into account when making the inlining decisions. Without the address sanitizer, inlining won't normally have any negative effects on the overall stack size, and may even help save a few bytes for the caller-saved registers. You could argue that the gcc inlining algorithm is buggy in combination with kasan, but what does that help you? In most instances of this problem, we actually force the inlining (see the other patches in this series), so making gcc smarter would not help much either. >> Would you rather see this patch instead? >> >> diff --git a/include/linux/tty_flip.h b/include/linux/tty_flip.h >> index c28dd523f96e..25348c5ffcb7 100644 >> --- a/include/linux/tty_flip.h >> +++ b/include/linux/tty_flip.h >> @@ -13,8 +13,8 @@ extern int tty_prepare_flip_string(struct tty_port *port, >> extern void tty_flip_buffer_push(struct tty_port *port); >> void tty_schedule_flip(struct tty_port *port); >> >> -static inline int tty_insert_flip_char(struct tty_port *port, >> - unsigned char ch, char flag) >> +static noinline_if_stackbloat int >> +tty_insert_flip_char(struct tty_port *port, unsigned char ch, char flag) >> { >> struct tty_buffer *tb = port->buf.tail; >> int change; >> >> This is just as good at eliminating the crazy stack usage in vt/keyboard.o, >> but it will also impact all other users of that function. > > How is this function blowing up the stack? We have 2 variables being > added, that's it. Are we really that low on stack that 2 words is too > much? The 'tb' and 'change' variables don't hurt here, they just get optimized away. The problem are the 'ch' and 'flag' variables that are passed into tty_insert_flip_char by value, and from there into tty_insert_flip_string_flags by reference. In this case, kasan tries to detect whether tty_insert_flip_string_flags() does any out-of-bounds access on the pointers and adds 64 bytes redzone around each of the two variables. gcc-6.3.1 happens to inline 16 calls of tty_insert_flip_char() into kbd_keycode(), so the stack size grows from 168 bytes to 168+(16*2*64) = 2216 bytes. There are 10 calls to put_queue() in to_utf8(), 12 in emulate_raw() and another 4 in kbd_keycode() itself. On ARM64, it happens to decide differently and presumably doesn't inline tty_insert_flip_char() and kbd_keycode() into kbd_keycode(), so the maximum stack size isn't as bad, but the problem still exists. > And no, we shouldn't need to do this. It sounds like ksan is the > problem here... Of course kasan is the problem, but it really just does whatever we asked it to do, and cannot do any better as long as we inline many copies of tty_insert_flip_char() into kbd_keycode(). Arnd
Arnd Bergmann, on ven. 16 juin 2017 17:41:47 +0200, wrote: > The problem are the 'ch' and 'flag' variables that are passed into > tty_insert_flip_char by value, and from there into > tty_insert_flip_string_flags by reference. In this case, kasan tries > to detect whether tty_insert_flip_string_flags() does any out-of-bounds > access on the pointers and adds 64 bytes redzone around each of > the two variables. Ouch. > gcc-6.3.1 happens to inline 16 calls of tty_insert_flip_char() into > kbd_keycode(), so the stack size grows from 168 bytes to > 168+(16*2*64) = 2216 bytes. There are 10 calls to put_queue() > in to_utf8(), 12 in emulate_raw() and another 4 in kbd_keycode() > itself. That's why I agreed for put_queue :) I'm however afraid we'd have to mark a lot of static functions that way, depending on the aggressivity of gcc... I'd indeed really argue that gcc should consider stack usage when inlining. static int f(int foo) { char c[256]; g(c, foo); } is really not something that I'd want to see the compiler to inline. > > And no, we shouldn't need to do this. It sounds like ksan is the > > problem here... > > Of course kasan is the problem, but it really just does whatever we > asked it to do, and cannot do any better as long as we inline many > copies of tty_insert_flip_char() into kbd_keycode(). We didn't ask to inline put_queue into kbd_keycode. Samuel
On 06/16/2017 06:41 PM, Arnd Bergmann wrote: > On Fri, Jun 16, 2017 at 3:02 PM, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: >> On Fri, Jun 16, 2017 at 02:01:57PM +0200, Arnd Bergmann wrote: >>> On Thu, Jun 15, 2017 at 6:53 AM, Greg Kroah-Hartman >>> <gregkh@linuxfoundation.org> wrote: >>>> On Thu, Jun 15, 2017 at 06:52:21AM +0200, Greg Kroah-Hartman wrote: >>>>> On Wed, Jun 14, 2017 at 11:15:38PM +0200, Arnd Bergmann wrote: >>>>>> -static void put_queue(struct vc_data *vc, int ch) >>>>>> +static noinline_if_stackbloat void put_queue(struct vc_data *vc, int ch) >>>>>> { >>>>>> tty_insert_flip_char(&vc->port, ch, 0); >>>>>> tty_schedule_flip(&vc->port); >>>>>> } >>>>> >>>>> Ugh, really? We have to start telling gcc not to be stupid here? >>>>> That's not going to be easy, and will just entail us doing this all over >>>>> the place, right? >>>>> >>>>> The code isn't asking to be inlined, so why is gcc allowing it to be >>>>> done that way? Doesn't that imply gcc is the problem here? >>>> >>>> Wait, you are now, in this patch, _asking_ for it to be inlined. How is >>>> that solving anything? >>> >>> The three functions that gain the attribute are all those that gcc decided >>> to inline for itself. Usually gcc makes reasonable inlining decisions, so >>> I left the existing behavior my marking them as 'inline' without >>> CONFIG_KASAN and 'noinline' when KASAN is enabled. >> >> But why should we have to care about this? If gcc wanted to inline >> them, and it did so in a way that blows up the stack, that would be a >> gcc bug, right? Why do I have to tell gcc "don't inline", when really, >> I never told it to inline it in the first place? > > I don't think gcc takes stack size into account when making the inlining > decisions. Without the address sanitizer, inlining won't normally have > any negative effects on the overall stack size, and may even help save > a few bytes for the caller-saved registers. > Well, in fact it should take stack into account. Gcc even has following params: large-stack-frame The limit specifying large stack frames. While inlining the algorithm is trying to not grow past this limit too much. The default value is 256 bytes. large-stack-frame-growth Specifies maximal growth of large stack frames caused by inlining in percents. The default value is 1000 which limits large stack frame growth to 11 times the original size. However, I've tried both and even with minimal values gcc continued to make bad decisions. Note, if you're going to try it, make sure that you have CONFIG_OPTIMIZE_INLINING=y because otherwise 'inline' becomes 'always_inline' > You could argue that the gcc inlining algorithm is buggy in combination > with kasan, but what does that help you? In most instances of this > problem, we actually force the inlining (see the other patches in this > series), so making gcc smarter would not help much either. > >>> Would you rather see this patch instead? >>> >>> diff --git a/include/linux/tty_flip.h b/include/linux/tty_flip.h >>> index c28dd523f96e..25348c5ffcb7 100644 >>> --- a/include/linux/tty_flip.h >>> +++ b/include/linux/tty_flip.h >>> @@ -13,8 +13,8 @@ extern int tty_prepare_flip_string(struct tty_port *port, >>> extern void tty_flip_buffer_push(struct tty_port *port); >>> void tty_schedule_flip(struct tty_port *port); >>> >>> -static inline int tty_insert_flip_char(struct tty_port *port, >>> - unsigned char ch, char flag) >>> +static noinline_if_stackbloat int >>> +tty_insert_flip_char(struct tty_port *port, unsigned char ch, char flag) >>> { >>> struct tty_buffer *tb = port->buf.tail; >>> int change; >>> >>> This is just as good at eliminating the crazy stack usage in vt/keyboard.o, >>> but it will also impact all other users of that function. >> >> How is this function blowing up the stack? We have 2 variables being >> added, that's it. Are we really that low on stack that 2 words is too >> much? > > The 'tb' and 'change' variables don't hurt here, they just get optimized > away. The problem are the 'ch' and 'flag' variables that are passed into > tty_insert_flip_char by value, and from there into > tty_insert_flip_string_flags by reference. In this case, kasan tries > to detect whether tty_insert_flip_string_flags() does any out-of-bounds > access on the pointers and adds 64 bytes redzone around each of > the two variables. > > gcc-6.3.1 happens to inline 16 calls of tty_insert_flip_char() into > kbd_keycode(), so the stack size grows from 168 bytes to > 168+(16*2*64) = 2216 bytes. There are 10 calls to put_queue() > in to_utf8(), 12 in emulate_raw() and another 4 in kbd_keycode() > itself. On ARM64, it happens to decide differently and presumably > doesn't inline tty_insert_flip_char() and kbd_keycode() into > kbd_keycode(), so the maximum stack size isn't as bad, but > the problem still exists. > >> And no, we shouldn't need to do this. It sounds like ksan is the >> problem here... > > Of course kasan is the problem, but it really just does whatever we > asked it to do, and cannot do any better as long as we inline many > copies of tty_insert_flip_char() into kbd_keycode(). > > Arnd >
On Fri, Jun 16, 2017 at 8:58 AM, Samuel Thibault <samuel.thibault@ens-lyon.org> wrote: > Arnd Bergmann, on ven. 16 juin 2017 17:41:47 +0200, wrote: >> The problem are the 'ch' and 'flag' variables that are passed into >> tty_insert_flip_char by value, and from there into >> tty_insert_flip_string_flags by reference. In this case, kasan tries >> to detect whether tty_insert_flip_string_flags() does any out-of-bounds >> access on the pointers and adds 64 bytes redzone around each of >> the two variables. > > Ouch. > >> gcc-6.3.1 happens to inline 16 calls of tty_insert_flip_char() into I wonder if we should stop marking tty_insert_flip_char() as inline. >> kbd_keycode(), so the stack size grows from 168 bytes to >> 168+(16*2*64) = 2216 bytes. There are 10 calls to put_queue() >> in to_utf8(), 12 in emulate_raw() and another 4 in kbd_keycode() >> itself. > > That's why I agreed for put_queue :) > > I'm however afraid we'd have to mark a lot of static functions that way, > depending on the aggressivity of gcc... I'd indeed really argue that gcc > should consider stack usage when inlining. > > static int f(int foo) { > char c[256]; > g(c, foo); > } > > is really not something that I'd want to see the compiler to inline. Why would not we want it be inlined? What we do not want us several calls having _separate_ instances of 'c' generated on the stack, all inlined calls should share 'c'. And of course if we have f1, f2, and f3 with c1, c2, and c3, GCC should not blow up the stack inlining and allocating stack for all 3 of them beforehand. But this all seems to me issue that should be solved in toolchain, not trying to play whack-a-mole with kernel sources. Thanks. -- Dmitry
On Fri, Jun 16, 2017 at 7:29 PM, Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > On Fri, Jun 16, 2017 at 8:58 AM, Samuel Thibault > <samuel.thibault@ens-lyon.org> wrote: >> Arnd Bergmann, on ven. 16 juin 2017 17:41:47 +0200, wrote: >>> The problem are the 'ch' and 'flag' variables that are passed into >>> tty_insert_flip_char by value, and from there into >>> tty_insert_flip_string_flags by reference. In this case, kasan tries >>> to detect whether tty_insert_flip_string_flags() does any out-of-bounds >>> access on the pointers and adds 64 bytes redzone around each of >>> the two variables. >> >> Ouch. >> >>> gcc-6.3.1 happens to inline 16 calls of tty_insert_flip_char() into > > I wonder if we should stop marking tty_insert_flip_char() as inline. That would be an easy solution, yes. tty_insert_flip_char() was apparently meant to be optimized for the fast path to completely avoid calling into another function, but that fast path got a bit more complex with commit acc0f67f307f ("tty: Halve flip buffer GFP_ATOMIC memory consumption"). If we move it out of line, the fast path optimization goes away and we could just have a simple implementation like int tty_insert_flip_char(struct tty_port *port, unsigned char ch, char flag) { struct tty_buffer *tb = port->buf.tail; int flags = (flag == TTY_NORMAL) ? TTYB_NORMAL : 0; if (!__tty_buffer_request_room(port, 1, flags)) return 0; if (~tb->flags & TTYB_NORMAL) *flag_buf_ptr(tb, tb->used) = flag; *char_buf_ptr(tb, tb->used++) = ch; return 1; } One rather simple change I found would actually avoid the warning and would seem to actually give us better runtime behavior even without KASAN: This reduces the stack frame size for kbd_event() to 1256 bytes, which is well within the limit, and it lets us keep the flag-less buffers across a 'tb->used >= tb->size' condition. Calling into tty_insert_flip_string_flags() today will allocate a flag buffer if there isn't already one, even when it is not needed. >> I'm however afraid we'd have to mark a lot of static functions that way, >> depending on the aggressivity of gcc... I'd indeed really argue that gcc >> should consider stack usage when inlining. >> >> static int f(int foo) { >> char c[256]; >> g(c, foo); >> } >> >> is really not something that I'd want to see the compiler to inline. > > Why would not we want it be inlined? What we do not want us several > calls having _separate_ instances of 'c' generated on the stack, all > inlined calls should share 'c'. And of course if we have f1, f2, and > f3 with c1, c2, and c3, GCC should not blow up the stack inlining and > allocating stack for all 3 of them beforehand. > > But this all seems to me issue that should be solved in toolchain, not > trying to play whack-a-mole with kernel sources. The problem for the Samuel's example is that a) the "--param asan-stack=1" option in KASAN does blow up the stack, which is why the annotation is now called 'noinline_if_stackbloat'. b) The toolchain cannot solve the problem, as most instances of the problem (unlike kbd_put_queue) force the inlining unless you build with the x86-specific CONFIG_OPTIMIZE_INLINING. Arnddiff --git a/include/linux/tty_flip.h b/include/linux/tty_flip.h index c28dd523f96e..15d03a14ad0f 100644 --- a/include/linux/tty_flip.h +++ b/include/linux/tty_flip.h @@ -26,7 +26,7 @@ static inline int tty_insert_flip_char(struct tty_port *port, *char_buf_ptr(tb, tb->used++) = ch; return 1; } - return tty_insert_flip_string_flags(port, &ch, &flag, 1); + return tty_insert_flip_string_fixed_flag(port, &ch, flag, 1); } static inline int tty_insert_flip_string(struct tty_port *port,
On Fri, Jun 16, 2017 at 1:56 PM, Arnd Bergmann <arnd@arndb.de> wrote: > On Fri, Jun 16, 2017 at 7:29 PM, Dmitry Torokhov > <dmitry.torokhov@gmail.com> wrote: >> On Fri, Jun 16, 2017 at 8:58 AM, Samuel Thibault >> <samuel.thibault@ens-lyon.org> wrote: >>> I'm however afraid we'd have to mark a lot of static functions that way, >>> depending on the aggressivity of gcc... I'd indeed really argue that gcc >>> should consider stack usage when inlining. >>> >>> static int f(int foo) { >>> char c[256]; >>> g(c, foo); >>> } >>> >>> is really not something that I'd want to see the compiler to inline. >> >> Why would not we want it be inlined? What we do not want us several >> calls having _separate_ instances of 'c' generated on the stack, all >> inlined calls should share 'c'. And of course if we have f1, f2, and >> f3 with c1, c2, and c3, GCC should not blow up the stack inlining and >> allocating stack for all 3 of them beforehand. >> >> But this all seems to me issue that should be solved in toolchain, not >> trying to play whack-a-mole with kernel sources. > > The problem for the Samuel's example is that > > a) the "--param asan-stack=1" option in KASAN does blow up the > stack, which is why the annotation is now called 'noinline_if_stackbloat'. > > b) The toolchain cannot solve the problem, as most instances of the > problem (unlike kbd_put_queue) force the inlining unless you build > with the x86-specific CONFIG_OPTIMIZE_INLINING. If inlining done right there should be no change in stack size, because if calls are not inlined then stack storage is "shared" between calls, and it should similarly be shared when calls are inlined. And that is toolchain issue. -- Dmitry
diff --git a/drivers/tty/vt/keyboard.c b/drivers/tty/vt/keyboard.c index f4166263bb3a..c0d111444a0e 100644 --- a/drivers/tty/vt/keyboard.c +++ b/drivers/tty/vt/keyboard.c @@ -301,13 +301,13 @@ int kbd_rate(struct kbd_repeat *rpt) /* * Helper Functions. */ -static void put_queue(struct vc_data *vc, int ch) +static noinline_if_stackbloat void put_queue(struct vc_data *vc, int ch) { tty_insert_flip_char(&vc->port, ch, 0); tty_schedule_flip(&vc->port); } -static void puts_queue(struct vc_data *vc, char *cp) +static noinline_if_stackbloat void puts_queue(struct vc_data *vc, char *cp) { while (*cp) { tty_insert_flip_char(&vc->port, *cp, 0); @@ -555,7 +555,7 @@ static void fn_inc_console(struct vc_data *vc) set_console(i); } -static void fn_send_intr(struct vc_data *vc) +static noinline_if_stackbloat void fn_send_intr(struct vc_data *vc) { tty_insert_flip_char(&vc->port, 0, TTY_BREAK); tty_schedule_flip(&vc->port);
As reported by kernelci, some functions in the VT code use significant amounts of kernel stack when local variables get inlined into the caller multiple times: drivers/tty/vt/keyboard.c: In function 'kbd_keycode': drivers/tty/vt/keyboard.c:1452:1: error: the frame size of 2240 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Annotating those functions as noinline_if_stackbloat prevents the inlining and reduces the overall stack usage in this driver. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- drivers/tty/vt/keyboard.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- 2.9.0