Message ID | 20180116162905.1404585-1-arnd@arndb.de |
---|---|
State | New |
Headers | show |
Series | ARM: make memzero optimization smarter | expand |
On Tue, 16 Jan 2018, Arnd Bergmann wrote: > While testing with a gcc-8.0.0 snapshot, I ran into a harmless > build warning: > > In file included from include/linux/string.h:18:0, > ... > from drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:10: > drivers/net/ethernet/hisilicon/hns/hns_ethtool.c: In function '__lb_other_process': > arch/arm/include/asm/string.h:50:5: error: 'memset' specified size 4294967295 exceeds maximum object size 2147483647 [-Werror=stringop-overflow=] > memset((__p),(v),(__n)); \ > ^~~~~~~~~~~~~~~~~~~~~~~ > drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:394:3: note: in expansion of macro 'memset' > memset(&skb->data[frame_size / 2], 0xAA, frame_size / 2 - 1); > ^~~~~~ > > I think the warning is unintentional here, and gcc should not actually > warn, so I reported this in the gcc bugzilla as pr82103. From the > discussion there, it seems unlikely that this gets addressed in > the compiler. > > The problem here is that testing the 'frame_size' variable for non-zero > in the first memset() macro invocation leads to a code path in which > gcc thinks it may be zero, and that code path would lead to an overly > large length for the following memset that is now "(u32)-1". We know > this won't happen as the skb len is already guaranteed to be nonzero > when we get here (it has just been allocated with a nonzero size). > > However, we can avoid this class of bogus warnings for the memset() macro > by only doing the micro-optimization for zero-length arguments when the > length is a compile-time constant. This should also reduce code size by > a few bytes, and avoid an extra branch for the cases that a variable-length > argument is always nonzero, which is probably the common case anyway. > > I have made sure that the __memzero implementation can safely handle > a zero length argument. Why not simply drop the test on (__n) != 0 then? I fail to see what the advantage is in that case. > Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82103 > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > Originally sent in September 2017, but then forgot about it > as nobody replied. > > I slightly updated the change text now to reflect that the gcc > developers treat it as an invalid bug. > --- > arch/arm/include/asm/string.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm/include/asm/string.h b/arch/arm/include/asm/string.h > index f54a3136aac6..a8b90e33cc87 100644 > --- a/arch/arm/include/asm/string.h > +++ b/arch/arm/include/asm/string.h > @@ -44,7 +44,7 @@ extern void __memzero(void *ptr, __kernel_size_t n); > #define memset(p,v,n) \ > ({ \ > void *__p = (p); size_t __n = n; \ > - if ((__n) != 0) { \ > + if (!__builtin_constant_p(__n) || (__n) != 0) { \ > if (__builtin_constant_p((v)) && (v) == 0) \ > __memzero((__p),(__n)); \ > else \ > -- > 2.9.0 > >
On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > On Tue, 16 Jan 2018, Arnd Bergmann wrote: > >> While testing with a gcc-8.0.0 snapshot, I ran into a harmless >> build warning: >> >> In file included from include/linux/string.h:18:0, >> ... >> from drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:10: >> drivers/net/ethernet/hisilicon/hns/hns_ethtool.c: In function '__lb_other_process': >> arch/arm/include/asm/string.h:50:5: error: 'memset' specified size 4294967295 exceeds maximum object size 2147483647 [-Werror=stringop-overflow=] >> memset((__p),(v),(__n)); \ >> ^~~~~~~~~~~~~~~~~~~~~~~ >> drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:394:3: note: in expansion of macro 'memset' >> memset(&skb->data[frame_size / 2], 0xAA, frame_size / 2 - 1); >> ^~~~~~ >> >> I think the warning is unintentional here, and gcc should not actually >> warn, so I reported this in the gcc bugzilla as pr82103. From the >> discussion there, it seems unlikely that this gets addressed in >> the compiler. >> >> The problem here is that testing the 'frame_size' variable for non-zero >> in the first memset() macro invocation leads to a code path in which >> gcc thinks it may be zero, and that code path would lead to an overly >> large length for the following memset that is now "(u32)-1". We know >> this won't happen as the skb len is already guaranteed to be nonzero >> when we get here (it has just been allocated with a nonzero size). >> >> However, we can avoid this class of bogus warnings for the memset() macro >> by only doing the micro-optimization for zero-length arguments when the >> length is a compile-time constant. This should also reduce code size by >> a few bytes, and avoid an extra branch for the cases that a variable-length >> argument is always nonzero, which is probably the common case anyway. >> >> I have made sure that the __memzero implementation can safely handle >> a zero length argument. > > Why not simply drop the test on (__n) != 0 then? I fail to see what the > advantage is in that case. Good point. We might actually get even better results by dropping the __memzero path entirely, since gcc has can optimize trivial memset() operations and inline them. If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr' instructions compared to the memset.S implementation, but calling memset() rather than __memzero() from C code ends up saving a function call at least some of the time. Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero() and 636 calls to memset() in vmlinux. If I remove the macro entirely, I get 1775 calls to memset() instead, so 780 memzero instances got inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the __memzero implementation that we could possibly also drop. FWIW, the zero-length check saves five references to __memzero() and one reference to memset(), or 16 bytes in kernel size, I have not checked what those are. Arnd
On Tue, 16 Jan 2018, Arnd Bergmann wrote: > On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > > On Tue, 16 Jan 2018, Arnd Bergmann wrote: > > > >> However, we can avoid this class of bogus warnings for the memset() macro > >> by only doing the micro-optimization for zero-length arguments when the > >> length is a compile-time constant. This should also reduce code size by > >> a few bytes, and avoid an extra branch for the cases that a variable-length > >> argument is always nonzero, which is probably the common case anyway. > >> > >> I have made sure that the __memzero implementation can safely handle > >> a zero length argument. > > > > Why not simply drop the test on (__n) != 0 then? I fail to see what the > > advantage is in that case. > > Good point. We might actually get even better results by dropping the > __memzero path entirely, since gcc has can optimize trivial memset() > operations and inline them. > > If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr' > instructions compared to the memset.S implementation, but calling > memset() rather than __memzero() from C code ends up saving a > function call at least some of the time. > > Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero() > and 636 calls to memset() in vmlinux. If I remove the macro entirely, > I get 1775 calls to memset() instead, so 780 memzero instances got > inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the > __memzero implementation that we could possibly also drop. I get 3668 fewer bytes just by removing the test against 0 in the macro. And an additional 5092 fewer bytes by removing the call-to-__memzero optimization. That's using gcc v6.3.1. > FWIW, the zero-length check saves five references to __memzero() > and one reference to memset(), or 16 bytes in kernel size, I have not > checked what those are. They apparently are: security/keys/key.c:1117:2: memset(&ktype->lock_class, 0, sizeof(ktype->lock_class)); crypto/drbg.c:615:3: memset(drbg->V, 1, drbg_statelen(drbg)); crypto/drbg.c:1120:3: memset(drbg->V, 0, drbg_statelen(drbg)); crypto/drbg.c:1121:3: memset(drbg->C, 0, drbg_statelen(drbg)); drivers/crypto/bcm/cipher.c:1963:2: memset(ctx->bcm_spu_req_hdr, 0, alloc_len); drivers/media/platform/vivid/vivid-vbi-cap.c:106:2: memset(vbuf, 0x10, vb2_plane_size(&buf->vb.vb2_buf, 0)); drivers/media/platform/vivid/vivid-vbi-cap.c:127:2: memset(vbuf, 0, vb2_plane_size(&buf->vb.vb2_buf, 0)); drivers/gpu/drm/nouveau/nvkm/subdev/bios/conn.c:50:2: memset(info, 0x00, sizeof(*info)); Nicolas
On Tue, Jan 16, 2018 at 11:07:34PM -0500, Nicolas Pitre wrote: > On Tue, 16 Jan 2018, Arnd Bergmann wrote: > > > On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > > > On Tue, 16 Jan 2018, Arnd Bergmann wrote: > > > > > >> However, we can avoid this class of bogus warnings for the memset() macro > > >> by only doing the micro-optimization for zero-length arguments when the > > >> length is a compile-time constant. This should also reduce code size by > > >> a few bytes, and avoid an extra branch for the cases that a variable-length > > >> argument is always nonzero, which is probably the common case anyway. > > >> > > >> I have made sure that the __memzero implementation can safely handle > > >> a zero length argument. > > > > > > Why not simply drop the test on (__n) != 0 then? I fail to see what the > > > advantage is in that case. > > > > Good point. We might actually get even better results by dropping the > > __memzero path entirely, since gcc has can optimize trivial memset() > > operations and inline them. > > > > If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr' > > instructions compared to the memset.S implementation, but calling > > memset() rather than __memzero() from C code ends up saving a > > function call at least some of the time. > > > > Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero() > > and 636 calls to memset() in vmlinux. If I remove the macro entirely, > > I get 1775 calls to memset() instead, so 780 memzero instances got > > inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the > > __memzero implementation that we could possibly also drop. > > I get 3668 fewer bytes just by removing the test against 0 in the macro. > > And an additional 5092 fewer bytes by removing the call-to-__memzero > optimization. However, __memzero is not safe against being called with a zero length so it's not something we can simply remove. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
On Wed, 17 Jan 2018, Russell King - ARM Linux wrote: > On Tue, Jan 16, 2018 at 11:07:34PM -0500, Nicolas Pitre wrote: > > On Tue, 16 Jan 2018, Arnd Bergmann wrote: > > > > > On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > > > > On Tue, 16 Jan 2018, Arnd Bergmann wrote: > > > > > > > >> However, we can avoid this class of bogus warnings for the memset() macro > > > >> by only doing the micro-optimization for zero-length arguments when the > > > >> length is a compile-time constant. This should also reduce code size by > > > >> a few bytes, and avoid an extra branch for the cases that a variable-length > > > >> argument is always nonzero, which is probably the common case anyway. > > > >> > > > >> I have made sure that the __memzero implementation can safely handle > > > >> a zero length argument. > > > > > > > > Why not simply drop the test on (__n) != 0 then? I fail to see what the > > > > advantage is in that case. > > > > > > Good point. We might actually get even better results by dropping the > > > __memzero path entirely, since gcc has can optimize trivial memset() > > > operations and inline them. > > > > > > If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr' > > > instructions compared to the memset.S implementation, but calling > > > memset() rather than __memzero() from C code ends up saving a > > > function call at least some of the time. > > > > > > Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero() > > > and 636 calls to memset() in vmlinux. If I remove the macro entirely, > > > I get 1775 calls to memset() instead, so 780 memzero instances got > > > inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the > > > __memzero implementation that we could possibly also drop. > > > > I get 3668 fewer bytes just by removing the test against 0 in the macro. > > > > And an additional 5092 fewer bytes by removing the call-to-__memzero > > optimization. > > However, __memzero is not safe against being called with a zero length > so it's not something we can simply remove. The idea is about the possibility of removing __memzero altogether. It is not clear that the tiny performance gain from a dedicated memzero implementation is worth the current overhead around it. Nicolas
On Wed, 17 Jan 2018, Nicolas Pitre wrote: > On Wed, 17 Jan 2018, Russell King - ARM Linux wrote: > > > However, __memzero is not safe against being called with a zero length > > so it's not something we can simply remove. > > The idea is about the possibility of removing __memzero altogether. > It is not clear that the tiny performance gain from a dedicated memzero > implementation is worth the current overhead around it. This being said, I fail to see how __memzero is not safe against a zero length. Are you sure it isn't? Nicolas
diff --git a/arch/arm/include/asm/string.h b/arch/arm/include/asm/string.h index f54a3136aac6..a8b90e33cc87 100644 --- a/arch/arm/include/asm/string.h +++ b/arch/arm/include/asm/string.h @@ -44,7 +44,7 @@ extern void __memzero(void *ptr, __kernel_size_t n); #define memset(p,v,n) \ ({ \ void *__p = (p); size_t __n = n; \ - if ((__n) != 0) { \ + if (!__builtin_constant_p(__n) || (__n) != 0) { \ if (__builtin_constant_p((v)) && (v) == 0) \ __memzero((__p),(__n)); \ else \
While testing with a gcc-8.0.0 snapshot, I ran into a harmless build warning: In file included from include/linux/string.h:18:0, ... from drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:10: drivers/net/ethernet/hisilicon/hns/hns_ethtool.c: In function '__lb_other_process': arch/arm/include/asm/string.h:50:5: error: 'memset' specified size 4294967295 exceeds maximum object size 2147483647 [-Werror=stringop-overflow=] memset((__p),(v),(__n)); \ ^~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:394:3: note: in expansion of macro 'memset' memset(&skb->data[frame_size / 2], 0xAA, frame_size / 2 - 1); ^~~~~~ I think the warning is unintentional here, and gcc should not actually warn, so I reported this in the gcc bugzilla as pr82103. From the discussion there, it seems unlikely that this gets addressed in the compiler. The problem here is that testing the 'frame_size' variable for non-zero in the first memset() macro invocation leads to a code path in which gcc thinks it may be zero, and that code path would lead to an overly large length for the following memset that is now "(u32)-1". We know this won't happen as the skb len is already guaranteed to be nonzero when we get here (it has just been allocated with a nonzero size). However, we can avoid this class of bogus warnings for the memset() macro by only doing the micro-optimization for zero-length arguments when the length is a compile-time constant. This should also reduce code size by a few bytes, and avoid an extra branch for the cases that a variable-length argument is always nonzero, which is probably the common case anyway. I have made sure that the __memzero implementation can safely handle a zero length argument. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82103 Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- Originally sent in September 2017, but then forgot about it as nobody replied. I slightly updated the change text now to reflect that the gcc developers treat it as an invalid bug. --- arch/arm/include/asm/string.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.9.0