Message ID | 20170106202058.GH2966@redhat.com |
---|---|
State | New |
Headers | show |
Thanks for the correction and updating the comments. -Aditya -----Original Message----- From: Jonathan Wakely [mailto:jwakely@redhat.com] Sent: Friday, January 06, 2017 2:21 PM To: Aditya Kumar Cc: libstdc++@gcc.gnu.org; gcc-patches@gcc.gnu.org; hiraditya@msn.com Subject: Re: [PATCH] improve string find algorithm On 06/01/17 08:42 -0600, Aditya Kumar wrote: >Yes, we do. >Sorry for the mistake, it happened because I first wrote this for >libcxx (https://reviews.llvm.org/D27068) and while porting that line >got missed. > >Thanks, >-Aditya > > >diff --git a/libstdc++-v3/include/bits/basic_string.tcc >b/libstdc++-v3/include/bits/basic_string.tcc >index df1e8dd..7942ee6 100644 >--- a/libstdc++-v3/include/bits/basic_string.tcc >+++ b/libstdc++-v3/include/bits/basic_string.tcc >@@ -1194,14 +1194,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > if (__n == 0) > return __pos <= __size ? __pos : npos; > >- if (__n <= __size) >- { >- for (; __pos <= __size - __n; ++__pos) >- if (traits_type::eq(__data[__pos], __s[0]) >- && traits_type::compare(__data + __pos + 1, >- __s + 1, __n - 1) == 0) >- return __pos; >- } >+ if (__n > __size) >+ return npos; >+ >+ const _CharT __elem0 = __s[0]; >+ const _CharT* __first1 = __data; >+ const _CharT* __first2 = __s; >+ const _CharT* __last1 = __data + __size; >+ ptrdiff_t __len2 = __n - __pos; What's this variable for? >+ while (true) { >+ ptrdiff_t __len1 = __last1 - __first1; >+ if (__len1 < __len2) >+ return npos; >+ >+ // Find the first match. >+ __first1 = traits_type::find(__first1, __len1 - __len2 + 1, __elem0); >+ if (__first1 == 0) >+ return npos; >+ // Compare the full string when first match is found. >+ if (traits_type::compare(__first1, __first2, __len2) == 0) >+ return __first1 - __data; >+ >+ ++__first1; >+ } >+ > return npos; > } This is still wrong, consider std::string("abcd").find("ab", 2) This should return npos, but you return 0. The postcondition for the function is that the return value is not less than the pos argument. The attached patch should be a correct version of the improved algorithm.
On 06/01/17 14:34 -0600, Aditya Kumar wrote:
>Thanks for the correction and updating the comments.
Could you try the corrected patch on your benchmarks?
The performance with the patch is worse for the first string::find
benchmark in our testsuite:
s = "aabbaabbaaxd adbffdadgaxaabbbddhatyaaaabbbaabbaabbcsy";
f = "aabbaabbc";
s.find(f);
> Could you try the corrected patch on your benchmarks? For the test-case you gave there is a regression. Benchmark Time CPU Iterations ------------------------------------------------------------------- Without the patch: BM_StringRegression 81 ns 81 ns 8503740 With the patch: BM_StringRegression 109 ns 109 ns 6346500 The real advantage is when there are fewer matches as seen in BM_StringFindNoMatch. The code for the benchmark can be found in https://github.com/llvm-mirror/libcxx/blob/master/benchmarks/string.bench.cpp However, I have written an independent std-benchmark that can be used just by exporting the CC, CXX, LD_LIBRARY_FLAGS: https://github.com/hiraditya/std-benchmark Following are the results: ---------------------------------------------------------------------------------------------------------------------------------- Without the patch: Run on (8 X 3403.85 MHz CPU s) 2017-01-06 15:47:30 ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. Benchmark Time CPU Iterations ------------------------------------------------------------------- BM_StringFindNoMatch/10 6 ns 6 ns 114499418 BM_StringFindNoMatch/64 34 ns 34 ns 20578576 BM_StringFindNoMatch/512 222 ns 222 ns 3136787 BM_StringFindNoMatch/4096 1728 ns 1729 ns 401913 BM_StringFindNoMatch/32768 13679 ns 13684 ns 50680 BM_StringFindNoMatch/131072 54570 ns 54591 ns 12506 BM_StringFindAllMatch/1 4 ns 4 ns 180640260 BM_StringFindAllMatch/8 6 ns 6 ns 119682220 BM_StringFindAllMatch/64 7 ns 7 ns 97679753 BM_StringFindAllMatch/512 19 ns 19 ns 36035174 BM_StringFindAllMatch/4096 92 ns 92 ns 7516363 BM_StringFindAllMatch/32768 849 ns 849 ns 809284 BM_StringFindAllMatch/131072 3610 ns 3611 ns 193894 BM_StringFindMatch1/1 27273 ns 27283 ns 25579 BM_StringFindMatch1/8 27289 ns 27300 ns 25516 BM_StringFindMatch1/64 27297 ns 27307 ns 25561 BM_StringFindMatch1/512 27303 ns 27314 ns 25579 BM_StringFindMatch1/4096 27488 ns 27499 ns 25366 BM_StringFindMatch1/32768 28157 ns 28168 ns 24750 BM_StringFindMatch2/1 27273 ns 27284 ns 25562 BM_StringFindMatch2/8 27296 ns 27306 ns 25555 BM_StringFindMatch2/64 27312 ns 27323 ns 25549 BM_StringFindMatch2/512 27327 ns 27338 ns 25558 BM_StringFindMatch2/4096 27513 ns 27524 ns 25349 BM_StringFindMatch2/32768 28161 ns 28172 ns 24788 BM_StringRegression 81 ns 81 ns 8503740 With the patch Run on (8 X 1071.8 MHz CPU s) 2017-01-06 16:06:29 ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. Benchmark Time CPU Iterations ------------------------------------------------------------------- BM_StringFindNoMatch/10 6 ns 6 ns 121302159 BM_StringFindNoMatch/64 7 ns 7 ns 102003502 BM_StringFindNoMatch/512 15 ns 15 ns 44820639 BM_StringFindNoMatch/4096 77 ns 77 ns 9016958 BM_StringFindNoMatch/32768 555 ns 555 ns 1227219 BM_StringFindNoMatch/131072 2688 ns 2689 ns 259488 BM_StringFindAllMatch/1 8 ns 8 ns 85893410 BM_StringFindAllMatch/8 9 ns 9 ns 80811804 BM_StringFindAllMatch/64 9 ns 9 ns 74237599 BM_StringFindAllMatch/512 23 ns 23 ns 31163379 BM_StringFindAllMatch/4096 94 ns 94 ns 7317385 BM_StringFindAllMatch/32768 847 ns 848 ns 803901 BM_StringFindAllMatch/131072 3551 ns 3552 ns 196844 BM_StringFindMatch1/1 1337 ns 1337 ns 518042 BM_StringFindMatch1/8 1338 ns 1338 ns 519431 BM_StringFindMatch1/64 1340 ns 1341 ns 513974 BM_StringFindMatch1/512 1355 ns 1356 ns 511857 BM_StringFindMatch1/4096 1489 ns 1489 ns 465629 BM_StringFindMatch1/32768 2203 ns 2204 ns 316044 BM_StringFindMatch2/1 1337 ns 1338 ns 519057 BM_StringFindMatch2/8 1337 ns 1337 ns 517745 BM_StringFindMatch2/64 1347 ns 1347 ns 514813 BM_StringFindMatch2/512 1362 ns 1363 ns 507408 BM_StringFindMatch2/4096 1491 ns 1492 ns 465979 BM_StringFindMatch2/32768 2204 ns 2205 ns 315887 BM_StringRegression 109 ns 109 ns 6346500 -Aditya From: Jonathan Wakely <jwakely@redhat.com> Sent: Friday, January 6, 2017 2:37 PM To: Aditya Kumar Cc: libstdc++@gcc.gnu.org; gcc-patches@gcc.gnu.org; hiraditya@msn.com Subject: Re: [PATCH] improve string find algorithm On 06/01/17 14:34 -0600, Aditya Kumar wrote: >Thanks for the correction and updating the comments. Could you try the corrected patch on your benchmarks? The performance with the patch is worse for the first string::find benchmark in our testsuite: s = "aabbaabbaaxd adbffdadgaxaabbbddhatyaaaabbbaabbaabbcsy"; f = "aabbaabbc"; s.find(f);
On 06/01/17 22:19 +0000, Aditya K wrote: >> Could you try the corrected patch on your benchmarks? > >For the test-case you gave there is a regression. > >Benchmark Time CPU Iterations >------------------------------------------------------------------- >Without the patch: BM_StringRegression 81 ns 81 ns 8503740 >With the patch: BM_StringRegression 109 ns 109 ns 6346500 > > >The real advantage is when there are fewer matches as seen in BM_StringFindNoMatch. The code for the benchmark can be found in https://github.com/llvm-mirror/libcxx/blob/master/benchmarks/string.bench.cpp >However, I have written an independent std-benchmark that can be used just by exporting the CC, CXX, LD_LIBRARY_FLAGS: https://github.com/hiraditya/std-benchmark I think the large improvements are worth the smaller regression, so I'm committing the patch I sent last week. >Following are the results: >---------------------------------------------------------------------------------------------------------------------------------- >Without the patch: > > >Run on (8 X 3403.85 MHz CPU s) >2017-01-06 15:47:30 >***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. >Benchmark Time CPU Iterations >------------------------------------------------------------------- >BM_StringFindNoMatch/10 6 ns 6 ns 114499418 >BM_StringFindNoMatch/64 34 ns 34 ns 20578576 >BM_StringFindNoMatch/512 222 ns 222 ns 3136787 >BM_StringFindNoMatch/4096 1728 ns 1729 ns 401913 >BM_StringFindNoMatch/32768 13679 ns 13684 ns 50680 >BM_StringFindNoMatch/131072 54570 ns 54591 ns 12506 >BM_StringFindAllMatch/1 4 ns 4 ns 180640260 >BM_StringFindAllMatch/8 6 ns 6 ns 119682220 >BM_StringFindAllMatch/64 7 ns 7 ns 97679753 >BM_StringFindAllMatch/512 19 ns 19 ns 36035174 >BM_StringFindAllMatch/4096 92 ns 92 ns 7516363 >BM_StringFindAllMatch/32768 849 ns 849 ns 809284 >BM_StringFindAllMatch/131072 3610 ns 3611 ns 193894 >BM_StringFindMatch1/1 27273 ns 27283 ns 25579 >BM_StringFindMatch1/8 27289 ns 27300 ns 25516 >BM_StringFindMatch1/64 27297 ns 27307 ns 25561 >BM_StringFindMatch1/512 27303 ns 27314 ns 25579 >BM_StringFindMatch1/4096 27488 ns 27499 ns 25366 >BM_StringFindMatch1/32768 28157 ns 28168 ns 24750 >BM_StringFindMatch2/1 27273 ns 27284 ns 25562 >BM_StringFindMatch2/8 27296 ns 27306 ns 25555 >BM_StringFindMatch2/64 27312 ns 27323 ns 25549 >BM_StringFindMatch2/512 27327 ns 27338 ns 25558 >BM_StringFindMatch2/4096 27513 ns 27524 ns 25349 >BM_StringFindMatch2/32768 28161 ns 28172 ns 24788 >BM_StringRegression 81 ns 81 ns 8503740 > > > >With the patch > > >Run on (8 X 1071.8 MHz CPU s) >2017-01-06 16:06:29 >***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. >Benchmark Time CPU Iterations >------------------------------------------------------------------- >BM_StringFindNoMatch/10 6 ns 6 ns 121302159 >BM_StringFindNoMatch/64 7 ns 7 ns 102003502 >BM_StringFindNoMatch/512 15 ns 15 ns 44820639 >BM_StringFindNoMatch/4096 77 ns 77 ns 9016958 >BM_StringFindNoMatch/32768 555 ns 555 ns 1227219 >BM_StringFindNoMatch/131072 2688 ns 2689 ns 259488 >BM_StringFindAllMatch/1 8 ns 8 ns 85893410 >BM_StringFindAllMatch/8 9 ns 9 ns 80811804 >BM_StringFindAllMatch/64 9 ns 9 ns 74237599 >BM_StringFindAllMatch/512 23 ns 23 ns 31163379 >BM_StringFindAllMatch/4096 94 ns 94 ns 7317385 >BM_StringFindAllMatch/32768 847 ns 848 ns 803901 >BM_StringFindAllMatch/131072 3551 ns 3552 ns 196844 >BM_StringFindMatch1/1 1337 ns 1337 ns 518042 >BM_StringFindMatch1/8 1338 ns 1338 ns 519431 >BM_StringFindMatch1/64 1340 ns 1341 ns 513974 >BM_StringFindMatch1/512 1355 ns 1356 ns 511857 >BM_StringFindMatch1/4096 1489 ns 1489 ns 465629 >BM_StringFindMatch1/32768 2203 ns 2204 ns 316044 >BM_StringFindMatch2/1 1337 ns 1338 ns 519057 >BM_StringFindMatch2/8 1337 ns 1337 ns 517745 >BM_StringFindMatch2/64 1347 ns 1347 ns 514813 >BM_StringFindMatch2/512 1362 ns 1363 ns 507408 >BM_StringFindMatch2/4096 1491 ns 1492 ns 465979 >BM_StringFindMatch2/32768 2204 ns 2205 ns 315887 >BM_StringRegression 109 ns 109 ns 6346500 > > > >-Aditya > > >From: Jonathan Wakely <jwakely@redhat.com> >Sent: Friday, January 6, 2017 2:37 PM >To: Aditya Kumar >Cc: libstdc++@gcc.gnu.org; gcc-patches@gcc.gnu.org; hiraditya@msn.com >Subject: Re: [PATCH] improve string find algorithm > >On 06/01/17 14:34 -0600, Aditya Kumar wrote: >>Thanks for the correction and updating the comments. > >Could you try the corrected patch on your benchmarks? > >The performance with the patch is worse for the first string::find >benchmark in our testsuite: > > s = "aabbaabbaaxd adbffdadgaxaabbbddhatyaaaabbbaabbaabbcsy"; > f = "aabbaabbc"; > s.find(f); > >
Thanks, -Aditya From: Jonathan Wakely <jwakely@redhat.com> Sent: Monday, January 9, 2017 6:33 AM To: Aditya K Cc: Aditya Kumar; Sebastian Pop; libstdc++@gcc.gnu.org; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] improve string find algorithm On 06/01/17 22:19 +0000, Aditya K wrote: >> Could you try the corrected patch on your benchmarks? > >For the test-case you gave there is a regression. > >Benchmark Time CPU Iterations >------------------------------------------------------------------- >Without the patch: BM_StringRegression 81 ns 81 ns 8503740 >With the patch: BM_StringRegression 109 ns 109 ns 6346500 > > >The real advantage is when there are fewer matches as seen in BM_StringFindNoMatch. The code for the benchmark can be found in https://github.com/llvm-mirror/libcxx/blob/master/benchmarks/string.bench.cpp llvm-mirror/libcxx github.com Mirror of official libcxx git repository located at http://llvm.org/git/libcxx. Updated every five minutes. >However, I have written an independent std-benchmark that can be used just by exporting the CC, CXX, LD_LIBRARY_FLAGS: https://github.com/hiraditya/std-benchmark hiraditya/std-benchmark github.com std-benchmark - A benchmark for standard libraries I think the large improvements are worth the smaller regression, so I'm committing the patch I sent last week. >Following are the results: >---------------------------------------------------------------------------------------------------------------------------------- >Without the patch: > > >Run on (8 X 3403.85 MHz CPU s) >2017-01-06 15:47:30 >***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. >Benchmark Time CPU Iterations >------------------------------------------------------------------- >BM_StringFindNoMatch/10 6 ns 6 ns 114499418 >BM_StringFindNoMatch/64 34 ns 34 ns 20578576 >BM_StringFindNoMatch/512 222 ns 222 ns 3136787 >BM_StringFindNoMatch/4096 1728 ns 1729 ns 401913 >BM_StringFindNoMatch/32768 13679 ns 13684 ns 50680 >BM_StringFindNoMatch/131072 54570 ns 54591 ns 12506 >BM_StringFindAllMatch/1 4 ns 4 ns 180640260 >BM_StringFindAllMatch/8 6 ns 6 ns 119682220 >BM_StringFindAllMatch/64 7 ns 7 ns 97679753 >BM_StringFindAllMatch/512 19 ns 19 ns 36035174 >BM_StringFindAllMatch/4096 92 ns 92 ns 7516363 >BM_StringFindAllMatch/32768 849 ns 849 ns 809284 >BM_StringFindAllMatch/131072 3610 ns 3611 ns 193894 >BM_StringFindMatch1/1 27273 ns 27283 ns 25579 >BM_StringFindMatch1/8 27289 ns 27300 ns 25516 >BM_StringFindMatch1/64 27297 ns 27307 ns 25561 >BM_StringFindMatch1/512 27303 ns 27314 ns 25579 >BM_StringFindMatch1/4096 27488 ns 27499 ns 25366 >BM_StringFindMatch1/32768 28157 ns 28168 ns 24750 >BM_StringFindMatch2/1 27273 ns 27284 ns 25562 >BM_StringFindMatch2/8 27296 ns 27306 ns 25555 >BM_StringFindMatch2/64 27312 ns 27323 ns 25549 >BM_StringFindMatch2/512 27327 ns 27338 ns 25558 >BM_StringFindMatch2/4096 27513 ns 27524 ns 25349 >BM_StringFindMatch2/32768 28161 ns 28172 ns 24788 >BM_StringRegression 81 ns 81 ns 8503740 > > > >With the patch > > >Run on (8 X 1071.8 MHz CPU s) >2017-01-06 16:06:29 >***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. >Benchmark Time CPU Iterations >------------------------------------------------------------------- >BM_StringFindNoMatch/10 6 ns 6 ns 121302159 >BM_StringFindNoMatch/64 7 ns 7 ns 102003502 >BM_StringFindNoMatch/512 15 ns 15 ns 44820639 >BM_StringFindNoMatch/4096 77 ns 77 ns 9016958 >BM_StringFindNoMatch/32768 555 ns 555 ns 1227219 >BM_StringFindNoMatch/131072 2688 ns 2689 ns 259488 >BM_StringFindAllMatch/1 8 ns 8 ns 85893410 >BM_StringFindAllMatch/8 9 ns 9 ns 80811804 >BM_StringFindAllMatch/64 9 ns 9 ns 74237599 >BM_StringFindAllMatch/512 23 ns 23 ns 31163379 >BM_StringFindAllMatch/4096 94 ns 94 ns 7317385 >BM_StringFindAllMatch/32768 847 ns 848 ns 803901 >BM_StringFindAllMatch/131072 3551 ns 3552 ns 196844 >BM_StringFindMatch1/1 1337 ns 1337 ns 518042 >BM_StringFindMatch1/8 1338 ns 1338 ns 519431 >BM_StringFindMatch1/64 1340 ns 1341 ns 513974 >BM_StringFindMatch1/512 1355 ns 1356 ns 511857 >BM_StringFindMatch1/4096 1489 ns 1489 ns 465629 >BM_StringFindMatch1/32768 2203 ns 2204 ns 316044 >BM_StringFindMatch2/1 1337 ns 1338 ns 519057 >BM_StringFindMatch2/8 1337 ns 1337 ns 517745 >BM_StringFindMatch2/64 1347 ns 1347 ns 514813 >BM_StringFindMatch2/512 1362 ns 1363 ns 507408 >BM_StringFindMatch2/4096 1491 ns 1492 ns 465979 >BM_StringFindMatch2/32768 2204 ns 2205 ns 315887 >BM_StringRegression 109 ns 109 ns 6346500 > > > >-Aditya > > >From: Jonathan Wakely <jwakely@redhat.com> >Sent: Friday, January 6, 2017 2:37 PM >To: Aditya Kumar >Cc: libstdc++@gcc.gnu.org; gcc-patches@gcc.gnu.org; hiraditya@msn.com >Subject: Re: [PATCH] improve string find algorithm > >On 06/01/17 14:34 -0600, Aditya Kumar wrote: >>Thanks for the correction and updating the comments. > >Could you try the corrected patch on your benchmarks? > >The performance with the patch is worse for the first string::find >benchmark in our testsuite: > > s = "aabbaabbaaxd adbffdadgaxaabbbddhatyaaaabbbaabbaabbcsy"; > f = "aabbaabbc"; > s.find(f); > >
commit 2050cea863ebf5b958199478717ed717a3fb3abe Author: Jonathan Wakely <jwakely@redhat.com> Date: Fri Jan 6 19:50:02 2017 +0000 PR66414 optimize std::string::find 2017-01-06 Jonathan Wakely <jwakely@redhat.com> Aditya Kumar <hiraditya@msn.com> PR libstdc++/66414 * include/bits/basic_string.tcc (basic_string::find(const CharT*, size_type, size_type)): Optimize. diff --git a/libstdc++-v3/include/bits/basic_string.tcc b/libstdc++-v3/include/bits/basic_string.tcc index 8b2895b..41b7fa1 100644 --- a/libstdc++-v3/include/bits/basic_string.tcc +++ b/libstdc++-v3/include/bits/basic_string.tcc @@ -1190,18 +1190,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { __glibcxx_requires_string_len(__s, __n); const size_type __size = this->size(); - const _CharT* __data = _M_data(); if (__n == 0) return __pos <= __size ? __pos : npos; + if (__pos >= __size) + return npos; - if (__n <= __size) + const _CharT __elem0 = __s[0]; + const _CharT* const __data = data(); + const _CharT* __first = __data + __pos; + const _CharT* const __last = __data + __size; + size_type __len = __size - __pos; + + while (__len >= __n) { - for (; __pos <= __size - __n; ++__pos) - if (traits_type::eq(__data[__pos], __s[0]) - && traits_type::compare(__data + __pos + 1, - __s + 1, __n - 1) == 0) - return __pos; + // Find the first occurrence of __elem0: + __first = traits_type::find(__first, __len - __n + 1, __elem0); + if (!__first) + return npos; + // Compare the full strings from the first occurrence of __elem0. + // We already know that __first[0] == __s[0] but compare them again + // anyway because __s is probably aligned, which helps memcmp. + if (traits_type::compare(__first, __s, __n) == 0) + return __first - __data; + __len = __last - ++__first; } return npos; }