improve string find algorithm

Message ID 20170106202058.GH2966@redhat.com
State New
Headers show

Commit Message

Jonathan Wakely Jan. 6, 2017, 8:20 p.m.
On 06/01/17 08:42 -0600, Aditya Kumar wrote:
>Yes, we do.

>Sorry for the mistake, it happened because I first wrote this for

>libcxx (https://reviews.llvm.org/D27068) and while porting that line

>got missed.

>

>Thanks,

>-Aditya

>

>

>diff --git a/libstdc++-v3/include/bits/basic_string.tcc

>b/libstdc++-v3/include/bits/basic_string.tcc

>index df1e8dd..7942ee6 100644

>--- a/libstdc++-v3/include/bits/basic_string.tcc

>+++ b/libstdc++-v3/include/bits/basic_string.tcc

>@@ -1194,14 +1194,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

>       if (__n == 0)

>         return __pos <= __size ? __pos : npos;

>

>-      if (__n <= __size)

>-       {

>-         for (; __pos <= __size - __n; ++__pos)

>-           if (traits_type::eq(__data[__pos], __s[0])

>-               && traits_type::compare(__data + __pos + 1,

>-                                       __s + 1, __n - 1) == 0)

>-             return __pos;

>-       }

>+      if (__n > __size)

>+       return npos;

>+

>+      const _CharT __elem0 = __s[0];

>+      const _CharT* __first1 = __data;

>+      const _CharT* __first2 = __s;

>+      const _CharT* __last1 = __data + __size;

>+      ptrdiff_t __len2 = __n -  __pos;


What's this variable for?

>+      while (true) {

>+        ptrdiff_t __len1 = __last1 - __first1;

>+       if (__len1 < __len2)

>+         return npos;

>+

>+       // Find the first match.

>+       __first1 = traits_type::find(__first1, __len1 - __len2 + 1, __elem0);

>+       if (__first1 == 0)

>+         return npos;

>+       // Compare the full string when first match is found.

>+        if (traits_type::compare(__first1, __first2, __len2) == 0)

>+          return __first1 - __data;

>+

>+        ++__first1;

>+      }

>+

>       return npos;

>     }


This is still wrong, consider std::string("abcd").find("ab", 2)

This should return npos, but you return 0. The postcondition for the
function is that the return value is not less than the pos argument.

The attached patch should be a correct version of the improved
algorithm.

Comments

Aditya Kumar Jan. 6, 2017, 8:34 p.m. | #1
Thanks for the correction and updating the comments.
-Aditya

-----Original Message-----
From: Jonathan Wakely [mailto:jwakely@redhat.com] 

Sent: Friday, January 06, 2017 2:21 PM
To: Aditya Kumar
Cc: libstdc++@gcc.gnu.org; gcc-patches@gcc.gnu.org; hiraditya@msn.com
Subject: Re: [PATCH] improve string find algorithm

On 06/01/17 08:42 -0600, Aditya Kumar wrote:
>Yes, we do.

>Sorry for the mistake, it happened because I first wrote this for

>libcxx (https://reviews.llvm.org/D27068) and while porting that line

>got missed.

>

>Thanks,

>-Aditya

>

>

>diff --git a/libstdc++-v3/include/bits/basic_string.tcc

>b/libstdc++-v3/include/bits/basic_string.tcc

>index df1e8dd..7942ee6 100644

>--- a/libstdc++-v3/include/bits/basic_string.tcc

>+++ b/libstdc++-v3/include/bits/basic_string.tcc

>@@ -1194,14 +1194,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

>       if (__n == 0)

>         return __pos <= __size ? __pos : npos;

>

>-      if (__n <= __size)

>-       {

>-         for (; __pos <= __size - __n; ++__pos)

>-           if (traits_type::eq(__data[__pos], __s[0])

>-               && traits_type::compare(__data + __pos + 1,

>-                                       __s + 1, __n - 1) == 0)

>-             return __pos;

>-       }

>+      if (__n > __size)

>+       return npos;

>+

>+      const _CharT __elem0 = __s[0];

>+      const _CharT* __first1 = __data;

>+      const _CharT* __first2 = __s;

>+      const _CharT* __last1 = __data + __size;

>+      ptrdiff_t __len2 = __n -  __pos;


What's this variable for?

>+      while (true) {

>+        ptrdiff_t __len1 = __last1 - __first1;

>+       if (__len1 < __len2)

>+         return npos;

>+

>+       // Find the first match.

>+       __first1 = traits_type::find(__first1, __len1 - __len2 + 1,

__elem0);
>+       if (__first1 == 0)

>+         return npos;

>+       // Compare the full string when first match is found.

>+        if (traits_type::compare(__first1, __first2, __len2) == 0)

>+          return __first1 - __data;

>+

>+        ++__first1;

>+      }

>+

>       return npos;

>     }


This is still wrong, consider std::string("abcd").find("ab", 2)

This should return npos, but you return 0. The postcondition for the
function is that the return value is not less than the pos argument.

The attached patch should be a correct version of the improved
algorithm.
Jonathan Wakely Jan. 6, 2017, 8:37 p.m. | #2
On 06/01/17 14:34 -0600, Aditya Kumar wrote:
>Thanks for the correction and updating the comments.


Could you try the corrected patch on your benchmarks?

The performance with the patch is worse for the first string::find
benchmark in our testsuite:

  s = "aabbaabbaaxd adbffdadgaxaabbbddhatyaaaabbbaabbaabbcsy";
  f = "aabbaabbc";
  s.find(f);
Aditya K Jan. 6, 2017, 10:19 p.m. | #3
> Could you try the corrected patch on your benchmarks?


For the test-case you gave there is a regression.

Benchmark                                                                    Time           CPU      Iterations
-------------------------------------------------------------------
Without the patch: BM_StringRegression                  81 ns         81 ns    8503740
With the patch: BM_StringRegression                      109 ns      109 ns    6346500


The real advantage is when there are fewer matches as seen in BM_StringFindNoMatch. The code for the benchmark  can be found in https://github.com/llvm-mirror/libcxx/blob/master/benchmarks/string.bench.cpp
However, I have written an independent std-benchmark that can be used just by exporting the CC, CXX, LD_LIBRARY_FLAGS: https://github.com/hiraditya/std-benchmark


Following are the results:
----------------------------------------------------------------------------------------------------------------------------------
Without the patch:


Run on (8 X 3403.85 MHz CPU s)
2017-01-06 15:47:30
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
Benchmark                             Time           CPU Iterations
-------------------------------------------------------------------
BM_StringFindNoMatch/10               6 ns          6 ns  114499418
BM_StringFindNoMatch/64              34 ns         34 ns   20578576
BM_StringFindNoMatch/512            222 ns        222 ns    3136787
BM_StringFindNoMatch/4096          1728 ns       1729 ns     401913
BM_StringFindNoMatch/32768        13679 ns      13684 ns      50680
BM_StringFindNoMatch/131072       54570 ns      54591 ns      12506
BM_StringFindAllMatch/1               4 ns          4 ns  180640260
BM_StringFindAllMatch/8               6 ns          6 ns  119682220
BM_StringFindAllMatch/64              7 ns          7 ns   97679753
BM_StringFindAllMatch/512            19 ns         19 ns   36035174
BM_StringFindAllMatch/4096           92 ns         92 ns    7516363
BM_StringFindAllMatch/32768         849 ns        849 ns     809284
BM_StringFindAllMatch/131072       3610 ns       3611 ns     193894
BM_StringFindMatch1/1             27273 ns      27283 ns      25579
BM_StringFindMatch1/8             27289 ns      27300 ns      25516
BM_StringFindMatch1/64            27297 ns      27307 ns      25561
BM_StringFindMatch1/512           27303 ns      27314 ns      25579
BM_StringFindMatch1/4096          27488 ns      27499 ns      25366
BM_StringFindMatch1/32768         28157 ns      28168 ns      24750
BM_StringFindMatch2/1             27273 ns      27284 ns      25562
BM_StringFindMatch2/8             27296 ns      27306 ns      25555
BM_StringFindMatch2/64            27312 ns      27323 ns      25549
BM_StringFindMatch2/512           27327 ns      27338 ns      25558
BM_StringFindMatch2/4096          27513 ns      27524 ns      25349
BM_StringFindMatch2/32768         28161 ns      28172 ns      24788
BM_StringRegression                  81 ns         81 ns    8503740



With the patch


Run on (8 X 1071.8 MHz CPU s)
2017-01-06 16:06:29
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
Benchmark                             Time           CPU Iterations
-------------------------------------------------------------------
BM_StringFindNoMatch/10               6 ns          6 ns  121302159
BM_StringFindNoMatch/64               7 ns          7 ns  102003502
BM_StringFindNoMatch/512             15 ns         15 ns   44820639
BM_StringFindNoMatch/4096            77 ns         77 ns    9016958
BM_StringFindNoMatch/32768          555 ns        555 ns    1227219
BM_StringFindNoMatch/131072        2688 ns       2689 ns     259488
BM_StringFindAllMatch/1               8 ns          8 ns   85893410
BM_StringFindAllMatch/8               9 ns          9 ns   80811804
BM_StringFindAllMatch/64              9 ns          9 ns   74237599
BM_StringFindAllMatch/512            23 ns         23 ns   31163379
BM_StringFindAllMatch/4096           94 ns         94 ns    7317385
BM_StringFindAllMatch/32768         847 ns        848 ns     803901
BM_StringFindAllMatch/131072       3551 ns       3552 ns     196844
BM_StringFindMatch1/1              1337 ns       1337 ns     518042
BM_StringFindMatch1/8              1338 ns       1338 ns     519431
BM_StringFindMatch1/64             1340 ns       1341 ns     513974
BM_StringFindMatch1/512            1355 ns       1356 ns     511857
BM_StringFindMatch1/4096           1489 ns       1489 ns     465629
BM_StringFindMatch1/32768          2203 ns       2204 ns     316044
BM_StringFindMatch2/1              1337 ns       1338 ns     519057
BM_StringFindMatch2/8              1337 ns       1337 ns     517745
BM_StringFindMatch2/64             1347 ns       1347 ns     514813
BM_StringFindMatch2/512            1362 ns       1363 ns     507408
BM_StringFindMatch2/4096           1491 ns       1492 ns     465979
BM_StringFindMatch2/32768          2204 ns       2205 ns     315887
BM_StringRegression                 109 ns        109 ns    6346500



-Aditya


From: Jonathan Wakely <jwakely@redhat.com>

Sent: Friday, January 6, 2017 2:37 PM
To: Aditya Kumar
Cc: libstdc++@gcc.gnu.org; gcc-patches@gcc.gnu.org; hiraditya@msn.com
Subject: Re: [PATCH] improve string find algorithm
    
On 06/01/17 14:34 -0600, Aditya Kumar wrote:
>Thanks for the correction and updating the comments.


Could you try the corrected patch on your benchmarks?

The performance with the patch is worse for the first string::find
benchmark in our testsuite:

  s = "aabbaabbaaxd adbffdadgaxaabbbddhatyaaaabbbaabbaabbcsy";
  f = "aabbaabbc";
  s.find(f);
Jonathan Wakely Jan. 9, 2017, 12:33 p.m. | #4
On 06/01/17 22:19 +0000, Aditya K wrote:
>> Could you try the corrected patch on your benchmarks?

>

>For the test-case you gave there is a regression.

>

>Benchmark                                                                    Time           CPU      Iterations

>-------------------------------------------------------------------

>Without the patch: BM_StringRegression                  81 ns         81 ns    8503740

>With the patch: BM_StringRegression                      109 ns      109 ns    6346500

>

>

>The real advantage is when there are fewer matches as seen in BM_StringFindNoMatch. The code for the benchmark  can be found in https://github.com/llvm-mirror/libcxx/blob/master/benchmarks/string.bench.cpp

>However, I have written an independent std-benchmark that can be used just by exporting the CC, CXX, LD_LIBRARY_FLAGS: https://github.com/hiraditya/std-benchmark


I think the large improvements are worth the smaller regression, so
I'm committing the patch I sent last week.



>Following are the results:

>----------------------------------------------------------------------------------------------------------------------------------

>Without the patch:

>

>

>Run on (8 X 3403.85 MHz CPU s)

>2017-01-06 15:47:30

>***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.

>Benchmark                             Time           CPU Iterations

>-------------------------------------------------------------------

>BM_StringFindNoMatch/10               6 ns          6 ns  114499418

>BM_StringFindNoMatch/64              34 ns         34 ns   20578576

>BM_StringFindNoMatch/512            222 ns        222 ns    3136787

>BM_StringFindNoMatch/4096          1728 ns       1729 ns     401913

>BM_StringFindNoMatch/32768        13679 ns      13684 ns      50680

>BM_StringFindNoMatch/131072       54570 ns      54591 ns      12506

>BM_StringFindAllMatch/1               4 ns          4 ns  180640260

>BM_StringFindAllMatch/8               6 ns          6 ns  119682220

>BM_StringFindAllMatch/64              7 ns          7 ns   97679753

>BM_StringFindAllMatch/512            19 ns         19 ns   36035174

>BM_StringFindAllMatch/4096           92 ns         92 ns    7516363

>BM_StringFindAllMatch/32768         849 ns        849 ns     809284

>BM_StringFindAllMatch/131072       3610 ns       3611 ns     193894

>BM_StringFindMatch1/1             27273 ns      27283 ns      25579

>BM_StringFindMatch1/8             27289 ns      27300 ns      25516

>BM_StringFindMatch1/64            27297 ns      27307 ns      25561

>BM_StringFindMatch1/512           27303 ns      27314 ns      25579

>BM_StringFindMatch1/4096          27488 ns      27499 ns      25366

>BM_StringFindMatch1/32768         28157 ns      28168 ns      24750

>BM_StringFindMatch2/1             27273 ns      27284 ns      25562

>BM_StringFindMatch2/8             27296 ns      27306 ns      25555

>BM_StringFindMatch2/64            27312 ns      27323 ns      25549

>BM_StringFindMatch2/512           27327 ns      27338 ns      25558

>BM_StringFindMatch2/4096          27513 ns      27524 ns      25349

>BM_StringFindMatch2/32768         28161 ns      28172 ns      24788

>BM_StringRegression                  81 ns         81 ns    8503740

>

>

>

>With the patch

>

>

>Run on (8 X 1071.8 MHz CPU s)

>2017-01-06 16:06:29

>***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.

>Benchmark                             Time           CPU Iterations

>-------------------------------------------------------------------

>BM_StringFindNoMatch/10               6 ns          6 ns  121302159

>BM_StringFindNoMatch/64               7 ns          7 ns  102003502

>BM_StringFindNoMatch/512             15 ns         15 ns   44820639

>BM_StringFindNoMatch/4096            77 ns         77 ns    9016958

>BM_StringFindNoMatch/32768          555 ns        555 ns    1227219

>BM_StringFindNoMatch/131072        2688 ns       2689 ns     259488

>BM_StringFindAllMatch/1               8 ns          8 ns   85893410

>BM_StringFindAllMatch/8               9 ns          9 ns   80811804

>BM_StringFindAllMatch/64              9 ns          9 ns   74237599

>BM_StringFindAllMatch/512            23 ns         23 ns   31163379

>BM_StringFindAllMatch/4096           94 ns         94 ns    7317385

>BM_StringFindAllMatch/32768         847 ns        848 ns     803901

>BM_StringFindAllMatch/131072       3551 ns       3552 ns     196844

>BM_StringFindMatch1/1              1337 ns       1337 ns     518042

>BM_StringFindMatch1/8              1338 ns       1338 ns     519431

>BM_StringFindMatch1/64             1340 ns       1341 ns     513974

>BM_StringFindMatch1/512            1355 ns       1356 ns     511857

>BM_StringFindMatch1/4096           1489 ns       1489 ns     465629

>BM_StringFindMatch1/32768          2203 ns       2204 ns     316044

>BM_StringFindMatch2/1              1337 ns       1338 ns     519057

>BM_StringFindMatch2/8              1337 ns       1337 ns     517745

>BM_StringFindMatch2/64             1347 ns       1347 ns     514813

>BM_StringFindMatch2/512            1362 ns       1363 ns     507408

>BM_StringFindMatch2/4096           1491 ns       1492 ns     465979

>BM_StringFindMatch2/32768          2204 ns       2205 ns     315887

>BM_StringRegression                 109 ns        109 ns    6346500

>

>

>

>-Aditya

>

>

>From: Jonathan Wakely <jwakely@redhat.com>

>Sent: Friday, January 6, 2017 2:37 PM

>To: Aditya Kumar

>Cc: libstdc++@gcc.gnu.org; gcc-patches@gcc.gnu.org; hiraditya@msn.com

>Subject: Re: [PATCH] improve string find algorithm


>On 06/01/17 14:34 -0600, Aditya Kumar wrote:

>>Thanks for the correction and updating the comments.

>

>Could you try the corrected patch on your benchmarks?

>

>The performance with the patch is worse for the first string::find

>benchmark in our testsuite:

>

>  s = "aabbaabbaaxd adbffdadgaxaabbbddhatyaaaabbbaabbaabbcsy";

>  f = "aabbaabbc";

>  s.find(f);

>

>
Aditya K Jan. 9, 2017, 4:52 p.m. | #5
Thanks,
-Aditya




From: Jonathan Wakely <jwakely@redhat.com>

Sent: Monday, January 9, 2017 6:33 AM
To: Aditya K
Cc: Aditya Kumar; Sebastian Pop; libstdc++@gcc.gnu.org; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] improve string find algorithm
    
On 06/01/17 22:19 +0000, Aditya K wrote:
>> Could you try the corrected patch on your benchmarks?

>

>For the test-case you gave there is a regression.

>

>Benchmark                                                                    Time           CPU      Iterations

>-------------------------------------------------------------------

>Without the patch: BM_StringRegression                  81 ns         81 ns    8503740

>With the patch: BM_StringRegression                      109 ns      109 ns    6346500

>

>

>The real advantage is when there are fewer matches as seen in BM_StringFindNoMatch. The code for the benchmark  can be found in  https://github.com/llvm-mirror/libcxx/blob/master/benchmarks/string.bench.cpp




llvm-mirror/libcxx
github.com
Mirror of official libcxx git repository located at http://llvm.org/git/libcxx. Updated every five minutes.

>However, I have written an independent std-benchmark that can be used just by exporting the CC, CXX, LD_LIBRARY_FLAGS: https://github.com/hiraditya/std-benchmark




hiraditya/std-benchmark
github.com
std-benchmark - A benchmark for standard libraries


I think the large improvements are worth the smaller regression, so
I'm committing the patch I sent last week.



>Following are the results:

>----------------------------------------------------------------------------------------------------------------------------------

>Without the patch:

>

>

>Run on (8 X 3403.85 MHz CPU s)

>2017-01-06 15:47:30

>***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.

>Benchmark                             Time           CPU Iterations

>-------------------------------------------------------------------

>BM_StringFindNoMatch/10               6 ns          6 ns  114499418

>BM_StringFindNoMatch/64              34 ns         34 ns   20578576

>BM_StringFindNoMatch/512            222 ns        222 ns    3136787

>BM_StringFindNoMatch/4096          1728 ns       1729 ns     401913

>BM_StringFindNoMatch/32768        13679 ns      13684 ns      50680

>BM_StringFindNoMatch/131072       54570 ns      54591 ns      12506

>BM_StringFindAllMatch/1               4 ns          4 ns  180640260

>BM_StringFindAllMatch/8               6 ns          6 ns  119682220

>BM_StringFindAllMatch/64              7 ns          7 ns   97679753

>BM_StringFindAllMatch/512            19 ns         19 ns   36035174

>BM_StringFindAllMatch/4096           92 ns         92 ns    7516363

>BM_StringFindAllMatch/32768         849 ns        849 ns     809284

>BM_StringFindAllMatch/131072       3610 ns       3611 ns     193894

>BM_StringFindMatch1/1             27273 ns      27283 ns      25579

>BM_StringFindMatch1/8             27289 ns      27300 ns      25516

>BM_StringFindMatch1/64            27297 ns      27307 ns      25561

>BM_StringFindMatch1/512           27303 ns      27314 ns      25579

>BM_StringFindMatch1/4096          27488 ns      27499 ns      25366

>BM_StringFindMatch1/32768         28157 ns      28168 ns      24750

>BM_StringFindMatch2/1             27273 ns      27284 ns      25562

>BM_StringFindMatch2/8             27296 ns      27306 ns      25555

>BM_StringFindMatch2/64            27312 ns      27323 ns      25549

>BM_StringFindMatch2/512           27327 ns      27338 ns      25558

>BM_StringFindMatch2/4096          27513 ns      27524 ns      25349

>BM_StringFindMatch2/32768         28161 ns      28172 ns      24788

>BM_StringRegression                  81 ns         81 ns    8503740

>

>

>

>With the patch

>

>

>Run on (8 X 1071.8 MHz CPU s)

>2017-01-06 16:06:29

>***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.

>Benchmark                             Time           CPU Iterations

>-------------------------------------------------------------------

>BM_StringFindNoMatch/10               6 ns          6 ns  121302159

>BM_StringFindNoMatch/64               7 ns          7 ns  102003502

>BM_StringFindNoMatch/512             15 ns         15 ns   44820639

>BM_StringFindNoMatch/4096            77 ns         77 ns    9016958

>BM_StringFindNoMatch/32768          555 ns        555 ns    1227219

>BM_StringFindNoMatch/131072        2688 ns       2689 ns     259488

>BM_StringFindAllMatch/1               8 ns          8 ns   85893410

>BM_StringFindAllMatch/8               9 ns          9 ns   80811804

>BM_StringFindAllMatch/64              9 ns          9 ns   74237599

>BM_StringFindAllMatch/512            23 ns         23 ns   31163379

>BM_StringFindAllMatch/4096           94 ns         94 ns    7317385

>BM_StringFindAllMatch/32768         847 ns        848 ns     803901

>BM_StringFindAllMatch/131072       3551 ns       3552 ns     196844

>BM_StringFindMatch1/1              1337 ns       1337 ns     518042

>BM_StringFindMatch1/8              1338 ns       1338 ns     519431

>BM_StringFindMatch1/64             1340 ns       1341 ns     513974

>BM_StringFindMatch1/512            1355 ns       1356 ns     511857

>BM_StringFindMatch1/4096           1489 ns       1489 ns     465629

>BM_StringFindMatch1/32768          2203 ns       2204 ns     316044

>BM_StringFindMatch2/1              1337 ns       1338 ns     519057

>BM_StringFindMatch2/8              1337 ns       1337 ns     517745

>BM_StringFindMatch2/64             1347 ns       1347 ns     514813

>BM_StringFindMatch2/512            1362 ns       1363 ns     507408

>BM_StringFindMatch2/4096           1491 ns       1492 ns     465979

>BM_StringFindMatch2/32768          2204 ns       2205 ns     315887

>BM_StringRegression                 109 ns        109 ns    6346500

>

>

>

>-Aditya

>

>

>From: Jonathan Wakely <jwakely@redhat.com>

>Sent: Friday, January 6, 2017 2:37 PM

>To: Aditya Kumar

>Cc: libstdc++@gcc.gnu.org; gcc-patches@gcc.gnu.org; hiraditya@msn.com

>Subject: Re: [PATCH] improve string find algorithm


>On 06/01/17 14:34 -0600, Aditya Kumar wrote:

>>Thanks for the correction and updating the comments.

>

>Could you try the corrected patch on your benchmarks?

>

>The performance with the patch is worse for the first string::find

>benchmark in our testsuite:

>

>  s = "aabbaabbaaxd adbffdadgaxaabbbddhatyaaaabbbaabbaabbcsy";

>  f = "aabbaabbc";

>  s.find(f);

>

>

Patch hide | download patch | download mbox

commit 2050cea863ebf5b958199478717ed717a3fb3abe
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Jan 6 19:50:02 2017 +0000

    PR66414 optimize std::string::find
    
    2017-01-06  Jonathan Wakely  <jwakely@redhat.com>
    	    Aditya Kumar  <hiraditya@msn.com>
    
    	PR libstdc++/66414
    	* include/bits/basic_string.tcc
    	(basic_string::find(const CharT*, size_type, size_type)): Optimize.

diff --git a/libstdc++-v3/include/bits/basic_string.tcc b/libstdc++-v3/include/bits/basic_string.tcc
index 8b2895b..41b7fa1 100644
--- a/libstdc++-v3/include/bits/basic_string.tcc
+++ b/libstdc++-v3/include/bits/basic_string.tcc
@@ -1190,18 +1190,30 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     {
       __glibcxx_requires_string_len(__s, __n);
       const size_type __size = this->size();
-      const _CharT* __data = _M_data();
 
       if (__n == 0)
 	return __pos <= __size ? __pos : npos;
+      if (__pos >= __size)
+	return npos;
 
-      if (__n <= __size)
+      const _CharT __elem0 = __s[0];
+      const _CharT* const __data = data();
+      const _CharT* __first = __data + __pos;
+      const _CharT* const __last = __data + __size;
+      size_type __len = __size - __pos;
+
+      while (__len >= __n)
 	{
-	  for (; __pos <= __size - __n; ++__pos)
-	    if (traits_type::eq(__data[__pos], __s[0])
-		&& traits_type::compare(__data + __pos + 1,
-					__s + 1, __n - 1) == 0)
-	      return __pos;
+	  // Find the first occurrence of __elem0:
+	  __first = traits_type::find(__first, __len - __n + 1, __elem0);
+	  if (!__first)
+	    return npos;
+	  // Compare the full strings from the first occurrence of __elem0.
+	  // We already know that __first[0] == __s[0] but compare them again
+	  // anyway because __s is probably aligned, which helps memcmp.
+	  if (traits_type::compare(__first, __s, __n) == 0)
+	    return __first - __data;
+	  __len = __last - ++__first;
 	}
       return npos;
     }