[v2,1/3] syscalls/tgkill01: add new test

Message ID 1552457573-1354-2-git-send-email-sumit.garg@linaro.org
State New
Headers show
Series
  • syscalls: add tgkill test-cases
Related show

Commit Message

Sumit Garg March 13, 2019, 6:12 a.m.
From: Greg Hackmann <ghackmann@google.com>

tgkill() delivers a signal to a specific thread.  Test this by
installing a SIGUSR1 handler which records the current pthread ID.
Start a number of threads in parallel, then one-by-one call tgkill(...,
tid, SIGUSR1) and check that the expected pthread ID was recorded.

Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Reviewed-by: Li Wang <liwang@redhat.com>
---
 runtest/syscalls                            |   2 +
 testcases/kernel/syscalls/tgkill/.gitignore |   1 +
 testcases/kernel/syscalls/tgkill/Makefile   |  10 ++
 testcases/kernel/syscalls/tgkill/tgkill.h   |  22 +++++
 testcases/kernel/syscalls/tgkill/tgkill01.c | 141 ++++++++++++++++++++++++++++
 5 files changed, 176 insertions(+)
 create mode 100644 testcases/kernel/syscalls/tgkill/.gitignore
 create mode 100644 testcases/kernel/syscalls/tgkill/Makefile
 create mode 100644 testcases/kernel/syscalls/tgkill/tgkill.h
 create mode 100644 testcases/kernel/syscalls/tgkill/tgkill01.c

Comments

Cyril Hrubis March 14, 2019, 12:22 p.m. | #1
Hi!
The test itself looks good, my only concern is actually that checkpoints
cannot be used for the keeping the thread asleep during the test.
However I can easily add one function to the futex library:

TST_CHECKPOINT_SLEEP(id)

That would cause the thread to wait on the checkpoint until:

* we were woken up
* we timeouted

Which would basically loop on tst_checkpoint_wait() and retry in case of
EINTR.

Maybe it would be a good idea to retry on EINTR in the
TST_CHECKPOINT_WAIT(), then we could easily use that one here as well.
I'm not sure though if there are tests that depends on checkpoints being
interrupted by signals though, I would have to check.

For the second part we already have a function to wake all threads
waiting on checkpoint, but we have to specify exact number of threads to
wait for, which is there in order to avoid race coditions (i.e. thread
was not sleeping yet at the time we tried to wake it). So we would have
to count the number of threads we want to wake before the call to the
TST_CHECKPOINT_WAKE2().
Sumit Garg March 14, 2019, 1:25 p.m. | #2
On Thu, 14 Mar 2019 at 17:53, Cyril Hrubis <chrubis@suse.cz> wrote:
>
> Hi!
> The test itself looks good, my only concern is actually that checkpoints
> cannot be used for the keeping the thread asleep during the test.
> However I can easily add one function to the futex library:
>
> TST_CHECKPOINT_SLEEP(id)
>
> That would cause the thread to wait on the checkpoint until:
>
> * we were woken up
> * we timeouted
>
> Which would basically loop on tst_checkpoint_wait() and retry in case of
> EINTR.
>

I am not sure how we would manage actual "msec_timeout" in case we get
EINTR and need to retry again as we may need to take care of elapsed
time till we receive asynchronous signal.

-Sumit

> Maybe it would be a good idea to retry on EINTR in the
> TST_CHECKPOINT_WAIT(), then we could easily use that one here as well.
> I'm not sure though if there are tests that depends on checkpoints being
> interrupted by signals though, I would have to check.
>
> For the second part we already have a function to wake all threads
> waiting on checkpoint, but we have to specify exact number of threads to
> wait for, which is there in order to avoid race coditions (i.e. thread
> was not sleeping yet at the time we tried to wake it). So we would have
> to count the number of threads we want to wake before the call to the
> TST_CHECKPOINT_WAKE2().
>
> --
> Cyril Hrubis
> chrubis@suse.cz
Cyril Hrubis March 14, 2019, 1:58 p.m. | #3
Hi!
> I am not sure how we would manage actual "msec_timeout" in case we get
> EINTR and need to retry again as we may need to take care of elapsed
> time till we receive asynchronous signal.

I would have just restarted the timeout after we got signal, the worst case
that can happen is that in an unlikely case we will send a signals fast enough
so that the checkpoint will never timeout. But even then the test library will
timeout and would kill the process anyways.

Another option is to switch checkpoints so that they use absolute timeout and
pass clock_gettime() + msec_timeout as timeout.

I would go for something as simple as:

diff --git a/lib/tst_checkpoint.c b/lib/tst_checkpoint.c
index 5455d0378..5e5b11496 100644
--- a/lib/tst_checkpoint.c
+++ b/lib/tst_checkpoint.c
@@ -85,6 +85,7 @@ void tst_checkpoint_init(const char *file, const int lineno,
 int tst_checkpoint_wait(unsigned int id, unsigned int msec_timeout)
 {
        struct timespec timeout;
+       int ret;
 
        if (id >= tst_max_futexes) {
                errno = EOVERFLOW;
@@ -94,8 +95,12 @@ int tst_checkpoint_wait(unsigned int id, unsigned int msec_timeout)
        timeout.tv_sec = msec_timeout/1000;
        timeout.tv_nsec = (msec_timeout%1000) * 1000000;
 
-       return syscall(SYS_futex, &tst_futexes[id], FUTEX_WAIT,
-                      tst_futexes[id], &timeout);
+       do {
+               ret = syscall(SYS_futex, &tst_futexes[id], FUTEX_WAIT,
+                             tst_futexes[id], &timeout);
+       } while (ret == -1 && errno == EINTR);
+
+       return ret;
 }
Li Wang March 15, 2019, 7:45 a.m. | #4
On Thu, Mar 14, 2019 at 9:59 PM Cyril Hrubis <chrubis@suse.cz> wrote:

> Hi!

> > I am not sure how we would manage actual "msec_timeout" in case we get

> > EINTR and need to retry again as we may need to take care of elapsed

> > time till we receive asynchronous signal.

>

> I would have just restarted the timeout after we got signal, the worst case

> that can happen is that in an unlikely case we will send a signals fast

> enough

>


Maybe we can print something useful there at least for friendly debugging
if that unlikely case happens.

> so that the checkpoint will never timeout. But even then the test library

> will

> timeout and would kill the process anyways.

>

> Another option is to switch checkpoints so that they use absolute timeout

> and

> pass clock_gettime() + msec_timeout as timeout.

>

> I would go for something as simple as:

>

> diff --git a/lib/tst_checkpoint.c b/lib/tst_checkpoint.c

> index 5455d0378..5e5b11496 100644

> --- a/lib/tst_checkpoint.c

> +++ b/lib/tst_checkpoint.c

> @@ -85,6 +85,7 @@ void tst_checkpoint_init(const char *file, const int

> lineno,

>  int tst_checkpoint_wait(unsigned int id, unsigned int msec_timeout)

>  {

>         struct timespec timeout;

> +       int ret;

>

>         if (id >= tst_max_futexes) {

>                 errno = EOVERFLOW;

> @@ -94,8 +95,12 @@ int tst_checkpoint_wait(unsigned int id, unsigned int

> msec_timeout)

>         timeout.tv_sec = msec_timeout/1000;

>         timeout.tv_nsec = (msec_timeout%1000) * 1000000;

>

> -       return syscall(SYS_futex, &tst_futexes[id], FUTEX_WAIT,

> -                      tst_futexes[id], &timeout);

> +       do {

> +               ret = syscall(SYS_futex, &tst_futexes[id], FUTEX_WAIT,

> +                             tst_futexes[id], &timeout);


    if (ret == -1 && errno == EINTR)
        tst_res(TWARN | TERRNO, "FUTEX_WAIT operation was interrupted by a
signal, retry again");

+       } while (ret == -1 && errno == EINTR);
> +

> +       return ret;

>  }

>

> --

> Cyril Hrubis

> chrubis@suse.cz

>



-- 
Regards,
Li Wang
<div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small"></div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 14, 2019 at 9:59 PM Cyril Hrubis &lt;<a href="mailto:chrubis@suse.cz">chrubis@suse.cz</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi!<br>
&gt; I am not sure how we would manage actual &quot;msec_timeout&quot; in case we get<br>
&gt; EINTR and need to retry again as we may need to take care of elapsed<br>
&gt; time till we receive asynchronous signal.<br>
<br>
I would have just restarted the timeout after we got signal, the worst case<br>
that can happen is that in an unlikely case we will send a signals fast enough<br></blockquote><div><br></div><div class="gmail_default" style="font-size:small">Maybe we can print something useful there at least for friendly debugging if that unlikely case happens.</div><div class="gmail_default" style="font-size:small"></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
so that the checkpoint will never timeout. But even then the test library will<br>
timeout and would kill the process anyways.<br>
<br>
Another option is to switch checkpoints so that they use absolute timeout and<br>
pass clock_gettime() + msec_timeout as timeout.<br>
<br>
I would go for something as simple as:<br>
<br>
diff --git a/lib/tst_checkpoint.c b/lib/tst_checkpoint.c<br>
index 5455d0378..5e5b11496 100644<br>
--- a/lib/tst_checkpoint.c<br>
+++ b/lib/tst_checkpoint.c<br>
@@ -85,6 +85,7 @@ void tst_checkpoint_init(const char *file, const int lineno,<br>
 int tst_checkpoint_wait(unsigned int id, unsigned int msec_timeout)<br>
 {<br>
        struct timespec timeout;<br>
+       int ret;<br>
<br>
        if (id &gt;= tst_max_futexes) {<br>
                errno = EOVERFLOW;<br>
@@ -94,8 +95,12 @@ int tst_checkpoint_wait(unsigned int id, unsigned int msec_timeout)<br>
        timeout.tv_sec = msec_timeout/1000;<br>
        timeout.tv_nsec = (msec_timeout%1000) * 1000000;<br>
<br>
-       return syscall(SYS_futex, &amp;tst_futexes[id], FUTEX_WAIT,<br>
-                      tst_futexes[id], &amp;timeout);<br>
+       do {<br>
+               ret = <span class="gmail_default" style="font-size:small"></span>syscall(SYS_futex, &amp;tst_futexes[id], FUTEX_WAIT,<br>
+                             tst_futexes[id], &amp;timeout);</blockquote><div><div class="gmail_default" style="font-size:small"></div></div><div class="gmail_default" style="font-size:small">    if (ret == -1 &amp;&amp; errno == EINTR)</div><div><div class="gmail_default">        tst_res(TWARN | TERRNO, &quot;FUTEX_WAIT operation was interrupted by a signal, retry again&quot;);</div></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+       } while (ret == -1 &amp;&amp; errno == EINTR);<br>
+<br>
+       return ret;<br>
 }<br>
<br>
-- <br>
Cyril Hrubis<br>
<a href="mailto:chrubis@suse.cz" target="_blank">chrubis@suse.cz</a><br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div>Regards,<br></div><div>Li Wang<br></div></div></div></div></div></div></div></div></div></div></div></div>
Sumit Garg March 15, 2019, 9:15 a.m. | #5
On Thu, 14 Mar 2019 at 19:29, Cyril Hrubis <chrubis@suse.cz> wrote:
>
> Hi!
> > I am not sure how we would manage actual "msec_timeout" in case we get
> > EINTR and need to retry again as we may need to take care of elapsed
> > time till we receive asynchronous signal.
>
> I would have just restarted the timeout after we got signal, the worst case
> that can happen is that in an unlikely case we will send a signals fast enough
> so that the checkpoint will never timeout. But even then the test library will
> timeout and would kill the process anyways.
>
> Another option is to switch checkpoints so that they use absolute timeout and
> pass clock_gettime() + msec_timeout as timeout.
>
> I would go for something as simple as:
>
> diff --git a/lib/tst_checkpoint.c b/lib/tst_checkpoint.c
> index 5455d0378..5e5b11496 100644
> --- a/lib/tst_checkpoint.c
> +++ b/lib/tst_checkpoint.c
> @@ -85,6 +85,7 @@ void tst_checkpoint_init(const char *file, const int lineno,
>  int tst_checkpoint_wait(unsigned int id, unsigned int msec_timeout)
>  {
>         struct timespec timeout;
> +       int ret;
>
>         if (id >= tst_max_futexes) {
>                 errno = EOVERFLOW;
> @@ -94,8 +95,12 @@ int tst_checkpoint_wait(unsigned int id, unsigned int msec_timeout)
>         timeout.tv_sec = msec_timeout/1000;
>         timeout.tv_nsec = (msec_timeout%1000) * 1000000;
>
> -       return syscall(SYS_futex, &tst_futexes[id], FUTEX_WAIT,
> -                      tst_futexes[id], &timeout);
> +       do {
> +               ret = syscall(SYS_futex, &tst_futexes[id], FUTEX_WAIT,
> +                             tst_futexes[id], &timeout);
> +       } while (ret == -1 && errno == EINTR);
> +
> +       return ret;
>  }
>

Wouldn't this loop be more appropriate in
"tst_safe_checkpoint_wait()"? As at later stage we may have tests that
depends on checkpoints being interrupted by signals and could directly
use "tst_checkpoint_wait()".

-Sumit

> --
> Cyril Hrubis
> chrubis@suse.cz
Sumit Garg March 15, 2019, 9:22 a.m. | #6
On Fri, 15 Mar 2019 at 13:15, Li Wang <liwang@redhat.com> wrote:
>
>
>
> On Thu, Mar 14, 2019 at 9:59 PM Cyril Hrubis <chrubis@suse.cz> wrote:
>>
>> Hi!
>> > I am not sure how we would manage actual "msec_timeout" in case we get
>> > EINTR and need to retry again as we may need to take care of elapsed
>> > time till we receive asynchronous signal.
>>
>> I would have just restarted the timeout after we got signal, the worst case
>> that can happen is that in an unlikely case we will send a signals fast enough
>
>
> Maybe we can print something useful there at least for friendly debugging if that unlikely case happens.
>>
>> so that the checkpoint will never timeout. But even then the test library will
>> timeout and would kill the process anyways.
>>
>> Another option is to switch checkpoints so that they use absolute timeout and
>> pass clock_gettime() + msec_timeout as timeout.
>>
>> I would go for something as simple as:
>>
>> diff --git a/lib/tst_checkpoint.c b/lib/tst_checkpoint.c
>> index 5455d0378..5e5b11496 100644
>> --- a/lib/tst_checkpoint.c
>> +++ b/lib/tst_checkpoint.c
>> @@ -85,6 +85,7 @@ void tst_checkpoint_init(const char *file, const int lineno,
>>  int tst_checkpoint_wait(unsigned int id, unsigned int msec_timeout)
>>  {
>>         struct timespec timeout;
>> +       int ret;
>>
>>         if (id >= tst_max_futexes) {
>>                 errno = EOVERFLOW;
>> @@ -94,8 +95,12 @@ int tst_checkpoint_wait(unsigned int id, unsigned int msec_timeout)
>>         timeout.tv_sec = msec_timeout/1000;
>>         timeout.tv_nsec = (msec_timeout%1000) * 1000000;
>>
>> -       return syscall(SYS_futex, &tst_futexes[id], FUTEX_WAIT,
>> -                      tst_futexes[id], &timeout);
>> +       do {
>> +               ret = syscall(SYS_futex, &tst_futexes[id], FUTEX_WAIT,
>> +                             tst_futexes[id], &timeout);
>
>     if (ret == -1 && errno == EINTR)
>         tst_res(TWARN | TERRNO, "FUTEX_WAIT operation was interrupted by a signal, retry again");
>

I am not sure if this warning message is desired for test-cases which
needs to wait on checkpoints irrespective of signals like this
tgkill01 test-case.

-Sumit

>> +       } while (ret == -1 && errno == EINTR);
>> +
>> +       return ret;
>>  }
>>
>> --
>> Cyril Hrubis
>> chrubis@suse.cz
>
>
>
> --
> Regards,
> Li Wang
Cyril Hrubis March 15, 2019, 10:08 a.m. | #7
Hi!
> >>  int tst_checkpoint_wait(unsigned int id, unsigned int msec_timeout)
> >>  {
> >>         struct timespec timeout;
> >> +       int ret;
> >>
> >>         if (id >= tst_max_futexes) {
> >>                 errno = EOVERFLOW;
> >> @@ -94,8 +95,12 @@ int tst_checkpoint_wait(unsigned int id, unsigned int msec_timeout)
> >>         timeout.tv_sec = msec_timeout/1000;
> >>         timeout.tv_nsec = (msec_timeout%1000) * 1000000;
> >>
> >> -       return syscall(SYS_futex, &tst_futexes[id], FUTEX_WAIT,
> >> -                      tst_futexes[id], &timeout);
> >> +       do {
> >> +               ret = syscall(SYS_futex, &tst_futexes[id], FUTEX_WAIT,
> >> +                             tst_futexes[id], &timeout);
> >
> >     if (ret == -1 && errno == EINTR)
> >         tst_res(TWARN | TERRNO, "FUTEX_WAIT operation was interrupted by a signal, retry again");
> >
> 
> I am not sure if this warning message is desired for test-cases which
> needs to wait on checkpoints irrespective of signals like this
> tgkill01 test-case.

Agreed, it's not an error condition, it's just a coincidence that most
of the tests does not get signals when they sleep on futex, otherwise
thing would crash and burn. So I would argue that retrying on EINTR is
actually a bug fix rather than anything else.
Cyril Hrubis March 15, 2019, 10:18 a.m. | #8
Hi!
> Wouldn't this loop be more appropriate in
> "tst_safe_checkpoint_wait()"? As at later stage we may have tests that
> depends on checkpoints being interrupted by signals and could directly
> use "tst_checkpoint_wait()".

The tst_checkpoint_wait() has a single use in the source tree and that
is testcases/lib/tst_checkpoint.c which is binary wrapper around
checkpoints so that we can use them in shell scripts as well, which is
pretty cool btw. And I think that we should retry on EINTR there as
well.

Also there does not seem to be test relying on checkpoints being
interrupted by signals and I would like to avoid this pattern ideally as
asynchronous events such as signals interrupting functions is something
rather counter intuitive.
Li Wang March 15, 2019, 10:23 a.m. | #9
On Fri, Mar 15, 2019 at 6:09 PM Cyril Hrubis <chrubis@suse.cz> wrote:

> Hi!

> > >>  int tst_checkpoint_wait(unsigned int id, unsigned int msec_timeout)

> > >>  {

> > >>         struct timespec timeout;

> > >> +       int ret;

> > >>

> > >>         if (id >= tst_max_futexes) {

> > >>                 errno = EOVERFLOW;

> > >> @@ -94,8 +95,12 @@ int tst_checkpoint_wait(unsigned int id, unsigned

> int msec_timeout)

> > >>         timeout.tv_sec = msec_timeout/1000;

> > >>         timeout.tv_nsec = (msec_timeout%1000) * 1000000;

> > >>

> > >> -       return syscall(SYS_futex, &tst_futexes[id], FUTEX_WAIT,

> > >> -                      tst_futexes[id], &timeout);

> > >> +       do {

> > >> +               ret = syscall(SYS_futex, &tst_futexes[id], FUTEX_WAIT,

> > >> +                             tst_futexes[id], &timeout);

> > >

> > >     if (ret == -1 && errno == EINTR)

> > >         tst_res(TWARN | TERRNO, "FUTEX_WAIT operation was interrupted

> by a signal, retry again");

> > >

> >

> > I am not sure if this warning message is desired for test-cases which

> > needs to wait on checkpoints irrespective of signals like this

> > tgkill01 test-case.

>

> Agreed, it's not an error condition, it's just a coincidence that most

> of the tests does not get signals when they sleep on futex, otherwise

> thing would crash and burn. So I would argue that retrying on EINTR is

> actually a bug fix rather than anything else.

>


Okay, here I'm not insist to print the warning. But it's only use for hint
on that worst situation which you were mentioned. If the checkpoint got
signal leads to never timeout and test eventually killed by test lib. That
would hard to know what happened at that moment. My concern is the
situation is hard to reproduce later so just want to print more useful
messeges:).

-- 
Regards,
Li Wang
<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Mar 15, 2019 at 6:09 PM Cyril Hrubis &lt;<a href="mailto:chrubis@suse.cz">chrubis@suse.cz</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi!<br>
&gt; &gt;&gt;  int tst_checkpoint_wait(unsigned int id, unsigned int msec_timeout)<br>
&gt; &gt;&gt;  {<br>
&gt; &gt;&gt;         struct timespec timeout;<br>
&gt; &gt;&gt; +       int ret;<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt;         if (id &gt;= tst_max_futexes) {<br>
&gt; &gt;&gt;                 errno = EOVERFLOW;<br>
&gt; &gt;&gt; @@ -94,8 +95,12 @@ int tst_checkpoint_wait(unsigned int id, unsigned int msec_timeout)<br>
&gt; &gt;&gt;         timeout.tv_sec = msec_timeout/1000;<br>
&gt; &gt;&gt;         timeout.tv_nsec = (msec_timeout%1000) * 1000000;<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; -       return syscall(SYS_futex, &amp;tst_futexes[id], FUTEX_WAIT,<br>
&gt; &gt;&gt; -                      tst_futexes[id], &amp;timeout);<br>
&gt; &gt;&gt; +       do {<br>
&gt; &gt;&gt; +               ret = syscall(SYS_futex, &amp;tst_futexes[id], FUTEX_WAIT,<br>
&gt; &gt;&gt; +                             tst_futexes[id], &amp;timeout);<br>
&gt; &gt;<br>
&gt; &gt;     if (ret == -1 &amp;&amp; errno == EINTR)<br>
&gt; &gt;         tst_res(TWARN | TERRNO, &quot;FUTEX_WAIT operation was interrupted by a signal, retry again&quot;);<br>
&gt; &gt;<br>
&gt; <br>
&gt; I am not sure if this warning message is desired for test-cases which<br>
&gt; needs to wait on checkpoints irrespective of signals like this<br>
&gt; tgkill01 test-case.<br>
<br>
Agreed, it&#39;s not an error condition, it&#39;s just a coincidence that most<br>
of the tests does not get signals when they sleep on futex, otherwise<br>
thing would crash and burn. So I would argue that retrying on EINTR is<br>
actually a bug fix rather than anything else.<br></blockquote><div><br></div><div><div class="gmail_default" style="font-size:small">Okay, here I&#39;m not insist to print the warning. But it&#39;s only use for hint on that worst situation which you were mentioned. If the checkpoint got signal leads to never timeout and test eventually killed by test lib. That would hard to know what happened at that moment. My concern is the situation is hard to reproduce later so just want to print more useful messeges:).</div></div></div><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div>Regards,<br></div><div>Li Wang<br></div></div></div></div>
Cyril Hrubis March 15, 2019, 11:33 a.m. | #10
Hi!
> > > I am not sure if this warning message is desired for test-cases which
> > > needs to wait on checkpoints irrespective of signals like this
> > > tgkill01 test-case.
> >
> > Agreed, it's not an error condition, it's just a coincidence that most
> > of the tests does not get signals when they sleep on futex, otherwise
> > thing would crash and burn. So I would argue that retrying on EINTR is
> > actually a bug fix rather than anything else.
> >
> 
> Okay, here I'm not insist to print the warning. But it's only use for hint
> on that worst situation which you were mentioned. If the checkpoint got
> signal leads to never timeout and test eventually killed by test lib. That
> would hard to know what happened at that moment. My concern is the
> situation is hard to reproduce later so just want to print more useful
> messeges:).

As for now that's only a hypotetical corner case, someone would have to
send signals to such process sleeping on the checkpoint in a loop for
that to happen. The problem is that printing any messages when
checkpoint was interrupted by signal would lead to even greater
confusion for tests that actually have to send signals to such
processes.
Sumit Garg March 15, 2019, 2:01 p.m. | #11
On Fri, 15 Mar 2019 at 15:48, Cyril Hrubis <chrubis@suse.cz> wrote:
>
> Hi!
> > Wouldn't this loop be more appropriate in
> > "tst_safe_checkpoint_wait()"? As at later stage we may have tests that
> > depends on checkpoints being interrupted by signals and could directly
> > use "tst_checkpoint_wait()".
>
> The tst_checkpoint_wait() has a single use in the source tree and that
> is testcases/lib/tst_checkpoint.c which is binary wrapper around
> checkpoints so that we can use them in shell scripts as well, which is
> pretty cool btw. And I think that we should retry on EINTR there as
> well.

Ah, I see.

>
> Also there does not seem to be test relying on checkpoints being
> interrupted by signals and I would like to avoid this pattern ideally as
> asynchronous events such as signals interrupting functions is something
> rather counter intuitive.
>

Ok got it. Should I pick up this change as separate patch in this
patch-set or would you like to commit it and then I would update
patch-set to use these checkpoints?

-Sumit

> --
> Cyril Hrubis
> chrubis@suse.cz
Li Wang March 18, 2019, 6:39 a.m. | #12
On Fri, Mar 15, 2019 at 7:34 PM Cyril Hrubis <chrubis@suse.cz> wrote:

> Hi!

> > > > I am not sure if this warning message is desired for test-cases which

> > > > needs to wait on checkpoints irrespective of signals like this

> > > > tgkill01 test-case.

> > >

> > > Agreed, it's not an error condition, it's just a coincidence that most

> > > of the tests does not get signals when they sleep on futex, otherwise

> > > thing would crash and burn. So I would argue that retrying on EINTR is

> > > actually a bug fix rather than anything else.

> > >

> >

> > Okay, here I'm not insist to print the warning. But it's only use for

> hint

> > on that worst situation which you were mentioned. If the checkpoint got

> > signal leads to never timeout and test eventually killed by test lib.

> That

> > would hard to know what happened at that moment. My concern is the

> > situation is hard to reproduce later so just want to print more useful

> > messeges:).

>

> As for now that's only a hypotetical corner case, someone would have to

> send signals to such process sleeping on the checkpoint in a loop for

> that to happen. The problem is that printing any messages when

> checkpoint was interrupted by signal would lead to even greater

> confusion for tests that actually have to send signals to such

> processes.

>


That's true. I'm ok to withdraw my suggestion.

-- 
Regards,
Li Wang
<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Mar 15, 2019 at 7:34 PM Cyril Hrubis &lt;<a href="mailto:chrubis@suse.cz">chrubis@suse.cz</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi!<br>
&gt; &gt; &gt; I am not sure if this warning message is desired for test-cases which<br>
&gt; &gt; &gt; needs to wait on checkpoints irrespective of signals like this<br>
&gt; &gt; &gt; tgkill01 test-case.<br>
&gt; &gt;<br>
&gt; &gt; Agreed, it&#39;s not an error condition, it&#39;s just a coincidence that most<br>
&gt; &gt; of the tests does not get signals when they sleep on futex, otherwise<br>
&gt; &gt; thing would crash and burn. So I would argue that retrying on EINTR is<br>
&gt; &gt; actually a bug fix rather than anything else.<br>
&gt; &gt;<br>
&gt; <br>
&gt; Okay, here I&#39;m not insist to print the warning. But it&#39;s only use for hint<br>
&gt; on that worst situation which you were mentioned. If the checkpoint got<br>
&gt; signal leads to never timeout and test eventually killed by test lib. That<br>
&gt; would hard to know what happened at that moment. My concern is the<br>
&gt; situation is hard to reproduce later so just want to print more useful<br>
&gt; messeges:).<br>
<br>
As for now that&#39;s only a hypotetical corner case, someone would have to<br>
send signals to such process sleeping on the checkpoint in a loop for<br>
that to happen. The problem is that printing any messages when<br>
checkpoint was interrupted by signal would lead to even greater<br>
confusion for tests that actually have to send signals to such<br>
processes.<br></blockquote><div><br></div><div class="gmail_default" style="font-size:small">That&#39;s true. I&#39;m ok to withdraw my suggestion.</div></div><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div>Regards,<br></div><div>Li Wang<br></div></div></div></div></div></div>
Cyril Hrubis March 18, 2019, 12:59 p.m. | #13
Hi!
> Ok got it. Should I pick up this change as separate patch in this
> patch-set or would you like to commit it and then I would update
> patch-set to use these checkpoints?

I will send my patch to the ML soon, you can then rebase your work on
the top of that then I will commit the patches in right order.

Patch

diff --git a/runtest/syscalls b/runtest/syscalls
index d752dba..8cd99e0 100644
--- a/runtest/syscalls
+++ b/runtest/syscalls
@@ -1399,6 +1399,8 @@  syslog10 syslog10
 syslog11 syslog11
 syslog12 syslog12
 
+tgkill01 tgkill01
+
 time01 time01
 time02 time02
 
diff --git a/testcases/kernel/syscalls/tgkill/.gitignore b/testcases/kernel/syscalls/tgkill/.gitignore
new file mode 100644
index 0000000..f4566fd
--- /dev/null
+++ b/testcases/kernel/syscalls/tgkill/.gitignore
@@ -0,0 +1 @@ 
+tgkill01
diff --git a/testcases/kernel/syscalls/tgkill/Makefile b/testcases/kernel/syscalls/tgkill/Makefile
new file mode 100644
index 0000000..a51080c
--- /dev/null
+++ b/testcases/kernel/syscalls/tgkill/Makefile
@@ -0,0 +1,10 @@ 
+# SPDX-License-Identifier: GPL-2.0-or-later
+# Copyright (c) 2018 Google, Inc.
+
+top_srcdir		?= ../../../..
+
+include $(top_srcdir)/include/mk/testcases.mk
+
+include $(top_srcdir)/include/mk/generic_leaf_target.mk
+
+LDLIBS			+= -pthread
diff --git a/testcases/kernel/syscalls/tgkill/tgkill.h b/testcases/kernel/syscalls/tgkill/tgkill.h
new file mode 100644
index 0000000..a7d96f4
--- /dev/null
+++ b/testcases/kernel/syscalls/tgkill/tgkill.h
@@ -0,0 +1,22 @@ 
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2018 Google, Inc.
+ */
+
+#ifndef TGKILL_H
+#define TGKILL_H
+
+#include "config.h"
+#include "lapi/syscalls.h"
+
+static inline int sys_tgkill(int tgid, int tid, int sig)
+{
+	return tst_syscall(__NR_tgkill, tgid, tid, sig);
+}
+
+static inline pid_t sys_gettid(void)
+{
+	return tst_syscall(__NR_gettid);
+}
+
+#endif /* TGKILL_H */
diff --git a/testcases/kernel/syscalls/tgkill/tgkill01.c b/testcases/kernel/syscalls/tgkill/tgkill01.c
new file mode 100644
index 0000000..acc9977
--- /dev/null
+++ b/testcases/kernel/syscalls/tgkill/tgkill01.c
@@ -0,0 +1,141 @@ 
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2018 Google, Inc.
+ *
+ * tgkill() delivers a signal to a specific thread.  Test this by installing
+ * a SIGUSR1 handler which records the current pthread ID.  Start a number
+ * of threads in parallel, then one-by-one call tgkill(..., tid, SIGUSR1)
+ * and check that the expected pthread ID was recorded.
+ */
+
+#include <pthread.h>
+#include <stdlib.h>
+
+#include "tst_safe_pthread.h"
+#include "tst_test.h"
+#include "tgkill.h"
+
+struct thread_state {
+	pthread_t thread;
+	pid_t tid;
+};
+
+static char *str_threads;
+static int n_threads = 10;
+static struct thread_state *threads;
+
+static pthread_t sigusr1_thread;
+
+static int test_running;
+static pthread_cond_t test_running_cond = PTHREAD_COND_INITIALIZER;
+static pthread_mutex_t test_running_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+static void sigusr1_handler(int signum __attribute__((unused)))
+{
+	sigusr1_thread = pthread_self();
+}
+
+static void *thread_func(void *arg)
+{
+	struct thread_state *thread = arg;
+
+	/**
+	 * There is no standard way to map pthread -> tid, so we will have the
+	 * child stash its own tid then notify the parent that the stashed tid
+	 * is available.
+	 */
+	thread->tid = sys_gettid();
+
+	TST_CHECKPOINT_WAKE(0);
+
+	pthread_mutex_lock(&test_running_mutex);
+	while (test_running)
+		pthread_cond_wait(&test_running_cond, &test_running_mutex);
+	pthread_mutex_unlock(&test_running_mutex);
+
+	return arg;
+}
+
+static void start_thread(struct thread_state *thread)
+{
+	SAFE_PTHREAD_CREATE(&thread->thread, NULL, thread_func, thread);
+
+	TST_CHECKPOINT_WAIT(0);
+}
+
+static void stop_threads(void)
+{
+	int i;
+
+	test_running = 0;
+	pthread_cond_broadcast(&test_running_cond);
+
+	for (i = 0; i < n_threads; i++) {
+		if (threads[i].tid == -1)
+			continue;
+
+		SAFE_PTHREAD_JOIN(threads[i].thread, NULL);
+		threads[i].tid = -1;
+	}
+
+	if (threads)
+		free(threads);
+}
+
+static void run(void)
+{
+	int i;
+
+	for (i = 0; i < n_threads; i++)
+		threads[i].tid = -1;
+
+	test_running = 1;
+	for (i = 0; i < n_threads; i++)
+		start_thread(&threads[i]);
+
+	for (i = 0; i < n_threads; i++) {
+		sigusr1_thread = pthread_self();
+
+		TEST(sys_tgkill(getpid(), threads[i].tid, SIGUSR1));
+		if (TST_RET) {
+			tst_res(TFAIL | TTERRNO, "tgkill() failed");
+			return;
+		}
+
+		while (pthread_equal(sigusr1_thread, pthread_self()))
+			usleep(1000);
+
+		if (!pthread_equal(sigusr1_thread, threads[i].thread)) {
+			tst_res(TFAIL, "SIGUSR1 delivered to wrong thread");
+			return;
+		}
+	}
+
+	tst_res(TPASS, "SIGUSR1 delivered to correct threads");
+}
+
+static void setup(void)
+{
+	if (tst_parse_int(str_threads, &n_threads, 1, INT_MAX))
+		tst_brk(TBROK, "Invalid number of threads '%s'", str_threads);
+
+	threads = SAFE_MALLOC(sizeof(*threads) * n_threads);
+
+	struct sigaction sigusr1 = {
+		.sa_handler = sigusr1_handler,
+	};
+	SAFE_SIGACTION(SIGUSR1, &sigusr1, NULL);
+}
+
+static struct tst_option options[] = {
+	{"t:", &str_threads, "-t       Number of threads (default 10)"},
+	{NULL, NULL, NULL},
+};
+
+static struct tst_test test = {
+	.options = options,
+	.needs_checkpoints = 1,
+	.setup = setup,
+	.test_all = run,
+	.cleanup = stop_threads,
+};