diff mbox series

[03/12] tests/plugin: add test plugin for inline operations

Message ID 20240111142326.1743444-4-pierrick.bouvier@linaro.org
State New
Headers show
Series TCG Plugin inline operation enhancement | expand

Commit Message

Pierrick Bouvier Jan. 11, 2024, 2:23 p.m. UTC
For now, it simply performs instruction, bb and mem count, and ensure
that inline vs callback versions have the same result. Later, we'll
extend it when new inline operations are added.

Use existing plugins to test everything works is a bit cumbersome, as
different events are treated in different plugins. Thus, this new one.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 tests/plugin/inline.c    | 183 +++++++++++++++++++++++++++++++++++++++
 tests/plugin/meson.build |   2 +-
 2 files changed, 184 insertions(+), 1 deletion(-)
 create mode 100644 tests/plugin/inline.c

Comments

Philippe Mathieu-Daudé Jan. 11, 2024, 3:57 p.m. UTC | #1
Hi Pierrick,

On 11/1/24 15:23, Pierrick Bouvier wrote:
> For now, it simply performs instruction, bb and mem count, and ensure
> that inline vs callback versions have the same result. Later, we'll
> extend it when new inline operations are added.
> 
> Use existing plugins to test everything works is a bit cumbersome, as
> different events are treated in different plugins. Thus, this new one.
> 
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
>   tests/plugin/inline.c    | 183 +++++++++++++++++++++++++++++++++++++++
>   tests/plugin/meson.build |   2 +-
>   2 files changed, 184 insertions(+), 1 deletion(-)
>   create mode 100644 tests/plugin/inline.c

> +#define MAX_CPUS 8

Where does this value come from?

Should the pluggin API provide a helper to ask TCG how many
vCPUs are created?
Pierrick Bouvier Jan. 11, 2024, 5:20 p.m. UTC | #2
On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
> Hi Pierrick,
> 
> On 11/1/24 15:23, Pierrick Bouvier wrote:
>> For now, it simply performs instruction, bb and mem count, and ensure
>> that inline vs callback versions have the same result. Later, we'll
>> extend it when new inline operations are added.
>>
>> Use existing plugins to test everything works is a bit cumbersome, as
>> different events are treated in different plugins. Thus, this new one.
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>>    tests/plugin/inline.c    | 183 +++++++++++++++++++++++++++++++++++++++
>>    tests/plugin/meson.build |   2 +-
>>    2 files changed, 184 insertions(+), 1 deletion(-)
>>    create mode 100644 tests/plugin/inline.c
> 
>> +#define MAX_CPUS 8
> 
> Where does this value come from?
> 

The plugin tests/plugin/insn.c had this constant, so I picked it up from 
here.

> Should the pluggin API provide a helper to ask TCG how many
> vCPUs are created?

In user mode, we can't know how many simultaneous threads (and thus 
vcpu) will be triggered by advance. I'm not sure if additional cpus can 
be added in system mode.

One problem though, is that when you register an inline op with a 
dynamic array, when you resize it (when detecting a new vcpu), you can't 
change it afterwards. So, you need a storage statically sized somewhere.

Your question is good, and maybe we should define a MAX constant that 
plugins should rely on, instead of a random amount.
Alex Bennée Jan. 12, 2024, 5:20 p.m. UTC | #3
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:

> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
>> Hi Pierrick,
>> On 11/1/24 15:23, Pierrick Bouvier wrote:
>>> For now, it simply performs instruction, bb and mem count, and ensure
>>> that inline vs callback versions have the same result. Later, we'll
>>> extend it when new inline operations are added.
>>>
>>> Use existing plugins to test everything works is a bit cumbersome, as
>>> different events are treated in different plugins. Thus, this new one.
>>>
>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>> ---
>>>    tests/plugin/inline.c    | 183 +++++++++++++++++++++++++++++++++++++++
>>>    tests/plugin/meson.build |   2 +-
>>>    2 files changed, 184 insertions(+), 1 deletion(-)
>>>    create mode 100644 tests/plugin/inline.c
>> 
>>> +#define MAX_CPUS 8
>> Where does this value come from?
>> 
>
> The plugin tests/plugin/insn.c had this constant, so I picked it up
> from here.
>
>> Should the pluggin API provide a helper to ask TCG how many
>> vCPUs are created?
>
> In user mode, we can't know how many simultaneous threads (and thus
> vcpu) will be triggered by advance. I'm not sure if additional cpus
> can be added in system mode.
>
> One problem though, is that when you register an inline op with a
> dynamic array, when you resize it (when detecting a new vcpu), you
> can't change it afterwards. So, you need a storage statically sized
> somewhere.
>
> Your question is good, and maybe we should define a MAX constant that
> plugins should rely on, instead of a random amount.

For user-mode it can be infinite. The existing plugins do this by
ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the
scoreboard as well? Of course that does introduce a trap for those using
user-mode...
Pierrick Bouvier Jan. 13, 2024, 5:16 a.m. UTC | #4
On 1/12/24 21:20, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> 
>> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
>>> Hi Pierrick,
>>> On 11/1/24 15:23, Pierrick Bouvier wrote:
>>>> For now, it simply performs instruction, bb and mem count, and ensure
>>>> that inline vs callback versions have the same result. Later, we'll
>>>> extend it when new inline operations are added.
>>>>
>>>> Use existing plugins to test everything works is a bit cumbersome, as
>>>> different events are treated in different plugins. Thus, this new one.
>>>>
>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>> ---
>>>>     tests/plugin/inline.c    | 183 +++++++++++++++++++++++++++++++++++++++
>>>>     tests/plugin/meson.build |   2 +-
>>>>     2 files changed, 184 insertions(+), 1 deletion(-)
>>>>     create mode 100644 tests/plugin/inline.c
>>>
>>>> +#define MAX_CPUS 8
>>> Where does this value come from?
>>>
>>
>> The plugin tests/plugin/insn.c had this constant, so I picked it up
>> from here.
>>
>>> Should the pluggin API provide a helper to ask TCG how many
>>> vCPUs are created?
>>
>> In user mode, we can't know how many simultaneous threads (and thus
>> vcpu) will be triggered by advance. I'm not sure if additional cpus
>> can be added in system mode.
>>
>> One problem though, is that when you register an inline op with a
>> dynamic array, when you resize it (when detecting a new vcpu), you
>> can't change it afterwards. So, you need a storage statically sized
>> somewhere.
>>
>> Your question is good, and maybe we should define a MAX constant that
>> plugins should rely on, instead of a random amount.
> 
> For user-mode it can be infinite. The existing plugins do this by
> ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the
> scoreboard as well? Of course that does introduce a trap for those using
> user-mode...
> 

The problem with vcpu-index % max_vcpu is that it reintroduces race 
condition, though it's probably less frequent than on a single variable. 
IMHO, yes it solves memory error, but does not solve the initial problem 
itself.

The simplest solution would be to have a size "big enough" for most 
cases, and abort when it's reached.

Another solution, much more complicated, but correct, would be to move 
memory management of plugin scoreboard to plugin runtime, and add a 
level of indirection to access it. Every time a new vcpu is added, we 
can grow dynamically. This way, the array can grow, and ultimately, 
plugin can poke its content/size. I'm not sure this complexity is what 
we want though.
Alex Bennée Jan. 13, 2024, 5:16 p.m. UTC | #5
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:

> On 1/12/24 21:20, Alex Bennée wrote:
>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>> 
>>> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
>>>> Hi Pierrick,
>>>> On 11/1/24 15:23, Pierrick Bouvier wrote:
>>>>> For now, it simply performs instruction, bb and mem count, and ensure
>>>>> that inline vs callback versions have the same result. Later, we'll
>>>>> extend it when new inline operations are added.
>>>>>
>>>>> Use existing plugins to test everything works is a bit cumbersome, as
>>>>> different events are treated in different plugins. Thus, this new one.
>>>>>
<snip>
>>>>> +#define MAX_CPUS 8
>>>> Where does this value come from?
>>>>
>>>
>>> The plugin tests/plugin/insn.c had this constant, so I picked it up
>>> from here.
>>>
>>>> Should the pluggin API provide a helper to ask TCG how many
>>>> vCPUs are created?
>>>
>>> In user mode, we can't know how many simultaneous threads (and thus
>>> vcpu) will be triggered by advance. I'm not sure if additional cpus
>>> can be added in system mode.
>>>
>>> One problem though, is that when you register an inline op with a
>>> dynamic array, when you resize it (when detecting a new vcpu), you
>>> can't change it afterwards. So, you need a storage statically sized
>>> somewhere.
>>>
>>> Your question is good, and maybe we should define a MAX constant that
>>> plugins should rely on, instead of a random amount.
>> For user-mode it can be infinite. The existing plugins do this by
>> ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the
>> scoreboard as well? Of course that does introduce a trap for those using
>> user-mode...
>> 
>
> The problem with vcpu-index % max_vcpu is that it reintroduces race
> condition, though it's probably less frequent than on a single
> variable. IMHO, yes it solves memory error, but does not solve the
> initial problem itself.
>
> The simplest solution would be to have a size "big enough" for most
> cases, and abort when it's reached.

Well that is simple enough for system emulation as max_vcpus is a bounded
number.

> Another solution, much more complicated, but correct, would be to move
> memory management of plugin scoreboard to plugin runtime, and add a
> level of indirection to access it.

That certainly gives us the most control and safety. We can then ensure
we'll never to writing past the bounds of the buffer. The plugin would
have to use an access function to get the pointer to read at the time it
cared and of course inline checks should be pretty simple.

> Every time a new vcpu is added, we
> can grow dynamically. This way, the array can grow, and ultimately,
> plugin can poke its content/size. I'm not sure this complexity is what
> we want though.

It doesn't seem too bad. We have a start/end_exclusive is *-user do_fork
where we could update pointers. If we are smart about growing the size
of the arrays we could avoid too much re-translation.

Do we want a limit of one scoreboard per thread? Can we store structures
in there?
Pierrick Bouvier Jan. 15, 2024, 7:06 a.m. UTC | #6
On 1/13/24 21:16, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> 
>> On 1/12/24 21:20, Alex Bennée wrote:
>>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>>>
>>>> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
>>>>> Hi Pierrick,
>>>>> On 11/1/24 15:23, Pierrick Bouvier wrote:
>>>>>> For now, it simply performs instruction, bb and mem count, and ensure
>>>>>> that inline vs callback versions have the same result. Later, we'll
>>>>>> extend it when new inline operations are added.
>>>>>>
>>>>>> Use existing plugins to test everything works is a bit cumbersome, as
>>>>>> different events are treated in different plugins. Thus, this new one.
>>>>>>
> <snip>
>>>>>> +#define MAX_CPUS 8
>>>>> Where does this value come from?
>>>>>
>>>>
>>>> The plugin tests/plugin/insn.c had this constant, so I picked it up
>>>> from here.
>>>>
>>>>> Should the pluggin API provide a helper to ask TCG how many
>>>>> vCPUs are created?
>>>>
>>>> In user mode, we can't know how many simultaneous threads (and thus
>>>> vcpu) will be triggered by advance. I'm not sure if additional cpus
>>>> can be added in system mode.
>>>>
>>>> One problem though, is that when you register an inline op with a
>>>> dynamic array, when you resize it (when detecting a new vcpu), you
>>>> can't change it afterwards. So, you need a storage statically sized
>>>> somewhere.
>>>>
>>>> Your question is good, and maybe we should define a MAX constant that
>>>> plugins should rely on, instead of a random amount.
>>> For user-mode it can be infinite. The existing plugins do this by
>>> ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the
>>> scoreboard as well? Of course that does introduce a trap for those using
>>> user-mode...
>>>
>>
>> The problem with vcpu-index % max_vcpu is that it reintroduces race
>> condition, though it's probably less frequent than on a single
>> variable. IMHO, yes it solves memory error, but does not solve the
>> initial problem itself.
>>
>> The simplest solution would be to have a size "big enough" for most
>> cases, and abort when it's reached.
> 
> Well that is simple enough for system emulation as max_vcpus is a bounded
> number.
> 
>> Another solution, much more complicated, but correct, would be to move
>> memory management of plugin scoreboard to plugin runtime, and add a
>> level of indirection to access it.
> 
> That certainly gives us the most control and safety. We can then ensure
> we'll never to writing past the bounds of the buffer. The plugin would
> have to use an access function to get the pointer to read at the time it
> cared and of course inline checks should be pretty simple.
> 
>> Every time a new vcpu is added, we
>> can grow dynamically. This way, the array can grow, and ultimately,
>> plugin can poke its content/size. I'm not sure this complexity is what
>> we want though.
> 
> It doesn't seem too bad. We have a start/end_exclusive is *-user do_fork
> where we could update pointers. If we are smart about growing the size
> of the arrays we could avoid too much re-translation.
>

I was concerned about a potential race when the scoreboard updates this 
pointer, and other cpus are executing tb (using it). But this concern is 
not valid, since start_exclusive ensures all other cpus are stopped.

vcpu_init_hook function in plugins/core.c seems a good location to add 
this logic. We would check if an update is needed, then 
start_exclusive(), update the scoreboard and exit exclusive section.

Do you think it's worth to try to inline scoreboard pointer (and flush 
all tb when updated), instead of simply adding an indirection to it? 
With this, we could avoid any need to re-translate anything.

> Do we want a limit of one scoreboard per thread? Can we store structures
> in there?
> 

 From the current plugins use case, it seems that several scoreboards 
are needed.
Allowing structure storage seems a bit more tricky to me, because since 
memory may be reallocated, users won't be allowed to point directly to 
it. I would be in favor to avoid this (comments are welcome).
Alex Bennée Jan. 15, 2024, 9:04 a.m. UTC | #7
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:

> On 1/13/24 21:16, Alex Bennée wrote:
>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>> 
>>> On 1/12/24 21:20, Alex Bennée wrote:
>>>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>>>>
>>>>> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
>>>>>> Hi Pierrick,
>>>>>> On 11/1/24 15:23, Pierrick Bouvier wrote:
>>>>>>> For now, it simply performs instruction, bb and mem count, and ensure
>>>>>>> that inline vs callback versions have the same result. Later, we'll
>>>>>>> extend it when new inline operations are added.
>>>>>>>
>>>>>>> Use existing plugins to test everything works is a bit cumbersome, as
>>>>>>> different events are treated in different plugins. Thus, this new one.
>>>>>>>
>> <snip>
>>>>>>> +#define MAX_CPUS 8
>>>>>> Where does this value come from?
>>>>>>
>>>>>
>>>>> The plugin tests/plugin/insn.c had this constant, so I picked it up
>>>>> from here.
>>>>>
>>>>>> Should the pluggin API provide a helper to ask TCG how many
>>>>>> vCPUs are created?
>>>>>
>>>>> In user mode, we can't know how many simultaneous threads (and thus
>>>>> vcpu) will be triggered by advance. I'm not sure if additional cpus
>>>>> can be added in system mode.
>>>>>
>>>>> One problem though, is that when you register an inline op with a
>>>>> dynamic array, when you resize it (when detecting a new vcpu), you
>>>>> can't change it afterwards. So, you need a storage statically sized
>>>>> somewhere.
>>>>>
>>>>> Your question is good, and maybe we should define a MAX constant that
>>>>> plugins should rely on, instead of a random amount.
>>>> For user-mode it can be infinite. The existing plugins do this by
>>>> ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the
>>>> scoreboard as well? Of course that does introduce a trap for those using
>>>> user-mode...
>>>>
>>>
>>> The problem with vcpu-index % max_vcpu is that it reintroduces race
>>> condition, though it's probably less frequent than on a single
>>> variable. IMHO, yes it solves memory error, but does not solve the
>>> initial problem itself.
>>>
>>> The simplest solution would be to have a size "big enough" for most
>>> cases, and abort when it's reached.
>> Well that is simple enough for system emulation as max_vcpus is a
>> bounded
>> number.
>> 
>>> Another solution, much more complicated, but correct, would be to move
>>> memory management of plugin scoreboard to plugin runtime, and add a
>>> level of indirection to access it.
>> That certainly gives us the most control and safety. We can then
>> ensure
>> we'll never to writing past the bounds of the buffer. The plugin would
>> have to use an access function to get the pointer to read at the time it
>> cared and of course inline checks should be pretty simple.
>> 
>>> Every time a new vcpu is added, we
>>> can grow dynamically. This way, the array can grow, and ultimately,
>>> plugin can poke its content/size. I'm not sure this complexity is what
>>> we want though.
>> It doesn't seem too bad. We have a start/end_exclusive is *-user
>> do_fork
>> where we could update pointers. If we are smart about growing the size
>> of the arrays we could avoid too much re-translation.
>>
>
> I was concerned about a potential race when the scoreboard updates
> this pointer, and other cpus are executing tb (using it). But this
> concern is not valid, since start_exclusive ensures all other cpus are
> stopped.
>
> vcpu_init_hook function in plugins/core.c seems a good location to add
> this logic. We would check if an update is needed, then
> start_exclusive(), update the scoreboard and exit exclusive section.

It might already be in the exclusive section. We should try and re-use
an existing exclusive region rather than adding a new one. It's
expensive so best avoided if we can.

> Do you think it's worth to try to inline scoreboard pointer (and flush
> all tb when updated), instead of simply adding an indirection to it?
> With this, we could avoid any need to re-translate anything.

Depends on the cost/complexity of the indirection I guess.
Re-translation isn't super expensive if we say double the size of the
score board each time we need to.

>> Do we want a limit of one scoreboard per thread? Can we store structures
>> in there?
>> 
>
> From the current plugins use case, it seems that several scoreboards
> are needed.
> Allowing structure storage seems a bit more tricky to me, because
> since memory may be reallocated, users won't be allowed to point
> directly to it. I would be in favor to avoid this (comments are
> welcome).

The init function can take a sizeof(entry) field and the inline op can
have an offset field (which for the simple 0 case can be folded away by
TCG).
Pierrick Bouvier Jan. 16, 2024, 7:46 a.m. UTC | #8
On 1/15/24 13:04, Alex Bennée wrote:> Pierrick Bouvier 
<pierrick.bouvier@linaro.org> writes:>
>> On 1/13/24 21:16, Alex Bennée wrote:
>>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>>>
>>>> On 1/12/24 21:20, Alex Bennée wrote:
>>>>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>>>>>
>>>>>> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
>>>>>>> Hi Pierrick,
>>>>>>> On 11/1/24 15:23, Pierrick Bouvier wrote:
>>>>>>>> For now, it simply performs instruction, bb and mem count, and ensure
>>>>>>>> that inline vs callback versions have the same result. Later, we'll
>>>>>>>> extend it when new inline operations are added.
>>>>>>>>
>>>>>>>> Use existing plugins to test everything works is a bit cumbersome, as
>>>>>>>> different events are treated in different plugins. Thus, this new one.
>>>>>>>>
>>> <snip>
>>>>>>>> +#define MAX_CPUS 8
>>>>>>> Where does this value come from?
>>>>>>>
>>>>>>
>>>>>> The plugin tests/plugin/insn.c had this constant, so I picked it up
>>>>>> from here.
>>>>>>
>>>>>>> Should the pluggin API provide a helper to ask TCG how many
>>>>>>> vCPUs are created?
>>>>>>
>>>>>> In user mode, we can't know how many simultaneous threads (and thus
>>>>>> vcpu) will be triggered by advance. I'm not sure if additional cpus
>>>>>> can be added in system mode.
>>>>>>
>>>>>> One problem though, is that when you register an inline op with a
>>>>>> dynamic array, when you resize it (when detecting a new vcpu), you
>>>>>> can't change it afterwards. So, you need a storage statically sized
>>>>>> somewhere.
>>>>>>
>>>>>> Your question is good, and maybe we should define a MAX constant that
>>>>>> plugins should rely on, instead of a random amount.
>>>>> For user-mode it can be infinite. The existing plugins do this by
>>>>> ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the
>>>>> scoreboard as well? Of course that does introduce a trap for those using
>>>>> user-mode...
>>>>>
>>>>
>>>> The problem with vcpu-index % max_vcpu is that it reintroduces race
>>>> condition, though it's probably less frequent than on a single
>>>> variable. IMHO, yes it solves memory error, but does not solve the
>>>> initial problem itself.
>>>>
>>>> The simplest solution would be to have a size "big enough" for most
>>>> cases, and abort when it's reached.
>>> Well that is simple enough for system emulation as max_vcpus is a
>>> bounded
>>> number.
>>>
>>>> Another solution, much more complicated, but correct, would be to move
>>>> memory management of plugin scoreboard to plugin runtime, and add a
>>>> level of indirection to access it.
>>> That certainly gives us the most control and safety. We can then
>>> ensure
>>> we'll never to writing past the bounds of the buffer. The plugin would
>>> have to use an access function to get the pointer to read at the time it
>>> cared and of course inline checks should be pretty simple.
>>>
>>>> Every time a new vcpu is added, we
>>>> can grow dynamically. This way, the array can grow, and ultimately,
>>>> plugin can poke its content/size. I'm not sure this complexity is what
>>>> we want though.
>>> It doesn't seem too bad. We have a start/end_exclusive is *-user
>>> do_fork
>>> where we could update pointers. If we are smart about growing the size
>>> of the arrays we could avoid too much re-translation.
>>>
>>
>> I was concerned about a potential race when the scoreboard updates
>> this pointer, and other cpus are executing tb (using it). But this
>> concern is not valid, since start_exclusive ensures all other cpus are
>> stopped.
>>
>> vcpu_init_hook function in plugins/core.c seems a good location to add
>> this logic. We would check if an update is needed, then
>> start_exclusive(), update the scoreboard and exit exclusive section.
> 
> It might already be in the exclusive section. We should try and re-use
> an existing exclusive region rather than adding a new one. It's
> expensive so best avoided if we can.
> 
>> Do you think it's worth to try to inline scoreboard pointer (and flush
>> all tb when updated), instead of simply adding an indirection to it?
>> With this, we could avoid any need to re-translate anything.
> 
> Depends on the cost/complexity of the indirection I guess.
> Re-translation isn't super expensive if we say double the size of the
> score board each time we need to.
>
>>> Do we want a limit of one scoreboard per thread? Can we store structures
>>> in there?
>>>
>>
>>  From the current plugins use case, it seems that several scoreboards
>> are needed.
>> Allowing structure storage seems a bit more tricky to me, because
>> since memory may be reallocated, users won't be allowed to point
>> directly to it. I would be in favor to avoid this (comments are
>> welcome).
> 
> The init function can take a sizeof(entry) field and the inline op can
> have an offset field (which for the simple 0 case can be folded away by
> TCG).
> 

Thanks for all your comments and guidance on this.

I implemented a new version, working with a scoreboard that gets resized 
automatically, which allows usage of structs as well. The result is 
pretty satisfying as there is almost no more need to manually keep track 
of how many cpus have been used, while offering thread-safety by default.

I'll re-spin the serie once I cleaned up commits, and ported existing 
plugins to this.
diff mbox series

Patch

diff --git a/tests/plugin/inline.c b/tests/plugin/inline.c
new file mode 100644
index 00000000000..6114ebca545
--- /dev/null
+++ b/tests/plugin/inline.c
@@ -0,0 +1,183 @@ 
+/*
+ * Copyright (C) 2023, Pierrick Bouvier <pierrick.bouvier@linaro.org>
+ *
+ * Demonstrates and tests usage of inline ops.
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+
+#include <glib.h>
+#include <stdint.h>
+#include <stdio.h>
+
+#include <qemu-plugin.h>
+
+#define MAX_CPUS 8
+
+static uint64_t count_tb;
+static uint64_t count_tb_per_vcpu[MAX_CPUS];
+static uint64_t count_tb_inline_per_vcpu[MAX_CPUS];
+static uint64_t count_tb_inline_racy;
+static uint64_t count_insn;
+static uint64_t count_insn_per_vcpu[MAX_CPUS];
+static uint64_t count_insn_inline_per_vcpu[MAX_CPUS];
+static uint64_t count_insn_inline_racy;
+static uint64_t count_mem;
+static uint64_t count_mem_per_vcpu[MAX_CPUS];
+static uint64_t count_mem_inline_per_vcpu[MAX_CPUS];
+static uint64_t count_mem_inline_racy;
+static GMutex tb_lock;
+static GMutex insn_lock;
+static GMutex mem_lock;
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+static uint64_t collect_per_vcpu(uint64_t *values)
+{
+    uint64_t count = 0;
+    for (int i = 0; i < MAX_CPUS; ++i) {
+        count += values[i];
+    }
+    return count;
+}
+
+static void stats_insn(void)
+{
+    const uint64_t expected = count_insn;
+    const uint64_t per_vcpu = collect_per_vcpu(count_insn_per_vcpu);
+    const uint64_t inl_per_vcpu = collect_per_vcpu(count_insn_inline_per_vcpu);
+    printf("insn: %" PRIu64 "\n", expected);
+    printf("insn: %" PRIu64 " (per vcpu)\n", per_vcpu);
+    printf("insn: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
+    printf("insn: %" PRIu64 " (inline racy)\n", count_insn_inline_racy);
+    g_assert(expected > 0);
+    g_assert(per_vcpu == expected);
+    g_assert(inl_per_vcpu == expected);
+    g_assert(count_insn_inline_racy <= expected);
+}
+
+static void stats_tb(void)
+{
+    const uint64_t expected = count_tb;
+    const uint64_t per_vcpu = collect_per_vcpu(count_tb_per_vcpu);
+    const uint64_t inl_per_vcpu = collect_per_vcpu(count_tb_inline_per_vcpu);
+    printf("tb: %" PRIu64 "\n", expected);
+    printf("tb: %" PRIu64 " (per vcpu)\n", per_vcpu);
+    printf("tb: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
+    printf("tb: %" PRIu64 " (inline racy)\n", count_tb_inline_racy);
+    g_assert(expected > 0);
+    g_assert(per_vcpu == expected);
+    g_assert(inl_per_vcpu == expected);
+    g_assert(count_tb_inline_racy <= expected);
+}
+
+static void stats_mem(void)
+{
+    const uint64_t expected = count_mem;
+    const uint64_t per_vcpu = collect_per_vcpu(count_mem_per_vcpu);
+    const uint64_t inl_per_vcpu = collect_per_vcpu(count_mem_inline_per_vcpu);
+    printf("mem: %" PRIu64 "\n", expected);
+    printf("mem: %" PRIu64 " (per vcpu)\n", per_vcpu);
+    printf("mem: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
+    printf("mem: %" PRIu64 " (inline racy)\n", count_mem_inline_racy);
+    g_assert(expected > 0);
+    g_assert(per_vcpu == expected);
+    g_assert(inl_per_vcpu == expected);
+    g_assert(count_mem_inline_racy <= expected);
+}
+
+static void plugin_exit(qemu_plugin_id_t id, void *udata)
+{
+    for (int i = 0; i < MAX_CPUS; ++i) {
+        const uint64_t tb = count_tb_per_vcpu[i];
+        const uint64_t tb_inline = count_tb_inline_per_vcpu[i];
+        const uint64_t insn = count_insn_per_vcpu[i];
+        const uint64_t insn_inline = count_insn_inline_per_vcpu[i];
+        const uint64_t mem = count_mem_per_vcpu[i];
+        const uint64_t mem_inline = count_mem_inline_per_vcpu[i];
+        printf("cpu %d: tb (%" PRIu64 ", %" PRIu64 ") | "
+               "insn (%" PRIu64 ", %" PRIu64 ") | "
+               "mem (%" PRIu64 ", %" PRIu64 ")"
+               "\n",
+               i, tb, tb_inline, insn, insn_inline, mem, mem_inline);
+        g_assert(tb == tb_inline);
+        g_assert(insn == insn_inline);
+        g_assert(mem == mem_inline);
+    }
+
+    stats_tb();
+    stats_insn();
+    stats_mem();
+}
+
+static void vcpu_tb_exec(unsigned int cpu_index, void *udata)
+{
+    count_tb_per_vcpu[cpu_index]++;
+    g_mutex_lock(&tb_lock);
+    count_tb++;
+    g_mutex_unlock(&tb_lock);
+}
+
+static void vcpu_insn_exec(unsigned int cpu_index, void *udata)
+{
+    count_insn_per_vcpu[cpu_index]++;
+    g_mutex_lock(&insn_lock);
+    count_insn++;
+    g_mutex_unlock(&insn_lock);
+}
+
+static void vcpu_mem_access(unsigned int cpu_index,
+                            qemu_plugin_meminfo_t info,
+                            uint64_t vaddr,
+                            void *userdata)
+{
+    count_mem_per_vcpu[cpu_index]++;
+    g_mutex_lock(&mem_lock);
+    count_mem++;
+    g_mutex_unlock(&mem_lock);
+}
+
+static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
+{
+    qemu_plugin_register_vcpu_tb_exec_cb(tb, vcpu_tb_exec,
+                                         QEMU_PLUGIN_CB_NO_REGS, 0);
+    qemu_plugin_register_vcpu_tb_exec_inline(tb, QEMU_PLUGIN_INLINE_ADD_U64,
+                                             &count_tb_inline_racy, 1);
+    qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
+        tb, QEMU_PLUGIN_INLINE_ADD_U64,
+        count_tb_inline_per_vcpu, sizeof(uint64_t), 1);
+
+    for (int idx = 0; idx < qemu_plugin_tb_n_insns(tb); ++idx) {
+        struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, idx);
+        qemu_plugin_register_vcpu_insn_exec_cb(insn, vcpu_insn_exec,
+                                               QEMU_PLUGIN_CB_NO_REGS, 0);
+        qemu_plugin_register_vcpu_insn_exec_inline(
+            insn, QEMU_PLUGIN_INLINE_ADD_U64,
+            &count_insn_inline_racy, 1);
+        qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
+            insn, QEMU_PLUGIN_INLINE_ADD_U64,
+            count_insn_inline_per_vcpu, sizeof(uint64_t), 1);
+        qemu_plugin_register_vcpu_mem_cb(insn, &vcpu_mem_access,
+                                         QEMU_PLUGIN_CB_NO_REGS,
+                                         QEMU_PLUGIN_MEM_RW, 0);
+        qemu_plugin_register_vcpu_mem_inline(insn, QEMU_PLUGIN_MEM_RW,
+                                             QEMU_PLUGIN_INLINE_ADD_U64,
+                                             &count_mem_inline_racy, 1);
+        qemu_plugin_register_vcpu_mem_inline_per_vcpu(
+            insn, QEMU_PLUGIN_MEM_RW,
+            QEMU_PLUGIN_INLINE_ADD_U64,
+            count_mem_inline_per_vcpu, sizeof(uint64_t), 1);
+    }
+}
+
+QEMU_PLUGIN_EXPORT
+int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
+                        int argc, char **argv)
+{
+    g_assert(info->system.smp_vcpus <= MAX_CPUS);
+    qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
+    qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
+
+    return 0;
+}
diff --git a/tests/plugin/meson.build b/tests/plugin/meson.build
index e18183aaeda..9eece5bab51 100644
--- a/tests/plugin/meson.build
+++ b/tests/plugin/meson.build
@@ -1,6 +1,6 @@ 
 t = []
 if get_option('plugins')
-  foreach i : ['bb', 'empty', 'insn', 'mem', 'syscall']
+  foreach i : ['bb', 'empty', 'inline', 'insn', 'mem', 'syscall']
     if host_os == 'windows'
       t += shared_module(i, files(i + '.c') + '../../contrib/plugins/win32_linker.c',
                         include_directories: '../../include/qemu',