[RFC,14/20] coresight: etm-perf: implementing 'event_init()' API

Message ID 1442593594-10665-15-git-send-email-mathieu.poirier@linaro.org
State New
Headers show

Commit Message

Mathieu Poirier Sept. 18, 2015, 4:26 p.m.
Enhancing skeleton PMU with event initialisation.

The function makes sure tracers aren't already enabled
before going through with the powe up and configuration
sequences.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
---
 drivers/hwtracing/coresight/coresight-etm-perf.c | 176 +++++++++++++++++++++++
 1 file changed, 176 insertions(+)

Comments

Alexander Shishkin Sept. 22, 2015, 2:29 p.m. | #1
Mathieu Poirier <mathieu.poirier@linaro.org> writes:

> +static void etm_event_destroy(struct perf_event *event)
> +{
> +	/* switching off the source will also tear down the path */
> +	etm_event_power_sources(event->cpu, false);
> +}
> +
> +static int etm_event_init(struct perf_event *event)
> +{
> +	int ret;
> +
> +	if (event->attr.type != etm_pmu.type)
> +		return -ENOENT;
> +
> +	if (event->cpu >= nr_cpu_ids)
> +		return -EINVAL;
> +
> +	/* only one session at a time */
> +	if (etm_event_source_enabled(event->cpu))
> +		return -EBUSY;

Why is this the case? If you were to configure the event in pmu::add()
and deconfigure it in pmu::del(), like you already do with the buffer
part, you could handle as many sessions as you want.

> +
> +	/*
> +	 * Make sure CPUs don't disappear between the
> +	 * power up sequence and configuration.
> +	 */
> +	get_online_cpus();
> +	ret = etm_event_power_sources(event->cpu, true);
> +	if (ret)
> +		goto out;
> +
> +	ret = etm_event_config_sources(event->cpu);

This can be done in pmu::add(), if you can call directly into
etm_configure_cpu() or etm_config_enable() so that there's no cross-cpu
calling in between.

> +
> +	event->destroy = etm_event_destroy;
> +out:
> +	put_online_cpus();
> +	return ret;
> +}
> +
>  static int __init etm_perf_init(void)
>  {
>  	etm_pmu.capabilities	= PERF_PMU_CAP_EXCLUSIVE;
> @@ -59,6 +234,7 @@ static int __init etm_perf_init(void)
>  	etm_pmu.attr_groups	= etm_pmu_attr_groups;
>  	etm_pmu.task_ctx_nr	= perf_sw_context;
>  	etm_pmu.read		= etm_event_read;
> +	etm_pmu.event_init	= etm_event_init;
>  
>  	return perf_pmu_register(&etm_pmu, CORESIGHT_ETM_PMU_NAME, -1);
>  }

One general comment -- it would be slightly easier at least for me if
all the pmu related bits were in one patch.

Thanks,
--
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Mathieu Poirier Sept. 28, 2015, 9:22 p.m. | #2
On 22 September 2015 at 08:29, Alexander Shishkin
<alexander.shishkin@linux.intel.com> wrote:
> Mathieu Poirier <mathieu.poirier@linaro.org> writes:
>
>> +static void etm_event_destroy(struct perf_event *event)
>> +{
>> +     /* switching off the source will also tear down the path */
>> +     etm_event_power_sources(event->cpu, false);
>> +}
>> +
>> +static int etm_event_init(struct perf_event *event)
>> +{
>> +     int ret;
>> +
>> +     if (event->attr.type != etm_pmu.type)
>> +             return -ENOENT;
>> +
>> +     if (event->cpu >= nr_cpu_ids)
>> +             return -EINVAL;
>> +
>> +     /* only one session at a time */
>> +     if (etm_event_source_enabled(event->cpu))
>> +             return -EBUSY;
>
> Why is this the case? If you were to configure the event in pmu::add()
> and deconfigure it in pmu::del(), like you already do with the buffer
> part, you could handle as many sessions as you want.

Apologies for the late reply, I was travelling.

We certainly don't want to have more than once trace session going on
at any given time, especially if the sessions have different
configuration parameters.  Moreover doing the tracer configuration as
part of pmu::add() is highly redundant.

>
>> +
>> +     /*
>> +      * Make sure CPUs don't disappear between the
>> +      * power up sequence and configuration.
>> +      */
>> +     get_online_cpus();
>> +     ret = etm_event_power_sources(event->cpu, true);
>> +     if (ret)
>> +             goto out;
>> +
>> +     ret = etm_event_config_sources(event->cpu);
>
> This can be done in pmu::add(), if you can call directly into
> etm_configure_cpu() or etm_config_enable() so that there's no cross-cpu
> calling in between.

As per my comment above, reconfiguring the tracers every time it is
about to run is redundant and extensive (etm_configure_cpu() isn't
exactly short),  incurring a cost that is likely to be higher than
calling get_online_cpus().

>
>> +
>> +     event->destroy = etm_event_destroy;
>> +out:
>> +     put_online_cpus();
>> +     return ret;
>> +}
>> +
>>  static int __init etm_perf_init(void)
>>  {
>>       etm_pmu.capabilities    = PERF_PMU_CAP_EXCLUSIVE;
>> @@ -59,6 +234,7 @@ static int __init etm_perf_init(void)
>>       etm_pmu.attr_groups     = etm_pmu_attr_groups;
>>       etm_pmu.task_ctx_nr     = perf_sw_context;
>>       etm_pmu.read            = etm_event_read;
>> +     etm_pmu.event_init      = etm_event_init;
>>
>>       return perf_pmu_register(&etm_pmu, CORESIGHT_ETM_PMU_NAME, -1);
>>  }
>
> One general comment -- it would be slightly easier at least for me if
> all the pmu related bits were in one patch.

Ah!  I went to great length to split them up to make the patches
smaller  - I will refactor...

Thanks for the review,
Mathieu

>
> Thanks,
> --
> Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Alexander Shishkin Sept. 30, 2015, 9:43 a.m. | #3
Mathieu Poirier <mathieu.poirier@linaro.org> writes:

> On 22 September 2015 at 08:29, Alexander Shishkin
> <alexander.shishkin@linux.intel.com> wrote:
>> Mathieu Poirier <mathieu.poirier@linaro.org> writes:
>>
>>> +static void etm_event_destroy(struct perf_event *event)
>>> +{
>>> +     /* switching off the source will also tear down the path */
>>> +     etm_event_power_sources(event->cpu, false);
>>> +}
>>> +
>>> +static int etm_event_init(struct perf_event *event)
>>> +{
>>> +     int ret;
>>> +
>>> +     if (event->attr.type != etm_pmu.type)
>>> +             return -ENOENT;
>>> +
>>> +     if (event->cpu >= nr_cpu_ids)
>>> +             return -EINVAL;
>>> +
>>> +     /* only one session at a time */
>>> +     if (etm_event_source_enabled(event->cpu))
>>> +             return -EBUSY;
>>
>> Why is this the case? If you were to configure the event in pmu::add()
>> and deconfigure it in pmu::del(), like you already do with the buffer
>> part, you could handle as many sessions as you want.
>
> Apologies for the late reply, I was travelling.
>
> We certainly don't want to have more than once trace session going on
> at any given time, especially if the sessions have different
> configuration parameters.  Moreover doing the tracer configuration as
> part of pmu::add() is highly redundant.

But why?

The whole point of using perf for this is that it does all the tricky
context switching for us, all the cross-cpu calling to enable/disable
the events etc so that we can run multiple sessions in parallel without
having to worry (much) about scheduling. (Aside, of course, from other
useful things like sideband events, but that's another topic).

>> This can be done in pmu::add(), if you can call directly into
>> etm_configure_cpu() or etm_config_enable() so that there's no cross-cpu
>> calling in between.
>
> As per my comment above, reconfiguring the tracers every time it is
> about to run is redundant and extensive (etm_configure_cpu() isn't
> exactly short),  incurring a cost that is likely to be higher than
> calling get_online_cpus().

I was actually referring to synchronous smp_function_call*()s that
obviously won't work here. But the good news is that they are also
redundant.

But I don't see anything expensive in configuring etm and etb in
pmu::add(), as far as I can tell, it's just a bunch of register
writes. If you want to optimize those, you could compare the new context
against the previous one and only update registers that need to be
updated. The spinlock you also could get rid of, because there won't be
any local racing (again, afaict neither ETM nor ETB generate
interrupts).

That said, one expensive thing is reading out the ETB buffer on every
sched out, and that is the real problem, because it slows down the fast
path by a loop of arbitrary length reading out hw registers. Iirc, ETBs
could be up to 64K?

But a TMC-enabled coresight should do much better in this regard.

Thanks,
--
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Mathieu Poirier Oct. 2, 2015, 4:52 p.m. | #4
On 30 September 2015 at 03:43, Alexander Shishkin
<alexander.shishkin@linux.intel.com> wrote:
> Mathieu Poirier <mathieu.poirier@linaro.org> writes:
>
>> On 22 September 2015 at 08:29, Alexander Shishkin
>> <alexander.shishkin@linux.intel.com> wrote:
>>> Mathieu Poirier <mathieu.poirier@linaro.org> writes:
>>>
>>>> +static void etm_event_destroy(struct perf_event *event)
>>>> +{
>>>> +     /* switching off the source will also tear down the path */
>>>> +     etm_event_power_sources(event->cpu, false);
>>>> +}
>>>> +
>>>> +static int etm_event_init(struct perf_event *event)
>>>> +{
>>>> +     int ret;
>>>> +
>>>> +     if (event->attr.type != etm_pmu.type)
>>>> +             return -ENOENT;
>>>> +
>>>> +     if (event->cpu >= nr_cpu_ids)
>>>> +             return -EINVAL;
>>>> +
>>>> +     /* only one session at a time */
>>>> +     if (etm_event_source_enabled(event->cpu))
>>>> +             return -EBUSY;
>>>
>>> Why is this the case? If you were to configure the event in pmu::add()
>>> and deconfigure it in pmu::del(), like you already do with the buffer
>>> part, you could handle as many sessions as you want.
>>
>> Apologies for the late reply, I was travelling.
>>
>> We certainly don't want to have more than once trace session going on
>> at any given time, especially if the sessions have different
>> configuration parameters.  Moreover doing the tracer configuration as
>> part of pmu::add() is highly redundant.
>
> But why?
>
> The whole point of using perf for this is that it does all the tricky
> context switching for us, all the cross-cpu calling to enable/disable
> the events etc so that we can run multiple sessions in parallel without
> having to worry (much) about scheduling. (Aside, of course, from other
> useful things like sideband events, but that's another topic).

Sessions can run in parallel for as long as they don't use the same
CPUs.  There is no doubt as to the amount of benefit incurred by using
Perf but a tracer can't be commissioned by another session once it is
already part of one.

I'm suspecting we don't understand each other here... Maybe an IRC
chat is in order.

>
>>> This can be done in pmu::add(), if you can call directly into
>>> etm_configure_cpu() or etm_config_enable() so that there's no cross-cpu
>>> calling in between.
>>
>> As per my comment above, reconfiguring the tracers every time it is
>> about to run is redundant and extensive (etm_configure_cpu() isn't
>> exactly short),  incurring a cost that is likely to be higher than
>> calling get_online_cpus().
>
> I was actually referring to synchronous smp_function_call*()s that
> obviously won't work here. But the good news is that they are also
> redundant.
>
> But I don't see anything expensive in configuring etm and etb in
> pmu::add(), as far as I can tell, it's just a bunch of register
> writes.

Right, but why re-doing a configuration that doesn't change throughout
a trace session?  To me doing the configuration in pmu::add() is
simply redundant and not needed.

> If you want to optimize those, you could compare the new context
> against the previous one and only update registers that need to be
> updated. The spinlock you also could get rid of, because there won't be
> any local racing (again, afaict neither ETM nor ETB generate
> interrupts).
>
> That said, one expensive thing is reading out the ETB buffer on every
> sched out, and that is the real problem, because it slows down the fast
> path by a loop of arbitrary length reading out hw registers. Iirc, ETBs
> could be up to 64K

I agree completely.  This work is intended to set the way for TMCs and
other, faster, sinks.

>
> But a TMC-enabled coresight should do much better in this regard.
>
> Thanks,
> --
> Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch hide | download patch | download mbox

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 759b8d69b4e6..a21171a3e929 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -30,6 +30,9 @@ 
 
 static struct pmu etm_pmu;
 
+static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
+static DEFINE_PER_CPU(struct coresight_device *, csdev_sink);
+
 /* ETMCR is 'config' */
 PMU_FORMAT_ATTR(cycacc,		"config:12");
 PMU_FORMAT_ATTR(timestamp,	"config:28");
@@ -52,6 +55,178 @@  static const struct attribute_group *etm_pmu_attr_groups[] = {
 
 static void etm_event_read(struct perf_event *event) {}
 
+static int etm_event_power_single_source(int source, bool power)
+{
+	int ret = 0;
+	LIST_HEAD(path);
+	LIST_HEAD(sinks);
+	struct coresight_device *csdev;
+
+	csdev = per_cpu(csdev_src, source);
+
+	if (!csdev)
+		return -EINVAL;
+
+	if (csdev->type != CORESIGHT_DEV_TYPE_SOURCE)
+		return -EINVAL;
+
+	if (power) {
+		ret = source_ops(csdev)->poweron(csdev);
+		if (ret)
+			goto out;
+
+		ret = coresight_build_paths(csdev, &path, &sinks, true);
+		if (ret) {
+			dev_dbg(&csdev->dev, "creating path(s) failed\n");
+			source_ops(csdev)->poweroff(csdev);
+		}
+
+		/* Everything is good, record first enabled sink buffer */
+		per_cpu(csdev_sink, source) =
+			list_first_entry(&sinks,
+					 struct coresight_device, sinks);
+	} else {
+		source_ops(csdev)->poweroff(csdev);
+		ret = coresight_build_paths(csdev, &path, NULL, false);
+		if (ret)
+			dev_dbg(&csdev->dev, "releasing path(s) failed\n");
+	}
+
+out:
+	return ret;
+}
+
+static int etm_event_power_sources(int source, bool power)
+{
+	int cpu, ret;
+
+	if (source < -1 || source >= nr_cpu_ids)
+		return -EINVAL;
+
+	/* source == -1 is for all CPUs. */
+	if (source != -1) {
+		/* power up/down one source */
+		ret = etm_event_power_single_source(source, power);
+		goto out;
+	}
+
+	/* same process as above, but for all CPUs */
+	for_each_online_cpu(cpu) {
+		ret = etm_event_power_single_source(cpu, power);
+		if (ret)
+			break;
+	}
+
+out:
+	return ret;
+}
+
+static int etm_event_config_single_source(int source)
+{
+	struct coresight_device *csdev;
+
+	csdev = per_cpu(csdev_src, source);
+
+	if (!csdev)
+		return -EINVAL;
+
+	if (csdev->type != CORESIGHT_DEV_TYPE_SOURCE)
+		return -EINVAL;
+
+	return source_ops(csdev)->configure(csdev);
+}
+
+static int etm_event_config_sources(int source)
+{
+	int cpu, ret;
+
+	if (source < -1 || source >= nr_cpu_ids)
+		return -EINVAL;
+
+	/* source == -1 is for all CPUs. */
+	if (source != -1) {
+		/* configure one source */
+		ret = etm_event_config_single_source(source);
+		goto out;
+	}
+
+	/* same process as above, but for all CPUs */
+	for_each_online_cpu(cpu) {
+		ret = etm_event_config_single_source(cpu);
+		if (ret)
+			goto reset;
+	}
+
+out:
+	return ret;
+reset:
+	for_each_online_cpu(cpu)
+		etm_event_power_sources(cpu, false);
+	goto out;
+}
+
+static bool etm_event_source_single_enabled(int source)
+{
+	struct coresight_device *csdev = per_cpu(csdev_src, source);
+
+	if (!csdev)
+		return true;
+
+	return source_ops(csdev)->is_enabled(csdev);
+}
+
+static bool etm_event_source_enabled(int source)
+{
+	int cpu;
+
+	if (source != -1)
+		return etm_event_source_single_enabled(source);
+
+	for_each_online_cpu(cpu) {
+		if (etm_event_source_single_enabled(cpu))
+			return true;
+	}
+
+	return false;
+}
+
+static void etm_event_destroy(struct perf_event *event)
+{
+	/* switching off the source will also tear down the path */
+	etm_event_power_sources(event->cpu, false);
+}
+
+static int etm_event_init(struct perf_event *event)
+{
+	int ret;
+
+	if (event->attr.type != etm_pmu.type)
+		return -ENOENT;
+
+	if (event->cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	/* only one session at a time */
+	if (etm_event_source_enabled(event->cpu))
+		return -EBUSY;
+
+	/*
+	 * Make sure CPUs don't disappear between the
+	 * power up sequence and configuration.
+	 */
+	get_online_cpus();
+	ret = etm_event_power_sources(event->cpu, true);
+	if (ret)
+		goto out;
+
+	ret = etm_event_config_sources(event->cpu);
+
+	event->destroy = etm_event_destroy;
+out:
+	put_online_cpus();
+	return ret;
+}
+
 static int __init etm_perf_init(void)
 {
 	etm_pmu.capabilities	= PERF_PMU_CAP_EXCLUSIVE;
@@ -59,6 +234,7 @@  static int __init etm_perf_init(void)
 	etm_pmu.attr_groups	= etm_pmu_attr_groups;
 	etm_pmu.task_ctx_nr	= perf_sw_context;
 	etm_pmu.read		= etm_event_read;
+	etm_pmu.event_init	= etm_event_init;
 
 	return perf_pmu_register(&etm_pmu, CORESIGHT_ETM_PMU_NAME, -1);
 }