Message ID | 1442593594-10665-15-git-send-email-mathieu.poirier@linaro.org |
---|---|
State | New |
Headers | show |
Mathieu Poirier <mathieu.poirier@linaro.org> writes: > +static void etm_event_destroy(struct perf_event *event) > +{ > + /* switching off the source will also tear down the path */ > + etm_event_power_sources(event->cpu, false); > +} > + > +static int etm_event_init(struct perf_event *event) > +{ > + int ret; > + > + if (event->attr.type != etm_pmu.type) > + return -ENOENT; > + > + if (event->cpu >= nr_cpu_ids) > + return -EINVAL; > + > + /* only one session at a time */ > + if (etm_event_source_enabled(event->cpu)) > + return -EBUSY; Why is this the case? If you were to configure the event in pmu::add() and deconfigure it in pmu::del(), like you already do with the buffer part, you could handle as many sessions as you want. > + > + /* > + * Make sure CPUs don't disappear between the > + * power up sequence and configuration. > + */ > + get_online_cpus(); > + ret = etm_event_power_sources(event->cpu, true); > + if (ret) > + goto out; > + > + ret = etm_event_config_sources(event->cpu); This can be done in pmu::add(), if you can call directly into etm_configure_cpu() or etm_config_enable() so that there's no cross-cpu calling in between. > + > + event->destroy = etm_event_destroy; > +out: > + put_online_cpus(); > + return ret; > +} > + > static int __init etm_perf_init(void) > { > etm_pmu.capabilities = PERF_PMU_CAP_EXCLUSIVE; > @@ -59,6 +234,7 @@ static int __init etm_perf_init(void) > etm_pmu.attr_groups = etm_pmu_attr_groups; > etm_pmu.task_ctx_nr = perf_sw_context; > etm_pmu.read = etm_event_read; > + etm_pmu.event_init = etm_event_init; > > return perf_pmu_register(&etm_pmu, CORESIGHT_ETM_PMU_NAME, -1); > } One general comment -- it would be slightly easier at least for me if all the pmu related bits were in one patch. Thanks, -- Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 22 September 2015 at 08:29, Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote: > Mathieu Poirier <mathieu.poirier@linaro.org> writes: > >> +static void etm_event_destroy(struct perf_event *event) >> +{ >> + /* switching off the source will also tear down the path */ >> + etm_event_power_sources(event->cpu, false); >> +} >> + >> +static int etm_event_init(struct perf_event *event) >> +{ >> + int ret; >> + >> + if (event->attr.type != etm_pmu.type) >> + return -ENOENT; >> + >> + if (event->cpu >= nr_cpu_ids) >> + return -EINVAL; >> + >> + /* only one session at a time */ >> + if (etm_event_source_enabled(event->cpu)) >> + return -EBUSY; > > Why is this the case? If you were to configure the event in pmu::add() > and deconfigure it in pmu::del(), like you already do with the buffer > part, you could handle as many sessions as you want. Apologies for the late reply, I was travelling. We certainly don't want to have more than once trace session going on at any given time, especially if the sessions have different configuration parameters. Moreover doing the tracer configuration as part of pmu::add() is highly redundant. > >> + >> + /* >> + * Make sure CPUs don't disappear between the >> + * power up sequence and configuration. >> + */ >> + get_online_cpus(); >> + ret = etm_event_power_sources(event->cpu, true); >> + if (ret) >> + goto out; >> + >> + ret = etm_event_config_sources(event->cpu); > > This can be done in pmu::add(), if you can call directly into > etm_configure_cpu() or etm_config_enable() so that there's no cross-cpu > calling in between. As per my comment above, reconfiguring the tracers every time it is about to run is redundant and extensive (etm_configure_cpu() isn't exactly short), incurring a cost that is likely to be higher than calling get_online_cpus(). > >> + >> + event->destroy = etm_event_destroy; >> +out: >> + put_online_cpus(); >> + return ret; >> +} >> + >> static int __init etm_perf_init(void) >> { >> etm_pmu.capabilities = PERF_PMU_CAP_EXCLUSIVE; >> @@ -59,6 +234,7 @@ static int __init etm_perf_init(void) >> etm_pmu.attr_groups = etm_pmu_attr_groups; >> etm_pmu.task_ctx_nr = perf_sw_context; >> etm_pmu.read = etm_event_read; >> + etm_pmu.event_init = etm_event_init; >> >> return perf_pmu_register(&etm_pmu, CORESIGHT_ETM_PMU_NAME, -1); >> } > > One general comment -- it would be slightly easier at least for me if > all the pmu related bits were in one patch. Ah! I went to great length to split them up to make the patches smaller - I will refactor... Thanks for the review, Mathieu > > Thanks, > -- > Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Mathieu Poirier <mathieu.poirier@linaro.org> writes: > On 22 September 2015 at 08:29, Alexander Shishkin > <alexander.shishkin@linux.intel.com> wrote: >> Mathieu Poirier <mathieu.poirier@linaro.org> writes: >> >>> +static void etm_event_destroy(struct perf_event *event) >>> +{ >>> + /* switching off the source will also tear down the path */ >>> + etm_event_power_sources(event->cpu, false); >>> +} >>> + >>> +static int etm_event_init(struct perf_event *event) >>> +{ >>> + int ret; >>> + >>> + if (event->attr.type != etm_pmu.type) >>> + return -ENOENT; >>> + >>> + if (event->cpu >= nr_cpu_ids) >>> + return -EINVAL; >>> + >>> + /* only one session at a time */ >>> + if (etm_event_source_enabled(event->cpu)) >>> + return -EBUSY; >> >> Why is this the case? If you were to configure the event in pmu::add() >> and deconfigure it in pmu::del(), like you already do with the buffer >> part, you could handle as many sessions as you want. > > Apologies for the late reply, I was travelling. > > We certainly don't want to have more than once trace session going on > at any given time, especially if the sessions have different > configuration parameters. Moreover doing the tracer configuration as > part of pmu::add() is highly redundant. But why? The whole point of using perf for this is that it does all the tricky context switching for us, all the cross-cpu calling to enable/disable the events etc so that we can run multiple sessions in parallel without having to worry (much) about scheduling. (Aside, of course, from other useful things like sideband events, but that's another topic). >> This can be done in pmu::add(), if you can call directly into >> etm_configure_cpu() or etm_config_enable() so that there's no cross-cpu >> calling in between. > > As per my comment above, reconfiguring the tracers every time it is > about to run is redundant and extensive (etm_configure_cpu() isn't > exactly short), incurring a cost that is likely to be higher than > calling get_online_cpus(). I was actually referring to synchronous smp_function_call*()s that obviously won't work here. But the good news is that they are also redundant. But I don't see anything expensive in configuring etm and etb in pmu::add(), as far as I can tell, it's just a bunch of register writes. If you want to optimize those, you could compare the new context against the previous one and only update registers that need to be updated. The spinlock you also could get rid of, because there won't be any local racing (again, afaict neither ETM nor ETB generate interrupts). That said, one expensive thing is reading out the ETB buffer on every sched out, and that is the real problem, because it slows down the fast path by a loop of arbitrary length reading out hw registers. Iirc, ETBs could be up to 64K? But a TMC-enabled coresight should do much better in this regard. Thanks, -- Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 30 September 2015 at 03:43, Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote: > Mathieu Poirier <mathieu.poirier@linaro.org> writes: > >> On 22 September 2015 at 08:29, Alexander Shishkin >> <alexander.shishkin@linux.intel.com> wrote: >>> Mathieu Poirier <mathieu.poirier@linaro.org> writes: >>> >>>> +static void etm_event_destroy(struct perf_event *event) >>>> +{ >>>> + /* switching off the source will also tear down the path */ >>>> + etm_event_power_sources(event->cpu, false); >>>> +} >>>> + >>>> +static int etm_event_init(struct perf_event *event) >>>> +{ >>>> + int ret; >>>> + >>>> + if (event->attr.type != etm_pmu.type) >>>> + return -ENOENT; >>>> + >>>> + if (event->cpu >= nr_cpu_ids) >>>> + return -EINVAL; >>>> + >>>> + /* only one session at a time */ >>>> + if (etm_event_source_enabled(event->cpu)) >>>> + return -EBUSY; >>> >>> Why is this the case? If you were to configure the event in pmu::add() >>> and deconfigure it in pmu::del(), like you already do with the buffer >>> part, you could handle as many sessions as you want. >> >> Apologies for the late reply, I was travelling. >> >> We certainly don't want to have more than once trace session going on >> at any given time, especially if the sessions have different >> configuration parameters. Moreover doing the tracer configuration as >> part of pmu::add() is highly redundant. > > But why? > > The whole point of using perf for this is that it does all the tricky > context switching for us, all the cross-cpu calling to enable/disable > the events etc so that we can run multiple sessions in parallel without > having to worry (much) about scheduling. (Aside, of course, from other > useful things like sideband events, but that's another topic). Sessions can run in parallel for as long as they don't use the same CPUs. There is no doubt as to the amount of benefit incurred by using Perf but a tracer can't be commissioned by another session once it is already part of one. I'm suspecting we don't understand each other here... Maybe an IRC chat is in order. > >>> This can be done in pmu::add(), if you can call directly into >>> etm_configure_cpu() or etm_config_enable() so that there's no cross-cpu >>> calling in between. >> >> As per my comment above, reconfiguring the tracers every time it is >> about to run is redundant and extensive (etm_configure_cpu() isn't >> exactly short), incurring a cost that is likely to be higher than >> calling get_online_cpus(). > > I was actually referring to synchronous smp_function_call*()s that > obviously won't work here. But the good news is that they are also > redundant. > > But I don't see anything expensive in configuring etm and etb in > pmu::add(), as far as I can tell, it's just a bunch of register > writes. Right, but why re-doing a configuration that doesn't change throughout a trace session? To me doing the configuration in pmu::add() is simply redundant and not needed. > If you want to optimize those, you could compare the new context > against the previous one and only update registers that need to be > updated. The spinlock you also could get rid of, because there won't be > any local racing (again, afaict neither ETM nor ETB generate > interrupts). > > That said, one expensive thing is reading out the ETB buffer on every > sched out, and that is the real problem, because it slows down the fast > path by a loop of arbitrary length reading out hw registers. Iirc, ETBs > could be up to 64K I agree completely. This work is intended to set the way for TMCs and other, faster, sinks. > > But a TMC-enabled coresight should do much better in this regard. > > Thanks, > -- > Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 759b8d69b4e6..a21171a3e929 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -30,6 +30,9 @@ static struct pmu etm_pmu; +static DEFINE_PER_CPU(struct coresight_device *, csdev_src); +static DEFINE_PER_CPU(struct coresight_device *, csdev_sink); + /* ETMCR is 'config' */ PMU_FORMAT_ATTR(cycacc, "config:12"); PMU_FORMAT_ATTR(timestamp, "config:28"); @@ -52,6 +55,178 @@ static const struct attribute_group *etm_pmu_attr_groups[] = { static void etm_event_read(struct perf_event *event) {} +static int etm_event_power_single_source(int source, bool power) +{ + int ret = 0; + LIST_HEAD(path); + LIST_HEAD(sinks); + struct coresight_device *csdev; + + csdev = per_cpu(csdev_src, source); + + if (!csdev) + return -EINVAL; + + if (csdev->type != CORESIGHT_DEV_TYPE_SOURCE) + return -EINVAL; + + if (power) { + ret = source_ops(csdev)->poweron(csdev); + if (ret) + goto out; + + ret = coresight_build_paths(csdev, &path, &sinks, true); + if (ret) { + dev_dbg(&csdev->dev, "creating path(s) failed\n"); + source_ops(csdev)->poweroff(csdev); + } + + /* Everything is good, record first enabled sink buffer */ + per_cpu(csdev_sink, source) = + list_first_entry(&sinks, + struct coresight_device, sinks); + } else { + source_ops(csdev)->poweroff(csdev); + ret = coresight_build_paths(csdev, &path, NULL, false); + if (ret) + dev_dbg(&csdev->dev, "releasing path(s) failed\n"); + } + +out: + return ret; +} + +static int etm_event_power_sources(int source, bool power) +{ + int cpu, ret; + + if (source < -1 || source >= nr_cpu_ids) + return -EINVAL; + + /* source == -1 is for all CPUs. */ + if (source != -1) { + /* power up/down one source */ + ret = etm_event_power_single_source(source, power); + goto out; + } + + /* same process as above, but for all CPUs */ + for_each_online_cpu(cpu) { + ret = etm_event_power_single_source(cpu, power); + if (ret) + break; + } + +out: + return ret; +} + +static int etm_event_config_single_source(int source) +{ + struct coresight_device *csdev; + + csdev = per_cpu(csdev_src, source); + + if (!csdev) + return -EINVAL; + + if (csdev->type != CORESIGHT_DEV_TYPE_SOURCE) + return -EINVAL; + + return source_ops(csdev)->configure(csdev); +} + +static int etm_event_config_sources(int source) +{ + int cpu, ret; + + if (source < -1 || source >= nr_cpu_ids) + return -EINVAL; + + /* source == -1 is for all CPUs. */ + if (source != -1) { + /* configure one source */ + ret = etm_event_config_single_source(source); + goto out; + } + + /* same process as above, but for all CPUs */ + for_each_online_cpu(cpu) { + ret = etm_event_config_single_source(cpu); + if (ret) + goto reset; + } + +out: + return ret; +reset: + for_each_online_cpu(cpu) + etm_event_power_sources(cpu, false); + goto out; +} + +static bool etm_event_source_single_enabled(int source) +{ + struct coresight_device *csdev = per_cpu(csdev_src, source); + + if (!csdev) + return true; + + return source_ops(csdev)->is_enabled(csdev); +} + +static bool etm_event_source_enabled(int source) +{ + int cpu; + + if (source != -1) + return etm_event_source_single_enabled(source); + + for_each_online_cpu(cpu) { + if (etm_event_source_single_enabled(cpu)) + return true; + } + + return false; +} + +static void etm_event_destroy(struct perf_event *event) +{ + /* switching off the source will also tear down the path */ + etm_event_power_sources(event->cpu, false); +} + +static int etm_event_init(struct perf_event *event) +{ + int ret; + + if (event->attr.type != etm_pmu.type) + return -ENOENT; + + if (event->cpu >= nr_cpu_ids) + return -EINVAL; + + /* only one session at a time */ + if (etm_event_source_enabled(event->cpu)) + return -EBUSY; + + /* + * Make sure CPUs don't disappear between the + * power up sequence and configuration. + */ + get_online_cpus(); + ret = etm_event_power_sources(event->cpu, true); + if (ret) + goto out; + + ret = etm_event_config_sources(event->cpu); + + event->destroy = etm_event_destroy; +out: + put_online_cpus(); + return ret; +} + static int __init etm_perf_init(void) { etm_pmu.capabilities = PERF_PMU_CAP_EXCLUSIVE; @@ -59,6 +234,7 @@ static int __init etm_perf_init(void) etm_pmu.attr_groups = etm_pmu_attr_groups; etm_pmu.task_ctx_nr = perf_sw_context; etm_pmu.read = etm_event_read; + etm_pmu.event_init = etm_event_init; return perf_pmu_register(&etm_pmu, CORESIGHT_ETM_PMU_NAME, -1); }
Enhancing skeleton PMU with event initialisation. The function makes sure tracers aren't already enabled before going through with the powe up and configuration sequences. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> --- drivers/hwtracing/coresight/coresight-etm-perf.c | 176 +++++++++++++++++++++++ 1 file changed, 176 insertions(+)