[v2,1/6] Add ancillary bus support

Message ID	20201005182446.977325-2-david.m.ertman@intel.com
State	New
Headers	show Return-Path: <SRS0=eLx3=DM=alsa-project.org=alsa-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AE530212CC IronPort-SDR: RkiUSrj6SPaPMtnsHziAYGcW3EGtxnMtarlkk2PWyj256EKJlzKUmDeSWx12ytoFm1e0w2Z+uK 95NkUprR7rVA== IronPort-SDR: BT7SBxtn9bMCBkYRQCTmiDLe/ew3HuK27c2iRcGQVHMVYyglbvnIkn59LlNSHZ6RxKt4vsDhtN OG4tBZ3WS/iA== From: Dave Ertman <david.m.ertman@intel.com> To: alsa-devel@alsa-project.org Subject: [PATCH v2 1/6] Add ancillary bus support Date: Mon, 5 Oct 2020 11:24:41 -0700 Message-Id: <20201005182446.977325-2-david.m.ertman@intel.com> In-Reply-To: <20201005182446.977325-1-david.m.ertman@intel.com> References: <20201005182446.977325-1-david.m.ertman@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Cc: parav@mellanox.com, tiwai@suse.de, netdev@vger.kernel.org, ranjani.sridharan@linux.intel.com, pierre-louis.bossart@linux.intel.com, fred.oh@linux.intel.com, linux-rdma@vger.kernel.org, dledford@redhat.com, broonie@kernel.org, jgg@nvidia.com, gregkh@linuxfoundation.org, kuba@kernel.org, dan.j.williams@intel.com, shiraz.saleem@intel.com, davem@davemloft.net, kiran.patil@intel.com Precedence: list Errors-To: alsa-devel-bounces@alsa-project.org Sender: "Alsa-devel" <alsa-devel-bounces@alsa-project.org>
Series	Ancillary bus implementation and SOF multi-client support \| expand [v2,0/6] Ancillary bus implementation and SOF multi-client support [v2,1/6] Add ancillary bus support [v2,2/6] ASoC: SOF: Introduce descriptors for SOF client [v2,3/6] ASoC: SOF: Create client driver for IPC test [v2,4/6] ASoC: SOF: ops: Add ops for client registration [v2,5/6] ASoC: SOF: Intel: Define ops for client registration [v2,6/6] ASoC: SOF: debug: Remove IPC flood test support in SOF core

Ertman, David M Oct. 5, 2020, 6:24 p.m. UTC

Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
It enables drivers to create an ancillary_device and bind an
ancillary_driver to it.

The bus supports probe/remove shutdown and suspend/resume callbacks.
Each ancillary_device has a unique string based id; driver binds to
an ancillary_device based on this id through the bus.

Co-developed-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Co-developed-by: Fred Oh <fred.oh@linux.intel.com>
Signed-off-by: Fred Oh <fred.oh@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
---
 Documentation/driver-api/ancillary_bus.rst | 229 +++++++++++++++++++++
 Documentation/driver-api/index.rst         |   1 +
 drivers/bus/Kconfig                        |   3 +
 drivers/bus/Makefile                       |   3 +
 drivers/bus/ancillary.c                    | 225 ++++++++++++++++++++
 include/linux/ancillary_bus.h              |  69 +++++++
 include/linux/mod_devicetable.h            |   8 +
 scripts/mod/devicetable-offsets.c          |   3 +
 scripts/mod/file2alias.c                   |   8 +
 9 files changed, 549 insertions(+)
 create mode 100644 Documentation/driver-api/ancillary_bus.rst
 create mode 100644 drivers/bus/ancillary.c
 create mode 100644 include/linux/ancillary_bus.h

Leon Romanovsky Oct. 6, 2020, 7:18 a.m. UTC | #1

On Mon, Oct 05, 2020 at 11:24:41AM -0700, Dave Ertman wrote:
> Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> It enables drivers to create an ancillary_device and bind an
> ancillary_driver to it.

I was under impression that this name is going to be changed.

>
> The bus supports probe/remove shutdown and suspend/resume callbacks.
> Each ancillary_device has a unique string based id; driver binds to
> an ancillary_device based on this id through the bus.
>
> Co-developed-by: Kiran Patil <kiran.patil@intel.com>
> Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> Co-developed-by: Fred Oh <fred.oh@linux.intel.com>
> Signed-off-by: Fred Oh <fred.oh@linux.intel.com>
> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
> Reviewed-by: Parav Pandit <parav@mellanox.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> ---
>  Documentation/driver-api/ancillary_bus.rst | 229 +++++++++++++++++++++
>  Documentation/driver-api/index.rst         |   1 +
>  drivers/bus/Kconfig                        |   3 +
>  drivers/bus/Makefile                       |   3 +
>  drivers/bus/ancillary.c                    | 225 ++++++++++++++++++++
>  include/linux/ancillary_bus.h              |  69 +++++++
>  include/linux/mod_devicetable.h            |   8 +
>  scripts/mod/devicetable-offsets.c          |   3 +
>  scripts/mod/file2alias.c                   |   8 +
>  9 files changed, 549 insertions(+)
>  create mode 100644 Documentation/driver-api/ancillary_bus.rst
>  create mode 100644 drivers/bus/ancillary.c
>  create mode 100644 include/linux/ancillary_bus.h
>
> diff --git a/Documentation/driver-api/ancillary_bus.rst b/Documentation/driver-api/ancillary_bus.rst
> new file mode 100644
> index 000000000000..66f986e8672f
> --- /dev/null
> +++ b/Documentation/driver-api/ancillary_bus.rst
> @@ -0,0 +1,229 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +=============
> +Ancillary Bus
> +=============
> +
> +In some subsystems, the functionality of the core device (PCI/ACPI/other) is
> +too complex for a single device to be managed as a monolithic block or a part of
> +the functionality needs to be exposed to a different subsystem.  Splitting the
> +functionality into smaller orthogonal devices would make it easier to manage
> +data, power management and domain-specific interaction with the hardware. A key
> +requirement for such a split is that there is no dependency on a physical bus,
> +device, register accesses or regmap support. These individual devices split from
> +the core cannot live on the platform bus as they are not physical devices that
> +are controlled by DT/ACPI. The same argument applies for not using MFD in this
> +scenario as MFD relies on individual function devices being physical devices.
> +
> +An example for this kind of requirement is the audio subsystem where a single
> +IP is handling multiple entities such as HDMI, Soundwire, local devices such as
> +mics/speakers etc. The split for the core's functionality can be arbitrary or
> +be defined by the DSP firmware topology and include hooks for test/debug. This
> +allows for the audio core device to be minimal and focused on hardware-specific
> +control and communication.
> +
> +The ancillary bus is intended to be minimal, generic and avoid domain-specific
> +assumptions. Each ancillary_device represents a part of its parent
> +functionality. The generic behavior can be extended and specialized as needed
> +by encapsulating an ancillary_device within other domain-specific structures and
> +the use of .ops callbacks. Devices on the ancillary bus do not share any
> +structures and the use of a communication channel with the parent is
> +domain-specific.
> +
> +When Should the Ancillary Bus Be Used
> +=====================================
> +
> +The ancillary bus is to be used when a driver and one or more kernel modules,
> +who share a common header file with the driver, need a mechanism to connect and
> +provide access to a shared object allocated by the ancillary_device's
> +registering driver.  The registering driver for the ancillary_device(s) and the
> +kernel module(s) registering ancillary_drivers can be from the same subsystem,
> +or from multiple subsystems.
> +
> +The emphasis here is on a common generic interface that keeps subsystem
> +customization out of the bus infrastructure.
> +
> +One example could be a multi-port PCI network device that is rdma-capable and
> +needs to export this functionality and attach to an rdma driver in another
> +subsystem.  The PCI driver will allocate and register an ancillary_device for
> +each physical function on the NIC.  The rdma driver will register an
> +ancillary_driver that will be matched with and probed for each of these
> +ancillary_devices.  This will give the rdma driver access to the shared data/ops
> +in the PCI drivers shared object to establish a connection with the PCI driver.
> +
> +Another use case is for the PCI device to be split out into multiple sub
> +functions.  For each sub function an ancillary_device will be created.  A PCI
> +sub function driver will bind to such devices that will create its own one or
> +more class devices.  A PCI sub function ancillary device will likely be
> +contained in a struct with additional attributes such as user defined sub
> +function number and optional attributes such as resources and a link to the
> +parent device.  These attributes could be used by systemd/udev; and hence should
> +be initialized before a driver binds to an ancillary_device.
> +
> +Ancillary Device
> +================
> +
> +An ancillary_device is created and registered to represent a part of its parent
> +device's functionality. It is given a name that, combined with the registering
> +drivers KBUILD_MODNAME, creates a match_name that is used for driver binding,
> +and an id that combined with the match_name provide a unique name to register
> +with the bus subsystem.
> +
> +Registering an ancillary_device is a two-step process.  First you must call
> +ancillary_device_initialize(), which will check several aspects of the
> +ancillary_device struct and perform a device_initialize().  After this step
> +completes, any error state must have a call to put_device() in its resolution
> +path.  The second step in registering an ancillary_device is to perform a call
> +to ancillary_device_add(), which will set the name of the device and add the
> +device to the bus.
> +
> +To unregister an ancillary_device, just a call to ancillary_device_unregister()
> +is used.  This will perform both a device_del() and a put_device().
> +
> +.. code-block:: c
> +
> +	struct ancillary_device {
> +		struct device dev;
> +                const char *name;
> +		u32 id;
> +	};
> +
> +If two ancillary_devices both with a match_name "mod.foo" are registered onto
> +the bus, they must have unique id values (e.g. "x" and "y") so that the
> +registered devices names will be "mod.foo.x" and "mod.foo.y".  If match_name +
> +id are not unique, then the device_add will fail and generate an error message.
> +
> +The ancillary_device.dev.type.release or ancillary_device.dev.release must be
> +populated with a non-NULL pointer to successfully register the ancillary_device.
> +
> +The ancillary_device.dev.parent must also be populated.
> +
> +Ancillary Device Memory Model and Lifespan
> +------------------------------------------
> +
> +When a kernel driver registers an ancillary_device on the ancillary bus, we will
> +use the nomenclature to refer to this kernel driver as a registering driver.  It
> +is the entity that will allocate memory for the ancillary_device and register it
> +on the ancillary bus.  It is important to note that, as opposed to the platform
> +bus, the registering driver is wholly responsible for the management for the
> +memory used for the driver object.
> +
> +A parent object, defined in the shared header file, will contain the
> +ancillary_device.  It will also contain a pointer to the shared object(s), which
> +will also be defined in the shared header.  Both the parent object and the
> +shared object(s) will be allocated by the registering driver.  This layout
> +allows the ancillary_driver's registering module to perform a container_of()
> +call to go from the pointer to the ancillary_device, that is passed during the
> +call to the ancillary_driver's probe function, up to the parent object, and then
> +have access to the shared object(s).
> +
> +The memory for the ancillary_device will be freed only in its release()
> +callback flow as defined by its registering driver.
> +
> +The memory for the shared object(s) must have a lifespan equal to, or greater
> +than, the lifespan of the memory for the ancillary_device.  The ancillary_driver
> +should only consider that this shared object is valid as long as the
> +ancillary_device is still registered on the ancillary bus.  It is up to the
> +registering driver to manage (e.g. free or keep available) the memory for the
> +shared object beyond the life of the ancillary_device.
> +
> +Registering driver must unregister all ancillary devices before its registering
> +parent device's remove() is completed.
> +
> +Ancillary Drivers
> +=================
> +
> +Ancillary drivers follow the standard driver model convention, where
> +discovery/enumeration is handled by the core, and drivers
> +provide probe() and remove() methods. They support power management
> +and shutdown notifications using the standard conventions.
> +
> +.. code-block:: c
> +
> +	struct ancillary_driver {
> +		int (*probe)(struct ancillary_device *,
> +                             const struct ancillary_device_id *id);
> +		int (*remove)(struct ancillary_device *);
> +		void (*shutdown)(struct ancillary_device *);
> +		int (*suspend)(struct ancillary_device *, pm_message_t);
> +		int (*resume)(struct ancillary_device *);
> +		struct device_driver driver;
> +		const struct ancillary_device_id *id_table;
> +	};
> +
> +Ancillary drivers register themselves with the bus by calling
> +ancillary_driver_register(). The id_table contains the match_names of ancillary
> +devices that a driver can bind with.
> +
> +Example Usage
> +=============
> +
> +Ancillary devices are created and registered by a subsystem-level core device
> +that needs to break up its functionality into smaller fragments. One way to
> +extend the scope of an ancillary_device would be to encapsulate it within a
> +domain-specific structure defined by the parent device. This structure contains
> +the ancillary_device and any associated shared data/callbacks needed to
> +establish the connection with the parent.
> +
> +An example would be:
> +
> +.. code-block:: c
> +
> +        struct foo {
> +		struct ancillary_device ancildev;
> +		void (*connect)(struct ancillary_device *ancildev);
> +		void (*disconnect)(struct ancillary_device *ancildev);
> +		void *data;
> +        };
> +
> +The parent device would then register the ancillary_device by calling
> +ancillary_device_initialize(), and then ancillary_device_add(), with the pointer
> +to the ancildev member of the above structure. The parent would provide a name
> +for the ancillary_device that, combined with the parent's KBUILD_MODNAME, will
> +create a match_name that will be used for matching and binding with a driver.
> +
> +Whenever an ancillary_driver is registered, based on the match_name, the
> +ancillary_driver's probe() is invoked for the matching devices.  The
> +ancillary_driver can also be encapsulated inside custom drivers that make the
> +core device's functionality extensible by adding additional domain-specific ops
> +as follows:
> +
> +.. code-block:: c
> +
> +	struct my_ops {
> +		void (*send)(struct ancillary_device *ancildev);
> +		void (*receive)(struct ancillary_device *ancildev);
> +	};
> +
> +
> +	struct my_driver {
> +		struct ancillary_driver ancillary_drv;
> +		const struct my_ops ops;
> +	};
> +
> +An example of this type of usage would be:
> +
> +.. code-block:: c
> +
> +	const struct ancillary_device_id my_ancillary_id_table[] = {
> +		{ .name = "foo_mod.foo_dev" },
> +		{ },
> +	};
> +
> +	const struct my_ops my_custom_ops = {
> +		.send = my_tx,
> +		.receive = my_rx,
> +	};
> +
> +	const struct my_driver my_drv = {
> +		.ancillary_drv = {
> +			.driver = {
> +				.name = "myancillarydrv",

Why do we need to give control over driver name to the driver authors?
It can be problematic if author puts name that already exists.

> +			},
> +			.id_table = my_ancillary_id_table,
> +			.probe = my_probe,
> +			.remove = my_remove,
> +			.shutdown = my_shutdown,
> +		},
> +		.ops = my_custom_ops,
> +	};
> diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
> index 5ef2cfe3a16b..9584ac2ed1f5 100644
> --- a/Documentation/driver-api/index.rst
> +++ b/Documentation/driver-api/index.rst
> @@ -74,6 +74,7 @@ available subsections can be seen below.
>     thermal/index
>     fpga/index
>     acpi/index
> +   ancillary_bus
>     backlight/lp855x-driver.rst
>     connector
>     console
> diff --git a/drivers/bus/Kconfig b/drivers/bus/Kconfig
> index 0c262c2aeaf2..ba82a045b847 100644
> --- a/drivers/bus/Kconfig
> +++ b/drivers/bus/Kconfig
> @@ -5,6 +5,9 @@
>
>  menu "Bus devices"
>
> +config ANCILLARY_BUS
> +       tristate
> +
>  config ARM_CCI
>  	bool
>
> diff --git a/drivers/bus/Makefile b/drivers/bus/Makefile
> index 397e35392bff..7c217eb1dbb7 100644
> --- a/drivers/bus/Makefile
> +++ b/drivers/bus/Makefile
> @@ -3,6 +3,9 @@
>  # Makefile for the bus drivers.
>  #
>
> +# Ancillary bus driver
> +obj-$(CONFIG_ANCILLARY_BUS)	+= ancillary.o
> +
>  # Interconnect bus drivers for ARM platforms
>  obj-$(CONFIG_ARM_CCI)		+= arm-cci.o
>  obj-$(CONFIG_ARM_INTEGRATOR_LM)	+= arm-integrator-lm.o
> diff --git a/drivers/bus/ancillary.c b/drivers/bus/ancillary.c
> new file mode 100644
> index 000000000000..93888ca36fb1
> --- /dev/null
> +++ b/drivers/bus/ancillary.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Software based bus for Ancillary devices
> + *
> + * Copyright (c) 2019-2020 Intel Corporation
> + *
> + * Please see Documentation/driver-api/ancillary_bus.rst for more information.
> + */
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/device.h>
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/pm_domain.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/string.h>
> +#include <linux/ancillary_bus.h>
> +
> +static const struct ancillary_device_id *ancillary_match_id(const struct ancillary_device_id *id,
> +							    const struct ancillary_device *ancildev)
> +{
> +	while (id->name[0]) {
> +		const char *p = strrchr(dev_name(&ancildev->dev), '.');
> +		int match_size;
> +
> +		if (!p) {
> +			id++;
> +			continue;
> +		}
> +		match_size = p - dev_name(&ancildev->dev);
> +
> +		/* use dev_name(&ancildev->dev) prefix before last '.' char to match to */
> +		if (!strncmp(dev_name(&ancildev->dev), id->name, match_size))
> +			return id;
> +		id++;
> +	}
> +	return NULL;
> +}
> +
> +static int ancillary_match(struct device *dev, struct device_driver *drv)
> +{
> +	struct ancillary_device *ancildev = to_ancillary_dev(dev);
> +	struct ancillary_driver *ancildrv = to_ancillary_drv(drv);
> +
> +	return !!ancillary_match_id(ancildrv->id_table, ancildev);
> +}
> +
> +static int ancillary_uevent(struct device *dev, struct kobj_uevent_env *env)
> +{
> +	const char *name, *p;
> +
> +	name = dev_name(dev);
> +	p = strrchr(name, '.');
> +
> +	return add_uevent_var(env, "MODALIAS=%s%.*s", ANCILLARY_MODULE_PREFIX, (int)(p - name),
> +			      name);
> +}
> +
> +static const struct dev_pm_ops ancillary_dev_pm_ops = {
> +	SET_RUNTIME_PM_OPS(pm_generic_runtime_suspend, pm_generic_runtime_resume, NULL)
> +	SET_SYSTEM_SLEEP_PM_OPS(pm_generic_suspend, pm_generic_resume)
> +};
> +
> +struct bus_type ancillary_bus_type = {
> +	.name = "ancillary",
> +	.match = ancillary_match,
> +	.uevent = ancillary_uevent,
> +	.pm = &ancillary_dev_pm_ops,
> +};
> +
> +/**
> + * ancillary_device_initialize - check ancillary_device and initialize
> + * @ancildev: ancillary device struct
> + *
> + * This is the first step in the two-step process to register an ancillary_device.
> + *
> + * When this function returns an error code, then the device_initialize will *not* have
> + * been performed, and the caller will be responsible to free any memory allocated for the
> + * ancillary_device in the error path directly.
> + *
> + * It returns 0 on success.  On success, the device_initialize has been performed.
> + * After this point any error unwinding will need to include a call to put_device().
> + * In this post-initialize error scenario, a call to the device's .release callback will be
> + * triggered by put_device(), and all memory clean-up is expected to be handled there.
> + */
> +int ancillary_device_initialize(struct ancillary_device *ancildev)
> +{
> +	struct device *dev = &ancildev->dev;
> +
> +	dev->bus = &ancillary_bus_type;
> +
> +	if (!dev->parent) {
> +		pr_err("ancillary_device has a NULL dev->parent\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!ancildev->name) {
> +		pr_err("acillary_device has a NULL name\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!(dev->type && dev->type->release) && !dev->release) {
> +		pr_err("ancillary_device does not have a release callback defined\n");
> +		return -EINVAL;
> +	}
> +
> +	device_initialize(&ancildev->dev);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ancillary_device_initialize);
> +
> +/**
> + * __ancillary_device_add - add an ancillary bus device
> + * @ancildev: ancillary bus device to add to the bus
> + * @modname: name of the parent device's driver module
> + *
> + * This is the second step in the two-step process to register an ancillary_device.
> + *
> + * This function must be called after a successful call to ancillary_device_initialize(), which
> + * will perform the device_initialize.  This means that if this returns an error code, then a
> + * put_device must be performed so that the .release callback will be triggered to free the
> + * memory associated with the ancillary_device.
> + */
> +int __ancillary_device_add(struct ancillary_device *ancildev, const char *modname)
> +{
> +	struct device *dev = &ancildev->dev;
> +	int ret;
> +
> +	if (!modname) {
> +		pr_err("ancillary device modname is NULL\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = dev_set_name(dev, "%s.%s.%d", modname, ancildev->name, ancildev->id);
> +	if (ret) {
> +		pr_err("ancillary device dev_set_name failed: %d\n", ret);
> +		return ret;
> +	}
> +
> +	ret = device_add(dev);
> +	if (ret)
> +		dev_err(dev, "adding ancillary device failed!: %d\n", ret);
> +
> +	return ret;
> +}

Sorry, but this is very strange API that requires users to put
internal call to "dev" that is buried inside "struct ancillary_device".

For example in your next patch, you write this "put_device(&cdev->ancildev.dev);"

I'm pretty sure that the amount of bugs in error unwind will be
astonishing, so if you are doing wrappers over core code, better do not
pass complexity to the users.

> +EXPORT_SYMBOL_GPL(__ancillary_device_add);
> +
> +static int ancillary_probe_driver(struct device *dev)
> +{
> +	struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
> +	struct ancillary_device *ancildev = to_ancillary_dev(dev);
> +	int ret;
> +
> +	ret = dev_pm_domain_attach(dev, true);
> +	if (ret) {
> +		dev_warn(dev, "Failed to attach to PM Domain : %d\n", ret);
> +		return ret;
> +	}
> +
> +	ret = ancildrv->probe(ancildev, ancillary_match_id(ancildrv->id_table, ancildev));

I don't think that you need to call ->probe() if ancillary_match_id()
returned NULL and probably that check should be done before
dev_pm_domain_attach().

> +	if (ret)
> +		dev_pm_domain_detach(dev, true);
> +
> +	return ret;
> +}
> +
> +static int ancillary_remove_driver(struct device *dev)
> +{
> +	struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
> +	struct ancillary_device *ancildev = to_ancillary_dev(dev);
> +	int ret;
> +
> +	ret = ancildrv->remove(ancildev);
> +	dev_pm_domain_detach(dev, true);
> +
> +	return ret;

You returned an error to user and detached from PM, what will user do
with this information? Should user ignore it? retry?

> +}
> +
> +static void ancillary_shutdown_driver(struct device *dev)
> +{
> +	struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
> +	struct ancillary_device *ancildev = to_ancillary_dev(dev);
> +
> +	ancildrv->shutdown(ancildev);
> +}
> +
> +/**
> + * __ancillary_driver_register - register a driver for ancillary bus devices
> + * @ancildrv: ancillary_driver structure
> + * @owner: owning module/driver
> + */
> +int __ancillary_driver_register(struct ancillary_driver *ancildrv, struct module *owner)
> +{
> +	if (WARN_ON(!ancildrv->probe) || WARN_ON(!ancildrv->remove) ||
> +	    WARN_ON(!ancildrv->shutdown) || WARN_ON(!ancildrv->id_table))
> +		return -EINVAL;
> +
> +	ancildrv->driver.owner = owner;
> +	ancildrv->driver.bus = &ancillary_bus_type;
> +	ancildrv->driver.probe = ancillary_probe_driver;
> +	ancildrv->driver.remove = ancillary_remove_driver;
> +	ancildrv->driver.shutdown = ancillary_shutdown_driver;
> +
> +	return driver_register(&ancildrv->driver);
> +}
> +EXPORT_SYMBOL_GPL(__ancillary_driver_register);
> +
> +static int __init ancillary_bus_init(void)
> +{
> +	return bus_register(&ancillary_bus_type);
> +}
> +
> +static void __exit ancillary_bus_exit(void)
> +{
> +	bus_unregister(&ancillary_bus_type);
> +}
> +
> +module_init(ancillary_bus_init);
> +module_exit(ancillary_bus_exit);
> +
> +MODULE_LICENSE("GPL v2");
> +MODULE_DESCRIPTION("Ancillary Bus");
> +MODULE_AUTHOR("David Ertman <david.m.ertman@intel.com>");
> +MODULE_AUTHOR("Kiran Patil <kiran.patil@intel.com>");
> diff --git a/include/linux/ancillary_bus.h b/include/linux/ancillary_bus.h
> new file mode 100644
> index 000000000000..72169c8a5dfe
> --- /dev/null
> +++ b/include/linux/ancillary_bus.h
> @@ -0,0 +1,69 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2019-2020 Intel Corporation
> + *
> + * Please see Documentation/driver-api/ancillary_bus.rst for more information.
> + */
> +
> +#ifndef _ANCILLARY_BUS_H_
> +#define _ANCILLARY_BUS_H_
> +
> +#include <linux/device.h>
> +#include <linux/mod_devicetable.h>
> +#include <linux/slab.h>
> +
> +struct ancillary_device {
> +	struct device dev;
> +	const char *name;
> +	u32 id;
> +};
> +
> +struct ancillary_driver {
> +	int (*probe)(struct ancillary_device *ancildev, const struct ancillary_device_id *id);
> +	int (*remove)(struct ancillary_device *ancildev);
> +	void (*shutdown)(struct ancillary_device *ancildev);
> +	int (*suspend)(struct ancillary_device *ancildev, pm_message_t state);
> +	int (*resume)(struct ancillary_device *ancildev);
> +	struct device_driver driver;
> +	const struct ancillary_device_id *id_table;
> +};
> +
> +static inline struct ancillary_device *to_ancillary_dev(struct device *dev)
> +{
> +	return container_of(dev, struct ancillary_device, dev);
> +}
> +
> +static inline struct ancillary_driver *to_ancillary_drv(struct device_driver *drv)
> +{
> +	return container_of(drv, struct ancillary_driver, driver);
> +}
> +
> +int ancillary_device_initialize(struct ancillary_device *ancildev);
> +int __ancillary_device_add(struct ancillary_device *ancildev, const char *modname);
> +#define ancillary_device_add(ancildev) __ancillary_device_add(ancildev, KBUILD_MODNAME)
> +
> +static inline void ancillary_device_unregister(struct ancillary_device *ancildev)
> +{
> +	device_unregister(&ancildev->dev);
> +}
> +
> +int __ancillary_driver_register(struct ancillary_driver *ancildrv, struct module *owner);
> +#define ancillary_driver_register(ancildrv) __ancillary_driver_register(ancildrv, THIS_MODULE)
> +
> +static inline void ancillary_driver_unregister(struct ancillary_driver *ancildrv)
> +{
> +	driver_unregister(&ancildrv->driver);
> +}
> +
> +/**
> + * module_ancillary_driver() - Helper macro for registering an ancillary driver
> + * @__ancillary_driver: ancillary driver struct
> + *
> + * Helper macro for ancillary drivers which do not do anything special in
> + * module init/exit. This eliminates a lot of boilerplate. Each module may only
> + * use this macro once, and calling it replaces module_init() and module_exit()
> + */
> +#define module_ancillary_driver(__ancillary_driver) \
> +	module_driver(__ancillary_driver, ancillary_driver_register, ancillary_driver_unregister)
> +
> +#endif /* _ANCILLARY_BUS_H_ */
> diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h
> index 5b08a473cdba..7d596dc30833 100644
> --- a/include/linux/mod_devicetable.h
> +++ b/include/linux/mod_devicetable.h
> @@ -838,4 +838,12 @@ struct mhi_device_id {
>  	kernel_ulong_t driver_data;
>  };
>
> +#define ANCILLARY_NAME_SIZE 32
> +#define ANCILLARY_MODULE_PREFIX "ancillary:"
> +
> +struct ancillary_device_id {
> +	char name[ANCILLARY_NAME_SIZE];

I hope that this be enough.

> +	kernel_ulong_t driver_data;
> +};
> +
>  #endif /* LINUX_MOD_DEVICETABLE_H */
> diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c
> index 27007c18e754..79e37c4c25b3 100644
> --- a/scripts/mod/devicetable-offsets.c
> +++ b/scripts/mod/devicetable-offsets.c
> @@ -243,5 +243,8 @@ int main(void)
>  	DEVID(mhi_device_id);
>  	DEVID_FIELD(mhi_device_id, chan);
>
> +	DEVID(ancillary_device_id);
> +	DEVID_FIELD(ancillary_device_id, name);
> +
>  	return 0;
>  }
> diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
> index 2417dd1dee33..99c4fcd82bf3 100644
> --- a/scripts/mod/file2alias.c
> +++ b/scripts/mod/file2alias.c
> @@ -1364,6 +1364,13 @@ static int do_mhi_entry(const char *filename, void *symval, char *alias)
>  {
>  	DEF_FIELD_ADDR(symval, mhi_device_id, chan);
>  	sprintf(alias, MHI_DEVICE_MODALIAS_FMT, *chan);
> +	return 1;
> +}
> +
> +static int do_ancillary_entry(const char *filename, void *symval, char *alias)
> +{
> +	DEF_FIELD_ADDR(symval, ancillary_device_id, name);
> +	sprintf(alias, ANCILLARY_MODULE_PREFIX "%s", *name);
>
>  	return 1;
>  }
> @@ -1442,6 +1449,7 @@ static const struct devtable devtable[] = {
>  	{"tee", SIZE_tee_client_device_id, do_tee_entry},
>  	{"wmi", SIZE_wmi_device_id, do_wmi_entry},
>  	{"mhi", SIZE_mhi_device_id, do_mhi_entry},
> +	{"ancillary", SIZE_ancillary_device_id, do_ancillary_entry},
>  };
>
>  /* Create MODULE_ALIAS() statements.
> --
> 2.26.2
>

Pierre-Louis Bossart Oct. 6, 2020, 3:18 p.m. UTC | #2

Thanks for the review Leon.

>> Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
>> It enables drivers to create an ancillary_device and bind an
>> ancillary_driver to it.
> 
> I was under impression that this name is going to be changed.

It's part of the opens stated in the cover letter.

[...]

>> +	const struct my_driver my_drv = {
>> +		.ancillary_drv = {
>> +			.driver = {
>> +				.name = "myancillarydrv",
> 
> Why do we need to give control over driver name to the driver authors?
> It can be problematic if author puts name that already exists.

Good point. When I used the ancillary_devices for my own SoundWire test, 
the driver name didn't seem specifically meaningful but needed to be set 
to something, what mattered was the id_table. Just thinking aloud, maybe 
we can add prefixing with KMOD_BUILD, as we've done already to avoid 
collisions between device names?

[...]

>> +int __ancillary_device_add(struct ancillary_device *ancildev, const char *modname)
>> +{
>> +	struct device *dev = &ancildev->dev;
>> +	int ret;
>> +
>> +	if (!modname) {
>> +		pr_err("ancillary device modname is NULL\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	ret = dev_set_name(dev, "%s.%s.%d", modname, ancildev->name, ancildev->id);
>> +	if (ret) {
>> +		pr_err("ancillary device dev_set_name failed: %d\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	ret = device_add(dev);
>> +	if (ret)
>> +		dev_err(dev, "adding ancillary device failed!: %d\n", ret);
>> +
>> +	return ret;
>> +}
> 
> Sorry, but this is very strange API that requires users to put
> internal call to "dev" that is buried inside "struct ancillary_device".
> 
> For example in your next patch, you write this "put_device(&cdev->ancildev.dev);"
> 
> I'm pretty sure that the amount of bugs in error unwind will be
> astonishing, so if you are doing wrappers over core code, better do not
> pass complexity to the users.

In initial reviews, there was pushback on adding wrappers that don't do 
anything except for a pointer indirection.

Others had concerns that the API wasn't balanced and blurring layers.

Both points have merits IMHO. Do we want wrappers for everything and 
completely hide the low-level device?

> 
>> +EXPORT_SYMBOL_GPL(__ancillary_device_add);
>> +
>> +static int ancillary_probe_driver(struct device *dev)
>> +{
>> +	struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
>> +	struct ancillary_device *ancildev = to_ancillary_dev(dev);
>> +	int ret;
>> +
>> +	ret = dev_pm_domain_attach(dev, true);
>> +	if (ret) {
>> +		dev_warn(dev, "Failed to attach to PM Domain : %d\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	ret = ancildrv->probe(ancildev, ancillary_match_id(ancildrv->id_table, ancildev));
> 
> I don't think that you need to call ->probe() if ancillary_match_id()
> returned NULL and probably that check should be done before
> dev_pm_domain_attach().

we'll look into this.

> 
>> +	if (ret)
>> +		dev_pm_domain_detach(dev, true);
>> +
>> +	return ret;
>> +}
>> +
>> +static int ancillary_remove_driver(struct device *dev)
>> +{
>> +	struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
>> +	struct ancillary_device *ancildev = to_ancillary_dev(dev);
>> +	int ret;
>> +
>> +	ret = ancildrv->remove(ancildev);
>> +	dev_pm_domain_detach(dev, true);
>> +
>> +	return ret;
> 
> You returned an error to user and detached from PM, what will user do
> with this information? Should user ignore it? retry?

That comment was also provided in earlier reviews. In practice the error 
is typically ignored so there was a suggestion to move the return type 
to void, that could be done if this was desired by the majority.

[...]

>> diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h
>> index 5b08a473cdba..7d596dc30833 100644
>> --- a/include/linux/mod_devicetable.h
>> +++ b/include/linux/mod_devicetable.h
>> @@ -838,4 +838,12 @@ struct mhi_device_id {
>>   	kernel_ulong_t driver_data;
>>   };
>>
>> +#define ANCILLARY_NAME_SIZE 32
>> +#define ANCILLARY_MODULE_PREFIX "ancillary:"
>> +
>> +struct ancillary_device_id {
>> +	char name[ANCILLARY_NAME_SIZE];
> 
> I hope that this be enough.

Are you suggesting a different value to allow for a longer string?

Leon Romanovsky Oct. 6, 2020, 5:02 p.m. UTC | #3

On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:
> Thanks for the review Leon.
>
> > > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > > It enables drivers to create an ancillary_device and bind an
> > > ancillary_driver to it.
> >
> > I was under impression that this name is going to be changed.
>
> It's part of the opens stated in the cover letter.

ok, so what are the variants?
system bus (sysbus), sbsystem bus (subbus), crossbus ?

>
> [...]
>
> > > +	const struct my_driver my_drv = {
> > > +		.ancillary_drv = {
> > > +			.driver = {
> > > +				.name = "myancillarydrv",
> >
> > Why do we need to give control over driver name to the driver authors?
> > It can be problematic if author puts name that already exists.
>
> Good point. When I used the ancillary_devices for my own SoundWire test, the
> driver name didn't seem specifically meaningful but needed to be set to
> something, what mattered was the id_table. Just thinking aloud, maybe we can
> add prefixing with KMOD_BUILD, as we've done already to avoid collisions
> between device names?

IMHO, it shouldn't be controlled by the drivers at all and need to have
kernel module name hardwired. Users will use it later for various
bind/unbind/autoprobe tricks and it will give predictability for them.

>
> [...]
>
> > > +int __ancillary_device_add(struct ancillary_device *ancildev, const char *modname)
> > > +{
> > > +	struct device *dev = &ancildev->dev;
> > > +	int ret;
> > > +
> > > +	if (!modname) {
> > > +		pr_err("ancillary device modname is NULL\n");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	ret = dev_set_name(dev, "%s.%s.%d", modname, ancildev->name, ancildev->id);
> > > +	if (ret) {
> > > +		pr_err("ancillary device dev_set_name failed: %d\n", ret);
> > > +		return ret;
> > > +	}
> > > +
> > > +	ret = device_add(dev);
> > > +	if (ret)
> > > +		dev_err(dev, "adding ancillary device failed!: %d\n", ret);
> > > +
> > > +	return ret;
> > > +}
> >
> > Sorry, but this is very strange API that requires users to put
> > internal call to "dev" that is buried inside "struct ancillary_device".
> >
> > For example in your next patch, you write this "put_device(&cdev->ancildev.dev);"
> >
> > I'm pretty sure that the amount of bugs in error unwind will be
> > astonishing, so if you are doing wrappers over core code, better do not
> > pass complexity to the users.
>
> In initial reviews, there was pushback on adding wrappers that don't do
> anything except for a pointer indirection.
>
> Others had concerns that the API wasn't balanced and blurring layers.

Are you talking about internal review or public?
If it is public, can I get a link to it?

>
> Both points have merits IMHO. Do we want wrappers for everything and
> completely hide the low-level device?

This API is partially obscures low level driver-core code and needs to
provide clear and proper abstractions without need to remember about
put_device. There is already _add() interface why don't you do
put_device() in it?

>
> >
> > > +EXPORT_SYMBOL_GPL(__ancillary_device_add);
> > > +
> > > +static int ancillary_probe_driver(struct device *dev)
> > > +{
> > > +	struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
> > > +	struct ancillary_device *ancildev = to_ancillary_dev(dev);
> > > +	int ret;
> > > +
> > > +	ret = dev_pm_domain_attach(dev, true);
> > > +	if (ret) {
> > > +		dev_warn(dev, "Failed to attach to PM Domain : %d\n", ret);
> > > +		return ret;
> > > +	}
> > > +
> > > +	ret = ancildrv->probe(ancildev, ancillary_match_id(ancildrv->id_table, ancildev));
> >
> > I don't think that you need to call ->probe() if ancillary_match_id()
> > returned NULL and probably that check should be done before
> > dev_pm_domain_attach().
>
> we'll look into this.
>
> >
> > > +	if (ret)
> > > +		dev_pm_domain_detach(dev, true);
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static int ancillary_remove_driver(struct device *dev)
> > > +{
> > > +	struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
> > > +	struct ancillary_device *ancildev = to_ancillary_dev(dev);
> > > +	int ret;
> > > +
> > > +	ret = ancildrv->remove(ancildev);
> > > +	dev_pm_domain_detach(dev, true);
> > > +
> > > +	return ret;
> >
> > You returned an error to user and detached from PM, what will user do
> > with this information? Should user ignore it? retry?
>
> That comment was also provided in earlier reviews. In practice the error is
> typically ignored so there was a suggestion to move the return type to void,
> that could be done if this was desired by the majority.

+1 from me.

>
> [...]
>
> > > diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h
> > > index 5b08a473cdba..7d596dc30833 100644
> > > --- a/include/linux/mod_devicetable.h
> > > +++ b/include/linux/mod_devicetable.h
> > > @@ -838,4 +838,12 @@ struct mhi_device_id {
> > >   	kernel_ulong_t driver_data;
> > >   };
> > >
> > > +#define ANCILLARY_NAME_SIZE 32
> > > +#define ANCILLARY_MODULE_PREFIX "ancillary:"
> > > +
> > > +struct ancillary_device_id {
> > > +	char name[ANCILLARY_NAME_SIZE];
> >
> > I hope that this be enough.
>
> Are you suggesting a different value to allow for a longer string?

I have no idea, but worried that there were no checks at all if name is
more than 32. Maybe compiler warn about it?

Thanks

Parav Pandit Oct. 6, 2020, 5:09 p.m. UTC | #4

> From: Leon Romanovsky <leon@kernel.org>
> Sent: Tuesday, October 6, 2020 10:33 PM
> 
> On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:
> > Thanks for the review Leon.
> >
> > > > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > > > It enables drivers to create an ancillary_device and bind an
> > > > ancillary_driver to it.
> > >
> > > I was under impression that this name is going to be changed.
> >
> > It's part of the opens stated in the cover letter.
> 
> ok, so what are the variants?
> system bus (sysbus), sbsystem bus (subbus), crossbus ?
Since the intended use of this bus is to 
(a) create sub devices that represent 'functional separation' and 
(b) second use case for subfunctions from a pci device,

I proposed below names in v1 of this patchset.

(a) subdev_bus
(b) subfunction_bus

Leon Romanovsky Oct. 6, 2020, 5:23 p.m. UTC | #5

On Mon, Oct 05, 2020 at 11:24:41AM -0700, Dave Ertman wrote:
> Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> It enables drivers to create an ancillary_device and bind an
> ancillary_driver to it.
>
> The bus supports probe/remove shutdown and suspend/resume callbacks.
> Each ancillary_device has a unique string based id; driver binds to
> an ancillary_device based on this id through the bus.
>
> Co-developed-by: Kiran Patil <kiran.patil@intel.com>
> Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> Co-developed-by: Fred Oh <fred.oh@linux.intel.com>
> Signed-off-by: Fred Oh <fred.oh@linux.intel.com>
> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
> Reviewed-by: Parav Pandit <parav@mellanox.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> ---

<...>

> +/**
> + * __ancillary_driver_register - register a driver for ancillary bus devices
> + * @ancildrv: ancillary_driver structure
> + * @owner: owning module/driver
> + */
> +int __ancillary_driver_register(struct ancillary_driver *ancildrv, struct module *owner)
> +{
> +	if (WARN_ON(!ancildrv->probe) || WARN_ON(!ancildrv->remove) ||
> +	    WARN_ON(!ancildrv->shutdown) || WARN_ON(!ancildrv->id_table))
> +		return -EINVAL;

In our driver ->shutdown is empty, it will be best if ancillary bus will
do "if (->remove) ..->remove()" pattern.

> +
> +	ancildrv->driver.owner = owner;
> +	ancildrv->driver.bus = &ancillary_bus_type;
> +	ancildrv->driver.probe = ancillary_probe_driver;
> +	ancildrv->driver.remove = ancillary_remove_driver;
> +	ancildrv->driver.shutdown = ancillary_shutdown_driver;
> +

I think that this part is wrong, probe/remove/shutdown functions should
come from ancillary_bus_type. You are overwriting private device_driver
callbacks that makes impossible to make container_of of ancillary_driver
to chain operations.

> +	return driver_register(&ancildrv->driver);
> +}
> +EXPORT_SYMBOL_GPL(__ancillary_driver_register);

Thanks

Leon Romanovsky Oct. 6, 2020, 5:26 p.m. UTC | #6

On Tue, Oct 06, 2020 at 05:09:09PM +0000, Parav Pandit wrote:
>
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Tuesday, October 6, 2020 10:33 PM
> >
> > On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:
> > > Thanks for the review Leon.
> > >
> > > > > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > > > > It enables drivers to create an ancillary_device and bind an
> > > > > ancillary_driver to it.
> > > >
> > > > I was under impression that this name is going to be changed.
> > >
> > > It's part of the opens stated in the cover letter.
> >
> > ok, so what are the variants?
> > system bus (sysbus), sbsystem bus (subbus), crossbus ?
> Since the intended use of this bus is to
> (a) create sub devices that represent 'functional separation' and
> (b) second use case for subfunctions from a pci device,
>
> I proposed below names in v1 of this patchset.
>
> (a) subdev_bus

It sounds good, just can we avoid "_" in the name and call it subdev?

> (b) subfunction_bus

Saleem, Shiraz Oct. 6, 2020, 5:41 p.m. UTC | #7

> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> 
> On Tue, Oct 06, 2020 at 05:09:09PM +0000, Parav Pandit wrote:
> >
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Sent: Tuesday, October 6, 2020 10:33 PM
> > >
> > > On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:
> > > > Thanks for the review Leon.
> > > >
> > > > > > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > > > > > It enables drivers to create an ancillary_device and bind an
> > > > > > ancillary_driver to it.
> > > > >
> > > > > I was under impression that this name is going to be changed.
> > > >
> > > > It's part of the opens stated in the cover letter.
> > >
> > > ok, so what are the variants?
> > > system bus (sysbus), sbsystem bus (subbus), crossbus ?
> > Since the intended use of this bus is to
> > (a) create sub devices that represent 'functional separation' and
> > (b) second use case for subfunctions from a pci device,
> >
> > I proposed below names in v1 of this patchset.
> >
> > (a) subdev_bus
> 
> It sounds good, just can we avoid "_" in the name and call it subdev?
> 

What is wrong with naming the bus 'ancillary bus'? I feel it's a fitting name.
An ancillary software bus for ancillary devices carved off a parent device registered on a primary bus.

Saleem, Shiraz Oct. 6, 2020, 5:45 p.m. UTC | #8

> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> 
> On Mon, Oct 05, 2020 at 11:24:41AM -0700, Dave Ertman wrote:
> > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > It enables drivers to create an ancillary_device and bind an
> > ancillary_driver to it.
> >
> > The bus supports probe/remove shutdown and suspend/resume callbacks.
> > Each ancillary_device has a unique string based id; driver binds to an
> > ancillary_device based on this id through the bus.
> >
> > Co-developed-by: Kiran Patil <kiran.patil@intel.com>
> > Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> > Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> > Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> > Co-developed-by: Fred Oh <fred.oh@linux.intel.com>
> > Signed-off-by: Fred Oh <fred.oh@linux.intel.com>
> > Reviewed-by: Pierre-Louis Bossart
> > <pierre-louis.bossart@linux.intel.com>
> > Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > Reviewed-by: Parav Pandit <parav@mellanox.com>
> > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> > ---
> 
> <...>
> 
> > +/**
> > + * __ancillary_driver_register - register a driver for ancillary bus
> > +devices
> > + * @ancildrv: ancillary_driver structure
> > + * @owner: owning module/driver
> > + */
> > +int __ancillary_driver_register(struct ancillary_driver *ancildrv,
> > +struct module *owner) {
> > +	if (WARN_ON(!ancildrv->probe) || WARN_ON(!ancildrv->remove) ||
> > +	    WARN_ON(!ancildrv->shutdown) || WARN_ON(!ancildrv->id_table))
> > +		return -EINVAL;
> 
> In our driver ->shutdown is empty, it will be best if ancillary bus will do "if (-
> >remove) ..->remove()" pattern.
> 
I prefer that too if its possible. We will look into it.

Saleem, Shiraz Oct. 6, 2020, 5:50 p.m. UTC | #9

> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> 
> On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:
> > Thanks for the review Leon.
> >
> > > > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > > > It enables drivers to create an ancillary_device and bind an
> > > > ancillary_driver to it.
> > >
> > > I was under impression that this name is going to be changed.
> >
> > It's part of the opens stated in the cover letter.
> 
> ok, so what are the variants?
> system bus (sysbus), sbsystem bus (subbus), crossbus ?
> 
> >
> > [...]
> >
> > > > +	const struct my_driver my_drv = {
> > > > +		.ancillary_drv = {
> > > > +			.driver = {
> > > > +				.name = "myancillarydrv",
> > >
> > > Why do we need to give control over driver name to the driver authors?
> > > It can be problematic if author puts name that already exists.
> >
> > Good point. When I used the ancillary_devices for my own SoundWire
> > test, the driver name didn't seem specifically meaningful but needed
> > to be set to something, what mattered was the id_table. Just thinking
> > aloud, maybe we can add prefixing with KMOD_BUILD, as we've done
> > already to avoid collisions between device names?
> 
> IMHO, it shouldn't be controlled by the drivers at all and need to have kernel
> module name hardwired. Users will use it later for various bind/unbind/autoprobe
> tricks and it will give predictability for them.
> 

+1. This name is not used in the match. Having the bus hardwire the modname sounds like a good idea.

Shiraz

Ranjani Sridharan Oct. 6, 2020, 6:35 p.m. UTC | #10

On Tue, 2020-10-06 at 20:26 +0300, Leon Romanovsky wrote:
> On Tue, Oct 06, 2020 at 05:09:09PM +0000, Parav Pandit wrote:
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Sent: Tuesday, October 6, 2020 10:33 PM
> > > 
> > > On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart
> > > wrote:
> > > > Thanks for the review Leon.
> > > > 
> > > > > > Add support for the Ancillary Bus, ancillary_device and
> > > > > > ancillary_driver.
> > > > > > It enables drivers to create an ancillary_device and bind
> > > > > > an
> > > > > > ancillary_driver to it.
> > > > > 
> > > > > I was under impression that this name is going to be changed.
> > > > 
> > > > It's part of the opens stated in the cover letter.
> > > 
> > > ok, so what are the variants?
> > > system bus (sysbus), sbsystem bus (subbus), crossbus ?
> > Since the intended use of this bus is to
> > (a) create sub devices that represent 'functional separation' and
> > (b) second use case for subfunctions from a pci device,
> > 
> > I proposed below names in v1 of this patchset.
> > 
> > (a) subdev_bus
> 
> It sounds good, just can we avoid "_" in the name and call it subdev?
> 
> > (b) subfunction_bus

While we're still discussing names, may I also suggest simply "software
bus" instead?

Thanks,Ranjani

Leon Romanovsky Oct. 6, 2020, 7:20 p.m. UTC | #11

On Tue, Oct 06, 2020 at 05:41:00PM +0000, Saleem, Shiraz wrote:
> > Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> >
> > On Tue, Oct 06, 2020 at 05:09:09PM +0000, Parav Pandit wrote:
> > >
> > > > From: Leon Romanovsky <leon@kernel.org>
> > > > Sent: Tuesday, October 6, 2020 10:33 PM
> > > >
> > > > On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:
> > > > > Thanks for the review Leon.
> > > > >
> > > > > > > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > > > > > > It enables drivers to create an ancillary_device and bind an
> > > > > > > ancillary_driver to it.
> > > > > >
> > > > > > I was under impression that this name is going to be changed.
> > > > >
> > > > > It's part of the opens stated in the cover letter.
> > > >
> > > > ok, so what are the variants?
> > > > system bus (sysbus), sbsystem bus (subbus), crossbus ?
> > > Since the intended use of this bus is to
> > > (a) create sub devices that represent 'functional separation' and
> > > (b) second use case for subfunctions from a pci device,
> > >
> > > I proposed below names in v1 of this patchset.
> > >
> > > (a) subdev_bus
> >
> > It sounds good, just can we avoid "_" in the name and call it subdev?
> >
>
> What is wrong with naming the bus 'ancillary bus'? I feel it's a fitting name.
> An ancillary software bus for ancillary devices carved off a parent device registered on a primary bus.

Greg summarized it very well, every internal conversation about this patch
with my colleagues (non-english speakers) starts with the question:
"What does ancillary mean?"
https://lore.kernel.org/alsa-devel/20201001071403.GC31191@kroah.com/

"For non-native english speakers this is going to be rough,
given that I as a native english speaker had to go look up
the word in a dictionary to fully understand what you are
trying to do with that name."

Thanks

>
>
>

Dan Williams Oct. 7, 2020, 2:49 a.m. UTC | #12

On Tue, Oct 6, 2020 at 12:21 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, Oct 06, 2020 at 05:41:00PM +0000, Saleem, Shiraz wrote:
> > > Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > >
> > > On Tue, Oct 06, 2020 at 05:09:09PM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Leon Romanovsky <leon@kernel.org>
> > > > > Sent: Tuesday, October 6, 2020 10:33 PM
> > > > >
> > > > > On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:
> > > > > > Thanks for the review Leon.
> > > > > >
> > > > > > > > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > > > > > > > It enables drivers to create an ancillary_device and bind an
> > > > > > > > ancillary_driver to it.
> > > > > > >
> > > > > > > I was under impression that this name is going to be changed.
> > > > > >
> > > > > > It's part of the opens stated in the cover letter.
> > > > >
> > > > > ok, so what are the variants?
> > > > > system bus (sysbus), sbsystem bus (subbus), crossbus ?
> > > > Since the intended use of this bus is to
> > > > (a) create sub devices that represent 'functional separation' and
> > > > (b) second use case for subfunctions from a pci device,
> > > >
> > > > I proposed below names in v1 of this patchset.
> > > >
> > > > (a) subdev_bus
> > >
> > > It sounds good, just can we avoid "_" in the name and call it subdev?
> > >
> >
> > What is wrong with naming the bus 'ancillary bus'? I feel it's a fitting name.
> > An ancillary software bus for ancillary devices carved off a parent device registered on a primary bus.
>
> Greg summarized it very well, every internal conversation about this patch
> with my colleagues (non-english speakers) starts with the question:
> "What does ancillary mean?"
> https://lore.kernel.org/alsa-devel/20201001071403.GC31191@kroah.com/
>
> "For non-native english speakers this is going to be rough,
> given that I as a native english speaker had to go look up
> the word in a dictionary to fully understand what you are
> trying to do with that name."

I suggested "auxiliary" in another splintered thread on this question.
In terms of what the kernel is already using:

$ git grep auxiliary | wc -l
507
$ git grep ancillary | wc -l
153

Empirically, "auxiliary" is more common and closely matches the
intended function of these devices relative to their parent device.

Saleem, Shiraz Oct. 7, 2020, 1:09 p.m. UTC | #13

> Subject: Re: [PATCH v2 1/6] Add ancillary bus support

> 

> On Tue, Oct 6, 2020 at 12:21 PM Leon Romanovsky <leon@kernel.org> wrote:

> >

> > On Tue, Oct 06, 2020 at 05:41:00PM +0000, Saleem, Shiraz wrote:

> > > > Subject: Re: [PATCH v2 1/6] Add ancillary bus support

> > > >

> > > > On Tue, Oct 06, 2020 at 05:09:09PM +0000, Parav Pandit wrote:

> > > > >

> > > > > > From: Leon Romanovsky <leon@kernel.org>

> > > > > > Sent: Tuesday, October 6, 2020 10:33 PM

> > > > > >

> > > > > > On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:

> > > > > > > Thanks for the review Leon.

> > > > > > >

> > > > > > > > > Add support for the Ancillary Bus, ancillary_device and

> ancillary_driver.

> > > > > > > > > It enables drivers to create an ancillary_device and

> > > > > > > > > bind an ancillary_driver to it.

> > > > > > > >

> > > > > > > > I was under impression that this name is going to be changed.

> > > > > > >

> > > > > > > It's part of the opens stated in the cover letter.

> > > > > >

> > > > > > ok, so what are the variants?

> > > > > > system bus (sysbus), sbsystem bus (subbus), crossbus ?

> > > > > Since the intended use of this bus is to

> > > > > (a) create sub devices that represent 'functional separation'

> > > > > and

> > > > > (b) second use case for subfunctions from a pci device,

> > > > >

> > > > > I proposed below names in v1 of this patchset.

> > > > 

> > > > > (a) subdev_bus

> > > >

> > > > It sounds good, just can we avoid "_" in the name and call it subdev?

> > > >

> > >

> > > What is wrong with naming the bus 'ancillary bus'? I feel it's a fitting name.

> > > An ancillary software bus for ancillary devices carved off a parent device

> registered on a primary bus.

> >

> > Greg summarized it very well, every internal conversation about this

> > patch with my colleagues (non-english speakers) starts with the question:

> > "What does ancillary mean?"

> > https://lore.kernel.org/alsa-devel/20201001071403.GC31191@kroah.com/

> >

> > "For non-native english speakers this is going to be rough, given that

> > I as a native english speaker had to go look up the word in a

> > dictionary to fully understand what you are trying to do with that

> > name."

> 

> I suggested "auxiliary" in another splintered thread on this question.

> In terms of what the kernel is already using:

> 

> $ git grep auxiliary | wc -l

> 507

> $ git grep ancillary | wc -l

> 153

> 

> Empirically, "auxiliary" is more common and closely matches the intended function

> of these devices relative to their parent device.


auxiliary bus is a befitting name as well.

Leon Romanovsky Oct. 7, 2020, 1:36 p.m. UTC | #14

On Wed, Oct 07, 2020 at 01:09:55PM +0000, Saleem, Shiraz wrote:
> > Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> >
> > On Tue, Oct 6, 2020 at 12:21 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Tue, Oct 06, 2020 at 05:41:00PM +0000, Saleem, Shiraz wrote:
> > > > > Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > > > >
> > > > > On Tue, Oct 06, 2020 at 05:09:09PM +0000, Parav Pandit wrote:
> > > > > >
> > > > > > > From: Leon Romanovsky <leon@kernel.org>
> > > > > > > Sent: Tuesday, October 6, 2020 10:33 PM
> > > > > > >
> > > > > > > On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:
> > > > > > > > Thanks for the review Leon.
> > > > > > > >
> > > > > > > > > > Add support for the Ancillary Bus, ancillary_device and
> > ancillary_driver.
> > > > > > > > > > It enables drivers to create an ancillary_device and
> > > > > > > > > > bind an ancillary_driver to it.
> > > > > > > > >
> > > > > > > > > I was under impression that this name is going to be changed.
> > > > > > > >
> > > > > > > > It's part of the opens stated in the cover letter.
> > > > > > >
> > > > > > > ok, so what are the variants?
> > > > > > > system bus (sysbus), sbsystem bus (subbus), crossbus ?
> > > > > > Since the intended use of this bus is to
> > > > > > (a) create sub devices that represent 'functional separation'
> > > > > > and
> > > > > > (b) second use case for subfunctions from a pci device,
> > > > > >
> > > > > > I proposed below names in v1 of this patchset.
> > > > >
> > > > > > (a) subdev_bus
> > > > >
> > > > > It sounds good, just can we avoid "_" in the name and call it subdev?
> > > > >
> > > >
> > > > What is wrong with naming the bus 'ancillary bus'? I feel it's a fitting name.
> > > > An ancillary software bus for ancillary devices carved off a parent device
> > registered on a primary bus.
> > >
> > > Greg summarized it very well, every internal conversation about this
> > > patch with my colleagues (non-english speakers) starts with the question:
> > > "What does ancillary mean?"
> > > https://lore.kernel.org/alsa-devel/20201001071403.GC31191@kroah.com/
> > >
> > > "For non-native english speakers this is going to be rough, given that
> > > I as a native english speaker had to go look up the word in a
> > > dictionary to fully understand what you are trying to do with that
> > > name."
> >
> > I suggested "auxiliary" in another splintered thread on this question.
> > In terms of what the kernel is already using:
> >
> > $ git grep auxiliary | wc -l
> > 507
> > $ git grep ancillary | wc -l
> > 153
> >
> > Empirically, "auxiliary" is more common and closely matches the intended function
> > of these devices relative to their parent device.
>
> auxiliary bus is a befitting name as well.

Let's share all options and decide later.
I don't want to find us bikeshedding about it.

Thanks

Ertman, David M Oct. 7, 2020, 6:06 p.m. UTC | #15

> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Tuesday, October 6, 2020 10:03 AM
> To: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> Cc: Ertman, David M <david.m.ertman@intel.com>; alsa-devel@alsa-
> project.org; parav@mellanox.com; tiwai@suse.de; netdev@vger.kernel.org;
> ranjani.sridharan@linux.intel.com; fred.oh@linux.intel.com; linux-
> rdma@vger.kernel.org; dledford@redhat.com; broonie@kernel.org;
> jgg@nvidia.com; gregkh@linuxfoundation.org; kuba@kernel.org; Williams,
> Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> <kiran.patil@intel.com>
> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> 
> On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:
> > Thanks for the review Leon.
> >
> > > > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > > > It enables drivers to create an ancillary_device and bind an
> > > > ancillary_driver to it.
> > >
> > > I was under impression that this name is going to be changed.
> >
> > It's part of the opens stated in the cover letter.
> 
> ok, so what are the variants?
> system bus (sysbus), sbsystem bus (subbus), crossbus ?
> 
> >
> > [...]
> >
> > > > +	const struct my_driver my_drv = {
> > > > +		.ancillary_drv = {
> > > > +			.driver = {
> > > > +				.name = "myancillarydrv",
> > >
> > > Why do we need to give control over driver name to the driver authors?
> > > It can be problematic if author puts name that already exists.
> >
> > Good point. When I used the ancillary_devices for my own SoundWire test,
> the
> > driver name didn't seem specifically meaningful but needed to be set to
> > something, what mattered was the id_table. Just thinking aloud, maybe we
> can
> > add prefixing with KMOD_BUILD, as we've done already to avoid collisions
> > between device names?
> 
> IMHO, it shouldn't be controlled by the drivers at all and need to have
> kernel module name hardwired. Users will use it later for various
> bind/unbind/autoprobe tricks and it will give predictability for them.
> 
> >
> > [...]
> >
> > > > +int __ancillary_device_add(struct ancillary_device *ancildev, const
> char *modname)
> > > > +{
> > > > +	struct device *dev = &ancildev->dev;
> > > > +	int ret;
> > > > +
> > > > +	if (!modname) {
> > > > +		pr_err("ancillary device modname is NULL\n");
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	ret = dev_set_name(dev, "%s.%s.%d", modname, ancildev->name,
> ancildev->id);
> > > > +	if (ret) {
> > > > +		pr_err("ancillary device dev_set_name failed: %d\n", ret);
> > > > +		return ret;
> > > > +	}
> > > > +
> > > > +	ret = device_add(dev);
> > > > +	if (ret)
> > > > +		dev_err(dev, "adding ancillary device failed!: %d\n", ret);
> > > > +
> > > > +	return ret;
> > > > +}
> > >
> > > Sorry, but this is very strange API that requires users to put
> > > internal call to "dev" that is buried inside "struct ancillary_device".
> > >
> > > For example in your next patch, you write this "put_device(&cdev-
> >ancildev.dev);"
> > >
> > > I'm pretty sure that the amount of bugs in error unwind will be
> > > astonishing, so if you are doing wrappers over core code, better do not
> > > pass complexity to the users.
> >
> > In initial reviews, there was pushback on adding wrappers that don't do
> > anything except for a pointer indirection.
> >
> > Others had concerns that the API wasn't balanced and blurring layers.
> 
> Are you talking about internal review or public?
> If it is public, can I get a link to it?
> 
> >
> > Both points have merits IMHO. Do we want wrappers for everything and
> > completely hide the low-level device?
> 
> This API is partially obscures low level driver-core code and needs to
> provide clear and proper abstractions without need to remember about
> put_device. There is already _add() interface why don't you do
> put_device() in it?
> 

The pushback Pierre is referring to was during our mid-tier internal review.  It was
primarily a concern of Parav as I recall, so he can speak to his reasoning.

What we originally had was a single API call (ancillary_device_register) that started
with a call to device_initialize(), and every error path out of the function performed
a put_device().

Is this the model you have in mind?

-DaveE

> >
> > >
> > > > +EXPORT_SYMBOL_GPL(__ancillary_device_add);
> > > > +
> > > > +static int ancillary_probe_driver(struct device *dev)
> > > > +{
> > > > +	struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
> > > > +	struct ancillary_device *ancildev = to_ancillary_dev(dev);
> > > > +	int ret;
> > > > +
> > > > +	ret = dev_pm_domain_attach(dev, true);
> > > > +	if (ret) {
> > > > +		dev_warn(dev, "Failed to attach to PM Domain : %d\n", ret);
> > > > +		return ret;
> > > > +	}
> > > > +
> > > > +	ret = ancildrv->probe(ancildev, ancillary_match_id(ancildrv-
> >id_table, ancildev));
> > >
> > > I don't think that you need to call ->probe() if ancillary_match_id()
> > > returned NULL and probably that check should be done before
> > > dev_pm_domain_attach().
> >
> > we'll look into this.
> >
> > >
> > > > +	if (ret)
> > > > +		dev_pm_domain_detach(dev, true);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +static int ancillary_remove_driver(struct device *dev)
> > > > +{
> > > > +	struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
> > > > +	struct ancillary_device *ancildev = to_ancillary_dev(dev);
> > > > +	int ret;
> > > > +
> > > > +	ret = ancildrv->remove(ancildev);
> > > > +	dev_pm_domain_detach(dev, true);
> > > > +
> > > > +	return ret;
> > >
> > > You returned an error to user and detached from PM, what will user do
> > > with this information? Should user ignore it? retry?
> >
> > That comment was also provided in earlier reviews. In practice the error is
> > typically ignored so there was a suggestion to move the return type to void,
> > that could be done if this was desired by the majority.
> 
> +1 from me.
> 
> >
> > [...]
> >
> > > > diff --git a/include/linux/mod_devicetable.h
> b/include/linux/mod_devicetable.h
> > > > index 5b08a473cdba..7d596dc30833 100644
> > > > --- a/include/linux/mod_devicetable.h
> > > > +++ b/include/linux/mod_devicetable.h
> > > > @@ -838,4 +838,12 @@ struct mhi_device_id {
> > > >   	kernel_ulong_t driver_data;
> > > >   };
> > > >
> > > > +#define ANCILLARY_NAME_SIZE 32
> > > > +#define ANCILLARY_MODULE_PREFIX "ancillary:"
> > > > +
> > > > +struct ancillary_device_id {
> > > > +	char name[ANCILLARY_NAME_SIZE];
> > >
> > > I hope that this be enough.
> >
> > Are you suggesting a different value to allow for a longer string?
> 
> I have no idea, but worried that there were no checks at all if name is
> more than 32. Maybe compiler warn about it?
> 
> Thanks

Dan Williams Oct. 7, 2020, 6:55 p.m. UTC | #16

On Wed, Oct 7, 2020 at 6:37 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Wed, Oct 07, 2020 at 01:09:55PM +0000, Saleem, Shiraz wrote:
> > > Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > >
> > > On Tue, Oct 6, 2020 at 12:21 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Tue, Oct 06, 2020 at 05:41:00PM +0000, Saleem, Shiraz wrote:
> > > > > > Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > > > > >
> > > > > > On Tue, Oct 06, 2020 at 05:09:09PM +0000, Parav Pandit wrote:
> > > > > > >
> > > > > > > > From: Leon Romanovsky <leon@kernel.org>
> > > > > > > > Sent: Tuesday, October 6, 2020 10:33 PM
> > > > > > > >
> > > > > > > > On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:
> > > > > > > > > Thanks for the review Leon.
> > > > > > > > >
> > > > > > > > > > > Add support for the Ancillary Bus, ancillary_device and
> > > ancillary_driver.
> > > > > > > > > > > It enables drivers to create an ancillary_device and
> > > > > > > > > > > bind an ancillary_driver to it.
> > > > > > > > > >
> > > > > > > > > > I was under impression that this name is going to be changed.
> > > > > > > > >
> > > > > > > > > It's part of the opens stated in the cover letter.
> > > > > > > >
> > > > > > > > ok, so what are the variants?
> > > > > > > > system bus (sysbus), sbsystem bus (subbus), crossbus ?
> > > > > > > Since the intended use of this bus is to
> > > > > > > (a) create sub devices that represent 'functional separation'
> > > > > > > and
> > > > > > > (b) second use case for subfunctions from a pci device,
> > > > > > >
> > > > > > > I proposed below names in v1 of this patchset.
> > > > > >
> > > > > > > (a) subdev_bus
> > > > > >
> > > > > > It sounds good, just can we avoid "_" in the name and call it subdev?
> > > > > >
> > > > >
> > > > > What is wrong with naming the bus 'ancillary bus'? I feel it's a fitting name.
> > > > > An ancillary software bus for ancillary devices carved off a parent device
> > > registered on a primary bus.
> > > >
> > > > Greg summarized it very well, every internal conversation about this
> > > > patch with my colleagues (non-english speakers) starts with the question:
> > > > "What does ancillary mean?"
> > > > https://lore.kernel.org/alsa-devel/20201001071403.GC31191@kroah.com/
> > > >
> > > > "For non-native english speakers this is going to be rough, given that
> > > > I as a native english speaker had to go look up the word in a
> > > > dictionary to fully understand what you are trying to do with that
> > > > name."
> > >
> > > I suggested "auxiliary" in another splintered thread on this question.
> > > In terms of what the kernel is already using:
> > >
> > > $ git grep auxiliary | wc -l
> > > 507
> > > $ git grep ancillary | wc -l
> > > 153
> > >
> > > Empirically, "auxiliary" is more common and closely matches the intended function
> > > of these devices relative to their parent device.
> >
> > auxiliary bus is a befitting name as well.
>
> Let's share all options and decide later.
> I don't want to find us bikeshedding about it.

Too late we are deep into bikeshedding at this point... it continued
over here [1] for a bit, but let's try to bring the discussion back to
this thread.

[1]: http://lore.kernel.org/r/10048d4d-038c-c2b7-2ed7-fd4ca87d104a@linux.intel.com

Leon Romanovsky Oct. 7, 2020, 7:26 p.m. UTC | #17

On Wed, Oct 07, 2020 at 06:06:30PM +0000, Ertman, David M wrote:
> > -----Original Message-----
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Tuesday, October 6, 2020 10:03 AM
> > To: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > Cc: Ertman, David M <david.m.ertman@intel.com>; alsa-devel@alsa-
> > project.org; parav@mellanox.com; tiwai@suse.de; netdev@vger.kernel.org;
> > ranjani.sridharan@linux.intel.com; fred.oh@linux.intel.com; linux-
> > rdma@vger.kernel.org; dledford@redhat.com; broonie@kernel.org;
> > jgg@nvidia.com; gregkh@linuxfoundation.org; kuba@kernel.org; Williams,
> > Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> > <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> > <kiran.patil@intel.com>
> > Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> >
> > On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:
> > > Thanks for the review Leon.
> > >
> > > > > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > > > > It enables drivers to create an ancillary_device and bind an
> > > > > ancillary_driver to it.
> > > >
> > > > I was under impression that this name is going to be changed.
> > >
> > > It's part of the opens stated in the cover letter.
> >
> > ok, so what are the variants?
> > system bus (sysbus), sbsystem bus (subbus), crossbus ?
> >
> > >
> > > [...]
> > >
> > > > > +	const struct my_driver my_drv = {
> > > > > +		.ancillary_drv = {
> > > > > +			.driver = {
> > > > > +				.name = "myancillarydrv",
> > > >
> > > > Why do we need to give control over driver name to the driver authors?
> > > > It can be problematic if author puts name that already exists.
> > >
> > > Good point. When I used the ancillary_devices for my own SoundWire test,
> > the
> > > driver name didn't seem specifically meaningful but needed to be set to
> > > something, what mattered was the id_table. Just thinking aloud, maybe we
> > can
> > > add prefixing with KMOD_BUILD, as we've done already to avoid collisions
> > > between device names?
> >
> > IMHO, it shouldn't be controlled by the drivers at all and need to have
> > kernel module name hardwired. Users will use it later for various
> > bind/unbind/autoprobe tricks and it will give predictability for them.
> >
> > >
> > > [...]
> > >
> > > > > +int __ancillary_device_add(struct ancillary_device *ancildev, const
> > char *modname)
> > > > > +{
> > > > > +	struct device *dev = &ancildev->dev;
> > > > > +	int ret;
> > > > > +
> > > > > +	if (!modname) {
> > > > > +		pr_err("ancillary device modname is NULL\n");
> > > > > +		return -EINVAL;
> > > > > +	}
> > > > > +
> > > > > +	ret = dev_set_name(dev, "%s.%s.%d", modname, ancildev->name,
> > ancildev->id);
> > > > > +	if (ret) {
> > > > > +		pr_err("ancillary device dev_set_name failed: %d\n", ret);
> > > > > +		return ret;
> > > > > +	}
> > > > > +
> > > > > +	ret = device_add(dev);
> > > > > +	if (ret)
> > > > > +		dev_err(dev, "adding ancillary device failed!: %d\n", ret);
> > > > > +
> > > > > +	return ret;
> > > > > +}
> > > >
> > > > Sorry, but this is very strange API that requires users to put
> > > > internal call to "dev" that is buried inside "struct ancillary_device".
> > > >
> > > > For example in your next patch, you write this "put_device(&cdev-
> > >ancildev.dev);"
> > > >
> > > > I'm pretty sure that the amount of bugs in error unwind will be
> > > > astonishing, so if you are doing wrappers over core code, better do not
> > > > pass complexity to the users.
> > >
> > > In initial reviews, there was pushback on adding wrappers that don't do
> > > anything except for a pointer indirection.
> > >
> > > Others had concerns that the API wasn't balanced and blurring layers.
> >
> > Are you talking about internal review or public?
> > If it is public, can I get a link to it?
> >
> > >
> > > Both points have merits IMHO. Do we want wrappers for everything and
> > > completely hide the low-level device?
> >
> > This API is partially obscures low level driver-core code and needs to
> > provide clear and proper abstractions without need to remember about
> > put_device. There is already _add() interface why don't you do
> > put_device() in it?
> >
>
> The pushback Pierre is referring to was during our mid-tier internal review.  It was
> primarily a concern of Parav as I recall, so he can speak to his reasoning.
>
> What we originally had was a single API call (ancillary_device_register) that started
> with a call to device_initialize(), and every error path out of the function performed
> a put_device().
>
> Is this the model you have in mind?

I don't like this flow:
ancillary_device_initialize()
if (ancillary_ancillary_device_add()) {
  put_device(....)
  ancillary_device_unregister()
  return err;
}

And prefer this flow:
ancillary_device_initialize()
if (ancillary_device_add()) {
  ancillary_device_unregister()
  return err;
}

In this way, the ancillary users won't need to do non-intuitive put_device();

Thanks

Ertman, David M Oct. 7, 2020, 7:53 p.m. UTC | #18

> -----Original Message-----
> From: Alsa-devel <alsa-devel-bounces@alsa-project.org> On Behalf Of Leon
> Romanovsky
> Sent: Wednesday, October 7, 2020 12:26 PM
> To: Ertman, David M <david.m.ertman@intel.com>
> Cc: alsa-devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;
> netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com; Pierre-Louis
> Bossart <pierre-louis.bossart@linux.intel.com>; fred.oh@linux.intel.com;
> linux-rdma@vger.kernel.org; dledford@redhat.com; broonie@kernel.org;
> jgg@nvidia.com; gregkh@linuxfoundation.org; kuba@kernel.org; Williams,
> Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> <kiran.patil@intel.com>
> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> 
> On Wed, Oct 07, 2020 at 06:06:30PM +0000, Ertman, David M wrote:
> > > -----Original Message-----
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Sent: Tuesday, October 6, 2020 10:03 AM
> > > To: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > > Cc: Ertman, David M <david.m.ertman@intel.com>; alsa-devel@alsa-
> > > project.org; parav@mellanox.com; tiwai@suse.de;
> netdev@vger.kernel.org;
> > > ranjani.sridharan@linux.intel.com; fred.oh@linux.intel.com; linux-
> > > rdma@vger.kernel.org; dledford@redhat.com; broonie@kernel.org;
> > > jgg@nvidia.com; gregkh@linuxfoundation.org; kuba@kernel.org;
> Williams,
> > > Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> > > <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> > > <kiran.patil@intel.com>
> > > Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > >
> > > On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:
> > > > Thanks for the review Leon.
> > > >
> > > > > > Add support for the Ancillary Bus, ancillary_device and
> ancillary_driver.
> > > > > > It enables drivers to create an ancillary_device and bind an
> > > > > > ancillary_driver to it.
> > > > >
> > > > > I was under impression that this name is going to be changed.
> > > >
> > > > It's part of the opens stated in the cover letter.
> > >
> > > ok, so what are the variants?
> > > system bus (sysbus), sbsystem bus (subbus), crossbus ?
> > >
> > > >
> > > > [...]
> > > >
> > > > > > +	const struct my_driver my_drv = {
> > > > > > +		.ancillary_drv = {
> > > > > > +			.driver = {
> > > > > > +				.name = "myancillarydrv",
> > > > >
> > > > > Why do we need to give control over driver name to the driver
> authors?
> > > > > It can be problematic if author puts name that already exists.
> > > >
> > > > Good point. When I used the ancillary_devices for my own SoundWire
> test,
> > > the
> > > > driver name didn't seem specifically meaningful but needed to be set to
> > > > something, what mattered was the id_table. Just thinking aloud, maybe
> we
> > > can
> > > > add prefixing with KMOD_BUILD, as we've done already to avoid
> collisions
> > > > between device names?
> > >
> > > IMHO, it shouldn't be controlled by the drivers at all and need to have
> > > kernel module name hardwired. Users will use it later for various
> > > bind/unbind/autoprobe tricks and it will give predictability for them.
> > >
> > > >
> > > > [...]
> > > >
> > > > > > +int __ancillary_device_add(struct ancillary_device *ancildev, const
> > > char *modname)
> > > > > > +{
> > > > > > +	struct device *dev = &ancildev->dev;
> > > > > > +	int ret;
> > > > > > +
> > > > > > +	if (!modname) {
> > > > > > +		pr_err("ancillary device modname is NULL\n");
> > > > > > +		return -EINVAL;
> > > > > > +	}
> > > > > > +
> > > > > > +	ret = dev_set_name(dev, "%s.%s.%d", modname, ancildev-
> >name,
> > > ancildev->id);
> > > > > > +	if (ret) {
> > > > > > +		pr_err("ancillary device dev_set_name failed: %d\n",
> ret);
> > > > > > +		return ret;
> > > > > > +	}
> > > > > > +
> > > > > > +	ret = device_add(dev);
> > > > > > +	if (ret)
> > > > > > +		dev_err(dev, "adding ancillary device failed!: %d\n",
> ret);
> > > > > > +
> > > > > > +	return ret;
> > > > > > +}
> > > > >
> > > > > Sorry, but this is very strange API that requires users to put
> > > > > internal call to "dev" that is buried inside "struct ancillary_device".
> > > > >
> > > > > For example in your next patch, you write this "put_device(&cdev-
> > > >ancildev.dev);"
> > > > >
> > > > > I'm pretty sure that the amount of bugs in error unwind will be
> > > > > astonishing, so if you are doing wrappers over core code, better do
> not
> > > > > pass complexity to the users.
> > > >
> > > > In initial reviews, there was pushback on adding wrappers that don't do
> > > > anything except for a pointer indirection.
> > > >
> > > > Others had concerns that the API wasn't balanced and blurring layers.
> > >
> > > Are you talking about internal review or public?
> > > If it is public, can I get a link to it?
> > >
> > > >
> > > > Both points have merits IMHO. Do we want wrappers for everything
> and
> > > > completely hide the low-level device?
> > >
> > > This API is partially obscures low level driver-core code and needs to
> > > provide clear and proper abstractions without need to remember about
> > > put_device. There is already _add() interface why don't you do
> > > put_device() in it?
> > >
> >
> > The pushback Pierre is referring to was during our mid-tier internal review.
> It was
> > primarily a concern of Parav as I recall, so he can speak to his reasoning.
> >
> > What we originally had was a single API call (ancillary_device_register) that
> started
> > with a call to device_initialize(), and every error path out of the function
> performed
> > a put_device().
> >
> > Is this the model you have in mind?
> 
> I don't like this flow:
> ancillary_device_initialize()
> if (ancillary_ancillary_device_add()) {
>   put_device(....)
>   ancillary_device_unregister()
>   return err;
> }
> 
> And prefer this flow:
> ancillary_device_initialize()
> if (ancillary_device_add()) {
>   ancillary_device_unregister()
>   return err;
> }
> 
> In this way, the ancillary users won't need to do non-intuitive put_device();

Isn't there a problem calling device_unregister() if device_add() fails?
device_unregister() does a device_del() and if the device_add() failed there is
nothing to delete?

-DaveE

> 
> Thanks

Ertman, David M Oct. 7, 2020, 7:57 p.m. UTC | #19

> -----Original Message-----
> From: Ertman, David M
> Sent: Wednesday, October 7, 2020 12:54 PM
> To: 'Leon Romanovsky' <leon@kernel.org>
> Cc: alsa-devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;
> netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com; Pierre-Louis
> Bossart <pierre-louis.bossart@linux.intel.com>; fred.oh@linux.intel.com;
> linux-rdma@vger.kernel.org; dledford@redhat.com; broonie@kernel.org;
> jgg@nvidia.com; gregkh@linuxfoundation.org; kuba@kernel.org; Williams,
> Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> <kiran.patil@intel.com>
> Subject: RE: [PATCH v2 1/6] Add ancillary bus support
> 
> > -----Original Message-----
> > From: Alsa-devel <alsa-devel-bounces@alsa-project.org> On Behalf Of
> Leon
> > Romanovsky
> > Sent: Wednesday, October 7, 2020 12:26 PM
> > To: Ertman, David M <david.m.ertman@intel.com>
> > Cc: alsa-devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;
> > netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com; Pierre-Louis
> > Bossart <pierre-louis.bossart@linux.intel.com>; fred.oh@linux.intel.com;
> > linux-rdma@vger.kernel.org; dledford@redhat.com; broonie@kernel.org;
> > jgg@nvidia.com; gregkh@linuxfoundation.org; kuba@kernel.org; Williams,
> > Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> > <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> > <kiran.patil@intel.com>
> > Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> >
> > On Wed, Oct 07, 2020 at 06:06:30PM +0000, Ertman, David M wrote:
> > > > -----Original Message-----
> > > > From: Leon Romanovsky <leon@kernel.org>
> > > > Sent: Tuesday, October 6, 2020 10:03 AM
> > > > To: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > > > Cc: Ertman, David M <david.m.ertman@intel.com>; alsa-devel@alsa-
> > > > project.org; parav@mellanox.com; tiwai@suse.de;
> > netdev@vger.kernel.org;
> > > > ranjani.sridharan@linux.intel.com; fred.oh@linux.intel.com; linux-
> > > > rdma@vger.kernel.org; dledford@redhat.com; broonie@kernel.org;
> > > > jgg@nvidia.com; gregkh@linuxfoundation.org; kuba@kernel.org;
> > Williams,
> > > > Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> > > > <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> > > > <kiran.patil@intel.com>
> > > > Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > > >
> > > > On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:
> > > > > Thanks for the review Leon.
> > > > >
> > > > > > > Add support for the Ancillary Bus, ancillary_device and
> > ancillary_driver.
> > > > > > > It enables drivers to create an ancillary_device and bind an
> > > > > > > ancillary_driver to it.
> > > > > >
> > > > > > I was under impression that this name is going to be changed.
> > > > >
> > > > > It's part of the opens stated in the cover letter.
> > > >
> > > > ok, so what are the variants?
> > > > system bus (sysbus), sbsystem bus (subbus), crossbus ?
> > > >
> > > > >
> > > > > [...]
> > > > >
> > > > > > > +	const struct my_driver my_drv = {
> > > > > > > +		.ancillary_drv = {
> > > > > > > +			.driver = {
> > > > > > > +				.name = "myancillarydrv",
> > > > > >
> > > > > > Why do we need to give control over driver name to the driver
> > authors?
> > > > > > It can be problematic if author puts name that already exists.
> > > > >
> > > > > Good point. When I used the ancillary_devices for my own
> SoundWire
> > test,
> > > > the
> > > > > driver name didn't seem specifically meaningful but needed to be set
> to
> > > > > something, what mattered was the id_table. Just thinking aloud,
> maybe
> > we
> > > > can
> > > > > add prefixing with KMOD_BUILD, as we've done already to avoid
> > collisions
> > > > > between device names?
> > > >
> > > > IMHO, it shouldn't be controlled by the drivers at all and need to have
> > > > kernel module name hardwired. Users will use it later for various
> > > > bind/unbind/autoprobe tricks and it will give predictability for them.
> > > >
> > > > >
> > > > > [...]
> > > > >
> > > > > > > +int __ancillary_device_add(struct ancillary_device *ancildev,
> const
> > > > char *modname)
> > > > > > > +{
> > > > > > > +	struct device *dev = &ancildev->dev;
> > > > > > > +	int ret;
> > > > > > > +
> > > > > > > +	if (!modname) {
> > > > > > > +		pr_err("ancillary device modname is NULL\n");
> > > > > > > +		return -EINVAL;
> > > > > > > +	}
> > > > > > > +
> > > > > > > +	ret = dev_set_name(dev, "%s.%s.%d", modname, ancildev-
> > >name,
> > > > ancildev->id);
> > > > > > > +	if (ret) {
> > > > > > > +		pr_err("ancillary device dev_set_name failed: %d\n",
> > ret);
> > > > > > > +		return ret;
> > > > > > > +	}
> > > > > > > +
> > > > > > > +	ret = device_add(dev);
> > > > > > > +	if (ret)
> > > > > > > +		dev_err(dev, "adding ancillary device failed!: %d\n",
> > ret);
> > > > > > > +
> > > > > > > +	return ret;
> > > > > > > +}
> > > > > >
> > > > > > Sorry, but this is very strange API that requires users to put
> > > > > > internal call to "dev" that is buried inside "struct ancillary_device".
> > > > > >
> > > > > > For example in your next patch, you write this "put_device(&cdev-
> > > > >ancildev.dev);"
> > > > > >
> > > > > > I'm pretty sure that the amount of bugs in error unwind will be
> > > > > > astonishing, so if you are doing wrappers over core code, better do
> > not
> > > > > > pass complexity to the users.
> > > > >
> > > > > In initial reviews, there was pushback on adding wrappers that don't
> do
> > > > > anything except for a pointer indirection.
> > > > >
> > > > > Others had concerns that the API wasn't balanced and blurring layers.
> > > >
> > > > Are you talking about internal review or public?
> > > > If it is public, can I get a link to it?
> > > >
> > > > >
> > > > > Both points have merits IMHO. Do we want wrappers for everything
> > and
> > > > > completely hide the low-level device?
> > > >
> > > > This API is partially obscures low level driver-core code and needs to
> > > > provide clear and proper abstractions without need to remember about
> > > > put_device. There is already _add() interface why don't you do
> > > > put_device() in it?
> > > >
> > >
> > > The pushback Pierre is referring to was during our mid-tier internal
> review.
> > It was
> > > primarily a concern of Parav as I recall, so he can speak to his reasoning.
> > >
> > > What we originally had was a single API call (ancillary_device_register)
> that
> > started
> > > with a call to device_initialize(), and every error path out of the function
> > performed
> > > a put_device().
> > >
> > > Is this the model you have in mind?
> >
> > I don't like this flow:
> > ancillary_device_initialize()
> > if (ancillary_ancillary_device_add()) {
> >   put_device(....)
> >   ancillary_device_unregister()
> >   return err;
> > }
> >
> > And prefer this flow:
> > ancillary_device_initialize()
> > if (ancillary_device_add()) {
> >   ancillary_device_unregister()
> >   return err;
> > }
> >
> > In this way, the ancillary users won't need to do non-intuitive put_device();
> 
> Isn't there a problem calling device_unregister() if device_add() fails?
> device_unregister() does a device_del() and if the device_add() failed there
> is
> nothing to delete?

Sorry, hit send there unintentionally.

So, would it be best to split the unregister API into two calls as well.

ancillary_device_del()
ancillary_device_put()
?
-DaveE
> 
> -DaveE
> 
> >
> > Thanks

Ertman, David M Oct. 7, 2020, 8:01 p.m. UTC | #20

> -----Original Message-----

> From: Dan Williams <dan.j.williams@intel.com>

> Sent: Wednesday, October 7, 2020 11:56 AM

> To: Leon Romanovsky <leon@kernel.org>

> Cc: Saleem, Shiraz <shiraz.saleem@intel.com>; Parav Pandit

> <parav@nvidia.com>; Pierre-Louis Bossart <pierre-

> louis.bossart@linux.intel.com>; Ertman, David M

> <david.m.ertman@intel.com>; alsa-devel@alsa-project.org;

> parav@mellanox.com; tiwai@suse.de; netdev@vger.kernel.org;

> ranjani.sridharan@linux.intel.com; fred.oh@linux.intel.com; linux-

> rdma@vger.kernel.org; dledford@redhat.com; broonie@kernel.org; Jason

> Gunthorpe <jgg@nvidia.com>; gregkh@linuxfoundation.org;

> kuba@kernel.org; davem@davemloft.net; Patil, Kiran

> <kiran.patil@intel.com>

> Subject: Re: [PATCH v2 1/6] Add ancillary bus support

> 

> On Wed, Oct 7, 2020 at 6:37 AM Leon Romanovsky <leon@kernel.org>

> wrote:

> >

> > On Wed, Oct 07, 2020 at 01:09:55PM +0000, Saleem, Shiraz wrote:

> > > > Subject: Re: [PATCH v2 1/6] Add ancillary bus support

> > > >

> > > > On Tue, Oct 6, 2020 at 12:21 PM Leon Romanovsky <leon@kernel.org>

> wrote:

> > > > >

> > > > > On Tue, Oct 06, 2020 at 05:41:00PM +0000, Saleem, Shiraz wrote:

> > > > > > > Subject: Re: [PATCH v2 1/6] Add ancillary bus support

> > > > > > >

> > > > > > > On Tue, Oct 06, 2020 at 05:09:09PM +0000, Parav Pandit wrote:

> > > > > > > >

> > > > > > > > > From: Leon Romanovsky <leon@kernel.org>

> > > > > > > > > Sent: Tuesday, October 6, 2020 10:33 PM

> > > > > > > > >

> > > > > > > > > On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart

> wrote:

> > > > > > > > > > Thanks for the review Leon.

> > > > > > > > > >

> > > > > > > > > > > > Add support for the Ancillary Bus, ancillary_device and

> > > > ancillary_driver.

> > > > > > > > > > > > It enables drivers to create an ancillary_device and

> > > > > > > > > > > > bind an ancillary_driver to it.

> > > > > > > > > > >

> > > > > > > > > > > I was under impression that this name is going to be

> changed.

> > > > > > > > > >

> > > > > > > > > > It's part of the opens stated in the cover letter.

> > > > > > > > >

> > > > > > > > > ok, so what are the variants?

> > > > > > > > > system bus (sysbus), sbsystem bus (subbus), crossbus ?

> > > > > > > > Since the intended use of this bus is to

> > > > > > > > (a) create sub devices that represent 'functional separation'

> > > > > > > > and

> > > > > > > > (b) second use case for subfunctions from a pci device,

> > > > > > > >

> > > > > > > > I proposed below names in v1 of this patchset.

> > > > > > >

> > > > > > > > (a) subdev_bus

> > > > > > >

> > > > > > > It sounds good, just can we avoid "_" in the name and call it

> subdev?

> > > > > > >

> > > > > >

> > > > > > What is wrong with naming the bus 'ancillary bus'? I feel it's a fitting

> name.

> > > > > > An ancillary software bus for ancillary devices carved off a parent

> device

> > > > registered on a primary bus.

> > > > >

> > > > > Greg summarized it very well, every internal conversation about this

> > > > > patch with my colleagues (non-english speakers) starts with the

> question:

> > > > > "What does ancillary mean?"

> > > > > https://lore.kernel.org/alsa-

> devel/20201001071403.GC31191@kroah.com/

> > > > >

> > > > > "For non-native english speakers this is going to be rough, given that

> > > > > I as a native english speaker had to go look up the word in a

> > > > > dictionary to fully understand what you are trying to do with that

> > > > > name."

> > > >

> > > > I suggested "auxiliary" in another splintered thread on this question.

> > > > In terms of what the kernel is already using:

> > > >

> > > > $ git grep auxiliary | wc -l

> > > > 507

> > > > $ git grep ancillary | wc -l

> > > > 153

> > > >

> > > > Empirically, "auxiliary" is more common and closely matches the

> intended function

> > > > of these devices relative to their parent device.

> > >

> > > auxiliary bus is a befitting name as well.

> >

> > Let's share all options and decide later.

> > I don't want to find us bikeshedding about it.

> 

> Too late we are deep into bikeshedding at this point... it continued

> over here [1] for a bit, but let's try to bring the discussion back to

> this thread.

> 

> [1]: http://lore.kernel.org/r/10048d4d-038c-c2b7-2ed7-

> fd4ca87d104a@linux.intel.com


Out of all of the suggestions put forward so far that do not
have real objections to them ...

I would put my vote behind aux - short, simple, meaningful

-DaveE

Parav Pandit Oct. 7, 2020, 8:17 p.m. UTC | #21

> From: Leon Romanovsky <leon@kernel.org>
> Sent: Thursday, October 8, 2020 12:56 AM
> 
> > > This API is partially obscures low level driver-core code and needs
> > > to provide clear and proper abstractions without need to remember
> > > about put_device. There is already _add() interface why don't you do
> > > put_device() in it?
> > >
> >
> > The pushback Pierre is referring to was during our mid-tier internal
> > review.  It was primarily a concern of Parav as I recall, so he can speak to his
> reasoning.
> >
> > What we originally had was a single API call
> > (ancillary_device_register) that started with a call to
> > device_initialize(), and every error path out of the function performed a
> put_device().
> >
> > Is this the model you have in mind?
> 
> I don't like this flow:
> ancillary_device_initialize()
> if (ancillary_ancillary_device_add()) {
>   put_device(....)
>   ancillary_device_unregister()
Calling device_unregister() is incorrect, because add() wasn't successful.
Only put_device() or a wrapper ancillary_device_put() is necessary.

>   return err;
> }
> 
> And prefer this flow:
> ancillary_device_initialize()
> if (ancillary_device_add()) {
>   ancillary_device_unregister()
This is incorrect and a clear deviation from the current core APIs that adds the confusion.

>   return err;
> }
> 
> In this way, the ancillary users won't need to do non-intuitive put_device();

Below is most simple, intuitive and matching with core APIs for name and design pattern wise.
init()
{
	err = ancillary_device_initialize();
	if (err)
		return ret;

	err = ancillary_device_add();
	if (ret)
		goto err_unwind;

	err = some_foo();
	if (err)
		goto err_foo;
	return 0;

err_foo:
	ancillary_device_del(adev);
err_unwind:
	ancillary_device_put(adev->dev);
	return err;
}

cleanup()
{
	ancillary_device_de(adev);
	ancillary_device_put(adev);
	/* It is common to have a one wrapper for this as ancillary_device_unregister().
	 * This will match with core device_unregister() that has precise documentation.
	 * but given fact that init() code need proper error unwinding, like above,
	 * it make sense to have two APIs, and no need to export another symbol for unregister().
	 * This pattern is very easy to audit and code.
	 */
}

Ertman, David M Oct. 7, 2020, 8:18 p.m. UTC | #22

> -----Original Message-----

> From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

> Sent: Tuesday, October 6, 2020 8:18 AM

> To: Leon Romanovsky <leon@kernel.org>; Ertman, David M

> <david.m.ertman@intel.com>

> Cc: alsa-devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;

> netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com;

> fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;

> dledford@redhat.com; broonie@kernel.org; jgg@nvidia.com;

> gregkh@linuxfoundation.org; kuba@kernel.org; Williams, Dan J

> <dan.j.williams@intel.com>; Saleem, Shiraz <shiraz.saleem@intel.com>;

> davem@davemloft.net; Patil, Kiran <kiran.patil@intel.com>

> Subject: Re: [PATCH v2 1/6] Add ancillary bus support

> 

> Thanks for the review Leon.

> 

> >> Add support for the Ancillary Bus, ancillary_device and ancillary_driver.

> >> It enables drivers to create an ancillary_device and bind an

> >> ancillary_driver to it.

> >

> > I was under impression that this name is going to be changed.

> 

> It's part of the opens stated in the cover letter.

> 

> [...]

> 

> >> +	const struct my_driver my_drv = {

> >> +		.ancillary_drv = {

> >> +			.driver = {

> >> +				.name = "myancillarydrv",

> >

> > Why do we need to give control over driver name to the driver authors?

> > It can be problematic if author puts name that already exists.

> 

> Good point. When I used the ancillary_devices for my own SoundWire test,

> the driver name didn't seem specifically meaningful but needed to be set

> to something, what mattered was the id_table. Just thinking aloud, maybe

> we can add prefixing with KMOD_BUILD, as we've done already to avoid

> collisions between device names?

> 

> [...]


Since we have eliminated all IDA type things out of the bus infrastructure,
I like the idea of prefixing the driver name with KBUILD_MODNAME through
a macro front.  Since a parent driver can register more than one ancillary driver,
this allow the parent to have an internally meaningful name while still ensuring
its uniqueness.

-DaveE

Ertman, David M Oct. 7, 2020, 8:30 p.m. UTC | #23

> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Tuesday, October 6, 2020 10:03 AM
> To: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> Cc: Ertman, David M <david.m.ertman@intel.com>; alsa-devel@alsa-
> project.org; parav@mellanox.com; tiwai@suse.de; netdev@vger.kernel.org;
> ranjani.sridharan@linux.intel.com; fred.oh@linux.intel.com; linux-
> rdma@vger.kernel.org; dledford@redhat.com; broonie@kernel.org;
> jgg@nvidia.com; gregkh@linuxfoundation.org; kuba@kernel.org; Williams,
> Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> <kiran.patil@intel.com>
> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> 
> On Tue, Oct 06, 2020 at 10:18:07AM -0500, Pierre-Louis Bossart wrote:
> > Thanks for the review Leon.

[...]

> > > > +EXPORT_SYMBOL_GPL(__ancillary_device_add);
> > > > +
> > > > +static int ancillary_probe_driver(struct device *dev)
> > > > +{
> > > > +	struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
> > > > +	struct ancillary_device *ancildev = to_ancillary_dev(dev);
> > > > +	int ret;
> > > > +
> > > > +	ret = dev_pm_domain_attach(dev, true);
> > > > +	if (ret) {
> > > > +		dev_warn(dev, "Failed to attach to PM Domain : %d\n", ret);
> > > > +		return ret;
> > > > +	}
> > > > +
> > > > +	ret = ancildrv->probe(ancildev, ancillary_match_id(ancildrv-
> >id_table, ancildev));
> > >
> > > I don't think that you need to call ->probe() if ancillary_match_id()
> > > returned NULL and probably that check should be done before
> > > dev_pm_domain_attach().
> >
> > we'll look into this.
> >

AKAIK, this callback is only accessed from the bus subsystem after a successful
return from ancillary_match().

-DaveE

Ertman, David M Oct. 7, 2020, 8:46 p.m. UTC | #24

> -----Original Message-----
> From: Parav Pandit <parav@nvidia.com>
> Sent: Wednesday, October 7, 2020 1:17 PM
> To: Leon Romanovsky <leon@kernel.org>; Ertman, David M
> <david.m.ertman@intel.com>
> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa-
> devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;
> netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com;
> fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;
> dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe
> <jgg@nvidia.com>; gregkh@linuxfoundation.org; kuba@kernel.org; Williams,
> Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> <kiran.patil@intel.com>
> Subject: RE: [PATCH v2 1/6] Add ancillary bus support
> 
> 
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Thursday, October 8, 2020 12:56 AM
> >
> > > > This API is partially obscures low level driver-core code and needs
> > > > to provide clear and proper abstractions without need to remember
> > > > about put_device. There is already _add() interface why don't you do
> > > > put_device() in it?
> > > >
> > >
> > > The pushback Pierre is referring to was during our mid-tier internal
> > > review.  It was primarily a concern of Parav as I recall, so he can speak to
> his
> > reasoning.
> > >
> > > What we originally had was a single API call
> > > (ancillary_device_register) that started with a call to
> > > device_initialize(), and every error path out of the function performed a
> > put_device().
> > >
> > > Is this the model you have in mind?
> >
> > I don't like this flow:
> > ancillary_device_initialize()
> > if (ancillary_ancillary_device_add()) {
> >   put_device(....)
> >   ancillary_device_unregister()
> Calling device_unregister() is incorrect, because add() wasn't successful.
> Only put_device() or a wrapper ancillary_device_put() is necessary.
> 
> >   return err;
> > }
> >
> > And prefer this flow:
> > ancillary_device_initialize()
> > if (ancillary_device_add()) {
> >   ancillary_device_unregister()
> This is incorrect and a clear deviation from the current core APIs that adds the
> confusion.
> 
> >   return err;
> > }
> >
> > In this way, the ancillary users won't need to do non-intuitive put_device();
> 
> Below is most simple, intuitive and matching with core APIs for name and
> design pattern wise.
> init()
> {
> 	err = ancillary_device_initialize();
> 	if (err)
> 		return ret;
> 
> 	err = ancillary_device_add();
> 	if (ret)
> 		goto err_unwind;
> 
> 	err = some_foo();
> 	if (err)
> 		goto err_foo;
> 	return 0;
> 
> err_foo:
> 	ancillary_device_del(adev);
> err_unwind:
> 	ancillary_device_put(adev->dev);
> 	return err;
> }
> 
> cleanup()
> {
> 	ancillary_device_de(adev);
> 	ancillary_device_put(adev);
> 	/* It is common to have a one wrapper for this as
> ancillary_device_unregister().
> 	 * This will match with core device_unregister() that has precise
> documentation.
> 	 * but given fact that init() code need proper error unwinding, like
> above,
> 	 * it make sense to have two APIs, and no need to export another
> symbol for unregister().
> 	 * This pattern is very easy to audit and code.
> 	 */
> }

I like this flow +1

But ... since the init() function is performing both device_init and
device_add - it should probably be called ancillary_device_register, 
and we are back to a single exported API for both register and
unregister.

At that point, do we need wrappers on the primitives init, add, del,
and put?

-DaveE

Pierre-Louis Bossart Oct. 7, 2020, 8:59 p.m. UTC | #25

>> Below is most simple, intuitive and matching with core APIs for name and
>> design pattern wise.
>> init()
>> {
>> 	err = ancillary_device_initialize();
>> 	if (err)
>> 		return ret;
>>
>> 	err = ancillary_device_add();
>> 	if (ret)
>> 		goto err_unwind;
>>
>> 	err = some_foo();
>> 	if (err)
>> 		goto err_foo;
>> 	return 0;
>>
>> err_foo:
>> 	ancillary_device_del(adev);
>> err_unwind:
>> 	ancillary_device_put(adev->dev);
>> 	return err;
>> }
>>
>> cleanup()
>> {
>> 	ancillary_device_de(adev);
>> 	ancillary_device_put(adev);
>> 	/* It is common to have a one wrapper for this as
>> ancillary_device_unregister().
>> 	 * This will match with core device_unregister() that has precise
>> documentation.
>> 	 * but given fact that init() code need proper error unwinding, like
>> above,
>> 	 * it make sense to have two APIs, and no need to export another
>> symbol for unregister().
>> 	 * This pattern is very easy to audit and code.
>> 	 */
>> }
> 
> I like this flow +1
> 
> But ... since the init() function is performing both device_init and
> device_add - it should probably be called ancillary_device_register,
> and we are back to a single exported API for both register and
> unregister.

Kind reminder that we introduced the two functions to allow the caller 
to know if it needed to free memory when initialize() fails, and it 
didn't need to free memory when add() failed since put_device() takes 
care of it. If you have a single init() function it's impossible to know 
which behavior to select on error.

I also have a case with SoundWire where it's nice to first initialize, 
then set some data and then add.

> 
> At that point, do we need wrappers on the primitives init, add, del,
> and put?
> 
> -DaveE
>

Ertman, David M Oct. 7, 2020, 9:22 p.m. UTC | #26

> -----Original Message-----

> From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

> Sent: Wednesday, October 7, 2020 1:59 PM

> To: Ertman, David M <david.m.ertman@intel.com>; Parav Pandit

> <parav@nvidia.com>; Leon Romanovsky <leon@kernel.org>

> Cc: alsa-devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;

> netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com;

> fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;

> dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe

> <jgg@nvidia.com>; gregkh@linuxfoundation.org; kuba@kernel.org; Williams,

> Dan J <dan.j.williams@intel.com>; Saleem, Shiraz

> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran

> <kiran.patil@intel.com>

> Subject: Re: [PATCH v2 1/6] Add ancillary bus support

> 

> 

> 

> >> Below is most simple, intuitive and matching with core APIs for name and

> >> design pattern wise.

> >> init()

> >> {

> >> 	err = ancillary_device_initialize();

> >> 	if (err)

> >> 		return ret;

> >>

> >> 	err = ancillary_device_add();

> >> 	if (ret)

> >> 		goto err_unwind;

> >>

> >> 	err = some_foo();

> >> 	if (err)

> >> 		goto err_foo;

> >> 	return 0;

> >>

> >> err_foo:

> >> 	ancillary_device_del(adev);

> >> err_unwind:

> >> 	ancillary_device_put(adev->dev);

> >> 	return err;

> >> }

> >>

> >> cleanup()

> >> {

> >> 	ancillary_device_de(adev);

> >> 	ancillary_device_put(adev);

> >> 	/* It is common to have a one wrapper for this as

> >> ancillary_device_unregister().

> >> 	 * This will match with core device_unregister() that has precise

> >> documentation.

> >> 	 * but given fact that init() code need proper error unwinding, like

> >> above,

> >> 	 * it make sense to have two APIs, and no need to export another

> >> symbol for unregister().

> >> 	 * This pattern is very easy to audit and code.

> >> 	 */

> >> }

> >

> > I like this flow +1

> >

> > But ... since the init() function is performing both device_init and

> > device_add - it should probably be called ancillary_device_register,

> > and we are back to a single exported API for both register and

> > unregister.

> 

> Kind reminder that we introduced the two functions to allow the caller

> to know if it needed to free memory when initialize() fails, and it

> didn't need to free memory when add() failed since put_device() takes

> care of it. If you have a single init() function it's impossible to know

> which behavior to select on error.

> 

> I also have a case with SoundWire where it's nice to first initialize,

> then set some data and then add.

> 


The flow as outlined by Parav above does an initialize as the first step,
so every error path out of the function has to do a put_device(), so you
would never need to manually free the memory in the setup function.
It would be freed in the release call.

-DaveE

> >

> > At that point, do we need wrappers on the primitives init, add, del,

> > and put?

> >

> > -DaveE

> >

Pierre-Louis Bossart Oct. 7, 2020, 9:49 p.m. UTC | #27

On 10/7/20 4:22 PM, Ertman, David M wrote:
>> -----Original Message-----
>> From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
>> Sent: Wednesday, October 7, 2020 1:59 PM
>> To: Ertman, David M <david.m.ertman@intel.com>; Parav Pandit
>> <parav@nvidia.com>; Leon Romanovsky <leon@kernel.org>
>> Cc: alsa-devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;
>> netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com;
>> fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;
>> dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe
>> <jgg@nvidia.com>; gregkh@linuxfoundation.org; kuba@kernel.org; Williams,
>> Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
>> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
>> <kiran.patil@intel.com>
>> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
>>
>>
>>
>>>> Below is most simple, intuitive and matching with core APIs for name and
>>>> design pattern wise.
>>>> init()
>>>> {
>>>> 	err = ancillary_device_initialize();
>>>> 	if (err)
>>>> 		return ret;
>>>>
>>>> 	err = ancillary_device_add();
>>>> 	if (ret)
>>>> 		goto err_unwind;
>>>>
>>>> 	err = some_foo();
>>>> 	if (err)
>>>> 		goto err_foo;
>>>> 	return 0;
>>>>
>>>> err_foo:
>>>> 	ancillary_device_del(adev);
>>>> err_unwind:
>>>> 	ancillary_device_put(adev->dev);
>>>> 	return err;
>>>> }
>>>>
>>>> cleanup()
>>>> {
>>>> 	ancillary_device_de(adev);
>>>> 	ancillary_device_put(adev);
>>>> 	/* It is common to have a one wrapper for this as
>>>> ancillary_device_unregister().
>>>> 	 * This will match with core device_unregister() that has precise
>>>> documentation.
>>>> 	 * but given fact that init() code need proper error unwinding, like
>>>> above,
>>>> 	 * it make sense to have two APIs, and no need to export another
>>>> symbol for unregister().
>>>> 	 * This pattern is very easy to audit and code.
>>>> 	 */
>>>> }
>>>
>>> I like this flow +1
>>>
>>> But ... since the init() function is performing both device_init and
>>> device_add - it should probably be called ancillary_device_register,
>>> and we are back to a single exported API for both register and
>>> unregister.
>>
>> Kind reminder that we introduced the two functions to allow the caller
>> to know if it needed to free memory when initialize() fails, and it
>> didn't need to free memory when add() failed since put_device() takes
>> care of it. If you have a single init() function it's impossible to know
>> which behavior to select on error.
>>
>> I also have a case with SoundWire where it's nice to first initialize,
>> then set some data and then add.
>>
> 
> The flow as outlined by Parav above does an initialize as the first step,
> so every error path out of the function has to do a put_device(), so you
> would never need to manually free the memory in the setup function.
> It would be freed in the release call.

err = ancillary_device_initialize();
if (err)
	return ret;

where is the put_device() here? if the release function does any sort of 
kfree, then you'd need to do it manually in this case.

Parav Pandit Oct. 8, 2020, 4:56 a.m. UTC | #28

> From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

> Sent: Thursday, October 8, 2020 3:20 AM

> 

> 

> On 10/7/20 4:22 PM, Ertman, David M wrote:

> >> -----Original Message-----

> >> From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

> >> Sent: Wednesday, October 7, 2020 1:59 PM

> >> To: Ertman, David M <david.m.ertman@intel.com>; Parav Pandit

> >> <parav@nvidia.com>; Leon Romanovsky <leon@kernel.org>

> >> Cc: alsa-devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;

> >> netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com;

> >> fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;

> >> dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe

> >> <jgg@nvidia.com>; gregkh@linuxfoundation.org; kuba@kernel.org;

> >> Williams, Dan J <dan.j.williams@intel.com>; Saleem, Shiraz

> >> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran

> >> <kiran.patil@intel.com>

> >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support

> >>

> >>

> >>

> >>>> Below is most simple, intuitive and matching with core APIs for

> >>>> name and design pattern wise.

> >>>> init()

> >>>> {

> >>>> 	err = ancillary_device_initialize();

> >>>> 	if (err)

> >>>> 		return ret;

> >>>>

> >>>> 	err = ancillary_device_add();

> >>>> 	if (ret)

> >>>> 		goto err_unwind;

> >>>>

> >>>> 	err = some_foo();

> >>>> 	if (err)

> >>>> 		goto err_foo;

> >>>> 	return 0;

> >>>>

> >>>> err_foo:

> >>>> 	ancillary_device_del(adev);

> >>>> err_unwind:

> >>>> 	ancillary_device_put(adev->dev);

> >>>> 	return err;

> >>>> }

> >>>>

> >>>> cleanup()

> >>>> {

> >>>> 	ancillary_device_de(adev);

> >>>> 	ancillary_device_put(adev);

> >>>> 	/* It is common to have a one wrapper for this as

> >>>> ancillary_device_unregister().

> >>>> 	 * This will match with core device_unregister() that has precise

> >>>> documentation.

> >>>> 	 * but given fact that init() code need proper error unwinding,

> >>>> like above,

> >>>> 	 * it make sense to have two APIs, and no need to export another

> >>>> symbol for unregister().

> >>>> 	 * This pattern is very easy to audit and code.

> >>>> 	 */

> >>>> }

> >>>

> >>> I like this flow +1

> >>>

> >>> But ... since the init() function is performing both device_init and

> >>> device_add - it should probably be called ancillary_device_register,

> >>> and we are back to a single exported API for both register and

> >>> unregister.

> >>

> >> Kind reminder that we introduced the two functions to allow the

> >> caller to know if it needed to free memory when initialize() fails,

> >> and it didn't need to free memory when add() failed since

> >> put_device() takes care of it. If you have a single init() function

> >> it's impossible to know which behavior to select on error.

> >>

> >> I also have a case with SoundWire where it's nice to first

> >> initialize, then set some data and then add.

> >>

> >

> > The flow as outlined by Parav above does an initialize as the first

> > step, so every error path out of the function has to do a

> > put_device(), so you would never need to manually free the memory in

> the setup function.

> > It would be freed in the release call.

> 

> err = ancillary_device_initialize();

> if (err)

> 	return ret;

> 

> where is the put_device() here? if the release function does any sort of

> kfree, then you'd need to do it manually in this case.

Since device_initialize() failed, put_device() cannot be done here.
So yes, pseudo code should have shown,
if (err) {
	kfree(adev);
	return err;
}

If we just want to follow register(), unregister() pattern,

Than,

ancillar_device_register() should be,

/**
 * ancillar_device_register() - register an ancillary device
 * NOTE: __never directly free @adev after calling this function, even if it returned
 * an error. Always use ancillary_device_put() to give up the reference initialized by this function.
 * This note matches with the core and caller knows exactly what to be done.
 */
ancillary_device_register()
{
	device_initialize(&adev->dev);
	if (!dev->parent || !adev->name)
		return -EINVAL;
	if (!dev->release && !(dev->type && dev->type->release)) {
		/* core is already capable and throws the warning when release callback is not set.
		 * It is done at drivers/base/core.c:1798.
		 * For NULL release it says, "does not have a release() function, it is broken and must be fixed"
		 */
		return -EINVAL;
	}
	err = dev_set_name(adev...);
	if (err) {
		/* kobject_release() -> kobject_cleanup() are capable to detect if name is set/ not set
		  * and free the const if it was set.
		  */
		return err;
	}
	err = device_add(&adev->dev);
	If (err)
		return err;
}

Caller code:
init()
{
	adev = kzalloc(sizeof(*foo_adev)..);
	if (!adev)
		return -ENOMEM;
	err = ancillary_device_register(&adev);
	if (err)
		goto err;

err:
	ancillary_device_put(&adev);
	return err;
}

cleanup()
{
	ancillary_device_unregister(&adev);
}

Above pattern is fine too matching the core.

If I understand Leon correctly, he prefers simple register(), unregister() pattern.
If, so it should be explicit register(), unregister() API.

However I read that Pierre mentioned that SoundWire prefers initialize(), some_data_init(), add() pattern.
If SoundWire cannot do register() pattern,
So, whichever first user bundled with the patchset, those APIs should be exported, because we don’t add an API without a user.

Pierre, 
Can you please check if SoundWire can follow register() pattern?

Assuming Leon patches and my patches for subfunction arrive after Soundwire series + ancillary bus,
we can add the register() and unregister() version in our patchset later.

Greg already said that "it's not carved on stone, we can do incremental additions as the need arise".
So I think we should proceed with the wrappers which follow the core convention of 
either 
(a) initialize)(), add() or 
(b) register(), unregister().

Leon Romanovsky Oct. 8, 2020, 5:21 a.m. UTC | #29

On Wed, Oct 07, 2020 at 08:46:45PM +0000, Ertman, David M wrote:
> > -----Original Message-----
> > From: Parav Pandit <parav@nvidia.com>
> > Sent: Wednesday, October 7, 2020 1:17 PM
> > To: Leon Romanovsky <leon@kernel.org>; Ertman, David M
> > <david.m.ertman@intel.com>
> > Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa-
> > devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;
> > netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com;
> > fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;
> > dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe
> > <jgg@nvidia.com>; gregkh@linuxfoundation.org; kuba@kernel.org; Williams,
> > Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> > <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> > <kiran.patil@intel.com>
> > Subject: RE: [PATCH v2 1/6] Add ancillary bus support
> >
> >
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Sent: Thursday, October 8, 2020 12:56 AM
> > >
> > > > > This API is partially obscures low level driver-core code and needs
> > > > > to provide clear and proper abstractions without need to remember
> > > > > about put_device. There is already _add() interface why don't you do
> > > > > put_device() in it?
> > > > >
> > > >
> > > > The pushback Pierre is referring to was during our mid-tier internal
> > > > review.  It was primarily a concern of Parav as I recall, so he can speak to
> > his
> > > reasoning.
> > > >
> > > > What we originally had was a single API call
> > > > (ancillary_device_register) that started with a call to
> > > > device_initialize(), and every error path out of the function performed a
> > > put_device().
> > > >
> > > > Is this the model you have in mind?
> > >
> > > I don't like this flow:
> > > ancillary_device_initialize()
> > > if (ancillary_ancillary_device_add()) {
> > >   put_device(....)
> > >   ancillary_device_unregister()
> > Calling device_unregister() is incorrect, because add() wasn't successful.
> > Only put_device() or a wrapper ancillary_device_put() is necessary.
> >
> > >   return err;
> > > }
> > >
> > > And prefer this flow:
> > > ancillary_device_initialize()
> > > if (ancillary_device_add()) {
> > >   ancillary_device_unregister()
> > This is incorrect and a clear deviation from the current core APIs that adds the
> > confusion.
> >
> > >   return err;
> > > }
> > >
> > > In this way, the ancillary users won't need to do non-intuitive put_device();
> >
> > Below is most simple, intuitive and matching with core APIs for name and
> > design pattern wise.
> > init()
> > {
> > 	err = ancillary_device_initialize();
> > 	if (err)
> > 		return ret;
> >
> > 	err = ancillary_device_add();
> > 	if (ret)
> > 		goto err_unwind;
> >
> > 	err = some_foo();
> > 	if (err)
> > 		goto err_foo;
> > 	return 0;
> >
> > err_foo:
> > 	ancillary_device_del(adev);
> > err_unwind:
> > 	ancillary_device_put(adev->dev);
> > 	return err;
> > }
> >
> > cleanup()
> > {
> > 	ancillary_device_de(adev);
> > 	ancillary_device_put(adev);
> > 	/* It is common to have a one wrapper for this as
> > ancillary_device_unregister().
> > 	 * This will match with core device_unregister() that has precise
> > documentation.
> > 	 * but given fact that init() code need proper error unwinding, like
> > above,
> > 	 * it make sense to have two APIs, and no need to export another
> > symbol for unregister().
> > 	 * This pattern is very easy to audit and code.
> > 	 */
> > }
>
> I like this flow +1
>
> But ... since the init() function is performing both device_init and
> device_add - it should probably be called ancillary_device_register,
> and we are back to a single exported API for both register and
> unregister.
>
> At that point, do we need wrappers on the primitives init, add, del,
> and put?

Let me summarize.
1. You are not providing driver/core API but simplification and obfuscation
of basic primitives and structures. This is new layer. There is no room for
a claim that we must to follow internal API.
2. API should be symmetric. If you call to _register()/_add(), you will need
to call to _unregister()/_del(). Please don't add obscure _put().
3. You can't "ask" from users to call internal calls (put_device) over internal
fields in ancillary_device.
4. This API should be clear to drivers authors, "device_add()" call (and
semantic) is not used by the drivers (git grep " device_add(" drivers/).

Thanks

>
> -DaveE

Leon Romanovsky Oct. 8, 2020, 5:26 a.m. UTC | #30

On Thu, Oct 08, 2020 at 04:56:01AM +0000, Parav Pandit wrote:
>
>
> > From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > Sent: Thursday, October 8, 2020 3:20 AM
> >
> >
> > On 10/7/20 4:22 PM, Ertman, David M wrote:
> > >> -----Original Message-----
> > >> From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > >> Sent: Wednesday, October 7, 2020 1:59 PM
> > >> To: Ertman, David M <david.m.ertman@intel.com>; Parav Pandit
> > >> <parav@nvidia.com>; Leon Romanovsky <leon@kernel.org>
> > >> Cc: alsa-devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;
> > >> netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com;
> > >> fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;
> > >> dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe
> > >> <jgg@nvidia.com>; gregkh@linuxfoundation.org; kuba@kernel.org;
> > >> Williams, Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> > >> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> > >> <kiran.patil@intel.com>
> > >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > >>
> > >>
> > >>
> > >>>> Below is most simple, intuitive and matching with core APIs for
> > >>>> name and design pattern wise.
> > >>>> init()
> > >>>> {
> > >>>> 	err = ancillary_device_initialize();
> > >>>> 	if (err)
> > >>>> 		return ret;
> > >>>>
> > >>>> 	err = ancillary_device_add();
> > >>>> 	if (ret)
> > >>>> 		goto err_unwind;
> > >>>>
> > >>>> 	err = some_foo();
> > >>>> 	if (err)
> > >>>> 		goto err_foo;
> > >>>> 	return 0;
> > >>>>
> > >>>> err_foo:
> > >>>> 	ancillary_device_del(adev);
> > >>>> err_unwind:
> > >>>> 	ancillary_device_put(adev->dev);
> > >>>> 	return err;
> > >>>> }
> > >>>>
> > >>>> cleanup()
> > >>>> {
> > >>>> 	ancillary_device_de(adev);
> > >>>> 	ancillary_device_put(adev);
> > >>>> 	/* It is common to have a one wrapper for this as
> > >>>> ancillary_device_unregister().
> > >>>> 	 * This will match with core device_unregister() that has precise
> > >>>> documentation.
> > >>>> 	 * but given fact that init() code need proper error unwinding,
> > >>>> like above,
> > >>>> 	 * it make sense to have two APIs, and no need to export another
> > >>>> symbol for unregister().
> > >>>> 	 * This pattern is very easy to audit and code.
> > >>>> 	 */
> > >>>> }
> > >>>
> > >>> I like this flow +1
> > >>>
> > >>> But ... since the init() function is performing both device_init and
> > >>> device_add - it should probably be called ancillary_device_register,
> > >>> and we are back to a single exported API for both register and
> > >>> unregister.
> > >>
> > >> Kind reminder that we introduced the two functions to allow the
> > >> caller to know if it needed to free memory when initialize() fails,
> > >> and it didn't need to free memory when add() failed since
> > >> put_device() takes care of it. If you have a single init() function
> > >> it's impossible to know which behavior to select on error.
> > >>
> > >> I also have a case with SoundWire where it's nice to first
> > >> initialize, then set some data and then add.
> > >>
> > >
> > > The flow as outlined by Parav above does an initialize as the first
> > > step, so every error path out of the function has to do a
> > > put_device(), so you would never need to manually free the memory in
> > the setup function.
> > > It would be freed in the release call.
> >
> > err = ancillary_device_initialize();
> > if (err)
> > 	return ret;
> >
> > where is the put_device() here? if the release function does any sort of
> > kfree, then you'd need to do it manually in this case.
> Since device_initialize() failed, put_device() cannot be done here.
> So yes, pseudo code should have shown,
> if (err) {
> 	kfree(adev);
> 	return err;
> }
>
> If we just want to follow register(), unregister() pattern,
>
> Than,
>
> ancillar_device_register() should be,
>
> /**
>  * ancillar_device_register() - register an ancillary device
>  * NOTE: __never directly free @adev after calling this function, even if it returned
>  * an error. Always use ancillary_device_put() to give up the reference initialized by this function.
>  * This note matches with the core and caller knows exactly what to be done.
>  */
> ancillary_device_register()
> {
> 	device_initialize(&adev->dev);
> 	if (!dev->parent || !adev->name)
> 		return -EINVAL;
> 	if (!dev->release && !(dev->type && dev->type->release)) {
> 		/* core is already capable and throws the warning when release callback is not set.
> 		 * It is done at drivers/base/core.c:1798.
> 		 * For NULL release it says, "does not have a release() function, it is broken and must be fixed"
> 		 */
> 		return -EINVAL;
> 	}
> 	err = dev_set_name(adev...);
> 	if (err) {
> 		/* kobject_release() -> kobject_cleanup() are capable to detect if name is set/ not set
> 		  * and free the const if it was set.
> 		  */
> 		return err;
> 	}
> 	err = device_add(&adev->dev);
> 	If (err)
> 		return err;
> }
>
> Caller code:
> init()
> {
> 	adev = kzalloc(sizeof(*foo_adev)..);
> 	if (!adev)
> 		return -ENOMEM;
> 	err = ancillary_device_register(&adev);
> 	if (err)
> 		goto err;
>
> err:
> 	ancillary_device_put(&adev);
> 	return err;
> }
>
> cleanup()
> {
> 	ancillary_device_unregister(&adev);
> }
>
> Above pattern is fine too matching the core.
>
> If I understand Leon correctly, he prefers simple register(), unregister() pattern.
> If, so it should be explicit register(), unregister() API.

This is my summary
https://lore.kernel.org/linux-rdma/20201008052137.GA13580@unreal
The API should be symmetric.

Thanks

Dan Williams Oct. 8, 2020, 6:32 a.m. UTC | #31

On Wed, Oct 7, 2020 at 10:21 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Wed, Oct 07, 2020 at 08:46:45PM +0000, Ertman, David M wrote:
> > > -----Original Message-----
> > > From: Parav Pandit <parav@nvidia.com>
> > > Sent: Wednesday, October 7, 2020 1:17 PM
> > > To: Leon Romanovsky <leon@kernel.org>; Ertman, David M
> > > <david.m.ertman@intel.com>
> > > Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa-
> > > devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;
> > > netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com;
> > > fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;
> > > dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe
> > > <jgg@nvidia.com>; gregkh@linuxfoundation.org; kuba@kernel.org; Williams,
> > > Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> > > <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> > > <kiran.patil@intel.com>
> > > Subject: RE: [PATCH v2 1/6] Add ancillary bus support
> > >
> > >
> > > > From: Leon Romanovsky <leon@kernel.org>
> > > > Sent: Thursday, October 8, 2020 12:56 AM
> > > >
> > > > > > This API is partially obscures low level driver-core code and needs
> > > > > > to provide clear and proper abstractions without need to remember
> > > > > > about put_device. There is already _add() interface why don't you do
> > > > > > put_device() in it?
> > > > > >
> > > > >
> > > > > The pushback Pierre is referring to was during our mid-tier internal
> > > > > review.  It was primarily a concern of Parav as I recall, so he can speak to
> > > his
> > > > reasoning.
> > > > >
> > > > > What we originally had was a single API call
> > > > > (ancillary_device_register) that started with a call to
> > > > > device_initialize(), and every error path out of the function performed a
> > > > put_device().
> > > > >
> > > > > Is this the model you have in mind?
> > > >
> > > > I don't like this flow:
> > > > ancillary_device_initialize()
> > > > if (ancillary_ancillary_device_add()) {
> > > >   put_device(....)
> > > >   ancillary_device_unregister()
> > > Calling device_unregister() is incorrect, because add() wasn't successful.
> > > Only put_device() or a wrapper ancillary_device_put() is necessary.
> > >
> > > >   return err;
> > > > }
> > > >
> > > > And prefer this flow:
> > > > ancillary_device_initialize()
> > > > if (ancillary_device_add()) {
> > > >   ancillary_device_unregister()
> > > This is incorrect and a clear deviation from the current core APIs that adds the
> > > confusion.
> > >
> > > >   return err;
> > > > }
> > > >
> > > > In this way, the ancillary users won't need to do non-intuitive put_device();
> > >
> > > Below is most simple, intuitive and matching with core APIs for name and
> > > design pattern wise.
> > > init()
> > > {
> > >     err = ancillary_device_initialize();
> > >     if (err)
> > >             return ret;
> > >
> > >     err = ancillary_device_add();
> > >     if (ret)
> > >             goto err_unwind;
> > >
> > >     err = some_foo();
> > >     if (err)
> > >             goto err_foo;
> > >     return 0;
> > >
> > > err_foo:
> > >     ancillary_device_del(adev);
> > > err_unwind:
> > >     ancillary_device_put(adev->dev);
> > >     return err;
> > > }
> > >
> > > cleanup()
> > > {
> > >     ancillary_device_de(adev);
> > >     ancillary_device_put(adev);
> > >     /* It is common to have a one wrapper for this as
> > > ancillary_device_unregister().
> > >      * This will match with core device_unregister() that has precise
> > > documentation.
> > >      * but given fact that init() code need proper error unwinding, like
> > > above,
> > >      * it make sense to have two APIs, and no need to export another
> > > symbol for unregister().
> > >      * This pattern is very easy to audit and code.
> > >      */
> > > }
> >
> > I like this flow +1
> >
> > But ... since the init() function is performing both device_init and
> > device_add - it should probably be called ancillary_device_register,
> > and we are back to a single exported API for both register and
> > unregister.
> >
> > At that point, do we need wrappers on the primitives init, add, del,
> > and put?
>
> Let me summarize.
> 1. You are not providing driver/core API but simplification and obfuscation
> of basic primitives and structures. This is new layer. There is no room for
> a claim that we must to follow internal API.

Yes, this a driver core api, Greg even questioned why it was in
drivers/bus instead of drivers/base which I think makes sense.

> 2. API should be symmetric. If you call to _register()/_add(), you will need
> to call to _unregister()/_del(). Please don't add obscure _put().

It's not obscure it's a long standing semantic for how to properly
handle device_add() failures. Especially in this case where there is
no way to have something like a common auxiliary_device_alloc() that
will work for everyone the only other option is require all device
destruction to go through the provided release method (put_device())
after a device_add() failure.

> 3. You can't "ask" from users to call internal calls (put_device) over internal
> fields in ancillary_device.

Sure it can. platform_device_add() requires a put_device() on failure,
but also note how platform_device_add() *requires*
platform_device_alloc() be used to create the device. That
inflexibility is something this auxiliary bus is trying to avoid.

> 4. This API should be clear to drivers authors, "device_add()" call (and
> semantic) is not used by the drivers (git grep " device_add(" drivers/).

This shows 141 instances for me, so I'm not sure what you're getting at?

Look, this api is meant to be a replacement for places where platform
devices were being abused. The device_initialize() + customize device
+ device_add() organization has the flexibility needed to let users
customize naming and other parts of device creation in a way that a
device_register() flow, or platform_device_{register,add} in
particular, did not.

If the concern is that you'd like to have an auxiliary_device_put()
for symmetry that would need to come with the same warning as
commented on platform_device_put(), i.e. that's it's really only
vanity symmetry to be used in error paths. The semantics of
device_add() and device_put() on failure are long established, don't
invent new behavior for auxiliary_device_add() and
auxiliary_device_put() / put_device().

Leon Romanovsky Oct. 8, 2020, 7 a.m. UTC | #32

On Wed, Oct 07, 2020 at 11:32:11PM -0700, Dan Williams wrote:
> On Wed, Oct 7, 2020 at 10:21 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Wed, Oct 07, 2020 at 08:46:45PM +0000, Ertman, David M wrote:
> > > > -----Original Message-----
> > > > From: Parav Pandit <parav@nvidia.com>
> > > > Sent: Wednesday, October 7, 2020 1:17 PM
> > > > To: Leon Romanovsky <leon@kernel.org>; Ertman, David M
> > > > <david.m.ertman@intel.com>
> > > > Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa-
> > > > devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;
> > > > netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com;
> > > > fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;
> > > > dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe
> > > > <jgg@nvidia.com>; gregkh@linuxfoundation.org; kuba@kernel.org; Williams,
> > > > Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> > > > <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> > > > <kiran.patil@intel.com>
> > > > Subject: RE: [PATCH v2 1/6] Add ancillary bus support
> > > >
> > > >
> > > > > From: Leon Romanovsky <leon@kernel.org>
> > > > > Sent: Thursday, October 8, 2020 12:56 AM
> > > > >
> > > > > > > This API is partially obscures low level driver-core code and needs
> > > > > > > to provide clear and proper abstractions without need to remember
> > > > > > > about put_device. There is already _add() interface why don't you do
> > > > > > > put_device() in it?
> > > > > > >
> > > > > >
> > > > > > The pushback Pierre is referring to was during our mid-tier internal
> > > > > > review.  It was primarily a concern of Parav as I recall, so he can speak to
> > > > his
> > > > > reasoning.
> > > > > >
> > > > > > What we originally had was a single API call
> > > > > > (ancillary_device_register) that started with a call to
> > > > > > device_initialize(), and every error path out of the function performed a
> > > > > put_device().
> > > > > >
> > > > > > Is this the model you have in mind?
> > > > >
> > > > > I don't like this flow:
> > > > > ancillary_device_initialize()
> > > > > if (ancillary_ancillary_device_add()) {
> > > > >   put_device(....)
> > > > >   ancillary_device_unregister()
> > > > Calling device_unregister() is incorrect, because add() wasn't successful.
> > > > Only put_device() or a wrapper ancillary_device_put() is necessary.
> > > >
> > > > >   return err;
> > > > > }
> > > > >
> > > > > And prefer this flow:
> > > > > ancillary_device_initialize()
> > > > > if (ancillary_device_add()) {
> > > > >   ancillary_device_unregister()
> > > > This is incorrect and a clear deviation from the current core APIs that adds the
> > > > confusion.
> > > >
> > > > >   return err;
> > > > > }
> > > > >
> > > > > In this way, the ancillary users won't need to do non-intuitive put_device();
> > > >
> > > > Below is most simple, intuitive and matching with core APIs for name and
> > > > design pattern wise.
> > > > init()
> > > > {
> > > >     err = ancillary_device_initialize();
> > > >     if (err)
> > > >             return ret;
> > > >
> > > >     err = ancillary_device_add();
> > > >     if (ret)
> > > >             goto err_unwind;
> > > >
> > > >     err = some_foo();
> > > >     if (err)
> > > >             goto err_foo;
> > > >     return 0;
> > > >
> > > > err_foo:
> > > >     ancillary_device_del(adev);
> > > > err_unwind:
> > > >     ancillary_device_put(adev->dev);
> > > >     return err;
> > > > }
> > > >
> > > > cleanup()
> > > > {
> > > >     ancillary_device_de(adev);
> > > >     ancillary_device_put(adev);
> > > >     /* It is common to have a one wrapper for this as
> > > > ancillary_device_unregister().
> > > >      * This will match with core device_unregister() that has precise
> > > > documentation.
> > > >      * but given fact that init() code need proper error unwinding, like
> > > > above,
> > > >      * it make sense to have two APIs, and no need to export another
> > > > symbol for unregister().
> > > >      * This pattern is very easy to audit and code.
> > > >      */
> > > > }
> > >
> > > I like this flow +1
> > >
> > > But ... since the init() function is performing both device_init and
> > > device_add - it should probably be called ancillary_device_register,
> > > and we are back to a single exported API for both register and
> > > unregister.
> > >
> > > At that point, do we need wrappers on the primitives init, add, del,
> > > and put?
> >
> > Let me summarize.
> > 1. You are not providing driver/core API but simplification and obfuscation
> > of basic primitives and structures. This is new layer. There is no room for
> > a claim that we must to follow internal API.
>
> Yes, this a driver core api, Greg even questioned why it was in
> drivers/bus instead of drivers/base which I think makes sense.

We can argue till death, but at the end, this is a bus.

>
> > 2. API should be symmetric. If you call to _register()/_add(), you will need
> > to call to _unregister()/_del(). Please don't add obscure _put().
>
> It's not obscure it's a long standing semantic for how to properly
> handle device_add() failures. Especially in this case where there is
> no way to have something like a common auxiliary_device_alloc() that
> will work for everyone the only other option is require all device
> destruction to go through the provided release method (put_device())
> after a device_add() failure.

And this is my main concern, this is not device_add() failure but
ancillary_device_add() which hides driver_* logic.

We won't expect to see inside ancillary drivers direct calls to
device_*(), why will it be different here with put_device?

>
> > 3. You can't "ask" from users to call internal calls (put_device) over internal
> > fields in ancillary_device.
>
> Sure it can. platform_device_add() requires a put_device() on failure,
> but also note how platform_device_add() *requires*
> platform_device_alloc() be used to create the device. That
> inflexibility is something this auxiliary bus is trying to avoid.

I'm writing below, the rationale behind this bus is RDMA, netdev and
other cross-subsystem devices.

>
> > 4. This API should be clear to drivers authors, "device_add()" call (and
> > semantic) is not used by the drivers (git grep " device_add(" drivers/).
>
> This shows 141 instances for me, so I'm not sure what you're getting at?

Did you look at them? I did, most if not all of the calls are in
bus/core/generic logic, drivers are not calling to it or at least
not supposed to.

>
> Look, this api is meant to be a replacement for places where platform
> devices were being abused. The device_initialize() + customize device
> + device_add() organization has the flexibility needed to let users
> customize naming and other parts of device creation in a way that a
> device_register() flow, or platform_device_{register,add} in
> particular, did not.

It is hard me to say if the goal it to replace platform devices or not,
but this ancillary_device bus adventure started after request to stop
reinvent PCI logic for every new RDMA (RoCE) drivers. This is there
full power of this virtbus solution comes into full power by deleting
tons of complex code.

>
> If the concern is that you'd like to have an auxiliary_device_put()
> for symmetry that would need to come with the same warning as
> commented on platform_device_put(), i.e. that's it's really only
> vanity symmetry to be used in error paths. The semantics of
> device_add() and device_put() on failure are long established, don't
> invent new behavior for auxiliary_device_add() and
> auxiliary_device_put() / put_device().

All stated above is my opinion, it can be different from yours.

Thanks

Parav Pandit Oct. 8, 2020, 7:14 a.m. UTC | #33

> From: Leon Romanovsky <leon@kernel.org>
> Sent: Thursday, October 8, 2020 10:56 AM
> 
> On Thu, Oct 08, 2020 at 04:56:01AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > > Sent: Thursday, October 8, 2020 3:20 AM
> > >
> > >
> > > On 10/7/20 4:22 PM, Ertman, David M wrote:
> > > >> -----Original Message-----
> > > >> From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > > >> Sent: Wednesday, October 7, 2020 1:59 PM
> > > >> To: Ertman, David M <david.m.ertman@intel.com>; Parav Pandit
> > > >> <parav@nvidia.com>; Leon Romanovsky <leon@kernel.org>
> > > >> Cc: alsa-devel@alsa-project.org; parav@mellanox.com;
> > > >> tiwai@suse.de; netdev@vger.kernel.org;
> > > >> ranjani.sridharan@linux.intel.com;
> > > >> fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;
> > > >> dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe
> > > >> <jgg@nvidia.com>; gregkh@linuxfoundation.org; kuba@kernel.org;
> > > >> Williams, Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> > > >> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> > > >> <kiran.patil@intel.com>
> > > >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > > >>
> > > >>
> > > >>
> > > >>>> Below is most simple, intuitive and matching with core APIs for
> > > >>>> name and design pattern wise.
> > > >>>> init()
> > > >>>> {
> > > >>>> 	err = ancillary_device_initialize();
> > > >>>> 	if (err)
> > > >>>> 		return ret;
> > > >>>>
> > > >>>> 	err = ancillary_device_add();
> > > >>>> 	if (ret)
> > > >>>> 		goto err_unwind;
> > > >>>>
> > > >>>> 	err = some_foo();
> > > >>>> 	if (err)
> > > >>>> 		goto err_foo;
> > > >>>> 	return 0;
> > > >>>>
> > > >>>> err_foo:
> > > >>>> 	ancillary_device_del(adev);
> > > >>>> err_unwind:
> > > >>>> 	ancillary_device_put(adev->dev);
> > > >>>> 	return err;
> > > >>>> }
> > > >>>>
> > > >>>> cleanup()
> > > >>>> {
> > > >>>> 	ancillary_device_de(adev);
> > > >>>> 	ancillary_device_put(adev);
> > > >>>> 	/* It is common to have a one wrapper for this as
> > > >>>> ancillary_device_unregister().
> > > >>>> 	 * This will match with core device_unregister() that has
> > > >>>> precise documentation.
> > > >>>> 	 * but given fact that init() code need proper error
> > > >>>> unwinding, like above,
> > > >>>> 	 * it make sense to have two APIs, and no need to export
> > > >>>> another symbol for unregister().
> > > >>>> 	 * This pattern is very easy to audit and code.
> > > >>>> 	 */
> > > >>>> }
> > > >>>
> > > >>> I like this flow +1
> > > >>>
> > > >>> But ... since the init() function is performing both device_init
> > > >>> and device_add - it should probably be called
> > > >>> ancillary_device_register, and we are back to a single exported
> > > >>> API for both register and unregister.
> > > >>
> > > >> Kind reminder that we introduced the two functions to allow the
> > > >> caller to know if it needed to free memory when initialize()
> > > >> fails, and it didn't need to free memory when add() failed since
> > > >> put_device() takes care of it. If you have a single init()
> > > >> function it's impossible to know which behavior to select on error.
> > > >>
> > > >> I also have a case with SoundWire where it's nice to first
> > > >> initialize, then set some data and then add.
> > > >>
> > > >
> > > > The flow as outlined by Parav above does an initialize as the
> > > > first step, so every error path out of the function has to do a
> > > > put_device(), so you would never need to manually free the memory
> > > > in
> > > the setup function.
> > > > It would be freed in the release call.
> > >
> > > err = ancillary_device_initialize(); if (err)
> > > 	return ret;
> > >
> > > where is the put_device() here? if the release function does any
> > > sort of kfree, then you'd need to do it manually in this case.
> > Since device_initialize() failed, put_device() cannot be done here.
> > So yes, pseudo code should have shown, if (err) {
> > 	kfree(adev);
> > 	return err;
> > }
> >
> > If we just want to follow register(), unregister() pattern,
> >
> > Than,
> >
> > ancillar_device_register() should be,
> >
> > /**
> >  * ancillar_device_register() - register an ancillary device
> >  * NOTE: __never directly free @adev after calling this function, even
> > if it returned
> >  * an error. Always use ancillary_device_put() to give up the reference
> initialized by this function.
> >  * This note matches with the core and caller knows exactly what to be
> done.
> >  */
> > ancillary_device_register()
> > {
> > 	device_initialize(&adev->dev);
> > 	if (!dev->parent || !adev->name)
> > 		return -EINVAL;
> > 	if (!dev->release && !(dev->type && dev->type->release)) {
> > 		/* core is already capable and throws the warning when
> release callback is not set.
> > 		 * It is done at drivers/base/core.c:1798.
> > 		 * For NULL release it says, "does not have a release()
> function, it is broken and must be fixed"
> > 		 */
> > 		return -EINVAL;
> > 	}
> > 	err = dev_set_name(adev...);
> > 	if (err) {
> > 		/* kobject_release() -> kobject_cleanup() are capable to
> detect if name is set/ not set
> > 		  * and free the const if it was set.
> > 		  */
> > 		return err;
> > 	}
> > 	err = device_add(&adev->dev);
> > 	If (err)
> > 		return err;
> > }
> >
> > Caller code:
> > init()
> > {
> > 	adev = kzalloc(sizeof(*foo_adev)..);
> > 	if (!adev)
> > 		return -ENOMEM;
> > 	err = ancillary_device_register(&adev);
> > 	if (err)
> > 		goto err;
> >
> > err:
> > 	ancillary_device_put(&adev);
> > 	return err;
> > }
> >
> > cleanup()
> > {
> > 	ancillary_device_unregister(&adev);
> > }
> >
> > Above pattern is fine too matching the core.
> >
> > If I understand Leon correctly, he prefers simple register(), unregister()
> pattern.
> > If, so it should be explicit register(), unregister() API.
> 
> This is my summary
> https://lore.kernel.org/linux-rdma/20201008052137.GA13580@unreal
> The API should be symmetric.
> 

I disagree to your below point.
> 1. You are not providing driver/core API but simplification and obfuscation
> of basic primitives and structures. This is new layer. There is no room for
> a claim that we must to follow internal API.
If ancillary bus has
ancillary_device_add(), it cannot do device_initialize() and device_add() in both.

I provided two examples and what really matters is a given patchset uses (need to use) which pattern,
initialize() + add(), or register() + unregister().

As we all know that API is not added for future. It is the future patch extends it.
So lets wait for Pierre to reply if soundwire can follow register(), unregister() sequence.
This way same APIs can service both use-cases.

Regarding,
> 3. You can't "ask" from users to call internal calls (put_device) over internal
> fields in ancillary_device.
In that case if should be ancillary_device_put() ancillary_device_release().

Or we should follow the patten of ib_alloc_device [1],
ancillary_device_alloc()
    -> kzalloc(adev + dev) with compile time assert check like rdma and vdpa subsystem.
    ->device_initialize()
ancillary_device_add()

ancillar_device_de() <- balances with add
ancillary_device_dealloc() <-- balances with device_alloc(), which does the put_device() + free the memory allocated in alloc().

This approach of [1] also eliminates exposing adev.dev.release = <drivers_release_method_to_free_adev> in drivers.
And container_of() benefit also continues..

[1] https://elixir.bootlin.com/linux/v5.9-rc8/source/include/rdma/ib_verbs.h#L2791

Dan Williams Oct. 8, 2020, 7:38 a.m. UTC | #34

On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky <leon@kernel.org> wrote:
[..]
> All stated above is my opinion, it can be different from yours.

Yes, but we need to converge to move this forward. Jason was involved
in the current organization for registration, Greg was angling for
this to be core functionality. I have use cases outside of RDMA and
netdev. Parav was ok with the current organization. The SOF folks
already have a proposed incorporation of it. The argument I am hearing
is that "this registration api seems hard for driver writers" when we
have several driver writers who have already taken a look and can make
it work. If you want to follow on with a simpler wrappers for your use
case, great, but I do not yet see anyone concurring with your opinion
that the current organization is irretrievably broken or too obscure
to use.

Leon Romanovsky Oct. 8, 2020, 7:45 a.m. UTC | #35

On Thu, Oct 08, 2020 at 07:14:17AM +0000, Parav Pandit wrote:
>
>
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Thursday, October 8, 2020 10:56 AM
> >
> > On Thu, Oct 08, 2020 at 04:56:01AM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > > > Sent: Thursday, October 8, 2020 3:20 AM
> > > >
> > > >
> > > > On 10/7/20 4:22 PM, Ertman, David M wrote:
> > > > >> -----Original Message-----
> > > > >> From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > > > >> Sent: Wednesday, October 7, 2020 1:59 PM
> > > > >> To: Ertman, David M <david.m.ertman@intel.com>; Parav Pandit
> > > > >> <parav@nvidia.com>; Leon Romanovsky <leon@kernel.org>
> > > > >> Cc: alsa-devel@alsa-project.org; parav@mellanox.com;
> > > > >> tiwai@suse.de; netdev@vger.kernel.org;
> > > > >> ranjani.sridharan@linux.intel.com;
> > > > >> fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;
> > > > >> dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe
> > > > >> <jgg@nvidia.com>; gregkh@linuxfoundation.org; kuba@kernel.org;
> > > > >> Williams, Dan J <dan.j.williams@intel.com>; Saleem, Shiraz
> > > > >> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> > > > >> <kiran.patil@intel.com>
> > > > >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > > > >>
> > > > >>
> > > > >>
> > > > >>>> Below is most simple, intuitive and matching with core APIs for
> > > > >>>> name and design pattern wise.
> > > > >>>> init()
> > > > >>>> {
> > > > >>>> 	err = ancillary_device_initialize();
> > > > >>>> 	if (err)
> > > > >>>> 		return ret;
> > > > >>>>
> > > > >>>> 	err = ancillary_device_add();
> > > > >>>> 	if (ret)
> > > > >>>> 		goto err_unwind;
> > > > >>>>
> > > > >>>> 	err = some_foo();
> > > > >>>> 	if (err)
> > > > >>>> 		goto err_foo;
> > > > >>>> 	return 0;
> > > > >>>>
> > > > >>>> err_foo:
> > > > >>>> 	ancillary_device_del(adev);
> > > > >>>> err_unwind:
> > > > >>>> 	ancillary_device_put(adev->dev);
> > > > >>>> 	return err;
> > > > >>>> }
> > > > >>>>
> > > > >>>> cleanup()
> > > > >>>> {
> > > > >>>> 	ancillary_device_de(adev);
> > > > >>>> 	ancillary_device_put(adev);
> > > > >>>> 	/* It is common to have a one wrapper for this as
> > > > >>>> ancillary_device_unregister().
> > > > >>>> 	 * This will match with core device_unregister() that has
> > > > >>>> precise documentation.
> > > > >>>> 	 * but given fact that init() code need proper error
> > > > >>>> unwinding, like above,
> > > > >>>> 	 * it make sense to have two APIs, and no need to export
> > > > >>>> another symbol for unregister().
> > > > >>>> 	 * This pattern is very easy to audit and code.
> > > > >>>> 	 */
> > > > >>>> }
> > > > >>>
> > > > >>> I like this flow +1
> > > > >>>
> > > > >>> But ... since the init() function is performing both device_init
> > > > >>> and device_add - it should probably be called
> > > > >>> ancillary_device_register, and we are back to a single exported
> > > > >>> API for both register and unregister.
> > > > >>
> > > > >> Kind reminder that we introduced the two functions to allow the
> > > > >> caller to know if it needed to free memory when initialize()
> > > > >> fails, and it didn't need to free memory when add() failed since
> > > > >> put_device() takes care of it. If you have a single init()
> > > > >> function it's impossible to know which behavior to select on error.
> > > > >>
> > > > >> I also have a case with SoundWire where it's nice to first
> > > > >> initialize, then set some data and then add.
> > > > >>
> > > > >
> > > > > The flow as outlined by Parav above does an initialize as the
> > > > > first step, so every error path out of the function has to do a
> > > > > put_device(), so you would never need to manually free the memory
> > > > > in
> > > > the setup function.
> > > > > It would be freed in the release call.
> > > >
> > > > err = ancillary_device_initialize(); if (err)
> > > > 	return ret;
> > > >
> > > > where is the put_device() here? if the release function does any
> > > > sort of kfree, then you'd need to do it manually in this case.
> > > Since device_initialize() failed, put_device() cannot be done here.
> > > So yes, pseudo code should have shown, if (err) {
> > > 	kfree(adev);
> > > 	return err;
> > > }
> > >
> > > If we just want to follow register(), unregister() pattern,
> > >
> > > Than,
> > >
> > > ancillar_device_register() should be,
> > >
> > > /**
> > >  * ancillar_device_register() - register an ancillary device
> > >  * NOTE: __never directly free @adev after calling this function, even
> > > if it returned
> > >  * an error. Always use ancillary_device_put() to give up the reference
> > initialized by this function.
> > >  * This note matches with the core and caller knows exactly what to be
> > done.
> > >  */
> > > ancillary_device_register()
> > > {
> > > 	device_initialize(&adev->dev);
> > > 	if (!dev->parent || !adev->name)
> > > 		return -EINVAL;
> > > 	if (!dev->release && !(dev->type && dev->type->release)) {
> > > 		/* core is already capable and throws the warning when
> > release callback is not set.
> > > 		 * It is done at drivers/base/core.c:1798.
> > > 		 * For NULL release it says, "does not have a release()
> > function, it is broken and must be fixed"
> > > 		 */
> > > 		return -EINVAL;
> > > 	}
> > > 	err = dev_set_name(adev...);
> > > 	if (err) {
> > > 		/* kobject_release() -> kobject_cleanup() are capable to
> > detect if name is set/ not set
> > > 		  * and free the const if it was set.
> > > 		  */
> > > 		return err;
> > > 	}
> > > 	err = device_add(&adev->dev);
> > > 	If (err)
> > > 		return err;
> > > }
> > >
> > > Caller code:
> > > init()
> > > {
> > > 	adev = kzalloc(sizeof(*foo_adev)..);
> > > 	if (!adev)
> > > 		return -ENOMEM;
> > > 	err = ancillary_device_register(&adev);
> > > 	if (err)
> > > 		goto err;
> > >
> > > err:
> > > 	ancillary_device_put(&adev);
> > > 	return err;
> > > }
> > >
> > > cleanup()
> > > {
> > > 	ancillary_device_unregister(&adev);
> > > }
> > >
> > > Above pattern is fine too matching the core.
> > >
> > > If I understand Leon correctly, he prefers simple register(), unregister()
> > pattern.
> > > If, so it should be explicit register(), unregister() API.
> >
> > This is my summary
> > https://lore.kernel.org/linux-rdma/20201008052137.GA13580@unreal
> > The API should be symmetric.
> >
>
> I disagree to your below point.
> > 1. You are not providing driver/core API but simplification and obfuscation
> > of basic primitives and structures. This is new layer. There is no room for
> > a claim that we must to follow internal API.
> If ancillary bus has
> ancillary_device_add(), it cannot do device_initialize() and device_add() in both.
>
> I provided two examples and what really matters is a given patchset uses (need to use) which pattern,
> initialize() + add(), or register() + unregister().
>
> As we all know that API is not added for future. It is the future patch extends it.
> So lets wait for Pierre to reply if soundwire can follow register(), unregister() sequence.
> This way same APIs can service both use-cases.
>
> Regarding,
> > 3. You can't "ask" from users to call internal calls (put_device) over internal
> > fields in ancillary_device.
> In that case if should be ancillary_device_put() ancillary_device_release().
>
> Or we should follow the patten of ib_alloc_device [1],
> ancillary_device_alloc()
>     -> kzalloc(adev + dev) with compile time assert check like rdma and vdpa subsystem.
>     ->device_initialize()
> ancillary_device_add()
>
> ancillar_device_de() <- balances with add
> ancillary_device_dealloc() <-- balances with device_alloc(), which does the put_device() + free the memory allocated in alloc().
>
> This approach of [1] also eliminates exposing adev.dev.release = <drivers_release_method_to_free_adev> in drivers.
> And container_of() benefit also continues..
>
> [1] https://elixir.bootlin.com/linux/v5.9-rc8/source/include/rdma/ib_verbs.h#L2791
>

My code looks like this, probably yours looks the same.

  247                 priv->adev[i] = kzalloc(sizeof(*priv->adev[i]), GFP_KERNEL);
  248                 if (!priv->adev[i])
  249                         goto init_err;
  250
  251                 adev = &priv->adev[i]->adev;
  252                 adev->id = idx;
  253                 adev->name = mlx5_adev_devices[i].suffix;
  254                 adev->dev.parent = dev->device;
  255                 adev->dev.release = adev_release;
  256                 priv->adev[i]->mdev = dev;
  257
  258                 ret = ancillary_device_initialize(adev);
  259                 if (ret)
  260                         goto init_err;
  261
  262                 ret = ancillary_device_add(adev);
  263                 if (ret) {
  264                         put_device(&adev->dev);
  265                         goto add_err;
  266                 }

Thanks

Greg Kroah-Hartman Oct. 8, 2020, 7:50 a.m. UTC | #36

On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote:
> On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky <leon@kernel.org> wrote:
> [..]
> > All stated above is my opinion, it can be different from yours.
> 
> Yes, but we need to converge to move this forward. Jason was involved
> in the current organization for registration, Greg was angling for
> this to be core functionality. I have use cases outside of RDMA and
> netdev. Parav was ok with the current organization. The SOF folks
> already have a proposed incorporation of it. The argument I am hearing
> is that "this registration api seems hard for driver writers" when we
> have several driver writers who have already taken a look and can make
> it work. If you want to follow on with a simpler wrappers for your use
> case, great, but I do not yet see anyone concurring with your opinion
> that the current organization is irretrievably broken or too obscure
> to use.

That's kind of because I tuned out of this thread a long time ago :)

I do agree with Leon that I think the current patch is not the correct
way to do this the easiest, but don't have a competing proposal to show
what I mean.

Yet.

Let's see what happens after 5.10-rc1 is out, it's too late now for any
of this for this next merge window so we can not worry about it for a
few weeks.

thanks,

greg k-h

Leon Romanovsky Oct. 8, 2020, 8 a.m. UTC | #37

On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote:
> On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky <leon@kernel.org> wrote:
> [..]
> > All stated above is my opinion, it can be different from yours.
>
> Yes, but we need to converge to move this forward. Jason was involved
> in the current organization for registration, Greg was angling for
> this to be core functionality. I have use cases outside of RDMA and
> netdev. Parav was ok with the current organization. The SOF folks
> already have a proposed incorporation of it. The argument I am hearing
> is that "this registration api seems hard for driver writers" when we
> have several driver writers who have already taken a look and can make
> it work. If you want to follow on with a simpler wrappers for your use
> case, great, but I do not yet see anyone concurring with your opinion
> that the current organization is irretrievably broken or too obscure
> to use.

Can it be that I'm first one to use this bus for very large driver (>120K LOC)
that has 5 different ->probe() flows?

For example, this https://lore.kernel.org/linux-rdma/20201006172317.GN1874917@unreal/
hints to me that this bus wasn't used with anything complex as it was initially intended.

And regarding registration, I said many times that init()/add() scheme is ok, the inability
to call to uninit() after add() failure is not ok from my point of view.

Thanks

Dan Williams Oct. 8, 2020, 8:09 a.m. UTC | #38

On Thu, Oct 8, 2020 at 1:00 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote:
> > On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky <leon@kernel.org> wrote:
> > [..]
> > > All stated above is my opinion, it can be different from yours.
> >
> > Yes, but we need to converge to move this forward. Jason was involved
> > in the current organization for registration, Greg was angling for
> > this to be core functionality. I have use cases outside of RDMA and
> > netdev. Parav was ok with the current organization. The SOF folks
> > already have a proposed incorporation of it. The argument I am hearing
> > is that "this registration api seems hard for driver writers" when we
> > have several driver writers who have already taken a look and can make
> > it work. If you want to follow on with a simpler wrappers for your use
> > case, great, but I do not yet see anyone concurring with your opinion
> > that the current organization is irretrievably broken or too obscure
> > to use.
>
> Can it be that I'm first one to use this bus for very large driver (>120K LOC)
> that has 5 different ->probe() flows?
>
> For example, this https://lore.kernel.org/linux-rdma/20201006172317.GN1874917@unreal/
> hints to me that this bus wasn't used with anything complex as it was initially intended.

I missed that. Yes, I agree that's broken.

>
> And regarding registration, I said many times that init()/add() scheme is ok, the inability
> to call to uninit() after add() failure is not ok from my point of view.

Ok, I got to the wrong conclusion about your position.

Parav Pandit Oct. 8, 2020, 9:45 a.m. UTC | #39

> From: Leon Romanovsky <leon@kernel.org>
> Sent: Thursday, October 8, 2020 1:15 PM
> 
> On Thu, Oct 08, 2020 at 07:14:17AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Sent: Thursday, October 8, 2020 10:56 AM
> > >
> > > On Thu, Oct 08, 2020 at 04:56:01AM +0000, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Pierre-Louis Bossart
> > > > > <pierre-louis.bossart@linux.intel.com>
> > > > > Sent: Thursday, October 8, 2020 3:20 AM
> > > > >
> > > > >
> > > > > On 10/7/20 4:22 PM, Ertman, David M wrote:
> > > > > >> -----Original Message-----
> > > > > >> From: Pierre-Louis Bossart
> > > > > >> <pierre-louis.bossart@linux.intel.com>
> > > > > >> Sent: Wednesday, October 7, 2020 1:59 PM
> > > > > >> To: Ertman, David M <david.m.ertman@intel.com>; Parav Pandit
> > > > > >> <parav@nvidia.com>; Leon Romanovsky <leon@kernel.org>
> > > > > >> Cc: alsa-devel@alsa-project.org; parav@mellanox.com;
> > > > > >> tiwai@suse.de; netdev@vger.kernel.org;
> > > > > >> ranjani.sridharan@linux.intel.com;
> > > > > >> fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;
> > > > > >> dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe
> > > > > >> <jgg@nvidia.com>; gregkh@linuxfoundation.org;
> > > > > >> kuba@kernel.org; Williams, Dan J <dan.j.williams@intel.com>;
> > > > > >> Saleem, Shiraz <shiraz.saleem@intel.com>;
> > > > > >> davem@davemloft.net; Patil, Kiran <kiran.patil@intel.com>
> > > > > >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>>> Below is most simple, intuitive and matching with core APIs
> > > > > >>>> for name and design pattern wise.
> > > > > >>>> init()
> > > > > >>>> {
> > > > > >>>> 	err = ancillary_device_initialize();
> > > > > >>>> 	if (err)
> > > > > >>>> 		return ret;
> > > > > >>>>
> > > > > >>>> 	err = ancillary_device_add();
> > > > > >>>> 	if (ret)
> > > > > >>>> 		goto err_unwind;
> > > > > >>>>
> > > > > >>>> 	err = some_foo();
> > > > > >>>> 	if (err)
> > > > > >>>> 		goto err_foo;
> > > > > >>>> 	return 0;
> > > > > >>>>
> > > > > >>>> err_foo:
> > > > > >>>> 	ancillary_device_del(adev);
> > > > > >>>> err_unwind:
> > > > > >>>> 	ancillary_device_put(adev->dev);
> > > > > >>>> 	return err;
> > > > > >>>> }
> > > > > >>>>
> > > > > >>>> cleanup()
> > > > > >>>> {
> > > > > >>>> 	ancillary_device_de(adev);
> > > > > >>>> 	ancillary_device_put(adev);
> > > > > >>>> 	/* It is common to have a one wrapper for this as
> > > > > >>>> ancillary_device_unregister().
> > > > > >>>> 	 * This will match with core device_unregister() that has
> > > > > >>>> precise documentation.
> > > > > >>>> 	 * but given fact that init() code need proper error
> > > > > >>>> unwinding, like above,
> > > > > >>>> 	 * it make sense to have two APIs, and no need to export
> > > > > >>>> another symbol for unregister().
> > > > > >>>> 	 * This pattern is very easy to audit and code.
> > > > > >>>> 	 */
> > > > > >>>> }
> > > > > >>>
> > > > > >>> I like this flow +1
> > > > > >>>
> > > > > >>> But ... since the init() function is performing both
> > > > > >>> device_init and device_add - it should probably be called
> > > > > >>> ancillary_device_register, and we are back to a single
> > > > > >>> exported API for both register and unregister.
> > > > > >>
> > > > > >> Kind reminder that we introduced the two functions to allow
> > > > > >> the caller to know if it needed to free memory when
> > > > > >> initialize() fails, and it didn't need to free memory when
> > > > > >> add() failed since
> > > > > >> put_device() takes care of it. If you have a single init()
> > > > > >> function it's impossible to know which behavior to select on error.
> > > > > >>
> > > > > >> I also have a case with SoundWire where it's nice to first
> > > > > >> initialize, then set some data and then add.
> > > > > >>
> > > > > >
> > > > > > The flow as outlined by Parav above does an initialize as the
> > > > > > first step, so every error path out of the function has to do
> > > > > > a put_device(), so you would never need to manually free the
> > > > > > memory in
> > > > > the setup function.
> > > > > > It would be freed in the release call.
> > > > >
> > > > > err = ancillary_device_initialize(); if (err)
> > > > > 	return ret;
> > > > >
> > > > > where is the put_device() here? if the release function does any
> > > > > sort of kfree, then you'd need to do it manually in this case.
> > > > Since device_initialize() failed, put_device() cannot be done here.
> > > > So yes, pseudo code should have shown, if (err) {
> > > > 	kfree(adev);
> > > > 	return err;
> > > > }
> > > >
> > > > If we just want to follow register(), unregister() pattern,
> > > >
> > > > Than,
> > > >
> > > > ancillar_device_register() should be,
> > > >
> > > > /**
> > > >  * ancillar_device_register() - register an ancillary device
> > > >  * NOTE: __never directly free @adev after calling this function,
> > > > even if it returned
> > > >  * an error. Always use ancillary_device_put() to give up the
> > > > reference
> > > initialized by this function.
> > > >  * This note matches with the core and caller knows exactly what
> > > > to be
> > > done.
> > > >  */
> > > > ancillary_device_register()
> > > > {
> > > > 	device_initialize(&adev->dev);
> > > > 	if (!dev->parent || !adev->name)
> > > > 		return -EINVAL;
> > > > 	if (!dev->release && !(dev->type && dev->type->release)) {
> > > > 		/* core is already capable and throws the warning when
> > > release callback is not set.
> > > > 		 * It is done at drivers/base/core.c:1798.
> > > > 		 * For NULL release it says, "does not have a release()
> > > function, it is broken and must be fixed"
> > > > 		 */
> > > > 		return -EINVAL;
> > > > 	}
> > > > 	err = dev_set_name(adev...);
> > > > 	if (err) {
> > > > 		/* kobject_release() -> kobject_cleanup() are capable to
> > > detect if name is set/ not set
> > > > 		  * and free the const if it was set.
> > > > 		  */
> > > > 		return err;
> > > > 	}
> > > > 	err = device_add(&adev->dev);
> > > > 	If (err)
> > > > 		return err;
> > > > }
> > > >
> > > > Caller code:
> > > > init()
> > > > {
> > > > 	adev = kzalloc(sizeof(*foo_adev)..);
> > > > 	if (!adev)
> > > > 		return -ENOMEM;
> > > > 	err = ancillary_device_register(&adev);
> > > > 	if (err)
> > > > 		goto err;
> > > >
> > > > err:
> > > > 	ancillary_device_put(&adev);
> > > > 	return err;
> > > > }
> > > >
> > > > cleanup()
> > > > {
> > > > 	ancillary_device_unregister(&adev);
> > > > }
> > > >
> > > > Above pattern is fine too matching the core.
> > > >
> > > > If I understand Leon correctly, he prefers simple register(),
> > > > unregister()
> > > pattern.
> > > > If, so it should be explicit register(), unregister() API.
> > >
> > > This is my summary
> > > https://lore.kernel.org/linux-rdma/20201008052137.GA13580@unreal
> > > The API should be symmetric.
> > >
> >
> > I disagree to your below point.
> > > 1. You are not providing driver/core API but simplification and
> > > obfuscation of basic primitives and structures. This is new layer.
> > > There is no room for a claim that we must to follow internal API.
> > If ancillary bus has
> > ancillary_device_add(), it cannot do device_initialize() and device_add() in
> both.
> >
> > I provided two examples and what really matters is a given patchset
> > uses (need to use) which pattern,
> > initialize() + add(), or register() + unregister().
> >
> > As we all know that API is not added for future. It is the future patch
> extends it.
> > So lets wait for Pierre to reply if soundwire can follow register(),
> unregister() sequence.
> > This way same APIs can service both use-cases.
> >
> > Regarding,
> > > 3. You can't "ask" from users to call internal calls (put_device)
> > > over internal fields in ancillary_device.
> > In that case if should be ancillary_device_put() ancillary_device_release().
> >
> > Or we should follow the patten of ib_alloc_device [1],
> > ancillary_device_alloc()
> >     -> kzalloc(adev + dev) with compile time assert check like rdma and vdpa
> subsystem.
> >     ->device_initialize()
> > ancillary_device_add()
> >
> > ancillar_device_de() <- balances with add
> > ancillary_device_dealloc() <-- balances with device_alloc(), which does the
> put_device() + free the memory allocated in alloc().
> >
> > This approach of [1] also eliminates exposing adev.dev.release =
> <drivers_release_method_to_free_adev> in drivers.
> > And container_of() benefit also continues..
> >
> > [1]
> > https://elixir.bootlin.com/linux/v5.9-rc8/source/include/rdma/ib_verbs
> > .h#L2791
> >
> 
> My code looks like this, probably yours looks the same.
> 
>   247                 priv->adev[i] = kzalloc(sizeof(*priv->adev[i]), GFP_KERNEL);
>   248                 if (!priv->adev[i])
>   249                         goto init_err;
>   250
>   251                 adev = &priv->adev[i]->adev;
>   252                 adev->id = idx;
>   253                 adev->name = mlx5_adev_devices[i].suffix;
>   254                 adev->dev.parent = dev->device;
>   255                 adev->dev.release = adev_release;
>   256                 priv->adev[i]->mdev = dev;
>   257
>   258                 ret = ancillary_device_initialize(adev);
>   259                 if (ret)
>   260                         goto init_err;
>   261
>   262                 ret = ancillary_device_add(adev);
>   263                 if (ret) {
>   264                         put_device(&adev->dev);
>   265                         goto add_err;
>   266                 }

Yes, subfunction code is also very similar.
You expressed concerned that you didn't like put_device() at [1].
But in above code is touching adev->dev.{parent, release} is ok?
>   254                 adev->dev.parent = dev->device;
>   255                 adev->dev.release = adev_release;

If not,

We can make it elegant by doing,

the patten of ib_alloc_device [1],
ancillary_device_alloc()
    -> kzalloc(adev + dev) with compile time assert check like rdma and vdpa subsystem.
    ->device_initialize()
ancillary_device_add()

ancillar_device_de() <- balances with add
ancillary_device_dealloc() <-- balances with device_alloc(), which does the put_device() + free the memory allocated in alloc().

This approach of [2] also eliminates exposing adev.dev.release = <drivers_release_method_to_free_adev> in drivers.
And container_of() benefit also continues..

[1] https://lore.kernel.org/linux-rdma/20201007192610.GD3964015@unreal/
[2] https://elixir.bootlin.com/linux/v5.9-rc8/source/include/rdma/ib_verbs.h#L2791

Leon Romanovsky Oct. 8, 2020, 10:17 a.m. UTC | #40

On Thu, Oct 08, 2020 at 09:45:29AM +0000, Parav Pandit wrote:
>
>
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Thursday, October 8, 2020 1:15 PM
> >
> > On Thu, Oct 08, 2020 at 07:14:17AM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Leon Romanovsky <leon@kernel.org>
> > > > Sent: Thursday, October 8, 2020 10:56 AM
> > > >
> > > > On Thu, Oct 08, 2020 at 04:56:01AM +0000, Parav Pandit wrote:
> > > > >
> > > > >
> > > > > > From: Pierre-Louis Bossart
> > > > > > <pierre-louis.bossart@linux.intel.com>
> > > > > > Sent: Thursday, October 8, 2020 3:20 AM
> > > > > >
> > > > > >
> > > > > > On 10/7/20 4:22 PM, Ertman, David M wrote:
> > > > > > >> -----Original Message-----
> > > > > > >> From: Pierre-Louis Bossart
> > > > > > >> <pierre-louis.bossart@linux.intel.com>
> > > > > > >> Sent: Wednesday, October 7, 2020 1:59 PM
> > > > > > >> To: Ertman, David M <david.m.ertman@intel.com>; Parav Pandit
> > > > > > >> <parav@nvidia.com>; Leon Romanovsky <leon@kernel.org>
> > > > > > >> Cc: alsa-devel@alsa-project.org; parav@mellanox.com;
> > > > > > >> tiwai@suse.de; netdev@vger.kernel.org;
> > > > > > >> ranjani.sridharan@linux.intel.com;
> > > > > > >> fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;
> > > > > > >> dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe
> > > > > > >> <jgg@nvidia.com>; gregkh@linuxfoundation.org;
> > > > > > >> kuba@kernel.org; Williams, Dan J <dan.j.williams@intel.com>;
> > > > > > >> Saleem, Shiraz <shiraz.saleem@intel.com>;
> > > > > > >> davem@davemloft.net; Patil, Kiran <kiran.patil@intel.com>
> > > > > > >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>>> Below is most simple, intuitive and matching with core APIs
> > > > > > >>>> for name and design pattern wise.
> > > > > > >>>> init()
> > > > > > >>>> {
> > > > > > >>>> 	err = ancillary_device_initialize();
> > > > > > >>>> 	if (err)
> > > > > > >>>> 		return ret;
> > > > > > >>>>
> > > > > > >>>> 	err = ancillary_device_add();
> > > > > > >>>> 	if (ret)
> > > > > > >>>> 		goto err_unwind;
> > > > > > >>>>
> > > > > > >>>> 	err = some_foo();
> > > > > > >>>> 	if (err)
> > > > > > >>>> 		goto err_foo;
> > > > > > >>>> 	return 0;
> > > > > > >>>>
> > > > > > >>>> err_foo:
> > > > > > >>>> 	ancillary_device_del(adev);
> > > > > > >>>> err_unwind:
> > > > > > >>>> 	ancillary_device_put(adev->dev);
> > > > > > >>>> 	return err;
> > > > > > >>>> }
> > > > > > >>>>
> > > > > > >>>> cleanup()
> > > > > > >>>> {
> > > > > > >>>> 	ancillary_device_de(adev);
> > > > > > >>>> 	ancillary_device_put(adev);
> > > > > > >>>> 	/* It is common to have a one wrapper for this as
> > > > > > >>>> ancillary_device_unregister().
> > > > > > >>>> 	 * This will match with core device_unregister() that has
> > > > > > >>>> precise documentation.
> > > > > > >>>> 	 * but given fact that init() code need proper error
> > > > > > >>>> unwinding, like above,
> > > > > > >>>> 	 * it make sense to have two APIs, and no need to export
> > > > > > >>>> another symbol for unregister().
> > > > > > >>>> 	 * This pattern is very easy to audit and code.
> > > > > > >>>> 	 */
> > > > > > >>>> }
> > > > > > >>>
> > > > > > >>> I like this flow +1
> > > > > > >>>
> > > > > > >>> But ... since the init() function is performing both
> > > > > > >>> device_init and device_add - it should probably be called
> > > > > > >>> ancillary_device_register, and we are back to a single
> > > > > > >>> exported API for both register and unregister.
> > > > > > >>
> > > > > > >> Kind reminder that we introduced the two functions to allow
> > > > > > >> the caller to know if it needed to free memory when
> > > > > > >> initialize() fails, and it didn't need to free memory when
> > > > > > >> add() failed since
> > > > > > >> put_device() takes care of it. If you have a single init()
> > > > > > >> function it's impossible to know which behavior to select on error.
> > > > > > >>
> > > > > > >> I also have a case with SoundWire where it's nice to first
> > > > > > >> initialize, then set some data and then add.
> > > > > > >>
> > > > > > >
> > > > > > > The flow as outlined by Parav above does an initialize as the
> > > > > > > first step, so every error path out of the function has to do
> > > > > > > a put_device(), so you would never need to manually free the
> > > > > > > memory in
> > > > > > the setup function.
> > > > > > > It would be freed in the release call.
> > > > > >
> > > > > > err = ancillary_device_initialize(); if (err)
> > > > > > 	return ret;
> > > > > >
> > > > > > where is the put_device() here? if the release function does any
> > > > > > sort of kfree, then you'd need to do it manually in this case.
> > > > > Since device_initialize() failed, put_device() cannot be done here.
> > > > > So yes, pseudo code should have shown, if (err) {
> > > > > 	kfree(adev);
> > > > > 	return err;
> > > > > }
> > > > >
> > > > > If we just want to follow register(), unregister() pattern,
> > > > >
> > > > > Than,
> > > > >
> > > > > ancillar_device_register() should be,
> > > > >
> > > > > /**
> > > > >  * ancillar_device_register() - register an ancillary device
> > > > >  * NOTE: __never directly free @adev after calling this function,
> > > > > even if it returned
> > > > >  * an error. Always use ancillary_device_put() to give up the
> > > > > reference
> > > > initialized by this function.
> > > > >  * This note matches with the core and caller knows exactly what
> > > > > to be
> > > > done.
> > > > >  */
> > > > > ancillary_device_register()
> > > > > {
> > > > > 	device_initialize(&adev->dev);
> > > > > 	if (!dev->parent || !adev->name)
> > > > > 		return -EINVAL;
> > > > > 	if (!dev->release && !(dev->type && dev->type->release)) {
> > > > > 		/* core is already capable and throws the warning when
> > > > release callback is not set.
> > > > > 		 * It is done at drivers/base/core.c:1798.
> > > > > 		 * For NULL release it says, "does not have a release()
> > > > function, it is broken and must be fixed"
> > > > > 		 */
> > > > > 		return -EINVAL;
> > > > > 	}
> > > > > 	err = dev_set_name(adev...);
> > > > > 	if (err) {
> > > > > 		/* kobject_release() -> kobject_cleanup() are capable to
> > > > detect if name is set/ not set
> > > > > 		  * and free the const if it was set.
> > > > > 		  */
> > > > > 		return err;
> > > > > 	}
> > > > > 	err = device_add(&adev->dev);
> > > > > 	If (err)
> > > > > 		return err;
> > > > > }
> > > > >
> > > > > Caller code:
> > > > > init()
> > > > > {
> > > > > 	adev = kzalloc(sizeof(*foo_adev)..);
> > > > > 	if (!adev)
> > > > > 		return -ENOMEM;
> > > > > 	err = ancillary_device_register(&adev);
> > > > > 	if (err)
> > > > > 		goto err;
> > > > >
> > > > > err:
> > > > > 	ancillary_device_put(&adev);
> > > > > 	return err;
> > > > > }
> > > > >
> > > > > cleanup()
> > > > > {
> > > > > 	ancillary_device_unregister(&adev);
> > > > > }
> > > > >
> > > > > Above pattern is fine too matching the core.
> > > > >
> > > > > If I understand Leon correctly, he prefers simple register(),
> > > > > unregister()
> > > > pattern.
> > > > > If, so it should be explicit register(), unregister() API.
> > > >
> > > > This is my summary
> > > > https://lore.kernel.org/linux-rdma/20201008052137.GA13580@unreal
> > > > The API should be symmetric.
> > > >
> > >
> > > I disagree to your below point.
> > > > 1. You are not providing driver/core API but simplification and
> > > > obfuscation of basic primitives and structures. This is new layer.
> > > > There is no room for a claim that we must to follow internal API.
> > > If ancillary bus has
> > > ancillary_device_add(), it cannot do device_initialize() and device_add() in
> > both.
> > >
> > > I provided two examples and what really matters is a given patchset
> > > uses (need to use) which pattern,
> > > initialize() + add(), or register() + unregister().
> > >
> > > As we all know that API is not added for future. It is the future patch
> > extends it.
> > > So lets wait for Pierre to reply if soundwire can follow register(),
> > unregister() sequence.
> > > This way same APIs can service both use-cases.
> > >
> > > Regarding,
> > > > 3. You can't "ask" from users to call internal calls (put_device)
> > > > over internal fields in ancillary_device.
> > > In that case if should be ancillary_device_put() ancillary_device_release().
> > >
> > > Or we should follow the patten of ib_alloc_device [1],
> > > ancillary_device_alloc()
> > >     -> kzalloc(adev + dev) with compile time assert check like rdma and vdpa
> > subsystem.
> > >     ->device_initialize()
> > > ancillary_device_add()
> > >
> > > ancillar_device_de() <- balances with add
> > > ancillary_device_dealloc() <-- balances with device_alloc(), which does the
> > put_device() + free the memory allocated in alloc().
> > >
> > > This approach of [1] also eliminates exposing adev.dev.release =
> > <drivers_release_method_to_free_adev> in drivers.
> > > And container_of() benefit also continues..
> > >
> > > [1]
> > > https://elixir.bootlin.com/linux/v5.9-rc8/source/include/rdma/ib_verbs
> > > .h#L2791
> > >
> >
> > My code looks like this, probably yours looks the same.
> >
> >   247                 priv->adev[i] = kzalloc(sizeof(*priv->adev[i]), GFP_KERNEL);
> >   248                 if (!priv->adev[i])
> >   249                         goto init_err;
> >   250
> >   251                 adev = &priv->adev[i]->adev;
> >   252                 adev->id = idx;
> >   253                 adev->name = mlx5_adev_devices[i].suffix;
> >   254                 adev->dev.parent = dev->device;
> >   255                 adev->dev.release = adev_release;
> >   256                 priv->adev[i]->mdev = dev;
> >   257
> >   258                 ret = ancillary_device_initialize(adev);
> >   259                 if (ret)
> >   260                         goto init_err;
> >   261
> >   262                 ret = ancillary_device_add(adev);
> >   263                 if (ret) {
> >   264                         put_device(&adev->dev);
> >   265                         goto add_err;
> >   266                 }
>
> Yes, subfunction code is also very similar.
> You expressed concerned that you didn't like put_device() at [1].
> But in above code is touching adev->dev.{parent, release} is ok?

Yes, "adev->dev.{parent, release}" is not ok, but at least it doesn't
complicate error unwinding. This is why I didn't say anything about it.

> >   254                 adev->dev.parent = dev->device;
> >   255                 adev->dev.release = adev_release;
>
> If not,
>
> We can make it elegant by doing,

I like your idea, IMHO it is more clear and less error prone.

Thanks

>
> the patten of ib_alloc_device [1],
> ancillary_device_alloc()
>     -> kzalloc(adev + dev) with compile time assert check like rdma and vdpa subsystem.
>     ->device_initialize()
> ancillary_device_add()
>
> ancillar_device_de() <- balances with add
> ancillary_device_dealloc() <-- balances with device_alloc(), which does the put_device() + free the memory allocated in alloc().
>
> This approach of [2] also eliminates exposing adev.dev.release = <drivers_release_method_to_free_adev> in drivers.
> And container_of() benefit also continues..
>
> [1] https://lore.kernel.org/linux-rdma/20201007192610.GD3964015@unreal/
> [2] https://elixir.bootlin.com/linux/v5.9-rc8/source/include/rdma/ib_verbs.h#L2791

Parav Pandit Oct. 8, 2020, 11:10 a.m. UTC | #41

> From: gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>
> Sent: Thursday, October 8, 2020 1:21 PM
> 
> On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote:
> > On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky <leon@kernel.org>
> wrote:
> > [..]
> > > All stated above is my opinion, it can be different from yours.
> >
> > Yes, but we need to converge to move this forward. Jason was involved
> > in the current organization for registration, Greg was angling for
> > this to be core functionality. I have use cases outside of RDMA and
> > netdev. Parav was ok with the current organization. The SOF folks
> > already have a proposed incorporation of it. The argument I am hearing
> > is that "this registration api seems hard for driver writers" when we
> > have several driver writers who have already taken a look and can make
> > it work. If you want to follow on with a simpler wrappers for your use
> > case, great, but I do not yet see anyone concurring with your opinion
> > that the current organization is irretrievably broken or too obscure
> > to use.
> 
> That's kind of because I tuned out of this thread a long time ago :)
> 
> I do agree with Leon that I think the current patch is not the correct way to
> do this the easiest, but don't have a competing proposal to show what I
> mean.
> 
> Yet.
Please consider the approach of ib_alloc_device(), ib_dealloc_device() and ib_register_register()/unregister().
(a) It avoids driver calling put_device() on error unwinding path.
(b) still achieves container_of().

> 
> Let's see what happens after 5.10-rc1 is out, it's too late now for any of this
> for this next merge window so we can not worry about it for a few weeks.
> 
Ok. INHO giving direction to Dave and others to either refine current APIs or follow ib_alloc_device() approach will be a helpful input.

ancillary bus can do better APIs than the newly (march 2020 !) introduced vdpa bus [1] and its drivers which follows put_device() pattern in [2] and [3] in error unwinding path.

[1] https://elixir.bootlin.com/linux/v5.9-rc8/source/drivers/vdpa/vdpa.c
[2] https://elixir.bootlin.com/linux/v5.9-rc8/source/drivers/vdpa/ifcvf/ifcvf_main.c#L475
[3] https://elixir.bootlin.com/linux/v5.9-rc8/source/drivers/vdpa/mlx5/net/mlx5_vnet.c#L1967

> thanks,
> 
> greg k-h

Pierre-Louis Bossart Oct. 8, 2020, 1:29 p.m. UTC | #42

>>>>> But ... since the init() function is performing both device_init and
>>>>> device_add - it should probably be called ancillary_device_register,
>>>>> and we are back to a single exported API for both register and
>>>>> unregister.
>>>>
>>>> Kind reminder that we introduced the two functions to allow the
>>>> caller to know if it needed to free memory when initialize() fails,
>>>> and it didn't need to free memory when add() failed since
>>>> put_device() takes care of it. If you have a single init() function
>>>> it's impossible to know which behavior to select on error.
>>>>
>>>> I also have a case with SoundWire where it's nice to first
>>>> initialize, then set some data and then add.
>>>>
>>>
>>> The flow as outlined by Parav above does an initialize as the first
>>> step, so every error path out of the function has to do a
>>> put_device(), so you would never need to manually free the memory in
>> the setup function.
>>> It would be freed in the release call.
>>
>> err = ancillary_device_initialize();
>> if (err)
>> 	return ret;
>>
>> where is the put_device() here? if the release function does any sort of
>> kfree, then you'd need to do it manually in this case.
> Since device_initialize() failed, put_device() cannot be done here.
> So yes, pseudo code should have shown,
> if (err) {
> 	kfree(adev);
> 	return err;
> }

This doesn't work if the adev is part of a larger structure allocated by 
the parent, which is pretty much the intent to extent the basic bus and 
pass additional information which can be accessed with container_of().

Only the parent can do the kfree() explicitly in that case. If the 
parent relies on devm_kzalloc, this also can make the .release callback 
with no memory free required at all.

See e.g. the code I cooked for the transition of SoundWire away from 
platform devices at

https://github.com/thesofproject/linux/pull/2484/commits/d0540ae3744f3a748d49c5fe61469d82ed816981#diff-ac8eb3d3951c024f52b1d463b5317f70R305

The allocation is done on an 'ldev' which contains 'adev'.

I really don't seen how an ancillary_device_register() could model the 
different ways to allocate memory, for maximum flexibility across 
different domains it seems more relevant to keep the initialize() and 
add() APIs separate. I will accept the argument that this puts more 
responsibility on the parent, but it also provides more flexibility to 
the parent.

If we go with the suggested solution above, that already prevents 
SoundWire from using this bus. Not so good.

Ertman, David M Oct. 8, 2020, 4:39 p.m. UTC | #43

> -----Original Message-----
> From: Parav Pandit <parav@nvidia.com>
> Sent: Thursday, October 8, 2020 4:10 AM
> To: gregkh@linuxfoundation.org; Williams, Dan J <dan.j.williams@intel.com>
> Cc: Leon Romanovsky <leon@kernel.org>; Ertman, David M
> <david.m.ertman@intel.com>; Pierre-Louis Bossart <pierre-
> louis.bossart@linux.intel.com>; alsa-devel@alsa-project.org;
> parav@mellanox.com; tiwai@suse.de; netdev@vger.kernel.org;
> ranjani.sridharan@linux.intel.com; fred.oh@linux.intel.com; linux-
> rdma@vger.kernel.org; dledford@redhat.com; broonie@kernel.org; Jason
> Gunthorpe <jgg@nvidia.com>; kuba@kernel.org; Saleem, Shiraz
> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran
> <kiran.patil@intel.com>
> Subject: RE: [PATCH v2 1/6] Add ancillary bus support
> 
> 
> > From: gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>
> > Sent: Thursday, October 8, 2020 1:21 PM
> >
> > On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote:
> > > On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky <leon@kernel.org>
> > wrote:
> > > [..]
> > > > All stated above is my opinion, it can be different from yours.
> > >
> > > Yes, but we need to converge to move this forward. Jason was involved
> > > in the current organization for registration, Greg was angling for
> > > this to be core functionality. I have use cases outside of RDMA and
> > > netdev. Parav was ok with the current organization. The SOF folks
> > > already have a proposed incorporation of it. The argument I am hearing
> > > is that "this registration api seems hard for driver writers" when we
> > > have several driver writers who have already taken a look and can make
> > > it work. If you want to follow on with a simpler wrappers for your use
> > > case, great, but I do not yet see anyone concurring with your opinion
> > > that the current organization is irretrievably broken or too obscure
> > > to use.
> >
> > That's kind of because I tuned out of this thread a long time ago :)
> >
> > I do agree with Leon that I think the current patch is not the correct way to
> > do this the easiest, but don't have a competing proposal to show what I
> > mean.
> >
> > Yet.
> Please consider the approach of ib_alloc_device(), ib_dealloc_device() and
> ib_register_register()/unregister().
> (a) It avoids driver calling put_device() on error unwinding path.
> (b) still achieves container_of().
> 
> >
> > Let's see what happens after 5.10-rc1 is out, it's too late now for any of this
> > for this next merge window so we can not worry about it for a few weeks.
> >
> Ok. INHO giving direction to Dave and others to either refine current APIs or
> follow ib_alloc_device() approach will be a helpful input.
> 
> ancillary bus can do better APIs than the newly (march 2020 !) introduced
> vdpa bus [1] and its drivers which follows put_device() pattern in [2] and [3]
> in error unwinding path.
> 
> [1] https://elixir.bootlin.com/linux/v5.9-rc8/source/drivers/vdpa/vdpa.c
> [2] https://elixir.bootlin.com/linux/v5.9-
> rc8/source/drivers/vdpa/ifcvf/ifcvf_main.c#L475
> [3] https://elixir.bootlin.com/linux/v5.9-
> rc8/source/drivers/vdpa/mlx5/net/mlx5_vnet.c#L1967
> 
> > thanks,
> >
> > greg k-h

IMHO we need to stay with the two step registration process that we currently
have (initialize then add) so that the driver writer knows if they need to explicitly 
free the memory allocated for auxillary_device.  Sound folks have indicated that 
this really helps their flow also.  Greg asked to have these two functions fully
commented with kernel-doc headers, which has been done.

Without enforcing an "auxillary_object" that contains just an auxillary_device and a
void pointer, we cannot do the allocation of memory in the bus infrastructure without
breaking the container_of functionality.

-DaveE

Ertman, David M Oct. 8, 2020, 4:42 p.m. UTC | #44

> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Thursday, October 8, 2020 1:00 AM
> To: Williams, Dan J <dan.j.williams@intel.com>
> Cc: Ertman, David M <david.m.ertman@intel.com>; Parav Pandit
> <parav@nvidia.com>; Pierre-Louis Bossart <pierre-
> louis.bossart@linux.intel.com>; alsa-devel@alsa-project.org;
> parav@mellanox.com; tiwai@suse.de; netdev@vger.kernel.org;
> ranjani.sridharan@linux.intel.com; fred.oh@linux.intel.com; linux-
> rdma@vger.kernel.org; dledford@redhat.com; broonie@kernel.org; Jason
> Gunthorpe <jgg@nvidia.com>; gregkh@linuxfoundation.org;
> kuba@kernel.org; Saleem, Shiraz <shiraz.saleem@intel.com>;
> davem@davemloft.net; Patil, Kiran <kiran.patil@intel.com>
> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> 
> On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote:
> > On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky <leon@kernel.org>
> wrote:
> > [..]
> > > All stated above is my opinion, it can be different from yours.
> >
> > Yes, but we need to converge to move this forward. Jason was involved
> > in the current organization for registration, Greg was angling for
> > this to be core functionality. I have use cases outside of RDMA and
> > netdev. Parav was ok with the current organization. The SOF folks
> > already have a proposed incorporation of it. The argument I am hearing
> > is that "this registration api seems hard for driver writers" when we
> > have several driver writers who have already taken a look and can make
> > it work. If you want to follow on with a simpler wrappers for your use
> > case, great, but I do not yet see anyone concurring with your opinion
> > that the current organization is irretrievably broken or too obscure
> > to use.
> 
> Can it be that I'm first one to use this bus for very large driver (>120K LOC)
> that has 5 different ->probe() flows?
> 
> For example, this https://lore.kernel.org/linux-
> rdma/20201006172317.GN1874917@unreal/
> hints to me that this bus wasn't used with anything complex as it was initially
> intended.
> 
> And regarding registration, I said many times that init()/add() scheme is ok,
> the inability
> to call to uninit() after add() failure is not ok from my point of view.

So, to address your concern of not being able to call an uninit after a add failure
I can break the unregister flow into two steps also.  An uninit and a delete to mirror
the registration process's init and add.

Would this make the registration and un-registration flow acceptable?

-DaveE



> 
> Thanks

Ertman, David M Oct. 8, 2020, 4:54 p.m. UTC | #45

> -----Original Message-----

> From: Parav Pandit <parav@nvidia.com>

> Sent: Wednesday, October 7, 2020 9:56 PM

> To: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; Ertman,

> David M <david.m.ertman@intel.com>; Leon Romanovsky

> <leon@kernel.org>

> Cc: alsa-devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;

> netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com;

> fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;

> dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe

> <jgg@nvidia.com>; gregkh@linuxfoundation.org; kuba@kernel.org; Williams,

> Dan J <dan.j.williams@intel.com>; Saleem, Shiraz

> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran

> <kiran.patil@intel.com>

> Subject: RE: [PATCH v2 1/6] Add ancillary bus support

> 

> 

> 

> > From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

> > Sent: Thursday, October 8, 2020 3:20 AM

> >

> >

> > On 10/7/20 4:22 PM, Ertman, David M wrote:

> > >> -----Original Message-----

> > >> From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

> > >> Sent: Wednesday, October 7, 2020 1:59 PM

> > >> To: Ertman, David M <david.m.ertman@intel.com>; Parav Pandit

> > >> <parav@nvidia.com>; Leon Romanovsky <leon@kernel.org>

> > >> Cc: alsa-devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;

> > >> netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com;

> > >> fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;

> > >> dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe

> > >> <jgg@nvidia.com>; gregkh@linuxfoundation.org; kuba@kernel.org;

> > >> Williams, Dan J <dan.j.williams@intel.com>; Saleem, Shiraz

> > >> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran

> > >> <kiran.patil@intel.com>

> > >> Subject: Re: [PATCH v2 1/6] Add ancillary bus support

> > >>

> > >>

> > >>

> > >>>> Below is most simple, intuitive and matching with core APIs for

> > >>>> name and design pattern wise.

> > >>>> init()

> > >>>> {

> > >>>> 	err = ancillary_device_initialize();

> > >>>> 	if (err)

> > >>>> 		return ret;

> > >>>>

> > >>>> 	err = ancillary_device_add();

> > >>>> 	if (ret)

> > >>>> 		goto err_unwind;

> > >>>>

> > >>>> 	err = some_foo();

> > >>>> 	if (err)

> > >>>> 		goto err_foo;

> > >>>> 	return 0;

> > >>>>

> > >>>> err_foo:

> > >>>> 	ancillary_device_del(adev);

> > >>>> err_unwind:

> > >>>> 	ancillary_device_put(adev->dev);

> > >>>> 	return err;

> > >>>> }

> > >>>>

> > >>>> cleanup()

> > >>>> {

> > >>>> 	ancillary_device_de(adev);

> > >>>> 	ancillary_device_put(adev);

> > >>>> 	/* It is common to have a one wrapper for this as

> > >>>> ancillary_device_unregister().

> > >>>> 	 * This will match with core device_unregister() that has precise

> > >>>> documentation.

> > >>>> 	 * but given fact that init() code need proper error unwinding,

> > >>>> like above,

> > >>>> 	 * it make sense to have two APIs, and no need to export another

> > >>>> symbol for unregister().

> > >>>> 	 * This pattern is very easy to audit and code.

> > >>>> 	 */

> > >>>> }

> > >>>

> > >>> I like this flow +1

> > >>>

> > >>> But ... since the init() function is performing both device_init and

> > >>> device_add - it should probably be called ancillary_device_register,

> > >>> and we are back to a single exported API for both register and

> > >>> unregister.

> > >>

> > >> Kind reminder that we introduced the two functions to allow the

> > >> caller to know if it needed to free memory when initialize() fails,

> > >> and it didn't need to free memory when add() failed since

> > >> put_device() takes care of it. If you have a single init() function

> > >> it's impossible to know which behavior to select on error.

> > >>

> > >> I also have a case with SoundWire where it's nice to first

> > >> initialize, then set some data and then add.

> > >>

> > >

> > > The flow as outlined by Parav above does an initialize as the first

> > > step, so every error path out of the function has to do a

> > > put_device(), so you would never need to manually free the memory in

> > the setup function.

> > > It would be freed in the release call.

> >

> > err = ancillary_device_initialize();

> > if (err)

> > 	return ret;

> >

> > where is the put_device() here? if the release function does any sort of

> > kfree, then you'd need to do it manually in this case.

> Since device_initialize() failed, put_device() cannot be done here.

> So yes, pseudo code should have shown,

> if (err) {

> 	kfree(adev);

> 	return err;

> }

> 

> If we just want to follow register(), unregister() pattern,

> 

> Than,

> 

> ancillar_device_register() should be,

> 

> /**

>  * ancillar_device_register() - register an ancillary device

>  * NOTE: __never directly free @adev after calling this function, even if it

> returned

>  * an error. Always use ancillary_device_put() to give up the reference

> initialized by this function.

>  * This note matches with the core and caller knows exactly what to be done.

>  */

> ancillary_device_register()

> {

> 	device_initialize(&adev->dev);

> 	if (!dev->parent || !adev->name)

> 		return -EINVAL;

> 	if (!dev->release && !(dev->type && dev->type->release)) {

> 		/* core is already capable and throws the warning when

> release callback is not set.

> 		 * It is done at drivers/base/core.c:1798.

> 		 * For NULL release it says, "does not have a release()

> function, it is broken and must be fixed"

> 		 */

> 		return -EINVAL;

> 	}

That code is in device_release().  Because of this check we will never hit that code.

We either need to leave the error message here, or if we are going to rely on the core
to find this condition at the end of the process, then we need to completely remove
this check from the registration flow.

-DaveE

Leon Romanovsky Oct. 8, 2020, 5:20 p.m. UTC | #46

On Mon, Oct 05, 2020 at 11:24:41AM -0700, Dave Ertman wrote:
> Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> It enables drivers to create an ancillary_device and bind an
> ancillary_driver to it.
>
> The bus supports probe/remove shutdown and suspend/resume callbacks.
> Each ancillary_device has a unique string based id; driver binds to
> an ancillary_device based on this id through the bus.
>
> Co-developed-by: Kiran Patil <kiran.patil@intel.com>
> Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> Co-developed-by: Fred Oh <fred.oh@linux.intel.com>
> Signed-off-by: Fred Oh <fred.oh@linux.intel.com>
> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
> Reviewed-by: Parav Pandit <parav@mellanox.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> ---

<...>

> +
> +static const struct ancillary_device_id *ancillary_match_id(const struct ancillary_device_id *id,
> +							    const struct ancillary_device *ancildev)
> +{
> +	while (id->name[0]) {
> +		const char *p = strrchr(dev_name(&ancildev->dev), '.');
> +		int match_size;
> +
> +		if (!p) {
> +			id++;
> +			continue;
> +		}
> +		match_size = p - dev_name(&ancildev->dev);
> +
> +		/* use dev_name(&ancildev->dev) prefix before last '.' char to match to */
> +		if (!strncmp(dev_name(&ancildev->dev), id->name, match_size))

This check is wrong, it causes to wrong matching if strlen(id->name) > match_size
In my case, the trigger was:
[    5.175848] ancillary:ancillary_match_id: dev mlx5_core.ib.0, id mlx5_core.ib_rep

Leon Romanovsky Oct. 8, 2020, 5:21 p.m. UTC | #47

On Thu, Oct 08, 2020 at 04:42:48PM +0000, Ertman, David M wrote:
> > -----Original Message-----
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Thursday, October 8, 2020 1:00 AM
> > To: Williams, Dan J <dan.j.williams@intel.com>
> > Cc: Ertman, David M <david.m.ertman@intel.com>; Parav Pandit
> > <parav@nvidia.com>; Pierre-Louis Bossart <pierre-
> > louis.bossart@linux.intel.com>; alsa-devel@alsa-project.org;
> > parav@mellanox.com; tiwai@suse.de; netdev@vger.kernel.org;
> > ranjani.sridharan@linux.intel.com; fred.oh@linux.intel.com; linux-
> > rdma@vger.kernel.org; dledford@redhat.com; broonie@kernel.org; Jason
> > Gunthorpe <jgg@nvidia.com>; gregkh@linuxfoundation.org;
> > kuba@kernel.org; Saleem, Shiraz <shiraz.saleem@intel.com>;
> > davem@davemloft.net; Patil, Kiran <kiran.patil@intel.com>
> > Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> >
> > On Thu, Oct 08, 2020 at 12:38:00AM -0700, Dan Williams wrote:
> > > On Thu, Oct 8, 2020 at 12:01 AM Leon Romanovsky <leon@kernel.org>
> > wrote:
> > > [..]
> > > > All stated above is my opinion, it can be different from yours.
> > >
> > > Yes, but we need to converge to move this forward. Jason was involved
> > > in the current organization for registration, Greg was angling for
> > > this to be core functionality. I have use cases outside of RDMA and
> > > netdev. Parav was ok with the current organization. The SOF folks
> > > already have a proposed incorporation of it. The argument I am hearing
> > > is that "this registration api seems hard for driver writers" when we
> > > have several driver writers who have already taken a look and can make
> > > it work. If you want to follow on with a simpler wrappers for your use
> > > case, great, but I do not yet see anyone concurring with your opinion
> > > that the current organization is irretrievably broken or too obscure
> > > to use.
> >
> > Can it be that I'm first one to use this bus for very large driver (>120K LOC)
> > that has 5 different ->probe() flows?
> >
> > For example, this https://lore.kernel.org/linux-
> > rdma/20201006172317.GN1874917@unreal/
> > hints to me that this bus wasn't used with anything complex as it was initially
> > intended.
> >
> > And regarding registration, I said many times that init()/add() scheme is ok,
> > the inability
> > to call to uninit() after add() failure is not ok from my point of view.
>
> So, to address your concern of not being able to call an uninit after a add failure
> I can break the unregister flow into two steps also.  An uninit and a delete to mirror
> the registration process's init and add.
>
> Would this make the registration and un-registration flow acceptable?

Yes, sure.

>
> -DaveE
>
>
>
> >
> > Thanks

Ertman, David M Oct. 8, 2020, 5:28 p.m. UTC | #48

> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Thursday, October 8, 2020 10:20 AM
> To: Ertman, David M <david.m.ertman@intel.com>
> Cc: alsa-devel@alsa-project.org; tiwai@suse.de; broonie@kernel.org; linux-
> rdma@vger.kernel.org; jgg@nvidia.com; dledford@redhat.com;
> netdev@vger.kernel.org; davem@davemloft.net; kuba@kernel.org;
> gregkh@linuxfoundation.org; ranjani.sridharan@linux.intel.com; pierre-
> louis.bossart@linux.intel.com; fred.oh@linux.intel.com;
> parav@mellanox.com; Saleem, Shiraz <shiraz.saleem@intel.com>; Williams,
> Dan J <dan.j.williams@intel.com>; Patil, Kiran <kiran.patil@intel.com>
> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> 
> On Mon, Oct 05, 2020 at 11:24:41AM -0700, Dave Ertman wrote:
> > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > It enables drivers to create an ancillary_device and bind an
> > ancillary_driver to it.
> >
> > The bus supports probe/remove shutdown and suspend/resume callbacks.
> > Each ancillary_device has a unique string based id; driver binds to
> > an ancillary_device based on this id through the bus.
> >
> > Co-developed-by: Kiran Patil <kiran.patil@intel.com>
> > Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> > Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> > Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> > Co-developed-by: Fred Oh <fred.oh@linux.intel.com>
> > Signed-off-by: Fred Oh <fred.oh@linux.intel.com>
> > Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > Reviewed-by: Parav Pandit <parav@mellanox.com>
> > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> > ---
> 
> <...>
> 
> > +
> > +static const struct ancillary_device_id *ancillary_match_id(const struct
> ancillary_device_id *id,
> > +							    const struct
> ancillary_device *ancildev)
> > +{
> > +	while (id->name[0]) {
> > +		const char *p = strrchr(dev_name(&ancildev->dev), '.');
> > +		int match_size;
> > +
> > +		if (!p) {
> > +			id++;
> > +			continue;
> > +		}
> > +		match_size = p - dev_name(&ancildev->dev);
> > +
> > +		/* use dev_name(&ancildev->dev) prefix before last '.' char
> to match to */
> > +		if (!strncmp(dev_name(&ancildev->dev), id->name,
> match_size))
> 
> This check is wrong, it causes to wrong matching if strlen(id->name) >
> match_size
> In my case, the trigger was:
> [    5.175848] ancillary:ancillary_match_id: dev mlx5_core.ib.0, id
> mlx5_core.ib_rep

Nice catch , I will look into this.

-DaveE

> 
> From cf8f10af72f9e0d57c7ec077d59238cc12b0650f Mon Sep 17 00:00:00 2001
> From: Leon Romanovsky <leonro@nvidia.com>
> Date: Thu, 8 Oct 2020 19:40:03 +0300
> Subject: [PATCH] fixup! Fixes to ancillary bus
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/bus/ancillary.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/ancillary.c b/drivers/bus/ancillary.c
> index 54858f744ef5..615ce40ef8e4 100644
> --- a/drivers/bus/ancillary.c
> +++ b/drivers/bus/ancillary.c
> @@ -31,8 +31,10 @@ static const struct ancillary_device_id
> *ancillary_match_id(const struct ancilla
>  		match_size = p - dev_name(&ancildev->dev);
> 
>  		/* use dev_name(&ancildev->dev) prefix before last '.' char
> to match to */
> -		if (!strncmp(dev_name(&ancildev->dev), id->name,
> match_size))
> +		if (match_size == strlen(id->name) &&
> !strncmp(dev_name(&ancildev->dev), id->name, match_size)) {
>  			return id;
> +		}
> +
>  		id++;
>  	}
>  	return NULL;
> --
> 2.26.2
> 
> 
> 
> > +			return id;
> > +		id++;
> > +	}
> > +	return NULL;
> > +}

Parav Pandit Oct. 8, 2020, 5:35 p.m. UTC | #49

> From: Ertman, David M <david.m.ertman@intel.com>

> Sent: Thursday, October 8, 2020 10:24 PM


> > From: Parav Pandit <parav@nvidia.com>

> > Sent: Wednesday, October 7, 2020 9:56 PM



> > /**

> >  * ancillar_device_register() - register an ancillary device

> >  * NOTE: __never directly free @adev after calling this function, even

> > if it returned

> >  * an error. Always use ancillary_device_put() to give up the

> > reference initialized by this function.

> >  * This note matches with the core and caller knows exactly what to be

> done.

> >  */

> > ancillary_device_register()

> > {

> > 	device_initialize(&adev->dev);

> > 	if (!dev->parent || !adev->name)

> > 		return -EINVAL;

> > 	if (!dev->release && !(dev->type && dev->type->release)) {

> > 		/* core is already capable and throws the warning when

> release

> > callback is not set.

> > 		 * It is done at drivers/base/core.c:1798.

> > 		 * For NULL release it says, "does not have a release()

> function, it

> > is broken and must be fixed"

> > 		 */

> > 		return -EINVAL;

> > 	}

> That code is in device_release().  Because of this check we will never hit that

> code.

> 

> We either need to leave the error message here, or if we are going to rely on

> the core to find this condition at the end of the process, then we need to

> completely remove this check from the registration flow.

> 

Yes. Since the core is checking it, ancillary bus doesn't need to check here and release callback check can be removed.

> -DaveE

Ertman, David M Oct. 8, 2020, 6:13 p.m. UTC | #50

> -----Original Message-----

> From: Parav Pandit <parav@nvidia.com>

> Sent: Thursday, October 8, 2020 10:35 AM

> To: Ertman, David M <david.m.ertman@intel.com>; Pierre-Louis Bossart

> <pierre-louis.bossart@linux.intel.com>; Leon Romanovsky

> <leon@kernel.org>

> Cc: alsa-devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;

> netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com;

> fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;

> dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe

> <jgg@nvidia.com>; gregkh@linuxfoundation.org; kuba@kernel.org; Williams,

> Dan J <dan.j.williams@intel.com>; Saleem, Shiraz

> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran

> <kiran.patil@intel.com>

> Subject: RE: [PATCH v2 1/6] Add ancillary bus support

> 

> 

> 

> > From: Ertman, David M <david.m.ertman@intel.com>

> > Sent: Thursday, October 8, 2020 10:24 PM

> 

> > > From: Parav Pandit <parav@nvidia.com>

> > > Sent: Wednesday, October 7, 2020 9:56 PM

> 

> 

> > > /**

> > >  * ancillar_device_register() - register an ancillary device

> > >  * NOTE: __never directly free @adev after calling this function, even

> > > if it returned

> > >  * an error. Always use ancillary_device_put() to give up the

> > > reference initialized by this function.

> > >  * This note matches with the core and caller knows exactly what to be

> > done.

> > >  */

> > > ancillary_device_register()

> > > {

> > > 	device_initialize(&adev->dev);

> > > 	if (!dev->parent || !adev->name)

> > > 		return -EINVAL;

> > > 	if (!dev->release && !(dev->type && dev->type->release)) {

> > > 		/* core is already capable and throws the warning when

> > release

> > > callback is not set.

> > > 		 * It is done at drivers/base/core.c:1798.

> > > 		 * For NULL release it says, "does not have a release()

> > function, it

> > > is broken and must be fixed"

> > > 		 */

> > > 		return -EINVAL;

> > > 	}

> > That code is in device_release().  Because of this check we will never hit

> that

> > code.

> >

> > We either need to leave the error message here, or if we are going to rely

> on

> > the core to find this condition at the end of the process, then we need to

> > completely remove this check from the registration flow.

> >

> Yes. Since the core is checking it, ancillary bus doesn't need to check here and

> release callback check can be removed.


Will do

> 

> > -DaveE

Ertman, David M Oct. 8, 2020, 6:25 p.m. UTC | #51

> -----Original Message-----

> From: Dan Williams <dan.j.williams@intel.com>

> Sent: Wednesday, October 7, 2020 11:32 PM

> To: Leon Romanovsky <leon@kernel.org>

> Cc: Ertman, David M <david.m.ertman@intel.com>; Parav Pandit

> <parav@nvidia.com>; Pierre-Louis Bossart <pierre-

> louis.bossart@linux.intel.com>; alsa-devel@alsa-project.org;

> parav@mellanox.com; tiwai@suse.de; netdev@vger.kernel.org;

> ranjani.sridharan@linux.intel.com; fred.oh@linux.intel.com; linux-

> rdma@vger.kernel.org; dledford@redhat.com; broonie@kernel.org; Jason

> Gunthorpe <jgg@nvidia.com>; gregkh@linuxfoundation.org;

> kuba@kernel.org; Saleem, Shiraz <shiraz.saleem@intel.com>;

> davem@davemloft.net; Patil, Kiran <kiran.patil@intel.com>

> Subject: Re: [PATCH v2 1/6] Add ancillary bus support

> 

> On Wed, Oct 7, 2020 at 10:21 PM Leon Romanovsky <leon@kernel.org>

> wrote:

> >

> > On Wed, Oct 07, 2020 at 08:46:45PM +0000, Ertman, David M wrote:

> > > > -----Original Message-----

> > > > From: Parav Pandit <parav@nvidia.com>

> > > > Sent: Wednesday, October 7, 2020 1:17 PM

> > > > To: Leon Romanovsky <leon@kernel.org>; Ertman, David M

> > > > <david.m.ertman@intel.com>

> > > > Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>; alsa-

> > > > devel@alsa-project.org; parav@mellanox.com; tiwai@suse.de;

> > > > netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com;

> > > > fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;

> > > > dledford@redhat.com; broonie@kernel.org; Jason Gunthorpe

> > > > <jgg@nvidia.com>; gregkh@linuxfoundation.org; kuba@kernel.org;

> Williams,

> > > > Dan J <dan.j.williams@intel.com>; Saleem, Shiraz

> > > > <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran

> > > > <kiran.patil@intel.com>

> > > > Subject: RE: [PATCH v2 1/6] Add ancillary bus support

> > > >

> > > >

> > > > > From: Leon Romanovsky <leon@kernel.org>

> > > > > Sent: Thursday, October 8, 2020 12:56 AM

> > > > >

> > > > > > > This API is partially obscures low level driver-core code and needs

> > > > > > > to provide clear and proper abstractions without need to

> remember

> > > > > > > about put_device. There is already _add() interface why don't you

> do

> > > > > > > put_device() in it?

> > > > > > >

> > > > > >

> > > > > > The pushback Pierre is referring to was during our mid-tier internal

> > > > > > review.  It was primarily a concern of Parav as I recall, so he can

> speak to

> > > > his

> > > > > reasoning.

> > > > > >

> > > > > > What we originally had was a single API call

> > > > > > (ancillary_device_register) that started with a call to

> > > > > > device_initialize(), and every error path out of the function

> performed a

> > > > > put_device().

> > > > > >

> > > > > > Is this the model you have in mind?

> > > > >

> > > > > I don't like this flow:

> > > > > ancillary_device_initialize()

> > > > > if (ancillary_ancillary_device_add()) {

> > > > >   put_device(....)

> > > > >   ancillary_device_unregister()

> > > > Calling device_unregister() is incorrect, because add() wasn't

> successful.

> > > > Only put_device() or a wrapper ancillary_device_put() is necessary.

> > > >

> > > > >   return err;

> > > > > }

> > > > >

> > > > > And prefer this flow:

> > > > > ancillary_device_initialize()

> > > > > if (ancillary_device_add()) {

> > > > >   ancillary_device_unregister()

> > > > This is incorrect and a clear deviation from the current core APIs that

> adds the

> > > > confusion.

> > > >

> > > > >   return err;

> > > > > }

> > > > >

> > > > > In this way, the ancillary users won't need to do non-intuitive

> put_device();

> > > >

> > > > Below is most simple, intuitive and matching with core APIs for name

> and

> > > > design pattern wise.

> > > > init()

> > > > {

> > > >     err = ancillary_device_initialize();

> > > >     if (err)

> > > >             return ret;

> > > >

> > > >     err = ancillary_device_add();

> > > >     if (ret)

> > > >             goto err_unwind;

> > > >

> > > >     err = some_foo();

> > > >     if (err)

> > > >             goto err_foo;

> > > >     return 0;

> > > >

> > > > err_foo:

> > > >     ancillary_device_del(adev);

> > > > err_unwind:

> > > >     ancillary_device_put(adev->dev);

> > > >     return err;

> > > > }

> > > >

> > > > cleanup()

> > > > {

> > > >     ancillary_device_de(adev);

> > > >     ancillary_device_put(adev);

> > > >     /* It is common to have a one wrapper for this as

> > > > ancillary_device_unregister().

> > > >      * This will match with core device_unregister() that has precise

> > > > documentation.

> > > >      * but given fact that init() code need proper error unwinding, like

> > > > above,

> > > >      * it make sense to have two APIs, and no need to export another

> > > > symbol for unregister().

> > > >      * This pattern is very easy to audit and code.

> > > >      */

> > > > }

> > >

> > > I like this flow +1

> > >

> > > But ... since the init() function is performing both device_init and

> > > device_add - it should probably be called ancillary_device_register,

> > > and we are back to a single exported API for both register and

> > > unregister.

> > >

> > > At that point, do we need wrappers on the primitives init, add, del,

> > > and put?

> >

> > Let me summarize.

> > 1. You are not providing driver/core API but simplification and obfuscation

> > of basic primitives and structures. This is new layer. There is no room for

> > a claim that we must to follow internal API.

> 

> Yes, this a driver core api, Greg even questioned why it was in

> drivers/bus instead of drivers/base which I think makes sense.


Will move to drivers/base with next patch set.

-DaveE

Ertman, David M Oct. 8, 2020, 10:04 p.m. UTC | #52

> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Tuesday, October 6, 2020 10:23 AM
> To: Ertman, David M <david.m.ertman@intel.com>
> Cc: alsa-devel@alsa-project.org; tiwai@suse.de; broonie@kernel.org; linux-
> rdma@vger.kernel.org; jgg@nvidia.com; dledford@redhat.com;
> netdev@vger.kernel.org; davem@davemloft.net; kuba@kernel.org;
> gregkh@linuxfoundation.org; ranjani.sridharan@linux.intel.com; pierre-
> louis.bossart@linux.intel.com; fred.oh@linux.intel.com;
> parav@mellanox.com; Saleem, Shiraz <shiraz.saleem@intel.com>; Williams,
> Dan J <dan.j.williams@intel.com>; Patil, Kiran <kiran.patil@intel.com>
> Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> 
> On Mon, Oct 05, 2020 at 11:24:41AM -0700, Dave Ertman wrote:
> > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > It enables drivers to create an ancillary_device and bind an
> > ancillary_driver to it.
> >
> > The bus supports probe/remove shutdown and suspend/resume callbacks.
> > Each ancillary_device has a unique string based id; driver binds to
> > an ancillary_device based on this id through the bus.
> >
> > Co-developed-by: Kiran Patil <kiran.patil@intel.com>
> > Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> > Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> > Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> > Co-developed-by: Fred Oh <fred.oh@linux.intel.com>
> > Signed-off-by: Fred Oh <fred.oh@linux.intel.com>
> > Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > Reviewed-by: Parav Pandit <parav@mellanox.com>
> > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> > ---
> 
> <...>
> 
> > +/**
> > + * __ancillary_driver_register - register a driver for ancillary bus devices
> > + * @ancildrv: ancillary_driver structure
> > + * @owner: owning module/driver
> > + */
> > +int __ancillary_driver_register(struct ancillary_driver *ancildrv, struct
> module *owner)
> > +{
> > +	if (WARN_ON(!ancildrv->probe) || WARN_ON(!ancildrv->remove)
> ||
> > +	    WARN_ON(!ancildrv->shutdown) || WARN_ON(!ancildrv-
> >id_table))
> > +		return -EINVAL;
> 
> In our driver ->shutdown is empty, it will be best if ancillary bus will
> do "if (->remove) ..->remove()" pattern.
>

Yes, looking it over, only the probe needs to mandatory.  I will change the others to the
conditional model, and adjust the WARN_ONs.

 
> > +
> > +	ancildrv->driver.owner = owner;
> > +	ancildrv->driver.bus = &ancillary_bus_type;
> > +	ancildrv->driver.probe = ancillary_probe_driver;
> > +	ancildrv->driver.remove = ancillary_remove_driver;
> > +	ancildrv->driver.shutdown = ancillary_shutdown_driver;
> > +
>
> I think that this part is wrong, probe/remove/shutdown functions should
> come from ancillary_bus_type.

Dan Williams Oct. 8, 2020, 10:41 p.m. UTC | #53

On Thu, Oct 8, 2020 at 3:04 PM Ertman, David M <david.m.ertman@intel.com> wrote:
>
> > -----Original Message-----
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Tuesday, October 6, 2020 10:23 AM
> > To: Ertman, David M <david.m.ertman@intel.com>
> > Cc: alsa-devel@alsa-project.org; tiwai@suse.de; broonie@kernel.org; linux-
> > rdma@vger.kernel.org; jgg@nvidia.com; dledford@redhat.com;
> > netdev@vger.kernel.org; davem@davemloft.net; kuba@kernel.org;
> > gregkh@linuxfoundation.org; ranjani.sridharan@linux.intel.com; pierre-
> > louis.bossart@linux.intel.com; fred.oh@linux.intel.com;
> > parav@mellanox.com; Saleem, Shiraz <shiraz.saleem@intel.com>; Williams,
> > Dan J <dan.j.williams@intel.com>; Patil, Kiran <kiran.patil@intel.com>
> > Subject: Re: [PATCH v2 1/6] Add ancillary bus support
> >
> > On Mon, Oct 05, 2020 at 11:24:41AM -0700, Dave Ertman wrote:
> > > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > > It enables drivers to create an ancillary_device and bind an
> > > ancillary_driver to it.
> > >
> > > The bus supports probe/remove shutdown and suspend/resume callbacks.
> > > Each ancillary_device has a unique string based id; driver binds to
> > > an ancillary_device based on this id through the bus.
> > >
> > > Co-developed-by: Kiran Patil <kiran.patil@intel.com>
> > > Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> > > Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> > > Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> > > Co-developed-by: Fred Oh <fred.oh@linux.intel.com>
> > > Signed-off-by: Fred Oh <fred.oh@linux.intel.com>
> > > Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > > Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > > Reviewed-by: Parav Pandit <parav@mellanox.com>
> > > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> > > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> > > ---
> >
> > <...>
> >
> > > +/**
> > > + * __ancillary_driver_register - register a driver for ancillary bus devices
> > > + * @ancildrv: ancillary_driver structure
> > > + * @owner: owning module/driver
> > > + */
> > > +int __ancillary_driver_register(struct ancillary_driver *ancildrv, struct
> > module *owner)
> > > +{
> > > +   if (WARN_ON(!ancildrv->probe) || WARN_ON(!ancildrv->remove)
> > ||
> > > +       WARN_ON(!ancildrv->shutdown) || WARN_ON(!ancildrv-
> > >id_table))
> > > +           return -EINVAL;
> >
> > In our driver ->shutdown is empty, it will be best if ancillary bus will
> > do "if (->remove) ..->remove()" pattern.
> >
>
> Yes, looking it over, only the probe needs to mandatory.  I will change the others to the
> conditional model, and adjust the WARN_ONs.
>
>
> > > +
> > > +   ancildrv->driver.owner = owner;
> > > +   ancildrv->driver.bus = &ancillary_bus_type;
> > > +   ancildrv->driver.probe = ancillary_probe_driver;
> > > +   ancildrv->driver.remove = ancillary_remove_driver;
> > > +   ancildrv->driver.shutdown = ancillary_shutdown_driver;
> > > +
> >
> > I think that this part is wrong, probe/remove/shutdown functions should
> > come from ancillary_bus_type.
>
> From checking other usage cases, this is the model that is used for probe, remove,
> and shutdown in drivers.  Here is the example from Greybus.
>
> int greybus_register_driver(struct greybus_driver *driver, struct module *owner,
>                             const char *mod_name)
> {
>         int retval;
>
>         if (greybus_disabled())
>                 return -ENODEV;
>
>         driver->driver.bus = &greybus_bus_type;
>         driver->driver.name = driver->name;
>         driver->driver.probe = greybus_probe;
>         driver->driver.remove = greybus_remove;
>         driver->driver.owner = owner;
>         driver->driver.mod_name = mod_name;
>
>
> > You are overwriting private device_driver
> > callbacks that makes impossible to make container_of of ancillary_driver
> > to chain operations.
> >
>
> I am sorry, you lost me here.  you cannot perform container_of on the callbacks
> because they are pointers, but if you are referring to going from device_driver
> to the auxiliary_driver, that is what happens in auxiliary_probe_driver in the
> very beginning.
>
> static int auxiliary_probe_driver(struct device *dev)
> 145 {
> 146         struct auxiliary_driver *auxdrv = to_auxiliary_drv(dev->driver);
> 147         struct auxiliary_device *auxdev = to_auxiliary_dev(dev);
>
> Did I miss your meaning?

I think you're misunderstanding the cases when the
bus_type.{probe,remove} is used vs the driver.{probe,remove}
callbacks. The bus_type callbacks are to implement a pattern where the
'probe' and 'remove' method are typed to the bus device type. For
example 'struct pci_dev *' instead of raw 'struct device *'. See this
conversion of dax bus as an example of going from raw 'struct device
*' typed probe/remove to dax-device typed probe/remove:

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=75797273189d

Leon Romanovsky Oct. 9, 2020, 11:40 a.m. UTC | #54

On Thu, Oct 08, 2020 at 08:29:00AM -0500, Pierre-Louis Bossart wrote:
>
> > > > > > But ... since the init() function is performing both device_init and
> > > > > > device_add - it should probably be called ancillary_device_register,
> > > > > > and we are back to a single exported API for both register and
> > > > > > unregister.
> > > > >
> > > > > Kind reminder that we introduced the two functions to allow the
> > > > > caller to know if it needed to free memory when initialize() fails,
> > > > > and it didn't need to free memory when add() failed since
> > > > > put_device() takes care of it. If you have a single init() function
> > > > > it's impossible to know which behavior to select on error.
> > > > >
> > > > > I also have a case with SoundWire where it's nice to first
> > > > > initialize, then set some data and then add.
> > > > >
> > > >
> > > > The flow as outlined by Parav above does an initialize as the first
> > > > step, so every error path out of the function has to do a
> > > > put_device(), so you would never need to manually free the memory in
> > > the setup function.
> > > > It would be freed in the release call.
> > >
> > > err = ancillary_device_initialize();
> > > if (err)
> > > 	return ret;
> > >
> > > where is the put_device() here? if the release function does any sort of
> > > kfree, then you'd need to do it manually in this case.
> > Since device_initialize() failed, put_device() cannot be done here.
> > So yes, pseudo code should have shown,
> > if (err) {
> > 	kfree(adev);
> > 	return err;
> > }
>
> This doesn't work if the adev is part of a larger structure allocated by the
> parent, which is pretty much the intent to extent the basic bus and pass
> additional information which can be accessed with container_of().

Please take a look how ib_alloc_device() is implemented. It does all
that you wrote above in very similar manner to netdev_alloc.

In a nutshell, ib_alloc_device receives needed size from the user and
requires from the users to extend their structures below "general" one.

Thanks

Pierre-Louis Bossart Oct. 9, 2020, 2:26 p.m. UTC | #55

>>>> +
>>>> +   ancildrv->driver.owner = owner;
>>>> +   ancildrv->driver.bus = &ancillary_bus_type;
>>>> +   ancildrv->driver.probe = ancillary_probe_driver;
>>>> +   ancildrv->driver.remove = ancillary_remove_driver;
>>>> +   ancildrv->driver.shutdown = ancillary_shutdown_driver;
>>>> +
>>>
>>> I think that this part is wrong, probe/remove/shutdown functions should
>>> come from ancillary_bus_type.
>>
>>  From checking other usage cases, this is the model that is used for probe, remove,
>> and shutdown in drivers.  Here is the example from Greybus.
>>
>> int greybus_register_driver(struct greybus_driver *driver, struct module *owner,
>>                              const char *mod_name)
>> {
>>          int retval;
>>
>>          if (greybus_disabled())
>>                  return -ENODEV;
>>
>>          driver->driver.bus = &greybus_bus_type;
>>          driver->driver.name = driver->name;
>>          driver->driver.probe = greybus_probe;
>>          driver->driver.remove = greybus_remove;
>>          driver->driver.owner = owner;
>>          driver->driver.mod_name = mod_name;
>>
>>
>>> You are overwriting private device_driver
>>> callbacks that makes impossible to make container_of of ancillary_driver
>>> to chain operations.
>>>
>>
>> I am sorry, you lost me here.  you cannot perform container_of on the callbacks
>> because they are pointers, but if you are referring to going from device_driver
>> to the auxiliary_driver, that is what happens in auxiliary_probe_driver in the
>> very beginning.
>>
>> static int auxiliary_probe_driver(struct device *dev)
>> 145 {
>> 146         struct auxiliary_driver *auxdrv = to_auxiliary_drv(dev->driver);
>> 147         struct auxiliary_device *auxdev = to_auxiliary_dev(dev);
>>
>> Did I miss your meaning?
> 
> I think you're misunderstanding the cases when the
> bus_type.{probe,remove} is used vs the driver.{probe,remove}
> callbacks. The bus_type callbacks are to implement a pattern where the
> 'probe' and 'remove' method are typed to the bus device type. For
> example 'struct pci_dev *' instead of raw 'struct device *'. See this
> conversion of dax bus as an example of going from raw 'struct device
> *' typed probe/remove to dax-device typed probe/remove:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=75797273189d

Thanks Dan for the reference, very useful. This doesn't look like a a 
big change to implement, just wondering about the benefits and 
drawbacks, if any? I am a bit confused here.

First, was the initial pattern wrong as Leon asserted it? Such code 
exists in multiple examples in the kernel and there's nothing preventing 
the use of container_of that I can think of. Put differently, if this 
code was wrong then there are other existing buses that need to be updated.

Second, what additional functionality does this move from driver to 
bus_type provide? The commit reference just states 'In preparation for 
introducing seed devices the dax-bus core needs to be able to intercept 
->probe() and ->remove() operations", but that doesn't really help me 
figure out what 'intercept' means. Would you mind elaborating?

And last, the existing probe function does calls dev_pm_domain_attach():

static int ancillary_probe_driver(struct device *dev)
{
	struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
	struct ancillary_device *ancildev = to_ancillary_dev(dev);
	int ret;

	ret = dev_pm_domain_attach(dev, true);

So the need to access the raw device still exists. Is this still legit 
if the probe() is moved to the bus_type structure?

I have no objection to this change if it preserves the same 
functionality and possibly extends it, just wanted to better understand 
the reasons for the change and in which cases the bus probe() makes more 
sense than a driver probe().

Thanks for enlightening the rest of us!

Dan Williams Oct. 9, 2020, 7:22 p.m. UTC | #56

On Fri, Oct 9, 2020 at 7:27 AM Pierre-Louis Bossart
<pierre-louis.bossart@linux.intel.com> wrote:
>
>
>
> >>>> +
> >>>> +   ancildrv->driver.owner = owner;
> >>>> +   ancildrv->driver.bus = &ancillary_bus_type;
> >>>> +   ancildrv->driver.probe = ancillary_probe_driver;
> >>>> +   ancildrv->driver.remove = ancillary_remove_driver;
> >>>> +   ancildrv->driver.shutdown = ancillary_shutdown_driver;
> >>>> +
> >>>
> >>> I think that this part is wrong, probe/remove/shutdown functions should
> >>> come from ancillary_bus_type.
> >>
> >>  From checking other usage cases, this is the model that is used for probe, remove,
> >> and shutdown in drivers.  Here is the example from Greybus.
> >>
> >> int greybus_register_driver(struct greybus_driver *driver, struct module *owner,
> >>                              const char *mod_name)
> >> {
> >>          int retval;
> >>
> >>          if (greybus_disabled())
> >>                  return -ENODEV;
> >>
> >>          driver->driver.bus = &greybus_bus_type;
> >>          driver->driver.name = driver->name;
> >>          driver->driver.probe = greybus_probe;
> >>          driver->driver.remove = greybus_remove;
> >>          driver->driver.owner = owner;
> >>          driver->driver.mod_name = mod_name;
> >>
> >>
> >>> You are overwriting private device_driver
> >>> callbacks that makes impossible to make container_of of ancillary_driver
> >>> to chain operations.
> >>>
> >>
> >> I am sorry, you lost me here.  you cannot perform container_of on the callbacks
> >> because they are pointers, but if you are referring to going from device_driver
> >> to the auxiliary_driver, that is what happens in auxiliary_probe_driver in the
> >> very beginning.
> >>
> >> static int auxiliary_probe_driver(struct device *dev)
> >> 145 {
> >> 146         struct auxiliary_driver *auxdrv = to_auxiliary_drv(dev->driver);
> >> 147         struct auxiliary_device *auxdev = to_auxiliary_dev(dev);
> >>
> >> Did I miss your meaning?
> >
> > I think you're misunderstanding the cases when the
> > bus_type.{probe,remove} is used vs the driver.{probe,remove}
> > callbacks. The bus_type callbacks are to implement a pattern where the
> > 'probe' and 'remove' method are typed to the bus device type. For
> > example 'struct pci_dev *' instead of raw 'struct device *'. See this
> > conversion of dax bus as an example of going from raw 'struct device
> > *' typed probe/remove to dax-device typed probe/remove:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=75797273189d
>
> Thanks Dan for the reference, very useful. This doesn't look like a a
> big change to implement, just wondering about the benefits and
> drawbacks, if any? I am a bit confused here.
>
> First, was the initial pattern wrong as Leon asserted it? Such code
> exists in multiple examples in the kernel and there's nothing preventing
> the use of container_of that I can think of. Put differently, if this
> code was wrong then there are other existing buses that need to be updated.
>
> Second, what additional functionality does this move from driver to
> bus_type provide? The commit reference just states 'In preparation for
> introducing seed devices the dax-bus core needs to be able to intercept
> ->probe() and ->remove() operations", but that doesn't really help me
> figure out what 'intercept' means. Would you mind elaborating?
>
> And last, the existing probe function does calls dev_pm_domain_attach():
>
> static int ancillary_probe_driver(struct device *dev)
> {
>         struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
>         struct ancillary_device *ancildev = to_ancillary_dev(dev);
>         int ret;
>
>         ret = dev_pm_domain_attach(dev, true);
>
> So the need to access the raw device still exists. Is this still legit
> if the probe() is moved to the bus_type structure?

Sure, of course.

>
> I have no objection to this change if it preserves the same
> functionality and possibly extends it, just wanted to better understand
> the reasons for the change and in which cases the bus probe() makes more
> sense than a driver probe().
>
> Thanks for enlightening the rest of us!

tl;dr: The ops set by the device driver should never be overwritten by
the bus, the bus can only wrap them in its own ops.

The reason to use the bus_type is because the bus type is the only
agent that knows both how to convert a raw 'struct device *' to the
bus's native type, and how to convert a raw 'struct device_driver *'
to the bus's native driver type. The driver core does:

        if (dev->bus->probe) {
                ret = dev->bus->probe(dev);
        } else if (drv->probe) {
                ret = drv->probe(dev);
        }

...so that the bus has the first priority for probing a device /
wrapping the native driver ops. The bus ->probe, in addition to
optionally performing some bus specific pre-work, lets the bus upcast
the device to bus-native type.

The bus also knows the types of drivers that will be registered to it,
so the bus can upcast the dev->driver to the native type.

So with bus_type based driver ops driver authors can do:

struct auxiliary_device_driver auxdrv {
    .probe = fn(struct auxiliary_device *, <any aux bus custom probe arguments>)
};

auxiliary_driver_register(&auxdrv); <-- the core code can hide bus details

Without bus_type the driver author would need to do:

struct auxiliary_device_driver auxdrv {
    .drv = {
        .probe = fn(struct device *), <-- no opportunity for bus
specific probe args
        .bus = &auxilary_bus_type, <-- unnecessary export to device drivers
    },
};

driver_register(&auxdrv.drv)

Pierre-Louis Bossart Oct. 9, 2020, 7:39 p.m. UTC | #57

>>>>>> +
>>>>>> +   ancildrv->driver.owner = owner;
>>>>>> +   ancildrv->driver.bus = &ancillary_bus_type;
>>>>>> +   ancildrv->driver.probe = ancillary_probe_driver;
>>>>>> +   ancildrv->driver.remove = ancillary_remove_driver;
>>>>>> +   ancildrv->driver.shutdown = ancillary_shutdown_driver;
>>>>>> +
>>>>>
>>>>> I think that this part is wrong, probe/remove/shutdown functions should
>>>>> come from ancillary_bus_type.
>>>>
>>>>   From checking other usage cases, this is the model that is used for probe, remove,
>>>> and shutdown in drivers.  Here is the example from Greybus.
>>>>
>>>> int greybus_register_driver(struct greybus_driver *driver, struct module *owner,
>>>>                               const char *mod_name)
>>>> {
>>>>           int retval;
>>>>
>>>>           if (greybus_disabled())
>>>>                   return -ENODEV;
>>>>
>>>>           driver->driver.bus = &greybus_bus_type;
>>>>           driver->driver.name = driver->name;
>>>>           driver->driver.probe = greybus_probe;
>>>>           driver->driver.remove = greybus_remove;
>>>>           driver->driver.owner = owner;
>>>>           driver->driver.mod_name = mod_name;
>>>>
>>>>
>>>>> You are overwriting private device_driver
>>>>> callbacks that makes impossible to make container_of of ancillary_driver
>>>>> to chain operations.
>>>>>
>>>>
>>>> I am sorry, you lost me here.  you cannot perform container_of on the callbacks
>>>> because they are pointers, but if you are referring to going from device_driver
>>>> to the auxiliary_driver, that is what happens in auxiliary_probe_driver in the
>>>> very beginning.
>>>>
>>>> static int auxiliary_probe_driver(struct device *dev)
>>>> 145 {
>>>> 146         struct auxiliary_driver *auxdrv = to_auxiliary_drv(dev->driver);
>>>> 147         struct auxiliary_device *auxdev = to_auxiliary_dev(dev);
>>>>
>>>> Did I miss your meaning?
>>>
>>> I think you're misunderstanding the cases when the
>>> bus_type.{probe,remove} is used vs the driver.{probe,remove}
>>> callbacks. The bus_type callbacks are to implement a pattern where the
>>> 'probe' and 'remove' method are typed to the bus device type. For
>>> example 'struct pci_dev *' instead of raw 'struct device *'. See this
>>> conversion of dax bus as an example of going from raw 'struct device
>>> *' typed probe/remove to dax-device typed probe/remove:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=75797273189d
>>
>> Thanks Dan for the reference, very useful. This doesn't look like a a
>> big change to implement, just wondering about the benefits and
>> drawbacks, if any? I am a bit confused here.
>>
>> First, was the initial pattern wrong as Leon asserted it? Such code
>> exists in multiple examples in the kernel and there's nothing preventing
>> the use of container_of that I can think of. Put differently, if this
>> code was wrong then there are other existing buses that need to be updated.
>>
>> Second, what additional functionality does this move from driver to
>> bus_type provide? The commit reference just states 'In preparation for
>> introducing seed devices the dax-bus core needs to be able to intercept
>> ->probe() and ->remove() operations", but that doesn't really help me
>> figure out what 'intercept' means. Would you mind elaborating?
>>
>> And last, the existing probe function does calls dev_pm_domain_attach():
>>
>> static int ancillary_probe_driver(struct device *dev)
>> {
>>          struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
>>          struct ancillary_device *ancildev = to_ancillary_dev(dev);
>>          int ret;
>>
>>          ret = dev_pm_domain_attach(dev, true);
>>
>> So the need to access the raw device still exists. Is this still legit
>> if the probe() is moved to the bus_type structure?
> 
> Sure, of course.
> 
>>
>> I have no objection to this change if it preserves the same
>> functionality and possibly extends it, just wanted to better understand
>> the reasons for the change and in which cases the bus probe() makes more
>> sense than a driver probe().
>>
>> Thanks for enlightening the rest of us!
> 
> tl;dr: The ops set by the device driver should never be overwritten by
> the bus, the bus can only wrap them in its own ops.
> 
> The reason to use the bus_type is because the bus type is the only
> agent that knows both how to convert a raw 'struct device *' to the
> bus's native type, and how to convert a raw 'struct device_driver *'
> to the bus's native driver type. The driver core does:
> 
>          if (dev->bus->probe) {
>                  ret = dev->bus->probe(dev);
>          } else if (drv->probe) {
>                  ret = drv->probe(dev);
>          }
> 
> ...so that the bus has the first priority for probing a device /
> wrapping the native driver ops. The bus ->probe, in addition to
> optionally performing some bus specific pre-work, lets the bus upcast
> the device to bus-native type.
> 
> The bus also knows the types of drivers that will be registered to it,
> so the bus can upcast the dev->driver to the native type.
> 
> So with bus_type based driver ops driver authors can do:
> 
> struct auxiliary_device_driver auxdrv {
>      .probe = fn(struct auxiliary_device *, <any aux bus custom probe arguments>)
> };
> 
> auxiliary_driver_register(&auxdrv); <-- the core code can hide bus details
> 
> Without bus_type the driver author would need to do:
> 
> struct auxiliary_device_driver auxdrv {
>      .drv = {
>          .probe = fn(struct device *), <-- no opportunity for bus
> specific probe args
>          .bus = &auxilary_bus_type, <-- unnecessary export to device drivers
>      },
> };
> 
> driver_register(&auxdrv.drv)

Thanks Dan, I appreciate the explanation.

I guess the misunderstanding on my side was that in practice the drivers 
only declare a probe at the auxiliary level:

struct auxiliary_device_driver auxdrv {
     .drv = {
         .name = "my driver"
         <<< .probe not set here.
     }
     .probe =  fn(struct auxiliary_device *, int id),	
}

It looks indeed cleaner with your suggestion. DaveE and I were talking 
about this moments ago and made the change, will be testing later today.

Again thanks for the write-up and have a nice week-end.

Ertman, David M Oct. 12, 2020, 6:34 p.m. UTC | #58

> -----Original Message-----

> From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

> Sent: Friday, October 9, 2020 12:39 PM

> To: Williams, Dan J <dan.j.williams@intel.com>

> Cc: Ertman, David M <david.m.ertman@intel.com>; alsa-devel@alsa-

> project.org; parav@mellanox.com; Leon Romanovsky <leon@kernel.org>;

> tiwai@suse.de; netdev@vger.kernel.org; ranjani.sridharan@linux.intel.com;

> fred.oh@linux.intel.com; linux-rdma@vger.kernel.org;

> dledford@redhat.com; broonie@kernel.org; jgg@nvidia.com;

> gregkh@linuxfoundation.org; kuba@kernel.org; Saleem, Shiraz

> <shiraz.saleem@intel.com>; davem@davemloft.net; Patil, Kiran

> <kiran.patil@intel.com>

> Subject: Re: [PATCH v2 1/6] Add ancillary bus support

> 

> 

> >>>>>> +

> >>>>>> +   ancildrv->driver.owner = owner;

> >>>>>> +   ancildrv->driver.bus = &ancillary_bus_type;

> >>>>>> +   ancildrv->driver.probe = ancillary_probe_driver;

> >>>>>> +   ancildrv->driver.remove = ancillary_remove_driver;

> >>>>>> +   ancildrv->driver.shutdown = ancillary_shutdown_driver;

> >>>>>> +

> >>>>>

> >>>>> I think that this part is wrong, probe/remove/shutdown functions

> should

> >>>>> come from ancillary_bus_type.

> >>>>

> >>>>   From checking other usage cases, this is the model that is used for

> probe, remove,

> >>>> and shutdown in drivers.  Here is the example from Greybus.

> >>>>

> >>>> int greybus_register_driver(struct greybus_driver *driver, struct

> module *owner,

> >>>>                               const char *mod_name)

> >>>> {

> >>>>           int retval;

> >>>>

> >>>>           if (greybus_disabled())

> >>>>                   return -ENODEV;

> >>>>

> >>>>           driver->driver.bus = &greybus_bus_type;

> >>>>           driver->driver.name = driver->name;

> >>>>           driver->driver.probe = greybus_probe;

> >>>>           driver->driver.remove = greybus_remove;

> >>>>           driver->driver.owner = owner;

> >>>>           driver->driver.mod_name = mod_name;

> >>>>

> >>>>

> >>>>> You are overwriting private device_driver

> >>>>> callbacks that makes impossible to make container_of of

> ancillary_driver

> >>>>> to chain operations.

> >>>>>

> >>>>

> >>>> I am sorry, you lost me here.  you cannot perform container_of on the

> callbacks

> >>>> because they are pointers, but if you are referring to going from

> device_driver

> >>>> to the auxiliary_driver, that is what happens in auxiliary_probe_driver

> in the

> >>>> very beginning.

> >>>>

> >>>> static int auxiliary_probe_driver(struct device *dev)

> >>>> 145 {

> >>>> 146         struct auxiliary_driver *auxdrv = to_auxiliary_drv(dev->driver);

> >>>> 147         struct auxiliary_device *auxdev = to_auxiliary_dev(dev);

> >>>>

> >>>> Did I miss your meaning?

> >>>

> >>> I think you're misunderstanding the cases when the

> >>> bus_type.{probe,remove} is used vs the driver.{probe,remove}

> >>> callbacks. The bus_type callbacks are to implement a pattern where the

> >>> 'probe' and 'remove' method are typed to the bus device type. For

> >>> example 'struct pci_dev *' instead of raw 'struct device *'. See this

> >>> conversion of dax bus as an example of going from raw 'struct device

> >>> *' typed probe/remove to dax-device typed probe/remove:

> >>>

> >>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-

> next.git/commit/?id=75797273189d

> >>

> >> Thanks Dan for the reference, very useful. This doesn't look like a a

> >> big change to implement, just wondering about the benefits and

> >> drawbacks, if any? I am a bit confused here.

> >>

> >> First, was the initial pattern wrong as Leon asserted it? Such code

> >> exists in multiple examples in the kernel and there's nothing preventing

> >> the use of container_of that I can think of. Put differently, if this

> >> code was wrong then there are other existing buses that need to be

> updated.

> >>

> >> Second, what additional functionality does this move from driver to

> >> bus_type provide? The commit reference just states 'In preparation for

> >> introducing seed devices the dax-bus core needs to be able to intercept

> >> ->probe() and ->remove() operations", but that doesn't really help me

> >> figure out what 'intercept' means. Would you mind elaborating?

> >>

> >> And last, the existing probe function does calls dev_pm_domain_attach():

> >>

> >> static int ancillary_probe_driver(struct device *dev)

> >> {

> >>          struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);

> >>          struct ancillary_device *ancildev = to_ancillary_dev(dev);

> >>          int ret;

> >>

> >>          ret = dev_pm_domain_attach(dev, true);

> >>

> >> So the need to access the raw device still exists. Is this still legit

> >> if the probe() is moved to the bus_type structure?

> >

> > Sure, of course.

> >

> >>

> >> I have no objection to this change if it preserves the same

> >> functionality and possibly extends it, just wanted to better understand

> >> the reasons for the change and in which cases the bus probe() makes

> more

> >> sense than a driver probe().

> >>

> >> Thanks for enlightening the rest of us!

> >

> > tl;dr: The ops set by the device driver should never be overwritten by

> > the bus, the bus can only wrap them in its own ops.

> >

> > The reason to use the bus_type is because the bus type is the only

> > agent that knows both how to convert a raw 'struct device *' to the

> > bus's native type, and how to convert a raw 'struct device_driver *'

> > to the bus's native driver type. The driver core does:

> >

> >          if (dev->bus->probe) {

> >                  ret = dev->bus->probe(dev);

> >          } else if (drv->probe) {

> >                  ret = drv->probe(dev);

> >          }

> >

> > ...so that the bus has the first priority for probing a device /

> > wrapping the native driver ops. The bus ->probe, in addition to

> > optionally performing some bus specific pre-work, lets the bus upcast

> > the device to bus-native type.

> >

> > The bus also knows the types of drivers that will be registered to it,

> > so the bus can upcast the dev->driver to the native type.

> >

> > So with bus_type based driver ops driver authors can do:

> >

> > struct auxiliary_device_driver auxdrv {

> >      .probe = fn(struct auxiliary_device *, <any aux bus custom probe

> arguments>)

> > };

> >

> > auxiliary_driver_register(&auxdrv); <-- the core code can hide bus details

> >

> > Without bus_type the driver author would need to do:

> >

> > struct auxiliary_device_driver auxdrv {

> >      .drv = {

> >          .probe = fn(struct device *), <-- no opportunity for bus

> > specific probe args

> >          .bus = &auxilary_bus_type, <-- unnecessary export to device drivers

> >      },

> > };

> >

> > driver_register(&auxdrv.drv)

> 

> Thanks Dan, I appreciate the explanation.

> 

> I guess the misunderstanding on my side was that in practice the drivers

> only declare a probe at the auxiliary level:

> 

> struct auxiliary_device_driver auxdrv {

>      .drv = {

>          .name = "my driver"

>          <<< .probe not set here.

>      }

>      .probe =  fn(struct auxiliary_device *, int id),

> }

> 

> It looks indeed cleaner with your suggestion. DaveE and I were talking

> about this moments ago and made the change, will be testing later today.

> 

> Again thanks for the write-up and have a nice week-end.

> 


Like Pierre said, I have already changed the probe, remove, and shutdown callbacks
into the bus_type.

But it should be noted that you are not supposed to have these callbacks in both the
auxdrv->drv->* and in the bus->*.

in drivers/base/driver.c line 158 it checks for this:

if ((drv->bus->probe && drv->probe) ||
             (drv->bus->remove && drv->remove) ||
             (drv->bus->shutdown && drv->shutdown))
                 pr_warn("Driver '%s' needs updating - please use "
                         "bus_type methods\n", drv->name);

So, changing to the bus_type for these is the right thing to do, but driver writers need to
make sure that auxdrv->drv->[probe|remove|shutdown] are NULL.

-DaveE

[v2,1/6] Add ancillary bus support

Commit Message

Comments

Patch