Message ID | 1661809417-11370-1-git-send-email-lizhi.hou@amd.com |
---|---|
Headers | show |
Series | Generate device tree node for pci devices | expand |
On 9/13/22 00:00, Frank Rowand wrote: > On 8/29/22 16:43, Lizhi Hou wrote: >> This patch series introduces OF overlay support for PCI devices which >> primarily addresses two use cases. First, it provides a data driven method >> to describe hardware peripherals that are present in a PCI endpoint and >> hence can be accessed by the PCI host. An example device is Xilinx/AMD >> Alveo PCIe accelerators. Second, it allows reuse of a OF compatible >> driver -- often used in SoC platforms -- in a PCI host based system. An >> example device is Microchip LAN9662 Ethernet Controller. >> >> This patch series consolidates previous efforts to define such an >> infrastructure: >> https://lore.kernel.org/lkml/20220305052304.726050-1-lizhi.hou@xilinx.com/ >> https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ >> >> Normally, the PCI core discovers PCI devices and their BARs using the >> PCI enumeration process. However, the process does not provide a way to >> discover the hardware peripherals that are present in a PCI device, and >> which can be accessed through the PCI BARs. Also, the enumeration process >> does not provide a way to associate MSI-X vectors of a PCI device with the >> hardware peripherals that are present in the device. PCI device drivers >> often use header files to describe the hardware peripherals and their >> resources as there is no standard data driven way to do so. This patch >> series proposes to use flattened device tree blob to describe the >> peripherals in a data driven way. Based on previous discussion, using >> device tree overlay is the best way to unflatten the blob and populate >> platform devices. To use device tree overlay, there are three obvious >> problems that need to be resolved. >> >> First, we need to create a base tree for non-DT system such as x86_64. A >> patch series has been submitted for this: >> https://lore.kernel.org/lkml/20220624034327.2542112-1-frowand.list@gmail.com/ >> https://lore.kernel.org/lkml/20220216050056.311496-1-lizhi.hou@xilinx.com/ >> >> Second, a device tree node corresponding to the PCI endpoint is required >> for overlaying the flattened device tree blob for that PCI endpoint. >> Because PCI is a self-discoverable bus, a device tree node is usually not >> created for PCI devices. This series adds support to generate a device >> tree node for a PCI device which advertises itself using PCI quirks >> infrastructure. >> >> Third, we need to generate device tree nodes for PCI bridges since a child >> PCI endpoint may choose to have a device tree node created. >> >> This patch series is made up of two patches. >> >> The first patch is adding OF interface to allocate an OF node. It is copied >> from: >> https://lore.kernel.org/lkml/20220620104123.341054-5-clement.leger@bootlin.com/ >> >> The second patch introduces a kernel option, CONFIG_PCI_OF. When the option >> is turned on, the kernel will generate device tree nodes for all PCI >> bridges unconditionally. The patch also shows how to use the PCI quirks >> infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device tree node for >> a device. Specifically, the patch generates a device tree node for Xilinx >> Alveo U50 PCIe accelerator device. The generated device tree nodes do not >> have any property. Future patches will add the necessary properties. >> >> Clément Léger (1): >> of: dynamic: add of_node_alloc() >> >> Lizhi Hou (1): >> pci: create device tree node for selected devices >> >> drivers/of/dynamic.c | 50 +++++++++++++---- >> drivers/pci/Kconfig | 11 ++++ >> drivers/pci/bus.c | 2 + >> drivers/pci/msi/irqdomain.c | 6 +- >> drivers/pci/of.c | 106 ++++++++++++++++++++++++++++++++++++ >> drivers/pci/pci-driver.c | 3 +- >> drivers/pci/pci.h | 16 ++++++ >> drivers/pci/quirks.c | 11 ++++ >> drivers/pci/remove.c | 1 + >> include/linux/of.h | 7 +++ >> 10 files changed, 200 insertions(+), 13 deletions(-) >> > The patch description leaves out the most important piece of information. > > The device located at the PCI endpoint is implemented via FPGA > - which is programmed after Linux boots (or somewhere late in the boot process) > - (A) and thus can not be described by static data available pre-boot because > it is dynamic (and the FPGA program will often change while the Linux > kernel is already booted > - (B) can be described by static data available pre-boot because the FPGA > program will always be the same for this device on this system > > I am not positive what part of what I wrote above is correct and would appreciate > some confirmation of what is correct or incorrect. There are 2 series devices rely on this patch: 1) Xilinx Alveo Accelerator cards (FPGA based device) 2) lan9662 PCIe card please see: https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ For Xilinx Alveo device, it is (A). The FPGA partitions can be programmed dynamically after boot. Thanks, Lzhi > > -Frank
On 9/13/22 12:10, Lizhi Hou wrote: > > On 9/13/22 00:00, Frank Rowand wrote: >> On 8/29/22 16:43, Lizhi Hou wrote: >>> This patch series introduces OF overlay support for PCI devices which >>> primarily addresses two use cases. First, it provides a data driven method >>> to describe hardware peripherals that are present in a PCI endpoint and >>> hence can be accessed by the PCI host. An example device is Xilinx/AMD >>> Alveo PCIe accelerators. Second, it allows reuse of a OF compatible >>> driver -- often used in SoC platforms -- in a PCI host based system. An >>> example device is Microchip LAN9662 Ethernet Controller. >>> >>> This patch series consolidates previous efforts to define such an >>> infrastructure: >>> https://lore.kernel.org/lkml/20220305052304.726050-1-lizhi.hou@xilinx.com/ >>> https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ >>> >>> Normally, the PCI core discovers PCI devices and their BARs using the >>> PCI enumeration process. However, the process does not provide a way to >>> discover the hardware peripherals that are present in a PCI device, and >>> which can be accessed through the PCI BARs. Also, the enumeration process >>> does not provide a way to associate MSI-X vectors of a PCI device with the >>> hardware peripherals that are present in the device. PCI device drivers >>> often use header files to describe the hardware peripherals and their >>> resources as there is no standard data driven way to do so. This patch >>> series proposes to use flattened device tree blob to describe the >>> peripherals in a data driven way. Based on previous discussion, using >>> device tree overlay is the best way to unflatten the blob and populate >>> platform devices. To use device tree overlay, there are three obvious >>> problems that need to be resolved. >>> >>> First, we need to create a base tree for non-DT system such as x86_64. A >>> patch series has been submitted for this: >>> https://lore.kernel.org/lkml/20220624034327.2542112-1-frowand.list@gmail.com/ >>> https://lore.kernel.org/lkml/20220216050056.311496-1-lizhi.hou@xilinx.com/ >>> >>> Second, a device tree node corresponding to the PCI endpoint is required >>> for overlaying the flattened device tree blob for that PCI endpoint. >>> Because PCI is a self-discoverable bus, a device tree node is usually not >>> created for PCI devices. This series adds support to generate a device >>> tree node for a PCI device which advertises itself using PCI quirks >>> infrastructure. >>> >>> Third, we need to generate device tree nodes for PCI bridges since a child >>> PCI endpoint may choose to have a device tree node created. >>> >>> This patch series is made up of two patches. >>> >>> The first patch is adding OF interface to allocate an OF node. It is copied >>> from: >>> https://lore.kernel.org/lkml/20220620104123.341054-5-clement.leger@bootlin.com/ >>> >>> The second patch introduces a kernel option, CONFIG_PCI_OF. When the option >>> is turned on, the kernel will generate device tree nodes for all PCI >>> bridges unconditionally. The patch also shows how to use the PCI quirks >>> infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device tree node for >>> a device. Specifically, the patch generates a device tree node for Xilinx >>> Alveo U50 PCIe accelerator device. The generated device tree nodes do not >>> have any property. Future patches will add the necessary properties. >>> >>> Clément Léger (1): >>> of: dynamic: add of_node_alloc() >>> >>> Lizhi Hou (1): >>> pci: create device tree node for selected devices >>> >>> drivers/of/dynamic.c | 50 +++++++++++++---- >>> drivers/pci/Kconfig | 11 ++++ >>> drivers/pci/bus.c | 2 + >>> drivers/pci/msi/irqdomain.c | 6 +- >>> drivers/pci/of.c | 106 ++++++++++++++++++++++++++++++++++++ >>> drivers/pci/pci-driver.c | 3 +- >>> drivers/pci/pci.h | 16 ++++++ >>> drivers/pci/quirks.c | 11 ++++ >>> drivers/pci/remove.c | 1 + >>> include/linux/of.h | 7 +++ >>> 10 files changed, 200 insertions(+), 13 deletions(-) >>> >> The patch description leaves out the most important piece of information. >> >> The device located at the PCI endpoint is implemented via FPGA >> - which is programmed after Linux boots (or somewhere late in the boot process) >> - (A) and thus can not be described by static data available pre-boot because >> it is dynamic (and the FPGA program will often change while the Linux >> kernel is already booted >> - (B) can be described by static data available pre-boot because the FPGA >> program will always be the same for this device on this system >> >> I am not positive what part of what I wrote above is correct and would appreciate >> some confirmation of what is correct or incorrect. > > There are 2 series devices rely on this patch: > > 1) Xilinx Alveo Accelerator cards (FPGA based device) > > 2) lan9662 PCIe card > > please see: https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ Thanks. Please include this information in future versions of the patch series. For device 2 I have strongly recommended using pre-boot apply of the overlay to the base device tree. I realize that this suggestion is only a partial solution if one wants to use hotplug to change system configuration (as opposed to using hotplug only to replace an existing device (eg a broken device) with another instance of the same device). I also realize that this increased the system administration overhead. On the other hand an overlay based solution is likely to be fragile and possibly flaky. > > For Xilinx Alveo device, it is (A). The FPGA partitions can be programmed dynamically after boot. I looked at the Xilinx Alveo web page, and there are a variety of types of Alveo cards available. So the answer to my next question may vary by type of card. Is it expected that the fpga program on a given card will change frequently (eg multiple times per day), where the changed program results in a new device that would require a different hardware description in the device tree? Or is the fpga program expected to change on an infrequent basis (eg monthly, quarterly, annually), in the same way as device firmware and operating systems are updated on a regular basis for bug fixes and new functionality? > > > Thanks, > > Lzhi > >> >> -Frank
On 9/13/22 10:41, Frank Rowand wrote: > On 9/13/22 12:10, Lizhi Hou wrote: >> On 9/13/22 00:00, Frank Rowand wrote: >>> On 8/29/22 16:43, Lizhi Hou wrote: >>>> This patch series introduces OF overlay support for PCI devices which >>>> primarily addresses two use cases. First, it provides a data driven method >>>> to describe hardware peripherals that are present in a PCI endpoint and >>>> hence can be accessed by the PCI host. An example device is Xilinx/AMD >>>> Alveo PCIe accelerators. Second, it allows reuse of a OF compatible >>>> driver -- often used in SoC platforms -- in a PCI host based system. An >>>> example device is Microchip LAN9662 Ethernet Controller. >>>> >>>> This patch series consolidates previous efforts to define such an >>>> infrastructure: >>>> https://lore.kernel.org/lkml/20220305052304.726050-1-lizhi.hou@xilinx.com/ >>>> https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ >>>> >>>> Normally, the PCI core discovers PCI devices and their BARs using the >>>> PCI enumeration process. However, the process does not provide a way to >>>> discover the hardware peripherals that are present in a PCI device, and >>>> which can be accessed through the PCI BARs. Also, the enumeration process >>>> does not provide a way to associate MSI-X vectors of a PCI device with the >>>> hardware peripherals that are present in the device. PCI device drivers >>>> often use header files to describe the hardware peripherals and their >>>> resources as there is no standard data driven way to do so. This patch >>>> series proposes to use flattened device tree blob to describe the >>>> peripherals in a data driven way. Based on previous discussion, using >>>> device tree overlay is the best way to unflatten the blob and populate >>>> platform devices. To use device tree overlay, there are three obvious >>>> problems that need to be resolved. >>>> >>>> First, we need to create a base tree for non-DT system such as x86_64. A >>>> patch series has been submitted for this: >>>> https://lore.kernel.org/lkml/20220624034327.2542112-1-frowand.list@gmail.com/ >>>> https://lore.kernel.org/lkml/20220216050056.311496-1-lizhi.hou@xilinx.com/ >>>> >>>> Second, a device tree node corresponding to the PCI endpoint is required >>>> for overlaying the flattened device tree blob for that PCI endpoint. >>>> Because PCI is a self-discoverable bus, a device tree node is usually not >>>> created for PCI devices. This series adds support to generate a device >>>> tree node for a PCI device which advertises itself using PCI quirks >>>> infrastructure. >>>> >>>> Third, we need to generate device tree nodes for PCI bridges since a child >>>> PCI endpoint may choose to have a device tree node created. >>>> >>>> This patch series is made up of two patches. >>>> >>>> The first patch is adding OF interface to allocate an OF node. It is copied >>>> from: >>>> https://lore.kernel.org/lkml/20220620104123.341054-5-clement.leger@bootlin.com/ >>>> >>>> The second patch introduces a kernel option, CONFIG_PCI_OF. When the option >>>> is turned on, the kernel will generate device tree nodes for all PCI >>>> bridges unconditionally. The patch also shows how to use the PCI quirks >>>> infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device tree node for >>>> a device. Specifically, the patch generates a device tree node for Xilinx >>>> Alveo U50 PCIe accelerator device. The generated device tree nodes do not >>>> have any property. Future patches will add the necessary properties. >>>> >>>> Clément Léger (1): >>>> of: dynamic: add of_node_alloc() >>>> >>>> Lizhi Hou (1): >>>> pci: create device tree node for selected devices >>>> >>>> drivers/of/dynamic.c | 50 +++++++++++++---- >>>> drivers/pci/Kconfig | 11 ++++ >>>> drivers/pci/bus.c | 2 + >>>> drivers/pci/msi/irqdomain.c | 6 +- >>>> drivers/pci/of.c | 106 ++++++++++++++++++++++++++++++++++++ >>>> drivers/pci/pci-driver.c | 3 +- >>>> drivers/pci/pci.h | 16 ++++++ >>>> drivers/pci/quirks.c | 11 ++++ >>>> drivers/pci/remove.c | 1 + >>>> include/linux/of.h | 7 +++ >>>> 10 files changed, 200 insertions(+), 13 deletions(-) >>>> >>> The patch description leaves out the most important piece of information. >>> >>> The device located at the PCI endpoint is implemented via FPGA >>> - which is programmed after Linux boots (or somewhere late in the boot process) >>> - (A) and thus can not be described by static data available pre-boot because >>> it is dynamic (and the FPGA program will often change while the Linux >>> kernel is already booted >>> - (B) can be described by static data available pre-boot because the FPGA >>> program will always be the same for this device on this system >>> >>> I am not positive what part of what I wrote above is correct and would appreciate >>> some confirmation of what is correct or incorrect. >> There are 2 series devices rely on this patch: >> >> 1) Xilinx Alveo Accelerator cards (FPGA based device) >> >> 2) lan9662 PCIe card >> >> please see: https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ > Thanks. Please include this information in future versions of the patch series. > > For device 2 I have strongly recommended using pre-boot apply of the overlay to the base > device tree. I realize that this suggestion is only a partial solution if one wants to > use hotplug to change system configuration (as opposed to using hotplug only to replace > an existing device (eg a broken device) with another instance of the same device). I > also realize that this increased the system administration overhead. On the other hand > an overlay based solution is likely to be fragile and possibly flaky. Can you clarify the pre-boot apply approach? How will it work for PCI devices? > >> For Xilinx Alveo device, it is (A). The FPGA partitions can be programmed dynamically after boot. > I looked at the Xilinx Alveo web page, and there are a variety of types of Alveo cards > available. So the answer to my next question may vary by type of card. > > Is it expected that the fpga program on a given card will change frequently (eg multiple > times per day), where the changed program results in a new device that would require a > different hardware description in the device tree? Different images may be loaded to a FPGA partition several times a day. The PCI topology (Device IDs, BARs, MSIx, etc) does not change. New IPs may appear (and old IPs may disappear) on the BARs when a new image is loaded. We would like to use flattened device tree to describe the IPs on the BARs. Thanks, Lizhi > > Or is the fpga program expected to change on an infrequent basis (eg monthly, quarterly, > annually), in the same way as device firmware and operating systems are updated on a regular > basis for bug fixes and new functionality? > > >> >> Thanks, >> >> Lzhi >> >>> -Frank
On Wed, Sep 14, 2022 at 8:35 AM Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> wrote: > > On Mon, Aug 29, 2022 at 02:43:35PM -0700, Lizhi Hou wrote: > > This patch series introduces OF overlay support for PCI devices which > > primarily addresses two use cases. First, it provides a data driven method > > to describe hardware peripherals that are present in a PCI endpoint and > > hence can be accessed by the PCI host. An example device is Xilinx/AMD > > Alveo PCIe accelerators. Second, it allows reuse of a OF compatible > > driver -- often used in SoC platforms -- in a PCI host based system. An > > example device is Microchip LAN9662 Ethernet Controller. > > > > This patch series consolidates previous efforts to define such an > > infrastructure: > > https://lore.kernel.org/lkml/20220305052304.726050-1-lizhi.hou@xilinx.com/ > > https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ > > > > Normally, the PCI core discovers PCI devices and their BARs using the > > PCI enumeration process. However, the process does not provide a way to > > discover the hardware peripherals that are present in a PCI device, and > > which can be accessed through the PCI BARs. Also, the enumeration process > > does not provide a way to associate MSI-X vectors of a PCI device with the > > hardware peripherals that are present in the device. PCI device drivers > > often use header files to describe the hardware peripherals and their > > resources as there is no standard data driven way to do so. This patch > > series proposes to use flattened device tree blob to describe the > > peripherals in a data driven way. Based on previous discussion, using > > device tree overlay is the best way to unflatten the blob and populate > > platform devices. To use device tree overlay, there are three obvious > > problems that need to be resolved. > > Hi Lizhi, > > We all *love* "have you thought about xxx" questions but I would really like to > get your thoughts on this. An approach to this problem that I have seen in > various devices is to emulate a virtual pcie switch, and expose the "sub > devices" behind that. That way you can carve up the BAR space, each device has > its own config space and mapping of MSI-X vector to device becomes clear. This > approach also integrates well with other kernel infrastructure (IOMMU, hotplug). > > This is certainly possible on reprogrammable devices but requires some more > FPGA resources - though I don't believe the added utilization would be > significant. What do you think of this kind of solution? It would integrate easily unless the sub-devices you are targeting have drivers already which are not PCI drivers. Sure, we could add PCI support to them, but that could be a lot of churn. There are also usecases where we don't get to change the h/w. Rob
On 8/29/22 16:43, Lizhi Hou wrote: > This patch series introduces OF overlay support for PCI devices which > primarily addresses two use cases. First, it provides a data driven method > to describe hardware peripherals that are present in a PCI endpoint and > hence can be accessed by the PCI host. An example device is Xilinx/AMD > Alveo PCIe accelerators. Second, it allows reuse of a OF compatible > driver -- often used in SoC platforms -- in a PCI host based system. An > example device is Microchip LAN9662 Ethernet Controller. > > This patch series consolidates previous efforts to define such an > infrastructure: > https://lore.kernel.org/lkml/20220305052304.726050-1-lizhi.hou@xilinx.com/ > https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ > > Normally, the PCI core discovers PCI devices and their BARs using the > PCI enumeration process. However, the process does not provide a way to > discover the hardware peripherals that are present in a PCI device, and > which can be accessed through the PCI BARs. Also, the enumeration process > does not provide a way to associate MSI-X vectors of a PCI device with the > hardware peripherals that are present in the device. PCI device drivers > often use header files to describe the hardware peripherals and their > resources as there is no standard data driven way to do so. This patch> series proposes to use flattened device tree blob to describe the > peripherals in a data driven way. > Based on previous discussion, using > device tree overlay is the best way to unflatten the blob and populate > platform devices. I still do not agree with this statement. The device tree overlay implementation is very incomplete and should not be used until it becomes more complete. No need to debate this right now, but I don't want to let this go unchallenged. If there is no base system device tree on an ACPI based system, then I am not convinced that a mixed ACPI / device tree implementation is good architecture. I might be more supportive of using a device tree description of a PCI device in a detached device tree (not linked to the system device tree, but instead freestanding). Unfortunately the device tree functions assume a single system devicetree, with no concept of a freestanding tree (eg, if a NULL device tree node is provided to a function or macro, it often defaults to the root of the system device tree). I need to go look at whether the flag OF_DETACHED handles this, or if it could be leveraged to do so. > To use device tree overlay, there are three obvious > problems that need to be resolved. > > First, we need to create a base tree for non-DT system such as x86_64. A > patch series has been submitted for this: > https://lore.kernel.org/lkml/20220624034327.2542112-1-frowand.list@gmail.com/ > https://lore.kernel.org/lkml/20220216050056.311496-1-lizhi.hou@xilinx.com/ > > Second, a device tree node corresponding to the PCI endpoint is required > for overlaying the flattened device tree blob for that PCI endpoint. > Because PCI is a self-discoverable bus, a device tree node is usually not > created for PCI devices. This series adds support to generate a device > tree node for a PCI device which advertises itself using PCI quirks > infrastructure. > > Third, we need to generate device tree nodes for PCI bridges since a child > PCI endpoint may choose to have a device tree node created. > > This patch series is made up of two patches. > > The first patch is adding OF interface to allocate an OF node. It is copied > from: > https://lore.kernel.org/lkml/20220620104123.341054-5-clement.leger@bootlin.com/ > > The second patch introduces a kernel option, CONFIG_PCI_OF. When the option > is turned on, the kernel will generate device tree nodes for all PCI > bridges unconditionally. The patch also shows how to use the PCI quirks > infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device tree node for > a device. Specifically, the patch generates a device tree node for Xilinx > Alveo U50 PCIe accelerator device. The generated device tree nodes do not > have any property. Future patches will add the necessary properties. > > Clément Léger (1): > of: dynamic: add of_node_alloc() > > Lizhi Hou (1): > pci: create device tree node for selected devices > > drivers/of/dynamic.c | 50 +++++++++++++---- > drivers/pci/Kconfig | 11 ++++ > drivers/pci/bus.c | 2 + > drivers/pci/msi/irqdomain.c | 6 +- > drivers/pci/of.c | 106 ++++++++++++++++++++++++++++++++++++ > drivers/pci/pci-driver.c | 3 +- > drivers/pci/pci.h | 16 ++++++ > drivers/pci/quirks.c | 11 ++++ > drivers/pci/remove.c | 1 + > include/linux/of.h | 7 +++ > 10 files changed, 200 insertions(+), 13 deletions(-) >
On 9/13/22 16:02, Lizhi Hou wrote: > > On 9/13/22 10:41, Frank Rowand wrote: >> On 9/13/22 12:10, Lizhi Hou wrote: >>> On 9/13/22 00:00, Frank Rowand wrote: >>>> On 8/29/22 16:43, Lizhi Hou wrote: >>>>> This patch series introduces OF overlay support for PCI devices which >>>>> primarily addresses two use cases. First, it provides a data driven method >>>>> to describe hardware peripherals that are present in a PCI endpoint and >>>>> hence can be accessed by the PCI host. An example device is Xilinx/AMD >>>>> Alveo PCIe accelerators. Second, it allows reuse of a OF compatible >>>>> driver -- often used in SoC platforms -- in a PCI host based system. An >>>>> example device is Microchip LAN9662 Ethernet Controller. >>>>> >>>>> This patch series consolidates previous efforts to define such an >>>>> infrastructure: >>>>> https://lore.kernel.org/lkml/20220305052304.726050-1-lizhi.hou@xilinx.com/ >>>>> https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ >>>>> >>>>> Normally, the PCI core discovers PCI devices and their BARs using the >>>>> PCI enumeration process. However, the process does not provide a way to >>>>> discover the hardware peripherals that are present in a PCI device, and >>>>> which can be accessed through the PCI BARs. Also, the enumeration process >>>>> does not provide a way to associate MSI-X vectors of a PCI device with the >>>>> hardware peripherals that are present in the device. PCI device drivers >>>>> often use header files to describe the hardware peripherals and their >>>>> resources as there is no standard data driven way to do so. This patch >>>>> series proposes to use flattened device tree blob to describe the >>>>> peripherals in a data driven way. Based on previous discussion, using >>>>> device tree overlay is the best way to unflatten the blob and populate >>>>> platform devices. To use device tree overlay, there are three obvious >>>>> problems that need to be resolved. >>>>> >>>>> First, we need to create a base tree for non-DT system such as x86_64. A >>>>> patch series has been submitted for this: >>>>> https://lore.kernel.org/lkml/20220624034327.2542112-1-frowand.list@gmail.com/ >>>>> https://lore.kernel.org/lkml/20220216050056.311496-1-lizhi.hou@xilinx.com/ >>>>> >>>>> Second, a device tree node corresponding to the PCI endpoint is required >>>>> for overlaying the flattened device tree blob for that PCI endpoint. >>>>> Because PCI is a self-discoverable bus, a device tree node is usually not >>>>> created for PCI devices. This series adds support to generate a device >>>>> tree node for a PCI device which advertises itself using PCI quirks >>>>> infrastructure. >>>>> >>>>> Third, we need to generate device tree nodes for PCI bridges since a child >>>>> PCI endpoint may choose to have a device tree node created. >>>>> >>>>> This patch series is made up of two patches. >>>>> >>>>> The first patch is adding OF interface to allocate an OF node. It is copied >>>>> from: >>>>> https://lore.kernel.org/lkml/20220620104123.341054-5-clement.leger@bootlin.com/ >>>>> >>>>> The second patch introduces a kernel option, CONFIG_PCI_OF. When the option >>>>> is turned on, the kernel will generate device tree nodes for all PCI >>>>> bridges unconditionally. The patch also shows how to use the PCI quirks >>>>> infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device tree node for >>>>> a device. Specifically, the patch generates a device tree node for Xilinx >>>>> Alveo U50 PCIe accelerator device. The generated device tree nodes do not >>>>> have any property. Future patches will add the necessary properties. >>>>> >>>>> Clément Léger (1): >>>>> of: dynamic: add of_node_alloc() >>>>> >>>>> Lizhi Hou (1): >>>>> pci: create device tree node for selected devices >>>>> >>>>> drivers/of/dynamic.c | 50 +++++++++++++---- >>>>> drivers/pci/Kconfig | 11 ++++ >>>>> drivers/pci/bus.c | 2 + >>>>> drivers/pci/msi/irqdomain.c | 6 +- >>>>> drivers/pci/of.c | 106 ++++++++++++++++++++++++++++++++++++ >>>>> drivers/pci/pci-driver.c | 3 +- >>>>> drivers/pci/pci.h | 16 ++++++ >>>>> drivers/pci/quirks.c | 11 ++++ >>>>> drivers/pci/remove.c | 1 + >>>>> include/linux/of.h | 7 +++ >>>>> 10 files changed, 200 insertions(+), 13 deletions(-) >>>>> >>>> The patch description leaves out the most important piece of information. >>>> >>>> The device located at the PCI endpoint is implemented via FPGA >>>> - which is programmed after Linux boots (or somewhere late in the boot process) >>>> - (A) and thus can not be described by static data available pre-boot because >>>> it is dynamic (and the FPGA program will often change while the Linux >>>> kernel is already booted >>>> - (B) can be described by static data available pre-boot because the FPGA >>>> program will always be the same for this device on this system >>>> >>>> I am not positive what part of what I wrote above is correct and would appreciate >>>> some confirmation of what is correct or incorrect. >>> There are 2 series devices rely on this patch: >>> >>> 1) Xilinx Alveo Accelerator cards (FPGA based device) >>> >>> 2) lan9662 PCIe card >>> >>> please see: https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ >> Thanks. Please include this information in future versions of the patch series. >> >> For device 2 I have strongly recommended using pre-boot apply of the overlay to the base >> device tree. I realize that this suggestion is only a partial solution if one wants to >> use hotplug to change system configuration (as opposed to using hotplug only to replace >> an existing device (eg a broken device) with another instance of the same device). I >> also realize that this increased the system administration overhead. On the other hand >> an overlay based solution is likely to be fragile and possibly flaky. > Can you clarify the pre-boot apply approach? How will it work for PCI devices? >> >>> For Xilinx Alveo device, it is (A). The FPGA partitions can be programmed dynamically after boot. >> I looked at the Xilinx Alveo web page, and there are a variety of types of Alveo cards >> available. So the answer to my next question may vary by type of card. >> >> Is it expected that the fpga program on a given card will change frequently (eg multiple >> times per day), where the changed program results in a new device that would require a >> different hardware description in the device tree? > > Different images may be loaded to a FPGA partition several times a > day. The PCI topology (Device IDs, BARs, MSIx, etc) does not change. > New IPs may appear (and old IPs may disappear) on the BARs when a new > image is loaded. We would like to use flattened device tree to > describe the IPs on the BARs. That was kind of a non-answer. I know that images _may_ change at some frequency. I was trying to get a sense of whether the images were _likely_ to be changing on a frequent basis for these types of boards, or whether frequent image changes are likely to be a rare edge use case. If there is a good design for the 99.999% use case that does not support the 0.001% use case then it may be better to not create an inferior design that also supports the 0.001% use case. I hope that gives a better idea of the reason why I was asking the question and how the answer could impact design and implementation decisions. As a point of reference, some other fpga users have indicated a desire to change images many times per second. The current driver and overlay architecture did not seem to me to be a good match to that use case (depending on the definition of "many"). -Frank > > Thanks, > > Lizhi > >> >> Or is the fpga program expected to change on an infrequent basis (eg monthly, quarterly, >> annually), in the same way as device firmware and operating systems are updated on a regular >> basis for bug fixes and new functionality? >> >> >>> >>> Thanks, >>> >>> Lzhi >>> >>>> -Frank
Frank, On 9/16/22 7:23 PM, Frank Rowand wrote: > On 9/13/22 16:02, Lizhi Hou wrote: >> On 9/13/22 10:41, Frank Rowand wrote: >>> On 9/13/22 12:10, Lizhi Hou wrote: >>>> On 9/13/22 00:00, Frank Rowand wrote: >>>>> On 8/29/22 16:43, Lizhi Hou wrote: >>>>>> This patch series introduces OF overlay support for PCI devices which >>>>>> primarily addresses two use cases. First, it provides a data driven method >>>>>> to describe hardware peripherals that are present in a PCI endpoint and >>>>>> hence can be accessed by the PCI host. An example device is Xilinx/AMD >>>>>> Alveo PCIe accelerators. Second, it allows reuse of a OF compatible >>>>>> driver -- often used in SoC platforms -- in a PCI host based system. An >>>>>> example device is Microchip LAN9662 Ethernet Controller. >>>>>> >>>>>> This patch series consolidates previous efforts to define such an >>>>>> infrastructure: >>>>>> https://lore.kernel.org/lkml/20220305052304.726050-1-lizhi.hou@xilinx.com/ >>>>>> https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ >>>>>> >>>>>> Normally, the PCI core discovers PCI devices and their BARs using the >>>>>> PCI enumeration process. However, the process does not provide a way to >>>>>> discover the hardware peripherals that are present in a PCI device, and >>>>>> which can be accessed through the PCI BARs. Also, the enumeration process >>>>>> does not provide a way to associate MSI-X vectors of a PCI device with the >>>>>> hardware peripherals that are present in the device. PCI device drivers >>>>>> often use header files to describe the hardware peripherals and their >>>>>> resources as there is no standard data driven way to do so. This patch >>>>>> series proposes to use flattened device tree blob to describe the >>>>>> peripherals in a data driven way. Based on previous discussion, using >>>>>> device tree overlay is the best way to unflatten the blob and populate >>>>>> platform devices. To use device tree overlay, there are three obvious >>>>>> problems that need to be resolved. >>>>>> >>>>>> First, we need to create a base tree for non-DT system such as x86_64. A >>>>>> patch series has been submitted for this: >>>>>> https://lore.kernel.org/lkml/20220624034327.2542112-1-frowand.list@gmail.com/ >>>>>> https://lore.kernel.org/lkml/20220216050056.311496-1-lizhi.hou@xilinx.com/ >>>>>> >>>>>> Second, a device tree node corresponding to the PCI endpoint is required >>>>>> for overlaying the flattened device tree blob for that PCI endpoint. >>>>>> Because PCI is a self-discoverable bus, a device tree node is usually not >>>>>> created for PCI devices. This series adds support to generate a device >>>>>> tree node for a PCI device which advertises itself using PCI quirks >>>>>> infrastructure. >>>>>> >>>>>> Third, we need to generate device tree nodes for PCI bridges since a child >>>>>> PCI endpoint may choose to have a device tree node created. >>>>>> >>>>>> This patch series is made up of two patches. >>>>>> >>>>>> The first patch is adding OF interface to allocate an OF node. It is copied >>>>>> from: >>>>>> https://lore.kernel.org/lkml/20220620104123.341054-5-clement.leger@bootlin.com/ >>>>>> >>>>>> The second patch introduces a kernel option, CONFIG_PCI_OF. When the option >>>>>> is turned on, the kernel will generate device tree nodes for all PCI >>>>>> bridges unconditionally. The patch also shows how to use the PCI quirks >>>>>> infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device tree node for >>>>>> a device. Specifically, the patch generates a device tree node for Xilinx >>>>>> Alveo U50 PCIe accelerator device. The generated device tree nodes do not >>>>>> have any property. Future patches will add the necessary properties. >>>>>> >>>>>> Clément Léger (1): >>>>>> of: dynamic: add of_node_alloc() >>>>>> >>>>>> Lizhi Hou (1): >>>>>> pci: create device tree node for selected devices >>>>>> >>>>>> drivers/of/dynamic.c | 50 +++++++++++++---- >>>>>> drivers/pci/Kconfig | 11 ++++ >>>>>> drivers/pci/bus.c | 2 + >>>>>> drivers/pci/msi/irqdomain.c | 6 +- >>>>>> drivers/pci/of.c | 106 ++++++++++++++++++++++++++++++++++++ >>>>>> drivers/pci/pci-driver.c | 3 +- >>>>>> drivers/pci/pci.h | 16 ++++++ >>>>>> drivers/pci/quirks.c | 11 ++++ >>>>>> drivers/pci/remove.c | 1 + >>>>>> include/linux/of.h | 7 +++ >>>>>> 10 files changed, 200 insertions(+), 13 deletions(-) >>>>>> >>>>> The patch description leaves out the most important piece of information. >>>>> >>>>> The device located at the PCI endpoint is implemented via FPGA >>>>> - which is programmed after Linux boots (or somewhere late in the boot process) >>>>> - (A) and thus can not be described by static data available pre-boot because >>>>> it is dynamic (and the FPGA program will often change while the Linux >>>>> kernel is already booted >>>>> - (B) can be described by static data available pre-boot because the FPGA >>>>> program will always be the same for this device on this system >>>>> >>>>> I am not positive what part of what I wrote above is correct and would appreciate >>>>> some confirmation of what is correct or incorrect. >>>> There are 2 series devices rely on this patch: >>>> >>>> 1) Xilinx Alveo Accelerator cards (FPGA based device) >>>> >>>> 2) lan9662 PCIe card >>>> >>>> please see: https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ >>> Thanks. Please include this information in future versions of the patch series. >>> >>> For device 2 I have strongly recommended using pre-boot apply of the overlay to the base >>> device tree. I realize that this suggestion is only a partial solution if one wants to >>> use hotplug to change system configuration (as opposed to using hotplug only to replace >>> an existing device (eg a broken device) with another instance of the same device). I >>> also realize that this increased the system administration overhead. On the other hand >>> an overlay based solution is likely to be fragile and possibly flaky. >> Can you clarify the pre-boot apply approach? How will it work for PCI devices? >>>> For Xilinx Alveo device, it is (A). The FPGA partitions can be programmed dynamically after boot. >>> I looked at the Xilinx Alveo web page, and there are a variety of types of Alveo cards >>> available. So the answer to my next question may vary by type of card. >>> >>> Is it expected that the fpga program on a given card will change frequently (eg multiple >>> times per day), where the changed program results in a new device that would require a >>> different hardware description in the device tree? >> Different images may be loaded to a FPGA partition several times a >> day. The PCI topology (Device IDs, BARs, MSIx, etc) does not change. >> New IPs may appear (and old IPs may disappear) on the BARs when a new >> image is loaded. We would like to use flattened device tree to >> describe the IPs on the BARs. > That was kind of a non-answer. I know that images _may_ change at > some frequency. I was trying to get a sense of whether the images > were _likely_ to be changing on a frequent basis for these types > of boards, or whether frequent image changes are likely to be a > rare edge use case. > > If there is a good design for the 99.999% use case that does not > support the 0.001% use case then it may be better to not create > an inferior design that also supports the 0.001% use case. > > I hope that gives a better idea of the reason why I was asking the > question and how the answer could impact design and implementation > decisions. > > As a point of reference, some other fpga users have indicated a > desire to change images many times per second. The current driver > and overlay architecture did not seem to me to be a good match to > that use case (depending on the definition of "many"). I would rather we cover 99.999% now. My understanding is that the subdevices are flexible but fairly static and the frequency Lizhi mentions would cover development uses. In production I would expect the image to change about once a year with the same order of magnitude as firmware. Can you point me to a reference of a user case with high frequency images changing that also depends on pci io device changing? Tom > -Frank > >> Thanks, >> >> Lizhi >> >>> Or is the fpga program expected to change on an infrequent basis (eg monthly, quarterly, >>> annually), in the same way as device firmware and operating systems are updated on a regular >>> basis for bug fixes and new functionality? >>> >>> >>>> Thanks, >>>> >>>> Lzhi >>>> >>>>> -Frank
On 9/26/22 15:44, Rob Herring wrote: > On Fri, Sep 16, 2022 at 6:15 PM Frank Rowand <frowand.list@gmail.com> wrote: >> >> On 8/29/22 16:43, Lizhi Hou wrote: >>> This patch series introduces OF overlay support for PCI devices which >>> primarily addresses two use cases. First, it provides a data driven method >>> to describe hardware peripherals that are present in a PCI endpoint and >>> hence can be accessed by the PCI host. An example device is Xilinx/AMD >>> Alveo PCIe accelerators. Second, it allows reuse of a OF compatible >>> driver -- often used in SoC platforms -- in a PCI host based system. An >>> example device is Microchip LAN9662 Ethernet Controller. >>> >>> This patch series consolidates previous efforts to define such an >>> infrastructure: >>> https://lore.kernel.org/lkml/20220305052304.726050-1-lizhi.hou@xilinx.com/ >>> https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ >>> >>> Normally, the PCI core discovers PCI devices and their BARs using the >>> PCI enumeration process. However, the process does not provide a way to >>> discover the hardware peripherals that are present in a PCI device, and >>> which can be accessed through the PCI BARs. Also, the enumeration process >>> does not provide a way to associate MSI-X vectors of a PCI device with the >>> hardware peripherals that are present in the device. PCI device drivers >>> often use header files to describe the hardware peripherals and their >>> resources as there is no standard data driven way to do so. This patch> series proposes to use flattened device tree blob to describe the >>> peripherals in a data driven way. >> >>> Based on previous discussion, using >>> device tree overlay is the best way to unflatten the blob and populate >>> platform devices. >> >> I still do not agree with this statement. The device tree overlay >> implementation is very incomplete and should not be used until it >> becomes more complete. No need to debate this right now, but I don't want >> to let this go unchallenged. > > Then we should remove overlay support. The only way it becomes more > complete is having actual users. > > But really, whether this is the right solution to the problem is > independent of the state of kernel overlay support. > >> If there is no base system device tree on an ACPI based system, then I >> am not convinced that a mixed ACPI / device tree implementation is >> good architecture. > > Most/all of this series is needed for a DT system in which the PCI > devices are not populated in the DT. > >> I might be more supportive of using a device tree >> description of a PCI device in a detached device tree (not linked to >> the system device tree, but instead freestanding). Unfortunately the >> device tree functions assume a single system devicetree, with no concept >> of a freestanding tree (eg, if a NULL device tree node is provided to >> a function or macro, it often defaults to the root of the system device >> tree). I need to go look at whether the flag OF_DETACHED handles this, >> or if it could be leveraged to do so. > > Instead of worrying about a theoretical problem, we should see if > there is an actual problem for a user. > > I'm not so worried about DT functions themselves, but places which > have 'if ACPI ... else (DT) ...' paths. > Bringing this thread back into focus. Any thoughts on how to move forward? -Sonal > Rob
On 10/6/22 08:10, Rob Herring wrote: > On Fri, Sep 30, 2022 at 2:29 PM Sonal Santan <sonal.santan@amd.com> wrote: >> >> On 9/26/22 15:44, Rob Herring wrote: >>> On Fri, Sep 16, 2022 at 6:15 PM Frank Rowand <frowand.list@gmail.com> wrote: >>>> >>>> On 8/29/22 16:43, Lizhi Hou wrote: >>>>> This patch series introduces OF overlay support for PCI devices which >>>>> primarily addresses two use cases. First, it provides a data driven method >>>>> to describe hardware peripherals that are present in a PCI endpoint and >>>>> hence can be accessed by the PCI host. An example device is Xilinx/AMD >>>>> Alveo PCIe accelerators. Second, it allows reuse of a OF compatible >>>>> driver -- often used in SoC platforms -- in a PCI host based system. An >>>>> example device is Microchip LAN9662 Ethernet Controller. >>>>> >>>>> This patch series consolidates previous efforts to define such an >>>>> infrastructure: >>>>> https://lore.kernel.org/lkml/20220305052304.726050-1-lizhi.hou@xilinx.com/ >>>>> https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ >>>>> >>>>> Normally, the PCI core discovers PCI devices and their BARs using the >>>>> PCI enumeration process. However, the process does not provide a way to >>>>> discover the hardware peripherals that are present in a PCI device, and >>>>> which can be accessed through the PCI BARs. Also, the enumeration process >>>>> does not provide a way to associate MSI-X vectors of a PCI device with the >>>>> hardware peripherals that are present in the device. PCI device drivers >>>>> often use header files to describe the hardware peripherals and their >>>>> resources as there is no standard data driven way to do so. This patch> series proposes to use flattened device tree blob to describe the >>>>> peripherals in a data driven way. >>>> >>>>> Based on previous discussion, using >>>>> device tree overlay is the best way to unflatten the blob and populate >>>>> platform devices. >>>> >>>> I still do not agree with this statement. The device tree overlay >>>> implementation is very incomplete and should not be used until it >>>> becomes more complete. No need to debate this right now, but I don't want >>>> to let this go unchallenged. >>> >>> Then we should remove overlay support. The only way it becomes more >>> complete is having actual users. >>> >>> But really, whether this is the right solution to the problem is >>> independent of the state of kernel overlay support. >>> >>>> If there is no base system device tree on an ACPI based system, then I >>>> am not convinced that a mixed ACPI / device tree implementation is >>>> good architecture. >>> >>> Most/all of this series is needed for a DT system in which the PCI >>> devices are not populated in the DT. >>> >>>> I might be more supportive of using a device tree >>>> description of a PCI device in a detached device tree (not linked to >>>> the system device tree, but instead freestanding). Unfortunately the >>>> device tree functions assume a single system devicetree, with no concept >>>> of a freestanding tree (eg, if a NULL device tree node is provided to >>>> a function or macro, it often defaults to the root of the system device >>>> tree). I need to go look at whether the flag OF_DETACHED handles this, >>>> or if it could be leveraged to do so. >>> >>> Instead of worrying about a theoretical problem, we should see if >>> there is an actual problem for a user. >>> >>> I'm not so worried about DT functions themselves, but places which >>> have 'if ACPI ... else (DT) ...' paths. >>> >> >> Bringing this thread back into focus. Any thoughts on how to move forward? > > Reviewers raise concerns/issues and the submitters work to address > them or explain why they aren't an issue. The submitter has to push > things forward. That's how the process works. > We are working on updating the patch set to address the feedback. The design is still based on device tree overlay infrastructure. > As I noted, much of this is needed on a DT system with PCI device not > described in DT. So you could split out any ACPI system support to > avoid that concern for example. Enabling others to exercise these > patches may help too. Perhaps use QEMU to create some imaginary > device. To verify this patch set, in addition to a x86_64/ACPI based system, we also have an AARCH64/DT QEMU setup where we have attached a physical Alveo device. We haven't run into any ACPI or DTO issues so far. Perhaps we can introduce this feature in a phased manner where we first enable DT based platforms and then enable ACPI based platforms? -Sonal > > Rob
On Thu, Oct 13, 2022 at 12:28 PM Frank Rowand <frowand.list@gmail.com> wrote: > > On 10/13/22 03:02, Clément Léger wrote: > > Le Thu, 13 Oct 2022 01:05:26 -0500, > > Frank Rowand <frowand.list@gmail.com> a écrit : > > > >>> This would also require two different descriptions of the same card > >>> (for ACPI and device-tree) and would require the final user to create a > >>> specific overlay for its device based on the PCI slots the card is > >>> plugged in. > >> > >> One of the many missing pieces of overlay support. There have been several > >> discussion of how to describe a "socket" in a device tree that a device > >> could be plugged into, where a single device tree subtree .dtb could be > >> relocated to one or more different socket locations. Thus in this > >> case a single overlay could be relocated to various PCI slots. > >> > >> I don't expect be getting involved in any future efforts around sockets > >> (see my following comment for why). > >> > >>> > >>> The solution we proposed (Lizhi and I) allows to overcome these > >>> problems and is way easier to use. Fixing the potential bugs that might > >>> exists in the overlay layer seems a way better idea that just pushing > >> > >> It is not potential bugs. The current run time overlay implementation is > >> proof of concept quality and completeness. It is not production ready. > >> > >> I got an opportunity for early retirement a couple of weeks ago. My first > >> inclination was to continue the same level of device tree maintainership, > >> but I am quickly realizing that there are other activities that I would > >> like to devote my time and energy to. I will continue to support Rob with > >> minor patch reviews and testing, and potentially finishing up some > >> improvements to unittest. On the other hand, bringing run time overlay > >> support to product quality would be a major investment of my time that I > >> am not willing to continue. > > > > Hi Frank, > > > > This explains your position on the overlay support and I can > > certainly understand it ! Regarding the fact that it would enter > > No, my position on the technical aspects of overlay support is totally > unchanged. > > The only thing that has changed is that my time will not be available to > assist in future overlay related work. The burden for this will fall > more on Rob than it has in the past. s/Rob/someone that steps up to maintain the overlay code/ > > "production", the devices we are talking about are not really > > widespread yet? This would be a good opportunity to gather feedback > > early and improve the support gradually. We could probably even be able > > to support improvements in the overlay code if needed I guess. > > That is avoiding my point about the current implementation being > proof of concept. I think it would be better to talk in terms of under what conditions the overlay support is adequate (for production) rather than a blanket statement that it is not-production ready. A large part of it is really outside the code itself and related to going from static to dynamic DT. There are certainly issues, but dynamic DTs have been used in production for a very long time. However, that usage has been constrained. Rob
On 10/14/22 12:33, Rob Herring wrote: > On Thu, Oct 13, 2022 at 12:28 PM Frank Rowand <frowand.list@gmail.com> wrote: >> >> On 10/13/22 03:02, Clément Léger wrote: >>> Le Thu, 13 Oct 2022 01:05:26 -0500, >>> Frank Rowand <frowand.list@gmail.com> a écrit : >>> >>>>> This would also require two different descriptions of the same card >>>>> (for ACPI and device-tree) and would require the final user to create a >>>>> specific overlay for its device based on the PCI slots the card is >>>>> plugged in. >>>> >>>> One of the many missing pieces of overlay support. There have been several >>>> discussion of how to describe a "socket" in a device tree that a device >>>> could be plugged into, where a single device tree subtree .dtb could be >>>> relocated to one or more different socket locations. Thus in this >>>> case a single overlay could be relocated to various PCI slots. >>>> >>>> I don't expect be getting involved in any future efforts around sockets >>>> (see my following comment for why). >>>> >>>>> >>>>> The solution we proposed (Lizhi and I) allows to overcome these >>>>> problems and is way easier to use. Fixing the potential bugs that might >>>>> exists in the overlay layer seems a way better idea that just pushing >>>> >>>> It is not potential bugs. The current run time overlay implementation is >>>> proof of concept quality and completeness. It is not production ready. >>>> >>>> I got an opportunity for early retirement a couple of weeks ago. My first >>>> inclination was to continue the same level of device tree maintainership, >>>> but I am quickly realizing that there are other activities that I would >>>> like to devote my time and energy to. I will continue to support Rob with >>>> minor patch reviews and testing, and potentially finishing up some >>>> improvements to unittest. On the other hand, bringing run time overlay >>>> support to product quality would be a major investment of my time that I >>>> am not willing to continue. >>> >>> Hi Frank, >>> >>> This explains your position on the overlay support and I can >>> certainly understand it ! Regarding the fact that it would enter >> >> No, my position on the technical aspects of overlay support is totally >> unchanged. >> >> The only thing that has changed is that my time will not be available to >> assist in future overlay related work. The burden for this will fall >> more on Rob than it has in the past. > > s/Rob/someone that steps up to maintain the overlay code/ > >>> "production", the devices we are talking about are not really >>> widespread yet? This would be a good opportunity to gather feedback >>> early and improve the support gradually. We could probably even be able >>> to support improvements in the overlay code if needed I guess. >> >> That is avoiding my point about the current implementation being >> proof of concept. > > I think it would be better to talk in terms of under what conditions > the overlay support is adequate (for production) rather than a blanket > statement that it is not-production ready. I sort of agree. Use of run time overlays has been narrowly supported for use by a limited set of very cautious developers in a very constrained usage. > A large part of it is > really outside the code itself and related to going from static to > dynamic DT. There are certainly issues, but dynamic DTs have been used > in production for a very long time. However, that usage has been > constrained. Yes, to the dynamic DT comments. When the run time overlay code was added the overlay code used the existing dynamic DT code as a foundation but did not address the architectural issues that are exposed by using the dynamic DT code in a less constrained manner. > > Rob
On 9/25/22 22:03, Sonal Santan wrote: > On 9/19/22 20:12, Frank Rowand wrote: >> On 9/17/22 13:36, Tom Rix wrote: >>> Frank, >>> >>> On 9/16/22 7:23 PM, Frank Rowand wrote: >>>> On 9/13/22 16:02, Lizhi Hou wrote: >>>>> On 9/13/22 10:41, Frank Rowand wrote: >>>>>> On 9/13/22 12:10, Lizhi Hou wrote: >>>>>>> On 9/13/22 00:00, Frank Rowand wrote: >>>>>>>> On 8/29/22 16:43, Lizhi Hou wrote: >>>>>>>>> This patch series introduces OF overlay support for PCI devices which >>>>>>>>> primarily addresses two use cases. First, it provides a data driven method >>>>>>>>> to describe hardware peripherals that are present in a PCI endpoint and >>>>>>>>> hence can be accessed by the PCI host. An example device is Xilinx/AMD >>>>>>>>> Alveo PCIe accelerators. Second, it allows reuse of a OF compatible >>>>>>>>> driver -- often used in SoC platforms -- in a PCI host based system. An >>>>>>>>> example device is Microchip LAN9662 Ethernet Controller. >>>>>>>>> >>>>>>>>> This patch series consolidates previous efforts to define such an >>>>>>>>> infrastructure: >>>>>>>>> https://lore.kernel.org/lkml/20220305052304.726050-1-lizhi.hou@xilinx.com/ >>>>>>>>> https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ >>>>>>>>> >>>>>>>>> Normally, the PCI core discovers PCI devices and their BARs using the >>>>>>>>> PCI enumeration process. However, the process does not provide a way to >>>>>>>>> discover the hardware peripherals that are present in a PCI device, and >>>>>>>>> which can be accessed through the PCI BARs. Also, the enumeration process >>>>>>>>> does not provide a way to associate MSI-X vectors of a PCI device with the >>>>>>>>> hardware peripherals that are present in the device. PCI device drivers >>>>>>>>> often use header files to describe the hardware peripherals and their >>>>>>>>> resources as there is no standard data driven way to do so. This patch >>>>>>>>> series proposes to use flattened device tree blob to describe the >>>>>>>>> peripherals in a data driven way. Based on previous discussion, using >>>>>>>>> device tree overlay is the best way to unflatten the blob and populate >>>>>>>>> platform devices. To use device tree overlay, there are three obvious >>>>>>>>> problems that need to be resolved. >>>>>>>>> >>>>>>>>> First, we need to create a base tree for non-DT system such as x86_64. A >>>>>>>>> patch series has been submitted for this: >>>>>>>>> https://lore.kernel.org/lkml/20220624034327.2542112-1-frowand.list@gmail.com/ >>>>>>>>> https://lore.kernel.org/lkml/20220216050056.311496-1-lizhi.hou@xilinx.com/ >>>>>>>>> >>>>>>>>> Second, a device tree node corresponding to the PCI endpoint is required >>>>>>>>> for overlaying the flattened device tree blob for that PCI endpoint. >>>>>>>>> Because PCI is a self-discoverable bus, a device tree node is usually not >>>>>>>>> created for PCI devices. This series adds support to generate a device >>>>>>>>> tree node for a PCI device which advertises itself using PCI quirks >>>>>>>>> infrastructure. >>>>>>>>> >>>>>>>>> Third, we need to generate device tree nodes for PCI bridges since a child >>>>>>>>> PCI endpoint may choose to have a device tree node created. >>>>>>>>> >>>>>>>>> This patch series is made up of two patches. >>>>>>>>> >>>>>>>>> The first patch is adding OF interface to allocate an OF node. It is copied >>>>>>>>> from: >>>>>>>>> https://lore.kernel.org/lkml/20220620104123.341054-5-clement.leger@bootlin.com/ >>>>>>>>> >>>>>>>>> The second patch introduces a kernel option, CONFIG_PCI_OF. When the option >>>>>>>>> is turned on, the kernel will generate device tree nodes for all PCI >>>>>>>>> bridges unconditionally. The patch also shows how to use the PCI quirks >>>>>>>>> infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device tree node for >>>>>>>>> a device. Specifically, the patch generates a device tree node for Xilinx >>>>>>>>> Alveo U50 PCIe accelerator device. The generated device tree nodes do not >>>>>>>>> have any property. Future patches will add the necessary properties. >>>>>>>>> >>>>>>>>> Clément Léger (1): >>>>>>>>> of: dynamic: add of_node_alloc() >>>>>>>>> >>>>>>>>> Lizhi Hou (1): >>>>>>>>> pci: create device tree node for selected devices >>>>>>>>> >>>>>>>>> drivers/of/dynamic.c | 50 +++++++++++++---- >>>>>>>>> drivers/pci/Kconfig | 11 ++++ >>>>>>>>> drivers/pci/bus.c | 2 + >>>>>>>>> drivers/pci/msi/irqdomain.c | 6 +- >>>>>>>>> drivers/pci/of.c | 106 ++++++++++++++++++++++++++++++++++++ >>>>>>>>> drivers/pci/pci-driver.c | 3 +- >>>>>>>>> drivers/pci/pci.h | 16 ++++++ >>>>>>>>> drivers/pci/quirks.c | 11 ++++ >>>>>>>>> drivers/pci/remove.c | 1 + >>>>>>>>> include/linux/of.h | 7 +++ >>>>>>>>> 10 files changed, 200 insertions(+), 13 deletions(-) >>>>>>>>> >>>>>>>> The patch description leaves out the most important piece of information. >>>>>>>> >>>>>>>> The device located at the PCI endpoint is implemented via FPGA >>>>>>>> - which is programmed after Linux boots (or somewhere late in the boot process) >>>>>>>> - (A) and thus can not be described by static data available pre-boot because >>>>>>>> it is dynamic (and the FPGA program will often change while the Linux >>>>>>>> kernel is already booted >>>>>>>> - (B) can be described by static data available pre-boot because the FPGA >>>>>>>> program will always be the same for this device on this system >>>>>>>> >>>>>>>> I am not positive what part of what I wrote above is correct and would appreciate >>>>>>>> some confirmation of what is correct or incorrect. >>>>>>> There are 2 series devices rely on this patch: >>>>>>> >>>>>>> 1) Xilinx Alveo Accelerator cards (FPGA based device) >>>>>>> >>>>>>> 2) lan9662 PCIe card >>>>>>> >>>>>>> please see: https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ >>>>>> Thanks. Please include this information in future versions of the patch series. >>>>>> >>>>>> For device 2 I have strongly recommended using pre-boot apply of the overlay to the base >>>>>> device tree. I realize that this suggestion is only a partial solution if one wants to >>>>>> use hotplug to change system configuration (as opposed to using hotplug only to replace >>>>>> an existing device (eg a broken device) with another instance of the same device). I >>>>>> also realize that this increased the system administration overhead. On the other hand >>>>>> an overlay based solution is likely to be fragile and possibly flaky. >>>>> Can you clarify the pre-boot apply approach? How will it work for PCI devices? >>>>>>> For Xilinx Alveo device, it is (A). The FPGA partitions can be programmed dynamically after boot. >>>>>> I looked at the Xilinx Alveo web page, and there are a variety of types of Alveo cards >>>>>> available. So the answer to my next question may vary by type of card. >>>>>> >>>>>> Is it expected that the fpga program on a given card will change frequently (eg multiple >>>>>> times per day), where the changed program results in a new device that would require a >>>>>> different hardware description in the device tree? >>>>> Different images may be loaded to a FPGA partition several times a >>>>> day. The PCI topology (Device IDs, BARs, MSIx, etc) does not change. >>>>> New IPs may appear (and old IPs may disappear) on the BARs when a new >>>>> image is loaded. We would like to use flattened device tree to >>>>> describe the IPs on the BARs. >>>> That was kind of a non-answer. I know that images _may_ change at >>>> some frequency. I was trying to get a sense of whether the images >>>> were _likely_ to be changing on a frequent basis for these types >>>> of boards, or whether frequent image changes are likely to be a >>>> rare edge use case. >>>> >>>> If there is a good design for the 99.999% use case that does not >>>> support the 0.001% use case then it may be better to not create >>>> an inferior design that also supports the 0.001% use case. >>>> >>>> I hope that gives a better idea of the reason why I was asking the >>>> question and how the answer could impact design and implementation >>>> decisions. >>>> >>>> As a point of reference, some other fpga users have indicated a >>>> desire to change images many times per second. The current driver >>>> and overlay architecture did not seem to me to be a good match to >>>> that use case (depending on the definition of "many"). >>> >>> I would rather we cover 99.999% now. >>> >>> My understanding is that the subdevices are flexible but fairly >>> static and the frequency Lizhi mentions would cover development >>> uses. >>> >>> In production I would expect the image to change about once a year >>> with the same order of magnitude as firmware. >> >> Thanks for this info, it helps a lot. >> >>> >>> Can you point me to a reference of a user case with high frequency >>> images changing that also depends on pci io device changing? >> >> I actually don't have references to any previous PCI devices that are >> based on FPGAs, let alone with a high frequency of images changing. >> >> The Alveo devices are the first such devices that have come to my >> attention. Note that this is a technology space that I do not >> follow, so my lack of awareness does not mean much. >> >> I do not remember the specific discussion that was asserting or >> desiring a high frequency of image changes for an FPGA. The >> current overlay architecture and overall device tree architecture >> would not handle this well and/or robustly because (off the top of >> my head, hopefully I'm getting this correct) the live system device >> tree does not directly contain all of the associated data - some of >> it is contained in the unflattened device tree (FDT) that remains in >> memory after unflattening, both in the case of the base system device >> tree and overlay device trees. Some of the device tree data APIs return >> pointers to this data in the FDT. And the API does not provide reference >> counting for the data (just reference counting for nodes - and these >> reference counts are know to be frequently incorrect). >> > Thanks for pointing out the limitations of the current overlay > architecture. Can a careful orchestration of overlay creation and > tear down by each driver address the limitation? No, that is not practical (for example, see the never ending patches to address broken refcounting -- of_node_get() and of_node_put() usage). Plus the overlay data in the system device tree is visible to every driver and subsystem, not just the limited set that appear to be related to the nodes in the overlay. > I did see another > user, drivers/pci/hotplug/pnv_php.c, which seems to be using the > overlay infrastructure in this manner. What tree is that in? And what sections of that file? > > What is your suggestion to move forward? https://elinux.org/Frank%27s_Evolving_Overlay_Thoughts has a large but incomplete list of areas to be worked on. -Frank > > -Sonal > >> In general I have very little visibility into the FPGA space so I go >> out of my way to notify them before making changes to the overlay >> implementation, API, etc; listen carefully to their input; and give >> them lots of opportunity to test any resulting changes. >> >> -Frank >> >>> >>> Tom >>> >>>> -Frank >>>> >>>>> Thanks, >>>>> >>>>> Lizhi >>>>> >>>>>> Or is the fpga program expected to change on an infrequent basis (eg monthly, quarterly, >>>>>> annually), in the same way as device firmware and operating systems are updated on a regular >>>>>> basis for bug fixes and new functionality? >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Lzhi >>>>>>> >>>>>>>> -Frank >>> >> >
On 10/17/22 00:18, Clément Léger wrote: > Le Fri, 14 Oct 2022 13:52:50 -0500, > Frank Rowand <frowand.list@gmail.com> a écrit : > >> On 10/14/22 12:33, Rob Herring wrote: >>> On Thu, Oct 13, 2022 at 12:28 PM Frank Rowand <frowand.list@gmail.com> wrote: >>>> >>>> On 10/13/22 03:02, Clément Léger wrote: >>>>> Le Thu, 13 Oct 2022 01:05:26 -0500, >>>>> Frank Rowand <frowand.list@gmail.com> a écrit : >>>>> >>>>>>> This would also require two different descriptions of the same card >>>>>>> (for ACPI and device-tree) and would require the final user to create a >>>>>>> specific overlay for its device based on the PCI slots the card is >>>>>>> plugged in. >>>>>> >>>>>> One of the many missing pieces of overlay support. There have been several >>>>>> discussion of how to describe a "socket" in a device tree that a device >>>>>> could be plugged into, where a single device tree subtree .dtb could be >>>>>> relocated to one or more different socket locations. Thus in this >>>>>> case a single overlay could be relocated to various PCI slots. >>>>>> >>>>>> I don't expect be getting involved in any future efforts around sockets >>>>>> (see my following comment for why). >>>>>> >>>>>>> >>>>>>> The solution we proposed (Lizhi and I) allows to overcome these >>>>>>> problems and is way easier to use. Fixing the potential bugs that might >>>>>>> exists in the overlay layer seems a way better idea that just pushing >>>>>> >>>>>> It is not potential bugs. The current run time overlay implementation is >>>>>> proof of concept quality and completeness. It is not production ready. >>>>>> >>>>>> I got an opportunity for early retirement a couple of weeks ago. My first >>>>>> inclination was to continue the same level of device tree maintainership, >>>>>> but I am quickly realizing that there are other activities that I would >>>>>> like to devote my time and energy to. I will continue to support Rob with >>>>>> minor patch reviews and testing, and potentially finishing up some >>>>>> improvements to unittest. On the other hand, bringing run time overlay >>>>>> support to product quality would be a major investment of my time that I >>>>>> am not willing to continue. >>>>> >>>>> Hi Frank, >>>>> >>>>> This explains your position on the overlay support and I can >>>>> certainly understand it ! Regarding the fact that it would enter >>>> >>>> No, my position on the technical aspects of overlay support is totally >>>> unchanged. >>>> >>>> The only thing that has changed is that my time will not be available to >>>> assist in future overlay related work. The burden for this will fall >>>> more on Rob than it has in the past. >>> >>> s/Rob/someone that steps up to maintain the overlay code/ >>> >>>>> "production", the devices we are talking about are not really >>>>> widespread yet? This would be a good opportunity to gather feedback >>>>> early and improve the support gradually. We could probably even be able >>>>> to support improvements in the overlay code if needed I guess. >>>> >>>> That is avoiding my point about the current implementation being >>>> proof of concept. >>> >> >> >>> I think it would be better to talk in terms of under what conditions >>> the overlay support is adequate (for production) rather than a blanket >>> statement that it is not-production ready. >> >> I sort of agree. Use of run time overlays has been narrowly supported >> for use by a limited set of very cautious developers in a very constrained >> usage. > > As a first working point, could we potentially restrict drivers to only > insert an overlay but not remove it ? It would be quite limited, but > as you pointed out, the multiple load/unload (or FPGA reconfiguration) > will only happen during development. Under "normal" condition, we could > expect the FPGA to be configured once during the system runtime. The > same goes for our PCI card which uses an existing SoC, we can probably > assume that it is going to be plugged once for all during the system > runtime. > > This would limit the problems that might happen due to dynamic > insertion/removal of the overlay. > We would need "limited" overlay removal support to handle driver unload or device hotplug. Limited removal support will also be needed for Alveo use case in order to handle FPGA reconfiguration in production environment. -Sonal