mbox series

[v7,00/12] dmaengine/soc: Add Texas Instruments UDMA support

Message ID 20191209094332.4047-1-peter.ujfalusi@ti.com
Headers show
Series dmaengine/soc: Add Texas Instruments UDMA support | expand

Message

Peter Ujfalusi Dec. 9, 2019, 9:43 a.m. UTC
Hi,

Vinod, Nishanth, Tero, Santosh: the ti_sci patch in this series was sent
upstream over a month ago:
https://lore.kernel.org/lkml/20191025084715.25098-1-peter.ujfalusi@ti.com/

I'm still waiting on it's fate (Tero has given his r-b).
The ti_sci patch did not made it to 5.5-rc1, but I included it in the series and
let the maintainers decide if it can go via DMAengine for 5.6 or to later
releases (5.6 probably for the ti_sci and 5.7 for the UDMA driver patch).

Changes since v6:
(https://patchwork.kernel.org/project/linux-dmaengine/list/?series=209455&state=*)

- UDMAP DMAengine driver:
 - Squashed the split patches
 - Squashed the early TX completion handling update
   (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=210713&state=*)
 - Hard reset fix for RX channels to avoid channel lockdown
 - Correct completed descriptor's residue value

Changes since v5:
(https://patchwork.kernel.org/project/linux-dmaengine/list/?series=201051&state=*)
- Based on 5.4

- cppi5 header
 - clear the bits before setting new value with '|='

- UDMAP DT bindings:
 - valid compatibles as single enum list

- UDMAP DMAengine driver:
 - Fix udma_is_chan_running()
 - Use flags for acc32, burst support instead of a bool in udma_match_data
   struct
 - TDTYPE handling (teardown completion handling for j721e) is moved to separate
   patch as the tisci core patch has not moved for over a month.
   Both ti_sci and the iterative patch to udma is included in the series.

Changes since v4
(https://patchwork.kernel.org/project/linux-dmaengine/list/?series=196619&state=*)
- Based on 5.4-rc7

- ringacc DT bindings:
 - clarify the meaning of ti,sci-dev-id

- ringacc driver:
 - Remove 'default y' from Kconfig
 - Fix struct comments
 - Move try_module_get() earlier in k3_ringacc_request_ring()

- PSI-L thread database:
 - Add kernel style struct/enum documentation
 - Add missing thread description for sa2ul second interface
 - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL

- UDMAP DT bindings:
 - move to dual license
 - change compatible from const to enum
 - items dropped for ti,sci-rm-ranges-*
 - description text moved from literal block when it is sensible
 - example fixed to compile cleanly
  - added parent to provide correct address-cells
  - navss is moved to simple-mfd from simple-bus

- UDMAP DMAengine driver:
 - move fd_ring/r_ring under rflow
 - get rid of unused iomem for rflows
 - Remove 'default y' from Kconfig
 - Use defines for rflow src/dst tag selection
 - Merge the udma_ring_callback() and udma_tr_event_callback() to their
   corresponding interrupt handler
 - Create new defines for tx/rx channel's tisci valid parameter flags
 - Remove re-initialization to 0 of tisci request struct members
 - Make sure that vchan tasklets are also stopped when removing the module
 - Additional checkpatch --strict fixes when it made sense
  - make W=1 was clean

- UDMAP glue layer:
 - Remove 'default y' from Kconfig
 - commit message update for features needing the glue layer

Changes since v3
(https://patchwork.kernel.org/project/linux-dmaengine/list/?series=180679&state=*):
- Based on 5.4-rc5
- Fixed typos pointed out by Tero
- Added reviewed-by tags from Tero

- ring accelerator driver
 - TODO_GS is removed from the header
 - pm_runtime removed as NAVSS and it's components are always on
 - Check validity of Message mode setup (element size > 8 bytes must use proxy)

- cppi5 header
 - add commit message

- UDMAP DT bindings
 - Drop the psil-config node use on the remote PSI-L side and use only one cell
   which is the remote threadID:

     dmas = <&main_udmap 0xc400>, <&main_udmap 0x4400>;
     dma-names = "tx", "rx";

 - The PSI-L thread configuration description is moved to kernel as a new module:
   k3-psil/k3-psil-am654/k3-psil-j721e
 - ti,psil-base has been removed and moved to kernel
 - removed the no longer needed dt-bindings/dma/k3-udma.h
 - Convert the document to schema (yaml)

- NEW PSI-L endpoint configuration database
 - a simple database holding the remote end's configuration needed for UDMAP
   configuration. All previous parameters from DT has been moved here and merged
   with the linux only tr mode channel flag.
 - Client drivers can update the remote endpoint configuration as it can be
   different based on system configuration and the endpoint itself is under the
   control of the peripheral driver.
 - database for am654 and j721e

- UDMAP DMAengine driver
 - pm_runtime removed as NAVSS and it's components are always on
 - rchan_oes_offset added to MSI dommain allocation
 - Use the new PSI-L endpoint database for UDMAP configuration
 - Support for waiting for PDMA teardown completion on j721e instead of
   returning right away. depends on:
   https://lkml.org/lkml/2019/10/25/189
   Not included in this series, but it is in the branch I have prepared.
 - psil-base is moved from DT to be part of udma_match_data
 - tr_thread maps is removed and using the PSI-L endpoint configuration for it

- UDMAP glue layer
 - pm_runtime removed as NAVSS and it's components are always on
 - Use the new PSI-L endpoint database for UDMAP configuration

Changes since v2
(https://patchwork.kernel.org/project/linux-dmaengine/list/?series=152609&state=*)
- Based on 5.4-rc1
- Support for Flow only data transfer for the glue layer

- cppi5 header
 - comments converted to kernel-doc style
 - Remove the excessive WARN_ONs and rely on the user for sanity
 - new macro for checking TearDown Completion Message

- ring accelerator driver
 - fixed up th commit message (SoB, TI-SCI)
 - fixed ring reset
 - CONFIG_TI_K3_RINGACC_DEBUG is removed along with the dbg_write/read functions
   and use dev_dbg()
 - k3_ringacc_ring_dump() is moved to static
 - step numbering removed from k3_ringacc_ring_reset_dma()
 - Add clarification comment for shared ring usage in k3_ringacc_ring_cfg()
 - Magic shift values in k3_ringacc_ring_cfg_proxy() got defined
 - K3_RINGACC_RING_MODE_QM is removed as it is not supported

- UDMAP DT bindings
 - Fix property prefixing: s/pdma,/ti,pdma-
 - Add ti,notdpkt property to suppress teardown completion message on tchan
 - example updated accordingly

- UDMAP DMAengine driver
 - Change __raw_readl/writel to readl/writel
 - Split up the udma_tisci_channel_config() into m2m, tx and rx tisci
   configuration functions for clarity
 - DT bindings change: s/pdma,/ti,pdma-
 - Cleanup of udma_tx_status():
  - residue calculation fix for m2m
  - no need to read packet counter as it is not used
  - peer byte counter only available in PDMAs
  - Proper locking to avoid race with interrupt handler (polled m2m fix)
 - Support for ti,notdpkt
 - RFLOW management rework to support data movement without channel:
  - the channel is not controlled by Linux but other core and we only have
    rflows and rings to do the DMA transfers.
    This mode is only supported by the Glue layer for now.

- UDMAP glue layer
 - Debug print improvements
 - Support for rflow/ring only data movement

Changes since v1
(https://patchwork.kernel.org/project/linux-dmaengine/list/?series=114105&state=*)
- Added support for j721e
- Based on 5.3-rc2
- dropped ti_sci API patch for RM management as it is already upstream
- dropped dmadev_get_slave_channel() patch, using __dma_request_channel()
- Added Rob's Reviewed-by to ringacc DT binding document patch
- DT bindings changes:
 - linux,udma-mode is gone, I have a simple lookup table in the driver to flag
   TR channels.
 - Support for j721e
- Fix bug in of_node_put() handling in xlate function

Changes since RFC (https://patchwork.kernel.org/cover/10612465/):
- Based on linux-next (20190506) which now have the ti_sci interrupt support
- The series can be applied and the UDMA via DMAengine API will be functional
- Included in the series: ti_sci Resource management API, cppi5 header and
  driver for the ring accelerator.
- The DMAengine core patches have been updated as per the review comments for
  earlier submittion.
- The DMAengine driver patch is artificially split up to 6 smaller patches

The k3-udma driver implements the Data Movement Architecture described in
AM65x TRM (http://www.ti.com/lit/pdf/spruid7) and
j721e TRM (http://www.ti.com/lit/pdf/spruil1)

This DMA architecture is a big departure from 'traditional' architecture where
we had either EDMA or sDMA as system DMA.

Packet DMAs were used as dedicated DMAs to service only networking (Kesytone2)
or USB (am335x) while other peripherals were serviced by EDMA.

In AM65x/j721e the UDMA (Unified DMA) is used for all data movment within the
SoC, tasked to service all peripherals (UART, McSPI, McASP, networking, etc). 

The NAVSS/UDMA is built around CPPI5 (Communications Port Programming Interface)
and it supports Packet mode (similar to CPPI4.1 in Keystone2 for networking) and
TR mode (similar to EDMA descriptor).
The data movement is done within a PSI-L fabric, peripherals (including the
UDMA-P) are not addressed by their I/O register as with traditional DMAs but
with their PSI-L thread ID.

In AM65x/j721e we have two main type of peripherals:
Legacy: McASP, McSPI, UART, etc.
 to provide connectivity they are serviced by PDMA (Peripheral DMA)
 PDMA threads are locked to service a given peripheral, for example PSI-L thread
 0x4400/0xc400 is to service McASP0 rx/tx.
 The PDMa configuration can be done via the UDMA Real Time Peer registers.
Native: Networking, security accelerator
 these peripherals have native support for PSI-L.

To be able to use the DMA the following generic steps need to be taken:
- configure a DMA channel (tchan for TX, rchan for RX)
 - channel mode: Packet or TR mode
 - for memcpy a tchan and rchan pair is used.
 - for packet mode RX we also need to configure a receive flow to configure the
   packet receiption
- the source and destination threads must be paired
- at minimum one pair of rings need to be configured:
 - tx: transfer ring and transfer completion ring
 - rx: free descriptor ring and receive ring
- two interrupts: UDMA-P channel interrupt and ring interrupt for tc_ring/r_ring
 - If the channel is in packet mode or configured to memcpy then we only need
   one interrupt from the ring, events from UDMAP is not used.

When the channel setup is completed we only interract with the rings:
- TX: push a descriptor to t_ring and wait for it to be pushed to the tc_ring by
  the UDMA-P
- RX: push a descriptor to the fd_ring and waith for UDMA-P to push it back to
  the r_ring.

Since we have FIFOs in the DMA fabric (UDMA-P, PSI-L and PDMA) which was not the
case in previous DMAs we need to report the amount of data held in these FIFOs
to clients (delay calculation for ALSA, UART FIFO flush support).

Metadata support:
DMAengine user driver was posted upstream based/tested on the v1 of the UDMA
series: https://lkml.org/lkml/2019/6/28/20
SA2UL is using the metadata DMAengine API.

Note on the last patch:
In Keystone2 the networking had dedicated DMA (packet DMA) which is not the case
anymore and the DMAengine API currently missing support for the features we
would need to support networking, things like
- support for receive descriptor 'classification'
 - we need to support several receive queues for a channel.
 - the queues are used for packet priority handling for example, but they can be
   used to have pools of descriptors for different sizes.
- out of order completion of descriptors on a channel
 - when we have several queues to handle different priority packets the
   descriptors will be completed 'out-of-order'
- NAPI type of operation (polling instead of interrupt driven transfer)
 - without this we can not sustain gigabit speeds and we need to support NAPI
 - not to limit this to networking, but other high performance operations

It is my intention to work on these to be able to remove the 'glue' layer and
switch to DMAengine API - or have an API aside of DMAengine to have generic way
to support networking, but given how controversial and not trivial these changes
are we need something to support networking.

The series (+DT patches to enabled DMA on AM65x and j721e) on top of 5.5-rc1 is
available:
https://github.com/omap-audio/linux-audio.git peter/udma/series_v7-5.5-rc1

Regards,
Peter
---
Grygorii Strashko (3):
  bindings: soc: ti: add documentation for k3 ringacc
  soc: ti: k3: add navss ringacc driver
  dmaengine: ti: k3-udma: Add glue layer for non DMAengine users

Peter Ujfalusi (9):
  dmaengine: doc: Add sections for per descriptor metadata support
  dmaengine: Add metadata_ops for dma_async_tx_descriptor
  dmaengine: Add support for reporting DMA cached data amount
  dmaengine: ti: Add cppi5 header for K3 NAVSS/UDMA
  dmaengine: ti: k3 PSI-L remote endpoint configuration
  dt-bindings: dma: ti: Add document for K3 UDMA
  dmaengine: ti: New driver for K3 UDMA
  firmware: ti_sci: rm: Add support for tx_tdtype parameter for tx
    channel
  dmaengine: ti: k3-udma: Wait for peer teardown completion if supported

 .../devicetree/bindings/dma/ti/k3-udma.yaml   |  185 +
 .../devicetree/bindings/soc/ti/k3-ringacc.txt |   59 +
 Documentation/driver-api/dmaengine/client.rst |   75 +
 .../driver-api/dmaengine/provider.rst         |   46 +
 drivers/dma/dmaengine.c                       |   73 +
 drivers/dma/dmaengine.h                       |    8 +
 drivers/dma/ti/Kconfig                        |   24 +
 drivers/dma/ti/Makefile                       |    3 +
 drivers/dma/ti/k3-psil-am654.c                |  175 +
 drivers/dma/ti/k3-psil-j721e.c                |  222 ++
 drivers/dma/ti/k3-psil-priv.h                 |   39 +
 drivers/dma/ti/k3-psil.c                      |   97 +
 drivers/dma/ti/k3-udma-glue.c                 | 1198 ++++++
 drivers/dma/ti/k3-udma-private.c              |  133 +
 drivers/dma/ti/k3-udma.c                      | 3452 +++++++++++++++++
 drivers/dma/ti/k3-udma.h                      |  151 +
 drivers/firmware/ti_sci.c                     |    1 +
 drivers/firmware/ti_sci.h                     |    7 +
 drivers/soc/ti/Kconfig                        |   11 +
 drivers/soc/ti/Makefile                       |    1 +
 drivers/soc/ti/k3-ringacc.c                   | 1180 ++++++
 include/linux/dma/k3-psil.h                   |   71 +
 include/linux/dma/k3-udma-glue.h              |  134 +
 include/linux/dma/ti-cppi5.h                  | 1061 +++++
 include/linux/dmaengine.h                     |  110 +
 include/linux/soc/ti/k3-ringacc.h             |  244 ++
 include/linux/soc/ti/ti_sci_protocol.h        |    2 +
 27 files changed, 8762 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.yaml
 create mode 100644 Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt
 create mode 100644 drivers/dma/ti/k3-psil-am654.c
 create mode 100644 drivers/dma/ti/k3-psil-j721e.c
 create mode 100644 drivers/dma/ti/k3-psil-priv.h
 create mode 100644 drivers/dma/ti/k3-psil.c
 create mode 100644 drivers/dma/ti/k3-udma-glue.c
 create mode 100644 drivers/dma/ti/k3-udma-private.c
 create mode 100644 drivers/dma/ti/k3-udma.c
 create mode 100644 drivers/dma/ti/k3-udma.h
 create mode 100644 drivers/soc/ti/k3-ringacc.c
 create mode 100644 include/linux/dma/k3-psil.h
 create mode 100644 include/linux/dma/k3-udma-glue.h
 create mode 100644 include/linux/dma/ti-cppi5.h
 create mode 100644 include/linux/soc/ti/k3-ringacc.h

-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

Comments

Tero Kristo Dec. 11, 2019, 10:24 a.m. UTC | #1
On 09/12/2019 11:43, Peter Ujfalusi wrote:
> The system controller's resource manager have support for configuring the

> TDTYPE of TCHAN_CFG register on j721e.

> With this parameter the teardown completion can be controlled:

> TDTYPE == 0: Return without waiting for peer to complete the teardown

> TDTYPE == 1: Wait for peer to complete the teardown

> 

> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>


Hi Peter,

You somehow dropped my reviewed by tag from this patch, this appears 
identical to the v6 one. So,

Reviewed-by: Tero Kristo <t-kristo@ti.com>


> ---

>   drivers/firmware/ti_sci.c              | 1 +

>   drivers/firmware/ti_sci.h              | 7 +++++++

>   include/linux/soc/ti/ti_sci_protocol.h | 2 ++

>   3 files changed, 10 insertions(+)

> 

> diff --git a/drivers/firmware/ti_sci.c b/drivers/firmware/ti_sci.c

> index 4126be9e3216..f13e4a96f3b7 100644

> --- a/drivers/firmware/ti_sci.c

> +++ b/drivers/firmware/ti_sci.c

> @@ -2412,6 +2412,7 @@ static int ti_sci_cmd_rm_udmap_tx_ch_cfg(const struct ti_sci_handle *handle,

>   	req->fdepth = params->fdepth;

>   	req->tx_sched_priority = params->tx_sched_priority;

>   	req->tx_burst_size = params->tx_burst_size;

> +	req->tx_tdtype = params->tx_tdtype;

>   

>   	ret = ti_sci_do_xfer(info, xfer);

>   	if (ret) {

> diff --git a/drivers/firmware/ti_sci.h b/drivers/firmware/ti_sci.h

> index f0d068c03944..255327171dae 100644

> --- a/drivers/firmware/ti_sci.h

> +++ b/drivers/firmware/ti_sci.h

> @@ -910,6 +910,7 @@ struct rm_ti_sci_msg_udmap_rx_flow_opt_cfg {

>    *   12 - Valid bit for @ref ti_sci_msg_rm_udmap_tx_ch_cfg::tx_credit_count

>    *   13 - Valid bit for @ref ti_sci_msg_rm_udmap_tx_ch_cfg::fdepth

>    *   14 - Valid bit for @ref ti_sci_msg_rm_udmap_tx_ch_cfg::tx_burst_size

> + *   15 - Valid bit for @ref ti_sci_msg_rm_udmap_tx_ch_cfg::tx_tdtype

>    *

>    * @nav_id: SoC device ID of Navigator Subsystem where tx channel is located

>    *

> @@ -973,6 +974,11 @@ struct rm_ti_sci_msg_udmap_rx_flow_opt_cfg {

>    *

>    * @tx_burst_size: UDMAP transmit channel burst size configuration to be

>    * programmed into the tx_burst_size field of the TCHAN_TCFG register.

> + *

> + * @tx_tdtype: UDMAP transmit channel teardown type configuration to be

> + * programmed into the tdtype field of the TCHAN_TCFG register:

> + * 0 - Return immediately

> + * 1 - Wait for completion message from remote peer

>    */

>   struct ti_sci_msg_rm_udmap_tx_ch_cfg_req {

>   	struct ti_sci_msg_hdr hdr;

> @@ -994,6 +1000,7 @@ struct ti_sci_msg_rm_udmap_tx_ch_cfg_req {

>   	u16 fdepth;

>   	u8 tx_sched_priority;

>   	u8 tx_burst_size;

> +	u8 tx_tdtype;

>   } __packed;

>   

>   /**

> diff --git a/include/linux/soc/ti/ti_sci_protocol.h b/include/linux/soc/ti/ti_sci_protocol.h

> index 9531ec823298..f3aed0b91564 100644

> --- a/include/linux/soc/ti/ti_sci_protocol.h

> +++ b/include/linux/soc/ti/ti_sci_protocol.h

> @@ -342,6 +342,7 @@ struct ti_sci_msg_rm_udmap_tx_ch_cfg {

>   #define TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_SUPR_TDPKT_VALID        BIT(11)

>   #define TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_CREDIT_COUNT_VALID      BIT(12)

>   #define TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FDEPTH_VALID            BIT(13)

> +#define TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_TDTYPE_VALID            BIT(15)

>   	u16 nav_id;

>   	u16 index;

>   	u8 tx_pause_on_err;

> @@ -359,6 +360,7 @@ struct ti_sci_msg_rm_udmap_tx_ch_cfg {

>   	u16 fdepth;

>   	u8 tx_sched_priority;

>   	u8 tx_burst_size;

> +	u8 tx_tdtype;

>   };

>   

>   /**

> 


--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
J, KEERTHY Dec. 11, 2019, 10:43 a.m. UTC | #2
On 09/12/19 3:13 pm, Peter Ujfalusi wrote:
> Hi,

> 

> Vinod, Nishanth, Tero, Santosh: the ti_sci patch in this series was sent

> upstream over a month ago:

> https://lore.kernel.org/lkml/20191025084715.25098-1-peter.ujfalusi@ti.com/

> 

> I'm still waiting on it's fate (Tero has given his r-b).

> The ti_sci patch did not made it to 5.5-rc1, but I included it in the series and

> let the maintainers decide if it can go via DMAengine for 5.6 or to later

> releases (5.6 probably for the ti_sci and 5.7 for the UDMA driver patch).


Tested this series for sa2ul crypto for AES & 3DES which need the
metadata implementation by this series for sa2ul specific functionalities.

FWIW: Tested-by: Keerthy <j-keerthy@ti.com>

> 

> Changes since v6:

> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=209455&state=*)

> 

> - UDMAP DMAengine driver:

>  - Squashed the split patches

>  - Squashed the early TX completion handling update

>    (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=210713&state=*)

>  - Hard reset fix for RX channels to avoid channel lockdown

>  - Correct completed descriptor's residue value

> 

> Changes since v5:

> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=201051&state=*)

> - Based on 5.4

> 

> - cppi5 header

>  - clear the bits before setting new value with '|='

> 

> - UDMAP DT bindings:

>  - valid compatibles as single enum list

> 

> - UDMAP DMAengine driver:

>  - Fix udma_is_chan_running()

>  - Use flags for acc32, burst support instead of a bool in udma_match_data

>    struct

>  - TDTYPE handling (teardown completion handling for j721e) is moved to separate

>    patch as the tisci core patch has not moved for over a month.

>    Both ti_sci and the iterative patch to udma is included in the series.

> 

> Changes since v4

> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=196619&state=*)

> - Based on 5.4-rc7

> 

> - ringacc DT bindings:

>  - clarify the meaning of ti,sci-dev-id

> 

> - ringacc driver:

>  - Remove 'default y' from Kconfig

>  - Fix struct comments

>  - Move try_module_get() earlier in k3_ringacc_request_ring()

> 

> - PSI-L thread database:

>  - Add kernel style struct/enum documentation

>  - Add missing thread description for sa2ul second interface

>  - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL

> 

> - UDMAP DT bindings:

>  - move to dual license

>  - change compatible from const to enum

>  - items dropped for ti,sci-rm-ranges-*

>  - description text moved from literal block when it is sensible

>  - example fixed to compile cleanly

>   - added parent to provide correct address-cells

>   - navss is moved to simple-mfd from simple-bus

> 

> - UDMAP DMAengine driver:

>  - move fd_ring/r_ring under rflow

>  - get rid of unused iomem for rflows

>  - Remove 'default y' from Kconfig

>  - Use defines for rflow src/dst tag selection

>  - Merge the udma_ring_callback() and udma_tr_event_callback() to their

>    corresponding interrupt handler

>  - Create new defines for tx/rx channel's tisci valid parameter flags

>  - Remove re-initialization to 0 of tisci request struct members

>  - Make sure that vchan tasklets are also stopped when removing the module

>  - Additional checkpatch --strict fixes when it made sense

>   - make W=1 was clean

> 

> - UDMAP glue layer:

>  - Remove 'default y' from Kconfig

>  - commit message update for features needing the glue layer

> 

> Changes since v3

> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=180679&state=*):

> - Based on 5.4-rc5

> - Fixed typos pointed out by Tero

> - Added reviewed-by tags from Tero

> 

> - ring accelerator driver

>  - TODO_GS is removed from the header

>  - pm_runtime removed as NAVSS and it's components are always on

>  - Check validity of Message mode setup (element size > 8 bytes must use proxy)

> 

> - cppi5 header

>  - add commit message

> 

> - UDMAP DT bindings

>  - Drop the psil-config node use on the remote PSI-L side and use only one cell

>    which is the remote threadID:

> 

>      dmas = <&main_udmap 0xc400>, <&main_udmap 0x4400>;

>      dma-names = "tx", "rx";

> 

>  - The PSI-L thread configuration description is moved to kernel as a new module:

>    k3-psil/k3-psil-am654/k3-psil-j721e

>  - ti,psil-base has been removed and moved to kernel

>  - removed the no longer needed dt-bindings/dma/k3-udma.h

>  - Convert the document to schema (yaml)

> 

> - NEW PSI-L endpoint configuration database

>  - a simple database holding the remote end's configuration needed for UDMAP

>    configuration. All previous parameters from DT has been moved here and merged

>    with the linux only tr mode channel flag.

>  - Client drivers can update the remote endpoint configuration as it can be

>    different based on system configuration and the endpoint itself is under the

>    control of the peripheral driver.

>  - database for am654 and j721e

> 

> - UDMAP DMAengine driver

>  - pm_runtime removed as NAVSS and it's components are always on

>  - rchan_oes_offset added to MSI dommain allocation

>  - Use the new PSI-L endpoint database for UDMAP configuration

>  - Support for waiting for PDMA teardown completion on j721e instead of

>    returning right away. depends on:

>    https://lkml.org/lkml/2019/10/25/189

>    Not included in this series, but it is in the branch I have prepared.

>  - psil-base is moved from DT to be part of udma_match_data

>  - tr_thread maps is removed and using the PSI-L endpoint configuration for it

> 

> - UDMAP glue layer

>  - pm_runtime removed as NAVSS and it's components are always on

>  - Use the new PSI-L endpoint database for UDMAP configuration

> 

> Changes since v2

> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=152609&state=*)

> - Based on 5.4-rc1

> - Support for Flow only data transfer for the glue layer

> 

> - cppi5 header

>  - comments converted to kernel-doc style

>  - Remove the excessive WARN_ONs and rely on the user for sanity

>  - new macro for checking TearDown Completion Message

> 

> - ring accelerator driver

>  - fixed up th commit message (SoB, TI-SCI)

>  - fixed ring reset

>  - CONFIG_TI_K3_RINGACC_DEBUG is removed along with the dbg_write/read functions

>    and use dev_dbg()

>  - k3_ringacc_ring_dump() is moved to static

>  - step numbering removed from k3_ringacc_ring_reset_dma()

>  - Add clarification comment for shared ring usage in k3_ringacc_ring_cfg()

>  - Magic shift values in k3_ringacc_ring_cfg_proxy() got defined

>  - K3_RINGACC_RING_MODE_QM is removed as it is not supported

> 

> - UDMAP DT bindings

>  - Fix property prefixing: s/pdma,/ti,pdma-

>  - Add ti,notdpkt property to suppress teardown completion message on tchan

>  - example updated accordingly

> 

> - UDMAP DMAengine driver

>  - Change __raw_readl/writel to readl/writel

>  - Split up the udma_tisci_channel_config() into m2m, tx and rx tisci

>    configuration functions for clarity

>  - DT bindings change: s/pdma,/ti,pdma-

>  - Cleanup of udma_tx_status():

>   - residue calculation fix for m2m

>   - no need to read packet counter as it is not used

>   - peer byte counter only available in PDMAs

>   - Proper locking to avoid race with interrupt handler (polled m2m fix)

>  - Support for ti,notdpkt

>  - RFLOW management rework to support data movement without channel:

>   - the channel is not controlled by Linux but other core and we only have

>     rflows and rings to do the DMA transfers.

>     This mode is only supported by the Glue layer for now.

> 

> - UDMAP glue layer

>  - Debug print improvements

>  - Support for rflow/ring only data movement

> 

> Changes since v1

> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=114105&state=*)

> - Added support for j721e

> - Based on 5.3-rc2

> - dropped ti_sci API patch for RM management as it is already upstream

> - dropped dmadev_get_slave_channel() patch, using __dma_request_channel()

> - Added Rob's Reviewed-by to ringacc DT binding document patch

> - DT bindings changes:

>  - linux,udma-mode is gone, I have a simple lookup table in the driver to flag

>    TR channels.

>  - Support for j721e

> - Fix bug in of_node_put() handling in xlate function

> 

> Changes since RFC (https://patchwork.kernel.org/cover/10612465/):

> - Based on linux-next (20190506) which now have the ti_sci interrupt support

> - The series can be applied and the UDMA via DMAengine API will be functional

> - Included in the series: ti_sci Resource management API, cppi5 header and

>   driver for the ring accelerator.

> - The DMAengine core patches have been updated as per the review comments for

>   earlier submittion.

> - The DMAengine driver patch is artificially split up to 6 smaller patches

> 

> The k3-udma driver implements the Data Movement Architecture described in

> AM65x TRM (http://www.ti.com/lit/pdf/spruid7) and

> j721e TRM (http://www.ti.com/lit/pdf/spruil1)

> 

> This DMA architecture is a big departure from 'traditional' architecture where

> we had either EDMA or sDMA as system DMA.

> 

> Packet DMAs were used as dedicated DMAs to service only networking (Kesytone2)

> or USB (am335x) while other peripherals were serviced by EDMA.

> 

> In AM65x/j721e the UDMA (Unified DMA) is used for all data movment within the

> SoC, tasked to service all peripherals (UART, McSPI, McASP, networking, etc). 

> 

> The NAVSS/UDMA is built around CPPI5 (Communications Port Programming Interface)

> and it supports Packet mode (similar to CPPI4.1 in Keystone2 for networking) and

> TR mode (similar to EDMA descriptor).

> The data movement is done within a PSI-L fabric, peripherals (including the

> UDMA-P) are not addressed by their I/O register as with traditional DMAs but

> with their PSI-L thread ID.

> 

> In AM65x/j721e we have two main type of peripherals:

> Legacy: McASP, McSPI, UART, etc.

>  to provide connectivity they are serviced by PDMA (Peripheral DMA)

>  PDMA threads are locked to service a given peripheral, for example PSI-L thread

>  0x4400/0xc400 is to service McASP0 rx/tx.

>  The PDMa configuration can be done via the UDMA Real Time Peer registers.

> Native: Networking, security accelerator

>  these peripherals have native support for PSI-L.

> 

> To be able to use the DMA the following generic steps need to be taken:

> - configure a DMA channel (tchan for TX, rchan for RX)

>  - channel mode: Packet or TR mode

>  - for memcpy a tchan and rchan pair is used.

>  - for packet mode RX we also need to configure a receive flow to configure the

>    packet receiption

> - the source and destination threads must be paired

> - at minimum one pair of rings need to be configured:

>  - tx: transfer ring and transfer completion ring

>  - rx: free descriptor ring and receive ring

> - two interrupts: UDMA-P channel interrupt and ring interrupt for tc_ring/r_ring

>  - If the channel is in packet mode or configured to memcpy then we only need

>    one interrupt from the ring, events from UDMAP is not used.

> 

> When the channel setup is completed we only interract with the rings:

> - TX: push a descriptor to t_ring and wait for it to be pushed to the tc_ring by

>   the UDMA-P

> - RX: push a descriptor to the fd_ring and waith for UDMA-P to push it back to

>   the r_ring.

> 

> Since we have FIFOs in the DMA fabric (UDMA-P, PSI-L and PDMA) which was not the

> case in previous DMAs we need to report the amount of data held in these FIFOs

> to clients (delay calculation for ALSA, UART FIFO flush support).

> 

> Metadata support:

> DMAengine user driver was posted upstream based/tested on the v1 of the UDMA

> series: https://lkml.org/lkml/2019/6/28/20

> SA2UL is using the metadata DMAengine API.

> 

> Note on the last patch:

> In Keystone2 the networking had dedicated DMA (packet DMA) which is not the case

> anymore and the DMAengine API currently missing support for the features we

> would need to support networking, things like

> - support for receive descriptor 'classification'

>  - we need to support several receive queues for a channel.

>  - the queues are used for packet priority handling for example, but they can be

>    used to have pools of descriptors for different sizes.

> - out of order completion of descriptors on a channel

>  - when we have several queues to handle different priority packets the

>    descriptors will be completed 'out-of-order'

> - NAPI type of operation (polling instead of interrupt driven transfer)

>  - without this we can not sustain gigabit speeds and we need to support NAPI

>  - not to limit this to networking, but other high performance operations

> 

> It is my intention to work on these to be able to remove the 'glue' layer and

> switch to DMAengine API - or have an API aside of DMAengine to have generic way

> to support networking, but given how controversial and not trivial these changes

> are we need something to support networking.

> 

> The series (+DT patches to enabled DMA on AM65x and j721e) on top of 5.5-rc1 is

> available:

> https://github.com/omap-audio/linux-audio.git peter/udma/series_v7-5.5-rc1

> 

> Regards,

> Peter

> ---

> Grygorii Strashko (3):

>   bindings: soc: ti: add documentation for k3 ringacc

>   soc: ti: k3: add navss ringacc driver

>   dmaengine: ti: k3-udma: Add glue layer for non DMAengine users

> 

> Peter Ujfalusi (9):

>   dmaengine: doc: Add sections for per descriptor metadata support

>   dmaengine: Add metadata_ops for dma_async_tx_descriptor

>   dmaengine: Add support for reporting DMA cached data amount

>   dmaengine: ti: Add cppi5 header for K3 NAVSS/UDMA

>   dmaengine: ti: k3 PSI-L remote endpoint configuration

>   dt-bindings: dma: ti: Add document for K3 UDMA

>   dmaengine: ti: New driver for K3 UDMA

>   firmware: ti_sci: rm: Add support for tx_tdtype parameter for tx

>     channel

>   dmaengine: ti: k3-udma: Wait for peer teardown completion if supported

> 

>  .../devicetree/bindings/dma/ti/k3-udma.yaml   |  185 +

>  .../devicetree/bindings/soc/ti/k3-ringacc.txt |   59 +

>  Documentation/driver-api/dmaengine/client.rst |   75 +

>  .../driver-api/dmaengine/provider.rst         |   46 +

>  drivers/dma/dmaengine.c                       |   73 +

>  drivers/dma/dmaengine.h                       |    8 +

>  drivers/dma/ti/Kconfig                        |   24 +

>  drivers/dma/ti/Makefile                       |    3 +

>  drivers/dma/ti/k3-psil-am654.c                |  175 +

>  drivers/dma/ti/k3-psil-j721e.c                |  222 ++

>  drivers/dma/ti/k3-psil-priv.h                 |   39 +

>  drivers/dma/ti/k3-psil.c                      |   97 +

>  drivers/dma/ti/k3-udma-glue.c                 | 1198 ++++++

>  drivers/dma/ti/k3-udma-private.c              |  133 +

>  drivers/dma/ti/k3-udma.c                      | 3452 +++++++++++++++++

>  drivers/dma/ti/k3-udma.h                      |  151 +

>  drivers/firmware/ti_sci.c                     |    1 +

>  drivers/firmware/ti_sci.h                     |    7 +

>  drivers/soc/ti/Kconfig                        |   11 +

>  drivers/soc/ti/Makefile                       |    1 +

>  drivers/soc/ti/k3-ringacc.c                   | 1180 ++++++

>  include/linux/dma/k3-psil.h                   |   71 +

>  include/linux/dma/k3-udma-glue.h              |  134 +

>  include/linux/dma/ti-cppi5.h                  | 1061 +++++

>  include/linux/dmaengine.h                     |  110 +

>  include/linux/soc/ti/k3-ringacc.h             |  244 ++

>  include/linux/soc/ti/ti_sci_protocol.h        |    2 +

>  27 files changed, 8762 insertions(+)

>  create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.yaml

>  create mode 100644 Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt

>  create mode 100644 drivers/dma/ti/k3-psil-am654.c

>  create mode 100644 drivers/dma/ti/k3-psil-j721e.c

>  create mode 100644 drivers/dma/ti/k3-psil-priv.h

>  create mode 100644 drivers/dma/ti/k3-psil.c

>  create mode 100644 drivers/dma/ti/k3-udma-glue.c

>  create mode 100644 drivers/dma/ti/k3-udma-private.c

>  create mode 100644 drivers/dma/ti/k3-udma.c

>  create mode 100644 drivers/dma/ti/k3-udma.h

>  create mode 100644 drivers/soc/ti/k3-ringacc.c

>  create mode 100644 include/linux/dma/k3-psil.h

>  create mode 100644 include/linux/dma/k3-udma-glue.h

>  create mode 100644 include/linux/dma/ti-cppi5.h

>  create mode 100644 include/linux/soc/ti/k3-ringacc.h

>
Peter Ujfalusi Dec. 12, 2019, 8:46 a.m. UTC | #3
On 09/12/2019 11.43, Peter Ujfalusi wrote:
> Hi,

> 

> Vinod, Nishanth, Tero, Santosh: the ti_sci patch in this series was sent

> upstream over a month ago:

> https://lore.kernel.org/lkml/20191025084715.25098-1-peter.ujfalusi@ti.com/

> 

> I'm still waiting on it's fate (Tero has given his r-b).

> The ti_sci patch did not made it to 5.5-rc1, but I included it in the series and

> let the maintainers decide if it can go via DMAengine for 5.6 or to later

> releases (5.6 probably for the ti_sci and 5.7 for the UDMA driver patch).

> 

> Changes since v6:

> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=209455&state=*)

> 

> - UDMAP DMAengine driver:

>  - Squashed the split patches

>  - Squashed the early TX completion handling update

>    (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=210713&state=*)

>  - Hard reset fix for RX channels to avoid channel lockdown

>  - Correct completed descriptor's residue value


I got build failure with allmodconfig:

ERROR: "devm_ti_sci_get_of_resource" [drivers/soc/ti/k3-ringacc.ko]
undefined!
ERROR: "of_msi_get_domain" [drivers/soc/ti/k3-ringacc.ko] undefined!
ERROR: "devm_ti_sci_get_of_resource" [drivers/dma/ti/k3-udma.ko] undefined!
ERROR: "of_msi_get_domain" [drivers/dma/ti/k3-udma.ko] undefined!

They are because both devm_ti_sci_get_of_resource and of_msi_get_domain
is missing EXPORT_SYMBOL_GPL(), so they can not be used from modules.

There were patches in the past to add it for of_msi_get_domain:
https://lore.kernel.org/patchwork/patch/668123/
https://lore.kernel.org/patchwork/patch/716046/

I can not find a reason why these are not merged.
Matthias's patch looks to be the earlier one, is it OK if I resend it
within v8?

> Changes since v5:

> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=201051&state=*)

> - Based on 5.4

> 

> - cppi5 header

>  - clear the bits before setting new value with '|='

> 

> - UDMAP DT bindings:

>  - valid compatibles as single enum list

> 

> - UDMAP DMAengine driver:

>  - Fix udma_is_chan_running()

>  - Use flags for acc32, burst support instead of a bool in udma_match_data

>    struct

>  - TDTYPE handling (teardown completion handling for j721e) is moved to separate

>    patch as the tisci core patch has not moved for over a month.

>    Both ti_sci and the iterative patch to udma is included in the series.

> 

> Changes since v4

> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=196619&state=*)

> - Based on 5.4-rc7

> 

> - ringacc DT bindings:

>  - clarify the meaning of ti,sci-dev-id

> 

> - ringacc driver:

>  - Remove 'default y' from Kconfig

>  - Fix struct comments

>  - Move try_module_get() earlier in k3_ringacc_request_ring()

> 

> - PSI-L thread database:

>  - Add kernel style struct/enum documentation

>  - Add missing thread description for sa2ul second interface

>  - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL

> 

> - UDMAP DT bindings:

>  - move to dual license

>  - change compatible from const to enum

>  - items dropped for ti,sci-rm-ranges-*

>  - description text moved from literal block when it is sensible

>  - example fixed to compile cleanly

>   - added parent to provide correct address-cells

>   - navss is moved to simple-mfd from simple-bus

> 

> - UDMAP DMAengine driver:

>  - move fd_ring/r_ring under rflow

>  - get rid of unused iomem for rflows

>  - Remove 'default y' from Kconfig

>  - Use defines for rflow src/dst tag selection

>  - Merge the udma_ring_callback() and udma_tr_event_callback() to their

>    corresponding interrupt handler

>  - Create new defines for tx/rx channel's tisci valid parameter flags

>  - Remove re-initialization to 0 of tisci request struct members

>  - Make sure that vchan tasklets are also stopped when removing the module

>  - Additional checkpatch --strict fixes when it made sense

>   - make W=1 was clean

> 

> - UDMAP glue layer:

>  - Remove 'default y' from Kconfig

>  - commit message update for features needing the glue layer

> 

> Changes since v3

> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=180679&state=*):

> - Based on 5.4-rc5

> - Fixed typos pointed out by Tero

> - Added reviewed-by tags from Tero

> 

> - ring accelerator driver

>  - TODO_GS is removed from the header

>  - pm_runtime removed as NAVSS and it's components are always on

>  - Check validity of Message mode setup (element size > 8 bytes must use proxy)

> 

> - cppi5 header

>  - add commit message

> 

> - UDMAP DT bindings

>  - Drop the psil-config node use on the remote PSI-L side and use only one cell

>    which is the remote threadID:

> 

>      dmas = <&main_udmap 0xc400>, <&main_udmap 0x4400>;

>      dma-names = "tx", "rx";

> 

>  - The PSI-L thread configuration description is moved to kernel as a new module:

>    k3-psil/k3-psil-am654/k3-psil-j721e

>  - ti,psil-base has been removed and moved to kernel

>  - removed the no longer needed dt-bindings/dma/k3-udma.h

>  - Convert the document to schema (yaml)

> 

> - NEW PSI-L endpoint configuration database

>  - a simple database holding the remote end's configuration needed for UDMAP

>    configuration. All previous parameters from DT has been moved here and merged

>    with the linux only tr mode channel flag.

>  - Client drivers can update the remote endpoint configuration as it can be

>    different based on system configuration and the endpoint itself is under the

>    control of the peripheral driver.

>  - database for am654 and j721e

> 

> - UDMAP DMAengine driver

>  - pm_runtime removed as NAVSS and it's components are always on

>  - rchan_oes_offset added to MSI dommain allocation

>  - Use the new PSI-L endpoint database for UDMAP configuration

>  - Support for waiting for PDMA teardown completion on j721e instead of

>    returning right away. depends on:

>    https://lkml.org/lkml/2019/10/25/189

>    Not included in this series, but it is in the branch I have prepared.

>  - psil-base is moved from DT to be part of udma_match_data

>  - tr_thread maps is removed and using the PSI-L endpoint configuration for it

> 

> - UDMAP glue layer

>  - pm_runtime removed as NAVSS and it's components are always on

>  - Use the new PSI-L endpoint database for UDMAP configuration

> 

> Changes since v2

> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=152609&state=*)

> - Based on 5.4-rc1

> - Support for Flow only data transfer for the glue layer

> 

> - cppi5 header

>  - comments converted to kernel-doc style

>  - Remove the excessive WARN_ONs and rely on the user for sanity

>  - new macro for checking TearDown Completion Message

> 

> - ring accelerator driver

>  - fixed up th commit message (SoB, TI-SCI)

>  - fixed ring reset

>  - CONFIG_TI_K3_RINGACC_DEBUG is removed along with the dbg_write/read functions

>    and use dev_dbg()

>  - k3_ringacc_ring_dump() is moved to static

>  - step numbering removed from k3_ringacc_ring_reset_dma()

>  - Add clarification comment for shared ring usage in k3_ringacc_ring_cfg()

>  - Magic shift values in k3_ringacc_ring_cfg_proxy() got defined

>  - K3_RINGACC_RING_MODE_QM is removed as it is not supported

> 

> - UDMAP DT bindings

>  - Fix property prefixing: s/pdma,/ti,pdma-

>  - Add ti,notdpkt property to suppress teardown completion message on tchan

>  - example updated accordingly

> 

> - UDMAP DMAengine driver

>  - Change __raw_readl/writel to readl/writel

>  - Split up the udma_tisci_channel_config() into m2m, tx and rx tisci

>    configuration functions for clarity

>  - DT bindings change: s/pdma,/ti,pdma-

>  - Cleanup of udma_tx_status():

>   - residue calculation fix for m2m

>   - no need to read packet counter as it is not used

>   - peer byte counter only available in PDMAs

>   - Proper locking to avoid race with interrupt handler (polled m2m fix)

>  - Support for ti,notdpkt

>  - RFLOW management rework to support data movement without channel:

>   - the channel is not controlled by Linux but other core and we only have

>     rflows and rings to do the DMA transfers.

>     This mode is only supported by the Glue layer for now.

> 

> - UDMAP glue layer

>  - Debug print improvements

>  - Support for rflow/ring only data movement

> 

> Changes since v1

> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=114105&state=*)

> - Added support for j721e

> - Based on 5.3-rc2

> - dropped ti_sci API patch for RM management as it is already upstream

> - dropped dmadev_get_slave_channel() patch, using __dma_request_channel()

> - Added Rob's Reviewed-by to ringacc DT binding document patch

> - DT bindings changes:

>  - linux,udma-mode is gone, I have a simple lookup table in the driver to flag

>    TR channels.

>  - Support for j721e

> - Fix bug in of_node_put() handling in xlate function

> 

> Changes since RFC (https://patchwork.kernel.org/cover/10612465/):

> - Based on linux-next (20190506) which now have the ti_sci interrupt support

> - The series can be applied and the UDMA via DMAengine API will be functional

> - Included in the series: ti_sci Resource management API, cppi5 header and

>   driver for the ring accelerator.

> - The DMAengine core patches have been updated as per the review comments for

>   earlier submittion.

> - The DMAengine driver patch is artificially split up to 6 smaller patches

> 

> The k3-udma driver implements the Data Movement Architecture described in

> AM65x TRM (http://www.ti.com/lit/pdf/spruid7) and

> j721e TRM (http://www.ti.com/lit/pdf/spruil1)

> 

> This DMA architecture is a big departure from 'traditional' architecture where

> we had either EDMA or sDMA as system DMA.

> 

> Packet DMAs were used as dedicated DMAs to service only networking (Kesytone2)

> or USB (am335x) while other peripherals were serviced by EDMA.

> 

> In AM65x/j721e the UDMA (Unified DMA) is used for all data movment within the

> SoC, tasked to service all peripherals (UART, McSPI, McASP, networking, etc). 

> 

> The NAVSS/UDMA is built around CPPI5 (Communications Port Programming Interface)

> and it supports Packet mode (similar to CPPI4.1 in Keystone2 for networking) and

> TR mode (similar to EDMA descriptor).

> The data movement is done within a PSI-L fabric, peripherals (including the

> UDMA-P) are not addressed by their I/O register as with traditional DMAs but

> with their PSI-L thread ID.

> 

> In AM65x/j721e we have two main type of peripherals:

> Legacy: McASP, McSPI, UART, etc.

>  to provide connectivity they are serviced by PDMA (Peripheral DMA)

>  PDMA threads are locked to service a given peripheral, for example PSI-L thread

>  0x4400/0xc400 is to service McASP0 rx/tx.

>  The PDMa configuration can be done via the UDMA Real Time Peer registers.

> Native: Networking, security accelerator

>  these peripherals have native support for PSI-L.

> 

> To be able to use the DMA the following generic steps need to be taken:

> - configure a DMA channel (tchan for TX, rchan for RX)

>  - channel mode: Packet or TR mode

>  - for memcpy a tchan and rchan pair is used.

>  - for packet mode RX we also need to configure a receive flow to configure the

>    packet receiption

> - the source and destination threads must be paired

> - at minimum one pair of rings need to be configured:

>  - tx: transfer ring and transfer completion ring

>  - rx: free descriptor ring and receive ring

> - two interrupts: UDMA-P channel interrupt and ring interrupt for tc_ring/r_ring

>  - If the channel is in packet mode or configured to memcpy then we only need

>    one interrupt from the ring, events from UDMAP is not used.

> 

> When the channel setup is completed we only interract with the rings:

> - TX: push a descriptor to t_ring and wait for it to be pushed to the tc_ring by

>   the UDMA-P

> - RX: push a descriptor to the fd_ring and waith for UDMA-P to push it back to

>   the r_ring.

> 

> Since we have FIFOs in the DMA fabric (UDMA-P, PSI-L and PDMA) which was not the

> case in previous DMAs we need to report the amount of data held in these FIFOs

> to clients (delay calculation for ALSA, UART FIFO flush support).

> 

> Metadata support:

> DMAengine user driver was posted upstream based/tested on the v1 of the UDMA

> series: https://lkml.org/lkml/2019/6/28/20

> SA2UL is using the metadata DMAengine API.

> 

> Note on the last patch:

> In Keystone2 the networking had dedicated DMA (packet DMA) which is not the case

> anymore and the DMAengine API currently missing support for the features we

> would need to support networking, things like

> - support for receive descriptor 'classification'

>  - we need to support several receive queues for a channel.

>  - the queues are used for packet priority handling for example, but they can be

>    used to have pools of descriptors for different sizes.

> - out of order completion of descriptors on a channel

>  - when we have several queues to handle different priority packets the

>    descriptors will be completed 'out-of-order'

> - NAPI type of operation (polling instead of interrupt driven transfer)

>  - without this we can not sustain gigabit speeds and we need to support NAPI

>  - not to limit this to networking, but other high performance operations

> 

> It is my intention to work on these to be able to remove the 'glue' layer and

> switch to DMAengine API - or have an API aside of DMAengine to have generic way

> to support networking, but given how controversial and not trivial these changes

> are we need something to support networking.

> 

> The series (+DT patches to enabled DMA on AM65x and j721e) on top of 5.5-rc1 is

> available:

> https://github.com/omap-audio/linux-audio.git peter/udma/series_v7-5.5-rc1

> 

> Regards,

> Peter

> ---

> Grygorii Strashko (3):

>   bindings: soc: ti: add documentation for k3 ringacc

>   soc: ti: k3: add navss ringacc driver

>   dmaengine: ti: k3-udma: Add glue layer for non DMAengine users

> 

> Peter Ujfalusi (9):

>   dmaengine: doc: Add sections for per descriptor metadata support

>   dmaengine: Add metadata_ops for dma_async_tx_descriptor

>   dmaengine: Add support for reporting DMA cached data amount

>   dmaengine: ti: Add cppi5 header for K3 NAVSS/UDMA

>   dmaengine: ti: k3 PSI-L remote endpoint configuration

>   dt-bindings: dma: ti: Add document for K3 UDMA

>   dmaengine: ti: New driver for K3 UDMA

>   firmware: ti_sci: rm: Add support for tx_tdtype parameter for tx

>     channel

>   dmaengine: ti: k3-udma: Wait for peer teardown completion if supported

> 

>  .../devicetree/bindings/dma/ti/k3-udma.yaml   |  185 +

>  .../devicetree/bindings/soc/ti/k3-ringacc.txt |   59 +

>  Documentation/driver-api/dmaengine/client.rst |   75 +

>  .../driver-api/dmaengine/provider.rst         |   46 +

>  drivers/dma/dmaengine.c                       |   73 +

>  drivers/dma/dmaengine.h                       |    8 +

>  drivers/dma/ti/Kconfig                        |   24 +

>  drivers/dma/ti/Makefile                       |    3 +

>  drivers/dma/ti/k3-psil-am654.c                |  175 +

>  drivers/dma/ti/k3-psil-j721e.c                |  222 ++

>  drivers/dma/ti/k3-psil-priv.h                 |   39 +

>  drivers/dma/ti/k3-psil.c                      |   97 +

>  drivers/dma/ti/k3-udma-glue.c                 | 1198 ++++++

>  drivers/dma/ti/k3-udma-private.c              |  133 +

>  drivers/dma/ti/k3-udma.c                      | 3452 +++++++++++++++++

>  drivers/dma/ti/k3-udma.h                      |  151 +

>  drivers/firmware/ti_sci.c                     |    1 +

>  drivers/firmware/ti_sci.h                     |    7 +

>  drivers/soc/ti/Kconfig                        |   11 +

>  drivers/soc/ti/Makefile                       |    1 +

>  drivers/soc/ti/k3-ringacc.c                   | 1180 ++++++

>  include/linux/dma/k3-psil.h                   |   71 +

>  include/linux/dma/k3-udma-glue.h              |  134 +

>  include/linux/dma/ti-cppi5.h                  | 1061 +++++

>  include/linux/dmaengine.h                     |  110 +

>  include/linux/soc/ti/k3-ringacc.h             |  244 ++

>  include/linux/soc/ti/ti_sci_protocol.h        |    2 +

>  27 files changed, 8762 insertions(+)

>  create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.yaml

>  create mode 100644 Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt

>  create mode 100644 drivers/dma/ti/k3-psil-am654.c

>  create mode 100644 drivers/dma/ti/k3-psil-j721e.c

>  create mode 100644 drivers/dma/ti/k3-psil-priv.h

>  create mode 100644 drivers/dma/ti/k3-psil.c

>  create mode 100644 drivers/dma/ti/k3-udma-glue.c

>  create mode 100644 drivers/dma/ti/k3-udma-private.c

>  create mode 100644 drivers/dma/ti/k3-udma.c

>  create mode 100644 drivers/dma/ti/k3-udma.h

>  create mode 100644 drivers/soc/ti/k3-ringacc.c

>  create mode 100644 include/linux/dma/k3-psil.h

>  create mode 100644 include/linux/dma/k3-udma-glue.h

>  create mode 100644 include/linux/dma/ti-cppi5.h

>  create mode 100644 include/linux/soc/ti/k3-ringacc.h

> 


- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Tero Kristo Dec. 12, 2019, 10:55 a.m. UTC | #4
On 12/12/2019 10:46, Peter Ujfalusi wrote:
> 

> 

> On 09/12/2019 11.43, Peter Ujfalusi wrote:

>> Hi,

>>

>> Vinod, Nishanth, Tero, Santosh: the ti_sci patch in this series was sent

>> upstream over a month ago:

>> https://lore.kernel.org/lkml/20191025084715.25098-1-peter.ujfalusi@ti.com/

>>

>> I'm still waiting on it's fate (Tero has given his r-b).

>> The ti_sci patch did not made it to 5.5-rc1, but I included it in the series and

>> let the maintainers decide if it can go via DMAengine for 5.6 or to later

>> releases (5.6 probably for the ti_sci and 5.7 for the UDMA driver patch).

>>

>> Changes since v6:

>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=209455&state=*)

>>

>> - UDMAP DMAengine driver:

>>   - Squashed the split patches

>>   - Squashed the early TX completion handling update

>>     (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=210713&state=*)

>>   - Hard reset fix for RX channels to avoid channel lockdown

>>   - Correct completed descriptor's residue value

> 

> I got build failure with allmodconfig:

> 

> ERROR: "devm_ti_sci_get_of_resource" [drivers/soc/ti/k3-ringacc.ko]

> undefined!

> ERROR: "of_msi_get_domain" [drivers/soc/ti/k3-ringacc.ko] undefined!

> ERROR: "devm_ti_sci_get_of_resource" [drivers/dma/ti/k3-udma.ko] undefined!

> ERROR: "of_msi_get_domain" [drivers/dma/ti/k3-udma.ko] undefined!

> 

> They are because both devm_ti_sci_get_of_resource and of_msi_get_domain

> is missing EXPORT_SYMBOL_GPL(), so they can not be used from modules.

> 

> There were patches in the past to add it for of_msi_get_domain:

> https://lore.kernel.org/patchwork/patch/668123/

> https://lore.kernel.org/patchwork/patch/716046/

> 

> I can not find a reason why these are not merged.

> Matthias's patch looks to be the earlier one, is it OK if I resend it

> within v8?


You can just send those two patches separately, I can apply them first 
before this series. No need to resend this series.

-Tero

> 

>> Changes since v5:

>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=201051&state=*)

>> - Based on 5.4

>>

>> - cppi5 header

>>   - clear the bits before setting new value with '|='

>>

>> - UDMAP DT bindings:

>>   - valid compatibles as single enum list

>>

>> - UDMAP DMAengine driver:

>>   - Fix udma_is_chan_running()

>>   - Use flags for acc32, burst support instead of a bool in udma_match_data

>>     struct

>>   - TDTYPE handling (teardown completion handling for j721e) is moved to separate

>>     patch as the tisci core patch has not moved for over a month.

>>     Both ti_sci and the iterative patch to udma is included in the series.

>>

>> Changes since v4

>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=196619&state=*)

>> - Based on 5.4-rc7

>>

>> - ringacc DT bindings:

>>   - clarify the meaning of ti,sci-dev-id

>>

>> - ringacc driver:

>>   - Remove 'default y' from Kconfig

>>   - Fix struct comments

>>   - Move try_module_get() earlier in k3_ringacc_request_ring()

>>

>> - PSI-L thread database:

>>   - Add kernel style struct/enum documentation

>>   - Add missing thread description for sa2ul second interface

>>   - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL

>>

>> - UDMAP DT bindings:

>>   - move to dual license

>>   - change compatible from const to enum

>>   - items dropped for ti,sci-rm-ranges-*

>>   - description text moved from literal block when it is sensible

>>   - example fixed to compile cleanly

>>    - added parent to provide correct address-cells

>>    - navss is moved to simple-mfd from simple-bus

>>

>> - UDMAP DMAengine driver:

>>   - move fd_ring/r_ring under rflow

>>   - get rid of unused iomem for rflows

>>   - Remove 'default y' from Kconfig

>>   - Use defines for rflow src/dst tag selection

>>   - Merge the udma_ring_callback() and udma_tr_event_callback() to their

>>     corresponding interrupt handler

>>   - Create new defines for tx/rx channel's tisci valid parameter flags

>>   - Remove re-initialization to 0 of tisci request struct members

>>   - Make sure that vchan tasklets are also stopped when removing the module

>>   - Additional checkpatch --strict fixes when it made sense

>>    - make W=1 was clean

>>

>> - UDMAP glue layer:

>>   - Remove 'default y' from Kconfig

>>   - commit message update for features needing the glue layer

>>

>> Changes since v3

>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=180679&state=*):

>> - Based on 5.4-rc5

>> - Fixed typos pointed out by Tero

>> - Added reviewed-by tags from Tero

>>

>> - ring accelerator driver

>>   - TODO_GS is removed from the header

>>   - pm_runtime removed as NAVSS and it's components are always on

>>   - Check validity of Message mode setup (element size > 8 bytes must use proxy)

>>

>> - cppi5 header

>>   - add commit message

>>

>> - UDMAP DT bindings

>>   - Drop the psil-config node use on the remote PSI-L side and use only one cell

>>     which is the remote threadID:

>>

>>       dmas = <&main_udmap 0xc400>, <&main_udmap 0x4400>;

>>       dma-names = "tx", "rx";

>>

>>   - The PSI-L thread configuration description is moved to kernel as a new module:

>>     k3-psil/k3-psil-am654/k3-psil-j721e

>>   - ti,psil-base has been removed and moved to kernel

>>   - removed the no longer needed dt-bindings/dma/k3-udma.h

>>   - Convert the document to schema (yaml)

>>

>> - NEW PSI-L endpoint configuration database

>>   - a simple database holding the remote end's configuration needed for UDMAP

>>     configuration. All previous parameters from DT has been moved here and merged

>>     with the linux only tr mode channel flag.

>>   - Client drivers can update the remote endpoint configuration as it can be

>>     different based on system configuration and the endpoint itself is under the

>>     control of the peripheral driver.

>>   - database for am654 and j721e

>>

>> - UDMAP DMAengine driver

>>   - pm_runtime removed as NAVSS and it's components are always on

>>   - rchan_oes_offset added to MSI dommain allocation

>>   - Use the new PSI-L endpoint database for UDMAP configuration

>>   - Support for waiting for PDMA teardown completion on j721e instead of

>>     returning right away. depends on:

>>     https://lkml.org/lkml/2019/10/25/189

>>     Not included in this series, but it is in the branch I have prepared.

>>   - psil-base is moved from DT to be part of udma_match_data

>>   - tr_thread maps is removed and using the PSI-L endpoint configuration for it

>>

>> - UDMAP glue layer

>>   - pm_runtime removed as NAVSS and it's components are always on

>>   - Use the new PSI-L endpoint database for UDMAP configuration

>>

>> Changes since v2

>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=152609&state=*)

>> - Based on 5.4-rc1

>> - Support for Flow only data transfer for the glue layer

>>

>> - cppi5 header

>>   - comments converted to kernel-doc style

>>   - Remove the excessive WARN_ONs and rely on the user for sanity

>>   - new macro for checking TearDown Completion Message

>>

>> - ring accelerator driver

>>   - fixed up th commit message (SoB, TI-SCI)

>>   - fixed ring reset

>>   - CONFIG_TI_K3_RINGACC_DEBUG is removed along with the dbg_write/read functions

>>     and use dev_dbg()

>>   - k3_ringacc_ring_dump() is moved to static

>>   - step numbering removed from k3_ringacc_ring_reset_dma()

>>   - Add clarification comment for shared ring usage in k3_ringacc_ring_cfg()

>>   - Magic shift values in k3_ringacc_ring_cfg_proxy() got defined

>>   - K3_RINGACC_RING_MODE_QM is removed as it is not supported

>>

>> - UDMAP DT bindings

>>   - Fix property prefixing: s/pdma,/ti,pdma-

>>   - Add ti,notdpkt property to suppress teardown completion message on tchan

>>   - example updated accordingly

>>

>> - UDMAP DMAengine driver

>>   - Change __raw_readl/writel to readl/writel

>>   - Split up the udma_tisci_channel_config() into m2m, tx and rx tisci

>>     configuration functions for clarity

>>   - DT bindings change: s/pdma,/ti,pdma-

>>   - Cleanup of udma_tx_status():

>>    - residue calculation fix for m2m

>>    - no need to read packet counter as it is not used

>>    - peer byte counter only available in PDMAs

>>    - Proper locking to avoid race with interrupt handler (polled m2m fix)

>>   - Support for ti,notdpkt

>>   - RFLOW management rework to support data movement without channel:

>>    - the channel is not controlled by Linux but other core and we only have

>>      rflows and rings to do the DMA transfers.

>>      This mode is only supported by the Glue layer for now.

>>

>> - UDMAP glue layer

>>   - Debug print improvements

>>   - Support for rflow/ring only data movement

>>

>> Changes since v1

>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=114105&state=*)

>> - Added support for j721e

>> - Based on 5.3-rc2

>> - dropped ti_sci API patch for RM management as it is already upstream

>> - dropped dmadev_get_slave_channel() patch, using __dma_request_channel()

>> - Added Rob's Reviewed-by to ringacc DT binding document patch

>> - DT bindings changes:

>>   - linux,udma-mode is gone, I have a simple lookup table in the driver to flag

>>     TR channels.

>>   - Support for j721e

>> - Fix bug in of_node_put() handling in xlate function

>>

>> Changes since RFC (https://patchwork.kernel.org/cover/10612465/):

>> - Based on linux-next (20190506) which now have the ti_sci interrupt support

>> - The series can be applied and the UDMA via DMAengine API will be functional

>> - Included in the series: ti_sci Resource management API, cppi5 header and

>>    driver for the ring accelerator.

>> - The DMAengine core patches have been updated as per the review comments for

>>    earlier submittion.

>> - The DMAengine driver patch is artificially split up to 6 smaller patches

>>

>> The k3-udma driver implements the Data Movement Architecture described in

>> AM65x TRM (http://www.ti.com/lit/pdf/spruid7) and

>> j721e TRM (http://www.ti.com/lit/pdf/spruil1)

>>

>> This DMA architecture is a big departure from 'traditional' architecture where

>> we had either EDMA or sDMA as system DMA.

>>

>> Packet DMAs were used as dedicated DMAs to service only networking (Kesytone2)

>> or USB (am335x) while other peripherals were serviced by EDMA.

>>

>> In AM65x/j721e the UDMA (Unified DMA) is used for all data movment within the

>> SoC, tasked to service all peripherals (UART, McSPI, McASP, networking, etc).

>>

>> The NAVSS/UDMA is built around CPPI5 (Communications Port Programming Interface)

>> and it supports Packet mode (similar to CPPI4.1 in Keystone2 for networking) and

>> TR mode (similar to EDMA descriptor).

>> The data movement is done within a PSI-L fabric, peripherals (including the

>> UDMA-P) are not addressed by their I/O register as with traditional DMAs but

>> with their PSI-L thread ID.

>>

>> In AM65x/j721e we have two main type of peripherals:

>> Legacy: McASP, McSPI, UART, etc.

>>   to provide connectivity they are serviced by PDMA (Peripheral DMA)

>>   PDMA threads are locked to service a given peripheral, for example PSI-L thread

>>   0x4400/0xc400 is to service McASP0 rx/tx.

>>   The PDMa configuration can be done via the UDMA Real Time Peer registers.

>> Native: Networking, security accelerator

>>   these peripherals have native support for PSI-L.

>>

>> To be able to use the DMA the following generic steps need to be taken:

>> - configure a DMA channel (tchan for TX, rchan for RX)

>>   - channel mode: Packet or TR mode

>>   - for memcpy a tchan and rchan pair is used.

>>   - for packet mode RX we also need to configure a receive flow to configure the

>>     packet receiption

>> - the source and destination threads must be paired

>> - at minimum one pair of rings need to be configured:

>>   - tx: transfer ring and transfer completion ring

>>   - rx: free descriptor ring and receive ring

>> - two interrupts: UDMA-P channel interrupt and ring interrupt for tc_ring/r_ring

>>   - If the channel is in packet mode or configured to memcpy then we only need

>>     one interrupt from the ring, events from UDMAP is not used.

>>

>> When the channel setup is completed we only interract with the rings:

>> - TX: push a descriptor to t_ring and wait for it to be pushed to the tc_ring by

>>    the UDMA-P

>> - RX: push a descriptor to the fd_ring and waith for UDMA-P to push it back to

>>    the r_ring.

>>

>> Since we have FIFOs in the DMA fabric (UDMA-P, PSI-L and PDMA) which was not the

>> case in previous DMAs we need to report the amount of data held in these FIFOs

>> to clients (delay calculation for ALSA, UART FIFO flush support).

>>

>> Metadata support:

>> DMAengine user driver was posted upstream based/tested on the v1 of the UDMA

>> series: https://lkml.org/lkml/2019/6/28/20

>> SA2UL is using the metadata DMAengine API.

>>

>> Note on the last patch:

>> In Keystone2 the networking had dedicated DMA (packet DMA) which is not the case

>> anymore and the DMAengine API currently missing support for the features we

>> would need to support networking, things like

>> - support for receive descriptor 'classification'

>>   - we need to support several receive queues for a channel.

>>   - the queues are used for packet priority handling for example, but they can be

>>     used to have pools of descriptors for different sizes.

>> - out of order completion of descriptors on a channel

>>   - when we have several queues to handle different priority packets the

>>     descriptors will be completed 'out-of-order'

>> - NAPI type of operation (polling instead of interrupt driven transfer)

>>   - without this we can not sustain gigabit speeds and we need to support NAPI

>>   - not to limit this to networking, but other high performance operations

>>

>> It is my intention to work on these to be able to remove the 'glue' layer and

>> switch to DMAengine API - or have an API aside of DMAengine to have generic way

>> to support networking, but given how controversial and not trivial these changes

>> are we need something to support networking.

>>

>> The series (+DT patches to enabled DMA on AM65x and j721e) on top of 5.5-rc1 is

>> available:

>> https://github.com/omap-audio/linux-audio.git peter/udma/series_v7-5.5-rc1

>>

>> Regards,

>> Peter

>> ---

>> Grygorii Strashko (3):

>>    bindings: soc: ti: add documentation for k3 ringacc

>>    soc: ti: k3: add navss ringacc driver

>>    dmaengine: ti: k3-udma: Add glue layer for non DMAengine users

>>

>> Peter Ujfalusi (9):

>>    dmaengine: doc: Add sections for per descriptor metadata support

>>    dmaengine: Add metadata_ops for dma_async_tx_descriptor

>>    dmaengine: Add support for reporting DMA cached data amount

>>    dmaengine: ti: Add cppi5 header for K3 NAVSS/UDMA

>>    dmaengine: ti: k3 PSI-L remote endpoint configuration

>>    dt-bindings: dma: ti: Add document for K3 UDMA

>>    dmaengine: ti: New driver for K3 UDMA

>>    firmware: ti_sci: rm: Add support for tx_tdtype parameter for tx

>>      channel

>>    dmaengine: ti: k3-udma: Wait for peer teardown completion if supported

>>

>>   .../devicetree/bindings/dma/ti/k3-udma.yaml   |  185 +

>>   .../devicetree/bindings/soc/ti/k3-ringacc.txt |   59 +

>>   Documentation/driver-api/dmaengine/client.rst |   75 +

>>   .../driver-api/dmaengine/provider.rst         |   46 +

>>   drivers/dma/dmaengine.c                       |   73 +

>>   drivers/dma/dmaengine.h                       |    8 +

>>   drivers/dma/ti/Kconfig                        |   24 +

>>   drivers/dma/ti/Makefile                       |    3 +

>>   drivers/dma/ti/k3-psil-am654.c                |  175 +

>>   drivers/dma/ti/k3-psil-j721e.c                |  222 ++

>>   drivers/dma/ti/k3-psil-priv.h                 |   39 +

>>   drivers/dma/ti/k3-psil.c                      |   97 +

>>   drivers/dma/ti/k3-udma-glue.c                 | 1198 ++++++

>>   drivers/dma/ti/k3-udma-private.c              |  133 +

>>   drivers/dma/ti/k3-udma.c                      | 3452 +++++++++++++++++

>>   drivers/dma/ti/k3-udma.h                      |  151 +

>>   drivers/firmware/ti_sci.c                     |    1 +

>>   drivers/firmware/ti_sci.h                     |    7 +

>>   drivers/soc/ti/Kconfig                        |   11 +

>>   drivers/soc/ti/Makefile                       |    1 +

>>   drivers/soc/ti/k3-ringacc.c                   | 1180 ++++++

>>   include/linux/dma/k3-psil.h                   |   71 +

>>   include/linux/dma/k3-udma-glue.h              |  134 +

>>   include/linux/dma/ti-cppi5.h                  | 1061 +++++

>>   include/linux/dmaengine.h                     |  110 +

>>   include/linux/soc/ti/k3-ringacc.h             |  244 ++

>>   include/linux/soc/ti/ti_sci_protocol.h        |    2 +

>>   27 files changed, 8762 insertions(+)

>>   create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.yaml

>>   create mode 100644 Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt

>>   create mode 100644 drivers/dma/ti/k3-psil-am654.c

>>   create mode 100644 drivers/dma/ti/k3-psil-j721e.c

>>   create mode 100644 drivers/dma/ti/k3-psil-priv.h

>>   create mode 100644 drivers/dma/ti/k3-psil.c

>>   create mode 100644 drivers/dma/ti/k3-udma-glue.c

>>   create mode 100644 drivers/dma/ti/k3-udma-private.c

>>   create mode 100644 drivers/dma/ti/k3-udma.c

>>   create mode 100644 drivers/dma/ti/k3-udma.h

>>   create mode 100644 drivers/soc/ti/k3-ringacc.c

>>   create mode 100644 include/linux/dma/k3-psil.h

>>   create mode 100644 include/linux/dma/k3-udma-glue.h

>>   create mode 100644 include/linux/dma/ti-cppi5.h

>>   create mode 100644 include/linux/soc/ti/k3-ringacc.h

>>

> 

> - Péter

> 

> 


--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Tero Kristo Dec. 12, 2019, 10:57 a.m. UTC | #5
On 12/12/2019 12:55, Tero Kristo wrote:
> On 12/12/2019 10:46, Peter Ujfalusi wrote:

>>

>>

>> On 09/12/2019 11.43, Peter Ujfalusi wrote:

>>> Hi,

>>>

>>> Vinod, Nishanth, Tero, Santosh: the ti_sci patch in this series was sent

>>> upstream over a month ago:

>>> https://lore.kernel.org/lkml/20191025084715.25098-1-peter.ujfalusi@ti.com/ 

>>>

>>>

>>> I'm still waiting on it's fate (Tero has given his r-b).

>>> The ti_sci patch did not made it to 5.5-rc1, but I included it in the 

>>> series and

>>> let the maintainers decide if it can go via DMAengine for 5.6 or to 

>>> later

>>> releases (5.6 probably for the ti_sci and 5.7 for the UDMA driver 

>>> patch).

>>>

>>> Changes since v6:

>>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=209455&state=*) 

>>>

>>>

>>> - UDMAP DMAengine driver:

>>>   - Squashed the split patches

>>>   - Squashed the early TX completion handling update

>>>     

>>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=210713&state=*) 

>>>

>>>   - Hard reset fix for RX channels to avoid channel lockdown

>>>   - Correct completed descriptor's residue value

>>

>> I got build failure with allmodconfig:

>>

>> ERROR: "devm_ti_sci_get_of_resource" [drivers/soc/ti/k3-ringacc.ko]

>> undefined!

>> ERROR: "of_msi_get_domain" [drivers/soc/ti/k3-ringacc.ko] undefined!

>> ERROR: "devm_ti_sci_get_of_resource" [drivers/dma/ti/k3-udma.ko] 

>> undefined!

>> ERROR: "of_msi_get_domain" [drivers/dma/ti/k3-udma.ko] undefined!

>>

>> They are because both devm_ti_sci_get_of_resource and of_msi_get_domain

>> is missing EXPORT_SYMBOL_GPL(), so they can not be used from modules.

>>

>> There were patches in the past to add it for of_msi_get_domain:

>> https://lore.kernel.org/patchwork/patch/668123/

>> https://lore.kernel.org/patchwork/patch/716046/

>>

>> I can not find a reason why these are not merged.

>> Matthias's patch looks to be the earlier one, is it OK if I resend it

>> within v8?

> 

> You can just send those two patches separately, I can apply them first 

> before this series. No need to resend this series.


Oops, sorry about the noise, I got confused with the internal mailing 
list and this one (trying to get it merged internally at the same time.) 
Ignore my comment.

-Tero

> 

>>

>>> Changes since v5:

>>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=201051&state=*) 

>>>

>>> - Based on 5.4

>>>

>>> - cppi5 header

>>>   - clear the bits before setting new value with '|='

>>>

>>> - UDMAP DT bindings:

>>>   - valid compatibles as single enum list

>>>

>>> - UDMAP DMAengine driver:

>>>   - Fix udma_is_chan_running()

>>>   - Use flags for acc32, burst support instead of a bool in 

>>> udma_match_data

>>>     struct

>>>   - TDTYPE handling (teardown completion handling for j721e) is moved 

>>> to separate

>>>     patch as the tisci core patch has not moved for over a month.

>>>     Both ti_sci and the iterative patch to udma is included in the 

>>> series.

>>>

>>> Changes since v4

>>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=196619&state=*) 

>>>

>>> - Based on 5.4-rc7

>>>

>>> - ringacc DT bindings:

>>>   - clarify the meaning of ti,sci-dev-id

>>>

>>> - ringacc driver:

>>>   - Remove 'default y' from Kconfig

>>>   - Fix struct comments

>>>   - Move try_module_get() earlier in k3_ringacc_request_ring()

>>>

>>> - PSI-L thread database:

>>>   - Add kernel style struct/enum documentation

>>>   - Add missing thread description for sa2ul second interface

>>>   - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL

>>>

>>> - UDMAP DT bindings:

>>>   - move to dual license

>>>   - change compatible from const to enum

>>>   - items dropped for ti,sci-rm-ranges-*

>>>   - description text moved from literal block when it is sensible

>>>   - example fixed to compile cleanly

>>>    - added parent to provide correct address-cells

>>>    - navss is moved to simple-mfd from simple-bus

>>>

>>> - UDMAP DMAengine driver:

>>>   - move fd_ring/r_ring under rflow

>>>   - get rid of unused iomem for rflows

>>>   - Remove 'default y' from Kconfig

>>>   - Use defines for rflow src/dst tag selection

>>>   - Merge the udma_ring_callback() and udma_tr_event_callback() to their

>>>     corresponding interrupt handler

>>>   - Create new defines for tx/rx channel's tisci valid parameter flags

>>>   - Remove re-initialization to 0 of tisci request struct members

>>>   - Make sure that vchan tasklets are also stopped when removing the 

>>> module

>>>   - Additional checkpatch --strict fixes when it made sense

>>>    - make W=1 was clean

>>>

>>> - UDMAP glue layer:

>>>   - Remove 'default y' from Kconfig

>>>   - commit message update for features needing the glue layer

>>>

>>> Changes since v3

>>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=180679&state=*): 

>>>

>>> - Based on 5.4-rc5

>>> - Fixed typos pointed out by Tero

>>> - Added reviewed-by tags from Tero

>>>

>>> - ring accelerator driver

>>>   - TODO_GS is removed from the header

>>>   - pm_runtime removed as NAVSS and it's components are always on

>>>   - Check validity of Message mode setup (element size > 8 bytes must 

>>> use proxy)

>>>

>>> - cppi5 header

>>>   - add commit message

>>>

>>> - UDMAP DT bindings

>>>   - Drop the psil-config node use on the remote PSI-L side and use 

>>> only one cell

>>>     which is the remote threadID:

>>>

>>>       dmas = <&main_udmap 0xc400>, <&main_udmap 0x4400>;

>>>       dma-names = "tx", "rx";

>>>

>>>   - The PSI-L thread configuration description is moved to kernel as 

>>> a new module:

>>>     k3-psil/k3-psil-am654/k3-psil-j721e

>>>   - ti,psil-base has been removed and moved to kernel

>>>   - removed the no longer needed dt-bindings/dma/k3-udma.h

>>>   - Convert the document to schema (yaml)

>>>

>>> - NEW PSI-L endpoint configuration database

>>>   - a simple database holding the remote end's configuration needed 

>>> for UDMAP

>>>     configuration. All previous parameters from DT has been moved 

>>> here and merged

>>>     with the linux only tr mode channel flag.

>>>   - Client drivers can update the remote endpoint configuration as it 

>>> can be

>>>     different based on system configuration and the endpoint itself 

>>> is under the

>>>     control of the peripheral driver.

>>>   - database for am654 and j721e

>>>

>>> - UDMAP DMAengine driver

>>>   - pm_runtime removed as NAVSS and it's components are always on

>>>   - rchan_oes_offset added to MSI dommain allocation

>>>   - Use the new PSI-L endpoint database for UDMAP configuration

>>>   - Support for waiting for PDMA teardown completion on j721e instead of

>>>     returning right away. depends on:

>>>     https://lkml.org/lkml/2019/10/25/189

>>>     Not included in this series, but it is in the branch I have 

>>> prepared.

>>>   - psil-base is moved from DT to be part of udma_match_data

>>>   - tr_thread maps is removed and using the PSI-L endpoint 

>>> configuration for it

>>>

>>> - UDMAP glue layer

>>>   - pm_runtime removed as NAVSS and it's components are always on

>>>   - Use the new PSI-L endpoint database for UDMAP configuration

>>>

>>> Changes since v2

>>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=152609&state=*) 

>>>

>>> - Based on 5.4-rc1

>>> - Support for Flow only data transfer for the glue layer

>>>

>>> - cppi5 header

>>>   - comments converted to kernel-doc style

>>>   - Remove the excessive WARN_ONs and rely on the user for sanity

>>>   - new macro for checking TearDown Completion Message

>>>

>>> - ring accelerator driver

>>>   - fixed up th commit message (SoB, TI-SCI)

>>>   - fixed ring reset

>>>   - CONFIG_TI_K3_RINGACC_DEBUG is removed along with the 

>>> dbg_write/read functions

>>>     and use dev_dbg()

>>>   - k3_ringacc_ring_dump() is moved to static

>>>   - step numbering removed from k3_ringacc_ring_reset_dma()

>>>   - Add clarification comment for shared ring usage in 

>>> k3_ringacc_ring_cfg()

>>>   - Magic shift values in k3_ringacc_ring_cfg_proxy() got defined

>>>   - K3_RINGACC_RING_MODE_QM is removed as it is not supported

>>>

>>> - UDMAP DT bindings

>>>   - Fix property prefixing: s/pdma,/ti,pdma-

>>>   - Add ti,notdpkt property to suppress teardown completion message 

>>> on tchan

>>>   - example updated accordingly

>>>

>>> - UDMAP DMAengine driver

>>>   - Change __raw_readl/writel to readl/writel

>>>   - Split up the udma_tisci_channel_config() into m2m, tx and rx tisci

>>>     configuration functions for clarity

>>>   - DT bindings change: s/pdma,/ti,pdma-

>>>   - Cleanup of udma_tx_status():

>>>    - residue calculation fix for m2m

>>>    - no need to read packet counter as it is not used

>>>    - peer byte counter only available in PDMAs

>>>    - Proper locking to avoid race with interrupt handler (polled m2m 

>>> fix)

>>>   - Support for ti,notdpkt

>>>   - RFLOW management rework to support data movement without channel:

>>>    - the channel is not controlled by Linux but other core and we 

>>> only have

>>>      rflows and rings to do the DMA transfers.

>>>      This mode is only supported by the Glue layer for now.

>>>

>>> - UDMAP glue layer

>>>   - Debug print improvements

>>>   - Support for rflow/ring only data movement

>>>

>>> Changes since v1

>>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=114105&state=*) 

>>>

>>> - Added support for j721e

>>> - Based on 5.3-rc2

>>> - dropped ti_sci API patch for RM management as it is already upstream

>>> - dropped dmadev_get_slave_channel() patch, using 

>>> __dma_request_channel()

>>> - Added Rob's Reviewed-by to ringacc DT binding document patch

>>> - DT bindings changes:

>>>   - linux,udma-mode is gone, I have a simple lookup table in the 

>>> driver to flag

>>>     TR channels.

>>>   - Support for j721e

>>> - Fix bug in of_node_put() handling in xlate function

>>>

>>> Changes since RFC (https://patchwork.kernel.org/cover/10612465/):

>>> - Based on linux-next (20190506) which now have the ti_sci interrupt 

>>> support

>>> - The series can be applied and the UDMA via DMAengine API will be 

>>> functional

>>> - Included in the series: ti_sci Resource management API, cppi5 

>>> header and

>>>    driver for the ring accelerator.

>>> - The DMAengine core patches have been updated as per the review 

>>> comments for

>>>    earlier submittion.

>>> - The DMAengine driver patch is artificially split up to 6 smaller 

>>> patches

>>>

>>> The k3-udma driver implements the Data Movement Architecture 

>>> described in

>>> AM65x TRM (http://www.ti.com/lit/pdf/spruid7) and

>>> j721e TRM (http://www.ti.com/lit/pdf/spruil1)

>>>

>>> This DMA architecture is a big departure from 'traditional' 

>>> architecture where

>>> we had either EDMA or sDMA as system DMA.

>>>

>>> Packet DMAs were used as dedicated DMAs to service only networking 

>>> (Kesytone2)

>>> or USB (am335x) while other peripherals were serviced by EDMA.

>>>

>>> In AM65x/j721e the UDMA (Unified DMA) is used for all data movment 

>>> within the

>>> SoC, tasked to service all peripherals (UART, McSPI, McASP, 

>>> networking, etc).

>>>

>>> The NAVSS/UDMA is built around CPPI5 (Communications Port Programming 

>>> Interface)

>>> and it supports Packet mode (similar to CPPI4.1 in Keystone2 for 

>>> networking) and

>>> TR mode (similar to EDMA descriptor).

>>> The data movement is done within a PSI-L fabric, peripherals 

>>> (including the

>>> UDMA-P) are not addressed by their I/O register as with traditional 

>>> DMAs but

>>> with their PSI-L thread ID.

>>>

>>> In AM65x/j721e we have two main type of peripherals:

>>> Legacy: McASP, McSPI, UART, etc.

>>>   to provide connectivity they are serviced by PDMA (Peripheral DMA)

>>>   PDMA threads are locked to service a given peripheral, for example 

>>> PSI-L thread

>>>   0x4400/0xc400 is to service McASP0 rx/tx.

>>>   The PDMa configuration can be done via the UDMA Real Time Peer 

>>> registers.

>>> Native: Networking, security accelerator

>>>   these peripherals have native support for PSI-L.

>>>

>>> To be able to use the DMA the following generic steps need to be taken:

>>> - configure a DMA channel (tchan for TX, rchan for RX)

>>>   - channel mode: Packet or TR mode

>>>   - for memcpy a tchan and rchan pair is used.

>>>   - for packet mode RX we also need to configure a receive flow to 

>>> configure the

>>>     packet receiption

>>> - the source and destination threads must be paired

>>> - at minimum one pair of rings need to be configured:

>>>   - tx: transfer ring and transfer completion ring

>>>   - rx: free descriptor ring and receive ring

>>> - two interrupts: UDMA-P channel interrupt and ring interrupt for 

>>> tc_ring/r_ring

>>>   - If the channel is in packet mode or configured to memcpy then we 

>>> only need

>>>     one interrupt from the ring, events from UDMAP is not used.

>>>

>>> When the channel setup is completed we only interract with the rings:

>>> - TX: push a descriptor to t_ring and wait for it to be pushed to the 

>>> tc_ring by

>>>    the UDMA-P

>>> - RX: push a descriptor to the fd_ring and waith for UDMA-P to push 

>>> it back to

>>>    the r_ring.

>>>

>>> Since we have FIFOs in the DMA fabric (UDMA-P, PSI-L and PDMA) which 

>>> was not the

>>> case in previous DMAs we need to report the amount of data held in 

>>> these FIFOs

>>> to clients (delay calculation for ALSA, UART FIFO flush support).

>>>

>>> Metadata support:

>>> DMAengine user driver was posted upstream based/tested on the v1 of 

>>> the UDMA

>>> series: https://lkml.org/lkml/2019/6/28/20

>>> SA2UL is using the metadata DMAengine API.

>>>

>>> Note on the last patch:

>>> In Keystone2 the networking had dedicated DMA (packet DMA) which is 

>>> not the case

>>> anymore and the DMAengine API currently missing support for the 

>>> features we

>>> would need to support networking, things like

>>> - support for receive descriptor 'classification'

>>>   - we need to support several receive queues for a channel.

>>>   - the queues are used for packet priority handling for example, but 

>>> they can be

>>>     used to have pools of descriptors for different sizes.

>>> - out of order completion of descriptors on a channel

>>>   - when we have several queues to handle different priority packets the

>>>     descriptors will be completed 'out-of-order'

>>> - NAPI type of operation (polling instead of interrupt driven transfer)

>>>   - without this we can not sustain gigabit speeds and we need to 

>>> support NAPI

>>>   - not to limit this to networking, but other high performance 

>>> operations

>>>

>>> It is my intention to work on these to be able to remove the 'glue' 

>>> layer and

>>> switch to DMAengine API - or have an API aside of DMAengine to have 

>>> generic way

>>> to support networking, but given how controversial and not trivial 

>>> these changes

>>> are we need something to support networking.

>>>

>>> The series (+DT patches to enabled DMA on AM65x and j721e) on top of 

>>> 5.5-rc1 is

>>> available:

>>> https://github.com/omap-audio/linux-audio.git 

>>> peter/udma/series_v7-5.5-rc1

>>>

>>> Regards,

>>> Peter

>>> ---

>>> Grygorii Strashko (3):

>>>    bindings: soc: ti: add documentation for k3 ringacc

>>>    soc: ti: k3: add navss ringacc driver

>>>    dmaengine: ti: k3-udma: Add glue layer for non DMAengine users

>>>

>>> Peter Ujfalusi (9):

>>>    dmaengine: doc: Add sections for per descriptor metadata support

>>>    dmaengine: Add metadata_ops for dma_async_tx_descriptor

>>>    dmaengine: Add support for reporting DMA cached data amount

>>>    dmaengine: ti: Add cppi5 header for K3 NAVSS/UDMA

>>>    dmaengine: ti: k3 PSI-L remote endpoint configuration

>>>    dt-bindings: dma: ti: Add document for K3 UDMA

>>>    dmaengine: ti: New driver for K3 UDMA

>>>    firmware: ti_sci: rm: Add support for tx_tdtype parameter for tx

>>>      channel

>>>    dmaengine: ti: k3-udma: Wait for peer teardown completion if 

>>> supported

>>>

>>>   .../devicetree/bindings/dma/ti/k3-udma.yaml   |  185 +

>>>   .../devicetree/bindings/soc/ti/k3-ringacc.txt |   59 +

>>>   Documentation/driver-api/dmaengine/client.rst |   75 +

>>>   .../driver-api/dmaengine/provider.rst         |   46 +

>>>   drivers/dma/dmaengine.c                       |   73 +

>>>   drivers/dma/dmaengine.h                       |    8 +

>>>   drivers/dma/ti/Kconfig                        |   24 +

>>>   drivers/dma/ti/Makefile                       |    3 +

>>>   drivers/dma/ti/k3-psil-am654.c                |  175 +

>>>   drivers/dma/ti/k3-psil-j721e.c                |  222 ++

>>>   drivers/dma/ti/k3-psil-priv.h                 |   39 +

>>>   drivers/dma/ti/k3-psil.c                      |   97 +

>>>   drivers/dma/ti/k3-udma-glue.c                 | 1198 ++++++

>>>   drivers/dma/ti/k3-udma-private.c              |  133 +

>>>   drivers/dma/ti/k3-udma.c                      | 3452 +++++++++++++++++

>>>   drivers/dma/ti/k3-udma.h                      |  151 +

>>>   drivers/firmware/ti_sci.c                     |    1 +

>>>   drivers/firmware/ti_sci.h                     |    7 +

>>>   drivers/soc/ti/Kconfig                        |   11 +

>>>   drivers/soc/ti/Makefile                       |    1 +

>>>   drivers/soc/ti/k3-ringacc.c                   | 1180 ++++++

>>>   include/linux/dma/k3-psil.h                   |   71 +

>>>   include/linux/dma/k3-udma-glue.h              |  134 +

>>>   include/linux/dma/ti-cppi5.h                  | 1061 +++++

>>>   include/linux/dmaengine.h                     |  110 +

>>>   include/linux/soc/ti/k3-ringacc.h             |  244 ++

>>>   include/linux/soc/ti/ti_sci_protocol.h        |    2 +

>>>   27 files changed, 8762 insertions(+)

>>>   create mode 100644 

>>> Documentation/devicetree/bindings/dma/ti/k3-udma.yaml

>>>   create mode 100644 

>>> Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt

>>>   create mode 100644 drivers/dma/ti/k3-psil-am654.c

>>>   create mode 100644 drivers/dma/ti/k3-psil-j721e.c

>>>   create mode 100644 drivers/dma/ti/k3-psil-priv.h

>>>   create mode 100644 drivers/dma/ti/k3-psil.c

>>>   create mode 100644 drivers/dma/ti/k3-udma-glue.c

>>>   create mode 100644 drivers/dma/ti/k3-udma-private.c

>>>   create mode 100644 drivers/dma/ti/k3-udma.c

>>>   create mode 100644 drivers/dma/ti/k3-udma.h

>>>   create mode 100644 drivers/soc/ti/k3-ringacc.c

>>>   create mode 100644 include/linux/dma/k3-psil.h

>>>   create mode 100644 include/linux/dma/k3-udma-glue.h

>>>   create mode 100644 include/linux/dma/ti-cppi5.h

>>>   create mode 100644 include/linux/soc/ti/k3-ringacc.h

>>>

>>

>> - Péter

>>

>>

> 


--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Grygorii Strashko Dec. 12, 2019, 6:01 p.m. UTC | #6
On 09/12/2019 11:43, Peter Ujfalusi wrote:
> Hi,

> 

> Vinod, Nishanth, Tero, Santosh: the ti_sci patch in this series was sent

> upstream over a month ago:

> https://lore.kernel.org/lkml/20191025084715.25098-1-peter.ujfalusi@ti.com/

> 

> I'm still waiting on it's fate (Tero has given his r-b).

> The ti_sci patch did not made it to 5.5-rc1, but I included it in the series and

> let the maintainers decide if it can go via DMAengine for 5.6 or to later

> releases (5.6 probably for the ti_sci and 5.7 for the UDMA driver patch).

> 

> Changes since v6:

> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=209455&state=*)

> 

> - UDMAP DMAengine driver:

>   - Squashed the split patches

>   - Squashed the early TX completion handling update

>     (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=210713&state=*)

>   - Hard reset fix for RX channels to avoid channel lockdown

>   - Correct completed descriptor's residue value

> 


Thank you Peter,
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>


-- 
Best regards,
grygorii
Vinod Koul Dec. 20, 2019, 8:28 a.m. UTC | #7
Hi Peter,

On 09-12-19, 11:43, Peter Ujfalusi wrote:

> +  Optional: per descriptor metadata

> +  ---------------------------------

> +  DMAengine provides two ways for metadata support.

> +

> +  DESC_METADATA_CLIENT

> +

> +    The metadata buffer is allocated/provided by the client driver and it is

> +    attached to the descriptor.

> +

> +  .. code-block:: c

> +

> +     int dmaengine_desc_attach_metadata(struct dma_async_tx_descriptor *desc,

> +				   void *data, size_t len);

> +

> +  DESC_METADATA_ENGINE

> +

> +    The metadata buffer is allocated/managed by the DMA driver. The client


and when would it be freed?

> +    driver can ask for the pointer, maximum size and the currently used size of

> +    the metadata and can directly update or read it.

> +

> +  .. code-block:: c

> +

> +     void *dmaengine_desc_get_metadata_ptr(struct dma_async_tx_descriptor *desc,

> +		size_t *payload_len, size_t *max_len);

> +

> +     int dmaengine_desc_set_metadata_len(struct dma_async_tx_descriptor *desc,

> +		size_t payload_len);

> +

> +  Client drivers can query if a given mode is supported with:

> +

> +  .. code-block:: c

> +

> +     bool dmaengine_is_metadata_mode_supported(struct dma_chan *chan,

> +		enum dma_desc_metadata_mode mode);

> +

> +  Depending on the used mode client drivers must follow different flow.

> +

> +  DESC_METADATA_CLIENT

> +

> +    - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM:

> +      1. prepare the descriptor (dmaengine_prep_*)

> +         construct the metadata in the client's buffer

> +      2. use dmaengine_desc_attach_metadata() to attach the buffer to the

> +         descriptor

> +      3. submit the transfer


This is simpler, txn finished the metadata would be freed up right?
> +    - DMA_DEV_TO_MEM:

> +      1. prepare the descriptor (dmaengine_prep_*)

> +      2. use dmaengine_desc_attach_metadata() to attach the buffer to the

> +         descriptor

> +      3. submit the transfer

> +      4. when the transfer is completed, the metadata should be available in the

> +         attached buffer


and when and how would driver free that up :)

> +

> +  DESC_METADATA_ENGINE

> +

> +    - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM:

> +      1. prepare the descriptor (dmaengine_prep_*)

> +      2. use dmaengine_desc_get_metadata_ptr() to get the pointer to the

> +         engine's metadata area

> +      3. update the metadata at the pointer

> +      4. use dmaengine_desc_set_metadata_len()  to tell the DMA engine the

> +         amount of data the client has placed into the metadata buffer

> +      5. submit the transfer

> +    - DMA_DEV_TO_MEM:

> +      1. prepare the descriptor (dmaengine_prep_*)

> +      2. submit the transfer

> +      3. on transfer completion, use dmaengine_desc_get_metadata_ptr() to get the

> +         pointer to the engine's metadata area

> +      4. Read out the metadata from the pointer

> +

> +  .. note::

> +

> +     Mixed use of DESC_METADATA_CLIENT / DESC_METADATA_ENGINE is not allowed,

> +     client drivers must use either of the modes per descriptor.


We should check that if not done already!
-- 
~Vinod
Vinod Koul Dec. 20, 2019, 8:32 a.m. UTC | #8
Hi Peter,

On 09-12-19, 11:43, Peter Ujfalusi wrote:

> +int dmaengine_desc_attach_metadata(struct dma_async_tx_descriptor *desc,

> +				   void *data, size_t len)

> +{

> +	int ret;

> +

> +	if (!desc)

> +		return -EINVAL;

> +

> +	ret = desc_check_and_set_metadata_mode(desc, DESC_METADATA_CLIENT);

> +	if (ret)

> +		return ret;

> +

> +	if (!desc->metadata_ops || !desc->metadata_ops->attach)

> +		return -ENOTSUPP;

> +

> +	return desc->metadata_ops->attach(desc, data, len);


this looks good to me, only thing is we should check if people are
mixing the modes :)

-- 
~Vinod
Vinod Koul Dec. 20, 2019, 8:37 a.m. UTC | #9
On 09-12-19, 11:43, Peter Ujfalusi wrote:
> A DMA hardware can have big cache or FIFO and the amount of data sitting in

> the DMA fabric can be an interest for the clients.

> 

> For example in audio we want to know the delay in the data flow and in case

> the DMA have significantly large FIFO/cache, it can affect the latenc/delay

> 

> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>

> Reviewed-by: Tero Kristo <t-kristo@ti.com>

> ---

>  drivers/dma/dmaengine.h   | 8 ++++++++

>  include/linux/dmaengine.h | 2 ++

>  2 files changed, 10 insertions(+)

> 

> diff --git a/drivers/dma/dmaengine.h b/drivers/dma/dmaengine.h

> index 501c0b063f85..b0b97475707a 100644

> --- a/drivers/dma/dmaengine.h

> +++ b/drivers/dma/dmaengine.h

> @@ -77,6 +77,7 @@ static inline enum dma_status dma_cookie_status(struct dma_chan *chan,

>  		state->last = complete;

>  		state->used = used;

>  		state->residue = 0;

> +		state->in_flight_bytes = 0;

>  	}

>  	return dma_async_is_complete(cookie, complete, used);

>  }

> @@ -87,6 +88,13 @@ static inline void dma_set_residue(struct dma_tx_state *state, u32 residue)

>  		state->residue = residue;

>  }

>  

> +static inline void dma_set_in_flight_bytes(struct dma_tx_state *state,

> +					   u32 in_flight_bytes)

> +{

> +	if (state)

> +		state->in_flight_bytes = in_flight_bytes;

> +}


This would be used by dmaengine drivers right, so lets move it to drivers/dma/dmaengine.h

lets not expose this to users :)

-- 
~Vinod
Peter Ujfalusi Dec. 20, 2019, 8:48 a.m. UTC | #10
Hi Vinod,

On 20/12/2019 10.32, Vinod Koul wrote:
> Hi Peter,

> 

> On 09-12-19, 11:43, Peter Ujfalusi wrote:

> 

>> +int dmaengine_desc_attach_metadata(struct dma_async_tx_descriptor *desc,

>> +				   void *data, size_t len)

>> +{

>> +	int ret;

>> +

>> +	if (!desc)

>> +		return -EINVAL;

>> +

>> +	ret = desc_check_and_set_metadata_mode(desc, DESC_METADATA_CLIENT);

>> +	if (ret)

>> +		return ret;

>> +

>> +	if (!desc->metadata_ops || !desc->metadata_ops->attach)

>> +		return -ENOTSUPP;

>> +

>> +	return desc->metadata_ops->attach(desc, data, len);

> 

> this looks good to me, only thing is we should check if people are

> mixing the modes :)


desc_check_and_set_metadata_mode() does the checking to make sure that
the modes are not mixed.

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Peter Ujfalusi Dec. 20, 2019, 10:13 a.m. UTC | #11
On 20/12/2019 11.57, Vinod Koul wrote:
> On 20-12-19, 10:49, Peter Ujfalusi wrote:

> 

>>>> +static inline void dma_set_in_flight_bytes(struct dma_tx_state *state,

>>>> +					   u32 in_flight_bytes)

>>>> +{

>>>> +	if (state)

>>>> +		state->in_flight_bytes = in_flight_bytes;

>>>> +}

>>>

>>> This would be used by dmaengine drivers right, so lets move it to drivers/dma/dmaengine.h

>>>

>>> lets not expose this to users :)

>>

>> I have put it where the dma_set_residue() was.

>> I can add a patch first to move dma_set_residue() then add

> 

> not sure I follow, but dma_set_residue() in already in drivers/dma/dmaengine.h


and this patch adds the dma_set_in_flight_bytes() to
drivers/dma/dmaengine.h

in include/linux/dmaengine.h the dma_tx_state struct is updated only.

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Peter Ujfalusi Dec. 20, 2019, 10:42 a.m. UTC | #12
Hi Vinod,

On 20/12/2019 11.54, Vinod Koul wrote:
> On 09-12-19, 11:43, Peter Ujfalusi wrote:

> 

>> +#define CPPI5_INFO2_DESC_RETPUSHPOLICY		BIT(16)

>> +#define CPPI5_INFO2_DESC_RETP_MASK		GENMASK(18, 16)

>> +

>> +#define CPPI5_INFO2_DESC_RETQ_SHIFT		(0)

>> +#define CPPI5_INFO2_DESC_RETQ_MASK		GENMASK(15, 0)

>> +

>> +#define CPPI5_INFO3_DESC_SRCTAG_SHIFT		(16U)

>> +#define CPPI5_INFO3_DESC_SRCTAG_MASK		GENMASK(31, 16)

>> +#define CPPI5_INFO3_DESC_DSTTAG_SHIFT		(0)

>> +#define CPPI5_INFO3_DESC_DSTTAG_MASK		GENMASK(15, 0)

>> +

>> +#define CPPI5_BUFINFO1_HDESC_DATA_LEN_SHIFT	(0)

>> +#define CPPI5_BUFINFO1_HDESC_DATA_LEN_MASK	GENMASK(27, 0)

>> +

>> +#define CPPI5_OBUFINFO0_HDESC_BUF_LEN_SHIFT	(0)

>> +#define CPPI5_OBUFINFO0_HDESC_BUF_LEN_MASK	GENMASK(27, 0)

> 

> I think you can remove the SHIFT defines and use ffs() to get the bit

> position for shift


Right. I'll convert to use ffs()

> 

>> +static inline u32 cppi5_hdesc_calc_size(bool epib, u32 psdata_size,

>> +					u32 sw_data_size)

>> +{

>> +	u32 desc_size;

>> +

>> +	if (psdata_size > CPPI5_INFO0_HDESC_PSDATA_MAX_SIZE)

>> +		return 0;

>> +

>> +	desc_size = sizeof(struct cppi5_host_desc_t) + psdata_size +

>> +		    sw_data_size;

> 

> I think there was an API for this kind of mem allocation of struct and

> buffer attached...


The returned size is not only used when allocating memory or setting up
the dma_pool, but for UDMAP's fetch size parameter.

>> +static inline void cppi5_hdesc_reset_hbdesc(struct cppi5_host_desc_t *desc)

>> +{

>> +	desc->hdr = (struct cppi5_desc_hdr_t) { 0 };

>> +	desc->next_desc = 0;

> 

> would this not be superfluous? Or if you want a memset call?


The intention is to reset the header and the next descriptor link but
leave the backing buffer information intact. This allows the reuse of a
descriptor+buffer and we only need to set the header bits + next
descriptor pointer if any.

>> +static inline u32 *cppi5_hdesc_get_psdata32(struct cppi5_host_desc_t *desc)

>> +{

>> +	return (u32 *)cppi5_hdesc_get_psdata(desc);

> 

> you dont need casts away from void *


Hrm, or just remove this, clients can use the cppi5_hdesc_get_psdata()
directly.


- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Vinod Koul Dec. 23, 2019, 6:53 a.m. UTC | #13
On 09-12-19, 11:43, Peter Ujfalusi wrote:
> New binding document for

> Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P).

> 

> UDMA-P is introduced as part of the K3 architecture and can be found in

> AM654 and j721e.

> 

> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>

> Reviewed-by: Rob Herring <robh@kernel.org>

> ---

>  .../devicetree/bindings/dma/ti/k3-udma.yaml   | 185 ++++++++++++++++++

>  1 file changed, 185 insertions(+)

>  create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.yaml

> 

> diff --git a/Documentation/devicetree/bindings/dma/ti/k3-udma.yaml b/Documentation/devicetree/bindings/dma/ti/k3-udma.yaml

> new file mode 100644

> index 000000000000..77aef4a4abce

> --- /dev/null

> +++ b/Documentation/devicetree/bindings/dma/ti/k3-udma.yaml

> @@ -0,0 +1,185 @@

> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)

> +%YAML 1.2

> +---

> +$id: http://devicetree.org/schemas/dma/ti/k3-udma.yaml#

> +$schema: http://devicetree.org/meta-schemas/core.yaml#

> +

> +title: Texas Instruments K3 NAVSS Unified DMA Device Tree Bindings

> +

> +maintainers:

> +  - Peter Ujfalusi <peter.ujfalusi@ti.com>

> +

> +description: |

> +  The UDMA-P is intended to perform similar (but significantly upgraded)

> +  functions as the packet-oriented DMA used on previous SoC devices. The UDMA-P

> +  module supports the transmission and reception of various packet types.

> +  The UDMA-P is architected to facilitate the segmentation and reassembly of


How about:

The UDMA-P architecture facilitates the segmentation...

> +  SoC DMA data structure compliant packets to/from smaller data blocks that are

> +  natively compatible with the specific requirements of each connected

> +  peripheral.

> +  Multiple Tx and Rx channels are provided within the DMA which allow multiple

> +  segmentation or reassembly operations to be ongoing. The DMA controller

> +  maintains state information for each of the channels which allows packet

> +  segmentation and reassembly operations to be time division multiplexed between

> +  channels in order to share the underlying DMA hardware. An external DMA

> +  scheduler is used to control the ordering and rate at which this multiplexing

> +  occurs for Transmit operations. The ordering and rate of Receive operations

> +  is indirectly controlled by the order in which blocks are pushed into the DMA

> +  on the Rx PSI-L interface.

> +

> +  The UDMA-P also supports acting as both a UTC and UDMA-C for its internal

> +  channels. Channels in the UDMA-P can be configured to be either Packet-Based

> +  or Third-Party channels on a channel by channel basis.

> +

> +  All transfers within NAVSS is done between PSI-L source and destination

> +  threads.

> +  The peripherals serviced by UDMA can be PSI-L native (sa2ul, cpsw, etc) or

> +  legacy, non PSI-L native peripherals. In the later case a special, small PDMA

> +  is tasked to act as a bridge between the PSI-L fabric and the legacy

> +  peripheral.

> +

> +  PDMAs can be configured via UDMAP peer registers to match with the

> +  configuration of the legacy peripheral.

> +

> +allOf:

> +  - $ref: "../dma-controller.yaml#"

> +

> +properties:

> +  "#dma-cells":

> +    const: 1

> +    description: |

> +      The cell is the PSI-L  thread ID of the remote (to UDMAP) end.

> +      Valid ranges for thread ID depends on the data movement direction:

> +      for source thread IDs (rx): 0 - 0x7fff

> +      for destination thread IDs (tx): 0x8000 - 0xffff

> +

> +      PLease refer to the device documentation for the PSI-L thread map and also


s/PLease/Please

-- 
~Vinod
Peter Ujfalusi Dec. 23, 2019, 7:11 a.m. UTC | #14
Hi Vinod,

On 20/12/2019 12.42, Peter Ujfalusi wrote:
> Hi Vinod,

> 

> On 20/12/2019 11.54, Vinod Koul wrote:

>> On 09-12-19, 11:43, Peter Ujfalusi wrote:

>>

>>> +#define CPPI5_INFO2_DESC_RETPUSHPOLICY		BIT(16)

>>> +#define CPPI5_INFO2_DESC_RETP_MASK		GENMASK(18, 16)

>>> +

>>> +#define CPPI5_INFO2_DESC_RETQ_SHIFT		(0)

>>> +#define CPPI5_INFO2_DESC_RETQ_MASK		GENMASK(15, 0)

>>> +

>>> +#define CPPI5_INFO3_DESC_SRCTAG_SHIFT		(16U)

>>> +#define CPPI5_INFO3_DESC_SRCTAG_MASK		GENMASK(31, 16)

>>> +#define CPPI5_INFO3_DESC_DSTTAG_SHIFT		(0)

>>> +#define CPPI5_INFO3_DESC_DSTTAG_MASK		GENMASK(15, 0)

>>> +

>>> +#define CPPI5_BUFINFO1_HDESC_DATA_LEN_SHIFT	(0)

>>> +#define CPPI5_BUFINFO1_HDESC_DATA_LEN_MASK	GENMASK(27, 0)

>>> +

>>> +#define CPPI5_OBUFINFO0_HDESC_BUF_LEN_SHIFT	(0)

>>> +#define CPPI5_OBUFINFO0_HDESC_BUF_LEN_MASK	GENMASK(27, 0)

>>

>> I think you can remove the SHIFT defines and use ffs() to get the bit

>> position for shift

> 

> Right. I'll convert to use ffs()


I rather keep the defines.
While ffs() is simple, it is going to have effect in speeds gigabit or
beyond.

>>> +static inline u32 cppi5_hdesc_calc_size(bool epib, u32 psdata_size,

>>> +					u32 sw_data_size)

>>> +{

>>> +	u32 desc_size;

>>> +

>>> +	if (psdata_size > CPPI5_INFO0_HDESC_PSDATA_MAX_SIZE)

>>> +		return 0;

>>> +

>>> +	desc_size = sizeof(struct cppi5_host_desc_t) + psdata_size +

>>> +		    sw_data_size;

>>

>> I think there was an API for this kind of mem allocation of struct and

>> buffer attached...

> 

> The returned size is not only used when allocating memory or setting up

> the dma_pool, but for UDMAP's fetch size parameter.

> 

>>> +static inline void cppi5_hdesc_reset_hbdesc(struct cppi5_host_desc_t *desc)

>>> +{

>>> +	desc->hdr = (struct cppi5_desc_hdr_t) { 0 };

>>> +	desc->next_desc = 0;

>>

>> would this not be superfluous? Or if you want a memset call?

> 

> The intention is to reset the header and the next descriptor link but

> leave the backing buffer information intact. This allows the reuse of a

> descriptor+buffer and we only need to set the header bits + next

> descriptor pointer if any.

> 

>>> +static inline u32 *cppi5_hdesc_get_psdata32(struct cppi5_host_desc_t *desc)

>>> +{

>>> +	return (u32 *)cppi5_hdesc_get_psdata(desc);

>>

>> you dont need casts away from void *

> 

> Hrm, or just remove this, clients can use the cppi5_hdesc_get_psdata()

> directly.

> 

> 

> - Péter

> 

> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.

> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

> 


- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Vinod Koul Dec. 23, 2019, 7:34 a.m. UTC | #15
On 09-12-19, 11:43, Peter Ujfalusi wrote:

> +#include <linux/kernel.h>

> +#include <linux/dmaengine.h>

> +#include <linux/dma-mapping.h>

> +#include <linux/dmapool.h>

> +#include <linux/err.h>

> +#include <linux/init.h>

> +#include <linux/interrupt.h>

> +#include <linux/list.h>

> +#include <linux/module.h>

> +#include <linux/platform_device.h>

> +#include <linux/slab.h>

> +#include <linux/spinlock.h>

> +#include <linux/of.h>

> +#include <linux/of_dma.h>

> +#include <linux/of_device.h>

> +#include <linux/of_irq.h>


to many of headers, do we need all!

> +static char *udma_get_dir_text(enum dma_transfer_direction dir)

> +{

> +	switch (dir) {

> +	case DMA_DEV_TO_MEM:

> +		return "DEV_TO_MEM";

> +	case DMA_MEM_TO_DEV:

> +		return "MEM_TO_DEV";

> +	case DMA_MEM_TO_MEM:

> +		return "MEM_TO_MEM";

> +	case DMA_DEV_TO_DEV:

> +		return "DEV_TO_DEV";

> +	default:

> +		break;

> +	}

> +

> +	return "invalid";

> +}


this seems generic which other ppl may need, can we move it to core.

> +

> +static void udma_reset_uchan(struct udma_chan *uc)

> +{

> +	uc->state = UDMA_CHAN_IS_IDLE;

> +	uc->remote_thread_id = -1;

> +	uc->dir = DMA_MEM_TO_MEM;

> +	uc->pkt_mode = false;

> +	uc->ep_type = PSIL_EP_NATIVE;

> +	uc->enable_acc32 = 0;

> +	uc->enable_burst = 0;

> +	uc->channel_tpl = 0;

> +	uc->psd_size = 0;

> +	uc->metadata_size = 0;

> +	uc->hdesc_size = 0;

> +	uc->notdpkt = 0;


rather than do setting zero, why note memset and then set the nonzero
members only?

> +static void udma_reset_counters(struct udma_chan *uc)

> +{

> +	u32 val;

> +

> +	if (uc->tchan) {

> +		val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_BCNT_REG);

> +		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_BCNT_REG, val);


so you read back from UDMA_TCHAN_RT_BCNT_REG and write same value to
it??

> +

> +		val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_SBCNT_REG);

> +		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_SBCNT_REG, val);

> +

> +		val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_PCNT_REG);

> +		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_PCNT_REG, val);

> +

> +		val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_PEER_BCNT_REG);

> +		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_PEER_BCNT_REG, val);

> +	}

> +

> +	if (uc->rchan) {

> +		val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_BCNT_REG);

> +		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_BCNT_REG, val);

> +

> +		val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_SBCNT_REG);

> +		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_SBCNT_REG, val);

> +

> +		val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_PCNT_REG);

> +		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_PCNT_REG, val);

> +

> +		val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_PEER_BCNT_REG);

> +		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_PEER_BCNT_REG, val);


True for all of these, what am I missing :)

> +static int udma_start(struct udma_chan *uc)

> +{

> +	struct virt_dma_desc *vd = vchan_next_desc(&uc->vc);

> +

> +	if (!vd) {

> +		uc->desc = NULL;

> +		return -ENOENT;

> +	}

> +

> +	list_del(&vd->node);

> +

> +	uc->desc = to_udma_desc(&vd->tx);

> +

> +	/* Channel is already running and does not need reconfiguration */

> +	if (udma_is_chan_running(uc) && !udma_chan_needs_reconfiguration(uc)) {

> +		udma_start_desc(uc);

> +		goto out;


How about the case where settings are different than the current one?

> +static struct udma_desc *udma_alloc_tr_desc(struct udma_chan *uc,

> +					    size_t tr_size, int tr_count,

> +					    enum dma_transfer_direction dir)

> +{

> +	struct udma_hwdesc *hwdesc;

> +	struct cppi5_desc_hdr_t *tr_desc;

> +	struct udma_desc *d;

> +	u32 reload_count = 0;

> +	u32 ring_id;

> +

> +	switch (tr_size) {

> +	case 16:

> +	case 32:

> +	case 64:

> +	case 128:

> +		break;

> +	default:

> +		dev_err(uc->ud->dev, "Unsupported TR size of %zu\n", tr_size);

> +		return NULL;

> +	}

> +

> +	/* We have only one descriptor containing multiple TRs */

> +	d = kzalloc(sizeof(*d) + sizeof(d->hwdesc[0]), GFP_ATOMIC);


this is invoked from prep_ so should use GFP_NOWAIT, we dont use
GFP_ATOMIC :)

> +static struct udma_desc *

> +udma_prep_slave_sg_tr(struct udma_chan *uc, struct scatterlist *sgl,

> +		      unsigned int sglen, enum dma_transfer_direction dir,

> +		      unsigned long tx_flags, void *context)

> +{

> +	enum dma_slave_buswidth dev_width;

> +	struct scatterlist *sgent;

> +	struct udma_desc *d;

> +	size_t tr_size;

> +	struct cppi5_tr_type1_t *tr_req = NULL;

> +	unsigned int i;

> +	u32 burst;

> +

> +	if (dir == DMA_DEV_TO_MEM) {

> +		dev_width = uc->cfg.src_addr_width;

> +		burst = uc->cfg.src_maxburst;

> +	} else if (dir == DMA_MEM_TO_DEV) {

> +		dev_width = uc->cfg.dst_addr_width;

> +		burst = uc->cfg.dst_maxburst;

> +	} else {

> +		dev_err(uc->ud->dev, "%s: bad direction?\n", __func__);

> +		return NULL;

> +	}

> +

> +	if (!burst)

> +		burst = 1;

> +

> +	/* Now allocate and setup the descriptor. */

> +	tr_size = sizeof(struct cppi5_tr_type1_t);

> +	d = udma_alloc_tr_desc(uc, tr_size, sglen, dir);

> +	if (!d)

> +		return NULL;

> +

> +	d->sglen = sglen;

> +

> +	tr_req = (struct cppi5_tr_type1_t *)d->hwdesc[0].tr_req_base;


cast away from void *?

> +static int udma_configure_statictr(struct udma_chan *uc, struct udma_desc *d,

> +				   enum dma_slave_buswidth dev_width,

> +				   u16 elcnt)

> +{

> +	if (uc->ep_type != PSIL_EP_PDMA_XY)

> +		return 0;

> +

> +	/* Bus width translates to the element size (ES) */

> +	switch (dev_width) {

> +	case DMA_SLAVE_BUSWIDTH_1_BYTE:

> +		d->static_tr.elsize = 0;

> +		break;

> +	case DMA_SLAVE_BUSWIDTH_2_BYTES:

> +		d->static_tr.elsize = 1;

> +		break;

> +	case DMA_SLAVE_BUSWIDTH_3_BYTES:

> +		d->static_tr.elsize = 2;

> +		break;

> +	case DMA_SLAVE_BUSWIDTH_4_BYTES:

> +		d->static_tr.elsize = 3;

> +		break;

> +	case DMA_SLAVE_BUSWIDTH_8_BYTES:

> +		d->static_tr.elsize = 4;


seems like ffs(dev_width) to me?

> +static struct udma_desc *

> +udma_prep_slave_sg_pkt(struct udma_chan *uc, struct scatterlist *sgl,

> +		       unsigned int sglen, enum dma_transfer_direction dir,

> +		       unsigned long tx_flags, void *context)

> +{

> +	struct scatterlist *sgent;

> +	struct cppi5_host_desc_t *h_desc = NULL;

> +	struct udma_desc *d;

> +	u32 ring_id;

> +	unsigned int i;

> +

> +	d = kzalloc(sizeof(*d) + sglen * sizeof(d->hwdesc[0]), GFP_ATOMIC);


GFP_NOWAIT here and few other places

> +static struct udma_desc *

> +udma_prep_dma_cyclic_pkt(struct udma_chan *uc, dma_addr_t buf_addr,

> +			 size_t buf_len, size_t period_len,

> +			 enum dma_transfer_direction dir, unsigned long flags)

> +{

> +	struct udma_desc *d;

> +	u32 ring_id;

> +	int i;

> +	int periods = buf_len / period_len;

> +

> +	if (periods > (K3_UDMA_DEFAULT_RING_SIZE - 1))

> +		return NULL;

> +

> +	if (period_len > 0x3FFFFF)


Magic?

> +static enum dma_status udma_tx_status(struct dma_chan *chan,

> +				      dma_cookie_t cookie,

> +				      struct dma_tx_state *txstate)

> +{

> +	struct udma_chan *uc = to_udma_chan(chan);

> +	enum dma_status ret;

> +	unsigned long flags;

> +

> +	spin_lock_irqsave(&uc->vc.lock, flags);

> +

> +	ret = dma_cookie_status(chan, cookie, txstate);

> +

> +	if (!udma_is_chan_running(uc))

> +		ret = DMA_COMPLETE;


Even for paused, not started channel? Not sure what will be return on those cases
-- 
~Vinod
Peter Ujfalusi Dec. 23, 2019, 8:59 a.m. UTC | #16
On 23/12/2019 9.34, Vinod Koul wrote:
> On 09-12-19, 11:43, Peter Ujfalusi wrote:

> 

>> +#include <linux/kernel.h>

>> +#include <linux/dmaengine.h>

>> +#include <linux/dma-mapping.h>

>> +#include <linux/dmapool.h>

>> +#include <linux/err.h>

>> +#include <linux/init.h>

>> +#include <linux/interrupt.h>

>> +#include <linux/list.h>

>> +#include <linux/module.h>

>> +#include <linux/platform_device.h>

>> +#include <linux/slab.h>

>> +#include <linux/spinlock.h>

>> +#include <linux/of.h>

>> +#include <linux/of_dma.h>

>> +#include <linux/of_device.h>

>> +#include <linux/of_irq.h>

> 

> to many of headers, do we need all!


I'll try to cut them back.

>> +static char *udma_get_dir_text(enum dma_transfer_direction dir)

>> +{

>> +	switch (dir) {

>> +	case DMA_DEV_TO_MEM:

>> +		return "DEV_TO_MEM";

>> +	case DMA_MEM_TO_DEV:

>> +		return "MEM_TO_DEV";

>> +	case DMA_MEM_TO_MEM:

>> +		return "MEM_TO_MEM";

>> +	case DMA_DEV_TO_DEV:

>> +		return "DEV_TO_DEV";

>> +	default:

>> +		break;

>> +	}

>> +

>> +	return "invalid";

>> +}

> 

> this seems generic which other ppl may need, can we move it to core.


dmaengine_get_direction_text() to include/linux/dmaengine.h
This way client drivers can use it if they need it?

>> +static void udma_reset_uchan(struct udma_chan *uc)

>> +{

>> +	uc->state = UDMA_CHAN_IS_IDLE;

>> +	uc->remote_thread_id = -1;

>> +	uc->dir = DMA_MEM_TO_MEM;

>> +	uc->pkt_mode = false;

>> +	uc->ep_type = PSIL_EP_NATIVE;

>> +	uc->enable_acc32 = 0;

>> +	uc->enable_burst = 0;

>> +	uc->channel_tpl = 0;

>> +	uc->psd_size = 0;

>> +	uc->metadata_size = 0;

>> +	uc->hdesc_size = 0;

>> +	uc->notdpkt = 0;

> 

> rather than do setting zero, why note memset and then set the nonzero

> members only?


I have lots of other things in udma_chan which can not be memset, vchan
struct, tasklet, name (for irq), etc.

to use memset, I think I could move parameters under a new struct
(udma_chan_params) keeping only the state in udma_chan.


>> +static void udma_reset_counters(struct udma_chan *uc)

>> +{

>> +	u32 val;

>> +

>> +	if (uc->tchan) {

>> +		val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_BCNT_REG);

>> +		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_BCNT_REG, val);

> 

> so you read back from UDMA_TCHAN_RT_BCNT_REG and write same value to

> it??


Yes, that's correct. This is how we can reset it. The counter is
decremented with the value you have written to the register.

>> +

>> +		val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_SBCNT_REG);

>> +		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_SBCNT_REG, val);

>> +

>> +		val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_PCNT_REG);

>> +		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_PCNT_REG, val);

>> +

>> +		val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_PEER_BCNT_REG);

>> +		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_PEER_BCNT_REG, val);

>> +	}

>> +

>> +	if (uc->rchan) {

>> +		val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_BCNT_REG);

>> +		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_BCNT_REG, val);

>> +

>> +		val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_SBCNT_REG);

>> +		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_SBCNT_REG, val);

>> +

>> +		val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_PCNT_REG);

>> +		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_PCNT_REG, val);

>> +

>> +		val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_PEER_BCNT_REG);

>> +		udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_PEER_BCNT_REG, val);

> 

> True for all of these, what am I missing :)


Decrement on write.

> 

>> +static int udma_start(struct udma_chan *uc)

>> +{

>> +	struct virt_dma_desc *vd = vchan_next_desc(&uc->vc);

>> +

>> +	if (!vd) {

>> +		uc->desc = NULL;

>> +		return -ENOENT;

>> +	}

>> +

>> +	list_del(&vd->node);

>> +

>> +	uc->desc = to_udma_desc(&vd->tx);

>> +

>> +	/* Channel is already running and does not need reconfiguration */

>> +	if (udma_is_chan_running(uc) && !udma_chan_needs_reconfiguration(uc)) {

>> +		udma_start_desc(uc);

>> +		goto out;

> 

> How about the case where settings are different than the current one?


udma_chan_needs_reconfiguration() is checking that. I only need to
reconfigure UDMAP/PDMA if the settings have changed.

>> +static struct udma_desc *udma_alloc_tr_desc(struct udma_chan *uc,

>> +					    size_t tr_size, int tr_count,

>> +					    enum dma_transfer_direction dir)

>> +{

>> +	struct udma_hwdesc *hwdesc;

>> +	struct cppi5_desc_hdr_t *tr_desc;

>> +	struct udma_desc *d;

>> +	u32 reload_count = 0;

>> +	u32 ring_id;

>> +

>> +	switch (tr_size) {

>> +	case 16:

>> +	case 32:

>> +	case 64:

>> +	case 128:

>> +		break;

>> +	default:

>> +		dev_err(uc->ud->dev, "Unsupported TR size of %zu\n", tr_size);

>> +		return NULL;

>> +	}

>> +

>> +	/* We have only one descriptor containing multiple TRs */

>> +	d = kzalloc(sizeof(*d) + sizeof(d->hwdesc[0]), GFP_ATOMIC);

> 

> this is invoked from prep_ so should use GFP_NOWAIT, we dont use

> GFP_ATOMIC :)


Ok. btw: EDMA and sDMA driver is using GFP_ATOMIC :o

> 

>> +static struct udma_desc *

>> +udma_prep_slave_sg_tr(struct udma_chan *uc, struct scatterlist *sgl,

>> +		      unsigned int sglen, enum dma_transfer_direction dir,

>> +		      unsigned long tx_flags, void *context)

>> +{

>> +	enum dma_slave_buswidth dev_width;

>> +	struct scatterlist *sgent;

>> +	struct udma_desc *d;

>> +	size_t tr_size;

>> +	struct cppi5_tr_type1_t *tr_req = NULL;

>> +	unsigned int i;

>> +	u32 burst;

>> +

>> +	if (dir == DMA_DEV_TO_MEM) {

>> +		dev_width = uc->cfg.src_addr_width;

>> +		burst = uc->cfg.src_maxburst;

>> +	} else if (dir == DMA_MEM_TO_DEV) {

>> +		dev_width = uc->cfg.dst_addr_width;

>> +		burst = uc->cfg.dst_maxburst;

>> +	} else {

>> +		dev_err(uc->ud->dev, "%s: bad direction?\n", __func__);

>> +		return NULL;

>> +	}

>> +

>> +	if (!burst)

>> +		burst = 1;

>> +

>> +	/* Now allocate and setup the descriptor. */

>> +	tr_size = sizeof(struct cppi5_tr_type1_t);

>> +	d = udma_alloc_tr_desc(uc, tr_size, sglen, dir);

>> +	if (!d)

>> +		return NULL;

>> +

>> +	d->sglen = sglen;

>> +

>> +	tr_req = (struct cppi5_tr_type1_t *)d->hwdesc[0].tr_req_base;

> 

> cast away from void *?


True, it is not needed.

>> +static int udma_configure_statictr(struct udma_chan *uc, struct udma_desc *d,

>> +				   enum dma_slave_buswidth dev_width,

>> +				   u16 elcnt)

>> +{

>> +	if (uc->ep_type != PSIL_EP_PDMA_XY)

>> +		return 0;

>> +

>> +	/* Bus width translates to the element size (ES) */

>> +	switch (dev_width) {

>> +	case DMA_SLAVE_BUSWIDTH_1_BYTE:

>> +		d->static_tr.elsize = 0;

>> +		break;

>> +	case DMA_SLAVE_BUSWIDTH_2_BYTES:

>> +		d->static_tr.elsize = 1;

>> +		break;

>> +	case DMA_SLAVE_BUSWIDTH_3_BYTES:

>> +		d->static_tr.elsize = 2;

>> +		break;

>> +	case DMA_SLAVE_BUSWIDTH_4_BYTES:

>> +		d->static_tr.elsize = 3;

>> +		break;

>> +	case DMA_SLAVE_BUSWIDTH_8_BYTES:

>> +		d->static_tr.elsize = 4;

> 

> seems like ffs(dev_width) to me?


Not really:
ffs(DMA_SLAVE_BUSWIDTH_1_BYTE) = 1
ffs(DMA_SLAVE_BUSWIDTH_2_BYTES) = 2
ffs(DMA_SLAVE_BUSWIDTH_3_BYTES) = 1
ffs(DMA_SLAVE_BUSWIDTH_4_BYTES) = 3
ffs(DMA_SLAVE_BUSWIDTH_8_BYTES) = 4

> 

>> +static struct udma_desc *

>> +udma_prep_slave_sg_pkt(struct udma_chan *uc, struct scatterlist *sgl,

>> +		       unsigned int sglen, enum dma_transfer_direction dir,

>> +		       unsigned long tx_flags, void *context)

>> +{

>> +	struct scatterlist *sgent;

>> +	struct cppi5_host_desc_t *h_desc = NULL;

>> +	struct udma_desc *d;

>> +	u32 ring_id;

>> +	unsigned int i;

>> +

>> +	d = kzalloc(sizeof(*d) + sglen * sizeof(d->hwdesc[0]), GFP_ATOMIC);

> 

> GFP_NOWAIT here and few other places


Yes, I have fixed them up by this time.

> 

>> +static struct udma_desc *

>> +udma_prep_dma_cyclic_pkt(struct udma_chan *uc, dma_addr_t buf_addr,

>> +			 size_t buf_len, size_t period_len,

>> +			 enum dma_transfer_direction dir, unsigned long flags)

>> +{

>> +	struct udma_desc *d;

>> +	u32 ring_id;

>> +	int i;

>> +	int periods = buf_len / period_len;

>> +

>> +	if (periods > (K3_UDMA_DEFAULT_RING_SIZE - 1))

>> +		return NULL;

>> +

>> +	if (period_len > 0x3FFFFF)

> 

> Magic?


I'll add a define to cppi5. It is the packet length limit.

> 

>> +static enum dma_status udma_tx_status(struct dma_chan *chan,

>> +				      dma_cookie_t cookie,

>> +				      struct dma_tx_state *txstate)

>> +{

>> +	struct udma_chan *uc = to_udma_chan(chan);

>> +	enum dma_status ret;

>> +	unsigned long flags;

>> +

>> +	spin_lock_irqsave(&uc->vc.lock, flags);

>> +

>> +	ret = dma_cookie_status(chan, cookie, txstate);

>> +

>> +	if (!udma_is_chan_running(uc))

>> +		ret = DMA_COMPLETE;

> 

> Even for paused, not started channel? Not sure what will be return on those cases


Hrm, if the channel is not started yet, then I think it should be still
DMA_IN_PROGRESS, right?
The udma_is_chan_running() can be dropped from here.
I did missed the DMA_PAUSED state.

-	if (!udma_is_chan_running(uc))
-		ret = DMA_COMPLETE;
+	if (ret == DMA_IN_PROGRESS && udma_is_chan_paused(uc))
+		ret = DMA_PAUSED;


- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Vinod Koul Dec. 23, 2019, 11:26 a.m. UTC | #17
On 23-12-19, 10:59, Peter Ujfalusi wrote:

> >> +static void udma_reset_counters(struct udma_chan *uc)

> >> +{

> >> +	u32 val;

> >> +

> >> +	if (uc->tchan) {

> >> +		val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_BCNT_REG);

> >> +		udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_BCNT_REG, val);

> > 

> > so you read back from UDMA_TCHAN_RT_BCNT_REG and write same value to

> > it??

> 

> Yes, that's correct. This is how we can reset it. The counter is

> decremented with the value you have written to the register.


aha, with so many read+write back I would have added a helper.. Not a
big deal though can be updated later

> >> +static struct udma_desc *udma_alloc_tr_desc(struct udma_chan *uc,

> >> +					    size_t tr_size, int tr_count,

> >> +					    enum dma_transfer_direction dir)

> >> +{

> >> +	struct udma_hwdesc *hwdesc;

> >> +	struct cppi5_desc_hdr_t *tr_desc;

> >> +	struct udma_desc *d;

> >> +	u32 reload_count = 0;

> >> +	u32 ring_id;

> >> +

> >> +	switch (tr_size) {

> >> +	case 16:

> >> +	case 32:

> >> +	case 64:

> >> +	case 128:

> >> +		break;

> >> +	default:

> >> +		dev_err(uc->ud->dev, "Unsupported TR size of %zu\n", tr_size);

> >> +		return NULL;

> >> +	}

> >> +

> >> +	/* We have only one descriptor containing multiple TRs */

> >> +	d = kzalloc(sizeof(*d) + sizeof(d->hwdesc[0]), GFP_ATOMIC);

> > 

> > this is invoked from prep_ so should use GFP_NOWAIT, we dont use

> > GFP_ATOMIC :)

> 

> Ok. btw: EDMA and sDMA driver is using GFP_ATOMIC :o


heh, we made sure to document this bit :)

> >> +static int udma_configure_statictr(struct udma_chan *uc, struct udma_desc *d,

> >> +				   enum dma_slave_buswidth dev_width,

> >> +				   u16 elcnt)

> >> +{

> >> +	if (uc->ep_type != PSIL_EP_PDMA_XY)

> >> +		return 0;

> >> +

> >> +	/* Bus width translates to the element size (ES) */

> >> +	switch (dev_width) {

> >> +	case DMA_SLAVE_BUSWIDTH_1_BYTE:

> >> +		d->static_tr.elsize = 0;

> >> +		break;

> >> +	case DMA_SLAVE_BUSWIDTH_2_BYTES:

> >> +		d->static_tr.elsize = 1;

> >> +		break;

> >> +	case DMA_SLAVE_BUSWIDTH_3_BYTES:

> >> +		d->static_tr.elsize = 2;

> >> +		break;

> >> +	case DMA_SLAVE_BUSWIDTH_4_BYTES:

> >> +		d->static_tr.elsize = 3;

> >> +		break;

> >> +	case DMA_SLAVE_BUSWIDTH_8_BYTES:

> >> +		d->static_tr.elsize = 4;

> > 

> > seems like ffs(dev_width) to me?

> 

> Not really:

> ffs(DMA_SLAVE_BUSWIDTH_1_BYTE) = 1

> ffs(DMA_SLAVE_BUSWIDTH_2_BYTES) = 2

> ffs(DMA_SLAVE_BUSWIDTH_3_BYTES) = 1


I missed this!

> ffs(DMA_SLAVE_BUSWIDTH_4_BYTES) = 3

> ffs(DMA_SLAVE_BUSWIDTH_8_BYTES) = 4


Otherwise you are ffs() - 1

-- 
~Vinod