diff mbox series

[PATCHv4] serial: imx: Add DMA buffer configuration via sysfs

Message ID 20210305115058.92284-1-sebastian.reichel@collabora.com
State New
Headers show
Series [PATCHv4] serial: imx: Add DMA buffer configuration via sysfs | expand

Commit Message

Sebastian Reichel March 5, 2021, 11:50 a.m. UTC
From: Fabien Lahoudere <fabien.lahoudere@collabora.com>

In order to optimize serial communication (performance/throughput VS
latency), we may need to tweak DMA period number and size. This adds
sysfs attributes to configure those values before initialising DMA.
The defaults will stay the same as before (16 buffers with a size of
1024 bytes). Afterwards the values can be read/write with the
following sysfs files:

/sys/class/tty/ttymxc*/dma_buffer_size
/sys/class/tty/ttymxc*/dma_buffer_count

This is mainly needed for GEHC CS ONE (arch/arm/boot/dts/imx53-ppd.dts),
which has multiple microcontrollers connected via UART controlling. One
of the UARTs is connected to an on-board microcontroller at 19200 baud,
which constantly pushes critical data (so aging character detect
interrupt will never trigger). This data must be processed at 50-200 Hz,
so UART should return data in less than 5-20ms. With 1024 byte DMA
buffer (and a constant data stream) the read operation instead needs
1024 byte / 19200 baud = 53.333ms, which is way too long (note: Worst
Case would be remote processor sending data with short pauses <= 7
characters, which would further increase this number). The current
downstream kernel instead configures 24 bytes resulting in 1.25ms,
but that is obviously not sensible for normal UART use cases and cannot
be used as new default.

The same device also has another microcontroller with a 4M baud UART
connection exchanging lots of data. For this the same mechanism can
be used to increase the buffer size (downstream uses 4K instead of
the default 1K) with potentially slightly reduced buffer count. At
this baud rate latency is not an issue (4096 byte / 4M baud = 0.977 ms).
Before increasing the default buffer count from 4 to 16 in 76c38d30fee7,
this was required to avoid data loss. With the changed default it's
a performance optimization.

Signed-off-by: Fabien Lahoudere <fabien.lahoudere@collabora.com>
[reword documentation and commit message, rebase to current code]
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
---
I excluded Fabien from the Cc list, his mail address is no longer valid.

Changes since PATCHv3 [*]:
 * rewrote commit message to provide a lot more details why this is needed
 * rebased to torvalds/master (5.12-rc1-dontuse), also applies on top of linux-next
 * use sysfs_emit() instead of sprintf

[*] https://lore.kernel.org/lkml/1539249903-6316-1-git-send-email-fabien.lahoudere@collabora.com/
---
 .../ABI/stable/sysfs-driver-imx-uart          | 12 +++
 drivers/tty/serial/imx.c                      | 98 +++++++++++++++++--
 2 files changed, 103 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/ABI/stable/sysfs-driver-imx-uart

Comments

Sebastian Reichel April 5, 2021, 9:44 p.m. UTC | #1
Hi,

On Fri, Mar 05, 2021 at 01:42:52PM +0100, Sebastian Reichel wrote:
> On Fri, Mar 05, 2021 at 01:06:12PM +0100, Greg Kroah-Hartman wrote:

> > On Fri, Mar 05, 2021 at 12:50:58PM +0100, Sebastian Reichel wrote:

> > > From: Fabien Lahoudere <fabien.lahoudere@collabora.com>

> > > 

> > > In order to optimize serial communication (performance/throughput VS

> > > latency), we may need to tweak DMA period number and size. This adds

> > > sysfs attributes to configure those values before initialising DMA.

> > > The defaults will stay the same as before (16 buffers with a size of

> > > 1024 bytes). Afterwards the values can be read/write with the

> > > following sysfs files:

> > > 

> > > /sys/class/tty/ttymxc*/dma_buffer_size

> > > /sys/class/tty/ttymxc*/dma_buffer_count

> > 

> > Ick no.  Custom sysfs attributes for things like serial ports are crazy.

> > 

> > > This is mainly needed for GEHC CS ONE (arch/arm/boot/dts/imx53-ppd.dts),

> > > which has multiple microcontrollers connected via UART controlling. One

> > > of the UARTs is connected to an on-board microcontroller at 19200 baud,

> > > which constantly pushes critical data (so aging character detect

> > > interrupt will never trigger). This data must be processed at 50-200 Hz,

> > > so UART should return data in less than 5-20ms. With 1024 byte DMA

> > > buffer (and a constant data stream) the read operation instead needs

> > > 1024 byte / 19200 baud = 53.333ms, which is way too long (note: Worst

> > > Case would be remote processor sending data with short pauses <= 7

> > > characters, which would further increase this number). The current

> > > downstream kernel instead configures 24 bytes resulting in 1.25ms,

> > > but that is obviously not sensible for normal UART use cases and cannot

> > > be used as new default.

> > 

> > Why can't this be a device tree attribute? Why does this have to be a

> > sysfs thing that no one will know how to tune and set over time.  This

> > hardware should not force a user to manually tune it to get it to work

> > properly, this isn't the 1990's anymore :(

> > 

> > Please never force a user to choose stuff like this, they never will

> > know what to do.

> 

> This used to be a DT attribute in PATCHv1. It has been moved over to

> sysfs since PATCHv2, since it does not describe the hardware, but

> configuration. Unfortunately lore.kernel.org does not have the full

> thread, but this is the discussion:

> 

> https://lore.kernel.org/linux-serial/20170629182618.jpahpmuq364ldcv2@pengutronix.de/

> 

> From downstream POV this can be done either by adding a DT property

> to the UART node, or by adding a udev rule.

> 

> From my POV there is not a huge difference. In both cases we will

> be bound by an ABI afterwards, in both cases people will usually

> stick to the default value and in both cases people that do deviate

> from the default probably ran into problems and started to look

> for a solution.


ping? It's not very nice to get a rejected in cycles :(

-- Sebastian
Greg Kroah-Hartman April 6, 2021, 7:13 a.m. UTC | #2
On Mon, Apr 05, 2021 at 11:44:46PM +0200, Sebastian Reichel wrote:
> Hi,

> 

> On Fri, Mar 05, 2021 at 01:42:52PM +0100, Sebastian Reichel wrote:

> > On Fri, Mar 05, 2021 at 01:06:12PM +0100, Greg Kroah-Hartman wrote:

> > > On Fri, Mar 05, 2021 at 12:50:58PM +0100, Sebastian Reichel wrote:

> > > > From: Fabien Lahoudere <fabien.lahoudere@collabora.com>

> > > > 

> > > > In order to optimize serial communication (performance/throughput VS

> > > > latency), we may need to tweak DMA period number and size. This adds

> > > > sysfs attributes to configure those values before initialising DMA.

> > > > The defaults will stay the same as before (16 buffers with a size of

> > > > 1024 bytes). Afterwards the values can be read/write with the

> > > > following sysfs files:

> > > > 

> > > > /sys/class/tty/ttymxc*/dma_buffer_size

> > > > /sys/class/tty/ttymxc*/dma_buffer_count

> > > 

> > > Ick no.  Custom sysfs attributes for things like serial ports are crazy.

> > > 

> > > > This is mainly needed for GEHC CS ONE (arch/arm/boot/dts/imx53-ppd.dts),

> > > > which has multiple microcontrollers connected via UART controlling. One

> > > > of the UARTs is connected to an on-board microcontroller at 19200 baud,

> > > > which constantly pushes critical data (so aging character detect

> > > > interrupt will never trigger). This data must be processed at 50-200 Hz,

> > > > so UART should return data in less than 5-20ms. With 1024 byte DMA

> > > > buffer (and a constant data stream) the read operation instead needs

> > > > 1024 byte / 19200 baud = 53.333ms, which is way too long (note: Worst

> > > > Case would be remote processor sending data with short pauses <= 7

> > > > characters, which would further increase this number). The current

> > > > downstream kernel instead configures 24 bytes resulting in 1.25ms,

> > > > but that is obviously not sensible for normal UART use cases and cannot

> > > > be used as new default.

> > > 

> > > Why can't this be a device tree attribute? Why does this have to be a

> > > sysfs thing that no one will know how to tune and set over time.  This

> > > hardware should not force a user to manually tune it to get it to work

> > > properly, this isn't the 1990's anymore :(

> > > 

> > > Please never force a user to choose stuff like this, they never will

> > > know what to do.

> > 

> > This used to be a DT attribute in PATCHv1. It has been moved over to

> > sysfs since PATCHv2, since it does not describe the hardware, but

> > configuration. Unfortunately lore.kernel.org does not have the full

> > thread, but this is the discussion:

> > 

> > https://lore.kernel.org/linux-serial/20170629182618.jpahpmuq364ldcv2@pengutronix.de/

> > 

> > From downstream POV this can be done either by adding a DT property

> > to the UART node, or by adding a udev rule.

> > 

> > From my POV there is not a huge difference. In both cases we will

> > be bound by an ABI afterwards, in both cases people will usually

> > stick to the default value and in both cases people that do deviate

> > from the default probably ran into problems and started to look

> > for a solution.

> 

> ping? It's not very nice to get a rejected in cycles :(


I recommend working with the DT people here, as custom sysfs attributes
for things like this that are really just describing the hardware is
crazy.

thanks,

greg k-h
Ian Ray Dec. 22, 2023, 6:02 a.m. UTC | #3
On Fri, Dec 08, 2023 at 10:02:05AM +0100, Uwe Kleine-K??nig wrote:
> be safe.
> 
> Hello Greg,
> 
> [Cc += dt maintainers]
> 
> On Tue, Apr 06, 2021 at 09:13:04AM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Apr 05, 2021 at 11:44:46PM +0200, Sebastian Reichel wrote:
> > > On Fri, Mar 05, 2021 at 01:42:52PM +0100, Sebastian Reichel wrote:
> > > > On Fri, Mar 05, 2021 at 01:06:12PM +0100, Greg Kroah-Hartman wrote:
> > > > > On Fri, Mar 05, 2021 at 12:50:58PM +0100, Sebastian Reichel wrote:
> > > > > > From: Fabien Lahoudere <fabien.lahoudere@collabora.com>
> > > > > > 
> > > > > > In order to optimize serial communication (performance/throughput VS
> > > > > > latency), we may need to tweak DMA period number and size. This adds
> > > > > > sysfs attributes to configure those values before initialising DMA.
> > > > > > The defaults will stay the same as before (16 buffers with a size of
> > > > > > 1024 bytes). Afterwards the values can be read/write with the
> > > > > > following sysfs files:
> > > > > > 
> > > > > > /sys/class/tty/ttymxc*/dma_buffer_size
> > > > > > /sys/class/tty/ttymxc*/dma_buffer_count
> > > > > 
> > > > > Ick no.  Custom sysfs attributes for things like serial ports are crazy.
> > > > > 
> > > > > > This is mainly needed for GEHC CS ONE (arch/arm/boot/dts/imx53-ppd.dts),
> > > > > > which has multiple microcontrollers connected via UART controlling. One
> > > > > > of the UARTs is connected to an on-board microcontroller at 19200 baud,
> > > > > > which constantly pushes critical data (so aging character detect
> > > > > > interrupt will never trigger). This data must be processed at 50-200 Hz,
> > > > > > so UART should return data in less than 5-20ms. With 1024 byte DMA
> > > > > > buffer (and a constant data stream) the read operation instead needs
> > > > > > 1024 byte / 19200 baud = 53.333ms, which is way too long (note: Worst
> > > > > > Case would be remote processor sending data with short pauses <= 7
> > > > > > characters, which would further increase this number). The current
> > > > > > downstream kernel instead configures 24 bytes resulting in 1.25ms,
> > > > > > but that is obviously not sensible for normal UART use cases and cannot
> > > > > > be used as new default.
> > > > > 
> > > > > Why can't this be a device tree attribute? Why does this have to be a
> > > > > sysfs thing that no one will know how to tune and set over time.  This
> > > > > hardware should not force a user to manually tune it to get it to work
> > > > > properly, this isn't the 1990's anymore :(
> > > > > 
> > > > > Please never force a user to choose stuff like this, they never will
> > > > > know what to do.
> > > > 
> > > > This used to be a DT attribute in PATCHv1. It has been moved over to
> > > > sysfs since PATCHv2, since it does not describe the hardware, but
> > > > configuration. Unfortunately lore.kernel.org does not have the full
> > > > thread, but this is the discussion:
> > > > 
> > > > https://lore.kernel.org/linux-serial/20170629182618.jpahpmuq364ldcv2@pengutronix.de/
> > > > 
> > > > From downstream POV this can be done either by adding a DT property
> > > > to the UART node, or by adding a udev rule.
> > > > 
> > > > From my POV there is not a huge difference. In both cases we will
> > > > be bound by an ABI afterwards, in both cases people will usually
> > > > stick to the default value and in both cases people that do deviate
> > > > from the default probably ran into problems and started to look
> > > > for a solution.
> > > 
> > > ping? It's not very nice to get a rejected in cycles :(
> > 
> > I recommend working with the DT people here, as custom sysfs attributes
> > for things like this that are really just describing the hardware is
> > crazy.

Continuing here (see also [1]).

[1] https://lore.kernel.org/lkml/ZXr55QV4tnCz8GtI@4600ffe2ac4f/

> 
> I was one who expressed concerns in the earlier rounds that dt isn't the
> right place for this. dt is about hardware description, but choosing
> a good value for the dma buffer size is driver tuning and depends on the
> individual requirements. (latency, throughput, memory consumption,
> robustness under system load). I can even imagine use cases where the
> settings should be changed dynamically, which cannot (easily) be done
> using dt.
> 
> While I see your point that a driver specific sysfs property is
> unusual/strange/whatever every downside you mentioned also applies to a
> dt property (or a custom ioctl).
> 
> Among the solutions I can imagine, my preference order is:
> 
>  - automatic tuning

This might be too magical.  The algorithm would likely lose data before
it "warms up" and that would be unacceptable to our application.

>  - sysfs property
>  - further discussion
>  - dt property
>  - custom ioctl

- sysctl

Given the description [2] "configure kernel parameters at runtime" this
approach would appear to meet our needs.  I have not looked into this in
any detail yet -- but would like to get upstream feedback.

[2] https://man7.org/linux/man-pages/man8/sysctl.8.html

> 
> I wonder if there is a sensible way to implement a automatic tuning. In
> the use case mentioned in the commit log, Sebastian's need is low
> latency for a constantly sending microcontroller on the other side. Is
> it sensible to make the used dma buffers smaller if we have a certain
> throughput? Or is that too magic and doomed to fail covering most use
> cases? If that doesn't work, I support Sebastian's approach to do that
> in a sysfs property.
> 
> Sebastian, have you evaluated just not using dma for these UARTs?

We spent a significant amount of time analysing and profiling our
application.  DMA is absolutely required in order to avoid dropping
data.

This is especially true for the 4M baud UART.  This section quoted from
the original patch V4 [3].

[3] https://lore.kernel.org/lkml/20210305115058.92284-1-sebastian.reichel@collabora.com/

> The same device also has another microcontroller with a 4M baud UART
> connection exchanging lots of data. For this the same mechanism can
> be used to increase the buffer size (downstream uses 4K instead of
> the default 1K) with potentially slightly reduced buffer count. At
> this baud rate latency is not an issue (4096 byte / 4M baud = 0.977 ms).
> Before increasing the default buffer count from 4 to 16 in 76c38d30fee7,
> this was required to avoid data loss. With the changed default it's
> a performance optimization.

Note: Quite a lot has change since we submitted this patch -- GE
HealthCare has spun-off from GE, and Sebastian is no longer working on
the project.

Many thanks,
Ian

> 
> Best regards
> Uwe
> 
> -- 
> Pengutronix e.K.                           | Uwe Kleine-K?nig            |
> Industrial Linux Solutions                 | https://www.pengutronix.de/ |
diff mbox series

Patch

diff --git a/Documentation/ABI/stable/sysfs-driver-imx-uart b/Documentation/ABI/stable/sysfs-driver-imx-uart
new file mode 100644
index 000000000000..27a50fcd9c5f
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-driver-imx-uart
@@ -0,0 +1,12 @@ 
+What:		/sys/class/tty/ttymxc*/dma_buffer_count
+Date:		March 2021
+Contact:	Sebastian Reichel <sebastian.reichel@collabora.com>
+Description:	The i.MX serial DMA buffer is split into multiple chunks, so that
+		chunks can be processed while others are being filled with data.
+		This represents the number of chunks.
+
+What:		/sys/class/tty/ttymxc*/dma_buffer_size
+Date:		March 2021
+Contact:	Sebastian Reichel <sebastian.reichel@collabora.com>
+Description:	This represents the size of each DMA buffer chunk. Total DMA
+		buffer size is this number multiplied by dma_buffer_count.
diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
index 8257597d034d..1c5eb7be0c07 100644
--- a/drivers/tty/serial/imx.c
+++ b/drivers/tty/serial/imx.c
@@ -225,6 +225,8 @@  struct imx_port {
 	struct scatterlist	rx_sgl, tx_sgl[2];
 	void			*rx_buf;
 	struct circ_buf		rx_ring;
+	unsigned int		rx_buf_size;
+	unsigned int		rx_period_length;
 	unsigned int		rx_periods;
 	dma_cookie_t		rx_cookie;
 	unsigned int		tx_bytes;
@@ -1193,10 +1195,6 @@  static void imx_uart_dma_rx_callback(void *data)
 	}
 }
 
-/* RX DMA buffer periods */
-#define RX_DMA_PERIODS	16
-#define RX_BUF_SIZE	(RX_DMA_PERIODS * PAGE_SIZE / 4)
-
 static int imx_uart_start_rx_dma(struct imx_port *sport)
 {
 	struct scatterlist *sgl = &sport->rx_sgl;
@@ -1207,9 +1205,8 @@  static int imx_uart_start_rx_dma(struct imx_port *sport)
 
 	sport->rx_ring.head = 0;
 	sport->rx_ring.tail = 0;
-	sport->rx_periods = RX_DMA_PERIODS;
 
-	sg_init_one(sgl, sport->rx_buf, RX_BUF_SIZE);
+	sg_init_one(sgl, sport->rx_buf, sport->rx_buf_size);
 	ret = dma_map_sg(dev, sgl, 1, DMA_FROM_DEVICE);
 	if (ret == 0) {
 		dev_err(dev, "DMA mapping error for RX.\n");
@@ -1326,7 +1323,8 @@  static int imx_uart_dma_init(struct imx_port *sport)
 		goto err;
 	}
 
-	sport->rx_buf = kzalloc(RX_BUF_SIZE, GFP_KERNEL);
+	sport->rx_buf_size = sport->rx_period_length * sport->rx_periods;
+	sport->rx_buf = kzalloc(sport->rx_buf_size, GFP_KERNEL);
 	if (!sport->rx_buf) {
 		ret = -ENOMEM;
 		goto err;
@@ -1786,6 +1784,85 @@  static const char *imx_uart_type(struct uart_port *port)
 	return sport->port.type == PORT_IMX ? "IMX" : NULL;
 }
 
+static ssize_t dma_buffer_size_store(struct device *dev,
+				     struct device_attribute *attr,
+				     const char *buf, size_t count)
+{
+	unsigned int plen;
+	int ret;
+	struct device *port_device = dev->parent;
+	struct imx_port *sport = dev_get_drvdata(port_device);
+
+	if (sport->dma_chan_rx) {
+		dev_warn(dev, "RX DMA in use\n");
+		return -EBUSY;
+	}
+
+	ret = kstrtou32(buf, 0, &plen);
+	if (ret == 0) {
+		sport->rx_period_length = plen;
+		ret = count;
+	}
+
+	return ret;
+}
+
+static ssize_t dma_buffer_size_show(struct device *dev,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	struct device *port_device = dev->parent;
+	struct imx_port *sport = dev_get_drvdata(port_device);
+
+	return sysfs_emit(buf, "%u\n", sport->rx_period_length);
+}
+
+static DEVICE_ATTR_RW(dma_buffer_size);
+
+static ssize_t dma_buffer_count_store(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
+{
+	unsigned int periods;
+	int ret;
+	struct device *port_device = dev->parent;
+	struct imx_port *sport = dev_get_drvdata(port_device);
+
+	if (sport->dma_chan_rx) {
+		dev_warn(dev, "RX DMA in use\n");
+		return -EBUSY;
+	}
+
+	ret = kstrtou32(buf, 0, &periods);
+	if (ret == 0) {
+		sport->rx_periods = periods;
+		ret = count;
+	}
+
+	return ret;
+}
+
+static ssize_t dma_buffer_count_show(struct device *dev,
+				     struct device_attribute *attr,
+				     char *buf)
+{
+	struct device *port_device = dev->parent;
+	struct imx_port *sport = dev_get_drvdata(port_device);
+
+	return sysfs_emit(buf, "%u\n", sport->rx_periods);
+}
+
+static DEVICE_ATTR_RW(dma_buffer_count);
+
+static struct attribute *imx_uart_attrs[] = {
+	&dev_attr_dma_buffer_size.attr,
+	&dev_attr_dma_buffer_count.attr,
+	NULL
+};
+static struct attribute_group imx_uart_attr_group = {
+	.attrs = imx_uart_attrs,
+};
+
 /*
  * Configure/autoconfigure the port.
  */
@@ -2189,6 +2266,10 @@  static enum hrtimer_restart imx_trigger_stop_tx(struct hrtimer *t)
 	return HRTIMER_NORESTART;
 }
 
+/* Default RX DMA buffer configuration */
+#define RX_DMA_PERIODS		16
+#define RX_DMA_PERIOD_LEN	(PAGE_SIZE / 4)
+
 static int imx_uart_probe(struct platform_device *pdev)
 {
 	struct device_node *np = pdev->dev.of_node;
@@ -2257,6 +2338,9 @@  static int imx_uart_probe(struct platform_device *pdev)
 	sport->port.rs485_config = imx_uart_rs485_config;
 	sport->port.flags = UPF_BOOT_AUTOCONF;
 	timer_setup(&sport->timer, imx_uart_timeout, 0);
+	sport->rx_period_length = RX_DMA_PERIOD_LEN;
+	sport->rx_periods = RX_DMA_PERIODS;
+	sport->port.attr_group = &imx_uart_attr_group;
 
 	sport->gpios = mctrl_gpio_init(&sport->port, 0);
 	if (IS_ERR(sport->gpios))