diff mbox series

[v2,16/21] ipmi: kcs_bmc: Add a "raw" character device interface

Message ID 20210319062752.145730-16-andrew@aj.id.au
State New
Headers show
Series [v2,01/21] dt-bindings: aspeed-lpc: Remove LPC partitioning | expand

Commit Message

Andrew Jeffery March 19, 2021, 6:27 a.m. UTC
The existing IPMI chardev encodes IPMI behaviours as the name suggests.
However, KCS devices are useful beyond IPMI (or keyboards), as they
provide a means to generate IRQs and exchange arbitrary data between a
BMC and its host system.

Implement a "raw" KCS character device that exposes the IDR, ODR and STR
registers to userspace via read() and write() implemented on a character
device:

+--------+--------+---------+
| Offset | read() | write() |
+--------+--------+---------+
|   0    |   IDR  |   ODR   |
+--------+--------+---------+
|   1    |   STR  |   STR   |
+--------+--------+---------+

This interface allows userspace to implement arbitrary (though somewhat
inefficient) protocols for exchanging information between a BMC and host
firmware. Conceptually the KCS interface can be used as an out-of-band
machanism for interrupt-signaled control messages while bulk data
transfers occur over more appropriate interfaces between the BMC and the
host (which may lack their own interrupt mechanism, e.g. LPC FW cycles).

poll() is provided, which will wait for IBF or OBE conditions for data
reads and writes respectively. Reads of STR on its own never blocks,
though accessing both offsets in the one system call may block if the
data registers are not ready.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
---
 Documentation/ABI/testing/dev-raw-kcs |  25 ++
 drivers/char/ipmi/Kconfig             |  17 +
 drivers/char/ipmi/Makefile            |   1 +
 drivers/char/ipmi/kcs_bmc_cdev_raw.c  | 443 ++++++++++++++++++++++++++
 4 files changed, 486 insertions(+)
 create mode 100644 Documentation/ABI/testing/dev-raw-kcs
 create mode 100644 drivers/char/ipmi/kcs_bmc_cdev_raw.c

Comments

Zev Weiss April 9, 2021, 5:17 a.m. UTC | #1
On Fri, Mar 19, 2021 at 01:27:47AM CDT, Andrew Jeffery wrote:
>The existing IPMI chardev encodes IPMI behaviours as the name suggests.
>However, KCS devices are useful beyond IPMI (or keyboards), as they
>provide a means to generate IRQs and exchange arbitrary data between a
>BMC and its host system.
>
>Implement a "raw" KCS character device that exposes the IDR, ODR and STR
>registers to userspace via read() and write() implemented on a character
>device:
>
>+--------+--------+---------+
>| Offset | read() | write() |
>+--------+--------+---------+
>|   0    |   IDR  |   ODR   |
>+--------+--------+---------+
>|   1    |   STR  |   STR   |
>+--------+--------+---------+
>
>This interface allows userspace to implement arbitrary (though somewhat
>inefficient) protocols for exchanging information between a BMC and host
>firmware. Conceptually the KCS interface can be used as an out-of-band
>machanism for interrupt-signaled control messages while bulk data

Typo ("mechanism")

>transfers occur over more appropriate interfaces between the BMC and the
>host (which may lack their own interrupt mechanism, e.g. LPC FW cycles).
>
>poll() is provided, which will wait for IBF or OBE conditions for data
>reads and writes respectively. Reads of STR on its own never blocks,
>though accessing both offsets in the one system call may block if the
>data registers are not ready.
>
>Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
>---
> Documentation/ABI/testing/dev-raw-kcs |  25 ++
> drivers/char/ipmi/Kconfig             |  17 +
> drivers/char/ipmi/Makefile            |   1 +
> drivers/char/ipmi/kcs_bmc_cdev_raw.c  | 443 ++++++++++++++++++++++++++
> 4 files changed, 486 insertions(+)
> create mode 100644 Documentation/ABI/testing/dev-raw-kcs
> create mode 100644 drivers/char/ipmi/kcs_bmc_cdev_raw.c
>
>diff --git a/Documentation/ABI/testing/dev-raw-kcs b/Documentation/ABI/testing/dev-raw-kcs
>new file mode 100644
>index 000000000000..06e7e2071562
>--- /dev/null
>+++ b/Documentation/ABI/testing/dev-raw-kcs
>@@ -0,0 +1,25 @@
>+What:		/dev/raw-kcs*
>+Date:		2021-02-15
>+KernelVersion:	5.13
>+Contact:	openbmc@lists.ozlabs.org
>+Contact:	openipmi-developer@lists.sourceforge.net
>+Contact:	Andrew Jeffery <andrew@aj.id.au>
>+Description:	``/dev/raw-kcs*`` exposes to userspace the data and
>+		status registers of Keyboard-Controller-Style (KCS) IPMI
>+		interfaces via read() and write() syscalls. Direct
>+		exposure of the data and status registers enables
>+		inefficient but arbitrary protocols to be implemented
>+		over the device. A typical approach is to use KCS
>+		devices for out-of-band signalling for bulk data
>+		transfers over other interfaces between a Baseboard
>+		Management Controller and its host.
>+
>+		+--------+--------+---------+
>+		| Offset | read() | write() |
>+		+--------+--------+---------+
>+		|   0    |   IDR  |   ODR   |
>+		+--------+--------+---------+
>+		|   1    |   STR  |   STR   |
>+		+--------+--------+---------+
>+
>+Users:		libmctp: https://github.com/openbmc/libmctp
>diff --git a/drivers/char/ipmi/Kconfig b/drivers/char/ipmi/Kconfig
>index bc5f81899b62..273ac1a1f870 100644
>--- a/drivers/char/ipmi/Kconfig
>+++ b/drivers/char/ipmi/Kconfig
>@@ -137,6 +137,23 @@ config IPMI_KCS_BMC_CDEV_IPMI
> 	  This support is also available as a module. The module will be
> 	  called kcs_bmc_cdev_ipmi.
>
>+config IPMI_KCS_BMC_CDEV_RAW
>+	depends on IPMI_KCS_BMC
>+	tristate "Raw character device interface for BMC KCS devices"
>+	help
>+	  Provides a BMC-side character device directly exposing the
>+	  data and status registers of a KCS device to userspace. While
>+	  KCS devices are commonly used to implement IPMI message
>+	  passing, they provide a general interface for exchange of
>+	  interrupts, data and status information between the BMC and
>+	  its host.
>+
>+	  Say YES if you wish to use the KCS devices to implement
>+	  protocols that are not IPMI.
>+
>+	  This support is also available as a module. The module will be
>+	  called kcs_bmc_cdev_raw.
>+
> config ASPEED_BT_IPMI_BMC
> 	depends on ARCH_ASPEED || COMPILE_TEST
> 	depends on REGMAP && REGMAP_MMIO && MFD_SYSCON
>diff --git a/drivers/char/ipmi/Makefile b/drivers/char/ipmi/Makefile
>index fcfa676afddb..c8cc248ddd90 100644
>--- a/drivers/char/ipmi/Makefile
>+++ b/drivers/char/ipmi/Makefile
>@@ -24,6 +24,7 @@ obj-$(CONFIG_IPMI_WATCHDOG) += ipmi_watchdog.o
> obj-$(CONFIG_IPMI_POWEROFF) += ipmi_poweroff.o
> obj-$(CONFIG_IPMI_KCS_BMC) += kcs_bmc.o
> obj-$(CONFIG_IPMI_KCS_BMC_CDEV_IPMI) += kcs_bmc_cdev_ipmi.o
>+obj-$(CONFIG_IPMI_KCS_BMC_CDEV_RAW) += kcs_bmc_cdev_raw.o
> obj-$(CONFIG_ASPEED_BT_IPMI_BMC) += bt-bmc.o
> obj-$(CONFIG_ASPEED_KCS_IPMI_BMC) += kcs_bmc_aspeed.o
> obj-$(CONFIG_NPCM7XX_KCS_IPMI_BMC) += kcs_bmc_npcm7xx.o
>diff --git a/drivers/char/ipmi/kcs_bmc_cdev_raw.c b/drivers/char/ipmi/kcs_bmc_cdev_raw.c
>new file mode 100644
>index 000000000000..bdd258648c8e
>--- /dev/null
>+++ b/drivers/char/ipmi/kcs_bmc_cdev_raw.c
>@@ -0,0 +1,443 @@
>+// SPDX-License-Identifier: GPL-2.0-or-later
>+/* Copyright (c) 2021 IBM Corp. */
>+
>+#include <linux/delay.h>
>+#include <linux/device.h>
>+#include <linux/errno.h>
>+#include <linux/fs.h>
>+#include <linux/list.h>
>+#include <linux/miscdevice.h>
>+#include <linux/module.h>
>+#include <linux/poll.h>
>+
>+#include "kcs_bmc_client.h"
>+
>+#define DEVICE_NAME "raw-kcs"
>+
>+struct kcs_bmc_raw {
>+	struct list_head entry;
>+
>+	struct kcs_bmc_client client;
>+
>+	wait_queue_head_t queue;
>+	u8 events;
>+	bool writable;
>+	bool readable;
>+	u8 idr;
>+
>+	struct miscdevice miscdev;
>+};
>+
>+static inline struct kcs_bmc_raw *client_to_kcs_bmc_raw(struct kcs_bmc_client *client)
>+{
>+	return container_of(client, struct kcs_bmc_raw, client);
>+}
>+
>+/* Call under priv->queue.lock */
>+static void kcs_bmc_raw_update_event_mask(struct kcs_bmc_raw *priv, u8 mask, u8 state)
>+{
>+	kcs_bmc_update_event_mask(priv->client.dev, mask, state);
>+	priv->events &= ~mask;
>+	priv->events |= state & mask;
>+}
>+
>+static int kcs_bmc_raw_event(struct kcs_bmc_client *client)
>+{
>+	struct kcs_bmc_raw *priv;
>+	struct device *dev;
>+	u8 status, handled;
>+
>+	priv = client_to_kcs_bmc_raw(client);
>+	dev = priv->miscdev.this_device;
>+
>+	spin_lock(&priv->queue.lock);
>+
>+	status = kcs_bmc_read_status(client->dev);
>+	handled = 0;
>+
>+	if ((priv->events & KCS_BMC_EVENT_TYPE_IBF) && (status & KCS_BMC_STR_IBF)) {
>+		if (priv->readable)
>+			dev_err(dev, "Storm brewing!");

That seems a *touch* cryptic...

>+
>+		dev_dbg(dev, "Disabling IDR events for back-pressure\n");
>+		kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_IBF, 0);
>+		priv->idr = kcs_bmc_read_data(client->dev);
>+		priv->readable = true;
>+
>+		dev_dbg(dev, "IDR read, waking waiters\n");
>+		wake_up_locked(&priv->queue);
>+
>+		handled |= KCS_BMC_EVENT_TYPE_IBF;
>+	}
>+
>+	if ((priv->events & KCS_BMC_EVENT_TYPE_OBE) && !(status & KCS_BMC_STR_OBF)) {
>+		kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_OBE, 0);
>+		priv->writable = true;
>+
>+		dev_dbg(dev, "ODR writable, waking waiters\n");
>+		wake_up_locked(&priv->queue);
>+
>+		handled |= KCS_BMC_EVENT_TYPE_OBE;
>+	}
>+
>+	spin_unlock(&priv->queue.lock);
>+
>+	return handled ? KCS_BMC_EVENT_HANDLED : KCS_BMC_EVENT_NONE;

Hm, if we're just treating it as a boolean here, is there any need to
muck around with setting specific bits of 'handled' in the if-blocks
above?

>+}
>+
>+static const struct kcs_bmc_client_ops kcs_bmc_raw_client_ops = {
>+	.event = kcs_bmc_raw_event,
>+};
>+
>+static inline struct kcs_bmc_raw *file_to_kcs_bmc_raw(struct file *filp)
>+{
>+	return container_of(filp->private_data, struct kcs_bmc_raw, miscdev);
>+}
>+
>+static int kcs_bmc_raw_open(struct inode *inode, struct file *filp)
>+{
>+	struct kcs_bmc_raw *priv = file_to_kcs_bmc_raw(filp);
>+
>+	return kcs_bmc_enable_device(priv->client.dev, &priv->client);
>+}
>+
>+static bool kcs_bmc_raw_prepare_obe(struct kcs_bmc_raw *priv)
>+{
>+	bool writable;
>+
>+	/* Enable the OBE event so we can catch the host clearing OBF */
>+	kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_OBE, KCS_BMC_EVENT_TYPE_OBE);
>+
>+	/* Now that we'll catch an OBE event, check if it's already occurred */
>+	writable = !(kcs_bmc_read_status(priv->client.dev) & KCS_BMC_STR_OBF);
>+
>+	/* If OBF is clear we've missed the OBE event, so disable it */
>+	if (writable)
>+		kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_OBE, 0);
>+
>+	return writable;
>+}
>+
>+static __poll_t kcs_bmc_raw_poll(struct file *filp, poll_table *wait)
>+{
>+	struct kcs_bmc_raw *priv;
>+	__poll_t events = 0;
>+
>+	priv = file_to_kcs_bmc_raw(filp);
>+
>+	poll_wait(filp, &priv->queue, wait);
>+
>+	spin_lock_irq(&priv->queue.lock);
>+	if (kcs_bmc_raw_prepare_obe(priv))
>+		events |= (EPOLLOUT | EPOLLWRNORM);
>+
>+	if (priv->readable || (kcs_bmc_read_status(priv->client.dev) & KCS_BMC_STR_IBF))
>+		events |= (EPOLLIN | EPOLLRDNORM);
>+	spin_unlock_irq(&priv->queue.lock);
>+
>+	return events;
>+}
>+
>+static ssize_t kcs_bmc_raw_read(struct file *filp, char __user *buf,
>+			     size_t count, loff_t *ppos)
>+{
>+	struct kcs_bmc_device *kcs_bmc;
>+	struct kcs_bmc_raw *priv;
>+	bool read_idr, read_str;
>+	struct device *dev;
>+	u8 idr, str;
>+	ssize_t rc;
>+
>+	priv = file_to_kcs_bmc_raw(filp);
>+	kcs_bmc = priv->client.dev;
>+	dev = priv->miscdev.this_device;
>+
>+	if (!count)
>+		return 0;
>+
>+	if (count > 2 || *ppos > 1)
>+		return -EINVAL;
>+
>+	if (*ppos + count > 2)
>+		return -EINVAL;
>+
>+	read_idr = (*ppos == 0);
>+	read_str = (*ppos == 1) || (count == 2);
>+
>+	spin_lock_irq(&priv->queue.lock);
>+	if (read_idr) {
>+		dev_dbg(dev, "Waiting for IBF\n");
>+		str = kcs_bmc_read_status(kcs_bmc);
>+		if ((filp->f_flags & O_NONBLOCK) && (str & KCS_BMC_STR_IBF)) {
>+			rc = -EWOULDBLOCK;
>+			goto out;
>+		}
>+
>+		rc = wait_event_interruptible_locked(priv->queue,
>+						     priv->readable || (str & KCS_BMC_STR_IBF));
>+		if (rc < 0)
>+			goto out;
>+
>+		if (signal_pending(current)) {
>+			dev_dbg(dev, "Interrupted waiting for IBF\n");
>+			rc = -EINTR;
>+			goto out;
>+		}
>+
>+		/*
>+		 * Re-enable events prior to possible read of IDR (which clears
>+		 * IBF) to ensure we receive interrupts for subsequent writes
>+		 * to IDR. Writes to IDR by the host should not occur while IBF
>+		 * is set.
>+		 */
>+		dev_dbg(dev, "Woken by IBF, enabling IRQ\n");
>+		kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_IBF,
>+					      KCS_BMC_EVENT_TYPE_IBF);
>+
>+		/* Read data out of IDR into internal storage if necessary */
>+		if (!priv->readable) {
>+			WARN(!(str & KCS_BMC_STR_IBF), "Unknown reason for wakeup!");
>+
>+			priv->idr = kcs_bmc_read_data(kcs_bmc);
>+		}
>+
>+		/* Copy data from internal storage to userspace */
>+		idr = priv->idr;
>+
>+		/* We're done consuming the internally stored value */
>+		priv->readable = false;
>+	}
>+
>+	if (read_str) {
>+		str = kcs_bmc_read_status(kcs_bmc);
>+		if (*ppos == 0 || priv->readable)
>+			/*
>+			 * If we got this far with `*ppos == 0` then we've read
>+			 * data out of IDR, so set IBF when reporting back to
>+			 * userspace so userspace knows the IDR value is valid.
>+			 */
>+			str |= KCS_BMC_STR_IBF;
>+
>+		dev_dbg(dev, "Read status 0x%x\n", str);
>+
>+	}
>+
>+	rc = count;
>+out:
>+	spin_unlock_irq(&priv->queue.lock);
>+
>+	if (rc < 0)
>+		return rc;
>+
>+	/* Now copy the data in to the userspace buffer */
>+
>+	if (read_idr)
>+		if (copy_to_user(buf++, &idr, sizeof(idr)))
>+			return -EFAULT;
>+
>+	if (read_str)
>+		if (copy_to_user(buf, &str, sizeof(str)))
>+			return -EFAULT;
>+
>+	return count;
>+}
>+
>+static ssize_t kcs_bmc_raw_write(struct file *filp, const char __user *buf,
>+			      size_t count, loff_t *ppos)
>+{
>+	struct kcs_bmc_device *kcs_bmc;
>+	bool write_odr, write_str;
>+	struct kcs_bmc_raw *priv;
>+	struct device *dev;
>+	uint8_t data[2];
>+	ssize_t result;
>+	u8 str;
>+
>+	priv = file_to_kcs_bmc_raw(filp);
>+	kcs_bmc = priv->client.dev;
>+	dev = priv->miscdev.this_device;
>+
>+	if (!count)
>+		return count;
>+
>+	if (count > 2)
>+		return -EINVAL;
>+
>+	if (*ppos >= 2)
>+		return -EINVAL;
>+
>+	if (*ppos + count > 2)
>+		return -EINVAL;
>+
>+	if (copy_from_user(data, buf, count))
>+		return -EFAULT;
>+
>+	write_odr = (*ppos == 0);
>+	write_str = (*ppos == 1) || (count == 2);
>+
>+	spin_lock_irq(&priv->queue.lock);
>+
>+	/* Always write status before data, we generate the SerIRQ by writing ODR */
>+	if (write_str) {
>+		/* The index of STR in the userspace buffer depends on whether ODR is written */
>+		str = data[*ppos == 0];
>+		if (!(str & KCS_BMC_STR_OBF))
>+			dev_warn(dev, "Clearing OBF with status write: 0x%x\n", str);
>+		dev_dbg(dev, "Writing status 0x%x\n", str);
>+		kcs_bmc_write_status(kcs_bmc, str);
>+	}
>+
>+	if (write_odr) {
>+		/* If we're writing ODR it's always the first byte in the buffer */
>+		u8 odr = data[0];
>+
>+		str = kcs_bmc_read_status(kcs_bmc);
>+		if (str & KCS_BMC_STR_OBF) {
>+			if (filp->f_flags & O_NONBLOCK) {
>+				result = -EWOULDBLOCK;
>+				goto out;
>+			}
>+
>+			priv->writable = kcs_bmc_raw_prepare_obe(priv);
>+
>+			/* Now either OBF is already clear, or we'll get an OBE event to wake us */
>+			dev_dbg(dev, "Waiting for OBF to clear\n");
>+			wait_event_interruptible_locked(priv->queue, priv->writable);
>+
>+			if (signal_pending(current)) {
>+				kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_OBE, 0);
>+				result = -EINTR;
>+				goto out;
>+			}
>+
>+			WARN_ON(kcs_bmc_read_status(kcs_bmc) & KCS_BMC_STR_OBF);
>+		}
>+
>+		dev_dbg(dev, "Writing 0x%x to ODR\n", odr);
>+		kcs_bmc_write_data(kcs_bmc, odr);
>+	}
>+
>+	result = count;
>+out:
>+	spin_unlock_irq(&priv->queue.lock);
>+
>+	return result;
>+}
>+
>+static int kcs_bmc_raw_release(struct inode *inode, struct file *filp)
>+{
>+	struct kcs_bmc_raw *priv = file_to_kcs_bmc_raw(filp);
>+
>+	kcs_bmc_disable_device(priv->client.dev, &priv->client);
>+
>+	return 0;
>+}
>+
>+static const struct file_operations kcs_bmc_raw_fops = {
>+	.owner          = THIS_MODULE,
>+	.open		= kcs_bmc_raw_open,
>+	.llseek		= no_seek_end_llseek,
>+	.read           = kcs_bmc_raw_read,
>+	.write          = kcs_bmc_raw_write,
>+	.poll		= kcs_bmc_raw_poll,
>+	.release	= kcs_bmc_raw_release,
>+};
>+
>+static DEFINE_SPINLOCK(kcs_bmc_raw_instances_lock);
>+static LIST_HEAD(kcs_bmc_raw_instances);
>+
>+static int kcs_bmc_raw_attach_cdev(struct kcs_bmc_device *kcs_bmc)
>+{
>+	struct kcs_bmc_raw *priv;
>+	int rc;
>+
>+	priv = devm_kzalloc(kcs_bmc->dev, sizeof(*priv), GFP_KERNEL);
>+	if (!priv)
>+		return -ENOMEM;
>+
>+	priv->client.dev = kcs_bmc;
>+	priv->client.ops = &kcs_bmc_raw_client_ops;
>+
>+	init_waitqueue_head(&priv->queue);
>+	priv->writable = false;
>+	priv->readable = false;
>+
>+	priv->miscdev.minor = MISC_DYNAMIC_MINOR;
>+	priv->miscdev.name = devm_kasprintf(kcs_bmc->dev, GFP_KERNEL, "%s%u", DEVICE_NAME,
>+					   kcs_bmc->channel);
>+	if (!priv->miscdev.name)
>+		return -EINVAL;
>+
>+	priv->miscdev.fops = &kcs_bmc_raw_fops;
>+
>+	/* Initialise our expected events. Listen for IBF but ignore OBE until necessary */
>+	kcs_bmc_raw_update_event_mask(priv, (KCS_BMC_EVENT_TYPE_IBF | KCS_BMC_EVENT_TYPE_OBE),
>+				      KCS_BMC_EVENT_TYPE_IBF);
>+
>+	rc = misc_register(&priv->miscdev);
>+	if (rc) {
>+		dev_err(kcs_bmc->dev, "Unable to register device\n");
>+		return rc;
>+	}
>+
>+	spin_lock_irq(&kcs_bmc_raw_instances_lock);
>+	list_add(&priv->entry, &kcs_bmc_raw_instances);
>+	spin_unlock_irq(&kcs_bmc_raw_instances_lock);
>+
>+	dev_info(kcs_bmc->dev, "Initialised raw client for channel %d", kcs_bmc->channel);
>+
>+	return 0;
>+}
>+
>+static int kcs_bmc_raw_detach_cdev(struct kcs_bmc_device *kcs_bmc)
>+{
>+	struct kcs_bmc_raw *priv = NULL, *pos;
>+
>+	spin_lock_irq(&kcs_bmc_raw_instances_lock);
>+	list_for_each_entry(pos, &kcs_bmc_raw_instances, entry) {
>+		if (pos->client.dev == kcs_bmc) {
>+			priv = pos;
>+			list_del(&pos->entry);
>+			break;
>+		}
>+	}
>+	spin_unlock_irq(&kcs_bmc_raw_instances_lock);
>+
>+	if (!priv)
>+		return 0;

Similarly to patch #12, might we want to indicate some sort of failure
here, or is this a normal/expected case?

>+
>+	misc_deregister(&priv->miscdev);
>+	kcs_bmc_disable_device(kcs_bmc, &priv->client);
>+	devm_kfree(priv->client.dev->dev, priv);
>+
>+	return 0;
>+}
>+
>+static const struct kcs_bmc_cdev_ops kcs_bmc_raw_cdev_ops = {
>+	.add_device = kcs_bmc_raw_attach_cdev,
>+	.remove_device = kcs_bmc_raw_detach_cdev,
>+};
>+
>+static struct kcs_bmc_cdev kcs_bmc_raw_cdev = {
>+	.ops = &kcs_bmc_raw_cdev_ops,
>+};
>+
>+static int kcs_bmc_raw_init(void)
>+{
>+	return kcs_bmc_register_cdev(&kcs_bmc_raw_cdev);
>+}
>+module_init(kcs_bmc_raw_init);
>+
>+static void kcs_bmc_raw_exit(void)
>+{
>+	int rc;
>+
>+	rc = kcs_bmc_unregister_cdev(&kcs_bmc_raw_cdev);
>+	if (rc)
>+		pr_warn("Failed to remove KCS BMC client: %d", rc);
>+}
>+module_exit(kcs_bmc_raw_exit);
>+
>+MODULE_LICENSE("GPL v2");
>+MODULE_AUTHOR("Andrew Jeffery <andrew@aj.id.au>");
>+MODULE_DESCRIPTION("Character device for raw access to a KCS device");
>-- 
>2.27.0
>
Andrew Jeffery April 9, 2021, 6:46 a.m. UTC | #2
On Fri, 9 Apr 2021, at 14:47, Zev Weiss wrote:
> On Fri, Mar 19, 2021 at 01:27:47AM CDT, Andrew Jeffery wrote:
> >The existing IPMI chardev encodes IPMI behaviours as the name suggests.
> >However, KCS devices are useful beyond IPMI (or keyboards), as they
> >provide a means to generate IRQs and exchange arbitrary data between a
> >BMC and its host system.
> >
> >Implement a "raw" KCS character device that exposes the IDR, ODR and STR
> >registers to userspace via read() and write() implemented on a character
> >device:
> >
> >+--------+--------+---------+
> >| Offset | read() | write() |
> >+--------+--------+---------+
> >|   0    |   IDR  |   ODR   |
> >+--------+--------+---------+
> >|   1    |   STR  |   STR   |
> >+--------+--------+---------+
> >
> >This interface allows userspace to implement arbitrary (though somewhat
> >inefficient) protocols for exchanging information between a BMC and host
> >firmware. Conceptually the KCS interface can be used as an out-of-band
> >machanism for interrupt-signaled control messages while bulk data
> 
> Typo ("mechanism")

Ack.

> 
> >transfers occur over more appropriate interfaces between the BMC and the
> >host (which may lack their own interrupt mechanism, e.g. LPC FW cycles).
> >
> >poll() is provided, which will wait for IBF or OBE conditions for data
> >reads and writes respectively. Reads of STR on its own never blocks,
> >though accessing both offsets in the one system call may block if the
> >data registers are not ready.
> >
> >Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> >---
> > Documentation/ABI/testing/dev-raw-kcs |  25 ++
> > drivers/char/ipmi/Kconfig             |  17 +
> > drivers/char/ipmi/Makefile            |   1 +
> > drivers/char/ipmi/kcs_bmc_cdev_raw.c  | 443 ++++++++++++++++++++++++++
> > 4 files changed, 486 insertions(+)
> > create mode 100644 Documentation/ABI/testing/dev-raw-kcs
> > create mode 100644 drivers/char/ipmi/kcs_bmc_cdev_raw.c
> >
> >diff --git a/Documentation/ABI/testing/dev-raw-kcs b/Documentation/ABI/testing/dev-raw-kcs
> >new file mode 100644
> >index 000000000000..06e7e2071562
> >--- /dev/null
> >+++ b/Documentation/ABI/testing/dev-raw-kcs
> >@@ -0,0 +1,25 @@
> >+What:		/dev/raw-kcs*
> >+Date:		2021-02-15
> >+KernelVersion:	5.13
> >+Contact:	openbmc@lists.ozlabs.org
> >+Contact:	openipmi-developer@lists.sourceforge.net
> >+Contact:	Andrew Jeffery <andrew@aj.id.au>
> >+Description:	``/dev/raw-kcs*`` exposes to userspace the data and
> >+		status registers of Keyboard-Controller-Style (KCS) IPMI
> >+		interfaces via read() and write() syscalls. Direct
> >+		exposure of the data and status registers enables
> >+		inefficient but arbitrary protocols to be implemented
> >+		over the device. A typical approach is to use KCS
> >+		devices for out-of-band signalling for bulk data
> >+		transfers over other interfaces between a Baseboard
> >+		Management Controller and its host.
> >+
> >+		+--------+--------+---------+
> >+		| Offset | read() | write() |
> >+		+--------+--------+---------+
> >+		|   0    |   IDR  |   ODR   |
> >+		+--------+--------+---------+
> >+		|   1    |   STR  |   STR   |
> >+		+--------+--------+---------+
> >+
> >+Users:		libmctp: https://github.com/openbmc/libmctp
> >diff --git a/drivers/char/ipmi/Kconfig b/drivers/char/ipmi/Kconfig
> >index bc5f81899b62..273ac1a1f870 100644
> >--- a/drivers/char/ipmi/Kconfig
> >+++ b/drivers/char/ipmi/Kconfig
> >@@ -137,6 +137,23 @@ config IPMI_KCS_BMC_CDEV_IPMI
> > 	  This support is also available as a module. The module will be
> > 	  called kcs_bmc_cdev_ipmi.
> >
> >+config IPMI_KCS_BMC_CDEV_RAW
> >+	depends on IPMI_KCS_BMC
> >+	tristate "Raw character device interface for BMC KCS devices"
> >+	help
> >+	  Provides a BMC-side character device directly exposing the
> >+	  data and status registers of a KCS device to userspace. While
> >+	  KCS devices are commonly used to implement IPMI message
> >+	  passing, they provide a general interface for exchange of
> >+	  interrupts, data and status information between the BMC and
> >+	  its host.
> >+
> >+	  Say YES if you wish to use the KCS devices to implement
> >+	  protocols that are not IPMI.
> >+
> >+	  This support is also available as a module. The module will be
> >+	  called kcs_bmc_cdev_raw.
> >+
> > config ASPEED_BT_IPMI_BMC
> > 	depends on ARCH_ASPEED || COMPILE_TEST
> > 	depends on REGMAP && REGMAP_MMIO && MFD_SYSCON
> >diff --git a/drivers/char/ipmi/Makefile b/drivers/char/ipmi/Makefile
> >index fcfa676afddb..c8cc248ddd90 100644
> >--- a/drivers/char/ipmi/Makefile
> >+++ b/drivers/char/ipmi/Makefile
> >@@ -24,6 +24,7 @@ obj-$(CONFIG_IPMI_WATCHDOG) += ipmi_watchdog.o
> > obj-$(CONFIG_IPMI_POWEROFF) += ipmi_poweroff.o
> > obj-$(CONFIG_IPMI_KCS_BMC) += kcs_bmc.o
> > obj-$(CONFIG_IPMI_KCS_BMC_CDEV_IPMI) += kcs_bmc_cdev_ipmi.o
> >+obj-$(CONFIG_IPMI_KCS_BMC_CDEV_RAW) += kcs_bmc_cdev_raw.o
> > obj-$(CONFIG_ASPEED_BT_IPMI_BMC) += bt-bmc.o
> > obj-$(CONFIG_ASPEED_KCS_IPMI_BMC) += kcs_bmc_aspeed.o
> > obj-$(CONFIG_NPCM7XX_KCS_IPMI_BMC) += kcs_bmc_npcm7xx.o
> >diff --git a/drivers/char/ipmi/kcs_bmc_cdev_raw.c b/drivers/char/ipmi/kcs_bmc_cdev_raw.c
> >new file mode 100644
> >index 000000000000..bdd258648c8e
> >--- /dev/null
> >+++ b/drivers/char/ipmi/kcs_bmc_cdev_raw.c
> >@@ -0,0 +1,443 @@
> >+// SPDX-License-Identifier: GPL-2.0-or-later
> >+/* Copyright (c) 2021 IBM Corp. */
> >+
> >+#include <linux/delay.h>
> >+#include <linux/device.h>
> >+#include <linux/errno.h>
> >+#include <linux/fs.h>
> >+#include <linux/list.h>
> >+#include <linux/miscdevice.h>
> >+#include <linux/module.h>
> >+#include <linux/poll.h>
> >+
> >+#include "kcs_bmc_client.h"
> >+
> >+#define DEVICE_NAME "raw-kcs"
> >+
> >+struct kcs_bmc_raw {
> >+	struct list_head entry;
> >+
> >+	struct kcs_bmc_client client;
> >+
> >+	wait_queue_head_t queue;
> >+	u8 events;
> >+	bool writable;
> >+	bool readable;
> >+	u8 idr;
> >+
> >+	struct miscdevice miscdev;
> >+};
> >+
> >+static inline struct kcs_bmc_raw *client_to_kcs_bmc_raw(struct kcs_bmc_client *client)
> >+{
> >+	return container_of(client, struct kcs_bmc_raw, client);
> >+}
> >+
> >+/* Call under priv->queue.lock */
> >+static void kcs_bmc_raw_update_event_mask(struct kcs_bmc_raw *priv, u8 mask, u8 state)
> >+{
> >+	kcs_bmc_update_event_mask(priv->client.dev, mask, state);
> >+	priv->events &= ~mask;
> >+	priv->events |= state & mask;
> >+}
> >+
> >+static int kcs_bmc_raw_event(struct kcs_bmc_client *client)
> >+{
> >+	struct kcs_bmc_raw *priv;
> >+	struct device *dev;
> >+	u8 status, handled;
> >+
> >+	priv = client_to_kcs_bmc_raw(client);
> >+	dev = priv->miscdev.this_device;
> >+
> >+	spin_lock(&priv->queue.lock);
> >+
> >+	status = kcs_bmc_read_status(client->dev);
> >+	handled = 0;
> >+
> >+	if ((priv->events & KCS_BMC_EVENT_TYPE_IBF) && (status & KCS_BMC_STR_IBF)) {
> >+		if (priv->readable)
> >+			dev_err(dev, "Storm brewing!");
> 
> That seems a *touch* cryptic...

Uh, yeah. That wasn't meant to be there in that form.

> 
> >+
> >+		dev_dbg(dev, "Disabling IDR events for back-pressure\n");
> >+		kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_IBF, 0);
> >+		priv->idr = kcs_bmc_read_data(client->dev);
> >+		priv->readable = true;
> >+
> >+		dev_dbg(dev, "IDR read, waking waiters\n");
> >+		wake_up_locked(&priv->queue);
> >+
> >+		handled |= KCS_BMC_EVENT_TYPE_IBF;
> >+	}
> >+
> >+	if ((priv->events & KCS_BMC_EVENT_TYPE_OBE) && !(status & KCS_BMC_STR_OBF)) {
> >+		kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_OBE, 0);
> >+		priv->writable = true;
> >+
> >+		dev_dbg(dev, "ODR writable, waking waiters\n");
> >+		wake_up_locked(&priv->queue);
> >+
> >+		handled |= KCS_BMC_EVENT_TYPE_OBE;
> >+	}
> >+
> >+	spin_unlock(&priv->queue.lock);
> >+
> >+	return handled ? KCS_BMC_EVENT_HANDLED : KCS_BMC_EVENT_NONE;
> 
> Hm, if we're just treating it as a boolean here, is there any need to
> muck around with setting specific bits of 'handled' in the if-blocks
> above?

I don't think it matters? If we want to debug we can print the handled bitmask.

> 
> >+}
> >+
> >+static const struct kcs_bmc_client_ops kcs_bmc_raw_client_ops = {
> >+	.event = kcs_bmc_raw_event,
> >+};
> >+
> >+static inline struct kcs_bmc_raw *file_to_kcs_bmc_raw(struct file *filp)
> >+{
> >+	return container_of(filp->private_data, struct kcs_bmc_raw, miscdev);
> >+}
> >+
> >+static int kcs_bmc_raw_open(struct inode *inode, struct file *filp)
> >+{
> >+	struct kcs_bmc_raw *priv = file_to_kcs_bmc_raw(filp);
> >+
> >+	return kcs_bmc_enable_device(priv->client.dev, &priv->client);
> >+}
> >+
> >+static bool kcs_bmc_raw_prepare_obe(struct kcs_bmc_raw *priv)
> >+{
> >+	bool writable;
> >+
> >+	/* Enable the OBE event so we can catch the host clearing OBF */
> >+	kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_OBE, KCS_BMC_EVENT_TYPE_OBE);
> >+
> >+	/* Now that we'll catch an OBE event, check if it's already occurred */
> >+	writable = !(kcs_bmc_read_status(priv->client.dev) & KCS_BMC_STR_OBF);
> >+
> >+	/* If OBF is clear we've missed the OBE event, so disable it */
> >+	if (writable)
> >+		kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_OBE, 0);
> >+
> >+	return writable;
> >+}
> >+
> >+static __poll_t kcs_bmc_raw_poll(struct file *filp, poll_table *wait)
> >+{
> >+	struct kcs_bmc_raw *priv;
> >+	__poll_t events = 0;
> >+
> >+	priv = file_to_kcs_bmc_raw(filp);
> >+
> >+	poll_wait(filp, &priv->queue, wait);
> >+
> >+	spin_lock_irq(&priv->queue.lock);
> >+	if (kcs_bmc_raw_prepare_obe(priv))
> >+		events |= (EPOLLOUT | EPOLLWRNORM);
> >+
> >+	if (priv->readable || (kcs_bmc_read_status(priv->client.dev) & KCS_BMC_STR_IBF))
> >+		events |= (EPOLLIN | EPOLLRDNORM);
> >+	spin_unlock_irq(&priv->queue.lock);
> >+
> >+	return events;
> >+}
> >+
> >+static ssize_t kcs_bmc_raw_read(struct file *filp, char __user *buf,
> >+			     size_t count, loff_t *ppos)
> >+{
> >+	struct kcs_bmc_device *kcs_bmc;
> >+	struct kcs_bmc_raw *priv;
> >+	bool read_idr, read_str;
> >+	struct device *dev;
> >+	u8 idr, str;
> >+	ssize_t rc;
> >+
> >+	priv = file_to_kcs_bmc_raw(filp);
> >+	kcs_bmc = priv->client.dev;
> >+	dev = priv->miscdev.this_device;
> >+
> >+	if (!count)
> >+		return 0;
> >+
> >+	if (count > 2 || *ppos > 1)
> >+		return -EINVAL;
> >+
> >+	if (*ppos + count > 2)
> >+		return -EINVAL;
> >+
> >+	read_idr = (*ppos == 0);
> >+	read_str = (*ppos == 1) || (count == 2);
> >+
> >+	spin_lock_irq(&priv->queue.lock);
> >+	if (read_idr) {
> >+		dev_dbg(dev, "Waiting for IBF\n");
> >+		str = kcs_bmc_read_status(kcs_bmc);
> >+		if ((filp->f_flags & O_NONBLOCK) && (str & KCS_BMC_STR_IBF)) {
> >+			rc = -EWOULDBLOCK;
> >+			goto out;
> >+		}
> >+
> >+		rc = wait_event_interruptible_locked(priv->queue,
> >+						     priv->readable || (str & KCS_BMC_STR_IBF));
> >+		if (rc < 0)
> >+			goto out;
> >+
> >+		if (signal_pending(current)) {
> >+			dev_dbg(dev, "Interrupted waiting for IBF\n");
> >+			rc = -EINTR;
> >+			goto out;
> >+		}
> >+
> >+		/*
> >+		 * Re-enable events prior to possible read of IDR (which clears
> >+		 * IBF) to ensure we receive interrupts for subsequent writes
> >+		 * to IDR. Writes to IDR by the host should not occur while IBF
> >+		 * is set.
> >+		 */
> >+		dev_dbg(dev, "Woken by IBF, enabling IRQ\n");
> >+		kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_IBF,
> >+					      KCS_BMC_EVENT_TYPE_IBF);
> >+
> >+		/* Read data out of IDR into internal storage if necessary */
> >+		if (!priv->readable) {
> >+			WARN(!(str & KCS_BMC_STR_IBF), "Unknown reason for wakeup!");
> >+
> >+			priv->idr = kcs_bmc_read_data(kcs_bmc);
> >+		}
> >+
> >+		/* Copy data from internal storage to userspace */
> >+		idr = priv->idr;
> >+
> >+		/* We're done consuming the internally stored value */
> >+		priv->readable = false;
> >+	}
> >+
> >+	if (read_str) {
> >+		str = kcs_bmc_read_status(kcs_bmc);
> >+		if (*ppos == 0 || priv->readable)
> >+			/*
> >+			 * If we got this far with `*ppos == 0` then we've read
> >+			 * data out of IDR, so set IBF when reporting back to
> >+			 * userspace so userspace knows the IDR value is valid.
> >+			 */
> >+			str |= KCS_BMC_STR_IBF;
> >+
> >+		dev_dbg(dev, "Read status 0x%x\n", str);
> >+
> >+	}
> >+
> >+	rc = count;
> >+out:
> >+	spin_unlock_irq(&priv->queue.lock);
> >+
> >+	if (rc < 0)
> >+		return rc;
> >+
> >+	/* Now copy the data in to the userspace buffer */
> >+
> >+	if (read_idr)
> >+		if (copy_to_user(buf++, &idr, sizeof(idr)))
> >+			return -EFAULT;
> >+
> >+	if (read_str)
> >+		if (copy_to_user(buf, &str, sizeof(str)))
> >+			return -EFAULT;
> >+
> >+	return count;
> >+}
> >+
> >+static ssize_t kcs_bmc_raw_write(struct file *filp, const char __user *buf,
> >+			      size_t count, loff_t *ppos)
> >+{
> >+	struct kcs_bmc_device *kcs_bmc;
> >+	bool write_odr, write_str;
> >+	struct kcs_bmc_raw *priv;
> >+	struct device *dev;
> >+	uint8_t data[2];
> >+	ssize_t result;
> >+	u8 str;
> >+
> >+	priv = file_to_kcs_bmc_raw(filp);
> >+	kcs_bmc = priv->client.dev;
> >+	dev = priv->miscdev.this_device;
> >+
> >+	if (!count)
> >+		return count;
> >+
> >+	if (count > 2)
> >+		return -EINVAL;
> >+
> >+	if (*ppos >= 2)
> >+		return -EINVAL;
> >+
> >+	if (*ppos + count > 2)
> >+		return -EINVAL;
> >+
> >+	if (copy_from_user(data, buf, count))
> >+		return -EFAULT;
> >+
> >+	write_odr = (*ppos == 0);
> >+	write_str = (*ppos == 1) || (count == 2);
> >+
> >+	spin_lock_irq(&priv->queue.lock);
> >+
> >+	/* Always write status before data, we generate the SerIRQ by writing ODR */
> >+	if (write_str) {
> >+		/* The index of STR in the userspace buffer depends on whether ODR is written */
> >+		str = data[*ppos == 0];
> >+		if (!(str & KCS_BMC_STR_OBF))
> >+			dev_warn(dev, "Clearing OBF with status write: 0x%x\n", str);
> >+		dev_dbg(dev, "Writing status 0x%x\n", str);
> >+		kcs_bmc_write_status(kcs_bmc, str);
> >+	}
> >+
> >+	if (write_odr) {
> >+		/* If we're writing ODR it's always the first byte in the buffer */
> >+		u8 odr = data[0];
> >+
> >+		str = kcs_bmc_read_status(kcs_bmc);
> >+		if (str & KCS_BMC_STR_OBF) {
> >+			if (filp->f_flags & O_NONBLOCK) {
> >+				result = -EWOULDBLOCK;
> >+				goto out;
> >+			}
> >+
> >+			priv->writable = kcs_bmc_raw_prepare_obe(priv);
> >+
> >+			/* Now either OBF is already clear, or we'll get an OBE event to wake us */
> >+			dev_dbg(dev, "Waiting for OBF to clear\n");
> >+			wait_event_interruptible_locked(priv->queue, priv->writable);
> >+
> >+			if (signal_pending(current)) {
> >+				kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_OBE, 0);
> >+				result = -EINTR;
> >+				goto out;
> >+			}
> >+
> >+			WARN_ON(kcs_bmc_read_status(kcs_bmc) & KCS_BMC_STR_OBF);
> >+		}
> >+
> >+		dev_dbg(dev, "Writing 0x%x to ODR\n", odr);
> >+		kcs_bmc_write_data(kcs_bmc, odr);
> >+	}
> >+
> >+	result = count;
> >+out:
> >+	spin_unlock_irq(&priv->queue.lock);
> >+
> >+	return result;
> >+}
> >+
> >+static int kcs_bmc_raw_release(struct inode *inode, struct file *filp)
> >+{
> >+	struct kcs_bmc_raw *priv = file_to_kcs_bmc_raw(filp);
> >+
> >+	kcs_bmc_disable_device(priv->client.dev, &priv->client);
> >+
> >+	return 0;
> >+}
> >+
> >+static const struct file_operations kcs_bmc_raw_fops = {
> >+	.owner          = THIS_MODULE,
> >+	.open		= kcs_bmc_raw_open,
> >+	.llseek		= no_seek_end_llseek,
> >+	.read           = kcs_bmc_raw_read,
> >+	.write          = kcs_bmc_raw_write,
> >+	.poll		= kcs_bmc_raw_poll,
> >+	.release	= kcs_bmc_raw_release,
> >+};
> >+
> >+static DEFINE_SPINLOCK(kcs_bmc_raw_instances_lock);
> >+static LIST_HEAD(kcs_bmc_raw_instances);
> >+
> >+static int kcs_bmc_raw_attach_cdev(struct kcs_bmc_device *kcs_bmc)
> >+{
> >+	struct kcs_bmc_raw *priv;
> >+	int rc;
> >+
> >+	priv = devm_kzalloc(kcs_bmc->dev, sizeof(*priv), GFP_KERNEL);
> >+	if (!priv)
> >+		return -ENOMEM;
> >+
> >+	priv->client.dev = kcs_bmc;
> >+	priv->client.ops = &kcs_bmc_raw_client_ops;
> >+
> >+	init_waitqueue_head(&priv->queue);
> >+	priv->writable = false;
> >+	priv->readable = false;
> >+
> >+	priv->miscdev.minor = MISC_DYNAMIC_MINOR;
> >+	priv->miscdev.name = devm_kasprintf(kcs_bmc->dev, GFP_KERNEL, "%s%u", DEVICE_NAME,
> >+					   kcs_bmc->channel);
> >+	if (!priv->miscdev.name)
> >+		return -EINVAL;
> >+
> >+	priv->miscdev.fops = &kcs_bmc_raw_fops;
> >+
> >+	/* Initialise our expected events. Listen for IBF but ignore OBE until necessary */
> >+	kcs_bmc_raw_update_event_mask(priv, (KCS_BMC_EVENT_TYPE_IBF | KCS_BMC_EVENT_TYPE_OBE),
> >+				      KCS_BMC_EVENT_TYPE_IBF);
> >+
> >+	rc = misc_register(&priv->miscdev);
> >+	if (rc) {
> >+		dev_err(kcs_bmc->dev, "Unable to register device\n");
> >+		return rc;
> >+	}
> >+
> >+	spin_lock_irq(&kcs_bmc_raw_instances_lock);
> >+	list_add(&priv->entry, &kcs_bmc_raw_instances);
> >+	spin_unlock_irq(&kcs_bmc_raw_instances_lock);
> >+
> >+	dev_info(kcs_bmc->dev, "Initialised raw client for channel %d", kcs_bmc->channel);
> >+
> >+	return 0;
> >+}
> >+
> >+static int kcs_bmc_raw_detach_cdev(struct kcs_bmc_device *kcs_bmc)
> >+{
> >+	struct kcs_bmc_raw *priv = NULL, *pos;
> >+
> >+	spin_lock_irq(&kcs_bmc_raw_instances_lock);
> >+	list_for_each_entry(pos, &kcs_bmc_raw_instances, entry) {
> >+		if (pos->client.dev == kcs_bmc) {
> >+			priv = pos;
> >+			list_del(&pos->entry);
> >+			break;
> >+		}
> >+	}
> >+	spin_unlock_irq(&kcs_bmc_raw_instances_lock);
> >+
> >+	if (!priv)
> >+		return 0;
> 
> Similarly to patch #12, might we want to indicate some sort of failure
> here, or is this a normal/expected case?

I replied on 12/21, I'll have another think about it.

Cheers,

Andrew
Arnd Bergmann April 9, 2021, 7:55 a.m. UTC | #3
On Fri, Mar 19, 2021 at 7:31 AM Andrew Jeffery <andrew@aj.id.au> wrote:
>

> The existing IPMI chardev encodes IPMI behaviours as the name suggests.

> However, KCS devices are useful beyond IPMI (or keyboards), as they

> provide a means to generate IRQs and exchange arbitrary data between a

> BMC and its host system.


I only noticed the series after Joel asked about the DT changes on the arm
side. One question though:

How does this related to the drivers/input/serio/ framework that also talks
to the keyboard controller for things that are not keyboards? Are these
separate communication channels on adjacent I/O ports, or does there
need to be some arbitration?

       Arnd
Andrew Jeffery April 12, 2021, 1:33 a.m. UTC | #4
On Fri, 9 Apr 2021, at 17:25, Arnd Bergmann wrote:
> On Fri, Mar 19, 2021 at 7:31 AM Andrew Jeffery <andrew@aj.id.au> wrote:

> >

> > The existing IPMI chardev encodes IPMI behaviours as the name suggests.

> > However, KCS devices are useful beyond IPMI (or keyboards), as they

> > provide a means to generate IRQs and exchange arbitrary data between a

> > BMC and its host system.

> 

> I only noticed the series after Joel asked about the DT changes on the arm

> side. One question though:

> 

> How does this related to the drivers/input/serio/ framework that also talks

> to the keyboard controller for things that are not keyboards?


I've taken a brief look and I feel they're somewhat closely related.

It's plausible that we could wrangle the code so the Aspeed and Nuvoton 
KCS drivers move under drivers/input/serio. If you squint, the i8042 
serio device driver has similarities with what the Aspeed and Nuvoton 
device drivers are providing to the KCS IPMI stack.

Both the KCS IPMI and raw chardev I've implemented in this patch need 
both read and write access to the status register (STR). serio could 
potentially expose its value through serio_interrupt() using the 
SERIO_OOB_DATA flag, but I haven't put any thought into it beyond this 
sentence. We'd need some extra support for writing STR via the serio 
API. I'm not sure that fits into the abstraction (unless we make 
serio_write() take a flags argument?).

In that vein, the serio_raw interface is close to the functionality 
that the raw chardev provides in this patch, though again serio_raw 
lacks userspace access to STR. Flags are ignored in the ->interrupt() 
callback so all values received via ->interrupt() are exposed as data. 
The result is there's no way to take care of SERIO_OOB_DATA in the 
read() path. Given that, I think we'd have to expose an ioctl() to 
access the STR value after taking care of SERIO_OOB_DATA in 
->interrupt().

I'm not sure where that lands us.

Dmitry, any thoughts here?

> Are these

> separate communication channels on adjacent I/O ports, or does there

> need to be some arbitration?


As it stands there's no arbitration.

Cheers,

Andrew
Arnd Bergmann April 12, 2021, 8:48 a.m. UTC | #5
On Mon, Apr 12, 2021 at 3:33 AM Andrew Jeffery <andrew@aj.id.au> wrote:
> On Fri, 9 Apr 2021, at 17:25, Arnd Bergmann wrote:

> > On Fri, Mar 19, 2021 at 7:31 AM Andrew Jeffery <andrew@aj.id.au> wrote:

> > >

> > > The existing IPMI chardev encodes IPMI behaviours as the name suggests.

> > > However, KCS devices are useful beyond IPMI (or keyboards), as they

> > > provide a means to generate IRQs and exchange arbitrary data between a

> > > BMC and its host system.

> >

> > I only noticed the series after Joel asked about the DT changes on the arm

> > side. One question though:

> >

> > How does this related to the drivers/input/serio/ framework that also talks

> > to the keyboard controller for things that are not keyboards?

>

> I've taken a brief look and I feel they're somewhat closely related.

>

> It's plausible that we could wrangle the code so the Aspeed and Nuvoton

> KCS drivers move under drivers/input/serio. If you squint, the i8042

> serio device driver has similarities with what the Aspeed and Nuvoton

> device drivers are providing to the KCS IPMI stack.


After looking some more into it, I finally understood that the two are
rather complementary. While the  drivers/char/ipmi/kcs_bmc.c
is the other (bmc) end of drivers/char/ipmi/ipmi_kcs_sm.c, it seems
that the proposed kcs_bmc_cdev_raw.c interface would be
what corresponds to the other side of
drivers/input/serio/i8042.c+userio.c. Then again, these are also on
separate ports (0x60 for the keyboard controller, 0xca2 for the BMC
KCS), so they would never actually talk to one another.

> Both the KCS IPMI and raw chardev I've implemented in this patch need

> both read and write access to the status register (STR). serio could

> potentially expose its value through serio_interrupt() using the

> SERIO_OOB_DATA flag, but I haven't put any thought into it beyond this

> sentence. We'd need some extra support for writing STR via the serio

> API. I'm not sure that fits into the abstraction (unless we make

> serio_write() take a flags argument?).

>

> In that vein, the serio_raw interface is close to the functionality

> that the raw chardev provides in this patch, though again serio_raw

> lacks userspace access to STR. Flags are ignored in the ->interrupt()

> callback so all values received via ->interrupt() are exposed as data.

> The result is there's no way to take care of SERIO_OOB_DATA in the

> read() path. Given that, I think we'd have to expose an ioctl() to

> access the STR value after taking care of SERIO_OOB_DATA in

> ->interrupt().

>

> I'm not sure where that lands us.


Based on what I looked up, I think you can just forget about my original
question. We have two separate interfaces that use an Intel 8042-style
protocol, but they don't really interact.

      Arnd
Andrew Jeffery April 12, 2021, 11:45 p.m. UTC | #6
On Mon, 12 Apr 2021, at 18:18, Arnd Bergmann wrote:
> On Mon, Apr 12, 2021 at 3:33 AM Andrew Jeffery <andrew@aj.id.au> wrote:
> > On Fri, 9 Apr 2021, at 17:25, Arnd Bergmann wrote:
> > > On Fri, Mar 19, 2021 at 7:31 AM Andrew Jeffery <andrew@aj.id.au> wrote:
> > > >
> > > > The existing IPMI chardev encodes IPMI behaviours as the name suggests.
> > > > However, KCS devices are useful beyond IPMI (or keyboards), as they
> > > > provide a means to generate IRQs and exchange arbitrary data between a
> > > > BMC and its host system.
> > >
> > > I only noticed the series after Joel asked about the DT changes on the arm
> > > side. One question though:
> > >
> > > How does this related to the drivers/input/serio/ framework that also talks
> > > to the keyboard controller for things that are not keyboards?
> >
> > I've taken a brief look and I feel they're somewhat closely related.
> >
> > It's plausible that we could wrangle the code so the Aspeed and Nuvoton
> > KCS drivers move under drivers/input/serio. If you squint, the i8042
> > serio device driver has similarities with what the Aspeed and Nuvoton
> > device drivers are providing to the KCS IPMI stack.
> 
> After looking some more into it, I finally understood that the two are
> rather complementary. While the  drivers/char/ipmi/kcs_bmc.c
> is the other (bmc) end of drivers/char/ipmi/ipmi_kcs_sm.c, it seems
> that the proposed kcs_bmc_cdev_raw.c interface would be
> what corresponds to the other side of
> drivers/input/serio/i8042.c+userio.c.

Right. I guess the question is should we be splitting kernel subsystems 
along host/bmc lines? Doesn't feel intuitive, it's all Linux, but maybe 
we can consolidate in the future if it makes sense?

> Then again, these are also on
> separate ports (0x60 for the keyboard controller, 0xca2 for the BMC
> KCS), so they would never actually talk to one another.

Well, sort of I guess. On Power systems we don't use the keyboard 
controller for IPMI or keyboards, so we're just kinda exploiting the 
hardware for our own purposes.

> 
> > Both the KCS IPMI and raw chardev I've implemented in this patch need
> > both read and write access to the status register (STR). serio could
> > potentially expose its value through serio_interrupt() using the
> > SERIO_OOB_DATA flag, but I haven't put any thought into it beyond this
> > sentence. We'd need some extra support for writing STR via the serio
> > API. I'm not sure that fits into the abstraction (unless we make
> > serio_write() take a flags argument?).
> >
> > In that vein, the serio_raw interface is close to the functionality
> > that the raw chardev provides in this patch, though again serio_raw
> > lacks userspace access to STR. Flags are ignored in the ->interrupt()
> > callback so all values received via ->interrupt() are exposed as data.
> > The result is there's no way to take care of SERIO_OOB_DATA in the
> > read() path. Given that, I think we'd have to expose an ioctl() to
> > access the STR value after taking care of SERIO_OOB_DATA in
> > ->interrupt().
> >
> > I'm not sure where that lands us.
> 
> Based on what I looked up, I think you can just forget about my original
> question. We have two separate interfaces that use an Intel 8042-style
> protocol, but they don't really interact.

Right, this is still true given Power doesn't care for keyboards or 
IPMI via the keyboard controllers; the two still don't interact.

Andrew
Arnd Bergmann April 13, 2021, 8:22 a.m. UTC | #7
On Tue, Apr 13, 2021 at 1:45 AM Andrew Jeffery <andrew@aj.id.au> wrote:
> On Mon, 12 Apr 2021, at 18:18, Arnd Bergmann wrote:
> > On Mon, Apr 12, 2021 at 3:33 AM Andrew Jeffery <andrew@aj.id.au> wrote:
> > > On Fri, 9 Apr 2021, at 17:25, Arnd Bergmann wrote:
> > > > On Fri, Mar 19, 2021 at 7:31 AM Andrew Jeffery <andrew@aj.id.au> wrote:
> > > > >
> > > > > The existing IPMI chardev encodes IPMI behaviours as the name suggests.
> > > > > However, KCS devices are useful beyond IPMI (or keyboards), as they
> > > > > provide a means to generate IRQs and exchange arbitrary data between a
> > > > > BMC and its host system.
> > > >
> > > > I only noticed the series after Joel asked about the DT changes on the arm
> > > > side. One question though:
> > > >
> > > > How does this related to the drivers/input/serio/ framework that also talks
> > > > to the keyboard controller for things that are not keyboards?
> > >
> > > I've taken a brief look and I feel they're somewhat closely related.
> > >
> > > It's plausible that we could wrangle the code so the Aspeed and Nuvoton
> > > KCS drivers move under drivers/input/serio. If you squint, the i8042
> > > serio device driver has similarities with what the Aspeed and Nuvoton
> > > device drivers are providing to the KCS IPMI stack.
> >
> > After looking some more into it, I finally understood that the two are
> > rather complementary. While the  drivers/char/ipmi/kcs_bmc.c
> > is the other (bmc) end of drivers/char/ipmi/ipmi_kcs_sm.c, it seems
> > that the proposed kcs_bmc_cdev_raw.c interface would be
> > what corresponds to the other side of
> > drivers/input/serio/i8042.c+userio.c.
>
> Right. I guess the question is should we be splitting kernel subsystems
> along host/bmc lines? Doesn't feel intuitive, it's all Linux, but maybe
> we can consolidate in the future if it makes sense?

We actually have a number of subsystems with somewhat overlapping
functionality. I brought up serio, because it has an abstraction for multiple
things that communicate over the keyboard controller and I thought
the problem you were trying to solve was also related to the keyboard
controller.
It is also one of multiple abstractions that allow you to connect a device
to a uart (along with serdev and tty_ldisc, probably at least one more that
you can nest above or below these).

Consolidating the kcs_bmc.c interface into something that already
exists would obviously be best, but it's not clear which of these that
should be, that depends on the fundamental properties of the hardware
interface.

> > Then again, these are also on
> > separate ports (0x60 for the keyboard controller, 0xca2 for the BMC
> > KCS), so they would never actually talk to one another.
>
> Well, sort of I guess. On Power systems we don't use the keyboard
> controller for IPMI or keyboards, so we're just kinda exploiting the
> hardware for our own purposes.

Can you describe in an abstract form what the hardware interface
can do here and what you want from it? I wonder if it could be
part of a higher-level interface such as drivers/mailbox/ instead.

         Arnd
Andrew Jeffery April 14, 2021, 12:30 a.m. UTC | #8
On Tue, 13 Apr 2021, at 17:52, Arnd Bergmann wrote:
> On Tue, Apr 13, 2021 at 1:45 AM Andrew Jeffery <andrew@aj.id.au> wrote:
> > On Mon, 12 Apr 2021, at 18:18, Arnd Bergmann wrote:
> > > On Mon, Apr 12, 2021 at 3:33 AM Andrew Jeffery <andrew@aj.id.au> wrote:
> > > > On Fri, 9 Apr 2021, at 17:25, Arnd Bergmann wrote:
> > > > > On Fri, Mar 19, 2021 at 7:31 AM Andrew Jeffery <andrew@aj.id.au> wrote:
> > > > > >
> > > > > > The existing IPMI chardev encodes IPMI behaviours as the name suggests.
> > > > > > However, KCS devices are useful beyond IPMI (or keyboards), as they
> > > > > > provide a means to generate IRQs and exchange arbitrary data between a
> > > > > > BMC and its host system.
> > > > >
> > > > > I only noticed the series after Joel asked about the DT changes on the arm
> > > > > side. One question though:
> > > > >
> > > > > How does this related to the drivers/input/serio/ framework that also talks
> > > > > to the keyboard controller for things that are not keyboards?
> > > >
> > > > I've taken a brief look and I feel they're somewhat closely related.
> > > >
> > > > It's plausible that we could wrangle the code so the Aspeed and Nuvoton
> > > > KCS drivers move under drivers/input/serio. If you squint, the i8042
> > > > serio device driver has similarities with what the Aspeed and Nuvoton
> > > > device drivers are providing to the KCS IPMI stack.
> > >
> > > After looking some more into it, I finally understood that the two are
> > > rather complementary. While the  drivers/char/ipmi/kcs_bmc.c
> > > is the other (bmc) end of drivers/char/ipmi/ipmi_kcs_sm.c, it seems
> > > that the proposed kcs_bmc_cdev_raw.c interface would be
> > > what corresponds to the other side of
> > > drivers/input/serio/i8042.c+userio.c.
> >
> > Right. I guess the question is should we be splitting kernel subsystems
> > along host/bmc lines? Doesn't feel intuitive, it's all Linux, but maybe
> > we can consolidate in the future if it makes sense?
> 
> We actually have a number of subsystems with somewhat overlapping
> functionality. I brought up serio, because it has an abstraction for multiple
> things that communicate over the keyboard controller and I thought
> the problem you were trying to solve was also related to the keyboard
> controller.
> It is also one of multiple abstractions that allow you to connect a device
> to a uart (along with serdev and tty_ldisc, probably at least one more that
> you can nest above or below these).
> 
> Consolidating the kcs_bmc.c interface into something that already
> exists would obviously be best, but it's not clear which of these that
> should be, that depends on the fundamental properties of the hardware
> interface.
> 
> > > Then again, these are also on
> > > separate ports (0x60 for the keyboard controller, 0xca2 for the BMC
> > > KCS), so they would never actually talk to one another.
> >
> > Well, sort of I guess. On Power systems we don't use the keyboard
> > controller for IPMI or keyboards, so we're just kinda exploiting the
> > hardware for our own purposes.
> 
> Can you describe in an abstract form what the hardware interface
> can do here and what you want from it? I wonder if it could be
> part of a higher-level interface such as drivers/mailbox/ instead.

It gives us interrupts each way between the host and BMC when we send 
some (small amount of) data/metadata. Mailbox is possibly a fit for 
this? We're (ab)using the keyboard controllers to implement a vendor 
MCTP binding over LPC[1] and also a simple protocol for the (Power) 
host to trigger BMC debug data capture in the event of issues with 
other (more complex) in-band communication stacks. The MCTP binding is 
what requires access to STR.

It's feasible that we could implement the debug capture protocol with 
the serio_raw interface now that I think about it (as it only makes use 
of data and not status). What's unclear to me right now is what impact 
that has on the Aspeed/Nuvoton KCS drivers we have in the IPMI 
subsystem. If we can do something sensible to service both serio and 
IPMI with the one driver implementation then I can put together a PoC 
for the debug data stuff using serio_raw.

Regarding the MCTP binding, Jeremy Kerr is working in an in-kernel, 
socket-based implementation of MCTP. Eventually this will allow us to 
bury the KCS details in the MCTP subsystem, which removes some of the 
motivation for the raw interface here.

Andrew

[1] https://github.com/openbmc/libmctp/blob/master/docs/bindings/vendor-ibm-astlpc.md
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/dev-raw-kcs b/Documentation/ABI/testing/dev-raw-kcs
new file mode 100644
index 000000000000..06e7e2071562
--- /dev/null
+++ b/Documentation/ABI/testing/dev-raw-kcs
@@ -0,0 +1,25 @@ 
+What:		/dev/raw-kcs*
+Date:		2021-02-15
+KernelVersion:	5.13
+Contact:	openbmc@lists.ozlabs.org
+Contact:	openipmi-developer@lists.sourceforge.net
+Contact:	Andrew Jeffery <andrew@aj.id.au>
+Description:	``/dev/raw-kcs*`` exposes to userspace the data and
+		status registers of Keyboard-Controller-Style (KCS) IPMI
+		interfaces via read() and write() syscalls. Direct
+		exposure of the data and status registers enables
+		inefficient but arbitrary protocols to be implemented
+		over the device. A typical approach is to use KCS
+		devices for out-of-band signalling for bulk data
+		transfers over other interfaces between a Baseboard
+		Management Controller and its host.
+
+		+--------+--------+---------+
+		| Offset | read() | write() |
+		+--------+--------+---------+
+		|   0    |   IDR  |   ODR   |
+		+--------+--------+---------+
+		|   1    |   STR  |   STR   |
+		+--------+--------+---------+
+
+Users:		libmctp: https://github.com/openbmc/libmctp
diff --git a/drivers/char/ipmi/Kconfig b/drivers/char/ipmi/Kconfig
index bc5f81899b62..273ac1a1f870 100644
--- a/drivers/char/ipmi/Kconfig
+++ b/drivers/char/ipmi/Kconfig
@@ -137,6 +137,23 @@  config IPMI_KCS_BMC_CDEV_IPMI
 	  This support is also available as a module. The module will be
 	  called kcs_bmc_cdev_ipmi.
 
+config IPMI_KCS_BMC_CDEV_RAW
+	depends on IPMI_KCS_BMC
+	tristate "Raw character device interface for BMC KCS devices"
+	help
+	  Provides a BMC-side character device directly exposing the
+	  data and status registers of a KCS device to userspace. While
+	  KCS devices are commonly used to implement IPMI message
+	  passing, they provide a general interface for exchange of
+	  interrupts, data and status information between the BMC and
+	  its host.
+
+	  Say YES if you wish to use the KCS devices to implement
+	  protocols that are not IPMI.
+
+	  This support is also available as a module. The module will be
+	  called kcs_bmc_cdev_raw.
+
 config ASPEED_BT_IPMI_BMC
 	depends on ARCH_ASPEED || COMPILE_TEST
 	depends on REGMAP && REGMAP_MMIO && MFD_SYSCON
diff --git a/drivers/char/ipmi/Makefile b/drivers/char/ipmi/Makefile
index fcfa676afddb..c8cc248ddd90 100644
--- a/drivers/char/ipmi/Makefile
+++ b/drivers/char/ipmi/Makefile
@@ -24,6 +24,7 @@  obj-$(CONFIG_IPMI_WATCHDOG) += ipmi_watchdog.o
 obj-$(CONFIG_IPMI_POWEROFF) += ipmi_poweroff.o
 obj-$(CONFIG_IPMI_KCS_BMC) += kcs_bmc.o
 obj-$(CONFIG_IPMI_KCS_BMC_CDEV_IPMI) += kcs_bmc_cdev_ipmi.o
+obj-$(CONFIG_IPMI_KCS_BMC_CDEV_RAW) += kcs_bmc_cdev_raw.o
 obj-$(CONFIG_ASPEED_BT_IPMI_BMC) += bt-bmc.o
 obj-$(CONFIG_ASPEED_KCS_IPMI_BMC) += kcs_bmc_aspeed.o
 obj-$(CONFIG_NPCM7XX_KCS_IPMI_BMC) += kcs_bmc_npcm7xx.o
diff --git a/drivers/char/ipmi/kcs_bmc_cdev_raw.c b/drivers/char/ipmi/kcs_bmc_cdev_raw.c
new file mode 100644
index 000000000000..bdd258648c8e
--- /dev/null
+++ b/drivers/char/ipmi/kcs_bmc_cdev_raw.c
@@ -0,0 +1,443 @@ 
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Copyright (c) 2021 IBM Corp. */
+
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/list.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/poll.h>
+
+#include "kcs_bmc_client.h"
+
+#define DEVICE_NAME "raw-kcs"
+
+struct kcs_bmc_raw {
+	struct list_head entry;
+
+	struct kcs_bmc_client client;
+
+	wait_queue_head_t queue;
+	u8 events;
+	bool writable;
+	bool readable;
+	u8 idr;
+
+	struct miscdevice miscdev;
+};
+
+static inline struct kcs_bmc_raw *client_to_kcs_bmc_raw(struct kcs_bmc_client *client)
+{
+	return container_of(client, struct kcs_bmc_raw, client);
+}
+
+/* Call under priv->queue.lock */
+static void kcs_bmc_raw_update_event_mask(struct kcs_bmc_raw *priv, u8 mask, u8 state)
+{
+	kcs_bmc_update_event_mask(priv->client.dev, mask, state);
+	priv->events &= ~mask;
+	priv->events |= state & mask;
+}
+
+static int kcs_bmc_raw_event(struct kcs_bmc_client *client)
+{
+	struct kcs_bmc_raw *priv;
+	struct device *dev;
+	u8 status, handled;
+
+	priv = client_to_kcs_bmc_raw(client);
+	dev = priv->miscdev.this_device;
+
+	spin_lock(&priv->queue.lock);
+
+	status = kcs_bmc_read_status(client->dev);
+	handled = 0;
+
+	if ((priv->events & KCS_BMC_EVENT_TYPE_IBF) && (status & KCS_BMC_STR_IBF)) {
+		if (priv->readable)
+			dev_err(dev, "Storm brewing!");
+
+		dev_dbg(dev, "Disabling IDR events for back-pressure\n");
+		kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_IBF, 0);
+		priv->idr = kcs_bmc_read_data(client->dev);
+		priv->readable = true;
+
+		dev_dbg(dev, "IDR read, waking waiters\n");
+		wake_up_locked(&priv->queue);
+
+		handled |= KCS_BMC_EVENT_TYPE_IBF;
+	}
+
+	if ((priv->events & KCS_BMC_EVENT_TYPE_OBE) && !(status & KCS_BMC_STR_OBF)) {
+		kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_OBE, 0);
+		priv->writable = true;
+
+		dev_dbg(dev, "ODR writable, waking waiters\n");
+		wake_up_locked(&priv->queue);
+
+		handled |= KCS_BMC_EVENT_TYPE_OBE;
+	}
+
+	spin_unlock(&priv->queue.lock);
+
+	return handled ? KCS_BMC_EVENT_HANDLED : KCS_BMC_EVENT_NONE;
+}
+
+static const struct kcs_bmc_client_ops kcs_bmc_raw_client_ops = {
+	.event = kcs_bmc_raw_event,
+};
+
+static inline struct kcs_bmc_raw *file_to_kcs_bmc_raw(struct file *filp)
+{
+	return container_of(filp->private_data, struct kcs_bmc_raw, miscdev);
+}
+
+static int kcs_bmc_raw_open(struct inode *inode, struct file *filp)
+{
+	struct kcs_bmc_raw *priv = file_to_kcs_bmc_raw(filp);
+
+	return kcs_bmc_enable_device(priv->client.dev, &priv->client);
+}
+
+static bool kcs_bmc_raw_prepare_obe(struct kcs_bmc_raw *priv)
+{
+	bool writable;
+
+	/* Enable the OBE event so we can catch the host clearing OBF */
+	kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_OBE, KCS_BMC_EVENT_TYPE_OBE);
+
+	/* Now that we'll catch an OBE event, check if it's already occurred */
+	writable = !(kcs_bmc_read_status(priv->client.dev) & KCS_BMC_STR_OBF);
+
+	/* If OBF is clear we've missed the OBE event, so disable it */
+	if (writable)
+		kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_OBE, 0);
+
+	return writable;
+}
+
+static __poll_t kcs_bmc_raw_poll(struct file *filp, poll_table *wait)
+{
+	struct kcs_bmc_raw *priv;
+	__poll_t events = 0;
+
+	priv = file_to_kcs_bmc_raw(filp);
+
+	poll_wait(filp, &priv->queue, wait);
+
+	spin_lock_irq(&priv->queue.lock);
+	if (kcs_bmc_raw_prepare_obe(priv))
+		events |= (EPOLLOUT | EPOLLWRNORM);
+
+	if (priv->readable || (kcs_bmc_read_status(priv->client.dev) & KCS_BMC_STR_IBF))
+		events |= (EPOLLIN | EPOLLRDNORM);
+	spin_unlock_irq(&priv->queue.lock);
+
+	return events;
+}
+
+static ssize_t kcs_bmc_raw_read(struct file *filp, char __user *buf,
+			     size_t count, loff_t *ppos)
+{
+	struct kcs_bmc_device *kcs_bmc;
+	struct kcs_bmc_raw *priv;
+	bool read_idr, read_str;
+	struct device *dev;
+	u8 idr, str;
+	ssize_t rc;
+
+	priv = file_to_kcs_bmc_raw(filp);
+	kcs_bmc = priv->client.dev;
+	dev = priv->miscdev.this_device;
+
+	if (!count)
+		return 0;
+
+	if (count > 2 || *ppos > 1)
+		return -EINVAL;
+
+	if (*ppos + count > 2)
+		return -EINVAL;
+
+	read_idr = (*ppos == 0);
+	read_str = (*ppos == 1) || (count == 2);
+
+	spin_lock_irq(&priv->queue.lock);
+	if (read_idr) {
+		dev_dbg(dev, "Waiting for IBF\n");
+		str = kcs_bmc_read_status(kcs_bmc);
+		if ((filp->f_flags & O_NONBLOCK) && (str & KCS_BMC_STR_IBF)) {
+			rc = -EWOULDBLOCK;
+			goto out;
+		}
+
+		rc = wait_event_interruptible_locked(priv->queue,
+						     priv->readable || (str & KCS_BMC_STR_IBF));
+		if (rc < 0)
+			goto out;
+
+		if (signal_pending(current)) {
+			dev_dbg(dev, "Interrupted waiting for IBF\n");
+			rc = -EINTR;
+			goto out;
+		}
+
+		/*
+		 * Re-enable events prior to possible read of IDR (which clears
+		 * IBF) to ensure we receive interrupts for subsequent writes
+		 * to IDR. Writes to IDR by the host should not occur while IBF
+		 * is set.
+		 */
+		dev_dbg(dev, "Woken by IBF, enabling IRQ\n");
+		kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_IBF,
+					      KCS_BMC_EVENT_TYPE_IBF);
+
+		/* Read data out of IDR into internal storage if necessary */
+		if (!priv->readable) {
+			WARN(!(str & KCS_BMC_STR_IBF), "Unknown reason for wakeup!");
+
+			priv->idr = kcs_bmc_read_data(kcs_bmc);
+		}
+
+		/* Copy data from internal storage to userspace */
+		idr = priv->idr;
+
+		/* We're done consuming the internally stored value */
+		priv->readable = false;
+	}
+
+	if (read_str) {
+		str = kcs_bmc_read_status(kcs_bmc);
+		if (*ppos == 0 || priv->readable)
+			/*
+			 * If we got this far with `*ppos == 0` then we've read
+			 * data out of IDR, so set IBF when reporting back to
+			 * userspace so userspace knows the IDR value is valid.
+			 */
+			str |= KCS_BMC_STR_IBF;
+
+		dev_dbg(dev, "Read status 0x%x\n", str);
+
+	}
+
+	rc = count;
+out:
+	spin_unlock_irq(&priv->queue.lock);
+
+	if (rc < 0)
+		return rc;
+
+	/* Now copy the data in to the userspace buffer */
+
+	if (read_idr)
+		if (copy_to_user(buf++, &idr, sizeof(idr)))
+			return -EFAULT;
+
+	if (read_str)
+		if (copy_to_user(buf, &str, sizeof(str)))
+			return -EFAULT;
+
+	return count;
+}
+
+static ssize_t kcs_bmc_raw_write(struct file *filp, const char __user *buf,
+			      size_t count, loff_t *ppos)
+{
+	struct kcs_bmc_device *kcs_bmc;
+	bool write_odr, write_str;
+	struct kcs_bmc_raw *priv;
+	struct device *dev;
+	uint8_t data[2];
+	ssize_t result;
+	u8 str;
+
+	priv = file_to_kcs_bmc_raw(filp);
+	kcs_bmc = priv->client.dev;
+	dev = priv->miscdev.this_device;
+
+	if (!count)
+		return count;
+
+	if (count > 2)
+		return -EINVAL;
+
+	if (*ppos >= 2)
+		return -EINVAL;
+
+	if (*ppos + count > 2)
+		return -EINVAL;
+
+	if (copy_from_user(data, buf, count))
+		return -EFAULT;
+
+	write_odr = (*ppos == 0);
+	write_str = (*ppos == 1) || (count == 2);
+
+	spin_lock_irq(&priv->queue.lock);
+
+	/* Always write status before data, we generate the SerIRQ by writing ODR */
+	if (write_str) {
+		/* The index of STR in the userspace buffer depends on whether ODR is written */
+		str = data[*ppos == 0];
+		if (!(str & KCS_BMC_STR_OBF))
+			dev_warn(dev, "Clearing OBF with status write: 0x%x\n", str);
+		dev_dbg(dev, "Writing status 0x%x\n", str);
+		kcs_bmc_write_status(kcs_bmc, str);
+	}
+
+	if (write_odr) {
+		/* If we're writing ODR it's always the first byte in the buffer */
+		u8 odr = data[0];
+
+		str = kcs_bmc_read_status(kcs_bmc);
+		if (str & KCS_BMC_STR_OBF) {
+			if (filp->f_flags & O_NONBLOCK) {
+				result = -EWOULDBLOCK;
+				goto out;
+			}
+
+			priv->writable = kcs_bmc_raw_prepare_obe(priv);
+
+			/* Now either OBF is already clear, or we'll get an OBE event to wake us */
+			dev_dbg(dev, "Waiting for OBF to clear\n");
+			wait_event_interruptible_locked(priv->queue, priv->writable);
+
+			if (signal_pending(current)) {
+				kcs_bmc_raw_update_event_mask(priv, KCS_BMC_EVENT_TYPE_OBE, 0);
+				result = -EINTR;
+				goto out;
+			}
+
+			WARN_ON(kcs_bmc_read_status(kcs_bmc) & KCS_BMC_STR_OBF);
+		}
+
+		dev_dbg(dev, "Writing 0x%x to ODR\n", odr);
+		kcs_bmc_write_data(kcs_bmc, odr);
+	}
+
+	result = count;
+out:
+	spin_unlock_irq(&priv->queue.lock);
+
+	return result;
+}
+
+static int kcs_bmc_raw_release(struct inode *inode, struct file *filp)
+{
+	struct kcs_bmc_raw *priv = file_to_kcs_bmc_raw(filp);
+
+	kcs_bmc_disable_device(priv->client.dev, &priv->client);
+
+	return 0;
+}
+
+static const struct file_operations kcs_bmc_raw_fops = {
+	.owner          = THIS_MODULE,
+	.open		= kcs_bmc_raw_open,
+	.llseek		= no_seek_end_llseek,
+	.read           = kcs_bmc_raw_read,
+	.write          = kcs_bmc_raw_write,
+	.poll		= kcs_bmc_raw_poll,
+	.release	= kcs_bmc_raw_release,
+};
+
+static DEFINE_SPINLOCK(kcs_bmc_raw_instances_lock);
+static LIST_HEAD(kcs_bmc_raw_instances);
+
+static int kcs_bmc_raw_attach_cdev(struct kcs_bmc_device *kcs_bmc)
+{
+	struct kcs_bmc_raw *priv;
+	int rc;
+
+	priv = devm_kzalloc(kcs_bmc->dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	priv->client.dev = kcs_bmc;
+	priv->client.ops = &kcs_bmc_raw_client_ops;
+
+	init_waitqueue_head(&priv->queue);
+	priv->writable = false;
+	priv->readable = false;
+
+	priv->miscdev.minor = MISC_DYNAMIC_MINOR;
+	priv->miscdev.name = devm_kasprintf(kcs_bmc->dev, GFP_KERNEL, "%s%u", DEVICE_NAME,
+					   kcs_bmc->channel);
+	if (!priv->miscdev.name)
+		return -EINVAL;
+
+	priv->miscdev.fops = &kcs_bmc_raw_fops;
+
+	/* Initialise our expected events. Listen for IBF but ignore OBE until necessary */
+	kcs_bmc_raw_update_event_mask(priv, (KCS_BMC_EVENT_TYPE_IBF | KCS_BMC_EVENT_TYPE_OBE),
+				      KCS_BMC_EVENT_TYPE_IBF);
+
+	rc = misc_register(&priv->miscdev);
+	if (rc) {
+		dev_err(kcs_bmc->dev, "Unable to register device\n");
+		return rc;
+	}
+
+	spin_lock_irq(&kcs_bmc_raw_instances_lock);
+	list_add(&priv->entry, &kcs_bmc_raw_instances);
+	spin_unlock_irq(&kcs_bmc_raw_instances_lock);
+
+	dev_info(kcs_bmc->dev, "Initialised raw client for channel %d", kcs_bmc->channel);
+
+	return 0;
+}
+
+static int kcs_bmc_raw_detach_cdev(struct kcs_bmc_device *kcs_bmc)
+{
+	struct kcs_bmc_raw *priv = NULL, *pos;
+
+	spin_lock_irq(&kcs_bmc_raw_instances_lock);
+	list_for_each_entry(pos, &kcs_bmc_raw_instances, entry) {
+		if (pos->client.dev == kcs_bmc) {
+			priv = pos;
+			list_del(&pos->entry);
+			break;
+		}
+	}
+	spin_unlock_irq(&kcs_bmc_raw_instances_lock);
+
+	if (!priv)
+		return 0;
+
+	misc_deregister(&priv->miscdev);
+	kcs_bmc_disable_device(kcs_bmc, &priv->client);
+	devm_kfree(priv->client.dev->dev, priv);
+
+	return 0;
+}
+
+static const struct kcs_bmc_cdev_ops kcs_bmc_raw_cdev_ops = {
+	.add_device = kcs_bmc_raw_attach_cdev,
+	.remove_device = kcs_bmc_raw_detach_cdev,
+};
+
+static struct kcs_bmc_cdev kcs_bmc_raw_cdev = {
+	.ops = &kcs_bmc_raw_cdev_ops,
+};
+
+static int kcs_bmc_raw_init(void)
+{
+	return kcs_bmc_register_cdev(&kcs_bmc_raw_cdev);
+}
+module_init(kcs_bmc_raw_init);
+
+static void kcs_bmc_raw_exit(void)
+{
+	int rc;
+
+	rc = kcs_bmc_unregister_cdev(&kcs_bmc_raw_cdev);
+	if (rc)
+		pr_warn("Failed to remove KCS BMC client: %d", rc);
+}
+module_exit(kcs_bmc_raw_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Andrew Jeffery <andrew@aj.id.au>");
+MODULE_DESCRIPTION("Character device for raw access to a KCS device");