diff mbox series

[v2,5/8] cxl/mem: Add a "RAW" send command

Message ID 20210210000259.635748-6-ben.widawsky@intel.com
State New
Headers show
Series CXL 2.0 Support | expand

Commit Message

Ben Widawsky Feb. 10, 2021, 12:02 a.m. UTC
The CXL memory device send interface will have a number of supported
commands. The raw command is not such a command. Raw commands allow
userspace to send a specified opcode to the underlying hardware and
bypass all driver checks on the command. This is useful for a couple of
usecases, mainly:
1. Undocumented vendor specific hardware commands
2. Prototyping new hardware commands not yet supported by the driver

While this all sounds very powerful it comes with a couple of caveats:
1. Bug reports using raw commands will not get the same level of
   attention as bug reports using supported commands (via taint).
2. Supported commands will be rejected by the RAW command.

With this comes new debugfs knob to allow full access to your toes with
your weapon of choice.

Cc: Ariel Sibley <Ariel.Sibley@microchip.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/Kconfig          |  18 +++++
 drivers/cxl/mem.c            | 125 ++++++++++++++++++++++++++++++++++-
 include/uapi/linux/cxl_mem.h |  12 +++-
 3 files changed, 152 insertions(+), 3 deletions(-)

Comments

Ariel.Sibley@microchip.com Feb. 10, 2021, 3:26 p.m. UTC | #1
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index c4ba3aa0a05d..08eaa8e52083 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -33,6 +33,24 @@ config CXL_MEM
> 
>           If unsure say 'm'.
> 
> +config CXL_MEM_RAW_COMMANDS
> +       bool "RAW Command Interface for Memory Devices"
> +       depends on CXL_MEM
> +       help
> +         Enable CXL RAW command interface.
> +
> +         The CXL driver ioctl interface may assign a kernel ioctl command
> +         number for each specification defined opcode. At any given point in
> +         time the number of opcodes that the specification defines and a device
> +         may implement may exceed the kernel's set of associated ioctl function
> +         numbers. The mismatch is either by omission, specification is too new,
> +         or by design. When prototyping new hardware, or developing /
> debugging
> +         the driver it is useful to be able to submit any possible command to
> +         the hardware, even commands that may crash the kernel due to their
> +         potential impact to memory currently in use by the kernel.
> +
> +         If developing CXL hardware or the driver say Y, otherwise say N.

Blocking RAW commands by default will prevent vendors from developing user space tools that utilize vendor specific commands. Vendors of CXL.mem devices should take ownership of ensuring any vendor defined commands that could cause user data to be exposed or corrupted are disabled at the device level for shipping configurations.
Ben Widawsky Feb. 10, 2021, 4:49 p.m. UTC | #2
On 21-02-10 15:26:27, Ariel.Sibley@microchip.com wrote:
> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > index c4ba3aa0a05d..08eaa8e52083 100644
> > --- a/drivers/cxl/Kconfig
> > +++ b/drivers/cxl/Kconfig
> > @@ -33,6 +33,24 @@ config CXL_MEM
> > 
> >           If unsure say 'm'.
> > 
> > +config CXL_MEM_RAW_COMMANDS
> > +       bool "RAW Command Interface for Memory Devices"
> > +       depends on CXL_MEM
> > +       help
> > +         Enable CXL RAW command interface.
> > +
> > +         The CXL driver ioctl interface may assign a kernel ioctl command
> > +         number for each specification defined opcode. At any given point in
> > +         time the number of opcodes that the specification defines and a device
> > +         may implement may exceed the kernel's set of associated ioctl function
> > +         numbers. The mismatch is either by omission, specification is too new,
> > +         or by design. When prototyping new hardware, or developing /
> > debugging
> > +         the driver it is useful to be able to submit any possible command to
> > +         the hardware, even commands that may crash the kernel due to their
> > +         potential impact to memory currently in use by the kernel.
> > +
> > +         If developing CXL hardware or the driver say Y, otherwise say N.
> 
> Blocking RAW commands by default will prevent vendors from developing user
> space tools that utilize vendor specific commands. Vendors of CXL.mem devices
> should take ownership of ensuring any vendor defined commands that could cause
> user data to be exposed or corrupted are disabled at the device level for
> shipping configurations.

Thanks for brining this up Ariel. If there is a recommendation on how to codify
this, I would certainly like to know because the explanation will be long.

---

The background:

The enabling/disabling of the Kconfig option is driven by the distribution
and/or system integrator. Even if we made the default 'y', nothing stops them
from changing that. if you are using this driver in production and insist on
using RAW commands, you are free to carry around a small patch to get rid of the
WARN (it is a one-liner).

To recap why this is in place - the driver owns the sanctity of the device and
therefore a [large] part of the whole system. What we can do as driver writers
is figure out the set of commands that are "safe" and allow those. Aside from
being able to validate them, we're able to mediate them with other parallel
operations that might conflict. We gain the ability to squint extra hard at bug
reports. We provide a reason to try to use a well defined part of the spec.
Realizing that only allowing that small set of commands in a rapidly growing
ecosystem is not a welcoming API; we decided on RAW.

Vendor commands can be one of two types:
1. Some functionality probably most vendors want.
2. Functionality that is really single vendor specific.

Hopefully we can agree that the path for case #1 is to work with the consortium
to standardize a command that does what is needed and that can eventually become
part of UAPI. The situation is unfortunate, but temporary. If you won't be able
to upgrade your kernel, patch out the WARN as above.

The second situation is interesting and does need some more thought and
discussion.

---

I see 3 realistic options for truly vendor specific commands.
1. Tough noogies. Vendors aren't special and they shouldn't do that.
2. modparam to disable the WARN for specific devices (let the sysadmin decide)
3. Try to make them part of UAPI.

The right answer to me is #1, but I also realize I live in the real world.

#2 provides too much flexibility. Vendors will just do what they please and
distros and/or integrators will be seen as hostile if they don't accommodate.

I like #3, but I have a feeling not everyone will agree. My proposal for vendor
specific commands is, if it's clear it's truly a unique command, allow adding it
as part of UAPI (moving it out of RAW). I expect like 5 of these, ever. If we
start getting multiple per vendor, we've failed. The infrastructure is already
in place to allow doing this pretty easily. I think we'd have to draw up some
guidelines (like adding test cases for the command) to allow these to come in.
Anything with command effects is going to need extra scrutiny.

In my opinion, as maintainers of the driver, we do owe the community an answer
as to our direction for this. Dan, what is your thought?
Ben Widawsky Feb. 10, 2021, 6:11 p.m. UTC | #3
On 21-02-10 18:03:35, Ariel.Sibley@microchip.com wrote:
> > > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > > > index c4ba3aa0a05d..08eaa8e52083 100644
> > > > --- a/drivers/cxl/Kconfig
> > > > +++ b/drivers/cxl/Kconfig
> > > > @@ -33,6 +33,24 @@ config CXL_MEM
> > > >
> > > >           If unsure say 'm'.
> > > >
> > > > +config CXL_MEM_RAW_COMMANDS
> > > > +       bool "RAW Command Interface for Memory Devices"
> > > > +       depends on CXL_MEM
> > > > +       help
> > > > +         Enable CXL RAW command interface.
> > > > +
> > > > +         The CXL driver ioctl interface may assign a kernel ioctl command
> > > > +         number for each specification defined opcode. At any given point in
> > > > +         time the number of opcodes that the specification defines and a device
> > > > +         may implement may exceed the kernel's set of associated ioctl function
> > > > +         numbers. The mismatch is either by omission, specification is too new,
> > > > +         or by design. When prototyping new hardware, or developing /
> > > > debugging
> > > > +         the driver it is useful to be able to submit any possible command to
> > > > +         the hardware, even commands that may crash the kernel due to their
> > > > +         potential impact to memory currently in use by the kernel.
> > > > +
> > > > +         If developing CXL hardware or the driver say Y, otherwise say N.
> > >
> > > Blocking RAW commands by default will prevent vendors from developing user
> > > space tools that utilize vendor specific commands. Vendors of CXL.mem devices
> > > should take ownership of ensuring any vendor defined commands that could cause
> > > user data to be exposed or corrupted are disabled at the device level for
> > > shipping configurations.
> > 
> > Thanks for brining this up Ariel. If there is a recommendation on how to codify
> > this, I would certainly like to know because the explanation will be long.
> > 
> > ---
> > 
> > The background:
> > 
> > The enabling/disabling of the Kconfig option is driven by the distribution
> > and/or system integrator. Even if we made the default 'y', nothing stops them
> > from changing that. if you are using this driver in production and insist on
> > using RAW commands, you are free to carry around a small patch to get rid of the
> > WARN (it is a one-liner).
> > 
> > To recap why this is in place - the driver owns the sanctity of the device and
> > therefore a [large] part of the whole system. What we can do as driver writers
> > is figure out the set of commands that are "safe" and allow those. Aside from
> > being able to validate them, we're able to mediate them with other parallel
> > operations that might conflict. We gain the ability to squint extra hard at bug
> > reports. We provide a reason to try to use a well defined part of the spec.
> > Realizing that only allowing that small set of commands in a rapidly growing
> > ecosystem is not a welcoming API; we decided on RAW.
> > 
> > Vendor commands can be one of two types:
> > 1. Some functionality probably most vendors want.
> > 2. Functionality that is really single vendor specific.
> > 
> > Hopefully we can agree that the path for case #1 is to work with the consortium
> > to standardize a command that does what is needed and that can eventually become
> > part of UAPI. The situation is unfortunate, but temporary. If you won't be able
> > to upgrade your kernel, patch out the WARN as above.
> > 
> > The second situation is interesting and does need some more thought and
> > discussion.
> > 
> > ---
> > 
> > I see 3 realistic options for truly vendor specific commands.
> > 1. Tough noogies. Vendors aren't special and they shouldn't do that.
> > 2. modparam to disable the WARN for specific devices (let the sysadmin decide)
> > 3. Try to make them part of UAPI.
> > 
> > The right answer to me is #1, but I also realize I live in the real world.
> > 
> > #2 provides too much flexibility. Vendors will just do what they please and
> > distros and/or integrators will be seen as hostile if they don't accommodate.
> > 
> > I like #3, but I have a feeling not everyone will agree. My proposal for vendor
> > specific commands is, if it's clear it's truly a unique command, allow adding it
> > as part of UAPI (moving it out of RAW). I expect like 5 of these, ever. If we
> > start getting multiple per vendor, we've failed. The infrastructure is already
> > in place to allow doing this pretty easily. I think we'd have to draw up some
> > guidelines (like adding test cases for the command) to allow these to come in.
> > Anything with command effects is going to need extra scrutiny.
> 
> This would necessitate adding specific opcode values in the range C000h-FFFFh
> to UAPI, and those would then be allowed for all CXL.mem devices, correct?  If
> so, I do not think this is the right approach, as opcodes in this range are by
> definition vendor defined.  A given opcode value will have totally different
> effects depending on the vendor.

Perhaps I didn't explain well enough. The UAPI would define the command ID to
opcode mapping, for example 0xC000. There would be a validation step in the
driver where it determines if it's actually the correct hardware to execute on.
So it would be entirely possible to have multiple vendor commands with the same
opcode.

So UAPI might be this:
        ___C(GET_HEALTH_INFO, "Get Health Info"),                         \
        ___C(GET_LOG, "Get Log"),                                         \
        ___C(VENDOR_FOO_XXX, "FOO"),                                      \
        ___C(VENDOR_BAR_XXX, "BAR"),                                      \

User space just picks the command they want, FOO/BAR. If they use VENDOR_BAR_XXX
on VENDOR_FOO's hardware, they will get an error return value.

> I think you may be on to something with the command effects.  But rather than
> "extra scrutiny" for opcodes that have command effects, would it make sense to
> allow vendor defined opcodes that have Bit[5:0] in the Command Effect field of
> the CEL Entry Structure (Table 173) set to 0?  In conjunction, those bits
> represent any change to the configuration or data within the device.  For
> commands that have no such effects, is there harm to allowing them?  Of
> course, this approach relies on the vendor to not misrepresent the command
> effects.
> 

That last sentence is what worries me :-)


> > 
> > In my opinion, as maintainers of the driver, we do owe the community an answer
> > as to our direction for this. Dan, what is your thought?
Ben Widawsky Feb. 10, 2021, 7:12 p.m. UTC | #4
On 21-02-10 18:46:04, Ariel.Sibley@microchip.com wrote:
> > > > > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > > > > > index c4ba3aa0a05d..08eaa8e52083 100644
> > > > > > --- a/drivers/cxl/Kconfig
> > > > > > +++ b/drivers/cxl/Kconfig
> > > > > > @@ -33,6 +33,24 @@ config CXL_MEM
> > > > > >
> > > > > >           If unsure say 'm'.
> > > > > >
> > > > > > +config CXL_MEM_RAW_COMMANDS
> > > > > > +       bool "RAW Command Interface for Memory Devices"
> > > > > > +       depends on CXL_MEM
> > > > > > +       help
> > > > > > +         Enable CXL RAW command interface.
> > > > > > +
> > > > > > +         The CXL driver ioctl interface may assign a kernel ioctl command
> > > > > > +         number for each specification defined opcode. At any given point in
> > > > > > +         time the number of opcodes that the specification defines and a device
> > > > > > +         may implement may exceed the kernel's set of associated ioctl function
> > > > > > +         numbers. The mismatch is either by omission, specification is too new,
> > > > > > +         or by design. When prototyping new hardware, or developing /
> > > > > > debugging
> > > > > > +         the driver it is useful to be able to submit any possible command to
> > > > > > +         the hardware, even commands that may crash the kernel due to their
> > > > > > +         potential impact to memory currently in use by the kernel.
> > > > > > +
> > > > > > +         If developing CXL hardware or the driver say Y, otherwise say N.
> > > > >
> > > > > Blocking RAW commands by default will prevent vendors from developing user
> > > > > space tools that utilize vendor specific commands. Vendors of CXL.mem devices
> > > > > should take ownership of ensuring any vendor defined commands that could cause
> > > > > user data to be exposed or corrupted are disabled at the device level for
> > > > > shipping configurations.
> > > >
> > > > Thanks for brining this up Ariel. If there is a recommendation on how to codify
> > > > this, I would certainly like to know because the explanation will be long.
> > > >
> > > > ---
> > > >
> > > > The background:
> > > >
> > > > The enabling/disabling of the Kconfig option is driven by the distribution
> > > > and/or system integrator. Even if we made the default 'y', nothing stops them
> > > > from changing that. if you are using this driver in production and insist on
> > > > using RAW commands, you are free to carry around a small patch to get rid of the
> > > > WARN (it is a one-liner).
> > > >
> > > > To recap why this is in place - the driver owns the sanctity of the device and
> > > > therefore a [large] part of the whole system. What we can do as driver writers
> > > > is figure out the set of commands that are "safe" and allow those. Aside from
> > > > being able to validate them, we're able to mediate them with other parallel
> > > > operations that might conflict. We gain the ability to squint extra hard at bug
> > > > reports. We provide a reason to try to use a well defined part of the spec.
> > > > Realizing that only allowing that small set of commands in a rapidly growing
> > > > ecosystem is not a welcoming API; we decided on RAW.
> > > >
> > > > Vendor commands can be one of two types:
> > > > 1. Some functionality probably most vendors want.
> > > > 2. Functionality that is really single vendor specific.
> > > >
> > > > Hopefully we can agree that the path for case #1 is to work with the consortium
> > > > to standardize a command that does what is needed and that can eventually become
> > > > part of UAPI. The situation is unfortunate, but temporary. If you won't be able
> > > > to upgrade your kernel, patch out the WARN as above.
> > > >
> > > > The second situation is interesting and does need some more thought and
> > > > discussion.
> > > >
> > > > ---
> > > >
> > > > I see 3 realistic options for truly vendor specific commands.
> > > > 1. Tough noogies. Vendors aren't special and they shouldn't do that.
> > > > 2. modparam to disable the WARN for specific devices (let the sysadmin decide)
> > > > 3. Try to make them part of UAPI.
> > > >
> > > > The right answer to me is #1, but I also realize I live in the real world.
> > > >
> > > > #2 provides too much flexibility. Vendors will just do what they please and
> > > > distros and/or integrators will be seen as hostile if they don't accommodate.
> > > >
> > > > I like #3, but I have a feeling not everyone will agree. My proposal for vendor
> > > > specific commands is, if it's clear it's truly a unique command, allow adding it
> > > > as part of UAPI (moving it out of RAW). I expect like 5 of these, ever. If we
> > > > start getting multiple per vendor, we've failed. The infrastructure is already
> > > > in place to allow doing this pretty easily. I think we'd have to draw up some
> > > > guidelines (like adding test cases for the command) to allow these to come in.
> > > > Anything with command effects is going to need extra scrutiny.
> > >
> > > This would necessitate adding specific opcode values in the range C000h-FFFFh
> > > to UAPI, and those would then be allowed for all CXL.mem devices, correct?  If
> > > so, I do not think this is the right approach, as opcodes in this range are by
> > > definition vendor defined.  A given opcode value will have totally different
> > > effects depending on the vendor.
> > 
> > Perhaps I didn't explain well enough. The UAPI would define the command ID to
> > opcode mapping, for example 0xC000. There would be a validation step in the
> > driver where it determines if it's actually the correct hardware to execute on.
> > So it would be entirely possible to have multiple vendor commands with the same
> > opcode.
> > 
> > So UAPI might be this:
> >         ___C(GET_HEALTH_INFO, "Get Health Info"),                         \
> >         ___C(GET_LOG, "Get Log"),                                         \
> >         ___C(VENDOR_FOO_XXX, "FOO"),                                      \
> >         ___C(VENDOR_BAR_XXX, "BAR"),                                      \
> > 
> > User space just picks the command they want, FOO/BAR. If they use VENDOR_BAR_XXX
> > on VENDOR_FOO's hardware, they will get an error return value.
> 
> Would the driver be doing this enforcement of vendor ID / opcode
> compatibility, or would the error return value mentioned here be from the
> device?  My concern is where the same opcode has two meanings for different
> vendors.  For example, for Vendor A opcode 0xC000 might report some form of
> status information, but for Vendor B it might have data side effects.  There
> may not have been any UAPI intention to expose 0xC000 for Vendor B devices,
> but the existence of 0xC000 in UAPI for Vendor A results in the data
> corrupting version of 0xC000 for Vendor B being allowed.  It would seem to me
> that even if the commands are in UAPI, the driver would still need to rely on
> the contents of the CEL to determine if the command should be allowed.

I think I might not be properly understanding your concern. There are two types
of errors in UAPI that represent 3 error conditions:

1. errno from the ioctl - parameter invalid kind of stuff, this would include using
   the vendor A UAPI on vendor B's device (assuming matching opcodes).
2. errno from the ioctl - transport error of some sort in the mailbox command -
   timeout on doorbell kind of thing.
3. cxl_send_command.retval - Device's error code.

Did that address your concern?

>  
> > > I think you may be on to something with the command effects.  But rather than
> > > "extra scrutiny" for opcodes that have command effects, would it make sense to
> > > allow vendor defined opcodes that have Bit[5:0] in the Command Effect field of
> > > the CEL Entry Structure (Table 173) set to 0?  In conjunction, those bits
> > > represent any change to the configuration or data within the device.  For
> > > commands that have no such effects, is there harm to allowing them?  Of
> > > course, this approach relies on the vendor to not misrepresent the command
> > > effects.
> > >
> > 
> > That last sentence is what worries me :-)
> 
> One must also rely on the vendor to not simply corrupt data at random. :) IMO
> the contents of the CEL should be believed by the driver, rather than the
> driver treating the device as a hostile actor.
> 

I respect your opinion, but my opinion is that driver writers absolutely cannot
rely on that. It would further the conversation a great deal to get concrete
examples of commands that couldn't be part of the core spec and had no effects.
I assume all vendors are going to avoid doing that, which is a real shame.

So far I haven't seen the consortium shoot something down from a vendor because
it is too vendor specific...

> > 
> > 
> > > >
> > > > In my opinion, as maintainers of the driver, we do owe the community an answer
> > > > as to our direction for this. Dan, what is your thought?
Jonathan Cameron Feb. 11, 2021, 11:19 a.m. UTC | #5
On Tue, 9 Feb 2021 16:02:56 -0800
Ben Widawsky <ben.widawsky@intel.com> wrote:

> The CXL memory device send interface will have a number of supported

> commands. The raw command is not such a command. Raw commands allow

> userspace to send a specified opcode to the underlying hardware and

> bypass all driver checks on the command. This is useful for a couple of

> usecases, mainly:

> 1. Undocumented vendor specific hardware commands


This one I get.  There are things we'd love to standardize but often they
need proving in a generation of hardware before the data is available to
justify taking it to a standards body.  Stuff like performance stats.
This stuff will all sit in the vendor defined range.  Maybe there is an
argument for in driver hooks to allow proper support even for these
(Ben mentioned this in the other branch of the thread).

> 2. Prototyping new hardware commands not yet supported by the driver


For 2, could just have a convenient place to enable this by one line patch.
Some subsystems (SPI comes to mind) do this for their equivalent of raw
commands.  The code is all there to enable it but you need to hook it
up if you want to use it.  Avoids chance of a distro shipping it.

> 

> While this all sounds very powerful it comes with a couple of caveats:

> 1. Bug reports using raw commands will not get the same level of

>    attention as bug reports using supported commands (via taint).

> 2. Supported commands will be rejected by the RAW command.


Perhaps I'm missing reading this point 2 (not sure the code actually does it!)

As stated what worries me as it means when we add support for a new
bit of the spec we just broke the userspace ABI.

> 

> With this comes new debugfs knob to allow full access to your toes with

> your weapon of choice.


A few trivial things inline,

Jonathan

> 

> Cc: Ariel Sibley <Ariel.Sibley@microchip.com>

> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>

> Reviewed-by: Dan Williams <dan.j.williams@intel.com>

> ---

>  drivers/cxl/Kconfig          |  18 +++++

>  drivers/cxl/mem.c            | 125 ++++++++++++++++++++++++++++++++++-

>  include/uapi/linux/cxl_mem.h |  12 +++-

>  3 files changed, 152 insertions(+), 3 deletions(-)

> 

> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig

> index c4ba3aa0a05d..08eaa8e52083 100644

> --- a/drivers/cxl/Kconfig

> +++ b/drivers/cxl/Kconfig

> @@ -33,6 +33,24 @@ config CXL_MEM

>  

>  	  If unsure say 'm'.

>  

> +config CXL_MEM_RAW_COMMANDS

> +	bool "RAW Command Interface for Memory Devices"

> +	depends on CXL_MEM

> +	help

> +	  Enable CXL RAW command interface.

> +

> +	  The CXL driver ioctl interface may assign a kernel ioctl command

> +	  number for each specification defined opcode. At any given point in

> +	  time the number of opcodes that the specification defines and a device

> +	  may implement may exceed the kernel's set of associated ioctl function

> +	  numbers. The mismatch is either by omission, specification is too new,

> +	  or by design. When prototyping new hardware, or developing / debugging

> +	  the driver it is useful to be able to submit any possible command to

> +	  the hardware, even commands that may crash the kernel due to their

> +	  potential impact to memory currently in use by the kernel.

> +

> +	  If developing CXL hardware or the driver say Y, otherwise say N.

> +

>  config CXL_MEM_INSECURE_DEBUG

>  	bool "CXL.mem debugging"

>  	depends on CXL_MEM

> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c

> index ce65630bb75e..6d766a994dce 100644

> --- a/drivers/cxl/mem.c

> +++ b/drivers/cxl/mem.c

> @@ -1,6 +1,8 @@

>  // SPDX-License-Identifier: GPL-2.0-only

>  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */

>  #include <uapi/linux/cxl_mem.h>

> +#include <linux/security.h>

> +#include <linux/debugfs.h>

>  #include <linux/module.h>

>  #include <linux/mutex.h>

>  #include <linux/cdev.h>

> @@ -41,7 +43,14 @@

>  

>  enum opcode {

>  	CXL_MBOX_OP_INVALID		= 0x0000,

> +	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,

> +	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,

>  	CXL_MBOX_OP_IDENTIFY		= 0x4000,

> +	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,

> +	CXL_MBOX_OP_SET_LSA		= 0x4103,

> +	CXL_MBOX_OP_SET_SHUTDOWN_STATE	= 0x4204,

> +	CXL_MBOX_OP_SCAN_MEDIA		= 0x4304,

> +	CXL_MBOX_OP_GET_SCAN_MEDIA	= 0x4305,

>  	CXL_MBOX_OP_MAX			= 0x10000

>  };

>  

> @@ -91,6 +100,8 @@ struct cxl_memdev {

>  

>  static int cxl_mem_major;

>  static DEFINE_IDA(cxl_memdev_ida);

> +static struct dentry *cxl_debugfs;

> +static bool raw_allow_all;

>  

>  /**

>   * struct cxl_mem_command - Driver representation of a memory device command

> @@ -132,6 +143,49 @@ struct cxl_mem_command {

>   */

>  static struct cxl_mem_command mem_commands[] = {

>  	CXL_CMD(IDENTIFY, NONE, 0, 0x43),

> +#ifdef CONFIG_CXL_MEM_RAW_COMMANDS

> +	CXL_CMD(RAW, NONE, ~0, ~0),

> +#endif

> +};

> +

> +/*

> + * Commands that RAW doesn't permit. The rationale for each:

> + *

> + * CXL_MBOX_OP_ACTIVATE_FW: Firmware activation requires adjustment /

> + * coordination of transaction timeout values at the root bridge level.

> + *

> + * CXL_MBOX_OP_SET_PARTITION_INFO: The device memory map may change live

> + * and needs to be coordinated with HDM updates.

> + *

> + * CXL_MBOX_OP_SET_LSA: The label storage area may be cached by the

> + * driver and any writes from userspace invalidates those contents.

> + *

> + * CXL_MBOX_OP_SET_SHUTDOWN_STATE: Set shutdown state assumes no writes

> + * to the device after it is marked clean, userspace can not make that

> + * assertion.

> + *

> + * CXL_MBOX_OP_[GET_]SCAN_MEDIA: The kernel provides a native error list that

> + * is kept up to date with patrol notifications and error management.

> + */

> +static u16 disabled_raw_commands[] = {

> +	CXL_MBOX_OP_ACTIVATE_FW,

> +	CXL_MBOX_OP_SET_PARTITION_INFO,

> +	CXL_MBOX_OP_SET_LSA,

> +	CXL_MBOX_OP_SET_SHUTDOWN_STATE,

> +	CXL_MBOX_OP_SCAN_MEDIA,

> +	CXL_MBOX_OP_GET_SCAN_MEDIA,

> +};

> +

> +/*

> + * Command sets that RAW doesn't permit. All opcodes in this set are

> + * disabled because they pass plain text security payloads over the

> + * user/kernel boundary. This functionality is intended to be wrapped

> + * behind the keys ABI which allows for encrypted payloads in the UAPI

> + */

> +static u8 security_command_sets[] = {

> +	0x44, /* Sanitize */

> +	0x45, /* Persistent Memory Data-at-rest Security */

> +	0x46, /* Security Passthrough */

>  };

>  

>  #define cxl_for_each_cmd(cmd)                                                  \

> @@ -162,6 +216,16 @@ static int cxl_mem_wait_for_doorbell(struct cxl_mem *cxlm)

>  	return 0;

>  }

>  

> +static bool is_security_command(u16 opcode)

> +{

> +	int i;

> +

> +	for (i = 0; i < ARRAY_SIZE(security_command_sets); i++)

> +		if (security_command_sets[i] == (opcode >> 8))

> +			return true;

> +	return false;

> +}

> +

>  static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm,

>  				 struct mbox_cmd *mbox_cmd)

>  {

> @@ -170,7 +234,8 @@ static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm,

>  	dev_dbg(dev, "Mailbox command (opcode: %#x size: %zub) timed out\n",

>  		mbox_cmd->opcode, mbox_cmd->size_in);

>  

> -	if (IS_ENABLED(CONFIG_CXL_MEM_INSECURE_DEBUG)) {

> +	if (!is_security_command(mbox_cmd->opcode) ||

> +	    IS_ENABLED(CONFIG_CXL_MEM_INSECURE_DEBUG)) {

>  		print_hex_dump_debug("Payload ", DUMP_PREFIX_OFFSET, 16, 1,

>  				     mbox_cmd->payload_in, mbox_cmd->size_in,

>  				     true);

> @@ -434,6 +499,9 @@ static int handle_mailbox_cmd_from_user(struct cxl_memdev *cxlmd,

>  		cxl_command_names[cmd->info.id].name, mbox_cmd.opcode,

>  		cmd->info.size_in);

>  

> +	dev_WARN_ONCE(dev, cmd->info.id == CXL_MEM_COMMAND_ID_RAW,

> +		      "raw command path used\n");

> +

>  	rc = cxl_mem_mbox_send_cmd(cxlm, &mbox_cmd);

>  	cxl_mem_mbox_put(cxlm);

>  	if (rc)

> @@ -464,6 +532,29 @@ static int handle_mailbox_cmd_from_user(struct cxl_memdev *cxlmd,

>  	return rc;

>  }

>  

> +static bool cxl_mem_raw_command_allowed(u16 opcode)

> +{

> +	int i;

> +

> +	if (!IS_ENABLED(CONFIG_CXL_MEM_RAW_COMMANDS))

> +		return false;

> +

> +	if (security_locked_down(LOCKDOWN_NONE))

> +		return false;

> +

> +	if (raw_allow_all)

> +		return true;

> +

> +	if (is_security_command(opcode))

Given we are mixing generic calls like security_locked_down()
and local cxl specific ones like this one, prefix the
local versions.

cxl_is_security_command()

I'd also have a slight preference to do it for cxl_disabled_raw_commands
and cxl_raw_allow_all though they are less important as more obviously
local by not being function calls.

> +		return false;

> +

> +	for (i = 0; i < ARRAY_SIZE(disabled_raw_commands); i++)

> +		if (disabled_raw_commands[i] == opcode)

> +			return false;

> +

> +	return true;

> +}

> +

>  /**

>   * cxl_validate_cmd_from_user() - Check fields for CXL_MEM_SEND_COMMAND.

>   * @cxlm: &struct cxl_mem device whose mailbox will be used.

> @@ -500,6 +591,29 @@ static int cxl_validate_cmd_from_user(struct cxl_mem *cxlm,

>  	if (send_cmd->in.size > cxlm->payload_size)

>  		return -EINVAL;

>  

> +	/* Checks are bypassed for raw commands but along comes the taint! */

> +	if (send_cmd->id == CXL_MEM_COMMAND_ID_RAW) {

> +		const struct cxl_mem_command temp = {

> +			.info = {

> +				.id = CXL_MEM_COMMAND_ID_RAW,

> +				.flags = CXL_MEM_COMMAND_FLAG_NONE,

> +				.size_in = send_cmd->in.size,

> +				.size_out = send_cmd->out.size,

> +			},

> +			.opcode = send_cmd->raw.opcode

> +		};

> +

> +		if (send_cmd->raw.rsvd)

> +			return -EINVAL;

> +

> +		if (!cxl_mem_raw_command_allowed(send_cmd->raw.opcode))

> +			return -EPERM;

> +

> +		memcpy(out_cmd, &temp, sizeof(temp));

> +

> +		return 0;

> +	}

> +

>  	if (send_cmd->flags & ~CXL_MEM_COMMAND_FLAG_MASK)

>  		return -EINVAL;

>  

> @@ -1123,8 +1237,9 @@ static struct pci_driver cxl_mem_driver = {

>  

>  static __init int cxl_mem_init(void)

>  {

> -	int rc;

> +	struct dentry *mbox_debugfs;

>  	dev_t devt;

> +	int rc;


Shuffle this back to the place it was introduced to reduce patch noise.

>  

>  	rc = alloc_chrdev_region(&devt, 0, CXL_MEM_MAX_DEVS, "cxl");

>  	if (rc)

> @@ -1139,11 +1254,17 @@ static __init int cxl_mem_init(void)

>  		return rc;

>  	}

>  

> +	cxl_debugfs = debugfs_create_dir("cxl", NULL);

> +	mbox_debugfs = debugfs_create_dir("mbox", cxl_debugfs);

> +	debugfs_create_bool("raw_allow_all", 0600, mbox_debugfs,

> +			    &raw_allow_all);

> +

>  	return 0;

>  }

>  

>  static __exit void cxl_mem_exit(void)

>  {

> +	debugfs_remove_recursive(cxl_debugfs);

>  	pci_unregister_driver(&cxl_mem_driver);

>  	unregister_chrdev_region(MKDEV(cxl_mem_major, 0), CXL_MEM_MAX_DEVS);

>  }

> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h

> index f1f7e9f32ea5..72d1eb601a5d 100644

> --- a/include/uapi/linux/cxl_mem.h

> +++ b/include/uapi/linux/cxl_mem.h

> @@ -22,6 +22,7 @@

>  #define CXL_CMDS                                                          \

>  	___C(INVALID, "Invalid Command"),                                 \

>  	___C(IDENTIFY, "Identify Command"),                               \

> +	___C(RAW, "Raw device command"),                                  \

>  	___C(MAX, "Last command")

>  

>  #define ___C(a, b) CXL_MEM_COMMAND_ID_##a

> @@ -112,6 +113,9 @@ struct cxl_mem_query_commands {

>   * @id: The command to send to the memory device. This must be one of the

>   *	commands returned by the query command.

>   * @flags: Flags for the command (input).

> + * @raw: Special fields for raw commands

> + * @raw.opcode: Opcode passed to hardware when using the RAW command.

> + * @raw.rsvd: Must be zero.

>   * @rsvd: Must be zero.

>   * @retval: Return value from the memory device (output).

>   * @in.size: Size of the payload to provide to the device (input).

> @@ -133,7 +137,13 @@ struct cxl_mem_query_commands {

>  struct cxl_send_command {

>  	__u32 id;

>  	__u32 flags;

> -	__u32 rsvd;

> +	union {

> +		struct {

> +			__u16 opcode;

> +			__u16 rsvd;

> +		} raw;

> +		__u32 rsvd;

> +	};

>  	__u32 retval;

>  

>  	struct {
Ben Widawsky Feb. 11, 2021, 4:01 p.m. UTC | #6
On 21-02-11 11:19:24, Jonathan Cameron wrote:
> On Tue, 9 Feb 2021 16:02:56 -0800

> Ben Widawsky <ben.widawsky@intel.com> wrote:

> 

> > The CXL memory device send interface will have a number of supported

> > commands. The raw command is not such a command. Raw commands allow

> > userspace to send a specified opcode to the underlying hardware and

> > bypass all driver checks on the command. This is useful for a couple of

> > usecases, mainly:

> > 1. Undocumented vendor specific hardware commands

> 

> This one I get.  There are things we'd love to standardize but often they

> need proving in a generation of hardware before the data is available to

> justify taking it to a standards body.  Stuff like performance stats.

> This stuff will all sit in the vendor defined range.  Maybe there is an

> argument for in driver hooks to allow proper support even for these

> (Ben mentioned this in the other branch of the thread).

> 

> > 2. Prototyping new hardware commands not yet supported by the driver

> 

> For 2, could just have a convenient place to enable this by one line patch.

> Some subsystems (SPI comes to mind) do this for their equivalent of raw

> commands.  The code is all there to enable it but you need to hook it

> up if you want to use it.  Avoids chance of a distro shipping it.

> 


I'm fine to drop #2 as a justification point, or maybe reword the commit message
to say, "you could also just do... but since we have it for #1 already..."

> > 

> > While this all sounds very powerful it comes with a couple of caveats:

> > 1. Bug reports using raw commands will not get the same level of

> >    attention as bug reports using supported commands (via taint).

> > 2. Supported commands will be rejected by the RAW command.

> 

> Perhaps I'm missing reading this point 2 (not sure the code actually does it!)

> 

> As stated what worries me as it means when we add support for a new

> bit of the spec we just broke the userspace ABI.

> 


It does not break ABI. The agreement is userspace must always use the QUERY
command to find out what commands are supported. If it tries to use a RAW
command that is a supported command, it will be rejected. In the case you
mention, that's an application bug. If there is a way to document that better
than what's already in the UAPI kdocs, I'm open to suggestions.

Unlike perhaps other UAPI, this one only promises to give you a way to determine
what commands you can use, not the list of what commands you can use.

> > 

> > With this comes new debugfs knob to allow full access to your toes with

> > your weapon of choice.

> 

> A few trivial things inline,

> 

> Jonathan

> 

> > 

> > Cc: Ariel Sibley <Ariel.Sibley@microchip.com>

> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>

> > Reviewed-by: Dan Williams <dan.j.williams@intel.com>

> > ---

> >  drivers/cxl/Kconfig          |  18 +++++

> >  drivers/cxl/mem.c            | 125 ++++++++++++++++++++++++++++++++++-

> >  include/uapi/linux/cxl_mem.h |  12 +++-

> >  3 files changed, 152 insertions(+), 3 deletions(-)

> > 

> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig

> > index c4ba3aa0a05d..08eaa8e52083 100644

> > --- a/drivers/cxl/Kconfig

> > +++ b/drivers/cxl/Kconfig

> > @@ -33,6 +33,24 @@ config CXL_MEM

> >  

> >  	  If unsure say 'm'.

> >  

> > +config CXL_MEM_RAW_COMMANDS

> > +	bool "RAW Command Interface for Memory Devices"

> > +	depends on CXL_MEM

> > +	help

> > +	  Enable CXL RAW command interface.

> > +

> > +	  The CXL driver ioctl interface may assign a kernel ioctl command

> > +	  number for each specification defined opcode. At any given point in

> > +	  time the number of opcodes that the specification defines and a device

> > +	  may implement may exceed the kernel's set of associated ioctl function

> > +	  numbers. The mismatch is either by omission, specification is too new,

> > +	  or by design. When prototyping new hardware, or developing / debugging

> > +	  the driver it is useful to be able to submit any possible command to

> > +	  the hardware, even commands that may crash the kernel due to their

> > +	  potential impact to memory currently in use by the kernel.

> > +

> > +	  If developing CXL hardware or the driver say Y, otherwise say N.

> > +

> >  config CXL_MEM_INSECURE_DEBUG

> >  	bool "CXL.mem debugging"

> >  	depends on CXL_MEM

> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c

> > index ce65630bb75e..6d766a994dce 100644

> > --- a/drivers/cxl/mem.c

> > +++ b/drivers/cxl/mem.c

> > @@ -1,6 +1,8 @@

> >  // SPDX-License-Identifier: GPL-2.0-only

> >  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */

> >  #include <uapi/linux/cxl_mem.h>

> > +#include <linux/security.h>

> > +#include <linux/debugfs.h>

> >  #include <linux/module.h>

> >  #include <linux/mutex.h>

> >  #include <linux/cdev.h>

> > @@ -41,7 +43,14 @@

> >  

> >  enum opcode {

> >  	CXL_MBOX_OP_INVALID		= 0x0000,

> > +	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,

> > +	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,

> >  	CXL_MBOX_OP_IDENTIFY		= 0x4000,

> > +	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,

> > +	CXL_MBOX_OP_SET_LSA		= 0x4103,

> > +	CXL_MBOX_OP_SET_SHUTDOWN_STATE	= 0x4204,

> > +	CXL_MBOX_OP_SCAN_MEDIA		= 0x4304,

> > +	CXL_MBOX_OP_GET_SCAN_MEDIA	= 0x4305,

> >  	CXL_MBOX_OP_MAX			= 0x10000

> >  };

> >  

> > @@ -91,6 +100,8 @@ struct cxl_memdev {

> >  

> >  static int cxl_mem_major;

> >  static DEFINE_IDA(cxl_memdev_ida);

> > +static struct dentry *cxl_debugfs;

> > +static bool raw_allow_all;

> >  

> >  /**

> >   * struct cxl_mem_command - Driver representation of a memory device command

> > @@ -132,6 +143,49 @@ struct cxl_mem_command {

> >   */

> >  static struct cxl_mem_command mem_commands[] = {

> >  	CXL_CMD(IDENTIFY, NONE, 0, 0x43),

> > +#ifdef CONFIG_CXL_MEM_RAW_COMMANDS

> > +	CXL_CMD(RAW, NONE, ~0, ~0),

> > +#endif

> > +};

> > +

> > +/*

> > + * Commands that RAW doesn't permit. The rationale for each:

> > + *

> > + * CXL_MBOX_OP_ACTIVATE_FW: Firmware activation requires adjustment /

> > + * coordination of transaction timeout values at the root bridge level.

> > + *

> > + * CXL_MBOX_OP_SET_PARTITION_INFO: The device memory map may change live

> > + * and needs to be coordinated with HDM updates.

> > + *

> > + * CXL_MBOX_OP_SET_LSA: The label storage area may be cached by the

> > + * driver and any writes from userspace invalidates those contents.

> > + *

> > + * CXL_MBOX_OP_SET_SHUTDOWN_STATE: Set shutdown state assumes no writes

> > + * to the device after it is marked clean, userspace can not make that

> > + * assertion.

> > + *

> > + * CXL_MBOX_OP_[GET_]SCAN_MEDIA: The kernel provides a native error list that

> > + * is kept up to date with patrol notifications and error management.

> > + */

> > +static u16 disabled_raw_commands[] = {

> > +	CXL_MBOX_OP_ACTIVATE_FW,

> > +	CXL_MBOX_OP_SET_PARTITION_INFO,

> > +	CXL_MBOX_OP_SET_LSA,

> > +	CXL_MBOX_OP_SET_SHUTDOWN_STATE,

> > +	CXL_MBOX_OP_SCAN_MEDIA,

> > +	CXL_MBOX_OP_GET_SCAN_MEDIA,

> > +};

> > +

> > +/*

> > + * Command sets that RAW doesn't permit. All opcodes in this set are

> > + * disabled because they pass plain text security payloads over the

> > + * user/kernel boundary. This functionality is intended to be wrapped

> > + * behind the keys ABI which allows for encrypted payloads in the UAPI

> > + */

> > +static u8 security_command_sets[] = {

> > +	0x44, /* Sanitize */

> > +	0x45, /* Persistent Memory Data-at-rest Security */

> > +	0x46, /* Security Passthrough */

> >  };

> >  

> >  #define cxl_for_each_cmd(cmd)                                                  \

> > @@ -162,6 +216,16 @@ static int cxl_mem_wait_for_doorbell(struct cxl_mem *cxlm)

> >  	return 0;

> >  }

> >  

> > +static bool is_security_command(u16 opcode)

> > +{

> > +	int i;

> > +

> > +	for (i = 0; i < ARRAY_SIZE(security_command_sets); i++)

> > +		if (security_command_sets[i] == (opcode >> 8))

> > +			return true;

> > +	return false;

> > +}

> > +

> >  static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm,

> >  				 struct mbox_cmd *mbox_cmd)

> >  {

> > @@ -170,7 +234,8 @@ static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm,

> >  	dev_dbg(dev, "Mailbox command (opcode: %#x size: %zub) timed out\n",

> >  		mbox_cmd->opcode, mbox_cmd->size_in);

> >  

> > -	if (IS_ENABLED(CONFIG_CXL_MEM_INSECURE_DEBUG)) {

> > +	if (!is_security_command(mbox_cmd->opcode) ||

> > +	    IS_ENABLED(CONFIG_CXL_MEM_INSECURE_DEBUG)) {

> >  		print_hex_dump_debug("Payload ", DUMP_PREFIX_OFFSET, 16, 1,

> >  				     mbox_cmd->payload_in, mbox_cmd->size_in,

> >  				     true);

> > @@ -434,6 +499,9 @@ static int handle_mailbox_cmd_from_user(struct cxl_memdev *cxlmd,

> >  		cxl_command_names[cmd->info.id].name, mbox_cmd.opcode,

> >  		cmd->info.size_in);

> >  

> > +	dev_WARN_ONCE(dev, cmd->info.id == CXL_MEM_COMMAND_ID_RAW,

> > +		      "raw command path used\n");

> > +

> >  	rc = cxl_mem_mbox_send_cmd(cxlm, &mbox_cmd);

> >  	cxl_mem_mbox_put(cxlm);

> >  	if (rc)

> > @@ -464,6 +532,29 @@ static int handle_mailbox_cmd_from_user(struct cxl_memdev *cxlmd,

> >  	return rc;

> >  }

> >  

> > +static bool cxl_mem_raw_command_allowed(u16 opcode)

> > +{

> > +	int i;

> > +

> > +	if (!IS_ENABLED(CONFIG_CXL_MEM_RAW_COMMANDS))

> > +		return false;

> > +

> > +	if (security_locked_down(LOCKDOWN_NONE))

> > +		return false;

> > +

> > +	if (raw_allow_all)

> > +		return true;

> > +

> > +	if (is_security_command(opcode))

> Given we are mixing generic calls like security_locked_down()

> and local cxl specific ones like this one, prefix the

> local versions.

> 

> cxl_is_security_command()

> 

> I'd also have a slight preference to do it for cxl_disabled_raw_commands

> and cxl_raw_allow_all though they are less important as more obviously

> local by not being function calls.

> 

> > +		return false;

> > +

> > +	for (i = 0; i < ARRAY_SIZE(disabled_raw_commands); i++)

> > +		if (disabled_raw_commands[i] == opcode)

> > +			return false;

> > +

> > +	return true;

> > +}

> > +

> >  /**

> >   * cxl_validate_cmd_from_user() - Check fields for CXL_MEM_SEND_COMMAND.

> >   * @cxlm: &struct cxl_mem device whose mailbox will be used.

> > @@ -500,6 +591,29 @@ static int cxl_validate_cmd_from_user(struct cxl_mem *cxlm,

> >  	if (send_cmd->in.size > cxlm->payload_size)

> >  		return -EINVAL;

> >  

> > +	/* Checks are bypassed for raw commands but along comes the taint! */

> > +	if (send_cmd->id == CXL_MEM_COMMAND_ID_RAW) {

> > +		const struct cxl_mem_command temp = {

> > +			.info = {

> > +				.id = CXL_MEM_COMMAND_ID_RAW,

> > +				.flags = CXL_MEM_COMMAND_FLAG_NONE,

> > +				.size_in = send_cmd->in.size,

> > +				.size_out = send_cmd->out.size,

> > +			},

> > +			.opcode = send_cmd->raw.opcode

> > +		};

> > +

> > +		if (send_cmd->raw.rsvd)

> > +			return -EINVAL;

> > +

> > +		if (!cxl_mem_raw_command_allowed(send_cmd->raw.opcode))

> > +			return -EPERM;

> > +

> > +		memcpy(out_cmd, &temp, sizeof(temp));

> > +

> > +		return 0;

> > +	}

> > +

> >  	if (send_cmd->flags & ~CXL_MEM_COMMAND_FLAG_MASK)

> >  		return -EINVAL;

> >  

> > @@ -1123,8 +1237,9 @@ static struct pci_driver cxl_mem_driver = {

> >  

> >  static __init int cxl_mem_init(void)

> >  {

> > -	int rc;

> > +	struct dentry *mbox_debugfs;

> >  	dev_t devt;

> > +	int rc;

> 

> Shuffle this back to the place it was introduced to reduce patch noise.

> 

> >  

> >  	rc = alloc_chrdev_region(&devt, 0, CXL_MEM_MAX_DEVS, "cxl");

> >  	if (rc)

> > @@ -1139,11 +1254,17 @@ static __init int cxl_mem_init(void)

> >  		return rc;

> >  	}

> >  

> > +	cxl_debugfs = debugfs_create_dir("cxl", NULL);

> > +	mbox_debugfs = debugfs_create_dir("mbox", cxl_debugfs);

> > +	debugfs_create_bool("raw_allow_all", 0600, mbox_debugfs,

> > +			    &raw_allow_all);

> > +

> >  	return 0;

> >  }

> >  

> >  static __exit void cxl_mem_exit(void)

> >  {

> > +	debugfs_remove_recursive(cxl_debugfs);

> >  	pci_unregister_driver(&cxl_mem_driver);

> >  	unregister_chrdev_region(MKDEV(cxl_mem_major, 0), CXL_MEM_MAX_DEVS);

> >  }

> > diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h

> > index f1f7e9f32ea5..72d1eb601a5d 100644

> > --- a/include/uapi/linux/cxl_mem.h

> > +++ b/include/uapi/linux/cxl_mem.h

> > @@ -22,6 +22,7 @@

> >  #define CXL_CMDS                                                          \

> >  	___C(INVALID, "Invalid Command"),                                 \

> >  	___C(IDENTIFY, "Identify Command"),                               \

> > +	___C(RAW, "Raw device command"),                                  \

> >  	___C(MAX, "Last command")

> >  

> >  #define ___C(a, b) CXL_MEM_COMMAND_ID_##a

> > @@ -112,6 +113,9 @@ struct cxl_mem_query_commands {

> >   * @id: The command to send to the memory device. This must be one of the

> >   *	commands returned by the query command.

> >   * @flags: Flags for the command (input).

> > + * @raw: Special fields for raw commands

> > + * @raw.opcode: Opcode passed to hardware when using the RAW command.

> > + * @raw.rsvd: Must be zero.

> >   * @rsvd: Must be zero.

> >   * @retval: Return value from the memory device (output).

> >   * @in.size: Size of the payload to provide to the device (input).

> > @@ -133,7 +137,13 @@ struct cxl_mem_query_commands {

> >  struct cxl_send_command {

> >  	__u32 id;

> >  	__u32 flags;

> > -	__u32 rsvd;

> > +	union {

> > +		struct {

> > +			__u16 opcode;

> > +			__u16 rsvd;

> > +		} raw;

> > +		__u32 rsvd;

> > +	};

> >  	__u32 retval;

> >  

> >  	struct {

>
Dan Williams Feb. 11, 2021, 4:43 p.m. UTC | #7
On Wed, Feb 10, 2021 at 7:27 AM <Ariel.Sibley@microchip.com> wrote:
>

> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig

> > index c4ba3aa0a05d..08eaa8e52083 100644

> > --- a/drivers/cxl/Kconfig

> > +++ b/drivers/cxl/Kconfig

> > @@ -33,6 +33,24 @@ config CXL_MEM

> >

> >           If unsure say 'm'.

> >

> > +config CXL_MEM_RAW_COMMANDS

> > +       bool "RAW Command Interface for Memory Devices"

> > +       depends on CXL_MEM

> > +       help

> > +         Enable CXL RAW command interface.

> > +

> > +         The CXL driver ioctl interface may assign a kernel ioctl command

> > +         number for each specification defined opcode. At any given point in

> > +         time the number of opcodes that the specification defines and a device

> > +         may implement may exceed the kernel's set of associated ioctl function

> > +         numbers. The mismatch is either by omission, specification is too new,

> > +         or by design. When prototyping new hardware, or developing /

> > debugging

> > +         the driver it is useful to be able to submit any possible command to

> > +         the hardware, even commands that may crash the kernel due to their

> > +         potential impact to memory currently in use by the kernel.

> > +

> > +         If developing CXL hardware or the driver say Y, otherwise say N.

>

> Blocking RAW commands by default will prevent vendors from developing user space tools that utilize vendor specific commands. Vendors of CXL.mem devices should take ownership of ensuring any vendor defined commands that could cause user data to be exposed or corrupted are disabled at the device level for shipping configurations.


What follows is my personal opinion as a Linux kernel developer, not
necessarily the opinion of my employer...

Aside from the convention that new functionality is always default N
it is the Linux distributor that decides the configuration. In an
environment where the kernel is developing features like
CONFIG_SECURITY_LOCKDOWN_LSM that limit the ability of the kernel to
subvert platform features like secure boot, it is incumbent upon
drivers to evaluate what they must do to protect platform integrity.
See the ongoing tightening of /dev/mem like interfaces for an example
of the shrinking ability of root to have unfettered access to all
platform/hardware capabilities.

CXL is unique in that it impacts "System RAM" resources and that it
interleaves multiple devices. Compare this to NVME where the blast
radius of misbehavior is contained to an endpoint and is behind an
IOMMU. The larger impact to me increases the responsibility of CXL
enabling to review system impacts and vendor specific functionality is
typically unreviewable.

There are 2 proposals I can see to improve the unreviewable problem.
First, of course, get commands into the standard proper. One strawman
proposal is to take the "Code First" process that seems to be working
well for the ACPI and UEFI working groups and apply it to CXL command
definitions. That vastly shortens the time between proposal and Linux
enabling. The second proposal is to define a mechanism for de-facto
standards to develop. That need I believe was the motivation for
"designated vendor-specific" in the first instance? I.e. to share
implementations across vendors pre-standardization.

So, allocate a public id for the command space, publish a public
specification, and then send kernel patches. This was the process for
accepting command sets outside of ACPI into the LIBNVDIMM subsystem.
See drivers/acpi/nfit/nfit.h for the reference to the public command
sets.
Jonathan Cameron Feb. 12, 2021, 1:40 p.m. UTC | #8
On Thu, 11 Feb 2021 08:01:48 -0800
Ben Widawsky <ben.widawsky@intel.com> wrote:

> On 21-02-11 11:19:24, Jonathan Cameron wrote:

> > On Tue, 9 Feb 2021 16:02:56 -0800

> > Ben Widawsky <ben.widawsky@intel.com> wrote:

> >   

> > > The CXL memory device send interface will have a number of supported

> > > commands. The raw command is not such a command. Raw commands allow

> > > userspace to send a specified opcode to the underlying hardware and

> > > bypass all driver checks on the command. This is useful for a couple of

> > > usecases, mainly:

> > > 1. Undocumented vendor specific hardware commands  

> > 

> > This one I get.  There are things we'd love to standardize but often they

> > need proving in a generation of hardware before the data is available to

> > justify taking it to a standards body.  Stuff like performance stats.

> > This stuff will all sit in the vendor defined range.  Maybe there is an

> > argument for in driver hooks to allow proper support even for these

> > (Ben mentioned this in the other branch of the thread).

> >   

> > > 2. Prototyping new hardware commands not yet supported by the driver  

> > 

> > For 2, could just have a convenient place to enable this by one line patch.

> > Some subsystems (SPI comes to mind) do this for their equivalent of raw

> > commands.  The code is all there to enable it but you need to hook it

> > up if you want to use it.  Avoids chance of a distro shipping it.

> >   

> 

> I'm fine to drop #2 as a justification point, or maybe reword the commit message

> to say, "you could also just do... but since we have it for #1 already..."

> 

> > > 

> > > While this all sounds very powerful it comes with a couple of caveats:

> > > 1. Bug reports using raw commands will not get the same level of

> > >    attention as bug reports using supported commands (via taint).

> > > 2. Supported commands will be rejected by the RAW command.  

> > 

> > Perhaps I'm missing reading this point 2 (not sure the code actually does it!)

> > 

> > As stated what worries me as it means when we add support for a new

> > bit of the spec we just broke the userspace ABI.

> >   

> 

> It does not break ABI. The agreement is userspace must always use the QUERY

> command to find out what commands are supported. If it tries to use a RAW

> command that is a supported command, it will be rejected. In the case you

> mention, that's an application bug. If there is a way to document that better

> than what's already in the UAPI kdocs, I'm open to suggestions.

> 

> Unlike perhaps other UAPI, this one only promises to give you a way to determine

> what commands you can use, not the list of what commands you can use.


*crossed fingers* on this.  Users may have a different view when their application
just stops working.  It might print a nice error message telling them why
but it still doesn't work and that way lies grumpy Linus and reverts...

Mostly we'll get away with it because no one will notice, but it's unfortunately
still risky.   Personal preference is toplay safer and not allow direct userspace
access to commands in the spec (unless we've decided they will always be available
directly to userspace).  This includes anything in the ranges reserved for future
spec usage.

Jonathan



> 

> > > 

> > > With this comes new debugfs knob to allow full access to your toes with

> > > your weapon of choice.  

> > 

> > A few trivial things inline,

> > 

> > Jonathan

> >   

> > > 

> > > Cc: Ariel Sibley <Ariel.Sibley@microchip.com>

> > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>

> > > Reviewed-by: Dan Williams <dan.j.williams@intel.com>

> > > ---

> > >  drivers/cxl/Kconfig          |  18 +++++

> > >  drivers/cxl/mem.c            | 125 ++++++++++++++++++++++++++++++++++-

> > >  include/uapi/linux/cxl_mem.h |  12 +++-

> > >  3 files changed, 152 insertions(+), 3 deletions(-)

> > > 

> > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig

> > > index c4ba3aa0a05d..08eaa8e52083 100644

> > > --- a/drivers/cxl/Kconfig

> > > +++ b/drivers/cxl/Kconfig

> > > @@ -33,6 +33,24 @@ config CXL_MEM

> > >  

> > >  	  If unsure say 'm'.

> > >  

> > > +config CXL_MEM_RAW_COMMANDS

> > > +	bool "RAW Command Interface for Memory Devices"

> > > +	depends on CXL_MEM

> > > +	help

> > > +	  Enable CXL RAW command interface.

> > > +

> > > +	  The CXL driver ioctl interface may assign a kernel ioctl command

> > > +	  number for each specification defined opcode. At any given point in

> > > +	  time the number of opcodes that the specification defines and a device

> > > +	  may implement may exceed the kernel's set of associated ioctl function

> > > +	  numbers. The mismatch is either by omission, specification is too new,

> > > +	  or by design. When prototyping new hardware, or developing / debugging

> > > +	  the driver it is useful to be able to submit any possible command to

> > > +	  the hardware, even commands that may crash the kernel due to their

> > > +	  potential impact to memory currently in use by the kernel.

> > > +

> > > +	  If developing CXL hardware or the driver say Y, otherwise say N.

> > > +

> > >  config CXL_MEM_INSECURE_DEBUG

> > >  	bool "CXL.mem debugging"

> > >  	depends on CXL_MEM

> > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c

> > > index ce65630bb75e..6d766a994dce 100644

> > > --- a/drivers/cxl/mem.c

> > > +++ b/drivers/cxl/mem.c

> > > @@ -1,6 +1,8 @@

> > >  // SPDX-License-Identifier: GPL-2.0-only

> > >  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */

> > >  #include <uapi/linux/cxl_mem.h>

> > > +#include <linux/security.h>

> > > +#include <linux/debugfs.h>

> > >  #include <linux/module.h>

> > >  #include <linux/mutex.h>

> > >  #include <linux/cdev.h>

> > > @@ -41,7 +43,14 @@

> > >  

> > >  enum opcode {

> > >  	CXL_MBOX_OP_INVALID		= 0x0000,

> > > +	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,

> > > +	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,

> > >  	CXL_MBOX_OP_IDENTIFY		= 0x4000,

> > > +	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,

> > > +	CXL_MBOX_OP_SET_LSA		= 0x4103,

> > > +	CXL_MBOX_OP_SET_SHUTDOWN_STATE	= 0x4204,

> > > +	CXL_MBOX_OP_SCAN_MEDIA		= 0x4304,

> > > +	CXL_MBOX_OP_GET_SCAN_MEDIA	= 0x4305,

> > >  	CXL_MBOX_OP_MAX			= 0x10000

> > >  };

> > >  

> > > @@ -91,6 +100,8 @@ struct cxl_memdev {

> > >  

> > >  static int cxl_mem_major;

> > >  static DEFINE_IDA(cxl_memdev_ida);

> > > +static struct dentry *cxl_debugfs;

> > > +static bool raw_allow_all;

> > >  

> > >  /**

> > >   * struct cxl_mem_command - Driver representation of a memory device command

> > > @@ -132,6 +143,49 @@ struct cxl_mem_command {

> > >   */

> > >  static struct cxl_mem_command mem_commands[] = {

> > >  	CXL_CMD(IDENTIFY, NONE, 0, 0x43),

> > > +#ifdef CONFIG_CXL_MEM_RAW_COMMANDS

> > > +	CXL_CMD(RAW, NONE, ~0, ~0),

> > > +#endif

> > > +};

> > > +

> > > +/*

> > > + * Commands that RAW doesn't permit. The rationale for each:

> > > + *

> > > + * CXL_MBOX_OP_ACTIVATE_FW: Firmware activation requires adjustment /

> > > + * coordination of transaction timeout values at the root bridge level.

> > > + *

> > > + * CXL_MBOX_OP_SET_PARTITION_INFO: The device memory map may change live

> > > + * and needs to be coordinated with HDM updates.

> > > + *

> > > + * CXL_MBOX_OP_SET_LSA: The label storage area may be cached by the

> > > + * driver and any writes from userspace invalidates those contents.

> > > + *

> > > + * CXL_MBOX_OP_SET_SHUTDOWN_STATE: Set shutdown state assumes no writes

> > > + * to the device after it is marked clean, userspace can not make that

> > > + * assertion.

> > > + *

> > > + * CXL_MBOX_OP_[GET_]SCAN_MEDIA: The kernel provides a native error list that

> > > + * is kept up to date with patrol notifications and error management.

> > > + */

> > > +static u16 disabled_raw_commands[] = {

> > > +	CXL_MBOX_OP_ACTIVATE_FW,

> > > +	CXL_MBOX_OP_SET_PARTITION_INFO,

> > > +	CXL_MBOX_OP_SET_LSA,

> > > +	CXL_MBOX_OP_SET_SHUTDOWN_STATE,

> > > +	CXL_MBOX_OP_SCAN_MEDIA,

> > > +	CXL_MBOX_OP_GET_SCAN_MEDIA,

> > > +};

> > > +

> > > +/*

> > > + * Command sets that RAW doesn't permit. All opcodes in this set are

> > > + * disabled because they pass plain text security payloads over the

> > > + * user/kernel boundary. This functionality is intended to be wrapped

> > > + * behind the keys ABI which allows for encrypted payloads in the UAPI

> > > + */

> > > +static u8 security_command_sets[] = {

> > > +	0x44, /* Sanitize */

> > > +	0x45, /* Persistent Memory Data-at-rest Security */

> > > +	0x46, /* Security Passthrough */

> > >  };

> > >  

> > >  #define cxl_for_each_cmd(cmd)                                                  \

> > > @@ -162,6 +216,16 @@ static int cxl_mem_wait_for_doorbell(struct cxl_mem *cxlm)

> > >  	return 0;

> > >  }

> > >  

> > > +static bool is_security_command(u16 opcode)

> > > +{

> > > +	int i;

> > > +

> > > +	for (i = 0; i < ARRAY_SIZE(security_command_sets); i++)

> > > +		if (security_command_sets[i] == (opcode >> 8))

> > > +			return true;

> > > +	return false;

> > > +}

> > > +

> > >  static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm,

> > >  				 struct mbox_cmd *mbox_cmd)

> > >  {

> > > @@ -170,7 +234,8 @@ static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm,

> > >  	dev_dbg(dev, "Mailbox command (opcode: %#x size: %zub) timed out\n",

> > >  		mbox_cmd->opcode, mbox_cmd->size_in);

> > >  

> > > -	if (IS_ENABLED(CONFIG_CXL_MEM_INSECURE_DEBUG)) {

> > > +	if (!is_security_command(mbox_cmd->opcode) ||

> > > +	    IS_ENABLED(CONFIG_CXL_MEM_INSECURE_DEBUG)) {

> > >  		print_hex_dump_debug("Payload ", DUMP_PREFIX_OFFSET, 16, 1,

> > >  				     mbox_cmd->payload_in, mbox_cmd->size_in,

> > >  				     true);

> > > @@ -434,6 +499,9 @@ static int handle_mailbox_cmd_from_user(struct cxl_memdev *cxlmd,

> > >  		cxl_command_names[cmd->info.id].name, mbox_cmd.opcode,

> > >  		cmd->info.size_in);

> > >  

> > > +	dev_WARN_ONCE(dev, cmd->info.id == CXL_MEM_COMMAND_ID_RAW,

> > > +		      "raw command path used\n");

> > > +

> > >  	rc = cxl_mem_mbox_send_cmd(cxlm, &mbox_cmd);

> > >  	cxl_mem_mbox_put(cxlm);

> > >  	if (rc)

> > > @@ -464,6 +532,29 @@ static int handle_mailbox_cmd_from_user(struct cxl_memdev *cxlmd,

> > >  	return rc;

> > >  }

> > >  

> > > +static bool cxl_mem_raw_command_allowed(u16 opcode)

> > > +{

> > > +	int i;

> > > +

> > > +	if (!IS_ENABLED(CONFIG_CXL_MEM_RAW_COMMANDS))

> > > +		return false;

> > > +

> > > +	if (security_locked_down(LOCKDOWN_NONE))

> > > +		return false;

> > > +

> > > +	if (raw_allow_all)

> > > +		return true;

> > > +

> > > +	if (is_security_command(opcode))  

> > Given we are mixing generic calls like security_locked_down()

> > and local cxl specific ones like this one, prefix the

> > local versions.

> > 

> > cxl_is_security_command()

> > 

> > I'd also have a slight preference to do it for cxl_disabled_raw_commands

> > and cxl_raw_allow_all though they are less important as more obviously

> > local by not being function calls.

> >   

> > > +		return false;

> > > +

> > > +	for (i = 0; i < ARRAY_SIZE(disabled_raw_commands); i++)

> > > +		if (disabled_raw_commands[i] == opcode)

> > > +			return false;

> > > +

> > > +	return true;

> > > +}

> > > +

> > >  /**

> > >   * cxl_validate_cmd_from_user() - Check fields for CXL_MEM_SEND_COMMAND.

> > >   * @cxlm: &struct cxl_mem device whose mailbox will be used.

> > > @@ -500,6 +591,29 @@ static int cxl_validate_cmd_from_user(struct cxl_mem *cxlm,

> > >  	if (send_cmd->in.size > cxlm->payload_size)

> > >  		return -EINVAL;

> > >  

> > > +	/* Checks are bypassed for raw commands but along comes the taint! */

> > > +	if (send_cmd->id == CXL_MEM_COMMAND_ID_RAW) {

> > > +		const struct cxl_mem_command temp = {

> > > +			.info = {

> > > +				.id = CXL_MEM_COMMAND_ID_RAW,

> > > +				.flags = CXL_MEM_COMMAND_FLAG_NONE,

> > > +				.size_in = send_cmd->in.size,

> > > +				.size_out = send_cmd->out.size,

> > > +			},

> > > +			.opcode = send_cmd->raw.opcode

> > > +		};

> > > +

> > > +		if (send_cmd->raw.rsvd)

> > > +			return -EINVAL;

> > > +

> > > +		if (!cxl_mem_raw_command_allowed(send_cmd->raw.opcode))

> > > +			return -EPERM;

> > > +

> > > +		memcpy(out_cmd, &temp, sizeof(temp));

> > > +

> > > +		return 0;

> > > +	}

> > > +

> > >  	if (send_cmd->flags & ~CXL_MEM_COMMAND_FLAG_MASK)

> > >  		return -EINVAL;

> > >  

> > > @@ -1123,8 +1237,9 @@ static struct pci_driver cxl_mem_driver = {

> > >  

> > >  static __init int cxl_mem_init(void)

> > >  {

> > > -	int rc;

> > > +	struct dentry *mbox_debugfs;

> > >  	dev_t devt;

> > > +	int rc;  

> > 

> > Shuffle this back to the place it was introduced to reduce patch noise.

> >   

> > >  

> > >  	rc = alloc_chrdev_region(&devt, 0, CXL_MEM_MAX_DEVS, "cxl");

> > >  	if (rc)

> > > @@ -1139,11 +1254,17 @@ static __init int cxl_mem_init(void)

> > >  		return rc;

> > >  	}

> > >  

> > > +	cxl_debugfs = debugfs_create_dir("cxl", NULL);

> > > +	mbox_debugfs = debugfs_create_dir("mbox", cxl_debugfs);

> > > +	debugfs_create_bool("raw_allow_all", 0600, mbox_debugfs,

> > > +			    &raw_allow_all);

> > > +

> > >  	return 0;

> > >  }

> > >  

> > >  static __exit void cxl_mem_exit(void)

> > >  {

> > > +	debugfs_remove_recursive(cxl_debugfs);

> > >  	pci_unregister_driver(&cxl_mem_driver);

> > >  	unregister_chrdev_region(MKDEV(cxl_mem_major, 0), CXL_MEM_MAX_DEVS);

> > >  }

> > > diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h

> > > index f1f7e9f32ea5..72d1eb601a5d 100644

> > > --- a/include/uapi/linux/cxl_mem.h

> > > +++ b/include/uapi/linux/cxl_mem.h

> > > @@ -22,6 +22,7 @@

> > >  #define CXL_CMDS                                                          \

> > >  	___C(INVALID, "Invalid Command"),                                 \

> > >  	___C(IDENTIFY, "Identify Command"),                               \

> > > +	___C(RAW, "Raw device command"),                                  \

> > >  	___C(MAX, "Last command")

> > >  

> > >  #define ___C(a, b) CXL_MEM_COMMAND_ID_##a

> > > @@ -112,6 +113,9 @@ struct cxl_mem_query_commands {

> > >   * @id: The command to send to the memory device. This must be one of the

> > >   *	commands returned by the query command.

> > >   * @flags: Flags for the command (input).

> > > + * @raw: Special fields for raw commands

> > > + * @raw.opcode: Opcode passed to hardware when using the RAW command.

> > > + * @raw.rsvd: Must be zero.

> > >   * @rsvd: Must be zero.

> > >   * @retval: Return value from the memory device (output).

> > >   * @in.size: Size of the payload to provide to the device (input).

> > > @@ -133,7 +137,13 @@ struct cxl_mem_query_commands {

> > >  struct cxl_send_command {

> > >  	__u32 id;

> > >  	__u32 flags;

> > > -	__u32 rsvd;

> > > +	union {

> > > +		struct {

> > > +			__u16 opcode;

> > > +			__u16 rsvd;

> > > +		} raw;

> > > +		__u32 rsvd;

> > > +	};

> > >  	__u32 retval;

> > >  

> > >  	struct {  

> >
diff mbox series

Patch

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index c4ba3aa0a05d..08eaa8e52083 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -33,6 +33,24 @@  config CXL_MEM
 
 	  If unsure say 'm'.
 
+config CXL_MEM_RAW_COMMANDS
+	bool "RAW Command Interface for Memory Devices"
+	depends on CXL_MEM
+	help
+	  Enable CXL RAW command interface.
+
+	  The CXL driver ioctl interface may assign a kernel ioctl command
+	  number for each specification defined opcode. At any given point in
+	  time the number of opcodes that the specification defines and a device
+	  may implement may exceed the kernel's set of associated ioctl function
+	  numbers. The mismatch is either by omission, specification is too new,
+	  or by design. When prototyping new hardware, or developing / debugging
+	  the driver it is useful to be able to submit any possible command to
+	  the hardware, even commands that may crash the kernel due to their
+	  potential impact to memory currently in use by the kernel.
+
+	  If developing CXL hardware or the driver say Y, otherwise say N.
+
 config CXL_MEM_INSECURE_DEBUG
 	bool "CXL.mem debugging"
 	depends on CXL_MEM
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index ce65630bb75e..6d766a994dce 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -1,6 +1,8 @@ 
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
 #include <uapi/linux/cxl_mem.h>
+#include <linux/security.h>
+#include <linux/debugfs.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/cdev.h>
@@ -41,7 +43,14 @@ 
 
 enum opcode {
 	CXL_MBOX_OP_INVALID		= 0x0000,
+	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
+	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
 	CXL_MBOX_OP_IDENTIFY		= 0x4000,
+	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
+	CXL_MBOX_OP_SET_LSA		= 0x4103,
+	CXL_MBOX_OP_SET_SHUTDOWN_STATE	= 0x4204,
+	CXL_MBOX_OP_SCAN_MEDIA		= 0x4304,
+	CXL_MBOX_OP_GET_SCAN_MEDIA	= 0x4305,
 	CXL_MBOX_OP_MAX			= 0x10000
 };
 
@@ -91,6 +100,8 @@  struct cxl_memdev {
 
 static int cxl_mem_major;
 static DEFINE_IDA(cxl_memdev_ida);
+static struct dentry *cxl_debugfs;
+static bool raw_allow_all;
 
 /**
  * struct cxl_mem_command - Driver representation of a memory device command
@@ -132,6 +143,49 @@  struct cxl_mem_command {
  */
 static struct cxl_mem_command mem_commands[] = {
 	CXL_CMD(IDENTIFY, NONE, 0, 0x43),
+#ifdef CONFIG_CXL_MEM_RAW_COMMANDS
+	CXL_CMD(RAW, NONE, ~0, ~0),
+#endif
+};
+
+/*
+ * Commands that RAW doesn't permit. The rationale for each:
+ *
+ * CXL_MBOX_OP_ACTIVATE_FW: Firmware activation requires adjustment /
+ * coordination of transaction timeout values at the root bridge level.
+ *
+ * CXL_MBOX_OP_SET_PARTITION_INFO: The device memory map may change live
+ * and needs to be coordinated with HDM updates.
+ *
+ * CXL_MBOX_OP_SET_LSA: The label storage area may be cached by the
+ * driver and any writes from userspace invalidates those contents.
+ *
+ * CXL_MBOX_OP_SET_SHUTDOWN_STATE: Set shutdown state assumes no writes
+ * to the device after it is marked clean, userspace can not make that
+ * assertion.
+ *
+ * CXL_MBOX_OP_[GET_]SCAN_MEDIA: The kernel provides a native error list that
+ * is kept up to date with patrol notifications and error management.
+ */
+static u16 disabled_raw_commands[] = {
+	CXL_MBOX_OP_ACTIVATE_FW,
+	CXL_MBOX_OP_SET_PARTITION_INFO,
+	CXL_MBOX_OP_SET_LSA,
+	CXL_MBOX_OP_SET_SHUTDOWN_STATE,
+	CXL_MBOX_OP_SCAN_MEDIA,
+	CXL_MBOX_OP_GET_SCAN_MEDIA,
+};
+
+/*
+ * Command sets that RAW doesn't permit. All opcodes in this set are
+ * disabled because they pass plain text security payloads over the
+ * user/kernel boundary. This functionality is intended to be wrapped
+ * behind the keys ABI which allows for encrypted payloads in the UAPI
+ */
+static u8 security_command_sets[] = {
+	0x44, /* Sanitize */
+	0x45, /* Persistent Memory Data-at-rest Security */
+	0x46, /* Security Passthrough */
 };
 
 #define cxl_for_each_cmd(cmd)                                                  \
@@ -162,6 +216,16 @@  static int cxl_mem_wait_for_doorbell(struct cxl_mem *cxlm)
 	return 0;
 }
 
+static bool is_security_command(u16 opcode)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(security_command_sets); i++)
+		if (security_command_sets[i] == (opcode >> 8))
+			return true;
+	return false;
+}
+
 static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm,
 				 struct mbox_cmd *mbox_cmd)
 {
@@ -170,7 +234,8 @@  static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm,
 	dev_dbg(dev, "Mailbox command (opcode: %#x size: %zub) timed out\n",
 		mbox_cmd->opcode, mbox_cmd->size_in);
 
-	if (IS_ENABLED(CONFIG_CXL_MEM_INSECURE_DEBUG)) {
+	if (!is_security_command(mbox_cmd->opcode) ||
+	    IS_ENABLED(CONFIG_CXL_MEM_INSECURE_DEBUG)) {
 		print_hex_dump_debug("Payload ", DUMP_PREFIX_OFFSET, 16, 1,
 				     mbox_cmd->payload_in, mbox_cmd->size_in,
 				     true);
@@ -434,6 +499,9 @@  static int handle_mailbox_cmd_from_user(struct cxl_memdev *cxlmd,
 		cxl_command_names[cmd->info.id].name, mbox_cmd.opcode,
 		cmd->info.size_in);
 
+	dev_WARN_ONCE(dev, cmd->info.id == CXL_MEM_COMMAND_ID_RAW,
+		      "raw command path used\n");
+
 	rc = cxl_mem_mbox_send_cmd(cxlm, &mbox_cmd);
 	cxl_mem_mbox_put(cxlm);
 	if (rc)
@@ -464,6 +532,29 @@  static int handle_mailbox_cmd_from_user(struct cxl_memdev *cxlmd,
 	return rc;
 }
 
+static bool cxl_mem_raw_command_allowed(u16 opcode)
+{
+	int i;
+
+	if (!IS_ENABLED(CONFIG_CXL_MEM_RAW_COMMANDS))
+		return false;
+
+	if (security_locked_down(LOCKDOWN_NONE))
+		return false;
+
+	if (raw_allow_all)
+		return true;
+
+	if (is_security_command(opcode))
+		return false;
+
+	for (i = 0; i < ARRAY_SIZE(disabled_raw_commands); i++)
+		if (disabled_raw_commands[i] == opcode)
+			return false;
+
+	return true;
+}
+
 /**
  * cxl_validate_cmd_from_user() - Check fields for CXL_MEM_SEND_COMMAND.
  * @cxlm: &struct cxl_mem device whose mailbox will be used.
@@ -500,6 +591,29 @@  static int cxl_validate_cmd_from_user(struct cxl_mem *cxlm,
 	if (send_cmd->in.size > cxlm->payload_size)
 		return -EINVAL;
 
+	/* Checks are bypassed for raw commands but along comes the taint! */
+	if (send_cmd->id == CXL_MEM_COMMAND_ID_RAW) {
+		const struct cxl_mem_command temp = {
+			.info = {
+				.id = CXL_MEM_COMMAND_ID_RAW,
+				.flags = CXL_MEM_COMMAND_FLAG_NONE,
+				.size_in = send_cmd->in.size,
+				.size_out = send_cmd->out.size,
+			},
+			.opcode = send_cmd->raw.opcode
+		};
+
+		if (send_cmd->raw.rsvd)
+			return -EINVAL;
+
+		if (!cxl_mem_raw_command_allowed(send_cmd->raw.opcode))
+			return -EPERM;
+
+		memcpy(out_cmd, &temp, sizeof(temp));
+
+		return 0;
+	}
+
 	if (send_cmd->flags & ~CXL_MEM_COMMAND_FLAG_MASK)
 		return -EINVAL;
 
@@ -1123,8 +1237,9 @@  static struct pci_driver cxl_mem_driver = {
 
 static __init int cxl_mem_init(void)
 {
-	int rc;
+	struct dentry *mbox_debugfs;
 	dev_t devt;
+	int rc;
 
 	rc = alloc_chrdev_region(&devt, 0, CXL_MEM_MAX_DEVS, "cxl");
 	if (rc)
@@ -1139,11 +1254,17 @@  static __init int cxl_mem_init(void)
 		return rc;
 	}
 
+	cxl_debugfs = debugfs_create_dir("cxl", NULL);
+	mbox_debugfs = debugfs_create_dir("mbox", cxl_debugfs);
+	debugfs_create_bool("raw_allow_all", 0600, mbox_debugfs,
+			    &raw_allow_all);
+
 	return 0;
 }
 
 static __exit void cxl_mem_exit(void)
 {
+	debugfs_remove_recursive(cxl_debugfs);
 	pci_unregister_driver(&cxl_mem_driver);
 	unregister_chrdev_region(MKDEV(cxl_mem_major, 0), CXL_MEM_MAX_DEVS);
 }
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index f1f7e9f32ea5..72d1eb601a5d 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -22,6 +22,7 @@ 
 #define CXL_CMDS                                                          \
 	___C(INVALID, "Invalid Command"),                                 \
 	___C(IDENTIFY, "Identify Command"),                               \
+	___C(RAW, "Raw device command"),                                  \
 	___C(MAX, "Last command")
 
 #define ___C(a, b) CXL_MEM_COMMAND_ID_##a
@@ -112,6 +113,9 @@  struct cxl_mem_query_commands {
  * @id: The command to send to the memory device. This must be one of the
  *	commands returned by the query command.
  * @flags: Flags for the command (input).
+ * @raw: Special fields for raw commands
+ * @raw.opcode: Opcode passed to hardware when using the RAW command.
+ * @raw.rsvd: Must be zero.
  * @rsvd: Must be zero.
  * @retval: Return value from the memory device (output).
  * @in.size: Size of the payload to provide to the device (input).
@@ -133,7 +137,13 @@  struct cxl_mem_query_commands {
 struct cxl_send_command {
 	__u32 id;
 	__u32 flags;
-	__u32 rsvd;
+	union {
+		struct {
+			__u16 opcode;
+			__u16 rsvd;
+		} raw;
+		__u32 rsvd;
+	};
 	__u32 retval;
 
 	struct {