mbox series

[v2,00/13] riscv: Add support for xtheadvector

Message ID 20240610-xtheadvector-v2-0-97a48613ad64@rivosinc.com
Headers show
Series riscv: Add support for xtheadvector | expand

Message

Charlie Jenkins June 10, 2024, 10:56 p.m. UTC
xtheadvector is a custom extension that is based upon riscv vector
version 0.7.1 [1]. All of the vector routines have been modified to
support this alternative vector version based upon whether xtheadvector
was determined to be supported at boot.

vlenb is not supported on the existing xtheadvector hardware, so a
devicetree property thead,vlenb is added to provide the vlenb to Linux.

There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is
used to request which thead vendor extensions are supported on the
current platform. This allows future vendors to allocate hwprobe keys
for their vendor.

Support for xtheadvector is also added to the vector kselftests.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>

[1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361c61d335e03d3134b14133f/xtheadvector.adoc

---
This series is a continuation of a different series that was fragmented
into two other series in an attempt to get part of it merged in the 6.10
merge window. The split-off series did not get merged due to a NAK on
the series that added the generic riscv,vlenb devicetree entry. This
series has converted riscv,vlenb to thead,vlenb to remedy this issue.

The original series is titled "riscv: Support vendor extensions and
xtheadvector" [3].

The series titled "riscv: Extend cpufeature.c to detect vendor
extensions" is still under development and this series is based on that
series! [4]

I have tested this with an Allwinner Nezha board. I ran into issues
booting the board after 6.9-rc1 so I applied these patches to 6.8. There
are a couple of minor merge conflicts that do arrise when doing that, so
please let me know if you have been able to boot this board with a 6.9
kernel. I used SkiffOS [1] to manage building the image, but upgraded
the U-Boot version to Samuel Holland's more up-to-date version [2] and
changed out the device tree used by U-Boot with the device trees that
are present in upstream linux and this series. Thank you Samuel for all
of the work you did to make this task possible.

[1] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[2] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9a0b48
[3] https://lore.kernel.org/all/20240503-dev-charlie-support_thead_vector_6_9-v6-0-cb7624e65d82@rivosinc.com/
[4] https://lore.kernel.org/linux-riscv/20240609-support_vendor_extensions-v2-0-9a43f1fdcbb9@rivosinc.com/

---
Changes in v2:
- Removed extraneous references to "riscv,vlenb" (Jess)
- Moved declaration of "thead,vlenb" into cpus.yaml and added
  restriction that it's only applicable to thead cores (Conor)
- Check CONFIG_RISCV_ISA_XTHEADVECTOR instead of CONFIG_RISCV_ISA_V for
  thead,vlenb (Jess)
- Fix naming of hwprobe variables (Evan)
- Link to v1: https://lore.kernel.org/r/20240609-xtheadvector-v1-0-3fe591d7f109@rivosinc.com

---
Charlie Jenkins (12):
      dt-bindings: riscv: Add xtheadvector ISA extension description
      dt-bindings: cpus: add a thead vlen register length property
      riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
      riscv: Add thead and xtheadvector as a vendor extension
      riscv: vector: Use vlenb from DT for thead
      riscv: csr: Add CSR encodings for VCSR_VXRM/VCSR_VXSAT
      riscv: Add xtheadvector instruction definitions
      riscv: vector: Support xtheadvector save/restore
      riscv: hwprobe: Add thead vendor extension probing
      riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
      selftests: riscv: Fix vector tests
      selftests: riscv: Support xtheadvector in vector tests

Heiko Stuebner (1):
      RISC-V: define the elements of the VCSR vector CSR

 Documentation/arch/riscv/hwprobe.rst               |  10 +
 Documentation/devicetree/bindings/riscv/cpus.yaml  |  19 ++
 .../devicetree/bindings/riscv/extensions.yaml      |  10 +
 arch/riscv/Kconfig.vendor                          |  26 ++
 arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi      |   3 +-
 arch/riscv/include/asm/cpufeature.h                |   2 +
 arch/riscv/include/asm/csr.h                       |  13 +
 arch/riscv/include/asm/hwprobe.h                   |   4 +-
 arch/riscv/include/asm/switch_to.h                 |   2 +-
 arch/riscv/include/asm/vector.h                    | 249 +++++++++++++----
 arch/riscv/include/asm/vendor_extensions/thead.h   |  42 +++
 .../include/asm/vendor_extensions/thead_hwprobe.h  |  18 ++
 .../include/asm/vendor_extensions/vendor_hwprobe.h |  37 +++
 arch/riscv/include/uapi/asm/hwprobe.h              |   3 +-
 arch/riscv/include/uapi/asm/vendor/thead.h         |   3 +
 arch/riscv/kernel/cpufeature.c                     |  51 +++-
 arch/riscv/kernel/kernel_mode_vector.c             |   8 +-
 arch/riscv/kernel/process.c                        |   4 +-
 arch/riscv/kernel/signal.c                         |   6 +-
 arch/riscv/kernel/sys_hwprobe.c                    |   5 +
 arch/riscv/kernel/vector.c                         |  25 +-
 arch/riscv/kernel/vendor_extensions.c              |  10 +
 arch/riscv/kernel/vendor_extensions/Makefile       |   2 +
 arch/riscv/kernel/vendor_extensions/thead.c        |  18 ++
 .../riscv/kernel/vendor_extensions/thead_hwprobe.c |  19 ++
 tools/testing/selftests/riscv/vector/.gitignore    |   3 +-
 tools/testing/selftests/riscv/vector/Makefile      |  17 +-
 .../selftests/riscv/vector/v_exec_initval_nolibc.c |  93 +++++++
 tools/testing/selftests/riscv/vector/v_helpers.c   |  67 +++++
 tools/testing/selftests/riscv/vector/v_helpers.h   |   7 +
 tools/testing/selftests/riscv/vector/v_initval.c   |  22 ++
 .../selftests/riscv/vector/v_initval_nolibc.c      |  68 -----
 .../selftests/riscv/vector/vstate_exec_nolibc.c    |  20 +-
 .../testing/selftests/riscv/vector/vstate_prctl.c  | 295 ++++++++++++---------
 34 files changed, 910 insertions(+), 271 deletions(-)
---
base-commit: 11cc01d4d2af304b7288251aad7e03315db8dffc
change-id: 20240530-xtheadvector-833d3d17b423

Comments

Evan Green June 11, 2024, 3:58 p.m. UTC | #1
On Mon, Jun 10, 2024 at 3:57 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> Document support for thead vendor extensions using the key
> RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 and xtheadvector extension using
> the key RISCV_HWPROBE_VENDOR_EXT_XTHEADVECTOR.
>
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> Reviewed-by: Evan Green <evan@rivosinc.com>
> ---
>  Documentation/arch/riscv/hwprobe.rst | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/Documentation/arch/riscv/hwprobe.rst b/Documentation/arch/riscv/hwprobe.rst
> index 204cd4433af5..9c0ef8c57228 100644
> --- a/Documentation/arch/riscv/hwprobe.rst
> +++ b/Documentation/arch/riscv/hwprobe.rst
> @@ -214,3 +214,13 @@ The following keys are defined:
>
>  * :c:macro:`RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE`: An unsigned int which
>    represents the size of the Zicboz block in bytes.
> +
> +* :c:macro:`RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0`: A bitmask containing the

Our recent snafoo with CPUPERF_0 popped into my memory
when reading this. Does this work properly with the WHICH_CPUS flag?
Specifically, we need hwprobe_key_is_bitmask() to return true for this
key since it's a bitmask.
Charlie Jenkins June 11, 2024, 7:47 p.m. UTC | #2
On Tue, Jun 11, 2024 at 08:58:37AM -0700, Evan Green wrote:
> On Mon, Jun 10, 2024 at 3:57 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >
> > Document support for thead vendor extensions using the key
> > RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 and xtheadvector extension using
> > the key RISCV_HWPROBE_VENDOR_EXT_XTHEADVECTOR.
> >
> > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > Reviewed-by: Evan Green <evan@rivosinc.com>
> > ---
> >  Documentation/arch/riscv/hwprobe.rst | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/Documentation/arch/riscv/hwprobe.rst b/Documentation/arch/riscv/hwprobe.rst
> > index 204cd4433af5..9c0ef8c57228 100644
> > --- a/Documentation/arch/riscv/hwprobe.rst
> > +++ b/Documentation/arch/riscv/hwprobe.rst
> > @@ -214,3 +214,13 @@ The following keys are defined:
> >
> >  * :c:macro:`RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE`: An unsigned int which
> >    represents the size of the Zicboz block in bytes.
> > +
> > +* :c:macro:`RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0`: A bitmask containing the
> 
> Our recent snafoo with CPUPERF_0 popped into my memory
> when reading this. Does this work properly with the WHICH_CPUS flag?
> Specifically, we need hwprobe_key_is_bitmask() to return true for this
> key since it's a bitmask.

Hmm yes I need to add that. Thank you.

- Charlie
Conor Dooley June 13, 2024, 2:34 p.m. UTC | #3
On Mon, Jun 10, 2024 at 03:56:39PM -0700, Charlie Jenkins wrote:
> Add a property analogous to the vlenb CSR so that software can detect
> the vector length of each CPU prior to it being brought online.
> Currently software has to assume that the vector length read from the
> boot CPU applies to all possible CPUs. On T-Head CPUs implementing
> pre-ratification vector, reading the th.vlenb CSR may produce an illegal
> instruction trap, so this property is required on such systems.
> 
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Conor Dooley June 13, 2024, 2:36 p.m. UTC | #4
On Mon, Jun 10, 2024 at 03:56:40PM -0700, Charlie Jenkins wrote:
> The D1/D1s SoCs support xtheadvector so it can be included in the
> devicetree. Also include vlenb for the cpu.
> 
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

> ---
>  arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi b/arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi
> index 64c3c2e6cbe0..6367112e614a 100644
> --- a/arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi
> +++ b/arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi
> @@ -27,7 +27,8 @@ cpu0: cpu@0 {
>  			riscv,isa = "rv64imafdc";
>  			riscv,isa-base = "rv64i";
>  			riscv,isa-extensions = "i", "m", "a", "f", "d", "c", "zicntr", "zicsr",
> -					       "zifencei", "zihpm";
> +					       "zifencei", "zihpm", "xtheadvector";
> +			thead,vlenb = <128>;
>  			#cooling-cells = <2>;
>  
>  			cpu0_intc: interrupt-controller {
> 
> -- 
> 2.44.0
>
Conor Dooley June 13, 2024, 2:39 p.m. UTC | #5
Andy,

On Mon, Jun 10, 2024 at 03:56:46PM -0700, Charlie Jenkins wrote:
> Use alternatives to add support for xtheadvector vector save/restore
> routines.
> 
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>

Could you review this please?

Cheers,
Conor.

> ---
>  arch/riscv/include/asm/csr.h           |   6 +
>  arch/riscv/include/asm/switch_to.h     |   2 +-
>  arch/riscv/include/asm/vector.h        | 249 ++++++++++++++++++++++++++-------
>  arch/riscv/kernel/cpufeature.c         |   2 +-
>  arch/riscv/kernel/kernel_mode_vector.c |   8 +-
>  arch/riscv/kernel/process.c            |   4 +-
>  arch/riscv/kernel/signal.c             |   6 +-
>  arch/riscv/kernel/vector.c             |  13 +-
>  8 files changed, 222 insertions(+), 68 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> index 9086639a3dde..407d4a5687f5 100644
> --- a/arch/riscv/include/asm/csr.h
> +++ b/arch/riscv/include/asm/csr.h
> @@ -30,6 +30,12 @@
>  #define SR_VS_CLEAN	_AC(0x00000400, UL)
>  #define SR_VS_DIRTY	_AC(0x00000600, UL)
>  
> +#define SR_VS_THEAD		_AC(0x01800000, UL) /* xtheadvector Status */
> +#define SR_VS_OFF_THEAD		_AC(0x00000000, UL)
> +#define SR_VS_INITIAL_THEAD	_AC(0x00800000, UL)
> +#define SR_VS_CLEAN_THEAD	_AC(0x01000000, UL)
> +#define SR_VS_DIRTY_THEAD	_AC(0x01800000, UL)
> +
>  #define SR_XS		_AC(0x00018000, UL) /* Extension Status */
>  #define SR_XS_OFF	_AC(0x00000000, UL)
>  #define SR_XS_INITIAL	_AC(0x00008000, UL)
> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> index 7594df37cc9f..f9cbebe372b8 100644
> --- a/arch/riscv/include/asm/switch_to.h
> +++ b/arch/riscv/include/asm/switch_to.h
> @@ -99,7 +99,7 @@ do {							\
>  	__set_prev_cpu(__prev->thread);			\
>  	if (has_fpu())					\
>  		__switch_to_fpu(__prev, __next);	\
> -	if (has_vector())					\
> +	if (has_vector() || has_xtheadvector())		\
>  		__switch_to_vector(__prev, __next);	\
>  	if (switch_to_should_flush_icache(__next))	\
>  		local_flush_icache_all();		\
> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> index 731dcd0ed4de..6294dcaabc6a 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -18,6 +18,27 @@
>  #include <asm/cpufeature.h>
>  #include <asm/csr.h>
>  #include <asm/asm.h>
> +#include <asm/vendorid_list.h>
> +#include <asm/vendor_extensions.h>
> +#include <asm/vendor_extensions/thead.h>
> +
> +#define __riscv_v_vstate_or(_val, TYPE) ({				\
> +	typeof(_val) _res = _val;					\
> +	if (has_xtheadvector()) \
> +		_res = (_res & ~SR_VS_THEAD) | SR_VS_##TYPE##_THEAD;	\
> +	else								\
> +		_res = (_res & ~SR_VS) | SR_VS_##TYPE;			\
> +	_res;								\
> +})
> +
> +#define __riscv_v_vstate_check(_val, TYPE) ({				\
> +	bool _res;							\
> +	if (has_xtheadvector()) \
> +		_res = ((_val) & SR_VS_THEAD) == SR_VS_##TYPE##_THEAD;	\
> +	else								\
> +		_res = ((_val) & SR_VS) == SR_VS_##TYPE;		\
> +	_res;								\
> +})
>  
>  extern unsigned long riscv_v_vsize;
>  int riscv_v_setup_vsize(void);
> @@ -40,39 +61,62 @@ static __always_inline bool has_vector(void)
>  	return riscv_has_extension_unlikely(RISCV_ISA_EXT_v);
>  }
>  
> +static __always_inline bool has_xtheadvector_no_alternatives(void)
> +{
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_XTHEADVECTOR))
> +		return riscv_isa_vendor_extension_available(THEAD_VENDOR_ID, XTHEADVECTOR);
> +	else
> +		return false;
> +}
> +
> +static __always_inline bool has_xtheadvector(void)
> +{
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_XTHEADVECTOR))
> +		return riscv_has_vendor_extension_unlikely(THEAD_VENDOR_ID,
> +							   RISCV_ISA_VENDOR_EXT_XTHEADVECTOR);
> +	else
> +		return false;
> +}
> +
>  static inline void __riscv_v_vstate_clean(struct pt_regs *regs)
>  {
> -	regs->status = (regs->status & ~SR_VS) | SR_VS_CLEAN;
> +	regs->status = __riscv_v_vstate_or(regs->status, CLEAN);
>  }
>  
>  static inline void __riscv_v_vstate_dirty(struct pt_regs *regs)
>  {
> -	regs->status = (regs->status & ~SR_VS) | SR_VS_DIRTY;
> +	regs->status = __riscv_v_vstate_or(regs->status, DIRTY);
>  }
>  
>  static inline void riscv_v_vstate_off(struct pt_regs *regs)
>  {
> -	regs->status = (regs->status & ~SR_VS) | SR_VS_OFF;
> +	regs->status = __riscv_v_vstate_or(regs->status, OFF);
>  }
>  
>  static inline void riscv_v_vstate_on(struct pt_regs *regs)
>  {
> -	regs->status = (regs->status & ~SR_VS) | SR_VS_INITIAL;
> +	regs->status = __riscv_v_vstate_or(regs->status, INITIAL);
>  }
>  
>  static inline bool riscv_v_vstate_query(struct pt_regs *regs)
>  {
> -	return (regs->status & SR_VS) != 0;
> +	return !__riscv_v_vstate_check(regs->status, OFF);
>  }
>  
>  static __always_inline void riscv_v_enable(void)
>  {
> -	csr_set(CSR_SSTATUS, SR_VS);
> +	if (has_xtheadvector())
> +		csr_set(CSR_SSTATUS, SR_VS_THEAD);
> +	else
> +		csr_set(CSR_SSTATUS, SR_VS);
>  }
>  
>  static __always_inline void riscv_v_disable(void)
>  {
> -	csr_clear(CSR_SSTATUS, SR_VS);
> +	if (has_xtheadvector())
> +		csr_clear(CSR_SSTATUS, SR_VS_THEAD);
> +	else
> +		csr_clear(CSR_SSTATUS, SR_VS);
>  }
>  
>  static __always_inline void __vstate_csr_save(struct __riscv_v_ext_state *dest)
> @@ -81,10 +125,49 @@ static __always_inline void __vstate_csr_save(struct __riscv_v_ext_state *dest)
>  		"csrr	%0, " __stringify(CSR_VSTART) "\n\t"
>  		"csrr	%1, " __stringify(CSR_VTYPE) "\n\t"
>  		"csrr	%2, " __stringify(CSR_VL) "\n\t"
> -		"csrr	%3, " __stringify(CSR_VCSR) "\n\t"
> -		"csrr	%4, " __stringify(CSR_VLENB) "\n\t"
>  		: "=r" (dest->vstart), "=r" (dest->vtype), "=r" (dest->vl),
> -		  "=r" (dest->vcsr), "=r" (dest->vlenb) : :);
> +		"=r" (dest->vcsr) : :);
> +
> +	if (has_xtheadvector()) {
> +		u32 tmp_vcsr;
> +		bool restore_fpu = false;
> +		unsigned long status = csr_read(CSR_SSTATUS);
> +
> +		/*
> +		 * CSR_VCSR is defined as
> +		 * [2:1] - vxrm[1:0]
> +		 * [0] - vxsat
> +		 * The earlier vector spec implemented by T-Head uses separate
> +		 * registers for the same bit-elements, so just combine those
> +		 * into the existing output field.
> +		 *
> +		 * Additionally T-Head cores need FS to be enabled when accessing
> +		 * the VXRM and VXSAT CSRs, otherwise ending in illegal instructions.
> +		 * Though the cores do not implement the VXRM and VXSAT fields in the
> +		 * FCSR CSR that vector-0.7.1 specifies.
> +		 */
> +		if ((status & SR_FS) == SR_FS_OFF) {
> +			csr_set(CSR_SSTATUS, (status & ~SR_FS) | SR_FS_CLEAN);
> +			restore_fpu = true;
> +		}
> +
> +		asm volatile (
> +			"csrr	%[tmp_vcsr], " __stringify(VCSR_VXRM) "\n\t"
> +			"slliw	%[vcsr], %[tmp_vcsr], " __stringify(VCSR_VXRM_SHIFT) "\n\t"
> +			"csrr	%[tmp_vcsr], " __stringify(VCSR_VXSAT) "\n\t"
> +			"or	%[vcsr], %[vcsr], %[tmp_vcsr]\n\t"
> +			: [vcsr] "=r" (dest->vcsr), [tmp_vcsr] "=&r" (tmp_vcsr));
> +
> +		dest->vlenb = riscv_v_vsize / 32;
> +
> +		if (restore_fpu)
> +			csr_set(CSR_SSTATUS, status);
> +	} else {
> +		asm volatile (
> +			"csrr	%[vcsr], " __stringify(CSR_VCSR) "\n\t"
> +			"csrr	%[vlenb], " __stringify(CSR_VLENB) "\n\t"
> +			: [vcsr] "=r" (dest->vcsr), [vlenb] "=r" (dest->vlenb));
> +	}
>  }
>  
>  static __always_inline void __vstate_csr_restore(struct __riscv_v_ext_state *src)
> @@ -95,9 +178,37 @@ static __always_inline void __vstate_csr_restore(struct __riscv_v_ext_state *src
>  		"vsetvl	 x0, %2, %1\n\t"
>  		".option pop\n\t"
>  		"csrw	" __stringify(CSR_VSTART) ", %0\n\t"
> -		"csrw	" __stringify(CSR_VCSR) ", %3\n\t"
> -		: : "r" (src->vstart), "r" (src->vtype), "r" (src->vl),
> -		    "r" (src->vcsr) :);
> +		: : "r" (src->vstart), "r" (src->vtype), "r" (src->vl));
> +
> +	if (has_xtheadvector()) {
> +		u32 tmp_vcsr;
> +		bool restore_fpu = false;
> +		unsigned long status = csr_read(CSR_SSTATUS);
> +
> +		/*
> +		 * Similar to __vstate_csr_save above, restore values for the
> +		 * separate VXRM and VXSAT CSRs from the vcsr variable.
> +		 */
> +		if ((status & SR_FS) == SR_FS_OFF) {
> +			csr_set(CSR_SSTATUS, (status & ~SR_FS) | SR_FS_CLEAN);
> +			restore_fpu = true;
> +		}
> +
> +		asm volatile (
> +			"srliw	%[tmp_vcsr], %[vcsr], " __stringify(VCSR_VXRM_SHIFT) "\n\t"
> +			"andi	%[tmp_vcsr], %[tmp_vcsr], " __stringify(VCSR_VXRM_MASK) "\n\t"
> +			"csrw	" __stringify(VCSR_VXRM) ", %[tmp_vcsr]\n\t"
> +			"andi	%[tmp_vcsr], %[vcsr], " __stringify(VCSR_VXSAT_MASK) "\n\t"
> +			"csrw	" __stringify(VCSR_VXSAT) ", %[tmp_vcsr]\n\t"
> +			: [tmp_vcsr] "=&r" (tmp_vcsr) : [vcsr] "r" (src->vcsr));
> +
> +		if (restore_fpu)
> +			csr_set(CSR_SSTATUS, status);
> +	} else {
> +		asm volatile (
> +			"csrw	" __stringify(CSR_VCSR) ", %[vcsr]\n\t"
> +			: : [vcsr] "r" (src->vcsr));
> +	}
>  }
>  
>  static inline void __riscv_v_vstate_save(struct __riscv_v_ext_state *save_to,
> @@ -107,19 +218,33 @@ static inline void __riscv_v_vstate_save(struct __riscv_v_ext_state *save_to,
>  
>  	riscv_v_enable();
>  	__vstate_csr_save(save_to);
> -	asm volatile (
> -		".option push\n\t"
> -		".option arch, +v\n\t"
> -		"vsetvli	%0, x0, e8, m8, ta, ma\n\t"
> -		"vse8.v		v0, (%1)\n\t"
> -		"add		%1, %1, %0\n\t"
> -		"vse8.v		v8, (%1)\n\t"
> -		"add		%1, %1, %0\n\t"
> -		"vse8.v		v16, (%1)\n\t"
> -		"add		%1, %1, %0\n\t"
> -		"vse8.v		v24, (%1)\n\t"
> -		".option pop\n\t"
> -		: "=&r" (vl) : "r" (datap) : "memory");
> +	if (has_xtheadvector()) {
> +		asm volatile (
> +			"mv t0, %0\n\t"
> +			THEAD_VSETVLI_T4X0E8M8D1
> +			THEAD_VSB_V_V0T0
> +			"add		t0, t0, t4\n\t"
> +			THEAD_VSB_V_V0T0
> +			"add		t0, t0, t4\n\t"
> +			THEAD_VSB_V_V0T0
> +			"add		t0, t0, t4\n\t"
> +			THEAD_VSB_V_V0T0
> +			: : "r" (datap) : "memory", "t0", "t4");
> +	} else {
> +		asm volatile (
> +			".option push\n\t"
> +			".option arch, +v\n\t"
> +			"vsetvli	%0, x0, e8, m8, ta, ma\n\t"
> +			"vse8.v		v0, (%1)\n\t"
> +			"add		%1, %1, %0\n\t"
> +			"vse8.v		v8, (%1)\n\t"
> +			"add		%1, %1, %0\n\t"
> +			"vse8.v		v16, (%1)\n\t"
> +			"add		%1, %1, %0\n\t"
> +			"vse8.v		v24, (%1)\n\t"
> +			".option pop\n\t"
> +			: "=&r" (vl) : "r" (datap) : "memory");
> +	}
>  	riscv_v_disable();
>  }
>  
> @@ -129,55 +254,77 @@ static inline void __riscv_v_vstate_restore(struct __riscv_v_ext_state *restore_
>  	unsigned long vl;
>  
>  	riscv_v_enable();
> -	asm volatile (
> -		".option push\n\t"
> -		".option arch, +v\n\t"
> -		"vsetvli	%0, x0, e8, m8, ta, ma\n\t"
> -		"vle8.v		v0, (%1)\n\t"
> -		"add		%1, %1, %0\n\t"
> -		"vle8.v		v8, (%1)\n\t"
> -		"add		%1, %1, %0\n\t"
> -		"vle8.v		v16, (%1)\n\t"
> -		"add		%1, %1, %0\n\t"
> -		"vle8.v		v24, (%1)\n\t"
> -		".option pop\n\t"
> -		: "=&r" (vl) : "r" (datap) : "memory");
> +	if (has_xtheadvector()) {
> +		asm volatile (
> +			"mv t0, %0\n\t"
> +			THEAD_VSETVLI_T4X0E8M8D1
> +			THEAD_VLB_V_V0T0
> +			"add		t0, t0, t4\n\t"
> +			THEAD_VLB_V_V0T0
> +			"add		t0, t0, t4\n\t"
> +			THEAD_VLB_V_V0T0
> +			"add		t0, t0, t4\n\t"
> +			THEAD_VLB_V_V0T0
> +			: : "r" (datap) : "memory", "t0", "t4");
> +	} else {
> +		asm volatile (
> +			".option push\n\t"
> +			".option arch, +v\n\t"
> +			"vsetvli	%0, x0, e8, m8, ta, ma\n\t"
> +			"vle8.v		v0, (%1)\n\t"
> +			"add		%1, %1, %0\n\t"
> +			"vle8.v		v8, (%1)\n\t"
> +			"add		%1, %1, %0\n\t"
> +			"vle8.v		v16, (%1)\n\t"
> +			"add		%1, %1, %0\n\t"
> +			"vle8.v		v24, (%1)\n\t"
> +			".option pop\n\t"
> +			: "=&r" (vl) : "r" (datap) : "memory");
> +	}
>  	__vstate_csr_restore(restore_from);
>  	riscv_v_disable();
>  }
>  
>  static inline void __riscv_v_vstate_discard(void)
>  {
> -	unsigned long vl, vtype_inval = 1UL << (BITS_PER_LONG - 1);
> +	unsigned long vtype_inval = 1UL << (BITS_PER_LONG - 1);
>  
>  	riscv_v_enable();
> +	if (has_xtheadvector())
> +		asm volatile (THEAD_VSETVLI_X0X0E8M8D1);
> +	else
> +		asm volatile (
> +			".option push\n\t"
> +			".option arch, +v\n\t"
> +			"vsetvli	x0, x0, e8, m8, ta, ma\n\t"
> +			".option pop\n\t");
> +
>  	asm volatile (
>  		".option push\n\t"
>  		".option arch, +v\n\t"
> -		"vsetvli	%0, x0, e8, m8, ta, ma\n\t"
>  		"vmv.v.i	v0, -1\n\t"
>  		"vmv.v.i	v8, -1\n\t"
>  		"vmv.v.i	v16, -1\n\t"
>  		"vmv.v.i	v24, -1\n\t"
> -		"vsetvl		%0, x0, %1\n\t"
> +		"vsetvl		x0, x0, %0\n\t"
>  		".option pop\n\t"
> -		: "=&r" (vl) : "r" (vtype_inval) : "memory");
> +		: : "r" (vtype_inval));
> +
>  	riscv_v_disable();
>  }
>  
>  static inline void riscv_v_vstate_discard(struct pt_regs *regs)
>  {
> -	if ((regs->status & SR_VS) == SR_VS_OFF)
> -		return;
> -
> -	__riscv_v_vstate_discard();
> -	__riscv_v_vstate_dirty(regs);
> +	if (riscv_v_vstate_query(regs)) {
> +		__riscv_v_vstate_discard();
> +		__riscv_v_vstate_dirty(regs);
> +	}
>  }
>  
>  static inline void riscv_v_vstate_save(struct __riscv_v_ext_state *vstate,
>  				       struct pt_regs *regs)
>  {
> -	if ((regs->status & SR_VS) == SR_VS_DIRTY) {
> +	if (__riscv_v_vstate_check(regs->status, DIRTY)) {
>  		__riscv_v_vstate_save(vstate, vstate->datap);
>  		__riscv_v_vstate_clean(regs);
>  	}
> @@ -186,7 +333,7 @@ static inline void riscv_v_vstate_save(struct __riscv_v_ext_state *vstate,
>  static inline void riscv_v_vstate_restore(struct __riscv_v_ext_state *vstate,
>  					  struct pt_regs *regs)
>  {
> -	if ((regs->status & SR_VS) != SR_VS_OFF) {
> +	if (riscv_v_vstate_query(regs)) {
>  		__riscv_v_vstate_restore(vstate, vstate->datap);
>  		__riscv_v_vstate_clean(regs);
>  	}
> @@ -195,7 +342,7 @@ static inline void riscv_v_vstate_restore(struct __riscv_v_ext_state *vstate,
>  static inline void riscv_v_vstate_set_restore(struct task_struct *task,
>  					      struct pt_regs *regs)
>  {
> -	if ((regs->status & SR_VS) != SR_VS_OFF) {
> +	if (riscv_v_vstate_query(regs)) {
>  		set_tsk_thread_flag(task, TIF_RISCV_V_DEFER_RESTORE);
>  		riscv_v_vstate_on(regs);
>  	}
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 077be4ab1f9a..180f7eae9086 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -789,7 +789,7 @@ void __init riscv_fill_hwcap(void)
>  		elf_hwcap &= ~COMPAT_HWCAP_ISA_F;
>  	}
>  
> -	if (elf_hwcap & COMPAT_HWCAP_ISA_V) {
> +	if (elf_hwcap & COMPAT_HWCAP_ISA_V || has_xtheadvector_no_alternatives()) {
>  		riscv_v_setup_vsize();
>  		/*
>  		 * ISA string in device tree might have 'v' flag, but
> diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
> index 6afe80c7f03a..99972a48e86b 100644
> --- a/arch/riscv/kernel/kernel_mode_vector.c
> +++ b/arch/riscv/kernel/kernel_mode_vector.c
> @@ -143,7 +143,7 @@ static int riscv_v_start_kernel_context(bool *is_nested)
>  
>  	/* Transfer the ownership of V from user to kernel, then save */
>  	riscv_v_start(RISCV_PREEMPT_V | RISCV_PREEMPT_V_DIRTY);
> -	if ((task_pt_regs(current)->status & SR_VS) == SR_VS_DIRTY) {
> +	if (__riscv_v_vstate_check(task_pt_regs(current)->status, DIRTY)) {
>  		uvstate = &current->thread.vstate;
>  		__riscv_v_vstate_save(uvstate, uvstate->datap);
>  	}
> @@ -160,7 +160,7 @@ asmlinkage void riscv_v_context_nesting_start(struct pt_regs *regs)
>  		return;
>  
>  	depth = riscv_v_ctx_get_depth();
> -	if (depth == 0 && (regs->status & SR_VS) == SR_VS_DIRTY)
> +	if (depth == 0 && __riscv_v_vstate_check(regs->status, DIRTY))
>  		riscv_preempt_v_set_dirty();
>  
>  	riscv_v_ctx_depth_inc();
> @@ -208,7 +208,7 @@ void kernel_vector_begin(void)
>  {
>  	bool nested = false;
>  
> -	if (WARN_ON(!has_vector()))
> +	if (WARN_ON(!(has_vector() || has_xtheadvector())))
>  		return;
>  
>  	BUG_ON(!may_use_simd());
> @@ -236,7 +236,7 @@ EXPORT_SYMBOL_GPL(kernel_vector_begin);
>   */
>  void kernel_vector_end(void)
>  {
> -	if (WARN_ON(!has_vector()))
> +	if (WARN_ON(!(has_vector() || has_xtheadvector())))
>  		return;
>  
>  	riscv_v_disable();
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index e4bc61c4e58a..191023decd16 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -176,7 +176,7 @@ void flush_thread(void)
>  void arch_release_task_struct(struct task_struct *tsk)
>  {
>  	/* Free the vector context of datap. */
> -	if (has_vector())
> +	if (has_vector() || has_xtheadvector())
>  		riscv_v_thread_free(tsk);
>  }
>  
> @@ -222,7 +222,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>  		p->thread.s[0] = 0;
>  	}
>  	p->thread.riscv_v_flags = 0;
> -	if (has_vector())
> +	if (has_vector() || has_xtheadvector())
>  		riscv_v_thread_alloc(p);
>  	p->thread.ra = (unsigned long)ret_from_fork;
>  	p->thread.sp = (unsigned long)childregs; /* kernel sp */
> diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
> index 5a2edd7f027e..1d5e4b3ca9e1 100644
> --- a/arch/riscv/kernel/signal.c
> +++ b/arch/riscv/kernel/signal.c
> @@ -189,7 +189,7 @@ static long restore_sigcontext(struct pt_regs *regs,
>  
>  			return 0;
>  		case RISCV_V_MAGIC:
> -			if (!has_vector() || !riscv_v_vstate_query(regs) ||
> +			if (!(has_vector() || has_xtheadvector()) || !riscv_v_vstate_query(regs) ||
>  			    size != riscv_v_sc_size)
>  				return -EINVAL;
>  
> @@ -211,7 +211,7 @@ static size_t get_rt_frame_size(bool cal_all)
>  
>  	frame_size = sizeof(*frame);
>  
> -	if (has_vector()) {
> +	if (has_vector() || has_xtheadvector()) {
>  		if (cal_all || riscv_v_vstate_query(task_pt_regs(current)))
>  			total_context_size += riscv_v_sc_size;
>  	}
> @@ -284,7 +284,7 @@ static long setup_sigcontext(struct rt_sigframe __user *frame,
>  	if (has_fpu())
>  		err |= save_fp_state(regs, &sc->sc_fpregs);
>  	/* Save the vector state. */
> -	if (has_vector() && riscv_v_vstate_query(regs))
> +	if ((has_vector() || has_xtheadvector()) && riscv_v_vstate_query(regs))
>  		err |= save_v_state(regs, (void __user **)&sc_ext_ptr);
>  	/* Write zero to fp-reserved space and check it on restore_sigcontext */
>  	err |= __put_user(0, &sc->sc_extdesc.reserved);
> diff --git a/arch/riscv/kernel/vector.c b/arch/riscv/kernel/vector.c
> index 3ba2f2432483..83126995f61a 100644
> --- a/arch/riscv/kernel/vector.c
> +++ b/arch/riscv/kernel/vector.c
> @@ -63,7 +63,7 @@ int riscv_v_setup_vsize(void)
>  
>  void __init riscv_v_setup_ctx_cache(void)
>  {
> -	if (!has_vector())
> +	if (!(has_vector() || has_xtheadvector()))
>  		return;
>  
>  	riscv_v_user_cachep = kmem_cache_create_usercopy("riscv_vector_ctx",
> @@ -184,7 +184,8 @@ bool riscv_v_first_use_handler(struct pt_regs *regs)
>  	u32 insn = (u32)regs->badaddr;
>  
>  	/* Do not handle if V is not supported, or disabled */
> -	if (!(ELF_HWCAP & COMPAT_HWCAP_ISA_V))
> +	if (!(ELF_HWCAP & COMPAT_HWCAP_ISA_V) &&
> +	    !(has_xtheadvector() && riscv_v_vstate_ctrl_user_allowed()))
>  		return false;
>  
>  	/* If V has been enabled then it is not the first-use trap */
> @@ -223,7 +224,7 @@ void riscv_v_vstate_ctrl_init(struct task_struct *tsk)
>  	bool inherit;
>  	int cur, next;
>  
> -	if (!has_vector())
> +	if (!(has_vector() || has_xtheadvector()))
>  		return;
>  
>  	next = riscv_v_ctrl_get_next(tsk);
> @@ -245,7 +246,7 @@ void riscv_v_vstate_ctrl_init(struct task_struct *tsk)
>  
>  long riscv_v_vstate_ctrl_get_current(void)
>  {
> -	if (!has_vector())
> +	if (!(has_vector() || has_xtheadvector()))
>  		return -EINVAL;
>  
>  	return current->thread.vstate_ctrl & PR_RISCV_V_VSTATE_CTRL_MASK;
> @@ -256,7 +257,7 @@ long riscv_v_vstate_ctrl_set_current(unsigned long arg)
>  	bool inherit;
>  	int cur, next;
>  
> -	if (!has_vector())
> +	if (!(has_vector() || has_xtheadvector()))
>  		return -EINVAL;
>  
>  	if (arg & ~PR_RISCV_V_VSTATE_CTRL_MASK)
> @@ -306,7 +307,7 @@ static struct ctl_table riscv_v_default_vstate_table[] = {
>  
>  static int __init riscv_v_sysctl_init(void)
>  {
> -	if (has_vector())
> +	if (has_vector() || has_xtheadvector())
>  		if (!register_sysctl("abi", riscv_v_default_vstate_table))
>  			return -EINVAL;
>  	return 0;
> 
> -- 
> 2.44.0
>