diff mbox series

[v8,12/14] iommu/arm-smmu-v3: Introduce struct arm_smmu_vmaster

Message ID f205a4e2f5971cd4b1033d7cac41683e10ebabfb.1740504232.git.nicolinc@nvidia.com
State New
Headers show
Series iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) | expand

Commit Message

Nicolin Chen Feb. 25, 2025, 5:25 p.m. UTC
Use it to store all vSMMU-related data. The vsid (Virtual Stream ID) will
be the first use case. Since the vsid reader will be the eventq handler
that already holds a streams_mutex, reuse that to fenche the vmaster too.

Also add a pair of arm_smmu_attach_prepare/commit_vmaster helpers to set
or unset the master->vmaster point. Put these helpers inside the existing
arm_smmu_attach_prepare/commit().

For identity/blocked ops that don't call arm_smmu_attach_prepare/commit(),
add a simpler arm_smmu_master_clear_vmaster helper to unset the vmaster.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastavat <praan@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   | 28 ++++++++++++
 .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c     | 45 +++++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 18 +++++++-
 3 files changed, 90 insertions(+), 1 deletion(-)

Comments

Nicolin Chen March 11, 2025, 5:43 p.m. UTC | #1
On Tue, Mar 11, 2025 at 03:57:16PM +0000, Will Deacon wrote:
> On Tue, Feb 25, 2025 at 09:25:40AM -0800, Nicolin Chen wrote:
> > Use it to store all vSMMU-related data. The vsid (Virtual Stream ID) will
> > be the first use case. Since the vsid reader will be the eventq handler
> > that already holds a streams_mutex, reuse that to fenche the vmaster too.
> 
> "fenche"?

fence :)

And I fixed other nits too.

> > +int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state,
> > +				    struct arm_smmu_nested_domain *nested_domain)
> > +{
> > +	struct arm_smmu_vmaster *vmaster;
> > +	unsigned long vsid;
> > +	int ret;
> > +
> > +	iommu_group_mutex_assert(state->master->dev);
> > +
> > +	/* Skip invalid vSTE */
> > +	if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V)))
> > +		return 0;
> 
> Ok, and we don't need to set 'state->vmaster' in this case because we
> only report stage-1 faults back to the vSMMU?

This is a good question that I didn't ask myself hard enough..

I think we should probably drop it. An invalid STE should trigger
a C_BAD_STE event that is in the supported vEVENT list. I'll run
some test before removing this line from v9.

> With the nits fixed:
> 
> Acked-by: Will Deacon <will@kernel.org>

Thanks!
Nicolin
Nicolin Chen March 17, 2025, 6:49 p.m. UTC | #2
On Mon, Mar 17, 2025 at 12:44:23PM -0300, Jason Gunthorpe wrote:
> On Tue, Mar 11, 2025 at 10:43:08AM -0700, Nicolin Chen wrote:
> > > > +int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state,
> > > > +				    struct arm_smmu_nested_domain *nested_domain)
> > > > +{
> > > > +	struct arm_smmu_vmaster *vmaster;
> > > > +	unsigned long vsid;
> > > > +	int ret;
> > > > +
> > > > +	iommu_group_mutex_assert(state->master->dev);
> > > > +
> > > > +	/* Skip invalid vSTE */
> > > > +	if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V)))
> > > > +		return 0;
> > > 
> > > Ok, and we don't need to set 'state->vmaster' in this case because we
> > > only report stage-1 faults back to the vSMMU?
> > 
> > This is a good question that I didn't ask myself hard enough..
> > 
> > I think we should probably drop it. An invalid STE should trigger
> > a C_BAD_STE event that is in the supported vEVENT list. I'll run
> > some test before removing this line from v9.
> 
> It won't trigger C_BAD_STE, recall Robin was opposed to thatm so we have this:
> 
> static void arm_smmu_make_nested_domain_ste(
> 	struct arm_smmu_ste *target, struct arm_smmu_master *master,
> 	struct arm_smmu_nested_domain *nested_domain, bool ats_enabled)
> {
> 	unsigned int cfg =
> 		FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(nested_domain->ste[0]));
> 
> 	/*
> 	 * Userspace can request a non-valid STE through the nesting interface.
> 	 * We relay that into an abort physical STE with the intention that
> 	 * C_BAD_STE for this SID can be generated to userspace.
> 	 */
> 	if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V)))
> 		cfg = STRTAB_STE_0_CFG_ABORT;
> 
> So, in the case of a non-valid STE, and a device access, the HW will
> generate one of the translation faults and that will be forwarded.
> 
> Some software component will have to transform those fault events into
> C_BAD_STE for the VM.

Hmm, double checked the spec. It does say that C_BAD_STE would be
triggered:

" V, bit [0] STE Valid.
  [...]
  Device transactions that select an STE with this field configured
  to 0 are terminated with an abort reported back to the device and
  a C_BAD_STE event is recorded."

I also did a hack test unsetting the V bit in the kernel. Then, the
HW did report C_BAD_STE (0x4) back to the VM (via vEVENTQ).

Thanks
Nicolin
Jason Gunthorpe March 17, 2025, 7:27 p.m. UTC | #3
On Mon, Mar 17, 2025 at 11:49:14AM -0700, Nicolin Chen wrote:
> On Mon, Mar 17, 2025 at 12:44:23PM -0300, Jason Gunthorpe wrote:
> > On Tue, Mar 11, 2025 at 10:43:08AM -0700, Nicolin Chen wrote:
> > > > > +int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state,
> > > > > +				    struct arm_smmu_nested_domain *nested_domain)
> > > > > +{
> > > > > +	struct arm_smmu_vmaster *vmaster;
> > > > > +	unsigned long vsid;
> > > > > +	int ret;
> > > > > +
> > > > > +	iommu_group_mutex_assert(state->master->dev);
> > > > > +
> > > > > +	/* Skip invalid vSTE */
> > > > > +	if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V)))
> > > > > +		return 0;
> > > > 
> > > > Ok, and we don't need to set 'state->vmaster' in this case because we
> > > > only report stage-1 faults back to the vSMMU?
> > > 
> > > This is a good question that I didn't ask myself hard enough..
> > > 
> > > I think we should probably drop it. An invalid STE should trigger
> > > a C_BAD_STE event that is in the supported vEVENT list. I'll run
> > > some test before removing this line from v9.
> > 
> > It won't trigger C_BAD_STE, recall Robin was opposed to thatm so we have this:
> > 
> > static void arm_smmu_make_nested_domain_ste(
> > 	struct arm_smmu_ste *target, struct arm_smmu_master *master,
> > 	struct arm_smmu_nested_domain *nested_domain, bool ats_enabled)
> > {
> > 	unsigned int cfg =
> > 		FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(nested_domain->ste[0]));
> > 
> > 	/*
> > 	 * Userspace can request a non-valid STE through the nesting interface.
> > 	 * We relay that into an abort physical STE with the intention that
> > 	 * C_BAD_STE for this SID can be generated to userspace.
> > 	 */
> > 	if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V)))
> > 		cfg = STRTAB_STE_0_CFG_ABORT;
> > 
> > So, in the case of a non-valid STE, and a device access, the HW will
> > generate one of the translation faults and that will be forwarded.
> > 
> > Some software component will have to transform those fault events into
> > C_BAD_STE for the VM.
> 
> Hmm, double checked the spec. It does say that C_BAD_STE would be
> triggered:
> 
> " V, bit [0] STE Valid.
>   [...]
>   Device transactions that select an STE with this field configured
>   to 0 are terminated with an abort reported back to the device and
>   a C_BAD_STE event is recorded."
> 
> I also did a hack test unsetting the V bit in the kernel. Then, the
> HW did report C_BAD_STE (0x4) back to the VM (via vEVENTQ).

Yes, I expect that C_BAD_STE will forward just fine.

But, as above, it should never be generated by HW because the
hypervisor kernel will never install a bad STE, we detect that and
convert it to abort.

Jason
Jason Gunthorpe April 7, 2025, 5:17 p.m. UTC | #4
On Mon, Apr 07, 2025 at 08:08:57PM +0800, Zhangfei Gao wrote:
> Hi, Nico
> 
> On Wed, 26 Feb 2025 at 01:35, Nicolin Chen <nicolinc@nvidia.com> wrote:
> >
> > Use it to store all vSMMU-related data. The vsid (Virtual Stream ID) will
> > be the first use case. Since the vsid reader will be the eventq handler
> > that already holds a streams_mutex, reuse that to fenche the vmaster too.
> >
> > Also add a pair of arm_smmu_attach_prepare/commit_vmaster helpers to set
> > or unset the master->vmaster point. Put these helpers inside the existing
> > arm_smmu_attach_prepare/commit().
> >
> > For identity/blocked ops that don't call arm_smmu_attach_prepare/commit(),
> > add a simpler arm_smmu_master_clear_vmaster helper to unset the vmaster.
> >
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Pranjal Shrivastavat <praan@google.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > ---
> 
> >
> > +int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state,
> > +                                   struct arm_smmu_nested_domain *nested_domain)
> > +{
> > +       struct arm_smmu_vmaster *vmaster;
> > +       unsigned long vsid;
> > +       int ret;
> > +
> > +       iommu_group_mutex_assert(state->master->dev);
> > +
> > +       /* Skip invalid vSTE */
> > +       if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V)))
> > +               return 0;
> 
> Why this is removed in v9 and 6.15-rc1?
> 
> I tested 6.15-rc1 the qemu failed to boot with qemu branch:
> for_iommufd_veventq-v8
> "failed to attach the bypass pagetable"
> 
> After adding this "skip check" back, the qemu works again.
> 
> Do we need to add this back?

Shouldn't need too.. Do you know what failed to cause the failed to
attach? There is nothing in this function that should cause that.

Jason
diff mbox series

Patch

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index bd9d7c85576a..36961a3579f2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -799,6 +799,11 @@  struct arm_smmu_stream {
 	struct rb_node			node;
 };
 
+struct arm_smmu_vmaster {
+	struct arm_vsmmu		*vsmmu;
+	unsigned long			vsid;
+};
+
 struct arm_smmu_event {
 	u8				stall : 1,
 					ssv : 1,
@@ -824,6 +829,7 @@  struct arm_smmu_master {
 	struct arm_smmu_device		*smmu;
 	struct device			*dev;
 	struct arm_smmu_stream		*streams;
+	struct arm_smmu_vmaster		*vmaster; /* use smmu->streams_mutex */
 	/* Locked by the iommu core using the group mutex */
 	struct arm_smmu_ctx_desc_cfg	cd_table;
 	unsigned int			num_streams;
@@ -972,6 +978,7 @@  struct arm_smmu_attach_state {
 	bool disable_ats;
 	ioasid_t ssid;
 	/* Resulting state */
+	struct arm_smmu_vmaster *vmaster;
 	bool ats_enabled;
 };
 
@@ -1055,9 +1062,30 @@  struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev,
 				       struct iommu_domain *parent,
 				       struct iommufd_ctx *ictx,
 				       unsigned int viommu_type);
+int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state,
+				    struct arm_smmu_nested_domain *nested_domain);
+void arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state);
+void arm_smmu_master_clear_vmaster(struct arm_smmu_master *master);
 #else
 #define arm_smmu_hw_info NULL
 #define arm_vsmmu_alloc NULL
+
+static inline int
+arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state,
+				struct arm_smmu_nested_domain *nested_domain)
+{
+	return 0; /* NOP */
+}
+
+static inline void
+arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state)
+{
+}
+
+static inline void
+arm_smmu_master_clear_vmaster(struct arm_smmu_master *master)
+{
+}
 #endif /* CONFIG_ARM_SMMU_V3_IOMMUFD */
 
 #endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index 5aa2e7af58b4..6b712b1ab429 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -85,6 +85,51 @@  static void arm_smmu_make_nested_domain_ste(
 	}
 }
 
+int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state,
+				    struct arm_smmu_nested_domain *nested_domain)
+{
+	struct arm_smmu_vmaster *vmaster;
+	unsigned long vsid;
+	int ret;
+
+	iommu_group_mutex_assert(state->master->dev);
+
+	/* Skip invalid vSTE */
+	if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V)))
+		return 0;
+
+	ret = iommufd_viommu_get_vdev_id(&nested_domain->vsmmu->core,
+					 state->master->dev, &vsid);
+	if (ret)
+		return ret;
+
+	vmaster = kzalloc(sizeof(*vmaster), GFP_KERNEL);
+	if (!vmaster)
+		return -ENOMEM;
+	vmaster->vsmmu = nested_domain->vsmmu;
+	vmaster->vsid = vsid;
+	state->vmaster = vmaster;
+
+	return 0;
+}
+
+void arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state)
+{
+	struct arm_smmu_master *master = state->master;
+
+	mutex_lock(&master->smmu->streams_mutex);
+	kfree(master->vmaster);
+	master->vmaster = state->vmaster;
+	mutex_unlock(&master->smmu->streams_mutex);
+}
+
+void arm_smmu_master_clear_vmaster(struct arm_smmu_master *master)
+{
+	struct arm_smmu_attach_state state = { .master = master };
+
+	arm_smmu_attach_commit_vmaster(&state);
+}
+
 static int arm_smmu_attach_dev_nested(struct iommu_domain *domain,
 				      struct device *dev)
 {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 358072b4e293..964d2cf27d3d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2803,6 +2803,7 @@  int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
 	struct arm_smmu_domain *smmu_domain =
 		to_smmu_domain_devices(new_domain);
 	unsigned long flags;
+	int ret;
 
 	/*
 	 * arm_smmu_share_asid() must not see two domains pointing to the same
@@ -2832,9 +2833,18 @@  int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
 	}
 
 	if (smmu_domain) {
+		if (new_domain->type == IOMMU_DOMAIN_NESTED) {
+			ret = arm_smmu_attach_prepare_vmaster(
+				state, to_smmu_nested_domain(new_domain));
+			if (ret)
+				return ret;
+		}
+
 		master_domain = kzalloc(sizeof(*master_domain), GFP_KERNEL);
-		if (!master_domain)
+		if (!master_domain) {
+			kfree(state->vmaster);
 			return -ENOMEM;
+		}
 		master_domain->master = master;
 		master_domain->ssid = state->ssid;
 		if (new_domain->type == IOMMU_DOMAIN_NESTED)
@@ -2861,6 +2871,7 @@  int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
 			spin_unlock_irqrestore(&smmu_domain->devices_lock,
 					       flags);
 			kfree(master_domain);
+			kfree(state->vmaster);
 			return -EINVAL;
 		}
 
@@ -2893,6 +2904,8 @@  void arm_smmu_attach_commit(struct arm_smmu_attach_state *state)
 
 	lockdep_assert_held(&arm_smmu_asid_lock);
 
+	arm_smmu_attach_commit_vmaster(state);
+
 	if (state->ats_enabled && !master->ats_enabled) {
 		arm_smmu_enable_ats(master);
 	} else if (state->ats_enabled && master->ats_enabled) {
@@ -3162,6 +3175,7 @@  static int arm_smmu_attach_dev_identity(struct iommu_domain *domain,
 	struct arm_smmu_ste ste;
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
 
+	arm_smmu_master_clear_vmaster(master);
 	arm_smmu_make_bypass_ste(master->smmu, &ste);
 	arm_smmu_attach_dev_ste(domain, dev, &ste, STRTAB_STE_1_S1DSS_BYPASS);
 	return 0;
@@ -3180,7 +3194,9 @@  static int arm_smmu_attach_dev_blocked(struct iommu_domain *domain,
 					struct device *dev)
 {
 	struct arm_smmu_ste ste;
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
 
+	arm_smmu_master_clear_vmaster(master);
 	arm_smmu_make_abort_ste(&ste);
 	arm_smmu_attach_dev_ste(domain, dev, &ste,
 				STRTAB_STE_1_S1DSS_TERMINATE);