[4/5] drm/msm: Use separate ASID for each set of pgtables

Message ID	20220821181917.1188021-5-robdclark@gmail.com
State	New
Headers	show Return-Path: <linux-arm-msm-owner@kernel.org> From: Rob Clark <robdclark@gmail.com> To: dri-devel@lists.freedesktop.org Cc: linux-arm-msm@vger.kernel.org, freedreno@lists.freedesktop.org, Rob Clark <robdclark@chromium.org>, Rob Clark <robdclark@gmail.com>, Abhinav Kumar <quic_abhinavk@quicinc.com>, Dmitry Baryshkov <dmitry.baryshkov@linaro.org>, Sean Paul <sean@poorly.run>, David Airlie <airlied@linux.ie>, Daniel Vetter <daniel@ffwll.ch>, linux-kernel@vger.kernel.org (open list) Subject: [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables Date: Sun, 21 Aug 2022 11:19:05 -0700 Message-Id: <20220821181917.1188021-5-robdclark@gmail.com> In-Reply-To: <20220821181917.1188021-1-robdclark@gmail.com> References: <20220821181917.1188021-1-robdclark@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	drm/msm+iommu/arm-smmu-qcom: tlbinv optimizations \| expand [0/5] drm/msm+iommu/arm-smmu-qcom: tlbinv optimizations [1/5] iommu/arm-smmu-qcom: Fix indentation [2/5] iommu/arm-smmu-qcom: Provide way to access current TTBR0 [3/5] iommu/arm-smmu-qcom: Add private interface to tlbinv by ASID [4/5] drm/msm: Use separate ASID for each set of pgtables [5/5] drm/msm: Skip tlbinv on unmap from non-current pgtables

Message ID

20220821181917.1188021-5-robdclark@gmail.com

State

New

Headers

From: Rob Clark <robdclark@gmail.com>
To: dri-devel@lists.freedesktop.org
Cc: linux-arm-msm@vger.kernel.org, freedreno@lists.freedesktop.org,
        Rob Clark <robdclark@chromium.org>,
        Rob Clark <robdclark@gmail.com>,
        Abhinav Kumar <quic_abhinavk@quicinc.com>,
        Dmitry Baryshkov <dmitry.baryshkov@linaro.org>,
        Sean Paul <sean@poorly.run>, David Airlie <airlied@linux.ie>,
        Daniel Vetter <daniel@ffwll.ch>,
        linux-kernel@vger.kernel.org (open list)
Subject: [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables
Date: Sun, 21 Aug 2022 11:19:05 -0700
Message-Id: <20220821181917.1188021-5-robdclark@gmail.com>
In-Reply-To: <20220821181917.1188021-1-robdclark@gmail.com>
References: <20220821181917.1188021-1-robdclark@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

drm/msm+iommu/arm-smmu-qcom: tlbinv optimizations | expand

Commit Message

Rob Clark Aug. 21, 2022, 6:19 p.m. UTC

From: Rob Clark <robdclark@chromium.org>

Optimize TLB invalidation by using different ASID for each set of
pgtables.  There can be scenarios where multiple processes end up
with the same ASID (such as >256 processes using the GPU), but this
is harmless, it will only result in some over-invalidation (but
less over-invalidation compared to using ASID=0 for all processes)

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_iommu.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

Comments

Robin Murphy Aug. 22, 2022, 1:52 p.m. UTC | #1

On 2022-08-21 19:19, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Optimize TLB invalidation by using different ASID for each set of
> pgtables.  There can be scenarios where multiple processes end up
> with the same ASID (such as >256 processes using the GPU), but this
> is harmless, it will only result in some over-invalidation (but
> less over-invalidation compared to using ASID=0 for all processes)

Um, if you're still relying on the GPU doing an invalidate-all-by-ASID 
whenever it switches a TTBR, then there's only ever one ASID live in the 
TLBs at once, so it really doesn't matter whether its value stays the 
same or changes. This seems like a whole chunk of complexity to achieve 
nothing :/

If you could actually use multiple ASIDs in a meaningful way to avoid 
any invalidations, you'd need to do considerably more work to keep track 
of reuse, and those races would probably be a lot less benign.

Robin.

.> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>   drivers/gpu/drm/msm/msm_iommu.c | 15 ++++++++++-----
>   1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index a54ed354578b..94c8c09980d1 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -33,6 +33,8 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
>   		size_t size)
>   {
>   	struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
> +	struct adreno_smmu_priv *adreno_smmu =
> +		dev_get_drvdata(pagetable->parent->dev);
>   	struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
>   	size_t unmapped = 0;
>   
> @@ -43,7 +45,7 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
>   		size -= 4096;
>   	}
>   
> -	iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain);
> +	adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
>   
>   	return (unmapped == size) ? 0 : -EINVAL;
>   }
> @@ -147,6 +149,7 @@ static int msm_fault_handler(struct iommu_domain *domain, struct device *dev,
>   
>   struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
>   {
> +	static atomic_t asid = ATOMIC_INIT(1);
>   	struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(parent->dev);
>   	struct msm_iommu *iommu = to_msm_iommu(parent);
>   	struct msm_iommu_pagetable *pagetable;
> @@ -210,12 +213,14 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
>   	pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
>   
>   	/*
> -	 * TODO we would like each set of page tables to have a unique ASID
> -	 * to optimize TLB invalidation.  But iommu_flush_iotlb_all() will
> -	 * end up flushing the ASID used for TTBR1 pagetables, which is not
> -	 * what we want.  So for now just use the same ASID as TTBR1.
> +	 * ASID 0 is used for kernel mapped buffers in TTBR1, which we
> +	 * do not need to invalidate when unmapping from TTBR0 pgtables.
> +	 * The hw ASID is at *least* 8b, but can be 16b.  We just assume
> +	 * the worst:
>   	 */
>   	pagetable->asid = 0;
> +	while (!pagetable->asid)
> +		pagetable->asid = atomic_inc_return(&asid) & 0xff;
>   
>   	return &pagetable->base;
>   }

Robin Murphy Aug. 22, 2022, 2:38 p.m. UTC | #2

On 2022-08-22 14:52, Robin Murphy wrote:
> On 2022-08-21 19:19, Rob Clark wrote:
>> From: Rob Clark <robdclark@chromium.org>
>>
>> Optimize TLB invalidation by using different ASID for each set of
>> pgtables.  There can be scenarios where multiple processes end up
>> with the same ASID (such as >256 processes using the GPU), but this
>> is harmless, it will only result in some over-invalidation (but
>> less over-invalidation compared to using ASID=0 for all processes)
> 
> Um, if you're still relying on the GPU doing an invalidate-all-by-ASID 
> whenever it switches a TTBR, then there's only ever one ASID live in the 
> TLBs at once, so it really doesn't matter whether its value stays the 
> same or changes. This seems like a whole chunk of complexity to achieve 
> nothing :/
> 
> If you could actually use multiple ASIDs in a meaningful way to avoid 
> any invalidations, you'd need to do considerably more work to keep track 
> of reuse, and those races would probably be a lot less benign.

Oh, and on top of that, if you did want to go down that route then 
chances are you'd then also want to start looking at using global 
mappings in TTBR1 to avoid increased TLB pressure from kernel buffers, 
and then we'd run up against some pretty horrid MMU-500 errata which so 
far I've been happy to ignore on the basis that Linux doesn't use global 
mappings. Spoiler alert: unless you can additionally convert everything 
to invalidate by VA, the workaround for #752357 most likely makes the 
whole idea moot.

Robin.

>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>> ---
>>   drivers/gpu/drm/msm/msm_iommu.c | 15 ++++++++++-----
>>   1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
>> b/drivers/gpu/drm/msm/msm_iommu.c
>> index a54ed354578b..94c8c09980d1 100644
>> --- a/drivers/gpu/drm/msm/msm_iommu.c
>> +++ b/drivers/gpu/drm/msm/msm_iommu.c
>> @@ -33,6 +33,8 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu 
>> *mmu, u64 iova,
>>           size_t size)
>>   {
>>       struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
>> +    struct adreno_smmu_priv *adreno_smmu =
>> +        dev_get_drvdata(pagetable->parent->dev);
>>       struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
>>       size_t unmapped = 0;
>> @@ -43,7 +45,7 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu 
>> *mmu, u64 iova,
>>           size -= 4096;
>>       }
>> -    iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain);
>> +    adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
>>       return (unmapped == size) ? 0 : -EINVAL;
>>   }
>> @@ -147,6 +149,7 @@ static int msm_fault_handler(struct iommu_domain 
>> *domain, struct device *dev,
>>   struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
>>   {
>> +    static atomic_t asid = ATOMIC_INIT(1);
>>       struct adreno_smmu_priv *adreno_smmu = 
>> dev_get_drvdata(parent->dev);
>>       struct msm_iommu *iommu = to_msm_iommu(parent);
>>       struct msm_iommu_pagetable *pagetable;
>> @@ -210,12 +213,14 @@ struct msm_mmu 
>> *msm_iommu_pagetable_create(struct msm_mmu *parent)
>>       pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
>>       /*
>> -     * TODO we would like each set of page tables to have a unique ASID
>> -     * to optimize TLB invalidation.  But iommu_flush_iotlb_all() will
>> -     * end up flushing the ASID used for TTBR1 pagetables, which is not
>> -     * what we want.  So for now just use the same ASID as TTBR1.
>> +     * ASID 0 is used for kernel mapped buffers in TTBR1, which we
>> +     * do not need to invalidate when unmapping from TTBR0 pgtables.
>> +     * The hw ASID is at *least* 8b, but can be 16b.  We just assume
>> +     * the worst:
>>        */
>>       pagetable->asid = 0;
>> +    while (!pagetable->asid)
>> +        pagetable->asid = atomic_inc_return(&asid) & 0xff;
>>       return &pagetable->base;
>>   }

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index a54ed354578b..94c8c09980d1 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -33,6 +33,8 @@  static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
 		size_t size)
 {
 	struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+	struct adreno_smmu_priv *adreno_smmu =
+		dev_get_drvdata(pagetable->parent->dev);
 	struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
 	size_t unmapped = 0;
 
@@ -43,7 +45,7 @@  static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
 		size -= 4096;
 	}
 
-	iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain);
+	adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
 
 	return (unmapped == size) ? 0 : -EINVAL;
 }
@@ -147,6 +149,7 @@  static int msm_fault_handler(struct iommu_domain *domain, struct device *dev,
 
 struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
 {
+	static atomic_t asid = ATOMIC_INIT(1);
 	struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(parent->dev);
 	struct msm_iommu *iommu = to_msm_iommu(parent);
 	struct msm_iommu_pagetable *pagetable;
@@ -210,12 +213,14 @@  struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
 	pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
 
 	/*
-	 * TODO we would like each set of page tables to have a unique ASID
-	 * to optimize TLB invalidation.  But iommu_flush_iotlb_all() will
-	 * end up flushing the ASID used for TTBR1 pagetables, which is not
-	 * what we want.  So for now just use the same ASID as TTBR1.
+	 * ASID 0 is used for kernel mapped buffers in TTBR1, which we
+	 * do not need to invalidate when unmapping from TTBR0 pgtables.
+	 * The hw ASID is at *least* 8b, but can be 16b.  We just assume
+	 * the worst:
 	 */
 	pagetable->asid = 0;
+	while (!pagetable->asid)
+		pagetable->asid = atomic_inc_return(&asid) & 0xff;
 
 	return &pagetable->base;
 }

[4/5] drm/msm: Use separate ASID for each set of pgtables

Commit Message

Comments

Patch