diff mbox series

[V4,23/23] crypto: hisilicon/qm: Workaround to enable build with RISC-V clang

Message ID 20230404182037.863533-24-sunilvl@ventanamicro.com
State Superseded
Headers show
Series Add basic ACPI support for RISC-V | expand

Commit Message

Sunil V L April 4, 2023, 6:20 p.m. UTC
With CONFIG_ACPI enabled for RISC-V, this driver gets enabled in
allmodconfig build. The gcc tool chain builds this driver removing the
inline arm64 assembly code. However, clang for RISC-V tries to build
the arm64 assembly and below error is seen.

drivers/crypto/hisilicon/qm.c:627:10: error: invalid output constraint '+Q' in asm
                       "+Q" (*((char __iomem *)fun_base))
                       ^
It appears that RISC-V clang is not smart enough to detect
IS_ENABLED(CONFIG_ARM64) and remove the dead code.

As a workaround, move this check to preprocessing stage which works
with the RISC-V clang tool chain.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
---
 drivers/crypto/hisilicon/qm.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

Arnd Bergmann April 5, 2023, 8:16 a.m. UTC | #1
On Tue, Apr 4, 2023, at 20:20, Sunil V L wrote:
> With CONFIG_ACPI enabled for RISC-V, this driver gets enabled in
> allmodconfig build. The gcc tool chain builds this driver removing the
> inline arm64 assembly code. However, clang for RISC-V tries to build
> the arm64 assembly and below error is seen.
>
> drivers/crypto/hisilicon/qm.c:627:10: error: invalid output constraint 
> '+Q' in asm
>                        "+Q" (*((char __iomem *)fun_base))
>                        ^
> It appears that RISC-V clang is not smart enough to detect
> IS_ENABLED(CONFIG_ARM64) and remove the dead code.
>
> As a workaround, move this check to preprocessing stage which works
> with the RISC-V clang tool chain.
>
> Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>

Your patch looks correct for this particular problem, but I
see that there are a couple of other issues in the same function:

> -	}
> +#if IS_ENABLED(CONFIG_ARM64)
> +	unsigned long tmp0 = 0, tmp1 = 0;
> 
>  	asm volatile("ldp %0, %1, %3\n"
>  		     "stp %0, %1, %2\n"
> @@ -627,6 +623,11 @@ static void qm_mb_write(struct hisi_qm *qm, const 
> void *src)
>  		       "+Q" (*((char __iomem *)fun_base))
>  		     : "Q" (*((char *)src))
>  		     : "memory");

For the arm64 version:

- the "dmb oshst" barrier needs to come before the stp, not after
  it,  otherwise there is no guarantee that data written to memory
  is visible by the device when the mailbox gets triggered
- The input/output arguments need to be pointers to 128-bit types,
  either a struct or a __uint128_t
- this lacks a byteswap on big-endian kernels

> +#else
> +	memcpy_toio(fun_base, src, 16);
> +	dma_wmb();
> +#endif

This version has the same problems, plus the write is not actually
atomic. I wonder if a pair of writeq() calls would just do the
right thing here for both arm64 and others, or possibly a
writeq() followed by a writeq_relaxed() to avoid the extra dmb()
in the middle.

     Arnd
Weili Qian April 11, 2023, 11:42 a.m. UTC | #2
On 2023/4/5 16:16, Arnd Bergmann wrote:
> On Tue, Apr 4, 2023, at 20:20, Sunil V L wrote:
>> With CONFIG_ACPI enabled for RISC-V, this driver gets enabled in
>> allmodconfig build. The gcc tool chain builds this driver removing the
>> inline arm64 assembly code. However, clang for RISC-V tries to build
>> the arm64 assembly and below error is seen.
>>
>> drivers/crypto/hisilicon/qm.c:627:10: error: invalid output constraint 
>> '+Q' in asm
>>                        "+Q" (*((char __iomem *)fun_base))
>>                        ^
>> It appears that RISC-V clang is not smart enough to detect
>> IS_ENABLED(CONFIG_ARM64) and remove the dead code.
>>
>> As a workaround, move this check to preprocessing stage which works
>> with the RISC-V clang tool chain.
>>
>> Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
> 
> Your patch looks correct for this particular problem, but I
> see that there are a couple of other issues in the same function:
> 
>> -	}
>> +#if IS_ENABLED(CONFIG_ARM64)
>> +	unsigned long tmp0 = 0, tmp1 = 0;
>>
>>  	asm volatile("ldp %0, %1, %3\n"
>>  		     "stp %0, %1, %2\n"
>> @@ -627,6 +623,11 @@ static void qm_mb_write(struct hisi_qm *qm, const 
>> void *src)
>>  		       "+Q" (*((char __iomem *)fun_base))
>>  		     : "Q" (*((char *)src))
>>  		     : "memory");
> 
> For the arm64 version:
> 
> - the "dmb oshst" barrier needs to come before the stp, not after
>   it,  otherwise there is no guarantee that data written to memory
>   is visible by the device when the mailbox gets triggered
> - The input/output arguments need to be pointers to 128-bit types,
>   either a struct or a __uint128_t
> - this lacks a byteswap on big-endian kernels
Sorry for the late reply.

- the execution order relies on the data dependency between ldp and stp:
  load "src" to "tmp0" and "tmp1", then
  store "tmp0" and "tmp1" to "fun_base";
  The "dmb oshst" is used to ensure that the stp instruction has been executed
  before CPU checking mailbox status. Whether the execution order
  cannot be guaranteed via data dependency?

- The input argument "src" is struct "struct qm_mailbox".
- Before call this funcion, the data has been byteswapped.

	mailbox->w0 = cpu_to_le16((cmd) |
		((op) ? 0x1 << QM_MB_OP_SHIFT : 0) |
		(0x1 << QM_MB_BUSY_SHIFT));
	mailbox->queue_num = cpu_to_le16(queue);
	mailbox->base_l = cpu_to_le32(lower_32_bits(base));
	mailbox->base_h = cpu_to_le32(upper_32_bits(base));
	mailbox->rsvd = 0;

> 
>> +#else
>> +	memcpy_toio(fun_base, src, 16);
>> +	dma_wmb();
>> +#endif
> 
> This version has the same problems, plus the write is not actually
> atomic. I wonder if a pair of writeq() calls would just do the
> right thing here for both arm64 and others, or possibly a
> writeq() followed by a writeq_relaxed() to avoid the extra dmb()
> in the middle.
> 
>      Arnd
> .
> 
We have to do a 128bit atomic write here to trigger a mailbox. The reason
is that the PF and related VFs of a hardware cannot write mailbox MMIO at the
same time.
For this SoC(Kunpeng) which has QM, if the address is 128bit aligned, stp will
be atomic. The offset of QM mailbox is 128bit aligned, so it is safe here.

Best regards,
Weili
diff mbox series

Patch

diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index e4c84433a88a..a5f521529ab2 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -611,13 +611,9 @@  EXPORT_SYMBOL_GPL(hisi_qm_wait_mb_ready);
 static void qm_mb_write(struct hisi_qm *qm, const void *src)
 {
 	void __iomem *fun_base = qm->io_base + QM_MB_CMD_SEND_BASE;
-	unsigned long tmp0 = 0, tmp1 = 0;
 
-	if (!IS_ENABLED(CONFIG_ARM64)) {
-		memcpy_toio(fun_base, src, 16);
-		dma_wmb();
-		return;
-	}
+#if IS_ENABLED(CONFIG_ARM64)
+	unsigned long tmp0 = 0, tmp1 = 0;
 
 	asm volatile("ldp %0, %1, %3\n"
 		     "stp %0, %1, %2\n"
@@ -627,6 +623,11 @@  static void qm_mb_write(struct hisi_qm *qm, const void *src)
 		       "+Q" (*((char __iomem *)fun_base))
 		     : "Q" (*((char *)src))
 		     : "memory");
+#else
+	memcpy_toio(fun_base, src, 16);
+	dma_wmb();
+#endif
+
 }
 
 static int qm_mb_nolock(struct hisi_qm *qm, struct qm_mailbox *mailbox)