[22/36] cputlb: Fold TLB_RECHECK into TLB_INVALID_MASK

Message ID	20190903160858.5296-23-richard.henderson@linaro.org
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Date: Tue, 3 Sep 2019 09:08:44 -0700 Message-Id: <20190903160858.5296-23-richard.henderson@linaro.org> In-Reply-To: <20190903160858.5296-1-richard.henderson@linaro.org> References: <20190903160858.5296-1-richard.henderson@linaro.org> Subject: [Qemu-devel] [PATCH 22/36] cputlb: Fold TLB_RECHECK into TLB_INVALID_MASK Precedence: list Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Series	tcg patch queue \| expand [00/36] tcg patch queue [01/36] tcg: TCGMemOp is now accelerator independent MemOp [02/36] memory: Introduce size_memop [03/36] target/mips: Access MemoryRegion with MemOp [04/36] hw/s390x: Access MemoryRegion with MemOp [05/36] hw/intc/armv7m_nic: Access MemoryRegion with MemOp [06/36] hw/virtio: Access MemoryRegion with MemOp [07/36] hw/vfio: Access MemoryRegion with MemOp [08/36] exec: Access MemoryRegion with MemOp [09/36] cputlb: Access MemoryRegion with MemOp [10/36] memory: Access MemoryRegion with MemOp [11/36] hw/s390x: Hard code size with MO_{8\|16\|32\|64} [12/36] target/mips: Hard code size with MO_{8\|16\|32\|64} [13/36] exec: Hard code size with MO_{8\|16\|32\|64} [14/36] memory: Access MemoryRegion with endianness [15/36] cputlb: Replace size and endian operands for MemOp [16/36] memory: Single byte swap along the I/O path [17/36] cputlb: Byte swap memory transaction attribute [18/36] target/sparc: Add TLB entry with attributes [19/36] target/sparc: sun4u Invert Endian TTE bit [20/36] exec: Move user-only watchpoint stubs inline [21/36] exec: Factor out core logic of check_watchpoint() [22/36] cputlb: Fold TLB_RECHECK into TLB_INVALID_MASK [23/36] exec: Factor out cpu_watchpoint_address_matches [24/36] cputlb: Fix size operand for tlb_fill on unaligned store [25/36] cputlb: Remove double-alignment in store_helper [26/36] cputlb: Handle watchpoints via TLB_WATCHPOINT [27/36] tcg: Check for watchpoints in probe_write() [28/36] s390x/tcg: Use guest_addr_valid() instead of h2g_valid() in probe_write_access() [29/36] s390x/tcg: Fix length calculation in probe_write_access() [30/36] tcg: Factor out CONFIG_USER_ONLY probe_write() from s390x code [31/36] tcg: Enforce single page access in probe_write() [32/36] mips/tcg: Call probe_write() for CONFIG_USER_ONLY as well [33/36] hppa/tcg: Call probe_write() also for CONFIG_USER_ONLY [34/36] s390x/tcg: Pass a size to probe_write() in do_csst() [35/36] tcg: Make probe_write() return a pointer to the host page [36/36] tcg: Factor out probe_write() logic into probe_access()

Message ID

20190903160858.5296-23-richard.henderson@linaro.org

State

New

Headers

Received-SPF: pass (google.com: domain of
	qemu-devel-bounces+patch=linaro.org@nongnu.org designates
	209.51.188.17 as permitted sender) client-ip=209.51.188.17; 
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Tue,  3 Sep 2019 09:08:44 -0700
Message-Id: <20190903160858.5296-23-richard.henderson@linaro.org>
In-Reply-To: <20190903160858.5296-1-richard.henderson@linaro.org>
References: <20190903160858.5296-1-richard.henderson@linaro.org>
Subject: [Qemu-devel] [PATCH 22/36] cputlb: Fold TLB_RECHECK into
	TLB_INVALID_MASK
Precedence: list
Cc: peter.maydell@linaro.org
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>

Series

tcg patch queue | expand

Commit Message

Richard Henderson Sept. 3, 2019, 4:08 p.m. UTC

We had two different mechanisms to force a recheck of the tlb.

Before TLB_RECHECK was introduced, we had a PAGE_WRITE_INV bit
that would immediate set TLB_INVALID_MASK, which automatically
means that a second check of the tlb entry fails.

We can use the same mechanism to handle small pages.
Conserve TLB_* bits by removing TLB_RECHECK.

Reviewed-by: David Hildenbrand <david@redhat.com>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

---
 include/exec/cpu-all.h |  5 +--
 accel/tcg/cputlb.c     | 86 +++++++++++-------------------------------
 2 files changed, 24 insertions(+), 67 deletions(-)

-- 
2.17.1

Comments

Peter Maydell Sept. 6, 2019, 11:02 a.m. UTC | #1

On Tue, 3 Sep 2019 at 17:09, Richard Henderson
<richard.henderson@linaro.org> wrote:
>

> We had two different mechanisms to force a recheck of the tlb.

>

> Before TLB_RECHECK was introduced, we had a PAGE_WRITE_INV bit

> that would immediate set TLB_INVALID_MASK, which automatically

> means that a second check of the tlb entry fails.

>

> We can use the same mechanism to handle small pages.

> Conserve TLB_* bits by removing TLB_RECHECK.

>

> Reviewed-by: David Hildenbrand <david@redhat.com>

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

> ---

> @@ -1265,27 +1269,6 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,

>          if ((addr & (size - 1)) != 0) {

>              goto do_unaligned_access;

>          }

> -

> -        if (tlb_addr & TLB_RECHECK) {

> -            /*

> -             * This is a TLB_RECHECK access, where the MMU protection

> -             * covers a smaller range than a target page, and we must

> -             * repeat the MMU check here. This tlb_fill() call might

> -             * longjump out if this access should cause a guest exception.

> -             */

> -            tlb_fill(env_cpu(env), addr, size,

> -                     access_type, mmu_idx, retaddr);

> -            index = tlb_index(env, mmu_idx, addr);

> -            entry = tlb_entry(env, mmu_idx, addr);

> -

> -            tlb_addr = code_read ? entry->addr_code : entry->addr_read;

> -            tlb_addr &= ~TLB_RECHECK;

> -            if (!(tlb_addr & ~TARGET_PAGE_MASK)) {

> -                /* RAM access */

> -                goto do_aligned_access;

> -            }

> -        }

> -

>          return io_readx(env, &env_tlb(env)->d[mmu_idx].iotlb[index],

>                          mmu_idx, addr, retaddr, access_type, op);

>      }

In the old version of this code, we do the "tlb fill if TLB_RECHECK
is set", and then we say "now we've done the refill have we actually
got RAM", and we avoid calling io_readx() if that is the case.
This is necessary because io_readx() will misbehave if you try to
call it on RAM (notably if what we have is notdirty-mem then we
need to do the read-from-actual-host-ram because the IO ops backing
notdirty-mem are intended for writes only).

With this patch applied, we seem to have lost the handling for
if the tlb_fill in a TLB_RECHECK case gives us back some real RAM.
(Similarly for store_helper().)

I think this is what's causing Mark Cave-Ayland's Solaris test
case to fail.

More generally, I don't really understand why this merging
is correct -- "TLB needs a recheck" is not the same thing as
"TLB is invalid" and I don't think we can merge the two
bits.

thanks
-- PMM

Richard Henderson Sept. 6, 2019, 2:58 p.m. UTC | #2

On 9/6/19 7:02 AM, Peter Maydell wrote:
> On Tue, 3 Sep 2019 at 17:09, Richard Henderson

> <richard.henderson@linaro.org> wrote:

>>

>> We had two different mechanisms to force a recheck of the tlb.

>>

>> Before TLB_RECHECK was introduced, we had a PAGE_WRITE_INV bit

>> that would immediate set TLB_INVALID_MASK, which automatically

>> means that a second check of the tlb entry fails.

>>

>> We can use the same mechanism to handle small pages.

>> Conserve TLB_* bits by removing TLB_RECHECK.

>>

>> Reviewed-by: David Hildenbrand <david@redhat.com>

>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

>> ---

> 

>> @@ -1265,27 +1269,6 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,

>>          if ((addr & (size - 1)) != 0) {

>>              goto do_unaligned_access;

>>          }

>> -

>> -        if (tlb_addr & TLB_RECHECK) {

>> -            /*

>> -             * This is a TLB_RECHECK access, where the MMU protection

>> -             * covers a smaller range than a target page, and we must

>> -             * repeat the MMU check here. This tlb_fill() call might

>> -             * longjump out if this access should cause a guest exception.

>> -             */

>> -            tlb_fill(env_cpu(env), addr, size,

>> -                     access_type, mmu_idx, retaddr);

>> -            index = tlb_index(env, mmu_idx, addr);

>> -            entry = tlb_entry(env, mmu_idx, addr);

>> -

>> -            tlb_addr = code_read ? entry->addr_code : entry->addr_read;

>> -            tlb_addr &= ~TLB_RECHECK;

>> -            if (!(tlb_addr & ~TARGET_PAGE_MASK)) {

>> -                /* RAM access */

>> -                goto do_aligned_access;

>> -            }

>> -        }

>> -

>>          return io_readx(env, &env_tlb(env)->d[mmu_idx].iotlb[index],

>>                          mmu_idx, addr, retaddr, access_type, op);

>>      }

> 

> In the old version of this code, we do the "tlb fill if TLB_RECHECK

> is set", and then we say "now we've done the refill have we actually

> got RAM", and we avoid calling io_readx() if that is the case.



I don't think that's the case, since,

        if (!victim_tlb_hit(env, mmu_idx, index, tlb_off,
                            addr & TARGET_PAGE_MASK)) {
            tlb_fill(env_cpu(env), addr, size,
                     access_type, mmu_idx, retaddr);
            index = tlb_index(env, mmu_idx, addr);
            entry = tlb_entry(env, mmu_idx, addr);
        }
        tlb_addr = code_read ? entry->addr_code : entry->addr_read;
        tlb_addr &= ~TLB_INVALID_MASK;
    }

the last line here clears INVALID.  The only bits that could remain should be
WATCHPOINT and MMIO.  (NOTDIRTY can only be set for entry->addr_write, not for
addr_read/addr_code.)

And for that matter, once we've processed the watchpoint we remove
TLB_WATCHPOINT as well, so that we only enter io_readx() if MMIO is set.

> This is necessary because io_readx() will misbehave if you try to

> call it on RAM (notably if what we have is notdirty-mem then we

> need to do the read-from-actual-host-ram because the IO ops backing

> notdirty-mem are intended for writes only).

> 

> With this patch applied, we seem to have lost the handling for

> if the tlb_fill in a TLB_RECHECK case gives us back some real RAM.

> (Similarly for store_helper().)


Again, I disagree.  I think there must be some other explanation.

> More generally, I don't really understand why this merging

> is correct -- "TLB needs a recheck" is not the same thing as

> "TLB is invalid" and I don't think we can merge the two

> bits.


"TLB is invalid" means that we cannot use an existing tlb entry, therefore we
must go back to tlb_fill.  "TLB needs a recheck" means we must go back to
tlb_fill -- exactly the same.

The only odd bit about "TLB is invalid" is that it applies to the *next*
lookup.  If we have just returned from tlb_fill, then the tlb entry *must* be
valid.  If it were not valid, then tlb_fill would not return at all.

So, on the paths that use tlb_fill, we clear TLB_INVALID_MASK, indicating that
the lookup has just been done.

Which, honestly, ought to have happened with TLB_RECHECK because it was not
uncommon to perform two tlb_fill in a row -- the first because of a true tlb
miss and the second because the entry supplied by the fill has TLB_RECHECK set.


r~

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 8323094648..8d07ae23a5 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -329,14 +329,11 @@  CPUArchState *cpu_copy(CPUArchState *env);
 #define TLB_NOTDIRTY        (1 << (TARGET_PAGE_BITS - 2))
 /* Set if TLB entry is an IO callback.  */
 #define TLB_MMIO            (1 << (TARGET_PAGE_BITS - 3))
-/* Set if TLB entry must have MMU lookup repeated for every access */
-#define TLB_RECHECK         (1 << (TARGET_PAGE_BITS - 4))
 
 /* Use this mask to check interception with an alignment mask
  * in a TCG backend.
  */
-#define TLB_FLAGS_MASK  (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO \
-                         | TLB_RECHECK)
+#define TLB_FLAGS_MASK  (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO)
 
 /**
  * tlb_hit_page: return true if page aligned @addr is a hit against the
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index d9787cc893..c9576bebcf 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -732,11 +732,8 @@  void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
 
     address = vaddr_page;
     if (size < TARGET_PAGE_SIZE) {
-        /*
-         * Slow-path the TLB entries; we will repeat the MMU check and TLB
-         * fill on every access.
-         */
-        address |= TLB_RECHECK;
+        /* Repeat the MMU check and TLB fill on every access.  */
+        address |= TLB_INVALID_MASK;
     }
     if (attrs.byte_swap) {
         /* Force the access through the I/O slow path.  */
@@ -1026,10 +1023,15 @@  static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
   victim_tlb_hit(env, mmu_idx, index, offsetof(CPUTLBEntry, TY), \
                  (ADDR) & TARGET_PAGE_MASK)
 
-/* NOTE: this function can trigger an exception */
-/* NOTE2: the returned address is not exactly the physical address: it
- * is actually a ram_addr_t (in system mode; the user mode emulation
- * version of this function returns a guest virtual address).
+/*
+ * Return a ram_addr_t for the virtual address for execution.
+ *
+ * Return -1 if we can't translate and execute from an entire page
+ * of RAM.  This will force us to execute by loading and translating
+ * one insn at a time, without caching.
+ *
+ * NOTE: This function will trigger an exception if the page is
+ * not executable.
  */
 tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr)
 {
@@ -1043,19 +1045,20 @@  tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr)
             tlb_fill(env_cpu(env), addr, 0, MMU_INST_FETCH, mmu_idx, 0);
             index = tlb_index(env, mmu_idx, addr);
             entry = tlb_entry(env, mmu_idx, addr);
+
+            if (unlikely(entry->addr_code & TLB_INVALID_MASK)) {
+                /*
+                 * The MMU protection covers a smaller range than a target
+                 * page, so we must redo the MMU check for every insn.
+                 */
+                return -1;
+            }
         }
         assert(tlb_hit(entry->addr_code, addr));
     }
 
-    if (unlikely(entry->addr_code & (TLB_RECHECK | TLB_MMIO))) {
-        /*
-         * Return -1 if we can't translate and execute from an entire
-         * page of RAM here, which will cause us to execute by loading
-         * and translating one insn at a time, without caching:
-         *  - TLB_RECHECK: means the MMU protection covers a smaller range
-         *    than a target page, so we must redo the MMU check every insn
-         *  - TLB_MMIO: region is not backed by RAM
-         */
+    if (unlikely(entry->addr_code & TLB_MMIO)) {
+        /* The region is not backed by RAM.  */
         return -1;
     }
 
@@ -1180,7 +1183,7 @@  static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
     }
 
     /* Notice an IO access or a needs-MMU-lookup access */
-    if (unlikely(tlb_addr & (TLB_MMIO | TLB_RECHECK))) {
+    if (unlikely(tlb_addr & TLB_MMIO)) {
         /* There's really nothing that can be done to
            support this apart from stop-the-world.  */
         goto stop_the_world;
@@ -1258,6 +1261,7 @@  load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
             entry = tlb_entry(env, mmu_idx, addr);
         }
         tlb_addr = code_read ? entry->addr_code : entry->addr_read;
+        tlb_addr &= ~TLB_INVALID_MASK;
     }
 
     /* Handle an IO access.  */
@@ -1265,27 +1269,6 @@  load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
         if ((addr & (size - 1)) != 0) {
             goto do_unaligned_access;
         }
-
-        if (tlb_addr & TLB_RECHECK) {
-            /*
-             * This is a TLB_RECHECK access, where the MMU protection
-             * covers a smaller range than a target page, and we must
-             * repeat the MMU check here. This tlb_fill() call might
-             * longjump out if this access should cause a guest exception.
-             */
-            tlb_fill(env_cpu(env), addr, size,
-                     access_type, mmu_idx, retaddr);
-            index = tlb_index(env, mmu_idx, addr);
-            entry = tlb_entry(env, mmu_idx, addr);
-
-            tlb_addr = code_read ? entry->addr_code : entry->addr_read;
-            tlb_addr &= ~TLB_RECHECK;
-            if (!(tlb_addr & ~TARGET_PAGE_MASK)) {
-                /* RAM access */
-                goto do_aligned_access;
-            }
-        }
-
         return io_readx(env, &env_tlb(env)->d[mmu_idx].iotlb[index],
                         mmu_idx, addr, retaddr, access_type, op);
     }
@@ -1314,7 +1297,6 @@  load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
         return res & MAKE_64BIT_MASK(0, size * 8);
     }
 
- do_aligned_access:
     haddr = (void *)((uintptr_t)addr + entry->addend);
     switch (op) {
     case MO_UB:
@@ -1509,27 +1491,6 @@  store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
         if ((addr & (size - 1)) != 0) {
             goto do_unaligned_access;
         }
-
-        if (tlb_addr & TLB_RECHECK) {
-            /*
-             * This is a TLB_RECHECK access, where the MMU protection
-             * covers a smaller range than a target page, and we must
-             * repeat the MMU check here. This tlb_fill() call might
-             * longjump out if this access should cause a guest exception.
-             */
-            tlb_fill(env_cpu(env), addr, size, MMU_DATA_STORE,
-                     mmu_idx, retaddr);
-            index = tlb_index(env, mmu_idx, addr);
-            entry = tlb_entry(env, mmu_idx, addr);
-
-            tlb_addr = tlb_addr_write(entry);
-            tlb_addr &= ~TLB_RECHECK;
-            if (!(tlb_addr & ~TARGET_PAGE_MASK)) {
-                /* RAM access */
-                goto do_aligned_access;
-            }
-        }
-
         io_writex(env, &env_tlb(env)->d[mmu_idx].iotlb[index], mmu_idx,
                   val, addr, retaddr, op);
         return;
@@ -1579,7 +1540,6 @@  store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
         return;
     }
 
- do_aligned_access:
     haddr = (void *)((uintptr_t)addr + entry->addend);
     switch (op) {
     case MO_UB:

[22/36] cputlb: Fold TLB_RECHECK into TLB_INVALID_MASK

Commit Message

Comments

Patch