diff mbox

[RFC,2/2] zap_pte_range: fix partial TLB flushing in response to a dirty pte

Message ID 1414496662-25202-3-git-send-email-will.deacon@arm.com
State New
Headers show

Commit Message

Will Deacon Oct. 28, 2014, 11:44 a.m. UTC
When we encounter a dirty page during unmap, we force a TLB invalidation
to avoid a race with pte_mkclean and stale, dirty TLB entries in the
CPU.

This uses the same force_flush logic as the batch failure code, but
since we don't break out of the loop when finding a dirty pte, tlb->end
can be < addr as we only batch for present ptes. This can result in a
negative range being passed to subsequent TLB invalidation calls,
potentially leading to massive over-invalidation of the TLB (observed
in practice running firefox on arm64).

This patch fixes the issue by restricting the use of addr in the TLB
range calculations. The first range then ends up covering tlb->start to
min(tlb->end, addr), which corresponds to the currently batched range.
The second range then covers anything remaining, which may still lead to
a (much reduced) over-invalidation of the TLB.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 mm/memory.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Linus Torvalds Oct. 28, 2014, 3:18 p.m. UTC | #1
On Tue, Oct 28, 2014 at 4:44 AM, Will Deacon <will.deacon@arm.com> wrote:
> @@ -1194,11 +1194,10 @@ again:
>                  * then update the range to be the remaining
>                  * TLB range.
>                  */
> -               old_end = tlb->end;
> -               tlb->end = addr;
> +               tlb->end = old_end = min(tlb->end, addr);
>                 tlb_flush_mmu_tlbonly(tlb);
> -               tlb->start = addr;
> -               tlb->end = old_end;
> +               tlb->start = old_end;
> +               tlb->end = end;

I don't think this is right. Setting "tlb->end = end" looks very wrong
indeed, because "end" here inside zap_pte_range() is *not* the final
end of the zap range, it is just the end of the current set of pte's.

There's a reason the old code *saved* the old end value. You've now
ripped that out, and use the "old_end" for something else entirely.

Your arm64 version of tlb_add_flush() then hides the bug you just
introduced by updating the end range for each page you encounter. But
quite frankly, I think your problems are all fundamental to that very
issue. You're playing games with start/end during the TLB flush
itself, which is not how those things were designed to work.

So now you break everything that *doesn't* do your arm games.

                   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 3e503831e042..ea41508d41f3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1194,11 +1194,10 @@  again:
 		 * then update the range to be the remaining
 		 * TLB range.
 		 */
-		old_end = tlb->end;
-		tlb->end = addr;
+		tlb->end = old_end = min(tlb->end, addr);
 		tlb_flush_mmu_tlbonly(tlb);
-		tlb->start = addr;
-		tlb->end = old_end;
+		tlb->start = old_end;
+		tlb->end = end;
 	}
 	pte_unmap_unlock(start_pte, ptl);