mbox series

[RFC,0/3] target/mips: Make the number of TLB entries a CPU property

Message ID 20201013132535.3599453-1-f4bug@amsat.org
Headers show
Series target/mips: Make the number of TLB entries a CPU property | expand

Message

Philippe Mathieu-Daudé Oct. 13, 2020, 1:25 p.m. UTC
Yocto developers have expressed interest in running MIPS32
CPU with custom number of TLB:
https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html

Help them by making the number of TLB entries a CPU property,
keeping our set of CPU definitions in sync with real hardware.

Please test/review,

Phil.

Philippe Mathieu-Daudé (3):
  target/mips: Make cpu_mips_realize_env() propagate Error
  target/mips: Store number of TLB entries in CPUMIPSState
  target/mips: Make the number of TLB entries a CPU property

 target/mips/cpu.h                |  1 +
 target/mips/internal.h           | 10 +++++++++-
 target/mips/cpu.c                | 12 ++++++++++--
 target/mips/translate.c          | 16 ++++++++++++++--
 target/mips/translate_init.c.inc |  2 +-
 5 files changed, 35 insertions(+), 6 deletions(-)

-- 
2.26.2

Comments

Richard Henderson Oct. 13, 2020, 11:11 p.m. UTC | #1
On 10/13/20 6:25 AM, Philippe Mathieu-Daudé wrote:
> Yocto developers have expressed interest in running MIPS32
> CPU with custom number of TLB:
> https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html
> 
> Help them by making the number of TLB entries a CPU property,
> keeping our set of CPU definitions in sync with real hardware.

You mean keeping the 34kf model within qemu in sync, rather than creating a
nonsense model that doesn't exist.

Question: is this cpu parameter useful for anything else?

Because the ideal solution for a CI loop is to use one of the mips cpu models
that has the hw page table walker (CP0C3_PW).  Having qemu being able to refill
the tlb itself is massively faster.

We do not currently implement a mips cpu that has the PW.  When I downloaded
the document set in 2014, I only got the mips64-pra and neglected to get the
mips32-pra.  So I don't actually know if the PW applies to mips32.  I do know
that there's nothing in the kernel that ifdefs around it.

So:

(1) anyone know if the PW incompatible with mips32?
(2) if not, was there any mips32 hw built with PW
    that we could model?
(3) if not, would a cpu parameter to force-enable PW
    for any r2+ cpu be more useful that frobbing the
    number of tlb entries?


r~
Victor Kamensky (kamensky) Oct. 14, 2020, 1:36 a.m. UTC | #2
Hi Philippe,

Thank you very much for looking at this. I gave a spin to
your 3 patch series in original setup, and as expected with
'-cpu 34Kf,tlb-entries=64' option it works great.

If nobody objects, and your patches could be merged, we
would greatly appreciate it.

Thanks,
Victor
Richard Henderson Oct. 14, 2020, 2:22 a.m. UTC | #3
On 10/13/20 4:11 PM, Richard Henderson wrote:
> On 10/13/20 6:25 AM, Philippe Mathieu-Daudé wrote:
>> Yocto developers have expressed interest in running MIPS32
>> CPU with custom number of TLB:
>> https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html
>>
>> Help them by making the number of TLB entries a CPU property,
>> keeping our set of CPU definitions in sync with real hardware.
> 
> You mean keeping the 34kf model within qemu in sync, rather than creating a
> nonsense model that doesn't exist.
> 
> Question: is this cpu parameter useful for anything else?
> 
> Because the ideal solution for a CI loop is to use one of the mips cpu models
> that has the hw page table walker (CP0C3_PW).  Having qemu being able to refill
> the tlb itself is massively faster.
> 
> We do not currently implement a mips cpu that has the PW.  When I downloaded

Bah, "mips32 cpu".

We do have the P5600 that does has it, though the code is wrapped up in
TARGET_MIPS64.  I'll also note that the code could be better placed [*]

> (1) anyone know if the PW incompatible with mips32?

I've since found a copy of the mips32-pra in the wayback machine and have
answered this as "no" -- PW is documented for mips32.

> (2) if not, was there any mips32 hw built with PW
>     that we could model?

But I still don't know about this.

A further question for the Yocto folks: could you make use of a 64-bit kernel
in testing a 32-bit userspace?

And I guess maybe we should update our recommendations in the docs.  Thoughts
on this, Phil?


r~


[*] Where it is now, it can't be used for gdb (mips_cpu_get_phys_page_debug).
When used there, we should not modify cpu state, i.e. actually insert the PTE
into the MIPS TLB, but we could still make use of the information available.
Zhijian Li (Fujitsu)" via Oct. 14, 2020, 3:21 a.m. UTC | #4
Hi Richard,

Please forgive my cumbersome mailing agent at work.
Please look inline for 'victor>'
Richard Purdie Oct. 14, 2020, 7:14 a.m. UTC | #5
On Wed, 2020-10-14 at 01:36 +0000, Victor Kamensky (kamensky) wrote:
> Thank you very much for looking at this. I gave a spin to
> your 3 patch series in original setup, and as expected with
> '-cpu 34Kf,tlb-entries=64' option it works great.
> 
> If nobody objects, and your patches could be merged, we
> would greatly appreciate it.

Speaking as one of the Yocto Project maintainers, this is really
helpful for us, thanks!

qemumips is one of our slowest platforms for automated testing so this
performance improvement helps a lot.

Cheers,

Richard
Richard Purdie Oct. 14, 2020, 7:26 a.m. UTC | #6
On Tue, 2020-10-13 at 19:22 -0700, Richard Henderson wrote:
> On 10/13/20 4:11 PM, Richard Henderson wrote:
> > On 10/13/20 6:25 AM, Philippe Mathieu-Daudé wrote:
> > > Yocto developers have expressed interest in running MIPS32
> > > CPU with custom number of TLB:
> > > https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html
> > > 
> > > Help them by making the number of TLB entries a CPU property,
> > > keeping our set of CPU definitions in sync with real hardware.
> > 
> > You mean keeping the 34kf model within qemu in sync, rather than
> > creating a
> > nonsense model that doesn't exist.
> > 
> > Question: is this cpu parameter useful for anything else?
> > 
> > Because the ideal solution for a CI loop is to use one of the mips
> > cpu models
> > that has the hw page table walker (CP0C3_PW).  Having qemu being
> > able to refill
> > the tlb itself is massively faster.
> > 
> > We do not currently implement a mips cpu that has the PW.  When I
> > downloaded
> 
> Bah, "mips32 cpu".
> 
> We do have the P5600 that does has it, though the code is wrapped up
> in TARGET_MIPS64.  I'll also note that the code could be better
> placed [*]
> 
> > (1) anyone know if the PW incompatible with mips32?
> 
> I've since found a copy of the mips32-pra in the wayback machine and
> have answered this as "no" -- PW is documented for mips32.
> 
> > (2) if not, was there any mips32 hw built with PW
> >     that we could model?
> 
> But I still don't know about this.
> 
> A further question for the Yocto folks: could you make use of a 64-
> bit kernel in testing a 32-bit userspace?

We run testing of 64 bit kernel with 64 bit userspace and 32 bit kernel
with 32 bit userspace, we've tested that for years. I know some of our
users do use 64 bit kernels with 32 bit userspace and we do limited
testing of that for multilib support.

I think we did try testing an R2 setup but found little performance
change and I think it may have been unreliable so we didn't make the
switch. We did move to 34kf relatively recently as that did perform
marginally better and matched qemu's recommendations.

We've also run into a lot of problems with 32 bit mips in general if we
go over 256MB memory since that seems to trigger highmem and hangs
regularly for us. We're working on infrastructure to save out those
hung VMs to help us debug such issues but don't have that yet. Its not
regular enough and we don't have the expertise to debug it properly as
yet unfortunately.

There is a question of how valid a 32 bit kernel is these days,
particularly given the memory issues we seem to run into there with
larger images.

Cheers,

Richard
Jiaxun Yang Oct. 14, 2020, 10:20 a.m. UTC | #7
在 2020/10/13 21:25, Philippe Mathieu-Daudé 写道:
> Allow changing the number of TLB entries for
> testing/tunning purpose.
>
> Example to force a 34Kf cpu with 64 TLB:
>
>    $ qemu-system-mipsel -cpu 34Kf,tlb-entries=64 ...
>
> This is helpful for developers of the Yocto Project [*]:
>
>    Yocto Project uses qemu-system-mips 34Kf cpu model, to run 32bit
>    MIPS CI loop. It was observed that in this case CI test execution
>    time was almost twice longer than 64bit MIPS variant that runs
>    under MIPS64R2-generic model. It was investigated and concluded
>    that the difference in number of TLBs 16 in 34Kf case vs 64 in
>    MIPS64R2-generic is responsible for most of CI real time execution
>    difference. Because with 16 TLBs linux user-land trashes TLB more
>    and it needs to execute more instructions in TLB refill handler
>    calls, as result it runs much longer.
>
> [*] https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html
>
> Buglink: https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992
> Reported-by: Victor Kamensky <kamensky@cisco.com>
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
Hi Philippe,

I think name can this property vtlb-entries?

MIPS R2 had introduced dual-TLB feature and the entries specified here
is the number of VTLB, while FTLB is another set of entries with fixed 
pagesize.

Although MIPS TCG haven't implemented dual-TLB but it can prevent future
confusion.

Thanks.

- Jiaxun
Philippe Mathieu-Daudé Oct. 14, 2020, 10:54 a.m. UTC | #8
On 10/14/20 12:20 PM, Jiaxun Yang wrote:
> 在 2020/10/13 21:25, Philippe Mathieu-Daudé 写道:

>> Allow changing the number of TLB entries for

>> testing/tunning purpose.

>>

>> Example to force a 34Kf cpu with 64 TLB:

>>

>>    $ qemu-system-mipsel -cpu 34Kf,tlb-entries=64 ...

>>

>> This is helpful for developers of the Yocto Project [*]:

>>

>>    Yocto Project uses qemu-system-mips 34Kf cpu model, to run 32bit

>>    MIPS CI loop. It was observed that in this case CI test execution

>>    time was almost twice longer than 64bit MIPS variant that runs

>>    under MIPS64R2-generic model. It was investigated and concluded

>>    that the difference in number of TLBs 16 in 34Kf case vs 64 in

>>    MIPS64R2-generic is responsible for most of CI real time execution

>>    difference. Because with 16 TLBs linux user-land trashes TLB more

>>    and it needs to execute more instructions in TLB refill handler

>>    calls, as result it runs much longer.

>>

>> [*] https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html

>>

>> Buglink: https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992

>> Reported-by: Victor Kamensky <kamensky@cisco.com>

>> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

>> ---

> Hi Philippe,

> 

> I think name can this property vtlb-entries?

> 

> MIPS R2 had introduced dual-TLB feature and the entries specified here

> is the number of VTLB, while FTLB is another set of entries with fixed 

> pagesize.

> 

> Although MIPS TCG haven't implemented dual-TLB but it can prevent future

> confusion.


Sure, good idea.

I'll follow Richard suggestion first, having a look at the P5600.

> 

> Thanks.

> 

> - Jiaxun

>
Philippe Mathieu-Daudé Oct. 14, 2020, 2:53 p.m. UTC | #9
On 10/14/20 9:14 AM, Richard Purdie wrote:
> On Wed, 2020-10-14 at 01:36 +0000, Victor Kamensky (kamensky) wrote:
>> Thank you very much for looking at this. I gave a spin to
>> your 3 patch series in original setup, and as expected with
>> '-cpu 34Kf,tlb-entries=64' option it works great.
>>
>> If nobody objects, and your patches could be merged, we
>> would greatly appreciate it.
> 
> Speaking as one of the Yocto Project maintainers, this is really
> helpful for us, thanks!
> 
> qemumips is one of our slowest platforms for automated testing so this
> performance improvement helps a lot.

Could you try Richard's suggestion? Using '-cpu P5600' instead?
It is available in Linux since v5.8.

> 
> Cheers,
> 
> Richard
> 
>
Zhijian Li (Fujitsu)" via Oct. 14, 2020, 8:20 p.m. UTC | #10
In order just to keep on the same thread, here is piece of information
I found:

I looked at "MIPS32® 34Kf™ Processor Core Datasheet" [1]

Page 8 in "Joint TLB (JTLB)" section says:

"The JTLB is a fully associative TLB cache containing 16, 32,
or 64-dual-entries mapping up to 128 virtual pages to their
corresponding physical addresses."

So "34Kf-64tlb" cpu model I proposed turns out not to be "fictitious"
after all. Having 64 TLBs is all within this CPU spec. It is not clear
why original 34Kf model choose worst case scenario wrt
TLB numbers. Commit log where 34Kf was introduced does not
have much details.

So IMO on 34Kf route we have the following choices:

1) I can rephrase commit message and resubmit commit for
"34Kf-64tlb" cpu model, if it could be merged

2) We can bump up number of TLBs to 64 in existing 34Kf model
since it is within the spec.

3) Use Phil's series and tlb-entries cpu parameter would cover all
3 variants of 16,32,64 TLBs allowed by 34Kf data sheet spec.

Please see inline wrt asked '-cpu P5600' testing. Look for 'victor2>'

[1] https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00419-2B-34Kf-DTS-01.20.pdf
Khem Raj Oct. 14, 2020, 8:53 p.m. UTC | #11
On Wed, Oct 14, 2020 at 1:20 PM Victor Kamensky (kamensky)
<kamensky@cisco.com> wrote:
>
> In order just to keep on the same thread, here is piece of information
> I found:
>
> I looked at "MIPS32® 34Kf™ Processor Core Datasheet" [1]
>
> Page 8 in "Joint TLB (JTLB)" section says:
>
> "The JTLB is a fully associative TLB cache containing 16, 32,
> or 64-dual-entries mapping up to 128 virtual pages to their
> corresponding physical addresses."
>
> So "34Kf-64tlb" cpu model I proposed turns out not to be "fictitious"
> after all. Having 64 TLBs is all within this CPU spec. It is not clear
> why original 34Kf model choose worst case scenario wrt
> TLB numbers. Commit log where 34Kf was introduced does not
> have much details.

thanks for digging this information from CPU specs. It seems using 64
TLB as default might be a good option for 34Kf then

>
> So IMO on 34Kf route we have the following choices:
>
> 1) I can rephrase commit message and resubmit commit for
> "34Kf-64tlb" cpu model, if it could be merged
>
> 2) We can bump up number of TLBs to 64 in existing 34Kf model
> since it is within the spec.

this looks a good option since it is with in specs and is backward compatible.

>
> 3) Use Phil's series and tlb-entries cpu parameter would cover all

I agree.

> 3 variants of 16,32,64 TLBs allowed by 34Kf data sheet spec.
>
> Please see inline wrt asked '-cpu P5600' testing. Look for 'victor2>'
>
> [1] https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00419-2B-34Kf-DTS-01.20.pdf
>
> ________________________________________
> From: Philippe Mathieu-Daudé <philippe.mathieu.daude@gmail.com> on behalf of Philippe Mathieu-Daudé <f4bug@amsat.org>
> Sent: Wednesday, October 14, 2020 7:53 AM
> To: Richard Purdie; Victor Kamensky (kamensky); qemu-devel@nongnu.org
> Cc: Aleksandar Rikalo; Khem Raj; Aleksandar Markovic; Aurelien Jarno; Richard Henderson
> Subject: Re: [RFC PATCH 0/3] target/mips: Make the number of TLB entries a CPU property
>
> On 10/14/20 9:14 AM, Richard Purdie wrote:
> > On Wed, 2020-10-14 at 01:36 +0000, Victor Kamensky (kamensky) wrote:
> >> Thank you very much for looking at this. I gave a spin to
> >> your 3 patch series in original setup, and as expected with
> >> '-cpu 34Kf,tlb-entries=64' option it works great.
> >>
> >> If nobody objects, and your patches could be merged, we
> >> would greatly appreciate it.
> >
> > Speaking as one of the Yocto Project maintainers, this is really
> > helpful for us, thanks!
> >
> > qemumips is one of our slowest platforms for automated testing so this
> > performance improvement helps a lot.
>
> Could you try Richard's suggestion? Using '-cpu P5600' instead?
> It is available in Linux since v5.8.
>
> victor2> I've tried exact image that works on 34Kf and 34Kf-64tlb models
> victor2> image with '-cpu P5600'. it does not boot: it dies in init (systemd).
> victor2> I can look under gdb with qemu -s -S options, what is going on there
> victor2> but it will take time.
> victor2> If someone have some clues what might cause it please let
> victor2> me know. Here is high level information about setup:
> victor2>    - qemu version is 5.1.0
> victor2>    - kernel base version is 5.8.9
> victor2>    - systemd version is 1_246.6
> victor2>    - user land CPU related build options "-meb -meb -mabi=32 -mhard-float -march=mips32r2 -mllsc -mips32r2"
>
> Thanks,
> Victor
>
> >
> > Cheers,
> >
> > Richard
> >
> >
Zhijian Li (Fujitsu)" via Oct. 15, 2020, 6:56 p.m. UTC | #12
Hi Guys,

I looked at issue with P5600 machine under gdb
of kernel. arch_check_elf from arch/mips/kernel/elf.c
rejects our sysroot binaries with -ENOEXEC code,
since our binaries do not have EF_MIPS_NAN2008 ELF
header flag set and this CPU model does not have
cpu_has_nan_legacy, i.e mips_use_nan_legacy=false.
So at least we would need to have to change our
user-land ABI compilation flags to cleanly match
EF_MIPS_NAN2008 requirements. I am not sure whether
it is an option, and how it would impact older
CPUs.

For experiment sake I added ieee754=relaxed kernel
option to override mips_use_nan_legacy setting 
and system gets some sings of life after that but
then it hangs further down the road. I briefly
tried to look at this, but it is not clear what
is going on. On first look it seems that system
is trashing on nested do_page_fault calls. It might
be that something missing in our kernel config, or
we hitting some kernel bug, or bug in P5600 qemu
model. It is hard to tell right now.

Is it fair to say that we put enough effort
exploring P5600 route and it seems does not
work for us without additional substantial
work.

Is possible to come back to 34Kf route, doing
very small localized very well defined change
of bumping TLBs number for model that we know
works well for us?

Since we figured out that 34Kf spec allows 16,
32, or 64 TLBs my first personal preference
would be to use Phil's patch series with
addressing review comments. And additionally
it would be great to set number of 34KF TLB to 64
by default. If anyone out there (IMO unlikely)
depends that before model had only 16 TLBs,
he/she can use cpu parameters to put it back
to 16. My second alternative choice is to
accept 34Kf-64tlb model, after I rephrase
commit message.

Thanks,
Victor
Richard Henderson Oct. 16, 2020, 5:19 p.m. UTC | #13
On 10/15/20 11:56 AM, Victor Kamensky (kamensky) via wrote:
> Is possible to come back to 34Kf route, doing

> very small localized very well defined change

> of bumping TLBs number for model that we know

> works well for us?


Yes, thanks for testing.

I think we should also add a property to enable Config3.PM for any cpu, and see
how that gets on.  But simply adjusting the number of tlb entries is a good
start, and is the only thing that will work for older kernels.


r~