mbox series

[v4,0/3] pseries NUMA distance rework

Message ID 20200904010439.581957-1-danielhb413@gmail.com
Headers show
Series pseries NUMA distance rework | expand

Message

Daniel Henrique Barboza Sept. 4, 2020, 1:04 a.m. UTC
Hi,

Another spin for this rework of the existing NUMA support
for pSeries, prior to incoming changes in how we calculate
NUMA distance in the pseries machine. Changes here were
made based on David Gibson's review of v3.

This patches were rebased using David's ppc-for-5.2 tree.

changes from v3:
- patches 1 to 4 in v3 got pushed to ppc-for-5.2
- patch 1 (former 5):
    * moved spapr_register_hypercall() to spapr_numa.c, turning
h_home_node_associativity back to static
- patch 2 (former 6):
    * using memcpy as suggested in the review of patch 2 of v3
- patch 3 (former 7):
    * fixed typo, variable size
    * using G_STATIC_ASSERT() instead of g_assert()

v3 link: 
https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01726.html


Daniel Henrique Barboza (3):
  spapr: move h_home_node_associativity to spapr_numa.c
  spapr_numa: create a vcpu associativity helper
  spapr_numa: use spapr_numa_get_vcpu_assoc() in home_node hcall

 hw/ppc/spapr_hcall.c | 40 -------------------
 hw/ppc/spapr_numa.c  | 92 +++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 82 insertions(+), 50 deletions(-)

Comments

David Gibson Sept. 4, 2020, 4:10 a.m. UTC | #1
On Thu, Sep 03, 2020 at 10:04:38PM -0300, Daniel Henrique Barboza wrote:
> The work to be done in h_home_node_associativity() intersects
> with what is already done in spapr_numa_fixup_cpu_dt(). This
> patch creates a new helper, spapr_numa_get_vcpu_assoc(), to
> be used for both spapr_numa_fixup_cpu_dt() and
> h_home_node_associativity().
> 
> While we're at it, use memcpy() instead of loop assignment
> to created the returned array.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  hw/ppc/spapr_numa.c | 30 ++++++++++++++++++++----------
>  1 file changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
> index 368c1a494d..980a6488bf 100644
> --- a/hw/ppc/spapr_numa.c
> +++ b/hw/ppc/spapr_numa.c
> @@ -71,13 +71,15 @@ void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
>                        sizeof(spapr->numa_assoc_array[nodeid]))));
>  }
>  
> -int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
> -                            int offset, PowerPCCPU *cpu)
> +static uint32_t *spapr_numa_get_vcpu_assoc(SpaprMachineState *spapr,
> +                                          PowerPCCPU *cpu,
> +                                          uint *vcpu_assoc_size)
>  {
> -    uint vcpu_assoc_size = NUMA_ASSOC_SIZE + 1;
> -    uint32_t vcpu_assoc[vcpu_assoc_size];
> +    uint32_t *vcpu_assoc = NULL;
>      int index = spapr_get_vcpu_id(cpu);
> -    int i;
> +
> +    *vcpu_assoc_size = (NUMA_ASSOC_SIZE + 1) * sizeof(uint32_t);
> +    vcpu_assoc = g_malloc(*vcpu_assoc_size);
>  
>      /*
>       * VCPUs have an extra 'cpu_id' value in ibm,associativity
> @@ -86,16 +88,24 @@ int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
>       * cpu_id last.
>       */
>      vcpu_assoc[0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS + 1);
> +    memcpy(vcpu_assoc + 1, spapr->numa_assoc_array[cpu->node_id],
> +           MAX_DISTANCE_REF_POINTS);

That needs to be MAX_DISTANCE_REF_POINTS * sizeof(uint32_t), doesn't it?

> +    vcpu_assoc[MAX_DISTANCE_REF_POINTS + 1] = cpu_to_be32(index);
>  
> -    for (i = 1; i <= MAX_DISTANCE_REF_POINTS; i++) {
> -        vcpu_assoc[i] = spapr->numa_assoc_array[cpu->node_id][i];
> -    }
> +    return vcpu_assoc;
> +}
> +
> +int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
> +                            int offset, PowerPCCPU *cpu)
> +{
> +    g_autofree uint32_t *vcpu_assoc = NULL;
> +    uint vcpu_assoc_size;
>  
> -    vcpu_assoc[vcpu_assoc_size - 1] = cpu_to_be32(index);
> +    vcpu_assoc = spapr_numa_get_vcpu_assoc(spapr, cpu, &vcpu_assoc_size);
>  
>      /* Advertise NUMA via ibm,associativity */
>      return fdt_setprop(fdt, offset, "ibm,associativity",
> -                       vcpu_assoc, sizeof(vcpu_assoc));
> +                       vcpu_assoc, vcpu_assoc_size);>  }
>  
>
Daniel Henrique Barboza Sept. 4, 2020, 10:19 a.m. UTC | #2
On 9/4/20 7:02 AM, Greg Kurz wrote:
> On Thu,  3 Sep 2020 22:04:38 -0300
> Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> 
>> The work to be done in h_home_node_associativity() intersects
>> with what is already done in spapr_numa_fixup_cpu_dt(). This
>> patch creates a new helper, spapr_numa_get_vcpu_assoc(), to
>> be used for both spapr_numa_fixup_cpu_dt() and
>> h_home_node_associativity().
>>
>> While we're at it, use memcpy() instead of loop assignment
>> to created the returned array.
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
> 
> Hi Daniel,
> 
> A few comments below.
> 
>>   hw/ppc/spapr_numa.c | 30 ++++++++++++++++++++----------
>>   1 file changed, 20 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
>> index 368c1a494d..980a6488bf 100644
>> --- a/hw/ppc/spapr_numa.c
>> +++ b/hw/ppc/spapr_numa.c
>> @@ -71,13 +71,15 @@ void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
>>                         sizeof(spapr->numa_assoc_array[nodeid]))));
>>   }
>>   
>> -int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
>> -                            int offset, PowerPCCPU *cpu)
>> +static uint32_t *spapr_numa_get_vcpu_assoc(SpaprMachineState *spapr,
>> +                                          PowerPCCPU *cpu,
>> +                                          uint *vcpu_assoc_size)
>>   {
>> -    uint vcpu_assoc_size = NUMA_ASSOC_SIZE + 1;
>> -    uint32_t vcpu_assoc[vcpu_assoc_size];
>> +    uint32_t *vcpu_assoc = NULL;
> 
> You don't need to initialize this pointer since it is assigned a value
> unconditionally just below.
> 
>>       int index = spapr_get_vcpu_id(cpu);
>> -    int i;
>> +
>> +    *vcpu_assoc_size = (NUMA_ASSOC_SIZE + 1) * sizeof(uint32_t);
> 
> It's a bit weird to return something that is definitely a compile
> time constant by reference... What about introducing a macro ?
> 
> #define VCPU_NUMA_ASSOC_SIZE (NUMA_ASSOC_SIZE + 1)
> 
>> +    vcpu_assoc = g_malloc(*vcpu_assoc_size);
>>   
> 
>      vcpu_assoc = g_new(uint32_t, VCPU_NUMA_ASSOC_SIZE);
> 
>>       /*
>>        * VCPUs have an extra 'cpu_id' value in ibm,associativity
>> @@ -86,16 +88,24 @@ int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
>>        * cpu_id last.
>>        */
>>       vcpu_assoc[0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS + 1);
>> +    memcpy(vcpu_assoc + 1, spapr->numa_assoc_array[cpu->node_id],
>> +           MAX_DISTANCE_REF_POINTS);
>> +    vcpu_assoc[MAX_DISTANCE_REF_POINTS + 1] = cpu_to_be32(index);
>>   
> 
>      memcpy(vcpu_assoc + 1, spapr->numa_assoc_array[cpu->node_id],
>             (VPCU_ASSOC_SIZE - 2) * sizeof(uint32_t));
>      vcpu_assoc[VCPU_ASSOC_SIZE - 1] = cpu_to_be32(index);
> 
> I personally find more clear than using MAX_DISTANCE_REF_POINTS in an array
> that was just allocated with NUMA_ASSOC_SIZE... one has to check spapr.h
> to see that NUMA_ASSOC_SIZE == MAX_DISTANCE_REF_POINTS + 1


That all makes sense to me. I'll introduce a VCPU_ASSOC_SIZE in spapr_numa.h
and use it when operating the associativity for vcpus, both in this patch
and also in patch 3.


Thanks,


DHB


> 
>> -    for (i = 1; i <= MAX_DISTANCE_REF_POINTS; i++) {
>> -        vcpu_assoc[i] = spapr->numa_assoc_array[cpu->node_id][i];
>> -    }
>> +    return vcpu_assoc;
>> +}
>> +
>> +int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
>> +                            int offset, PowerPCCPU *cpu)
>> +{
>> +    g_autofree uint32_t *vcpu_assoc = NULL;
>> +    uint vcpu_assoc_size;
>>   
>> -    vcpu_assoc[vcpu_assoc_size - 1] = cpu_to_be32(index);
>> +    vcpu_assoc = spapr_numa_get_vcpu_assoc(spapr, cpu, &vcpu_assoc_size);
>>   
>>       /* Advertise NUMA via ibm,associativity */
>>       return fdt_setprop(fdt, offset, "ibm,associativity",
>> -                       vcpu_assoc, sizeof(vcpu_assoc));
>> +                       vcpu_assoc, vcpu_assoc_size);
> 
>      return fdt_setprop(fdt, offset, "ibm,associativity",
>                         vcpu_assoc, VCPU_NUMA_ASSOC_SIZE * sizeof(uint32_t));
> 
>>   }
>>   
>>   
>
Greg Kurz Sept. 4, 2020, 10:33 a.m. UTC | #3
On Thu,  3 Sep 2020 22:04:39 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> The current implementation of h_home_node_associativity hard codes
> the values of associativity domains of the vcpus. Let's make
> it consider the values already initialized in spapr->numa_assoc_array,
> via the spapr_numa_get_vcpu_assoc() helper.
> 
> We want to set it and forget it, and for that we also need to
> assert that we don't overflow the registers of the hypercall.
> From R4 to R9 we can squeeze in 12 associativity domains, so
> let's assert that MAX_DISTANCE_REF_POINTS isn't greater
> than that.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  hw/ppc/spapr_numa.c | 33 +++++++++++++++++++++++++--------
>  1 file changed, 25 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
> index 980a6488bf..0a7e07fe60 100644
> --- a/hw/ppc/spapr_numa.c
> +++ b/hw/ppc/spapr_numa.c
> @@ -181,10 +181,12 @@ static target_ulong h_home_node_associativity(PowerPCCPU *cpu,
>                                                target_ulong opcode,
>                                                target_ulong *args)
>  {
> +    g_autofree uint32_t *vcpu_assoc = NULL;
>      target_ulong flags = args[0];
>      target_ulong procno = args[1];
>      PowerPCCPU *tcpu;
> -    int idx;
> +    uint vcpu_assoc_size;
> +    int idx, assoc_idx;
>  
>      /* only support procno from H_REGISTER_VPA */
>      if (flags != 0x1) {
> @@ -196,16 +198,31 @@ static target_ulong h_home_node_associativity(PowerPCCPU *cpu,
>          return H_P2;
>      }
>  
> -    /* sequence is the same as in the "ibm,associativity" property */
> +    /*
> +     * Given that we want to be flexible with the sizes and indexes,
> +     * we must consider that there is a hard limit of how many
> +     * associativities domain we can fit in R4 up to R9, which
> +     * would be 12. Assert and bail if that's not the case.
> +     */
> +    G_STATIC_ASSERT(MAX_DISTANCE_REF_POINTS <= 12);
> +
> +    vcpu_assoc = spapr_numa_get_vcpu_assoc(spapr, tcpu, &vcpu_assoc_size);
> +    vcpu_assoc_size /= sizeof(uint32_t);

Using vcpu_assoc_size both as a size-in-bytes and a number of elements in
the array is gross... Anyway since this should go away if you introduce
a macro as suggested in the previous patch.

> +    /* assoc_idx starts at 1 to skip associativity size */
> +    assoc_idx = 1;
>  
> -    idx = 0;
>  #define ASSOCIATIVITY(a, b) (((uint64_t)(a) << 32) | \
>                               ((uint64_t)(b) & 0xffffffff))
> -    args[idx++] = ASSOCIATIVITY(0, 0);
> -    args[idx++] = ASSOCIATIVITY(0, tcpu->node_id);
> -    args[idx++] = ASSOCIATIVITY(procno, -1);
> -    for ( ; idx < 6; idx++) {
> -        args[idx] = -1;
> +
> +    for (idx = 0; idx < 6; idx++) {
> +        int32_t a, b;
> +
> +        a = assoc_idx < vcpu_assoc_size ?
> +            be32_to_cpu(vcpu_assoc[assoc_idx++]) : -1;
> +        b = assoc_idx < vcpu_assoc_size ?
> +            be32_to_cpu(vcpu_assoc[assoc_idx++]) : -1;
> +
> +        args[idx] = ASSOCIATIVITY(a, b);
>      }

Ouch this change is really giving me a headache... I understand that
tcpu->node_id and procno are now being read from vcpu_assoc[] but
it's hard to check what vcpu_assoc[assoc_idx++] points to, especially
with the ternary operator... Honestly, I'd rather keep that loop
unrolled with comments telling what's being read.

>  #undef ASSOCIATIVITY
>