Message ID | 20200901125645.118026-1-danielhb413@gmail.com |
---|---|
Headers | show |
Series | pseries NUMA distance rework | expand |
On Tue, Sep 01, 2020 at 09:56:43AM -0300, Daniel Henrique Barboza wrote: > In a similar fashion as the previous patch, let's move the > handling of ibm,associativity-lookup-arrays from spapr.c to > spapr_numa.c. A spapr_numa_write_assoc_lookup_arrays() helper was > created, and spapr_dt_dynamic_reconfiguration_memory() can now > use it to advertise the lookup-arrays. > > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> > --- > hw/ppc/spapr.c | 25 ++---------------------- > hw/ppc/spapr_numa.c | 39 +++++++++++++++++++++++++++++++++++++ > include/hw/ppc/spapr_numa.h | 2 ++ > 3 files changed, 43 insertions(+), 23 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 172f965fe0..65d2ccd578 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -535,13 +535,10 @@ static int spapr_dt_dynamic_reconfiguration_memory(SpaprMachineState *spapr, > void *fdt) > { > MachineState *machine = MACHINE(spapr); > - int nb_numa_nodes = machine->numa_state->num_nodes; > - int ret, i, offset; > + int ret, offset; > uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; > uint32_t prop_lmb_size[] = {cpu_to_be32(lmb_size >> 32), > cpu_to_be32(lmb_size & 0xffffffff)}; > - uint32_t *int_buf, *cur_index, buf_len; > - int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1; > MemoryDeviceInfoList *dimms = NULL; > > /* > @@ -582,25 +579,7 @@ static int spapr_dt_dynamic_reconfiguration_memory(SpaprMachineState *spapr, > return ret; > } > > - /* ibm,associativity-lookup-arrays */ > - buf_len = (nr_nodes * 4 + 2) * sizeof(uint32_t); > - cur_index = int_buf = g_malloc0(buf_len); > - int_buf[0] = cpu_to_be32(nr_nodes); > - int_buf[1] = cpu_to_be32(4); /* Number of entries per associativity list */ > - cur_index += 2; > - for (i = 0; i < nr_nodes; i++) { > - uint32_t associativity[] = { > - cpu_to_be32(0x0), > - cpu_to_be32(0x0), > - cpu_to_be32(0x0), > - cpu_to_be32(i) > - }; > - memcpy(cur_index, associativity, sizeof(associativity)); > - cur_index += 4; > - } > - ret = fdt_setprop(fdt, offset, "ibm,associativity-lookup-arrays", int_buf, > - (cur_index - int_buf) * sizeof(uint32_t)); > - g_free(int_buf); > + ret = spapr_numa_write_assoc_lookup_arrays(spapr, fdt, offset); > > return ret; > } > diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c > index b8882d209e..9eb4bdbe80 100644 > --- a/hw/ppc/spapr_numa.c > +++ b/hw/ppc/spapr_numa.c > @@ -75,6 +75,45 @@ int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt, > vcpu_assoc, sizeof(vcpu_assoc)); > } > > + > +int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt, > + int offset) > +{ > + MachineState *machine = MACHINE(spapr); > + SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); > + int nb_numa_nodes = machine->numa_state->num_nodes; > + int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1; > + uint32_t *int_buf, *cur_index, buf_len; > + int ret, i, j; > + > + /* ibm,associativity-lookup-arrays */ > + buf_len = (nr_nodes * MAX_DISTANCE_REF_POINTS + 2) * sizeof(uint32_t); > + cur_index = int_buf = g_malloc0(buf_len); > + int_buf[0] = cpu_to_be32(nr_nodes); > + /* Number of entries per associativity list */ > + int_buf[1] = cpu_to_be32(MAX_DISTANCE_REF_POINTS); > + cur_index += 2; > + for (i = 0; i < nr_nodes; i++) { > + /* > + * For the lookup-array we use the ibm,associativity array, > + * from numa_assoc_array. without the first element (size). > + */ > + uint32_t associativity[MAX_DISTANCE_REF_POINTS]; > + > + for (j = 0; j < MAX_DISTANCE_REF_POINTS; j++) { > + associativity[j] = smc->numa_assoc_array[i][j + 1]; > + } > + > + memcpy(cur_index, associativity, sizeof(associativity)); AFAICT, you could just use a single memcpy() to copy from the numa_assoc_array() into the property here, rather than using a loop and temporary array. > + cur_index += 4; Shouldn't this be += MAX_DISTANCE_REF_POINTS? > + } > + ret = fdt_setprop(fdt, offset, "ibm,associativity-lookup-arrays", int_buf, > + (cur_index - int_buf) * sizeof(uint32_t)); > + g_free(int_buf); > + > + return ret; > +} > + > /* > * Helper that writes ibm,associativity-reference-points and > * max-associativity-domains in the RTAS pointed by @rtas > diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h > index f92fb4f28a..f6127501a6 100644 > --- a/include/hw/ppc/spapr_numa.h > +++ b/include/hw/ppc/spapr_numa.h > @@ -22,6 +22,8 @@ void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt, > int offset, int nodeid); > int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt, > int offset, PowerPCCPU *cpu); > +int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt, > + int offset); > > > #endif /* HW_SPAPR_NUMA_H */
On Tue, Sep 01, 2020 at 09:56:38AM -0300, Daniel Henrique Barboza wrote: > Hi, > > Following the reviews of the first version [1], specially this > reply from David [2], I decided to take a step back and refactor > all the code in hw/ppc/spapr* that operates with ibm,associativity, > ibm,associativity-reference-points and ibm,max-associativity-domains. > > A new file named 'spapr_numa.c' was created to gather all the > associativity related code into helpers that write NUMA/associativity > related info to the FDT. These helpers are then used in other > spapr_* files. This allows us to change NUMA related code in a > single location, instead of searching every file to see where is > associativity being written and how, and all the soon to get > more complex logic can be contained in spapr_numa.c. I consider > the end result to be better than what I ended up doing in v1. > > Unlike v1, there is no NUMA distance change being done in this series. > Later on, the hub of the new NUMA distance calculation will be > spapr_numa_associativity_init(), where we'll take into consideration > user input from numa_states, handle sizes to what the PAPR kernel > understands and establish assoaciativity domains between the NUMA > nodes. Patches 1..4 applied to ppc-for-5.2. Patch 5 has some nits I've commented on. > > > Changes from v1: > - all the patches that did guest visible changes were removed. They > will be re-submitted in a follow-up series after this one. > - patch 02 from v1 will be reworked and reposted in the follow-up > series as well. > - version 1 link: > https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg03169.html > > > > These patches were rebased using David's ppc-for-5.2 tree. Github > repo with the patches applied: > > https://github.com/danielhb/qemu/tree/spapr_numa_v2 > > > [1] https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg03169.html > [2] https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg04661.html > > Daniel Henrique Barboza (7): > ppc: introducing spapr_numa.c NUMA code helper > ppc/spapr_nvdimm: turn spapr_dt_nvdimm() static > spapr: introduce SpaprMachineClass::numa_assoc_array > spapr, spapr_numa: handle vcpu ibm,associativity > spapr, spapr_numa: move lookup-arrays handling to spapr_numa.c > spapr_numa: move NVLink2 associativity handling to spapr_numa.c > spapr_hcall: h_home_node_associativity now reads numa_assoc_array > > hw/ppc/meson.build | 3 +- > hw/ppc/spapr.c | 91 +++--------------- > hw/ppc/spapr_hcall.c | 16 +++- > hw/ppc/spapr_numa.c | 172 ++++++++++++++++++++++++++++++++++ > hw/ppc/spapr_nvdimm.c | 37 ++++---- > hw/ppc/spapr_pci.c | 9 +- > hw/ppc/spapr_pci_nvlink2.c | 19 +--- > include/hw/ppc/spapr.h | 13 ++- > include/hw/ppc/spapr_numa.h | 32 +++++++ > include/hw/ppc/spapr_nvdimm.h | 3 +- > 10 files changed, 268 insertions(+), 127 deletions(-) > create mode 100644 hw/ppc/spapr_numa.c > create mode 100644 include/hw/ppc/spapr_numa.h >
On Thu, Sep 03, 2020 at 11:35:39AM +1000, David Gibson wrote: > On Tue, Sep 01, 2020 at 09:56:38AM -0300, Daniel Henrique Barboza wrote: > > Hi, > > > > Following the reviews of the first version [1], specially this > > reply from David [2], I decided to take a step back and refactor > > all the code in hw/ppc/spapr* that operates with ibm,associativity, > > ibm,associativity-reference-points and ibm,max-associativity-domains. > > > > A new file named 'spapr_numa.c' was created to gather all the > > associativity related code into helpers that write NUMA/associativity > > related info to the FDT. These helpers are then used in other > > spapr_* files. This allows us to change NUMA related code in a > > single location, instead of searching every file to see where is > > associativity being written and how, and all the soon to get > > more complex logic can be contained in spapr_numa.c. I consider > > the end result to be better than what I ended up doing in v1. > > > > Unlike v1, there is no NUMA distance change being done in this series. > > Later on, the hub of the new NUMA distance calculation will be > > spapr_numa_associativity_init(), where we'll take into consideration > > user input from numa_states, handle sizes to what the PAPR kernel > > understands and establish assoaciativity domains between the NUMA > > nodes. > > Patches 1..4 applied to ppc-for-5.2. Patch 5 has some nits I've > commented on. Ah, sorry, I realised I missed something. Patches 1..2 are still applied, but patch 3 has a nit large enough to call for a respin.
On 9/2/20 10:46 PM, David Gibson wrote: > On Tue, Sep 01, 2020 at 09:56:45AM -0300, Daniel Henrique Barboza wrote: >> home_node_associativity reply now uses the associativity >> values for tcpu->node_id provided by numa_assoc_array. >> >> This will avoid further changes in this code when numa_assoc_array >> changes values, but it won't be enough to prevent further changes >> if (falar aqui q se mudar o tamanho do array tem q mexer nessa >> funcao tambem, falar q a macro associativity() deixa a automacao >> de tudo mto unreadable) > > Uh.. I'm guessing that was a note to yourself you intended to > translate before publishing? Ops! Forgot to edit it :| > >> >> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> >> --- >> hw/ppc/spapr_hcall.c | 16 ++++++++++++++-- >> 1 file changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c >> index c1d01228c6..2ec30efdcb 100644 >> --- a/hw/ppc/spapr_hcall.c >> +++ b/hw/ppc/spapr_hcall.c >> @@ -1878,9 +1878,13 @@ static target_ulong h_home_node_associativity(PowerPCCPU *cpu, >> target_ulong opcode, >> target_ulong *args) > > You could move this function to spapr_numa.c as well (the name's a bit > misleading, but spapr_hcall.c isn't really intended to hold *all* > hcall implementations, just the ones that don't have somewhere better > to live). Ok! > >> { >> + SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); >> target_ulong flags = args[0]; >> target_ulong procno = args[1]; >> PowerPCCPU *tcpu; >> + uint32_t assoc_domain1; >> + uint32_t assoc_domain2; >> + uint32_t assoc_domain3; >> int idx; >> >> /* only support procno from H_REGISTER_VPA */ >> @@ -1893,13 +1897,21 @@ static target_ulong h_home_node_associativity(PowerPCCPU *cpu, >> return H_P2; >> } >> >> + /* >> + * Index 0 is the ibm,associativity size of the node, >> + * which isn't relevant here. >> + */ >> + assoc_domain1 = smc->numa_assoc_array[tcpu->node_id][1]; >> + assoc_domain2 = smc->numa_assoc_array[tcpu->node_id][2]; >> + assoc_domain3 = smc->numa_assoc_array[tcpu->node_id][3]; > > Using all these temporaries is a little ugly. Maybe do something like: > uint32_t *assoc = smc->numa_assoc_array[tcpu->node_id]; > > Then just use assoc[1], assoc[2] etc. below. > >> + >> /* sequence is the same as in the "ibm,associativity" property */ >> >> idx = 0; >> #define ASSOCIATIVITY(a, b) (((uint64_t)(a) << 32) | \ >> ((uint64_t)(b) & 0xffffffff)) >> - args[idx++] = ASSOCIATIVITY(0, 0); >> - args[idx++] = ASSOCIATIVITY(0, tcpu->node_id); >> + args[idx++] = ASSOCIATIVITY(assoc_domain1, assoc_domain2); >> + args[idx++] = ASSOCIATIVITY(assoc_domain3, tcpu->node_id); >> args[idx++] = ASSOCIATIVITY(procno, -1); >> for ( ; idx < 6; idx++) { >> args[idx] = -1; > > Better yet would be to make this handle an arbitrary length of assoc > array, further isolating this from the specifics of how we construct > the arrays. Ideally, you'd call a common path with > spapr_numa_fixup_cpu_dt() to get the assoc array for a cpu, then here > just translate it into the in-register format the hcall expects. Since we're moving this to spapr_numa.c then I guess it's ok to add more juggling to handle an arbitrary size. I got a bit nervous about adding too much stuff here in spapr_hcall.c. Thanks, DHB >
On 9/2/20 10:34 PM, David Gibson wrote: > On Tue, Sep 01, 2020 at 09:56:43AM -0300, Daniel Henrique Barboza wrote: >> In a similar fashion as the previous patch, let's move the >> handling of ibm,associativity-lookup-arrays from spapr.c to >> spapr_numa.c. A spapr_numa_write_assoc_lookup_arrays() helper was >> created, and spapr_dt_dynamic_reconfiguration_memory() can now >> use it to advertise the lookup-arrays. >> >> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> >> --- >> hw/ppc/spapr.c | 25 ++---------------------- >> hw/ppc/spapr_numa.c | 39 +++++++++++++++++++++++++++++++++++++ >> include/hw/ppc/spapr_numa.h | 2 ++ >> 3 files changed, 43 insertions(+), 23 deletions(-) >> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >> index 172f965fe0..65d2ccd578 100644 >> --- a/hw/ppc/spapr.c >> +++ b/hw/ppc/spapr.c >> @@ -535,13 +535,10 @@ static int spapr_dt_dynamic_reconfiguration_memory(SpaprMachineState *spapr, >> void *fdt) >> { >> MachineState *machine = MACHINE(spapr); >> - int nb_numa_nodes = machine->numa_state->num_nodes; >> - int ret, i, offset; >> + int ret, offset; >> uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE; >> uint32_t prop_lmb_size[] = {cpu_to_be32(lmb_size >> 32), >> cpu_to_be32(lmb_size & 0xffffffff)}; >> - uint32_t *int_buf, *cur_index, buf_len; >> - int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1; >> MemoryDeviceInfoList *dimms = NULL; >> >> /* >> @@ -582,25 +579,7 @@ static int spapr_dt_dynamic_reconfiguration_memory(SpaprMachineState *spapr, >> return ret; >> } >> >> - /* ibm,associativity-lookup-arrays */ >> - buf_len = (nr_nodes * 4 + 2) * sizeof(uint32_t); >> - cur_index = int_buf = g_malloc0(buf_len); >> - int_buf[0] = cpu_to_be32(nr_nodes); >> - int_buf[1] = cpu_to_be32(4); /* Number of entries per associativity list */ >> - cur_index += 2; >> - for (i = 0; i < nr_nodes; i++) { >> - uint32_t associativity[] = { >> - cpu_to_be32(0x0), >> - cpu_to_be32(0x0), >> - cpu_to_be32(0x0), >> - cpu_to_be32(i) >> - }; >> - memcpy(cur_index, associativity, sizeof(associativity)); >> - cur_index += 4; >> - } >> - ret = fdt_setprop(fdt, offset, "ibm,associativity-lookup-arrays", int_buf, >> - (cur_index - int_buf) * sizeof(uint32_t)); >> - g_free(int_buf); >> + ret = spapr_numa_write_assoc_lookup_arrays(spapr, fdt, offset); >> >> return ret; >> } >> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c >> index b8882d209e..9eb4bdbe80 100644 >> --- a/hw/ppc/spapr_numa.c >> +++ b/hw/ppc/spapr_numa.c >> @@ -75,6 +75,45 @@ int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt, >> vcpu_assoc, sizeof(vcpu_assoc)); >> } >> >> + >> +int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt, >> + int offset) >> +{ >> + MachineState *machine = MACHINE(spapr); >> + SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); >> + int nb_numa_nodes = machine->numa_state->num_nodes; >> + int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1; >> + uint32_t *int_buf, *cur_index, buf_len; >> + int ret, i, j; >> + >> + /* ibm,associativity-lookup-arrays */ >> + buf_len = (nr_nodes * MAX_DISTANCE_REF_POINTS + 2) * sizeof(uint32_t); >> + cur_index = int_buf = g_malloc0(buf_len); >> + int_buf[0] = cpu_to_be32(nr_nodes); >> + /* Number of entries per associativity list */ >> + int_buf[1] = cpu_to_be32(MAX_DISTANCE_REF_POINTS); >> + cur_index += 2; >> + for (i = 0; i < nr_nodes; i++) { >> + /* >> + * For the lookup-array we use the ibm,associativity array, >> + * from numa_assoc_array. without the first element (size). >> + */ >> + uint32_t associativity[MAX_DISTANCE_REF_POINTS]; >> + >> + for (j = 0; j < MAX_DISTANCE_REF_POINTS; j++) { >> + associativity[j] = smc->numa_assoc_array[i][j + 1]; >> + } >> + >> + memcpy(cur_index, associativity, sizeof(associativity)); > > AFAICT, you could just use a single memcpy() to copy from the > numa_assoc_array() into the property here, rather than using a loop > and temporary array. I remember that I was having some weird problems with memcpy() and numa_assoc_array and this is how I got around it. I'll try to sort it out again. > >> + cur_index += 4; > > Shouldn't this be += MAX_DISTANCE_REF_POINTS? Yeah it should. Good catch. > >> + } >> + ret = fdt_setprop(fdt, offset, "ibm,associativity-lookup-arrays", int_buf, >> + (cur_index - int_buf) * sizeof(uint32_t)); >> + g_free(int_buf); >> + >> + return ret; >> +} >> + >> /* >> * Helper that writes ibm,associativity-reference-points and >> * max-associativity-domains in the RTAS pointed by @rtas >> diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h >> index f92fb4f28a..f6127501a6 100644 >> --- a/include/hw/ppc/spapr_numa.h >> +++ b/include/hw/ppc/spapr_numa.h >> @@ -22,6 +22,8 @@ void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt, >> int offset, int nodeid); >> int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt, >> int offset, PowerPCCPU *cpu); >> +int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt, >> + int offset); >> >> >> #endif /* HW_SPAPR_NUMA_H */ >
On 9/2/20 10:51 PM, David Gibson wrote: > On Tue, Sep 01, 2020 at 09:56:41AM -0300, Daniel Henrique Barboza wrote: >> The next step to centralize all NUMA/associativity handling in >> the spapr machine is to create a 'one stop place' for all >> things ibm,associativity. >> >> This patch introduces numa_assoc_array, a 2 dimensional array >> that will store all ibm,associativity arrays of all NUMA nodes. >> This array is initialized in a new spapr_numa_associativity_init() >> function, called in spapr_machine_init(). It is being initialized >> with the same values used in other ibm,associativity properties >> around spapr files (i.e. all zeros, last value is node_id). >> The idea is to remove all hardcoded definitions and FDT writes >> of ibm,associativity arrays, doing instead a call to the new >> helper spapr_numa_write_associativity_dt() helper, that will >> be able to write the DT with the correct values. >> >> We'll start small, handling the trivial cases first. The >> remaining instances of ibm,associativity will be handled >> next. >> >> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> > > The idea is great, but there's one small but significant problem here: > >> +void spapr_numa_associativity_init(MachineState *machine) >> +{ >> + SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine); >> + int nb_numa_nodes = machine->numa_state->num_nodes; >> + int i; >> + >> + /* >> + * For all associativity arrays: first position is the size, >> + * position MAX_DISTANCE_REF_POINTS is always the numa_id, >> + * represented by the index 'i'. >> + * >> + * This will break on sparse NUMA setups, when/if QEMU starts >> + * to support it, because there will be no more guarantee that >> + * 'i' will be a valid node_id set by the user. >> + */ >> + for (i = 0; i < nb_numa_nodes; i++) { >> + smc->numa_assoc_array[i][0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS); >> + smc->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i); > > This initialization is called on a machine *instance*, which means it > should treat the machine class as read-only. i.e. the > numa_assoc_array should be in the SpaprMachineState, rather than the > class. > > I mean, we'd get away with it in practice, since there's only ever > likely to be a single machine instance, but still we should correct > this. Got it. I'll move it to SpaprMachineState. This will also spare a handful of lines everywhere else since I was instantiating the class just to manipulate the matrix (and now, in hindsight, I figured that this was a warning about the weirdness of what I was doing). Thanks, DHB >