[Xen-devel,RFC,48/49] ARM: allocate two pages for struct vcpu

Message ID 20180209143937.28866-49-andre.przywara@linaro.org
State Superseded
Headers show
Series
  • New VGIC(-v2) implementation
Related show

Commit Message

Andre Przywara Feb. 9, 2018, 2:39 p.m.
At the moment we allocate exactly one page for struct vcpu on ARM, also
have a check in place to prevent it growing beyond 4KB.
As the struct includes the state of all 32 private (per-VCPU) interrupts,
we are at 3840 bytes on arm64 at the moment already. Growing the per-IRQ
VGIC structure even slightly makes the VCPU quickly exceed the 4K limit.
The new VGIC will need more space per virtual IRQ. I spent a few hours
trying to trim this down, but couldn't get it below 4KB, even with the
nasty hacks piling up to save some bytes here and there.
It turns out that beyond efficiency, maybe, there is no real technical
reason this struct has to fit in one page, so lifting the limit to two
pages seems like the most pragmatic solution.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
---
 xen/arch/arm/domain.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Julien Grall Feb. 19, 2018, 2:07 p.m. | #1
Hi,

On 09/02/18 14:39, Andre Przywara wrote:
> At the moment we allocate exactly one page for struct vcpu on ARM, also
> have a check in place to prevent it growing beyond 4KB.
> As the struct includes the state of all 32 private (per-VCPU) interrupts,
> we are at 3840 bytes on arm64 at the moment already. Growing the per-IRQ
> VGIC structure even slightly makes the VCPU quickly exceed the 4K limit.
> The new VGIC will need more space per virtual IRQ. I spent a few hours
> trying to trim this down, but couldn't get it below 4KB, even with the
> nasty hacks piling up to save some bytes here and there.
> It turns out that beyond efficiency, maybe, there is no real technical
> reason this struct has to fit in one page, so lifting the limit to two
> pages seems like the most pragmatic solution.

I looked briefly:

sizeof(struct vcpu):
	arm32: 3809
	arm64: 5248

Clearly, bumping to 8K for 32-bit as well is a no go for me.

For arm64:

struct vgic_irq {
	spinlock_t                 irq_lock;     /*     0     8 */
	struct list_head           lpi_list;     /*     8    16 */
	struct list_head           ap_list;      /*    24    16 */
	struct vcpu *              vcpu;         /*    40     8 */
	struct vcpu *              target_vcpu;  /*    48     8 */
	u32                        intid;        /*    56     4 */
	_Bool                      line_level;   /*    60     1 */
	_Bool                      pending_latch;/*    61     1 */
	_Bool                      active;       /*    62     1 */
	_Bool                      enabled;      /*    63     1 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	_Bool                      hw;           /*    64     1 */

	/* XXX 3 bytes hole, try to pack */

	atomic_t                   refcount;     /*    68     4 */
	u32                        hwintid;      /*    72     4 */
	union {
		u8                 targets;      /*           1 */
		u32                mpidr;        /*           4 */
	};                                       /*    76     4 */
	u8                         source;       /*    80     1 */
	u8                         priority;     /*    81     1 */

	/* XXX 2 bytes hole, try to pack */

	enum vgic_irq_config       config;       /*    84     4 */

	/* size: 88, cachelines: 2, members: 17 */
	/* sum members: 83, holes: 2, sum holes: 5 */
	/* last cacheline: 24 bytes */
};

   - There are 2 holes, with a total waste of 5 bytes per IRQ.
   - The bool fields could be turned into bool :1.
   - config is 4 bytes just for 2 values! Probably worth considered a 
different type (like a bool).

You multiple that saving per 32 and it will free a bit of space. Even if 
it is not going to reach the 4K, those small optimizing are not invasive.

Looking at the other structures, struct vgic_v{2,3}_cpu_if are 
duplicating struct gic_v{2,3}. The state is restored using the latter, 
but it looks like LRs will be saved in both structure... You probably 
want to look at streamlining that.

I might be convinced we need to bump the size of vCPU allocated, but for 
that I need to see that effort have been made to minimize the size of th 
new structures.

Cheers,

> 
> Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
> ---
>   xen/arch/arm/domain.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 87bd493924..4dd34393f1 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -502,10 +502,13 @@ void dump_pageframe_info(struct domain *d)
>   struct vcpu *alloc_vcpu_struct(void)
>   {
>       struct vcpu *v;
> -    BUILD_BUG_ON(sizeof(*v) > PAGE_SIZE);
> -    v = alloc_xenheap_pages(0, 0);
> -    if ( v != NULL )
> +
> +    BUILD_BUG_ON(sizeof(*v) > 2 * PAGE_SIZE);
> +    v = alloc_xenheap_pages(1, 0);
> +    if ( v != NULL ) {
>           clear_page(v);
> +        clear_page((void *)v + PAGE_SIZE);
> +    }
>       return v;
>   }
>   
>

Patch

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 87bd493924..4dd34393f1 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -502,10 +502,13 @@  void dump_pageframe_info(struct domain *d)
 struct vcpu *alloc_vcpu_struct(void)
 {
     struct vcpu *v;
-    BUILD_BUG_ON(sizeof(*v) > PAGE_SIZE);
-    v = alloc_xenheap_pages(0, 0);
-    if ( v != NULL )
+
+    BUILD_BUG_ON(sizeof(*v) > 2 * PAGE_SIZE);
+    v = alloc_xenheap_pages(1, 0);
+    if ( v != NULL ) {
         clear_page(v);
+        clear_page((void *)v + PAGE_SIZE);
+    }
     return v;
 }