[Xen-devel,v3,12/13] xen/arm: Add the property "protected-devices" in the hypervisor node

Message ID 1394552999-14171-13-git-send-email-julien.grall@linaro.org
State Superseded, archived
Headers show

Commit Message

Julien Grall March 11, 2014, 3:49 p.m.
DOM0 is using the swiotlb to bounce DMA. With the IOMMU support in Xen,
protected devices should not use it.

Only Xen is abled to know if an IOMMU protects the device. The new property
"protected-devices" is a list of device phandles protected by an IOMMU.

Signed-off-by: Julien Grall <julien.grall@linaro.org>

---
    This patch *MUST NOT* be applied until we agreed on a device binding
    the device tree folks. DOM0 can run safely with swiotlb on protected
    devices while LVM is not used for guest disk.

    Changes in v2:
        - Patch added
---
 xen/arch/arm/domain_build.c |   51 ++++++++++++++++++++++++++++++++++++++-----
 xen/arch/arm/kernel.h       |    3 +++
 2 files changed, 48 insertions(+), 6 deletions(-)

Comments

Ian Campbell March 18, 2014, 4:48 p.m. | #1
On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> DOM0 is using the swiotlb to bounce DMA. With the IOMMU support in Xen,
> protected devices should not use it.
> 
> Only Xen is abled to know if an IOMMU protects the device. The new property
> "protected-devices" is a list of device phandles protected by an IOMMU.
> 
> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> 
> ---
>     This patch *MUST NOT* be applied until we agreed on a device binding
>     the device tree folks. DOM0 can run safely with swiotlb on protected
>     devices while LVM is not used for guest disk.

LVM works these days I think.

> 
>     Changes in v2:
>         - Patch added
> ---
>  xen/arch/arm/domain_build.c |   51 ++++++++++++++++++++++++++++++++++++++-----
>  xen/arch/arm/kernel.h       |    3 +++
>  2 files changed, 48 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 2438aa0..565784a 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -324,19 +324,22 @@ static int make_memory_node(const struct domain *d,
>      return res;
>  }
>  
> -static int make_hypervisor_node(struct domain *d,
> -                                void *fdt, const struct dt_device_node *parent)
> +static int make_hypervisor_node(struct domain *d, struct kernel_info *kinfo,
> +                                const struct dt_device_node *parent)
>  {
>      const char compat[] =
>          "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>          "xen,xen";
>      __be32 reg[4];
>      gic_interrupt_t intr;
> -    __be32 *cells;
> +    __be32 *cells, *_cells;
>      int res;
>      int addrcells = dt_n_addr_cells(parent);
>      int sizecells = dt_n_size_cells(parent);
>      paddr_t gnttab_start, gnttab_size;
> +    const struct dt_device_node *dev;
> +    struct hvm_iommu *hd = domain_hvm_iommu(d);
> +    void *fdt = kinfo->fdt;
>  
>      DPRINT("Create hypervisor node\n");
>  
> @@ -384,6 +387,39 @@ static int make_hypervisor_node(struct domain *d,
>      if ( res )
>          return res;
>  
> +    if ( kinfo->num_dev_protected )
> +    {
> +        /* Don't need to take dtdevs_lock here */

Why not? Please explain in the comment.

> +        cells = xmalloc_array(__be32, kinfo->num_dev_protected *
> +                              dt_size_to_cells(sizeof(dt_phandle)));
> +        if ( !cells )
> +            return -FDT_ERR_XEN(ENOMEM);
> +
> +        _cells = cells;

Odd numbers of leading _ are reserved for the compiler IIRC. Even
numbers are reserved for the libc/environment which is why we can get
away with such names in hypervisor context.

But lets just skirt the whole issue and pick a name which doesn't use a
leading _. cells_iter or c or something.

Is there no interface to say "make an fdt_property(name, size)"
returning the data to be filled in?

Ian.
Julien Grall March 18, 2014, 8:09 p.m. | #2
Hi Ian,

On 03/18/2014 04:48 PM, Ian Campbell wrote:
> On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
>> DOM0 is using the swiotlb to bounce DMA. With the IOMMU support in Xen,
>> protected devices should not use it.
>>
>> Only Xen is abled to know if an IOMMU protects the device. The new property
>> "protected-devices" is a list of device phandles protected by an IOMMU.
>>
>> Signed-off-by: Julien Grall <julien.grall@linaro.org>
>>
>> ---
>>     This patch *MUST NOT* be applied until we agreed on a device binding
>>     the device tree folks. DOM0 can run safely with swiotlb on protected
>>     devices while LVM is not used for guest disk.
> 
> LVM works these days I think.

With this patch series applied, LVM will be broken if the hard drive is
protected by an IOMMU. It's the case on midway, the platform will crash
just after the guest begins to boot.

>> @@ -384,6 +387,39 @@ static int make_hypervisor_node(struct domain *d,
>>      if ( res )
>>          return res;
>>  
>> +    if ( kinfo->num_dev_protected )
>> +    {
>> +        /* Don't need to take dtdevs_lock here */
> 
> Why not? Please explain in the comment.

Because, building dom0 is only done with 1 CPU online (e.g CPU0). I
though it was obvious, I will update the comment.

>> +        cells = xmalloc_array(__be32, kinfo->num_dev_protected *
>> +                              dt_size_to_cells(sizeof(dt_phandle)));
>> +        if ( !cells )
>> +            return -FDT_ERR_XEN(ENOMEM);
>> +
>> +        _cells = cells;
> 
> Odd numbers of leading _ are reserved for the compiler IIRC. Even
> numbers are reserved for the libc/environment which is why we can get
> away with such names in hypervisor context.
> 
> But lets just skirt the whole issue and pick a name which doesn't use a
> leading _. cells_iter or c or something.
> 
> Is there no interface to say "make an fdt_property(name, size)"
> returning the data to be filled in?

No, every helpers request to have an input data in parameters.

I will rename _cells into cells_iter.

Regards,
Ian Campbell March 19, 2014, 10:33 a.m. | #3
On Tue, 2014-03-18 at 20:09 +0000, Julien Grall wrote:
> Hi Ian,
> 
> On 03/18/2014 04:48 PM, Ian Campbell wrote:
> > On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> >> DOM0 is using the swiotlb to bounce DMA. With the IOMMU support in Xen,
> >> protected devices should not use it.
> >>
> >> Only Xen is abled to know if an IOMMU protects the device. The new property
> >> "protected-devices" is a list of device phandles protected by an IOMMU.
> >>
> >> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> >>
> >> ---
> >>     This patch *MUST NOT* be applied until we agreed on a device binding
> >>     the device tree folks. DOM0 can run safely with swiotlb on protected
> >>     devices while LVM is not used for guest disk.
> > 
> > LVM works these days I think.
> 
> With this patch series applied, LVM will be broken if the hard drive is
> protected by an IOMMU.

How/why?

> It's the case on midway, the platform will crash just after the guest
> begins to boot.

This configuration works today (osstest tests it) so this would be a
regression. Can you sort this please?

> >> @@ -384,6 +387,39 @@ static int make_hypervisor_node(struct domain *d,
> >>      if ( res )
> >>          return res;
> >>  
> >> +    if ( kinfo->num_dev_protected )
> >> +    {
> >> +        /* Don't need to take dtdevs_lock here */
> > 
> > Why not? Please explain in the comment.
> 
> Because, building dom0 is only done with 1 CPU online (e.g CPU0). I
> though it was obvious, I will update the comment.

Locking is never obvious IMHO.

> >> +        cells = xmalloc_array(__be32, kinfo->num_dev_protected *
> >> +                              dt_size_to_cells(sizeof(dt_phandle)));
> >> +        if ( !cells )
> >> +            return -FDT_ERR_XEN(ENOMEM);
> >> +
> >> +        _cells = cells;
> > 
> > Odd numbers of leading _ are reserved for the compiler IIRC. Even
> > numbers are reserved for the libc/environment which is why we can get
> > away with such names in hypervisor context.
> > 
> > But lets just skirt the whole issue and pick a name which doesn't use a
> > leading _. cells_iter or c or something.
> > 
> > Is there no interface to say "make an fdt_property(name, size)"
> > returning the data to be filled in?
> 
> No, every helpers request to have an input data in parameters.

fdt_setprop_inplace isn't helpful because you need to be able to create
the empty prop, which doesn't exist. Oh well.


> I will rename _cells into cells_iter.

Thanks.

Ian.
Julien Grall April 3, 2014, 9:51 p.m. | #4
Hi Ian,

Sorry, I forgot to answer to this email...

On 19/03/14 10:33, Ian Campbell wrote:
> On Tue, 2014-03-18 at 20:09 +0000, Julien Grall wrote:
>> Hi Ian,
>>
>> On 03/18/2014 04:48 PM, Ian Campbell wrote:
>>> On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
>>>> DOM0 is using the swiotlb to bounce DMA. With the IOMMU support in Xen,
>>>> protected devices should not use it.
>>>>
>>>> Only Xen is abled to know if an IOMMU protects the device. The new property
>>>> "protected-devices" is a list of device phandles protected by an IOMMU.
>>>>
>>>> Signed-off-by: Julien Grall <julien.grall@linaro.org>
>>>>
>>>> ---
>>>>      This patch *MUST NOT* be applied until we agreed on a device binding
>>>>      the device tree folks. DOM0 can run safely with swiotlb on protected
>>>>      devices while LVM is not used for guest disk.
>>>
>>> LVM works these days I think.
>>
>> With this patch series applied, LVM will be broken if the hard drive is
>> protected by an IOMMU.
>
> How/why?

If the guest is using LVM for its block, bouncing via the swiotlb with 
IOMMU enabled will result to wrong mapping. If I remember correctly it's 
because SWIOTLB DMA address is a physical address and not an IPA. So the 
IOMMU will rejected the request.

So we have to by-pass swiotlb in this case.

>
>> It's the case on midway, the platform will crash just after the guest
>> begins to boot.
>
> This configuration works today (osstest tests it) so this would be a
> regression. Can you sort this please?

As said above with IOMMU enabled I won't be able to sort it until my 
patch series for Linux will be upstreamed.

Regards,
Ian Campbell April 4, 2014, 9:40 a.m. | #5
On Thu, 2014-04-03 at 22:51 +0100, Julien Grall wrote:
> Hi Ian,
> 
> Sorry, I forgot to answer to this email...
> 
> On 19/03/14 10:33, Ian Campbell wrote:
> > On Tue, 2014-03-18 at 20:09 +0000, Julien Grall wrote:
> >> Hi Ian,
> >>
> >> On 03/18/2014 04:48 PM, Ian Campbell wrote:
> >>> On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> >>>> DOM0 is using the swiotlb to bounce DMA. With the IOMMU support in Xen,
> >>>> protected devices should not use it.
> >>>>
> >>>> Only Xen is abled to know if an IOMMU protects the device. The new property
> >>>> "protected-devices" is a list of device phandles protected by an IOMMU.
> >>>>
> >>>> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> >>>>
> >>>> ---
> >>>>      This patch *MUST NOT* be applied until we agreed on a device binding
> >>>>      the device tree folks. DOM0 can run safely with swiotlb on protected
> >>>>      devices while LVM is not used for guest disk.
> >>>
> >>> LVM works these days I think.
> >>
> >> With this patch series applied, LVM will be broken if the hard drive is
> >> protected by an IOMMU.
> >
> > How/why?
> 
> If the guest is using LVM for its block, bouncing via the swiotlb with 
> IOMMU enabled will result to wrong mapping. If I remember correctly it's 
> because SWIOTLB DMA address is a physical address and not an IPA. So the 
> IOMMU will rejected the request.

Can this not be made to work? The swiotlb shouldn't be breaking things
like this, regardless of whether there is an iommu which it doesn't know
about it should be returning sane results, which might imply the Xen
needs to be giving it sensible results.

At what point is the IOMMU rejecting the request? I didn't expect it to
be involved in the swiotlb path -- that's a copy into dom0 owned RAM
with a known 1:1 mapping (provided either by the literal use of a 1:1
mapping or by using the IOMMU to give the guest that impression).

> So we have to by-pass swiotlb in this case.
> 
> >
> >> It's the case on midway, the platform will crash just after the guest
> >> begins to boot.
> >
> > This configuration works today (osstest tests it) so this would be a
> > regression. Can you sort this please?
> 
> As said above with IOMMU enabled I won't be able to sort it until my 
> patch series for Linux will be upstreamed.

We really need to be able to manage this transition in a compatible way,
that means new kernels working on old hypervisors as well as old kernels
working on new hypervisors (it's obviously fine for this case to bounce
when it doesn't need to).

Ian.
Julien Grall April 4, 2014, 10:25 a.m. | #6
On 04/04/2014 10:40 AM, Ian Campbell wrote:

> We really need to be able to manage this transition in a compatible way,
> that means new kernels working on old hypervisors as well as old kernels
> working on new hypervisors (it's obviously fine for this case to bounce
> when it doesn't need to).

It's not possible because a same platform can have both protected and
non-protected devices. The Linux has to know in some way if the DMA has
to be program with IPA or PA.

The only way is to disable the IOMMU, or don't use partition disk for
guest, until the DOM0 will be able to disable swiotlb.
Ian Campbell April 4, 2014, 10:28 a.m. | #7
On Fri, 2014-04-04 at 11:25 +0100, Julien Grall wrote:
> On 04/04/2014 10:40 AM, Ian Campbell wrote:
> 
> > We really need to be able to manage this transition in a compatible way,
> > that means new kernels working on old hypervisors as well as old kernels
> > working on new hypervisors (it's obviously fine for this case to bounce
> > when it doesn't need to).
> 
> It's not possible because a same platform can have both protected and
> non-protected devices. The Linux has to know in some way if the DMA has
> to be program with IPA or PA.

Then there must be a negotiation between Xen and the Linux kernel so Xen
can know which case to apply.

e.g. if the kernel does not advertise support for protected devices then
Xen must act as if no IOMMU was present. 

> The only way is to disable the IOMMU, or don't use partition disk for
> guest, until the DOM0 will be able to disable swiotlb.
>
Julien Grall April 4, 2014, 10:39 a.m. | #8
On 04/04/2014 11:28 AM, Ian Campbell wrote:
> On Fri, 2014-04-04 at 11:25 +0100, Julien Grall wrote:
>> On 04/04/2014 10:40 AM, Ian Campbell wrote:
>>
>>> We really need to be able to manage this transition in a compatible way,
>>> that means new kernels working on old hypervisors as well as old kernels
>>> working on new hypervisors (it's obviously fine for this case to bounce
>>> when it doesn't need to).
>>
>> It's not possible because a same platform can have both protected and
>> non-protected devices. The Linux has to know in some way if the DMA has
>> to be program with IPA or PA.
> 
> Then there must be a negotiation between Xen and the Linux kernel so Xen
> can know which case to apply.
> 
> e.g. if the kernel does not advertise support for protected devices then
> Xen must act as if no IOMMU was present.

How the kernel can say "I'm supporting IOMMU"? New hypercall?

Xen has to program the IOMMU quite early (e.g before Linux is booting
and use the protected device).

Backporting my patch series to support protected devices is not a big
deal. What about disabling IOMMU by default on ARM until a good support
is made in Linux?
Ian Campbell April 4, 2014, 10:48 a.m. | #9
On Fri, 2014-04-04 at 11:39 +0100, Julien Grall wrote:
> On 04/04/2014 11:28 AM, Ian Campbell wrote:
> > On Fri, 2014-04-04 at 11:25 +0100, Julien Grall wrote:
> >> On 04/04/2014 10:40 AM, Ian Campbell wrote:
> >>
> >>> We really need to be able to manage this transition in a compatible way,
> >>> that means new kernels working on old hypervisors as well as old kernels
> >>> working on new hypervisors (it's obviously fine for this case to bounce
> >>> when it doesn't need to).
> >>
> >> It's not possible because a same platform can have both protected and
> >> non-protected devices. The Linux has to know in some way if the DMA has
> >> to be program with IPA or PA.
> > 
> > Then there must be a negotiation between Xen and the Linux kernel so Xen
> > can know which case to apply.
> > 
> > e.g. if the kernel does not advertise support for protected devices then
> > Xen must act as if no IOMMU was present.
> 
> How the kernel can say "I'm supporting IOMMU"? New hypercall?

On x86 we use the ELF notes to communicate it at build time. We don't
currently have a similar mechanism under ARM but perhaps we need to
invent one now.

There is also __HYPERVISOR_vm_assist which is/was used on PV x86 to
signal these sorts of things, if its not too late.

> Xen has to program the IOMMU quite early (e.g before Linux is booting
> and use the protected device).

In that case an ELF note type solution might be the only option.

However, since this stuff only comes to matter when the guest comes to
do grant mapping it might be that we can defer until runtime and require
that a modern kernel calls vm_assist before making any grant calls. If
it doesn't then it is assumed to be unable to cope with the iommu.

> Backporting my patch series to support protected devices is not a big
> deal. What about disabling IOMMU by default on ARM until a good support
> is made in Linux?

I'd rather avoid this if at all possible, upgrading Xen is not supposed
to require new dom0 kernel features and it is hard to describe "support
for protected devices" as a bug fix.

Ian.
Julien Grall April 4, 2014, 11:01 a.m. | #10
On 04/04/2014 11:48 AM, Ian Campbell wrote:
> On Fri, 2014-04-04 at 11:39 +0100, Julien Grall wrote:
>> On 04/04/2014 11:28 AM, Ian Campbell wrote:
>>> On Fri, 2014-04-04 at 11:25 +0100, Julien Grall wrote:
>>>> On 04/04/2014 10:40 AM, Ian Campbell wrote:
>>>>
>>>>> We really need to be able to manage this transition in a compatible way,
>>>>> that means new kernels working on old hypervisors as well as old kernels
>>>>> working on new hypervisors (it's obviously fine for this case to bounce
>>>>> when it doesn't need to).
>>>>
>>>> It's not possible because a same platform can have both protected and
>>>> non-protected devices. The Linux has to know in some way if the DMA has
>>>> to be program with IPA or PA.
>>>
>>> Then there must be a negotiation between Xen and the Linux kernel so Xen
>>> can know which case to apply.
>>>
>>> e.g. if the kernel does not advertise support for protected devices then
>>> Xen must act as if no IOMMU was present.
>>
>> How the kernel can say "I'm supporting IOMMU"? New hypercall?
> 
> On x86 we use the ELF notes to communicate it at build time. We don't
> currently have a similar mechanism under ARM but perhaps we need to
> invent one now.
> 
> There is also __HYPERVISOR_vm_assist which is/was used on PV x86 to
> signal these sorts of things, if its not too late.
> 
>> Xen has to program the IOMMU quite early (e.g before Linux is booting
>> and use the protected device).
> 
> In that case an ELF note type solution might be the only option.
> 
> However, since this stuff only comes to matter when the guest comes to
> do grant mapping it might be that we can defer until runtime and require
> that a modern kernel calls vm_assist before making any grant calls. If
> it doesn't then it is assumed to be unable to cope with the iommu.

Using vm_assist means we can't anymore denied access to invalid
transaction by default. It sounds like we want to completely disable the
IOMMU, because in this case passthrough should not be enabled.

Futhermore, I can't predict what would happen if the device is used and
the kernel decides to call vm_assist (e.g protect devices). I suppose we
can break the device at this time.

It's not possible in Xen to know if the decide is used or not.

>> Backporting my patch series to support protected devices is not a big
>> deal. What about disabling IOMMU by default on ARM until a good support
>> is made in Linux?
> 
> I'd rather avoid this if at all possible, upgrading Xen is not supposed
> to require new dom0 kernel features and it is hard to describe "support
> for protected devices" as a bug fix.

But we can chose to disable IOMMU by default on ARM. And the user will
have to decide if it's safe or not to use IOMMU.

Regards,
Ian Campbell April 4, 2014, 11:13 a.m. | #11
On Fri, 2014-04-04 at 12:01 +0100, Julien Grall wrote:
> On 04/04/2014 11:48 AM, Ian Campbell wrote:
> > On Fri, 2014-04-04 at 11:39 +0100, Julien Grall wrote:
> >> On 04/04/2014 11:28 AM, Ian Campbell wrote:
> >>> On Fri, 2014-04-04 at 11:25 +0100, Julien Grall wrote:
> >>>> On 04/04/2014 10:40 AM, Ian Campbell wrote:
> >>>>
> >>>>> We really need to be able to manage this transition in a compatible way,
> >>>>> that means new kernels working on old hypervisors as well as old kernels
> >>>>> working on new hypervisors (it's obviously fine for this case to bounce
> >>>>> when it doesn't need to).
> >>>>
> >>>> It's not possible because a same platform can have both protected and
> >>>> non-protected devices. The Linux has to know in some way if the DMA has
> >>>> to be program with IPA or PA.
> >>>
> >>> Then there must be a negotiation between Xen and the Linux kernel so Xen
> >>> can know which case to apply.
> >>>
> >>> e.g. if the kernel does not advertise support for protected devices then
> >>> Xen must act as if no IOMMU was present.
> >>
> >> How the kernel can say "I'm supporting IOMMU"? New hypercall?
> > 
> > On x86 we use the ELF notes to communicate it at build time. We don't
> > currently have a similar mechanism under ARM but perhaps we need to
> > invent one now.
> > 
> > There is also __HYPERVISOR_vm_assist which is/was used on PV x86 to
> > signal these sorts of things, if its not too late.
> > 
> >> Xen has to program the IOMMU quite early (e.g before Linux is booting
> >> and use the protected device).
> > 
> > In that case an ELF note type solution might be the only option.
> > 
> > However, since this stuff only comes to matter when the guest comes to
> > do grant mapping it might be that we can defer until runtime and require
> > that a modern kernel calls vm_assist before making any grant calls. If
> > it doesn't then it is assumed to be unable to cope with the iommu.
> 
> Using vm_assist means we can't anymore denied access to invalid
> transaction by default. It sounds like we want to completely disable the
> IOMMU, because in this case passthrough should not be enabled.

We could enable both of those things only at the point where vm_assist
was called.

And yes, if the dom0 kernel isn't capable of doing stuff with the IOMMU
enable we should turn off passthrough too.

> Futhermore, I can't predict what would happen if the device is used and
> the kernel decides to call vm_assist (e.g protect devices). I suppose we
> can break the device at this time.

I think we can reasonably require that vm_assist be called super early,
i.e. before any DMA operations occur, in Linux terms I think
early_initcall(xen_guest_init) is likely early enough, or we could move
xen_guestr_init even sooner.

We also have the option of a build time feature flag in the image itself
like on x86. It is likely that going forward we will have other cases
where we wished we had such a thing so getting it in place now would be
useful.

> It's not possible in Xen to know if the decide is used or not.

"the decide"?

> 
> >> Backporting my patch series to support protected devices is not a big
> >> deal. What about disabling IOMMU by default on ARM until a good support
> >> is made in Linux?
> > 
> > I'd rather avoid this if at all possible, upgrading Xen is not supposed
> > to require new dom0 kernel features and it is hard to describe "support
> > for protected devices" as a bug fix.
> 
> But we can chose to disable IOMMU by default on ARM. And the user will
> have to decide if it's safe or not to use IOMMU.

That's a cop out and a rubbish user experience going forward.

Ian.
Julien Grall April 4, 2014, 11:23 a.m. | #12
On 04/04/2014 12:13 PM, Ian Campbell wrote:
> We could enable both of those things only at the point where vm_assist
> was called.

[..]

> I think we can reasonably require that vm_assist be called super early,
> i.e. before any DMA operations occur, in Linux terms I think
> early_initcall(xen_guest_init) is likely early enough, or we could move
> xen_guestr_init even sooner.

IHMO, this solution is far too complicate and we will have to fork from
the current IOMMU framework.

> We also have the option of a build time feature flag in the image itself
> like on x86. It is likely that going forward we will have other cases
> where we wished we had such a thing so getting it in place now would be
> useful.

I definitely prefer this solution which seems more cleaner than the others.

Is it possible to embedded notes in a zImage? How x86 handle other
format than ELF?

> 
>> It's not possible in Xen to know if the decide is used or not.
> 
> "the decide"?

The device, I mixed multiple sentences together, sorry.
Ian Campbell April 4, 2014, 12:45 p.m. | #13
On Fri, 2014-04-04 at 12:23 +0100, Julien Grall wrote:
> On 04/04/2014 12:13 PM, Ian Campbell wrote:
> > We could enable both of those things only at the point where vm_assist
> > was called.
> 
> [..]
> 
> > I think we can reasonably require that vm_assist be called super early,
> > i.e. before any DMA operations occur, in Linux terms I think
> > early_initcall(xen_guest_init) is likely early enough, or we could move
> > xen_guestr_init even sooner.
> 
> IHMO, this solution is far too complicate and we will have to fork from
> the current IOMMU framework.

Eh? On the Linux side? You just call this hypercall super early so that
you are sure no drivers have initialised yet and you are done, there is
no need to fork anything AFAICT. And in any case if any generic
framework needs extending to better work with Xen then we should do
that, not fork it or make a worse design to sidestep the need.

Obviously there will be complexity on the Xen side, but I think far too
complicated will be an overstatement.

> > We also have the option of a build time feature flag in the image itself
> > like on x86. It is likely that going forward we will have other cases
> > where we wished we had such a thing so getting it in place now would be
> > useful.
> 
> I definitely prefer this solution which seems more cleaner than the others.
> 
> Is it possible to embedded notes in a zImage? How x86 handle other
> format than ELF?

The x86 bzImage format embeds an ELF file, which something which was
done long ago to improve things for Xen PV x86 use which wants to boot
an ELF.

ARM zImage doesn't currently have anything like that so you would be
looking at an image format extension.

> >> It's not possible in Xen to know if the decide is used or not.
> > 
> > "the decide"?
> 
> The device, I mixed multiple sentences together, sorry.
>
Julien Grall April 4, 2014, 1:10 p.m. | #14
On 04/04/2014 01:45 PM, Ian Campbell wrote:
> On Fri, 2014-04-04 at 12:23 +0100, Julien Grall wrote:
>> On 04/04/2014 12:13 PM, Ian Campbell wrote:
>>> We could enable both of those things only at the point where vm_assist
>>> was called.
>>
>> [..]
>>
>>> I think we can reasonably require that vm_assist be called super early,
>>> i.e. before any DMA operations occur, in Linux terms I think
>>> early_initcall(xen_guest_init) is likely early enough, or we could move
>>> xen_guestr_init even sooner.
>>
>> IHMO, this solution is far too complicate and we will have to fork from
>> the current IOMMU framework.
> 
> Eh? On the Linux side? You just call this hypercall super early so that
> you are sure no drivers have initialised yet and you are done, there is
> no need to fork anything AFAICT. And in any case if any generic
> framework needs extending to better work with Xen then we should do
> that, not fork it or make a worse design to sidestep the need.
> 
> Obviously there will be complexity on the Xen side, but I think far too
> complicated will be an overstatement.

I was talking about Xen side. The current IOMMU platform would need some
rework to support it.

For me this solution is far too complicate compare to the other
solutions (e.g ELF note).

> 
>>> We also have the option of a build time feature flag in the image itself
>>> like on x86. It is likely that going forward we will have other cases
>>> where we wished we had such a thing so getting it in place now would be
>>> useful.
>>
>> I definitely prefer this solution which seems more cleaner than the others.
>>
>> Is it possible to embedded notes in a zImage? How x86 handle other
>> format than ELF?
> 
> The x86 bzImage format embeds an ELF file, which something which was
> done long ago to improve things for Xen PV x86 use which wants to boot
> an ELF.
> 
> ARM zImage doesn't currently have anything like that so you would be
> looking at an image format extension.

Ok. So it will be easy to add support for ELF image. However for zImage,
do you think Linux ARM maintainers will accept a change in the format?
Ian Campbell April 4, 2014, 1:18 p.m. | #15
On Fri, 2014-04-04 at 14:10 +0100, Julien Grall wrote:
> On 04/04/2014 01:45 PM, Ian Campbell wrote:
> > On Fri, 2014-04-04 at 12:23 +0100, Julien Grall wrote:
> >> On 04/04/2014 12:13 PM, Ian Campbell wrote:
> >>> We could enable both of those things only at the point where vm_assist
> >>> was called.
> >>
> >> [..]
> >>
> >>> I think we can reasonably require that vm_assist be called super early,
> >>> i.e. before any DMA operations occur, in Linux terms I think
> >>> early_initcall(xen_guest_init) is likely early enough, or we could move
> >>> xen_guestr_init even sooner.
> >>
> >> IHMO, this solution is far too complicate and we will have to fork from
> >> the current IOMMU framework.
> > 
> > Eh? On the Linux side? You just call this hypercall super early so that
> > you are sure no drivers have initialised yet and you are done, there is
> > no need to fork anything AFAICT. And in any case if any generic
> > framework needs extending to better work with Xen then we should do
> > that, not fork it or make a worse design to sidestep the need.
> > 
> > Obviously there will be complexity on the Xen side, but I think far too
> > complicated will be an overstatement.
> 
> I was talking about Xen side. The current IOMMU platform would need some
> rework to support it.
> 
> For me this solution is far too complicate compare to the other
> solutions (e.g ELF note).
> 
> > 
> >>> We also have the option of a build time feature flag in the image itself
> >>> like on x86. It is likely that going forward we will have other cases
> >>> where we wished we had such a thing so getting it in place now would be
> >>> useful.
> >>
> >> I definitely prefer this solution which seems more cleaner than the others.
> >>
> >> Is it possible to embedded notes in a zImage? How x86 handle other
> >> format than ELF?
> > 
> > The x86 bzImage format embeds an ELF file, which something which was
> > done long ago to improve things for Xen PV x86 use which wants to boot
> > an ELF.
> > 
> > ARM zImage doesn't currently have anything like that so you would be
> > looking at an image format extension.
> 
> Ok. So it will be easy to add support for ELF image. However for zImage,
> do you think Linux ARM maintainers will accept a change in the format?

I've no idea. I'm 100% positive that they will require it to be done in
a backwards compatible way. I don't know how amenable the zImage format
is to such extensions.

On x86 IIRC the kexec people were interested in being able to get an ELF
out of a bzImage, you might find that the same is true on ARM, which
might help with motivating the change.

Ian.

Patch

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 2438aa0..565784a 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -324,19 +324,22 @@  static int make_memory_node(const struct domain *d,
     return res;
 }
 
-static int make_hypervisor_node(struct domain *d,
-                                void *fdt, const struct dt_device_node *parent)
+static int make_hypervisor_node(struct domain *d, struct kernel_info *kinfo,
+                                const struct dt_device_node *parent)
 {
     const char compat[] =
         "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
         "xen,xen";
     __be32 reg[4];
     gic_interrupt_t intr;
-    __be32 *cells;
+    __be32 *cells, *_cells;
     int res;
     int addrcells = dt_n_addr_cells(parent);
     int sizecells = dt_n_size_cells(parent);
     paddr_t gnttab_start, gnttab_size;
+    const struct dt_device_node *dev;
+    struct hvm_iommu *hd = domain_hvm_iommu(d);
+    void *fdt = kinfo->fdt;
 
     DPRINT("Create hypervisor node\n");
 
@@ -384,6 +387,39 @@  static int make_hypervisor_node(struct domain *d,
     if ( res )
         return res;
 
+    if ( kinfo->num_dev_protected )
+    {
+        /* Don't need to take dtdevs_lock here */
+        cells = xmalloc_array(__be32, kinfo->num_dev_protected *
+                              dt_size_to_cells(sizeof(dt_phandle)));
+        if ( !cells )
+            return -FDT_ERR_XEN(ENOMEM);
+
+        _cells = cells;
+
+        DPRINT("  List of protected devices\n");
+        list_for_each_entry( dev, &hd->dt_devices, next_assigned )
+        {
+            DPRINT("    - %s\n", dt_node_full_name(dev));
+            if ( !dev->phandle )
+            {
+                printk(XENLOG_ERR "Unable to handle protected device (%s)"
+                       "with no phandle", dt_node_full_name(dev));
+                xfree(cells);
+                return -FDT_ERR_XEN(EINVAL);
+            }
+            dt_set_cell(&_cells, dt_size_to_cells(sizeof(dt_phandle)),
+                        dev->phandle);
+        }
+
+        res = fdt_property(fdt, "protected-devices", cells,
+                           sizeof (dt_phandle) * kinfo->num_dev_protected);
+
+        xfree(cells);
+        if ( res )
+            return res;
+    }
+
     res = fdt_end_node(fdt);
 
     return res;
@@ -670,7 +706,8 @@  static int make_timer_node(const struct domain *d, void *fdt,
 }
 
 /* Map the device in the domain */
-static int map_device(struct domain *d, struct dt_device_node *dev)
+static int map_device(struct domain *d, struct kernel_info *kinfo,
+                      struct dt_device_node *dev)
 {
     unsigned int nirq;
     unsigned int naddr;
@@ -695,6 +732,7 @@  static int map_device(struct domain *d, struct dt_device_node *dev)
                    dt_node_full_name(dev));
             return res;
         }
+        kinfo->num_dev_protected++;
     }
 
     /* Map IRQs */
@@ -843,7 +881,7 @@  static int handle_node(struct domain *d, struct kernel_info *kinfo,
     if ( !dt_device_type_is_equal(node, "memory") &&
          dt_device_is_available(node) )
     {
-        res = map_device(d, node);
+        res = map_device(d, kinfo, node);
 
         if ( res )
             return res;
@@ -874,7 +912,7 @@  static int handle_node(struct domain *d, struct kernel_info *kinfo,
 
     if ( node == dt_host )
     {
-        res = make_hypervisor_node(d, kinfo->fdt, node);
+        res = make_hypervisor_node(d, kinfo, node);
         if ( res )
             return res;
 
@@ -1027,6 +1065,7 @@  int construct_dom0(struct domain *d)
 
     d->max_pages = ~0U;
 
+    kinfo.num_dev_protected = 0;
     kinfo.unassigned_mem = dom0_mem;
 
     allocate_memory(d, &kinfo);
diff --git a/xen/arch/arm/kernel.h b/xen/arch/arm/kernel.h
index b48c2c9..3af5c50 100644
--- a/xen/arch/arm/kernel.h
+++ b/xen/arch/arm/kernel.h
@@ -18,6 +18,9 @@  struct kernel_info {
     paddr_t unassigned_mem; /* RAM not (yet) assigned to a bank */
     struct dt_mem_info mem;
 
+    /* Number of devices protected by an IOMMU */
+    unsigned int num_dev_protected;
+
     paddr_t dtb_paddr;
     paddr_t entry;