mbox series

[00/21] Control VQ support in vDPA

Message ID 20201216064818.48239-1-jasowang@redhat.com
Headers show
Series Control VQ support in vDPA | expand

Message

Jason Wang Dec. 16, 2020, 6:47 a.m. UTC
Hi All:

This series tries to add the support for control virtqueue in vDPA.

Control virtqueue is used by networking device for accepting various
commands from the driver. It's a must to support multiqueue and other
configurations.

When used by vhost-vDPA bus driver for VM, the control virtqueue
should be shadowed via userspace VMM (Qemu) instead of being assigned
directly to Guest. This is because Qemu needs to know the device state
in order to start and stop device correctly (e.g for Live Migration).

This requies to isolate the memory mapping for control virtqueue
presented by vhost-vDPA to prevent guest from accesing it directly.

To achieve this, vDPA introduce two new abstractions:

- address space: identified through address space id (ASID) and a set
                 of memory mapping in maintained
- virtqueue group: the minimal set of virtqueues that must share an
                 address space

Device needs to advertise the following attributes to vDPA:

- the number of address spaces supported in the device
- the number of virtqueue groups supported in the device
- the mappings from a specific virtqueue to its virtqueue groups

The mappings from virtqueue to virtqueue groups is fixed and defined
by vDPA device driver. E.g:

- For the device that has hardware ASID support, it can simply
  advertise a per virtqueue virtqueue group.
- For the device that does not have hardware ASID support, it can
  simply advertise a single virtqueue group that contains all
  virtqueues. Or if it wants a software emulated control virtqueue, it
  can advertise two virtqueue groups, one is for cvq, another is for
  the rest virtqueues.

vDPA also allow to change the association between virtqueue group and
address space. So in the case of control virtqueue, userspace
VMM(Qemu) may use a dedicated address space for the control virtqueue
group to isolate the memory mapping.

The vhost/vhost-vDPA is also extend for the userspace to:

- query the number of virtqueue groups and address spaces supported by
  the device
- query the virtqueue group for a specific virtqueue
- assocaite a virtqueue group with an address space
- send ASID based IOTLB commands

This will help userspace VMM(Qemu) to detect whether the control vq
could be supported and isolate memory mappings of control virtqueue
from the others.

To demonstrate the usage, vDPA simulator is extended to support
setting MAC address via a emulated control virtqueue.

Please review.

Changes since RFC:

- tweak vhost uAPI documentation
- switch to use device specific IOTLB really in patch 4
- tweak the commit log
- fix that ASID in vhost is claimed to be 32 actually but 16bit
  actually
- fix use after free when using ASID with IOTLB batching requests
- switch to use Stefano's patch for having separated iov
- remove unused "used_as" variable
- fix the iotlb/asid checking in vhost_vdpa_unmap()

Thanks

Jason Wang (20):
  vhost: move the backend feature bits to vhost_types.h
  virtio-vdpa: don't set callback if virtio doesn't need it
  vhost-vdpa: passing iotlb to IOMMU mapping helpers
  vhost-vdpa: switch to use vhost-vdpa specific IOTLB
  vdpa: add the missing comment for nvqs in struct vdpa_device
  vdpa: introduce virtqueue groups
  vdpa: multiple address spaces support
  vdpa: introduce config operations for associating ASID to a virtqueue
    group
  vhost_iotlb: split out IOTLB initialization
  vhost: support ASID in IOTLB API
  vhost-vdpa: introduce asid based IOTLB
  vhost-vdpa: introduce uAPI to get the number of virtqueue groups
  vhost-vdpa: introduce uAPI to get the number of address spaces
  vhost-vdpa: uAPI to get virtqueue group id
  vhost-vdpa: introduce uAPI to set group ASID
  vhost-vdpa: support ASID based IOTLB API
  vdpa_sim: advertise VIRTIO_NET_F_MTU
  vdpa_sim: factor out buffer completion logic
  vdpa_sim: filter destination mac address
  vdpasim: control virtqueue support

Stefano Garzarella (1):
  vdpa_sim: split vdpasim_virtqueue's iov field in out_iov and in_iov

 drivers/vdpa/ifcvf/ifcvf_main.c   |   9 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c |  11 +-
 drivers/vdpa/vdpa.c               |   8 +-
 drivers/vdpa/vdpa_sim/vdpa_sim.c  | 292 ++++++++++++++++++++++++------
 drivers/vhost/iotlb.c             |  23 ++-
 drivers/vhost/vdpa.c              | 246 ++++++++++++++++++++-----
 drivers/vhost/vhost.c             |  23 ++-
 drivers/vhost/vhost.h             |   4 +-
 drivers/virtio/virtio_vdpa.c      |   2 +-
 include/linux/vdpa.h              |  42 ++++-
 include/linux/vhost_iotlb.h       |   2 +
 include/uapi/linux/vhost.h        |  25 ++-
 include/uapi/linux/vhost_types.h  |  10 +-
 13 files changed, 561 insertions(+), 136 deletions(-)

Comments

Jason Wang Dec. 17, 2020, 3:30 a.m. UTC | #1
On 2020/12/16 下午5:47, Michael S. Tsirkin wrote:
> On Wed, Dec 16, 2020 at 02:47:57PM +0800, Jason Wang wrote:

>> Hi All:

>>

>> This series tries to add the support for control virtqueue in vDPA.

>>

>> Control virtqueue is used by networking device for accepting various

>> commands from the driver. It's a must to support multiqueue and other

>> configurations.

>>

>> When used by vhost-vDPA bus driver for VM, the control virtqueue

>> should be shadowed via userspace VMM (Qemu) instead of being assigned

>> directly to Guest. This is because Qemu needs to know the device state

>> in order to start and stop device correctly (e.g for Live Migration).

>>

>> This requies to isolate the memory mapping for control virtqueue

>> presented by vhost-vDPA to prevent guest from accesing it directly.

>> To achieve this, vDPA introduce two new abstractions:

>>

>> - address space: identified through address space id (ASID) and a set

>>                   of memory mapping in maintained

>> - virtqueue group: the minimal set of virtqueues that must share an

>>                   address space

> How will this support the pretty common case where control vq

> is programmed by the kernel through the PF, and others by the VFs?



In this case, the VF parent need to provide a software control vq and 
decode the command then send them to VF.


>

>

> I actually thought the way to support it is by exposing

> something like an "inject buffers" API which sends data to a given VQ.

> Maybe an ioctl, and maybe down the road uio ring can support batching

> these ....



So the virtuqueue allows the request to be processed asynchronously (e.g 
driver may choose to use interrupt for control vq). This means we need 
to support that in uAPI level. And if we manage to do that, it's just 
another type of virtqueue.

For virtio-vDPA, this also means the extensions for queue processing 
which is a functional duplication. Using what proposed in this series, 
we don't need any changes for kernel virtio drivers.

What's more important, this series could be used for future features 
that requires DMA isolation between virtqueues:

- report dirty pages via virtqueue
- sub function level device slicing

...

Thanks


>

>

>> Device needs to advertise the following attributes to vDPA:

>>

>> - the number of address spaces supported in the device

>> - the number of virtqueue groups supported in the device

>> - the mappings from a specific virtqueue to its virtqueue groups

>>

>> The mappings from virtqueue to virtqueue groups is fixed and defined

>> by vDPA device driver. E.g:

>>

>> - For the device that has hardware ASID support, it can simply

>>    advertise a per virtqueue virtqueue group.

>> - For the device that does not have hardware ASID support, it can

>>    simply advertise a single virtqueue group that contains all

>>    virtqueues. Or if it wants a software emulated control virtqueue, it

>>    can advertise two virtqueue groups, one is for cvq, another is for

>>    the rest virtqueues.

>>

>> vDPA also allow to change the association between virtqueue group and

>> address space. So in the case of control virtqueue, userspace

>> VMM(Qemu) may use a dedicated address space for the control virtqueue

>> group to isolate the memory mapping.

>>

>> The vhost/vhost-vDPA is also extend for the userspace to:

>>

>> - query the number of virtqueue groups and address spaces supported by

>>    the device

>> - query the virtqueue group for a specific virtqueue

>> - assocaite a virtqueue group with an address space

>> - send ASID based IOTLB commands

>>

>> This will help userspace VMM(Qemu) to detect whether the control vq

>> could be supported and isolate memory mappings of control virtqueue

>> from the others.

>>

>> To demonstrate the usage, vDPA simulator is extended to support

>> setting MAC address via a emulated control virtqueue.

>>

>> Please review.

>>

>> Changes since RFC:

>>

>> - tweak vhost uAPI documentation

>> - switch to use device specific IOTLB really in patch 4

>> - tweak the commit log

>> - fix that ASID in vhost is claimed to be 32 actually but 16bit

>>    actually

>> - fix use after free when using ASID with IOTLB batching requests

>> - switch to use Stefano's patch for having separated iov

>> - remove unused "used_as" variable

>> - fix the iotlb/asid checking in vhost_vdpa_unmap()

>>

>> Thanks

>>

>> Jason Wang (20):

>>    vhost: move the backend feature bits to vhost_types.h

>>    virtio-vdpa: don't set callback if virtio doesn't need it

>>    vhost-vdpa: passing iotlb to IOMMU mapping helpers

>>    vhost-vdpa: switch to use vhost-vdpa specific IOTLB

>>    vdpa: add the missing comment for nvqs in struct vdpa_device

>>    vdpa: introduce virtqueue groups

>>    vdpa: multiple address spaces support

>>    vdpa: introduce config operations for associating ASID to a virtqueue

>>      group

>>    vhost_iotlb: split out IOTLB initialization

>>    vhost: support ASID in IOTLB API

>>    vhost-vdpa: introduce asid based IOTLB

>>    vhost-vdpa: introduce uAPI to get the number of virtqueue groups

>>    vhost-vdpa: introduce uAPI to get the number of address spaces

>>    vhost-vdpa: uAPI to get virtqueue group id

>>    vhost-vdpa: introduce uAPI to set group ASID

>>    vhost-vdpa: support ASID based IOTLB API

>>    vdpa_sim: advertise VIRTIO_NET_F_MTU

>>    vdpa_sim: factor out buffer completion logic

>>    vdpa_sim: filter destination mac address

>>    vdpasim: control virtqueue support

>>

>> Stefano Garzarella (1):

>>    vdpa_sim: split vdpasim_virtqueue's iov field in out_iov and in_iov

>>

>>   drivers/vdpa/ifcvf/ifcvf_main.c   |   9 +-

>>   drivers/vdpa/mlx5/net/mlx5_vnet.c |  11 +-

>>   drivers/vdpa/vdpa.c               |   8 +-

>>   drivers/vdpa/vdpa_sim/vdpa_sim.c  | 292 ++++++++++++++++++++++++------

>>   drivers/vhost/iotlb.c             |  23 ++-

>>   drivers/vhost/vdpa.c              | 246 ++++++++++++++++++++-----

>>   drivers/vhost/vhost.c             |  23 ++-

>>   drivers/vhost/vhost.h             |   4 +-

>>   drivers/virtio/virtio_vdpa.c      |   2 +-

>>   include/linux/vdpa.h              |  42 ++++-

>>   include/linux/vhost_iotlb.h       |   2 +

>>   include/uapi/linux/vhost.h        |  25 ++-

>>   include/uapi/linux/vhost_types.h  |  10 +-

>>   13 files changed, 561 insertions(+), 136 deletions(-)

>>

>> -- 

>> 2.25.1
Eli Cohen Dec. 17, 2020, 7:26 a.m. UTC | #2
On Wed, Dec 16, 2020 at 02:47:57PM +0800, Jason Wang wrote:

Hi Jason,
I saw the patchset and will start reviewing it starting Dec 27. I am out
of office next week.

> Hi All:

> 

> This series tries to add the support for control virtqueue in vDPA.

> 

> Control virtqueue is used by networking device for accepting various

> commands from the driver. It's a must to support multiqueue and other

> configurations.

> 

> When used by vhost-vDPA bus driver for VM, the control virtqueue

> should be shadowed via userspace VMM (Qemu) instead of being assigned

> directly to Guest. This is because Qemu needs to know the device state

> in order to start and stop device correctly (e.g for Live Migration).

> 

> This requies to isolate the memory mapping for control virtqueue

> presented by vhost-vDPA to prevent guest from accesing it directly.

> 

> To achieve this, vDPA introduce two new abstractions:

> 

> - address space: identified through address space id (ASID) and a set

>                  of memory mapping in maintained

> - virtqueue group: the minimal set of virtqueues that must share an

>                  address space

> 

> Device needs to advertise the following attributes to vDPA:

> 

> - the number of address spaces supported in the device

> - the number of virtqueue groups supported in the device

> - the mappings from a specific virtqueue to its virtqueue groups

> 

> The mappings from virtqueue to virtqueue groups is fixed and defined

> by vDPA device driver. E.g:

> 

> - For the device that has hardware ASID support, it can simply

>   advertise a per virtqueue virtqueue group.

> - For the device that does not have hardware ASID support, it can

>   simply advertise a single virtqueue group that contains all

>   virtqueues. Or if it wants a software emulated control virtqueue, it

>   can advertise two virtqueue groups, one is for cvq, another is for

>   the rest virtqueues.

> 

> vDPA also allow to change the association between virtqueue group and

> address space. So in the case of control virtqueue, userspace

> VMM(Qemu) may use a dedicated address space for the control virtqueue

> group to isolate the memory mapping.

> 

> The vhost/vhost-vDPA is also extend for the userspace to:

> 

> - query the number of virtqueue groups and address spaces supported by

>   the device

> - query the virtqueue group for a specific virtqueue

> - assocaite a virtqueue group with an address space

> - send ASID based IOTLB commands

> 

> This will help userspace VMM(Qemu) to detect whether the control vq

> could be supported and isolate memory mappings of control virtqueue

> from the others.

> 

> To demonstrate the usage, vDPA simulator is extended to support

> setting MAC address via a emulated control virtqueue.

> 

> Please review.

> 

> Changes since RFC:

> 

> - tweak vhost uAPI documentation

> - switch to use device specific IOTLB really in patch 4

> - tweak the commit log

> - fix that ASID in vhost is claimed to be 32 actually but 16bit

>   actually

> - fix use after free when using ASID with IOTLB batching requests

> - switch to use Stefano's patch for having separated iov

> - remove unused "used_as" variable

> - fix the iotlb/asid checking in vhost_vdpa_unmap()

> 

> Thanks

> 

> Jason Wang (20):

>   vhost: move the backend feature bits to vhost_types.h

>   virtio-vdpa: don't set callback if virtio doesn't need it

>   vhost-vdpa: passing iotlb to IOMMU mapping helpers

>   vhost-vdpa: switch to use vhost-vdpa specific IOTLB

>   vdpa: add the missing comment for nvqs in struct vdpa_device

>   vdpa: introduce virtqueue groups

>   vdpa: multiple address spaces support

>   vdpa: introduce config operations for associating ASID to a virtqueue

>     group

>   vhost_iotlb: split out IOTLB initialization

>   vhost: support ASID in IOTLB API

>   vhost-vdpa: introduce asid based IOTLB

>   vhost-vdpa: introduce uAPI to get the number of virtqueue groups

>   vhost-vdpa: introduce uAPI to get the number of address spaces

>   vhost-vdpa: uAPI to get virtqueue group id

>   vhost-vdpa: introduce uAPI to set group ASID

>   vhost-vdpa: support ASID based IOTLB API

>   vdpa_sim: advertise VIRTIO_NET_F_MTU

>   vdpa_sim: factor out buffer completion logic

>   vdpa_sim: filter destination mac address

>   vdpasim: control virtqueue support

> 

> Stefano Garzarella (1):

>   vdpa_sim: split vdpasim_virtqueue's iov field in out_iov and in_iov

> 

>  drivers/vdpa/ifcvf/ifcvf_main.c   |   9 +-

>  drivers/vdpa/mlx5/net/mlx5_vnet.c |  11 +-

>  drivers/vdpa/vdpa.c               |   8 +-

>  drivers/vdpa/vdpa_sim/vdpa_sim.c  | 292 ++++++++++++++++++++++++------

>  drivers/vhost/iotlb.c             |  23 ++-

>  drivers/vhost/vdpa.c              | 246 ++++++++++++++++++++-----

>  drivers/vhost/vhost.c             |  23 ++-

>  drivers/vhost/vhost.h             |   4 +-

>  drivers/virtio/virtio_vdpa.c      |   2 +-

>  include/linux/vdpa.h              |  42 ++++-

>  include/linux/vhost_iotlb.h       |   2 +

>  include/uapi/linux/vhost.h        |  25 ++-

>  include/uapi/linux/vhost_types.h  |  10 +-

>  13 files changed, 561 insertions(+), 136 deletions(-)

> 

> -- 

> 2.25.1

>
Michael S. Tsirkin Dec. 17, 2020, 7:58 a.m. UTC | #3
On Thu, Dec 17, 2020 at 11:30:18AM +0800, Jason Wang wrote:
> 

> On 2020/12/16 下午5:47, Michael S. Tsirkin wrote:

> > On Wed, Dec 16, 2020 at 02:47:57PM +0800, Jason Wang wrote:

> > > Hi All:

> > > 

> > > This series tries to add the support for control virtqueue in vDPA.

> > > 

> > > Control virtqueue is used by networking device for accepting various

> > > commands from the driver. It's a must to support multiqueue and other

> > > configurations.

> > > 

> > > When used by vhost-vDPA bus driver for VM, the control virtqueue

> > > should be shadowed via userspace VMM (Qemu) instead of being assigned

> > > directly to Guest. This is because Qemu needs to know the device state

> > > in order to start and stop device correctly (e.g for Live Migration).

> > > 

> > > This requies to isolate the memory mapping for control virtqueue

> > > presented by vhost-vDPA to prevent guest from accesing it directly.

> > > To achieve this, vDPA introduce two new abstractions:

> > > 

> > > - address space: identified through address space id (ASID) and a set

> > >                   of memory mapping in maintained

> > > - virtqueue group: the minimal set of virtqueues that must share an

> > >                   address space

> > How will this support the pretty common case where control vq

> > is programmed by the kernel through the PF, and others by the VFs?

> 

> 

> In this case, the VF parent need to provide a software control vq and decode

> the command then send them to VF.



But how does that tie to the address space infrastructure?



> 

> > 

> > 

> > I actually thought the way to support it is by exposing

> > something like an "inject buffers" API which sends data to a given VQ.

> > Maybe an ioctl, and maybe down the road uio ring can support batching

> > these ....

> 

> 

> So the virtuqueue allows the request to be processed asynchronously (e.g

> driver may choose to use interrupt for control vq). This means we need to

> support that in uAPI level.


I don't think we need to make it async, just a regular ioctl will do.
In fact no guest uses the asynchronous property.


> And if we manage to do that, it's just another

> type of virtqueue.

> 

> For virtio-vDPA, this also means the extensions for queue processing which

> is a functional duplication.


I don't see why, just send it to the actual control vq :)

> Using what proposed in this series, we don't

> need any changes for kernel virtio drivers.

> 

> What's more important, this series could be used for future features that

> requires DMA isolation between virtqueues:

> 

> - report dirty pages via virtqueue

> - sub function level device slicing



I agree these are nice to have, but I am not sure basic control vq must
be tied to that.

> ...

> 

> Thanks

> 

> 

> > 

> > 

> > > Device needs to advertise the following attributes to vDPA:

> > > 

> > > - the number of address spaces supported in the device

> > > - the number of virtqueue groups supported in the device

> > > - the mappings from a specific virtqueue to its virtqueue groups

> > > 

> > > The mappings from virtqueue to virtqueue groups is fixed and defined

> > > by vDPA device driver. E.g:

> > > 

> > > - For the device that has hardware ASID support, it can simply

> > >    advertise a per virtqueue virtqueue group.

> > > - For the device that does not have hardware ASID support, it can

> > >    simply advertise a single virtqueue group that contains all

> > >    virtqueues. Or if it wants a software emulated control virtqueue, it

> > >    can advertise two virtqueue groups, one is for cvq, another is for

> > >    the rest virtqueues.

> > > 

> > > vDPA also allow to change the association between virtqueue group and

> > > address space. So in the case of control virtqueue, userspace

> > > VMM(Qemu) may use a dedicated address space for the control virtqueue

> > > group to isolate the memory mapping.

> > > 

> > > The vhost/vhost-vDPA is also extend for the userspace to:

> > > 

> > > - query the number of virtqueue groups and address spaces supported by

> > >    the device

> > > - query the virtqueue group for a specific virtqueue

> > > - assocaite a virtqueue group with an address space

> > > - send ASID based IOTLB commands

> > > 

> > > This will help userspace VMM(Qemu) to detect whether the control vq

> > > could be supported and isolate memory mappings of control virtqueue

> > > from the others.

> > > 

> > > To demonstrate the usage, vDPA simulator is extended to support

> > > setting MAC address via a emulated control virtqueue.

> > > 

> > > Please review.

> > > 

> > > Changes since RFC:

> > > 

> > > - tweak vhost uAPI documentation

> > > - switch to use device specific IOTLB really in patch 4

> > > - tweak the commit log

> > > - fix that ASID in vhost is claimed to be 32 actually but 16bit

> > >    actually

> > > - fix use after free when using ASID with IOTLB batching requests

> > > - switch to use Stefano's patch for having separated iov

> > > - remove unused "used_as" variable

> > > - fix the iotlb/asid checking in vhost_vdpa_unmap()

> > > 

> > > Thanks

> > > 

> > > Jason Wang (20):

> > >    vhost: move the backend feature bits to vhost_types.h

> > >    virtio-vdpa: don't set callback if virtio doesn't need it

> > >    vhost-vdpa: passing iotlb to IOMMU mapping helpers

> > >    vhost-vdpa: switch to use vhost-vdpa specific IOTLB

> > >    vdpa: add the missing comment for nvqs in struct vdpa_device

> > >    vdpa: introduce virtqueue groups

> > >    vdpa: multiple address spaces support

> > >    vdpa: introduce config operations for associating ASID to a virtqueue

> > >      group

> > >    vhost_iotlb: split out IOTLB initialization

> > >    vhost: support ASID in IOTLB API

> > >    vhost-vdpa: introduce asid based IOTLB

> > >    vhost-vdpa: introduce uAPI to get the number of virtqueue groups

> > >    vhost-vdpa: introduce uAPI to get the number of address spaces

> > >    vhost-vdpa: uAPI to get virtqueue group id

> > >    vhost-vdpa: introduce uAPI to set group ASID

> > >    vhost-vdpa: support ASID based IOTLB API

> > >    vdpa_sim: advertise VIRTIO_NET_F_MTU

> > >    vdpa_sim: factor out buffer completion logic

> > >    vdpa_sim: filter destination mac address

> > >    vdpasim: control virtqueue support

> > > 

> > > Stefano Garzarella (1):

> > >    vdpa_sim: split vdpasim_virtqueue's iov field in out_iov and in_iov

> > > 

> > >   drivers/vdpa/ifcvf/ifcvf_main.c   |   9 +-

> > >   drivers/vdpa/mlx5/net/mlx5_vnet.c |  11 +-

> > >   drivers/vdpa/vdpa.c               |   8 +-

> > >   drivers/vdpa/vdpa_sim/vdpa_sim.c  | 292 ++++++++++++++++++++++++------

> > >   drivers/vhost/iotlb.c             |  23 ++-

> > >   drivers/vhost/vdpa.c              | 246 ++++++++++++++++++++-----

> > >   drivers/vhost/vhost.c             |  23 ++-

> > >   drivers/vhost/vhost.h             |   4 +-

> > >   drivers/virtio/virtio_vdpa.c      |   2 +-

> > >   include/linux/vdpa.h              |  42 ++++-

> > >   include/linux/vhost_iotlb.h       |   2 +

> > >   include/uapi/linux/vhost.h        |  25 ++-

> > >   include/uapi/linux/vhost_types.h  |  10 +-

> > >   13 files changed, 561 insertions(+), 136 deletions(-)

> > > 

> > > -- 

> > > 2.25.1
Jason Wang Dec. 17, 2020, 9:02 a.m. UTC | #4
On 2020/12/17 下午3:58, Michael S. Tsirkin wrote:
> On Thu, Dec 17, 2020 at 11:30:18AM +0800, Jason Wang wrote:

>> On 2020/12/16 下午5:47, Michael S. Tsirkin wrote:

>>> On Wed, Dec 16, 2020 at 02:47:57PM +0800, Jason Wang wrote:

>>>> Hi All:

>>>>

>>>> This series tries to add the support for control virtqueue in vDPA.

>>>>

>>>> Control virtqueue is used by networking device for accepting various

>>>> commands from the driver. It's a must to support multiqueue and other

>>>> configurations.

>>>>

>>>> When used by vhost-vDPA bus driver for VM, the control virtqueue

>>>> should be shadowed via userspace VMM (Qemu) instead of being assigned

>>>> directly to Guest. This is because Qemu needs to know the device state

>>>> in order to start and stop device correctly (e.g for Live Migration).

>>>>

>>>> This requies to isolate the memory mapping for control virtqueue

>>>> presented by vhost-vDPA to prevent guest from accesing it directly.

>>>> To achieve this, vDPA introduce two new abstractions:

>>>>

>>>> - address space: identified through address space id (ASID) and a set

>>>>                    of memory mapping in maintained

>>>> - virtqueue group: the minimal set of virtqueues that must share an

>>>>                    address space

>>> How will this support the pretty common case where control vq

>>> is programmed by the kernel through the PF, and others by the VFs?

>>

>> In this case, the VF parent need to provide a software control vq and decode

>> the command then send them to VF.

>

> But how does that tie to the address space infrastructure?



In this case, address space is not a must. But the idea is to make 
control vq works for all types of hardware:

1) control virtqueue is implemented via VF/PF communication
2) control virtqueue is implemented by VF but not through DMA
3) control virtqueue is implemented by VF DMA, it could be either a 
hardware control virtqueue or other type of DMA

The address space is a must for 3) to work and can work for both 1) and 2).


>

>

>

>>>

>>> I actually thought the way to support it is by exposing

>>> something like an "inject buffers" API which sends data to a given VQ.

>>> Maybe an ioctl, and maybe down the road uio ring can support batching

>>> these ....

>>

>> So the virtuqueue allows the request to be processed asynchronously (e.g

>> driver may choose to use interrupt for control vq). This means we need to

>> support that in uAPI level.

> I don't think we need to make it async, just a regular ioctl will do.

> In fact no guest uses the asynchronous property.



It was not forbidden by the spec then we need to support that. E.g we 
can not assume driver doesn't assign interrupt for cvq.


>

>

>> And if we manage to do that, it's just another

>> type of virtqueue.

>>

>> For virtio-vDPA, this also means the extensions for queue processing which

>> is a functional duplication.

> I don't see why, just send it to the actual control vq :)



But in the case you've pointed out, there's no hardware control vq in fact.


>

>> Using what proposed in this series, we don't

>> need any changes for kernel virtio drivers.

>>

>> What's more important, this series could be used for future features that

>> requires DMA isolation between virtqueues:

>>

>> - report dirty pages via virtqueue

>> - sub function level device slicing

>

> I agree these are nice to have, but I am not sure basic control vq must

> be tied to that.



If the control virtqueue is implemented via DMA through VF, it looks 
like a must.

Thanks


>

>> ...

>>

>> Thanks

>>

>>

>>>

>>>> Device needs to advertise the following attributes to vDPA:

>>>>

>>>> - the number of address spaces supported in the device

>>>> - the number of virtqueue groups supported in the device

>>>> - the mappings from a specific virtqueue to its virtqueue groups

>>>>

>>>> The mappings from virtqueue to virtqueue groups is fixed and defined

>>>> by vDPA device driver. E.g:

>>>>

>>>> - For the device that has hardware ASID support, it can simply

>>>>     advertise a per virtqueue virtqueue group.

>>>> - For the device that does not have hardware ASID support, it can

>>>>     simply advertise a single virtqueue group that contains all

>>>>     virtqueues. Or if it wants a software emulated control virtqueue, it

>>>>     can advertise two virtqueue groups, one is for cvq, another is for

>>>>     the rest virtqueues.

>>>>

>>>> vDPA also allow to change the association between virtqueue group and

>>>> address space. So in the case of control virtqueue, userspace

>>>> VMM(Qemu) may use a dedicated address space for the control virtqueue

>>>> group to isolate the memory mapping.

>>>>

>>>> The vhost/vhost-vDPA is also extend for the userspace to:

>>>>

>>>> - query the number of virtqueue groups and address spaces supported by

>>>>     the device

>>>> - query the virtqueue group for a specific virtqueue

>>>> - assocaite a virtqueue group with an address space

>>>> - send ASID based IOTLB commands

>>>>

>>>> This will help userspace VMM(Qemu) to detect whether the control vq

>>>> could be supported and isolate memory mappings of control virtqueue

>>>> from the others.

>>>>

>>>> To demonstrate the usage, vDPA simulator is extended to support

>>>> setting MAC address via a emulated control virtqueue.

>>>>

>>>> Please review.

>>>>

>>>> Changes since RFC:

>>>>

>>>> - tweak vhost uAPI documentation

>>>> - switch to use device specific IOTLB really in patch 4

>>>> - tweak the commit log

>>>> - fix that ASID in vhost is claimed to be 32 actually but 16bit

>>>>     actually

>>>> - fix use after free when using ASID with IOTLB batching requests

>>>> - switch to use Stefano's patch for having separated iov

>>>> - remove unused "used_as" variable

>>>> - fix the iotlb/asid checking in vhost_vdpa_unmap()

>>>>

>>>> Thanks

>>>>

>>>> Jason Wang (20):

>>>>     vhost: move the backend feature bits to vhost_types.h

>>>>     virtio-vdpa: don't set callback if virtio doesn't need it

>>>>     vhost-vdpa: passing iotlb to IOMMU mapping helpers

>>>>     vhost-vdpa: switch to use vhost-vdpa specific IOTLB

>>>>     vdpa: add the missing comment for nvqs in struct vdpa_device

>>>>     vdpa: introduce virtqueue groups

>>>>     vdpa: multiple address spaces support

>>>>     vdpa: introduce config operations for associating ASID to a virtqueue

>>>>       group

>>>>     vhost_iotlb: split out IOTLB initialization

>>>>     vhost: support ASID in IOTLB API

>>>>     vhost-vdpa: introduce asid based IOTLB

>>>>     vhost-vdpa: introduce uAPI to get the number of virtqueue groups

>>>>     vhost-vdpa: introduce uAPI to get the number of address spaces

>>>>     vhost-vdpa: uAPI to get virtqueue group id

>>>>     vhost-vdpa: introduce uAPI to set group ASID

>>>>     vhost-vdpa: support ASID based IOTLB API

>>>>     vdpa_sim: advertise VIRTIO_NET_F_MTU

>>>>     vdpa_sim: factor out buffer completion logic

>>>>     vdpa_sim: filter destination mac address

>>>>     vdpasim: control virtqueue support

>>>>

>>>> Stefano Garzarella (1):

>>>>     vdpa_sim: split vdpasim_virtqueue's iov field in out_iov and in_iov

>>>>

>>>>    drivers/vdpa/ifcvf/ifcvf_main.c   |   9 +-

>>>>    drivers/vdpa/mlx5/net/mlx5_vnet.c |  11 +-

>>>>    drivers/vdpa/vdpa.c               |   8 +-

>>>>    drivers/vdpa/vdpa_sim/vdpa_sim.c  | 292 ++++++++++++++++++++++++------

>>>>    drivers/vhost/iotlb.c             |  23 ++-

>>>>    drivers/vhost/vdpa.c              | 246 ++++++++++++++++++++-----

>>>>    drivers/vhost/vhost.c             |  23 ++-

>>>>    drivers/vhost/vhost.h             |   4 +-

>>>>    drivers/virtio/virtio_vdpa.c      |   2 +-

>>>>    include/linux/vdpa.h              |  42 ++++-

>>>>    include/linux/vhost_iotlb.h       |   2 +

>>>>    include/uapi/linux/vhost.h        |  25 ++-

>>>>    include/uapi/linux/vhost_types.h  |  10 +-

>>>>    13 files changed, 561 insertions(+), 136 deletions(-)

>>>>

>>>> -- 

>>>> 2.25.1
Michael S. Tsirkin Dec. 17, 2020, 10:28 p.m. UTC | #5
On Thu, Dec 17, 2020 at 05:02:49PM +0800, Jason Wang wrote:
> 

> On 2020/12/17 下午3:58, Michael S. Tsirkin wrote:

> > On Thu, Dec 17, 2020 at 11:30:18AM +0800, Jason Wang wrote:

> > > On 2020/12/16 下午5:47, Michael S. Tsirkin wrote:

> > > > On Wed, Dec 16, 2020 at 02:47:57PM +0800, Jason Wang wrote:

> > > > > Hi All:

> > > > > 

> > > > > This series tries to add the support for control virtqueue in vDPA.

> > > > > 

> > > > > Control virtqueue is used by networking device for accepting various

> > > > > commands from the driver. It's a must to support multiqueue and other

> > > > > configurations.

> > > > > 

> > > > > When used by vhost-vDPA bus driver for VM, the control virtqueue

> > > > > should be shadowed via userspace VMM (Qemu) instead of being assigned

> > > > > directly to Guest. This is because Qemu needs to know the device state

> > > > > in order to start and stop device correctly (e.g for Live Migration).

> > > > > 

> > > > > This requies to isolate the memory mapping for control virtqueue

> > > > > presented by vhost-vDPA to prevent guest from accesing it directly.

> > > > > To achieve this, vDPA introduce two new abstractions:

> > > > > 

> > > > > - address space: identified through address space id (ASID) and a set

> > > > >                    of memory mapping in maintained

> > > > > - virtqueue group: the minimal set of virtqueues that must share an

> > > > >                    address space

> > > > How will this support the pretty common case where control vq

> > > > is programmed by the kernel through the PF, and others by the VFs?

> > > 

> > > In this case, the VF parent need to provide a software control vq and decode

> > > the command then send them to VF.

> > 

> > But how does that tie to the address space infrastructure?

> 

> 

> In this case, address space is not a must.


That's ok, problem is I don't see how address space is going
to work in this case at all.

There's no address space there that userspace/guest can control.


> But the idea is to make control

> vq works for all types of hardware:

> 

> 1) control virtqueue is implemented via VF/PF communication

> 2) control virtqueue is implemented by VF but not through DMA

> 3) control virtqueue is implemented by VF DMA, it could be either a hardware

> control virtqueue or other type of DMA

> 

> The address space is a must for 3) to work and can work for both 1) and 2).

> 

> 

> > 

> > 

> > 

> > > > 

> > > > I actually thought the way to support it is by exposing

> > > > something like an "inject buffers" API which sends data to a given VQ.

> > > > Maybe an ioctl, and maybe down the road uio ring can support batching

> > > > these ....

> > > 

> > > So the virtuqueue allows the request to be processed asynchronously (e.g

> > > driver may choose to use interrupt for control vq). This means we need to

> > > support that in uAPI level.

> > I don't think we need to make it async, just a regular ioctl will do.

> > In fact no guest uses the asynchronous property.

> 

> 

> It was not forbidden by the spec then we need to support that. E.g we can

> not assume driver doesn't assign interrupt for cvq.

> 

> 

> > 

> > 

> > > And if we manage to do that, it's just another

> > > type of virtqueue.

> > > 

> > > For virtio-vDPA, this also means the extensions for queue processing which

> > > is a functional duplication.

> > I don't see why, just send it to the actual control vq :)

> 

> 

> But in the case you've pointed out, there's no hardware control vq in fact.

> 

> 

> > 

> > > Using what proposed in this series, we don't

> > > need any changes for kernel virtio drivers.

> > > 

> > > What's more important, this series could be used for future features that

> > > requires DMA isolation between virtqueues:

> > > 

> > > - report dirty pages via virtqueue

> > > - sub function level device slicing

> > 

> > I agree these are nice to have, but I am not sure basic control vq must

> > be tied to that.

> 

> 

> If the control virtqueue is implemented via DMA through VF, it looks like a

> must.

> 

> Thanks

> 

> 

> > 

> > > ...

> > > 

> > > Thanks

> > > 

> > > 

> > > > 

> > > > > Device needs to advertise the following attributes to vDPA:

> > > > > 

> > > > > - the number of address spaces supported in the device

> > > > > - the number of virtqueue groups supported in the device

> > > > > - the mappings from a specific virtqueue to its virtqueue groups

> > > > > 

> > > > > The mappings from virtqueue to virtqueue groups is fixed and defined

> > > > > by vDPA device driver. E.g:

> > > > > 

> > > > > - For the device that has hardware ASID support, it can simply

> > > > >     advertise a per virtqueue virtqueue group.

> > > > > - For the device that does not have hardware ASID support, it can

> > > > >     simply advertise a single virtqueue group that contains all

> > > > >     virtqueues. Or if it wants a software emulated control virtqueue, it

> > > > >     can advertise two virtqueue groups, one is for cvq, another is for

> > > > >     the rest virtqueues.

> > > > > 

> > > > > vDPA also allow to change the association between virtqueue group and

> > > > > address space. So in the case of control virtqueue, userspace

> > > > > VMM(Qemu) may use a dedicated address space for the control virtqueue

> > > > > group to isolate the memory mapping.

> > > > > 

> > > > > The vhost/vhost-vDPA is also extend for the userspace to:

> > > > > 

> > > > > - query the number of virtqueue groups and address spaces supported by

> > > > >     the device

> > > > > - query the virtqueue group for a specific virtqueue

> > > > > - assocaite a virtqueue group with an address space

> > > > > - send ASID based IOTLB commands

> > > > > 

> > > > > This will help userspace VMM(Qemu) to detect whether the control vq

> > > > > could be supported and isolate memory mappings of control virtqueue

> > > > > from the others.

> > > > > 

> > > > > To demonstrate the usage, vDPA simulator is extended to support

> > > > > setting MAC address via a emulated control virtqueue.

> > > > > 

> > > > > Please review.

> > > > > 

> > > > > Changes since RFC:

> > > > > 

> > > > > - tweak vhost uAPI documentation

> > > > > - switch to use device specific IOTLB really in patch 4

> > > > > - tweak the commit log

> > > > > - fix that ASID in vhost is claimed to be 32 actually but 16bit

> > > > >     actually

> > > > > - fix use after free when using ASID with IOTLB batching requests

> > > > > - switch to use Stefano's patch for having separated iov

> > > > > - remove unused "used_as" variable

> > > > > - fix the iotlb/asid checking in vhost_vdpa_unmap()

> > > > > 

> > > > > Thanks

> > > > > 

> > > > > Jason Wang (20):

> > > > >     vhost: move the backend feature bits to vhost_types.h

> > > > >     virtio-vdpa: don't set callback if virtio doesn't need it

> > > > >     vhost-vdpa: passing iotlb to IOMMU mapping helpers

> > > > >     vhost-vdpa: switch to use vhost-vdpa specific IOTLB

> > > > >     vdpa: add the missing comment for nvqs in struct vdpa_device

> > > > >     vdpa: introduce virtqueue groups

> > > > >     vdpa: multiple address spaces support

> > > > >     vdpa: introduce config operations for associating ASID to a virtqueue

> > > > >       group

> > > > >     vhost_iotlb: split out IOTLB initialization

> > > > >     vhost: support ASID in IOTLB API

> > > > >     vhost-vdpa: introduce asid based IOTLB

> > > > >     vhost-vdpa: introduce uAPI to get the number of virtqueue groups

> > > > >     vhost-vdpa: introduce uAPI to get the number of address spaces

> > > > >     vhost-vdpa: uAPI to get virtqueue group id

> > > > >     vhost-vdpa: introduce uAPI to set group ASID

> > > > >     vhost-vdpa: support ASID based IOTLB API

> > > > >     vdpa_sim: advertise VIRTIO_NET_F_MTU

> > > > >     vdpa_sim: factor out buffer completion logic

> > > > >     vdpa_sim: filter destination mac address

> > > > >     vdpasim: control virtqueue support

> > > > > 

> > > > > Stefano Garzarella (1):

> > > > >     vdpa_sim: split vdpasim_virtqueue's iov field in out_iov and in_iov

> > > > > 

> > > > >    drivers/vdpa/ifcvf/ifcvf_main.c   |   9 +-

> > > > >    drivers/vdpa/mlx5/net/mlx5_vnet.c |  11 +-

> > > > >    drivers/vdpa/vdpa.c               |   8 +-

> > > > >    drivers/vdpa/vdpa_sim/vdpa_sim.c  | 292 ++++++++++++++++++++++++------

> > > > >    drivers/vhost/iotlb.c             |  23 ++-

> > > > >    drivers/vhost/vdpa.c              | 246 ++++++++++++++++++++-----

> > > > >    drivers/vhost/vhost.c             |  23 ++-

> > > > >    drivers/vhost/vhost.h             |   4 +-

> > > > >    drivers/virtio/virtio_vdpa.c      |   2 +-

> > > > >    include/linux/vdpa.h              |  42 ++++-

> > > > >    include/linux/vhost_iotlb.h       |   2 +

> > > > >    include/uapi/linux/vhost.h        |  25 ++-

> > > > >    include/uapi/linux/vhost_types.h  |  10 +-

> > > > >    13 files changed, 561 insertions(+), 136 deletions(-)

> > > > > 

> > > > > -- 

> > > > > 2.25.1
Jason Wang Dec. 18, 2020, 2:56 a.m. UTC | #6
On 2020/12/18 上午6:28, Michael S. Tsirkin wrote:
> On Thu, Dec 17, 2020 at 05:02:49PM +0800, Jason Wang wrote:

>> On 2020/12/17 下午3:58, Michael S. Tsirkin wrote:

>>> On Thu, Dec 17, 2020 at 11:30:18AM +0800, Jason Wang wrote:

>>>> On 2020/12/16 下午5:47, Michael S. Tsirkin wrote:

>>>>> On Wed, Dec 16, 2020 at 02:47:57PM +0800, Jason Wang wrote:

>>>>>> Hi All:

>>>>>>

>>>>>> This series tries to add the support for control virtqueue in vDPA.

>>>>>>

>>>>>> Control virtqueue is used by networking device for accepting various

>>>>>> commands from the driver. It's a must to support multiqueue and other

>>>>>> configurations.

>>>>>>

>>>>>> When used by vhost-vDPA bus driver for VM, the control virtqueue

>>>>>> should be shadowed via userspace VMM (Qemu) instead of being assigned

>>>>>> directly to Guest. This is because Qemu needs to know the device state

>>>>>> in order to start and stop device correctly (e.g for Live Migration).

>>>>>>

>>>>>> This requies to isolate the memory mapping for control virtqueue

>>>>>> presented by vhost-vDPA to prevent guest from accesing it directly.

>>>>>> To achieve this, vDPA introduce two new abstractions:

>>>>>>

>>>>>> - address space: identified through address space id (ASID) and a set

>>>>>>                     of memory mapping in maintained

>>>>>> - virtqueue group: the minimal set of virtqueues that must share an

>>>>>>                     address space

>>>>> How will this support the pretty common case where control vq

>>>>> is programmed by the kernel through the PF, and others by the VFs?

>>>> In this case, the VF parent need to provide a software control vq and decode

>>>> the command then send them to VF.

>>> But how does that tie to the address space infrastructure?

>> In this case, address space is not a must.

> That's ok, problem is I don't see how address space is going

> to work in this case at all.

>

> There's no address space there that userspace/guest can control.

>


The virtqueue group is mandated by parent but the association between 
virtqueue group and address space is under the control of userspace (Qemu).

A simple but common case is that:

1) Device advertise two virtqueue groups: group 0 contains RX and TX, 
group 1 contains CVQ.
2) Device advertise two address spaces

Then, for vhost-vDPA using by VM:

1) associate group 0 with as 0, group 1 with as 1 (via vhost-vDPA 
VHOST_VDPA_SET_GROUP_ASID)
2) Publish guest memory mapping via IOTLB asid 0
3) Publish control virtqueue mapping via IOTLB asid 1

Then the DMA is totally isolated in this case.

For vhost-vDPA using by DPDK or virtio-vDPA

1) associate group 0 and group 1 with as 0

since we don't need DMA isolation in this case.

In order to let it be controlled by Guest, we need extend virtio spec to 
support those concepts.

Thanks
Eli Cohen Dec. 29, 2020, 7:28 a.m. UTC | #7
On Wed, Dec 16, 2020 at 02:48:04PM +0800, Jason Wang wrote:
> This patches introduces the multiple address spaces support for vDPA

> device. This idea is to identify a specific address space via an

> dedicated identifier - ASID.

> 

> During vDPA device allocation, vDPA device driver needs to report the

> number of address spaces supported by the device then the DMA mapping

> ops of the vDPA device needs to be extended to support ASID.

> 

> This helps to isolate the environments for the virtqueue that will not

> be assigned directly. E.g in the case of virtio-net, the control

> virtqueue will not be assigned directly to guest.

> 

> As a start, simply claim 1 virtqueue groups and 1 address spaces for

> all vDPA devices. And vhost-vDPA will simply reject the device with

> more than 1 virtqueue groups or address spaces.

> 

> Signed-off-by: Jason Wang <jasowang@redhat.com>

> ---

>  drivers/vdpa/ifcvf/ifcvf_main.c   |  2 +-

>  drivers/vdpa/mlx5/net/mlx5_vnet.c |  5 +++--

>  drivers/vdpa/vdpa.c               |  4 +++-

>  drivers/vdpa/vdpa_sim/vdpa_sim.c  | 10 ++++++----

>  drivers/vhost/vdpa.c              | 14 +++++++++-----

>  include/linux/vdpa.h              | 23 ++++++++++++++++-------

>  6 files changed, 38 insertions(+), 20 deletions(-)

> 

> diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c

> index c629f4fcc738..8a43f562b169 100644

> --- a/drivers/vdpa/ifcvf/ifcvf_main.c

> +++ b/drivers/vdpa/ifcvf/ifcvf_main.c

> @@ -445,7 +445,7 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id)

>  

>  	adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa,

>  				    dev, &ifc_vdpa_ops,

> -				    IFCVF_MAX_QUEUE_PAIRS * 2, 1);

> +				    IFCVF_MAX_QUEUE_PAIRS * 2, 1, 1);

>  

>  	if (adapter == NULL) {

>  		IFCVF_ERR(pdev, "Failed to allocate vDPA structure");

> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c

> index 719b52fcc547..7aaf0a4ee80d 100644

> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c

> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c

> @@ -1804,7 +1804,8 @@ static u32 mlx5_vdpa_get_generation(struct vdpa_device *vdev)

>  	return mvdev->generation;

>  }

>  

> -static int mlx5_vdpa_set_map(struct vdpa_device *vdev, struct vhost_iotlb *iotlb)

> +static int mlx5_vdpa_set_map(struct vdpa_device *vdev, unsigned int asid,

> +			     struct vhost_iotlb *iotlb)

>  {

>  	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);

>  	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);

> @@ -1947,7 +1948,7 @@ void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev)

>  	max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);

>  

>  	ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, &mlx5_vdpa_ops,

> -				 2 * mlx5_vdpa_max_qps(max_vqs), 1);

> +				 2 * mlx5_vdpa_max_qps(max_vqs), 1, 1);

>  	if (IS_ERR(ndev))

>  		return ndev;

>  

> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c

> index 46399746ec7c..05195fa7865d 100644

> --- a/drivers/vdpa/vdpa.c

> +++ b/drivers/vdpa/vdpa.c

> @@ -63,6 +63,7 @@ static void vdpa_release_dev(struct device *d)

>   * @config: the bus operations that is supported by this device

>   * @nvqs: number of virtqueues supported by this device

>   * @ngroups: number of groups supported by this device

> + * @nas: number of address spaces supported by this device

>   * @size: size of the parent structure that contains private data

>   *

>   * Driver should use vdpa_alloc_device() wrapper macro instead of

> @@ -74,7 +75,7 @@ static void vdpa_release_dev(struct device *d)

>  struct vdpa_device *__vdpa_alloc_device(struct device *parent,

>  					const struct vdpa_config_ops *config,

>  					int nvqs, unsigned int ngroups,

> -					size_t size)

> +					unsigned int nas, size_t size)

>  {

>  	struct vdpa_device *vdev;

>  	int err = -EINVAL;

> @@ -102,6 +103,7 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent,

>  	vdev->features_valid = false;

>  	vdev->nvqs = nvqs;

>  	vdev->ngroups = ngroups;

> +	vdev->nas = nas;

>  

>  	err = dev_set_name(&vdev->dev, "vdpa%u", vdev->index);

>  	if (err)

> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c

> index 5d554b3cd152..140de45ffff2 100644

> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c

> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c

> @@ -359,7 +359,7 @@ static struct vdpasim *vdpasim_create(void)

>  		ops = &vdpasim_net_config_ops;

>  

>  	vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops,

> -				    VDPASIM_VQ_NUM, 1);

> +				    VDPASIM_VQ_NUM, 1, 1);

>  	if (!vdpasim)

>  		goto err_alloc;

>  

> @@ -606,7 +606,7 @@ static struct vdpa_iova_range vdpasim_get_iova_range(struct vdpa_device *vdpa)

>  	return range;

>  }

>  

> -static int vdpasim_set_map(struct vdpa_device *vdpa,

> +static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid,

>  			   struct vhost_iotlb *iotlb)

>  {

>  	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);

> @@ -633,7 +633,8 @@ static int vdpasim_set_map(struct vdpa_device *vdpa,

>  	return ret;

>  }

>  

> -static int vdpasim_dma_map(struct vdpa_device *vdpa, u64 iova, u64 size,

> +static int vdpasim_dma_map(struct vdpa_device *vdpa, unsigned int asid,

> +			   u64 iova, u64 size,

>  			   u64 pa, u32 perm)

>  {

>  	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);

> @@ -647,7 +648,8 @@ static int vdpasim_dma_map(struct vdpa_device *vdpa, u64 iova, u64 size,

>  	return ret;

>  }

>  

> -static int vdpasim_dma_unmap(struct vdpa_device *vdpa, u64 iova, u64 size)

> +static int vdpasim_dma_unmap(struct vdpa_device *vdpa, unsigned int asid,

> +			     u64 iova, u64 size)

>  {

>  	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);

>  

> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c

> index 9bcc03d4e68b..03a9b3311c6c 100644

> --- a/drivers/vhost/vdpa.c

> +++ b/drivers/vhost/vdpa.c

> @@ -570,10 +570,10 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, struct vhost_iotlb *iotlb,

>  		return r;

>  

>  	if (ops->dma_map) {

> -		r = ops->dma_map(vdpa, iova, size, pa, perm);

> +		r = ops->dma_map(vdpa, 0, iova, size, pa, perm);

>  	} else if (ops->set_map) {

>  		if (!v->in_batch)

> -			r = ops->set_map(vdpa, iotlb);

> +			r = ops->set_map(vdpa, 0, iotlb);

>  	} else {

>  		r = iommu_map(v->domain, iova, pa, size,

>  			      perm_to_iommu_flags(perm));

> @@ -597,10 +597,10 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v,

>  	vhost_vdpa_iotlb_unmap(v, iotlb, iova, iova + size - 1);

>  

>  	if (ops->dma_map) {

> -		ops->dma_unmap(vdpa, iova, size);

> +		ops->dma_unmap(vdpa, 0, iova, size);

>  	} else if (ops->set_map) {

>  		if (!v->in_batch)

> -			ops->set_map(vdpa, iotlb);

> +			ops->set_map(vdpa, 0, iotlb);

>  	} else {

>  		iommu_unmap(v->domain, iova, size);

>  	}

> @@ -764,7 +764,7 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev,

>  		break;

>  	case VHOST_IOTLB_BATCH_END:

>  		if (v->in_batch && ops->set_map)

> -			ops->set_map(vdpa, iotlb);

> +			ops->set_map(vdpa, 0, iotlb);

>  		v->in_batch = false;

>  		break;

>  	default:

> @@ -1032,6 +1032,10 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)

>  	int minor;

>  	int r;

>  

> +	/* Only support 1 address space and 1 groups */

> +	if (vdpa->ngroups != 1 || vdpa->nas != 1)

> +		return -ENOTSUPP;

> +

>  	/* Currently, we only accept the network devices. */

>  	if (ops->get_device_id(vdpa) != VIRTIO_ID_NET)

>  		return -ENOTSUPP;

> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h

> index bfc6790b263e..0a9a754f8180 100644

> --- a/include/linux/vdpa.h

> +++ b/include/linux/vdpa.h

> @@ -43,6 +43,8 @@ struct vdpa_vq_state {

>   * @index: device index

>   * @features_valid: were features initialized? for legacy guests

>   * @nvqs: the number of virtqueues

> + * @ngroups: the number of virtqueue groups

> + * @nas: the number of address spaces


I am not sure these can be categorised as part of the state of the VQ.
It's more of a property so maybe we can have a callback to get the
properties of the VQ?

Moreover, I think they should be handled in the hardware drivers to
return 1 for both ngroups and nas.

>   */

>  struct vdpa_device {

>  	struct device dev;

> @@ -52,6 +54,7 @@ struct vdpa_device {

>  	bool features_valid;

>  	int nvqs;

>  	unsigned int ngroups;

> +	unsigned int nas;

>  };

>  

>  /**

> @@ -175,6 +178,7 @@ struct vdpa_iova_range {

>   *				Needed for device that using device

>   *				specific DMA translation (on-chip IOMMU)

>   *				@vdev: vdpa device

> + *				@asid: address space identifier

>   *				@iotlb: vhost memory mapping to be

>   *				used by the vDPA

>   *				Returns integer: success (0) or error (< 0)

> @@ -183,6 +187,7 @@ struct vdpa_iova_range {

>   *				specific DMA translation (on-chip IOMMU)

>   *				and preferring incremental map.

>   *				@vdev: vdpa device

> + *				@asid: address space identifier

>   *				@iova: iova to be mapped

>   *				@size: size of the area

>   *				@pa: physical address for the map

> @@ -194,6 +199,7 @@ struct vdpa_iova_range {

>   *				specific DMA translation (on-chip IOMMU)

>   *				and preferring incremental unmap.

>   *				@vdev: vdpa device

> + *				@asid: address space identifier

>   *				@iova: iova to be unmapped

>   *				@size: size of the area

>   *				Returns integer: success (0) or error (< 0)

> @@ -240,10 +246,12 @@ struct vdpa_config_ops {

>  	struct vdpa_iova_range (*get_iova_range)(struct vdpa_device *vdev);

>  

>  	/* DMA ops */

> -	int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);

> -	int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,

> -		       u64 pa, u32 perm);

> -	int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);

> +	int (*set_map)(struct vdpa_device *vdev, unsigned int asid,

> +		       struct vhost_iotlb *iotlb);

> +	int (*dma_map)(struct vdpa_device *vdev, unsigned int asid,

> +		       u64 iova, u64 size, u64 pa, u32 perm);

> +	int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid,

> +			 u64 iova, u64 size);

>  

>  	/* Free device resources */

>  	void (*free)(struct vdpa_device *vdev);

> @@ -252,11 +260,12 @@ struct vdpa_config_ops {

>  struct vdpa_device *__vdpa_alloc_device(struct device *parent,

>  					const struct vdpa_config_ops *config,

>  					int nvqs, unsigned int ngroups,

> -					size_t size);

> +					unsigned int nas, size_t size);

>  

> -#define vdpa_alloc_device(dev_struct, member, parent, config, nvqs, ngroups) \

> +#define vdpa_alloc_device(dev_struct, member, parent, config, nvqs, \

> +			  ngroups, nas)				    \

>  			  container_of(__vdpa_alloc_device( \

> -				       parent, config, nvqs, ngroups, \

> +				       parent, config, nvqs, ngroups, nas,  \

>  				       sizeof(dev_struct) + \

>  				       BUILD_BUG_ON_ZERO(offsetof( \

>  				       dev_struct, member))), \

> -- 

> 2.25.1

>
Eli Cohen Dec. 29, 2020, 11:41 a.m. UTC | #8
On Wed, Dec 16, 2020 at 02:48:08PM +0800, Jason Wang wrote:
> This patch converts the vhost-vDPA device to support multiple IOTLBs

> tagged via ASID via hlist. This will be used for supporting multiple

> address spaces in the following patches.

> 

> Signed-off-by: Jason Wang <jasowang@redhat.com>

> ---

>  drivers/vhost/vdpa.c | 106 ++++++++++++++++++++++++++++++++-----------

>  1 file changed, 80 insertions(+), 26 deletions(-)

> 

> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c

> index feb6a58df22d..060d5b5b7e64 100644

> --- a/drivers/vhost/vdpa.c

> +++ b/drivers/vhost/vdpa.c

> @@ -33,13 +33,21 @@ enum {

>  

>  #define VHOST_VDPA_DEV_MAX (1U << MINORBITS)

>  

> +#define VHOST_VDPA_IOTLB_BUCKETS 16

> +

> +struct vhost_vdpa_as {

> +	struct hlist_node hash_link;

> +	struct vhost_iotlb iotlb;

> +	u32 id;

> +};

> +

>  struct vhost_vdpa {

>  	struct vhost_dev vdev;

>  	struct iommu_domain *domain;

>  	struct vhost_virtqueue *vqs;

>  	struct completion completion;

>  	struct vdpa_device *vdpa;

> -	struct vhost_iotlb *iotlb;

> +	struct hlist_head as[VHOST_VDPA_IOTLB_BUCKETS];

>  	struct device dev;

>  	struct cdev cdev;

>  	atomic_t opened;

> @@ -49,12 +57,64 @@ struct vhost_vdpa {

>  	struct eventfd_ctx *config_ctx;

>  	int in_batch;

>  	struct vdpa_iova_range range;

> +	int used_as;


This is not really used. Not in this patch and later removed.

>  };

>  

>  static DEFINE_IDA(vhost_vdpa_ida);

>  

>  static dev_t vhost_vdpa_major;

>  

> +static struct vhost_vdpa_as *asid_to_as(struct vhost_vdpa *v, u32 asid)

> +{

> +	struct hlist_head *head = &v->as[asid % VHOST_VDPA_IOTLB_BUCKETS];

> +	struct vhost_vdpa_as *as;

> +

> +	hlist_for_each_entry(as, head, hash_link)

> +		if (as->id == asid)

> +			return as;

> +

> +	return NULL;

> +}

> +

> +static struct vhost_vdpa_as *vhost_vdpa_alloc_as(struct vhost_vdpa *v, u32 asid)

> +{

> +	struct hlist_head *head = &v->as[asid % VHOST_VDPA_IOTLB_BUCKETS];

> +	struct vhost_vdpa_as *as;

> +

> +	if (asid_to_as(v, asid))

> +		return NULL;

> +

> +	as = kmalloc(sizeof(*as), GFP_KERNEL);

> +	if (!as)

> +		return NULL;

> +

> +	vhost_iotlb_init(&as->iotlb, 0, 0);

> +	as->id = asid;

> +	hlist_add_head(&as->hash_link, head);

> +	++v->used_as;

> +

> +	return as;

> +}

> +

> +static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)

> +{

> +	struct vhost_vdpa_as *as = asid_to_as(v, asid);

> +

> +	/* Remove default address space is not allowed */

> +	if (asid == 0)

> +		return -EINVAL;

> +

> +	if (!as)

> +		return -EINVAL;

> +

> +	hlist_del(&as->hash_link);

> +	vhost_iotlb_reset(&as->iotlb);

> +	kfree(as);

> +	--v->used_as;

> +

> +	return 0;

> +}

> +

>  static void handle_vq_kick(struct vhost_work *work)

>  {

>  	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,

> @@ -525,15 +585,6 @@ static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v,

>  	}

>  }

>  

> -static void vhost_vdpa_iotlb_free(struct vhost_vdpa *v)

> -{

> -	struct vhost_iotlb *iotlb = v->iotlb;

> -

> -	vhost_vdpa_iotlb_unmap(v, iotlb, 0ULL, 0ULL - 1);

> -	kfree(v->iotlb);

> -	v->iotlb = NULL;

> -}

> -

>  static int perm_to_iommu_flags(u32 perm)

>  {

>  	int flags = 0;

> @@ -745,7 +796,8 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, u32 asid,

>  	struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, vdev);

>  	struct vdpa_device *vdpa = v->vdpa;

>  	const struct vdpa_config_ops *ops = vdpa->config;

> -	struct vhost_iotlb *iotlb = v->iotlb;

> +	struct vhost_vdpa_as *as = asid_to_as(v, 0);

> +	struct vhost_iotlb *iotlb = &as->iotlb;

>  	int r = 0;

>  

>  	if (asid != 0)

> @@ -856,6 +908,13 @@ static void vhost_vdpa_set_iova_range(struct vhost_vdpa *v)

>  	}

>  }

>  

> +static void vhost_vdpa_cleanup(struct vhost_vdpa *v)

> +{

> +	vhost_dev_cleanup(&v->vdev);

> +	kfree(v->vdev.vqs);

> +	vhost_vdpa_remove_as(v, 0);

> +}

> +

>  static int vhost_vdpa_open(struct inode *inode, struct file *filep)

>  {

>  	struct vhost_vdpa *v;

> @@ -886,15 +945,12 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)

>  	vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,

>  		       vhost_vdpa_process_iotlb_msg);

>  

> -	v->iotlb = vhost_iotlb_alloc(0, 0);

> -	if (!v->iotlb) {

> -		r = -ENOMEM;

> -		goto err_init_iotlb;

> -	}

> +	if (!vhost_vdpa_alloc_as(v, 0))

> +		goto err_alloc_as;

>  

>  	r = vhost_vdpa_alloc_domain(v);

>  	if (r)

> -		goto err_alloc_domain;

> +		goto err_alloc_as;

>  

>  	vhost_vdpa_set_iova_range(v);

>  

> @@ -902,11 +958,8 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)

>  

>  	return 0;

>  

> -err_alloc_domain:

> -	vhost_vdpa_iotlb_free(v);

> -err_init_iotlb:

> -	vhost_dev_cleanup(&v->vdev);

> -	kfree(vqs);

> +err_alloc_as:

> +	vhost_vdpa_cleanup(v);

>  err:

>  	atomic_dec(&v->opened);

>  	return r;

> @@ -933,12 +986,10 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep)

>  	filep->private_data = NULL;

>  	vhost_vdpa_reset(v);

>  	vhost_dev_stop(&v->vdev);

> -	vhost_vdpa_iotlb_free(v);

>  	vhost_vdpa_free_domain(v);

>  	vhost_vdpa_config_put(v);

>  	vhost_vdpa_clean_irq(v);

> -	vhost_dev_cleanup(&v->vdev);

> -	kfree(v->vdev.vqs);

> +	vhost_vdpa_cleanup(v);

>  	mutex_unlock(&d->mutex);

>  

>  	atomic_dec(&v->opened);

> @@ -1033,7 +1084,7 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)

>  	const struct vdpa_config_ops *ops = vdpa->config;

>  	struct vhost_vdpa *v;

>  	int minor;

> -	int r;

> +	int i, r;

>  

>  	/* Only support 1 address space and 1 groups */

>  	if (vdpa->ngroups != 1 || vdpa->nas != 1)

> @@ -1085,6 +1136,9 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)

>  	init_completion(&v->completion);

>  	vdpa_set_drvdata(vdpa, v);

>  

> +	for (i = 0; i < VHOST_VDPA_IOTLB_BUCKETS; i++)

> +		INIT_HLIST_HEAD(&v->as[i]);

> +

>  	return 0;

>  

>  err:

> -- 

> 2.25.1

>
Eli Cohen Dec. 29, 2020, 11:53 a.m. UTC | #9
On Wed, Dec 16, 2020 at 02:48:08PM +0800, Jason Wang wrote:
> This patch converts the vhost-vDPA device to support multiple IOTLBs

> tagged via ASID via hlist. This will be used for supporting multiple

> address spaces in the following patches.

> 

> Signed-off-by: Jason Wang <jasowang@redhat.com>

> ---

>  drivers/vhost/vdpa.c | 106 ++++++++++++++++++++++++++++++++-----------

>  1 file changed, 80 insertions(+), 26 deletions(-)

> 

> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c

> index feb6a58df22d..060d5b5b7e64 100644

> --- a/drivers/vhost/vdpa.c

> +++ b/drivers/vhost/vdpa.c

> @@ -33,13 +33,21 @@ enum {

>  

>  #define VHOST_VDPA_DEV_MAX (1U << MINORBITS)

>  

> +#define VHOST_VDPA_IOTLB_BUCKETS 16

> +

> +struct vhost_vdpa_as {

> +	struct hlist_node hash_link;

> +	struct vhost_iotlb iotlb;

> +	u32 id;

> +};

> +

>  struct vhost_vdpa {

>  	struct vhost_dev vdev;

>  	struct iommu_domain *domain;

>  	struct vhost_virtqueue *vqs;

>  	struct completion completion;

>  	struct vdpa_device *vdpa;

> -	struct vhost_iotlb *iotlb;

> +	struct hlist_head as[VHOST_VDPA_IOTLB_BUCKETS];

>  	struct device dev;

>  	struct cdev cdev;

>  	atomic_t opened;

> @@ -49,12 +57,64 @@ struct vhost_vdpa {

>  	struct eventfd_ctx *config_ctx;

>  	int in_batch;

>  	struct vdpa_iova_range range;

> +	int used_as;

>  };

>  

>  static DEFINE_IDA(vhost_vdpa_ida);

>  

>  static dev_t vhost_vdpa_major;

>  

> +static struct vhost_vdpa_as *asid_to_as(struct vhost_vdpa *v, u32 asid)

> +{

> +	struct hlist_head *head = &v->as[asid % VHOST_VDPA_IOTLB_BUCKETS];

> +	struct vhost_vdpa_as *as;

> +

> +	hlist_for_each_entry(as, head, hash_link)

> +		if (as->id == asid)

> +			return as;

> +

> +	return NULL;

> +}

> +

> +static struct vhost_vdpa_as *vhost_vdpa_alloc_as(struct vhost_vdpa *v, u32 asid)

> +{

> +	struct hlist_head *head = &v->as[asid % VHOST_VDPA_IOTLB_BUCKETS];

> +	struct vhost_vdpa_as *as;

> +

> +	if (asid_to_as(v, asid))

> +		return NULL;

> +

> +	as = kmalloc(sizeof(*as), GFP_KERNEL);


kzalloc()? See comment below.

> +	if (!as)

> +		return NULL;

> +

> +	vhost_iotlb_init(&as->iotlb, 0, 0);

> +	as->id = asid;

> +	hlist_add_head(&as->hash_link, head);

> +	++v->used_as;


Although you eventually ended up removing used_as, this is a bug since
you're incrementing a random value. Maybe it's better to be on the safe
side and use kzalloc() for as above.

> +

> +	return as;

> +}

> +

> +static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)

> +{

> +	struct vhost_vdpa_as *as = asid_to_as(v, asid);

> +

> +	/* Remove default address space is not allowed */

> +	if (asid == 0)

> +		return -EINVAL;

> +

> +	if (!as)

> +		return -EINVAL;

> +

> +	hlist_del(&as->hash_link);

> +	vhost_iotlb_reset(&as->iotlb);

> +	kfree(as);

> +	--v->used_as;

> +

> +	return 0;

> +}

> +

>  static void handle_vq_kick(struct vhost_work *work)

>  {

>  	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,

> @@ -525,15 +585,6 @@ static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v,

>  	}

>  }

>  

> -static void vhost_vdpa_iotlb_free(struct vhost_vdpa *v)

> -{

> -	struct vhost_iotlb *iotlb = v->iotlb;

> -

> -	vhost_vdpa_iotlb_unmap(v, iotlb, 0ULL, 0ULL - 1);

> -	kfree(v->iotlb);

> -	v->iotlb = NULL;

> -}

> -

>  static int perm_to_iommu_flags(u32 perm)

>  {

>  	int flags = 0;

> @@ -745,7 +796,8 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, u32 asid,

>  	struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, vdev);

>  	struct vdpa_device *vdpa = v->vdpa;

>  	const struct vdpa_config_ops *ops = vdpa->config;

> -	struct vhost_iotlb *iotlb = v->iotlb;

> +	struct vhost_vdpa_as *as = asid_to_as(v, 0);

> +	struct vhost_iotlb *iotlb = &as->iotlb;

>  	int r = 0;

>  

>  	if (asid != 0)

> @@ -856,6 +908,13 @@ static void vhost_vdpa_set_iova_range(struct vhost_vdpa *v)

>  	}

>  }

>  

> +static void vhost_vdpa_cleanup(struct vhost_vdpa *v)

> +{

> +	vhost_dev_cleanup(&v->vdev);

> +	kfree(v->vdev.vqs);

> +	vhost_vdpa_remove_as(v, 0);

> +}

> +

>  static int vhost_vdpa_open(struct inode *inode, struct file *filep)

>  {

>  	struct vhost_vdpa *v;

> @@ -886,15 +945,12 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)

>  	vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,

>  		       vhost_vdpa_process_iotlb_msg);

>  

> -	v->iotlb = vhost_iotlb_alloc(0, 0);

> -	if (!v->iotlb) {

> -		r = -ENOMEM;

> -		goto err_init_iotlb;

> -	}

> +	if (!vhost_vdpa_alloc_as(v, 0))

> +		goto err_alloc_as;

>  

>  	r = vhost_vdpa_alloc_domain(v);

>  	if (r)

> -		goto err_alloc_domain;

> +		goto err_alloc_as;

>  

>  	vhost_vdpa_set_iova_range(v);

>  

> @@ -902,11 +958,8 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)

>  

>  	return 0;

>  

> -err_alloc_domain:

> -	vhost_vdpa_iotlb_free(v);

> -err_init_iotlb:

> -	vhost_dev_cleanup(&v->vdev);

> -	kfree(vqs);

> +err_alloc_as:

> +	vhost_vdpa_cleanup(v);

>  err:

>  	atomic_dec(&v->opened);

>  	return r;

> @@ -933,12 +986,10 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep)

>  	filep->private_data = NULL;

>  	vhost_vdpa_reset(v);

>  	vhost_dev_stop(&v->vdev);

> -	vhost_vdpa_iotlb_free(v);

>  	vhost_vdpa_free_domain(v);

>  	vhost_vdpa_config_put(v);

>  	vhost_vdpa_clean_irq(v);

> -	vhost_dev_cleanup(&v->vdev);

> -	kfree(v->vdev.vqs);

> +	vhost_vdpa_cleanup(v);

>  	mutex_unlock(&d->mutex);

>  

>  	atomic_dec(&v->opened);

> @@ -1033,7 +1084,7 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)

>  	const struct vdpa_config_ops *ops = vdpa->config;

>  	struct vhost_vdpa *v;

>  	int minor;

> -	int r;

> +	int i, r;

>  

>  	/* Only support 1 address space and 1 groups */

>  	if (vdpa->ngroups != 1 || vdpa->nas != 1)

> @@ -1085,6 +1136,9 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)

>  	init_completion(&v->completion);

>  	vdpa_set_drvdata(vdpa, v);

>  

> +	for (i = 0; i < VHOST_VDPA_IOTLB_BUCKETS; i++)

> +		INIT_HLIST_HEAD(&v->as[i]);

> +

>  	return 0;

>  

>  err:

> -- 

> 2.25.1

>
Eli Cohen Dec. 29, 2020, 12:05 p.m. UTC | #10
On Wed, Dec 16, 2020 at 02:48:08PM +0800, Jason Wang wrote:
> This patch converts the vhost-vDPA device to support multiple IOTLBs

> tagged via ASID via hlist. This will be used for supporting multiple

> address spaces in the following patches.

> 

> Signed-off-by: Jason Wang <jasowang@redhat.com>

> ---

>  drivers/vhost/vdpa.c | 106 ++++++++++++++++++++++++++++++++-----------

>  1 file changed, 80 insertions(+), 26 deletions(-)

> 

> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c

> index feb6a58df22d..060d5b5b7e64 100644

> --- a/drivers/vhost/vdpa.c

> +++ b/drivers/vhost/vdpa.c

> @@ -33,13 +33,21 @@ enum {

>  

>  #define VHOST_VDPA_DEV_MAX (1U << MINORBITS)

>  

> +#define VHOST_VDPA_IOTLB_BUCKETS 16

> +

> +struct vhost_vdpa_as {

> +	struct hlist_node hash_link;

> +	struct vhost_iotlb iotlb;

> +	u32 id;

> +};

> +

>  struct vhost_vdpa {

>  	struct vhost_dev vdev;

>  	struct iommu_domain *domain;

>  	struct vhost_virtqueue *vqs;

>  	struct completion completion;

>  	struct vdpa_device *vdpa;

> -	struct vhost_iotlb *iotlb;

> +	struct hlist_head as[VHOST_VDPA_IOTLB_BUCKETS];

>  	struct device dev;

>  	struct cdev cdev;

>  	atomic_t opened;

> @@ -49,12 +57,64 @@ struct vhost_vdpa {

>  	struct eventfd_ctx *config_ctx;

>  	int in_batch;

>  	struct vdpa_iova_range range;

> +	int used_as;

>  };

>  

>  static DEFINE_IDA(vhost_vdpa_ida);

>  

>  static dev_t vhost_vdpa_major;

>  

> +static struct vhost_vdpa_as *asid_to_as(struct vhost_vdpa *v, u32 asid)

> +{

> +	struct hlist_head *head = &v->as[asid % VHOST_VDPA_IOTLB_BUCKETS];

> +	struct vhost_vdpa_as *as;

> +

> +	hlist_for_each_entry(as, head, hash_link)

> +		if (as->id == asid)

> +			return as;

> +

> +	return NULL;

> +}

> +

> +static struct vhost_vdpa_as *vhost_vdpa_alloc_as(struct vhost_vdpa *v, u32 asid)

> +{

> +	struct hlist_head *head = &v->as[asid % VHOST_VDPA_IOTLB_BUCKETS];

> +	struct vhost_vdpa_as *as;

> +

> +	if (asid_to_as(v, asid))

> +		return NULL;

> +

> +	as = kmalloc(sizeof(*as), GFP_KERNEL);

> +	if (!as)

> +		return NULL;

> +

> +	vhost_iotlb_init(&as->iotlb, 0, 0);

> +	as->id = asid;

> +	hlist_add_head(&as->hash_link, head);

> +	++v->used_as;

> +

> +	return as;

> +}

> +

> +static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)


The return value is never interpreted. I think it should either be made
void or return values checked.

> +{

> +	struct vhost_vdpa_as *as = asid_to_as(v, asid);

> +

> +	/* Remove default address space is not allowed */

> +	if (asid == 0)

> +		return -EINVAL;


Can you explain why? I think you have a memory leak due to this as no
one will ever free as with id 0.

> +

> +	if (!as)

> +		return -EINVAL;

> +

> +	hlist_del(&as->hash_link);

> +	vhost_iotlb_reset(&as->iotlb);

> +	kfree(as);

> +	--v->used_as;

> +

> +	return 0;

> +}

> +

>  static void handle_vq_kick(struct vhost_work *work)

>  {

>  	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,

> @@ -525,15 +585,6 @@ static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v,

>  	}

>  }

>  

> -static void vhost_vdpa_iotlb_free(struct vhost_vdpa *v)

> -{

> -	struct vhost_iotlb *iotlb = v->iotlb;

> -

> -	vhost_vdpa_iotlb_unmap(v, iotlb, 0ULL, 0ULL - 1);

> -	kfree(v->iotlb);

> -	v->iotlb = NULL;

> -}

> -

>  static int perm_to_iommu_flags(u32 perm)

>  {

>  	int flags = 0;

> @@ -745,7 +796,8 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, u32 asid,

>  	struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, vdev);

>  	struct vdpa_device *vdpa = v->vdpa;

>  	const struct vdpa_config_ops *ops = vdpa->config;

> -	struct vhost_iotlb *iotlb = v->iotlb;

> +	struct vhost_vdpa_as *as = asid_to_as(v, 0);

> +	struct vhost_iotlb *iotlb = &as->iotlb;

>  	int r = 0;

>  

>  	if (asid != 0)

> @@ -856,6 +908,13 @@ static void vhost_vdpa_set_iova_range(struct vhost_vdpa *v)

>  	}

>  }

>  

> +static void vhost_vdpa_cleanup(struct vhost_vdpa *v)

> +{

> +	vhost_dev_cleanup(&v->vdev);

> +	kfree(v->vdev.vqs);

> +	vhost_vdpa_remove_as(v, 0);

> +}

> +

>  static int vhost_vdpa_open(struct inode *inode, struct file *filep)

>  {

>  	struct vhost_vdpa *v;

> @@ -886,15 +945,12 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)

>  	vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,

>  		       vhost_vdpa_process_iotlb_msg);

>  

> -	v->iotlb = vhost_iotlb_alloc(0, 0);

> -	if (!v->iotlb) {

> -		r = -ENOMEM;

> -		goto err_init_iotlb;

> -	}

> +	if (!vhost_vdpa_alloc_as(v, 0))

> +		goto err_alloc_as;

>  

>  	r = vhost_vdpa_alloc_domain(v);

>  	if (r)

> -		goto err_alloc_domain;

> +		goto err_alloc_as;

>  

>  	vhost_vdpa_set_iova_range(v);

>  

> @@ -902,11 +958,8 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)

>  

>  	return 0;

>  

> -err_alloc_domain:

> -	vhost_vdpa_iotlb_free(v);

> -err_init_iotlb:

> -	vhost_dev_cleanup(&v->vdev);

> -	kfree(vqs);

> +err_alloc_as:

> +	vhost_vdpa_cleanup(v);

>  err:

>  	atomic_dec(&v->opened);

>  	return r;

> @@ -933,12 +986,10 @@ static int vhost_vdpa_release(struct inode *inode, struct file *filep)

>  	filep->private_data = NULL;

>  	vhost_vdpa_reset(v);

>  	vhost_dev_stop(&v->vdev);

> -	vhost_vdpa_iotlb_free(v);

>  	vhost_vdpa_free_domain(v);

>  	vhost_vdpa_config_put(v);

>  	vhost_vdpa_clean_irq(v);

> -	vhost_dev_cleanup(&v->vdev);

> -	kfree(v->vdev.vqs);

> +	vhost_vdpa_cleanup(v);

>  	mutex_unlock(&d->mutex);

>  

>  	atomic_dec(&v->opened);

> @@ -1033,7 +1084,7 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)

>  	const struct vdpa_config_ops *ops = vdpa->config;

>  	struct vhost_vdpa *v;

>  	int minor;

> -	int r;

> +	int i, r;

>  

>  	/* Only support 1 address space and 1 groups */

>  	if (vdpa->ngroups != 1 || vdpa->nas != 1)

> @@ -1085,6 +1136,9 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)

>  	init_completion(&v->completion);

>  	vdpa_set_drvdata(vdpa, v);

>  

> +	for (i = 0; i < VHOST_VDPA_IOTLB_BUCKETS; i++)

> +		INIT_HLIST_HEAD(&v->as[i]);

> +

>  	return 0;

>  

>  err:

> -- 

> 2.25.1

>
Jason Wang Dec. 30, 2020, 4 a.m. UTC | #11
On 2020/12/29 下午3:28, Eli Cohen wrote:
>> @@ -43,6 +43,8 @@ struct vdpa_vq_state {

>>    * @index: device index

>>    * @features_valid: were features initialized? for legacy guests

>>    * @nvqs: the number of virtqueues

>> + * @ngroups: the number of virtqueue groups

>> + * @nas: the number of address spaces

> I am not sure these can be categorised as part of the state of the VQ.

> It's more of a property so maybe we can have a callback to get the

> properties of the VQ?

>

> Moreover, I think they should be handled in the hardware drivers to

> return 1 for both ngroups and nas.



We can, but those are static attributes that is not expected to be 
changed by the driver.

Introduce callbacks for those static stuffs does not give us any advantage.

For group id and notification area, since we don't have a abstract of 
vdpa_virtqueue. The only choice is to introduce callbacks for them.

Maybe it's time to introduce vdpa_virtqueue.

Thanks
Jason Wang Dec. 30, 2020, 4:04 a.m. UTC | #12
On 2020/12/29 下午3:28, Eli Cohen wrote:
>> @@ -43,6 +43,8 @@ struct vdpa_vq_state {

>>    * @index: device index

>>    * @features_valid: were features initialized? for legacy guests

>>    * @nvqs: the number of virtqueues

>> + * @ngroups: the number of virtqueue groups

>> + * @nas: the number of address spaces

> I am not sure these can be categorised as part of the state of the VQ.

> It's more of a property so maybe we can have a callback to get the

> properties of the VQ?



Or maybe there's a misunderstanding of the patch.

Those two attributes belongs to vdpa_device instead of vdpa_vq_state 
actually.

Thanks
Jason Wang Dec. 30, 2020, 6:23 a.m. UTC | #13
On 2020/12/29 下午7:41, Eli Cohen wrote:
> On Wed, Dec 16, 2020 at 02:48:08PM +0800, Jason Wang wrote:

>> This patch converts the vhost-vDPA device to support multiple IOTLBs

>> tagged via ASID via hlist. This will be used for supporting multiple

>> address spaces in the following patches.

>>

>> Signed-off-by: Jason Wang <jasowang@redhat.com>

>> ---

>>   drivers/vhost/vdpa.c | 106 ++++++++++++++++++++++++++++++++-----------

>>   1 file changed, 80 insertions(+), 26 deletions(-)

>>

>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c

>> index feb6a58df22d..060d5b5b7e64 100644

>> --- a/drivers/vhost/vdpa.c

>> +++ b/drivers/vhost/vdpa.c

>> @@ -33,13 +33,21 @@ enum {

>>   

>>   #define VHOST_VDPA_DEV_MAX (1U << MINORBITS)

>>   

>> +#define VHOST_VDPA_IOTLB_BUCKETS 16

>> +

>> +struct vhost_vdpa_as {

>> +	struct hlist_node hash_link;

>> +	struct vhost_iotlb iotlb;

>> +	u32 id;

>> +};

>> +

>>   struct vhost_vdpa {

>>   	struct vhost_dev vdev;

>>   	struct iommu_domain *domain;

>>   	struct vhost_virtqueue *vqs;

>>   	struct completion completion;

>>   	struct vdpa_device *vdpa;

>> -	struct vhost_iotlb *iotlb;

>> +	struct hlist_head as[VHOST_VDPA_IOTLB_BUCKETS];

>>   	struct device dev;

>>   	struct cdev cdev;

>>   	atomic_t opened;

>> @@ -49,12 +57,64 @@ struct vhost_vdpa {

>>   	struct eventfd_ctx *config_ctx;

>>   	int in_batch;

>>   	struct vdpa_iova_range range;

>> +	int used_as;

> This is not really used. Not in this patch and later removed.



Right, will remove this.

Thanks
Jason Wang Dec. 30, 2020, 6:33 a.m. UTC | #14
On 2020/12/29 下午8:05, Eli Cohen wrote:
>> +

>> +static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)

> The return value is never interpreted. I think it should either be made

> void or return values checked.



Right, will make it void.


>

>> +{

>> +	struct vhost_vdpa_as *as = asid_to_as(v, asid);

>> +

>> +	/* Remove default address space is not allowed */

>> +	if (asid == 0)

>> +		return -EINVAL;

> Can you explain why? I think you have a memory leak due to this as no

> one will ever free as with id 0.

>


Looks like a bug. Will remove this.

Thanks
Jason Wang Dec. 30, 2020, 6:34 a.m. UTC | #15
On 2020/12/29 下午7:53, Eli Cohen wrote:
>> +

>> +static struct vhost_vdpa_as *vhost_vdpa_alloc_as(struct vhost_vdpa *v, u32 asid)

>> +{

>> +	struct hlist_head *head = &v->as[asid % VHOST_VDPA_IOTLB_BUCKETS];

>> +	struct vhost_vdpa_as *as;

>> +

>> +	if (asid_to_as(v, asid))

>> +		return NULL;

>> +

>> +	as = kmalloc(sizeof(*as), GFP_KERNEL);

> kzalloc()? See comment below.

>

>> +	if (!as)

>> +		return NULL;

>> +

>> +	vhost_iotlb_init(&as->iotlb, 0, 0);

>> +	as->id = asid;

>> +	hlist_add_head(&as->hash_link, head);

>> +	++v->used_as;

> Although you eventually ended up removing used_as, this is a bug since

> you're incrementing a random value. Maybe it's better to be on the safe

> side and use kzalloc() for as above.



Yes and used_as needs to be removed.

Thanks



>
Eli Cohen Dec. 30, 2020, 9:44 a.m. UTC | #16
On Wed, Dec 30, 2020 at 12:04:30PM +0800, Jason Wang wrote:
> 

> On 2020/12/29 下午3:28, Eli Cohen wrote:

> > > @@ -43,6 +43,8 @@ struct vdpa_vq_state {

> > >    * @index: device index

> > >    * @features_valid: were features initialized? for legacy guests

> > >    * @nvqs: the number of virtqueues

> > > + * @ngroups: the number of virtqueue groups

> > > + * @nas: the number of address spaces

> > I am not sure these can be categorised as part of the state of the VQ.

> > It's more of a property so maybe we can have a callback to get the

> > properties of the VQ?

> 

> 

> Or maybe there's a misunderstanding of the patch.

> 


Yes, I misinterpreted the hunk. No issue here.

> Those two attributes belongs to vdpa_device instead of vdpa_vq_state

> actually.

> 

> Thanks

>
Stefan Hajnoczi Jan. 4, 2021, 10:04 a.m. UTC | #17
On Wed, Dec 16, 2020 at 02:48:03PM +0800, Jason Wang wrote:
> This patch introduces virtqueue groups to vDPA device. The virtqueue

> group is the minimal set of virtqueues that must share an address

> space. And the adddress space identifier could only be attached to

> a specific virtqueue group.

> 

> A new mandated bus operation is introduced to get the virtqueue group

> ID for a specific virtqueue.

> 

> All the vDPA device drivers were converted to simply support a single

> virtqueue group.

> 

> Signed-off-by: Jason Wang <jasowang@redhat.com>

> ---

>  drivers/vdpa/ifcvf/ifcvf_main.c   |  9 ++++++++-

>  drivers/vdpa/mlx5/net/mlx5_vnet.c |  8 +++++++-

>  drivers/vdpa/vdpa.c               |  4 +++-

>  drivers/vdpa/vdpa_sim/vdpa_sim.c  | 11 ++++++++++-

>  include/linux/vdpa.h              | 12 +++++++++---

>  5 files changed, 37 insertions(+), 7 deletions(-)


Maybe consider calling it iotlb_group or iommu_group so the purpose of
the group is clear?
Jason Wang Jan. 5, 2021, 4:13 a.m. UTC | #18
On 2021/1/4 下午6:04, Stefan Hajnoczi wrote:
> On Wed, Dec 16, 2020 at 02:48:03PM +0800, Jason Wang wrote:

>> This patch introduces virtqueue groups to vDPA device. The virtqueue

>> group is the minimal set of virtqueues that must share an address

>> space. And the adddress space identifier could only be attached to

>> a specific virtqueue group.

>>

>> A new mandated bus operation is introduced to get the virtqueue group

>> ID for a specific virtqueue.

>>

>> All the vDPA device drivers were converted to simply support a single

>> virtqueue group.

>>

>> Signed-off-by: Jason Wang <jasowang@redhat.com>

>> ---

>>   drivers/vdpa/ifcvf/ifcvf_main.c   |  9 ++++++++-

>>   drivers/vdpa/mlx5/net/mlx5_vnet.c |  8 +++++++-

>>   drivers/vdpa/vdpa.c               |  4 +++-

>>   drivers/vdpa/vdpa_sim/vdpa_sim.c  | 11 ++++++++++-

>>   include/linux/vdpa.h              | 12 +++++++++---

>>   5 files changed, 37 insertions(+), 7 deletions(-)

> Maybe consider calling it iotlb_group or iommu_group so the purpose of

> the group is clear?



I'm not sure. The reason that I choose virtqueue group is because:

1) Virtqueue is the only entity that tries to issues DMA
2) For IOMMU group, it may cause confusion to the existing IOMMU group 
who group devices
3) IOTLB is the concept in vhost, we don't have such definition in the 
virtio spec

Thanks
Eli Cohen Jan. 11, 2021, 12:26 p.m. UTC | #19
On Wed, Dec 16, 2020 at 02:48:18PM +0800, Jason Wang wrote:
> This patch introduces the control virtqueue support for vDPA

> simulator. This is a requirement for supporting advanced features like

> multiqueue.

> 

> A requirement for control virtqueue is to isolate its memory access

> from the rx/tx virtqueues. This is because when using vDPA device

> for VM, the control virqueue is not directly assigned to VM. Userspace

> (Qemu) will present a shadow control virtqueue to control for

> recording the device states.

> 

> The isolation is done via the virtqueue groups and ASID support in

> vDPA through vhost-vdpa. The simulator is extended to have:

> 

> 1) three virtqueues: RXVQ, TXVQ and CVQ (control virtqueue)

> 2) two virtqueue groups: group 0 contains RXVQ and TXVQ; group 1

>    contains CVQ

> 3) two address spaces and the simulator simply implements the address

>    spaces by mapping it 1:1 to IOTLB.

> 

> For the VM use cases, userspace(Qemu) may set AS 0 to group 0 and AS 1

> to group 1. So we have:

> 

> 1) The IOTLB for virtqueue group 0 contains the mappings of guest, so

>    RX and TX can be assigned to guest directly.

> 2) The IOTLB for virtqueue group 1 contains the mappings of CVQ which

>    is the buffers that allocated and managed by VMM only. So CVQ of

>    vhost-vdpa is visible to VMM only. And Guest can not access the CVQ

>    of vhost-vdpa.

> 

> For the other use cases, since AS 0 is associated to all virtqueue

> groups by default. All virtqueues share the same mapping by default.

> 

> To demonstrate the function, VIRITO_NET_F_CTRL_MACADDR is

> implemented in the simulator for the driver to set mac address.

> 


Hi Jason,

is there any version of qemu/libvirt available that I can see the
control virtqueue working in action?

> Signed-off-by: Jason Wang <jasowang@redhat.com>

> ---

>  drivers/vdpa/vdpa_sim/vdpa_sim.c | 189 +++++++++++++++++++++++++++----

>  1 file changed, 166 insertions(+), 23 deletions(-)

> 

> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c

> index fe90a783bde4..0fd06ac491cd 100644

> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c

> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c

> @@ -60,14 +60,18 @@ struct vdpasim_virtqueue {

>  #define VDPASIM_QUEUE_MAX 256

>  #define VDPASIM_DEVICE_ID 0x1

>  #define VDPASIM_VENDOR_ID 0

> -#define VDPASIM_VQ_NUM 0x2

> +#define VDPASIM_VQ_NUM 0x3

> +#define VDPASIM_AS_NUM 0x2

> +#define VDPASIM_GROUP_NUM 0x2

>  #define VDPASIM_NAME "vdpasim-netdev"

>  

>  static u64 vdpasim_features = (1ULL << VIRTIO_F_ANY_LAYOUT) |

>  			      (1ULL << VIRTIO_F_VERSION_1)  |

>  			      (1ULL << VIRTIO_F_ACCESS_PLATFORM) |

> +			      (1ULL << VIRTIO_NET_F_MTU) |

>  			      (1ULL << VIRTIO_NET_F_MAC) |

> -			      (1ULL << VIRTIO_NET_F_MTU);

> +			      (1ULL << VIRTIO_NET_F_CTRL_VQ) |

> +			      (1ULL << VIRTIO_NET_F_CTRL_MAC_ADDR);

>  

>  /* State of each vdpasim device */

>  struct vdpasim {

> @@ -147,11 +151,17 @@ static void vdpasim_reset(struct vdpasim *vdpasim)

>  {

>  	int i;

>  

> -	for (i = 0; i < VDPASIM_VQ_NUM; i++)

> +	spin_lock(&vdpasim->iommu_lock);

> +

> +	for (i = 0; i < VDPASIM_VQ_NUM; i++) {

>  		vdpasim_vq_reset(&vdpasim->vqs[i]);

> +		vringh_set_iotlb(&vdpasim->vqs[i].vring,

> +				 &vdpasim->iommu[0]);

> +	}

>  

> -	spin_lock(&vdpasim->iommu_lock);

> -	vhost_iotlb_reset(vdpasim->iommu);

> +	for (i = 0; i < VDPASIM_AS_NUM; i++) {

> +		vhost_iotlb_reset(&vdpasim->iommu[i]);

> +	}

>  	spin_unlock(&vdpasim->iommu_lock);

>  

>  	vdpasim->features = 0;

> @@ -191,6 +201,81 @@ static bool receive_filter(struct vdpasim *vdpasim, size_t len)

>  	return false;

>  }

>  

> +virtio_net_ctrl_ack vdpasim_handle_ctrl_mac(struct vdpasim *vdpasim,

> +					    u8 cmd)

> +{

> +	struct vdpasim_virtqueue *cvq = &vdpasim->vqs[2];

> +	virtio_net_ctrl_ack status = VIRTIO_NET_ERR;

> +	size_t read;

> +

> +	switch (cmd) {

> +	case VIRTIO_NET_CTRL_MAC_ADDR_SET:

> +		read = vringh_iov_pull_iotlb(&cvq->vring, &cvq->in_iov,

> +					     (void *)vdpasim->config.mac,

> +					     ETH_ALEN);

> +		if (read == ETH_ALEN)

> +			status = VIRTIO_NET_OK;

> +		break;

> +	default:

> +		break;

> +	}

> +

> +	return status;

> +}

> +

> +static void vdpasim_handle_cvq(struct vdpasim *vdpasim)

> +{

> +	struct vdpasim_virtqueue *cvq = &vdpasim->vqs[2];

> +	virtio_net_ctrl_ack status = VIRTIO_NET_ERR;

> +	struct virtio_net_ctrl_hdr ctrl;

> +	size_t read, write;

> +	int err;

> +

> +	if (!(vdpasim->features & (1ULL << VIRTIO_NET_F_CTRL_VQ)))

> +		return;

> +

> +	if (!cvq->ready)

> +		return;

> +

> +	while (true) {

> +		err = vringh_getdesc_iotlb(&cvq->vring, &cvq->in_iov,

> +					   &cvq->out_iov,

> +					   &cvq->head, GFP_ATOMIC);

> +		if (err <= 0)

> +			break;

> +

> +		read = vringh_iov_pull_iotlb(&cvq->vring, &cvq->in_iov, &ctrl,

> +					     sizeof(ctrl));

> +		if (read != sizeof(ctrl))

> +			break;

> +

> +		switch (ctrl.class) {

> +		case VIRTIO_NET_CTRL_MAC:

> +			status = vdpasim_handle_ctrl_mac(vdpasim, ctrl.cmd);

> +			break;

> +		default:

> +			break;

> +		}

> +

> +		/* Make sure data is wrote before advancing index */

> +		smp_wmb();

> +

> +		write = vringh_iov_push_iotlb(&cvq->vring, &cvq->out_iov,

> +					      &status, sizeof (status));

> +		vringh_complete_iotlb(&cvq->vring, cvq->head, write);

> +		vringh_kiov_cleanup(&cvq->in_iov);

> +		vringh_kiov_cleanup(&cvq->out_iov);

> +

> +		/* Make sure used is visible before rasing the interrupt. */

> +		smp_wmb();

> +

> +		local_bh_disable();

> +		if (cvq->cb)

> +			cvq->cb(cvq->private);

> +		local_bh_enable();

> +	}

> +}

> +

>  static void vdpasim_work(struct work_struct *work)

>  {

>  	struct vdpasim *vdpasim = container_of(work, struct

> @@ -276,7 +361,7 @@ static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page,

>  				   unsigned long attrs)

>  {

>  	struct vdpasim *vdpasim = dev_to_sim(dev);

> -	struct vhost_iotlb *iommu = vdpasim->iommu;

> +	struct vhost_iotlb *iommu = &vdpasim->iommu[0];

>  	u64 pa = (page_to_pfn(page) << PAGE_SHIFT) + offset;

>  	int ret, perm = dir_to_perm(dir);

>  

> @@ -301,7 +386,7 @@ static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr,

>  			       unsigned long attrs)

>  {

>  	struct vdpasim *vdpasim = dev_to_sim(dev);

> -	struct vhost_iotlb *iommu = vdpasim->iommu;

> +	struct vhost_iotlb *iommu = &vdpasim->iommu[0];

>  

>  	spin_lock(&vdpasim->iommu_lock);

>  	vhost_iotlb_del_range(iommu, (u64)dma_addr,

> @@ -314,7 +399,7 @@ static void *vdpasim_alloc_coherent(struct device *dev, size_t size,

>  				    unsigned long attrs)

>  {

>  	struct vdpasim *vdpasim = dev_to_sim(dev);

> -	struct vhost_iotlb *iommu = vdpasim->iommu;

> +	struct vhost_iotlb *iommu = &vdpasim->iommu[0];

>  	void *addr = kmalloc(size, flag);

>  	int ret;

>  

> @@ -344,7 +429,7 @@ static void vdpasim_free_coherent(struct device *dev, size_t size,

>  				  unsigned long attrs)

>  {

>  	struct vdpasim *vdpasim = dev_to_sim(dev);

> -	struct vhost_iotlb *iommu = vdpasim->iommu;

> +	struct vhost_iotlb *iommu = &vdpasim->iommu[0];

>  

>  	spin_lock(&vdpasim->iommu_lock);

>  	vhost_iotlb_del_range(iommu, (u64)dma_addr,

> @@ -370,14 +455,17 @@ static struct vdpasim *vdpasim_create(void)

>  	struct vdpasim *vdpasim;

>  	struct device *dev;

>  	int ret = -ENOMEM;

> +	int i;

>  

>  	if (batch_mapping)

>  		ops = &vdpasim_net_batch_config_ops;

>  	else

>  		ops = &vdpasim_net_config_ops;

>  

> +	/* 3 virtqueues, 2 address spaces, 2 virtqueue groups */

>  	vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops,

> -				    VDPASIM_VQ_NUM, 1, 1);

> +				    VDPASIM_VQ_NUM, VDPASIM_AS_NUM,

> +				    VDPASIM_GROUP_NUM);

>  	if (!vdpasim)

>  		goto err_alloc;

>  

> @@ -391,10 +479,14 @@ static struct vdpasim *vdpasim_create(void)

>  		goto err_iommu;

>  	set_dma_ops(dev, &vdpasim_dma_ops);

>  

> -	vdpasim->iommu = vhost_iotlb_alloc(2048, 0);

> +	vdpasim->iommu = kmalloc_array(VDPASIM_AS_NUM,

> +				       sizeof(*vdpasim->iommu), GFP_KERNEL);

>  	if (!vdpasim->iommu)

>  		goto err_iommu;

>  

> +	for (i = 0; i < VDPASIM_AS_NUM; i++)

> +		vhost_iotlb_init(&vdpasim->iommu[i], 0, 0);

> +

>  	vdpasim->buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);

>  	if (!vdpasim->buffer)

>  		goto err_iommu;

> @@ -409,8 +501,9 @@ static struct vdpasim *vdpasim_create(void)

>  		eth_random_addr(vdpasim->config.mac);

>  	}

>  

> -	vringh_set_iotlb(&vdpasim->vqs[0].vring, vdpasim->iommu);

> -	vringh_set_iotlb(&vdpasim->vqs[1].vring, vdpasim->iommu);

> +	/* Make sure that default ASID is zero */

> +	for (i = 0; i < VDPASIM_VQ_NUM; i++)

> +		vringh_set_iotlb(&vdpasim->vqs[i].vring, &vdpasim->iommu[0]);

>  

>  	vdpasim->vdpa.dma_dev = dev;

>  	ret = vdpa_register_device(&vdpasim->vdpa);

> @@ -452,7 +545,14 @@ static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 idx)

>  	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);

>  	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];

>  

> -	if (vq->ready)

> +	if (idx == 2) {

> +		/* Kernel virtio driver will do busy waiting for the

> +		 * result, so we can't handle cvq in the workqueue.

> +		 */

> +		spin_lock(&vdpasim->lock);

> +		vdpasim_handle_cvq(vdpasim);

> +		spin_unlock(&vdpasim->lock);

> +	} else if (vq->ready)

>  		schedule_work(&vdpasim->work);

>  }

>  

> @@ -518,7 +618,11 @@ static u32 vdpasim_get_vq_align(struct vdpa_device *vdpa)

>  

>  static u32 vdpasim_get_vq_group(struct vdpa_device *vdpa, u16 idx)

>  {

> -	return 0;

> +	/* RX and TX belongs to group 0, CVQ belongs to group 1 */

> +	if (idx == 2)

> +		return 1;

> +	else

> +		return 0;

>  }

>  

>  static u64 vdpasim_get_features(struct vdpa_device *vdpa)

> @@ -624,20 +728,52 @@ static struct vdpa_iova_range vdpasim_get_iova_range(struct vdpa_device *vdpa)

>  	return range;

>  }

>  

> +int vdpasim_set_group_asid(struct vdpa_device *vdpa, unsigned int group,

> +			   unsigned int asid)

> +{

> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);

> +	struct vhost_iotlb *iommu;

> +	int i;

> +

> +	if (group > VDPASIM_GROUP_NUM)

> +		return -EINVAL;

> +

> +	if (asid > VDPASIM_AS_NUM)

> +		return -EINVAL;

> +

> +	iommu = &vdpasim->iommu[asid];

> +

> +	spin_lock(&vdpasim->lock);

> +

> +	for (i = 0; i < VDPASIM_VQ_NUM; i++)

> +		if (vdpasim_get_vq_group(vdpa, i) == group)

> +			vringh_set_iotlb(&vdpasim->vqs[i].vring, iommu);

> +

> +	spin_unlock(&vdpasim->lock);

> +

> +	return 0;

> +}

> +

>  static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid,

>  			   struct vhost_iotlb *iotlb)

>  {

>  	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);

>  	struct vhost_iotlb_map *map;

> +	struct vhost_iotlb *iommu;

>  	u64 start = 0ULL, last = 0ULL - 1;

>  	int ret;

>  

> +	if (asid >= VDPASIM_AS_NUM)

> +		return -EINVAL;

> +

>  	spin_lock(&vdpasim->iommu_lock);

> -	vhost_iotlb_reset(vdpasim->iommu);

> +

> +	iommu = &vdpasim->iommu[asid];

> +	vhost_iotlb_reset(iommu);

>  

>  	for (map = vhost_iotlb_itree_first(iotlb, start, last); map;

>  	     map = vhost_iotlb_itree_next(map, start, last)) {

> -		ret = vhost_iotlb_add_range(vdpasim->iommu, map->start,

> +		ret = vhost_iotlb_add_range(iommu, map->start,

>  					    map->last, map->addr, map->perm);

>  		if (ret)

>  			goto err;

> @@ -646,7 +782,7 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid,

>  	return 0;

>  

>  err:

> -	vhost_iotlb_reset(vdpasim->iommu);

> +	vhost_iotlb_reset(iommu);

>  	spin_unlock(&vdpasim->iommu_lock);

>  	return ret;

>  }

> @@ -658,9 +794,12 @@ static int vdpasim_dma_map(struct vdpa_device *vdpa, unsigned int asid,

>  	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);

>  	int ret;

>  

> +	if (asid >= VDPASIM_AS_NUM)

> +		return -EINVAL;

> +

>  	spin_lock(&vdpasim->iommu_lock);

> -	ret = vhost_iotlb_add_range(vdpasim->iommu, iova, iova + size - 1, pa,

> -				    perm);

> +	ret = vhost_iotlb_add_range(&vdpasim->iommu[asid], iova,

> +				    iova + size - 1, pa, perm);

>  	spin_unlock(&vdpasim->iommu_lock);

>  

>  	return ret;

> @@ -671,8 +810,11 @@ static int vdpasim_dma_unmap(struct vdpa_device *vdpa, unsigned int asid,

>  {

>  	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);

>  

> +	if (asid >= VDPASIM_AS_NUM)

> +		return -EINVAL;

> +

>  	spin_lock(&vdpasim->iommu_lock);

> -	vhost_iotlb_del_range(vdpasim->iommu, iova, iova + size - 1);

> +	vhost_iotlb_del_range(&vdpasim->iommu[asid], iova, iova + size - 1);

>  	spin_unlock(&vdpasim->iommu_lock);

>  

>  	return 0;

> @@ -684,8 +826,7 @@ static void vdpasim_free(struct vdpa_device *vdpa)

>  

>  	cancel_work_sync(&vdpasim->work);

>  	kfree(vdpasim->buffer);

> -	if (vdpasim->iommu)

> -		vhost_iotlb_free(vdpasim->iommu);

> +	vhost_iotlb_free(vdpasim->iommu);

>  }

>  

>  static const struct vdpa_config_ops vdpasim_net_config_ops = {

> @@ -711,6 +852,7 @@ static const struct vdpa_config_ops vdpasim_net_config_ops = {

>  	.set_config             = vdpasim_set_config,

>  	.get_generation         = vdpasim_get_generation,

>  	.get_iova_range         = vdpasim_get_iova_range,

> +	.set_group_asid         = vdpasim_set_group_asid,

>  	.dma_map                = vdpasim_dma_map,

>  	.dma_unmap              = vdpasim_dma_unmap,

>  	.free                   = vdpasim_free,

> @@ -739,6 +881,7 @@ static const struct vdpa_config_ops vdpasim_net_batch_config_ops = {

>  	.set_config             = vdpasim_set_config,

>  	.get_generation         = vdpasim_get_generation,

>  	.get_iova_range         = vdpasim_get_iova_range,

> +	.set_group_asid         = vdpasim_set_group_asid,

>  	.set_map                = vdpasim_set_map,

>  	.free                   = vdpasim_free,

>  };

> -- 

> 2.25.1

>
Jason Wang Jan. 12, 2021, 3:11 a.m. UTC | #20
On 2021/1/11 下午8:26, Eli Cohen wrote:
> On Wed, Dec 16, 2020 at 02:48:18PM +0800, Jason Wang wrote:

>> This patch introduces the control virtqueue support for vDPA

>> simulator. This is a requirement for supporting advanced features like

>> multiqueue.

>>

>> A requirement for control virtqueue is to isolate its memory access

>> from the rx/tx virtqueues. This is because when using vDPA device

>> for VM, the control virqueue is not directly assigned to VM. Userspace

>> (Qemu) will present a shadow control virtqueue to control for

>> recording the device states.

>>

>> The isolation is done via the virtqueue groups and ASID support in

>> vDPA through vhost-vdpa. The simulator is extended to have:

>>

>> 1) three virtqueues: RXVQ, TXVQ and CVQ (control virtqueue)

>> 2) two virtqueue groups: group 0 contains RXVQ and TXVQ; group 1

>>     contains CVQ

>> 3) two address spaces and the simulator simply implements the address

>>     spaces by mapping it 1:1 to IOTLB.

>>

>> For the VM use cases, userspace(Qemu) may set AS 0 to group 0 and AS 1

>> to group 1. So we have:

>>

>> 1) The IOTLB for virtqueue group 0 contains the mappings of guest, so

>>     RX and TX can be assigned to guest directly.

>> 2) The IOTLB for virtqueue group 1 contains the mappings of CVQ which

>>     is the buffers that allocated and managed by VMM only. So CVQ of

>>     vhost-vdpa is visible to VMM only. And Guest can not access the CVQ

>>     of vhost-vdpa.

>>

>> For the other use cases, since AS 0 is associated to all virtqueue

>> groups by default. All virtqueues share the same mapping by default.

>>

>> To demonstrate the function, VIRITO_NET_F_CTRL_MACADDR is

>> implemented in the simulator for the driver to set mac address.

>>

> Hi Jason,

>

> is there any version of qemu/libvirt available that I can see the

> control virtqueue working in action?



Not yet, the qemu part depends on the shadow virtqueue work of Eugenio. 
But it will work as:

1) qemu will use a separated address space for the control virtqueue 
(shadow) exposed through vhost-vDPA
2) the commands sent through control virtqueue by guest driver will 
intercept by qemu
3) Qemu will send those commands to the shadow control virtqueue

Eugenio, any ETA for the new version of shadow virtqueue support in Qemu?

Thanks


>
Eugenio Perez Martin Jan. 22, 2021, 7:43 p.m. UTC | #21
On Tue, Jan 12, 2021 at 4:12 AM Jason Wang <jasowang@redhat.com> wrote:
>

>

> On 2021/1/11 下午8:26, Eli Cohen wrote:

> > On Wed, Dec 16, 2020 at 02:48:18PM +0800, Jason Wang wrote:

> >> This patch introduces the control virtqueue support for vDPA

> >> simulator. This is a requirement for supporting advanced features like

> >> multiqueue.

> >>

> >> A requirement for control virtqueue is to isolate its memory access

> >> from the rx/tx virtqueues. This is because when using vDPA device

> >> for VM, the control virqueue is not directly assigned to VM. Userspace

> >> (Qemu) will present a shadow control virtqueue to control for

> >> recording the device states.

> >>

> >> The isolation is done via the virtqueue groups and ASID support in

> >> vDPA through vhost-vdpa. The simulator is extended to have:

> >>

> >> 1) three virtqueues: RXVQ, TXVQ and CVQ (control virtqueue)

> >> 2) two virtqueue groups: group 0 contains RXVQ and TXVQ; group 1

> >>     contains CVQ

> >> 3) two address spaces and the simulator simply implements the address

> >>     spaces by mapping it 1:1 to IOTLB.

> >>

> >> For the VM use cases, userspace(Qemu) may set AS 0 to group 0 and AS 1

> >> to group 1. So we have:

> >>

> >> 1) The IOTLB for virtqueue group 0 contains the mappings of guest, so

> >>     RX and TX can be assigned to guest directly.

> >> 2) The IOTLB for virtqueue group 1 contains the mappings of CVQ which

> >>     is the buffers that allocated and managed by VMM only. So CVQ of

> >>     vhost-vdpa is visible to VMM only. And Guest can not access the CVQ

> >>     of vhost-vdpa.

> >>

> >> For the other use cases, since AS 0 is associated to all virtqueue

> >> groups by default. All virtqueues share the same mapping by default.

> >>

> >> To demonstrate the function, VIRITO_NET_F_CTRL_MACADDR is

> >> implemented in the simulator for the driver to set mac address.

> >>

> > Hi Jason,

> >

> > is there any version of qemu/libvirt available that I can see the

> > control virtqueue working in action?

>

>

> Not yet, the qemu part depends on the shadow virtqueue work of Eugenio.

> But it will work as:

>

> 1) qemu will use a separated address space for the control virtqueue

> (shadow) exposed through vhost-vDPA

> 2) the commands sent through control virtqueue by guest driver will

> intercept by qemu

> 3) Qemu will send those commands to the shadow control virtqueue

>

> Eugenio, any ETA for the new version of shadow virtqueue support in Qemu?

>


Hi Jason. Sorry for the late response.

For the notification part I have addressed all the issues of the RFC
[1], except the potential race conditions Stefan pointed, and tested
with vdpa devices. You can find at
https://github.com/eugpermar/qemu/tree/vdpa_sw_live_migration.d/notifications.rfc
. Since the shadow path is activated only through QMP and does not
interfere with regular operation, I could post to the qemu list if you
prefer. The series will be smaller if merged in steps.

Adding the buffer forwarding on top should not take long.

[1] https://lkml.org/lkml/2020/9/23/1243

Thanks!
Jason Wang Jan. 25, 2021, 3:16 a.m. UTC | #22
On 2021/1/23 上午3:43, Eugenio Perez Martin wrote:
> On Tue, Jan 12, 2021 at 4:12 AM Jason Wang <jasowang@redhat.com> wrote:

>>

>> On 2021/1/11 下午8:26, Eli Cohen wrote:

>>> On Wed, Dec 16, 2020 at 02:48:18PM +0800, Jason Wang wrote:

>>>> This patch introduces the control virtqueue support for vDPA

>>>> simulator. This is a requirement for supporting advanced features like

>>>> multiqueue.

>>>>

>>>> A requirement for control virtqueue is to isolate its memory access

>>>> from the rx/tx virtqueues. This is because when using vDPA device

>>>> for VM, the control virqueue is not directly assigned to VM. Userspace

>>>> (Qemu) will present a shadow control virtqueue to control for

>>>> recording the device states.

>>>>

>>>> The isolation is done via the virtqueue groups and ASID support in

>>>> vDPA through vhost-vdpa. The simulator is extended to have:

>>>>

>>>> 1) three virtqueues: RXVQ, TXVQ and CVQ (control virtqueue)

>>>> 2) two virtqueue groups: group 0 contains RXVQ and TXVQ; group 1

>>>>      contains CVQ

>>>> 3) two address spaces and the simulator simply implements the address

>>>>      spaces by mapping it 1:1 to IOTLB.

>>>>

>>>> For the VM use cases, userspace(Qemu) may set AS 0 to group 0 and AS 1

>>>> to group 1. So we have:

>>>>

>>>> 1) The IOTLB for virtqueue group 0 contains the mappings of guest, so

>>>>      RX and TX can be assigned to guest directly.

>>>> 2) The IOTLB for virtqueue group 1 contains the mappings of CVQ which

>>>>      is the buffers that allocated and managed by VMM only. So CVQ of

>>>>      vhost-vdpa is visible to VMM only. And Guest can not access the CVQ

>>>>      of vhost-vdpa.

>>>>

>>>> For the other use cases, since AS 0 is associated to all virtqueue

>>>> groups by default. All virtqueues share the same mapping by default.

>>>>

>>>> To demonstrate the function, VIRITO_NET_F_CTRL_MACADDR is

>>>> implemented in the simulator for the driver to set mac address.

>>>>

>>> Hi Jason,

>>>

>>> is there any version of qemu/libvirt available that I can see the

>>> control virtqueue working in action?

>>

>> Not yet, the qemu part depends on the shadow virtqueue work of Eugenio.

>> But it will work as:

>>

>> 1) qemu will use a separated address space for the control virtqueue

>> (shadow) exposed through vhost-vDPA

>> 2) the commands sent through control virtqueue by guest driver will

>> intercept by qemu

>> 3) Qemu will send those commands to the shadow control virtqueue

>>

>> Eugenio, any ETA for the new version of shadow virtqueue support in Qemu?

>>

> Hi Jason. Sorry for the late response.

>

> For the notification part I have addressed all the issues of the RFC

> [1], except the potential race conditions Stefan pointed, and tested

> with vdpa devices. You can find at

> https://github.com/eugpermar/qemu/tree/vdpa_sw_live_migration.d/notifications.rfc

> . Since the shadow path is activated only through QMP and does not

> interfere with regular operation, I could post to the qemu list if you

> prefer. The series will be smaller if merged in steps.



Sure. Please post them.


>

> Adding the buffer forwarding on top should not take long.

>

> [1] https://lkml.org/lkml/2020/9/23/1243

>

> Thanks!



Thanks