Message ID | 1472242795-23970-3-git-send-email-loic.pallardy@st.com |
---|---|
State | New |
Headers | show |
On Fri 26 Aug 13:19 PDT 2016, Loic Pallardy wrote: > Some coprocessors request fixed memory mapping for firmware execution > and associated communication linked. > Memory resources are defined in firmware resource table. > Resource address different from 0x0 and 0xFFFFFFFF is considered as predefined Do you think we're required to support both 0 and -1 for this? > and already reserved at system level. > In that case, remoteproc core doesn't need to perform any allocation. > Memory region access can be managed using memremap/memunmap functions > > Signed-off-by: Loic Pallardy <loic.pallardy@st.com> > --- > drivers/remoteproc/remoteproc_core.c | 61 ++++++++++++++++++++++++++---------- > include/linux/remoteproc.h | 4 +++ > 2 files changed, 49 insertions(+), 16 deletions(-) > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > index 18f4286..0ddbb92 100644 > --- a/drivers/remoteproc/remoteproc_core.c > +++ b/drivers/remoteproc/remoteproc_core.c > @@ -213,13 +213,25 @@ int rproc_alloc_vring(struct rproc_vdev *rvdev, int i) > /* actual size of vring (in bytes) */ > size = PAGE_ALIGN(vring_size(rvring->len, rvring->align)); > > - /* > - * Allocate non-cacheable memory for the vring. In the future > - * this call will also configure the IOMMU for us > - */ > - va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL); > + rsc = (void *)rproc->table_ptr + rvdev->rsc_offset; > + > + /* check if specific memory region requested by firmware */ > + if (rsc->vring[i].da != 0 && rsc->vring[i].da != FW_RSC_ADDR_ANY) { I think we should convert that reserved field in the vring to a "pa"; allowing this resource to not be 1:1 mapped into the remote. And if nothing else just to be consistent with the carveouts and devmem. @Suman, do you have any input on this? Regards, Bjorn
On 08/27/2016 02:32 AM, Bjorn Andersson wrote: > On Fri 26 Aug 13:19 PDT 2016, Loic Pallardy wrote: > >> Some coprocessors request fixed memory mapping for firmware execution >> and associated communication linked. >> Memory resources are defined in firmware resource table. >> Resource address different from 0x0 and 0xFFFFFFFF is considered as predefined > > Do you think we're required to support both 0 and -1 for this? Hi Bjorn, You're right, only -1 is needed. SoC can have internal RAM in 0x0 for example. I'll update in a V2. > >> and already reserved at system level. >> In that case, remoteproc core doesn't need to perform any allocation. >> Memory region access can be managed using memremap/memunmap functions >> >> Signed-off-by: Loic Pallardy <loic.pallardy@st.com> >> --- >> drivers/remoteproc/remoteproc_core.c | 61 ++++++++++++++++++++++++++---------- >> include/linux/remoteproc.h | 4 +++ >> 2 files changed, 49 insertions(+), 16 deletions(-) >> >> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c >> index 18f4286..0ddbb92 100644 >> --- a/drivers/remoteproc/remoteproc_core.c >> +++ b/drivers/remoteproc/remoteproc_core.c >> @@ -213,13 +213,25 @@ int rproc_alloc_vring(struct rproc_vdev *rvdev, int i) >> /* actual size of vring (in bytes) */ >> size = PAGE_ALIGN(vring_size(rvring->len, rvring->align)); >> >> - /* >> - * Allocate non-cacheable memory for the vring. In the future >> - * this call will also configure the IOMMU for us >> - */ >> - va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL); >> + rsc = (void *)rproc->table_ptr + rvdev->rsc_offset; >> + >> + /* check if specific memory region requested by firmware */ >> + if (rsc->vring[i].da != 0 && rsc->vring[i].da != FW_RSC_ADDR_ANY) { > > I think we should convert that reserved field in the vring to a "pa"; > allowing this resource to not be 1:1 mapped into the remote. And if > nothing else just to be consistent with the carveouts and devmem. In fact vring doesn't have pa because coprocessor diretly access it without help of hardware accelerator. On both carveout and devmem, hardware accelerators may be used. That's true having pa field will be more consistent from host pov. Regards, Loic > > > @Suman, do you have any input on this? > > Regards, > Bjorn >
Hi Loic, Bjorn, On 08/29/2016 03:09 AM, loic pallardy wrote: > > > On 08/27/2016 02:32 AM, Bjorn Andersson wrote: >> On Fri 26 Aug 13:19 PDT 2016, Loic Pallardy wrote: >> >>> Some coprocessors request fixed memory mapping for firmware execution >>> and associated communication linked. >>> Memory resources are defined in firmware resource table. >>> Resource address different from 0x0 and 0xFFFFFFFF is considered as >>> predefined >> >> Do you think we're required to support both 0 and -1 for this? > Hi Bjorn, > You're right, only -1 is needed. SoC can have internal RAM in 0x0 for > example. > I'll update in a V2. >> >>> and already reserved at system level. >>> In that case, remoteproc core doesn't need to perform any allocation. >>> Memory region access can be managed using memremap/memunmap functions >>> >>> Signed-off-by: Loic Pallardy <loic.pallardy@st.com> >>> --- >>> drivers/remoteproc/remoteproc_core.c | 61 >>> ++++++++++++++++++++++++++---------- >>> include/linux/remoteproc.h | 4 +++ >>> 2 files changed, 49 insertions(+), 16 deletions(-) >>> >>> diff --git a/drivers/remoteproc/remoteproc_core.c >>> b/drivers/remoteproc/remoteproc_core.c >>> index 18f4286..0ddbb92 100644 >>> --- a/drivers/remoteproc/remoteproc_core.c >>> +++ b/drivers/remoteproc/remoteproc_core.c >>> @@ -213,13 +213,25 @@ int rproc_alloc_vring(struct rproc_vdev *rvdev, >>> int i) >>> /* actual size of vring (in bytes) */ >>> size = PAGE_ALIGN(vring_size(rvring->len, rvring->align)); >>> >>> - /* >>> - * Allocate non-cacheable memory for the vring. In the future >>> - * this call will also configure the IOMMU for us >>> - */ >>> - va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL); >>> + rsc = (void *)rproc->table_ptr + rvdev->rsc_offset; >>> + >>> + /* check if specific memory region requested by firmware */ >>> + if (rsc->vring[i].da != 0 && rsc->vring[i].da != FW_RSC_ADDR_ANY) { >> >> I think we should convert that reserved field in the vring to a "pa"; >> allowing this resource to not be 1:1 mapped into the remote. And if >> nothing else just to be consistent with the carveouts and devmem. > In fact vring doesn't have pa because coprocessor diretly access it > without help of hardware accelerator. On both carveout and devmem, > hardware accelerators may be used. > That's true having pa field will be more consistent from host pov. I agree, and I actually have a need for the pa/dma address without disturbing the da as well. > > Regards, > Loic > >> >> @Suman, do you have any input on this? I was thinking about this as well, and the way I actually envisioned this is to add additional rproc_ops with the default behavior falling back to the dma_alloc API. I had two use-cases in mind for that - one is the same as what Loic is trying to resolve here, and the other is a case where I want to allocate these memories not through DMA API, but like say allocate from an remote processor internal RAM or an on-chip internal memory. This is the case atleast for vrings and vring buffers. I think these decisions are best made in the individual platform drivers as the integration can definitely vary from one SoC to another. The other thing this series makes an assumption is that with a fixed da, it is assuming the device is not behind an MMU, and whatever da is pointing to is a bus accessible address. We have traditional meant the da as "device address" so it translated as bus address on devices that are not behind an MMU, or actual virtual addresses as seen by the device if behind an MMU. On TI SoCs on some devices, we do have an MMU and so we have a non (-1) da, but it is not valid for memremapping. At the same time, we would also need any allocated address to be filled in. regards Suman
On 08/31/2016 01:13 AM, Suman Anna wrote: > Hi Suman, Hi Loic, Bjorn, > > On 08/29/2016 03:09 AM, loic pallardy wrote: >> >> >> On 08/27/2016 02:32 AM, Bjorn Andersson wrote: >>> On Fri 26 Aug 13:19 PDT 2016, Loic Pallardy wrote: >>> >>>> Some coprocessors request fixed memory mapping for firmware execution >>>> and associated communication linked. >>>> Memory resources are defined in firmware resource table. >>>> Resource address different from 0x0 and 0xFFFFFFFF is considered as >>>> predefined >>> >>> Do you think we're required to support both 0 and -1 for this? >> Hi Bjorn, >> You're right, only -1 is needed. SoC can have internal RAM in 0x0 for >> example. >> I'll update in a V2. >>> >>>> and already reserved at system level. >>>> In that case, remoteproc core doesn't need to perform any allocation. >>>> Memory region access can be managed using memremap/memunmap functions >>>> >>>> Signed-off-by: Loic Pallardy <loic.pallardy@st.com> >>>> --- >>>> drivers/remoteproc/remoteproc_core.c | 61 >>>> ++++++++++++++++++++++++++---------- >>>> include/linux/remoteproc.h | 4 +++ >>>> 2 files changed, 49 insertions(+), 16 deletions(-) >>>> >>>> diff --git a/drivers/remoteproc/remoteproc_core.c >>>> b/drivers/remoteproc/remoteproc_core.c >>>> index 18f4286..0ddbb92 100644 >>>> --- a/drivers/remoteproc/remoteproc_core.c >>>> +++ b/drivers/remoteproc/remoteproc_core.c >>>> @@ -213,13 +213,25 @@ int rproc_alloc_vring(struct rproc_vdev *rvdev, >>>> int i) >>>> /* actual size of vring (in bytes) */ >>>> size = PAGE_ALIGN(vring_size(rvring->len, rvring->align)); >>>> >>>> - /* >>>> - * Allocate non-cacheable memory for the vring. In the future >>>> - * this call will also configure the IOMMU for us >>>> - */ >>>> - va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL); >>>> + rsc = (void *)rproc->table_ptr + rvdev->rsc_offset; >>>> + >>>> + /* check if specific memory region requested by firmware */ >>>> + if (rsc->vring[i].da != 0 && rsc->vring[i].da != FW_RSC_ADDR_ANY) { >>> >>> I think we should convert that reserved field in the vring to a "pa"; >>> allowing this resource to not be 1:1 mapped into the remote. And if >>> nothing else just to be consistent with the carveouts and devmem. >> In fact vring doesn't have pa because coprocessor diretly access it >> without help of hardware accelerator. On both carveout and devmem, >> hardware accelerators may be used. >> That's true having pa field will be more consistent from host pov. > > I agree, and I actually have a need for the pa/dma address without > disturbing the da as well. >> >> Regards, >> Loic >> >>> >>> @Suman, do you have any input on this? > > I was thinking about this as well, and the way I actually envisioned > this is to add additional rproc_ops with the default behavior falling > back to the dma_alloc API. I had two use-cases in mind for that - one is > the same as what Loic is trying to resolve here, and the other is a case > where I want to allocate these memories not through DMA API, but like > say allocate from an remote processor internal RAM or an on-chip > internal memory. This is the case atleast for vrings and vring buffers. > I think these decisions are best made in the individual platform drivers > as the integration can definitely vary from one SoC to another. > Same use cases on ST side. Indeed we can create some new ops to alloc and release memory chunk at driver level. Driver will be free to use reserved memeory framework or simply memremap chunks. I'll propose a patch for that. > The other thing this series makes an assumption is that with a fixed da, > it is assuming the device is not behind an MMU, and whatever da is > pointing to is a bus accessible address. We have traditional meant the > da as "device address" so it translated as bus address on devices that > are not behind an MMU, or actual virtual addresses as seen by the device > if behind an MMU. On TI SoCs on some devices, we do have an MMU and so > we have a non (-1) da, but it is not valid for memremapping. > At the same time, we would also need any allocated address to be filled in. > Today da is used because there is no other field which can be used to fix this resource. But agree da is only for device address and pa is missing. If all agree I'll convert reserved field in pa. Thanks for your review. Regards, Loic > regards > Suman >
On Tue 30 Aug 16:13 PDT 2016, Suman Anna wrote: > >>> + if (rsc->vring[i].da != 0 && rsc->vring[i].da != FW_RSC_ADDR_ANY) { [..] > >> @Suman, do you have any input on this? > Thanks Suman > I was thinking about this as well, and the way I actually envisioned > this is to add additional rproc_ops with the default behavior falling > back to the dma_alloc API. I had two use-cases in mind for that - one is > the same as what Loic is trying to resolve here, and the other is a case > where I want to allocate these memories not through DMA API, but like > say allocate from an remote processor internal RAM or an on-chip > internal memory. Are these cases a matter of mapping the chunks with ioremap, or are there more fancy setups that would affect how we load the data into them as well? Also, in the case of you mapping vrings in on-chip memory, would you use the resource table to communicate these addresses or are they simply hard coded in the loaded firmware? > This is the case atleast for vrings and vring buffers. > I think these decisions are best made in the individual platform drivers > as the integration can definitely vary from one SoC to another. > This touches upon the discussion related to how to support fixed position vring buffers. > The other thing this series makes an assumption is that with a fixed da, > it is assuming the device is not behind an MMU, and whatever da is > pointing to is a bus accessible address. But doesn't the current code do the same? Isn't the "dma" that we assign "da" the physical address of the memory? > We have traditional meant the > da as "device address" so it translated as bus address on devices that > are not behind an MMU, or actual virtual addresses as seen by the device > if behind an MMU. I like the idea of making this the uniform design among the various resource types. > On TI SoCs on some devices, we do have an MMU and so > we have a non (-1) da, but it is not valid for memremapping. > At the same time, we would also need any allocated address to be filled in. Right, so analog to the carveout case we need to allocate memory and potentially map the memory in the iommu. As this case then repeats itself for the vring (rpmsg) buffers I think we should strive for representing and handling all these memory allocations in a more uniform way. Regards, Bjorn
On 08/31/2016 11:37 AM, Bjorn Andersson wrote: > On Tue 30 Aug 16:13 PDT 2016, Suman Anna wrote: > >>>>> + if (rsc->vring[i].da != 0 && rsc->vring[i].da != FW_RSC_ADDR_ANY) { > [..] >>>> @Suman, do you have any input on this? >> > > Thanks Suman > >> I was thinking about this as well, and the way I actually envisioned >> this is to add additional rproc_ops with the default behavior falling >> back to the dma_alloc API. I had two use-cases in mind for that - one is >> the same as what Loic is trying to resolve here, and the other is a case >> where I want to allocate these memories not through DMA API, but like >> say allocate from an remote processor internal RAM or an on-chip >> internal memory. > > Are these cases a matter of mapping the chunks with ioremap, or are > there more fancy setups that would affect how we load the data into them > as well? The loading can be handled automatically as we do provide the .da_to_va() ops for individual platform drivers to provide a translation mechanism. The default ELF loader only needs the kernel va to be able to copy the data over. In the case of fixed addresses, it is just a matter of ioremap, but if using the mmio-sram driver, the platform drivers are responsible for providing you the va and dma. > > Also, in the case of you mapping vrings in on-chip memory, would you use > the resource table to communicate these addresses or are they simply > hard coded in the loaded firmware? It really depends on how the on-chip memory get used. Unless there is a limitation to use a fixed address location, the normal usage would be used the mmio-sram driver and the gen_pool API to allocate on-chip memory. We do fill in the resource table to communicate these addresses to loaded firmware. > >> This is the case atleast for vrings and vring buffers. >> I think these decisions are best made in the individual platform drivers >> as the integration can definitely vary from one SoC to another. >> > > This touches upon the discussion related to how to support fixed > position vring buffers. Indeed, in one of the usage patterns. > >> The other thing this series makes an assumption is that with a fixed da, >> it is assuming the device is not behind an MMU, and whatever da is >> pointing to is a bus accessible address. > > But doesn't the current code do the same? > Isn't the "dma" that we assign "da" the physical address of the memory? I meant this series is making the assumption. Previously, we were ignoring the da and overwriting it with the allocated physical address field, right. > >> We have traditional meant the >> da as "device address" so it translated as bus address on devices that >> are not behind an MMU, or actual virtual addresses as seen by the device >> if behind an MMU. > > I like the idea of making this the uniform design among the various > resource types. > >> On TI SoCs on some devices, we do have an MMU and so >> we have a non (-1) da, but it is not valid for memremapping. >> At the same time, we would also need any allocated address to be filled in. > > Right, so analog to the carveout case we need to allocate memory and > potentially map the memory in the iommu. > > As this case then repeats itself for the vring (rpmsg) buffers I think > we should strive for representing and handling all these memory > allocations in a more uniform way. Yes agreed. Though there are currently some gaps w.r.t the vrings and the vring buffers mapping, as the current code doesn't have an associated iommu_map calls around the allocation. It might be that the remoteproc would require these to be allocated/mapped in a specific region for properly configuring the cacheability property around this. We are using a work-around for the moment to get around this. regards Suman
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c index 18f4286..0ddbb92 100644 --- a/drivers/remoteproc/remoteproc_core.c +++ b/drivers/remoteproc/remoteproc_core.c @@ -213,13 +213,25 @@ int rproc_alloc_vring(struct rproc_vdev *rvdev, int i) /* actual size of vring (in bytes) */ size = PAGE_ALIGN(vring_size(rvring->len, rvring->align)); - /* - * Allocate non-cacheable memory for the vring. In the future - * this call will also configure the IOMMU for us - */ - va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL); + rsc = (void *)rproc->table_ptr + rvdev->rsc_offset; + + /* check if specific memory region requested by firmware */ + if (rsc->vring[i].da != 0 && rsc->vring[i].da != FW_RSC_ADDR_ANY) { + va = memremap(rsc->vring[i].da, size, MEMREMAP_WC); + rvring->dma = rsc->vring[i].da; + rvring->memmap = true; + } else { + /* + * Allocate non-cacheable memory for the vring. In the future + * this call will also configure the IOMMU for us + */ + va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL); + rvring->dma = dma; + rsc->vring[i].da = dma; + } + if (!va) { - dev_err(dev->parent, "dma_alloc_coherent failed\n"); + dev_err(dev->parent, "Failed to get valid ving[%d] va\n", i); return -EINVAL; } @@ -231,7 +243,10 @@ int rproc_alloc_vring(struct rproc_vdev *rvdev, int i) ret = idr_alloc(&rproc->notifyids, rvring, 0, 0, GFP_KERNEL); if (ret < 0) { dev_err(dev, "idr_alloc failed: %d\n", ret); - dma_free_coherent(dev->parent, size, va, dma); + if (rvring->memmap) + memunmap(rvring->va); + else + dma_free_coherent(dev->parent, size, va, dma); return ret; } notifyid = ret; @@ -240,7 +255,6 @@ int rproc_alloc_vring(struct rproc_vdev *rvdev, int i) i, va, &dma, size, notifyid); rvring->va = va; - rvring->dma = dma; rvring->notifyid = notifyid; /* @@ -249,8 +263,6 @@ int rproc_alloc_vring(struct rproc_vdev *rvdev, int i) * set up the iommu. In this case the device address (da) will * hold the physical address and not the device address. */ - rsc = (void *)rproc->table_ptr + rvdev->rsc_offset; - rsc->vring[i].da = dma; rsc->vring[i].notifyid = notifyid; return 0; } @@ -293,7 +305,11 @@ void rproc_free_vring(struct rproc_vring *rvring) int idx = rvring->rvdev->vring - rvring; struct fw_rsc_vdev *rsc; - dma_free_coherent(rproc->dev.parent, size, rvring->va, rvring->dma); + if (rvring->memmap) + memunmap(rvring->va); + else + dma_free_coherent(rproc->dev.parent, size, rvring->va, + rvring->dma); idr_remove(&rproc->notifyids, rvring->notifyid); /* reset resource entry info */ @@ -585,7 +601,15 @@ static int rproc_handle_carveout(struct rproc *rproc, if (!carveout) return -ENOMEM; - va = dma_alloc_coherent(dev->parent, rsc->len, &dma, GFP_KERNEL); + /* check if specific memory region requested by firmware */ + if (rsc->pa != 0 && rsc->pa != FW_RSC_ADDR_ANY) { + va = memremap(rsc->pa, rsc->len, MEMREMAP_WC); + carveout->memmap = true; + dma = rsc->pa; + } else { + va = dma_alloc_coherent(dev->parent, rsc->len, &dma, GFP_KERNEL); + rsc->pa = dma; + } if (!va) { dev_err(dev->parent, "failed to allocate dma memory: len 0x%x\n", rsc->len); @@ -659,7 +683,6 @@ static int rproc_handle_carveout(struct rproc *rproc, * In this case, the device address and the physical address * are the same. */ - rsc->pa = dma; carveout->va = va; carveout->len = rsc->len; @@ -673,7 +696,10 @@ static int rproc_handle_carveout(struct rproc *rproc, free_mapping: kfree(mapping); dma_free: - dma_free_coherent(dev->parent, rsc->len, va, dma); + if (carveout->memmap) + memunmap(va); + else + dma_free_coherent(dev->parent, rsc->len, va, dma); free_carv: kfree(carveout); return ret; @@ -780,8 +806,11 @@ static void rproc_resource_cleanup(struct rproc *rproc) /* clean up carveout allocations */ list_for_each_entry_safe(entry, tmp, &rproc->carveouts, node) { - dma_free_coherent(dev->parent, entry->len, entry->va, - entry->dma); + if (entry->memmap) + memunmap(entry->va); + else + dma_free_coherent(dev->parent, entry->len, entry->va, + entry->dma); list_del(&entry->node); kfree(entry); } diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h index 80e1cba..ff1fb59 100644 --- a/include/linux/remoteproc.h +++ b/include/linux/remoteproc.h @@ -312,6 +312,7 @@ struct fw_rsc_vdev { * @len: length, in bytes * @da: device address * @priv: associated data + * @memmap: true if memory is memremapped * @node: list node */ struct rproc_mem_entry { @@ -320,6 +321,7 @@ struct rproc_mem_entry { int len; u32 da; void *priv; + bool memmap; struct list_head node; }; @@ -458,6 +460,7 @@ struct rproc { * @notifyid: rproc-specific unique vring index * @rvdev: remote vdev * @vq: the virtqueue of this vring + * @memmap: true if memory is memremapped */ struct rproc_vring { void *va; @@ -468,6 +471,7 @@ struct rproc_vring { int notifyid; struct rproc_vdev *rvdev; struct virtqueue *vq; + bool memmap; }; /**
Some coprocessors request fixed memory mapping for firmware execution and associated communication linked. Memory resources are defined in firmware resource table. Resource address different from 0x0 and 0xFFFFFFFF is considered as predefined and already reserved at system level. In that case, remoteproc core doesn't need to perform any allocation. Memory region access can be managed using memremap/memunmap functions Signed-off-by: Loic Pallardy <loic.pallardy@st.com> --- drivers/remoteproc/remoteproc_core.c | 61 ++++++++++++++++++++++++++---------- include/linux/remoteproc.h | 4 +++ 2 files changed, 49 insertions(+), 16 deletions(-) -- 1.9.1