Message ID | 20231207222944.663893-1-pierre-louis.bossart@linux.intel.com |
---|---|
Headers | show |
Series | soundwire/ASoC: speed-up downloads with BTP/BRA protocol | expand |
On Thu, Dec 07, 2023 at 04:29:28PM -0600, Pierre-Louis Bossart wrote: > The MIPI specification and most of the new codecs support the Bulk > Transfer Protocol (BTP) and specifically the Bulk Register Access > (BRA) configuration. This mode reclaims the 'audio' data space of the > SoundWire frame to send firmware/coefficients over the DataPort 0 > (DP0). So the bulk register access is accessing registers that are also visible through the one register at at time interface, just faster?
On 12/7/23 16:56, Mark Brown wrote: > On Thu, Dec 07, 2023 at 04:29:28PM -0600, Pierre-Louis Bossart wrote: > >> The MIPI specification and most of the new codecs support the Bulk >> Transfer Protocol (BTP) and specifically the Bulk Register Access >> (BRA) configuration. This mode reclaims the 'audio' data space of the >> SoundWire frame to send firmware/coefficients over the DataPort 0 >> (DP0). > > So the bulk register access is accessing registers that are also visible > through the one register at at time interface, just faster? Yes, each frame can transmit a packet with a start address, length and a bunch of data bytes protected with a CRC. With the default 50x4 frame size we use, we can send 8 contiguous bytes per frame instead of 1. With a larger frame you get even more bytes per frame. Also because we program a large buffer with all the packets pre-formatted by software, we don't have much software overhead. The packets are streamed over DMA and inserted in the frame by hardware at the relevant time. That means waiting for one DMA complete event instead of dealing with thousands of command/responses with interrupts. There are limitations though, if the frame is already transmitting audio data then obviously we have a conflict.
On 12/8/23 10:27, Charles Keepax wrote: > On Thu, Dec 07, 2023 at 04:29:29PM -0600, Pierre-Louis Bossart wrote: >> The Bulk Register Access protocol was left as a TODO topic since >> 2018. It's time to document this protocol and the design of its Linux >> support. >> >> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> >> --- >> +Concurrency between BRA and regular read/write >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> + >> +The existing 'nread/nwrite' API already relies on a notion of start >> +address and number of bytes, so it would be possible to extend this >> +API with a 'hint' requesting BPT/BRA be used. >> + >> +However BRA transfers could be quite long, and the use of a single >> +mutex for regular read/write and BRA is a show-stopper. Independent >> +operation of the control/command and BRA transfers is a fundamental >> +requirement, e.g. to change the volume level with the existing regmap >> +interface while downloading firmware. > > Is this definitely a show stopper? Not saying that it wouldn't be > desirable to do both from a speed perspective, but current > systems that download firmware (I2C/SPI) typically will block the > bus for some amount of time. There are also some desirable properties > to a single lock such as not needing to worry about accessing the > same register in the bulk transfer and a normal command transfer. There are other cases where we do want to interact with the device even if we are in the middle of downloading stuff. We currently have a msg_lock that's used to prevent multiple read/writes from being sent on the bus, I really don't think it would be wise to use this same "lightweight" lock for much longer transfers. This could impact response to alerts, changes in state reported by the hardware, etc. >> +Audio DMA support >> +----------------- >> + >> +Some DMAs, such as HDaudio, require an audio format field to be >> +set. This format is in turn used to define acceptable bursts. BPT/BRA >> +support is not fully compatible with these definitions in that the >> +format may vary between read and write commands. >> + >> +In addition, on Intel HDaudio Intel platforms the DMAs need to be >> +programmed with a PCM format matching the bandwidth of the BPT/BRA >> +transfer. The format is based on 48kHz 32-bit samples, and the number >> +of channels varies to adjust the bandwidth. The notion of channel is >> +completely notional since the data is not typical audio >> +PCM. Programming channels helps reserve enough bandwidth and adjust >> +FIFO sizes to avoid xruns. Note that the quality of service comes as a >> +cost. Since all channels need to be present as a sample block, data >> +sizes not aligned to 128-bytes are not supported. > > Apologies but could you elaborate a litte on this? I am not sure > I follow the reasoning, how does the 48k 32bit DMA implementation > result in 128-byte limitation? I would have thought 1 channel would > be 4-bytes and you are varying the channels so I would have expected > 4-byte aligned maybe 8-byte if the DMA expects stereo pairs. With the 50x4 default frame size that we're using, a BTP/BRA write requires 13.824 Mbits/s for TX and 3.072Mbits/s for RX. We have to treat these "non-audio' streams as respectively 10 and 2 channels of 48kHz 32bit data to reserve the equivalent PCM bandwidth, and that creates an alignment requirement since the DMA expects all 'channels' to be present in the same block. We could try and play with parameters, e.g. I am not sure what happens if we used 192kHz, maybe that reduces the alignment back to 32 bytes. But the point is arbitrary sizes *cannot* be handled. The HDaudio spec also mentions that for efficiency reason it's strongly recommended to use 128-byte multiples. > And what exactly do we mean by aligned, are we saying the length > all transfers needs to be a multiple of 128-bytes? Yes. > I think we might have some annoying restrictions on the block > size on our hardware as well I will go dig into that and report > back. That would be very useful indeed, we have to make sure this alignment is reasonably supported across hosts and devices. If we can't agree we'll have to make the alignment host-dependent, but that will make the logic in codec drivers more complicated.
On Fri, Dec 08, 2023 at 12:45:21PM -0600, Pierre-Louis Bossart wrote: > On 12/8/23 10:27, Charles Keepax wrote: > > I think we might have some annoying restrictions on the block > > size on our hardware as well I will go dig into that and report > > back. > That would be very useful indeed, we have to make sure this alignment is > reasonably supported across hosts and devices. > If we can't agree we'll have to make the alignment host-dependent, but > that will make the logic in codec drivers more complicated. Hopefully even if it's complicated it's something that can be factored out into library code rather than replicated.
On Thu, Dec 07, 2023 at 06:56:55PM -0600, Pierre-Louis Bossart wrote: > >> +The Device Number specified in the Header follows the SoundWire > >> +definitions, and broadcast and group addressing are > >> +permitted. However, in reality it is very unlikely that the exact same > >> +binary data needs to be provided to the two different Peripheral > >> +devices. The Linux implementation only allows for transfers to a > >> +single device at a time. > > For the firmware download case it seems likely that this won't always be > > the case, but it's definitely a thing that could be done incrementally. > One example discussed this week on the mailing list is the Cirrus Logic > case, where the firmware contains information on which speakers a given > amplifier should be doing, and each firmware is named as AMP-n. I can imagine a vendor structuring things so they've got separate firmware and coefficent/configuration images, the firmware images could be shared. > I have really not found any practical case where the same data needs to > be sent to N devices, and I don't have a burning desire to tie the codec > initialization together with all the asynchronous nature of > enumeration/probe. Like I say it does seem like something that could be done incrementally. > > These don't seem like insurmountable obstacles - they feel more like a > > copy break kind of situation where we can infer things from the pattern > > of transactions, and there's always the possibility of adding some calls > > to give regmap more high level information about the overall state of > > the device. One of the expected usage patterns with cache only mode is > > to build up a final register state then let the cache code work out the > > optimal pattern to actually write that out. > I did expect some pushback on regmap :-) > The point is really that the main use for this download is a set of > write-once memory areas which happen to be mapped as registers. There is > no real need to declare or access each memory address individually. Yeah, normally it's just write only stuff but I've seen things like controls being added in the DSP memory before - the > In addition in terms of error handling, the expectation is that the set > of writes are handled in a pass/fail manner. There's no real way to know > which of the individual writes failed. That's the case for any block writes. > > I might be missing something but those requests for redownload sound > > like things that would occur regardless of the mechanism used to perform > > the I/O? > What I meant is that the codec driver would e.g. need to fetch a > different firmware table and download it. There's no real need to > maintain a cache on the host side since the entire table will be downloaded. I mean, if that's the usage pattern surely things would just be marked volatile anyway? A cache is entirely optional. > > This feels a lot like it could map onto the regmap async interface, it > > would need a bit of a rework (mainly that currently they provide > > ordering guarantees with respect to both each other and sync I/O) but > > those could be removed with some care) but it's got the "here's a list > > of I/O, here's another call to ensure it's all actually happened" thing. > > It sounds very much like how I was thinking of the async API when I was > > writing it and the initial users. > > It's this bit that really got me thinking it could fit well into regmap. > I really don't know anything about this async interface, if you have > pointers on existing examples I am all ears....I am not aware of any > Intel platform or codec used on an Intel platform making use of this API. grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, or there's the rbtree cache sync code which uses a back door to go into an async mode. Basically just variants of all the normal regmap I/O calls with a _complete() call you can use to wait for everything to happen. The implementation is a bit heavyweight since it was written to work with fairly slow buses. > At any rate the low-level behavior is to > a) allocate and configure all the SoundWire resources > b) allocate and configure all the DMA resources > c) trigger DMA and enable SoundWire transfers > d) wait for the DMA to complete > The codec API can be modified as needed, as long as there are provisions > for these 4 steps. That's not quite how the current API works, but it feels like it's close enough to the intent and there's literally one user of the actual API. > >> + (3) A Peripheral driver may want to wait until existing BRA > >> + transfers complete or deal with BRA as a background task when > >> + audio transfers stop. In this case, there would be no timeout, > >> + and the operation may not happen if the platform is suspended. > > Option 3 would be a jump for regmap. > Sorry, I don't get what a 'jump' means in this context. Big change.
>>> This feels a lot like it could map onto the regmap async interface, it >>> would need a bit of a rework (mainly that currently they provide >>> ordering guarantees with respect to both each other and sync I/O) but >>> those could be removed with some care) but it's got the "here's a list >>> of I/O, here's another call to ensure it's all actually happened" thing. >>> It sounds very much like how I was thinking of the async API when I was >>> writing it and the initial users. > >>> It's this bit that really got me thinking it could fit well into regmap. > >> I really don't know anything about this async interface, if you have >> pointers on existing examples I am all ears....I am not aware of any >> Intel platform or codec used on an Intel platform making use of this API. > > grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, > or there's the rbtree cache sync code which uses a back door to go into > an async mode. Basically just variants of all the normal regmap I/O > calls with a _complete() call you can use to wait for everything to > happen. The implementation is a bit heavyweight since it was written to > work with fairly slow buses. I spent a fair amount of time this afternoon trying to understand the regmap_async parts, and I am not following where in the code there is an ordering requirement/enforcement between async and sync usages. I do see this comment in the code * @async_write: Write operation which completes asynchronously, optional and must serialise with respect to non-async I/O. But that 'must' is a requirement on the codec side, isn't it? When using regmap_raw_write_async(), the lock is released immediately. I don't see anything at the regmap level that would prevent a codec driver from using regmap_raw_write() immediately. That's probably a violation of the API to do so, but it's the same problem I was referring earlier in the conversation where 'regular' read/writes cannot happen in parallel with BTP/BRA transactions. Also is this just me spacing out or there is no regmap_raw_read_async()?
On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote: > > grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, > > or there's the rbtree cache sync code which uses a back door to go into > > an async mode. Basically just variants of all the normal regmap I/O > > calls with a _complete() call you can use to wait for everything to > > happen. The implementation is a bit heavyweight since it was written to > > work with fairly slow buses. > I spent a fair amount of time this afternoon trying to understand the > regmap_async parts, and I am not following where in the code there is an > ordering requirement/enforcement between async and sync usages. The only actual async implementation is SPI which processes things in order of submission, the sync API wraps the async API. > Also is this just me spacing out or there is no regmap_raw_read_async()? Right, there was never any need.
On 12/19/23 17:53, Mark Brown wrote: > On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote: > >>> grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, >>> or there's the rbtree cache sync code which uses a back door to go into >>> an async mode. Basically just variants of all the normal regmap I/O >>> calls with a _complete() call you can use to wait for everything to >>> happen. The implementation is a bit heavyweight since it was written to >>> work with fairly slow buses. > >> I spent a fair amount of time this afternoon trying to understand the >> regmap_async parts, and I am not following where in the code there is an >> ordering requirement/enforcement between async and sync usages. > > The only actual async implementation is SPI which processes things in > order of submission, the sync API wraps the async API. > >> Also is this just me spacing out or there is no regmap_raw_read_async()? > > Right, there was never any need. ok. I am starting to think that we could have a new type of regmap, say "regmap-sdw-bra", where the use of write_raw_async() would rely on the send/wait bus primitives, and write_raw() would fallback to the regular read/write commands. We'd need a mutual exclusion to prevent parallel async/sync access to the same regmap. In other words, "memory" areas that are used for firmware downloads would be moved to a different regmap with async capabilities and no caching support.
On Tue, Dec 19, 2023 at 06:08:15PM +0100, Pierre-Louis Bossart wrote: > On 12/19/23 17:53, Mark Brown wrote: > > On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote: > >>> grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, > >>> or there's the rbtree cache sync code which uses a back door to go into > >>> an async mode. Basically just variants of all the normal regmap I/O > >>> calls with a _complete() call you can use to wait for everything to > >>> happen. The implementation is a bit heavyweight since it was written to > >>> work with fairly slow buses. > > > >> I spent a fair amount of time this afternoon trying to understand the > >> regmap_async parts, and I am not following where in the code there is an > >> ordering requirement/enforcement between async and sync usages. > > > > The only actual async implementation is SPI which processes things in > > order of submission, the sync API wraps the async API. > > > >> Also is this just me spacing out or there is no regmap_raw_read_async()? > > > > Right, there was never any need. > > ok. I am starting to think that we could have a new type of regmap, say > "regmap-sdw-bra", where the use of write_raw_async() would rely on the > send/wait bus primitives, and write_raw() would fallback to the regular > read/write commands. We'd need a mutual exclusion to prevent parallel > async/sync access to the same regmap. > > In other words, "memory" areas that are used for firmware downloads > would be moved to a different regmap with async capabilities and no > caching support. I would be a little inclined to say leave adding a regmap for a follow up series, whether we add it to the existing regmap or add a new one, or whatever, it should all sit happily on top of the API being added in this series. Makes it a little more contained to focus on one area at a time, and leave this series as adding core support for BRA. But that said, if we really want to I don't feel mega strongly on this one. Thanks, Charles
On 12/20/23 16:16, Charles Keepax wrote: > On Tue, Dec 19, 2023 at 06:08:15PM +0100, Pierre-Louis Bossart wrote: >> On 12/19/23 17:53, Mark Brown wrote: >>> On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote: >>>>> grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, >>>>> or there's the rbtree cache sync code which uses a back door to go into >>>>> an async mode. Basically just variants of all the normal regmap I/O >>>>> calls with a _complete() call you can use to wait for everything to >>>>> happen. The implementation is a bit heavyweight since it was written to >>>>> work with fairly slow buses. >>> >>>> I spent a fair amount of time this afternoon trying to understand the >>>> regmap_async parts, and I am not following where in the code there is an >>>> ordering requirement/enforcement between async and sync usages. >>> >>> The only actual async implementation is SPI which processes things in >>> order of submission, the sync API wraps the async API. >>> >>>> Also is this just me spacing out or there is no regmap_raw_read_async()? >>> >>> Right, there was never any need. >> >> ok. I am starting to think that we could have a new type of regmap, say >> "regmap-sdw-bra", where the use of write_raw_async() would rely on the >> send/wait bus primitives, and write_raw() would fallback to the regular >> read/write commands. We'd need a mutual exclusion to prevent parallel >> async/sync access to the same regmap. >> >> In other words, "memory" areas that are used for firmware downloads >> would be moved to a different regmap with async capabilities and no >> caching support. > > I would be a little inclined to say leave adding a regmap for a > follow up series, whether we add it to the existing regmap or add > a new one, or whatever, it should all sit happily on top of the > API being added in this series. Makes it a little more contained > to focus on one area at a time, and leave this series as adding > core support for BRA. But that said, if we really want to I don't > feel mega strongly on this one. Right, I was probably going too far down in the details for a December 20 post. The point I was trying to make it seems there's consensus that regmap with the async parts would be the API used by SoundWire/ASoC codecs, and the regmap implementation would rely on the bus/host send/wait routines. The regmap stuff will need joint work with codec driver folks so it should indeed be done in a second step when the SoundWire bus/host parts are available. Put differently: is there any sustained objection to the proposal of extending regmap with async BRA transfers?
On Wed, Dec 20, 2023 at 07:26:24PM +0100, Pierre-Louis Bossart wrote: > Put differently: is there any sustained objection to the proposal of > extending regmap with async BRA transfers? Not from me, and I agree that it makes sense to do it once the underlying SoundWire bits are in place.
On Wed, Dec 20, 2023 at 07:26:24PM +0100, Pierre-Louis Bossart wrote: > On 12/20/23 16:16, Charles Keepax wrote: > > On Tue, Dec 19, 2023 at 06:08:15PM +0100, Pierre-Louis Bossart wrote: > >> On 12/19/23 17:53, Mark Brown wrote: > >>> On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote: > Put differently: is there any sustained objection to the proposal of > extending regmap with async BRA transfers? Seems good to me, was not my intention to object to the idea itself. Thanks, Charles