[RFC,00/16] soundwire/ASoC: speed-up downloads with BTP/BRA protocol

Message ID	20231207222944.663893-1-pierre-louis.bossart@linux.intel.com
Headers	show Return-Path: <alsa-devel-bounces@alsa-project.org> From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> To: linux-sound@vger.kernel.org Cc: alsa-devel@alsa-project.org, tiwai@suse.de, broonie@kernel.org, vinod.koul@intel.com, Bard liao <yung-chuan.liao@linux.intel.com>, Ranjani Sridharan <ranjani.sridharan@linux.intel.com>, Peter Ujfalusi <peter.ujfalusi@linux.intel.com>, Kai Vehmanen <kai.vehmanen@linux.intel.com>, srinivas.kandagatla@linaro.org, Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>, vijendar.mukunda@amd.com, Charles Keepax <ckeepax@opensource.cirrus.com>, Richard Fitzgerald <rf@opensource.cirrus.com>, Shuming Fan <shumingf@realtek.com>, Jack Yu <jack.yu@realtek.com>, Oder Chiou <oder_chiou@realtek.com>, Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Subject: [RFC PATCH 00/16] soundwire/ASoC: speed-up downloads with BTP/BRA protocol Date: Thu, 7 Dec 2023 16:29:28 -0600 Message-Id: <20231207222944.663893-1-pierre-louis.bossart@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID-Hash: JWSG6NWRVENVRPPPMOUDJLYVRBN7XOPG Precedence: list Archived-At: <https://mailman.alsa-project.org/hyperkitty/list/alsa-devel@alsa-project.org/message/JWSG6NWRVENVRPPPMOUDJLYVRBN7XOPG/>
Series	soundwire/ASoC: speed-up downloads with BTP/BRA protocol \| expand [RFC,00/16] soundwire/ASoC: speed-up downloads with BTP/BRA protocol [RFC,01/16] Documentation: driver: add SoundWire BRA description [RFC,02/16] soundwire: cadence: add BTP support for DP0 [RFC,03/16] soundwire: stream: extend sdw_alloc_stream() to take 'type' parameter [RFC,04/16] soundwire: extend sdw_stream_type to BPT [RFC,05/16] soundwire: stream: special-case the bus compute_params() routine [RFC,06/16] soundwire: stream: reuse existing code for BPT stream [RFC,07/16] soundwire: bus: add API for BPT protocol [RFC,08/16] soundwire: bus: add bpt_stream pointer [RFC,09/16] soundwire: crc8: add constant table [RFC,10/16] soundwire: cadence: add BTP/BRA helpers to format data [RFC,11/16] soundwire: intel_auxdevice: add indirection for BPT open/close/send_async/wait [RFC,12/16] ASoC: SOF: Intel: hda-sdw-bpt: add helpers for SoundWire BPT DMA [RFC,13/16] soundwire: intel: add BPT context definition [RFC,14/16] soundwire: intel_ace2x: add BPT open/close/send_async/wait [RFC,15/16] soundwire: debugfs: add interface for BPT/BRA transfers [RFC,16/16] ASoC: rt711-sdca: add DP0 support

Pierre-Louis Bossart Dec. 7, 2023, 10:29 p.m. UTC

This RFC patchset suggests a new API for ASoC codec drivers to use for
firmware/table downloads.

Problem statement:

All existing transfers initiated by codec drivers rely on SoundWire
read/write commands, which can only support ONE byte per frame. With
the typical 48kHz frame rate, this means 384 kbits/s.

In addition, the command/control is typically handled with a FIFO and
interrupts which adds more software overhead. To give a practical
reference, sending 32Kb takes 2.5s on Intel platforms, which means
about 105kbit/s only. Additional non-audio activity is likely to
adversely impact interrupt scheduling and further decrease the
transfer speeds.

New SDCA-based codecs have a need to download tables and DSP firmware
which are typically between 20 and 256 Kb. The slow bus operation has
a direct impact on boot/resume times, and clearly having to wait more
than 300ms is a showstopper in terms of latency requirements and
user-experience.

Suggested solution:

The MIPI specification and most of the new codecs support the Bulk
Transfer Protocol (BTP) and specifically the Bulk Register Access
(BRA) configuration. This mode reclaims the 'audio' data space of the
SoundWire frame to send firmware/coefficients over the DataPort 0
(DP0).

The API suggested is rather simple, with the following sequence
expected:
open(): reserve resources and prepare hardware
send_async(): trigger DMAs and perform SoundWire bank switch
wait(): wait for DMA completion and disable DMAs
close(): release resources

Benefits:

Even after accounting for the protocol overhead, the data can be sent
8x or 16x faster on the link than with the regular commands.

With the use of DMAs, the software overhead becomes limited to the
initialization. Measured results show that transferring the same 32Kb
takes about 100ms, a 25x improvement on the baseline write() commands
with an actual bitrate of 2.6 Mbits/s. These results are a measure of
bus/hardware performance mainly, and will typically not be too
modified by the CPU activity and scheduling.

The performance for reads is similar, with a 25x speedup measured.

Limitations:

Setting up the transfers over DP0 takes time, and the reliance on DMAs
on the host side brings alignment restrictions. The BTP/BRA protocol
is really only relevant for "large" transfers done during boot/resume
*before* audio transfers take place. Mixing BTP/BRA and audio is a
nightmare, this patchset suggests a mutual-exclusion between two
usages.

Scope:

This patchset only exposes the API and a debugfs interface to initiate
commands, validate results and measure performance. The actual use of
the API is left as an exercise for codec driver developers.

This patchset depends on a number of pre-requisite patches and will
not build on top of any for-next branches. The main intent of this RFC
is to gather comments on the usage, API, benefits and restrictions.

The code and functionality was tested on an Intel LunarLake RVP platform
connected to a Realtek RT711-SDCA device.

Acknowledgements:

Thanks to Zeek Tsai at Realtek for providing test sequences that
helped reconcile the data formatted by the host driver with the
expected results on the code side.

Pierre-Louis Bossart (16):
  Documentation: driver: add SoundWire BRA description
  soundwire: cadence: add BTP support for DP0
  soundwire: stream: extend sdw_alloc_stream() to take 'type' parameter
  soundwire: extend sdw_stream_type to BPT
  soundwire: stream: special-case the bus compute_params() routine
  soundwire: stream: reuse existing code for BPT stream
  soundwire: bus: add API for BPT protocol
  soundwire: bus: add bpt_stream pointer
  soundwire: crc8: add constant table
  soundwire: cadence: add BTP/BRA helpers to format data
  soundwire: intel_auxdevice: add indirection for BPT
    open/close/send_async/wait
  ASoC: SOF: Intel: hda-sdw-bpt: add helpers for SoundWire BPT DMA
  soundwire: intel: add BPT context definition
  soundwire: intel_ace2x: add BPT open/close/send_async/wait
  soundwire: debugfs: add interface for BPT/BRA transfers
  ASoC: rt711-sdca: add DP0 support

 Documentation/driver-api/soundwire/bra.rst    | 478 +++++++++++++
 Documentation/driver-api/soundwire/index.rst  |   1 +
 Documentation/driver-api/soundwire/stream.rst |   2 +-
 .../driver-api/soundwire/summary.rst          |   5 +-
 drivers/soundwire/Kconfig                     |   1 +
 drivers/soundwire/Makefile                    |   4 +-
 drivers/soundwire/amd_manager.c               |   2 +-
 drivers/soundwire/bus.c                       |  77 +++
 drivers/soundwire/bus.h                       |  18 +
 drivers/soundwire/cadence_master.c            | 646 +++++++++++++++++-
 drivers/soundwire/cadence_master.h            |  30 +
 drivers/soundwire/crc8.c                      | 277 ++++++++
 drivers/soundwire/crc8.h                      |  11 +
 drivers/soundwire/debugfs.c                   | 122 +++-
 .../soundwire/generic_bandwidth_allocation.c  |  84 ++-
 drivers/soundwire/intel.h                     |  12 +
 drivers/soundwire/intel_ace2x.c               | 377 ++++++++++
 drivers/soundwire/intel_auxdevice.c           |  55 ++
 drivers/soundwire/qcom.c                      |   2 +-
 drivers/soundwire/stream.c                    | 137 +++-
 include/linux/soundwire/sdw.h                 |  91 ++-
 include/linux/soundwire/sdw_intel.h           |  16 +
 include/sound/hda-sdw-bpt.h                   |  76 +++
 sound/soc/codecs/rt711-sdca-sdw.c             |   8 +
 sound/soc/qcom/sdw.c                          |   2 +-
 sound/soc/sof/intel/Kconfig                   |   8 +-
 sound/soc/sof/intel/Makefile                  |   4 +
 sound/soc/sof/intel/hda-sdw-bpt.c             | 328 +++++++++
 28 files changed, 2810 insertions(+), 64 deletions(-)
 create mode 100644 Documentation/driver-api/soundwire/bra.rst
 create mode 100644 drivers/soundwire/crc8.c
 create mode 100644 drivers/soundwire/crc8.h
 create mode 100644 include/sound/hda-sdw-bpt.h
 create mode 100644 sound/soc/sof/intel/hda-sdw-bpt.c

Mark Brown Dec. 7, 2023, 10:56 p.m. UTC | #1

On Thu, Dec 07, 2023 at 04:29:28PM -0600, Pierre-Louis Bossart wrote:

> The MIPI specification and most of the new codecs support the Bulk
> Transfer Protocol (BTP) and specifically the Bulk Register Access
> (BRA) configuration. This mode reclaims the 'audio' data space of the
> SoundWire frame to send firmware/coefficients over the DataPort 0
> (DP0).

So the bulk register access is accessing registers that are also visible
through the one register at at time interface, just faster?

Pierre-Louis Bossart Dec. 7, 2023, 11:06 p.m. UTC | #2

On 12/7/23 16:56, Mark Brown wrote:
> On Thu, Dec 07, 2023 at 04:29:28PM -0600, Pierre-Louis Bossart wrote:
> 
>> The MIPI specification and most of the new codecs support the Bulk
>> Transfer Protocol (BTP) and specifically the Bulk Register Access
>> (BRA) configuration. This mode reclaims the 'audio' data space of the
>> SoundWire frame to send firmware/coefficients over the DataPort 0
>> (DP0).
> 
> So the bulk register access is accessing registers that are also visible
> through the one register at at time interface, just faster?

Yes, each frame can transmit a packet with a start address, length and a
bunch of data bytes protected with a CRC. With the default 50x4 frame
size we use, we can send 8 contiguous bytes per frame instead of 1. With
a larger frame you get even more bytes per frame.

Also because we program a large buffer with all the packets
pre-formatted by software, we don't have much software overhead. The
packets are streamed over DMA and inserted in the frame by hardware at
the relevant time. That means waiting for one DMA complete event instead
of dealing with thousands of command/responses with interrupts.

There are limitations though, if the frame is already transmitting audio
data then obviously we have a conflict.

Pierre-Louis Bossart Dec. 8, 2023, 6:45 p.m. UTC | #3

On 12/8/23 10:27, Charles Keepax wrote:
> On Thu, Dec 07, 2023 at 04:29:29PM -0600, Pierre-Louis Bossart wrote:
>> The Bulk Register Access protocol was left as a TODO topic since
>> 2018. It's time to document this protocol and the design of its Linux
>> support.
>>
>> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
>> ---
>> +Concurrency between BRA and regular read/write
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +The existing 'nread/nwrite' API already relies on a notion of start
>> +address and number of bytes, so it would be possible to extend this
>> +API with a 'hint' requesting BPT/BRA be used.
>> +
>> +However BRA transfers could be quite long, and the use of a single
>> +mutex for regular read/write and BRA is a show-stopper. Independent
>> +operation of the control/command and BRA transfers is a fundamental
>> +requirement, e.g. to change the volume level with the existing regmap
>> +interface while downloading firmware.
> 
> Is this definitely a show stopper? Not saying that it wouldn't be
> desirable to do both from a speed perspective, but current
> systems that download firmware (I2C/SPI) typically will block the
> bus for some amount of time. There are also some desirable properties
> to a single lock such as not needing to worry about accessing the
> same register in the bulk transfer and a normal command transfer.

There are other cases where we do want to interact with the device even
if we are in the middle of downloading stuff. We currently have a
msg_lock that's used to prevent multiple read/writes from being sent on
the bus, I really don't think it would be wise to use this same
"lightweight" lock for much longer transfers. This could impact response
to alerts, changes in state reported by the hardware, etc.

>> +Audio DMA support
>> +-----------------
>> +
>> +Some DMAs, such as HDaudio, require an audio format field to be
>> +set. This format is in turn used to define acceptable bursts. BPT/BRA
>> +support is not fully compatible with these definitions in that the
>> +format may vary between read and write commands.
>> +
>> +In addition, on Intel HDaudio Intel platforms the DMAs need to be
>> +programmed with a PCM format matching the bandwidth of the BPT/BRA
>> +transfer. The format is based on 48kHz 32-bit samples, and the number
>> +of channels varies to adjust the bandwidth. The notion of channel is
>> +completely notional since the data is not typical audio
>> +PCM. Programming channels helps reserve enough bandwidth and adjust
>> +FIFO sizes to avoid xruns. Note that the quality of service comes as a
>> +cost. Since all channels need to be present as a sample block, data
>> +sizes not aligned to 128-bytes are not supported.
> 
> Apologies but could you elaborate a litte on this? I am not sure
> I follow the reasoning, how does the 48k 32bit DMA implementation
> result in 128-byte limitation? I would have thought 1 channel would
> be 4-bytes and you are varying the channels so I would have expected
> 4-byte aligned maybe 8-byte if the DMA expects stereo pairs.

With the 50x4 default frame size that we're using, a BTP/BRA write
requires 13.824 Mbits/s for TX and 3.072Mbits/s for RX. We have to treat
these "non-audio' streams as respectively 10 and 2 channels of 48kHz
32bit data to reserve the equivalent PCM bandwidth, and that creates an
alignment requirement since the DMA expects all 'channels' to be present
in the same block.

We could try and play with parameters, e.g. I am not sure what happens
if we used 192kHz, maybe that reduces the alignment back to 32 bytes.
But the point is arbitrary sizes *cannot* be handled.

The HDaudio spec also mentions that for efficiency reason it's strongly
recommended to use 128-byte multiples.

> And what exactly do we mean by aligned, are we saying the length
> all transfers needs to be a multiple of 128-bytes?

Yes.

> I think we might have some annoying restrictions on the block
> size on our hardware as well I will go dig into that and report
> back.

That would be very useful indeed, we have to make sure this alignment is
reasonably supported across hosts and devices.
If we can't agree we'll have to make the alignment host-dependent, but
that will make the logic in codec drivers more complicated.

Mark Brown Dec. 8, 2023, 6:55 p.m. UTC | #4

On Fri, Dec 08, 2023 at 12:45:21PM -0600, Pierre-Louis Bossart wrote:
> On 12/8/23 10:27, Charles Keepax wrote:

> > I think we might have some annoying restrictions on the block
> > size on our hardware as well I will go dig into that and report
> > back.

> That would be very useful indeed, we have to make sure this alignment is
> reasonably supported across hosts and devices.
> If we can't agree we'll have to make the alignment host-dependent, but
> that will make the logic in codec drivers more complicated.

Hopefully even if it's complicated it's something that can be factored
out into library code rather than replicated.

Mark Brown Dec. 8, 2023, 9:49 p.m. UTC | #5

On Thu, Dec 07, 2023 at 06:56:55PM -0600, Pierre-Louis Bossart wrote:

> >> +The Device Number specified in the Header follows the SoundWire
> >> +definitions, and broadcast and group addressing are
> >> +permitted. However, in reality it is very unlikely that the exact same
> >> +binary data needs to be provided to the two different Peripheral
> >> +devices. The Linux implementation only allows for transfers to a
> >> +single device at a time.

> > For the firmware download case it seems likely that this won't always be
> > the case, but it's definitely a thing that could be done incrementally.

> One example discussed this week on the mailing list is the Cirrus Logic
> case, where the firmware contains information on which speakers a given
> amplifier should be doing, and each firmware is named as AMP-n.

I can imagine a vendor structuring things so they've got separate
firmware and coefficent/configuration images, the firmware images could
be shared.

> I have really not found any practical case where the same data needs to
> be sent to N devices, and I don't have a burning desire to tie the codec
> initialization together with all the asynchronous nature of
> enumeration/probe.

Like I say it does seem like something that could be done incrementally.

> > These don't seem like insurmountable obstacles - they feel more like a
> > copy break kind of situation where we can infer things from the pattern
> > of transactions, and there's always the possibility of adding some calls
> > to give regmap more high level information about the overall state of
> > the device.  One of the expected usage patterns with cache only mode is
> > to build up a final register state then let the cache code work out the
> > optimal pattern to actually write that out.

> I did expect some pushback on regmap :-)

> The point is really that the main use for this download is a set of
> write-once memory areas which happen to be mapped as registers. There is
> no real need to declare or access each memory address individually.

Yeah, normally it's just write only stuff but I've seen things like
controls being added in the DSP memory before - the 

> In addition in terms of error handling, the expectation is that the set
> of writes are handled in a pass/fail manner. There's no real way to know
> which of the individual writes failed.

That's the case for any block writes.

> > I might be missing something but those requests for redownload sound
> > like things that would occur regardless of the mechanism used to perform
> > the I/O?

> What I meant is that the codec driver would e.g. need to fetch a
> different firmware table and download it. There's no real need to
> maintain a cache on the host side since the entire table will be downloaded.

I mean, if that's the usage pattern surely things would just be marked
volatile anyway?  A cache is entirely optional.

> > This feels a lot like it could map onto the regmap async interface, it
> > would need a bit of a rework (mainly that currently they provide
> > ordering guarantees with respect to both each other and sync I/O) but
> > those could be removed with some care) but it's got the "here's a list
> > of I/O, here's another call to ensure it's all actually happened" thing.
> > It sounds very much like how I was thinking of the async API when I was
> > writing it and the initial users.

> > It's this bit that really got me thinking it could fit well into regmap.

> I really don't know anything about this async interface, if you have
> pointers on existing examples I am all ears....I am not aware of any
> Intel platform or codec used on an Intel platform making use of this API.

grep for regmap_.*async - cs_dsp.c is the upstream example in a driver,
or there's the rbtree cache sync code which uses a back door to go into
an async mode.  Basically just variants of all the normal regmap I/O
calls with a _complete() call you can use to wait for everything to
happen.  The implementation is a bit heavyweight since it was written to
work with fairly slow buses.

> At any rate the low-level behavior is to
> a) allocate and configure all the SoundWire resources
> b) allocate and configure all the DMA resources
> c) trigger DMA and enable SoundWire transfers
> d) wait for the DMA to complete

> The codec API can be modified as needed, as long as there are provisions
> for these 4 steps.

That's not quite how the current API works, but it feels like it's close
enough to the intent and there's literally one user of the actual API.

> >> +   (3) A Peripheral driver may want to wait until existing BRA
> >> +       transfers complete or deal with BRA as a background task when
> >> +       audio transfers stop. In this case, there would be no timeout,
> >> +       and the operation may not happen if the platform is suspended.

> > Option 3 would be a jump for regmap.

> Sorry, I don't get what a 'jump' means in this context.

Big change.

Pierre-Louis Bossart Dec. 19, 2023, 4:50 p.m. UTC | #6

>>> This feels a lot like it could map onto the regmap async interface, it
>>> would need a bit of a rework (mainly that currently they provide
>>> ordering guarantees with respect to both each other and sync I/O) but
>>> those could be removed with some care) but it's got the "here's a list
>>> of I/O, here's another call to ensure it's all actually happened" thing.
>>> It sounds very much like how I was thinking of the async API when I was
>>> writing it and the initial users.
> 
>>> It's this bit that really got me thinking it could fit well into regmap.
> 
>> I really don't know anything about this async interface, if you have
>> pointers on existing examples I am all ears....I am not aware of any
>> Intel platform or codec used on an Intel platform making use of this API.
> 
> grep for regmap_.*async - cs_dsp.c is the upstream example in a driver,
> or there's the rbtree cache sync code which uses a back door to go into
> an async mode.  Basically just variants of all the normal regmap I/O
> calls with a _complete() call you can use to wait for everything to
> happen.  The implementation is a bit heavyweight since it was written to
> work with fairly slow buses.

I spent a fair amount of time this afternoon trying to understand the
regmap_async parts, and I am not following where in the code there is an
ordering requirement/enforcement between async and sync usages.

I do see this comment in the code

 * @async_write: Write operation which completes asynchronously,
optional and must serialise with respect to non-async I/O.

But that 'must' is a requirement on the codec side, isn't it?

When using regmap_raw_write_async(), the lock is released immediately. I
don't see anything at the regmap level that would prevent a codec driver
from using regmap_raw_write() immediately.

That's probably a violation of the API to do so, but it's the same
problem I was referring earlier in the conversation where 'regular'
read/writes cannot happen in parallel with BTP/BRA transactions.

Also is this just me spacing out or there is no regmap_raw_read_async()?

Mark Brown Dec. 19, 2023, 4:53 p.m. UTC | #7

On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote:

> > grep for regmap_.*async - cs_dsp.c is the upstream example in a driver,
> > or there's the rbtree cache sync code which uses a back door to go into
> > an async mode.  Basically just variants of all the normal regmap I/O
> > calls with a _complete() call you can use to wait for everything to
> > happen.  The implementation is a bit heavyweight since it was written to
> > work with fairly slow buses.

> I spent a fair amount of time this afternoon trying to understand the
> regmap_async parts, and I am not following where in the code there is an
> ordering requirement/enforcement between async and sync usages.

The only actual async implementation is SPI which processes things in
order of submission, the sync API wraps the async API.

> Also is this just me spacing out or there is no regmap_raw_read_async()?

Right, there was never any need.

Pierre-Louis Bossart Dec. 19, 2023, 5:08 p.m. UTC | #8

On 12/19/23 17:53, Mark Brown wrote:
> On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote:
> 
>>> grep for regmap_.*async - cs_dsp.c is the upstream example in a driver,
>>> or there's the rbtree cache sync code which uses a back door to go into
>>> an async mode.  Basically just variants of all the normal regmap I/O
>>> calls with a _complete() call you can use to wait for everything to
>>> happen.  The implementation is a bit heavyweight since it was written to
>>> work with fairly slow buses.
> 
>> I spent a fair amount of time this afternoon trying to understand the
>> regmap_async parts, and I am not following where in the code there is an
>> ordering requirement/enforcement between async and sync usages.
> 
> The only actual async implementation is SPI which processes things in
> order of submission, the sync API wraps the async API.
> 
>> Also is this just me spacing out or there is no regmap_raw_read_async()?
> 
> Right, there was never any need.

ok. I am starting to think that we could have a new type of regmap, say
"regmap-sdw-bra", where the use of write_raw_async() would rely on the
send/wait bus primitives, and write_raw() would fallback to the regular
read/write commands. We'd need a mutual exclusion to prevent parallel
async/sync access to the same regmap.

In other words, "memory" areas that are used for firmware downloads
would be moved to a different regmap with async capabilities and no
caching support.

Charles Keepax Dec. 20, 2023, 3:16 p.m. UTC | #9

On Tue, Dec 19, 2023 at 06:08:15PM +0100, Pierre-Louis Bossart wrote:
> On 12/19/23 17:53, Mark Brown wrote:
> > On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote:
> >>> grep for regmap_.*async - cs_dsp.c is the upstream example in a driver,
> >>> or there's the rbtree cache sync code which uses a back door to go into
> >>> an async mode.  Basically just variants of all the normal regmap I/O
> >>> calls with a _complete() call you can use to wait for everything to
> >>> happen.  The implementation is a bit heavyweight since it was written to
> >>> work with fairly slow buses.
> > 
> >> I spent a fair amount of time this afternoon trying to understand the
> >> regmap_async parts, and I am not following where in the code there is an
> >> ordering requirement/enforcement between async and sync usages.
> > 
> > The only actual async implementation is SPI which processes things in
> > order of submission, the sync API wraps the async API.
> > 
> >> Also is this just me spacing out or there is no regmap_raw_read_async()?
> > 
> > Right, there was never any need.
> 
> ok. I am starting to think that we could have a new type of regmap, say
> "regmap-sdw-bra", where the use of write_raw_async() would rely on the
> send/wait bus primitives, and write_raw() would fallback to the regular
> read/write commands. We'd need a mutual exclusion to prevent parallel
> async/sync access to the same regmap.
> 
> In other words, "memory" areas that are used for firmware downloads
> would be moved to a different regmap with async capabilities and no
> caching support.

I would be a little inclined to say leave adding a regmap for a
follow up series, whether we add it to the existing regmap or add
a new one, or whatever, it should all sit happily on top of the
API being added in this series. Makes it a little more contained
to focus on one area at a time, and leave this series as adding
core support for BRA. But that said, if we really want to I don't
feel mega strongly on this one.

Thanks,
Charles

Pierre-Louis Bossart Dec. 20, 2023, 6:26 p.m. UTC | #10

On 12/20/23 16:16, Charles Keepax wrote:
> On Tue, Dec 19, 2023 at 06:08:15PM +0100, Pierre-Louis Bossart wrote:
>> On 12/19/23 17:53, Mark Brown wrote:
>>> On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote:
>>>>> grep for regmap_.*async - cs_dsp.c is the upstream example in a driver,
>>>>> or there's the rbtree cache sync code which uses a back door to go into
>>>>> an async mode.  Basically just variants of all the normal regmap I/O
>>>>> calls with a _complete() call you can use to wait for everything to
>>>>> happen.  The implementation is a bit heavyweight since it was written to
>>>>> work with fairly slow buses.
>>>
>>>> I spent a fair amount of time this afternoon trying to understand the
>>>> regmap_async parts, and I am not following where in the code there is an
>>>> ordering requirement/enforcement between async and sync usages.
>>>
>>> The only actual async implementation is SPI which processes things in
>>> order of submission, the sync API wraps the async API.
>>>
>>>> Also is this just me spacing out or there is no regmap_raw_read_async()?
>>>
>>> Right, there was never any need.
>>
>> ok. I am starting to think that we could have a new type of regmap, say
>> "regmap-sdw-bra", where the use of write_raw_async() would rely on the
>> send/wait bus primitives, and write_raw() would fallback to the regular
>> read/write commands. We'd need a mutual exclusion to prevent parallel
>> async/sync access to the same regmap.
>>
>> In other words, "memory" areas that are used for firmware downloads
>> would be moved to a different regmap with async capabilities and no
>> caching support.
> 
> I would be a little inclined to say leave adding a regmap for a
> follow up series, whether we add it to the existing regmap or add
> a new one, or whatever, it should all sit happily on top of the
> API being added in this series. Makes it a little more contained
> to focus on one area at a time, and leave this series as adding
> core support for BRA. But that said, if we really want to I don't
> feel mega strongly on this one.

Right, I was probably going too far down in the details for a December
20 post.

The point I was trying to make it seems there's consensus that regmap
with the async parts would be the API used by SoundWire/ASoC codecs, and
the regmap implementation would rely on the bus/host send/wait routines.

The regmap stuff will need joint work with codec driver folks so it
should indeed be done in a second step when the SoundWire bus/host parts
are available.

Put differently: is there any sustained objection to the proposal of
extending regmap with async BRA transfers?

Mark Brown Dec. 20, 2023, 6:28 p.m. UTC | #11

On Wed, Dec 20, 2023 at 07:26:24PM +0100, Pierre-Louis Bossart wrote:

> Put differently: is there any sustained objection to the proposal of
> extending regmap with async BRA transfers?

Not from me, and I agree that it makes sense to do it once the
underlying SoundWire bits are in place.

Charles Keepax Dec. 21, 2023, 9:46 a.m. UTC | #12

On Wed, Dec 20, 2023 at 07:26:24PM +0100, Pierre-Louis Bossart wrote:
> On 12/20/23 16:16, Charles Keepax wrote:
> > On Tue, Dec 19, 2023 at 06:08:15PM +0100, Pierre-Louis Bossart wrote:
> >> On 12/19/23 17:53, Mark Brown wrote:
> >>> On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote:
> Put differently: is there any sustained objection to the proposal of
> extending regmap with async BRA transfers?

Seems good to me, was not my intention to object to the idea itself.

Thanks,
Charles

[RFC,00/16] soundwire/ASoC: speed-up downloads with BTP/BRA protocol

Message

Comments