diff mbox series

[17/18] ALSA: emu10k1: fix playback of short wavetable samples

Message ID 20240401100742.506001-18-oswald.buddenhagen@gmx.de
State New
Headers show
Series ALSA: emu10k1 & emux: fixes related to wavetable playback | expand

Commit Message

Oswald Buddenhagen April 1, 2024, 10:07 a.m. UTC
As already anticipated in the linked commit, playback was broken for
very short samples. That's because we'd set the current position beyond
the end of the loop, so while the start would now be correct (due to the
cache lag), we'd run off the end of the sample and play garbage.

Fixing the playback position itself wouldn't be that hard (just modulo
it), but this wouldn't address pre-filling the cache with the right
data.

We could pre-fill the cache manually, but that's slow, requires
additional code for each sample width, and is made even more complex by
the driver's virtual address space having no contiguous mapping for the
CPU.

We could have the engine fill the cache piece-wise (which is really what
happens when playback is running), but that would also be complex, and
we'd need to wait for the engine to handle each piece, so it wouldn't be
that much faster than the manual fill.

For the case of requiring only one loop iteration prior to reaching the
cache size, one could leverage the engine's looping mechanism around
CCR_CACHELOOPFLAG, but this special case doesn't seem worth the
complexity.

So we just unroll the loop as far as necessary to be able to play back
the sample without any fiddling.

Pedantically, this would be incorrect for loop-until-release samples
with a low loop end which are released very quickly, but that would be
relatively harmless, is not a plausible use case in the first place, and
SoundFont sample mode 3 isn't actually implemented anyway (it's
conflated with mode 1, infinite looping).

Fixes: df335e9a8b (ALSA: emu10k1: fix synthesizer sample playback position and caching, 2023-05-18)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218625
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
---
 sound/pci/emu10k1/emu10k1_patch.c | 53 ++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 5 deletions(-)

--
2.42.0.419.g70bf8a5751
diff mbox series

Patch

diff --git a/sound/pci/emu10k1/emu10k1_patch.c b/sound/pci/emu10k1/emu10k1_patch.c
index 699aa0fec97b..dbfa89435ac2 100644
--- a/sound/pci/emu10k1/emu10k1_patch.c
+++ b/sound/pci/emu10k1/emu10k1_patch.c
@@ -31,6 +31,7 @@  snd_emu10k1_sample_new(struct snd_emux *rec, struct snd_sf_sample *sp,
 	int shift;
 	int offset;
 	int truesize, size, blocksize;
+	int loop_start, loop_end, loop_size, data_end, unroll;
 	struct snd_emu10k1 *emu;

 	emu = rec->hw;
@@ -64,12 +65,35 @@  snd_emu10k1_sample_new(struct snd_emux *rec, struct snd_sf_sample *sp,
 		}
 	}

+	loop_start = sp->v.loopstart;
+	loop_end = sp->v.loopend;
+	loop_size = loop_end - loop_start;
+	if (!loop_size)
+		return -EINVAL;
+	data_end = sp->v.end;
+
 	/* recalculate offset */
 	sp->v.start += BLANK_HEAD_SIZE;
 	sp->v.end += BLANK_HEAD_SIZE;
 	sp->v.loopstart += BLANK_HEAD_SIZE;
 	sp->v.loopend += BLANK_HEAD_SIZE;

+	// Automatic pre-filling of the cache does not work in the presence
+	// of loops (*), and we don't want to fill it manually, as that is
+	// fiddly and slow. So we unroll the loop until the loop end is
+	// beyond the cache size.
+	// (*) Strictly speaking, a single iteration is supported (that's
+	// how it works when the playback engine runs), but handling this
+	// special case is not worth it.
+	unroll = 0;
+	while (sp->v.loopend < 64) {
+		truesize += loop_size;
+		sp->v.loopstart += loop_size;
+		sp->v.loopend += loop_size;
+		sp->v.end += loop_size;
+		unroll++;
+	}
+
 	/* try to allocate a memory block */
 	blocksize = truesize << shift;
 	sp->block = snd_emu10k1_synth_alloc(emu, blocksize);
@@ -89,19 +113,38 @@  snd_emu10k1_sample_new(struct snd_emux *rec, struct snd_sf_sample *sp,
 	offset += size;

 	/* copy provided samples */
-	size = sp->v.size << shift;
-	if (snd_emu10k1_synth_copy_from_user(emu, sp->block, offset, data, size, xor)) {
-		snd_emu10k1_synth_free(emu, sp->block);
-		sp->block = NULL;
-		return -EFAULT;
+	if (unroll && loop_end <= data_end) {
+		size = loop_end << shift;
+		if (snd_emu10k1_synth_copy_from_user(emu, sp->block, offset, data, size, xor))
+			goto faulty;
+		offset += size;
+
+		data += loop_start << shift;
+		while (--unroll > 0) {
+			size = loop_size << shift;
+			if (snd_emu10k1_synth_copy_from_user(emu, sp->block, offset, data, size, xor))
+				goto faulty;
+			offset += size;
+		}
+
+		size = (data_end - loop_start) << shift;
+	} else {
+		size = data_end << shift;
 	}
+	if (snd_emu10k1_synth_copy_from_user(emu, sp->block, offset, data, size, xor))
+		goto faulty;
 	offset += size;

 	/* clear rest of samples (if any) */
 	if (offset < blocksize)
 		snd_emu10k1_synth_memset(emu, sp->block, offset, blocksize - offset, fill);

 	return 0;
+
+faulty:
+	snd_emu10k1_synth_free(emu, sp->block);
+	sp->block = NULL;
+	return -EFAULT;
 }

 /*