[v11,00/10] spi: Add support for stacked/parallel memories

Message ID	20231125092137.2948-1-amit.kumar-mahapatra@amd.com
Headers	show Return-Path: <alsa-devel-bounces@alsa-project.org> Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; pr=C From: Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com> To: <broonie@kernel.org>, <tudor.ambarus@linaro.org>, <pratyush@kernel.org>, <miquel.raynal@bootlin.com>, <richard@nod.at>, <vigneshr@ti.com>, <sbinding@opensource.cirrus.com>, <lee@kernel.org>, <james.schulman@cirrus.com>, <david.rhodes@cirrus.com>, <rf@opensource.cirrus.com>, <perex@perex.cz>, <tiwai@suse.com> CC: <linux-spi@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <michael@walle.cc>, <linux-mtd@lists.infradead.org>, <nicolas.ferre@microchip.com>, <alexandre.belloni@bootlin.com>, <claudiu.beznea@tuxon.dev>, <michal.simek@amd.com>, <linux-arm-kernel@lists.infradead.org>, <alsa-devel@alsa-project.org>, <patches@opensource.cirrus.com>, <linux-sound@vger.kernel.org>, <git@amd.com>, <amitrkcian2002@gmail.com>, Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com> Subject: [PATCH v11 00/10] spi: Add support for stacked/parallel memories Date: Sat, 25 Nov 2023 14:51:27 +0530 Message-ID: <20231125092137.2948-1-amit.kumar-mahapatra@amd.com> MIME-Version: 1.0 Content-Type: text/plain Message-ID-Hash: BSE6RY3YY4JP6I5ENMQKB7LXKQ7UMAPA Precedence: list Archived-At: <https://mailman.alsa-project.org/hyperkitty/list/alsa-devel@alsa-project.org/message/BSE6RY3YY4JP6I5ENMQKB7LXKQ7UMAPA/>
Series	spi: Add support for stacked/parallel memories \| expand [v11,00/10] spi: Add support for stacked/parallel memories [v11,03/10] spi: Add multi-cs memories support in SPI core [v11,04/10] mtd: spi-nor: Convert macros with inline functions [v11,05/10] mtd: spi-nor: Add APIs to set/get nor->params [v11,09/10] mtd: spi-nor: Add parallel memories support in spi-nor

Mahapatra, Amit Kumar Nov. 25, 2023, 9:21 a.m. UTC

This patch series updated the spi-nor, spi core and the AMD-Xilinx GQSPI 
driver to add stacked and parallel memories support.

The first nine patches
https://lore.kernel.org/all/20230119185342.2093323-1-amit.kumar-mahapatra@amd.com/
https://lore.kernel.org/all/20230310173217.3429788-2-amit.kumar-mahapatra@amd.com/
https://lore.kernel.org/all/20230310173217.3429788-3-amit.kumar-mahapatra@amd.com/
https://lore.kernel.org/all/20230310173217.3429788-4-amit.kumar-mahapatra@amd.com/
https://lore.kernel.org/all/20230310173217.3429788-5-amit.kumar-mahapatra@amd.com/
https://lore.kernel.org/all/20230310173217.3429788-6-amit.kumar-mahapatra@amd.com/
https://lore.kernel.org/all/20230310173217.3429788-7-amit.kumar-mahapatra@amd.com/
https://lore.kernel.org/all/20230310173217.3429788-8-amit.kumar-mahapatra@amd.com/
https://lore.kernel.org/all/20230310173217.3429788-9-amit.kumar-mahapatra@amd.com/
of the previous series got applied to
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
for-next But the rest of the patches in the series did not get applied as 
failure was reported for spi driver with GPIO CS, so send the remaining 
patches in the series after rebasing it on top of for-next branch and 
fixing the issue.

CS GPIO is not tested on our hardware, but it has been tested by @Stefan 
https://lore.kernel.org/all/005001da1efc$619ad5a0$24d080e0$@opensource.cirrus.com/
---
BRANCH: for-next

Changes in v11:
- Rebased patch series on top of latest for-next branch.
- Added a new patch(1/10) to replace spi->chip_select with
  spi_get_chipselect() call in tps6594-spi.c.
- Added a new patch(2/10) to replace spi->chip_select with
  spi_get_chipseletc() call in cs35l56_hda_spi.c.
- In spi.c initialized unused CS[] to 0xff and spi->cs_index_mask
  to 0x01 in all flows.
- Updated spi_dev_check() to compare the CS of old spi device with
  the new spi device CS.
- Updated cover letter description to add information regarding GPIO CS
  testing and added Stefen's Tested-by tag in 3/10 patch.

Changes in v10:
 - Rebased patch series on top of latest for-next branch and fixed
   merge conflicts.

Changes in v9:
- Updated 1/8 patch description to add an high-level overview of
  parallel(multi-cs) & stacked design.
- Initialized all unused CS to 0xFF.
- Moved CS check from spi_add_device() to __spi_add_device().
- Updated __spi_add_device() to check to make sure that multiple logical CS
  don't map to the same physical CS and same physical CS doesn't map to
  different logical CS.
- Updated 1/8, 5/8 & 7/8 to support arbitrary number of flash devices
  connected in parallel or stacked mode.
- Updated documentation for chip_select.
- Added a new spi-nor structure member nor->num_flash to keep track of the
  number of flashes connected.
- Added a new patch in the series 4/8 to move write_enable call just before
  spi_mem ops call in SPI-NOR.
- Added comments in SPI core & SPI-NOR.
- Rebased the patch series on top of the latest for-next branch.

Changes in v8:
- Updated __spi_add_device() and spi_set_cs() to fix spi driver failure
  with GPIO CS.
- Rebased the patch series on top of latest for-next branch and fixed
  merge conflicts.
- Updated cover letter description to add information regarding GPIO CS
  testing and request Stefan to provide his Tested-by tag for 1/7 patch.
- Updated 1/7 patch description.

Changes in v7:
- Updated spi_dev_check() to avoid failures for spi driver GPIO CS and
  moved the error message from __spi_add_device() to spi_dev_check().
- Resolved code indentation issue in spi_set_cs().
- In spi_set_cs() call spi_delay_exec( ) once if the controller supports
  multi cs with both the CS backed by GPIO.
- Updated __spi_validate()to add checks for both the GPIO CS.
- Replaced cs_index_mask bit mask with SPI_CS_CNT_MAX.
- Updated struct spi_controller to represent multi CS capability of the
  spi controller through a flag bit SPI_CONTROLLER_MULTI_CS instead of
  a boolen structure member "multi_cs_cap".
- Updated 1/7 patch description .

Changes in v6:
- Rebased on top of latest v6.3-rc1 and fixed merge conflicts in
  spi-mpc512x-psc.c, sfdp.c, spansion.c files and removed spi-omap-100k.c.
- Updated spi_dev_check( ) to reject new devices if any one of the
  chipselect is used by another device.

Changes in v5:
- Rebased the patches on top of v6.3-rc1 and fixed the merge conflicts.
- Fixed compilation warnings in spi-sh-msiof.c with shmobile_defconfig

Changes in v4:
- Fixed build error in spi-pl022.c file - reported by Mark.
- Fixed build error in spi-sn-f-ospi.c file.
- Added Reviewed-by: Serge Semin <fancer.lancer@gmail.com> tag.
- Added two more patches to replace spi->chip_select with API calls in
  mpc832x_rdb.c & cs35l41_hda_spi.c files.

Changes in v3:
- Rebased the patches on top of
  https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next
- Added a patch to convert spi_nor_otp_region_len(nor) &
  spi_nor_otp_n_regions(nor) macros into inline functions
- Added Reviewed-by & Acked-by tags

Changes in v2:
- Rebased the patches on top of v6.2-rc1
- Created separate patch to add get & set APIs for spi->chip_select &
  spi->cs_gpiod, and replaced all spi->chip_select and spi->cs_gpiod
  references with the API calls.
- Created separate patch to add get & set APIs for nor->params.
---
Amit Kumar Mahapatra (10):
  mfd: tps6594: Use set/get APIs to access spi->chip_select
  ALSA: hda/cs35l56: Use set/get APIs to access spi->chip_select
  spi: Add multi-cs memories support in SPI core
  mtd: spi-nor: Convert macros with inline functions
  mtd: spi-nor: Add APIs to set/get nor->params
  mtd: spi-nor: Move write enable inside specific write & erase APIs
  mtd: spi-nor: Add stacked memories support in spi-nor
  spi: spi-zynqmp-gqspi: Add stacked memories support in GQSPI driver
  mtd: spi-nor: Add parallel memories support in spi-nor
  spi: spi-zynqmp-gqspi: Add parallel memories support in GQSPI driver

 drivers/mfd/tps6594-spi.c        |   2 +-
 drivers/mtd/spi-nor/atmel.c      |  17 +-
 drivers/mtd/spi-nor/core.c       | 683 +++++++++++++++++++++++++------
 drivers/mtd/spi-nor/core.h       |   8 +
 drivers/mtd/spi-nor/debugfs.c    |   4 +-
 drivers/mtd/spi-nor/gigadevice.c |   4 +-
 drivers/mtd/spi-nor/issi.c       |  11 +-
 drivers/mtd/spi-nor/macronix.c   |  10 +-
 drivers/mtd/spi-nor/micron-st.c  |  35 +-
 drivers/mtd/spi-nor/otp.c        |  48 ++-
 drivers/mtd/spi-nor/sfdp.c       |  33 +-
 drivers/mtd/spi-nor/spansion.c   |  57 +--
 drivers/mtd/spi-nor/sst.c        |   7 +-
 drivers/mtd/spi-nor/swp.c        |  25 +-
 drivers/mtd/spi-nor/winbond.c    |   2 +-
 drivers/mtd/spi-nor/xilinx.c     |  18 +-
 drivers/spi/spi-zynqmp-gqspi.c   |  58 ++-
 drivers/spi/spi.c                | 259 ++++++++++--
 include/linux/mtd/spi-nor.h      |  21 +-
 include/linux/spi/spi.h          |  51 ++-
 sound/pci/hda/cs35l56_hda_spi.c  |   2 +-
 21 files changed, 1085 insertions(+), 270 deletions(-)

Lee Jones Dec. 1, 2023, 9:57 a.m. UTC | #1

On Sat, 25 Nov 2023 14:51:28 +0530, Amit Kumar Mahapatra wrote:
> In preparation for adding multiple CS support for a device, set/get
> functions were introduces accessing spi->chip_select in
> 'commit 303feb3cc06a ("spi: Add APIs in spi core to set/get
> spi->chip_select and spi->cs_gpiod")'.
> Replace spi->chip_select with spi_get_chipselect() API.
> 
> 
> [...]

Applied, thanks!

[01/10] mfd: tps6594: Use set/get APIs to access spi->chip_select
        commit: dd636638446c87c95c5beddcd367d95ac6764c6c

--
Lee Jones [李琼斯]

Mark Brown Dec. 1, 2023, 6:50 p.m. UTC | #2

On Fri, Dec 01, 2023 at 09:57:36AM +0000, Lee Jones wrote:
> On Sat, 25 Nov 2023 14:51:28 +0530, Amit Kumar Mahapatra wrote:
> > In preparation for adding multiple CS support for a device, set/get
> > functions were introduces accessing spi->chip_select in
> > 'commit 303feb3cc06a ("spi: Add APIs in spi core to set/get
> > spi->chip_select and spi->cs_gpiod")'.
> > Replace spi->chip_select with spi_get_chipselect() API.

> Applied, thanks!

> [01/10] mfd: tps6594: Use set/get APIs to access spi->chip_select
>         commit: dd636638446c87c95c5beddcd367d95ac6764c6c

Is there a signed tag available for this - without this change the
subsequent SPI changes introduce a build breakage.

Lee Jones Dec. 6, 2023, 1:45 p.m. UTC | #3

On Fri, 01 Dec 2023, Mark Brown wrote:

> On Fri, Dec 01, 2023 at 09:57:36AM +0000, Lee Jones wrote:
> > On Sat, 25 Nov 2023 14:51:28 +0530, Amit Kumar Mahapatra wrote:
> > > In preparation for adding multiple CS support for a device, set/get
> > > functions were introduces accessing spi->chip_select in
> > > 'commit 303feb3cc06a ("spi: Add APIs in spi core to set/get
> > > spi->chip_select and spi->cs_gpiod")'.
> > > Replace spi->chip_select with spi_get_chipselect() API.
> 
> > Applied, thanks!
> 
> > [01/10] mfd: tps6594: Use set/get APIs to access spi->chip_select
> >         commit: dd636638446c87c95c5beddcd367d95ac6764c6c
> 
> Is there a signed tag available for this - without this change the
> subsequent SPI changes introduce a build breakage.

Not yet, but I can get around to making one.

Tudor Ambarus Dec. 6, 2023, 2:30 p.m. UTC | #4

Hi, Amit,

On 11/25/23 09:21, Amit Kumar Mahapatra wrote:
> Each flash that is connected in stacked mode should have a separate
> parameter structure. So, the flash parameter member(*params) of the spi_nor
> structure is changed to an array (*params[2]). The array is used to store
> the parameters of each flash connected in stacked configuration.
> 
> The current implementation assumes that a maximum of two flashes are
> connected in stacked mode and both the flashes are of same make but can
> differ in sizes. So, except the sizes all other flash parameters of both
> the flashes are identical.

Do you plan to add support for different flashes in stacked mode? If
not, wouldn't it be simpler to have just an array of flash sizes instead
of duplicating the entire params struct?

> 
> SPI-NOR is not aware of the chip_select values, for any incoming request
> SPI-NOR will decide the flash index with the help of individual flash size
> and the configuration type (single/stacked). SPI-NOR will pass on the flash
> index information to the SPI core & SPI driver by setting the appropriate
> bit in nor->spimem->spi->cs_index_mask. For example, if nth bit of
> nor->spimem->spi->cs_index_mask is set then the driver would
> assert/de-assert spi->chip_slect[n].
> 
> Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com>
> ---
>  drivers/mtd/spi-nor/core.c  | 272 +++++++++++++++++++++++++++++-------
>  drivers/mtd/spi-nor/core.h  |   4 +
>  include/linux/mtd/spi-nor.h |  15 +-
>  3 files changed, 240 insertions(+), 51 deletions(-)
> 
> diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
> index 93ae69b7ff83..e990be7c7eb6 100644
> --- a/drivers/mtd/spi-nor/core.c
> +++ b/drivers/mtd/spi-nor/core.c

cut

> @@ -2905,7 +3007,10 @@ static void spi_nor_init_fixup_flags(struct spi_nor *nor)
>  static int spi_nor_late_init_params(struct spi_nor *nor)
>  {
>  	struct spi_nor_flash_parameter *params = spi_nor_get_params(nor, 0);
> -	int ret;
> +	struct device_node *np = spi_nor_get_flash_node(nor);
> +	u64 flash_size[SNOR_FLASH_CNT_MAX];
> +	u32 idx = 0;
> +	int rc, ret;
>  
>  	if (nor->manufacturer && nor->manufacturer->fixups &&
>  	    nor->manufacturer->fixups->late_init) {
> @@ -2937,6 +3042,44 @@ static int spi_nor_late_init_params(struct spi_nor *nor)
>  	if (params->n_banks > 1)
>  		params->bank_size = div64_u64(params->size, params->n_banks);
>  
> +	nor->num_flash = 0;
> +
> +	/*
> +	 * The flashes that are connected in stacked mode should be of same make.
> +	 * Except the flash size all other properties are identical for all the
> +	 * flashes connected in stacked mode.
> +	 * The flashes that are connected in parallel mode should be identical.
> +	 */
> +	while (idx < SNOR_FLASH_CNT_MAX) {
> +		rc = of_property_read_u64_index(np, "stacked-memories", idx, &flash_size[idx]);

This is a little late in my opinion, as we don't have any sanity check
on the flashes that are stacked on top of the first. We shall at least
read and compare the ID for all.

Cheers,
ta

Tudor Ambarus Dec. 6, 2023, 2:43 p.m. UTC | #5

On 12/6/23 14:30, Tudor Ambarus wrote:
> Hi, Amit,
> 
> On 11/25/23 09:21, Amit Kumar Mahapatra wrote:
>> Each flash that is connected in stacked mode should have a separate
>> parameter structure. So, the flash parameter member(*params) of the spi_nor
>> structure is changed to an array (*params[2]). The array is used to store
>> the parameters of each flash connected in stacked configuration.
>>
>> The current implementation assumes that a maximum of two flashes are
>> connected in stacked mode and both the flashes are of same make but can
>> differ in sizes. So, except the sizes all other flash parameters of both
>> the flashes are identical.
> 
> Do you plan to add support for different flashes in stacked mode? If
> not, wouldn't it be simpler to have just an array of flash sizes instead
> of duplicating the entire params struct?
> 
>>
>> SPI-NOR is not aware of the chip_select values, for any incoming request
>> SPI-NOR will decide the flash index with the help of individual flash size
>> and the configuration type (single/stacked). SPI-NOR will pass on the flash
>> index information to the SPI core & SPI driver by setting the appropriate
>> bit in nor->spimem->spi->cs_index_mask. For example, if nth bit of
>> nor->spimem->spi->cs_index_mask is set then the driver would
>> assert/de-assert spi->chip_slect[n].
>>
>> Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com>
>> ---
>>  drivers/mtd/spi-nor/core.c  | 272 +++++++++++++++++++++++++++++-------
>>  drivers/mtd/spi-nor/core.h  |   4 +
>>  include/linux/mtd/spi-nor.h |  15 +-
>>  3 files changed, 240 insertions(+), 51 deletions(-)
>>
>> diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
>> index 93ae69b7ff83..e990be7c7eb6 100644
>> --- a/drivers/mtd/spi-nor/core.c
>> +++ b/drivers/mtd/spi-nor/core.c
> 
> cut
> 
>> @@ -2905,7 +3007,10 @@ static void spi_nor_init_fixup_flags(struct spi_nor *nor)
>>  static int spi_nor_late_init_params(struct spi_nor *nor)
>>  {
>>  	struct spi_nor_flash_parameter *params = spi_nor_get_params(nor, 0);
>> -	int ret;
>> +	struct device_node *np = spi_nor_get_flash_node(nor);
>> +	u64 flash_size[SNOR_FLASH_CNT_MAX];
>> +	u32 idx = 0;
>> +	int rc, ret;
>>  
>>  	if (nor->manufacturer && nor->manufacturer->fixups &&
>>  	    nor->manufacturer->fixups->late_init) {
>> @@ -2937,6 +3042,44 @@ static int spi_nor_late_init_params(struct spi_nor *nor)
>>  	if (params->n_banks > 1)
>>  		params->bank_size = div64_u64(params->size, params->n_banks);
>>  
>> +	nor->num_flash = 0;
>> +
>> +	/*
>> +	 * The flashes that are connected in stacked mode should be of same make.
>> +	 * Except the flash size all other properties are identical for all the
>> +	 * flashes connected in stacked mode.
>> +	 * The flashes that are connected in parallel mode should be identical.
>> +	 */
>> +	while (idx < SNOR_FLASH_CNT_MAX) {
>> +		rc = of_property_read_u64_index(np, "stacked-memories", idx, &flash_size[idx]);

also, it's not clear to me why you read this property multiple times.
Have you sent a device tree patch somewhere? It will help me understand
what you're trying to achieve.

> 
> This is a little late in my opinion, as we don't have any sanity check
> on the flashes that are stacked on top of the first. We shall at least
> read and compare the ID for all.
> 
> Cheers,
> ta

Tudor Ambarus Dec. 7, 2023, 5:24 p.m. UTC | #6

On 11/25/23 09:21, Amit Kumar Mahapatra wrote:
> The current implementation assumes that a maximum of two flashes are
> connected in stacked mode and both the flashes are of same make but can
> differ in sizes. So, except the sizes all other flash parameters of both
> the flashes are identical.

Too much restrictions, isn't it? Have you thought about adding a layer
on top of SPI NOR managing the stacked/parallel flashes?

Mark Brown Dec. 7, 2023, 10:35 p.m. UTC | #7

On Sat, 25 Nov 2023 14:51:27 +0530, Amit Kumar Mahapatra wrote:
> This patch series updated the spi-nor, spi core and the AMD-Xilinx GQSPI
> driver to add stacked and parallel memories support.
> 
> The first nine patches
> https://lore.kernel.org/all/20230119185342.2093323-1-amit.kumar-mahapatra@amd.com/
> https://lore.kernel.org/all/20230310173217.3429788-2-amit.kumar-mahapatra@amd.com/
> https://lore.kernel.org/all/20230310173217.3429788-3-amit.kumar-mahapatra@amd.com/
> https://lore.kernel.org/all/20230310173217.3429788-4-amit.kumar-mahapatra@amd.com/
> https://lore.kernel.org/all/20230310173217.3429788-5-amit.kumar-mahapatra@amd.com/
> https://lore.kernel.org/all/20230310173217.3429788-6-amit.kumar-mahapatra@amd.com/
> https://lore.kernel.org/all/20230310173217.3429788-7-amit.kumar-mahapatra@amd.com/
> https://lore.kernel.org/all/20230310173217.3429788-8-amit.kumar-mahapatra@amd.com/
> https://lore.kernel.org/all/20230310173217.3429788-9-amit.kumar-mahapatra@amd.com/
> of the previous series got applied to
> https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
> for-next But the rest of the patches in the series did not get applied as
> failure was reported for spi driver with GPIO CS, so send the remaining
> patches in the series after rebasing it on top of for-next branch and
> fixing the issue.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[02/10] ALSA: hda/cs35l56: Use set/get APIs to access spi->chip_select
        commit: f05e2f61fe88092e0d341ea27644a84e3386358d
[03/10] spi: Add multi-cs memories support in SPI core
        commit: 4d8ff6b0991d5e86b17b235fc46ec62e9195cb9b

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

Mahapatra, Amit Kumar Dec. 8, 2023, 5:05 p.m. UTC | #8

Hello Tudor,

> -----Original Message-----
> From: Tudor Ambarus <tudor.ambarus@linaro.org>
> Sent: Wednesday, December 6, 2023 8:00 PM
> To: Mahapatra, Amit Kumar <amit.kumar-mahapatra@amd.com>;
> broonie@kernel.org; pratyush@kernel.org; miquel.raynal@bootlin.com;
> richard@nod.at; vigneshr@ti.com; sbinding@opensource.cirrus.com;
> lee@kernel.org; james.schulman@cirrus.com; david.rhodes@cirrus.com;
> rf@opensource.cirrus.com; perex@perex.cz; tiwai@suse.com
> Cc: linux-spi@vger.kernel.org; linux-kernel@vger.kernel.org;
> michael@walle.cc; linux-mtd@lists.infradead.org;
> nicolas.ferre@microchip.com; alexandre.belloni@bootlin.com;
> claudiu.beznea@tuxon.dev; Simek, Michal <michal.simek@amd.com>; linux-
> arm-kernel@lists.infradead.org; alsa-devel@alsa-project.org;
> patches@opensource.cirrus.com; linux-sound@vger.kernel.org; git (AMD-
> Xilinx) <git@amd.com>; amitrkcian2002@gmail.com
> Subject: Re: [PATCH v11 07/10] mtd: spi-nor: Add stacked memories support
> in spi-nor
> 
> Hi, Amit,
> 
> On 11/25/23 09:21, Amit Kumar Mahapatra wrote:
> > Each flash that is connected in stacked mode should have a separate
> > parameter structure. So, the flash parameter member(*params) of the
> > spi_nor structure is changed to an array (*params[2]). The array is
> > used to store the parameters of each flash connected in stacked
> configuration.
> >
> > The current implementation assumes that a maximum of two flashes are
> > connected in stacked mode and both the flashes are of same make but
> > can differ in sizes. So, except the sizes all other flash parameters
> > of both the flashes are identical.
> 
> Do you plan to add support for different flashes in stacked mode? If not,

No, according to the current implementation, in stacked mode, both flashes 
must be of the same make.

> wouldn't it be simpler to have just an array of flash sizes instead of
> duplicating the entire params struct?

Yes, that is accurate. In alignment with our current stacked support use case we 
can have an array of flash sizes instead.
The primary purpose of having an array of params struct was to facilitate 
potential future extensions, allowing the addition of stacked support for 
different flashes

> 
> >
> > SPI-NOR is not aware of the chip_select values, for any incoming
> > request SPI-NOR will decide the flash index with the help of
> > individual flash size and the configuration type (single/stacked).
> > SPI-NOR will pass on the flash index information to the SPI core & SPI
> > driver by setting the appropriate bit in
> > nor->spimem->spi->cs_index_mask. For example, if nth bit of
> > nor->spimem->spi->cs_index_mask is set then the driver would
> > assert/de-assert spi->chip_slect[n].
> >
> > Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com>
> > ---
> >  drivers/mtd/spi-nor/core.c  | 272 +++++++++++++++++++++++++++++-------
> >  drivers/mtd/spi-nor/core.h  |   4 +
> >  include/linux/mtd/spi-nor.h |  15 +-
> >  3 files changed, 240 insertions(+), 51 deletions(-)
> >
> > diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
> > index 93ae69b7ff83..e990be7c7eb6 100644
> > --- a/drivers/mtd/spi-nor/core.c
> > +++ b/drivers/mtd/spi-nor/core.c
> 
> cut
> 
> > @@ -2905,7 +3007,10 @@ static void spi_nor_init_fixup_flags(struct
> > spi_nor *nor)  static int spi_nor_late_init_params(struct spi_nor
> > *nor)  {
> >  	struct spi_nor_flash_parameter *params = spi_nor_get_params(nor,
> 0);
> > -	int ret;
> > +	struct device_node *np = spi_nor_get_flash_node(nor);
> > +	u64 flash_size[SNOR_FLASH_CNT_MAX];
> > +	u32 idx = 0;
> > +	int rc, ret;
> >
> >  	if (nor->manufacturer && nor->manufacturer->fixups &&
> >  	    nor->manufacturer->fixups->late_init) { @@ -2937,6 +3042,44 @@
> > static int spi_nor_late_init_params(struct spi_nor *nor)
> >  	if (params->n_banks > 1)
> >  		params->bank_size = div64_u64(params->size, params-
> >n_banks);
> >
> > +	nor->num_flash = 0;
> > +
> > +	/*
> > +	 * The flashes that are connected in stacked mode should be of same
> make.
> > +	 * Except the flash size all other properties are identical for all the
> > +	 * flashes connected in stacked mode.
> > +	 * The flashes that are connected in parallel mode should be identical.
> > +	 */
> > +	while (idx < SNOR_FLASH_CNT_MAX) {
> > +		rc = of_property_read_u64_index(np, "stacked-memories",
> idx,
> > +&flash_size[idx]);
> 
> This is a little late in my opinion, as we don't have any sanity check on the
> flashes that are stacked on top of the first. We shall at least read and compare
> the ID for all.

Alright, I will incorporate a sanity check for reading and comparing the 
ID of the stacked flash. Subsequently, I believe this stacked logic 
should be relocated to spi_nor_get_flash_info() where we identify the 
first flash. Please share your thoughts on this. Additionally, do you 
anticipate that SPI-NOR should throw an error if the second or any 
subsequent flash within the stacked connection is different? Or would you 
prefer it to print a warning and operate in single mode (i.e., only the 
first flash)?

Regards,
Amit
> 
> Cheers,
> ta

Tudor Ambarus Dec. 11, 2023, 3:33 a.m. UTC | #9

On 12/8/23 17:05, Mahapatra, Amit Kumar wrote:
> Hello Tudor,

Hi!

> 
>> -----Original Message-----
>> From: Tudor Ambarus <tudor.ambarus@linaro.org>
>> Sent: Wednesday, December 6, 2023 8:00 PM
>> To: Mahapatra, Amit Kumar <amit.kumar-mahapatra@amd.com>;
>> broonie@kernel.org; pratyush@kernel.org; miquel.raynal@bootlin.com;
>> richard@nod.at; vigneshr@ti.com; sbinding@opensource.cirrus.com;
>> lee@kernel.org; james.schulman@cirrus.com; david.rhodes@cirrus.com;
>> rf@opensource.cirrus.com; perex@perex.cz; tiwai@suse.com
>> Cc: linux-spi@vger.kernel.org; linux-kernel@vger.kernel.org;
>> michael@walle.cc; linux-mtd@lists.infradead.org;
>> nicolas.ferre@microchip.com; alexandre.belloni@bootlin.com;
>> claudiu.beznea@tuxon.dev; Simek, Michal <michal.simek@amd.com>; linux-
>> arm-kernel@lists.infradead.org; alsa-devel@alsa-project.org;
>> patches@opensource.cirrus.com; linux-sound@vger.kernel.org; git (AMD-
>> Xilinx) <git@amd.com>; amitrkcian2002@gmail.com
>> Subject: Re: [PATCH v11 07/10] mtd: spi-nor: Add stacked memories support
>> in spi-nor
>>
>> Hi, Amit,
>>
>> On 11/25/23 09:21, Amit Kumar Mahapatra wrote:
>>> Each flash that is connected in stacked mode should have a separate
>>> parameter structure. So, the flash parameter member(*params) of the
>>> spi_nor structure is changed to an array (*params[2]). The array is
>>> used to store the parameters of each flash connected in stacked
>> configuration.
>>>
>>> The current implementation assumes that a maximum of two flashes are
>>> connected in stacked mode and both the flashes are of same make but
>>> can differ in sizes. So, except the sizes all other flash parameters
>>> of both the flashes are identical.
>>
>> Do you plan to add support for different flashes in stacked mode? If not,
> 
> No, according to the current implementation, in stacked mode, both flashes 
> must be of the same make.
> 
>> wouldn't it be simpler to have just an array of flash sizes instead of
>> duplicating the entire params struct?
> 
> Yes, that is accurate. In alignment with our current stacked support use case we 
> can have an array of flash sizes instead.
> The primary purpose of having an array of params struct was to facilitate 
> potential future extensions, allowing the addition of stacked support for 
> different flashes
> 

right. Don't do this change yet, let's decide on the overall
architecture first.

>>
>>>
>>> SPI-NOR is not aware of the chip_select values, for any incoming
>>> request SPI-NOR will decide the flash index with the help of
>>> individual flash size and the configuration type (single/stacked).
>>> SPI-NOR will pass on the flash index information to the SPI core & SPI
>>> driver by setting the appropriate bit in
>>> nor->spimem->spi->cs_index_mask. For example, if nth bit of
>>> nor->spimem->spi->cs_index_mask is set then the driver would
>>> assert/de-assert spi->chip_slect[n].
>>>
>>> Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com>
>>> ---
>>>  drivers/mtd/spi-nor/core.c  | 272 +++++++++++++++++++++++++++++-------
>>>  drivers/mtd/spi-nor/core.h  |   4 +
>>>  include/linux/mtd/spi-nor.h |  15 +-
>>>  3 files changed, 240 insertions(+), 51 deletions(-)
>>>
>>> diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
>>> index 93ae69b7ff83..e990be7c7eb6 100644
>>> --- a/drivers/mtd/spi-nor/core.c
>>> +++ b/drivers/mtd/spi-nor/core.c
>>
>> cut
>>
>>> @@ -2905,7 +3007,10 @@ static void spi_nor_init_fixup_flags(struct
>>> spi_nor *nor)  static int spi_nor_late_init_params(struct spi_nor
>>> *nor)  {
>>>  	struct spi_nor_flash_parameter *params = spi_nor_get_params(nor,
>> 0);
>>> -	int ret;
>>> +	struct device_node *np = spi_nor_get_flash_node(nor);
>>> +	u64 flash_size[SNOR_FLASH_CNT_MAX];
>>> +	u32 idx = 0;
>>> +	int rc, ret;
>>>
>>>  	if (nor->manufacturer && nor->manufacturer->fixups &&
>>>  	    nor->manufacturer->fixups->late_init) { @@ -2937,6 +3042,44 @@
>>> static int spi_nor_late_init_params(struct spi_nor *nor)
>>>  	if (params->n_banks > 1)
>>>  		params->bank_size = div64_u64(params->size, params-
>>> n_banks);
>>>
>>> +	nor->num_flash = 0;
>>> +
>>> +	/*
>>> +	 * The flashes that are connected in stacked mode should be of same
>> make.
>>> +	 * Except the flash size all other properties are identical for all the
>>> +	 * flashes connected in stacked mode.
>>> +	 * The flashes that are connected in parallel mode should be identical.
>>> +	 */
>>> +	while (idx < SNOR_FLASH_CNT_MAX) {
>>> +		rc = of_property_read_u64_index(np, "stacked-memories",
>> idx,
>>> +&flash_size[idx]);
>>
>> This is a little late in my opinion, as we don't have any sanity check on the
>> flashes that are stacked on top of the first. We shall at least read and compare
>> the ID for all.
> 
> Alright, I will incorporate a sanity check for reading and comparing the 
> ID of the stacked flash. Subsequently, I believe this stacked logic 
> should be relocated to spi_nor_get_flash_info() where we identify the 
> first flash. Please share your thoughts on this. Additionally, do you 

I'm wondering whether we can add a layer on top of the flash type to
handle the stacked/parallel modes. This way everything will become flash
type independent. Would it be possible to stack 2 SPI NANDs? How about a
SPI NOR and a SPI NAND?

Is the datasheet of the controller public?

> anticipate that SPI-NOR should throw an error if the second or any 
> subsequent flash within the stacked connection is different? Or would you 
> prefer it to print a warning and operate in single mode (i.e., only the 
> first flash)?

Both options are fine, but I haven't yet decided on the overall
architecture.

Cheers,
ta

Tudor Ambarus Dec. 11, 2023, 9:35 a.m. UTC | #10

On 12/11/23 06:56, Mahapatra, Amit Kumar wrote:
> 
> 
>> -----Original Message-----
>> From: Tudor Ambarus <tudor.ambarus@linaro.org>
>> Sent: Monday, December 11, 2023 9:03 AM
>> To: Mahapatra, Amit Kumar <amit.kumar-mahapatra@amd.com>;
>> broonie@kernel.org; pratyush@kernel.org; miquel.raynal@bootlin.com;
>> richard@nod.at; vigneshr@ti.com; sbinding@opensource.cirrus.com;
>> lee@kernel.org; james.schulman@cirrus.com; david.rhodes@cirrus.com;
>> rf@opensource.cirrus.com; perex@perex.cz; tiwai@suse.com
>> Cc: linux-spi@vger.kernel.org; linux-kernel@vger.kernel.org;
>> michael@walle.cc; linux-mtd@lists.infradead.org;
>> nicolas.ferre@microchip.com; alexandre.belloni@bootlin.com;
>> claudiu.beznea@tuxon.dev; Simek, Michal <michal.simek@amd.com>; linux-
>> arm-kernel@lists.infradead.org; alsa-devel@alsa-project.org;
>> patches@opensource.cirrus.com; linux-sound@vger.kernel.org; git (AMD-
>> Xilinx) <git@amd.com>; amitrkcian2002@gmail.com
>> Subject: Re: [PATCH v11 07/10] mtd: spi-nor: Add stacked memories support
>> in spi-nor
>>
>>
>>
>> On 12/8/23 17:05, Mahapatra, Amit Kumar wrote:
>>> Hello Tudor,
>>
>> Hi!
> 
> Hello Tudor,
> 

Hi!

>>
>>>
>>>> -----Original Message-----
>>>> From: Tudor Ambarus <tudor.ambarus@linaro.org>
>>>> Sent: Wednesday, December 6, 2023 8:00 PM
>>>> To: Mahapatra, Amit Kumar <amit.kumar-mahapatra@amd.com>;
>>>> broonie@kernel.org; pratyush@kernel.org; miquel.raynal@bootlin.com;
>>>> richard@nod.at; vigneshr@ti.com; sbinding@opensource.cirrus.com;
>>>> lee@kernel.org; james.schulman@cirrus.com; david.rhodes@cirrus.com;
>>>> rf@opensource.cirrus.com; perex@perex.cz; tiwai@suse.com
>>>> Cc: linux-spi@vger.kernel.org; linux-kernel@vger.kernel.org;
>>>> michael@walle.cc; linux-mtd@lists.infradead.org;
>>>> nicolas.ferre@microchip.com; alexandre.belloni@bootlin.com;
>>>> claudiu.beznea@tuxon.dev; Simek, Michal <michal.simek@amd.com>;
>>>> linux- arm-kernel@lists.infradead.org; alsa-devel@alsa-project.org;
>>>> patches@opensource.cirrus.com; linux-sound@vger.kernel.org; git (AMD-
>>>> Xilinx) <git@amd.com>; amitrkcian2002@gmail.com
>>>> Subject: Re: [PATCH v11 07/10] mtd: spi-nor: Add stacked memories
>>>> support in spi-nor
>>>>
>>>> Hi, Amit,
>>>>
>>>> On 11/25/23 09:21, Amit Kumar Mahapatra wrote:
>>>>> Each flash that is connected in stacked mode should have a separate
>>>>> parameter structure. So, the flash parameter member(*params) of the
>>>>> spi_nor structure is changed to an array (*params[2]). The array is
>>>>> used to store the parameters of each flash connected in stacked
>>>> configuration.
>>>>>
>>>>> The current implementation assumes that a maximum of two flashes are
>>>>> connected in stacked mode and both the flashes are of same make but
>>>>> can differ in sizes. So, except the sizes all other flash parameters
>>>>> of both the flashes are identical.
>>>>
>>>> Do you plan to add support for different flashes in stacked mode? If
>>>> not,
>>>
>>> No, according to the current implementation, in stacked mode, both
>>> flashes must be of the same make.
>>>
>>>> wouldn't it be simpler to have just an array of flash sizes instead
>>>> of duplicating the entire params struct?
>>>
>>> Yes, that is accurate. In alignment with our current stacked support
>>> use case we can have an array of flash sizes instead.
>>> The primary purpose of having an array of params struct was to
>>> facilitate potential future extensions, allowing the addition of
>>> stacked support for different flashes
>>>
>>
>> right. Don't do this change yet, let's decide on the overall architecture first.
> 
> Sure.
>>
>>>>
>>>>>
>>>>> SPI-NOR is not aware of the chip_select values, for any incoming
>>>>> request SPI-NOR will decide the flash index with the help of
>>>>> individual flash size and the configuration type (single/stacked).
>>>>> SPI-NOR will pass on the flash index information to the SPI core &
>>>>> SPI driver by setting the appropriate bit in
>>>>> nor->spimem->spi->cs_index_mask. For example, if nth bit of
>>>>> nor->spimem->spi->cs_index_mask is set then the driver would
>>>>> assert/de-assert spi->chip_slect[n].
>>>>>
>>>>> Signed-off-by: Amit Kumar Mahapatra <amit.kumar-
>> mahapatra@amd.com>
>>>>> ---
>>>>>  drivers/mtd/spi-nor/core.c  | 272 +++++++++++++++++++++++++++++-----
>> --
>>>>>  drivers/mtd/spi-nor/core.h  |   4 +
>>>>>  include/linux/mtd/spi-nor.h |  15 +-
>>>>>  3 files changed, 240 insertions(+), 51 deletions(-)
>>>>>
>>>>> diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
>>>>> index 93ae69b7ff83..e990be7c7eb6 100644
>>>>> --- a/drivers/mtd/spi-nor/core.c
>>>>> +++ b/drivers/mtd/spi-nor/core.c
>>>>
>>>> cut
>>>>
>>>>> @@ -2905,7 +3007,10 @@ static void spi_nor_init_fixup_flags(struct
>>>>> spi_nor *nor)  static int spi_nor_late_init_params(struct spi_nor
>>>>> *nor)  {
>>>>>  	struct spi_nor_flash_parameter *params = spi_nor_get_params(nor,
>>>> 0);
>>>>> -	int ret;
>>>>> +	struct device_node *np = spi_nor_get_flash_node(nor);
>>>>> +	u64 flash_size[SNOR_FLASH_CNT_MAX];
>>>>> +	u32 idx = 0;
>>>>> +	int rc, ret;
>>>>>
>>>>>  	if (nor->manufacturer && nor->manufacturer->fixups &&
>>>>>  	    nor->manufacturer->fixups->late_init) { @@ -2937,6 +3042,44 @@
>>>>> static int spi_nor_late_init_params(struct spi_nor *nor)
>>>>>  	if (params->n_banks > 1)
>>>>>  		params->bank_size = div64_u64(params->size, params-
>> n_banks);
>>>>>
>>>>> +	nor->num_flash = 0;
>>>>> +
>>>>> +	/*
>>>>> +	 * The flashes that are connected in stacked mode should be of
>>>>> +same
>>>> make.
>>>>> +	 * Except the flash size all other properties are identical for all the
>>>>> +	 * flashes connected in stacked mode.
>>>>> +	 * The flashes that are connected in parallel mode should be identical.
>>>>> +	 */
>>>>> +	while (idx < SNOR_FLASH_CNT_MAX) {
>>>>> +		rc = of_property_read_u64_index(np, "stacked-memories",
>>>> idx,
>>>>> +&flash_size[idx]);
>>>>
>>>> This is a little late in my opinion, as we don't have any sanity
>>>> check on the flashes that are stacked on top of the first. We shall
>>>> at least read and compare the ID for all.
>>>
>>> Alright, I will incorporate a sanity check for reading and comparing
>>> the ID of the stacked flash. Subsequently, I believe this stacked
>>> logic should be relocated to spi_nor_get_flash_info() where we
>>> identify the first flash. Please share your thoughts on this.
>>> Additionally, do you
>>
>> I'm wondering whether we can add a layer on top of the flash type to handle
> 
> When you mention "on top," are you referring to incorporating it into 
> the MTD layer? Initially, Miquel had submitted this patch to address 

I mean something above SPI MEM flashes, be it NOR, NANDs or whatever.
Instead of treating the stacked flashes as a monolithic device and treat
in SPI NOR some array of flashes, to have a layer above which probes the
SPI MEM flash driver for each stacked flash. In your case SPI NOR would
be probed twice, as you use 2 SPI NOR flashes.

> stacked/parallel handling in the MTD layer.
> https://lore.kernel.org/linux-mtd/20200114191052.0a16d116@xps13/t/
> However, the Device Tree bindings were initially not accepted. 

Okay, thanks for the pointer. I'll take a look.

> Following a series of discussions, the below bindings were 
> eventually merged.
> https://lore.kernel.org/all/20220126112608.955728-4-miquel.raynal@bootlin.com/

I saw, thanks.

> 
>> the stacked/parallel modes. This way everything will become flash type
>> independent. Would it be possible to stack 2 SPI NANDs? How about a SPI
>> NOR and a SPI NAND?
>>
>> Is the datasheet of the controller public?
> 
> Yes, https://docs.xilinx.com/r/en-US/am011-versal-acap-trm/Quad-SPI-Controller
> 

Wonderful, I'll take a look. I see two clocks there. Does this mean that
the stacked flashes can be operated at different frequencies? Do you
know if we can combine a SPI NOR with a SPI NAND in stacked configuration?

I need to study this a bit. I'll try to involve Michael and Pratyush too.

Cheers,
ta

Michael Walle Dec. 12, 2023, 12:34 p.m. UTC | #11

> This patch series updated the spi-nor, spi core and the AMD-Xilinx 
> GQSPI
> driver to add stacked and parallel memories support.

Honestly, I'm not thrilled about how this is implemented in the core
and what the restrictions are.
First, the pattern "if (n==1) then { old behavior } else { copy old
code modify for n==2 }" is hard to maintain. There should be no copy
and the old code shall be adapted to work for both n=1 and n>1.

But I agree with Tudor, some kind of abstraction (layer) would be nice.

Also, you hardcode n=2 everywhere. Please generalize that one.

Are you aware of any other controller supporting such a feature? I've
seen you also need to modify the spi controller and intercept some
commands. Can everything be moved there?

I'm not sure we are implementing controller specific things in the
core. Hard to judge without seeing other controllers doing a similar
thing. I'd like to avoid that.

If we had some kind of abstraction here, that might be easier to adapt
in the future, but just putting everything into the core will make it
really hard to maintain. So if everything related to stacked and
parallel memory would be in drivers/mtd/spi-nor/stacked.c, we'd have
at least everything in one place with a proper interface between
that and the core.

-michael

Tudor Ambarus Dec. 15, 2023, 8:09 a.m. UTC | #12

On 15.12.2023 09:55, Mahapatra, Amit Kumar wrote:
>> Thanks! Can you share with us what flashes you used for testing in the
>> stacked and parallel configurations?
> I used SPI-NOR QSPI flashes for testing stacked and parallel.

I got that, I wanted the flash name or device ID.

What I'm interested is if each flash is in its own package. Are they?

> 
>>>> can combine a SPI NOR with a SPI NAND in stacked configuration?
>>> No, Xilinx/AMD QSPI controllers doesn't support this configuration.
>>>
>> 2 SPI NANDs shall work with the AMD controller, right?
> We never tested 2 SPI-NAND with AMD controller.

I was asking because I think the stacked layer shall be SPI MEM generic,
and not particular to SPI NOR.

>> I skimmed over the QSPI controller datasheet and wondered why one would
>> get complicated with 2 Quad Flashes when we have Octal. But then I saw that
>> the same SoC can configure an Octal controller (the Octal and Quad
>> controllers seems mutual exclusive) and that the Octal one can operate 2
>> octal flashes in stacked mode 🙂.
> Thats correct .
> 
> Please let me know how you want me to proceed on this.

I got you. Still need to allocate more time on this.

Cheers,
ta

Tudor Ambarus Dec. 15, 2023, 10:33 a.m. UTC | #13

On 12/15/23 10:02, Mahapatra, Amit Kumar wrote:
> Hello Tudor,

Hi,

> 
>> -----Original Message-----
>> From: Tudor Ambarus <tudor.ambarus@linaro.org>
>> Sent: Friday, December 15, 2023 1:40 PM
>> To: Mahapatra, Amit Kumar <amit.kumar-mahapatra@amd.com>;
>> broonie@kernel.org; pratyush@kernel.org; miquel.raynal@bootlin.com;
>> richard@nod.at; vigneshr@ti.com; sbinding@opensource.cirrus.com;
>> lee@kernel.org; james.schulman@cirrus.com; david.rhodes@cirrus.com;
>> rf@opensource.cirrus.com; perex@perex.cz; tiwai@suse.com
>> Cc: linux-spi@vger.kernel.org; linux-kernel@vger.kernel.org;
>> michael@walle.cc; linux-mtd@lists.infradead.org;
>> nicolas.ferre@microchip.com; alexandre.belloni@bootlin.com;
>> claudiu.beznea@tuxon.dev; Simek, Michal <michal.simek@amd.com>; linux-
>> arm-kernel@lists.infradead.org; alsa-devel@alsa-project.org;
>> patches@opensource.cirrus.com; linux-sound@vger.kernel.org; git (AMD-
>> Xilinx) <git@amd.com>; amitrkcian2002@gmail.com
>> Subject: Re: [PATCH v11 07/10] mtd: spi-nor: Add stacked memories support
>> in spi-nor
>>
>>
>>
>> On 15.12.2023 09:55, Mahapatra, Amit Kumar wrote:
>>>> Thanks! Can you share with us what flashes you used for testing in
>>>> the stacked and parallel configurations?
>>> I used SPI-NOR QSPI flashes for testing stacked and parallel.
>>
>> I got that, I wanted the flash name or device ID.
> 
> N25Q00A, MX66U2G45G, IS25LP01G & W25H02JV are some of the QSPI flashes on 
> which we tested. Additionally, we conducted tests on over 30 different 
> QSPI flashes from four distinct vendors (Miron, Winbond, Macronix, and ISSI).
> 

Great.

>> What I'm interested is if each flash is in its own package. Are they?
> 
> I'm sorry, but I don't quite understand what you mean by "if each flash in 
> its own package."
> 

There are flashes that are stacked at the physical level. It's a single
flash with multiple dies, that are all under a single physical package.

As I understand, your stacked flash model is at logical level. You have
2 flashes each in its own package. 2 different entities. Is my
understanding correct?

Cheers,
ta

Richard Weinberger Dec. 18, 2023, 10:10 p.m. UTC | #14

----- Ursprüngliche Mail -----
> Von: "Amit Kumar Mahapatra" <amit.kumar-mahapatra@amd.com>

> This patch series updated the spi-nor, spi core and the AMD-Xilinx GQSPI
> driver to add stacked and parallel memories support.

I wish the series had a real cover letter which explains the big picture
in more detail.

What I didn't really get so far, is it really necessary to support multiple
chip selects within a single mtd?
You changes introduce hard to maintain changes into the spi-nor/mtd core code
which alert me.
Why can't we have one mtd for each cs and, if needed, combine them later?
We have drivers such as mtdconcat for reasons.

Thanks,
//richard

Miquel Raynal Dec. 19, 2023, 8:12 a.m. UTC | #15

Hi Richard,

richard@nod.at wrote on Mon, 18 Dec 2023 23:10:20 +0100 (CET):

> ----- Ursprüngliche Mail -----
> > Von: "Amit Kumar Mahapatra" <amit.kumar-mahapatra@amd.com>  
> 
> > This patch series updated the spi-nor, spi core and the AMD-Xilinx GQSPI
> > driver to add stacked and parallel memories support.  
> 
> I wish the series had a real cover letter which explains the big picture
> in more detail.
> 
> What I didn't really get so far, is it really necessary to support multiple
> chip selects within a single mtd?
> You changes introduce hard to maintain changes into the spi-nor/mtd core code
> which alert me.
> Why can't we have one mtd for each cs and, if needed, combine them later?
> We have drivers such as mtdconcat for reasons.

The Xilinx SPI controller is a bit convoluted, there are two ways to
address the bits in a memory:
* Either your extend the memory range with the second chip "on
  top" of the first (which would typically be a mtd-concat use case)
* Or you use the two chips in parallel and you store the even bits
  in one device (let's say cs0) and the odd bits in the other (cs1).
  Extending mtd-concat for this might be another solution, I don't know
  how feasible it would be.

Maybe these bindings will help understanding the logic:
e2edd1b64f1c ("spi: dt-bindings: Describe stacked/parallel memories modes")
eba5368503b4 ("spi: dt-bindings: Add an example with two stacked flashes")

However I agree the changes will likely be hard to maintain given the
complexity brought with such a different controller.

Thanks,
Miquèl

Tudor Ambarus Dec. 19, 2023, 8:26 a.m. UTC | #16

On 15.12.2023 13:20, Mahapatra, Amit Kumar wrote:
> Hello Tudor,
> 

Hi!

>> -----Original Message-----
>> From: Tudor Ambarus <tudor.ambarus@linaro.org>
>> Sent: Friday, December 15, 2023 4:03 PM
>> To: Mahapatra, Amit Kumar <amit.kumar-mahapatra@amd.com>;
>> broonie@kernel.org; pratyush@kernel.org; miquel.raynal@bootlin.com;
>> richard@nod.at; vigneshr@ti.com; sbinding@opensource.cirrus.com;
>> lee@kernel.org; james.schulman@cirrus.com; david.rhodes@cirrus.com;
>> rf@opensource.cirrus.com; perex@perex.cz; tiwai@suse.com
>> Cc: linux-spi@vger.kernel.org; linux-kernel@vger.kernel.org; michael@walle.cc;
>> linux-mtd@lists.infradead.org; nicolas.ferre@microchip.com;
>> alexandre.belloni@bootlin.com; claudiu.beznea@tuxon.dev; Simek, Michal
>> <michal.simek@amd.com>; linux-arm-kernel@lists.infradead.org; alsa-
>> devel@alsa-project.org; patches@opensource.cirrus.com; linux-
>> sound@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>;
>> amitrkcian2002@gmail.com
>> Subject: Re: [PATCH v11 07/10] mtd: spi-nor: Add stacked memories support
>> in spi-nor
>>
>>
>>
>> On 12/15/23 10:02, Mahapatra, Amit Kumar wrote:
>>> Hello Tudor,
>>
>> Hi,
>>
>>>
>>>> -----Original Message-----
>>>> From: Tudor Ambarus <tudor.ambarus@linaro.org>
>>>> Sent: Friday, December 15, 2023 1:40 PM
>>>> To: Mahapatra, Amit Kumar <amit.kumar-mahapatra@amd.com>;
>>>> broonie@kernel.org; pratyush@kernel.org; miquel.raynal@bootlin.com;
>>>> richard@nod.at; vigneshr@ti.com; sbinding@opensource.cirrus.com;
>>>> lee@kernel.org; james.schulman@cirrus.com; david.rhodes@cirrus.com;
>>>> rf@opensource.cirrus.com; perex@perex.cz; tiwai@suse.com
>>>> Cc: linux-spi@vger.kernel.org; linux-kernel@vger.kernel.org;
>>>> michael@walle.cc; linux-mtd@lists.infradead.org;
>>>> nicolas.ferre@microchip.com; alexandre.belloni@bootlin.com;
>>>> claudiu.beznea@tuxon.dev; Simek, Michal <michal.simek@amd.com>;
>>>> linux- arm-kernel@lists.infradead.org; alsa-devel@alsa-project.org;
>>>> patches@opensource.cirrus.com; linux-sound@vger.kernel.org; git (AMD-
>>>> Xilinx) <git@amd.com>; amitrkcian2002@gmail.com
>>>> Subject: Re: [PATCH v11 07/10] mtd: spi-nor: Add stacked memories
>>>> support in spi-nor
>>>>
>>>>
>>>>
>>>> On 15.12.2023 09:55, Mahapatra, Amit Kumar wrote:
>>>>>> Thanks! Can you share with us what flashes you used for testing in
>>>>>> the stacked and parallel configurations?
>>>>> I used SPI-NOR QSPI flashes for testing stacked and parallel.
>>>>
>>>> I got that, I wanted the flash name or device ID.
>>>
>>> N25Q00A, MX66U2G45G, IS25LP01G & W25H02JV are some of the QSPI
>> flashes
>>> on which we tested. Additionally, we conducted tests on over 30
>>> different QSPI flashes from four distinct vendors (Miron, Winbond,
>> Macronix, and ISSI).
>>>
>>
>> Great.
>>
>>>> What I'm interested is if each flash is in its own package. Are they?
>>>
>>> I'm sorry, but I don't quite understand what you mean by "if each
>>> flash in its own package."
>>>
>>
>> There are flashes that are stacked at the physical level. It's a single flash with
>> multiple dies, that are all under a single physical package.
> 
> Got it. The W25H02JV QSPI flash I mentioned earlier is a device with 
> with four dies that are stacked at the physical level.
> 
>>
>> As I understand, your stacked flash model is at logical level. You have
>> 2 flashes each in its own package. 2 different entities. Is my understanding
>> correct?
> 
> Yes, that’s correct.
> 
> I'd like to contribute to your earlier point regarding the placement of 
> the stacked layer. As you correctly highlighted, it should be in the 
> spi-mem generic layer. For instance, when a read/write operation extends 
> across multiple flashes (whether SPI-NOR or SPI-NAND), the stacked layer 
> must handle the flash crossover. This requires setting the appropriate CS 
> index in mem->spi->cs_index_mask to select the correct slave device and 
> updating the data buffer, address & data length in spi_mem_op struct 
> variable. Does this align with your understanding?
> 

This was the initial idea, yes, but we'll have to see how mtd concat
fits in. Maybe the abstraction can be made at the mtd level, which I
suspect mtd concat does. I have to read that driver, never opened it.

Something else to consider: I see that Micron has a twin quad mode:
https://media-www.micron.com/-/media/client/global/documents/products/data-sheet/nor-flash/serial-nor/mt25t/generation-b/mt25t_qljs_l_512_xba_0.pdf?rev=de70b770c5dc4da8b8ead06b57c03500

The micron's "Separate Chip-Select and Clock Signals" resembles the
AMD's dual parallel 8-bit.
Micron's "Shared Chip-Select and Clock Signals" differs from the AMD's
stacked mode, as Micron uses DQ[3:0] and DQ[7:4], whereas AMD considers
both as DQ[3:0].

I had a short chat with Michael and he highlighted that instead of the
parallel mode, one would be better of with an octal device. I wonder
whether the quad parallel is worth the effort. I see AMD can select
either quad (single/stacked/parallel) or octal (single/stacked). Is the
parallel mode considered obsolete for new IPs?

Cheers,
ta

Tudor Ambarus Feb. 9, 2024, 11:06 a.m. UTC | #17

On 12/21/23 06:54, Mahapatra, Amit Kumar wrote:
>> Something else to consider: I see that Micron has a twin quad mode:
>> https://media-www.micron.com/-
>> /media/client/global/documents/products/data-sheet/nor-flash/serial-
>> nor/mt25t/generation-
>> b/mt25t_qljs_l_512_xba_0.pdf?rev=de70b770c5dc4da8b8ead06b57c03500
>>
>> The micron's "Separate Chip-Select and Clock Signals" resembles the AMD's
>> dual parallel 8-bit.
> Yes, I agree.
> 
>> Micron's "Shared Chip-Select and Clock Signals" differs from the AMD's
>> stacked mode, as Micron uses DQ[3:0] and DQ[7:4], whereas AMD considers
>> both as DQ[3:0].
> Yes, correct.

Amit, please help me to assess this. I assume Micron and Microchip is
using the same concepts as AMD uses for the "Dual Parallel 8-bit IO
mode", but they call it "Twin Quad Mode".

I was wrong, the AMD datasheet [1] was misleading [2], it described the
IOs for both flashes as IO[3:0], but later on in the "Table QSPI
Interface Signals" the second flash is described with IO[7:4].

The AMD's 8-bit Dual Flash Parallel Interface is using dedicated CS# and
CLK# lines for each flash. As Micron does, isn't it?

Micron says [3] that:
"The device contains two quad I/O die, each able to operate
independently for a total of eight I/Os. The memory map applies to each
die. Each die has internal registers for status, configuration, and
device protection that can be set and read independently from one other.
Micron recommends that internal configuration settings for the two die
be set identically."

it also says that:
"When using quad commands in XIO-SPI or when using QIO-SPI,
DQ[3:0]/DQ[7:4] are I/O."

So I guess the upper layers just ask for a chunk of memory to be written
and the controller handles the cs# lines automatically. How is the AMD
controller working, do you have to drive the cs# lines manually, or you
just set the parallel mode and the controller takes care of everything?

I assume this is how mchp is handling things, they seem to just set a
bit the protocol into the QSPI_IFR.PROTTYP register field and that's all
[4]. They even seem to write the registers of both flashes at the same time.

In what regards the AMD's "dual stack interface", AMD is sharing the
clock and IO lines and uses dedicated CS# lines for the flashes, whereas
Micron shares the CS# and CLK# lines with different IO lines.

Amit, please study the architectures used by mchp, micron and amd and
let us know if they are the same or they differ, and if they differ what
are the differences.

I added Conor from mchp in cc, I see Nicolas is already there, and Bean
from micron.

Thanks,
ta

[1]
https://docs.xilinx.com/r/en-US/am011-versal-acap-trm/QSPI-Flash-Interface-Signals
[2]
https://docs.xilinx.com/viewer/attachment/dwmjhDJGICdJqD4swyVzcQ/fD8nv4ry78xM0_EF5kv4mA
[3]
https://media-www.micron.com/-/media/client/global/documents/products/data-sheet/nor-flash/serial-nor/mt25t/generation-b/mt25t_qljs_l_512_xba_0.pdf?rev=de70b770c5dc4da8b8ead06b57c03500
[4]
https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/ProductDocuments/DataSheets/SAMA7G5-Series-Data-Sheet-DS60001765.pdf

Tudor Ambarus Feb. 9, 2024, 4:13 p.m. UTC | #18

On 2/9/24 11:06, Tudor Ambarus wrote:
> 
> 
> On 12/21/23 06:54, Mahapatra, Amit Kumar wrote:
>>> Something else to consider: I see that Micron has a twin quad mode:
>>> https://media-www.micron.com/-
>>> /media/client/global/documents/products/data-sheet/nor-flash/serial-
>>> nor/mt25t/generation-
>>> b/mt25t_qljs_l_512_xba_0.pdf?rev=de70b770c5dc4da8b8ead06b57c03500
>>>
>>> The micron's "Separate Chip-Select and Clock Signals" resembles the AMD's
>>> dual parallel 8-bit.
>> Yes, I agree.
>>
>>> Micron's "Shared Chip-Select and Clock Signals" differs from the AMD's
>>> stacked mode, as Micron uses DQ[3:0] and DQ[7:4], whereas AMD considers
>>> both as DQ[3:0].
>> Yes, correct.
> 
> Amit, please help me to assess this. I assume Micron and Microchip is
> using the same concepts as AMD uses for the "Dual Parallel 8-bit IO
> mode", but they call it "Twin Quad Mode".
> 
> I was wrong, the AMD datasheet [1] was misleading [2], it described the
> IOs for both flashes as IO[3:0], but later on in the "Table QSPI
> Interface Signals" the second flash is described with IO[7:4].
> 
> The AMD's 8-bit Dual Flash Parallel Interface is using dedicated CS# and
> CLK# lines for each flash. As Micron does, isn't it?
> 
> Micron says [3] that:
> "The device contains two quad I/O die, each able to operate
> independently for a total of eight I/Os. The memory map applies to each
> die. Each die has internal registers for status, configuration, and
> device protection that can be set and read independently from one other.
> Micron recommends that internal configuration settings for the two die
> be set identically."

Amit,

I forgot to say my first conclusion about the quote from above. Even if
the dies are in the same physical package, micron asks users to
configure each die as it is a self-standing entity, IOW to configure
each die as it is a flash on its own. To me it looks like 2 concatenated
flashes at first look. Thus identical to how AMD controller works.
Please clarify this.

> 
> it also says that:
> "When using quad commands in XIO-SPI or when using QIO-SPI,
> DQ[3:0]/DQ[7:4] are I/O."

and this would be a parallel concatenation of two flashes.

Then it would be good if you let us now the similarities and differences
between how amd and mchp controller work, I scrawled few ideas below.

thanks,
ta
> 
> So I guess the upper layers just ask for a chunk of memory to be written
> and the controller handles the cs# lines automatically. How is the AMD
> controller working, do you have to drive the cs# lines manually, or you
> just set the parallel mode and the controller takes care of everything?
> 
> I assume this is how mchp is handling things, they seem to just set a
> bit the protocol into the QSPI_IFR.PROTTYP register field and that's all
> [4]. They even seem to write the registers of both flashes at the same time.
> 
> In what regards the AMD's "dual stack interface", AMD is sharing the
> clock and IO lines and uses dedicated CS# lines for the flashes, whereas
> Micron shares the CS# and CLK# lines with different IO lines.
> 
> Amit, please study the architectures used by mchp, micron and amd and
> let us know if they are the same or they differ, and if they differ what
> are the differences.
> 
> I added Conor from mchp in cc, I see Nicolas is already there, and Bean
> from micron.
> 
> Thanks,
> ta
> 
> [1]
> https://docs.xilinx.com/r/en-US/am011-versal-acap-trm/QSPI-Flash-Interface-Signals
> [2]
> https://docs.xilinx.com/viewer/attachment/dwmjhDJGICdJqD4swyVzcQ/fD8nv4ry78xM0_EF5kv4mA
> [3]
> https://media-www.micron.com/-/media/client/global/documents/products/data-sheet/nor-flash/serial-nor/mt25t/generation-b/mt25t_qljs_l_512_xba_0.pdf?rev=de70b770c5dc4da8b8ead06b57c03500
> [4]
> https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/ProductDocuments/DataSheets/SAMA7G5-Series-Data-Sheet-DS60001765.pdf

Mahapatra, Amit Kumar March 13, 2024, 4:03 p.m. UTC | #19

Hello Tudor,

> -----Original Message-----
> From: Tudor Ambarus <tudor.ambarus@linaro.org>
> Sent: Friday, February 9, 2024 4:36 PM
> To: Mahapatra, Amit Kumar <amit.kumar-mahapatra@amd.com>;
> broonie@kernel.org; pratyush@kernel.org; miquel.raynal@bootlin.com;
> richard@nod.at; vigneshr@ti.com; sbinding@opensource.cirrus.com;
> lee@kernel.org; james.schulman@cirrus.com; david.rhodes@cirrus.com;
> rf@opensource.cirrus.com; perex@perex.cz; tiwai@suse.com
> Cc: linux-spi@vger.kernel.org; linux-kernel@vger.kernel.org;
> michael@walle.cc; linux-mtd@lists.infradead.org;
> nicolas.ferre@microchip.com; alexandre.belloni@bootlin.com;
> claudiu.beznea@tuxon.dev; Simek, Michal <michal.simek@amd.com>; linux-
> arm-kernel@lists.infradead.org; alsa-devel@alsa-project.org;
> patches@opensource.cirrus.com; linux-sound@vger.kernel.org; git (AMD-
> Xilinx) <git@amd.com>; amitrkcian2002@gmail.com; Conor Dooley
> <conor.dooley@microchip.com>; beanhuo@micron.com
> Subject: Re: [PATCH v11 07/10] mtd: spi-nor: Add stacked memories support
> in spi-nor
> 
> CAUTION: This message has originated from an External Source. Please use
> proper judgment and caution when opening attachments, clicking links, or
> responding to this email.
> 
> 
> On 12/21/23 06:54, Mahapatra, Amit Kumar wrote:
> >> Something else to consider: I see that Micron has a twin quad mode:
> >> https://media-www.micron.com/-
> >> /media/client/global/documents/products/data-sheet/nor-flash/serial-
> >> nor/mt25t/generation-
> >>
> b/mt25t_qljs_l_512_xba_0.pdf?rev=de70b770c5dc4da8b8ead06b57c03500
> >>
> >> The micron's "Separate Chip-Select and Clock Signals" resembles the
> >> AMD's dual parallel 8-bit.
> > Yes, I agree.
> >
> >> Micron's "Shared Chip-Select and Clock Signals" differs from the
> >> AMD's stacked mode, as Micron uses DQ[3:0] and DQ[7:4], whereas AMD
> >> considers both as DQ[3:0].
> > Yes, correct.
> 
> Amit, please help me to assess this. I assume Micron and Microchip is using
> the same concepts as AMD uses for the "Dual Parallel 8-bit IO mode", but
> they call it "Twin Quad Mode".

That's accurate. It's the same concept.
> 
> I was wrong, the AMD datasheet [1] was misleading [2], it described the IOs
> for both flashes as IO[3:0], but later on in the "Table QSPI Interface Signals"
> the second flash is described with IO[7:4].

That’s correct.
> 
> The AMD's 8-bit Dual Flash Parallel Interface is using dedicated CS# and CLK#
> lines for each flash. As Micron does, isn't it?

Correct.
> 
> Micron says [3] that:
> "The device contains two quad I/O die, each able to operate independently
> for a total of eight I/Os. The memory map applies to each die. Each die has
> internal registers for status, configuration, and device protection that can be
> set and read independently from one other.
> Micron recommends that internal configuration settings for the two die be set
> identically."
> 
> it also says that:
> "When using quad commands in XIO-SPI or when using QIO-SPI,
> DQ[3:0]/DQ[7:4] are I/O."
> 
> So I guess the upper layers just ask for a chunk of memory to be written and
> the controller handles the cs# lines automatically. How is the AMD controller
> working, do you have to drive the cs# lines manually, or you just set the
> parallel mode and the controller takes care of everything?

https://docs.xilinx.com/r/en-US/am011-versal-acap-trm/Word-Format
In parallel mode, the driver sets both the CS_LOWER and CS_UPPER bits 
in the Cmd_FIFO_Data register(please refer the above link), and sets 
BUS_SEL to 2’b11 to utilize all 8 IO lines. The controller then manages 
the assertion and de-assertion of the CS# lines.
> 
> I assume this is how mchp is handling things, they seem to just set a bit the
> protocol into the QSPI_IFR.PROTTYP register field and that's all [4]. They even
> seem to write the registers of both flashes at the same time.

Yes, that's accurate, but the key distinction is that in Microchip, the 
QSPI controller has one CS# (referred to as QCS), whereas the AMD QSPI 
controller has 2 CS#(referred to as CS0 & CS1).

Regards,
Amit

> 
> In what regards the AMD's "dual stack interface", AMD is sharing the clock
> and IO lines and uses dedicated CS# lines for the flashes, whereas Micron
> shares the CS# and CLK# lines with different IO lines.
> 
> Amit, please study the architectures used by mchp, micron and amd and let
> us know if they are the same or they differ, and if they differ what are the
> differences.
> 
> I added Conor from mchp in cc, I see Nicolas is already there, and Bean from
> micron.
> 
> Thanks,
> ta
> 
> [1]
> https://docs.xilinx.com/r/en-US/am011-versal-acap-trm/QSPI-Flash-Interface-
> Signals
> [2]
> https://docs.xilinx.com/viewer/attachment/dwmjhDJGICdJqD4swyVzcQ/fD8nv
> 4ry78xM0_EF5kv4mA
> [3]
> https://media-www.micron.com/-
> /media/client/global/documents/products/data-sheet/nor-flash/serial-
> nor/mt25t/generation-
> b/mt25t_qljs_l_512_xba_0.pdf?rev=de70b770c5dc4da8b8ead06b57c03500
> [4]
> https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/
> ProductDocuments/DataSheets/SAMA7G5-Series-Data-Sheet-
> DS60001765.pdf

Mahapatra, Amit Kumar March 13, 2024, 4:03 p.m. UTC | #20

> -----Original Message-----
> From: Tudor Ambarus <tudor.ambarus@linaro.org>
> Sent: Friday, February 9, 2024 9:44 PM
> To: Mahapatra, Amit Kumar <amit.kumar-mahapatra@amd.com>;
> broonie@kernel.org; pratyush@kernel.org; miquel.raynal@bootlin.com;
> richard@nod.at; vigneshr@ti.com; sbinding@opensource.cirrus.com;
> lee@kernel.org; james.schulman@cirrus.com; david.rhodes@cirrus.com;
> rf@opensource.cirrus.com; perex@perex.cz; tiwai@suse.com
> Cc: linux-spi@vger.kernel.org; linux-kernel@vger.kernel.org;
> michael@walle.cc; linux-mtd@lists.infradead.org;
> nicolas.ferre@microchip.com; alexandre.belloni@bootlin.com;
> claudiu.beznea@tuxon.dev; Simek, Michal <michal.simek@amd.com>; linux-
> arm-kernel@lists.infradead.org; alsa-devel@alsa-project.org;
> patches@opensource.cirrus.com; linux-sound@vger.kernel.org; git (AMD-
> Xilinx) <git@amd.com>; amitrkcian2002@gmail.com; Conor Dooley
> <conor.dooley@microchip.com>; beanhuo@micron.com
> Subject: Re: [PATCH v11 07/10] mtd: spi-nor: Add stacked memories support
> in spi-nor
> 
> 
> 
> On 2/9/24 11:06, Tudor Ambarus wrote:
> >
> >
> > On 12/21/23 06:54, Mahapatra, Amit Kumar wrote:
> >>> Something else to consider: I see that Micron has a twin quad mode:
> >>> https://media-www.micron.com/-
> >>> /media/client/global/documents/products/data-sheet/nor-flash/serial-
> >>> nor/mt25t/generation-
> >>>
> b/mt25t_qljs_l_512_xba_0.pdf?rev=de70b770c5dc4da8b8ead06b57c03500
> >>>
> >>> The micron's "Separate Chip-Select and Clock Signals" resembles the
> >>> AMD's dual parallel 8-bit.
> >> Yes, I agree.
> >>
> >>> Micron's "Shared Chip-Select and Clock Signals" differs from the
> >>> AMD's stacked mode, as Micron uses DQ[3:0] and DQ[7:4], whereas AMD
> >>> considers both as DQ[3:0].
> >> Yes, correct.
> >
> > Amit, please help me to assess this. I assume Micron and Microchip is
> > using the same concepts as AMD uses for the "Dual Parallel 8-bit IO
> > mode", but they call it "Twin Quad Mode".
> >
> > I was wrong, the AMD datasheet [1] was misleading [2], it described
> > the IOs for both flashes as IO[3:0], but later on in the "Table QSPI
> > Interface Signals" the second flash is described with IO[7:4].
> >
> > The AMD's 8-bit Dual Flash Parallel Interface is using dedicated CS#
> > and CLK# lines for each flash. As Micron does, isn't it?
> >
> > Micron says [3] that:
> > "The device contains two quad I/O die, each able to operate
> > independently for a total of eight I/Os. The memory map applies to
> > each die. Each die has internal registers for status, configuration,
> > and device protection that can be set and read independently from one
> other.
> > Micron recommends that internal configuration settings for the two die
> > be set identically."
> 

Hello Tudor,

> Amit,
> 
> I forgot to say my first conclusion about the quote from above. Even if the
> dies are in the same physical package, micron asks users to configure each die
> as it is a self-standing entity, IOW to configure each die as it is a flash on its
> own. To me it looks like 2 concatenated flashes at first look. Thus identical to
> how AMD controller works.
> Please clarify this.

That’s correct, the Micron flash that you referred can communicate with 
the AMD QSPI controller in both parallel and stacked mode.
> 
> >
> > it also says that:
> > "When using quad commands in XIO-SPI or when using QIO-SPI,
> > DQ[3:0]/DQ[7:4] are I/O."
> 
> and this would be a parallel concatenation of two flashes.

That's correct.

Regards,
Amit
> 
> Then it would be good if you let us now the similarities and differences
> between how amd and mchp controller work, I scrawled few ideas below.
> 
> thanks,
> ta
> >
> > So I guess the upper layers just ask for a chunk of memory to be
> > written and the controller handles the cs# lines automatically. How is
> > the AMD controller working, do you have to drive the cs# lines
> > manually, or you just set the parallel mode and the controller takes care of
> everything?
> >
> > I assume this is how mchp is handling things, they seem to just set a
> > bit the protocol into the QSPI_IFR.PROTTYP register field and that's
> > all [4]. They even seem to write the registers of both flashes at the same
> time.
> >
> > In what regards the AMD's "dual stack interface", AMD is sharing the
> > clock and IO lines and uses dedicated CS# lines for the flashes,
> > whereas Micron shares the CS# and CLK# lines with different IO lines.
> >
> > Amit, please study the architectures used by mchp, micron and amd and
> > let us know if they are the same or they differ, and if they differ
> > what are the differences.
> >
> > I added Conor from mchp in cc, I see Nicolas is already there, and
> > Bean from micron.
> >
> > Thanks,
> > ta
> >
> > [1]
> > https://docs.xilinx.com/r/en-US/am011-versal-acap-trm/QSPI-Flash-Inter
> > face-Signals
> > [2]
> >
> https://docs.xilinx.com/viewer/attachment/dwmjhDJGICdJqD4swyVzcQ/fD8nv
> > 4ry78xM0_EF5kv4mA
> > [3]
> > https://media-www.micron.com/-
> /media/client/global/documents/products/
> > data-sheet/nor-flash/serial-nor/mt25t/generation-b/mt25t_qljs_l_512_xb
> > a_0.pdf?rev=de70b770c5dc4da8b8ead06b57c03500
> > [4]
> >
> https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/
> Produ
> > ctDocuments/DataSheets/SAMA7G5-Series-Data-Sheet-DS60001765.pdf

Mahapatra, Amit Kumar July 26, 2024, 12:35 p.m. UTC | #21

> -----Original Message-----
> From: Mahapatra, Amit Kumar <amit.kumar-mahapatra@amd.com>
> Sent: Wednesday, March 13, 2024 9:34 PM
> To: Tudor Ambarus <tudor.ambarus@linaro.org>; broonie@kernel.org;
> pratyush@kernel.org; miquel.raynal@bootlin.com; richard@nod.at;
> vigneshr@ti.com; sbinding@opensource.cirrus.com; lee@kernel.org;
> james.schulman@cirrus.com; david.rhodes@cirrus.com;
> rf@opensource.cirrus.com; perex@perex.cz; tiwai@suse.com
> Cc: linux-spi@vger.kernel.org; linux-kernel@vger.kernel.org;
> michael@walle.cc; linux-mtd@lists.infradead.org;
> nicolas.ferre@microchip.com; alexandre.belloni@bootlin.com;
> claudiu.beznea@tuxon.dev; Simek, Michal <michal.simek@amd.com>; linux-
> arm-kernel@lists.infradead.org; alsa-devel@alsa-project.org;
> patches@opensource.cirrus.com; linux-sound@vger.kernel.org; git (AMD-
> Xilinx) <git@amd.com>; amitrkcian2002@gmail.com; Conor Dooley
> <conor.dooley@microchip.com>; beanhuo@micron.com
> Subject: RE: [PATCH v11 07/10] mtd: spi-nor: Add stacked memories support
> in spi-nor
> 
> 
> 
> > -----Original Message-----
> > From: Tudor Ambarus <tudor.ambarus@linaro.org>
> > Sent: Friday, February 9, 2024 9:44 PM
> > To: Mahapatra, Amit Kumar <amit.kumar-mahapatra@amd.com>;
> > broonie@kernel.org; pratyush@kernel.org; miquel.raynal@bootlin.com;
> > richard@nod.at; vigneshr@ti.com; sbinding@opensource.cirrus.com;
> > lee@kernel.org; james.schulman@cirrus.com; david.rhodes@cirrus.com;
> > rf@opensource.cirrus.com; perex@perex.cz; tiwai@suse.com
> > Cc: linux-spi@vger.kernel.org; linux-kernel@vger.kernel.org;
> > michael@walle.cc; linux-mtd@lists.infradead.org;
> > nicolas.ferre@microchip.com; alexandre.belloni@bootlin.com;
> > claudiu.beznea@tuxon.dev; Simek, Michal <michal.simek@amd.com>;
> linux-
> > arm-kernel@lists.infradead.org; alsa-devel@alsa-project.org;
> > patches@opensource.cirrus.com; linux-sound@vger.kernel.org; git (AMD-
> > Xilinx) <git@amd.com>; amitrkcian2002@gmail.com; Conor Dooley
> > <conor.dooley@microchip.com>; beanhuo@micron.com
> > Subject: Re: [PATCH v11 07/10] mtd: spi-nor: Add stacked memories
> > support in spi-nor

Hello Everyone,

I would like to propose another approach for handling stacked and 
parallel connection modes and would appreciate your thoughts on it. 
But before that, here is some background on what we are trying to achieve.

The AMD QSPI controller supports two advanced connection modes(Stacked and 
Dual Parallel) which allow the controller to treat two different flashes 
as one storage.

Stacked:
Flashes share the same SPI bus, but different CS line, controller asserts 
the CS of the flash to which it needs to communicate.

Dual Parallel:
Both the flashes have their separate SPI bus CS of both the flashes will 
be asserted/de-asserted at the same time. In this mode data will be split 
across both the flashes by enabling the STRIPE setting in the controller. 
If STRIPE is not enabled, then same data will be sent to both the devices.

For more information on the modes please feel free to go through the 
controller flash interface below
https://docs.amd.com/r/en-US/am011-versal-acap-trm/QSPI-Flash-Device-Interface

Mirochip QSPI controller also supports "Dual Parallel 8-bit IO mode", but 
they call it "Twin Quad Mode".
https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/ProductDocuments/DataSheets/SAMA7G5-Series-Data-Sheet-DS60001765.pdf

DT binding changes were added through the following commits:
https://github.com/torvalds/linux/commit/f89504300e94524d5d5846ff8b728592ac72cec4
https://github.com/torvalds/linux/commit/eba5368503b4291db7819512600fa014ea17c5a8
https://github.com/torvalds/linux/commit/e2edd1b64f1c79e8abda365149ed62a2a9a494b4

SPI core changes were adds through the following commit:
https://github.com/torvalds/linux/commit/4d8ff6b0991d5e86b17b235fc46ec62e9195cb9b

Based on the inputs/suggestions from Tudor, i am planning to add a new 
layer between the SPI-NOR and MTD layers to support stacked and parallel 
configurations. This new layer will be part of the spi-nor and located in 
mtd/spi-nor/

This layer would perform the following tasks:
 - During probing, store information from all the connected flashes, 
   whether in stacked or parallel mode, and present it as a single device 
   to the MTD layer.
 - Register callbacks through this new layer instead of spi-nor/core.c and 
   handle MTD device registration.
 - In stacked mode, select the appropriate spi-nor flash based on the 
   address provided by the MTD layer during flash operations.
 - Manage flash crossover operations in stacked mode.
 - Ensure both connected flashes are identical in parallel mode.
 - Handle odd byte count requests from the MTD layer during flash 
   operations in parallel mode.

For implementing this the current DT binding need to be updated as 
follows.

stacked-memories DT changes:
 - Flash size information can be retrieved directly from the flash, so it 
   has been removed from the DT binding.
 - Each stacked flash will have its own flash node. This approach allows 
   flashes of different makes and sizes to be stacked together, as each 
   flash will be probed individually.
 - The stacked-memories DT bindings will contain the phandles of the flash 
   nodes connected in stacked mode.

spi@0 {
  
  flash@0 {
    compatible = "jedec,spi-nor"
    reg = <0x00>;
    stacked-memories = <&flash@0 &flash@1>;
    spi-max-frequency = <50000000>;
    ...
              partition@0 { 
        label = "qspi-0";
        reg = <0x0 0xf00000>;
    };
                        

  }
  
  flash@1 {
    compatible = "jedec,spi-nor"
              reg = <0x01>;
    spi-max-frequency = <50000000>;
    ...
              partition@0 { 
        label = "qspi-1";
        reg = <0x0 0x800000>;
    };
  }
}

parallel-memories DT changes:
 - Flash size information can be retrieved directly from the flash, so it 
   has been removed from the DT binding.
 - Each flash connected in parallel mode will have its own flash node. 
   This allows us to verify that both flashes connected in parallel are 
   identical, as each flash node will be probed separately.
 - The parallel-memories DT bindings will contain the phandles of the 
   flash nodes connected in parallel.

spi@0 {
  
  flash@0 {
    compatible = "jedec,spi-nor"
    reg = <0x00>;
    parallel-memories = <&flash@0 &flash@1>;
    spi-max-frequency = <50000000>;
    ...
              partition@0 { 
        label = "qspi-0";
        reg = <0x0 0xf00000>;
    };
                        

  }
  
  flash@1 {
    compatible = "jedec,spi-nor"
              reg = <0x01>;
    spi-max-frequency = <50000000>;
    ...
              partition@0 { 
        label = "qspi-1";
        reg = <0x0 0x800000>;
    };
  }
}

Regards,
Amit

> >
> >
> >
> > On 2/9/24 11:06, Tudor Ambarus wrote:
> > >
> > >
> > > On 12/21/23 06:54, Mahapatra, Amit Kumar wrote:
> > >>> Something else to consider: I see that Micron has a twin quad mode:
> > >>> https://media-www.micron.com/-
> > >>> /media/client/global/documents/products/data-sheet/nor-flash/seria
> > >>> l-
> > >>> nor/mt25t/generation-
> > >>>
> > b/mt25t_qljs_l_512_xba_0.pdf?rev=de70b770c5dc4da8b8ead06b57c03500
> > >>>
> > >>> The micron's "Separate Chip-Select and Clock Signals" resembles
> > >>> the AMD's dual parallel 8-bit.
> > >> Yes, I agree.
> > >>
> > >>> Micron's "Shared Chip-Select and Clock Signals" differs from the
> > >>> AMD's stacked mode, as Micron uses DQ[3:0] and DQ[7:4], whereas
> > >>> AMD considers both as DQ[3:0].
> > >> Yes, correct.
> > >
> > > Amit, please help me to assess this. I assume Micron and Microchip
> > > is using the same concepts as AMD uses for the "Dual Parallel 8-bit
> > > IO mode", but they call it "Twin Quad Mode".
> > >
> > > I was wrong, the AMD datasheet [1] was misleading [2], it described
> > > the IOs for both flashes as IO[3:0], but later on in the "Table QSPI
> > > Interface Signals" the second flash is described with IO[7:4].
> > >
> > > The AMD's 8-bit Dual Flash Parallel Interface is using dedicated CS#
> > > and CLK# lines for each flash. As Micron does, isn't it?
> > >
> > > Micron says [3] that:
> > > "The device contains two quad I/O die, each able to operate
> > > independently for a total of eight I/Os. The memory map applies to
> > > each die. Each die has internal registers for status, configuration,
> > > and device protection that can be set and read independently from
> > > one
> > other.
> > > Micron recommends that internal configuration settings for the two
> > > die be set identically."
> >
> 
> Hello Tudor,
> 
> > Amit,
> >
> > I forgot to say my first conclusion about the quote from above. Even
> > if the dies are in the same physical package, micron asks users to
> > configure each die as it is a self-standing entity, IOW to configure
> > each die as it is a flash on its own. To me it looks like 2
> > concatenated flashes at first look. Thus identical to how AMD controller
> works.
> > Please clarify this.
> 
> That’s correct, the Micron flash that you referred can communicate with the
> AMD QSPI controller in both parallel and stacked mode.
> >
> > >
> > > it also says that:
> > > "When using quad commands in XIO-SPI or when using QIO-SPI,
> > > DQ[3:0]/DQ[7:4] are I/O."
> >
> > and this would be a parallel concatenation of two flashes.
> 
> That's correct.
> 
> Regards,
> Amit
> >
> > Then it would be good if you let us now the similarities and
> > differences between how amd and mchp controller work, I scrawled few
> ideas below.
> >
> > thanks,
> > ta
> > >
> > > So I guess the upper layers just ask for a chunk of memory to be
> > > written and the controller handles the cs# lines automatically. How
> > > is the AMD controller working, do you have to drive the cs# lines
> > > manually, or you just set the parallel mode and the controller takes
> > > care of
> > everything?
> > >
> > > I assume this is how mchp is handling things, they seem to just set
> > > a bit the protocol into the QSPI_IFR.PROTTYP register field and
> > > that's all [4]. They even seem to write the registers of both
> > > flashes at the same
> > time.
> > >
> > > In what regards the AMD's "dual stack interface", AMD is sharing the
> > > clock and IO lines and uses dedicated CS# lines for the flashes,
> > > whereas Micron shares the CS# and CLK# lines with different IO lines.
> > >
> > > Amit, please study the architectures used by mchp, micron and amd
> > > and let us know if they are the same or they differ, and if they
> > > differ what are the differences.
> > >
> > > I added Conor from mchp in cc, I see Nicolas is already there, and
> > > Bean from micron.
> > >
> > > Thanks,
> > > ta
> > >
> > > [1]
> > > https://docs.xilinx.com/r/en-US/am011-versal-acap-trm/QSPI-Flash-Int
> > > er
> > > face-Signals
> > > [2]
> > >
> >
> https://docs.xilinx.com/viewer/attachment/dwmjhDJGICdJqD4swyVzcQ/fD8nv
> > > 4ry78xM0_EF5kv4mA
> > > [3]
> > > https://media-www.micron.com/-
> > /media/client/global/documents/products/
> > > data-sheet/nor-flash/serial-nor/mt25t/generation-b/mt25t_qljs_l_512_
> > > xb
> > > a_0.pdf?rev=de70b770c5dc4da8b8ead06b57c03500
> > > [4]
> > >
> >
> https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/
> > Produ
> > > ctDocuments/DataSheets/SAMA7G5-Series-Data-Sheet-DS60001765.pdf

Michael Walle July 26, 2024, 12:55 p.m. UTC | #22

Hi,

> Based on the inputs/suggestions from Tudor, i am planning to add a new 
> layer between the SPI-NOR and MTD layers to support stacked and parallel 
> configurations. This new layer will be part of the spi-nor and located in 
> mtd/spi-nor/

Will AMD submit to maintain this layer? What happens if the
maintainer will leave AMD? TBH, personally, I don't like to
maintain such a niche feature.
I'd really like to see some use cases and performance reports for
this, like actual boards (and no evaluation boards don't count). Why
wouldn't someone just use an octal flash?

And as already mentioned there is also mtdcat, which seems to
duplicate some features?

-michael

Michal Simek July 31, 2024, 8:58 a.m. UTC | #23

Hi Michael,

On 7/26/24 14:55, Michael Walle wrote:
> Hi,
> 
>> Based on the inputs/suggestions from Tudor, i am planning to add a new
>> layer between the SPI-NOR and MTD layers to support stacked and parallel
>> configurations. This new layer will be part of the spi-nor and located in
>> mtd/spi-nor/
> 
> Will AMD submit to maintain this layer? What happens if the
> maintainer will leave AMD? TBH, personally, I don't like to
> maintain such a niche feature.
> I'd really like to see some use cases and performance reports for
> this, like actual boards (and no evaluation boards don't count). Why
> wouldn't someone just use an octal flash?

AMD/Xilinx is not creating products that's why we don't have data on actual 
boards but I don't really understand why evaluation boards don't count. A lot of 
customers are taking schematics from us and removing parts which they don't need 
and add their custom part.

But one product for parallel configuration which is publicly saying that it is 
using it is for example this SOM.
https://shop.trenz-electronic.de/en/TE0820-05-2AI21MA-MPSoC-Module-with-AMD-Zynq-UltraScale-ZU2CG-1I-2-GByte-DDR4-SDRAM-4-x-5-cm

I am not marketing guy to tell if there is any other which publicly saying we 
are using this feature but we can only develop/support/maintain support for 
these configurations on our evaluation boards because that's what we have access 
to and what we know how it is done.

Also performance numbers from us can be only provided against our evaluation boards.

Thanks,
Michal

Michael Walle July 31, 2024, 9:19 a.m. UTC | #24

Hi Michal,

> >> Based on the inputs/suggestions from Tudor, i am planning to add a new
> >> layer between the SPI-NOR and MTD layers to support stacked and parallel
> >> configurations. This new layer will be part of the spi-nor and located in
> >> mtd/spi-nor/
> > 
> > Will AMD submit to maintain this layer? What happens if the
> > maintainer will leave AMD? TBH, personally, I don't like to
> > maintain such a niche feature.
> > I'd really like to see some use cases and performance reports for
> > this, like actual boards (and no evaluation boards don't count). Why
> > wouldn't someone just use an octal flash?
>
> AMD/Xilinx is not creating products that's why we don't have data on actual 
> boards but I don't really understand why evaluation boards don't count.

Because on an eval board the vendor just puts everything possible on
the board.

> A lot of 
> customers are taking schematics from us and removing parts which they don't need 
> and add their custom part.
>
> But one product for parallel configuration which is publicly saying that it is 
> using it is for example this SOM.
> https://shop.trenz-electronic.de/en/TE0820-05-2AI21MA-MPSoC-Module-with-AMD-Zynq-UltraScale-ZU2CG-1I-2-GByte-DDR4-SDRAM-4-x-5-cm
>
> I am not marketing guy to tell if there is any other which publicly saying we 
> are using this feature but we can only develop/support/maintain support for 
> these configurations on our evaluation boards because that's what we have access 
> to and what we know how it is done.
>
> Also performance numbers from us can be only provided against our evaluation boards.

Which is good enough.

All I'm saying is that you shouldn't put burden on us (the SPI NOR
maintainers) for what seems to me at least as a niche. Thus I was
asking for performance numbers and users. Convince me that I'm
wrong and that is worth our time.

The first round of patches were really invasive regarding the core
code. So if there is a clean layering approach which can be enabled
as a module and you are maintaining it I'm fine with that (even if
the core code needs some changes then like hooks or so, not sure).

-michael

Michal Simek July 31, 2024, 1:40 p.m. UTC | #25

On 7/31/24 11:19, Michael Walle wrote:
> Hi Michal,
> 
>>>> Based on the inputs/suggestions from Tudor, i am planning to add a new
>>>> layer between the SPI-NOR and MTD layers to support stacked and parallel
>>>> configurations. This new layer will be part of the spi-nor and located in
>>>> mtd/spi-nor/
>>>
>>> Will AMD submit to maintain this layer? What happens if the
>>> maintainer will leave AMD? TBH, personally, I don't like to
>>> maintain such a niche feature.
>>> I'd really like to see some use cases and performance reports for
>>> this, like actual boards (and no evaluation boards don't count). Why
>>> wouldn't someone just use an octal flash?
>>
>> AMD/Xilinx is not creating products that's why we don't have data on actual
>> boards but I don't really understand why evaluation boards don't count.
> 
> Because on an eval board the vendor just puts everything possible on
> the board.
> 
>> A lot of
>> customers are taking schematics from us and removing parts which they don't need
>> and add their custom part.
>>
>> But one product for parallel configuration which is publicly saying that it is
>> using it is for example this SOM.
>> https://shop.trenz-electronic.de/en/TE0820-05-2AI21MA-MPSoC-Module-with-AMD-Zynq-UltraScale-ZU2CG-1I-2-GByte-DDR4-SDRAM-4-x-5-cm
>>
>> I am not marketing guy to tell if there is any other which publicly saying we
>> are using this feature but we can only develop/support/maintain support for
>> these configurations on our evaluation boards because that's what we have access
>> to and what we know how it is done.
>>
>> Also performance numbers from us can be only provided against our evaluation boards.
> 
> Which is good enough.
> 
> All I'm saying is that you shouldn't put burden on us (the SPI NOR
> maintainers) for what seems to me at least as a niche. Thus I was
> asking for performance numbers and users. Convince me that I'm
> wrong and that is worth our time.

No. It is not really just feature of our evaluation boards. Customers are using 
it. I was talking to one guy from field and he confirms me that these 
configurations are used by his multiple customers in real products.

> 
> The first round of patches were really invasive regarding the core
> code. So if there is a clean layering approach which can be enabled
> as a module and you are maintaining it I'm fine with that (even if
> the core code needs some changes then like hooks or so, not sure).

That discussion started with Miquel some years ago when he was trying to to 
solve description in DT which is merged for a while in the kernel.
And Amit is trying to figure it out which way to go.
I don't want to predict where that code should be going or how it should look 
like because don't have spi-nor experience. But I hope we finally move forward 
on this topic to see the path how to upstream support for it.

Thanks,
Michal

Michael Walle July 31, 2024, 2:11 p.m. UTC | #26

> > All I'm saying is that you shouldn't put burden on us (the SPI NOR
> > maintainers) for what seems to me at least as a niche. Thus I was
> > asking for performance numbers and users. Convince me that I'm
> > wrong and that is worth our time.
>
> No. It is not really just feature of our evaluation boards. Customers are using 
> it. I was talking to one guy from field and he confirms me that these 
> configurations are used by his multiple customers in real products.

Which begs the question, do we really have to support every feature
in the core (I'd like to hear Tudors and Pratyush opinion here).
Honestly, this just looks like a concatenation of two QSPI
controllers. Why didn't you just use a normal octal controller which
is a protocol also backed by the JEDEC standard. Is it any faster?
Do you get more capacity? Does anyone really use large SPI-NOR
flashes? If so, why? I mean you've put that controller on your SoC,
you must have some convincing arguments why a customer should use
it.

> > The first round of patches were really invasive regarding the core
> > code. So if there is a clean layering approach which can be enabled
> > as a module and you are maintaining it I'm fine with that (even if
> > the core code needs some changes then like hooks or so, not sure).
>
> That discussion started with Miquel some years ago when he was trying to to 
> solve description in DT which is merged for a while in the kernel.

What's your point here? From what I can tell the DT binding is wrong
and needs to be reworked anyway.

> And Amit is trying to figure it out which way to go.
> I don't want to predict where that code should be going or how it should look 
> like because don't have spi-nor experience. But I hope we finally move forward 
> on this topic to see the path how to upstream support for it.

You still didn't answer my initial question. Will AMD support and
maintain that code? I was pushing you towards putting that code into
your own driver because then that's up to you what you are doing
there.

So how do we move forward? I'd like to see as little as core changes
possible to get your dual flash setup working. I'm fine with having a
layer in spi-nor/ (not sure that's how it will work with mtdcat
which looks like it's similar as your stacked flash) as long as it
can be a module (selected by the driver).

-michael

Michal Simek Aug. 1, 2024, 6:22 a.m. UTC | #27

On 7/31/24 16:11, Michael Walle wrote:
>>> All I'm saying is that you shouldn't put burden on us (the SPI NOR
>>> maintainers) for what seems to me at least as a niche. Thus I was
>>> asking for performance numbers and users. Convince me that I'm
>>> wrong and that is worth our time.
>>
>> No. It is not really just feature of our evaluation boards. Customers are using
>> it. I was talking to one guy from field and he confirms me that these
>> configurations are used by his multiple customers in real products.
> 
> Which begs the question, do we really have to support every feature
> in the core (I'd like to hear Tudors and Pratyush opinion here).
> Honestly, this just looks like a concatenation of two QSPI
> controllers. 

Based on my understanding for stacked yes. For parallel no.

> Why didn't you just use a normal octal controller which
> is a protocol also backed by the JEDEC standard. 

On newer SOC octal IP core is used.
Amit please comment.

> Is it any faster?

Amit: please provide numbers.

> Do you get more capacity? Does anyone really use large SPI-NOR
> flashes? If so, why?

You get twice more capacity based on that configuration. I can't answer the 
second question because not working with field. But both of that configurations 
are used by customers. Adding Neal if he wants to add something more to it.

> I mean you've put that controller on your SoC,
> you must have some convincing arguments why a customer should use
> it.

I expect recommendation is to use single configuration but if you need bigger 
space for your application the only way to extend it is to use stacked 
configuration with two the same flashes next to each other.
If you want to have bigger size and also be faster answer is parallel 
configuration.

>>> The first round of patches were really invasive regarding the core
>>> code. So if there is a clean layering approach which can be enabled
>>> as a module and you are maintaining it I'm fine with that (even if
>>> the core code needs some changes then like hooks or so, not sure).
>>
>> That discussion started with Miquel some years ago when he was trying to to
>> solve description in DT which is merged for a while in the kernel.
> 
> What's your point here? From what I can tell the DT binding is wrong
> and needs to be reworked anyway.

I am just saying that this is not any adhoc new feature but configuration which 
has been already discussed and some steps made. If DT binding is wrong it can be 
deprecated and use new one but for that it has be clear which way to go.

>> And Amit is trying to figure it out which way to go.
>> I don't want to predict where that code should be going or how it should look
>> like because don't have spi-nor experience. But I hope we finally move forward
>> on this topic to see the path how to upstream support for it.
> 
> You still didn't answer my initial question. Will AMD support and
> maintain that code? I was pushing you towards putting that code into
> your own driver because then that's up to you what you are doing
> there.

Of course. We care about our code and about supporting our SOC and features 
which are related to it. It means yes, we will be regularly testing it and 
taking care about it.

> So how do we move forward? I'd like to see as little as core changes
> possible to get your dual flash setup working. I'm fine with having a
> layer in spi-nor/ (not sure that's how it will work with mtdcat
> which looks like it's similar as your stacked flash) as long as it
> can be a module (selected by the driver).

ok.

Thanks,
Michal

Frager, Neal Aug. 1, 2024, 6:37 a.m. UTC | #28

Hi Michael,

>>> All I'm saying is that you shouldn't put burden on us (the SPI NOR
>>> maintainers) for what seems to me at least as a niche. Thus I was
>>> asking for performance numbers and users. Convince me that I'm
>>> wrong and that is worth our time.
>>
>> No. It is not really just feature of our evaluation boards. Customers are using
>> it. I was talking to one guy from field and he confirms me that these
>> configurations are used by his multiple customers in real products.
> 
> Which begs the question, do we really have to support every feature
> in the core (I'd like to hear Tudors and Pratyush opinion here).
> Honestly, this just looks like a concatenation of two QSPI
> controllers. 

> Based on my understanding for stacked yes. For parallel no.

> Why didn't you just use a normal octal controller which
> is a protocol also backed by the JEDEC standard. 

> On newer SOC octal IP core is used.
> Amit please comment.

> Is it any faster?

> Amit: please provide numbers.

> Do you get more capacity? Does anyone really use large SPI-NOR
> flashes? If so, why?

> You get twice more capacity based on that configuration. I can't answer the 
> second question because not working with field. But both of that configurations 
> are used by customers. Adding Neal if he wants to add something more to it.

Just to add a comment as I work directly with our customers.  The main reason
this support is important is for our older SoCs, zynq and zynqmp.

Most of our customers are using QSPI flash as the first boot memory to get
from the boot ROM to u-boot.  They then typically use other memories, such as
eMMC for the Linux kernel, OS and file system.

The issue we have on the zynq and zynqmp SoCs is that the boot ROM (code that
cannot be changed) will not boot from an OSPI flash.  It will only boot from a
QSPI flash.  This is what is forcing many of our customers down the QSPI path.
Since many of these customers are interested in additional speed and memory
size, they then end up using a parallel or stacked configuration because they
cannot use an OSPI with zynq or zynqmp.

All of our newer SoCs can boot from OSPI.  I agree with you that if someone
could choose OSPI for performance, they would, so I do not expect parallel or
stacked configurations with our newer SoCs.

I get why you see this configuration as a niche, but for us, it is a very
large niche because zynq and zynqmp are two of our most successful SoC
families.

> I mean you've put that controller on your SoC,
> you must have some convincing arguments why a customer should use
> it.

> I expect recommendation is to use single configuration but if you need bigger 
> space for your application the only way to extend it is to use stacked 
> configuration with two the same flashes next to each other.
> If you want to have bigger size and also be faster answer is parallel 
> configuration.


>>> The first round of patches were really invasive regarding the core
>>> code. So if there is a clean layering approach which can be enabled
>>> as a module and you are maintaining it I'm fine with that (even if
>>> the core code needs some changes then like hooks or so, not sure).
>>
>> That discussion started with Miquel some years ago when he was trying to to
>> solve description in DT which is merged for a while in the kernel.
> 
> What's your point here? From what I can tell the DT binding is wrong
> and needs to be reworked anyway.

> I am just saying that this is not any adhoc new feature but configuration which 
> has been already discussed and some steps made. If DT binding is wrong it can be 
> deprecated and use new one but for that it has be clear which way to go.


>> And Amit is trying to figure it out which way to go.
>> I don't want to predict where that code should be going or how it should look
>> like because don't have spi-nor experience. But I hope we finally move forward
>> on this topic to see the path how to upstream support for it.
> 
> You still didn't answer my initial question. Will AMD support and
> maintain that code? I was pushing you towards putting that code into
> your own driver because then that's up to you what you are doing
> there.

> Of course. We care about our code and about supporting our SOC and features 
> which are related to it. It means yes, we will be regularly testing it and 
> taking care about it.


> So how do we move forward? I'd like to see as little as core changes
> possible to get your dual flash setup working. I'm fine with having a
> layer in spi-nor/ (not sure that's how it will work with mtdcat
> which looks like it's similar as your stacked flash) as long as it
> can be a module (selected by the driver).

> ok.

Best regards,
Neal Frager
AMD

Mahapatra, Amit Kumar Aug. 1, 2024, 9:28 a.m. UTC | #29

Hello,


> -----Original Message-----
> From: Frager, Neal <neal.frager@amd.com>
> Sent: Thursday, August 1, 2024 12:07 PM
> To: Simek, Michal <michal.simek@amd.com>; Michael Walle
> <michael@walle.cc>; Mahapatra, Amit Kumar <amit.kumar-
> mahapatra@amd.com>; Tudor Ambarus <tudor.ambarus@linaro.org>;
> broonie@kernel.org; pratyush@kernel.org; miquel.raynal@bootlin.com;
> richard@nod.at; vigneshr@ti.com; sbinding@opensource.cirrus.com;
> lee@kernel.org; james.schulman@cirrus.com; david.rhodes@cirrus.com;
> rf@opensource.cirrus.com; perex@perex.cz; tiwai@suse.com
> Cc: linux-spi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> mtd@lists.infradead.org; nicolas.ferre@microchip.com;
> alexandre.belloni@bootlin.com; claudiu.beznea@tuxon.dev; linux-arm-
> kernel@lists.infradead.org; alsa-devel@alsa-project.org;
> patches@opensource.cirrus.com; linux-sound@vger.kernel.org; git (AMD-
> Xilinx) <git@amd.com>; amitrkcian2002@gmail.com; Conor Dooley
> <conor.dooley@microchip.com>; beanhuo@micron.com
> Subject: RE: [PATCH v11 07/10] mtd: spi-nor: Add stacked memories support
> in spi-nor
> 
> Hi Michael,
> 
> >>> All I'm saying is that you shouldn't put burden on us (the SPI NOR
> >>> maintainers) for what seems to me at least as a niche. Thus I was
> >>> asking for performance numbers and users. Convince me that I'm wrong
> >>> and that is worth our time.
> >>
> >> No. It is not really just feature of our evaluation boards. Customers
> >> are using it. I was talking to one guy from field and he confirms me
> >> that these configurations are used by his multiple customers in real
> products.
> >
> > Which begs the question, do we really have to support every feature in
> > the core (I'd like to hear Tudors and Pratyush opinion here).
> > Honestly, this just looks like a concatenation of two QSPI
> > controllers.
> 
> > Based on my understanding for stacked yes. For parallel no.
> 
> > Why didn't you just use a normal octal controller which is a protocol
> > also backed by the JEDEC standard.
> 
> > On newer SOC octal IP core is used.
> > Amit please comment.
> 
> > Is it any faster?
> 
> > Amit: please provide numbers.
> 
Here are some QSPI performance numbers comparing a single flash mode and 
two flashes connected in parallel mode. I ran the test on a VCK190 Eval 
Board https://www.xilinx.com/products/boards-and-kits/vck190.html, 
measuring the timing for mtd_debug erase, write, and read operations. 
The QSPI bus clock was set to 150MHz, and the data size was 32MB, 
comparing a single flash setup with a two-flash parallel mode setup.

Single Flash mode:

xilinx-vck190-20242:/home/petalinux# time mtd_debug erase /dev/mtd2 0x00 0x1e00000
Erased 31457280 bytes from address 0x00000000 in flash

real	0m54.713s
user	0m0.000s
sys	0m32.639s
xilinx-vck190-20242:/home/petalinux# time mtd_debug write /dev/mtd2 0x00 0x1e00000 test.bin
Copied 31457280 bytes from test.bin to address 0x00000000 in flash

real	0m30.187s
user	0m0.000s
sys	0m16.359s
xilinx-vck190-20242:/home/petalinux# time mtd_debug read /dev/mtd2 0x00 0x1e00000 test-read.bin
Copied 31457280 bytes from address 0x00000000 in flash to test-read.bin

real	0m0.472s
user	0m0.001s
sys	0m0.040s

Two flashes connected in parallel mode:

xilinx-vck190-20242:/home/petalinux# time mtd_debug erase /dev/mtd2 0x00 0x1e00000
Erased 31457280 bytes from address 0x00000000 in flash

real	0m27.727s
user	0m0.004s
sys	0m14.923s
xilinx-vck190-20242:/home/petalinux# time mtd_debug write /dev/mtd2 0x00 0x1e00000 test.bin
Copied 31457280 bytes from test.bin to address 0x00000000 in flash

real	0m16.538s
user	0m0.000s
sys	0m8.512s
xilinx-vck190-20242:/home/petalinux# time mtd_debug read /dev/mtd2 0x00 0x1e00000 test-read.bin
Copied 31457280 bytes from address 0x00000000 in flash to test-read.bin

real	0m0.263s
user	0m0.000s
sys	0m0.044s

Regards,
Amit

> > Do you get more capacity? Does anyone really use large SPI-NOR
> > flashes? If so, why?
> 
> > You get twice more capacity based on that configuration. I can't
> > answer the second question because not working with field. But both of
> > that configurations are used by customers. Adding Neal if he wants to add
> something more to it.
> 
> Just to add a comment as I work directly with our customers.  The main
> reason this support is important is for our older SoCs, zynq and zynqmp.
> 
> Most of our customers are using QSPI flash as the first boot memory to get
> from the boot ROM to u-boot.  They then typically use other memories, such
> as eMMC for the Linux kernel, OS and file system.
> 
> The issue we have on the zynq and zynqmp SoCs is that the boot ROM (code
> that cannot be changed) will not boot from an OSPI flash.  It will only boot
> from a QSPI flash.  This is what is forcing many of our customers down the
> QSPI path.
> Since many of these customers are interested in additional speed and
> memory size, they then end up using a parallel or stacked configuration
> because they cannot use an OSPI with zynq or zynqmp.
> 
> All of our newer SoCs can boot from OSPI.  I agree with you that if someone
> could choose OSPI for performance, they would, so I do not expect parallel or
> stacked configurations with our newer SoCs.
> 
> I get why you see this configuration as a niche, but for us, it is a very large
> niche because zynq and zynqmp are two of our most successful SoC families.
> 
> > I mean you've put that controller on your SoC, you must have some
> > convincing arguments why a customer should use it.
> 
> > I expect recommendation is to use single configuration but if you need
> > bigger space for your application the only way to extend it is to use
> > stacked configuration with two the same flashes next to each other.
> > If you want to have bigger size and also be faster answer is parallel
> > configuration.
> 
> 
> >>> The first round of patches were really invasive regarding the core
> >>> code. So if there is a clean layering approach which can be enabled
> >>> as a module and you are maintaining it I'm fine with that (even if
> >>> the core code needs some changes then like hooks or so, not sure).
> >>
> >> That discussion started with Miquel some years ago when he was trying
> >> to to solve description in DT which is merged for a while in the kernel.
> >
> > What's your point here? From what I can tell the DT binding is wrong
> > and needs to be reworked anyway.
> 
> > I am just saying that this is not any adhoc new feature but
> > configuration which has been already discussed and some steps made. If
> > DT binding is wrong it can be deprecated and use new one but for that it
> has be clear which way to go.
> 
> 
> >> And Amit is trying to figure it out which way to go.
> >> I don't want to predict where that code should be going or how it
> >> should look like because don't have spi-nor experience. But I hope we
> >> finally move forward on this topic to see the path how to upstream
> support for it.
> >
> > You still didn't answer my initial question. Will AMD support and
> > maintain that code? I was pushing you towards putting that code into
> > your own driver because then that's up to you what you are doing
> > there.
> 
> > Of course. We care about our code and about supporting our SOC and
> > features which are related to it. It means yes, we will be regularly
> > testing it and taking care about it.
> 
> 
> > So how do we move forward? I'd like to see as little as core changes
> > possible to get your dual flash setup working. I'm fine with having a
> > layer in spi-nor/ (not sure that's how it will work with mtdcat which
> > looks like it's similar as your stacked flash) as long as it can be a
> > module (selected by the driver).
> 
> > ok.
> 
> Best regards,
> Neal Frager
> AMD

Michael Walle Aug. 5, 2024, 8:14 a.m. UTC | #30

Hi,

> > You get twice more capacity based on that configuration. I can't answer the 
> > second question because not working with field. But both of that configurations 
> > are used by customers. Adding Neal if he wants to add something more to it.
>
> Just to add a comment as I work directly with our customers.  The main reason
> this support is important is for our older SoCs, zynq and zynqmp.
>
> Most of our customers are using QSPI flash as the first boot memory to get
> from the boot ROM to u-boot.  They then typically use other memories, such as
> eMMC for the Linux kernel, OS and file system.

Agreed and that's probably the most prominent use case for NOR
flashes anyway.

> The issue we have on the zynq and zynqmp SoCs is that the boot ROM (code that
> cannot be changed) will not boot from an OSPI flash.  It will only boot from a
> QSPI flash.  This is what is forcing many of our customers down the QSPI path.
> Since many of these customers are interested in additional speed and memory
> size, they then end up using a parallel or stacked configuration because they
> cannot use an OSPI with zynq or zynqmp.

Above you've said, the bootloader is stored on the NOR flash and the
bulk storage is eMMC. So why do you need bigger NOR flashes (where
even the biggest NOR flash isn't enough)?

I also don't understand "the boot ROM will just boot from QSPI".
First, you cannot connect an octal flash anyway, because you only
have an QSPI controller, right? Secondly, normally the bootrom will
(at least) boot the first stage using normal (single line) SPI
commands. Is that not the case for zynq and zynqmp?

> All of our newer SoCs can boot from OSPI.  I agree with you that if someone
> could choose OSPI for performance, they would, so I do not expect parallel or
> stacked configurations with our newer SoCs.

Ok, but then the argument with bigger flashes are void, because you
are back to be bound to one OSPI flash.

> I get why you see this configuration as a niche, but for us, it is a very
> large niche because zynq and zynqmp are two of our most successful SoC
> families.

Fair enough. But please find a way to support it without butchering
the whole core.

-michael

Michael Walle Aug. 5, 2024, 8:27 a.m. UTC | #31

Hi,

> >>> All I'm saying is that you shouldn't put burden on us (the SPI NOR
> >>> maintainers) for what seems to me at least as a niche. Thus I was
> >>> asking for performance numbers and users. Convince me that I'm
> >>> wrong and that is worth our time.
> >>
> >> No. It is not really just feature of our evaluation boards. Customers are using
> >> it. I was talking to one guy from field and he confirms me that these
> >> configurations are used by his multiple customers in real products.
> > 
> > Which begs the question, do we really have to support every feature
> > in the core (I'd like to hear Tudors and Pratyush opinion here).
> > Honestly, this just looks like a concatenation of two QSPI
> > controllers. 
>
> Based on my understanding for stacked yes. For parallel no.

See below.

> > Why didn't you just use a normal octal controller which
> > is a protocol also backed by the JEDEC standard. 
>
> On newer SOC octal IP core is used.
> Amit please comment.
>
> > Is it any faster?
>
> Amit: please provide numbers.
>
> > Do you get more capacity? Does anyone really use large SPI-NOR
> > flashes? If so, why?
>
> You get twice more capacity based on that configuration. I can't answer the 
> second question because not working with field. But both of that configurations 
> are used by customers. Adding Neal if he wants to add something more to it.
>
> > I mean you've put that controller on your SoC,
> > you must have some convincing arguments why a customer should use
> > it.
>
> I expect recommendation is to use single configuration but if you need bigger 
> space for your application the only way to extend it is to use stacked 
> configuration with two the same flashes next to each other.
> If you want to have bigger size and also be faster answer is parallel 
> configuration.

But who is using expensive NOR flash for bulk storage anyway? You're
only mentioning parallel mode. Also the performance numbers were
just about the parallel mode. What about stacked mode? Because
there's a chance that parallel mode works without modification of
the core (?).

> >>> The first round of patches were really invasive regarding the core
> >>> code. So if there is a clean layering approach which can be enabled
> >>> as a module and you are maintaining it I'm fine with that (even if
> >>> the core code needs some changes then like hooks or so, not sure).
> >>
> >> That discussion started with Miquel some years ago when he was trying to to
> >> solve description in DT which is merged for a while in the kernel.
> > 
> > What's your point here? From what I can tell the DT binding is wrong
> > and needs to be reworked anyway.
>
> I am just saying that this is not any adhoc new feature but configuration which 
> has been already discussed and some steps made. If DT binding is wrong it can be 
> deprecated and use new one but for that it has be clear which way to go.

Well, AMD could have side stepped all this if they had just
integrated a normal OSPI flash controller, which would have the same
requirements regarding the pins (if not even less) and it would have
been *easy* to integrate it into the already available ecosystem.
That was what my initial question was about. Why did you choose two
QSPI ports instead of one OSPI port.

> >> And Amit is trying to figure it out which way to go.
> >> I don't want to predict where that code should be going or how it should look
> >> like because don't have spi-nor experience. But I hope we finally move forward
> >> on this topic to see the path how to upstream support for it.
> > 
> > You still didn't answer my initial question. Will AMD support and
> > maintain that code? I was pushing you towards putting that code into
> > your own driver because then that's up to you what you are doing
> > there.
>
> Of course. We care about our code and about supporting our SOC and features 
> which are related to it. It means yes, we will be regularly testing it and 
> taking care about it.

Great!

-michael

Michal Simek Aug. 5, 2024, 11 a.m. UTC | #32

Hi,

On 8/5/24 10:27, Michael Walle wrote:
> Hi,
> 
>>>>> All I'm saying is that you shouldn't put burden on us (the SPI NOR
>>>>> maintainers) for what seems to me at least as a niche. Thus I was
>>>>> asking for performance numbers and users. Convince me that I'm
>>>>> wrong and that is worth our time.
>>>>
>>>> No. It is not really just feature of our evaluation boards. Customers are using
>>>> it. I was talking to one guy from field and he confirms me that these
>>>> configurations are used by his multiple customers in real products.
>>>
>>> Which begs the question, do we really have to support every feature
>>> in the core (I'd like to hear Tudors and Pratyush opinion here).
>>> Honestly, this just looks like a concatenation of two QSPI
>>> controllers.
>>
>> Based on my understanding for stacked yes. For parallel no.
> 
> See below.
> 
>>> Why didn't you just use a normal octal controller which
>>> is a protocol also backed by the JEDEC standard.
>>
>> On newer SOC octal IP core is used.
>> Amit please comment.
>>
>>> Is it any faster?
>>
>> Amit: please provide numbers.
>>
>>> Do you get more capacity? Does anyone really use large SPI-NOR
>>> flashes? If so, why?
>>
>> You get twice more capacity based on that configuration. I can't answer the
>> second question because not working with field. But both of that configurations
>> are used by customers. Adding Neal if he wants to add something more to it.
>>
>>> I mean you've put that controller on your SoC,
>>> you must have some convincing arguments why a customer should use
>>> it.
>>
>> I expect recommendation is to use single configuration but if you need bigger
>> space for your application the only way to extend it is to use stacked
>> configuration with two the same flashes next to each other.
>> If you want to have bigger size and also be faster answer is parallel
>> configuration.
> 
> But who is using expensive NOR flash for bulk storage anyway? 

I expect you understand that even if I know companies which does it I am not 
allow to share their names.

But customers don't need to have other free pins to connect for example emmc.
That's why adding one more "expensive flash" can be for them only one option.

Also I bet that price for one more qspi flash is nothing compare to chip itself 
and other related expenses for low volume production.

> You're
> only mentioning parallel mode. Also the performance numbers were
> just about the parallel mode. What about stacked mode? Because
> there's a chance that parallel mode works without modification of
> the core (?).

I will let Amit to comment it.


> 
>>>>> The first round of patches were really invasive regarding the core
>>>>> code. So if there is a clean layering approach which can be enabled
>>>>> as a module and you are maintaining it I'm fine with that (even if
>>>>> the core code needs some changes then like hooks or so, not sure).
>>>>
>>>> That discussion started with Miquel some years ago when he was trying to to
>>>> solve description in DT which is merged for a while in the kernel.
>>>
>>> What's your point here? From what I can tell the DT binding is wrong
>>> and needs to be reworked anyway.
>>
>> I am just saying that this is not any adhoc new feature but configuration which
>> has been already discussed and some steps made. If DT binding is wrong it can be
>> deprecated and use new one but for that it has be clear which way to go.
> 
> Well, AMD could have side stepped all this if they had just
> integrated a normal OSPI flash controller, which would have the same
> requirements regarding the pins (if not even less) and it would have
> been *easy* to integrate it into the already available ecosystem.
> That was what my initial question was about. Why did you choose two
> QSPI ports instead of one OSPI port.

Keep in your mind that ZynqMP is 9years old SoC. Zynq 12+ years with a lot of 
internal development happening before. Not sure if ospi even exists at that 
time. Also if any IP was available for the price which they were targeting.
I don't think make sense to discuss OSPI in this context because that's not in 
these SoCs.
I have never worked with spi that's why don't know historical context to provide 
more details.

Thanks,
Michal

Mahapatra, Amit Kumar Aug. 7, 2024, 1:21 p.m. UTC | #33

Hello Michael,

> On 8/5/24 10:27, Michael Walle wrote:
> > Hi,
> >
> >>>>> All I'm saying is that you shouldn't put burden on us (the SPI NOR
> >>>>> maintainers) for what seems to me at least as a niche. Thus I was
> >>>>> asking for performance numbers and users. Convince me that I'm
> >>>>> wrong and that is worth our time.
> >>>>
> >>>> No. It is not really just feature of our evaluation boards.
> >>>> Customers are using it. I was talking to one guy from field and he
> >>>> confirms me that these configurations are used by his multiple
> customers in real products.
> >>>
> >>> Which begs the question, do we really have to support every feature
> >>> in the core (I'd like to hear Tudors and Pratyush opinion here).
> >>> Honestly, this just looks like a concatenation of two QSPI
> >>> controllers.
> >>
> >> Based on my understanding for stacked yes. For parallel no.
> >
> > See below.
> >
> >>> Why didn't you just use a normal octal controller which is a
> >>> protocol also backed by the JEDEC standard.
> >>
> >> On newer SOC octal IP core is used.
> >> Amit please comment.
> >>
> >>> Is it any faster?
> >>
> >> Amit: please provide numbers.
> >>
> >>> Do you get more capacity? Does anyone really use large SPI-NOR
> >>> flashes? If so, why?
> >>
> >> You get twice more capacity based on that configuration. I can't
> >> answer the second question because not working with field. But both
> >> of that configurations are used by customers. Adding Neal if he wants to
> add something more to it.
> >>
> >>> I mean you've put that controller on your SoC, you must have some
> >>> convincing arguments why a customer should use it.
> >>
> >> I expect recommendation is to use single configuration but if you
> >> need bigger space for your application the only way to extend it is
> >> to use stacked configuration with two the same flashes next to each other.
> >> If you want to have bigger size and also be faster answer is parallel
> >> configuration.
> >
> > But who is using expensive NOR flash for bulk storage anyway?
> 
> I expect you understand that even if I know companies which does it I am not
> allow to share their names.
> 
> But customers don't need to have other free pins to connect for example
> emmc.
> That's why adding one more "expensive flash" can be for them only one
> option.
> 
> Also I bet that price for one more qspi flash is nothing compare to chip itself
> and other related expenses for low volume production.
> 
> > You're
> > only mentioning parallel mode. Also the performance numbers were just
> > about the parallel mode. What about stacked mode? Because there's a
> > chance that parallel mode works without modification of the core (?).
> 
> I will let Amit to comment it.

The performance of the stacked configuration will be the same as that of 
the single mode. As Michal mentioned earlier, stacked mode is used for 
scenarios where the customer requires larger flash space while maintaining 
the same performance.

I want to provide some background on why I choose to handle stacked and 
parallel modes through an additional layer or file, such as 
/mtd/spi-nor/stacked.c, rather than mtd-concat. Initially, when Miquel 
began upstreaming stacked support by extending the mtd-concat driver, 
the DT binding was not accepted. He proposed a couple of DT bindings 
[1] & [2] to support stacking through mtd-concat, but none were accepted. 
Additionally, after reviewing the MTD core code, he found that adding 
stacked support through mtd-concat could be complicated and involve many 
corner cases, which he mentioned in his RFC [3]. He then suggested 
concatenating the flashes instead of the mtd partitions, and eventually, 
the current DT bindings were added. This is why I propose handling the 
stacked and parallel configurations through an additional layer or file, 
as the mtd-concat approach was already discussed and rejected.

[1] https://lore.kernel.org/all/20191113171505.26128-4-miquel.raynal@bootlin.com/
[2] https://lore.kernel.org/all/20191127105522.31445-5-miquel.raynal@bootlin.com/
[3]https://lore.kernel.org/all/20211112152411.818321-1-miquel.raynal@bootlin.com/

Regards,
Amit

> 
> 
> >
> >>>>> The first round of patches were really invasive regarding the core
> >>>>> code. So if there is a clean layering approach which can be
> >>>>> enabled as a module and you are maintaining it I'm fine with that
> >>>>> (even if the core code needs some changes then like hooks or so, not
> sure).
> >>>>
> >>>> That discussion started with Miquel some years ago when he was
> >>>> trying to to solve description in DT which is merged for a while in the
> kernel.
> >>>
> >>> What's your point here? From what I can tell the DT binding is wrong
> >>> and needs to be reworked anyway.
> >>
> >> I am just saying that this is not any adhoc new feature but
> >> configuration which has been already discussed and some steps made.
> >> If DT binding is wrong it can be deprecated and use new one but for that it
> has be clear which way to go.
> >
> > Well, AMD could have side stepped all this if they had just integrated
> > a normal OSPI flash controller, which would have the same requirements
> > regarding the pins (if not even less) and it would have been *easy* to
> > integrate it into the already available ecosystem.
> > That was what my initial question was about. Why did you choose two
> > QSPI ports instead of one OSPI port.
> 
> Keep in your mind that ZynqMP is 9years old SoC. Zynq 12+ years with a lot of
> internal development happening before. Not sure if ospi even exists at that
> time. Also if any IP was available for the price which they were targeting.
> I don't think make sense to discuss OSPI in this context because that's not in
> these SoCs.
> I have never worked with spi that's why don't know historical context to
> provide more details.
> 
> Thanks,
> Michal

Miquel Raynal Aug. 12, 2024, 7:29 a.m. UTC | #34

Hi Michael,

> > > The first round of patches were really invasive regarding the core
> > > code. So if there is a clean layering approach which can be enabled
> > > as a module and you are maintaining it I'm fine with that (even if
> > > the core code needs some changes then like hooks or so, not sure).  
> >
> > That discussion started with Miquel some years ago when he was trying to to 
> > solve description in DT which is merged for a while in the kernel.  
> 
> What's your point here? From what I can tell the DT binding is wrong
> and needs to be reworked anyway.

I'm sorry I'm now catching up, can you point at the thread explaining
what is wrong in the bindings? I didn't find where this was detailed. Or
otherwise summarize quickly what needs to change?

Thanks!
Miquèl

Michael Walle Aug. 12, 2024, 7:37 a.m. UTC | #35

Hi,

> > > > The first round of patches were really invasive regarding the core
> > > > code. So if there is a clean layering approach which can be enabled
> > > > as a module and you are maintaining it I'm fine with that (even if
> > > > the core code needs some changes then like hooks or so, not sure).  
> > >
> > > That discussion started with Miquel some years ago when he was trying to to 
> > > solve description in DT which is merged for a while in the kernel.  
> > 
> > What's your point here? From what I can tell the DT binding is wrong
> > and needs to be reworked anyway.
>
> I'm sorry I'm now catching up, can you point at the thread explaining
> what is wrong in the bindings? I didn't find where this was detailed. Or
> otherwise summarize quickly what needs to change?

Somewhere in this mega thread Tudor had some remarks about the
bindings. Amit also mentioned it here:

https://lore.kernel.org/r/IA0PR12MB769944254171C39FF4171B52DCB42@IA0PR12MB7699.namprd12.prod.outlook.com/

-michael

Miquel Raynal Aug. 12, 2024, 8:38 a.m. UTC | #36

Hi,

> Hello Everyone,
> 
> I would like to propose another approach for handling stacked and 
> parallel connection modes and would appreciate your thoughts on it. 
> But before that, here is some background on what we are trying to achieve.
> 
> The AMD QSPI controller supports two advanced connection modes(Stacked and 
> Dual Parallel) which allow the controller to treat two different flashes 
> as one storage.
> 
> Stacked:
> Flashes share the same SPI bus, but different CS line, controller asserts 
> the CS of the flash to which it needs to communicate.
> 
> Dual Parallel:
> Both the flashes have their separate SPI bus CS of both the flashes will 
> be asserted/de-asserted at the same time. In this mode data will be split 
> across both the flashes by enabling the STRIPE setting in the controller. 
> If STRIPE is not enabled, then same data will be sent to both the devices.
> 
> For more information on the modes please feel free to go through the 
> controller flash interface below
> https://docs.amd.com/r/en-US/am011-versal-acap-trm/QSPI-Flash-Device-Interface
> 
> Mirochip QSPI controller also supports "Dual Parallel 8-bit IO mode", but 
> they call it "Twin Quad Mode".
> https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/ProductDocuments/DataSheets/SAMA7G5-Series-Data-Sheet-DS60001765.pdf
> 
> DT binding changes were added through the following commits:
> https://github.com/torvalds/linux/commit/f89504300e94524d5d5846ff8b728592ac72cec4
> https://github.com/torvalds/linux/commit/eba5368503b4291db7819512600fa014ea17c5a8
> https://github.com/torvalds/linux/commit/e2edd1b64f1c79e8abda365149ed62a2a9a494b4
> 
> SPI core changes were adds through the following commit:
> https://github.com/torvalds/linux/commit/4d8ff6b0991d5e86b17b235fc46ec62e9195cb9b
> 
> Based on the inputs/suggestions from Tudor, i am planning to add a new 
> layer between the SPI-NOR and MTD layers to support stacked and parallel 
> configurations. This new layer will be part of the spi-nor and located in 
> mtd/spi-nor/

For now I don't think you need a totally generic implementation. As
long as there is only one controller supporting these modes, I'd say
this is not super relevant.

> This layer would perform the following tasks:
>  - During probing, store information from all the connected flashes, 
>    whether in stacked or parallel mode, and present it as a single device 
>    to the MTD layer.
>  - Register callbacks through this new layer instead of spi-nor/core.c and 
>    handle MTD device registration.
>  - In stacked mode, select the appropriate spi-nor flash based on the 
>    address provided by the MTD layer during flash operations.
>  - Manage flash crossover operations in stacked mode.
>  - Ensure both connected flashes are identical in parallel mode.
>  - Handle odd byte count requests from the MTD layer during flash 
>    operations in parallel mode.
> 
> For implementing this the current DT binding need to be updated as 
> follows.

So you want to go back to step 1 and redefine bindings? Is that worth?

> stacked-memories DT changes:
>  - Flash size information can be retrieved directly from the flash, so it 
>    has been removed from the DT binding.
>  - Each stacked flash will have its own flash node. This approach allows 
>    flashes of different makes and sizes to be stacked together, as each 
>    flash will be probed individually.

And how will you define partitions crossing device boundaries? I
believe this constraint has been totally forgotten in this proposal.
The whole idea of stacking two devices this way was to simplify the
user's life with a single device exposed and the controller handling
itself the CS changes. That is precisely what the current binding do.
The final goal being to double your storage transparently. If your goal
was to put two chips aside, then none of this was actually needed. If
you don't care about that anymore, then all the energy put into
discussing the bindings initially was useless and a controller property
could also have made the trick.

>  - The stacked-memories DT bindings will contain the phandles of the flash 
>    nodes connected in stacked mode.
> 
> spi@0 {
>   
>   flash@0 {
>     compatible = "jedec,spi-nor"
>     reg = <0x00>;
>     stacked-memories = <&flash@0 &flash@1>;
>     spi-max-frequency = <50000000>;
>     ...
>               partition@0 { 
>         label = "qspi-0";
>         reg = <0x0 0xf00000>;
>     };
>                         
> 
>   }
>   
>   flash@1 {
>     compatible = "jedec,spi-nor"
>               reg = <0x01>;
>     spi-max-frequency = <50000000>;
>     ...
>               partition@0 { 
>         label = "qspi-1";
>         reg = <0x0 0x800000>;
>     };
>   }
> }
> 
> parallel-memories DT changes:
>  - Flash size information can be retrieved directly from the flash, so it 
>    has been removed from the DT binding.
>  - Each flash connected in parallel mode will have its own flash node. 
>    This allows us to verify that both flashes connected in parallel are 
>    identical, as each flash node will be probed separately.

Well, you don't have to verify that. It's a basic hardware design
constraint for using this mode.

Otherwise same comment as above, this description doesn't allow
correct partitioning and that was one of the main constraints back when
we discussed these needs.

>  - The parallel-memories DT bindings will contain the phandles of the 
>    flash nodes connected in parallel.
> 
> spi@0 {
>   
>   flash@0 {
>     compatible = "jedec,spi-nor"
>     reg = <0x00>;
>     parallel-memories = <&flash@0 &flash@1>;
>     spi-max-frequency = <50000000>;
>     ...
>               partition@0 { 
>         label = "qspi-0";
>         reg = <0x0 0xf00000>;
>     };
>                         
> 
>   }
>   
>   flash@1 {
>     compatible = "jedec,spi-nor"
>               reg = <0x01>;
>     spi-max-frequency = <50000000>;
>     ...
>               partition@0 { 
>         label = "qspi-1";
>         reg = <0x0 0x800000>;
>     };
>   }
> }

Thanks,
Miquèl

Miquel Raynal Aug. 12, 2024, 8:39 a.m. UTC | #37

Hi Michael,

michael@walle.cc wrote on Mon, 12 Aug 2024 09:37:06 +0200:

> Hi,
> 
> > > > > The first round of patches were really invasive regarding the core
> > > > > code. So if there is a clean layering approach which can be enabled
> > > > > as a module and you are maintaining it I'm fine with that (even if
> > > > > the core code needs some changes then like hooks or so, not sure).    
> > > >
> > > > That discussion started with Miquel some years ago when he was trying to to 
> > > > solve description in DT which is merged for a while in the kernel.    
> > > 
> > > What's your point here? From what I can tell the DT binding is wrong
> > > and needs to be reworked anyway.  
> >
> > I'm sorry I'm now catching up, can you point at the thread explaining
> > what is wrong in the bindings? I didn't find where this was detailed. Or
> > otherwise summarize quickly what needs to change?  
> 
> Somewhere in this mega thread Tudor had some remarks about the
> bindings. Amit also mentioned it here:
> 
> https://lore.kernel.org/r/IA0PR12MB769944254171C39FF4171B52DCB42@IA0PR12MB7699.namprd12.prod.outlook.com/

Great. I jumped-in there. Thanks!

Miquèl

Tudor Ambarus Aug. 12, 2024, 9:45 a.m. UTC | #38

Hiya,

Amit, the thread becomes unreadable, better start a new one describing
what you plan to do.

On 8/12/24 9:38 AM, Miquel Raynal wrote:
> For now I don't think you need a totally generic implementation. As
> long as there is only one controller supporting these modes, I'd say
> this is not super relevant.

Miquel,

Microchip supports a twin quad mode too, and micron has some flashes
with similar architecture, see:

https://lore.kernel.org/linux-mtd/20231125092137.2948-1-amit.kumar-mahapatra@amd.com/T/#mbe999dde1d29371183aa53089ef6f0087d6902a7

We shall consider them from the beginning. I guess we need to read
everything all over again.

Mahapatra, Amit Kumar Aug. 14, 2024, 7:13 a.m. UTC | #39

Hello Miquel,

> > Based on the inputs/suggestions from Tudor, i am planning to add a new
> > layer between the SPI-NOR and MTD layers to support stacked and
> > parallel configurations. This new layer will be part of the spi-nor
> > and located in mtd/spi-nor/
> 
> For now I don't think you need a totally generic implementation. As long as
> there is only one controller supporting these modes, I'd say this is not super
> relevant.

IMHO, there should be a general solution since this isn't limited to just 
one controller. Any controller can support stacked mode with multiple 
native-cs or multiple gpio-cs, or with a combination of a native-cs and a 
gpio-cs. For parallel configurations, there are other controllers from 
Microchip and some flash devices that operate similarly to AMD's parallel 
mode.

> 
> > This layer would perform the following tasks:
> >  - During probing, store information from all the connected flashes,
> >    whether in stacked or parallel mode, and present it as a single device
> >    to the MTD layer.
> >  - Register callbacks through this new layer instead of spi-nor/core.c and
> >    handle MTD device registration.
> >  - In stacked mode, select the appropriate spi-nor flash based on the
> >    address provided by the MTD layer during flash operations.
> >  - Manage flash crossover operations in stacked mode.
> >  - Ensure both connected flashes are identical in parallel mode.
> >  - Handle odd byte count requests from the MTD layer during flash
> >    operations in parallel mode.
> >
> > For implementing this the current DT binding need to be updated as
> > follows.
> 
> So you want to go back to step 1 and redefine bindings? Is that worth?

The current bindings are effective if we only support identical 
flash devices or flashes of the same make but with different sizes 
connected in stacked mode. However, if we want to extend stacked support 
to include flashes of different makes in stacked mode, the current 
bindings may not be adequate. That's why I suggested updating the bindings 
to accommodate all possible scenario.

> 
> > stacked-memories DT changes:
> >  - Flash size information can be retrieved directly from the flash, so it
> >    has been removed from the DT binding.
> >  - Each stacked flash will have its own flash node. This approach allows
> >    flashes of different makes and sizes to be stacked together, as each
> >    flash will be probed individually.
> 
> And how will you define partitions crossing device boundaries? I believe this
> constraint has been totally forgotten in this proposal.

According to the new binding proposal, one of the two flash nodes will 
have a partition that crosses the device boundary.

> The whole idea of stacking two devices this way was to simplify the user's life
> with a single device exposed and the controller handling itself the CS changes.
> That is precisely what the current binding do.

That's true, but as I mentioned earlier, if we're not inclined to support 
stacked mode for flashes of different makes, then the current bindings 
are sufficient.

> The final goal being to double your storage transparently. If your goal was to
> put two chips aside, then none of this was actually needed. If you don't care
> about that anymore, then all the energy put into discussing the bindings
> initially was useless and a controller property could also have made the trick.
> 
> >  - The stacked-memories DT bindings will contain the phandles of the flash
> >    nodes connected in stacked mode.
> >
> > spi@0 {
> >
> >   flash@0 {
> >     compatible = "jedec,spi-nor"
> >     reg = <0x00>;
> >     stacked-memories = <&flash@0 &flash@1>;
> >     spi-max-frequency = <50000000>;
> >     ...
> >               partition@0 {
> >         label = "qspi-0";
> >         reg = <0x0 0xf00000>;
> >     };
> >
> >
> >   }
> >
> >   flash@1 {
> >     compatible = "jedec,spi-nor"
> >               reg = <0x01>;
> >     spi-max-frequency = <50000000>;
> >     ...
> >               partition@0 {
> >         label = "qspi-1";
> >         reg = <0x0 0x800000>;
> >     };
> >   }
> > }
> >
> > parallel-memories DT changes:
> >  - Flash size information can be retrieved directly from the flash, so it
> >    has been removed from the DT binding.
> >  - Each flash connected in parallel mode will have its own flash node.
> >    This allows us to verify that both flashes connected in parallel are
> >    identical, as each flash node will be probed separately.
> 
> Well, you don't have to verify that. It's a basic hardware design constraint for
> using this mode.

Agree

Regards,
Amit

> 
> Otherwise same comment as above, this description doesn't allow correct
> partitioning and that was one of the main constraints back when we discussed
> these needs.
> 
> >  - The parallel-memories DT bindings will contain the phandles of the
> >    flash nodes connected in parallel.
> >
> > spi@0 {
> >
> >   flash@0 {
> >     compatible = "jedec,spi-nor"
> >     reg = <0x00>;
> >     parallel-memories = <&flash@0 &flash@1>;
> >     spi-max-frequency = <50000000>;
> >     ...
> >               partition@0 {
> >         label = "qspi-0";
> >         reg = <0x0 0xf00000>;
> >     };
> >
> >
> >   }
> >
> >   flash@1 {
> >     compatible = "jedec,spi-nor"
> >               reg = <0x01>;
> >     spi-max-frequency = <50000000>;
> >     ...
> >               partition@0 {
> >         label = "qspi-1";
> >         reg = <0x0 0x800000>;
> >     };
> >   }
> > }
> 
> Thanks,
> Miquèl

Miquel Raynal Aug. 14, 2024, 8:46 a.m. UTC | #40

Hi Amit,

amit.kumar-mahapatra@amd.com wrote on Wed, 14 Aug 2024 07:13:35 +0000:

> Hello Miquel,
> 
> > > Based on the inputs/suggestions from Tudor, i am planning to add a new
> > > layer between the SPI-NOR and MTD layers to support stacked and
> > > parallel configurations. This new layer will be part of the spi-nor
> > > and located in mtd/spi-nor/  
> > 
> > For now I don't think you need a totally generic implementation. As long as
> > there is only one controller supporting these modes, I'd say this is not super
> > relevant.  
> 
> IMHO, there should be a general solution since this isn't limited to just 
> one controller. Any controller can support stacked mode with multiple 
> native-cs or multiple gpio-cs, or with a combination of a native-cs and a 
> gpio-cs.

That's not what was initially discussed. The Xilinx use case was:
a controller is managing two devices "at the same time" transparently
from the host. So the two flashes appear like a single one and thus,
are described like a single one.

You don't need anything in the bindings nor in the core to manage two
flashes connected to the same controller otherwise. The only use case
the Xilinx model was bringing, was to consider the storage bigger from
a host perspective and thus be able to store files across the device
boundary natively.

> For parallel configurations, there are other controllers from 
> Microchip and some flash devices that operate similarly to AMD's parallel 
> mode.

Yes, Tudor reminded me about these.

> > > This layer would perform the following tasks:
> > >  - During probing, store information from all the connected flashes,
> > >    whether in stacked or parallel mode, and present it as a single device
> > >    to the MTD layer.
> > >  - Register callbacks through this new layer instead of spi-nor/core.c and
> > >    handle MTD device registration.
> > >  - In stacked mode, select the appropriate spi-nor flash based on the
> > >    address provided by the MTD layer during flash operations.
> > >  - Manage flash crossover operations in stacked mode.
> > >  - Ensure both connected flashes are identical in parallel mode.
> > >  - Handle odd byte count requests from the MTD layer during flash
> > >    operations in parallel mode.
> > >
> > > For implementing this the current DT binding need to be updated as
> > > follows.  
> > 
> > So you want to go back to step 1 and redefine bindings? Is that worth?  
> 
> The current bindings are effective if we only support identical 
> flash devices or flashes of the same make but with different sizes 
> connected in stacked mode. However, if we want to extend stacked support 
> to include flashes of different makes in stacked mode,

The only actual feature the stacked mode brings is the ability to
consider two devices like one. This is abstracted by hardware, this is
a controller capability. The only way this can work is if the two
storage devices are of the same kind and accept the same commands/init
cycles.

If you consider two different devices, then there is no hardware
abstraction anymore, the controller is no longer "smart" enough and you
are back to the standard situation with two devices connected using
their own independent CS, known by the host. In this case the
"stacked-mode" bindings no longer apply. You need to describe the two
chips independently in the DT, and your stacked feature in the
controller can no longer be used.

You need other bindings to support this case because it is a different
situation. For this case, there was a mtd-concat approach (which was
never merged). But this is really something different than the stacked
mode in your controller because there is no specific hardware feature
involved, it's just pure software.

> the current 
> bindings may not be adequate. That's why I suggested updating the bindings 
> to accommodate all possible scenario.
> 
> >   
> > > stacked-memories DT changes:
> > >  - Flash size information can be retrieved directly from the flash, so it
> > >    has been removed from the DT binding.
> > >  - Each stacked flash will have its own flash node. This approach allows
> > >    flashes of different makes and sizes to be stacked together, as each
> > >    flash will be probed individually.  
> > 
> > And how will you define partitions crossing device boundaries? I believe this
> > constraint has been totally forgotten in this proposal.  
> 
> According to the new binding proposal, one of the two flash nodes will 
> have a partition that crosses the device boundary.

From a bindings perspective, it feels very awkward and I doubt it will
be accepted. From a code perspective, you're gonna need to butcher the
core...

> > The whole idea of stacking two devices this way was to simplify the user's life
> > with a single device exposed and the controller handling itself the CS changes.
> > That is precisely what the current binding do.  
> 
> That's true, but as I mentioned earlier, if we're not inclined to support 
> stacked mode for flashes of different makes, then the current bindings 
> are sufficient.

Thanks,
Miquèl

Mahapatra, Amit Kumar Aug. 14, 2024, 12:53 p.m. UTC | #41

Hello Miquel,

> That's not what was initially discussed. The Xilinx use case was:
> a controller is managing two devices "at the same time" transparently from

Yes, but in stacked mode, the controller communicates with one of the two 
connected flash devices at any given time, based on the address and data 
length. It doesn't assert both chip selects simultaneously.

> the host. So the two flashes appear like a single one and thus, are described
> like a single one.
> 
> You don't need anything in the bindings nor in the core to manage two
> flashes connected to the same controller otherwise. The only use case the
> Xilinx model was bringing, was to consider the storage bigger from a host
> perspective and thus be able to store files across the device boundary
> natively.

When adding stacked support to the SPI core, Mark also asked us to support 
the GPIO chip select use case, so it is not only restricted to native cs.

> 
> > For parallel configurations, there are other controllers from
> > Microchip and some flash devices that operate similarly to AMD's
> > parallel mode.
> 
> Yes, Tudor reminded me about these.
> 
> > > > This layer would perform the following tasks:
> > > >  - During probing, store information from all the connected flashes,
> > > >    whether in stacked or parallel mode, and present it as a single device
> > > >    to the MTD layer.
> > > >  - Register callbacks through this new layer instead of spi-nor/core.c and
> > > >    handle MTD device registration.
> > > >  - In stacked mode, select the appropriate spi-nor flash based on the
> > > >    address provided by the MTD layer during flash operations.
> > > >  - Manage flash crossover operations in stacked mode.
> > > >  - Ensure both connected flashes are identical in parallel mode.
> > > >  - Handle odd byte count requests from the MTD layer during flash
> > > >    operations in parallel mode.
> > > >
> > > > For implementing this the current DT binding need to be updated as
> > > > follows.
> > >
> > > So you want to go back to step 1 and redefine bindings? Is that worth?
> >
> > The current bindings are effective if we only support identical flash
> > devices or flashes of the same make but with different sizes connected
> > in stacked mode. However, if we want to extend stacked support to
> > include flashes of different makes in stacked mode,
> 
> The only actual feature the stacked mode brings is the ability to consider two
> devices like one. This is abstracted by hardware, this is a controller capability.

Stacked mode is a software abstraction rather than a controller feature or 
capability. At any given time, the controller communicates with one of the 
two connected flash devices, as determined by the requested address and data 
length. If an operation starts on one flash and ends on the other, the core 
needs to split it into two separate operations and adjust the data length 
accordingly.

> The only way this can work is if the two storage devices are of the same kind
> and accept the same commands/init cycles.
> 
> If you consider two different devices, then there is no hardware abstraction
> anymore, the controller is no longer "smart" enough and you are back to the
> standard situation with two devices connected using their own independent
> CS, known by the host. In this case the "stacked-mode" bindings no longer
> apply. You need to describe the two chips independently in the DT, and your
> stacked feature in the controller can no longer be used.

As stated earlier stacked is not a controller feature but rather a software 
abstraction to assert the appropriate cs as per the requested address & data 
length.

Regards,
Amit
> 
> You need other bindings to support this case because it is a different
> situation. For this case, there was a mtd-concat approach (which was never
> merged). But this is really something different than the stacked mode in your
> controller because there is no specific hardware feature involved, it's just pure
> software.
> 
> > the current
> > bindings may not be adequate. That's why I suggested updating the
> > bindings to accommodate all possible scenario.
> >
> > >
> > > > stacked-memories DT changes:
> > > >  - Flash size information can be retrieved directly from the flash, so it
> > > >    has been removed from the DT binding.
> > > >  - Each stacked flash will have its own flash node. This approach allows
> > > >    flashes of different makes and sizes to be stacked together, as each
> > > >    flash will be probed individually.
> > >
> > > And how will you define partitions crossing device boundaries? I
> > > believe this constraint has been totally forgotten in this proposal.
> >
> > According to the new binding proposal, one of the two flash nodes will
> > have a partition that crosses the device boundary.
> 
> From a bindings perspective, it feels very awkward and I doubt it will be
> accepted. From a code perspective, you're gonna need to butcher the core...
> 
> > > The whole idea of stacking two devices this way was to simplify the
> > > user's life with a single device exposed and the controller handling itself
> the CS changes.
> > > That is precisely what the current binding do.
> >
> > That's true, but as I mentioned earlier, if we're not inclined to
> > support stacked mode for flashes of different makes, then the current
> > bindings are sufficient.
> 
> Thanks,
> Miquèl

Miquel Raynal Aug. 14, 2024, 2:46 p.m. UTC | #42

Hi Amit,

> > > > > For implementing this the current DT binding need to be updated as
> > > > > follows.  
> > > >
> > > > So you want to go back to step 1 and redefine bindings? Is that worth?  
> > >
> > > The current bindings are effective if we only support identical flash
> > > devices or flashes of the same make but with different sizes connected
> > > in stacked mode. However, if we want to extend stacked support to
> > > include flashes of different makes in stacked mode,  
> > 
> > The only actual feature the stacked mode brings is the ability to consider two
> > devices like one. This is abstracted by hardware, this is a controller capability.  
> 
> Stacked mode is a software abstraction rather than a controller feature or 
> capability. At any given time, the controller communicates with one of the 
> two connected flash devices, as determined by the requested address and data 
> length. If an operation starts on one flash and ends on the other, the core 
> needs to split it into two separate operations and adjust the data length 
> accordingly.

I'm sorry, that was not my understanding, cf the initial RFC:

	Subject: [RFC PATCH 0/3] Dual stacked/parallel memories bindings
	Date: Fri, 12 Nov 2021 16:24:08 +0100

	"[...] supporting specific SPI controller modes like Xilinx's
	where the controller can highly abstract the hardware and
	provide access to a single bigger device instead [...]"

Furthermore, I rapidly checked the Zynq7000 TRM, it suggests that the
controller is capable of addressing the right memory itself based on
the address, especially in linear mode? 

	https://docs.amd.com/r/en-US/ug585-zynq-7000-SoC-TRM/Dual-SS-4-bit-Stacked-I/O

	"The lower SPI flash memory should always be connected if the
	linear Quad-SPI memory subsystem is used, and the upper flash
	memory is optional. Total address space is 32 MB with a 25-bit
	address. In IO mode, the MSB of the address is defined by
	U_PAGE which is located at bit 28 of register 0xA0 . In Linear
	address mode, AXI address bit 24 determines the upper or lower
	memory page. All of the commands will be executed by the device
	selected by U_PAGE in I/O mode and address bit 24 in linear
	mode."

Anyway, you may decide to go down the "pure software" route, which is
probably easier from an implementation perspective, but means you're
gonna have to argue -again- in favor of the representation of a purely
virtual device that is not hardware.

Cheers,
Miquèl

Mahapatra, Amit Kumar Aug. 19, 2024, 10:28 a.m. UTC | #43

Hello Miquel,

> Hi Amit,
> 
> > > > > > For implementing this the current DT binding need to be
> > > > > > updated as follows.
> > > > >
> > > > > So you want to go back to step 1 and redefine bindings? Is that worth?
> > > >
> > > > The current bindings are effective if we only support identical
> > > > flash devices or flashes of the same make but with different sizes
> > > > connected in stacked mode. However, if we want to extend stacked
> > > > support to include flashes of different makes in stacked mode,
> > >
> > > The only actual feature the stacked mode brings is the ability to
> > > consider two devices like one. This is abstracted by hardware, this is a
> controller capability.
> >
> > Stacked mode is a software abstraction rather than a controller
> > feature or capability. At any given time, the controller communicates
> > with one of the two connected flash devices, as determined by the
> > requested address and data length. If an operation starts on one flash
> > and ends on the other, the core needs to split it into two separate
> > operations and adjust the data length accordingly.
> 
> I'm sorry, that was not my understanding, cf the initial RFC:
> 
> 	Subject: [RFC PATCH 0/3] Dual stacked/parallel memories bindings
> 	Date: Fri, 12 Nov 2021 16:24:08 +0100
> 
> 	"[...] supporting specific SPI controller modes like Xilinx's
> 	where the controller can highly abstract the hardware and
> 	provide access to a single bigger device instead [...]"
> 
> Furthermore, I rapidly checked the Zynq7000 TRM, it suggests that the
> controller is capable of addressing the right memory itself based on the
> address, especially in linear mode?

Yes, this is true only when operating in Linear Mode. In this mode, the 
Zynq 7000 can only perform read operations. Erase and write operations 
are possible only in I/O mode, so while operating in I/O mode the CS pins 
must be manually asserted by the driver according to the address and data 
length
> 
> 	https://docs.amd.com/r/en-US/ug585-zynq-7000-SoC-TRM/Dual-SS-
> 4-bit-Stacked-I/O
> 
> 	"The lower SPI flash memory should always be connected if the
> 	linear Quad-SPI memory subsystem is used, and the upper flash
> 	memory is optional. Total address space is 32 MB with a 25-bit
> 	address. In IO mode, the MSB of the address is defined by
> 	U_PAGE which is located at bit 28 of register 0xA0 . In Linear
> 	address mode, AXI address bit 24 determines the upper or lower
> 	memory page. All of the commands will be executed by the device
> 	selected by U_PAGE in I/O mode and address bit 24 in linear
> 	mode."
> 
> Anyway, you may decide to go down the "pure software" route, which is
> probably easier from an implementation perspective, but means you're
> gonna have to argue -again- in favor of the representation of a purely virtual
> device that is not hardware.

Yes, that's correct, but I was planning to manage it through a new layer rather 
than using mtd-concat. As Tudor mentioned, I will start a new email thread 
outlining my proposal so we can resume the discussion on the patch moving 
forward.

Regards,
Amit
> 
> Cheers,
> Miquèl

[v11,00/10] spi: Add support for stacked/parallel memories

Message

Comments