diff mbox series

mtd: rawnand: denali: add more delays before latching incoming data

Message ID 20200316104307.1891-1-yamada.masahiro@socionext.com
State Superseded
Headers show
Series mtd: rawnand: denali: add more delays before latching incoming data | expand

Commit Message

Masahiro Yamada March 16, 2020, 10:43 a.m. UTC
The Denali IP have several registers to specify how many clock cycles
should be waited between falling/rising signals. You can improve the
NAND access performance by programming these registers with optimized
values.

Because struct nand_sdr_timings represents the device requirement
in pico seconds, denali_setup_data_interface() computes the register
values by dividing the device timings with the clock period.

Marek Vasut reported this driver in the latest kernel does not work
on his SOCFPGA board. (The on-board NAND chip is mode 5)

The suspicious parameter is acc_clks, so this commit relaxes it.

The Denali NAND Flash Memory Controller User's Guide describes this
register as follows:

  acc_clks
    signifies the number of bus interface clk_x clock cycles,
    controller should wait from read enable going low to sending
    out a strobe of clk_x for capturing of incoming data.

Currently, acc_clks is calculated only based on tREA, the delay on the
chip side. This does not include additional delays that come from the
data path on the PCB and in the SoC, load capacity of the pins, etc.

This relatively becomes a big factor on faster timing modes like mode 5.

Before supporting the ->setup_data_interface() hook (e.g. Linux 4.12),
the Denali driver hacks acc_clks in a couple of ways [1] [2] to support
the timing mode 5.

We would not go back to the hard-coded acc_clks, but we need to include
this factor into the delay somehow. Let's say the amount of the additional
delay is 10000 pico sec.

In the new calculation, acc_clks is determined by timings->tREA_max +
data_setup_on_host.

Also, prolong the RE# low period to make sure the data hold is met.

Finally, re-center the data latch timing for extra safety.

[1] https://github.com/torvalds/linux/blob/v4.12/drivers/mtd/nand/denali.c#L276
[2] https://github.com/torvalds/linux/blob/v4.12/drivers/mtd/nand/denali.c#L282

Reported-by: Marek Vasut <marex@denx.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

---

 drivers/mtd/nand/raw/denali.c | 44 ++++++++++++++++++++++++++---------
 1 file changed, 33 insertions(+), 11 deletions(-)

-- 
2.17.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

Comments

Miquel Raynal March 20, 2020, 5:11 p.m. UTC | #1
Hi Marek,

Masahiro Yamada <yamada.masahiro@socionext.com> wrote on Mon, 16 Mar
2020 19:43:07 +0900:

> The Denali IP have several registers to specify how many clock cycles
> should be waited between falling/rising signals. You can improve the
> NAND access performance by programming these registers with optimized
> values.
> 
> Because struct nand_sdr_timings represents the device requirement
> in pico seconds, denali_setup_data_interface() computes the register
> values by dividing the device timings with the clock period.
> 
> Marek Vasut reported this driver in the latest kernel does not work
> on his SOCFPGA board. (The on-board NAND chip is mode 5)
> 
> The suspicious parameter is acc_clks, so this commit relaxes it.
> 
> The Denali NAND Flash Memory Controller User's Guide describes this
> register as follows:
> 
>   acc_clks
>     signifies the number of bus interface clk_x clock cycles,
>     controller should wait from read enable going low to sending
>     out a strobe of clk_x for capturing of incoming data.
> 
> Currently, acc_clks is calculated only based on tREA, the delay on the
> chip side. This does not include additional delays that come from the
> data path on the PCB and in the SoC, load capacity of the pins, etc.
> 
> This relatively becomes a big factor on faster timing modes like mode 5.
> 
> Before supporting the ->setup_data_interface() hook (e.g. Linux 4.12),
> the Denali driver hacks acc_clks in a couple of ways [1] [2] to support
> the timing mode 5.
> 
> We would not go back to the hard-coded acc_clks, but we need to include
> this factor into the delay somehow. Let's say the amount of the additional
> delay is 10000 pico sec.
> 
> In the new calculation, acc_clks is determined by timings->tREA_max +
> data_setup_on_host.
> 
> Also, prolong the RE# low period to make sure the data hold is met.
> 
> Finally, re-center the data latch timing for extra safety.
> 
> [1] https://github.com/torvalds/linux/blob/v4.12/drivers/mtd/nand/denali.c#L276
> [2] https://github.com/torvalds/linux/blob/v4.12/drivers/mtd/nand/denali.c#L282
> 
> Reported-by: Marek Vasut <marex@denx.de>
> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> ---

Can you please give this patch a try and report the result?

Thanks,
Miquèl
Marek Vasut March 20, 2020, 5:14 p.m. UTC | #2
On 3/20/20 6:11 PM, Miquel Raynal wrote:
> Hi Marek,

> 

> Masahiro Yamada <yamada.masahiro@socionext.com> wrote on Mon, 16 Mar

> 2020 19:43:07 +0900:

> 

>> The Denali IP have several registers to specify how many clock cycles

>> should be waited between falling/rising signals. You can improve the

>> NAND access performance by programming these registers with optimized

>> values.

>>

>> Because struct nand_sdr_timings represents the device requirement

>> in pico seconds, denali_setup_data_interface() computes the register

>> values by dividing the device timings with the clock period.

>>

>> Marek Vasut reported this driver in the latest kernel does not work

>> on his SOCFPGA board. (The on-board NAND chip is mode 5)

>>

>> The suspicious parameter is acc_clks, so this commit relaxes it.

>>

>> The Denali NAND Flash Memory Controller User's Guide describes this

>> register as follows:

>>

>>   acc_clks

>>     signifies the number of bus interface clk_x clock cycles,

>>     controller should wait from read enable going low to sending

>>     out a strobe of clk_x for capturing of incoming data.

>>

>> Currently, acc_clks is calculated only based on tREA, the delay on the

>> chip side. This does not include additional delays that come from the

>> data path on the PCB and in the SoC, load capacity of the pins, etc.

>>

>> This relatively becomes a big factor on faster timing modes like mode 5.

>>

>> Before supporting the ->setup_data_interface() hook (e.g. Linux 4.12),

>> the Denali driver hacks acc_clks in a couple of ways [1] [2] to support

>> the timing mode 5.

>>

>> We would not go back to the hard-coded acc_clks, but we need to include

>> this factor into the delay somehow. Let's say the amount of the additional

>> delay is 10000 pico sec.

>>

>> In the new calculation, acc_clks is determined by timings->tREA_max +

>> data_setup_on_host.

>>

>> Also, prolong the RE# low period to make sure the data hold is met.

>>

>> Finally, re-center the data latch timing for extra safety.

>>

>> [1] https://github.com/torvalds/linux/blob/v4.12/drivers/mtd/nand/denali.c#L276

>> [2] https://github.com/torvalds/linux/blob/v4.12/drivers/mtd/nand/denali.c#L282

>>

>> Reported-by: Marek Vasut <marex@denx.de>

>> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

>> ---

> 

> Can you please give this patch a try and report the result?


It's on my list, don't worry.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
Masahiro Yamada March 22, 2020, 5:42 p.m. UTC | #3
On Sat, Mar 21, 2020 at 2:15 AM Marek Vasut <marex@denx.de> wrote:
>

> On 3/20/20 6:11 PM, Miquel Raynal wrote:

> > Hi Marek,

> >

> > Masahiro Yamada <yamada.masahiro@socionext.com> wrote on Mon, 16 Mar

> > 2020 19:43:07 +0900:

> >

> >> The Denali IP have several registers to specify how many clock cycles

> >> should be waited between falling/rising signals. You can improve the

> >> NAND access performance by programming these registers with optimized

> >> values.

> >>

> >> Because struct nand_sdr_timings represents the device requirement

> >> in pico seconds, denali_setup_data_interface() computes the register

> >> values by dividing the device timings with the clock period.

> >>

> >> Marek Vasut reported this driver in the latest kernel does not work

> >> on his SOCFPGA board. (The on-board NAND chip is mode 5)

> >>

> >> The suspicious parameter is acc_clks, so this commit relaxes it.

> >>

> >> The Denali NAND Flash Memory Controller User's Guide describes this

> >> register as follows:

> >>

> >>   acc_clks

> >>     signifies the number of bus interface clk_x clock cycles,

> >>     controller should wait from read enable going low to sending

> >>     out a strobe of clk_x for capturing of incoming data.

> >>

> >> Currently, acc_clks is calculated only based on tREA, the delay on the

> >> chip side. This does not include additional delays that come from the

> >> data path on the PCB and in the SoC, load capacity of the pins, etc.

> >>

> >> This relatively becomes a big factor on faster timing modes like mode 5.

> >>

> >> Before supporting the ->setup_data_interface() hook (e.g. Linux 4.12),

> >> the Denali driver hacks acc_clks in a couple of ways [1] [2] to support

> >> the timing mode 5.

> >>

> >> We would not go back to the hard-coded acc_clks, but we need to include

> >> this factor into the delay somehow. Let's say the amount of the additional

> >> delay is 10000 pico sec.

> >>

> >> In the new calculation, acc_clks is determined by timings->tREA_max +

> >> data_setup_on_host.

> >>

> >> Also, prolong the RE# low period to make sure the data hold is met.

> >>

> >> Finally, re-center the data latch timing for extra safety.

> >>

> >> [1] https://github.com/torvalds/linux/blob/v4.12/drivers/mtd/nand/denali.c#L276

> >> [2] https://github.com/torvalds/linux/blob/v4.12/drivers/mtd/nand/denali.c#L282

> >>

> >> Reported-by: Marek Vasut <marex@denx.de>

> >> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

> >> ---

> >

> > Can you please give this patch a try and report the result?

>

> It's on my list, don't worry.



Preferably, please test v2.

Thanks.


-- 
Best Regards
Masahiro Yamada

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
Marek Vasut March 22, 2020, 5:49 p.m. UTC | #4
On 3/22/20 6:42 PM, Masahiro Yamada wrote:
> On Sat, Mar 21, 2020 at 2:15 AM Marek Vasut <marex@denx.de> wrote:

>>

>> On 3/20/20 6:11 PM, Miquel Raynal wrote:

>>> Hi Marek,

>>>

>>> Masahiro Yamada <yamada.masahiro@socionext.com> wrote on Mon, 16 Mar

>>> 2020 19:43:07 +0900:

>>>

>>>> The Denali IP have several registers to specify how many clock cycles

>>>> should be waited between falling/rising signals. You can improve the

>>>> NAND access performance by programming these registers with optimized

>>>> values.

>>>>

>>>> Because struct nand_sdr_timings represents the device requirement

>>>> in pico seconds, denali_setup_data_interface() computes the register

>>>> values by dividing the device timings with the clock period.

>>>>

>>>> Marek Vasut reported this driver in the latest kernel does not work

>>>> on his SOCFPGA board. (The on-board NAND chip is mode 5)

>>>>

>>>> The suspicious parameter is acc_clks, so this commit relaxes it.

>>>>

>>>> The Denali NAND Flash Memory Controller User's Guide describes this

>>>> register as follows:

>>>>

>>>>   acc_clks

>>>>     signifies the number of bus interface clk_x clock cycles,

>>>>     controller should wait from read enable going low to sending

>>>>     out a strobe of clk_x for capturing of incoming data.

>>>>

>>>> Currently, acc_clks is calculated only based on tREA, the delay on the

>>>> chip side. This does not include additional delays that come from the

>>>> data path on the PCB and in the SoC, load capacity of the pins, etc.

>>>>

>>>> This relatively becomes a big factor on faster timing modes like mode 5.

>>>>

>>>> Before supporting the ->setup_data_interface() hook (e.g. Linux 4.12),

>>>> the Denali driver hacks acc_clks in a couple of ways [1] [2] to support

>>>> the timing mode 5.

>>>>

>>>> We would not go back to the hard-coded acc_clks, but we need to include

>>>> this factor into the delay somehow. Let's say the amount of the additional

>>>> delay is 10000 pico sec.

>>>>

>>>> In the new calculation, acc_clks is determined by timings->tREA_max +

>>>> data_setup_on_host.

>>>>

>>>> Also, prolong the RE# low period to make sure the data hold is met.

>>>>

>>>> Finally, re-center the data latch timing for extra safety.

>>>>

>>>> [1] https://github.com/torvalds/linux/blob/v4.12/drivers/mtd/nand/denali.c#L276

>>>> [2] https://github.com/torvalds/linux/blob/v4.12/drivers/mtd/nand/denali.c#L282

>>>>

>>>> Reported-by: Marek Vasut <marex@denx.de>

>>>> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

>>>> ---

>>>

>>> Can you please give this patch a try and report the result?

>>

>> It's on my list, don't worry.

> 

> 

> Preferably, please test v2.


Yes, I see the V2.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
diff mbox series

Patch

diff --git a/drivers/mtd/nand/raw/denali.c b/drivers/mtd/nand/raw/denali.c
index 6a6c919b2569..ecd11c08aa2a 100644
--- a/drivers/mtd/nand/raw/denali.c
+++ b/drivers/mtd/nand/raw/denali.c
@@ -764,6 +764,7 @@  static int denali_write_page(struct nand_chip *chip, const u8 *buf,
 static int denali_setup_data_interface(struct nand_chip *chip, int chipnr,
 				       const struct nand_data_interface *conf)
 {
+	static const u32 data_setup_on_host = 10000;
 	struct denali_controller *denali = to_denali_controller(chip);
 	struct denali_chip_sel *sel;
 	const struct nand_sdr_timings *timings;
@@ -796,15 +797,6 @@  static int denali_setup_data_interface(struct nand_chip *chip, int chipnr,
 
 	sel = &to_denali_chip(chip)->sels[chipnr];
 
-	/* tREA -> ACC_CLKS */
-	acc_clks = DIV_ROUND_UP(timings->tREA_max, t_x);
-	acc_clks = min_t(int, acc_clks, ACC_CLKS__VALUE);
-
-	tmp = ioread32(denali->reg + ACC_CLKS);
-	tmp &= ~ACC_CLKS__VALUE;
-	tmp |= FIELD_PREP(ACC_CLKS__VALUE, acc_clks);
-	sel->acc_clks = tmp;
-
 	/* tRWH -> RE_2_WE */
 	re_2_we = DIV_ROUND_UP(timings->tRHW_min, t_x);
 	re_2_we = min_t(int, re_2_we, RE_2_WE__VALUE);
@@ -862,14 +854,44 @@  static int denali_setup_data_interface(struct nand_chip *chip, int chipnr,
 	tmp |= FIELD_PREP(RDWR_EN_HI_CNT__VALUE, rdwr_en_hi);
 	sel->rdwr_en_hi_cnt = tmp;
 
-	/* tRP, tWP -> RDWR_EN_LO_CNT */
+	/*
+	 * tREA -> ACC_CLKS
+	 * tRP, tWP, tRHOH, tRC, tWC -> RDWR_EN_LO_CNT
+	 */
+
+	/*
+	 * Determine the minimum of acc_clks to meet the setup timing when
+	 * capturing the incoming data.
+	 *
+	 * The delay on the chip side is well-defined as tREA, but we need to
+	 * take additional delay into account. This includes a certain degree
+	 * of unknowledge, such as signal propagation delays on the PCB and
+	 * in the SoC, load capacity of the I/O pins, etc.
+	 */
+	acc_clks = DIV_ROUND_UP(timings->tREA_max + data_setup_on_host, t_x);
+
+	/* Determine the minimum of rdwr_en_lo_cnt from RE#/WE# pulse width */
 	rdwr_en_lo = DIV_ROUND_UP(max(timings->tRP_min, timings->tWP_min), t_x);
+
+	/* Extend rdwr_en_lo to meet the data hold timing */
+	rdwr_en_lo = max_t(int, rdwr_en_lo, acc_clks - timings->tRHOH_min / t_x);
+
+	/* Extend rdwr_en_lo to meet the requirement for RE#/WE# cycle time */
 	rdwr_en_lo_hi = DIV_ROUND_UP(max(timings->tRC_min, timings->tWC_min),
 				     t_x);
-	rdwr_en_lo_hi = max_t(int, rdwr_en_lo_hi, mult_x);
 	rdwr_en_lo = max(rdwr_en_lo, rdwr_en_lo_hi - rdwr_en_hi);
 	rdwr_en_lo = min_t(int, rdwr_en_lo, RDWR_EN_LO_CNT__VALUE);
 
+	/* Center the data latch timing for extra safety */
+	acc_clks = (acc_clks + rdwr_en_lo +
+		    DIV_ROUND_UP(timings->tRHOH_min, t_x)) / 2;
+	acc_clks = min_t(int, acc_clks, ACC_CLKS__VALUE);
+
+	tmp = ioread32(denali->reg + ACC_CLKS);
+	tmp &= ~ACC_CLKS__VALUE;
+	tmp |= FIELD_PREP(ACC_CLKS__VALUE, acc_clks);
+	sel->acc_clks = tmp;
+
 	tmp = ioread32(denali->reg + RDWR_EN_LO_CNT);
 	tmp &= ~RDWR_EN_LO_CNT__VALUE;
 	tmp |= FIELD_PREP(RDWR_EN_LO_CNT__VALUE, rdwr_en_lo);