diff mbox series

memsize: Make get_ram_size() work with arbitary RAM size

Message ID 1594378641-26360-1-git-send-email-bmeng.cn@gmail.com
State New
Headers show
Series memsize: Make get_ram_size() work with arbitary RAM size | expand

Commit Message

Bin Meng July 10, 2020, 10:57 a.m. UTC
From: Bin Meng <bin.meng at windriver.com>

Currently get_ram_size() only works with certain RAM size like 1GiB,
2GiB, 4GiB, 8GiB, etc. Chanage the codes to work with any RAM size.

Signed-off-by: Bin Meng <bin.meng at windriver.com>
---

 common/memsize.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

Comments

Wolfgang Denk July 13, 2020, 5:18 p.m. UTC | #1
Dear Bin Meng,

In message <1594378641-26360-1-git-send-email-bmeng.cn at gmail.com> you wrote:
>
> Currently get_ram_size() only works with certain RAM size like 1GiB,
> 2GiB, 4GiB, 8GiB, etc. Chanage the codes to work with any RAM size.

I'm afraid I don't understand this change,  Can you please explain a
bit more detailed what "any RAM size" means?

The existing code should work fine with any RAM size that is a power
of two (and the algoithm used is based on this assumption, so the
code would fail if it is not met).

> -	long           save[BITS_PER_LONG - 1];
> -	long           save_base;
> -	long           cnt;
> -	long           val;
> -	long           size;
> -	int            i = 0;
> +	long save[BITS_PER_LONG - 1];
> +	long save_base;
> +	long cnt;
> +	long val;
> +	long size;
> +	int i = 0;

Why do you change the formatting here?  I can see no need for that?


> +	long n = maxsize / sizeof(long);
>  
> -	for (cnt = (maxsize / sizeof(long)) >> 1; cnt > 0; cnt >>= 1) {
> +	n = __ffs(n);
> +	n = BIT(n);
> +
> +	for (cnt = n >> 1; cnt > 0; cnt >>= 1) {

I can only speculate - but do you think that this will work for a
memory size of - for example - 2.5 GiB as might result from combining
two banks of 2 GiB resp. 512 MiB ?   I bet it doesn't.

For correct operation (as originally intended) you would always
specify a maxsize twice as large as the maximum possible memory size
of a bank of memory, and the function would return the real size it
found.

Any other use, especially not checking one bank of memory at a time,
will not work as intended.  And I have yet to see systems where
the size of a bank of memory is not a power of two.

So I feel what you are doing here is not an improvement, but a
"workaround" for some incorrect usage.

I don't think we should accept this patch.

Best regards,

Wolfgang Denk
Bin Meng July 14, 2020, 10:35 a.m. UTC | #2
Hi Wolfgang,

On Tue, Jul 14, 2020 at 1:18 AM Wolfgang Denk <wd at denx.de> wrote:
>
> Dear Bin Meng,
>
> In message <1594378641-26360-1-git-send-email-bmeng.cn at gmail.com> you wrote:
> >
> > Currently get_ram_size() only works with certain RAM size like 1GiB,
> > 2GiB, 4GiB, 8GiB, etc. Chanage the codes to work with any RAM size.
>
> I'm afraid I don't understand this change,  Can you please explain a
> bit more detailed what "any RAM size" means?

I meant "any RAM size" that is not power of two.

>
> The existing code should work fine with any RAM size that is a power
> of two (and the algoithm used is based on this assumption, so the
> code would fail if it is not met).

Unfortunately this limitation is not documented clearly.

>
> > -     long           save[BITS_PER_LONG - 1];
> > -     long           save_base;
> > -     long           cnt;
> > -     long           val;
> > -     long           size;
> > -     int            i = 0;
> > +     long save[BITS_PER_LONG - 1];
> > +     long save_base;
> > +     long cnt;
> > +     long val;
> > +     long size;
> > +     int i = 0;
>
> Why do you change the formatting here?  I can see no need for that?

I see the above are the only places in this file that do not follow
our coding practices. Since I was modifying this function, I think
it's fine to do the updates while we are here.

>
>
> > +     long n = maxsize / sizeof(long);
> >
> > -     for (cnt = (maxsize / sizeof(long)) >> 1; cnt > 0; cnt >>= 1) {
> > +     n = __ffs(n);
> > +     n = BIT(n);
> > +
> > +     for (cnt = n >> 1; cnt > 0; cnt >>= 1) {
>
> I can only speculate - but do you think that this will work for a
> memory size of - for example - 2.5 GiB as might result from combining
> two banks of 2 GiB resp. 512 MiB ?   I bet it doesn't.

Correct. This issue was exposed when I was testing U-Boot with QEMU on
RISC-V. With QEMU we can instantiate arbitrary RAM size for a given
machine.

>
> For correct operation (as originally intended) you would always
> specify a maxsize twice as large as the maximum possible memory size
> of a bank of memory, and the function would return the real size it
> found.

I don't think this function can work as expected. Even if I pass a
power-of-two RAM size to this function, it could still fail. Imagine
the target has 2GiB memory size, and I pass 8GiB as the maxsize to
this function. The function will hang when it tries to read/write
something at an address that is beyond 2GiB.

>
> Any other use, especially not checking one bank of memory at a time,
> will not work as intended.  And I have yet to see systems where
> the size of a bank of memory is not a power of two.
>
> So I feel what you are doing here is not an improvement, but a
> "workaround" for some incorrect usage.
>

Given what you mentioned the function limitation, we should update the
codes with the above power of two assumptions documented.

> I don't think we should accept this patch.
>

Regards,
Bin
Wolfgang Denk July 14, 2020, 12:28 p.m. UTC | #3
Dear Bin Meng,

In message <CAEUhbmWBFvb_SGobk2f_JXhvAtyZkZ1jJ0HXiWdC0xU61Hn1jg at mail.gmail.com> you wrote:
>
> > I'm afraid I don't understand this change,  Can you please explain a
> > bit more detailed what "any RAM size" means?
>
> I meant "any RAM size" that is not power of two.

I was afraid you meant this.

> > The existing code should work fine with any RAM size that is a power
> > of two (and the algoithm used is based on this assumption, so the
> > code would fail if it is not met).
>
> Unfortunately this limitation is not documented clearly.

You are right - the code is completely undocumented, not even the
fact that it should always operate on a single bank of memory at a
time.

> > For correct operation (as originally intended) you would always
> > specify a maxsize twice as large as the maximum possible memory size
> > of a bank of memory, and the function would return the real size it
> > found.
>
> I don't think this function can work as expected. Even if I pass a
> power-of-two RAM size to this function, it could still fail. Imagine
> the target has 2GiB memory size, and I pass 8GiB as the maxsize to
> this function. The function will hang when it tries to read/write
> something at an address that is beyond 2GiB.

It should not hang if used properly - but of course this depends a
lot on the hardware properties, and you have to know what you are
doing.

When the code was written, all we had to deal with was Power
Architecture systems with pretty flexible and easy to ptogram memory
controllers.  A typical use case was a system where you could for
example populate either 64 or 128 or 256 MB of RAM.  In such a case
you would run the code with a max size of 512 MB, after configuring
he memory controller to map such a size.  In this case, acceses to
the whole 512 MB range will work without problems, the higher
address ranges just mirroring the data of the actually populated
RAM.  This mirroring will be detected, so you get the information
how big the populated RAM really is.

If we would only check with a max size of 256 MB, we would not be
able to detect the case when there is bigger RAM (say, 512 MB)
populated.

I am aware that there are situations where you can't program the
memory controller so freely, so we often cannot use this complete
testing and lose the ability to detect working, but bigger RAM.

In any case we always test only the memory locations at the
power-of-two borders - the main reason is that this test is supposed
to run on every boot, so it must be fast.

In your case this immediately shows the problem.  Assume you have a
memory configuration which is supposed to have 2 GiB + 512 MiB,
but by mistake wrong components were fit and instead of 512 MiB
there are only 128 MiB actually present.

U-Boot will test memory below the ... 128M, 256M, 512M, 1G, 2G (and
ideally 4G) limits.  It will tell you that there are 2 GiB working
memory.  It will NOT test any location between 2 GB and 4GB, so it
has no chance to find out if there is 128 MB or 512 MB present, or
nothing at all, or if this memory is not working at all and
returning random data.

This code works only for bank sizes that are a power of two!

> Given what you mentioned the function limitation, we should update the
> codes with the above power of two assumptions documented.

That would indeed make sense.  On the other hand, I would have
expected that this behaviour is kind of obvious from the fact, that
we just bit shift the address location used for testing?

Best regards,

Wolfgang Denk
Heinrich Schuchardt July 14, 2020, 3:53 p.m. UTC | #4
On 14.07.20 12:35, Bin Meng wrote:
> Hi Wolfgang,
>
> On Tue, Jul 14, 2020 at 1:18 AM Wolfgang Denk <wd at denx.de> wrote:
>>
>> Dear Bin Meng,
>>
>> In message <1594378641-26360-1-git-send-email-bmeng.cn at gmail.com> you wrote:
>>>
>>> Currently get_ram_size() only works with certain RAM size like 1GiB,
>>> 2GiB, 4GiB, 8GiB, etc. Chanage the codes to work with any RAM size.
>>
>> I'm afraid I don't understand this change,  Can you please explain a
>> bit more detailed what "any RAM size" means?
>
> I meant "any RAM size" that is not power of two.
>
>>
>> The existing code should work fine with any RAM size that is a power
>> of two (and the algoithm used is based on this assumption, so the
>> code would fail if it is not met).
>
> Unfortunately this limitation is not documented clearly.
>
>>
>>> -     long           save[BITS_PER_LONG - 1];
>>> -     long           save_base;
>>> -     long           cnt;
>>> -     long           val;
>>> -     long           size;
>>> -     int            i = 0;
>>> +     long save[BITS_PER_LONG - 1];
>>> +     long save_base;
>>> +     long cnt;
>>> +     long val;
>>> +     long size;
>>> +     int i = 0;
>>
>> Why do you change the formatting here?  I can see no need for that?
>
> I see the above are the only places in this file that do not follow
> our coding practices. Since I was modifying this function, I think
> it's fine to do the updates while we are here.
>
>>
>>
>>> +     long n = maxsize / sizeof(long);
>>>
>>> -     for (cnt = (maxsize / sizeof(long)) >> 1; cnt > 0; cnt >>= 1) {
>>> +     n = __ffs(n);
>>> +     n = BIT(n);
>>> +
>>> +     for (cnt = n >> 1; cnt > 0; cnt >>= 1) {
>>
>> I can only speculate - but do you think that this will work for a
>> memory size of - for example - 2.5 GiB as might result from combining
>> two banks of 2 GiB resp. 512 MiB ?   I bet it doesn't.
>
> Correct. This issue was exposed when I was testing U-Boot with QEMU on
> RISC-V. With QEMU we can instantiate arbitrary RAM size for a given
> machine.
>
>>
>> For correct operation (as originally intended) you would always
>> specify a maxsize twice as large as the maximum possible memory size
>> of a bank of memory, and the function would return the real size it
>> found.
>
> I don't think this function can work as expected. Even if I pass a
> power-of-two RAM size to this function, it could still fail. Imagine
> the target has 2GiB memory size, and I pass 8GiB as the maxsize to
> this function. The function will hang when it tries to read/write
> something at an address that is beyond 2GiB.
>
>>
>> Any other use, especially not checking one bank of memory at a time,
>> will not work as intended.  And I have yet to see systems where
>> the size of a bank of memory is not a power of two.
>>
>> So I feel what you are doing here is not an improvement, but a
>> "workaround" for some incorrect usage.
>>
>
> Given what you mentioned the function limitation, we should update the
> codes with the above power of two assumptions documented.
>
>> I don't think we should accept this patch.
>>
>
> Regards,
> Bin
>


If we want a fast algorithm to determine the last supported address even
if the start or size is not a power of two:

unsigned long n = sizeof(unsigned long) * 8 - 1;
unsigned long addr = 0;
n = 1UL << n;
for (; n; n >>= 1) {
????????unsigned long next = addr + n;

????????if (next < start || next <= start + size - 1 && test(next))
????????????????addr = next;
}

Best regards

Heinrich
Wolfgang Denk July 14, 2020, 4:09 p.m. UTC | #5
Dear Heinrich,

In message <53dad1c7-7684-f975-1567-6ec5e03fa4b6 at gmx.de> you wrote:
>
> If we want a fast algorithm to determine the last supported address even
> if the start or size is not a power of two:

Are you sure?  How is this supposed to work?

Running it with start = 0 and size = 0xC0000000 it will test the
memory locations

        0x80000000
        0xA0000000
        0xB0000000
        0xB8000000
        0xBC000000
        0xBE000000
        0xBF000000
        0xBF800000
        0xBFC00000
        0xBFE00000
        0xBFF00000
        0xBFF80000
        0xBFFC0000
        0xBFFE0000
        0xBFFF0000
        0xBFFF8000
        0xBFFFC000
        0xBFFFE000
        0xBFFFF000
        0xBFFFF800
        0xBFFFFC00
        0xBFFFFE00
        0xBFFFFF00
        0xBFFFFF80
        0xBFFFFFC0
        0xBFFFFFE0
        0xBFFFFFF0
        0xBFFFFFF8
        0xBFFFFFFC
        0xBFFFFFFE
        0xBFFFFFFF

What do you intend with such a sequence?

Best regards,

Wolfgang Denk
Wolfgang Denk July 21, 2020, 6:51 a.m. UTC | #6
Dear Heinrich,

In message <4d5aaf77-378d-4ac2-0008-599ceeeb95ed@gmx.de> you wrote:
>

> > Are you sure?  How is this supposed to work?

> >

> > Running it with start = 0 and size = 0xC0000000 it will test the

> > memory locations

> >

> >         0x80000000

> >         0xA0000000

> >         0xB0000000

> >         0xB8000000

> >         0xBC000000

> >         0xBE000000

> >         0xBF000000

> >         0xBF800000

> >         0xBFC00000

> >         0xBFE00000

> >         0xBFF00000

> >         0xBFF80000

> >         0xBFFC0000

> >         0xBFFE0000

> >         0xBFFF0000

> >         0xBFFF8000

> >         0xBFFFC000

> >         0xBFFFE000

> >         0xBFFFF000

> >         0xBFFFF800

> >         0xBFFFFC00

> >         0xBFFFFE00

> >         0xBFFFFF00

> >         0xBFFFFF80

> >         0xBFFFFFC0

> >         0xBFFFFFE0

> >         0xBFFFFFF0

> >         0xBFFFFFF8

> >         0xBFFFFFFC

> >         0xBFFFFFFE

> >         0xBFFFFFFF

>

> The last accessible byte is at 0xBFFFFFFF which matches (start + size -

> 1) in your example.

>

> The difference to the current logic is that it does not require start or

> size to be power of two.


Yes, I know, but your test strategy makes no sense.  You have to
check memory at the power-of-two boudanries to find out how big the
ram actually is and whether your memory accesses may be mirrored at
other locations in the address room.

You are just testing the end of the expected size in some way for
which I cannot find a good explanation.

> If the size input is larger than the actually accessible memory size,

> you will get the actual memory size, e.g. lets assume that the last

> accessible address is 0x3333:


Please understand thtat this function is not a plain stupid size
test, but there is more intelligece in it.   We have seen many cases
where for example the memory controller was set up for 256 MiB
address room, but only 64 MiB of memory were installed.  The current
version of the test reliably detects that there are only 64 MiB of
memory presnet, clearly indicating the problem.

I think your test would happyly report to see 256 MiB of memory and
miss the fact that this is only one of the 4 mirrors of the real
memory - and of course any attempt to run Linux on such a system
would result in funny crashes...

> Now the algorithm returns that the last accessible memory location is

> 0x3332.


Come on.  The purpose of this function as it was designed is to find
the size of installed memory in systems where this may change
because there may be several hardware configurations.  And memory
chips tend to come in sizes that are a power of two.

And we not only test the size, we also verify basic functionality
(including errors like shorts of breaks in the data and address
lanes), and this really fast so it can be kept in the production
version.


You drop all this very practical functionality for a feature which
is of hypothetical use at best?

> The algorithm runs in O(log(UINT_MAX)) which is the same time complexity

> as for the current algorithm.


Yes, but it makes absolutely no sense to me.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
I can't say I've ever been lost, but I was bewildered once for  three
days.                                     - Daniel Boone (Attributed)
diff mbox series

Patch

diff --git a/common/memsize.c b/common/memsize.c
index e95c682..776737a 100644
--- a/common/memsize.c
+++ b/common/memsize.c
@@ -6,6 +6,7 @@ 
 
 #include <common.h>
 #include <init.h>
+#include <linux/bitops.h>
 
 DECLARE_GLOBAL_DATA_PTR;
 
@@ -27,14 +28,18 @@  DECLARE_GLOBAL_DATA_PTR;
 long get_ram_size(long *base, long maxsize)
 {
 	volatile long *addr;
-	long           save[BITS_PER_LONG - 1];
-	long           save_base;
-	long           cnt;
-	long           val;
-	long           size;
-	int            i = 0;
+	long save[BITS_PER_LONG - 1];
+	long save_base;
+	long cnt;
+	long val;
+	long size;
+	int i = 0;
+	long n = maxsize / sizeof(long);
 
-	for (cnt = (maxsize / sizeof(long)) >> 1; cnt > 0; cnt >>= 1) {
+	n = __ffs(n);
+	n = BIT(n);
+
+	for (cnt = n >> 1; cnt > 0; cnt >>= 1) {
 		addr = base + cnt;	/* pointer arith! */
 		sync();
 		save[i++] = *addr;
@@ -53,7 +58,7 @@  long get_ram_size(long *base, long maxsize)
 		/* Restore the original data before leaving the function. */
 		sync();
 		*base = save_base;
-		for (cnt = 1; cnt < maxsize / sizeof(long); cnt <<= 1) {
+		for (cnt = 1; cnt < n; cnt <<= 1) {
 			addr  = base + cnt;
 			sync();
 			*addr = save[--i];
@@ -61,7 +66,7 @@  long get_ram_size(long *base, long maxsize)
 		return (0);
 	}
 
-	for (cnt = 1; cnt < maxsize / sizeof(long); cnt <<= 1) {
+	for (cnt = 1; cnt < n; cnt <<= 1) {
 		addr = base + cnt;	/* pointer arith! */
 		val = *addr;
 		*addr = save[--i];
@@ -72,12 +77,13 @@  long get_ram_size(long *base, long maxsize)
 			 * before leaving the function.
 			 */
 			for (cnt <<= 1;
-			     cnt < maxsize / sizeof(long);
+			     cnt < n;
 			     cnt <<= 1) {
 				addr  = base + cnt;
 				*addr = save[--i];
 			}
-			/* warning: don't restore save_base in this case,
+			/*
+			 * warning: don't restore save_base in this case,
 			 * it is already done in the loop because
 			 * base and base+size share the same physical memory
 			 * and *base is saved after *(base+size) modification