diff mbox

[v3,00/18] Add basic Minidump kernel driver support

Message ID 1683133352-10046-1-git-send-email-quic_mojha@quicinc.com
State Superseded
Headers show

Commit Message

Mukesh Ojha May 3, 2023, 5:02 p.m. UTC
Minidump is a best effort mechanism to collect useful and predefined data
for first level of debugging on end user devices running on Qualcomm SoCs.
It is built on the premise that System on Chip (SoC) or subsystem part of
SoC crashes, due to a range of hardware and software bugs. Hence, the
ability to collect accurate data is only a best-effort. The data collected
could be invalid or corrupted, data collection itself could fail, and so on.

Qualcomm devices in engineering mode provides a mechanism for generating
full system ramdumps for post mortem debugging. But in some cases it's
however not feasible to capture the entire content of RAM. The minidump
mechanism provides the means for selecting which snippets should be
included in the ramdump.

The core of minidump feature is part of Qualcomm's boot firmware code.
It initializes shared memory (SMEM), which is a part of DDR and
allocates a small section of SMEM to minidump table i.e also called
global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
their own table of segments to be included in the minidump and all get
their reference from G-ToC. Each segment/region has some details like
name, physical address and it's size etc. and it could be anywhere
scattered in the DDR.

Existing upstream Qualcomm remoteproc driver[1] already supports minidump
feature for remoteproc instances like ADSP, MODEM, ... where predefined
selective segments of subsystem region can be dumped as part of
coredump collection which generates smaller size artifacts compared to
complete coredump of subsystem on crash.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142

In addition to managing and querying the APSS minidump description,
the Linux driver maintains a ELF header in a segment. This segment
gets updated with section/program header whenever a new entry gets
registered.

Patch 1/18 is very trivial change.
Patch 2/18 moves the minidump specific data structure and macro to
 qcom_minidump.h so that (4/18) qcom minidump driver can use.
Patch 3/18 documents qualcomm minidump guide for users.
Patch 4/18 implements qualcomm minidump kernel driver and exports
 symbol which other minidump kernel client can use.
Patch 5/18 add pending region support for the clients who came for
registration before minidump.
Patch 6/18 add update region support for registered clients.
Patch 7/18 enables the qualcomm minidump driver.
Patch 8/18 Use the exported symbol from minidump driver in qcom_common
 for querying minidump descriptor for a subsystem.
Patch 9-13 add qcom dynamic ramoops region support via a driver which
 adds ramoops platform device and also register existing pstore
 frontend regions.
Patch 14-18 are not new and has already been through 6 versions and
reason of adding here is for minidump testing purpose and it will be rebased
automatically along with new version of minidump series.

Testing of the patches has been done on sm8450 target after enabling config like
CONFIG_PSTORE_RAM and CONFIG_PSTORE_CONSOLE and once the device boots up.
Try crashing it via devmem2 0xf11c000(this is known to create xpu violation and
and put the device in download mode) on command prompt.

I have added download patch here numbered from 14/18 to 18/18
Earlier download mode setting patches were sent separately
https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/

Default storage type is set to via USB, so minidump would be downloaded with the
help of x86_64 machine (running PCAT tool) attached to Qualcomm device which has
backed minidump boot firmware support (more can be found patch 3/18)

Below patch [1] is to warm reset Qualcomm device which has upstream qcom
watchdog driver support.

After applying all patches, we can boot the device and can execute
following command.

echo mini > /sys/module/qcom_scm/parameters/download_mode
echo c > /proc/sysrq-trigger

This will make the device go to download mode and collect the minidump on to the
attached x86 machine running the Qualcomm PCAT tool(This comes as part Qualcomm
package manager kit).
After that we will see a bunch of predefined registered region as binary blobs files
starts with md_* downloaded on the x86 machine on given location in PCAT tool from
the target device.

A sample client example to dump a linux region has been given in patch 3/18 and as
well as can be seen in patch 12/18.

[1]
--------------------------->8-------------------------------------

commit f1124ccebd47550b4c9627aa162d9cdceba2b76f
Author: Mukesh Ojha <quic_mojha@quicinc.com>
Date:   Thu Mar 16 14:08:35 2023 +0530

    do not merge: watchdog bite on panic

    Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>

--------------------------------------------------------------------------

Changes in v3:
 - Addressed most of the comments by Srini on v2 and refactored the minidump driver.
    - Added platform device support
    - Unregister region support.
 - Added update region for clients.
 - Added pending region support.
 - Modified the documentation guide accordingly.
 - Added qcom_pstore_ramdump client driver which happen to add ramoops platform
   device and also registers ramoops region with minidump.
 - Added download mode patch series with this minidump series. 
    https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/

Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/
 - Addressed review comment made by [quic_tsoni/bmasney] to add documentation.
 - Addressed comments made by [srinivas.kandagatla]
 - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore
   region in minidump.
 - Fixed issue reported by kernel test robot.

Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/


Mukesh Ojha (18):
  remoteproc: qcom: Expand MD_* as MINIDUMP_*
  remoteproc: qcom: Move minidump specific data to qcom_minidump.h
  docs: qcom: Add qualcomm minidump guide
  soc: qcom: Add Qualcomm minidump kernel driver
  soc: qcom: minidump: Add pending region registration support
  soc: qcom: minidump: Add update region support
  arm64: defconfig: Enable Qualcomm minidump driver
  remoterproc: qcom: refactor to leverage exported minidump symbol
  soc: qcom: Add qcom's pstore minidump driver support
  dt-bindings: reserved-memory: Add qcom,ramoops-minidump binding
  arm64: dts: qcom: sm8450: Add Qualcomm ramoops minidump node
  soc: qcom: Register pstore frontend region with minidump
  arm64: defconfig: Enable Qualcomm pstore minidump client driver
  firmware: qcom_scm: provide a read-modify-write function
  pinctrl: qcom: Use qcom_scm_io_update_field()
  firmware: scm: Modify only the download bits in TCSR register
  firmware: qcom_scm: Refactor code to support multiple download mode
  firmware: qcom_scm: Add multiple download mode support

 Documentation/admin-guide/qcom_minidump.rst        | 246 +++++++
 .../reserved-memory/qcom,ramoops-minidump.yaml     |  69 ++
 arch/arm64/boot/dts/qcom/sm8450.dtsi               |  11 +
 arch/arm64/configs/defconfig                       |   2 +
 drivers/firmware/Kconfig                           |  11 -
 drivers/firmware/qcom_scm.c                        |  88 ++-
 drivers/pinctrl/qcom/pinctrl-msm.c                 |  11 +-
 drivers/remoteproc/qcom_common.c                   |  75 +--
 drivers/soc/qcom/Kconfig                           |  25 +
 drivers/soc/qcom/Makefile                          |   2 +
 drivers/soc/qcom/qcom_minidump.c                   | 724 +++++++++++++++++++++
 drivers/soc/qcom/qcom_pstore_minidump.c            | 194 ++++++
 drivers/soc/qcom/smem.c                            |   8 +
 include/linux/firmware/qcom/qcom_scm.h             |   2 +
 include/soc/qcom/qcom_minidump.h                   | 130 ++++
 15 files changed, 1503 insertions(+), 95 deletions(-)
 create mode 100644 Documentation/admin-guide/qcom_minidump.rst
 create mode 100644 Documentation/devicetree/bindings/reserved-memory/qcom,ramoops-minidump.yaml
 create mode 100644 drivers/soc/qcom/qcom_minidump.c
 create mode 100644 drivers/soc/qcom/qcom_pstore_minidump.c
 create mode 100644 include/soc/qcom/qcom_minidump.h

Comments

Krzysztof Kozlowski May 4, 2023, 11:23 a.m. UTC | #1
On 03/05/2023 19:02, Mukesh Ojha wrote:
> Previous patches add the Qualcomm minidump driver support, so
> lets enable minidump config so that it can be used by kernel
> clients.
> 
> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>

This patchset is split too much. Defconfig change is one change. Not two
or three.

> ---
>  arch/arm64/configs/defconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index a24609e..831c942 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -1250,6 +1250,7 @@ CONFIG_QCOM_STATS=m
>  CONFIG_QCOM_WCNSS_CTRL=m
>  CONFIG_QCOM_APR=m
>  CONFIG_QCOM_ICC_BWMON=m
> +CONFIG_QCOM_MINIDUMP=y

This must be a module.

Best regards,
Krzysztof
Krzysztof Kozlowski May 4, 2023, 11:26 a.m. UTC | #2
On 03/05/2023 19:02, Mukesh Ojha wrote:
> Minidump is a best effort mechanism to collect useful and predefined data
> for first level of debugging on end user devices running on Qualcomm SoCs.
> It is built on the premise that System on Chip (SoC) or subsystem part of
> SoC crashes, due to a range of hardware and software bugs. Hence, the
> ability to collect accurate data is only a best-effort. The data collected
> could be invalid or corrupted, data collection itself could fail, and so on.

You organized the patch in a way making it very hard for us to review. I
see mixed remoteproc, then soc, then defconfig (!!!), then remote proc,
then soc, then bindings (! they must be before usage...), then dts
(which should be the last), then soc then dts then... You see the point.

Bindings, docs, changes organized by subsystem. Then DTS as separate
patchset with a link to this one. If you have bisectability issues then
it's a hint something is wrongly organized or done and must be fixed anyway.

Best regards,
Krzysztof
Krzysztof Kozlowski May 4, 2023, 11:36 a.m. UTC | #3
On 03/05/2023 19:02, Mukesh Ojha wrote:
> Minidump is a best effort mechanism to collect useful and predefined
> data for first level of debugging on end user devices running on
> Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
> or subsystem part of SoC crashes, due to a range of hardware and
> software bugs. Hence, the ability to collect accurate data is only
> a best-effort. The data collected could be invalid or corrupted,
> data collection itself could fail, and so on.
> 
> Qualcomm devices in engineering mode provides a mechanism for
> generating full system ramdumps for post mortem debugging. But in some
> cases it's however not feasible to capture the entire content of RAM.
> The minidump mechanism provides the means for selecting region should
> be included in the ramdump. The solution supports extracting the
> ramdump/minidump produced either over USB or stored to an attached
> storage device.
> 
> The core of minidump feature is part of Qualcomm's boot firmware code.
> It initializes shared memory(SMEM), which is a part of DDR and
> allocates a small section of it to minidump table i.e also called
> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
> their own table of segments to be included in the minidump, all
> references from a descriptor in SMEM (G-ToC). Each segment/region has
> some details like name, physical address and it's size etc. and it
> could be anywhere scattered in the DDR.
> 
> Minidump kernel driver adds the capability to add linux region to be
> dumped as part of ram dump collection. It provides appropriate symbol
> to check its enablement and register client regions.
> 
> To simplify post mortem debugging, it creates and maintain an ELF
> header as first region that gets updated upon registration
> of a new region.
> 
> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
> ---
>  drivers/soc/qcom/Kconfig         |  14 +
>  drivers/soc/qcom/Makefile        |   1 +
>  drivers/soc/qcom/qcom_minidump.c | 581 +++++++++++++++++++++++++++++++++++++++
>  drivers/soc/qcom/smem.c          |   8 +
>  include/soc/qcom/qcom_minidump.h |  61 +++-
>  5 files changed, 663 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/soc/qcom/qcom_minidump.c
> 
> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
> index a491718..15c931e 100644
> --- a/drivers/soc/qcom/Kconfig
> +++ b/drivers/soc/qcom/Kconfig
> @@ -279,4 +279,18 @@ config QCOM_INLINE_CRYPTO_ENGINE
>  	tristate
>  	select QCOM_SCM
>  
> +config QCOM_MINIDUMP
> +	tristate "QCOM Minidump Support"
> +	depends on ARCH_QCOM || COMPILE_TEST
> +	select QCOM_SMEM
> +	help
> +	  Enablement of core minidump feature is controlled from boot firmware
> +	  side, and this config allow linux to query and manages APPS minidump
> +	  table.
> +
> +	  Client drivers can register their internal data structures and debug
> +	  messages as part of the minidump region and when the SoC is crashed,
> +	  these selective regions will be dumped instead of the entire DDR.
> +	  This saves significant amount of time and/or storage space.
> +
>  endmenu
> diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
> index 0f43a88..1ebe081 100644
> --- a/drivers/soc/qcom/Makefile
> +++ b/drivers/soc/qcom/Makefile
> @@ -33,3 +33,4 @@ obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o
>  obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) +=	kryo-l2-accessors.o
>  obj-$(CONFIG_QCOM_ICC_BWMON)	+= icc-bwmon.o
>  obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE)	+= ice.o
> +obj-$(CONFIG_QCOM_MINIDUMP) += qcom_minidump.o
> diff --git a/drivers/soc/qcom/qcom_minidump.c b/drivers/soc/qcom/qcom_minidump.c
> new file mode 100644
> index 0000000..d107a86
> --- /dev/null
> +++ b/drivers/soc/qcom/qcom_minidump.c
> @@ -0,0 +1,581 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +/*
> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/elf.h>
> +#include <linux/err.h>
> +#include <linux/errno.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/io.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/platform_device.h>
> +#include <linux/string.h>
> +#include <linux/soc/qcom/smem.h>
> +#include <soc/qcom/qcom_minidump.h>
> +
> +/**
> + * struct minidump_elfhdr - Minidump table elf header
> + * @ehdr: Elf main header
> + * @shdr: Section header
> + * @phdr: Program header
> + * @elf_offset: Section offset in elf
> + * @strtable_idx: String table current index position
> + */
> +struct minidump_elfhdr {
> +	struct elfhdr		*ehdr;
> +	struct elf_shdr		*shdr;
> +	struct elf_phdr		*phdr;
> +	size_t			elf_offset;
> +	size_t			strtable_idx;
> +};
> +
> +/**
> + * struct minidump - Minidump driver private data
> + * @md_gbl_toc	: Global TOC pointer
> + * @md_apss_toc	: Application Subsystem TOC pointer
> + * @md_regions	: High level OS region base pointer
> + * @elf		: Minidump elf header
> + * @dev		: Minidump device
> + */
> +struct minidump {
> +	struct minidump_global_toc	*md_gbl_toc;
> +	struct minidump_subsystem	*md_apss_toc;
> +	struct minidump_region		*md_regions;
> +	struct minidump_elfhdr		elf;
> +	struct device			*dev;
> +};
> +
> +/*
> + * In some of the Old Qualcomm devices, boot firmware statically allocates 300
> + * as total number of supported region (including all co-processors) in
> + * minidump table out of which linux was using 201. In future, this limitation
> + * from boot firmware might get removed by allocating the region dynamically.
> + * So, keep it compatible with older devices, we can keep the current limit for
> + * Linux to 201.
> + */
> +#define MAX_NUM_ENTRIES	  201
> +#define MAX_STRTBL_SIZE	  (MAX_NUM_ENTRIES * MAX_REGION_NAME_LENGTH)
> +
> +static struct minidump *__md;

No, no file scope or global scope statics.

> +static DEFINE_MUTEX(minidump_lock);

Neither this.

Also you need to clearly express what is protected by newn lock.

> +
> +static struct elf_shdr *elf_shdr_entry_addr(struct elfhdr *ehdr, int idx)
> +{
> +	struct elf_shdr *eshdr = (struct elf_shdr *)((size_t)ehdr + ehdr->e_shoff);
> +
> +	return &eshdr[idx];
> +}
> +
> +static struct elf_phdr *elf_phdr_entry_addr(struct elfhdr *ehdr, int idx)
> +{
> +	struct elf_phdr *ephdr = (struct elf_phdr *)((size_t)ehdr + ehdr->e_phoff);
> +
> +	return &ephdr[idx];
> +}
> +
> +static char *elf_str_table_start(struct elfhdr *ehdr)
> +{
> +	struct elf_shdr *eshdr;
> +
> +	if (ehdr->e_shstrndx == SHN_UNDEF)
> +		return NULL;
> +
> +	eshdr = elf_shdr_entry_addr(ehdr, ehdr->e_shstrndx);
> +	return (char *)ehdr + eshdr->sh_offset;
> +}
> +
> +static char *elf_lookup_string(struct elfhdr *ehdr, int offset)
> +{
> +	char *strtab = elf_str_table_start(ehdr);
> +
> +	if (!strtab || (__md->elf.strtable_idx < offset))
> +		return NULL;
> +
> +	return strtab + offset;
> +}
> +
> +static unsigned int append_str_to_strtable(const char *name)
> +{
> +	char *strtab = elf_str_table_start(__md->elf.ehdr);
> +	unsigned int old_idx = __md->elf.strtable_idx;
> +	unsigned int ret;
> +
> +	if (!strtab || !name)
> +		return 0;
> +
> +	ret = old_idx;
> +	old_idx += strscpy((strtab + old_idx), name, MAX_REGION_NAME_LENGTH);
> +	__md->elf.strtable_idx = old_idx + 1;
> +	return ret;
> +}
> +
> +static int
> +get_apss_minidump_region_index(const struct qcom_apss_minidump_region *region)
> +{
> +	struct minidump_region *mdr;
> +	unsigned int i;
> +	unsigned int count;
> +
> +	count = le32_to_cpu(__md->md_apss_toc->region_count);
> +	for (i = 0; i < count; i++) {
> +		mdr = &__md->md_regions[i];
> +		if (!strcmp(mdr->name, region->name))
> +			return i;
> +	}
> +	return -ENOENT;
> +}
> +
> +static void
> +qcom_apss_minidump_update_elf_header(const struct qcom_apss_minidump_region *region)

You need to name everything shorter. Neither functions nor structs are
making it easy to follow.

> +{
> +	struct elfhdr *ehdr = __md->elf.ehdr;
> +	struct elf_shdr *shdr;
> +	struct elf_phdr *phdr;
> +
> +	shdr = elf_shdr_entry_addr(ehdr, ehdr->e_shnum++);
> +	phdr = elf_phdr_entry_addr(ehdr, ehdr->e_phnum++);
> +
> +	shdr->sh_type = SHT_PROGBITS;
> +	shdr->sh_name = append_str_to_strtable(region->name);
> +	shdr->sh_addr = (elf_addr_t)region->virt_addr;
> +	shdr->sh_size = region->size;
> +	shdr->sh_flags = SHF_WRITE;
> +	shdr->sh_offset = __md->elf.elf_offset;
> +	shdr->sh_entsize = 0;
> +
> +	phdr->p_type = PT_LOAD;
> +	phdr->p_offset = __md->elf.elf_offset;
> +	phdr->p_vaddr = (elf_addr_t)region->virt_addr;
> +	phdr->p_paddr = region->phys_addr;
> +	phdr->p_filesz = phdr->p_memsz = region->size;
> +	phdr->p_flags = PF_R | PF_W;
> +	__md->elf.elf_offset += shdr->sh_size;
> +}
> +
> +static void
> +qcom_apss_minidump_add_region(const struct qcom_apss_minidump_region *region)
> +{
> +	struct minidump_region *mdr;
> +	unsigned int region_cnt = le32_to_cpu(__md->md_apss_toc->region_count);
> +
> +	mdr = &__md->md_regions[region_cnt];
> +	strscpy(mdr->name, region->name, sizeof(mdr->name));
> +	mdr->address = cpu_to_le64(region->phys_addr);
> +	mdr->size = cpu_to_le64(region->size);
> +	mdr->valid = cpu_to_le32(MINIDUMP_REGION_VALID);
> +	region_cnt++;
> +	__md->md_apss_toc->region_count = cpu_to_le32(region_cnt);
> +}
> +
> +static bool
> +qcom_apss_minidump_valid_region(const struct qcom_apss_minidump_region *region)
> +{
> +	return region &&
> +		strnlen(region->name, MAX_NAME_LENGTH) < MAX_NAME_LENGTH &&
> +		region->virt_addr &&
> +		region->size &&
> +		IS_ALIGNED(region->size, 4);
> +}
> +
> +static int qcom_apss_minidump_add_elf_header(void)
> +{
> +	struct qcom_apss_minidump_region elfregion;
> +	struct elfhdr *ehdr;
> +	struct elf_shdr *shdr;
> +	struct elf_phdr *phdr;
> +	unsigned int  elfh_size;
> +	unsigned int strtbl_off;
> +	unsigned int phdr_off;
> +	char *banner;
> +	unsigned int banner_len;
> +
> +	banner_len = strlen(linux_banner);
> +	/*
> +	 * Header buffer contains:
> +	 * ELF header, (MAX_NUM_ENTRIES + 4) of Section and Program ELF headers,
> +	 * where, 4 additional entries, one for empty header, one for string table
> +	 * one for minidump table and one for linux banner.
> +	 *
> +	 * Linux banner is stored in minidump to aid post mortem tools to determine
> +	 * the kernel version.
> +	 */
> +	elfh_size = sizeof(*ehdr);
> +	elfh_size += MAX_STRTBL_SIZE;
> +	elfh_size += banner_len + 1;
> +	elfh_size += ((sizeof(*shdr) + sizeof(*phdr)) * (MAX_NUM_ENTRIES + 4));
> +	elfh_size = ALIGN(elfh_size, 4);
> +
> +	__md->elf.ehdr = kzalloc(elfh_size, GFP_KERNEL);
> +	if (!__md->elf.ehdr)
> +		return -ENOMEM;
> +
> +	/* Register ELF header as first region */
> +	strscpy(elfregion.name, "KELF_HEADER", sizeof(elfregion.name));
> +	elfregion.virt_addr = __md->elf.ehdr;
> +	elfregion.phys_addr = virt_to_phys(__md->elf.ehdr);
> +	elfregion.size = elfh_size;
> +	qcom_apss_minidump_add_region(&elfregion);
> +
> +	ehdr = __md->elf.ehdr;
> +	/* Assign Section/Program headers offset */
> +	__md->elf.shdr = shdr = (struct elf_shdr *)(ehdr + 1);
> +	__md->elf.phdr = phdr = (struct elf_phdr *)(shdr + MAX_NUM_ENTRIES);
> +	phdr_off = sizeof(*ehdr) + (sizeof(*shdr) * MAX_NUM_ENTRIES);
> +
> +	memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
> +	ehdr->e_ident[EI_CLASS] = ELF_CLASS;
> +	ehdr->e_ident[EI_DATA] = ELF_DATA;
> +	ehdr->e_ident[EI_VERSION] = EV_CURRENT;
> +	ehdr->e_ident[EI_OSABI] = ELF_OSABI;
> +	ehdr->e_type = ET_CORE;
> +	ehdr->e_machine  = ELF_ARCH;
> +	ehdr->e_version = EV_CURRENT;
> +	ehdr->e_ehsize = sizeof(*ehdr);
> +	ehdr->e_phoff = phdr_off;
> +	ehdr->e_phentsize = sizeof(*phdr);
> +	ehdr->e_shoff = sizeof(*ehdr);
> +	ehdr->e_shentsize = sizeof(*shdr);
> +	ehdr->e_shstrndx = 1;
> +
> +	__md->elf.elf_offset = elfh_size;
> +
> +	/*
> +	 * The zeroth index of the section header is reserved and is rarely used.
> +	 * Set the section header as null (SHN_UNDEF) and move to the next one.
> +	 * 2nd Section is String table.
> +	 */
> +	__md->elf.strtable_idx = 1;
> +	strtbl_off = sizeof(*ehdr) + ((sizeof(*phdr) + sizeof(*shdr)) * MAX_NUM_ENTRIES);
> +	shdr++;
> +	shdr->sh_type = SHT_STRTAB;
> +	shdr->sh_offset = (elf_addr_t)strtbl_off;
> +	shdr->sh_size = MAX_STRTBL_SIZE;
> +	shdr->sh_entsize = 0;
> +	shdr->sh_flags = 0;
> +	shdr->sh_name = append_str_to_strtable("STR_TBL");
> +	shdr++;
> +
> +	/* 3rd Section is Linux banner */
> +	banner = (char *)ehdr + strtbl_off + MAX_STRTBL_SIZE;
> +	memcpy(banner, linux_banner, banner_len);
> +
> +	shdr->sh_type = SHT_PROGBITS;
> +	shdr->sh_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE);
> +	shdr->sh_size = banner_len + 1;
> +	shdr->sh_addr = (elf_addr_t)linux_banner;
> +	shdr->sh_entsize = 0;
> +	shdr->sh_flags = SHF_WRITE;
> +	shdr->sh_name = append_str_to_strtable("linux_banner");
> +
> +	phdr->p_type = PT_LOAD;
> +	phdr->p_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE);
> +	phdr->p_vaddr = (elf_addr_t)linux_banner;
> +	phdr->p_paddr = virt_to_phys(linux_banner);
> +	phdr->p_filesz = phdr->p_memsz = banner_len + 1;
> +	phdr->p_flags = PF_R | PF_W;
> +
> +	/*
> +	 * Above are some prdefined sections/program header used
> +	 * for debug, update their count here.
> +	 */
> +	ehdr->e_phnum = 1;
> +	ehdr->e_shnum = 3;
> +
> +	return 0;
> +}
> +
> +/**
> + * qcom_minidump_subsystem_desc() - Get minidump subsystem descriptor.
> + * @minidump_index: minidump index for a subsystem in minidump table
> + *
> + * Return: minidump subsystem descriptor address on success and error
> + * on failure
> + */
> +struct minidump_subsystem *qcom_minidump_subsystem_desc(unsigned int minidump_index)
> +{
> +	struct minidump_subsystem *md_ss_toc;
> +
> +	mutex_lock(&minidump_lock);
> +	if (!__md) {
> +		md_ss_toc = ERR_PTR(-EPROBE_DEFER);
> +		goto unlock;
> +	}
> +
> +	md_ss_toc = &__md->md_gbl_toc->subsystems[minidump_index];
> +unlock:
> +	mutex_unlock(&minidump_lock);
> +	return md_ss_toc;
> +}
> +EXPORT_SYMBOL_GPL(qcom_minidump_subsystem_desc);
> +
> +/**
> + * qcom_apss_minidump_region_register() - Register a region in Minidump table.
> + * @region: minidump region.
> + *
> + * Return: On success, it returns 0, otherwise a negative error value on failure.
> + */
> +int qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region)
> +{
> +	unsigned int num_region;
> +	int ret;
> +
> +	if (!__md)
> +		return -EPROBE_DEFER;
> +
> +	if (!qcom_apss_minidump_valid_region(region))
> +		return -EINVAL;
> +
> +	mutex_lock(&minidump_lock);
> +	ret = get_apss_minidump_region_index(region);
> +	if (ret >= 0) {
> +		dev_info(__md->dev, "%s region is already registered\n", region->name);
> +		ret = -EEXIST;
> +		goto unlock;
> +	}
> +
> +	/* Check if there is a room for a new entry */
> +	num_region = le32_to_cpu(__md->md_apss_toc->region_count);
> +	if (num_region >= MAX_NUM_ENTRIES) {
> +		dev_err(__md->dev, "maximum region limit %u reached\n", num_region);
> +		ret = -ENOSPC;
> +		goto unlock;
> +	}
> +
> +	qcom_apss_minidump_add_region(region);
> +	qcom_apss_minidump_update_elf_header(region);
> +	ret = 0;
> +unlock:
> +	mutex_unlock(&minidump_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(qcom_apss_minidump_region_register);
> +
> +static int
> +qcom_apss_minidump_clear_header(const struct qcom_apss_minidump_region *region)
> +{
> +	struct elfhdr *ehdr = __md->elf.ehdr;
> +	struct elf_shdr *shdr;
> +	struct elf_shdr *tmp_shdr;
> +	struct elf_phdr *phdr;
> +	struct elf_phdr *tmp_phdr;
> +	unsigned int phidx;
> +	unsigned int shidx;
> +	unsigned int len;
> +	unsigned int i;
> +	char *shname;
> +
> +	for (i = 0; i < ehdr->e_phnum; i++) {
> +		phdr = elf_phdr_entry_addr(ehdr, i);
> +		if (phdr->p_paddr == region->phys_addr &&
> +		    phdr->p_memsz == region->size)
> +			break;
> +	}
> +
> +	if (i == ehdr->e_phnum) {
> +		dev_err(__md->dev, "Cannot find program header entry in elf\n");
> +		return -EINVAL;
> +	}
> +
> +	phidx = i;
> +	for (i = 0; i < ehdr->e_shnum; i++) {
> +		shdr = elf_shdr_entry_addr(ehdr, i);
> +		shname = elf_lookup_string(ehdr, shdr->sh_name);
> +		if (shname && !strcmp(shname, region->name) &&
> +		    shdr->sh_addr == (elf_addr_t)region->virt_addr &&
> +		    shdr->sh_size == region->size)
> +			break;
> +	}
> +
> +	if (i == ehdr->e_shnum) {
> +		dev_err(__md->dev, "Cannot find section header entry in elf\n");
> +		return -EINVAL;
> +	}
> +
> +	shidx = i;
> +	if (shdr->sh_offset != phdr->p_offset) {
> +		dev_err(__md->dev, "Invalid entry details for region: %s\n", region->name);
> +		return -EINVAL;
> +	}
> +
> +	/* Clear name in string table */
> +	len = strlen(shname) + 1;
> +	memmove(shname, shname + len,
> +		__md->elf.strtable_idx - shdr->sh_name - len);
> +	__md->elf.strtable_idx -= len;
> +
> +	/* Clear program header */
> +	tmp_phdr = elf_phdr_entry_addr(ehdr, phidx);
> +	for (i = phidx; i < ehdr->e_phnum - 1; i++) {
> +		tmp_phdr = elf_phdr_entry_addr(ehdr, i + 1);
> +		phdr = elf_phdr_entry_addr(ehdr, i);
> +		memcpy(phdr, tmp_phdr, sizeof(struct elf_phdr));
> +		phdr->p_offset = phdr->p_offset - region->size;
> +	}
> +	memset(tmp_phdr, 0, sizeof(struct elf_phdr));
> +	ehdr->e_phnum--;
> +
> +	/* Clear section header */
> +	tmp_shdr = elf_shdr_entry_addr(ehdr, shidx);
> +	for (i = shidx; i < ehdr->e_shnum - 1; i++) {
> +		tmp_shdr = elf_shdr_entry_addr(ehdr, i + 1);
> +		shdr = elf_shdr_entry_addr(ehdr, i);
> +		memcpy(shdr, tmp_shdr, sizeof(struct elf_shdr));
> +		shdr->sh_offset -= region->size;
> +		shdr->sh_name -= len;
> +	}
> +
> +	memset(tmp_shdr, 0, sizeof(struct elf_shdr));
> +	ehdr->e_shnum--;
> +	__md->elf.elf_offset -= region->size;
> +
> +	return 0;
> +}
> +
> +/**
> + * qcom_apss_minidump_region_unregister() - Unregister region from Minidump table.
> + * @region: minidump region.
> + *
> + * Return: On success, it returns 0 and negative error value on failure.
> + */
> +int qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region)
> +{
> +	struct minidump_region *mdr;
> +	unsigned int num_region;
> +	unsigned int idx;
> +	int ret;
> +
> +	if (!region)
> +		return -EINVAL;
> +
> +	mutex_lock(&minidump_lock);
> +	if (!__md) {
> +		ret = -EPROBE_DEFER;
> +		goto unlock;
> +	}
> +
> +	idx = get_apss_minidump_region_index(region);
> +	if (idx < 0) {
> +		dev_err(__md->dev, "%s region is not present\n", region->name);
> +		ret = idx;
> +		goto unlock;
> +	}
> +
> +	mdr = &__md->md_regions[0];
> +	num_region = le32_to_cpu(__md->md_apss_toc->region_count);
> +	/*
> +	 * Left shift all the regions exist after this removed region
> +	 * index by 1 to fill the gap and zero out the last region
> +	 * present at the end.
> +	 */
> +	memmove(&mdr[idx], &mdr[idx + 1],
> +		(num_region - idx - 1) * sizeof(struct minidump_region));
> +	memset(&mdr[num_region - 1], 0, sizeof(struct minidump_region));
> +	ret = qcom_apss_minidump_clear_header(region);
> +	if (ret) {
> +		dev_err(__md->dev, "Failed to remove region: %s\n", region->name);
> +		goto unlock;
> +	}
> +
> +	num_region--;
> +	__md->md_apss_toc->region_count = cpu_to_le32(num_region);
> +unlock:
> +	mutex_unlock(&minidump_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(qcom_apss_minidump_region_unregister);
> +
> +static int qcom_minidump_init_apss_subsystem(struct minidump *md)
> +{
> +	struct minidump_subsystem *apsstoc;
> +
> +	apsstoc = &md->md_gbl_toc->subsystems[MINIDUMP_APSS_DESC];
> +	md->md_regions = devm_kcalloc(md->dev, MAX_NUM_ENTRIES,
> +				      sizeof(struct minidump_region), GFP_KERNEL);
> +	if (!md->md_regions)
> +		return -ENOMEM;
> +
> +	md->md_apss_toc = apsstoc;
> +	apsstoc->regions_baseptr = cpu_to_le64(virt_to_phys(md->md_regions));
> +	apsstoc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED);
> +	apsstoc->status = cpu_to_le32(1);
> +	apsstoc->region_count = cpu_to_le32(0);
> +
> +	/* Tell bootloader not to encrypt the regions of this subsystem */
> +	apsstoc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE);
> +	apsstoc->encryption_required = cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ);
> +
> +	return 0;
> +}
> +
> +static int qcom_minidump_probe(struct platform_device *pdev)
> +{
> +	struct minidump_global_toc *mdgtoc;
> +	struct minidump *md;
> +	size_t size;
> +	int ret;
> +
> +	md = devm_kzalloc(&pdev->dev, sizeof(*md), GFP_KERNEL);
> +	if (!md)
> +		return -ENOMEM;
> +
> +	mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size);
> +	if (IS_ERR(mdgtoc)) {
> +		ret = PTR_ERR(mdgtoc);
> +		dev_err(&pdev->dev, "Couldn't find minidump smem item: %d\n", ret);
> +		return ret;
> +	}
> +
> +	if (size < sizeof(*mdgtoc) || !mdgtoc->status) {
> +		ret = -EINVAL;
> +		dev_err(&pdev->dev, "minidump table is not initialized: %d\n", ret);
> +		return ret;
> +	}
> +
> +	mutex_lock(&minidump_lock);
> +	md->dev = &pdev->dev;
> +	md->md_gbl_toc = mdgtoc;

What are you protecting here? It's not possible to have concurrent
access to md, is it?

> +	ret = qcom_minidump_init_apss_subsystem(md);
> +	if (ret) {
> +		dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret);
> +		goto unlock;
> +	}
> +
> +	__md = md;

No. This is a platform device, so it can have multiple instances.

> +	/* First entry would be ELF header */
> +	ret = qcom_apss_minidump_add_elf_header();
> +	if (ret) {
> +		dev_err(&pdev->dev, "Failed to add elf header: %d\n", ret);
> +		memset(md->md_apss_toc, 0, sizeof(struct minidump_subsystem));
> +		__md = NULL;
> +	}
> +
> +unlock:
> +	mutex_unlock(&minidump_lock);
> +	return ret;
> +}
> +
> +static int qcom_minidump_remove(struct platform_device *pdev)
> +{
> +	memset(__md->md_apss_toc, 0, sizeof(struct minidump_subsystem));
> +	__md = NULL;

Don't use __ in variable names. Drop it everywhere.

> +
> +	return 0;
> +}
> +
> +static struct platform_driver qcom_minidump_driver = {
> +	.probe = qcom_minidump_probe,
> +	.remove = qcom_minidump_remove,
> +	.driver  = {
> +		.name = "qcom-minidump",
> +	},
> +};
> +
> +module_platform_driver(qcom_minidump_driver);
> +
> +MODULE_DESCRIPTION("Qualcomm APSS minidump driver");
> +MODULE_LICENSE("GPL v2");
> +MODULE_ALIAS("platform:qcom-minidump");
> diff --git a/drivers/soc/qcom/smem.c b/drivers/soc/qcom/smem.c
> index 6be7ea9..d459656 100644
> --- a/drivers/soc/qcom/smem.c
> +++ b/drivers/soc/qcom/smem.c
> @@ -279,6 +279,7 @@ struct qcom_smem {
>  
>  	u32 item_count;
>  	struct platform_device *socinfo;
> +	struct platform_device *minidump;
>  	struct smem_ptable *ptable;
>  	struct smem_partition global_partition;
>  	struct smem_partition partitions[SMEM_HOST_COUNT];
> @@ -1151,12 +1152,19 @@ static int qcom_smem_probe(struct platform_device *pdev)
>  	if (IS_ERR(smem->socinfo))
>  		dev_dbg(&pdev->dev, "failed to register socinfo device\n");
>  
> +	smem->minidump = platform_device_register_data(&pdev->dev, "qcom-minidump",
> +						      PLATFORM_DEVID_NONE, NULL,
> +						      0);
> +	if (IS_ERR(smem->minidump))
> +		dev_dbg(&pdev->dev, "failed to register minidump device\n");
> +
>  	return 0;
>  }
>  
>  static int qcom_smem_remove(struct platform_device *pdev)
>  {
>  	platform_device_unregister(__smem->socinfo);
> +	platform_device_unregister(__smem->minidump);

Wrong order. You registered first socinfo, right?

>  
>  	hwspin_lock_free(__smem->hwlock);
>  	__smem = NULL;
> diff --git a/include/soc/qcom/qcom_minidump.h b/include/soc/qcom/qcom_minidump.h
> index 84c8605..1872668 100644
> --- a/include/soc/qcom/qcom_minidump.h
> +++ b/include/soc/qcom/qcom_minidump.h
> @@ -1,6 +1,7 @@
>  /* SPDX-License-Identifier: GPL-2.0-only */
>  /*
> - * Qualcomm minidump shared data structures and macros
> + * This file contain Qualcomm minidump data structures and macros shared with
> + * boot firmware and also apss minidump client's data structure
>   *
>   * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
>   */
> @@ -9,12 +10,27 @@
>  #define _QCOM_MINIDUMP_H_
>  
>  #define MAX_NUM_OF_SS           10
> +#define MAX_NAME_LENGTH		12
>  #define MAX_REGION_NAME_LENGTH  16
> +
> +#define MINIDUMP_REVISION	1
>  #define SBL_MINIDUMP_SMEM_ID	602
> +
> +/* Application processor minidump descriptor */
> +#define MINIDUMP_APSS_DESC	0
> +#define SMEM_ENTRY_SIZE		40
> +
>  #define MINIDUMP_REGION_VALID		('V' << 24 | 'A' << 16 | 'L' << 8 | 'I' << 0)
> +#define MINIDUMP_REGION_INVALID		('I' << 24 | 'N' << 16 | 'V' << 8 | 'A' << 0)
> +#define MINIDUMP_REGION_INIT		('I' << 24 | 'N' << 16 | 'I' << 8 | 'T' << 0)
> +#define MINIDUMP_REGION_NOINIT		0
> +
> +#define MINIDUMP_SS_ENCR_REQ		(0 << 24 | 'Y' << 16 | 'E' << 8 | 'S' << 0)
> +#define MINIDUMP_SS_ENCR_NOTREQ		(0 << 24 | 0 << 16 | 'N' << 8 | 'R' << 0)
> +#define MINIDUMP_SS_ENCR_NONE		('N' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0)
>  #define MINIDUMP_SS_ENCR_DONE		('D' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0)
> +#define MINIDUMP_SS_ENCR_START		('S' << 24 | 'T' << 16 | 'R' << 8 | 'T' << 0)
>  #define MINIDUMP_SS_ENABLED		('E' << 24 | 'N' << 16 | 'B' << 8 | 'L' << 0)
> -

Why removing this?

>  /**
>   * struct minidump_region - Minidump region
>   * @name		: Name of the region to be dumped
> @@ -63,4 +79,45 @@ struct minidump_global_toc {
>  	struct minidump_subsystem	subsystems[MAX_NUM_OF_SS];
>  };
>  
> +/**
> + * struct qcom_apss_minidump_region - APSS Minidump region information
> + *
> + * @name:	Entry name, Minidump will dump binary with this name.
> + * @virt_addr:  Virtual address of the entry.
> + * @phys_addr:	Physical address of the entry to dump.
> + * @size:	Number of byte to dump from @address location,
> + *		and it should be 4 byte aligned.
> + */
> +struct qcom_apss_minidump_region {
> +	char		name[MAX_NAME_LENGTH];
> +	void		*virt_addr;
> +	phys_addr_t	phys_addr;
> +	size_t		size;
> +};

You expose way too much internals in global header.

> +
> +#if IS_ENABLED(CONFIG_QCOM_MINIDUMP)
> +extern struct minidump_subsystem *

No externs.

The header is unreadable.

> +qcom_minidump_subsystem_desc(unsigned int minidump_index);
> +extern int
> +qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region);
> +extern int
> +qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region);

Blank line

> +#else

Blank line

> +static inline
> +struct minidump_subsystem *qcom_minidump_subsystem_desc(unsigned int minidump_index)
> +{
> +	return NULL;
> +}

Blank line

> +static inline int
> +qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region)
> +{
> +	/* Return quietly, if minidump is not enabled */
> +	return 0;
> +}


> +static inline int
> +qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region)
> +{
> +	return 0;
> +}

> +#endif

/* CONFIG_QCOM_MINIDUMP */

>  #endif  /* _QCOM_MINIDUMP_H_ */

Best regards,
Krzysztof
Krzysztof Kozlowski May 4, 2023, 11:38 a.m. UTC | #4
On 03/05/2023 19:02, Mukesh Ojha wrote:
> Move minidump specific data types and macros to a separate internal
> header(qcom_minidump.h) so that it can be shared among different
> Qualcomm drivers.

No, this is not internal header. You moved it to global header.

There is no reason driver internals should be exposed to other unrelated
subsystems.

> 
> There is no change in functional behavior after this.

It is. You made all these internal symbols available to others.

> 

This comes without justification why other drivers needs to access
private and internal data. It does not look correct design. NAK.

Best regards,
Krzysztof
Mukesh Ojha May 4, 2023, 11:45 a.m. UTC | #5
On 5/4/2023 4:53 PM, Krzysztof Kozlowski wrote:
> On 03/05/2023 19:02, Mukesh Ojha wrote:
>> Previous patches add the Qualcomm minidump driver support, so
>> lets enable minidump config so that it can be used by kernel
>> clients.
>>
>> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
> 
> This patchset is split too much. Defconfig change is one change. Not two
> or three.
> 
>> ---
>>   arch/arm64/configs/defconfig | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
>> index a24609e..831c942 100644
>> --- a/arch/arm64/configs/defconfig
>> +++ b/arch/arm64/configs/defconfig
>> @@ -1250,6 +1250,7 @@ CONFIG_QCOM_STATS=m
>>   CONFIG_QCOM_WCNSS_CTRL=m
>>   CONFIG_QCOM_APR=m
>>   CONFIG_QCOM_ICC_BWMON=m
>> +CONFIG_QCOM_MINIDUMP=y
> 
> This must be a module.

Why do you think this should be a module ?

Is it because, it is lying here among others '=m' ?

Or you have some other reasoning ? like it is for qcom specific
soc and can not be used outside ? but that is not true for
all configs mentioned here.

The reason behind making it as '=y' was, to collect information from 
core kernel data structure as well as the information like percpu data, 
run queue, irq stat kind of information on kernel crash on a target 
running some perf configuration(android phone).

-- Mukesh

> 
> Best regards,
> Krzysztof
>
Mukesh Ojha May 4, 2023, 11:58 a.m. UTC | #6
On 5/4/2023 5:08 PM, Krzysztof Kozlowski wrote:
> On 03/05/2023 19:02, Mukesh Ojha wrote:
>> Move minidump specific data types and macros to a separate internal
>> header(qcom_minidump.h) so that it can be shared among different
>> Qualcomm drivers.
> 
> No, this is not internal header. You moved it to global header.
> 
> There is no reason driver internals should be exposed to other unrelated
> subsystems.
> 
>>
>> There is no change in functional behavior after this.
> 
> It is. You made all these internal symbols available to others.
> 
>>
> 
> This comes without justification why other drivers needs to access
> private and internal data. It does not look correct design. NAK.

Thanks for catching outdated commit text, will fix the commit with
more descriptive reasoning.

It has to be global so that co-processor minidump and apss minidump can
share data structure and they are lying in different directory.

-Mukesh

> 
> Best regards,
> Krzysztof
>
Krzysztof Kozlowski May 4, 2023, 12:03 p.m. UTC | #7
On 04/05/2023 13:58, Mukesh Ojha wrote:
> 
> 
> On 5/4/2023 5:08 PM, Krzysztof Kozlowski wrote:
>> On 03/05/2023 19:02, Mukesh Ojha wrote:
>>> Move minidump specific data types and macros to a separate internal
>>> header(qcom_minidump.h) so that it can be shared among different
>>> Qualcomm drivers.
>>
>> No, this is not internal header. You moved it to global header.
>>
>> There is no reason driver internals should be exposed to other unrelated
>> subsystems.
>>
>>>
>>> There is no change in functional behavior after this.
>>
>> It is. You made all these internal symbols available to others.
>>
>>>
>>
>> This comes without justification why other drivers needs to access
>> private and internal data. It does not look correct design. NAK.
> 
> Thanks for catching outdated commit text, will fix the commit with
> more descriptive reasoning.
> 
> It has to be global so that co-processor minidump and apss minidump can
> share data structure and they are lying in different directory.
> 

Then you should not share all the internals of memory layout but only
few pieces necessary to talk with minidump driver. The minidump driver
should organize everything how it wants.

Best regards,
Krzysztof
Mukesh Ojha May 4, 2023, 12:26 p.m. UTC | #8
On 5/4/2023 5:33 PM, Krzysztof Kozlowski wrote:
> On 04/05/2023 13:58, Mukesh Ojha wrote:
>>
>>
>> On 5/4/2023 5:08 PM, Krzysztof Kozlowski wrote:
>>> On 03/05/2023 19:02, Mukesh Ojha wrote:
>>>> Move minidump specific data types and macros to a separate internal
>>>> header(qcom_minidump.h) so that it can be shared among different
>>>> Qualcomm drivers.
>>>
>>> No, this is not internal header. You moved it to global header.
>>>
>>> There is no reason driver internals should be exposed to other unrelated
>>> subsystems.
>>>
>>>>
>>>> There is no change in functional behavior after this.
>>>
>>> It is. You made all these internal symbols available to others.
>>>
>>>>
>>>
>>> This comes without justification why other drivers needs to access
>>> private and internal data. It does not look correct design. NAK.
>>
>> Thanks for catching outdated commit text, will fix the commit with
>> more descriptive reasoning.
>>
>> It has to be global so that co-processor minidump and apss minidump can
>> share data structure and they are lying in different directory.
>>
> 
> Then you should not share all the internals of memory layout but only
> few pieces necessary to talk with minidump driver. The minidump driver
> should organize everything how it wants.

These are core data structure which is shared with boot firmware and the
one's are moved here all are required by minidump driver .

If you follow here[1], i raised by concern to make this particular one's
as private and later to avoid confusion went with single header.
But if others agree, I will keep the one that get shared with minidump
as separate one or if relative path of headers are allowed that can make
it private between these drivers(which i don't think, will be allowed or
recommended).

[1]
https://lore.kernel.org/lkml/3df1ec27-7e4d-1f84-ff20-94e8ea91c86f@quicinc.com/

-- Mukesh
> 
> Best regards,
> Krzysztof
>
Krzysztof Kozlowski May 4, 2023, 12:32 p.m. UTC | #9
On 04/05/2023 13:45, Mukesh Ojha wrote:
> 
> 
> On 5/4/2023 4:53 PM, Krzysztof Kozlowski wrote:
>> On 03/05/2023 19:02, Mukesh Ojha wrote:
>>> Previous patches add the Qualcomm minidump driver support, so
>>> lets enable minidump config so that it can be used by kernel
>>> clients.
>>>
>>> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
>>
>> This patchset is split too much. Defconfig change is one change. Not two
>> or three.
>>
>>> ---
>>>   arch/arm64/configs/defconfig | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
>>> index a24609e..831c942 100644
>>> --- a/arch/arm64/configs/defconfig
>>> +++ b/arch/arm64/configs/defconfig
>>> @@ -1250,6 +1250,7 @@ CONFIG_QCOM_STATS=m
>>>   CONFIG_QCOM_WCNSS_CTRL=m
>>>   CONFIG_QCOM_APR=m
>>>   CONFIG_QCOM_ICC_BWMON=m
>>> +CONFIG_QCOM_MINIDUMP=y
>>
>> This must be a module.
> 
> Why do you think this should be a module ?
> 
> Is it because, it is lying here among others '=m' ?

Because we want and insist on everything being a module. That's the
generic rule. There are exceptions, so if this justifies being an
exception, please bring appropriate arguments.

> 
> Or you have some other reasoning ? like it is for qcom specific
> soc and can not be used outside ? but that is not true for
> all configs mentioned here.
> 
> The reason behind making it as '=y' was, to collect information from 
> core kernel data structure as well as the information like percpu data, 
> run queue, irq stat kind of information on kernel crash on a target 
> running some perf configuration(android phone).

I don't understand why =m stops you from all that. What's more, I don't
understand why do you refer to the Android here. This is a development
and debugging Linux defconfig, not Android reference config for vendors...

Best regards,
Krzysztof
Krzysztof Kozlowski May 4, 2023, 12:36 p.m. UTC | #10
On 04/05/2023 14:26, Mukesh Ojha wrote:
> 
> 
> On 5/4/2023 5:33 PM, Krzysztof Kozlowski wrote:
>> On 04/05/2023 13:58, Mukesh Ojha wrote:
>>>
>>>
>>> On 5/4/2023 5:08 PM, Krzysztof Kozlowski wrote:
>>>> On 03/05/2023 19:02, Mukesh Ojha wrote:
>>>>> Move minidump specific data types and macros to a separate internal
>>>>> header(qcom_minidump.h) so that it can be shared among different
>>>>> Qualcomm drivers.
>>>>
>>>> No, this is not internal header. You moved it to global header.
>>>>
>>>> There is no reason driver internals should be exposed to other unrelated
>>>> subsystems.
>>>>
>>>>>
>>>>> There is no change in functional behavior after this.
>>>>
>>>> It is. You made all these internal symbols available to others.
>>>>
>>>>>
>>>>
>>>> This comes without justification why other drivers needs to access
>>>> private and internal data. It does not look correct design. NAK.
>>>
>>> Thanks for catching outdated commit text, will fix the commit with
>>> more descriptive reasoning.
>>>
>>> It has to be global so that co-processor minidump and apss minidump can
>>> share data structure and they are lying in different directory.
>>>
>>
>> Then you should not share all the internals of memory layout but only
>> few pieces necessary to talk with minidump driver. The minidump driver
>> should organize everything how it wants.
> 
> These are core data structure which is shared with boot firmware and the
> one's are moved here all are required by minidump driver .

I am not sure if I understand correctly. If they are all required by
minidump driver, then this must not be in include, but stay with
minidump. Remoteproc then should not touch it.

I don't understand why internals of minidump should be important for
remoteproc. If they are, means you broken encapsulation.

> 
> If you follow here[1], i raised by concern to make this particular one's
> as private and later to avoid confusion went with single header.
> But if others agree, I will keep the one that get shared with minidump
> as separate one or if relative path of headers are allowed that can make
> it private between these drivers(which i don't think, will be allowed or
> recommended).

Let's be specific: why MD_REGION_VALID must be available for remoteproc
or any other driver after introducing qcom minidump driver?


Best regards,
Krzysztof
Mukesh Ojha May 4, 2023, 12:38 p.m. UTC | #11
On 5/4/2023 5:06 PM, Krzysztof Kozlowski wrote:
> On 03/05/2023 19:02, Mukesh Ojha wrote:
>> Minidump is a best effort mechanism to collect useful and predefined
>> data for first level of debugging on end user devices running on
>> Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
>> or subsystem part of SoC crashes, due to a range of hardware and
>> software bugs. Hence, the ability to collect accurate data is only
>> a best-effort. The data collected could be invalid or corrupted,
>> data collection itself could fail, and so on.
>>
>> Qualcomm devices in engineering mode provides a mechanism for
>> generating full system ramdumps for post mortem debugging. But in some
>> cases it's however not feasible to capture the entire content of RAM.
>> The minidump mechanism provides the means for selecting region should
>> be included in the ramdump. The solution supports extracting the
>> ramdump/minidump produced either over USB or stored to an attached
>> storage device.
>>
>> The core of minidump feature is part of Qualcomm's boot firmware code.
>> It initializes shared memory(SMEM), which is a part of DDR and
>> allocates a small section of it to minidump table i.e also called
>> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
>> their own table of segments to be included in the minidump, all
>> references from a descriptor in SMEM (G-ToC). Each segment/region has
>> some details like name, physical address and it's size etc. and it
>> could be anywhere scattered in the DDR.
>>
>> Minidump kernel driver adds the capability to add linux region to be
>> dumped as part of ram dump collection. It provides appropriate symbol
>> to check its enablement and register client regions.
>>
>> To simplify post mortem debugging, it creates and maintain an ELF
>> header as first region that gets updated upon registration
>> of a new region.
>>
>> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
>> ---
>>   drivers/soc/qcom/Kconfig         |  14 +
>>   drivers/soc/qcom/Makefile        |   1 +
>>   drivers/soc/qcom/qcom_minidump.c | 581 +++++++++++++++++++++++++++++++++++++++
>>   drivers/soc/qcom/smem.c          |   8 +
>>   include/soc/qcom/qcom_minidump.h |  61 +++-
>>   5 files changed, 663 insertions(+), 2 deletions(-)
>>   create mode 100644 drivers/soc/qcom/qcom_minidump.c
>>
>> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
>> index a491718..15c931e 100644
>> --- a/drivers/soc/qcom/Kconfig
>> +++ b/drivers/soc/qcom/Kconfig
>> @@ -279,4 +279,18 @@ config QCOM_INLINE_CRYPTO_ENGINE
>>   	tristate
>>   	select QCOM_SCM
>>   
>> +config QCOM_MINIDUMP
>> +	tristate "QCOM Minidump Support"
>> +	depends on ARCH_QCOM || COMPILE_TEST
>> +	select QCOM_SMEM
>> +	help
>> +	  Enablement of core minidump feature is controlled from boot firmware
>> +	  side, and this config allow linux to query and manages APPS minidump
>> +	  table.
>> +
>> +	  Client drivers can register their internal data structures and debug
>> +	  messages as part of the minidump region and when the SoC is crashed,
>> +	  these selective regions will be dumped instead of the entire DDR.
>> +	  This saves significant amount of time and/or storage space.
>> +
>>   endmenu
>> diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
>> index 0f43a88..1ebe081 100644
>> --- a/drivers/soc/qcom/Makefile
>> +++ b/drivers/soc/qcom/Makefile
>> @@ -33,3 +33,4 @@ obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o
>>   obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) +=	kryo-l2-accessors.o
>>   obj-$(CONFIG_QCOM_ICC_BWMON)	+= icc-bwmon.o
>>   obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE)	+= ice.o
>> +obj-$(CONFIG_QCOM_MINIDUMP) += qcom_minidump.o
>> diff --git a/drivers/soc/qcom/qcom_minidump.c b/drivers/soc/qcom/qcom_minidump.c
>> new file mode 100644
>> index 0000000..d107a86
>> --- /dev/null
>> +++ b/drivers/soc/qcom/qcom_minidump.c
>> @@ -0,0 +1,581 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +
>> +/*
>> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
>> + */
>> +
>> +#include <linux/elf.h>
>> +#include <linux/err.h>
>> +#include <linux/errno.h>
>> +#include <linux/export.h>
>> +#include <linux/init.h>
>> +#include <linux/io.h>
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/string.h>
>> +#include <linux/soc/qcom/smem.h>
>> +#include <soc/qcom/qcom_minidump.h>
>> +
>> +/**
>> + * struct minidump_elfhdr - Minidump table elf header
>> + * @ehdr: Elf main header
>> + * @shdr: Section header
>> + * @phdr: Program header
>> + * @elf_offset: Section offset in elf
>> + * @strtable_idx: String table current index position
>> + */
>> +struct minidump_elfhdr {
>> +	struct elfhdr		*ehdr;
>> +	struct elf_shdr		*shdr;
>> +	struct elf_phdr		*phdr;
>> +	size_t			elf_offset;
>> +	size_t			strtable_idx;
>> +};
>> +
>> +/**
>> + * struct minidump - Minidump driver private data
>> + * @md_gbl_toc	: Global TOC pointer
>> + * @md_apss_toc	: Application Subsystem TOC pointer
>> + * @md_regions	: High level OS region base pointer
>> + * @elf		: Minidump elf header
>> + * @dev		: Minidump device
>> + */
>> +struct minidump {
>> +	struct minidump_global_toc	*md_gbl_toc;
>> +	struct minidump_subsystem	*md_apss_toc;
>> +	struct minidump_region		*md_regions;
>> +	struct minidump_elfhdr		elf;
>> +	struct device			*dev;
>> +};
>> +
>> +/*
>> + * In some of the Old Qualcomm devices, boot firmware statically allocates 300
>> + * as total number of supported region (including all co-processors) in
>> + * minidump table out of which linux was using 201. In future, this limitation
>> + * from boot firmware might get removed by allocating the region dynamically.
>> + * So, keep it compatible with older devices, we can keep the current limit for
>> + * Linux to 201.
>> + */
>> +#define MAX_NUM_ENTRIES	  201
>> +#define MAX_STRTBL_SIZE	  (MAX_NUM_ENTRIES * MAX_REGION_NAME_LENGTH)
>> +
>> +static struct minidump *__md;
> 
> No, no file scope or global scope statics.

Sorry, this is done as per recommendation given here [1] and this 
matches both driver/firmware/qcom_scm.c and driver/soc/qcom/smem.c
implementations.

[1]
https://lore.kernel.org/lkml/f74dfcde-e59b-a9b3-9bbc-a8de644f6740@linaro.org/

> 
>> +static DEFINE_MUTEX(minidump_lock);
> 
> Neither this.
> 
> Also you need to clearly express what is protected by newn lock.

Sure, will keep proper reasoning why global lock reasoning.

"
Since it is possible to call this file's exported function before
this driver's probe get called or it's probe is happening in parallel
so it needs global lock to protect this case.

"

> 
>> +
>> +static struct elf_shdr *elf_shdr_entry_addr(struct elfhdr *ehdr, int idx)
>> +{
>> +	struct elf_shdr *eshdr = (struct elf_shdr *)((size_t)ehdr + ehdr->e_shoff);
>> +
>> +	return &eshdr[idx];
>> +}
>> +
>> +static struct elf_phdr *elf_phdr_entry_addr(struct elfhdr *ehdr, int idx)
>> +{
>> +	struct elf_phdr *ephdr = (struct elf_phdr *)((size_t)ehdr + ehdr->e_phoff);
>> +
>> +	return &ephdr[idx];
>> +}
>> +
>> +static char *elf_str_table_start(struct elfhdr *ehdr)
>> +{
>> +	struct elf_shdr *eshdr;
>> +
>> +	if (ehdr->e_shstrndx == SHN_UNDEF)
>> +		return NULL;
>> +
>> +	eshdr = elf_shdr_entry_addr(ehdr, ehdr->e_shstrndx);
>> +	return (char *)ehdr + eshdr->sh_offset;
>> +}
>> +
>> +static char *elf_lookup_string(struct elfhdr *ehdr, int offset)
>> +{
>> +	char *strtab = elf_str_table_start(ehdr);
>> +
>> +	if (!strtab || (__md->elf.strtable_idx < offset))
>> +		return NULL;
>> +
>> +	return strtab + offset;
>> +}
>> +
>> +static unsigned int append_str_to_strtable(const char *name)
>> +{
>> +	char *strtab = elf_str_table_start(__md->elf.ehdr);
>> +	unsigned int old_idx = __md->elf.strtable_idx;
>> +	unsigned int ret;
>> +
>> +	if (!strtab || !name)
>> +		return 0;
>> +
>> +	ret = old_idx;
>> +	old_idx += strscpy((strtab + old_idx), name, MAX_REGION_NAME_LENGTH);
>> +	__md->elf.strtable_idx = old_idx + 1;
>> +	return ret;
>> +}
>> +
>> +static int
>> +get_apss_minidump_region_index(const struct qcom_apss_minidump_region *region)
>> +{
>> +	struct minidump_region *mdr;
>> +	unsigned int i;
>> +	unsigned int count;
>> +
>> +	count = le32_to_cpu(__md->md_apss_toc->region_count);
>> +	for (i = 0; i < count; i++) {
>> +		mdr = &__md->md_regions[i];
>> +		if (!strcmp(mdr->name, region->name))
>> +			return i;
>> +	}
>> +	return -ENOENT;
>> +}
>> +
>> +static void
>> +qcom_apss_minidump_update_elf_header(const struct qcom_apss_minidump_region *region)
> 
> You need to name everything shorter. Neither functions nor structs are
> making it easy to follow.

Sure !!

Thought of being very much descriptive here :-)

> 
>> +{
>> +	struct elfhdr *ehdr = __md->elf.ehdr;
>> +	struct elf_shdr *shdr;
>> +	struct elf_phdr *phdr;
>> +
>> +	shdr = elf_shdr_entry_addr(ehdr, ehdr->e_shnum++);
>> +	phdr = elf_phdr_entry_addr(ehdr, ehdr->e_phnum++);
>> +
>> +	shdr->sh_type = SHT_PROGBITS;
>> +	shdr->sh_name = append_str_to_strtable(region->name);
>> +	shdr->sh_addr = (elf_addr_t)region->virt_addr;
>> +	shdr->sh_size = region->size;
>> +	shdr->sh_flags = SHF_WRITE;
>> +	shdr->sh_offset = __md->elf.elf_offset;
>> +	shdr->sh_entsize = 0;
>> +
>> +	phdr->p_type = PT_LOAD;
>> +	phdr->p_offset = __md->elf.elf_offset;
>> +	phdr->p_vaddr = (elf_addr_t)region->virt_addr;
>> +	phdr->p_paddr = region->phys_addr;
>> +	phdr->p_filesz = phdr->p_memsz = region->size;
>> +	phdr->p_flags = PF_R | PF_W;
>> +	__md->elf.elf_offset += shdr->sh_size;
>> +}
>> +
>> +static void
>> +qcom_apss_minidump_add_region(const struct qcom_apss_minidump_region *region)
>> +{
>> +	struct minidump_region *mdr;
>> +	unsigned int region_cnt = le32_to_cpu(__md->md_apss_toc->region_count);
>> +
>> +	mdr = &__md->md_regions[region_cnt];
>> +	strscpy(mdr->name, region->name, sizeof(mdr->name));
>> +	mdr->address = cpu_to_le64(region->phys_addr);
>> +	mdr->size = cpu_to_le64(region->size);
>> +	mdr->valid = cpu_to_le32(MINIDUMP_REGION_VALID);
>> +	region_cnt++;
>> +	__md->md_apss_toc->region_count = cpu_to_le32(region_cnt);
>> +}
>> +
>> +static bool
>> +qcom_apss_minidump_valid_region(const struct qcom_apss_minidump_region *region)
>> +{
>> +	return region &&
>> +		strnlen(region->name, MAX_NAME_LENGTH) < MAX_NAME_LENGTH &&
>> +		region->virt_addr &&
>> +		region->size &&
>> +		IS_ALIGNED(region->size, 4);
>> +}
>> +
>> +static int qcom_apss_minidump_add_elf_header(void)
>> +{
>> +	struct qcom_apss_minidump_region elfregion;
>> +	struct elfhdr *ehdr;
>> +	struct elf_shdr *shdr;
>> +	struct elf_phdr *phdr;
>> +	unsigned int  elfh_size;
>> +	unsigned int strtbl_off;
>> +	unsigned int phdr_off;
>> +	char *banner;
>> +	unsigned int banner_len;
>> +
>> +	banner_len = strlen(linux_banner);
>> +	/*
>> +	 * Header buffer contains:
>> +	 * ELF header, (MAX_NUM_ENTRIES + 4) of Section and Program ELF headers,
>> +	 * where, 4 additional entries, one for empty header, one for string table
>> +	 * one for minidump table and one for linux banner.
>> +	 *
>> +	 * Linux banner is stored in minidump to aid post mortem tools to determine
>> +	 * the kernel version.
>> +	 */
>> +	elfh_size = sizeof(*ehdr);
>> +	elfh_size += MAX_STRTBL_SIZE;
>> +	elfh_size += banner_len + 1;
>> +	elfh_size += ((sizeof(*shdr) + sizeof(*phdr)) * (MAX_NUM_ENTRIES + 4));
>> +	elfh_size = ALIGN(elfh_size, 4);
>> +
>> +	__md->elf.ehdr = kzalloc(elfh_size, GFP_KERNEL);
>> +	if (!__md->elf.ehdr)
>> +		return -ENOMEM;
>> +
>> +	/* Register ELF header as first region */
>> +	strscpy(elfregion.name, "KELF_HEADER", sizeof(elfregion.name));
>> +	elfregion.virt_addr = __md->elf.ehdr;
>> +	elfregion.phys_addr = virt_to_phys(__md->elf.ehdr);
>> +	elfregion.size = elfh_size;
>> +	qcom_apss_minidump_add_region(&elfregion);
>> +
>> +	ehdr = __md->elf.ehdr;
>> +	/* Assign Section/Program headers offset */
>> +	__md->elf.shdr = shdr = (struct elf_shdr *)(ehdr + 1);
>> +	__md->elf.phdr = phdr = (struct elf_phdr *)(shdr + MAX_NUM_ENTRIES);
>> +	phdr_off = sizeof(*ehdr) + (sizeof(*shdr) * MAX_NUM_ENTRIES);
>> +
>> +	memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
>> +	ehdr->e_ident[EI_CLASS] = ELF_CLASS;
>> +	ehdr->e_ident[EI_DATA] = ELF_DATA;
>> +	ehdr->e_ident[EI_VERSION] = EV_CURRENT;
>> +	ehdr->e_ident[EI_OSABI] = ELF_OSABI;
>> +	ehdr->e_type = ET_CORE;
>> +	ehdr->e_machine  = ELF_ARCH;
>> +	ehdr->e_version = EV_CURRENT;
>> +	ehdr->e_ehsize = sizeof(*ehdr);
>> +	ehdr->e_phoff = phdr_off;
>> +	ehdr->e_phentsize = sizeof(*phdr);
>> +	ehdr->e_shoff = sizeof(*ehdr);
>> +	ehdr->e_shentsize = sizeof(*shdr);
>> +	ehdr->e_shstrndx = 1;
>> +
>> +	__md->elf.elf_offset = elfh_size;
>> +
>> +	/*
>> +	 * The zeroth index of the section header is reserved and is rarely used.
>> +	 * Set the section header as null (SHN_UNDEF) and move to the next one.
>> +	 * 2nd Section is String table.
>> +	 */
>> +	__md->elf.strtable_idx = 1;
>> +	strtbl_off = sizeof(*ehdr) + ((sizeof(*phdr) + sizeof(*shdr)) * MAX_NUM_ENTRIES);
>> +	shdr++;
>> +	shdr->sh_type = SHT_STRTAB;
>> +	shdr->sh_offset = (elf_addr_t)strtbl_off;
>> +	shdr->sh_size = MAX_STRTBL_SIZE;
>> +	shdr->sh_entsize = 0;
>> +	shdr->sh_flags = 0;
>> +	shdr->sh_name = append_str_to_strtable("STR_TBL");
>> +	shdr++;
>> +
>> +	/* 3rd Section is Linux banner */
>> +	banner = (char *)ehdr + strtbl_off + MAX_STRTBL_SIZE;
>> +	memcpy(banner, linux_banner, banner_len);
>> +
>> +	shdr->sh_type = SHT_PROGBITS;
>> +	shdr->sh_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE);
>> +	shdr->sh_size = banner_len + 1;
>> +	shdr->sh_addr = (elf_addr_t)linux_banner;
>> +	shdr->sh_entsize = 0;
>> +	shdr->sh_flags = SHF_WRITE;
>> +	shdr->sh_name = append_str_to_strtable("linux_banner");
>> +
>> +	phdr->p_type = PT_LOAD;
>> +	phdr->p_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE);
>> +	phdr->p_vaddr = (elf_addr_t)linux_banner;
>> +	phdr->p_paddr = virt_to_phys(linux_banner);
>> +	phdr->p_filesz = phdr->p_memsz = banner_len + 1;
>> +	phdr->p_flags = PF_R | PF_W;
>> +
>> +	/*
>> +	 * Above are some prdefined sections/program header used
>> +	 * for debug, update their count here.
>> +	 */
>> +	ehdr->e_phnum = 1;
>> +	ehdr->e_shnum = 3;
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * qcom_minidump_subsystem_desc() - Get minidump subsystem descriptor.
>> + * @minidump_index: minidump index for a subsystem in minidump table
>> + *
>> + * Return: minidump subsystem descriptor address on success and error
>> + * on failure
>> + */
>> +struct minidump_subsystem *qcom_minidump_subsystem_desc(unsigned int minidump_index)
>> +{
>> +	struct minidump_subsystem *md_ss_toc;
>> +
>> +	mutex_lock(&minidump_lock);
>> +	if (!__md) {
>> +		md_ss_toc = ERR_PTR(-EPROBE_DEFER);
>> +		goto unlock;
>> +	}
>> +
>> +	md_ss_toc = &__md->md_gbl_toc->subsystems[minidump_index];
>> +unlock:
>> +	mutex_unlock(&minidump_lock);
>> +	return md_ss_toc;
>> +}
>> +EXPORT_SYMBOL_GPL(qcom_minidump_subsystem_desc);
>> +
>> +/**
>> + * qcom_apss_minidump_region_register() - Register a region in Minidump table.
>> + * @region: minidump region.
>> + *
>> + * Return: On success, it returns 0, otherwise a negative error value on failure.
>> + */
>> +int qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region)
>> +{
>> +	unsigned int num_region;
>> +	int ret;
>> +
>> +	if (!__md)
>> +		return -EPROBE_DEFER;
>> +
>> +	if (!qcom_apss_minidump_valid_region(region))
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&minidump_lock);
>> +	ret = get_apss_minidump_region_index(region);
>> +	if (ret >= 0) {
>> +		dev_info(__md->dev, "%s region is already registered\n", region->name);
>> +		ret = -EEXIST;
>> +		goto unlock;
>> +	}
>> +
>> +	/* Check if there is a room for a new entry */
>> +	num_region = le32_to_cpu(__md->md_apss_toc->region_count);
>> +	if (num_region >= MAX_NUM_ENTRIES) {
>> +		dev_err(__md->dev, "maximum region limit %u reached\n", num_region);
>> +		ret = -ENOSPC;
>> +		goto unlock;
>> +	}
>> +
>> +	qcom_apss_minidump_add_region(region);
>> +	qcom_apss_minidump_update_elf_header(region);
>> +	ret = 0;
>> +unlock:
>> +	mutex_unlock(&minidump_lock);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(qcom_apss_minidump_region_register);
>> +
>> +static int
>> +qcom_apss_minidump_clear_header(const struct qcom_apss_minidump_region *region)
>> +{
>> +	struct elfhdr *ehdr = __md->elf.ehdr;
>> +	struct elf_shdr *shdr;
>> +	struct elf_shdr *tmp_shdr;
>> +	struct elf_phdr *phdr;
>> +	struct elf_phdr *tmp_phdr;
>> +	unsigned int phidx;
>> +	unsigned int shidx;
>> +	unsigned int len;
>> +	unsigned int i;
>> +	char *shname;
>> +
>> +	for (i = 0; i < ehdr->e_phnum; i++) {
>> +		phdr = elf_phdr_entry_addr(ehdr, i);
>> +		if (phdr->p_paddr == region->phys_addr &&
>> +		    phdr->p_memsz == region->size)
>> +			break;
>> +	}
>> +
>> +	if (i == ehdr->e_phnum) {
>> +		dev_err(__md->dev, "Cannot find program header entry in elf\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	phidx = i;
>> +	for (i = 0; i < ehdr->e_shnum; i++) {
>> +		shdr = elf_shdr_entry_addr(ehdr, i);
>> +		shname = elf_lookup_string(ehdr, shdr->sh_name);
>> +		if (shname && !strcmp(shname, region->name) &&
>> +		    shdr->sh_addr == (elf_addr_t)region->virt_addr &&
>> +		    shdr->sh_size == region->size)
>> +			break;
>> +	}
>> +
>> +	if (i == ehdr->e_shnum) {
>> +		dev_err(__md->dev, "Cannot find section header entry in elf\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	shidx = i;
>> +	if (shdr->sh_offset != phdr->p_offset) {
>> +		dev_err(__md->dev, "Invalid entry details for region: %s\n", region->name);
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* Clear name in string table */
>> +	len = strlen(shname) + 1;
>> +	memmove(shname, shname + len,
>> +		__md->elf.strtable_idx - shdr->sh_name - len);
>> +	__md->elf.strtable_idx -= len;
>> +
>> +	/* Clear program header */
>> +	tmp_phdr = elf_phdr_entry_addr(ehdr, phidx);
>> +	for (i = phidx; i < ehdr->e_phnum - 1; i++) {
>> +		tmp_phdr = elf_phdr_entry_addr(ehdr, i + 1);
>> +		phdr = elf_phdr_entry_addr(ehdr, i);
>> +		memcpy(phdr, tmp_phdr, sizeof(struct elf_phdr));
>> +		phdr->p_offset = phdr->p_offset - region->size;
>> +	}
>> +	memset(tmp_phdr, 0, sizeof(struct elf_phdr));
>> +	ehdr->e_phnum--;
>> +
>> +	/* Clear section header */
>> +	tmp_shdr = elf_shdr_entry_addr(ehdr, shidx);
>> +	for (i = shidx; i < ehdr->e_shnum - 1; i++) {
>> +		tmp_shdr = elf_shdr_entry_addr(ehdr, i + 1);
>> +		shdr = elf_shdr_entry_addr(ehdr, i);
>> +		memcpy(shdr, tmp_shdr, sizeof(struct elf_shdr));
>> +		shdr->sh_offset -= region->size;
>> +		shdr->sh_name -= len;
>> +	}
>> +
>> +	memset(tmp_shdr, 0, sizeof(struct elf_shdr));
>> +	ehdr->e_shnum--;
>> +	__md->elf.elf_offset -= region->size;
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * qcom_apss_minidump_region_unregister() - Unregister region from Minidump table.
>> + * @region: minidump region.
>> + *
>> + * Return: On success, it returns 0 and negative error value on failure.
>> + */
>> +int qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region)
>> +{
>> +	struct minidump_region *mdr;
>> +	unsigned int num_region;
>> +	unsigned int idx;
>> +	int ret;
>> +
>> +	if (!region)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&minidump_lock);
>> +	if (!__md) {
>> +		ret = -EPROBE_DEFER;
>> +		goto unlock;
>> +	}
>> +
>> +	idx = get_apss_minidump_region_index(region);
>> +	if (idx < 0) {
>> +		dev_err(__md->dev, "%s region is not present\n", region->name);
>> +		ret = idx;
>> +		goto unlock;
>> +	}
>> +
>> +	mdr = &__md->md_regions[0];
>> +	num_region = le32_to_cpu(__md->md_apss_toc->region_count);
>> +	/*
>> +	 * Left shift all the regions exist after this removed region
>> +	 * index by 1 to fill the gap and zero out the last region
>> +	 * present at the end.
>> +	 */
>> +	memmove(&mdr[idx], &mdr[idx + 1],
>> +		(num_region - idx - 1) * sizeof(struct minidump_region));
>> +	memset(&mdr[num_region - 1], 0, sizeof(struct minidump_region));
>> +	ret = qcom_apss_minidump_clear_header(region);
>> +	if (ret) {
>> +		dev_err(__md->dev, "Failed to remove region: %s\n", region->name);
>> +		goto unlock;
>> +	}
>> +
>> +	num_region--;
>> +	__md->md_apss_toc->region_count = cpu_to_le32(num_region);
>> +unlock:
>> +	mutex_unlock(&minidump_lock);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(qcom_apss_minidump_region_unregister);
>> +
>> +static int qcom_minidump_init_apss_subsystem(struct minidump *md)
>> +{
>> +	struct minidump_subsystem *apsstoc;
>> +
>> +	apsstoc = &md->md_gbl_toc->subsystems[MINIDUMP_APSS_DESC];
>> +	md->md_regions = devm_kcalloc(md->dev, MAX_NUM_ENTRIES,
>> +				      sizeof(struct minidump_region), GFP_KERNEL);
>> +	if (!md->md_regions)
>> +		return -ENOMEM;
>> +
>> +	md->md_apss_toc = apsstoc;
>> +	apsstoc->regions_baseptr = cpu_to_le64(virt_to_phys(md->md_regions));
>> +	apsstoc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED);
>> +	apsstoc->status = cpu_to_le32(1);
>> +	apsstoc->region_count = cpu_to_le32(0);
>> +
>> +	/* Tell bootloader not to encrypt the regions of this subsystem */
>> +	apsstoc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE);
>> +	apsstoc->encryption_required = cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ);
>> +
>> +	return 0;
>> +}
>> +
>> +static int qcom_minidump_probe(struct platform_device *pdev)
>> +{
>> +	struct minidump_global_toc *mdgtoc;
>> +	struct minidump *md;
>> +	size_t size;
>> +	int ret;
>> +
>> +	md = devm_kzalloc(&pdev->dev, sizeof(*md), GFP_KERNEL);
>> +	if (!md)
>> +		return -ENOMEM;
>> +
>> +	mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size);
>> +	if (IS_ERR(mdgtoc)) {
>> +		ret = PTR_ERR(mdgtoc);
>> +		dev_err(&pdev->dev, "Couldn't find minidump smem item: %d\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	if (size < sizeof(*mdgtoc) || !mdgtoc->status) {
>> +		ret = -EINVAL;
>> +		dev_err(&pdev->dev, "minidump table is not initialized: %d\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	mutex_lock(&minidump_lock);
>> +	md->dev = &pdev->dev;
>> +	md->md_gbl_toc = mdgtoc;
> 
> What are you protecting here? It's not possible to have concurrent
> access to md, is it?

Check qcom_apss_minidump_region_{register/unregister} and it is possible
that these API gets called parallel to this probe.

I agree, i made a mistake in not protecting __md in {register} API
but did it unregister API in this patch, which i have fixed in later patch.

> 
>> +	ret = qcom_minidump_init_apss_subsystem(md);
>> +	if (ret) {
>> +		dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret);
>> +		goto unlock;
>> +	}
>> +
>> +	__md = md;
> 
> No. This is a platform device, so it can have multiple instances.

It can have only one instance that is created from SMEM driver probe.

> 
>> +	/* First entry would be ELF header */
>> +	ret = qcom_apss_minidump_add_elf_header();
>> +	if (ret) {
>> +		dev_err(&pdev->dev, "Failed to add elf header: %d\n", ret);
>> +		memset(md->md_apss_toc, 0, sizeof(struct minidump_subsystem));
>> +		__md = NULL;
>> +	}
>> +
>> +unlock:
>> +	mutex_unlock(&minidump_lock);
>> +	return ret;
>> +}
>> +
>> +static int qcom_minidump_remove(struct platform_device *pdev)
>> +{
>> +	memset(__md->md_apss_toc, 0, sizeof(struct minidump_subsystem));
>> +	__md = NULL;
> 
> Don't use __ in variable names. Drop it everywhere.

As i said above, this is being followed in other drivers, so followed
it here as per recommendation.

Let @srini comeback on this.

> 
>> +
>> +	return 0;
>> +}
>> +
>> +static struct platform_driver qcom_minidump_driver = {
>> +	.probe = qcom_minidump_probe,
>> +	.remove = qcom_minidump_remove,
>> +	.driver  = {
>> +		.name = "qcom-minidump",
>> +	},
>> +};
>> +
>> +module_platform_driver(qcom_minidump_driver);
>> +
>> +MODULE_DESCRIPTION("Qualcomm APSS minidump driver");
>> +MODULE_LICENSE("GPL v2");
>> +MODULE_ALIAS("platform:qcom-minidump");
>> diff --git a/drivers/soc/qcom/smem.c b/drivers/soc/qcom/smem.c
>> index 6be7ea9..d459656 100644
>> --- a/drivers/soc/qcom/smem.c
>> +++ b/drivers/soc/qcom/smem.c
>> @@ -279,6 +279,7 @@ struct qcom_smem {
>>   
>>   	u32 item_count;
>>   	struct platform_device *socinfo;
>> +	struct platform_device *minidump;
>>   	struct smem_ptable *ptable;
>>   	struct smem_partition global_partition;
>>   	struct smem_partition partitions[SMEM_HOST_COUNT];
>> @@ -1151,12 +1152,19 @@ static int qcom_smem_probe(struct platform_device *pdev)
>>   	if (IS_ERR(smem->socinfo))
>>   		dev_dbg(&pdev->dev, "failed to register socinfo device\n");
>>   
>> +	smem->minidump = platform_device_register_data(&pdev->dev, "qcom-minidump",
>> +						      PLATFORM_DEVID_NONE, NULL,
>> +						      0);
>> +	if (IS_ERR(smem->minidump))
>> +		dev_dbg(&pdev->dev, "failed to register minidump device\n");
>> +
>>   	return 0;
>>   }
>>   
>>   static int qcom_smem_remove(struct platform_device *pdev)
>>   {
>>   	platform_device_unregister(__smem->socinfo);
>> +	platform_device_unregister(__smem->minidump);
> 
> Wrong order. You registered first socinfo, right?

Any order is fine here, they are not dependent.
But, will fix this.

> 
>>   
>>   	hwspin_lock_free(__smem->hwlock);
>>   	__smem = NULL;
>> diff --git a/include/soc/qcom/qcom_minidump.h b/include/soc/qcom/qcom_minidump.h
>> index 84c8605..1872668 100644
>> --- a/include/soc/qcom/qcom_minidump.h
>> +++ b/include/soc/qcom/qcom_minidump.h
>> @@ -1,6 +1,7 @@
>>   /* SPDX-License-Identifier: GPL-2.0-only */
>>   /*
>> - * Qualcomm minidump shared data structures and macros
>> + * This file contain Qualcomm minidump data structures and macros shared with
>> + * boot firmware and also apss minidump client's data structure
>>    *
>>    * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
>>    */
>> @@ -9,12 +10,27 @@
>>   #define _QCOM_MINIDUMP_H_
>>   
>>   #define MAX_NUM_OF_SS           10
>> +#define MAX_NAME_LENGTH		12
>>   #define MAX_REGION_NAME_LENGTH  16
>> +
>> +#define MINIDUMP_REVISION	1
>>   #define SBL_MINIDUMP_SMEM_ID	602
>> +
>> +/* Application processor minidump descriptor */
>> +#define MINIDUMP_APSS_DESC	0
>> +#define SMEM_ENTRY_SIZE		40
>> +
>>   #define MINIDUMP_REGION_VALID		('V' << 24 | 'A' << 16 | 'L' << 8 | 'I' << 0)
>> +#define MINIDUMP_REGION_INVALID		('I' << 24 | 'N' << 16 | 'V' << 8 | 'A' << 0)
>> +#define MINIDUMP_REGION_INIT		('I' << 24 | 'N' << 16 | 'I' << 8 | 'T' << 0)
>> +#define MINIDUMP_REGION_NOINIT		0
>> +
>> +#define MINIDUMP_SS_ENCR_REQ		(0 << 24 | 'Y' << 16 | 'E' << 8 | 'S' << 0)
>> +#define MINIDUMP_SS_ENCR_NOTREQ		(0 << 24 | 0 << 16 | 'N' << 8 | 'R' << 0)
>> +#define MINIDUMP_SS_ENCR_NONE		('N' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0)
>>   #define MINIDUMP_SS_ENCR_DONE		('D' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0)
>> +#define MINIDUMP_SS_ENCR_START		('S' << 24 | 'T' << 16 | 'R' << 8 | 'T' << 0)
>>   #define MINIDUMP_SS_ENABLED		('E' << 24 | 'N' << 16 | 'B' << 8 | 'L' << 0)
>> -
> 
> Why removing this?

Ok, will keep it.

> 
>>   /**
>>    * struct minidump_region - Minidump region
>>    * @name		: Name of the region to be dumped
>> @@ -63,4 +79,45 @@ struct minidump_global_toc {
>>   	struct minidump_subsystem	subsystems[MAX_NUM_OF_SS];
>>   };
>>   
>> +/**
>> + * struct qcom_apss_minidump_region - APSS Minidump region information
>> + *
>> + * @name:	Entry name, Minidump will dump binary with this name.
>> + * @virt_addr:  Virtual address of the entry.
>> + * @phys_addr:	Physical address of the entry to dump.
>> + * @size:	Number of byte to dump from @address location,
>> + *		and it should be 4 byte aligned.
>> + */
>> +struct qcom_apss_minidump_region {
>> +	char		name[MAX_NAME_LENGTH];
>> +	void		*virt_addr;
>> +	phys_addr_t	phys_addr;
>> +	size_t		size;
>> +};
> 
> You expose way too much internals in global header.

This suppose to shared with all minidump clients.

As already discussed in earlier patch, will keep the one which get
shared with remoteproc as differnet header if the people who have
reviewed till now agree to this common stuff.

> 
>> +
>> +#if IS_ENABLED(CONFIG_QCOM_MINIDUMP)
>> +extern struct minidump_subsystem *
> 
> No externs.

ok.

> The header is unreadable.
> 
>> +qcom_minidump_subsystem_desc(unsigned int minidump_index);
>> +extern int
>> +qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region);
>> +extern int
>> +qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region);
> 
> Blank line
> 
>> +#else
> 
> Blank line
> 
>> +static inline
>> +struct minidump_subsystem *qcom_minidump_subsystem_desc(unsigned int minidump_index)
>> +{
>> +	return NULL;
>> +}
> 
> Blank line
> 
>> +static inline int
>> +qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region)
>> +{
>> +	/* Return quietly, if minidump is not enabled */
>> +	return 0;
>> +}
> 
> 
>> +static inline int
>> +qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region)
>> +{
>> +	return 0;
>> +}
> 
>> +#endif
> 
> /* CONFIG_QCOM_MINIDUMP */
> 
>>   #endif  /* _QCOM_MINIDUMP_H_ */

Noted, thanks.

> 
> Best regards,
> Krzysztof
>
Mukesh Ojha May 4, 2023, 12:57 p.m. UTC | #12
On 5/4/2023 6:06 PM, Krzysztof Kozlowski wrote:
> On 04/05/2023 14:26, Mukesh Ojha wrote:
>>
>>
>> On 5/4/2023 5:33 PM, Krzysztof Kozlowski wrote:
>>> On 04/05/2023 13:58, Mukesh Ojha wrote:
>>>>
>>>>
>>>> On 5/4/2023 5:08 PM, Krzysztof Kozlowski wrote:
>>>>> On 03/05/2023 19:02, Mukesh Ojha wrote:
>>>>>> Move minidump specific data types and macros to a separate internal
>>>>>> header(qcom_minidump.h) so that it can be shared among different
>>>>>> Qualcomm drivers.
>>>>>
>>>>> No, this is not internal header. You moved it to global header.
>>>>>
>>>>> There is no reason driver internals should be exposed to other unrelated
>>>>> subsystems.
>>>>>
>>>>>>
>>>>>> There is no change in functional behavior after this.
>>>>>
>>>>> It is. You made all these internal symbols available to others.
>>>>>
>>>>>>
>>>>>
>>>>> This comes without justification why other drivers needs to access
>>>>> private and internal data. It does not look correct design. NAK.
>>>>
>>>> Thanks for catching outdated commit text, will fix the commit with
>>>> more descriptive reasoning.
>>>>
>>>> It has to be global so that co-processor minidump and apss minidump can
>>>> share data structure and they are lying in different directory.
>>>>
>>>
>>> Then you should not share all the internals of memory layout but only
>>> few pieces necessary to talk with minidump driver. The minidump driver
>>> should organize everything how it wants.
>>
>> These are core data structure which is shared with boot firmware and the
>> one's are moved here all are required by minidump driver .
> 
> I am not sure if I understand correctly. If they are all required by
> minidump driver, then this must not be in include, but stay with
> minidump. Remoteproc then should not touch it.
> 
> I don't understand why internals of minidump should be important for
> remoteproc. If they are, means you broken encapsulation.
> 
>>
>> If you follow here[1], i raised by concern to make this particular one's
>> as private and later to avoid confusion went with single header.
>> But if others agree, I will keep the one that get shared with minidump
>> as separate one or if relative path of headers are allowed that can make
>> it private between these drivers(which i don't think, will be allowed or
>> recommended).
> 
> Let's be specific: why MD_REGION_VALID must be available for remoteproc
> or any other driver after introducing qcom minidump driver?

Forget about this driver for a moment.

I am not sure  how much you know about existing qcom_minidump()
implementation and why is it there in first place in remoteproc
code in driver/remoteproc/qcom_common.c

The idea is, remoteproc co-processor like adsp/cdsp etc. may have their
static predefined region (segments) to be collected on their crash which 
is what exactly existing qcom_minidump() is doing.

Now, after this minidump series, APSS (linux) will have it's
own of collecting linux client region independent of whether
remoteproc minidump collection.

I think, are you hinting to move all minidump related code from 
remoteproc to qcom_minidump driver, is this what are you trying
to say ?

-- Mukesh
> 
> 
> Best regards,
> Krzysztof
>
Mukesh Ojha May 4, 2023, 2:43 p.m. UTC | #13
On 5/4/2023 6:02 PM, Krzysztof Kozlowski wrote:
> On 04/05/2023 13:45, Mukesh Ojha wrote:
>>
>>
>> On 5/4/2023 4:53 PM, Krzysztof Kozlowski wrote:
>>> On 03/05/2023 19:02, Mukesh Ojha wrote:
>>>> Previous patches add the Qualcomm minidump driver support, so
>>>> lets enable minidump config so that it can be used by kernel
>>>> clients.
>>>>
>>>> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
>>>
>>> This patchset is split too much. Defconfig change is one change. Not two
>>> or three.
>>>
>>>> ---
>>>>    arch/arm64/configs/defconfig | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
>>>> index a24609e..831c942 100644
>>>> --- a/arch/arm64/configs/defconfig
>>>> +++ b/arch/arm64/configs/defconfig
>>>> @@ -1250,6 +1250,7 @@ CONFIG_QCOM_STATS=m
>>>>    CONFIG_QCOM_WCNSS_CTRL=m
>>>>    CONFIG_QCOM_APR=m
>>>>    CONFIG_QCOM_ICC_BWMON=m
>>>> +CONFIG_QCOM_MINIDUMP=y
>>>
>>> This must be a module.
>>
>> Why do you think this should be a module ?
>>
>> Is it because, it is lying here among others '=m' ?
> 
> Because we want and insist on everything being a module. That's the
> generic rule. There are exceptions, so if this justifies being an
> exception, please bring appropriate arguments.
> 
>>
>> Or you have some other reasoning ? like it is for qcom specific
>> soc and can not be used outside ? but that is not true for
>> all configs mentioned here.
>>
>> The reason behind making it as '=y' was, to collect information from
>> core kernel data structure as well as the information like percpu data,
>> run queue, irq stat kind of information on kernel crash on a target
>> running some perf configuration(android phone).
> 
> I don't understand why =m stops you from all that.

How do i get kernel symbol address from a modules
can we use kallsyms_lookup_name from modules ?

--Mukesh

  What's more, I don't
> understand why do you refer to the Android here. This is a development
> and debugging Linux defconfig, not Android reference config for vendors...
> 
> Best regards,
> Krzysztof
>
Krzysztof Kozlowski May 4, 2023, 3:16 p.m. UTC | #14
On 04/05/2023 14:57, Mukesh Ojha wrote:
> 
> 
> On 5/4/2023 6:06 PM, Krzysztof Kozlowski wrote:
>> On 04/05/2023 14:26, Mukesh Ojha wrote:
>>>
>>>
>>> On 5/4/2023 5:33 PM, Krzysztof Kozlowski wrote:
>>>> On 04/05/2023 13:58, Mukesh Ojha wrote:
>>>>>
>>>>>
>>>>> On 5/4/2023 5:08 PM, Krzysztof Kozlowski wrote:
>>>>>> On 03/05/2023 19:02, Mukesh Ojha wrote:
>>>>>>> Move minidump specific data types and macros to a separate internal
>>>>>>> header(qcom_minidump.h) so that it can be shared among different
>>>>>>> Qualcomm drivers.
>>>>>>
>>>>>> No, this is not internal header. You moved it to global header.
>>>>>>
>>>>>> There is no reason driver internals should be exposed to other unrelated
>>>>>> subsystems.
>>>>>>
>>>>>>>
>>>>>>> There is no change in functional behavior after this.
>>>>>>
>>>>>> It is. You made all these internal symbols available to others.
>>>>>>
>>>>>>>
>>>>>>
>>>>>> This comes without justification why other drivers needs to access
>>>>>> private and internal data. It does not look correct design. NAK.
>>>>>
>>>>> Thanks for catching outdated commit text, will fix the commit with
>>>>> more descriptive reasoning.
>>>>>
>>>>> It has to be global so that co-processor minidump and apss minidump can
>>>>> share data structure and they are lying in different directory.
>>>>>
>>>>
>>>> Then you should not share all the internals of memory layout but only
>>>> few pieces necessary to talk with minidump driver. The minidump driver
>>>> should organize everything how it wants.
>>>
>>> These are core data structure which is shared with boot firmware and the
>>> one's are moved here all are required by minidump driver .
>>
>> I am not sure if I understand correctly. If they are all required by
>> minidump driver, then this must not be in include, but stay with
>> minidump. Remoteproc then should not touch it.
>>
>> I don't understand why internals of minidump should be important for
>> remoteproc. If they are, means you broken encapsulation.
>>
>>>
>>> If you follow here[1], i raised by concern to make this particular one's
>>> as private and later to avoid confusion went with single header.
>>> But if others agree, I will keep the one that get shared with minidump
>>> as separate one or if relative path of headers are allowed that can make
>>> it private between these drivers(which i don't think, will be allowed or
>>> recommended).
>>
>> Let's be specific: why MD_REGION_VALID must be available for remoteproc
>> or any other driver after introducing qcom minidump driver?
> 
> Forget about this driver for a moment.
> 
> I am not sure  how much you know about existing qcom_minidump()
> implementation and why is it there in first place in remoteproc
> code in driver/remoteproc/qcom_common.c
> 
> The idea is, remoteproc co-processor like adsp/cdsp etc. may have their
> static predefined region (segments) to be collected on their crash which 
> is what exactly existing qcom_minidump() is doing.
> 
> Now, after this minidump series, APSS (linux) will have it's
> own of collecting linux client region independent of whether
> remoteproc minidump collection.
> 
> I think, are you hinting to move all minidump related code from 
> remoteproc to qcom_minidump driver, is this what are you trying
> to say ?

Close, not all but the ones not necessary to identify the
regions/storage/layout. If some variable about this
region/storage/layout is the same everywhere, it means it's basically a
property of qcom minidump and you have just exposed it to consumers
breaking encapsulation.

Best regards,
Krzysztof
Krzysztof Kozlowski May 4, 2023, 3:21 p.m. UTC | #15
On 04/05/2023 14:38, Mukesh Ojha wrote:
> 
> 
> On 5/4/2023 5:06 PM, Krzysztof Kozlowski wrote:
>> On 03/05/2023 19:02, Mukesh Ojha wrote:
>>> Minidump is a best effort mechanism to collect useful and predefined
>>> data for first level of debugging on end user devices running on
>>> Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
>>> or subsystem part of SoC crashes, due to a range of hardware and
>>> software bugs. Hence, the ability to collect accurate data is only
>>> a best-effort. The data collected could be invalid or corrupted,
>>> data collection itself could fail, and so on.
>>>
>>> Qualcomm devices in engineering mode provides a mechanism for
>>> generating full system ramdumps for post mortem debugging. But in some
>>> cases it's however not feasible to capture the entire content of RAM.
>>> The minidump mechanism provides the means for selecting region should
>>> be included in the ramdump. The solution supports extracting the
>>> ramdump/minidump produced either over USB or stored to an attached
>>> storage device.
>>>
>>> The core of minidump feature is part of Qualcomm's boot firmware code.
>>> It initializes shared memory(SMEM), which is a part of DDR and
>>> allocates a small section of it to minidump table i.e also called
>>> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
>>> their own table of segments to be included in the minidump, all
>>> references from a descriptor in SMEM (G-ToC). Each segment/region has
>>> some details like name, physical address and it's size etc. and it
>>> could be anywhere scattered in the DDR.
>>>
>>> Minidump kernel driver adds the capability to add linux region to be
>>> dumped as part of ram dump collection. It provides appropriate symbol
>>> to check its enablement and register client regions.
>>>
>>> To simplify post mortem debugging, it creates and maintain an ELF
>>> header as first region that gets updated upon registration
>>> of a new region.
>>>
>>> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
>>> ---
>>>   drivers/soc/qcom/Kconfig         |  14 +
>>>   drivers/soc/qcom/Makefile        |   1 +
>>>   drivers/soc/qcom/qcom_minidump.c | 581 +++++++++++++++++++++++++++++++++++++++
>>>   drivers/soc/qcom/smem.c          |   8 +
>>>   include/soc/qcom/qcom_minidump.h |  61 +++-
>>>   5 files changed, 663 insertions(+), 2 deletions(-)
>>>   create mode 100644 drivers/soc/qcom/qcom_minidump.c
>>>
>>> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
>>> index a491718..15c931e 100644
>>> --- a/drivers/soc/qcom/Kconfig
>>> +++ b/drivers/soc/qcom/Kconfig
>>> @@ -279,4 +279,18 @@ config QCOM_INLINE_CRYPTO_ENGINE
>>>   	tristate
>>>   	select QCOM_SCM
>>>   
>>> +config QCOM_MINIDUMP
>>> +	tristate "QCOM Minidump Support"
>>> +	depends on ARCH_QCOM || COMPILE_TEST
>>> +	select QCOM_SMEM
>>> +	help
>>> +	  Enablement of core minidump feature is controlled from boot firmware
>>> +	  side, and this config allow linux to query and manages APPS minidump
>>> +	  table.
>>> +
>>> +	  Client drivers can register their internal data structures and debug
>>> +	  messages as part of the minidump region and when the SoC is crashed,
>>> +	  these selective regions will be dumped instead of the entire DDR.
>>> +	  This saves significant amount of time and/or storage space.
>>> +
>>>   endmenu
>>> diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
>>> index 0f43a88..1ebe081 100644
>>> --- a/drivers/soc/qcom/Makefile
>>> +++ b/drivers/soc/qcom/Makefile
>>> @@ -33,3 +33,4 @@ obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o
>>>   obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) +=	kryo-l2-accessors.o
>>>   obj-$(CONFIG_QCOM_ICC_BWMON)	+= icc-bwmon.o
>>>   obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE)	+= ice.o
>>> +obj-$(CONFIG_QCOM_MINIDUMP) += qcom_minidump.o
>>> diff --git a/drivers/soc/qcom/qcom_minidump.c b/drivers/soc/qcom/qcom_minidump.c
>>> new file mode 100644
>>> index 0000000..d107a86
>>> --- /dev/null
>>> +++ b/drivers/soc/qcom/qcom_minidump.c
>>> @@ -0,0 +1,581 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +
>>> +/*
>>> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
>>> + */
>>> +
>>> +#include <linux/elf.h>
>>> +#include <linux/err.h>
>>> +#include <linux/errno.h>
>>> +#include <linux/export.h>
>>> +#include <linux/init.h>
>>> +#include <linux/io.h>
>>> +#include <linux/kernel.h>
>>> +#include <linux/module.h>
>>> +#include <linux/platform_device.h>
>>> +#include <linux/string.h>
>>> +#include <linux/soc/qcom/smem.h>
>>> +#include <soc/qcom/qcom_minidump.h>
>>> +
>>> +/**
>>> + * struct minidump_elfhdr - Minidump table elf header
>>> + * @ehdr: Elf main header
>>> + * @shdr: Section header
>>> + * @phdr: Program header
>>> + * @elf_offset: Section offset in elf
>>> + * @strtable_idx: String table current index position
>>> + */
>>> +struct minidump_elfhdr {
>>> +	struct elfhdr		*ehdr;
>>> +	struct elf_shdr		*shdr;
>>> +	struct elf_phdr		*phdr;
>>> +	size_t			elf_offset;
>>> +	size_t			strtable_idx;
>>> +};
>>> +
>>> +/**
>>> + * struct minidump - Minidump driver private data
>>> + * @md_gbl_toc	: Global TOC pointer
>>> + * @md_apss_toc	: Application Subsystem TOC pointer
>>> + * @md_regions	: High level OS region base pointer
>>> + * @elf		: Minidump elf header
>>> + * @dev		: Minidump device
>>> + */
>>> +struct minidump {
>>> +	struct minidump_global_toc	*md_gbl_toc;
>>> +	struct minidump_subsystem	*md_apss_toc;
>>> +	struct minidump_region		*md_regions;
>>> +	struct minidump_elfhdr		elf;
>>> +	struct device			*dev;
>>> +};
>>> +
>>> +/*
>>> + * In some of the Old Qualcomm devices, boot firmware statically allocates 300
>>> + * as total number of supported region (including all co-processors) in
>>> + * minidump table out of which linux was using 201. In future, this limitation
>>> + * from boot firmware might get removed by allocating the region dynamically.
>>> + * So, keep it compatible with older devices, we can keep the current limit for
>>> + * Linux to 201.
>>> + */
>>> +#define MAX_NUM_ENTRIES	  201
>>> +#define MAX_STRTBL_SIZE	  (MAX_NUM_ENTRIES * MAX_REGION_NAME_LENGTH)
>>> +
>>> +static struct minidump *__md;
>>
>> No, no file scope or global scope statics.
> 
> Sorry, this is done as per recommendation given here [1] and this 
> matches both driver/firmware/qcom_scm.c and driver/soc/qcom/smem.c
> implementations.
> 
> [1]
> https://lore.kernel.org/lkml/f74dfcde-e59b-a9b3-9bbc-a8de644f6740@linaro.org/

That's not true. You had the static already in v2, before Srini commented.

Look:
https://lore.kernel.org/lkml/1679491817-2498-5-git-send-email-quic_mojha@quicinc.com/

+static struct minidump minidump;
+static DEFINE_MUTEX(minidump_lock);

We do not talk about the names.


>>> +
>>> +	if (size < sizeof(*mdgtoc) || !mdgtoc->status) {
>>> +		ret = -EINVAL;
>>> +		dev_err(&pdev->dev, "minidump table is not initialized: %d\n", ret);
>>> +		return ret;
>>> +	}
>>> +
>>> +	mutex_lock(&minidump_lock);
>>> +	md->dev = &pdev->dev;
>>> +	md->md_gbl_toc = mdgtoc;
>>
>> What are you protecting here? It's not possible to have concurrent
>> access to md, is it?
> 
> Check qcom_apss_minidump_region_{register/unregister} and it is possible
> that these API gets called parallel to this probe.

Wait, you say that something can modify local variable md before it is
assigned to __md? How?

> 
> I agree, i made a mistake in not protecting __md in {register} API
> but did it unregister API in this patch, which i have fixed in later patch.

No, you are protecting random things. Nothing will concurrently modify
md and &pdev->dev in this moment. mdgtoc is allocated above, so also
cannot by modified.

Otherwise show me the hypothetical scenario.


> 
>>
>>> +	ret = qcom_minidump_init_apss_subsystem(md);
>>> +	if (ret) {
>>> +		dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret);
>>> +		goto unlock;
>>> +	}
>>> +
>>> +	__md = md;
>>
>> No. This is a platform device, so it can have multiple instances.
> 
> It can have only one instance that is created from SMEM driver probe.

Anyone can instantiate more of them.... how did you solve it?


> 
>>
>>> +	/* First entry would be ELF header */
>>> +	ret = qcom_apss_minidump_add_elf_header();
>>> +	if (ret) {
>>> +		dev_err(&pdev->dev, "Failed to add elf header: %d\n", ret);
>>> +		memset(md->md_apss_toc, 0, sizeof(struct minidump_subsystem));
>>> +		__md = NULL;
>>> +	}
>>> +
>>> +unlock:
>>> +	mutex_unlock(&minidump_lock);
>>> +	return ret;
>>> +}
>>> +
>>> +static int qcom_minidump_remove(struct platform_device *pdev)
>>> +{
>>> +	memset(__md->md_apss_toc, 0, sizeof(struct minidump_subsystem));
>>> +	__md = NULL;
>>
>> Don't use __ in variable names. Drop it everywhere.
> 
> As i said above, this is being followed in other drivers, so followed
> it here as per recommendation.
> 
> Let @srini comeback on this.

Which part of coding style recommends __ for driver code?

> 
>>
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static struct platform_driver qcom_minidump_driver = {
>>> +	.probe = qcom_minidump_probe,
>>> +	.remove = qcom_minidump_remove,
>>> +	.driver  = {
>>> +		.name = "qcom-minidump",
>>> +	},
>>> +};
>>> +
>>> +module_platform_driver(qcom_minidump_driver);
>>> +
>>> +MODULE_DESCRIPTION("Qualcomm APSS minidump driver");
>>> +MODULE_LICENSE("GPL v2");
>>> +MODULE_ALIAS("platform:qcom-minidump");
>>> diff --git a/drivers/soc/qcom/smem.c b/drivers/soc/qcom/smem.c
>>> index 6be7ea9..d459656 100644
>>> --- a/drivers/soc/qcom/smem.c
>>> +++ b/drivers/soc/qcom/smem.c
>>> @@ -279,6 +279,7 @@ struct qcom_smem {
>>>   
>>>   	u32 item_count;
>>>   	struct platform_device *socinfo;
>>> +	struct platform_device *minidump;
>>>   	struct smem_ptable *ptable;
>>>   	struct smem_partition global_partition;
>>>   	struct smem_partition partitions[SMEM_HOST_COUNT];
>>> @@ -1151,12 +1152,19 @@ static int qcom_smem_probe(struct platform_device *pdev)
>>>   	if (IS_ERR(smem->socinfo))
>>>   		dev_dbg(&pdev->dev, "failed to register socinfo device\n");
>>>   
>>> +	smem->minidump = platform_device_register_data(&pdev->dev, "qcom-minidump",
>>> +						      PLATFORM_DEVID_NONE, NULL,
>>> +						      0);
>>> +	if (IS_ERR(smem->minidump))
>>> +		dev_dbg(&pdev->dev, "failed to register minidump device\n");
>>> +
>>>   	return 0;
>>>   }
>>>   
>>>   static int qcom_smem_remove(struct platform_device *pdev)
>>>   {
>>>   	platform_device_unregister(__smem->socinfo);
>>> +	platform_device_unregister(__smem->minidump);
>>
>> Wrong order. You registered first socinfo, right?
> 
> Any order is fine here, they are not dependent.
> But, will fix this.

No, the order is always reversed from allocation. It does not matter if
they are dependent or not.

Best regards,
Krzysztof
Krzysztof Kozlowski May 4, 2023, 3:24 p.m. UTC | #16
On 04/05/2023 16:43, Mukesh Ojha wrote:
> 
> 
> On 5/4/2023 6:02 PM, Krzysztof Kozlowski wrote:
>> On 04/05/2023 13:45, Mukesh Ojha wrote:
>>>
>>>
>>> On 5/4/2023 4:53 PM, Krzysztof Kozlowski wrote:
>>>> On 03/05/2023 19:02, Mukesh Ojha wrote:
>>>>> Previous patches add the Qualcomm minidump driver support, so
>>>>> lets enable minidump config so that it can be used by kernel
>>>>> clients.
>>>>>
>>>>> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
>>>>
>>>> This patchset is split too much. Defconfig change is one change. Not two
>>>> or three.
>>>>
>>>>> ---
>>>>>    arch/arm64/configs/defconfig | 1 +
>>>>>    1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
>>>>> index a24609e..831c942 100644
>>>>> --- a/arch/arm64/configs/defconfig
>>>>> +++ b/arch/arm64/configs/defconfig
>>>>> @@ -1250,6 +1250,7 @@ CONFIG_QCOM_STATS=m
>>>>>    CONFIG_QCOM_WCNSS_CTRL=m
>>>>>    CONFIG_QCOM_APR=m
>>>>>    CONFIG_QCOM_ICC_BWMON=m
>>>>> +CONFIG_QCOM_MINIDUMP=y
>>>>
>>>> This must be a module.
>>>
>>> Why do you think this should be a module ?
>>>
>>> Is it because, it is lying here among others '=m' ?
>>
>> Because we want and insist on everything being a module. That's the
>> generic rule. There are exceptions, so if this justifies being an
>> exception, please bring appropriate arguments.
>>
>>>
>>> Or you have some other reasoning ? like it is for qcom specific
>>> soc and can not be used outside ? but that is not true for
>>> all configs mentioned here.
>>>
>>> The reason behind making it as '=y' was, to collect information from
>>> core kernel data structure as well as the information like percpu data,
>>> run queue, irq stat kind of information on kernel crash on a target
>>> running some perf configuration(android phone).
>>
>> I don't understand why =m stops you from all that.
> 
> How do i get kernel symbol address from a modules
> can we use kallsyms_lookup_name from modules ?

You allow it to be a module in patch #4, so I think you solved it,
right? Otherwise it could not be a module?

Anyway, where do you use kallsyms_lookup_name()? I cannot find it in
your patch.

Best regards,
Krzysztof
Krzysztof Kozlowski May 4, 2023, 4:34 p.m. UTC | #17
On 04/05/2023 17:21, Krzysztof Kozlowski wrote:
>>>
>>>> +	ret = qcom_minidump_init_apss_subsystem(md);
>>>> +	if (ret) {
>>>> +		dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret);
>>>> +		goto unlock;
>>>> +	}
>>>> +
>>>> +	__md = md;
>>>
>>> No. This is a platform device, so it can have multiple instances.
>>
>> It can have only one instance that is created from SMEM driver probe.
> 
> Anyone can instantiate more of them.... how did you solve it?

To clarify - sprinkling more of singletons makes everything tightly
coupled, difficult to debug and non-portable. You cannot have two
instances, you have to control concurrent initialization by yourself in
each of such singletons.

I understand sometimes they are unavoidable, for example when this does
not map to hardware property. However here you have the parent - smem -
which can return you valid instance. Thus you avoid entire problem of
file-scope variables.

Best regards,
Krzysztof
Mukesh Ojha May 5, 2023, 5:34 a.m. UTC | #18
On 5/4/2023 8:51 PM, Krzysztof Kozlowski wrote:
> On 04/05/2023 14:38, Mukesh Ojha wrote:
>>
>>
>> On 5/4/2023 5:06 PM, Krzysztof Kozlowski wrote:
>>> On 03/05/2023 19:02, Mukesh Ojha wrote:
>>>> Minidump is a best effort mechanism to collect useful and predefined
>>>> data for first level of debugging on end user devices running on
>>>> Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
>>>> or subsystem part of SoC crashes, due to a range of hardware and
>>>> software bugs. Hence, the ability to collect accurate data is only
>>>> a best-effort. The data collected could be invalid or corrupted,
>>>> data collection itself could fail, and so on.
>>>>
>>>> Qualcomm devices in engineering mode provides a mechanism for
>>>> generating full system ramdumps for post mortem debugging. But in some
>>>> cases it's however not feasible to capture the entire content of RAM.
>>>> The minidump mechanism provides the means for selecting region should
>>>> be included in the ramdump. The solution supports extracting the
>>>> ramdump/minidump produced either over USB or stored to an attached
>>>> storage device.
>>>>
>>>> The core of minidump feature is part of Qualcomm's boot firmware code.
>>>> It initializes shared memory(SMEM), which is a part of DDR and
>>>> allocates a small section of it to minidump table i.e also called
>>>> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
>>>> their own table of segments to be included in the minidump, all
>>>> references from a descriptor in SMEM (G-ToC). Each segment/region has
>>>> some details like name, physical address and it's size etc. and it
>>>> could be anywhere scattered in the DDR.
>>>>
>>>> Minidump kernel driver adds the capability to add linux region to be
>>>> dumped as part of ram dump collection. It provides appropriate symbol
>>>> to check its enablement and register client regions.
>>>>
>>>> To simplify post mortem debugging, it creates and maintain an ELF
>>>> header as first region that gets updated upon registration
>>>> of a new region.
>>>>
>>>> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
>>>> ---
>>>>    drivers/soc/qcom/Kconfig         |  14 +
>>>>    drivers/soc/qcom/Makefile        |   1 +
>>>>    drivers/soc/qcom/qcom_minidump.c | 581 +++++++++++++++++++++++++++++++++++++++
>>>>    drivers/soc/qcom/smem.c          |   8 +
>>>>    include/soc/qcom/qcom_minidump.h |  61 +++-
>>>>    5 files changed, 663 insertions(+), 2 deletions(-)
>>>>    create mode 100644 drivers/soc/qcom/qcom_minidump.c
>>>>
>>>> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
>>>> index a491718..15c931e 100644
>>>> --- a/drivers/soc/qcom/Kconfig
>>>> +++ b/drivers/soc/qcom/Kconfig
>>>> @@ -279,4 +279,18 @@ config QCOM_INLINE_CRYPTO_ENGINE
>>>>    	tristate
>>>>    	select QCOM_SCM
>>>>    
>>>> +config QCOM_MINIDUMP
>>>> +	tristate "QCOM Minidump Support"
>>>> +	depends on ARCH_QCOM || COMPILE_TEST
>>>> +	select QCOM_SMEM
>>>> +	help
>>>> +	  Enablement of core minidump feature is controlled from boot firmware
>>>> +	  side, and this config allow linux to query and manages APPS minidump
>>>> +	  table.
>>>> +
>>>> +	  Client drivers can register their internal data structures and debug
>>>> +	  messages as part of the minidump region and when the SoC is crashed,
>>>> +	  these selective regions will be dumped instead of the entire DDR.
>>>> +	  This saves significant amount of time and/or storage space.
>>>> +
>>>>    endmenu
>>>> diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
>>>> index 0f43a88..1ebe081 100644
>>>> --- a/drivers/soc/qcom/Makefile
>>>> +++ b/drivers/soc/qcom/Makefile
>>>> @@ -33,3 +33,4 @@ obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o
>>>>    obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) +=	kryo-l2-accessors.o
>>>>    obj-$(CONFIG_QCOM_ICC_BWMON)	+= icc-bwmon.o
>>>>    obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE)	+= ice.o
>>>> +obj-$(CONFIG_QCOM_MINIDUMP) += qcom_minidump.o
>>>> diff --git a/drivers/soc/qcom/qcom_minidump.c b/drivers/soc/qcom/qcom_minidump.c
>>>> new file mode 100644
>>>> index 0000000..d107a86
>>>> --- /dev/null
>>>> +++ b/drivers/soc/qcom/qcom_minidump.c
>>>> @@ -0,0 +1,581 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +
>>>> +/*
>>>> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
>>>> + */
>>>> +
>>>> +#include <linux/elf.h>
>>>> +#include <linux/err.h>
>>>> +#include <linux/errno.h>
>>>> +#include <linux/export.h>
>>>> +#include <linux/init.h>
>>>> +#include <linux/io.h>
>>>> +#include <linux/kernel.h>
>>>> +#include <linux/module.h>
>>>> +#include <linux/platform_device.h>
>>>> +#include <linux/string.h>
>>>> +#include <linux/soc/qcom/smem.h>
>>>> +#include <soc/qcom/qcom_minidump.h>
>>>> +
>>>> +/**
>>>> + * struct minidump_elfhdr - Minidump table elf header
>>>> + * @ehdr: Elf main header
>>>> + * @shdr: Section header
>>>> + * @phdr: Program header
>>>> + * @elf_offset: Section offset in elf
>>>> + * @strtable_idx: String table current index position
>>>> + */
>>>> +struct minidump_elfhdr {
>>>> +	struct elfhdr		*ehdr;
>>>> +	struct elf_shdr		*shdr;
>>>> +	struct elf_phdr		*phdr;
>>>> +	size_t			elf_offset;
>>>> +	size_t			strtable_idx;
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct minidump - Minidump driver private data
>>>> + * @md_gbl_toc	: Global TOC pointer
>>>> + * @md_apss_toc	: Application Subsystem TOC pointer
>>>> + * @md_regions	: High level OS region base pointer
>>>> + * @elf		: Minidump elf header
>>>> + * @dev		: Minidump device
>>>> + */
>>>> +struct minidump {
>>>> +	struct minidump_global_toc	*md_gbl_toc;
>>>> +	struct minidump_subsystem	*md_apss_toc;
>>>> +	struct minidump_region		*md_regions;
>>>> +	struct minidump_elfhdr		elf;
>>>> +	struct device			*dev;
>>>> +};
>>>> +
>>>> +/*
>>>> + * In some of the Old Qualcomm devices, boot firmware statically allocates 300
>>>> + * as total number of supported region (including all co-processors) in
>>>> + * minidump table out of which linux was using 201. In future, this limitation
>>>> + * from boot firmware might get removed by allocating the region dynamically.
>>>> + * So, keep it compatible with older devices, we can keep the current limit for
>>>> + * Linux to 201.
>>>> + */
>>>> +#define MAX_NUM_ENTRIES	  201
>>>> +#define MAX_STRTBL_SIZE	  (MAX_NUM_ENTRIES * MAX_REGION_NAME_LENGTH)
>>>> +
>>>> +static struct minidump *__md;
>>>
>>> No, no file scope or global scope statics.
>>
>> Sorry, this is done as per recommendation given here [1] and this
>> matches both driver/firmware/qcom_scm.c and driver/soc/qcom/smem.c
>> implementations.
>>
>> [1]
>> https://lore.kernel.org/lkml/f74dfcde-e59b-a9b3-9bbc-a8de644f6740@linaro.org/
> 
> That's not true. You had the static already in v2, before Srini commented.
> 
> Look:
> https://lore.kernel.org/lkml/1679491817-2498-5-git-send-email-quic_mojha@quicinc.com/
> 
> +static struct minidump minidump;
> +static DEFINE_MUTEX(minidump_lock);
> 
> We do not talk about the names.

I apologize for this.

> 
> 
>>>> +
>>>> +	if (size < sizeof(*mdgtoc) || !mdgtoc->status) {
>>>> +		ret = -EINVAL;
>>>> +		dev_err(&pdev->dev, "minidump table is not initialized: %d\n", ret);
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	mutex_lock(&minidump_lock);
>>>> +	md->dev = &pdev->dev;
>>>> +	md->md_gbl_toc = mdgtoc;
>>>
>>> What are you protecting here? It's not possible to have concurrent
>>> access to md, is it?
>>
>> Check qcom_apss_minidump_region_{register/unregister} and it is possible
>> that these API gets called parallel to this probe.
> 
> Wait, you say that something can modify local variable md before it is
> assigned to __md? How?

No.

>>
>> I agree, i made a mistake in not protecting __md in {register} API
>> but did it unregister API in this patch, which i have fixed in later patch.
> 
> No, you are protecting random things. Nothing will concurrently modify
> md and &pdev->dev in this moment. mdgtoc is allocated above, so also
> cannot by modified.
> 
> Otherwise show me the hypothetical scenario.

You are correct, it should just protect the assignment.
__md = md;

Thanks
> 
> 
>>
>>>
>>>> +	ret = qcom_minidump_init_apss_subsystem(md);
>>>> +	if (ret) {
>>>> +		dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret);
>>>> +		goto unlock;
>>>> +	}
>>>> +
>>>> +	__md = md;
>>>
>>> No. This is a platform device, so it can have multiple instances.
>>
>> It can have only one instance that is created from SMEM driver probe.
> 
> Anyone can instantiate more of them.... how did you solve it?
> 
> 
>>
>>>
>>>> +	/* First entry would be ELF header */
>>>> +	ret = qcom_apss_minidump_add_elf_header();
>>>> +	if (ret) {
>>>> +		dev_err(&pdev->dev, "Failed to add elf header: %d\n", ret);
>>>> +		memset(md->md_apss_toc, 0, sizeof(struct minidump_subsystem));
>>>> +		__md = NULL;
>>>> +	}
>>>> +
>>>> +unlock:
>>>> +	mutex_unlock(&minidump_lock);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int qcom_minidump_remove(struct platform_device *pdev)
>>>> +{
>>>> +	memset(__md->md_apss_toc, 0, sizeof(struct minidump_subsystem));
>>>> +	__md = NULL;
>>>
>>> Don't use __ in variable names. Drop it everywhere.
>>
>> As i said above, this is being followed in other drivers, so followed
>> it here as per recommendation.
>>
>> Let @srini comeback on this.
> 
> Which part of coding style recommends __ for driver code?

Will fix this.

> 
>>
>>>
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static struct platform_driver qcom_minidump_driver = {
>>>> +	.probe = qcom_minidump_probe,
>>>> +	.remove = qcom_minidump_remove,
>>>> +	.driver  = {
>>>> +		.name = "qcom-minidump",
>>>> +	},
>>>> +};
>>>> +
>>>> +module_platform_driver(qcom_minidump_driver);
>>>> +
>>>> +MODULE_DESCRIPTION("Qualcomm APSS minidump driver");
>>>> +MODULE_LICENSE("GPL v2");
>>>> +MODULE_ALIAS("platform:qcom-minidump");
>>>> diff --git a/drivers/soc/qcom/smem.c b/drivers/soc/qcom/smem.c
>>>> index 6be7ea9..d459656 100644
>>>> --- a/drivers/soc/qcom/smem.c
>>>> +++ b/drivers/soc/qcom/smem.c
>>>> @@ -279,6 +279,7 @@ struct qcom_smem {
>>>>    
>>>>    	u32 item_count;
>>>>    	struct platform_device *socinfo;
>>>> +	struct platform_device *minidump;
>>>>    	struct smem_ptable *ptable;
>>>>    	struct smem_partition global_partition;
>>>>    	struct smem_partition partitions[SMEM_HOST_COUNT];
>>>> @@ -1151,12 +1152,19 @@ static int qcom_smem_probe(struct platform_device *pdev)
>>>>    	if (IS_ERR(smem->socinfo))
>>>>    		dev_dbg(&pdev->dev, "failed to register socinfo device\n");
>>>>    
>>>> +	smem->minidump = platform_device_register_data(&pdev->dev, "qcom-minidump",
>>>> +						      PLATFORM_DEVID_NONE, NULL,
>>>> +						      0);
>>>> +	if (IS_ERR(smem->minidump))
>>>> +		dev_dbg(&pdev->dev, "failed to register minidump device\n");
>>>> +
>>>>    	return 0;
>>>>    }
>>>>    
>>>>    static int qcom_smem_remove(struct platform_device *pdev)
>>>>    {
>>>>    	platform_device_unregister(__smem->socinfo);
>>>> +	platform_device_unregister(__smem->minidump);
>>>
>>> Wrong order. You registered first socinfo, right?
>>
>> Any order is fine here, they are not dependent.
>> But, will fix this.
> 
> No, the order is always reversed from allocation. It does not matter if
> they are dependent or not.

Ok

> 
> Best regards,
> Krzysztof
> 

-- Mukesh
Mukesh Ojha May 8, 2023, 7:10 a.m. UTC | #19
On 5/4/2023 10:04 PM, Krzysztof Kozlowski wrote:
> On 04/05/2023 17:21, Krzysztof Kozlowski wrote:
>>>>
>>>>> +	ret = qcom_minidump_init_apss_subsystem(md);
>>>>> +	if (ret) {
>>>>> +		dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret);
>>>>> +		goto unlock;
>>>>> +	}
>>>>> +
>>>>> +	__md = md;
>>>>
>>>> No. This is a platform device, so it can have multiple instances.
>>>
>>> It can have only one instance that is created from SMEM driver probe.
>>
>> Anyone can instantiate more of them.... how did you solve it?
> 
> To clarify - sprinkling more of singletons makes everything tightly
> coupled, difficult to debug and non-portable. You cannot have two
> instances, you have to control concurrent initialization by yourself in
> each of such singletons.
> 
> I understand sometimes they are unavoidable, for example when this does
> not map to hardware property. However here you have the parent - smem -
> which can return you valid instance. Thus you avoid entire problem of
> file-scope variables.

I get your point, why one's should avoid file scope variables.


This is infrastructure driver and will not have multiple instances and 
even if it happens could be avoided with with the help of global mutex 
and protect below function which i am already doing at the moment and 
fail the other probe if it is already initialized with proper logging..e.g

"already initialized..."


ret = qcom_minidump_init_apss_subsystem(md);


And this will be in-lined with

/* Pointer to the one and only smem handle */
static struct qcom_smem *__smem;

Let me know if you still disagree...and have some other way ?


-- Mukesh


> 
> Best regards,
> Krzysztof
>
Krzysztof Kozlowski May 9, 2023, 7:11 a.m. UTC | #20
On 08/05/2023 09:10, Mukesh Ojha wrote:
> 
> 
> On 5/4/2023 10:04 PM, Krzysztof Kozlowski wrote:
>> On 04/05/2023 17:21, Krzysztof Kozlowski wrote:
>>>>>
>>>>>> +	ret = qcom_minidump_init_apss_subsystem(md);
>>>>>> +	if (ret) {
>>>>>> +		dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret);
>>>>>> +		goto unlock;
>>>>>> +	}
>>>>>> +
>>>>>> +	__md = md;
>>>>>
>>>>> No. This is a platform device, so it can have multiple instances.
>>>>
>>>> It can have only one instance that is created from SMEM driver probe.
>>>
>>> Anyone can instantiate more of them.... how did you solve it?
>>
>> To clarify - sprinkling more of singletons makes everything tightly
>> coupled, difficult to debug and non-portable. You cannot have two
>> instances, you have to control concurrent initialization by yourself in
>> each of such singletons.
>>
>> I understand sometimes they are unavoidable, for example when this does
>> not map to hardware property. However here you have the parent - smem -
>> which can return you valid instance. Thus you avoid entire problem of
>> file-scope variables.
> 
> I get your point, why one's should avoid file scope variables.
> 
> 
> This is infrastructure driver and will not have multiple instances and 
> even if it happens could be avoided with with the help of global mutex 
> and protect below function which i am already doing at the moment and 

But we do not want global mutexes... so incorrect design is being
improved by more incorrect design.

> fail the other probe if it is already initialized with proper logging..e.g
> 
> "already initialized..."
> 
> 
> ret = qcom_minidump_init_apss_subsystem(md);
> 
> 
> And this will be in-lined with
> 
> /* Pointer to the one and only smem handle */
> static struct qcom_smem *__smem;
> 
> Let me know if you still disagree...and have some other way ?

Why the parent - smem - cannot return every consumer the instance it
has? There will be one smem having only one minidump, so all problems
solved?

Best regards,
Krzysztof
Trilok Soni May 14, 2023, 4:16 a.m. UTC | #21
On 5/8/2023 12:10 AM, Mukesh Ojha wrote:
> 
> 
> On 5/4/2023 10:04 PM, Krzysztof Kozlowski wrote:
>> On 04/05/2023 17:21, Krzysztof Kozlowski wrote:
>>>>>
>>>>>> +    ret = qcom_minidump_init_apss_subsystem(md);
>>>>>> +    if (ret) {
>>>>>> +        dev_err(&pdev->dev, "apss minidump initialization failed: 
>>>>>> %d\n", ret);
>>>>>> +        goto unlock;
>>>>>> +    }
>>>>>> +
>>>>>> +    __md = md;
>>>>>
>>>>> No. This is a platform device, so it can have multiple instances.
>>>>
>>>> It can have only one instance that is created from SMEM driver probe.
>>>
>>> Anyone can instantiate more of them.... how did you solve it?
>>
>> To clarify - sprinkling more of singletons makes everything tightly
>> coupled, difficult to debug and non-portable. You cannot have two
>> instances, you have to control concurrent initialization by yourself in
>> each of such singletons.
>>
>> I understand sometimes they are unavoidable, for example when this does
>> not map to hardware property. However here you have the parent - smem -
>> which can return you valid instance. Thus you avoid entire problem of
>> file-scope variables.
> 
> I get your point, why one's should avoid file scope variables.
> 
> 
> This is infrastructure driver and will not have multiple instances and 
> even if it happens could be avoided with with the help of global mutex 
> and protect below function which i am already doing at the moment and 
> fail the other probe if it is already initialized with proper logging..e.g

Another way to think here is what if you have chiplets? Two SOCs looks 
like one as a product? How does your driver will behave in those cases?

---Trilok Soni
Mukesh Ojha May 28, 2023, 11:29 a.m. UTC | #22
Hi Krzysztof,

On 5/9/2023 12:41 PM, Krzysztof Kozlowski wrote:
> On 08/05/2023 09:10, Mukesh Ojha wrote:
>>
>>
>> On 5/4/2023 10:04 PM, Krzysztof Kozlowski wrote:
>>> On 04/05/2023 17:21, Krzysztof Kozlowski wrote:
>>>>>>
>>>>>>> +	ret = qcom_minidump_init_apss_subsystem(md);
>>>>>>> +	if (ret) {
>>>>>>> +		dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret);
>>>>>>> +		goto unlock;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	__md = md;
>>>>>>
>>>>>> No. This is a platform device, so it can have multiple instances.
>>>>>
>>>>> It can have only one instance that is created from SMEM driver probe.
>>>>
>>>> Anyone can instantiate more of them.... how did you solve it?
>>>
>>> To clarify - sprinkling more of singletons makes everything tightly
>>> coupled, difficult to debug and non-portable. You cannot have two
>>> instances, you have to control concurrent initialization by yourself in
>>> each of such singletons.
>>>
>>> I understand sometimes they are unavoidable, for example when this does
>>> not map to hardware property. However here you have the parent - smem -
>>> which can return you valid instance. Thus you avoid entire problem of
>>> file-scope variables.
>>
>> I get your point, why one's should avoid file scope variables.
>>
>>
>> This is infrastructure driver and will not have multiple instances and
>> even if it happens could be avoided with with the help of global mutex
>> and protect below function which i am already doing at the moment and
> 
> But we do not want global mutexes... so incorrect design is being
> improved by more incorrect design.
> 
>> fail the other probe if it is already initialized with proper logging..e.g
>>
>> "already initialized..."
>>
>>
>> ret = qcom_minidump_init_apss_subsystem(md);
>>
>>
>> And this will be in-lined with
>>
>> /* Pointer to the one and only smem handle */
>> static struct qcom_smem *__smem;
>>
>> Let me know if you still disagree...and have some other way ?
> 
> Why the parent - smem - cannot return every consumer the instance it
> has? There will be one smem having only one minidump, so all problems
> solved?

Sorry, I am extending this discussion but it is needed to avoid rework
in upcoming patches.

I am inline with the thought of each smem has its own minidump instance, 
which is basically one at this moment as SMEM has only instance in DT.
In that way, Client driver calling qcom_apss_minidump_region_register()
will also need to know the instance it need to register with right?

However, I do have a use case [1] where SMEM or similar region 
supporting memory mapped region could be virtualized and guest vm does
not have direct access to it, that way it will only have one backend at 
a time.But even if they exist together that can be done with below approach.

File scope variable is still needed in minidump core but can be avoided 
in backend drivers where each backend register with core and get added 
itself in the list and for list protection, list mutex would be needed.


#define SMEM       0;
#define MMIO       1;
or enum may be..

And client can call this to the instance it need to register with..
int qcom_apss_minidump_region_register(region, SMEM);
int qcom_apss_minidump_region_register(region, MMIO);

Do you agree with this approach?

[1]

            +----------------+
            |                |
            | client A-Z     |
            +-----+----------+
                  |
                  |
                  |
                  |
                  v
       +------------------------+
       |                        |                other backends
       |    minidump core       +----------------------------+
       |                        |                            |
       +--+---------------------+                            |
          |                     |                            |
          |                     |                            |
          |                     |                            |e.g,
          |                     |                            |gunyah-rm
+--------v------+        +-----v-----------+             +--+---------+
|               |        |                 |             |            |
|minidump_smem  |        | minidump_mmio   |             | .....      |
+---------------+        +-----------------+             +------------+
  SMEM backend              mmio backend where
                            smem may be virtualized


-- Mukesh
> 
> Best regards,
> Krzysztof
>
Mukesh Ojha June 2, 2023, 10:43 a.m. UTC | #23
On 5/4/2023 5:06 PM, Krzysztof Kozlowski wrote:
> On 03/05/2023 19:02, Mukesh Ojha wrote:
>> Minidump is a best effort mechanism to collect useful and predefined
>> data for first level of debugging on end user devices running on
>> Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
>> or subsystem part of SoC crashes, due to a range of hardware and
>> software bugs. Hence, the ability to collect accurate data is only
>> a best-effort. The data collected could be invalid or corrupted,
>> data collection itself could fail, and so on.
>>
>> Qualcomm devices in engineering mode provides a mechanism for
>> generating full system ramdumps for post mortem debugging. But in some
>> cases it's however not feasible to capture the entire content of RAM.
>> The minidump mechanism provides the means for selecting region should
>> be included in the ramdump. The solution supports extracting the
>> ramdump/minidump produced either over USB or stored to an attached
>> storage device.
>>
>> The core of minidump feature is part of Qualcomm's boot firmware code.
>> It initializes shared memory(SMEM), which is a part of DDR and
>> allocates a small section of it to minidump table i.e also called
>> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
>> their own table of segments to be included in the minidump, all
>> references from a descriptor in SMEM (G-ToC). Each segment/region has
>> some details like name, physical address and it's size etc. and it
>> could be anywhere scattered in the DDR.
>>
>> Minidump kernel driver adds the capability to add linux region to be
>> dumped as part of ram dump collection. It provides appropriate symbol
>> to check its enablement and register client regions.
>>
>> To simplify post mortem debugging, it creates and maintain an ELF
>> header as first region that gets updated upon registration
>> of a new region.
>>
>> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
>> ---
>>   drivers/soc/qcom/Kconfig         |  14 +
>>   drivers/soc/qcom/Makefile        |   1 +
>>   drivers/soc/qcom/qcom_minidump.c | 581 +++++++++++++++++++++++++++++++++++++++
>>   drivers/soc/qcom/smem.c          |   8 +
>>   include/soc/qcom/qcom_minidump.h |  61 +++-
>>   5 files changed, 663 insertions(+), 2 deletions(-)
>>   create mode 100644 drivers/soc/qcom/qcom_minidump.c
>>
>> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
>> index a491718..15c931e 100644
>> --- a/drivers/soc/qcom/Kconfig
>> +++ b/drivers/soc/qcom/Kconfig
>> @@ -279,4 +279,18 @@ config QCOM_INLINE_CRYPTO_ENGINE
>>   	tristate
>>   	select QCOM_SCM
>>   
>> +config QCOM_MINIDUMP
>> +	tristate "QCOM Minidump Support"
>> +	depends on ARCH_QCOM || COMPILE_TEST
>> +	select QCOM_SMEM
>> +	help
>> +	  Enablement of core minidump feature is controlled from boot firmware
>> +	  side, and this config allow linux to query and manages APPS minidump
>> +	  table.
>> +
>> +	  Client drivers can register their internal data structures and debug
>> +	  messages as part of the minidump region and when the SoC is crashed,
>> +	  these selective regions will be dumped instead of the entire DDR.
>> +	  This saves significant amount of time and/or storage space.
>> +
>>   endmenu
>> diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
>> index 0f43a88..1ebe081 100644
>> --- a/drivers/soc/qcom/Makefile
>> +++ b/drivers/soc/qcom/Makefile
>> @@ -33,3 +33,4 @@ obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o
>>   obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) +=	kryo-l2-accessors.o
>>   obj-$(CONFIG_QCOM_ICC_BWMON)	+= icc-bwmon.o
>>   obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE)	+= ice.o
>> +obj-$(CONFIG_QCOM_MINIDUMP) += qcom_minidump.o
>> diff --git a/drivers/soc/qcom/qcom_minidump.c b/drivers/soc/qcom/qcom_minidump.c
>> new file mode 100644
>> index 0000000..d107a86
>> --- /dev/null
>> +++ b/drivers/soc/qcom/qcom_minidump.c
>> @@ -0,0 +1,581 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +
>> +/*
>> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
>> + */
>> +
>> +#include <linux/elf.h>
>> +#include <linux/err.h>
>> +#include <linux/errno.h>
>> +#include <linux/export.h>
>> +#include <linux/init.h>
>> +#include <linux/io.h>
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/string.h>
>> +#include <linux/soc/qcom/smem.h>
>> +#include <soc/qcom/qcom_minidump.h>
>> +
>> +/**
>> + * struct minidump_elfhdr - Minidump table elf header
>> + * @ehdr: Elf main header
>> + * @shdr: Section header
>> + * @phdr: Program header
>> + * @elf_offset: Section offset in elf
>> + * @strtable_idx: String table current index position
>> + */
>> +struct minidump_elfhdr {
>> +	struct elfhdr		*ehdr;
>> +	struct elf_shdr		*shdr;
>> +	struct elf_phdr		*phdr;
>> +	size_t			elf_offset;
>> +	size_t			strtable_idx;
>> +};
>> +
>> +/**
>> + * struct minidump - Minidump driver private data
>> + * @md_gbl_toc	: Global TOC pointer
>> + * @md_apss_toc	: Application Subsystem TOC pointer
>> + * @md_regions	: High level OS region base pointer
>> + * @elf		: Minidump elf header
>> + * @dev		: Minidump device
>> + */
>> +struct minidump {
>> +	struct minidump_global_toc	*md_gbl_toc;
>> +	struct minidump_subsystem	*md_apss_toc;
>> +	struct minidump_region		*md_regions;
>> +	struct minidump_elfhdr		elf;
>> +	struct device			*dev;
>> +};
>> +
>> +/*
>> + * In some of the Old Qualcomm devices, boot firmware statically allocates 300
>> + * as total number of supported region (including all co-processors) in
>> + * minidump table out of which linux was using 201. In future, this limitation
>> + * from boot firmware might get removed by allocating the region dynamically.
>> + * So, keep it compatible with older devices, we can keep the current limit for
>> + * Linux to 201.
>> + */
>> +#define MAX_NUM_ENTRIES	  201
>> +#define MAX_STRTBL_SIZE	  (MAX_NUM_ENTRIES * MAX_REGION_NAME_LENGTH)
>> +
>> +static struct minidump *__md;
> 
> No, no file scope or global scope statics.
> 
>> +static DEFINE_MUTEX(minidump_lock);
> 
> Neither this.
> 
> Also you need to clearly express what is protected by newn lock.

Once i divide this driver into two (minidump core + smem backend driver
as described here[1]) so there could more backend drivers could be 
attached to minidump core. Some of the file scope variable still be 
required.

[1]
https://lore.kernel.org/lkml/c2496855-113a-56e6-f6e2-9a9bd03a1267@quicinc.com/#t

> 
>> +
>> +static struct elf_shdr *elf_shdr_entry_addr(struct elfhdr *ehdr, int idx)
>> +{
>> +	struct elf_shdr *eshdr = (struct elf_shdr *)((size_t)ehdr + ehdr->e_shoff);
>> +
>> +	return &eshdr[idx];
>> +}
>> +
>> +static struct elf_phdr *elf_phdr_entry_addr(struct elfhdr *ehdr, int idx)
>> +{
>> +	struct elf_phdr *ephdr = (struct elf_phdr *)((size_t)ehdr + ehdr->e_phoff);
>> +
>> +	return &ephdr[idx];
>> +}
>> +
>> +static char *elf_str_table_start(struct elfhdr *ehdr)
>> +{
>> +	struct elf_shdr *eshdr;
>> +
>> +	if (ehdr->e_shstrndx == SHN_UNDEF)
>> +		return NULL;
>> +
>> +	eshdr = elf_shdr_entry_addr(ehdr, ehdr->e_shstrndx);
>> +	return (char *)ehdr + eshdr->sh_offset;
>> +}
>> +
>> +static char *elf_lookup_string(struct elfhdr *ehdr, int offset)
>> +{
>> +	char *strtab = elf_str_table_start(ehdr);
>> +
>> +	if (!strtab || (__md->elf.strtable_idx < offset))
>> +		return NULL;
>> +
>> +	return strtab + offset;
>> +}
>> +
>> +static unsigned int append_str_to_strtable(const char *name)
>> +{
>> +	char *strtab = elf_str_table_start(__md->elf.ehdr);
>> +	unsigned int old_idx = __md->elf.strtable_idx;
>> +	unsigned int ret;
>> +
>> +	if (!strtab || !name)
>> +		return 0;
>> +
>> +	ret = old_idx;
>> +	old_idx += strscpy((strtab + old_idx), name, MAX_REGION_NAME_LENGTH);
>> +	__md->elf.strtable_idx = old_idx + 1;
>> +	return ret;
>> +}
>> +
>> +static int
>> +get_apss_minidump_region_index(const struct qcom_apss_minidump_region *region)
>> +{
>> +	struct minidump_region *mdr;
>> +	unsigned int i;
>> +	unsigned int count;
>> +
>> +	count = le32_to_cpu(__md->md_apss_toc->region_count);
>> +	for (i = 0; i < count; i++) {
>> +		mdr = &__md->md_regions[i];
>> +		if (!strcmp(mdr->name, region->name))
>> +			return i;
>> +	}
>> +	return -ENOENT;
>> +}
>> +
>> +static void
>> +qcom_apss_minidump_update_elf_header(const struct qcom_apss_minidump_region *region)
> 
> You need to name everything shorter. Neither functions nor structs are
> making it easy to follow.

Would drop "apss" from the name;

> 
>> +{
>> +	struct elfhdr *ehdr = __md->elf.ehdr;
>> +	struct elf_shdr *shdr;
>> +	struct elf_phdr *phdr;
>> +
>> +	shdr = elf_shdr_entry_addr(ehdr, ehdr->e_shnum++);
>> +	phdr = elf_phdr_entry_addr(ehdr, ehdr->e_phnum++);
>> +
>> +	shdr->sh_type = SHT_PROGBITS;
>> +	shdr->sh_name = append_str_to_strtable(region->name);
>> +	shdr->sh_addr = (elf_addr_t)region->virt_addr;
>> +	shdr->sh_size = region->size;
>> +	shdr->sh_flags = SHF_WRITE;
>> +	shdr->sh_offset = __md->elf.elf_offset;
>> +	shdr->sh_entsize = 0;
>> +
>> +	phdr->p_type = PT_LOAD;
>> +	phdr->p_offset = __md->elf.elf_offset;
>> +	phdr->p_vaddr = (elf_addr_t)region->virt_addr;
>> +	phdr->p_paddr = region->phys_addr;
>> +	phdr->p_filesz = phdr->p_memsz = region->size;
>> +	phdr->p_flags = PF_R | PF_W;
>> +	__md->elf.elf_offset += shdr->sh_size;
>> +}
>> +
>> +static void
>> +qcom_apss_minidump_add_region(const struct qcom_apss_minidump_region *region)
>> +{
>> +	struct minidump_region *mdr;
>> +	unsigned int region_cnt = le32_to_cpu(__md->md_apss_toc->region_count);
>> +
>> +	mdr = &__md->md_regions[region_cnt];
>> +	strscpy(mdr->name, region->name, sizeof(mdr->name));
>> +	mdr->address = cpu_to_le64(region->phys_addr);
>> +	mdr->size = cpu_to_le64(region->size);
>> +	mdr->valid = cpu_to_le32(MINIDUMP_REGION_VALID);
>> +	region_cnt++;
>> +	__md->md_apss_toc->region_count = cpu_to_le32(region_cnt);
>> +}
>> +
>> +static bool
>> +qcom_apss_minidump_valid_region(const struct qcom_apss_minidump_region *region)
>> +{
>> +	return region &&
>> +		strnlen(region->name, MAX_NAME_LENGTH) < MAX_NAME_LENGTH &&
>> +		region->virt_addr &&
>> +		region->size &&
>> +		IS_ALIGNED(region->size, 4);
>> +}
>> +
>> +static int qcom_apss_minidump_add_elf_header(void)
>> +{
>> +	struct qcom_apss_minidump_region elfregion;
>> +	struct elfhdr *ehdr;
>> +	struct elf_shdr *shdr;
>> +	struct elf_phdr *phdr;
>> +	unsigned int  elfh_size;
>> +	unsigned int strtbl_off;
>> +	unsigned int phdr_off;
>> +	char *banner;
>> +	unsigned int banner_len;
>> +
>> +	banner_len = strlen(linux_banner);
>> +	/*
>> +	 * Header buffer contains:
>> +	 * ELF header, (MAX_NUM_ENTRIES + 4) of Section and Program ELF headers,
>> +	 * where, 4 additional entries, one for empty header, one for string table
>> +	 * one for minidump table and one for linux banner.
>> +	 *
>> +	 * Linux banner is stored in minidump to aid post mortem tools to determine
>> +	 * the kernel version.
>> +	 */
>> +	elfh_size = sizeof(*ehdr);
>> +	elfh_size += MAX_STRTBL_SIZE;
>> +	elfh_size += banner_len + 1;
>> +	elfh_size += ((sizeof(*shdr) + sizeof(*phdr)) * (MAX_NUM_ENTRIES + 4));
>> +	elfh_size = ALIGN(elfh_size, 4);
>> +
>> +	__md->elf.ehdr = kzalloc(elfh_size, GFP_KERNEL);
>> +	if (!__md->elf.ehdr)
>> +		return -ENOMEM;
>> +
>> +	/* Register ELF header as first region */
>> +	strscpy(elfregion.name, "KELF_HEADER", sizeof(elfregion.name));
>> +	elfregion.virt_addr = __md->elf.ehdr;
>> +	elfregion.phys_addr = virt_to_phys(__md->elf.ehdr);
>> +	elfregion.size = elfh_size;
>> +	qcom_apss_minidump_add_region(&elfregion);
>> +
>> +	ehdr = __md->elf.ehdr;
>> +	/* Assign Section/Program headers offset */
>> +	__md->elf.shdr = shdr = (struct elf_shdr *)(ehdr + 1);
>> +	__md->elf.phdr = phdr = (struct elf_phdr *)(shdr + MAX_NUM_ENTRIES);
>> +	phdr_off = sizeof(*ehdr) + (sizeof(*shdr) * MAX_NUM_ENTRIES);
>> +
>> +	memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
>> +	ehdr->e_ident[EI_CLASS] = ELF_CLASS;
>> +	ehdr->e_ident[EI_DATA] = ELF_DATA;
>> +	ehdr->e_ident[EI_VERSION] = EV_CURRENT;
>> +	ehdr->e_ident[EI_OSABI] = ELF_OSABI;
>> +	ehdr->e_type = ET_CORE;
>> +	ehdr->e_machine  = ELF_ARCH;
>> +	ehdr->e_version = EV_CURRENT;
>> +	ehdr->e_ehsize = sizeof(*ehdr);
>> +	ehdr->e_phoff = phdr_off;
>> +	ehdr->e_phentsize = sizeof(*phdr);
>> +	ehdr->e_shoff = sizeof(*ehdr);
>> +	ehdr->e_shentsize = sizeof(*shdr);
>> +	ehdr->e_shstrndx = 1;
>> +
>> +	__md->elf.elf_offset = elfh_size;
>> +
>> +	/*
>> +	 * The zeroth index of the section header is reserved and is rarely used.
>> +	 * Set the section header as null (SHN_UNDEF) and move to the next one.
>> +	 * 2nd Section is String table.
>> +	 */
>> +	__md->elf.strtable_idx = 1;
>> +	strtbl_off = sizeof(*ehdr) + ((sizeof(*phdr) + sizeof(*shdr)) * MAX_NUM_ENTRIES);
>> +	shdr++;
>> +	shdr->sh_type = SHT_STRTAB;
>> +	shdr->sh_offset = (elf_addr_t)strtbl_off;
>> +	shdr->sh_size = MAX_STRTBL_SIZE;
>> +	shdr->sh_entsize = 0;
>> +	shdr->sh_flags = 0;
>> +	shdr->sh_name = append_str_to_strtable("STR_TBL");
>> +	shdr++;
>> +
>> +	/* 3rd Section is Linux banner */
>> +	banner = (char *)ehdr + strtbl_off + MAX_STRTBL_SIZE;
>> +	memcpy(banner, linux_banner, banner_len);
>> +
>> +	shdr->sh_type = SHT_PROGBITS;
>> +	shdr->sh_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE);
>> +	shdr->sh_size = banner_len + 1;
>> +	shdr->sh_addr = (elf_addr_t)linux_banner;
>> +	shdr->sh_entsize = 0;
>> +	shdr->sh_flags = SHF_WRITE;
>> +	shdr->sh_name = append_str_to_strtable("linux_banner");
>> +
>> +	phdr->p_type = PT_LOAD;
>> +	phdr->p_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE);
>> +	phdr->p_vaddr = (elf_addr_t)linux_banner;
>> +	phdr->p_paddr = virt_to_phys(linux_banner);
>> +	phdr->p_filesz = phdr->p_memsz = banner_len + 1;
>> +	phdr->p_flags = PF_R | PF_W;
>> +
>> +	/*
>> +	 * Above are some prdefined sections/program header used
>> +	 * for debug, update their count here.
>> +	 */
>> +	ehdr->e_phnum = 1;
>> +	ehdr->e_shnum = 3;
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * qcom_minidump_subsystem_desc() - Get minidump subsystem descriptor.
>> + * @minidump_index: minidump index for a subsystem in minidump table
>> + *
>> + * Return: minidump subsystem descriptor address on success and error
>> + * on failure
>> + */
>> +struct minidump_subsystem *qcom_minidump_subsystem_desc(unsigned int minidump_index)
>> +{
>> +	struct minidump_subsystem *md_ss_toc;
>> +
>> +	mutex_lock(&minidump_lock);
>> +	if (!__md) {
>> +		md_ss_toc = ERR_PTR(-EPROBE_DEFER);
>> +		goto unlock;
>> +	}
>> +
>> +	md_ss_toc = &__md->md_gbl_toc->subsystems[minidump_index];
>> +unlock:
>> +	mutex_unlock(&minidump_lock);
>> +	return md_ss_toc;
>> +}
>> +EXPORT_SYMBOL_GPL(qcom_minidump_subsystem_desc);
>> +
>> +/**
>> + * qcom_apss_minidump_region_register() - Register a region in Minidump table.
>> + * @region: minidump region.
>> + *
>> + * Return: On success, it returns 0, otherwise a negative error value on failure.
>> + */
>> +int qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region)
>> +{
>> +	unsigned int num_region;
>> +	int ret;
>> +
>> +	if (!__md)
>> +		return -EPROBE_DEFER;
>> +
>> +	if (!qcom_apss_minidump_valid_region(region))
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&minidump_lock);
>> +	ret = get_apss_minidump_region_index(region);
>> +	if (ret >= 0) {
>> +		dev_info(__md->dev, "%s region is already registered\n", region->name);
>> +		ret = -EEXIST;
>> +		goto unlock;
>> +	}
>> +
>> +	/* Check if there is a room for a new entry */
>> +	num_region = le32_to_cpu(__md->md_apss_toc->region_count);
>> +	if (num_region >= MAX_NUM_ENTRIES) {
>> +		dev_err(__md->dev, "maximum region limit %u reached\n", num_region);
>> +		ret = -ENOSPC;
>> +		goto unlock;
>> +	}
>> +
>> +	qcom_apss_minidump_add_region(region);
>> +	qcom_apss_minidump_update_elf_header(region);
>> +	ret = 0;
>> +unlock:
>> +	mutex_unlock(&minidump_lock);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(qcom_apss_minidump_region_register);
>> +
>> +static int
>> +qcom_apss_minidump_clear_header(const struct qcom_apss_minidump_region *region)
>> +{
>> +	struct elfhdr *ehdr = __md->elf.ehdr;
>> +	struct elf_shdr *shdr;
>> +	struct elf_shdr *tmp_shdr;
>> +	struct elf_phdr *phdr;
>> +	struct elf_phdr *tmp_phdr;
>> +	unsigned int phidx;
>> +	unsigned int shidx;
>> +	unsigned int len;
>> +	unsigned int i;
>> +	char *shname;
>> +
>> +	for (i = 0; i < ehdr->e_phnum; i++) {
>> +		phdr = elf_phdr_entry_addr(ehdr, i);
>> +		if (phdr->p_paddr == region->phys_addr &&
>> +		    phdr->p_memsz == region->size)
>> +			break;
>> +	}
>> +
>> +	if (i == ehdr->e_phnum) {
>> +		dev_err(__md->dev, "Cannot find program header entry in elf\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	phidx = i;
>> +	for (i = 0; i < ehdr->e_shnum; i++) {
>> +		shdr = elf_shdr_entry_addr(ehdr, i);
>> +		shname = elf_lookup_string(ehdr, shdr->sh_name);
>> +		if (shname && !strcmp(shname, region->name) &&
>> +		    shdr->sh_addr == (elf_addr_t)region->virt_addr &&
>> +		    shdr->sh_size == region->size)
>> +			break;
>> +	}
>> +
>> +	if (i == ehdr->e_shnum) {
>> +		dev_err(__md->dev, "Cannot find section header entry in elf\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	shidx = i;
>> +	if (shdr->sh_offset != phdr->p_offset) {
>> +		dev_err(__md->dev, "Invalid entry details for region: %s\n", region->name);
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* Clear name in string table */
>> +	len = strlen(shname) + 1;
>> +	memmove(shname, shname + len,
>> +		__md->elf.strtable_idx - shdr->sh_name - len);
>> +	__md->elf.strtable_idx -= len;
>> +
>> +	/* Clear program header */
>> +	tmp_phdr = elf_phdr_entry_addr(ehdr, phidx);
>> +	for (i = phidx; i < ehdr->e_phnum - 1; i++) {
>> +		tmp_phdr = elf_phdr_entry_addr(ehdr, i + 1);
>> +		phdr = elf_phdr_entry_addr(ehdr, i);
>> +		memcpy(phdr, tmp_phdr, sizeof(struct elf_phdr));
>> +		phdr->p_offset = phdr->p_offset - region->size;
>> +	}
>> +	memset(tmp_phdr, 0, sizeof(struct elf_phdr));
>> +	ehdr->e_phnum--;
>> +
>> +	/* Clear section header */
>> +	tmp_shdr = elf_shdr_entry_addr(ehdr, shidx);
>> +	for (i = shidx; i < ehdr->e_shnum - 1; i++) {
>> +		tmp_shdr = elf_shdr_entry_addr(ehdr, i + 1);
>> +		shdr = elf_shdr_entry_addr(ehdr, i);
>> +		memcpy(shdr, tmp_shdr, sizeof(struct elf_shdr));
>> +		shdr->sh_offset -= region->size;
>> +		shdr->sh_name -= len;
>> +	}
>> +
>> +	memset(tmp_shdr, 0, sizeof(struct elf_shdr));
>> +	ehdr->e_shnum--;
>> +	__md->elf.elf_offset -= region->size;
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * qcom_apss_minidump_region_unregister() - Unregister region from Minidump table.
>> + * @region: minidump region.
>> + *
>> + * Return: On success, it returns 0 and negative error value on failure.
>> + */
>> +int qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region)
>> +{
>> +	struct minidump_region *mdr;
>> +	unsigned int num_region;
>> +	unsigned int idx;
>> +	int ret;
>> +
>> +	if (!region)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&minidump_lock);
>> +	if (!__md) {
>> +		ret = -EPROBE_DEFER;
>> +		goto unlock;
>> +	}
>> +
>> +	idx = get_apss_minidump_region_index(region);
>> +	if (idx < 0) {
>> +		dev_err(__md->dev, "%s region is not present\n", region->name);
>> +		ret = idx;
>> +		goto unlock;
>> +	}
>> +
>> +	mdr = &__md->md_regions[0];
>> +	num_region = le32_to_cpu(__md->md_apss_toc->region_count);
>> +	/*
>> +	 * Left shift all the regions exist after this removed region
>> +	 * index by 1 to fill the gap and zero out the last region
>> +	 * present at the end.
>> +	 */
>> +	memmove(&mdr[idx], &mdr[idx + 1],
>> +		(num_region - idx - 1) * sizeof(struct minidump_region));
>> +	memset(&mdr[num_region - 1], 0, sizeof(struct minidump_region));
>> +	ret = qcom_apss_minidump_clear_header(region);
>> +	if (ret) {
>> +		dev_err(__md->dev, "Failed to remove region: %s\n", region->name);
>> +		goto unlock;
>> +	}
>> +
>> +	num_region--;
>> +	__md->md_apss_toc->region_count = cpu_to_le32(num_region);
>> +unlock:
>> +	mutex_unlock(&minidump_lock);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(qcom_apss_minidump_region_unregister);
>> +
>> +static int qcom_minidump_init_apss_subsystem(struct minidump *md)
>> +{
>> +	struct minidump_subsystem *apsstoc;
>> +
>> +	apsstoc = &md->md_gbl_toc->subsystems[MINIDUMP_APSS_DESC];
>> +	md->md_regions = devm_kcalloc(md->dev, MAX_NUM_ENTRIES,
>> +				      sizeof(struct minidump_region), GFP_KERNEL);
>> +	if (!md->md_regions)
>> +		return -ENOMEM;
>> +
>> +	md->md_apss_toc = apsstoc;
>> +	apsstoc->regions_baseptr = cpu_to_le64(virt_to_phys(md->md_regions));
>> +	apsstoc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED);
>> +	apsstoc->status = cpu_to_le32(1);
>> +	apsstoc->region_count = cpu_to_le32(0);
>> +
>> +	/* Tell bootloader not to encrypt the regions of this subsystem */
>> +	apsstoc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE);
>> +	apsstoc->encryption_required = cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ);
>> +
>> +	return 0;
>> +}
>> +
>> +static int qcom_minidump_probe(struct platform_device *pdev)
>> +{
>> +	struct minidump_global_toc *mdgtoc;
>> +	struct minidump *md;
>> +	size_t size;
>> +	int ret;
>> +
>> +	md = devm_kzalloc(&pdev->dev, sizeof(*md), GFP_KERNEL);
>> +	if (!md)
>> +		return -ENOMEM;
>> +
>> +	mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size);
>> +	if (IS_ERR(mdgtoc)) {
>> +		ret = PTR_ERR(mdgtoc);
>> +		dev_err(&pdev->dev, "Couldn't find minidump smem item: %d\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	if (size < sizeof(*mdgtoc) || !mdgtoc->status) {
>> +		ret = -EINVAL;
>> +		dev_err(&pdev->dev, "minidump table is not initialized: %d\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	mutex_lock(&minidump_lock);
>> +	md->dev = &pdev->dev;
>> +	md->md_gbl_toc = mdgtoc;
> 
> What are you protecting here? It's not possible to have concurrent
> access to md, is it?
> 
>> +	ret = qcom_minidump_init_apss_subsystem(md);
>> +	if (ret) {
>> +		dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret);
>> +		goto unlock;
>> +	}
>> +
>> +	__md = md;
> 
> No. This is a platform device, so it can have multiple instances.
> 
>> +	/* First entry would be ELF header */
>> +	ret = qcom_apss_minidump_add_elf_header();
>> +	if (ret) {
>> +		dev_err(&pdev->dev, "Failed to add elf header: %d\n", ret);
>> +		memset(md->md_apss_toc, 0, sizeof(struct minidump_subsystem));
>> +		__md = NULL;
>> +	}
>> +
>> +unlock:
>> +	mutex_unlock(&minidump_lock);
>> +	return ret;
>> +}
>> +
>> +static int qcom_minidump_remove(struct platform_device *pdev)
>> +{
>> +	memset(__md->md_apss_toc, 0, sizeof(struct minidump_subsystem));
>> +	__md = NULL;
> 
> Don't use __ in variable names. Drop it everywhere.
> 
>> +
>> +	return 0;
>> +}
>> +
>> +static struct platform_driver qcom_minidump_driver = {
>> +	.probe = qcom_minidump_probe,
>> +	.remove = qcom_minidump_remove,
>> +	.driver  = {
>> +		.name = "qcom-minidump",
>> +	},
>> +};
>> +
>> +module_platform_driver(qcom_minidump_driver);
>> +
>> +MODULE_DESCRIPTION("Qualcomm APSS minidump driver");
>> +MODULE_LICENSE("GPL v2");
>> +MODULE_ALIAS("platform:qcom-minidump");
>> diff --git a/drivers/soc/qcom/smem.c b/drivers/soc/qcom/smem.c
>> index 6be7ea9..d459656 100644
>> --- a/drivers/soc/qcom/smem.c
>> +++ b/drivers/soc/qcom/smem.c
>> @@ -279,6 +279,7 @@ struct qcom_smem {
>>   
>>   	u32 item_count;
>>   	struct platform_device *socinfo;
>> +	struct platform_device *minidump;
>>   	struct smem_ptable *ptable;
>>   	struct smem_partition global_partition;
>>   	struct smem_partition partitions[SMEM_HOST_COUNT];
>> @@ -1151,12 +1152,19 @@ static int qcom_smem_probe(struct platform_device *pdev)
>>   	if (IS_ERR(smem->socinfo))
>>   		dev_dbg(&pdev->dev, "failed to register socinfo device\n");
>>   
>> +	smem->minidump = platform_device_register_data(&pdev->dev, "qcom-minidump",
>> +						      PLATFORM_DEVID_NONE, NULL,
>> +						      0);
>> +	if (IS_ERR(smem->minidump))
>> +		dev_dbg(&pdev->dev, "failed to register minidump device\n");
>> +
>>   	return 0;
>>   }
>>   
>>   static int qcom_smem_remove(struct platform_device *pdev)
>>   {
>>   	platform_device_unregister(__smem->socinfo);
>> +	platform_device_unregister(__smem->minidump);
> 
> Wrong order. You registered first socinfo, right?
> 
>>   
>>   	hwspin_lock_free(__smem->hwlock);
>>   	__smem = NULL;
>> diff --git a/include/soc/qcom/qcom_minidump.h b/include/soc/qcom/qcom_minidump.h
>> index 84c8605..1872668 100644
>> --- a/include/soc/qcom/qcom_minidump.h
>> +++ b/include/soc/qcom/qcom_minidump.h
>> @@ -1,6 +1,7 @@
>>   /* SPDX-License-Identifier: GPL-2.0-only */
>>   /*
>> - * Qualcomm minidump shared data structures and macros
>> + * This file contain Qualcomm minidump data structures and macros shared with
>> + * boot firmware and also apss minidump client's data structure
>>    *
>>    * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
>>    */
>> @@ -9,12 +10,27 @@
>>   #define _QCOM_MINIDUMP_H_
>>   
>>   #define MAX_NUM_OF_SS           10
>> +#define MAX_NAME_LENGTH		12
>>   #define MAX_REGION_NAME_LENGTH  16
>> +
>> +#define MINIDUMP_REVISION	1
>>   #define SBL_MINIDUMP_SMEM_ID	602
>> +
>> +/* Application processor minidump descriptor */
>> +#define MINIDUMP_APSS_DESC	0
>> +#define SMEM_ENTRY_SIZE		40
>> +
>>   #define MINIDUMP_REGION_VALID		('V' << 24 | 'A' << 16 | 'L' << 8 | 'I' << 0)
>> +#define MINIDUMP_REGION_INVALID		('I' << 24 | 'N' << 16 | 'V' << 8 | 'A' << 0)
>> +#define MINIDUMP_REGION_INIT		('I' << 24 | 'N' << 16 | 'I' << 8 | 'T' << 0)
>> +#define MINIDUMP_REGION_NOINIT		0
>> +
>> +#define MINIDUMP_SS_ENCR_REQ		(0 << 24 | 'Y' << 16 | 'E' << 8 | 'S' << 0)
>> +#define MINIDUMP_SS_ENCR_NOTREQ		(0 << 24 | 0 << 16 | 'N' << 8 | 'R' << 0)
>> +#define MINIDUMP_SS_ENCR_NONE		('N' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0)
>>   #define MINIDUMP_SS_ENCR_DONE		('D' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0)
>> +#define MINIDUMP_SS_ENCR_START		('S' << 24 | 'T' << 16 | 'R' << 8 | 'T' << 0)
>>   #define MINIDUMP_SS_ENABLED		('E' << 24 | 'N' << 16 | 'B' << 8 | 'L' << 0)
>> -
> 
> Why removing this?
> 
>>   /**
>>    * struct minidump_region - Minidump region
>>    * @name		: Name of the region to be dumped
>> @@ -63,4 +79,45 @@ struct minidump_global_toc {
>>   	struct minidump_subsystem	subsystems[MAX_NUM_OF_SS];
>>   };
>>   
>> +/**
>> + * struct qcom_apss_minidump_region - APSS Minidump region information
>> + *
>> + * @name:	Entry name, Minidump will dump binary with this name.
>> + * @virt_addr:  Virtual address of the entry.
>> + * @phys_addr:	Physical address of the entry to dump.
>> + * @size:	Number of byte to dump from @address location,
>> + *		and it should be 4 byte aligned.
>> + */
>> +struct qcom_apss_minidump_region {
>> +	char		name[MAX_NAME_LENGTH];
>> +	void		*virt_addr;
>> +	phys_addr_t	phys_addr;
>> +	size_t		size;
>> +};
> 
> You expose way too much internals in global header.
> 
>> +
>> +#if IS_ENABLED(CONFIG_QCOM_MINIDUMP)
>> +extern struct minidump_subsystem *
> 
> No externs.
> 
> The header is unreadable.
> 
>> +qcom_minidump_subsystem_desc(unsigned int minidump_index);
>> +extern int
>> +qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region);
>> +extern int
>> +qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region);
> 
> Blank line
> 
>> +#else
> 
> Blank line
> 
>> +static inline
>> +struct minidump_subsystem *qcom_minidump_subsystem_desc(unsigned int minidump_index)
>> +{
>> +	return NULL;
>> +}
> 
> Blank line
> 
>> +static inline int
>> +qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region)
>> +{
>> +	/* Return quietly, if minidump is not enabled */
>> +	return 0;
>> +}
> 
> 
>> +static inline int
>> +qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region)
>> +{
>> +	return 0;
>> +}
> 

Will apply the suggestion.

-- Mukesh

>> +#endif
> 
> /* CONFIG_QCOM_MINIDUMP */
> 
>>   #endif  /* _QCOM_MINIDUMP_H_ */
> 
> Best regards,
> Krzysztof
>
Bjorn Andersson July 15, 2023, 10:13 p.m. UTC | #24
On Wed, 03 May 2023 22:32:14 +0530, Mukesh Ojha wrote:
> Minidump is a best effort mechanism to collect useful and predefined data
> for first level of debugging on end user devices running on Qualcomm SoCs.
> It is built on the premise that System on Chip (SoC) or subsystem part of
> SoC crashes, due to a range of hardware and software bugs. Hence, the
> ability to collect accurate data is only a best-effort. The data collected
> could be invalid or corrupted, data collection itself could fail, and so on.
> 
> [...]

Applied, thanks!

[01/18] remoteproc: qcom: Expand MD_* as MINIDUMP_*
        commit: 318da1371246fdc1806011a27138175cfb078687

Best regards,
Bjorn Andersson July 17, 2023, 4:21 p.m. UTC | #25
On Sun, Jul 16, 2023 at 07:15:57PM -0600, Mathieu Poirier wrote:
> On Sat, Jul 15, 2023 at 03:13:34PM -0700, Bjorn Andersson wrote:
> > 
> > On Wed, 03 May 2023 22:32:14 +0530, Mukesh Ojha wrote:
> > > Minidump is a best effort mechanism to collect useful and predefined data
> > > for first level of debugging on end user devices running on Qualcomm SoCs.
> > > It is built on the premise that System on Chip (SoC) or subsystem part of
> > > SoC crashes, due to a range of hardware and software bugs. Hence, the
> > > ability to collect accurate data is only a best-effort. The data collected
> > > could be invalid or corrupted, data collection itself could fail, and so on.
> > > 
> > > [...]
> > 
> > Applied, thanks!
> > 
> > [01/18] remoteproc: qcom: Expand MD_* as MINIDUMP_*
> >         commit: 318da1371246fdc1806011a27138175cfb078687
> >
> 
> Krzysztof asked for modifications on this patch.
> 

Krzysztof pointed out that there was no reason to trivially modify the
defines and then immediately start moving things around.

I agree with this, but as the consensus was that the rest of the series
needs more work, I find this to be a sensible cleanup, getting rid of
the cryptic "MD_" abbreviation.

Regards,
Bjorn
diff mbox

Patch

diff --git a/drivers/watchdog/qcom-wdt.c b/drivers/watchdog/qcom-wdt.c
index 0d2209c..767e84a 100644
--- a/drivers/watchdog/qcom-wdt.c
+++ b/drivers/watchdog/qcom-wdt.c
@@ -12,6 +12,7 @@ 
 #include <linux/platform_device.h>
 #include <linux/watchdog.h>
 #include <linux/of_device.h>
+#include <linux/panic.h>

 enum wdt_reg {
        WDT_RST,
@@ -114,12 +115,28 @@  static int qcom_wdt_set_pretimeout(struct watchdog_device *wdd,
        return qcom_wdt_start(wdd);
 }

+static void qcom_wdt_bite_on_panic(struct qcom_wdt *wdt)
+{
+       writel(0, wdt_addr(wdt, WDT_EN));
+       writel(1, wdt_addr(wdt, WDT_BITE_TIME));
+       writel(1, wdt_addr(wdt, WDT_RST));
+       writel(QCOM_WDT_ENABLE, wdt_addr(wdt, WDT_EN));
+
+       wmb();
+
+       while(1)
+               udelay(1);
+}
+
 static int qcom_wdt_restart(struct watchdog_device *wdd, unsigned long action,
                            void *data)
 {
        struct qcom_wdt *wdt = to_qcom_wdt(wdd);
        u32 timeout;

+       if (in_panic)
+               qcom_wdt_bite_on_panic(wdt);
+
        /*
         * Trigger watchdog bite:
         *    Setup BITE_TIME to be 128ms, and enable WDT.
diff --git a/include/linux/panic.h b/include/linux/panic.h
index 979b776..f913629 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -22,6 +22,7 @@  extern int panic_on_oops;
 extern int panic_on_unrecovered_nmi;
 extern int panic_on_io_nmi;
 extern int panic_on_warn;
+extern bool in_panic;

 extern unsigned long panic_on_taint;
 extern bool panic_on_taint_nousertaint;
diff --git a/kernel/panic.c b/kernel/panic.c
index 487f5b0..714f7f4 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -65,6 +65,8 @@  static unsigned int warn_limit __read_mostly;

 int panic_timeout = CONFIG_PANIC_TIMEOUT;
 EXPORT_SYMBOL_GPL(panic_timeout);
+bool in_panic = false;
+EXPORT_SYMBOL_GPL(in_panic);

 #define PANIC_PRINT_TASK_INFO          0x00000001
 #define PANIC_PRINT_MEM_INFO           0x00000002
@@ -261,6 +263,7 @@  void panic(const char *fmt, ...)
        int old_cpu, this_cpu;
        bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;

+       in_panic = true;
        if (panic_on_warn) {
                /*
                 * This thread may hit another WARN() in the panic path.