Message ID | 1683133352-10046-1-git-send-email-quic_mojha@quicinc.com |
---|---|
State | Superseded |
Headers | show |
On 03/05/2023 19:02, Mukesh Ojha wrote: > Previous patches add the Qualcomm minidump driver support, so > lets enable minidump config so that it can be used by kernel > clients. > > Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> This patchset is split too much. Defconfig change is one change. Not two or three. > --- > arch/arm64/configs/defconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > index a24609e..831c942 100644 > --- a/arch/arm64/configs/defconfig > +++ b/arch/arm64/configs/defconfig > @@ -1250,6 +1250,7 @@ CONFIG_QCOM_STATS=m > CONFIG_QCOM_WCNSS_CTRL=m > CONFIG_QCOM_APR=m > CONFIG_QCOM_ICC_BWMON=m > +CONFIG_QCOM_MINIDUMP=y This must be a module. Best regards, Krzysztof
On 03/05/2023 19:02, Mukesh Ojha wrote: > Minidump is a best effort mechanism to collect useful and predefined data > for first level of debugging on end user devices running on Qualcomm SoCs. > It is built on the premise that System on Chip (SoC) or subsystem part of > SoC crashes, due to a range of hardware and software bugs. Hence, the > ability to collect accurate data is only a best-effort. The data collected > could be invalid or corrupted, data collection itself could fail, and so on. You organized the patch in a way making it very hard for us to review. I see mixed remoteproc, then soc, then defconfig (!!!), then remote proc, then soc, then bindings (! they must be before usage...), then dts (which should be the last), then soc then dts then... You see the point. Bindings, docs, changes organized by subsystem. Then DTS as separate patchset with a link to this one. If you have bisectability issues then it's a hint something is wrongly organized or done and must be fixed anyway. Best regards, Krzysztof
On 03/05/2023 19:02, Mukesh Ojha wrote: > Minidump is a best effort mechanism to collect useful and predefined > data for first level of debugging on end user devices running on > Qualcomm SoCs. It is built on the premise that System on Chip (SoC) > or subsystem part of SoC crashes, due to a range of hardware and > software bugs. Hence, the ability to collect accurate data is only > a best-effort. The data collected could be invalid or corrupted, > data collection itself could fail, and so on. > > Qualcomm devices in engineering mode provides a mechanism for > generating full system ramdumps for post mortem debugging. But in some > cases it's however not feasible to capture the entire content of RAM. > The minidump mechanism provides the means for selecting region should > be included in the ramdump. The solution supports extracting the > ramdump/minidump produced either over USB or stored to an attached > storage device. > > The core of minidump feature is part of Qualcomm's boot firmware code. > It initializes shared memory(SMEM), which is a part of DDR and > allocates a small section of it to minidump table i.e also called > global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has > their own table of segments to be included in the minidump, all > references from a descriptor in SMEM (G-ToC). Each segment/region has > some details like name, physical address and it's size etc. and it > could be anywhere scattered in the DDR. > > Minidump kernel driver adds the capability to add linux region to be > dumped as part of ram dump collection. It provides appropriate symbol > to check its enablement and register client regions. > > To simplify post mortem debugging, it creates and maintain an ELF > header as first region that gets updated upon registration > of a new region. > > Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> > --- > drivers/soc/qcom/Kconfig | 14 + > drivers/soc/qcom/Makefile | 1 + > drivers/soc/qcom/qcom_minidump.c | 581 +++++++++++++++++++++++++++++++++++++++ > drivers/soc/qcom/smem.c | 8 + > include/soc/qcom/qcom_minidump.h | 61 +++- > 5 files changed, 663 insertions(+), 2 deletions(-) > create mode 100644 drivers/soc/qcom/qcom_minidump.c > > diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig > index a491718..15c931e 100644 > --- a/drivers/soc/qcom/Kconfig > +++ b/drivers/soc/qcom/Kconfig > @@ -279,4 +279,18 @@ config QCOM_INLINE_CRYPTO_ENGINE > tristate > select QCOM_SCM > > +config QCOM_MINIDUMP > + tristate "QCOM Minidump Support" > + depends on ARCH_QCOM || COMPILE_TEST > + select QCOM_SMEM > + help > + Enablement of core minidump feature is controlled from boot firmware > + side, and this config allow linux to query and manages APPS minidump > + table. > + > + Client drivers can register their internal data structures and debug > + messages as part of the minidump region and when the SoC is crashed, > + these selective regions will be dumped instead of the entire DDR. > + This saves significant amount of time and/or storage space. > + > endmenu > diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile > index 0f43a88..1ebe081 100644 > --- a/drivers/soc/qcom/Makefile > +++ b/drivers/soc/qcom/Makefile > @@ -33,3 +33,4 @@ obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o > obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) += kryo-l2-accessors.o > obj-$(CONFIG_QCOM_ICC_BWMON) += icc-bwmon.o > obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE) += ice.o > +obj-$(CONFIG_QCOM_MINIDUMP) += qcom_minidump.o > diff --git a/drivers/soc/qcom/qcom_minidump.c b/drivers/soc/qcom/qcom_minidump.c > new file mode 100644 > index 0000000..d107a86 > --- /dev/null > +++ b/drivers/soc/qcom/qcom_minidump.c > @@ -0,0 +1,581 @@ > +// SPDX-License-Identifier: GPL-2.0-only > + > +/* > + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved. > + */ > + > +#include <linux/elf.h> > +#include <linux/err.h> > +#include <linux/errno.h> > +#include <linux/export.h> > +#include <linux/init.h> > +#include <linux/io.h> > +#include <linux/kernel.h> > +#include <linux/module.h> > +#include <linux/platform_device.h> > +#include <linux/string.h> > +#include <linux/soc/qcom/smem.h> > +#include <soc/qcom/qcom_minidump.h> > + > +/** > + * struct minidump_elfhdr - Minidump table elf header > + * @ehdr: Elf main header > + * @shdr: Section header > + * @phdr: Program header > + * @elf_offset: Section offset in elf > + * @strtable_idx: String table current index position > + */ > +struct minidump_elfhdr { > + struct elfhdr *ehdr; > + struct elf_shdr *shdr; > + struct elf_phdr *phdr; > + size_t elf_offset; > + size_t strtable_idx; > +}; > + > +/** > + * struct minidump - Minidump driver private data > + * @md_gbl_toc : Global TOC pointer > + * @md_apss_toc : Application Subsystem TOC pointer > + * @md_regions : High level OS region base pointer > + * @elf : Minidump elf header > + * @dev : Minidump device > + */ > +struct minidump { > + struct minidump_global_toc *md_gbl_toc; > + struct minidump_subsystem *md_apss_toc; > + struct minidump_region *md_regions; > + struct minidump_elfhdr elf; > + struct device *dev; > +}; > + > +/* > + * In some of the Old Qualcomm devices, boot firmware statically allocates 300 > + * as total number of supported region (including all co-processors) in > + * minidump table out of which linux was using 201. In future, this limitation > + * from boot firmware might get removed by allocating the region dynamically. > + * So, keep it compatible with older devices, we can keep the current limit for > + * Linux to 201. > + */ > +#define MAX_NUM_ENTRIES 201 > +#define MAX_STRTBL_SIZE (MAX_NUM_ENTRIES * MAX_REGION_NAME_LENGTH) > + > +static struct minidump *__md; No, no file scope or global scope statics. > +static DEFINE_MUTEX(minidump_lock); Neither this. Also you need to clearly express what is protected by newn lock. > + > +static struct elf_shdr *elf_shdr_entry_addr(struct elfhdr *ehdr, int idx) > +{ > + struct elf_shdr *eshdr = (struct elf_shdr *)((size_t)ehdr + ehdr->e_shoff); > + > + return &eshdr[idx]; > +} > + > +static struct elf_phdr *elf_phdr_entry_addr(struct elfhdr *ehdr, int idx) > +{ > + struct elf_phdr *ephdr = (struct elf_phdr *)((size_t)ehdr + ehdr->e_phoff); > + > + return &ephdr[idx]; > +} > + > +static char *elf_str_table_start(struct elfhdr *ehdr) > +{ > + struct elf_shdr *eshdr; > + > + if (ehdr->e_shstrndx == SHN_UNDEF) > + return NULL; > + > + eshdr = elf_shdr_entry_addr(ehdr, ehdr->e_shstrndx); > + return (char *)ehdr + eshdr->sh_offset; > +} > + > +static char *elf_lookup_string(struct elfhdr *ehdr, int offset) > +{ > + char *strtab = elf_str_table_start(ehdr); > + > + if (!strtab || (__md->elf.strtable_idx < offset)) > + return NULL; > + > + return strtab + offset; > +} > + > +static unsigned int append_str_to_strtable(const char *name) > +{ > + char *strtab = elf_str_table_start(__md->elf.ehdr); > + unsigned int old_idx = __md->elf.strtable_idx; > + unsigned int ret; > + > + if (!strtab || !name) > + return 0; > + > + ret = old_idx; > + old_idx += strscpy((strtab + old_idx), name, MAX_REGION_NAME_LENGTH); > + __md->elf.strtable_idx = old_idx + 1; > + return ret; > +} > + > +static int > +get_apss_minidump_region_index(const struct qcom_apss_minidump_region *region) > +{ > + struct minidump_region *mdr; > + unsigned int i; > + unsigned int count; > + > + count = le32_to_cpu(__md->md_apss_toc->region_count); > + for (i = 0; i < count; i++) { > + mdr = &__md->md_regions[i]; > + if (!strcmp(mdr->name, region->name)) > + return i; > + } > + return -ENOENT; > +} > + > +static void > +qcom_apss_minidump_update_elf_header(const struct qcom_apss_minidump_region *region) You need to name everything shorter. Neither functions nor structs are making it easy to follow. > +{ > + struct elfhdr *ehdr = __md->elf.ehdr; > + struct elf_shdr *shdr; > + struct elf_phdr *phdr; > + > + shdr = elf_shdr_entry_addr(ehdr, ehdr->e_shnum++); > + phdr = elf_phdr_entry_addr(ehdr, ehdr->e_phnum++); > + > + shdr->sh_type = SHT_PROGBITS; > + shdr->sh_name = append_str_to_strtable(region->name); > + shdr->sh_addr = (elf_addr_t)region->virt_addr; > + shdr->sh_size = region->size; > + shdr->sh_flags = SHF_WRITE; > + shdr->sh_offset = __md->elf.elf_offset; > + shdr->sh_entsize = 0; > + > + phdr->p_type = PT_LOAD; > + phdr->p_offset = __md->elf.elf_offset; > + phdr->p_vaddr = (elf_addr_t)region->virt_addr; > + phdr->p_paddr = region->phys_addr; > + phdr->p_filesz = phdr->p_memsz = region->size; > + phdr->p_flags = PF_R | PF_W; > + __md->elf.elf_offset += shdr->sh_size; > +} > + > +static void > +qcom_apss_minidump_add_region(const struct qcom_apss_minidump_region *region) > +{ > + struct minidump_region *mdr; > + unsigned int region_cnt = le32_to_cpu(__md->md_apss_toc->region_count); > + > + mdr = &__md->md_regions[region_cnt]; > + strscpy(mdr->name, region->name, sizeof(mdr->name)); > + mdr->address = cpu_to_le64(region->phys_addr); > + mdr->size = cpu_to_le64(region->size); > + mdr->valid = cpu_to_le32(MINIDUMP_REGION_VALID); > + region_cnt++; > + __md->md_apss_toc->region_count = cpu_to_le32(region_cnt); > +} > + > +static bool > +qcom_apss_minidump_valid_region(const struct qcom_apss_minidump_region *region) > +{ > + return region && > + strnlen(region->name, MAX_NAME_LENGTH) < MAX_NAME_LENGTH && > + region->virt_addr && > + region->size && > + IS_ALIGNED(region->size, 4); > +} > + > +static int qcom_apss_minidump_add_elf_header(void) > +{ > + struct qcom_apss_minidump_region elfregion; > + struct elfhdr *ehdr; > + struct elf_shdr *shdr; > + struct elf_phdr *phdr; > + unsigned int elfh_size; > + unsigned int strtbl_off; > + unsigned int phdr_off; > + char *banner; > + unsigned int banner_len; > + > + banner_len = strlen(linux_banner); > + /* > + * Header buffer contains: > + * ELF header, (MAX_NUM_ENTRIES + 4) of Section and Program ELF headers, > + * where, 4 additional entries, one for empty header, one for string table > + * one for minidump table and one for linux banner. > + * > + * Linux banner is stored in minidump to aid post mortem tools to determine > + * the kernel version. > + */ > + elfh_size = sizeof(*ehdr); > + elfh_size += MAX_STRTBL_SIZE; > + elfh_size += banner_len + 1; > + elfh_size += ((sizeof(*shdr) + sizeof(*phdr)) * (MAX_NUM_ENTRIES + 4)); > + elfh_size = ALIGN(elfh_size, 4); > + > + __md->elf.ehdr = kzalloc(elfh_size, GFP_KERNEL); > + if (!__md->elf.ehdr) > + return -ENOMEM; > + > + /* Register ELF header as first region */ > + strscpy(elfregion.name, "KELF_HEADER", sizeof(elfregion.name)); > + elfregion.virt_addr = __md->elf.ehdr; > + elfregion.phys_addr = virt_to_phys(__md->elf.ehdr); > + elfregion.size = elfh_size; > + qcom_apss_minidump_add_region(&elfregion); > + > + ehdr = __md->elf.ehdr; > + /* Assign Section/Program headers offset */ > + __md->elf.shdr = shdr = (struct elf_shdr *)(ehdr + 1); > + __md->elf.phdr = phdr = (struct elf_phdr *)(shdr + MAX_NUM_ENTRIES); > + phdr_off = sizeof(*ehdr) + (sizeof(*shdr) * MAX_NUM_ENTRIES); > + > + memcpy(ehdr->e_ident, ELFMAG, SELFMAG); > + ehdr->e_ident[EI_CLASS] = ELF_CLASS; > + ehdr->e_ident[EI_DATA] = ELF_DATA; > + ehdr->e_ident[EI_VERSION] = EV_CURRENT; > + ehdr->e_ident[EI_OSABI] = ELF_OSABI; > + ehdr->e_type = ET_CORE; > + ehdr->e_machine = ELF_ARCH; > + ehdr->e_version = EV_CURRENT; > + ehdr->e_ehsize = sizeof(*ehdr); > + ehdr->e_phoff = phdr_off; > + ehdr->e_phentsize = sizeof(*phdr); > + ehdr->e_shoff = sizeof(*ehdr); > + ehdr->e_shentsize = sizeof(*shdr); > + ehdr->e_shstrndx = 1; > + > + __md->elf.elf_offset = elfh_size; > + > + /* > + * The zeroth index of the section header is reserved and is rarely used. > + * Set the section header as null (SHN_UNDEF) and move to the next one. > + * 2nd Section is String table. > + */ > + __md->elf.strtable_idx = 1; > + strtbl_off = sizeof(*ehdr) + ((sizeof(*phdr) + sizeof(*shdr)) * MAX_NUM_ENTRIES); > + shdr++; > + shdr->sh_type = SHT_STRTAB; > + shdr->sh_offset = (elf_addr_t)strtbl_off; > + shdr->sh_size = MAX_STRTBL_SIZE; > + shdr->sh_entsize = 0; > + shdr->sh_flags = 0; > + shdr->sh_name = append_str_to_strtable("STR_TBL"); > + shdr++; > + > + /* 3rd Section is Linux banner */ > + banner = (char *)ehdr + strtbl_off + MAX_STRTBL_SIZE; > + memcpy(banner, linux_banner, banner_len); > + > + shdr->sh_type = SHT_PROGBITS; > + shdr->sh_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE); > + shdr->sh_size = banner_len + 1; > + shdr->sh_addr = (elf_addr_t)linux_banner; > + shdr->sh_entsize = 0; > + shdr->sh_flags = SHF_WRITE; > + shdr->sh_name = append_str_to_strtable("linux_banner"); > + > + phdr->p_type = PT_LOAD; > + phdr->p_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE); > + phdr->p_vaddr = (elf_addr_t)linux_banner; > + phdr->p_paddr = virt_to_phys(linux_banner); > + phdr->p_filesz = phdr->p_memsz = banner_len + 1; > + phdr->p_flags = PF_R | PF_W; > + > + /* > + * Above are some prdefined sections/program header used > + * for debug, update their count here. > + */ > + ehdr->e_phnum = 1; > + ehdr->e_shnum = 3; > + > + return 0; > +} > + > +/** > + * qcom_minidump_subsystem_desc() - Get minidump subsystem descriptor. > + * @minidump_index: minidump index for a subsystem in minidump table > + * > + * Return: minidump subsystem descriptor address on success and error > + * on failure > + */ > +struct minidump_subsystem *qcom_minidump_subsystem_desc(unsigned int minidump_index) > +{ > + struct minidump_subsystem *md_ss_toc; > + > + mutex_lock(&minidump_lock); > + if (!__md) { > + md_ss_toc = ERR_PTR(-EPROBE_DEFER); > + goto unlock; > + } > + > + md_ss_toc = &__md->md_gbl_toc->subsystems[minidump_index]; > +unlock: > + mutex_unlock(&minidump_lock); > + return md_ss_toc; > +} > +EXPORT_SYMBOL_GPL(qcom_minidump_subsystem_desc); > + > +/** > + * qcom_apss_minidump_region_register() - Register a region in Minidump table. > + * @region: minidump region. > + * > + * Return: On success, it returns 0, otherwise a negative error value on failure. > + */ > +int qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region) > +{ > + unsigned int num_region; > + int ret; > + > + if (!__md) > + return -EPROBE_DEFER; > + > + if (!qcom_apss_minidump_valid_region(region)) > + return -EINVAL; > + > + mutex_lock(&minidump_lock); > + ret = get_apss_minidump_region_index(region); > + if (ret >= 0) { > + dev_info(__md->dev, "%s region is already registered\n", region->name); > + ret = -EEXIST; > + goto unlock; > + } > + > + /* Check if there is a room for a new entry */ > + num_region = le32_to_cpu(__md->md_apss_toc->region_count); > + if (num_region >= MAX_NUM_ENTRIES) { > + dev_err(__md->dev, "maximum region limit %u reached\n", num_region); > + ret = -ENOSPC; > + goto unlock; > + } > + > + qcom_apss_minidump_add_region(region); > + qcom_apss_minidump_update_elf_header(region); > + ret = 0; > +unlock: > + mutex_unlock(&minidump_lock); > + return ret; > +} > +EXPORT_SYMBOL_GPL(qcom_apss_minidump_region_register); > + > +static int > +qcom_apss_minidump_clear_header(const struct qcom_apss_minidump_region *region) > +{ > + struct elfhdr *ehdr = __md->elf.ehdr; > + struct elf_shdr *shdr; > + struct elf_shdr *tmp_shdr; > + struct elf_phdr *phdr; > + struct elf_phdr *tmp_phdr; > + unsigned int phidx; > + unsigned int shidx; > + unsigned int len; > + unsigned int i; > + char *shname; > + > + for (i = 0; i < ehdr->e_phnum; i++) { > + phdr = elf_phdr_entry_addr(ehdr, i); > + if (phdr->p_paddr == region->phys_addr && > + phdr->p_memsz == region->size) > + break; > + } > + > + if (i == ehdr->e_phnum) { > + dev_err(__md->dev, "Cannot find program header entry in elf\n"); > + return -EINVAL; > + } > + > + phidx = i; > + for (i = 0; i < ehdr->e_shnum; i++) { > + shdr = elf_shdr_entry_addr(ehdr, i); > + shname = elf_lookup_string(ehdr, shdr->sh_name); > + if (shname && !strcmp(shname, region->name) && > + shdr->sh_addr == (elf_addr_t)region->virt_addr && > + shdr->sh_size == region->size) > + break; > + } > + > + if (i == ehdr->e_shnum) { > + dev_err(__md->dev, "Cannot find section header entry in elf\n"); > + return -EINVAL; > + } > + > + shidx = i; > + if (shdr->sh_offset != phdr->p_offset) { > + dev_err(__md->dev, "Invalid entry details for region: %s\n", region->name); > + return -EINVAL; > + } > + > + /* Clear name in string table */ > + len = strlen(shname) + 1; > + memmove(shname, shname + len, > + __md->elf.strtable_idx - shdr->sh_name - len); > + __md->elf.strtable_idx -= len; > + > + /* Clear program header */ > + tmp_phdr = elf_phdr_entry_addr(ehdr, phidx); > + for (i = phidx; i < ehdr->e_phnum - 1; i++) { > + tmp_phdr = elf_phdr_entry_addr(ehdr, i + 1); > + phdr = elf_phdr_entry_addr(ehdr, i); > + memcpy(phdr, tmp_phdr, sizeof(struct elf_phdr)); > + phdr->p_offset = phdr->p_offset - region->size; > + } > + memset(tmp_phdr, 0, sizeof(struct elf_phdr)); > + ehdr->e_phnum--; > + > + /* Clear section header */ > + tmp_shdr = elf_shdr_entry_addr(ehdr, shidx); > + for (i = shidx; i < ehdr->e_shnum - 1; i++) { > + tmp_shdr = elf_shdr_entry_addr(ehdr, i + 1); > + shdr = elf_shdr_entry_addr(ehdr, i); > + memcpy(shdr, tmp_shdr, sizeof(struct elf_shdr)); > + shdr->sh_offset -= region->size; > + shdr->sh_name -= len; > + } > + > + memset(tmp_shdr, 0, sizeof(struct elf_shdr)); > + ehdr->e_shnum--; > + __md->elf.elf_offset -= region->size; > + > + return 0; > +} > + > +/** > + * qcom_apss_minidump_region_unregister() - Unregister region from Minidump table. > + * @region: minidump region. > + * > + * Return: On success, it returns 0 and negative error value on failure. > + */ > +int qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region) > +{ > + struct minidump_region *mdr; > + unsigned int num_region; > + unsigned int idx; > + int ret; > + > + if (!region) > + return -EINVAL; > + > + mutex_lock(&minidump_lock); > + if (!__md) { > + ret = -EPROBE_DEFER; > + goto unlock; > + } > + > + idx = get_apss_minidump_region_index(region); > + if (idx < 0) { > + dev_err(__md->dev, "%s region is not present\n", region->name); > + ret = idx; > + goto unlock; > + } > + > + mdr = &__md->md_regions[0]; > + num_region = le32_to_cpu(__md->md_apss_toc->region_count); > + /* > + * Left shift all the regions exist after this removed region > + * index by 1 to fill the gap and zero out the last region > + * present at the end. > + */ > + memmove(&mdr[idx], &mdr[idx + 1], > + (num_region - idx - 1) * sizeof(struct minidump_region)); > + memset(&mdr[num_region - 1], 0, sizeof(struct minidump_region)); > + ret = qcom_apss_minidump_clear_header(region); > + if (ret) { > + dev_err(__md->dev, "Failed to remove region: %s\n", region->name); > + goto unlock; > + } > + > + num_region--; > + __md->md_apss_toc->region_count = cpu_to_le32(num_region); > +unlock: > + mutex_unlock(&minidump_lock); > + return ret; > +} > +EXPORT_SYMBOL_GPL(qcom_apss_minidump_region_unregister); > + > +static int qcom_minidump_init_apss_subsystem(struct minidump *md) > +{ > + struct minidump_subsystem *apsstoc; > + > + apsstoc = &md->md_gbl_toc->subsystems[MINIDUMP_APSS_DESC]; > + md->md_regions = devm_kcalloc(md->dev, MAX_NUM_ENTRIES, > + sizeof(struct minidump_region), GFP_KERNEL); > + if (!md->md_regions) > + return -ENOMEM; > + > + md->md_apss_toc = apsstoc; > + apsstoc->regions_baseptr = cpu_to_le64(virt_to_phys(md->md_regions)); > + apsstoc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED); > + apsstoc->status = cpu_to_le32(1); > + apsstoc->region_count = cpu_to_le32(0); > + > + /* Tell bootloader not to encrypt the regions of this subsystem */ > + apsstoc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE); > + apsstoc->encryption_required = cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ); > + > + return 0; > +} > + > +static int qcom_minidump_probe(struct platform_device *pdev) > +{ > + struct minidump_global_toc *mdgtoc; > + struct minidump *md; > + size_t size; > + int ret; > + > + md = devm_kzalloc(&pdev->dev, sizeof(*md), GFP_KERNEL); > + if (!md) > + return -ENOMEM; > + > + mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size); > + if (IS_ERR(mdgtoc)) { > + ret = PTR_ERR(mdgtoc); > + dev_err(&pdev->dev, "Couldn't find minidump smem item: %d\n", ret); > + return ret; > + } > + > + if (size < sizeof(*mdgtoc) || !mdgtoc->status) { > + ret = -EINVAL; > + dev_err(&pdev->dev, "minidump table is not initialized: %d\n", ret); > + return ret; > + } > + > + mutex_lock(&minidump_lock); > + md->dev = &pdev->dev; > + md->md_gbl_toc = mdgtoc; What are you protecting here? It's not possible to have concurrent access to md, is it? > + ret = qcom_minidump_init_apss_subsystem(md); > + if (ret) { > + dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret); > + goto unlock; > + } > + > + __md = md; No. This is a platform device, so it can have multiple instances. > + /* First entry would be ELF header */ > + ret = qcom_apss_minidump_add_elf_header(); > + if (ret) { > + dev_err(&pdev->dev, "Failed to add elf header: %d\n", ret); > + memset(md->md_apss_toc, 0, sizeof(struct minidump_subsystem)); > + __md = NULL; > + } > + > +unlock: > + mutex_unlock(&minidump_lock); > + return ret; > +} > + > +static int qcom_minidump_remove(struct platform_device *pdev) > +{ > + memset(__md->md_apss_toc, 0, sizeof(struct minidump_subsystem)); > + __md = NULL; Don't use __ in variable names. Drop it everywhere. > + > + return 0; > +} > + > +static struct platform_driver qcom_minidump_driver = { > + .probe = qcom_minidump_probe, > + .remove = qcom_minidump_remove, > + .driver = { > + .name = "qcom-minidump", > + }, > +}; > + > +module_platform_driver(qcom_minidump_driver); > + > +MODULE_DESCRIPTION("Qualcomm APSS minidump driver"); > +MODULE_LICENSE("GPL v2"); > +MODULE_ALIAS("platform:qcom-minidump"); > diff --git a/drivers/soc/qcom/smem.c b/drivers/soc/qcom/smem.c > index 6be7ea9..d459656 100644 > --- a/drivers/soc/qcom/smem.c > +++ b/drivers/soc/qcom/smem.c > @@ -279,6 +279,7 @@ struct qcom_smem { > > u32 item_count; > struct platform_device *socinfo; > + struct platform_device *minidump; > struct smem_ptable *ptable; > struct smem_partition global_partition; > struct smem_partition partitions[SMEM_HOST_COUNT]; > @@ -1151,12 +1152,19 @@ static int qcom_smem_probe(struct platform_device *pdev) > if (IS_ERR(smem->socinfo)) > dev_dbg(&pdev->dev, "failed to register socinfo device\n"); > > + smem->minidump = platform_device_register_data(&pdev->dev, "qcom-minidump", > + PLATFORM_DEVID_NONE, NULL, > + 0); > + if (IS_ERR(smem->minidump)) > + dev_dbg(&pdev->dev, "failed to register minidump device\n"); > + > return 0; > } > > static int qcom_smem_remove(struct platform_device *pdev) > { > platform_device_unregister(__smem->socinfo); > + platform_device_unregister(__smem->minidump); Wrong order. You registered first socinfo, right? > > hwspin_lock_free(__smem->hwlock); > __smem = NULL; > diff --git a/include/soc/qcom/qcom_minidump.h b/include/soc/qcom/qcom_minidump.h > index 84c8605..1872668 100644 > --- a/include/soc/qcom/qcom_minidump.h > +++ b/include/soc/qcom/qcom_minidump.h > @@ -1,6 +1,7 @@ > /* SPDX-License-Identifier: GPL-2.0-only */ > /* > - * Qualcomm minidump shared data structures and macros > + * This file contain Qualcomm minidump data structures and macros shared with > + * boot firmware and also apss minidump client's data structure > * > * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved. > */ > @@ -9,12 +10,27 @@ > #define _QCOM_MINIDUMP_H_ > > #define MAX_NUM_OF_SS 10 > +#define MAX_NAME_LENGTH 12 > #define MAX_REGION_NAME_LENGTH 16 > + > +#define MINIDUMP_REVISION 1 > #define SBL_MINIDUMP_SMEM_ID 602 > + > +/* Application processor minidump descriptor */ > +#define MINIDUMP_APSS_DESC 0 > +#define SMEM_ENTRY_SIZE 40 > + > #define MINIDUMP_REGION_VALID ('V' << 24 | 'A' << 16 | 'L' << 8 | 'I' << 0) > +#define MINIDUMP_REGION_INVALID ('I' << 24 | 'N' << 16 | 'V' << 8 | 'A' << 0) > +#define MINIDUMP_REGION_INIT ('I' << 24 | 'N' << 16 | 'I' << 8 | 'T' << 0) > +#define MINIDUMP_REGION_NOINIT 0 > + > +#define MINIDUMP_SS_ENCR_REQ (0 << 24 | 'Y' << 16 | 'E' << 8 | 'S' << 0) > +#define MINIDUMP_SS_ENCR_NOTREQ (0 << 24 | 0 << 16 | 'N' << 8 | 'R' << 0) > +#define MINIDUMP_SS_ENCR_NONE ('N' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0) > #define MINIDUMP_SS_ENCR_DONE ('D' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0) > +#define MINIDUMP_SS_ENCR_START ('S' << 24 | 'T' << 16 | 'R' << 8 | 'T' << 0) > #define MINIDUMP_SS_ENABLED ('E' << 24 | 'N' << 16 | 'B' << 8 | 'L' << 0) > - Why removing this? > /** > * struct minidump_region - Minidump region > * @name : Name of the region to be dumped > @@ -63,4 +79,45 @@ struct minidump_global_toc { > struct minidump_subsystem subsystems[MAX_NUM_OF_SS]; > }; > > +/** > + * struct qcom_apss_minidump_region - APSS Minidump region information > + * > + * @name: Entry name, Minidump will dump binary with this name. > + * @virt_addr: Virtual address of the entry. > + * @phys_addr: Physical address of the entry to dump. > + * @size: Number of byte to dump from @address location, > + * and it should be 4 byte aligned. > + */ > +struct qcom_apss_minidump_region { > + char name[MAX_NAME_LENGTH]; > + void *virt_addr; > + phys_addr_t phys_addr; > + size_t size; > +}; You expose way too much internals in global header. > + > +#if IS_ENABLED(CONFIG_QCOM_MINIDUMP) > +extern struct minidump_subsystem * No externs. The header is unreadable. > +qcom_minidump_subsystem_desc(unsigned int minidump_index); > +extern int > +qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region); > +extern int > +qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region); Blank line > +#else Blank line > +static inline > +struct minidump_subsystem *qcom_minidump_subsystem_desc(unsigned int minidump_index) > +{ > + return NULL; > +} Blank line > +static inline int > +qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region) > +{ > + /* Return quietly, if minidump is not enabled */ > + return 0; > +} > +static inline int > +qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region) > +{ > + return 0; > +} > +#endif /* CONFIG_QCOM_MINIDUMP */ > #endif /* _QCOM_MINIDUMP_H_ */ Best regards, Krzysztof
On 03/05/2023 19:02, Mukesh Ojha wrote: > Move minidump specific data types and macros to a separate internal > header(qcom_minidump.h) so that it can be shared among different > Qualcomm drivers. No, this is not internal header. You moved it to global header. There is no reason driver internals should be exposed to other unrelated subsystems. > > There is no change in functional behavior after this. It is. You made all these internal symbols available to others. > This comes without justification why other drivers needs to access private and internal data. It does not look correct design. NAK. Best regards, Krzysztof
On 5/4/2023 4:53 PM, Krzysztof Kozlowski wrote: > On 03/05/2023 19:02, Mukesh Ojha wrote: >> Previous patches add the Qualcomm minidump driver support, so >> lets enable minidump config so that it can be used by kernel >> clients. >> >> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> > > This patchset is split too much. Defconfig change is one change. Not two > or three. > >> --- >> arch/arm64/configs/defconfig | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig >> index a24609e..831c942 100644 >> --- a/arch/arm64/configs/defconfig >> +++ b/arch/arm64/configs/defconfig >> @@ -1250,6 +1250,7 @@ CONFIG_QCOM_STATS=m >> CONFIG_QCOM_WCNSS_CTRL=m >> CONFIG_QCOM_APR=m >> CONFIG_QCOM_ICC_BWMON=m >> +CONFIG_QCOM_MINIDUMP=y > > This must be a module. Why do you think this should be a module ? Is it because, it is lying here among others '=m' ? Or you have some other reasoning ? like it is for qcom specific soc and can not be used outside ? but that is not true for all configs mentioned here. The reason behind making it as '=y' was, to collect information from core kernel data structure as well as the information like percpu data, run queue, irq stat kind of information on kernel crash on a target running some perf configuration(android phone). -- Mukesh > > Best regards, > Krzysztof >
On 5/4/2023 5:08 PM, Krzysztof Kozlowski wrote: > On 03/05/2023 19:02, Mukesh Ojha wrote: >> Move minidump specific data types and macros to a separate internal >> header(qcom_minidump.h) so that it can be shared among different >> Qualcomm drivers. > > No, this is not internal header. You moved it to global header. > > There is no reason driver internals should be exposed to other unrelated > subsystems. > >> >> There is no change in functional behavior after this. > > It is. You made all these internal symbols available to others. > >> > > This comes without justification why other drivers needs to access > private and internal data. It does not look correct design. NAK. Thanks for catching outdated commit text, will fix the commit with more descriptive reasoning. It has to be global so that co-processor minidump and apss minidump can share data structure and they are lying in different directory. -Mukesh > > Best regards, > Krzysztof >
On 04/05/2023 13:58, Mukesh Ojha wrote: > > > On 5/4/2023 5:08 PM, Krzysztof Kozlowski wrote: >> On 03/05/2023 19:02, Mukesh Ojha wrote: >>> Move minidump specific data types and macros to a separate internal >>> header(qcom_minidump.h) so that it can be shared among different >>> Qualcomm drivers. >> >> No, this is not internal header. You moved it to global header. >> >> There is no reason driver internals should be exposed to other unrelated >> subsystems. >> >>> >>> There is no change in functional behavior after this. >> >> It is. You made all these internal symbols available to others. >> >>> >> >> This comes without justification why other drivers needs to access >> private and internal data. It does not look correct design. NAK. > > Thanks for catching outdated commit text, will fix the commit with > more descriptive reasoning. > > It has to be global so that co-processor minidump and apss minidump can > share data structure and they are lying in different directory. > Then you should not share all the internals of memory layout but only few pieces necessary to talk with minidump driver. The minidump driver should organize everything how it wants. Best regards, Krzysztof
On 5/4/2023 5:33 PM, Krzysztof Kozlowski wrote: > On 04/05/2023 13:58, Mukesh Ojha wrote: >> >> >> On 5/4/2023 5:08 PM, Krzysztof Kozlowski wrote: >>> On 03/05/2023 19:02, Mukesh Ojha wrote: >>>> Move minidump specific data types and macros to a separate internal >>>> header(qcom_minidump.h) so that it can be shared among different >>>> Qualcomm drivers. >>> >>> No, this is not internal header. You moved it to global header. >>> >>> There is no reason driver internals should be exposed to other unrelated >>> subsystems. >>> >>>> >>>> There is no change in functional behavior after this. >>> >>> It is. You made all these internal symbols available to others. >>> >>>> >>> >>> This comes without justification why other drivers needs to access >>> private and internal data. It does not look correct design. NAK. >> >> Thanks for catching outdated commit text, will fix the commit with >> more descriptive reasoning. >> >> It has to be global so that co-processor minidump and apss minidump can >> share data structure and they are lying in different directory. >> > > Then you should not share all the internals of memory layout but only > few pieces necessary to talk with minidump driver. The minidump driver > should organize everything how it wants. These are core data structure which is shared with boot firmware and the one's are moved here all are required by minidump driver . If you follow here[1], i raised by concern to make this particular one's as private and later to avoid confusion went with single header. But if others agree, I will keep the one that get shared with minidump as separate one or if relative path of headers are allowed that can make it private between these drivers(which i don't think, will be allowed or recommended). [1] https://lore.kernel.org/lkml/3df1ec27-7e4d-1f84-ff20-94e8ea91c86f@quicinc.com/ -- Mukesh > > Best regards, > Krzysztof >
On 04/05/2023 13:45, Mukesh Ojha wrote: > > > On 5/4/2023 4:53 PM, Krzysztof Kozlowski wrote: >> On 03/05/2023 19:02, Mukesh Ojha wrote: >>> Previous patches add the Qualcomm minidump driver support, so >>> lets enable minidump config so that it can be used by kernel >>> clients. >>> >>> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> >> >> This patchset is split too much. Defconfig change is one change. Not two >> or three. >> >>> --- >>> arch/arm64/configs/defconfig | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig >>> index a24609e..831c942 100644 >>> --- a/arch/arm64/configs/defconfig >>> +++ b/arch/arm64/configs/defconfig >>> @@ -1250,6 +1250,7 @@ CONFIG_QCOM_STATS=m >>> CONFIG_QCOM_WCNSS_CTRL=m >>> CONFIG_QCOM_APR=m >>> CONFIG_QCOM_ICC_BWMON=m >>> +CONFIG_QCOM_MINIDUMP=y >> >> This must be a module. > > Why do you think this should be a module ? > > Is it because, it is lying here among others '=m' ? Because we want and insist on everything being a module. That's the generic rule. There are exceptions, so if this justifies being an exception, please bring appropriate arguments. > > Or you have some other reasoning ? like it is for qcom specific > soc and can not be used outside ? but that is not true for > all configs mentioned here. > > The reason behind making it as '=y' was, to collect information from > core kernel data structure as well as the information like percpu data, > run queue, irq stat kind of information on kernel crash on a target > running some perf configuration(android phone). I don't understand why =m stops you from all that. What's more, I don't understand why do you refer to the Android here. This is a development and debugging Linux defconfig, not Android reference config for vendors... Best regards, Krzysztof
On 04/05/2023 14:26, Mukesh Ojha wrote: > > > On 5/4/2023 5:33 PM, Krzysztof Kozlowski wrote: >> On 04/05/2023 13:58, Mukesh Ojha wrote: >>> >>> >>> On 5/4/2023 5:08 PM, Krzysztof Kozlowski wrote: >>>> On 03/05/2023 19:02, Mukesh Ojha wrote: >>>>> Move minidump specific data types and macros to a separate internal >>>>> header(qcom_minidump.h) so that it can be shared among different >>>>> Qualcomm drivers. >>>> >>>> No, this is not internal header. You moved it to global header. >>>> >>>> There is no reason driver internals should be exposed to other unrelated >>>> subsystems. >>>> >>>>> >>>>> There is no change in functional behavior after this. >>>> >>>> It is. You made all these internal symbols available to others. >>>> >>>>> >>>> >>>> This comes without justification why other drivers needs to access >>>> private and internal data. It does not look correct design. NAK. >>> >>> Thanks for catching outdated commit text, will fix the commit with >>> more descriptive reasoning. >>> >>> It has to be global so that co-processor minidump and apss minidump can >>> share data structure and they are lying in different directory. >>> >> >> Then you should not share all the internals of memory layout but only >> few pieces necessary to talk with minidump driver. The minidump driver >> should organize everything how it wants. > > These are core data structure which is shared with boot firmware and the > one's are moved here all are required by minidump driver . I am not sure if I understand correctly. If they are all required by minidump driver, then this must not be in include, but stay with minidump. Remoteproc then should not touch it. I don't understand why internals of minidump should be important for remoteproc. If they are, means you broken encapsulation. > > If you follow here[1], i raised by concern to make this particular one's > as private and later to avoid confusion went with single header. > But if others agree, I will keep the one that get shared with minidump > as separate one or if relative path of headers are allowed that can make > it private between these drivers(which i don't think, will be allowed or > recommended). Let's be specific: why MD_REGION_VALID must be available for remoteproc or any other driver after introducing qcom minidump driver? Best regards, Krzysztof
On 5/4/2023 5:06 PM, Krzysztof Kozlowski wrote: > On 03/05/2023 19:02, Mukesh Ojha wrote: >> Minidump is a best effort mechanism to collect useful and predefined >> data for first level of debugging on end user devices running on >> Qualcomm SoCs. It is built on the premise that System on Chip (SoC) >> or subsystem part of SoC crashes, due to a range of hardware and >> software bugs. Hence, the ability to collect accurate data is only >> a best-effort. The data collected could be invalid or corrupted, >> data collection itself could fail, and so on. >> >> Qualcomm devices in engineering mode provides a mechanism for >> generating full system ramdumps for post mortem debugging. But in some >> cases it's however not feasible to capture the entire content of RAM. >> The minidump mechanism provides the means for selecting region should >> be included in the ramdump. The solution supports extracting the >> ramdump/minidump produced either over USB or stored to an attached >> storage device. >> >> The core of minidump feature is part of Qualcomm's boot firmware code. >> It initializes shared memory(SMEM), which is a part of DDR and >> allocates a small section of it to minidump table i.e also called >> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has >> their own table of segments to be included in the minidump, all >> references from a descriptor in SMEM (G-ToC). Each segment/region has >> some details like name, physical address and it's size etc. and it >> could be anywhere scattered in the DDR. >> >> Minidump kernel driver adds the capability to add linux region to be >> dumped as part of ram dump collection. It provides appropriate symbol >> to check its enablement and register client regions. >> >> To simplify post mortem debugging, it creates and maintain an ELF >> header as first region that gets updated upon registration >> of a new region. >> >> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> >> --- >> drivers/soc/qcom/Kconfig | 14 + >> drivers/soc/qcom/Makefile | 1 + >> drivers/soc/qcom/qcom_minidump.c | 581 +++++++++++++++++++++++++++++++++++++++ >> drivers/soc/qcom/smem.c | 8 + >> include/soc/qcom/qcom_minidump.h | 61 +++- >> 5 files changed, 663 insertions(+), 2 deletions(-) >> create mode 100644 drivers/soc/qcom/qcom_minidump.c >> >> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig >> index a491718..15c931e 100644 >> --- a/drivers/soc/qcom/Kconfig >> +++ b/drivers/soc/qcom/Kconfig >> @@ -279,4 +279,18 @@ config QCOM_INLINE_CRYPTO_ENGINE >> tristate >> select QCOM_SCM >> >> +config QCOM_MINIDUMP >> + tristate "QCOM Minidump Support" >> + depends on ARCH_QCOM || COMPILE_TEST >> + select QCOM_SMEM >> + help >> + Enablement of core minidump feature is controlled from boot firmware >> + side, and this config allow linux to query and manages APPS minidump >> + table. >> + >> + Client drivers can register their internal data structures and debug >> + messages as part of the minidump region and when the SoC is crashed, >> + these selective regions will be dumped instead of the entire DDR. >> + This saves significant amount of time and/or storage space. >> + >> endmenu >> diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile >> index 0f43a88..1ebe081 100644 >> --- a/drivers/soc/qcom/Makefile >> +++ b/drivers/soc/qcom/Makefile >> @@ -33,3 +33,4 @@ obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o >> obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) += kryo-l2-accessors.o >> obj-$(CONFIG_QCOM_ICC_BWMON) += icc-bwmon.o >> obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE) += ice.o >> +obj-$(CONFIG_QCOM_MINIDUMP) += qcom_minidump.o >> diff --git a/drivers/soc/qcom/qcom_minidump.c b/drivers/soc/qcom/qcom_minidump.c >> new file mode 100644 >> index 0000000..d107a86 >> --- /dev/null >> +++ b/drivers/soc/qcom/qcom_minidump.c >> @@ -0,0 +1,581 @@ >> +// SPDX-License-Identifier: GPL-2.0-only >> + >> +/* >> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved. >> + */ >> + >> +#include <linux/elf.h> >> +#include <linux/err.h> >> +#include <linux/errno.h> >> +#include <linux/export.h> >> +#include <linux/init.h> >> +#include <linux/io.h> >> +#include <linux/kernel.h> >> +#include <linux/module.h> >> +#include <linux/platform_device.h> >> +#include <linux/string.h> >> +#include <linux/soc/qcom/smem.h> >> +#include <soc/qcom/qcom_minidump.h> >> + >> +/** >> + * struct minidump_elfhdr - Minidump table elf header >> + * @ehdr: Elf main header >> + * @shdr: Section header >> + * @phdr: Program header >> + * @elf_offset: Section offset in elf >> + * @strtable_idx: String table current index position >> + */ >> +struct minidump_elfhdr { >> + struct elfhdr *ehdr; >> + struct elf_shdr *shdr; >> + struct elf_phdr *phdr; >> + size_t elf_offset; >> + size_t strtable_idx; >> +}; >> + >> +/** >> + * struct minidump - Minidump driver private data >> + * @md_gbl_toc : Global TOC pointer >> + * @md_apss_toc : Application Subsystem TOC pointer >> + * @md_regions : High level OS region base pointer >> + * @elf : Minidump elf header >> + * @dev : Minidump device >> + */ >> +struct minidump { >> + struct minidump_global_toc *md_gbl_toc; >> + struct minidump_subsystem *md_apss_toc; >> + struct minidump_region *md_regions; >> + struct minidump_elfhdr elf; >> + struct device *dev; >> +}; >> + >> +/* >> + * In some of the Old Qualcomm devices, boot firmware statically allocates 300 >> + * as total number of supported region (including all co-processors) in >> + * minidump table out of which linux was using 201. In future, this limitation >> + * from boot firmware might get removed by allocating the region dynamically. >> + * So, keep it compatible with older devices, we can keep the current limit for >> + * Linux to 201. >> + */ >> +#define MAX_NUM_ENTRIES 201 >> +#define MAX_STRTBL_SIZE (MAX_NUM_ENTRIES * MAX_REGION_NAME_LENGTH) >> + >> +static struct minidump *__md; > > No, no file scope or global scope statics. Sorry, this is done as per recommendation given here [1] and this matches both driver/firmware/qcom_scm.c and driver/soc/qcom/smem.c implementations. [1] https://lore.kernel.org/lkml/f74dfcde-e59b-a9b3-9bbc-a8de644f6740@linaro.org/ > >> +static DEFINE_MUTEX(minidump_lock); > > Neither this. > > Also you need to clearly express what is protected by newn lock. Sure, will keep proper reasoning why global lock reasoning. " Since it is possible to call this file's exported function before this driver's probe get called or it's probe is happening in parallel so it needs global lock to protect this case. " > >> + >> +static struct elf_shdr *elf_shdr_entry_addr(struct elfhdr *ehdr, int idx) >> +{ >> + struct elf_shdr *eshdr = (struct elf_shdr *)((size_t)ehdr + ehdr->e_shoff); >> + >> + return &eshdr[idx]; >> +} >> + >> +static struct elf_phdr *elf_phdr_entry_addr(struct elfhdr *ehdr, int idx) >> +{ >> + struct elf_phdr *ephdr = (struct elf_phdr *)((size_t)ehdr + ehdr->e_phoff); >> + >> + return &ephdr[idx]; >> +} >> + >> +static char *elf_str_table_start(struct elfhdr *ehdr) >> +{ >> + struct elf_shdr *eshdr; >> + >> + if (ehdr->e_shstrndx == SHN_UNDEF) >> + return NULL; >> + >> + eshdr = elf_shdr_entry_addr(ehdr, ehdr->e_shstrndx); >> + return (char *)ehdr + eshdr->sh_offset; >> +} >> + >> +static char *elf_lookup_string(struct elfhdr *ehdr, int offset) >> +{ >> + char *strtab = elf_str_table_start(ehdr); >> + >> + if (!strtab || (__md->elf.strtable_idx < offset)) >> + return NULL; >> + >> + return strtab + offset; >> +} >> + >> +static unsigned int append_str_to_strtable(const char *name) >> +{ >> + char *strtab = elf_str_table_start(__md->elf.ehdr); >> + unsigned int old_idx = __md->elf.strtable_idx; >> + unsigned int ret; >> + >> + if (!strtab || !name) >> + return 0; >> + >> + ret = old_idx; >> + old_idx += strscpy((strtab + old_idx), name, MAX_REGION_NAME_LENGTH); >> + __md->elf.strtable_idx = old_idx + 1; >> + return ret; >> +} >> + >> +static int >> +get_apss_minidump_region_index(const struct qcom_apss_minidump_region *region) >> +{ >> + struct minidump_region *mdr; >> + unsigned int i; >> + unsigned int count; >> + >> + count = le32_to_cpu(__md->md_apss_toc->region_count); >> + for (i = 0; i < count; i++) { >> + mdr = &__md->md_regions[i]; >> + if (!strcmp(mdr->name, region->name)) >> + return i; >> + } >> + return -ENOENT; >> +} >> + >> +static void >> +qcom_apss_minidump_update_elf_header(const struct qcom_apss_minidump_region *region) > > You need to name everything shorter. Neither functions nor structs are > making it easy to follow. Sure !! Thought of being very much descriptive here :-) > >> +{ >> + struct elfhdr *ehdr = __md->elf.ehdr; >> + struct elf_shdr *shdr; >> + struct elf_phdr *phdr; >> + >> + shdr = elf_shdr_entry_addr(ehdr, ehdr->e_shnum++); >> + phdr = elf_phdr_entry_addr(ehdr, ehdr->e_phnum++); >> + >> + shdr->sh_type = SHT_PROGBITS; >> + shdr->sh_name = append_str_to_strtable(region->name); >> + shdr->sh_addr = (elf_addr_t)region->virt_addr; >> + shdr->sh_size = region->size; >> + shdr->sh_flags = SHF_WRITE; >> + shdr->sh_offset = __md->elf.elf_offset; >> + shdr->sh_entsize = 0; >> + >> + phdr->p_type = PT_LOAD; >> + phdr->p_offset = __md->elf.elf_offset; >> + phdr->p_vaddr = (elf_addr_t)region->virt_addr; >> + phdr->p_paddr = region->phys_addr; >> + phdr->p_filesz = phdr->p_memsz = region->size; >> + phdr->p_flags = PF_R | PF_W; >> + __md->elf.elf_offset += shdr->sh_size; >> +} >> + >> +static void >> +qcom_apss_minidump_add_region(const struct qcom_apss_minidump_region *region) >> +{ >> + struct minidump_region *mdr; >> + unsigned int region_cnt = le32_to_cpu(__md->md_apss_toc->region_count); >> + >> + mdr = &__md->md_regions[region_cnt]; >> + strscpy(mdr->name, region->name, sizeof(mdr->name)); >> + mdr->address = cpu_to_le64(region->phys_addr); >> + mdr->size = cpu_to_le64(region->size); >> + mdr->valid = cpu_to_le32(MINIDUMP_REGION_VALID); >> + region_cnt++; >> + __md->md_apss_toc->region_count = cpu_to_le32(region_cnt); >> +} >> + >> +static bool >> +qcom_apss_minidump_valid_region(const struct qcom_apss_minidump_region *region) >> +{ >> + return region && >> + strnlen(region->name, MAX_NAME_LENGTH) < MAX_NAME_LENGTH && >> + region->virt_addr && >> + region->size && >> + IS_ALIGNED(region->size, 4); >> +} >> + >> +static int qcom_apss_minidump_add_elf_header(void) >> +{ >> + struct qcom_apss_minidump_region elfregion; >> + struct elfhdr *ehdr; >> + struct elf_shdr *shdr; >> + struct elf_phdr *phdr; >> + unsigned int elfh_size; >> + unsigned int strtbl_off; >> + unsigned int phdr_off; >> + char *banner; >> + unsigned int banner_len; >> + >> + banner_len = strlen(linux_banner); >> + /* >> + * Header buffer contains: >> + * ELF header, (MAX_NUM_ENTRIES + 4) of Section and Program ELF headers, >> + * where, 4 additional entries, one for empty header, one for string table >> + * one for minidump table and one for linux banner. >> + * >> + * Linux banner is stored in minidump to aid post mortem tools to determine >> + * the kernel version. >> + */ >> + elfh_size = sizeof(*ehdr); >> + elfh_size += MAX_STRTBL_SIZE; >> + elfh_size += banner_len + 1; >> + elfh_size += ((sizeof(*shdr) + sizeof(*phdr)) * (MAX_NUM_ENTRIES + 4)); >> + elfh_size = ALIGN(elfh_size, 4); >> + >> + __md->elf.ehdr = kzalloc(elfh_size, GFP_KERNEL); >> + if (!__md->elf.ehdr) >> + return -ENOMEM; >> + >> + /* Register ELF header as first region */ >> + strscpy(elfregion.name, "KELF_HEADER", sizeof(elfregion.name)); >> + elfregion.virt_addr = __md->elf.ehdr; >> + elfregion.phys_addr = virt_to_phys(__md->elf.ehdr); >> + elfregion.size = elfh_size; >> + qcom_apss_minidump_add_region(&elfregion); >> + >> + ehdr = __md->elf.ehdr; >> + /* Assign Section/Program headers offset */ >> + __md->elf.shdr = shdr = (struct elf_shdr *)(ehdr + 1); >> + __md->elf.phdr = phdr = (struct elf_phdr *)(shdr + MAX_NUM_ENTRIES); >> + phdr_off = sizeof(*ehdr) + (sizeof(*shdr) * MAX_NUM_ENTRIES); >> + >> + memcpy(ehdr->e_ident, ELFMAG, SELFMAG); >> + ehdr->e_ident[EI_CLASS] = ELF_CLASS; >> + ehdr->e_ident[EI_DATA] = ELF_DATA; >> + ehdr->e_ident[EI_VERSION] = EV_CURRENT; >> + ehdr->e_ident[EI_OSABI] = ELF_OSABI; >> + ehdr->e_type = ET_CORE; >> + ehdr->e_machine = ELF_ARCH; >> + ehdr->e_version = EV_CURRENT; >> + ehdr->e_ehsize = sizeof(*ehdr); >> + ehdr->e_phoff = phdr_off; >> + ehdr->e_phentsize = sizeof(*phdr); >> + ehdr->e_shoff = sizeof(*ehdr); >> + ehdr->e_shentsize = sizeof(*shdr); >> + ehdr->e_shstrndx = 1; >> + >> + __md->elf.elf_offset = elfh_size; >> + >> + /* >> + * The zeroth index of the section header is reserved and is rarely used. >> + * Set the section header as null (SHN_UNDEF) and move to the next one. >> + * 2nd Section is String table. >> + */ >> + __md->elf.strtable_idx = 1; >> + strtbl_off = sizeof(*ehdr) + ((sizeof(*phdr) + sizeof(*shdr)) * MAX_NUM_ENTRIES); >> + shdr++; >> + shdr->sh_type = SHT_STRTAB; >> + shdr->sh_offset = (elf_addr_t)strtbl_off; >> + shdr->sh_size = MAX_STRTBL_SIZE; >> + shdr->sh_entsize = 0; >> + shdr->sh_flags = 0; >> + shdr->sh_name = append_str_to_strtable("STR_TBL"); >> + shdr++; >> + >> + /* 3rd Section is Linux banner */ >> + banner = (char *)ehdr + strtbl_off + MAX_STRTBL_SIZE; >> + memcpy(banner, linux_banner, banner_len); >> + >> + shdr->sh_type = SHT_PROGBITS; >> + shdr->sh_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE); >> + shdr->sh_size = banner_len + 1; >> + shdr->sh_addr = (elf_addr_t)linux_banner; >> + shdr->sh_entsize = 0; >> + shdr->sh_flags = SHF_WRITE; >> + shdr->sh_name = append_str_to_strtable("linux_banner"); >> + >> + phdr->p_type = PT_LOAD; >> + phdr->p_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE); >> + phdr->p_vaddr = (elf_addr_t)linux_banner; >> + phdr->p_paddr = virt_to_phys(linux_banner); >> + phdr->p_filesz = phdr->p_memsz = banner_len + 1; >> + phdr->p_flags = PF_R | PF_W; >> + >> + /* >> + * Above are some prdefined sections/program header used >> + * for debug, update their count here. >> + */ >> + ehdr->e_phnum = 1; >> + ehdr->e_shnum = 3; >> + >> + return 0; >> +} >> + >> +/** >> + * qcom_minidump_subsystem_desc() - Get minidump subsystem descriptor. >> + * @minidump_index: minidump index for a subsystem in minidump table >> + * >> + * Return: minidump subsystem descriptor address on success and error >> + * on failure >> + */ >> +struct minidump_subsystem *qcom_minidump_subsystem_desc(unsigned int minidump_index) >> +{ >> + struct minidump_subsystem *md_ss_toc; >> + >> + mutex_lock(&minidump_lock); >> + if (!__md) { >> + md_ss_toc = ERR_PTR(-EPROBE_DEFER); >> + goto unlock; >> + } >> + >> + md_ss_toc = &__md->md_gbl_toc->subsystems[minidump_index]; >> +unlock: >> + mutex_unlock(&minidump_lock); >> + return md_ss_toc; >> +} >> +EXPORT_SYMBOL_GPL(qcom_minidump_subsystem_desc); >> + >> +/** >> + * qcom_apss_minidump_region_register() - Register a region in Minidump table. >> + * @region: minidump region. >> + * >> + * Return: On success, it returns 0, otherwise a negative error value on failure. >> + */ >> +int qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region) >> +{ >> + unsigned int num_region; >> + int ret; >> + >> + if (!__md) >> + return -EPROBE_DEFER; >> + >> + if (!qcom_apss_minidump_valid_region(region)) >> + return -EINVAL; >> + >> + mutex_lock(&minidump_lock); >> + ret = get_apss_minidump_region_index(region); >> + if (ret >= 0) { >> + dev_info(__md->dev, "%s region is already registered\n", region->name); >> + ret = -EEXIST; >> + goto unlock; >> + } >> + >> + /* Check if there is a room for a new entry */ >> + num_region = le32_to_cpu(__md->md_apss_toc->region_count); >> + if (num_region >= MAX_NUM_ENTRIES) { >> + dev_err(__md->dev, "maximum region limit %u reached\n", num_region); >> + ret = -ENOSPC; >> + goto unlock; >> + } >> + >> + qcom_apss_minidump_add_region(region); >> + qcom_apss_minidump_update_elf_header(region); >> + ret = 0; >> +unlock: >> + mutex_unlock(&minidump_lock); >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(qcom_apss_minidump_region_register); >> + >> +static int >> +qcom_apss_minidump_clear_header(const struct qcom_apss_minidump_region *region) >> +{ >> + struct elfhdr *ehdr = __md->elf.ehdr; >> + struct elf_shdr *shdr; >> + struct elf_shdr *tmp_shdr; >> + struct elf_phdr *phdr; >> + struct elf_phdr *tmp_phdr; >> + unsigned int phidx; >> + unsigned int shidx; >> + unsigned int len; >> + unsigned int i; >> + char *shname; >> + >> + for (i = 0; i < ehdr->e_phnum; i++) { >> + phdr = elf_phdr_entry_addr(ehdr, i); >> + if (phdr->p_paddr == region->phys_addr && >> + phdr->p_memsz == region->size) >> + break; >> + } >> + >> + if (i == ehdr->e_phnum) { >> + dev_err(__md->dev, "Cannot find program header entry in elf\n"); >> + return -EINVAL; >> + } >> + >> + phidx = i; >> + for (i = 0; i < ehdr->e_shnum; i++) { >> + shdr = elf_shdr_entry_addr(ehdr, i); >> + shname = elf_lookup_string(ehdr, shdr->sh_name); >> + if (shname && !strcmp(shname, region->name) && >> + shdr->sh_addr == (elf_addr_t)region->virt_addr && >> + shdr->sh_size == region->size) >> + break; >> + } >> + >> + if (i == ehdr->e_shnum) { >> + dev_err(__md->dev, "Cannot find section header entry in elf\n"); >> + return -EINVAL; >> + } >> + >> + shidx = i; >> + if (shdr->sh_offset != phdr->p_offset) { >> + dev_err(__md->dev, "Invalid entry details for region: %s\n", region->name); >> + return -EINVAL; >> + } >> + >> + /* Clear name in string table */ >> + len = strlen(shname) + 1; >> + memmove(shname, shname + len, >> + __md->elf.strtable_idx - shdr->sh_name - len); >> + __md->elf.strtable_idx -= len; >> + >> + /* Clear program header */ >> + tmp_phdr = elf_phdr_entry_addr(ehdr, phidx); >> + for (i = phidx; i < ehdr->e_phnum - 1; i++) { >> + tmp_phdr = elf_phdr_entry_addr(ehdr, i + 1); >> + phdr = elf_phdr_entry_addr(ehdr, i); >> + memcpy(phdr, tmp_phdr, sizeof(struct elf_phdr)); >> + phdr->p_offset = phdr->p_offset - region->size; >> + } >> + memset(tmp_phdr, 0, sizeof(struct elf_phdr)); >> + ehdr->e_phnum--; >> + >> + /* Clear section header */ >> + tmp_shdr = elf_shdr_entry_addr(ehdr, shidx); >> + for (i = shidx; i < ehdr->e_shnum - 1; i++) { >> + tmp_shdr = elf_shdr_entry_addr(ehdr, i + 1); >> + shdr = elf_shdr_entry_addr(ehdr, i); >> + memcpy(shdr, tmp_shdr, sizeof(struct elf_shdr)); >> + shdr->sh_offset -= region->size; >> + shdr->sh_name -= len; >> + } >> + >> + memset(tmp_shdr, 0, sizeof(struct elf_shdr)); >> + ehdr->e_shnum--; >> + __md->elf.elf_offset -= region->size; >> + >> + return 0; >> +} >> + >> +/** >> + * qcom_apss_minidump_region_unregister() - Unregister region from Minidump table. >> + * @region: minidump region. >> + * >> + * Return: On success, it returns 0 and negative error value on failure. >> + */ >> +int qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region) >> +{ >> + struct minidump_region *mdr; >> + unsigned int num_region; >> + unsigned int idx; >> + int ret; >> + >> + if (!region) >> + return -EINVAL; >> + >> + mutex_lock(&minidump_lock); >> + if (!__md) { >> + ret = -EPROBE_DEFER; >> + goto unlock; >> + } >> + >> + idx = get_apss_minidump_region_index(region); >> + if (idx < 0) { >> + dev_err(__md->dev, "%s region is not present\n", region->name); >> + ret = idx; >> + goto unlock; >> + } >> + >> + mdr = &__md->md_regions[0]; >> + num_region = le32_to_cpu(__md->md_apss_toc->region_count); >> + /* >> + * Left shift all the regions exist after this removed region >> + * index by 1 to fill the gap and zero out the last region >> + * present at the end. >> + */ >> + memmove(&mdr[idx], &mdr[idx + 1], >> + (num_region - idx - 1) * sizeof(struct minidump_region)); >> + memset(&mdr[num_region - 1], 0, sizeof(struct minidump_region)); >> + ret = qcom_apss_minidump_clear_header(region); >> + if (ret) { >> + dev_err(__md->dev, "Failed to remove region: %s\n", region->name); >> + goto unlock; >> + } >> + >> + num_region--; >> + __md->md_apss_toc->region_count = cpu_to_le32(num_region); >> +unlock: >> + mutex_unlock(&minidump_lock); >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(qcom_apss_minidump_region_unregister); >> + >> +static int qcom_minidump_init_apss_subsystem(struct minidump *md) >> +{ >> + struct minidump_subsystem *apsstoc; >> + >> + apsstoc = &md->md_gbl_toc->subsystems[MINIDUMP_APSS_DESC]; >> + md->md_regions = devm_kcalloc(md->dev, MAX_NUM_ENTRIES, >> + sizeof(struct minidump_region), GFP_KERNEL); >> + if (!md->md_regions) >> + return -ENOMEM; >> + >> + md->md_apss_toc = apsstoc; >> + apsstoc->regions_baseptr = cpu_to_le64(virt_to_phys(md->md_regions)); >> + apsstoc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED); >> + apsstoc->status = cpu_to_le32(1); >> + apsstoc->region_count = cpu_to_le32(0); >> + >> + /* Tell bootloader not to encrypt the regions of this subsystem */ >> + apsstoc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE); >> + apsstoc->encryption_required = cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ); >> + >> + return 0; >> +} >> + >> +static int qcom_minidump_probe(struct platform_device *pdev) >> +{ >> + struct minidump_global_toc *mdgtoc; >> + struct minidump *md; >> + size_t size; >> + int ret; >> + >> + md = devm_kzalloc(&pdev->dev, sizeof(*md), GFP_KERNEL); >> + if (!md) >> + return -ENOMEM; >> + >> + mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size); >> + if (IS_ERR(mdgtoc)) { >> + ret = PTR_ERR(mdgtoc); >> + dev_err(&pdev->dev, "Couldn't find minidump smem item: %d\n", ret); >> + return ret; >> + } >> + >> + if (size < sizeof(*mdgtoc) || !mdgtoc->status) { >> + ret = -EINVAL; >> + dev_err(&pdev->dev, "minidump table is not initialized: %d\n", ret); >> + return ret; >> + } >> + >> + mutex_lock(&minidump_lock); >> + md->dev = &pdev->dev; >> + md->md_gbl_toc = mdgtoc; > > What are you protecting here? It's not possible to have concurrent > access to md, is it? Check qcom_apss_minidump_region_{register/unregister} and it is possible that these API gets called parallel to this probe. I agree, i made a mistake in not protecting __md in {register} API but did it unregister API in this patch, which i have fixed in later patch. > >> + ret = qcom_minidump_init_apss_subsystem(md); >> + if (ret) { >> + dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret); >> + goto unlock; >> + } >> + >> + __md = md; > > No. This is a platform device, so it can have multiple instances. It can have only one instance that is created from SMEM driver probe. > >> + /* First entry would be ELF header */ >> + ret = qcom_apss_minidump_add_elf_header(); >> + if (ret) { >> + dev_err(&pdev->dev, "Failed to add elf header: %d\n", ret); >> + memset(md->md_apss_toc, 0, sizeof(struct minidump_subsystem)); >> + __md = NULL; >> + } >> + >> +unlock: >> + mutex_unlock(&minidump_lock); >> + return ret; >> +} >> + >> +static int qcom_minidump_remove(struct platform_device *pdev) >> +{ >> + memset(__md->md_apss_toc, 0, sizeof(struct minidump_subsystem)); >> + __md = NULL; > > Don't use __ in variable names. Drop it everywhere. As i said above, this is being followed in other drivers, so followed it here as per recommendation. Let @srini comeback on this. > >> + >> + return 0; >> +} >> + >> +static struct platform_driver qcom_minidump_driver = { >> + .probe = qcom_minidump_probe, >> + .remove = qcom_minidump_remove, >> + .driver = { >> + .name = "qcom-minidump", >> + }, >> +}; >> + >> +module_platform_driver(qcom_minidump_driver); >> + >> +MODULE_DESCRIPTION("Qualcomm APSS minidump driver"); >> +MODULE_LICENSE("GPL v2"); >> +MODULE_ALIAS("platform:qcom-minidump"); >> diff --git a/drivers/soc/qcom/smem.c b/drivers/soc/qcom/smem.c >> index 6be7ea9..d459656 100644 >> --- a/drivers/soc/qcom/smem.c >> +++ b/drivers/soc/qcom/smem.c >> @@ -279,6 +279,7 @@ struct qcom_smem { >> >> u32 item_count; >> struct platform_device *socinfo; >> + struct platform_device *minidump; >> struct smem_ptable *ptable; >> struct smem_partition global_partition; >> struct smem_partition partitions[SMEM_HOST_COUNT]; >> @@ -1151,12 +1152,19 @@ static int qcom_smem_probe(struct platform_device *pdev) >> if (IS_ERR(smem->socinfo)) >> dev_dbg(&pdev->dev, "failed to register socinfo device\n"); >> >> + smem->minidump = platform_device_register_data(&pdev->dev, "qcom-minidump", >> + PLATFORM_DEVID_NONE, NULL, >> + 0); >> + if (IS_ERR(smem->minidump)) >> + dev_dbg(&pdev->dev, "failed to register minidump device\n"); >> + >> return 0; >> } >> >> static int qcom_smem_remove(struct platform_device *pdev) >> { >> platform_device_unregister(__smem->socinfo); >> + platform_device_unregister(__smem->minidump); > > Wrong order. You registered first socinfo, right? Any order is fine here, they are not dependent. But, will fix this. > >> >> hwspin_lock_free(__smem->hwlock); >> __smem = NULL; >> diff --git a/include/soc/qcom/qcom_minidump.h b/include/soc/qcom/qcom_minidump.h >> index 84c8605..1872668 100644 >> --- a/include/soc/qcom/qcom_minidump.h >> +++ b/include/soc/qcom/qcom_minidump.h >> @@ -1,6 +1,7 @@ >> /* SPDX-License-Identifier: GPL-2.0-only */ >> /* >> - * Qualcomm minidump shared data structures and macros >> + * This file contain Qualcomm minidump data structures and macros shared with >> + * boot firmware and also apss minidump client's data structure >> * >> * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved. >> */ >> @@ -9,12 +10,27 @@ >> #define _QCOM_MINIDUMP_H_ >> >> #define MAX_NUM_OF_SS 10 >> +#define MAX_NAME_LENGTH 12 >> #define MAX_REGION_NAME_LENGTH 16 >> + >> +#define MINIDUMP_REVISION 1 >> #define SBL_MINIDUMP_SMEM_ID 602 >> + >> +/* Application processor minidump descriptor */ >> +#define MINIDUMP_APSS_DESC 0 >> +#define SMEM_ENTRY_SIZE 40 >> + >> #define MINIDUMP_REGION_VALID ('V' << 24 | 'A' << 16 | 'L' << 8 | 'I' << 0) >> +#define MINIDUMP_REGION_INVALID ('I' << 24 | 'N' << 16 | 'V' << 8 | 'A' << 0) >> +#define MINIDUMP_REGION_INIT ('I' << 24 | 'N' << 16 | 'I' << 8 | 'T' << 0) >> +#define MINIDUMP_REGION_NOINIT 0 >> + >> +#define MINIDUMP_SS_ENCR_REQ (0 << 24 | 'Y' << 16 | 'E' << 8 | 'S' << 0) >> +#define MINIDUMP_SS_ENCR_NOTREQ (0 << 24 | 0 << 16 | 'N' << 8 | 'R' << 0) >> +#define MINIDUMP_SS_ENCR_NONE ('N' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0) >> #define MINIDUMP_SS_ENCR_DONE ('D' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0) >> +#define MINIDUMP_SS_ENCR_START ('S' << 24 | 'T' << 16 | 'R' << 8 | 'T' << 0) >> #define MINIDUMP_SS_ENABLED ('E' << 24 | 'N' << 16 | 'B' << 8 | 'L' << 0) >> - > > Why removing this? Ok, will keep it. > >> /** >> * struct minidump_region - Minidump region >> * @name : Name of the region to be dumped >> @@ -63,4 +79,45 @@ struct minidump_global_toc { >> struct minidump_subsystem subsystems[MAX_NUM_OF_SS]; >> }; >> >> +/** >> + * struct qcom_apss_minidump_region - APSS Minidump region information >> + * >> + * @name: Entry name, Minidump will dump binary with this name. >> + * @virt_addr: Virtual address of the entry. >> + * @phys_addr: Physical address of the entry to dump. >> + * @size: Number of byte to dump from @address location, >> + * and it should be 4 byte aligned. >> + */ >> +struct qcom_apss_minidump_region { >> + char name[MAX_NAME_LENGTH]; >> + void *virt_addr; >> + phys_addr_t phys_addr; >> + size_t size; >> +}; > > You expose way too much internals in global header. This suppose to shared with all minidump clients. As already discussed in earlier patch, will keep the one which get shared with remoteproc as differnet header if the people who have reviewed till now agree to this common stuff. > >> + >> +#if IS_ENABLED(CONFIG_QCOM_MINIDUMP) >> +extern struct minidump_subsystem * > > No externs. ok. > The header is unreadable. > >> +qcom_minidump_subsystem_desc(unsigned int minidump_index); >> +extern int >> +qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region); >> +extern int >> +qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region); > > Blank line > >> +#else > > Blank line > >> +static inline >> +struct minidump_subsystem *qcom_minidump_subsystem_desc(unsigned int minidump_index) >> +{ >> + return NULL; >> +} > > Blank line > >> +static inline int >> +qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region) >> +{ >> + /* Return quietly, if minidump is not enabled */ >> + return 0; >> +} > > >> +static inline int >> +qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region) >> +{ >> + return 0; >> +} > >> +#endif > > /* CONFIG_QCOM_MINIDUMP */ > >> #endif /* _QCOM_MINIDUMP_H_ */ Noted, thanks. > > Best regards, > Krzysztof >
On 5/4/2023 6:06 PM, Krzysztof Kozlowski wrote: > On 04/05/2023 14:26, Mukesh Ojha wrote: >> >> >> On 5/4/2023 5:33 PM, Krzysztof Kozlowski wrote: >>> On 04/05/2023 13:58, Mukesh Ojha wrote: >>>> >>>> >>>> On 5/4/2023 5:08 PM, Krzysztof Kozlowski wrote: >>>>> On 03/05/2023 19:02, Mukesh Ojha wrote: >>>>>> Move minidump specific data types and macros to a separate internal >>>>>> header(qcom_minidump.h) so that it can be shared among different >>>>>> Qualcomm drivers. >>>>> >>>>> No, this is not internal header. You moved it to global header. >>>>> >>>>> There is no reason driver internals should be exposed to other unrelated >>>>> subsystems. >>>>> >>>>>> >>>>>> There is no change in functional behavior after this. >>>>> >>>>> It is. You made all these internal symbols available to others. >>>>> >>>>>> >>>>> >>>>> This comes without justification why other drivers needs to access >>>>> private and internal data. It does not look correct design. NAK. >>>> >>>> Thanks for catching outdated commit text, will fix the commit with >>>> more descriptive reasoning. >>>> >>>> It has to be global so that co-processor minidump and apss minidump can >>>> share data structure and they are lying in different directory. >>>> >>> >>> Then you should not share all the internals of memory layout but only >>> few pieces necessary to talk with minidump driver. The minidump driver >>> should organize everything how it wants. >> >> These are core data structure which is shared with boot firmware and the >> one's are moved here all are required by minidump driver . > > I am not sure if I understand correctly. If they are all required by > minidump driver, then this must not be in include, but stay with > minidump. Remoteproc then should not touch it. > > I don't understand why internals of minidump should be important for > remoteproc. If they are, means you broken encapsulation. > >> >> If you follow here[1], i raised by concern to make this particular one's >> as private and later to avoid confusion went with single header. >> But if others agree, I will keep the one that get shared with minidump >> as separate one or if relative path of headers are allowed that can make >> it private between these drivers(which i don't think, will be allowed or >> recommended). > > Let's be specific: why MD_REGION_VALID must be available for remoteproc > or any other driver after introducing qcom minidump driver? Forget about this driver for a moment. I am not sure how much you know about existing qcom_minidump() implementation and why is it there in first place in remoteproc code in driver/remoteproc/qcom_common.c The idea is, remoteproc co-processor like adsp/cdsp etc. may have their static predefined region (segments) to be collected on their crash which is what exactly existing qcom_minidump() is doing. Now, after this minidump series, APSS (linux) will have it's own of collecting linux client region independent of whether remoteproc minidump collection. I think, are you hinting to move all minidump related code from remoteproc to qcom_minidump driver, is this what are you trying to say ? -- Mukesh > > > Best regards, > Krzysztof >
On 5/4/2023 6:02 PM, Krzysztof Kozlowski wrote: > On 04/05/2023 13:45, Mukesh Ojha wrote: >> >> >> On 5/4/2023 4:53 PM, Krzysztof Kozlowski wrote: >>> On 03/05/2023 19:02, Mukesh Ojha wrote: >>>> Previous patches add the Qualcomm minidump driver support, so >>>> lets enable minidump config so that it can be used by kernel >>>> clients. >>>> >>>> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> >>> >>> This patchset is split too much. Defconfig change is one change. Not two >>> or three. >>> >>>> --- >>>> arch/arm64/configs/defconfig | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig >>>> index a24609e..831c942 100644 >>>> --- a/arch/arm64/configs/defconfig >>>> +++ b/arch/arm64/configs/defconfig >>>> @@ -1250,6 +1250,7 @@ CONFIG_QCOM_STATS=m >>>> CONFIG_QCOM_WCNSS_CTRL=m >>>> CONFIG_QCOM_APR=m >>>> CONFIG_QCOM_ICC_BWMON=m >>>> +CONFIG_QCOM_MINIDUMP=y >>> >>> This must be a module. >> >> Why do you think this should be a module ? >> >> Is it because, it is lying here among others '=m' ? > > Because we want and insist on everything being a module. That's the > generic rule. There are exceptions, so if this justifies being an > exception, please bring appropriate arguments. > >> >> Or you have some other reasoning ? like it is for qcom specific >> soc and can not be used outside ? but that is not true for >> all configs mentioned here. >> >> The reason behind making it as '=y' was, to collect information from >> core kernel data structure as well as the information like percpu data, >> run queue, irq stat kind of information on kernel crash on a target >> running some perf configuration(android phone). > > I don't understand why =m stops you from all that. How do i get kernel symbol address from a modules can we use kallsyms_lookup_name from modules ? --Mukesh What's more, I don't > understand why do you refer to the Android here. This is a development > and debugging Linux defconfig, not Android reference config for vendors... > > Best regards, > Krzysztof >
On 04/05/2023 14:57, Mukesh Ojha wrote: > > > On 5/4/2023 6:06 PM, Krzysztof Kozlowski wrote: >> On 04/05/2023 14:26, Mukesh Ojha wrote: >>> >>> >>> On 5/4/2023 5:33 PM, Krzysztof Kozlowski wrote: >>>> On 04/05/2023 13:58, Mukesh Ojha wrote: >>>>> >>>>> >>>>> On 5/4/2023 5:08 PM, Krzysztof Kozlowski wrote: >>>>>> On 03/05/2023 19:02, Mukesh Ojha wrote: >>>>>>> Move minidump specific data types and macros to a separate internal >>>>>>> header(qcom_minidump.h) so that it can be shared among different >>>>>>> Qualcomm drivers. >>>>>> >>>>>> No, this is not internal header. You moved it to global header. >>>>>> >>>>>> There is no reason driver internals should be exposed to other unrelated >>>>>> subsystems. >>>>>> >>>>>>> >>>>>>> There is no change in functional behavior after this. >>>>>> >>>>>> It is. You made all these internal symbols available to others. >>>>>> >>>>>>> >>>>>> >>>>>> This comes without justification why other drivers needs to access >>>>>> private and internal data. It does not look correct design. NAK. >>>>> >>>>> Thanks for catching outdated commit text, will fix the commit with >>>>> more descriptive reasoning. >>>>> >>>>> It has to be global so that co-processor minidump and apss minidump can >>>>> share data structure and they are lying in different directory. >>>>> >>>> >>>> Then you should not share all the internals of memory layout but only >>>> few pieces necessary to talk with minidump driver. The minidump driver >>>> should organize everything how it wants. >>> >>> These are core data structure which is shared with boot firmware and the >>> one's are moved here all are required by minidump driver . >> >> I am not sure if I understand correctly. If they are all required by >> minidump driver, then this must not be in include, but stay with >> minidump. Remoteproc then should not touch it. >> >> I don't understand why internals of minidump should be important for >> remoteproc. If they are, means you broken encapsulation. >> >>> >>> If you follow here[1], i raised by concern to make this particular one's >>> as private and later to avoid confusion went with single header. >>> But if others agree, I will keep the one that get shared with minidump >>> as separate one or if relative path of headers are allowed that can make >>> it private between these drivers(which i don't think, will be allowed or >>> recommended). >> >> Let's be specific: why MD_REGION_VALID must be available for remoteproc >> or any other driver after introducing qcom minidump driver? > > Forget about this driver for a moment. > > I am not sure how much you know about existing qcom_minidump() > implementation and why is it there in first place in remoteproc > code in driver/remoteproc/qcom_common.c > > The idea is, remoteproc co-processor like adsp/cdsp etc. may have their > static predefined region (segments) to be collected on their crash which > is what exactly existing qcom_minidump() is doing. > > Now, after this minidump series, APSS (linux) will have it's > own of collecting linux client region independent of whether > remoteproc minidump collection. > > I think, are you hinting to move all minidump related code from > remoteproc to qcom_minidump driver, is this what are you trying > to say ? Close, not all but the ones not necessary to identify the regions/storage/layout. If some variable about this region/storage/layout is the same everywhere, it means it's basically a property of qcom minidump and you have just exposed it to consumers breaking encapsulation. Best regards, Krzysztof
On 04/05/2023 14:38, Mukesh Ojha wrote: > > > On 5/4/2023 5:06 PM, Krzysztof Kozlowski wrote: >> On 03/05/2023 19:02, Mukesh Ojha wrote: >>> Minidump is a best effort mechanism to collect useful and predefined >>> data for first level of debugging on end user devices running on >>> Qualcomm SoCs. It is built on the premise that System on Chip (SoC) >>> or subsystem part of SoC crashes, due to a range of hardware and >>> software bugs. Hence, the ability to collect accurate data is only >>> a best-effort. The data collected could be invalid or corrupted, >>> data collection itself could fail, and so on. >>> >>> Qualcomm devices in engineering mode provides a mechanism for >>> generating full system ramdumps for post mortem debugging. But in some >>> cases it's however not feasible to capture the entire content of RAM. >>> The minidump mechanism provides the means for selecting region should >>> be included in the ramdump. The solution supports extracting the >>> ramdump/minidump produced either over USB or stored to an attached >>> storage device. >>> >>> The core of minidump feature is part of Qualcomm's boot firmware code. >>> It initializes shared memory(SMEM), which is a part of DDR and >>> allocates a small section of it to minidump table i.e also called >>> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has >>> their own table of segments to be included in the minidump, all >>> references from a descriptor in SMEM (G-ToC). Each segment/region has >>> some details like name, physical address and it's size etc. and it >>> could be anywhere scattered in the DDR. >>> >>> Minidump kernel driver adds the capability to add linux region to be >>> dumped as part of ram dump collection. It provides appropriate symbol >>> to check its enablement and register client regions. >>> >>> To simplify post mortem debugging, it creates and maintain an ELF >>> header as first region that gets updated upon registration >>> of a new region. >>> >>> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> >>> --- >>> drivers/soc/qcom/Kconfig | 14 + >>> drivers/soc/qcom/Makefile | 1 + >>> drivers/soc/qcom/qcom_minidump.c | 581 +++++++++++++++++++++++++++++++++++++++ >>> drivers/soc/qcom/smem.c | 8 + >>> include/soc/qcom/qcom_minidump.h | 61 +++- >>> 5 files changed, 663 insertions(+), 2 deletions(-) >>> create mode 100644 drivers/soc/qcom/qcom_minidump.c >>> >>> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig >>> index a491718..15c931e 100644 >>> --- a/drivers/soc/qcom/Kconfig >>> +++ b/drivers/soc/qcom/Kconfig >>> @@ -279,4 +279,18 @@ config QCOM_INLINE_CRYPTO_ENGINE >>> tristate >>> select QCOM_SCM >>> >>> +config QCOM_MINIDUMP >>> + tristate "QCOM Minidump Support" >>> + depends on ARCH_QCOM || COMPILE_TEST >>> + select QCOM_SMEM >>> + help >>> + Enablement of core minidump feature is controlled from boot firmware >>> + side, and this config allow linux to query and manages APPS minidump >>> + table. >>> + >>> + Client drivers can register their internal data structures and debug >>> + messages as part of the minidump region and when the SoC is crashed, >>> + these selective regions will be dumped instead of the entire DDR. >>> + This saves significant amount of time and/or storage space. >>> + >>> endmenu >>> diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile >>> index 0f43a88..1ebe081 100644 >>> --- a/drivers/soc/qcom/Makefile >>> +++ b/drivers/soc/qcom/Makefile >>> @@ -33,3 +33,4 @@ obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o >>> obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) += kryo-l2-accessors.o >>> obj-$(CONFIG_QCOM_ICC_BWMON) += icc-bwmon.o >>> obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE) += ice.o >>> +obj-$(CONFIG_QCOM_MINIDUMP) += qcom_minidump.o >>> diff --git a/drivers/soc/qcom/qcom_minidump.c b/drivers/soc/qcom/qcom_minidump.c >>> new file mode 100644 >>> index 0000000..d107a86 >>> --- /dev/null >>> +++ b/drivers/soc/qcom/qcom_minidump.c >>> @@ -0,0 +1,581 @@ >>> +// SPDX-License-Identifier: GPL-2.0-only >>> + >>> +/* >>> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved. >>> + */ >>> + >>> +#include <linux/elf.h> >>> +#include <linux/err.h> >>> +#include <linux/errno.h> >>> +#include <linux/export.h> >>> +#include <linux/init.h> >>> +#include <linux/io.h> >>> +#include <linux/kernel.h> >>> +#include <linux/module.h> >>> +#include <linux/platform_device.h> >>> +#include <linux/string.h> >>> +#include <linux/soc/qcom/smem.h> >>> +#include <soc/qcom/qcom_minidump.h> >>> + >>> +/** >>> + * struct minidump_elfhdr - Minidump table elf header >>> + * @ehdr: Elf main header >>> + * @shdr: Section header >>> + * @phdr: Program header >>> + * @elf_offset: Section offset in elf >>> + * @strtable_idx: String table current index position >>> + */ >>> +struct minidump_elfhdr { >>> + struct elfhdr *ehdr; >>> + struct elf_shdr *shdr; >>> + struct elf_phdr *phdr; >>> + size_t elf_offset; >>> + size_t strtable_idx; >>> +}; >>> + >>> +/** >>> + * struct minidump - Minidump driver private data >>> + * @md_gbl_toc : Global TOC pointer >>> + * @md_apss_toc : Application Subsystem TOC pointer >>> + * @md_regions : High level OS region base pointer >>> + * @elf : Minidump elf header >>> + * @dev : Minidump device >>> + */ >>> +struct minidump { >>> + struct minidump_global_toc *md_gbl_toc; >>> + struct minidump_subsystem *md_apss_toc; >>> + struct minidump_region *md_regions; >>> + struct minidump_elfhdr elf; >>> + struct device *dev; >>> +}; >>> + >>> +/* >>> + * In some of the Old Qualcomm devices, boot firmware statically allocates 300 >>> + * as total number of supported region (including all co-processors) in >>> + * minidump table out of which linux was using 201. In future, this limitation >>> + * from boot firmware might get removed by allocating the region dynamically. >>> + * So, keep it compatible with older devices, we can keep the current limit for >>> + * Linux to 201. >>> + */ >>> +#define MAX_NUM_ENTRIES 201 >>> +#define MAX_STRTBL_SIZE (MAX_NUM_ENTRIES * MAX_REGION_NAME_LENGTH) >>> + >>> +static struct minidump *__md; >> >> No, no file scope or global scope statics. > > Sorry, this is done as per recommendation given here [1] and this > matches both driver/firmware/qcom_scm.c and driver/soc/qcom/smem.c > implementations. > > [1] > https://lore.kernel.org/lkml/f74dfcde-e59b-a9b3-9bbc-a8de644f6740@linaro.org/ That's not true. You had the static already in v2, before Srini commented. Look: https://lore.kernel.org/lkml/1679491817-2498-5-git-send-email-quic_mojha@quicinc.com/ +static struct minidump minidump; +static DEFINE_MUTEX(minidump_lock); We do not talk about the names. >>> + >>> + if (size < sizeof(*mdgtoc) || !mdgtoc->status) { >>> + ret = -EINVAL; >>> + dev_err(&pdev->dev, "minidump table is not initialized: %d\n", ret); >>> + return ret; >>> + } >>> + >>> + mutex_lock(&minidump_lock); >>> + md->dev = &pdev->dev; >>> + md->md_gbl_toc = mdgtoc; >> >> What are you protecting here? It's not possible to have concurrent >> access to md, is it? > > Check qcom_apss_minidump_region_{register/unregister} and it is possible > that these API gets called parallel to this probe. Wait, you say that something can modify local variable md before it is assigned to __md? How? > > I agree, i made a mistake in not protecting __md in {register} API > but did it unregister API in this patch, which i have fixed in later patch. No, you are protecting random things. Nothing will concurrently modify md and &pdev->dev in this moment. mdgtoc is allocated above, so also cannot by modified. Otherwise show me the hypothetical scenario. > >> >>> + ret = qcom_minidump_init_apss_subsystem(md); >>> + if (ret) { >>> + dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret); >>> + goto unlock; >>> + } >>> + >>> + __md = md; >> >> No. This is a platform device, so it can have multiple instances. > > It can have only one instance that is created from SMEM driver probe. Anyone can instantiate more of them.... how did you solve it? > >> >>> + /* First entry would be ELF header */ >>> + ret = qcom_apss_minidump_add_elf_header(); >>> + if (ret) { >>> + dev_err(&pdev->dev, "Failed to add elf header: %d\n", ret); >>> + memset(md->md_apss_toc, 0, sizeof(struct minidump_subsystem)); >>> + __md = NULL; >>> + } >>> + >>> +unlock: >>> + mutex_unlock(&minidump_lock); >>> + return ret; >>> +} >>> + >>> +static int qcom_minidump_remove(struct platform_device *pdev) >>> +{ >>> + memset(__md->md_apss_toc, 0, sizeof(struct minidump_subsystem)); >>> + __md = NULL; >> >> Don't use __ in variable names. Drop it everywhere. > > As i said above, this is being followed in other drivers, so followed > it here as per recommendation. > > Let @srini comeback on this. Which part of coding style recommends __ for driver code? > >> >>> + >>> + return 0; >>> +} >>> + >>> +static struct platform_driver qcom_minidump_driver = { >>> + .probe = qcom_minidump_probe, >>> + .remove = qcom_minidump_remove, >>> + .driver = { >>> + .name = "qcom-minidump", >>> + }, >>> +}; >>> + >>> +module_platform_driver(qcom_minidump_driver); >>> + >>> +MODULE_DESCRIPTION("Qualcomm APSS minidump driver"); >>> +MODULE_LICENSE("GPL v2"); >>> +MODULE_ALIAS("platform:qcom-minidump"); >>> diff --git a/drivers/soc/qcom/smem.c b/drivers/soc/qcom/smem.c >>> index 6be7ea9..d459656 100644 >>> --- a/drivers/soc/qcom/smem.c >>> +++ b/drivers/soc/qcom/smem.c >>> @@ -279,6 +279,7 @@ struct qcom_smem { >>> >>> u32 item_count; >>> struct platform_device *socinfo; >>> + struct platform_device *minidump; >>> struct smem_ptable *ptable; >>> struct smem_partition global_partition; >>> struct smem_partition partitions[SMEM_HOST_COUNT]; >>> @@ -1151,12 +1152,19 @@ static int qcom_smem_probe(struct platform_device *pdev) >>> if (IS_ERR(smem->socinfo)) >>> dev_dbg(&pdev->dev, "failed to register socinfo device\n"); >>> >>> + smem->minidump = platform_device_register_data(&pdev->dev, "qcom-minidump", >>> + PLATFORM_DEVID_NONE, NULL, >>> + 0); >>> + if (IS_ERR(smem->minidump)) >>> + dev_dbg(&pdev->dev, "failed to register minidump device\n"); >>> + >>> return 0; >>> } >>> >>> static int qcom_smem_remove(struct platform_device *pdev) >>> { >>> platform_device_unregister(__smem->socinfo); >>> + platform_device_unregister(__smem->minidump); >> >> Wrong order. You registered first socinfo, right? > > Any order is fine here, they are not dependent. > But, will fix this. No, the order is always reversed from allocation. It does not matter if they are dependent or not. Best regards, Krzysztof
On 04/05/2023 16:43, Mukesh Ojha wrote: > > > On 5/4/2023 6:02 PM, Krzysztof Kozlowski wrote: >> On 04/05/2023 13:45, Mukesh Ojha wrote: >>> >>> >>> On 5/4/2023 4:53 PM, Krzysztof Kozlowski wrote: >>>> On 03/05/2023 19:02, Mukesh Ojha wrote: >>>>> Previous patches add the Qualcomm minidump driver support, so >>>>> lets enable minidump config so that it can be used by kernel >>>>> clients. >>>>> >>>>> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> >>>> >>>> This patchset is split too much. Defconfig change is one change. Not two >>>> or three. >>>> >>>>> --- >>>>> arch/arm64/configs/defconfig | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig >>>>> index a24609e..831c942 100644 >>>>> --- a/arch/arm64/configs/defconfig >>>>> +++ b/arch/arm64/configs/defconfig >>>>> @@ -1250,6 +1250,7 @@ CONFIG_QCOM_STATS=m >>>>> CONFIG_QCOM_WCNSS_CTRL=m >>>>> CONFIG_QCOM_APR=m >>>>> CONFIG_QCOM_ICC_BWMON=m >>>>> +CONFIG_QCOM_MINIDUMP=y >>>> >>>> This must be a module. >>> >>> Why do you think this should be a module ? >>> >>> Is it because, it is lying here among others '=m' ? >> >> Because we want and insist on everything being a module. That's the >> generic rule. There are exceptions, so if this justifies being an >> exception, please bring appropriate arguments. >> >>> >>> Or you have some other reasoning ? like it is for qcom specific >>> soc and can not be used outside ? but that is not true for >>> all configs mentioned here. >>> >>> The reason behind making it as '=y' was, to collect information from >>> core kernel data structure as well as the information like percpu data, >>> run queue, irq stat kind of information on kernel crash on a target >>> running some perf configuration(android phone). >> >> I don't understand why =m stops you from all that. > > How do i get kernel symbol address from a modules > can we use kallsyms_lookup_name from modules ? You allow it to be a module in patch #4, so I think you solved it, right? Otherwise it could not be a module? Anyway, where do you use kallsyms_lookup_name()? I cannot find it in your patch. Best regards, Krzysztof
On 04/05/2023 17:21, Krzysztof Kozlowski wrote: >>> >>>> + ret = qcom_minidump_init_apss_subsystem(md); >>>> + if (ret) { >>>> + dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret); >>>> + goto unlock; >>>> + } >>>> + >>>> + __md = md; >>> >>> No. This is a platform device, so it can have multiple instances. >> >> It can have only one instance that is created from SMEM driver probe. > > Anyone can instantiate more of them.... how did you solve it? To clarify - sprinkling more of singletons makes everything tightly coupled, difficult to debug and non-portable. You cannot have two instances, you have to control concurrent initialization by yourself in each of such singletons. I understand sometimes they are unavoidable, for example when this does not map to hardware property. However here you have the parent - smem - which can return you valid instance. Thus you avoid entire problem of file-scope variables. Best regards, Krzysztof
On 5/4/2023 8:51 PM, Krzysztof Kozlowski wrote: > On 04/05/2023 14:38, Mukesh Ojha wrote: >> >> >> On 5/4/2023 5:06 PM, Krzysztof Kozlowski wrote: >>> On 03/05/2023 19:02, Mukesh Ojha wrote: >>>> Minidump is a best effort mechanism to collect useful and predefined >>>> data for first level of debugging on end user devices running on >>>> Qualcomm SoCs. It is built on the premise that System on Chip (SoC) >>>> or subsystem part of SoC crashes, due to a range of hardware and >>>> software bugs. Hence, the ability to collect accurate data is only >>>> a best-effort. The data collected could be invalid or corrupted, >>>> data collection itself could fail, and so on. >>>> >>>> Qualcomm devices in engineering mode provides a mechanism for >>>> generating full system ramdumps for post mortem debugging. But in some >>>> cases it's however not feasible to capture the entire content of RAM. >>>> The minidump mechanism provides the means for selecting region should >>>> be included in the ramdump. The solution supports extracting the >>>> ramdump/minidump produced either over USB or stored to an attached >>>> storage device. >>>> >>>> The core of minidump feature is part of Qualcomm's boot firmware code. >>>> It initializes shared memory(SMEM), which is a part of DDR and >>>> allocates a small section of it to minidump table i.e also called >>>> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has >>>> their own table of segments to be included in the minidump, all >>>> references from a descriptor in SMEM (G-ToC). Each segment/region has >>>> some details like name, physical address and it's size etc. and it >>>> could be anywhere scattered in the DDR. >>>> >>>> Minidump kernel driver adds the capability to add linux region to be >>>> dumped as part of ram dump collection. It provides appropriate symbol >>>> to check its enablement and register client regions. >>>> >>>> To simplify post mortem debugging, it creates and maintain an ELF >>>> header as first region that gets updated upon registration >>>> of a new region. >>>> >>>> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> >>>> --- >>>> drivers/soc/qcom/Kconfig | 14 + >>>> drivers/soc/qcom/Makefile | 1 + >>>> drivers/soc/qcom/qcom_minidump.c | 581 +++++++++++++++++++++++++++++++++++++++ >>>> drivers/soc/qcom/smem.c | 8 + >>>> include/soc/qcom/qcom_minidump.h | 61 +++- >>>> 5 files changed, 663 insertions(+), 2 deletions(-) >>>> create mode 100644 drivers/soc/qcom/qcom_minidump.c >>>> >>>> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig >>>> index a491718..15c931e 100644 >>>> --- a/drivers/soc/qcom/Kconfig >>>> +++ b/drivers/soc/qcom/Kconfig >>>> @@ -279,4 +279,18 @@ config QCOM_INLINE_CRYPTO_ENGINE >>>> tristate >>>> select QCOM_SCM >>>> >>>> +config QCOM_MINIDUMP >>>> + tristate "QCOM Minidump Support" >>>> + depends on ARCH_QCOM || COMPILE_TEST >>>> + select QCOM_SMEM >>>> + help >>>> + Enablement of core minidump feature is controlled from boot firmware >>>> + side, and this config allow linux to query and manages APPS minidump >>>> + table. >>>> + >>>> + Client drivers can register their internal data structures and debug >>>> + messages as part of the minidump region and when the SoC is crashed, >>>> + these selective regions will be dumped instead of the entire DDR. >>>> + This saves significant amount of time and/or storage space. >>>> + >>>> endmenu >>>> diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile >>>> index 0f43a88..1ebe081 100644 >>>> --- a/drivers/soc/qcom/Makefile >>>> +++ b/drivers/soc/qcom/Makefile >>>> @@ -33,3 +33,4 @@ obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o >>>> obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) += kryo-l2-accessors.o >>>> obj-$(CONFIG_QCOM_ICC_BWMON) += icc-bwmon.o >>>> obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE) += ice.o >>>> +obj-$(CONFIG_QCOM_MINIDUMP) += qcom_minidump.o >>>> diff --git a/drivers/soc/qcom/qcom_minidump.c b/drivers/soc/qcom/qcom_minidump.c >>>> new file mode 100644 >>>> index 0000000..d107a86 >>>> --- /dev/null >>>> +++ b/drivers/soc/qcom/qcom_minidump.c >>>> @@ -0,0 +1,581 @@ >>>> +// SPDX-License-Identifier: GPL-2.0-only >>>> + >>>> +/* >>>> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved. >>>> + */ >>>> + >>>> +#include <linux/elf.h> >>>> +#include <linux/err.h> >>>> +#include <linux/errno.h> >>>> +#include <linux/export.h> >>>> +#include <linux/init.h> >>>> +#include <linux/io.h> >>>> +#include <linux/kernel.h> >>>> +#include <linux/module.h> >>>> +#include <linux/platform_device.h> >>>> +#include <linux/string.h> >>>> +#include <linux/soc/qcom/smem.h> >>>> +#include <soc/qcom/qcom_minidump.h> >>>> + >>>> +/** >>>> + * struct minidump_elfhdr - Minidump table elf header >>>> + * @ehdr: Elf main header >>>> + * @shdr: Section header >>>> + * @phdr: Program header >>>> + * @elf_offset: Section offset in elf >>>> + * @strtable_idx: String table current index position >>>> + */ >>>> +struct minidump_elfhdr { >>>> + struct elfhdr *ehdr; >>>> + struct elf_shdr *shdr; >>>> + struct elf_phdr *phdr; >>>> + size_t elf_offset; >>>> + size_t strtable_idx; >>>> +}; >>>> + >>>> +/** >>>> + * struct minidump - Minidump driver private data >>>> + * @md_gbl_toc : Global TOC pointer >>>> + * @md_apss_toc : Application Subsystem TOC pointer >>>> + * @md_regions : High level OS region base pointer >>>> + * @elf : Minidump elf header >>>> + * @dev : Minidump device >>>> + */ >>>> +struct minidump { >>>> + struct minidump_global_toc *md_gbl_toc; >>>> + struct minidump_subsystem *md_apss_toc; >>>> + struct minidump_region *md_regions; >>>> + struct minidump_elfhdr elf; >>>> + struct device *dev; >>>> +}; >>>> + >>>> +/* >>>> + * In some of the Old Qualcomm devices, boot firmware statically allocates 300 >>>> + * as total number of supported region (including all co-processors) in >>>> + * minidump table out of which linux was using 201. In future, this limitation >>>> + * from boot firmware might get removed by allocating the region dynamically. >>>> + * So, keep it compatible with older devices, we can keep the current limit for >>>> + * Linux to 201. >>>> + */ >>>> +#define MAX_NUM_ENTRIES 201 >>>> +#define MAX_STRTBL_SIZE (MAX_NUM_ENTRIES * MAX_REGION_NAME_LENGTH) >>>> + >>>> +static struct minidump *__md; >>> >>> No, no file scope or global scope statics. >> >> Sorry, this is done as per recommendation given here [1] and this >> matches both driver/firmware/qcom_scm.c and driver/soc/qcom/smem.c >> implementations. >> >> [1] >> https://lore.kernel.org/lkml/f74dfcde-e59b-a9b3-9bbc-a8de644f6740@linaro.org/ > > That's not true. You had the static already in v2, before Srini commented. > > Look: > https://lore.kernel.org/lkml/1679491817-2498-5-git-send-email-quic_mojha@quicinc.com/ > > +static struct minidump minidump; > +static DEFINE_MUTEX(minidump_lock); > > We do not talk about the names. I apologize for this. > > >>>> + >>>> + if (size < sizeof(*mdgtoc) || !mdgtoc->status) { >>>> + ret = -EINVAL; >>>> + dev_err(&pdev->dev, "minidump table is not initialized: %d\n", ret); >>>> + return ret; >>>> + } >>>> + >>>> + mutex_lock(&minidump_lock); >>>> + md->dev = &pdev->dev; >>>> + md->md_gbl_toc = mdgtoc; >>> >>> What are you protecting here? It's not possible to have concurrent >>> access to md, is it? >> >> Check qcom_apss_minidump_region_{register/unregister} and it is possible >> that these API gets called parallel to this probe. > > Wait, you say that something can modify local variable md before it is > assigned to __md? How? No. >> >> I agree, i made a mistake in not protecting __md in {register} API >> but did it unregister API in this patch, which i have fixed in later patch. > > No, you are protecting random things. Nothing will concurrently modify > md and &pdev->dev in this moment. mdgtoc is allocated above, so also > cannot by modified. > > Otherwise show me the hypothetical scenario. You are correct, it should just protect the assignment. __md = md; Thanks > > >> >>> >>>> + ret = qcom_minidump_init_apss_subsystem(md); >>>> + if (ret) { >>>> + dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret); >>>> + goto unlock; >>>> + } >>>> + >>>> + __md = md; >>> >>> No. This is a platform device, so it can have multiple instances. >> >> It can have only one instance that is created from SMEM driver probe. > > Anyone can instantiate more of them.... how did you solve it? > > >> >>> >>>> + /* First entry would be ELF header */ >>>> + ret = qcom_apss_minidump_add_elf_header(); >>>> + if (ret) { >>>> + dev_err(&pdev->dev, "Failed to add elf header: %d\n", ret); >>>> + memset(md->md_apss_toc, 0, sizeof(struct minidump_subsystem)); >>>> + __md = NULL; >>>> + } >>>> + >>>> +unlock: >>>> + mutex_unlock(&minidump_lock); >>>> + return ret; >>>> +} >>>> + >>>> +static int qcom_minidump_remove(struct platform_device *pdev) >>>> +{ >>>> + memset(__md->md_apss_toc, 0, sizeof(struct minidump_subsystem)); >>>> + __md = NULL; >>> >>> Don't use __ in variable names. Drop it everywhere. >> >> As i said above, this is being followed in other drivers, so followed >> it here as per recommendation. >> >> Let @srini comeback on this. > > Which part of coding style recommends __ for driver code? Will fix this. > >> >>> >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static struct platform_driver qcom_minidump_driver = { >>>> + .probe = qcom_minidump_probe, >>>> + .remove = qcom_minidump_remove, >>>> + .driver = { >>>> + .name = "qcom-minidump", >>>> + }, >>>> +}; >>>> + >>>> +module_platform_driver(qcom_minidump_driver); >>>> + >>>> +MODULE_DESCRIPTION("Qualcomm APSS minidump driver"); >>>> +MODULE_LICENSE("GPL v2"); >>>> +MODULE_ALIAS("platform:qcom-minidump"); >>>> diff --git a/drivers/soc/qcom/smem.c b/drivers/soc/qcom/smem.c >>>> index 6be7ea9..d459656 100644 >>>> --- a/drivers/soc/qcom/smem.c >>>> +++ b/drivers/soc/qcom/smem.c >>>> @@ -279,6 +279,7 @@ struct qcom_smem { >>>> >>>> u32 item_count; >>>> struct platform_device *socinfo; >>>> + struct platform_device *minidump; >>>> struct smem_ptable *ptable; >>>> struct smem_partition global_partition; >>>> struct smem_partition partitions[SMEM_HOST_COUNT]; >>>> @@ -1151,12 +1152,19 @@ static int qcom_smem_probe(struct platform_device *pdev) >>>> if (IS_ERR(smem->socinfo)) >>>> dev_dbg(&pdev->dev, "failed to register socinfo device\n"); >>>> >>>> + smem->minidump = platform_device_register_data(&pdev->dev, "qcom-minidump", >>>> + PLATFORM_DEVID_NONE, NULL, >>>> + 0); >>>> + if (IS_ERR(smem->minidump)) >>>> + dev_dbg(&pdev->dev, "failed to register minidump device\n"); >>>> + >>>> return 0; >>>> } >>>> >>>> static int qcom_smem_remove(struct platform_device *pdev) >>>> { >>>> platform_device_unregister(__smem->socinfo); >>>> + platform_device_unregister(__smem->minidump); >>> >>> Wrong order. You registered first socinfo, right? >> >> Any order is fine here, they are not dependent. >> But, will fix this. > > No, the order is always reversed from allocation. It does not matter if > they are dependent or not. Ok > > Best regards, > Krzysztof > -- Mukesh
On 5/4/2023 10:04 PM, Krzysztof Kozlowski wrote: > On 04/05/2023 17:21, Krzysztof Kozlowski wrote: >>>> >>>>> + ret = qcom_minidump_init_apss_subsystem(md); >>>>> + if (ret) { >>>>> + dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret); >>>>> + goto unlock; >>>>> + } >>>>> + >>>>> + __md = md; >>>> >>>> No. This is a platform device, so it can have multiple instances. >>> >>> It can have only one instance that is created from SMEM driver probe. >> >> Anyone can instantiate more of them.... how did you solve it? > > To clarify - sprinkling more of singletons makes everything tightly > coupled, difficult to debug and non-portable. You cannot have two > instances, you have to control concurrent initialization by yourself in > each of such singletons. > > I understand sometimes they are unavoidable, for example when this does > not map to hardware property. However here you have the parent - smem - > which can return you valid instance. Thus you avoid entire problem of > file-scope variables. I get your point, why one's should avoid file scope variables. This is infrastructure driver and will not have multiple instances and even if it happens could be avoided with with the help of global mutex and protect below function which i am already doing at the moment and fail the other probe if it is already initialized with proper logging..e.g "already initialized..." ret = qcom_minidump_init_apss_subsystem(md); And this will be in-lined with /* Pointer to the one and only smem handle */ static struct qcom_smem *__smem; Let me know if you still disagree...and have some other way ? -- Mukesh > > Best regards, > Krzysztof >
On 08/05/2023 09:10, Mukesh Ojha wrote: > > > On 5/4/2023 10:04 PM, Krzysztof Kozlowski wrote: >> On 04/05/2023 17:21, Krzysztof Kozlowski wrote: >>>>> >>>>>> + ret = qcom_minidump_init_apss_subsystem(md); >>>>>> + if (ret) { >>>>>> + dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret); >>>>>> + goto unlock; >>>>>> + } >>>>>> + >>>>>> + __md = md; >>>>> >>>>> No. This is a platform device, so it can have multiple instances. >>>> >>>> It can have only one instance that is created from SMEM driver probe. >>> >>> Anyone can instantiate more of them.... how did you solve it? >> >> To clarify - sprinkling more of singletons makes everything tightly >> coupled, difficult to debug and non-portable. You cannot have two >> instances, you have to control concurrent initialization by yourself in >> each of such singletons. >> >> I understand sometimes they are unavoidable, for example when this does >> not map to hardware property. However here you have the parent - smem - >> which can return you valid instance. Thus you avoid entire problem of >> file-scope variables. > > I get your point, why one's should avoid file scope variables. > > > This is infrastructure driver and will not have multiple instances and > even if it happens could be avoided with with the help of global mutex > and protect below function which i am already doing at the moment and But we do not want global mutexes... so incorrect design is being improved by more incorrect design. > fail the other probe if it is already initialized with proper logging..e.g > > "already initialized..." > > > ret = qcom_minidump_init_apss_subsystem(md); > > > And this will be in-lined with > > /* Pointer to the one and only smem handle */ > static struct qcom_smem *__smem; > > Let me know if you still disagree...and have some other way ? Why the parent - smem - cannot return every consumer the instance it has? There will be one smem having only one minidump, so all problems solved? Best regards, Krzysztof
On 5/8/2023 12:10 AM, Mukesh Ojha wrote: > > > On 5/4/2023 10:04 PM, Krzysztof Kozlowski wrote: >> On 04/05/2023 17:21, Krzysztof Kozlowski wrote: >>>>> >>>>>> +Â Â Â ret = qcom_minidump_init_apss_subsystem(md); >>>>>> +Â Â Â if (ret) { >>>>>> +Â Â Â Â Â Â Â dev_err(&pdev->dev, "apss minidump initialization failed: >>>>>> %d\n", ret); >>>>>> +Â Â Â Â Â Â Â goto unlock; >>>>>> +Â Â Â } >>>>>> + >>>>>> +Â Â Â __md = md; >>>>> >>>>> No. This is a platform device, so it can have multiple instances. >>>> >>>> It can have only one instance that is created from SMEM driver probe. >>> >>> Anyone can instantiate more of them.... how did you solve it? >> >> To clarify - sprinkling more of singletons makes everything tightly >> coupled, difficult to debug and non-portable. You cannot have two >> instances, you have to control concurrent initialization by yourself in >> each of such singletons. >> >> I understand sometimes they are unavoidable, for example when this does >> not map to hardware property. However here you have the parent - smem - >> which can return you valid instance. Thus you avoid entire problem of >> file-scope variables. > > I get your point, why one's should avoid file scope variables. > > > This is infrastructure driver and will not have multiple instances and > even if it happens could be avoided with with the help of global mutex > and protect below function which i am already doing at the moment and > fail the other probe if it is already initialized with proper logging..e.g Another way to think here is what if you have chiplets? Two SOCs looks like one as a product? How does your driver will behave in those cases? ---Trilok Soni
Hi Krzysztof, On 5/9/2023 12:41 PM, Krzysztof Kozlowski wrote: > On 08/05/2023 09:10, Mukesh Ojha wrote: >> >> >> On 5/4/2023 10:04 PM, Krzysztof Kozlowski wrote: >>> On 04/05/2023 17:21, Krzysztof Kozlowski wrote: >>>>>> >>>>>>> + ret = qcom_minidump_init_apss_subsystem(md); >>>>>>> + if (ret) { >>>>>>> + dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret); >>>>>>> + goto unlock; >>>>>>> + } >>>>>>> + >>>>>>> + __md = md; >>>>>> >>>>>> No. This is a platform device, so it can have multiple instances. >>>>> >>>>> It can have only one instance that is created from SMEM driver probe. >>>> >>>> Anyone can instantiate more of them.... how did you solve it? >>> >>> To clarify - sprinkling more of singletons makes everything tightly >>> coupled, difficult to debug and non-portable. You cannot have two >>> instances, you have to control concurrent initialization by yourself in >>> each of such singletons. >>> >>> I understand sometimes they are unavoidable, for example when this does >>> not map to hardware property. However here you have the parent - smem - >>> which can return you valid instance. Thus you avoid entire problem of >>> file-scope variables. >> >> I get your point, why one's should avoid file scope variables. >> >> >> This is infrastructure driver and will not have multiple instances and >> even if it happens could be avoided with with the help of global mutex >> and protect below function which i am already doing at the moment and > > But we do not want global mutexes... so incorrect design is being > improved by more incorrect design. > >> fail the other probe if it is already initialized with proper logging..e.g >> >> "already initialized..." >> >> >> ret = qcom_minidump_init_apss_subsystem(md); >> >> >> And this will be in-lined with >> >> /* Pointer to the one and only smem handle */ >> static struct qcom_smem *__smem; >> >> Let me know if you still disagree...and have some other way ? > > Why the parent - smem - cannot return every consumer the instance it > has? There will be one smem having only one minidump, so all problems > solved? Sorry, I am extending this discussion but it is needed to avoid rework in upcoming patches. I am inline with the thought of each smem has its own minidump instance, which is basically one at this moment as SMEM has only instance in DT. In that way, Client driver calling qcom_apss_minidump_region_register() will also need to know the instance it need to register with right? However, I do have a use case [1] where SMEM or similar region supporting memory mapped region could be virtualized and guest vm does not have direct access to it, that way it will only have one backend at a time.But even if they exist together that can be done with below approach. File scope variable is still needed in minidump core but can be avoided in backend drivers where each backend register with core and get added itself in the list and for list protection, list mutex would be needed. #define SMEM 0; #define MMIO 1; or enum may be.. And client can call this to the instance it need to register with.. int qcom_apss_minidump_region_register(region, SMEM); int qcom_apss_minidump_region_register(region, MMIO); Do you agree with this approach? [1] +----------------+ | | | client A-Z | +-----+----------+ | | | | v +------------------------+ | | other backends | minidump core +----------------------------+ | | | +--+---------------------+ | | | | | | | | | |e.g, | | |gunyah-rm +--------v------+ +-----v-----------+ +--+---------+ | | | | | | |minidump_smem | | minidump_mmio | | ..... | +---------------+ +-----------------+ +------------+ SMEM backend mmio backend where smem may be virtualized -- Mukesh > > Best regards, > Krzysztof >
On 5/4/2023 5:06 PM, Krzysztof Kozlowski wrote: > On 03/05/2023 19:02, Mukesh Ojha wrote: >> Minidump is a best effort mechanism to collect useful and predefined >> data for first level of debugging on end user devices running on >> Qualcomm SoCs. It is built on the premise that System on Chip (SoC) >> or subsystem part of SoC crashes, due to a range of hardware and >> software bugs. Hence, the ability to collect accurate data is only >> a best-effort. The data collected could be invalid or corrupted, >> data collection itself could fail, and so on. >> >> Qualcomm devices in engineering mode provides a mechanism for >> generating full system ramdumps for post mortem debugging. But in some >> cases it's however not feasible to capture the entire content of RAM. >> The minidump mechanism provides the means for selecting region should >> be included in the ramdump. The solution supports extracting the >> ramdump/minidump produced either over USB or stored to an attached >> storage device. >> >> The core of minidump feature is part of Qualcomm's boot firmware code. >> It initializes shared memory(SMEM), which is a part of DDR and >> allocates a small section of it to minidump table i.e also called >> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has >> their own table of segments to be included in the minidump, all >> references from a descriptor in SMEM (G-ToC). Each segment/region has >> some details like name, physical address and it's size etc. and it >> could be anywhere scattered in the DDR. >> >> Minidump kernel driver adds the capability to add linux region to be >> dumped as part of ram dump collection. It provides appropriate symbol >> to check its enablement and register client regions. >> >> To simplify post mortem debugging, it creates and maintain an ELF >> header as first region that gets updated upon registration >> of a new region. >> >> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> >> --- >> drivers/soc/qcom/Kconfig | 14 + >> drivers/soc/qcom/Makefile | 1 + >> drivers/soc/qcom/qcom_minidump.c | 581 +++++++++++++++++++++++++++++++++++++++ >> drivers/soc/qcom/smem.c | 8 + >> include/soc/qcom/qcom_minidump.h | 61 +++- >> 5 files changed, 663 insertions(+), 2 deletions(-) >> create mode 100644 drivers/soc/qcom/qcom_minidump.c >> >> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig >> index a491718..15c931e 100644 >> --- a/drivers/soc/qcom/Kconfig >> +++ b/drivers/soc/qcom/Kconfig >> @@ -279,4 +279,18 @@ config QCOM_INLINE_CRYPTO_ENGINE >> tristate >> select QCOM_SCM >> >> +config QCOM_MINIDUMP >> + tristate "QCOM Minidump Support" >> + depends on ARCH_QCOM || COMPILE_TEST >> + select QCOM_SMEM >> + help >> + Enablement of core minidump feature is controlled from boot firmware >> + side, and this config allow linux to query and manages APPS minidump >> + table. >> + >> + Client drivers can register their internal data structures and debug >> + messages as part of the minidump region and when the SoC is crashed, >> + these selective regions will be dumped instead of the entire DDR. >> + This saves significant amount of time and/or storage space. >> + >> endmenu >> diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile >> index 0f43a88..1ebe081 100644 >> --- a/drivers/soc/qcom/Makefile >> +++ b/drivers/soc/qcom/Makefile >> @@ -33,3 +33,4 @@ obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o >> obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) += kryo-l2-accessors.o >> obj-$(CONFIG_QCOM_ICC_BWMON) += icc-bwmon.o >> obj-$(CONFIG_QCOM_INLINE_CRYPTO_ENGINE) += ice.o >> +obj-$(CONFIG_QCOM_MINIDUMP) += qcom_minidump.o >> diff --git a/drivers/soc/qcom/qcom_minidump.c b/drivers/soc/qcom/qcom_minidump.c >> new file mode 100644 >> index 0000000..d107a86 >> --- /dev/null >> +++ b/drivers/soc/qcom/qcom_minidump.c >> @@ -0,0 +1,581 @@ >> +// SPDX-License-Identifier: GPL-2.0-only >> + >> +/* >> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved. >> + */ >> + >> +#include <linux/elf.h> >> +#include <linux/err.h> >> +#include <linux/errno.h> >> +#include <linux/export.h> >> +#include <linux/init.h> >> +#include <linux/io.h> >> +#include <linux/kernel.h> >> +#include <linux/module.h> >> +#include <linux/platform_device.h> >> +#include <linux/string.h> >> +#include <linux/soc/qcom/smem.h> >> +#include <soc/qcom/qcom_minidump.h> >> + >> +/** >> + * struct minidump_elfhdr - Minidump table elf header >> + * @ehdr: Elf main header >> + * @shdr: Section header >> + * @phdr: Program header >> + * @elf_offset: Section offset in elf >> + * @strtable_idx: String table current index position >> + */ >> +struct minidump_elfhdr { >> + struct elfhdr *ehdr; >> + struct elf_shdr *shdr; >> + struct elf_phdr *phdr; >> + size_t elf_offset; >> + size_t strtable_idx; >> +}; >> + >> +/** >> + * struct minidump - Minidump driver private data >> + * @md_gbl_toc : Global TOC pointer >> + * @md_apss_toc : Application Subsystem TOC pointer >> + * @md_regions : High level OS region base pointer >> + * @elf : Minidump elf header >> + * @dev : Minidump device >> + */ >> +struct minidump { >> + struct minidump_global_toc *md_gbl_toc; >> + struct minidump_subsystem *md_apss_toc; >> + struct minidump_region *md_regions; >> + struct minidump_elfhdr elf; >> + struct device *dev; >> +}; >> + >> +/* >> + * In some of the Old Qualcomm devices, boot firmware statically allocates 300 >> + * as total number of supported region (including all co-processors) in >> + * minidump table out of which linux was using 201. In future, this limitation >> + * from boot firmware might get removed by allocating the region dynamically. >> + * So, keep it compatible with older devices, we can keep the current limit for >> + * Linux to 201. >> + */ >> +#define MAX_NUM_ENTRIES 201 >> +#define MAX_STRTBL_SIZE (MAX_NUM_ENTRIES * MAX_REGION_NAME_LENGTH) >> + >> +static struct minidump *__md; > > No, no file scope or global scope statics. > >> +static DEFINE_MUTEX(minidump_lock); > > Neither this. > > Also you need to clearly express what is protected by newn lock. Once i divide this driver into two (minidump core + smem backend driver as described here[1]) so there could more backend drivers could be attached to minidump core. Some of the file scope variable still be required. [1] https://lore.kernel.org/lkml/c2496855-113a-56e6-f6e2-9a9bd03a1267@quicinc.com/#t > >> + >> +static struct elf_shdr *elf_shdr_entry_addr(struct elfhdr *ehdr, int idx) >> +{ >> + struct elf_shdr *eshdr = (struct elf_shdr *)((size_t)ehdr + ehdr->e_shoff); >> + >> + return &eshdr[idx]; >> +} >> + >> +static struct elf_phdr *elf_phdr_entry_addr(struct elfhdr *ehdr, int idx) >> +{ >> + struct elf_phdr *ephdr = (struct elf_phdr *)((size_t)ehdr + ehdr->e_phoff); >> + >> + return &ephdr[idx]; >> +} >> + >> +static char *elf_str_table_start(struct elfhdr *ehdr) >> +{ >> + struct elf_shdr *eshdr; >> + >> + if (ehdr->e_shstrndx == SHN_UNDEF) >> + return NULL; >> + >> + eshdr = elf_shdr_entry_addr(ehdr, ehdr->e_shstrndx); >> + return (char *)ehdr + eshdr->sh_offset; >> +} >> + >> +static char *elf_lookup_string(struct elfhdr *ehdr, int offset) >> +{ >> + char *strtab = elf_str_table_start(ehdr); >> + >> + if (!strtab || (__md->elf.strtable_idx < offset)) >> + return NULL; >> + >> + return strtab + offset; >> +} >> + >> +static unsigned int append_str_to_strtable(const char *name) >> +{ >> + char *strtab = elf_str_table_start(__md->elf.ehdr); >> + unsigned int old_idx = __md->elf.strtable_idx; >> + unsigned int ret; >> + >> + if (!strtab || !name) >> + return 0; >> + >> + ret = old_idx; >> + old_idx += strscpy((strtab + old_idx), name, MAX_REGION_NAME_LENGTH); >> + __md->elf.strtable_idx = old_idx + 1; >> + return ret; >> +} >> + >> +static int >> +get_apss_minidump_region_index(const struct qcom_apss_minidump_region *region) >> +{ >> + struct minidump_region *mdr; >> + unsigned int i; >> + unsigned int count; >> + >> + count = le32_to_cpu(__md->md_apss_toc->region_count); >> + for (i = 0; i < count; i++) { >> + mdr = &__md->md_regions[i]; >> + if (!strcmp(mdr->name, region->name)) >> + return i; >> + } >> + return -ENOENT; >> +} >> + >> +static void >> +qcom_apss_minidump_update_elf_header(const struct qcom_apss_minidump_region *region) > > You need to name everything shorter. Neither functions nor structs are > making it easy to follow. Would drop "apss" from the name; > >> +{ >> + struct elfhdr *ehdr = __md->elf.ehdr; >> + struct elf_shdr *shdr; >> + struct elf_phdr *phdr; >> + >> + shdr = elf_shdr_entry_addr(ehdr, ehdr->e_shnum++); >> + phdr = elf_phdr_entry_addr(ehdr, ehdr->e_phnum++); >> + >> + shdr->sh_type = SHT_PROGBITS; >> + shdr->sh_name = append_str_to_strtable(region->name); >> + shdr->sh_addr = (elf_addr_t)region->virt_addr; >> + shdr->sh_size = region->size; >> + shdr->sh_flags = SHF_WRITE; >> + shdr->sh_offset = __md->elf.elf_offset; >> + shdr->sh_entsize = 0; >> + >> + phdr->p_type = PT_LOAD; >> + phdr->p_offset = __md->elf.elf_offset; >> + phdr->p_vaddr = (elf_addr_t)region->virt_addr; >> + phdr->p_paddr = region->phys_addr; >> + phdr->p_filesz = phdr->p_memsz = region->size; >> + phdr->p_flags = PF_R | PF_W; >> + __md->elf.elf_offset += shdr->sh_size; >> +} >> + >> +static void >> +qcom_apss_minidump_add_region(const struct qcom_apss_minidump_region *region) >> +{ >> + struct minidump_region *mdr; >> + unsigned int region_cnt = le32_to_cpu(__md->md_apss_toc->region_count); >> + >> + mdr = &__md->md_regions[region_cnt]; >> + strscpy(mdr->name, region->name, sizeof(mdr->name)); >> + mdr->address = cpu_to_le64(region->phys_addr); >> + mdr->size = cpu_to_le64(region->size); >> + mdr->valid = cpu_to_le32(MINIDUMP_REGION_VALID); >> + region_cnt++; >> + __md->md_apss_toc->region_count = cpu_to_le32(region_cnt); >> +} >> + >> +static bool >> +qcom_apss_minidump_valid_region(const struct qcom_apss_minidump_region *region) >> +{ >> + return region && >> + strnlen(region->name, MAX_NAME_LENGTH) < MAX_NAME_LENGTH && >> + region->virt_addr && >> + region->size && >> + IS_ALIGNED(region->size, 4); >> +} >> + >> +static int qcom_apss_minidump_add_elf_header(void) >> +{ >> + struct qcom_apss_minidump_region elfregion; >> + struct elfhdr *ehdr; >> + struct elf_shdr *shdr; >> + struct elf_phdr *phdr; >> + unsigned int elfh_size; >> + unsigned int strtbl_off; >> + unsigned int phdr_off; >> + char *banner; >> + unsigned int banner_len; >> + >> + banner_len = strlen(linux_banner); >> + /* >> + * Header buffer contains: >> + * ELF header, (MAX_NUM_ENTRIES + 4) of Section and Program ELF headers, >> + * where, 4 additional entries, one for empty header, one for string table >> + * one for minidump table and one for linux banner. >> + * >> + * Linux banner is stored in minidump to aid post mortem tools to determine >> + * the kernel version. >> + */ >> + elfh_size = sizeof(*ehdr); >> + elfh_size += MAX_STRTBL_SIZE; >> + elfh_size += banner_len + 1; >> + elfh_size += ((sizeof(*shdr) + sizeof(*phdr)) * (MAX_NUM_ENTRIES + 4)); >> + elfh_size = ALIGN(elfh_size, 4); >> + >> + __md->elf.ehdr = kzalloc(elfh_size, GFP_KERNEL); >> + if (!__md->elf.ehdr) >> + return -ENOMEM; >> + >> + /* Register ELF header as first region */ >> + strscpy(elfregion.name, "KELF_HEADER", sizeof(elfregion.name)); >> + elfregion.virt_addr = __md->elf.ehdr; >> + elfregion.phys_addr = virt_to_phys(__md->elf.ehdr); >> + elfregion.size = elfh_size; >> + qcom_apss_minidump_add_region(&elfregion); >> + >> + ehdr = __md->elf.ehdr; >> + /* Assign Section/Program headers offset */ >> + __md->elf.shdr = shdr = (struct elf_shdr *)(ehdr + 1); >> + __md->elf.phdr = phdr = (struct elf_phdr *)(shdr + MAX_NUM_ENTRIES); >> + phdr_off = sizeof(*ehdr) + (sizeof(*shdr) * MAX_NUM_ENTRIES); >> + >> + memcpy(ehdr->e_ident, ELFMAG, SELFMAG); >> + ehdr->e_ident[EI_CLASS] = ELF_CLASS; >> + ehdr->e_ident[EI_DATA] = ELF_DATA; >> + ehdr->e_ident[EI_VERSION] = EV_CURRENT; >> + ehdr->e_ident[EI_OSABI] = ELF_OSABI; >> + ehdr->e_type = ET_CORE; >> + ehdr->e_machine = ELF_ARCH; >> + ehdr->e_version = EV_CURRENT; >> + ehdr->e_ehsize = sizeof(*ehdr); >> + ehdr->e_phoff = phdr_off; >> + ehdr->e_phentsize = sizeof(*phdr); >> + ehdr->e_shoff = sizeof(*ehdr); >> + ehdr->e_shentsize = sizeof(*shdr); >> + ehdr->e_shstrndx = 1; >> + >> + __md->elf.elf_offset = elfh_size; >> + >> + /* >> + * The zeroth index of the section header is reserved and is rarely used. >> + * Set the section header as null (SHN_UNDEF) and move to the next one. >> + * 2nd Section is String table. >> + */ >> + __md->elf.strtable_idx = 1; >> + strtbl_off = sizeof(*ehdr) + ((sizeof(*phdr) + sizeof(*shdr)) * MAX_NUM_ENTRIES); >> + shdr++; >> + shdr->sh_type = SHT_STRTAB; >> + shdr->sh_offset = (elf_addr_t)strtbl_off; >> + shdr->sh_size = MAX_STRTBL_SIZE; >> + shdr->sh_entsize = 0; >> + shdr->sh_flags = 0; >> + shdr->sh_name = append_str_to_strtable("STR_TBL"); >> + shdr++; >> + >> + /* 3rd Section is Linux banner */ >> + banner = (char *)ehdr + strtbl_off + MAX_STRTBL_SIZE; >> + memcpy(banner, linux_banner, banner_len); >> + >> + shdr->sh_type = SHT_PROGBITS; >> + shdr->sh_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE); >> + shdr->sh_size = banner_len + 1; >> + shdr->sh_addr = (elf_addr_t)linux_banner; >> + shdr->sh_entsize = 0; >> + shdr->sh_flags = SHF_WRITE; >> + shdr->sh_name = append_str_to_strtable("linux_banner"); >> + >> + phdr->p_type = PT_LOAD; >> + phdr->p_offset = (elf_addr_t)(strtbl_off + MAX_STRTBL_SIZE); >> + phdr->p_vaddr = (elf_addr_t)linux_banner; >> + phdr->p_paddr = virt_to_phys(linux_banner); >> + phdr->p_filesz = phdr->p_memsz = banner_len + 1; >> + phdr->p_flags = PF_R | PF_W; >> + >> + /* >> + * Above are some prdefined sections/program header used >> + * for debug, update their count here. >> + */ >> + ehdr->e_phnum = 1; >> + ehdr->e_shnum = 3; >> + >> + return 0; >> +} >> + >> +/** >> + * qcom_minidump_subsystem_desc() - Get minidump subsystem descriptor. >> + * @minidump_index: minidump index for a subsystem in minidump table >> + * >> + * Return: minidump subsystem descriptor address on success and error >> + * on failure >> + */ >> +struct minidump_subsystem *qcom_minidump_subsystem_desc(unsigned int minidump_index) >> +{ >> + struct minidump_subsystem *md_ss_toc; >> + >> + mutex_lock(&minidump_lock); >> + if (!__md) { >> + md_ss_toc = ERR_PTR(-EPROBE_DEFER); >> + goto unlock; >> + } >> + >> + md_ss_toc = &__md->md_gbl_toc->subsystems[minidump_index]; >> +unlock: >> + mutex_unlock(&minidump_lock); >> + return md_ss_toc; >> +} >> +EXPORT_SYMBOL_GPL(qcom_minidump_subsystem_desc); >> + >> +/** >> + * qcom_apss_minidump_region_register() - Register a region in Minidump table. >> + * @region: minidump region. >> + * >> + * Return: On success, it returns 0, otherwise a negative error value on failure. >> + */ >> +int qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region) >> +{ >> + unsigned int num_region; >> + int ret; >> + >> + if (!__md) >> + return -EPROBE_DEFER; >> + >> + if (!qcom_apss_minidump_valid_region(region)) >> + return -EINVAL; >> + >> + mutex_lock(&minidump_lock); >> + ret = get_apss_minidump_region_index(region); >> + if (ret >= 0) { >> + dev_info(__md->dev, "%s region is already registered\n", region->name); >> + ret = -EEXIST; >> + goto unlock; >> + } >> + >> + /* Check if there is a room for a new entry */ >> + num_region = le32_to_cpu(__md->md_apss_toc->region_count); >> + if (num_region >= MAX_NUM_ENTRIES) { >> + dev_err(__md->dev, "maximum region limit %u reached\n", num_region); >> + ret = -ENOSPC; >> + goto unlock; >> + } >> + >> + qcom_apss_minidump_add_region(region); >> + qcom_apss_minidump_update_elf_header(region); >> + ret = 0; >> +unlock: >> + mutex_unlock(&minidump_lock); >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(qcom_apss_minidump_region_register); >> + >> +static int >> +qcom_apss_minidump_clear_header(const struct qcom_apss_minidump_region *region) >> +{ >> + struct elfhdr *ehdr = __md->elf.ehdr; >> + struct elf_shdr *shdr; >> + struct elf_shdr *tmp_shdr; >> + struct elf_phdr *phdr; >> + struct elf_phdr *tmp_phdr; >> + unsigned int phidx; >> + unsigned int shidx; >> + unsigned int len; >> + unsigned int i; >> + char *shname; >> + >> + for (i = 0; i < ehdr->e_phnum; i++) { >> + phdr = elf_phdr_entry_addr(ehdr, i); >> + if (phdr->p_paddr == region->phys_addr && >> + phdr->p_memsz == region->size) >> + break; >> + } >> + >> + if (i == ehdr->e_phnum) { >> + dev_err(__md->dev, "Cannot find program header entry in elf\n"); >> + return -EINVAL; >> + } >> + >> + phidx = i; >> + for (i = 0; i < ehdr->e_shnum; i++) { >> + shdr = elf_shdr_entry_addr(ehdr, i); >> + shname = elf_lookup_string(ehdr, shdr->sh_name); >> + if (shname && !strcmp(shname, region->name) && >> + shdr->sh_addr == (elf_addr_t)region->virt_addr && >> + shdr->sh_size == region->size) >> + break; >> + } >> + >> + if (i == ehdr->e_shnum) { >> + dev_err(__md->dev, "Cannot find section header entry in elf\n"); >> + return -EINVAL; >> + } >> + >> + shidx = i; >> + if (shdr->sh_offset != phdr->p_offset) { >> + dev_err(__md->dev, "Invalid entry details for region: %s\n", region->name); >> + return -EINVAL; >> + } >> + >> + /* Clear name in string table */ >> + len = strlen(shname) + 1; >> + memmove(shname, shname + len, >> + __md->elf.strtable_idx - shdr->sh_name - len); >> + __md->elf.strtable_idx -= len; >> + >> + /* Clear program header */ >> + tmp_phdr = elf_phdr_entry_addr(ehdr, phidx); >> + for (i = phidx; i < ehdr->e_phnum - 1; i++) { >> + tmp_phdr = elf_phdr_entry_addr(ehdr, i + 1); >> + phdr = elf_phdr_entry_addr(ehdr, i); >> + memcpy(phdr, tmp_phdr, sizeof(struct elf_phdr)); >> + phdr->p_offset = phdr->p_offset - region->size; >> + } >> + memset(tmp_phdr, 0, sizeof(struct elf_phdr)); >> + ehdr->e_phnum--; >> + >> + /* Clear section header */ >> + tmp_shdr = elf_shdr_entry_addr(ehdr, shidx); >> + for (i = shidx; i < ehdr->e_shnum - 1; i++) { >> + tmp_shdr = elf_shdr_entry_addr(ehdr, i + 1); >> + shdr = elf_shdr_entry_addr(ehdr, i); >> + memcpy(shdr, tmp_shdr, sizeof(struct elf_shdr)); >> + shdr->sh_offset -= region->size; >> + shdr->sh_name -= len; >> + } >> + >> + memset(tmp_shdr, 0, sizeof(struct elf_shdr)); >> + ehdr->e_shnum--; >> + __md->elf.elf_offset -= region->size; >> + >> + return 0; >> +} >> + >> +/** >> + * qcom_apss_minidump_region_unregister() - Unregister region from Minidump table. >> + * @region: minidump region. >> + * >> + * Return: On success, it returns 0 and negative error value on failure. >> + */ >> +int qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region) >> +{ >> + struct minidump_region *mdr; >> + unsigned int num_region; >> + unsigned int idx; >> + int ret; >> + >> + if (!region) >> + return -EINVAL; >> + >> + mutex_lock(&minidump_lock); >> + if (!__md) { >> + ret = -EPROBE_DEFER; >> + goto unlock; >> + } >> + >> + idx = get_apss_minidump_region_index(region); >> + if (idx < 0) { >> + dev_err(__md->dev, "%s region is not present\n", region->name); >> + ret = idx; >> + goto unlock; >> + } >> + >> + mdr = &__md->md_regions[0]; >> + num_region = le32_to_cpu(__md->md_apss_toc->region_count); >> + /* >> + * Left shift all the regions exist after this removed region >> + * index by 1 to fill the gap and zero out the last region >> + * present at the end. >> + */ >> + memmove(&mdr[idx], &mdr[idx + 1], >> + (num_region - idx - 1) * sizeof(struct minidump_region)); >> + memset(&mdr[num_region - 1], 0, sizeof(struct minidump_region)); >> + ret = qcom_apss_minidump_clear_header(region); >> + if (ret) { >> + dev_err(__md->dev, "Failed to remove region: %s\n", region->name); >> + goto unlock; >> + } >> + >> + num_region--; >> + __md->md_apss_toc->region_count = cpu_to_le32(num_region); >> +unlock: >> + mutex_unlock(&minidump_lock); >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(qcom_apss_minidump_region_unregister); >> + >> +static int qcom_minidump_init_apss_subsystem(struct minidump *md) >> +{ >> + struct minidump_subsystem *apsstoc; >> + >> + apsstoc = &md->md_gbl_toc->subsystems[MINIDUMP_APSS_DESC]; >> + md->md_regions = devm_kcalloc(md->dev, MAX_NUM_ENTRIES, >> + sizeof(struct minidump_region), GFP_KERNEL); >> + if (!md->md_regions) >> + return -ENOMEM; >> + >> + md->md_apss_toc = apsstoc; >> + apsstoc->regions_baseptr = cpu_to_le64(virt_to_phys(md->md_regions)); >> + apsstoc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED); >> + apsstoc->status = cpu_to_le32(1); >> + apsstoc->region_count = cpu_to_le32(0); >> + >> + /* Tell bootloader not to encrypt the regions of this subsystem */ >> + apsstoc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE); >> + apsstoc->encryption_required = cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ); >> + >> + return 0; >> +} >> + >> +static int qcom_minidump_probe(struct platform_device *pdev) >> +{ >> + struct minidump_global_toc *mdgtoc; >> + struct minidump *md; >> + size_t size; >> + int ret; >> + >> + md = devm_kzalloc(&pdev->dev, sizeof(*md), GFP_KERNEL); >> + if (!md) >> + return -ENOMEM; >> + >> + mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size); >> + if (IS_ERR(mdgtoc)) { >> + ret = PTR_ERR(mdgtoc); >> + dev_err(&pdev->dev, "Couldn't find minidump smem item: %d\n", ret); >> + return ret; >> + } >> + >> + if (size < sizeof(*mdgtoc) || !mdgtoc->status) { >> + ret = -EINVAL; >> + dev_err(&pdev->dev, "minidump table is not initialized: %d\n", ret); >> + return ret; >> + } >> + >> + mutex_lock(&minidump_lock); >> + md->dev = &pdev->dev; >> + md->md_gbl_toc = mdgtoc; > > What are you protecting here? It's not possible to have concurrent > access to md, is it? > >> + ret = qcom_minidump_init_apss_subsystem(md); >> + if (ret) { >> + dev_err(&pdev->dev, "apss minidump initialization failed: %d\n", ret); >> + goto unlock; >> + } >> + >> + __md = md; > > No. This is a platform device, so it can have multiple instances. > >> + /* First entry would be ELF header */ >> + ret = qcom_apss_minidump_add_elf_header(); >> + if (ret) { >> + dev_err(&pdev->dev, "Failed to add elf header: %d\n", ret); >> + memset(md->md_apss_toc, 0, sizeof(struct minidump_subsystem)); >> + __md = NULL; >> + } >> + >> +unlock: >> + mutex_unlock(&minidump_lock); >> + return ret; >> +} >> + >> +static int qcom_minidump_remove(struct platform_device *pdev) >> +{ >> + memset(__md->md_apss_toc, 0, sizeof(struct minidump_subsystem)); >> + __md = NULL; > > Don't use __ in variable names. Drop it everywhere. > >> + >> + return 0; >> +} >> + >> +static struct platform_driver qcom_minidump_driver = { >> + .probe = qcom_minidump_probe, >> + .remove = qcom_minidump_remove, >> + .driver = { >> + .name = "qcom-minidump", >> + }, >> +}; >> + >> +module_platform_driver(qcom_minidump_driver); >> + >> +MODULE_DESCRIPTION("Qualcomm APSS minidump driver"); >> +MODULE_LICENSE("GPL v2"); >> +MODULE_ALIAS("platform:qcom-minidump"); >> diff --git a/drivers/soc/qcom/smem.c b/drivers/soc/qcom/smem.c >> index 6be7ea9..d459656 100644 >> --- a/drivers/soc/qcom/smem.c >> +++ b/drivers/soc/qcom/smem.c >> @@ -279,6 +279,7 @@ struct qcom_smem { >> >> u32 item_count; >> struct platform_device *socinfo; >> + struct platform_device *minidump; >> struct smem_ptable *ptable; >> struct smem_partition global_partition; >> struct smem_partition partitions[SMEM_HOST_COUNT]; >> @@ -1151,12 +1152,19 @@ static int qcom_smem_probe(struct platform_device *pdev) >> if (IS_ERR(smem->socinfo)) >> dev_dbg(&pdev->dev, "failed to register socinfo device\n"); >> >> + smem->minidump = platform_device_register_data(&pdev->dev, "qcom-minidump", >> + PLATFORM_DEVID_NONE, NULL, >> + 0); >> + if (IS_ERR(smem->minidump)) >> + dev_dbg(&pdev->dev, "failed to register minidump device\n"); >> + >> return 0; >> } >> >> static int qcom_smem_remove(struct platform_device *pdev) >> { >> platform_device_unregister(__smem->socinfo); >> + platform_device_unregister(__smem->minidump); > > Wrong order. You registered first socinfo, right? > >> >> hwspin_lock_free(__smem->hwlock); >> __smem = NULL; >> diff --git a/include/soc/qcom/qcom_minidump.h b/include/soc/qcom/qcom_minidump.h >> index 84c8605..1872668 100644 >> --- a/include/soc/qcom/qcom_minidump.h >> +++ b/include/soc/qcom/qcom_minidump.h >> @@ -1,6 +1,7 @@ >> /* SPDX-License-Identifier: GPL-2.0-only */ >> /* >> - * Qualcomm minidump shared data structures and macros >> + * This file contain Qualcomm minidump data structures and macros shared with >> + * boot firmware and also apss minidump client's data structure >> * >> * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved. >> */ >> @@ -9,12 +10,27 @@ >> #define _QCOM_MINIDUMP_H_ >> >> #define MAX_NUM_OF_SS 10 >> +#define MAX_NAME_LENGTH 12 >> #define MAX_REGION_NAME_LENGTH 16 >> + >> +#define MINIDUMP_REVISION 1 >> #define SBL_MINIDUMP_SMEM_ID 602 >> + >> +/* Application processor minidump descriptor */ >> +#define MINIDUMP_APSS_DESC 0 >> +#define SMEM_ENTRY_SIZE 40 >> + >> #define MINIDUMP_REGION_VALID ('V' << 24 | 'A' << 16 | 'L' << 8 | 'I' << 0) >> +#define MINIDUMP_REGION_INVALID ('I' << 24 | 'N' << 16 | 'V' << 8 | 'A' << 0) >> +#define MINIDUMP_REGION_INIT ('I' << 24 | 'N' << 16 | 'I' << 8 | 'T' << 0) >> +#define MINIDUMP_REGION_NOINIT 0 >> + >> +#define MINIDUMP_SS_ENCR_REQ (0 << 24 | 'Y' << 16 | 'E' << 8 | 'S' << 0) >> +#define MINIDUMP_SS_ENCR_NOTREQ (0 << 24 | 0 << 16 | 'N' << 8 | 'R' << 0) >> +#define MINIDUMP_SS_ENCR_NONE ('N' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0) >> #define MINIDUMP_SS_ENCR_DONE ('D' << 24 | 'O' << 16 | 'N' << 8 | 'E' << 0) >> +#define MINIDUMP_SS_ENCR_START ('S' << 24 | 'T' << 16 | 'R' << 8 | 'T' << 0) >> #define MINIDUMP_SS_ENABLED ('E' << 24 | 'N' << 16 | 'B' << 8 | 'L' << 0) >> - > > Why removing this? > >> /** >> * struct minidump_region - Minidump region >> * @name : Name of the region to be dumped >> @@ -63,4 +79,45 @@ struct minidump_global_toc { >> struct minidump_subsystem subsystems[MAX_NUM_OF_SS]; >> }; >> >> +/** >> + * struct qcom_apss_minidump_region - APSS Minidump region information >> + * >> + * @name: Entry name, Minidump will dump binary with this name. >> + * @virt_addr: Virtual address of the entry. >> + * @phys_addr: Physical address of the entry to dump. >> + * @size: Number of byte to dump from @address location, >> + * and it should be 4 byte aligned. >> + */ >> +struct qcom_apss_minidump_region { >> + char name[MAX_NAME_LENGTH]; >> + void *virt_addr; >> + phys_addr_t phys_addr; >> + size_t size; >> +}; > > You expose way too much internals in global header. > >> + >> +#if IS_ENABLED(CONFIG_QCOM_MINIDUMP) >> +extern struct minidump_subsystem * > > No externs. > > The header is unreadable. > >> +qcom_minidump_subsystem_desc(unsigned int minidump_index); >> +extern int >> +qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region); >> +extern int >> +qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region); > > Blank line > >> +#else > > Blank line > >> +static inline >> +struct minidump_subsystem *qcom_minidump_subsystem_desc(unsigned int minidump_index) >> +{ >> + return NULL; >> +} > > Blank line > >> +static inline int >> +qcom_apss_minidump_region_register(const struct qcom_apss_minidump_region *region) >> +{ >> + /* Return quietly, if minidump is not enabled */ >> + return 0; >> +} > > >> +static inline int >> +qcom_apss_minidump_region_unregister(const struct qcom_apss_minidump_region *region) >> +{ >> + return 0; >> +} > Will apply the suggestion. -- Mukesh >> +#endif > > /* CONFIG_QCOM_MINIDUMP */ > >> #endif /* _QCOM_MINIDUMP_H_ */ > > Best regards, > Krzysztof >
On Wed, 03 May 2023 22:32:14 +0530, Mukesh Ojha wrote: > Minidump is a best effort mechanism to collect useful and predefined data > for first level of debugging on end user devices running on Qualcomm SoCs. > It is built on the premise that System on Chip (SoC) or subsystem part of > SoC crashes, due to a range of hardware and software bugs. Hence, the > ability to collect accurate data is only a best-effort. The data collected > could be invalid or corrupted, data collection itself could fail, and so on. > > [...] Applied, thanks! [01/18] remoteproc: qcom: Expand MD_* as MINIDUMP_* commit: 318da1371246fdc1806011a27138175cfb078687 Best regards,
On Sun, Jul 16, 2023 at 07:15:57PM -0600, Mathieu Poirier wrote: > On Sat, Jul 15, 2023 at 03:13:34PM -0700, Bjorn Andersson wrote: > > > > On Wed, 03 May 2023 22:32:14 +0530, Mukesh Ojha wrote: > > > Minidump is a best effort mechanism to collect useful and predefined data > > > for first level of debugging on end user devices running on Qualcomm SoCs. > > > It is built on the premise that System on Chip (SoC) or subsystem part of > > > SoC crashes, due to a range of hardware and software bugs. Hence, the > > > ability to collect accurate data is only a best-effort. The data collected > > > could be invalid or corrupted, data collection itself could fail, and so on. > > > > > > [...] > > > > Applied, thanks! > > > > [01/18] remoteproc: qcom: Expand MD_* as MINIDUMP_* > > commit: 318da1371246fdc1806011a27138175cfb078687 > > > > Krzysztof asked for modifications on this patch. > Krzysztof pointed out that there was no reason to trivially modify the defines and then immediately start moving things around. I agree with this, but as the consensus was that the rest of the series needs more work, I find this to be a sensible cleanup, getting rid of the cryptic "MD_" abbreviation. Regards, Bjorn
diff --git a/drivers/watchdog/qcom-wdt.c b/drivers/watchdog/qcom-wdt.c index 0d2209c..767e84a 100644 --- a/drivers/watchdog/qcom-wdt.c +++ b/drivers/watchdog/qcom-wdt.c @@ -12,6 +12,7 @@ #include <linux/platform_device.h> #include <linux/watchdog.h> #include <linux/of_device.h> +#include <linux/panic.h> enum wdt_reg { WDT_RST, @@ -114,12 +115,28 @@ static int qcom_wdt_set_pretimeout(struct watchdog_device *wdd, return qcom_wdt_start(wdd); } +static void qcom_wdt_bite_on_panic(struct qcom_wdt *wdt) +{ + writel(0, wdt_addr(wdt, WDT_EN)); + writel(1, wdt_addr(wdt, WDT_BITE_TIME)); + writel(1, wdt_addr(wdt, WDT_RST)); + writel(QCOM_WDT_ENABLE, wdt_addr(wdt, WDT_EN)); + + wmb(); + + while(1) + udelay(1); +} + static int qcom_wdt_restart(struct watchdog_device *wdd, unsigned long action, void *data) { struct qcom_wdt *wdt = to_qcom_wdt(wdd); u32 timeout; + if (in_panic) + qcom_wdt_bite_on_panic(wdt); + /* * Trigger watchdog bite: * Setup BITE_TIME to be 128ms, and enable WDT. diff --git a/include/linux/panic.h b/include/linux/panic.h index 979b776..f913629 100644 --- a/include/linux/panic.h +++ b/include/linux/panic.h @@ -22,6 +22,7 @@ extern int panic_on_oops; extern int panic_on_unrecovered_nmi; extern int panic_on_io_nmi; extern int panic_on_warn; +extern bool in_panic; extern unsigned long panic_on_taint; extern bool panic_on_taint_nousertaint; diff --git a/kernel/panic.c b/kernel/panic.c index 487f5b0..714f7f4 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -65,6 +65,8 @@ static unsigned int warn_limit __read_mostly; int panic_timeout = CONFIG_PANIC_TIMEOUT; EXPORT_SYMBOL_GPL(panic_timeout); +bool in_panic = false; +EXPORT_SYMBOL_GPL(in_panic); #define PANIC_PRINT_TASK_INFO 0x00000001 #define PANIC_PRINT_MEM_INFO 0x00000002 @@ -261,6 +263,7 @@ void panic(const char *fmt, ...) int old_cpu, this_cpu; bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers; + in_panic = true; if (panic_on_warn) { /* * This thread may hit another WARN() in the panic path.
Minidump is a best effort mechanism to collect useful and predefined data for first level of debugging on end user devices running on Qualcomm SoCs. It is built on the premise that System on Chip (SoC) or subsystem part of SoC crashes, due to a range of hardware and software bugs. Hence, the ability to collect accurate data is only a best-effort. The data collected could be invalid or corrupted, data collection itself could fail, and so on. Qualcomm devices in engineering mode provides a mechanism for generating full system ramdumps for post mortem debugging. But in some cases it's however not feasible to capture the entire content of RAM. The minidump mechanism provides the means for selecting which snippets should be included in the ramdump. The core of minidump feature is part of Qualcomm's boot firmware code. It initializes shared memory (SMEM), which is a part of DDR and allocates a small section of SMEM to minidump table i.e also called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has their own table of segments to be included in the minidump and all get their reference from G-ToC. Each segment/region has some details like name, physical address and it's size etc. and it could be anywhere scattered in the DDR. Existing upstream Qualcomm remoteproc driver[1] already supports minidump feature for remoteproc instances like ADSP, MODEM, ... where predefined selective segments of subsystem region can be dumped as part of coredump collection which generates smaller size artifacts compared to complete coredump of subsystem on crash. [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142 In addition to managing and querying the APSS minidump description, the Linux driver maintains a ELF header in a segment. This segment gets updated with section/program header whenever a new entry gets registered. Patch 1/18 is very trivial change. Patch 2/18 moves the minidump specific data structure and macro to qcom_minidump.h so that (4/18) qcom minidump driver can use. Patch 3/18 documents qualcomm minidump guide for users. Patch 4/18 implements qualcomm minidump kernel driver and exports symbol which other minidump kernel client can use. Patch 5/18 add pending region support for the clients who came for registration before minidump. Patch 6/18 add update region support for registered clients. Patch 7/18 enables the qualcomm minidump driver. Patch 8/18 Use the exported symbol from minidump driver in qcom_common for querying minidump descriptor for a subsystem. Patch 9-13 add qcom dynamic ramoops region support via a driver which adds ramoops platform device and also register existing pstore frontend regions. Patch 14-18 are not new and has already been through 6 versions and reason of adding here is for minidump testing purpose and it will be rebased automatically along with new version of minidump series. Testing of the patches has been done on sm8450 target after enabling config like CONFIG_PSTORE_RAM and CONFIG_PSTORE_CONSOLE and once the device boots up. Try crashing it via devmem2 0xf11c000(this is known to create xpu violation and and put the device in download mode) on command prompt. I have added download patch here numbered from 14/18 to 18/18 Earlier download mode setting patches were sent separately https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/ Default storage type is set to via USB, so minidump would be downloaded with the help of x86_64 machine (running PCAT tool) attached to Qualcomm device which has backed minidump boot firmware support (more can be found patch 3/18) Below patch [1] is to warm reset Qualcomm device which has upstream qcom watchdog driver support. After applying all patches, we can boot the device and can execute following command. echo mini > /sys/module/qcom_scm/parameters/download_mode echo c > /proc/sysrq-trigger This will make the device go to download mode and collect the minidump on to the attached x86 machine running the Qualcomm PCAT tool(This comes as part Qualcomm package manager kit). After that we will see a bunch of predefined registered region as binary blobs files starts with md_* downloaded on the x86 machine on given location in PCAT tool from the target device. A sample client example to dump a linux region has been given in patch 3/18 and as well as can be seen in patch 12/18. [1] --------------------------->8------------------------------------- commit f1124ccebd47550b4c9627aa162d9cdceba2b76f Author: Mukesh Ojha <quic_mojha@quicinc.com> Date: Thu Mar 16 14:08:35 2023 +0530 do not merge: watchdog bite on panic Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> -------------------------------------------------------------------------- Changes in v3: - Addressed most of the comments by Srini on v2 and refactored the minidump driver. - Added platform device support - Unregister region support. - Added update region for clients. - Added pending region support. - Modified the documentation guide accordingly. - Added qcom_pstore_ramdump client driver which happen to add ramoops platform device and also registers ramoops region with minidump. - Added download mode patch series with this minidump series. https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/ Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/ - Addressed review comment made by [quic_tsoni/bmasney] to add documentation. - Addressed comments made by [srinivas.kandagatla] - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore region in minidump. - Fixed issue reported by kernel test robot. Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/ Mukesh Ojha (18): remoteproc: qcom: Expand MD_* as MINIDUMP_* remoteproc: qcom: Move minidump specific data to qcom_minidump.h docs: qcom: Add qualcomm minidump guide soc: qcom: Add Qualcomm minidump kernel driver soc: qcom: minidump: Add pending region registration support soc: qcom: minidump: Add update region support arm64: defconfig: Enable Qualcomm minidump driver remoterproc: qcom: refactor to leverage exported minidump symbol soc: qcom: Add qcom's pstore minidump driver support dt-bindings: reserved-memory: Add qcom,ramoops-minidump binding arm64: dts: qcom: sm8450: Add Qualcomm ramoops minidump node soc: qcom: Register pstore frontend region with minidump arm64: defconfig: Enable Qualcomm pstore minidump client driver firmware: qcom_scm: provide a read-modify-write function pinctrl: qcom: Use qcom_scm_io_update_field() firmware: scm: Modify only the download bits in TCSR register firmware: qcom_scm: Refactor code to support multiple download mode firmware: qcom_scm: Add multiple download mode support Documentation/admin-guide/qcom_minidump.rst | 246 +++++++ .../reserved-memory/qcom,ramoops-minidump.yaml | 69 ++ arch/arm64/boot/dts/qcom/sm8450.dtsi | 11 + arch/arm64/configs/defconfig | 2 + drivers/firmware/Kconfig | 11 - drivers/firmware/qcom_scm.c | 88 ++- drivers/pinctrl/qcom/pinctrl-msm.c | 11 +- drivers/remoteproc/qcom_common.c | 75 +-- drivers/soc/qcom/Kconfig | 25 + drivers/soc/qcom/Makefile | 2 + drivers/soc/qcom/qcom_minidump.c | 724 +++++++++++++++++++++ drivers/soc/qcom/qcom_pstore_minidump.c | 194 ++++++ drivers/soc/qcom/smem.c | 8 + include/linux/firmware/qcom/qcom_scm.h | 2 + include/soc/qcom/qcom_minidump.h | 130 ++++ 15 files changed, 1503 insertions(+), 95 deletions(-) create mode 100644 Documentation/admin-guide/qcom_minidump.rst create mode 100644 Documentation/devicetree/bindings/reserved-memory/qcom,ramoops-minidump.yaml create mode 100644 drivers/soc/qcom/qcom_minidump.c create mode 100644 drivers/soc/qcom/qcom_pstore_minidump.c create mode 100644 include/soc/qcom/qcom_minidump.h