mbox series

[0/3] x86: make pat and mtrr independent from each other

Message ID 20220715142549.25223-1-jgross@suse.com
Headers show
Series x86: make pat and mtrr independent from each other | expand

Message

Juergen Gross July 15, 2022, 2:25 p.m. UTC
Today PAT can't be used without MTRR being available, unless MTRR is at
least configured via CONFIG_MTRR and the system is running as Xen PV
guest. In this case PAT is automatically available via the hypervisor,
but the PAT MSR can't be modified by the kernel and MTRR is disabled.

As an additional complexity the availability of PAT can't be queried
via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
to be disabled. This leads to some drivers believing that not all cache
modes are available, resulting in failures or degraded functionality.

The same applies to a kernel built with no MTRR support: it won't
allow to use the PAT MSR, even if there is no technical reason for
that, other than setting up PAT on all cpus the same way (which is a
requirement of the processor's cache management) is relying on some
MTRR specific code.

Fix all of that by:

- moving the function needed by PAT from MTRR specific code one level
  up
- adding a PAT indirection layer supporting the 3 cases "no or disabled
  PAT", "PAT under kernel control", and "PAT under Xen control"
- removing the dependency of PAT on MTRR

Juergen Gross (3):
  x86: move some code out of arch/x86/kernel/cpu/mtrr
  x86: add wrapper functions for mtrr functions handling also pat
  x86: decouple pat and mtrr handling

 arch/x86/include/asm/memtype.h     |  13 ++-
 arch/x86/include/asm/mtrr.h        |  27 ++++--
 arch/x86/include/asm/processor.h   |  10 +++
 arch/x86/kernel/cpu/common.c       | 123 +++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/mtrr/generic.c |  90 ++------------------
 arch/x86/kernel/cpu/mtrr/mtrr.c    |  58 ++++---------
 arch/x86/kernel/cpu/mtrr/mtrr.h    |   1 -
 arch/x86/kernel/setup.c            |  12 +--
 arch/x86/kernel/smpboot.c          |   8 +-
 arch/x86/mm/pat/memtype.c          | 127 +++++++++++++++++++++--------
 arch/x86/power/cpu.c               |   2 +-
 arch/x86/xen/enlighten_pv.c        |   4 +
 12 files changed, 289 insertions(+), 186 deletions(-)

Comments

Rafael J. Wysocki July 15, 2022, 4:41 p.m. UTC | #1
On Fri, Jul 15, 2022 at 4:25 PM Juergen Gross <jgross@suse.com> wrote:
>
> There are several MTRR functions which also do PAT handling. In order
> to support PAT handling without MTRR in the future, add some wrappers
> for those functions.
>
> Cc: <stable@vger.kernel.org> # 5.17
> Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")
> Signed-off-by: Juergen Gross <jgross@suse.com>

Do I understand correctly that this particular patch doesn't change
the behavior?

If so, it would be good to mention that in the changelog.

> ---
>  arch/x86/include/asm/mtrr.h      |  2 --
>  arch/x86/include/asm/processor.h |  7 +++++
>  arch/x86/kernel/cpu/common.c     | 44 +++++++++++++++++++++++++++++++-
>  arch/x86/kernel/cpu/mtrr/mtrr.c  | 25 +++---------------
>  arch/x86/kernel/setup.c          |  5 +---
>  arch/x86/kernel/smpboot.c        |  8 +++---
>  arch/x86/power/cpu.c             |  2 +-
>  7 files changed, 59 insertions(+), 34 deletions(-)
>
> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
> index 12a16caed395..900083ac9f60 100644
> --- a/arch/x86/include/asm/mtrr.h
> +++ b/arch/x86/include/asm/mtrr.h
> @@ -43,7 +43,6 @@ extern int mtrr_del(int reg, unsigned long base, unsigned long size);
>  extern int mtrr_del_page(int reg, unsigned long base, unsigned long size);
>  extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
>  extern void mtrr_ap_init(void);
> -extern void set_mtrr_aps_delayed_init(void);
>  extern void mtrr_aps_init(void);
>  extern void mtrr_bp_restore(void);
>  extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
> @@ -86,7 +85,6 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
>  {
>  }
>  #define mtrr_ap_init() do {} while (0)
> -#define set_mtrr_aps_delayed_init() do {} while (0)
>  #define mtrr_aps_init() do {} while (0)
>  #define mtrr_bp_restore() do {} while (0)
>  #define mtrr_disable() do {} while (0)
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 5c934b922450..e2140204fb7e 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -865,7 +865,14 @@ bool arch_is_platform_page(u64 paddr);
>  #define arch_is_platform_page arch_is_platform_page
>  #endif
>
> +extern bool cache_aps_delayed_init;
> +
>  void cache_disable(void);
>  void cache_enable(void);
> +void cache_bp_init(void);
> +void cache_ap_init(void);
> +void cache_set_aps_delayed_init(void);
> +void cache_aps_init(void);
> +void cache_bp_restore(void);
>
>  #endif /* _ASM_X86_PROCESSOR_H */
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index e43322f8a4ef..0a1bd14f7966 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1929,7 +1929,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
>  #ifdef CONFIG_X86_32
>         enable_sep_cpu();
>  #endif
> -       mtrr_ap_init();
> +       cache_ap_init();
>         validate_apic_and_package_id(c);
>         x86_spec_ctrl_setup_ap();
>         update_srbds_msr();
> @@ -2403,3 +2403,45 @@ void cache_enable(void) __releases(cache_disable_lock)
>
>         raw_spin_unlock(&cache_disable_lock);
>  }
> +
> +void __init cache_bp_init(void)
> +{
> +       if (IS_ENABLED(CONFIG_MTRR))
> +               mtrr_bp_init();
> +       else
> +               pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel.");
> +}
> +
> +void cache_ap_init(void)
> +{
> +       if (cache_aps_delayed_init)
> +               return;
> +
> +       mtrr_ap_init();
> +}
> +
> +bool cache_aps_delayed_init;
> +
> +void cache_set_aps_delayed_init(void)
> +{
> +       cache_aps_delayed_init = true;
> +}
> +
> +void cache_aps_init(void)
> +{
> +       /*
> +        * Check if someone has requested the delay of AP cache initialization,
> +        * by doing cache_set_aps_delayed_init(), prior to this point. If not,
> +        * then we are done.
> +        */
> +       if (!cache_aps_delayed_init)
> +               return;
> +
> +       mtrr_aps_init();
> +       cache_aps_delayed_init = false;
> +}
> +
> +void cache_bp_restore(void)
> +{
> +       mtrr_bp_restore();
> +}
> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
> index 2746cac9d8a9..c1593cfae641 100644
> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
> @@ -69,7 +69,6 @@ unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
>  static DEFINE_MUTEX(mtrr_mutex);
>
>  u64 size_or_mask, size_and_mask;
> -static bool mtrr_aps_delayed_init;
>
>  static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;
>
> @@ -176,7 +175,8 @@ static int mtrr_rendezvous_handler(void *info)
>         if (data->smp_reg != ~0U) {
>                 mtrr_if->set(data->smp_reg, data->smp_base,
>                              data->smp_size, data->smp_type);
> -       } else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
> +       } else if ((use_intel() && cache_aps_delayed_init) ||
> +                  !cpu_online(smp_processor_id())) {
>                 mtrr_if->set_all();
>         }
>         return 0;
> @@ -789,7 +789,7 @@ void mtrr_ap_init(void)
>         if (!mtrr_enabled())
>                 return;
>
> -       if (!use_intel() || mtrr_aps_delayed_init)
> +       if (!use_intel())
>                 return;
>
>         /*
> @@ -823,16 +823,6 @@ void mtrr_save_state(void)
>         smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
>  }
>
> -void set_mtrr_aps_delayed_init(void)
> -{
> -       if (!mtrr_enabled())
> -               return;
> -       if (!use_intel())
> -               return;
> -
> -       mtrr_aps_delayed_init = true;
> -}
> -
>  /*
>   * Delayed MTRR initialization for all AP's
>   */
> @@ -841,16 +831,7 @@ void mtrr_aps_init(void)
>         if (!use_intel() || !mtrr_enabled())
>                 return;
>
> -       /*
> -        * Check if someone has requested the delay of AP MTRR initialization,
> -        * by doing set_mtrr_aps_delayed_init(), prior to this point. If not,
> -        * then we are done.
> -        */
> -       if (!mtrr_aps_delayed_init)
> -               return;
> -
>         set_mtrr(~0U, 0, 0, 0);
> -       mtrr_aps_delayed_init = false;
>  }
>
>  void mtrr_bp_restore(void)
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index bd6c6fd373ae..27d61f73c68a 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1001,10 +1001,7 @@ void __init setup_arch(char **cmdline_p)
>         max_pfn = e820__end_of_ram_pfn();
>
>         /* update e820 for memory not covered by WB MTRRs */
> -       if (IS_ENABLED(CONFIG_MTRR))
> -               mtrr_bp_init();
> -       else
> -               pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel.");
> +       cache_bp_init();
>
>         if (mtrr_trim_uncached_memory(max_pfn))
>                 max_pfn = e820__end_of_ram_pfn();
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 5e7f9532a10d..535d73a47062 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1432,7 +1432,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
>
>         uv_system_init();
>
> -       set_mtrr_aps_delayed_init();
> +       cache_set_aps_delayed_init();
>
>         smp_quirk_init_udelay();
>
> @@ -1443,12 +1443,12 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
>
>  void arch_thaw_secondary_cpus_begin(void)
>  {
> -       set_mtrr_aps_delayed_init();
> +       cache_set_aps_delayed_init();
>  }
>
>  void arch_thaw_secondary_cpus_end(void)
>  {
> -       mtrr_aps_init();
> +       cache_aps_init();
>  }
>
>  /*
> @@ -1491,7 +1491,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
>
>         nmi_selftest();
>         impress_friends();
> -       mtrr_aps_init();
> +       cache_aps_init();
>  }
>
>  static int __initdata setup_possible_cpus = -1;
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index bb176c72891c..21e014715322 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -261,7 +261,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>         do_fpu_end();
>         tsc_verify_tsc_adjust(true);
>         x86_platform.restore_sched_clock_state();
> -       mtrr_bp_restore();
> +       cache_bp_restore();
>         perf_restore_debug_store();
>
>         c = &cpu_data(smp_processor_id());
> --
> 2.35.3
>
Chuck Zmudzinski July 16, 2022, 11:32 a.m. UTC | #2
On 7/15/2022 10:25 AM, Juergen Gross wrote:
> Today PAT can't be used without MTRR being available, unless MTRR is at
> least configured via CONFIG_MTRR and the system is running as Xen PV
> guest. In this case PAT is automatically available via the hypervisor,
> but the PAT MSR can't be modified by the kernel and MTRR is disabled.
>
> As an additional complexity the availability of PAT can't be queried
> via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
> to be disabled. This leads to some drivers believing that not all cache
> modes are available, resulting in failures or degraded functionality.
>
> The same applies to a kernel built with no MTRR support: it won't
> allow to use the PAT MSR, even if there is no technical reason for
> that, other than setting up PAT on all cpus the same way (which is a
> requirement of the processor's cache management) is relying on some
> MTRR specific code.
>
> Fix all of that by:
>
> - moving the function needed by PAT from MTRR specific code one level
>   up
> - adding a PAT indirection layer supporting the 3 cases "no or disabled
>   PAT", "PAT under kernel control", and "PAT under Xen control"
> - removing the dependency of PAT on MTRR
>
> Juergen Gross (3):
>   x86: move some code out of arch/x86/kernel/cpu/mtrr
>   x86: add wrapper functions for mtrr functions handling also pat
>   x86: decouple pat and mtrr handling
>
>  arch/x86/include/asm/memtype.h     |  13 ++-
>  arch/x86/include/asm/mtrr.h        |  27 ++++--
>  arch/x86/include/asm/processor.h   |  10 +++
>  arch/x86/kernel/cpu/common.c       | 123 +++++++++++++++++++++++++++-
>  arch/x86/kernel/cpu/mtrr/generic.c |  90 ++------------------
>  arch/x86/kernel/cpu/mtrr/mtrr.c    |  58 ++++---------
>  arch/x86/kernel/cpu/mtrr/mtrr.h    |   1 -
>  arch/x86/kernel/setup.c            |  12 +--
>  arch/x86/kernel/smpboot.c          |   8 +-
>  arch/x86/mm/pat/memtype.c          | 127 +++++++++++++++++++++--------
>  arch/x86/power/cpu.c               |   2 +-
>  arch/x86/xen/enlighten_pv.c        |   4 +
>  12 files changed, 289 insertions(+), 186 deletions(-)
>

This patch series seems related to the regression reported
here on May 5, 2022:

https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

I am experiencing that regression and could test this patch
on my system.

Can you confirm that with this patch series you are trying
to fix that regression?

Chuck
Borislav Petkov July 16, 2022, 11:42 a.m. UTC | #3
On Sat, Jul 16, 2022 at 07:32:46AM -0400, Chuck Zmudzinski wrote:
> Can you confirm that with this patch series you are trying
> to fix that regression?

Yes, this patchset is aimed to fix the whole situation but please don't
do anything yet - I need to find time and look at the whole approach
before you can test it. Just be patient and we'll ping you when the time
comes.

Thx.
Chuck Zmudzinski July 16, 2022, 12:01 p.m. UTC | #4
On 7/16/2022 7:32 AM, Chuck Zmudzinski wrote:
> On 7/15/2022 10:25 AM, Juergen Gross wrote:
> > Today PAT can't be used without MTRR being available, unless MTRR is at
> > least configured via CONFIG_MTRR and the system is running as Xen PV
> > guest. In this case PAT is automatically available via the hypervisor,
> > but the PAT MSR can't be modified by the kernel and MTRR is disabled.
> >
> > As an additional complexity the availability of PAT can't be queried
> > via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
> > to be disabled. This leads to some drivers believing that not all cache
> > modes are available, resulting in failures or degraded functionality.
> >
> > The same applies to a kernel built with no MTRR support: it won't
> > allow to use the PAT MSR, even if there is no technical reason for
> > that, other than setting up PAT on all cpus the same way (which is a
> > requirement of the processor's cache management) is relying on some
> > MTRR specific code.
> >
> > Fix all of that by:
> >
> > - moving the function needed by PAT from MTRR specific code one level
> >   up
> > - adding a PAT indirection layer supporting the 3 cases "no or disabled
> >   PAT", "PAT under kernel control", and "PAT under Xen control"
> > - removing the dependency of PAT on MTRR
> >
> > Juergen Gross (3):
> >   x86: move some code out of arch/x86/kernel/cpu/mtrr
> >   x86: add wrapper functions for mtrr functions handling also pat
> >   x86: decouple pat and mtrr handling
> >
> >  arch/x86/include/asm/memtype.h     |  13 ++-
> >  arch/x86/include/asm/mtrr.h        |  27 ++++--
> >  arch/x86/include/asm/processor.h   |  10 +++
> >  arch/x86/kernel/cpu/common.c       | 123 +++++++++++++++++++++++++++-
> >  arch/x86/kernel/cpu/mtrr/generic.c |  90 ++------------------
> >  arch/x86/kernel/cpu/mtrr/mtrr.c    |  58 ++++---------
> >  arch/x86/kernel/cpu/mtrr/mtrr.h    |   1 -
> >  arch/x86/kernel/setup.c            |  12 +--
> >  arch/x86/kernel/smpboot.c          |   8 +-
> >  arch/x86/mm/pat/memtype.c          | 127 +++++++++++++++++++++--------
> >  arch/x86/power/cpu.c               |   2 +-
> >  arch/x86/xen/enlighten_pv.c        |   4 +
> >  12 files changed, 289 insertions(+), 186 deletions(-)
> >
>
> This patch series seems related to the regression reported
> here on May 5, 2022:

I'm sorry, the date of that report was May 4, 2022, not
May 5, 2022 - just to avoid any doubt about which regression
I am referring to.

Chuck

>
> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
>
> I am experiencing that regression 

or a very similar regression that is caused by the same commit:

bdd8b6c98239cad
("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")

> and could test this patch
> on my system.
>
> Can you confirm that with this patch series you are trying
> to fix that regression?
>
> Chuck

Chuck
Chuck Zmudzinski July 17, 2022, 4:06 a.m. UTC | #5
On 7/16/2022 7:42 AM, Borislav Petkov wrote:
> On Sat, Jul 16, 2022 at 07:32:46AM -0400, Chuck Zmudzinski wrote:
> > Can you confirm that with this patch series you are trying
> > to fix that regression?
>
> Yes, this patchset is aimed to fix the whole situation but please don't
> do anything yet - I need to find time and look at the whole approach
> before you can test it. Just be patient and we'll ping you when the time
> comes.
>
> Thx.
>

I will wait until I get the ping before trying it.

Thanks,

Chuck
Thorsten Leemhuis July 17, 2022, 7:55 a.m. UTC | #6
Hi Juergen!

On 15.07.22 16:25, Juergen Gross wrote:
> Today PAT can't be used without MTRR being available, unless MTRR is at
> least configured via CONFIG_MTRR and the system is running as Xen PV
> guest. In this case PAT is automatically available via the hypervisor,
> but the PAT MSR can't be modified by the kernel and MTRR is disabled.
> 
> As an additional complexity the availability of PAT can't be queried
> via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
> to be disabled. This leads to some drivers believing that not all cache
> modes are available, resulting in failures or degraded functionality.
> 
> The same applies to a kernel built with no MTRR support: it won't
> allow to use the PAT MSR, even if there is no technical reason for
> that, other than setting up PAT on all cpus the same way (which is a
> requirement of the processor's cache management) is relying on some
> MTRR specific code.
> 
> Fix all of that by:
> 
> - moving the function needed by PAT from MTRR specific code one level
>   up
> - adding a PAT indirection layer supporting the 3 cases "no or disabled
>   PAT", "PAT under kernel control", and "PAT under Xen control"
> - removing the dependency of PAT on MTRR

Thx for working on this. If you need to respin these patches for one
reason or another, could you do me a favor and add proper 'Link:' tags
pointing to all reports about this issue? e.g. like this:

 Link: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

These tags are considered important by Linus[1] and others, as they
allow anyone to look into the backstory weeks or years from now. That is
why they should be placed in cases like this, as
Documentation/process/submitting-patches.rst and
Documentation/process/5.Posting.rst explain in more detail. I care
personally, because these tags make my regression tracking efforts a
whole lot easier, as they allow my tracking bot 'regzbot' to
automatically connect reports with patches posted or committed to fix
tracked regressions.

[1] see for example:
https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-DgvdBWg@mail.gmail.com/
https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@mail.gmail.com/
https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_DoJA@mail.gmail.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

BTW, let me tell regzbot to monitor this thread:

#regzbot ^backmonitor:
https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
Chuck Zmudzinski July 18, 2022, 11:32 a.m. UTC | #7
On 7/17/2022 3:55 AM, Thorsten Leemhuis wrote:
> Hi Juergen!
>
> On 15.07.22 16:25, Juergen Gross wrote:
> > Today PAT can't be used without MTRR being available, unless MTRR is at
> > least configured via CONFIG_MTRR and the system is running as Xen PV
> > guest. In this case PAT is automatically available via the hypervisor,
> > but the PAT MSR can't be modified by the kernel and MTRR is disabled.
> > 
> > As an additional complexity the availability of PAT can't be queried
> > via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
> > to be disabled. This leads to some drivers believing that not all cache
> > modes are available, resulting in failures or degraded functionality.
> > 
> > The same applies to a kernel built with no MTRR support: it won't
> > allow to use the PAT MSR, even if there is no technical reason for
> > that, other than setting up PAT on all cpus the same way (which is a
> > requirement of the processor's cache management) is relying on some
> > MTRR specific code.
> > 
> > Fix all of that by:
> > 
> > - moving the function needed by PAT from MTRR specific code one level
> >   up
> > - adding a PAT indirection layer supporting the 3 cases "no or disabled
> >   PAT", "PAT under kernel control", and "PAT under Xen control"
> > - removing the dependency of PAT on MTRR
>
> Thx for working on this. If you need to respin these patches for one
> reason or another, could you do me a favor and add proper 'Link:' tags
> pointing to all reports about this issue? e.g. like this:
>
>  Link: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
>
> These tags are considered important by Linus[1] and others, as they
> allow anyone to look into the backstory weeks or years from now. That is
> why they should be placed in cases like this, as
> Documentation/process/submitting-patches.rst and
> Documentation/process/5.Posting.rst explain in more detail. I care
> personally, because these tags make my regression tracking efforts a
> whole lot easier, as they allow my tracking bot 'regzbot' to
> automatically connect reports with patches posted or committed to fix
> tracked regressions.
>
> [1] see for example:
> https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-DgvdBWg@mail.gmail.com/
> https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@mail.gmail.com/
> https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_DoJA@mail.gmail.com/
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>

I echo Thorsten's thx for starting on this now instead of waiting until
September which I think is when Juergen said he could start working
on this last week. I agree with Thorsten that Link tags are needed.
Since multiple patches have been proposed to fix this regression,
perhaps a Link to each proposed patch, and a note that
the original report identified a specific commit which when reverted
also fixes it. IMO, this is all part of the backstory Thorsten refers to.

It looks like with this approach, a fix will not be coming real soon,
and Borislav Petkov also discouraged me from testing this
patch set until I receive a ping telling me it is ready for testing,
which seems to confirm that this regression will not be fixed
very soon. Please correct me if I am wrong about how long
it will take to fix it with this approach.

Also, is there any guarantee this approach is endorsed by
all the maintainers who will need to sign-off, especially
Linus? I say this because some of the discussion on the
earlier proposed patches makes me doubt this. I am especially
referring to this discussion:

https://lore.kernel.org/lkml/4c8c9d4c-1c6b-8e9f-fa47-918a64898a28@leemhuis.info/

and also, here:

https://lore.kernel.org/lkml/YsRjX%2FU1XN8rq+8u@zn.tnic/

where Borislav Petkov argues that Linux should not be
patched at all to fix this regression but instead the fix
should come by patching the Xen hypervisor.

So I have several questions, presuming at least the fix is going
to be delayed for some time, and also presuming this approach
is not yet an approach that has the blessing of the maintainers
who will need to sign-off:

1. Can you estimate when the patch series will be ready for
testing and suitable for a prepatch or RC release?

2. Can you estimate when the patch series will be ready to be
merged into the mainline release? Is there any hope it will be
fixed before the next longterm release hosted on kernel.org?

3. Since a fix is likely not coming soon, can you explain
why the commit that was mentioned in the original
report cannot be reverted as a temporary solution while
we wait for the full fix to come later? I can say that
reverting that commit (It was a commit affecting
drm/i915) does fix the issue on my system with no
negative side effects at all. In such a case, it seems
contrary to Linus' regression rule to not revert the
offending commit, even if reverting the offending
commit is not going to be the final solution. IOW,
I am trying to argue that an important corollary to
the Linus regression rule is that we revert commits
that introduce regressions, especially when there
are no negative effects when reverting the offending
commit. Why are we not doing that in this case?

4. Can you explain why this patch series is superior
to the other proposed patches that are much more
simple and have been reported to fix the regression?

5. This approach seems way too aggressive for backporting
to the stable releases. Is that correct? Or, will the patches
be backported to the stable releases? I was told that
backports to the stable releases are needed to keep things
consistent across all the supported versions when I submitted
a patch to fix this regression that identified a specific five year
old commit that my proposed patch would fix.

Remember, this is a regression that is really bothering
people now. For example, I am now in a position where
I cannot install the updates of the Linux kernel that Debian
pushes out to me without patching the kernel with my
own private build that has one of the known fixes that
have already been identified as ways to workaround this
regression while we wait for the full solution that will
hopefully come later.

Chuck

> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> reports and sometimes miss something important when writing mails like
> this. If that's the case here, don't hesitate to tell me in a public
> reply, it's in everyone's interest to set the public record straight.
>
> BTW, let me tell regzbot to monitor this thread:
>
> #regzbot ^backmonitor:
> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
Chuck Zmudzinski July 19, 2022, 1:16 p.m. UTC | #8
On 7/18/2022 7:32 AM, Chuck Zmudzinski wrote:
> On 7/17/2022 3:55 AM, Thorsten Leemhuis wrote:
> > Hi Juergen!
> >
> > On 15.07.22 16:25, Juergen Gross wrote:
> > > Today PAT can't be used without MTRR being available, unless MTRR is at
> > > least configured via CONFIG_MTRR and the system is running as Xen PV
> > > guest. In this case PAT is automatically available via the hypervisor,
> > > but the PAT MSR can't be modified by the kernel and MTRR is disabled.
> > > 
> > > As an additional complexity the availability of PAT can't be queried
> > > via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
> > > to be disabled. This leads to some drivers believing that not all cache
> > > modes are available, resulting in failures or degraded functionality.
> > > 
> > > The same applies to a kernel built with no MTRR support: it won't
> > > allow to use the PAT MSR, even if there is no technical reason for
> > > that, other than setting up PAT on all cpus the same way (which is a
> > > requirement of the processor's cache management) is relying on some
> > > MTRR specific code.
> > > 
> > > Fix all of that by:
> > > 
> > > - moving the function needed by PAT from MTRR specific code one level
> > >   up
> > > - adding a PAT indirection layer supporting the 3 cases "no or disabled
> > >   PAT", "PAT under kernel control", and "PAT under Xen control"
> > > - removing the dependency of PAT on MTRR
> >
> > Thx for working on this. If you need to respin these patches for one
> > reason or another, could you do me a favor and add proper 'Link:' tags
> > pointing to all reports about this issue? e.g. like this:
> >
> >  Link: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
> >
> > These tags are considered important by Linus[1] and others, as they
> > allow anyone to look into the backstory weeks or years from now. That is
> > why they should be placed in cases like this, as
> > Documentation/process/submitting-patches.rst and
> > Documentation/process/5.Posting.rst explain in more detail. I care
> > personally, because these tags make my regression tracking efforts a
> > whole lot easier, as they allow my tracking bot 'regzbot' to
> > automatically connect reports with patches posted or committed to fix
> > tracked regressions.
> >
> > [1] see for example:
> > https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-DgvdBWg@mail.gmail.com/
> > https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@mail.gmail.com/
> > https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_DoJA@mail.gmail.com/
> >
> > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> >
>
> I echo Thorsten's thx for starting on this now instead of waiting until
> September which I think is when Juergen said he could start working
> on this last week. I agree with Thorsten that Link tags are needed.
> Since multiple patches have been proposed to fix this regression,
> perhaps a Link to each proposed patch, and a note that
> the original report identified a specific commit which when reverted
> also fixes it. IMO, this is all part of the backstory Thorsten refers to.
>
> It looks like with this approach, a fix will not be coming real soon,
> and Borislav Petkov also discouraged me from testing this
> patch set until I receive a ping telling me it is ready for testing,
> which seems to confirm that this regression will not be fixed
> very soon. Please correct me if I am wrong about how long
> it will take to fix it with this approach.
>
> Also, is there any guarantee this approach is endorsed by
> all the maintainers who will need to sign-off, especially
> Linus? I say this because some of the discussion on the
> earlier proposed patches makes me doubt this. I am especially
> referring to this discussion:
>
> https://lore.kernel.org/lkml/4c8c9d4c-1c6b-8e9f-fa47-918a64898a28@leemhuis.info/
>
> and also, here:
>
> https://lore.kernel.org/lkml/YsRjX%2FU1XN8rq+8u@zn.tnic/
>
> where Borislav Petkov argues that Linux should not be
> patched at all to fix this regression but instead the fix
> should come by patching the Xen hypervisor.
>
> So I have several questions, presuming at least the fix is going
> to be delayed for some time, and also presuming this approach
> is not yet an approach that has the blessing of the maintainers
> who will need to sign-off:
>
> 1. Can you estimate when the patch series will be ready for
> testing and suitable for a prepatch or RC release?
>
> 2. Can you estimate when the patch series will be ready to be
> merged into the mainline release? Is there any hope it will be
> fixed before the next longterm release hosted on kernel.org?
>
> 3. Since a fix is likely not coming soon, can you explain
> why the commit that was mentioned in the original
> report cannot be reverted as a temporary solution while
> we wait for the full fix to come later? I can say that
> reverting that commit (It was a commit affecting
> drm/i915) does fix the issue on my system with no
> negative side effects at all. In such a case, it seems
> contrary to Linus' regression rule to not revert the
> offending commit, even if reverting the offending
> commit is not going to be the final solution. IOW,
> I am trying to argue that an important corollary to
> the Linus regression rule is that we revert commits
> that introduce regressions, especially when there
> are no negative effects when reverting the offending
> commit. Why are we not doing that in this case?
>
> 4. Can you explain why this patch series is superior
> to the other proposed patches that are much more
> simple and have been reported to fix the regression?
>
> 5. This approach seems way too aggressive for backporting
> to the stable releases. Is that correct? Or, will the patches
> be backported to the stable releases? I was told that
> backports to the stable releases are needed to keep things
> consistent across all the supported versions when I submitted
> a patch to fix this regression that identified a specific five year
> old commit that my proposed patch would fix.
>
> Remember, this is a regression that is really bothering
> people now. For example, I am now in a position where
> I cannot install the updates of the Linux kernel that Debian
> pushes out to me without patching the kernel with my
> own private build that has one of the known fixes that
> have already been identified as ways to workaround this
> regression while we wait for the full solution that will
> hopefully come later.
>
> Chuck
>
> > P.S.: As the Linux kernel's regression tracker I deal with a lot of
> > reports and sometimes miss something important when writing mails like
> > this. If that's the case here, don't hesitate to tell me in a public
> > reply, it's in everyone's interest to set the public record straight.
> >
> > BTW, let me tell regzbot to monitor this thread:
> >
> > #regzbot ^backmonitor:
> > https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
>

OK, the comments Boris made on the individual patches of
this patch set answers most of my questions. Thx, Boris.

Chuck
Chuck Zmudzinski Aug. 13, 2022, 4:56 p.m. UTC | #9
On 7/17/22 3:55 AM, Thorsten Leemhuis wrote:
> Hi Juergen!
>
> On 15.07.22 16:25, Juergen Gross wrote:
> > Today PAT can't be used without MTRR being available, unless MTRR is at
> > least configured via CONFIG_MTRR and the system is running as Xen PV
> > guest. In this case PAT is automatically available via the hypervisor,
> > but the PAT MSR can't be modified by the kernel and MTRR is disabled.
> > 
> > As an additional complexity the availability of PAT can't be queried
> > via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
> > to be disabled. This leads to some drivers believing that not all cache
> > modes are available, resulting in failures or degraded functionality.
> > 
> > The same applies to a kernel built with no MTRR support: it won't
> > allow to use the PAT MSR, even if there is no technical reason for
> > that, other than setting up PAT on all cpus the same way (which is a
> > requirement of the processor's cache management) is relying on some
> > MTRR specific code.
> > 
> > Fix all of that by:
> > 
> > - moving the function needed by PAT from MTRR specific code one level
> >   up
> > - adding a PAT indirection layer supporting the 3 cases "no or disabled
> >   PAT", "PAT under kernel control", and "PAT under Xen control"
> > - removing the dependency of PAT on MTRR
>
> Thx for working on this. If you need to respin these patches for one
> reason or another, could you do me a favor and add proper 'Link:' tags
> pointing to all reports about this issue? e.g. like this:
>
>  Link: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
>
> These tags are considered important by Linus[1] and others, as they
> allow anyone to look into the backstory weeks or years from now. That is
> why they should be placed in cases like this, as
> Documentation/process/submitting-patches.rst and
> Documentation/process/5.Posting.rst explain in more detail. I care
> personally, because these tags make my regression tracking efforts a
> whole lot easier, as they allow my tracking bot 'regzbot' to
> automatically connect reports with patches posted or committed to fix
> tracked regressions.
>
> [1] see for example:
> https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-DgvdBWg@mail.gmail.com/
> https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@mail.gmail.com/
> https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_DoJA@mail.gmail.com/
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>
> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> reports and sometimes miss something important when writing mails like
> this. If that's the case here, don't hesitate to tell me in a public
> reply, it's in everyone's interest to set the public record straight.
>
> BTW, let me tell regzbot to monitor this thread:
>
> #regzbot ^backmonitor:
> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

Hi Thorsten,

This appears stalled again and we are now over three months
from the first report of the regression, The only excuse for
ignoring your comments, and other comments on the patches
in this patch series for this long a time is that the patch series
for some reason cannot be considered a true regression. If this is a
regression, then, IMHO, this needs to have a higher priority by the
maintainers, or the maintainers need to explain why this regression
cannot be fixed in a more timely manner. But continued silence
by the maintainers is unacceptable, IMHO. This is especially true
in this case when multiple fixes for the regression have been
identified and the maintainers have not yet clearly explained why
at least a fix, even if temporary, cannot be applied immediately
while we wait for a more comprehensive fix.

At the very least, I would expect Juergen to reply here and say that
he is delayed but does plan to spin up an updated version and include
the necessary links in the new version to facilitate your tracking of
the regression. Why the silence from Juergen here?

Best regards,

Chuck
Chuck Zmudzinski Aug. 14, 2022, 7:42 a.m. UTC | #10
On 8/13/2022 12:56 PM, Chuck Zmudzinski wrote:
> On 7/17/22 3:55 AM, Thorsten Leemhuis wrote:
> > Hi Juergen!
> >
> > On 15.07.22 16:25, Juergen Gross wrote: ...
>
> Hi Thorsten,
>
> This appears stalled again and we are now over three months
> from the first report of the regression, The only excuse for
> ignoring your comments, and other comments on the patches
> in this patch series for this long a time is that the patch series
> for some reason cannot be considered a true regression. If this is a
> regression, then, IMHO, this needs to have a higher priority by the
> maintainers, or the maintainers need to explain why this regression
> cannot be fixed in a more timely manner. But continued silence
> by the maintainers is unacceptable, IMHO. This is especially true
> in this case when multiple fixes for the regression have been
> identified and the maintainers have not yet clearly explained why
> at least a fix, even if temporary, cannot be applied immediately
> while we wait for a more comprehensive fix.
>
> At the very least, I would expect Juergen to reply here and say that
> he is delayed but does plan to spin up an updated version and include
> the necessary links in the new version to facilitate your tracking of
> the regression. Why the silence from Juergen here?

This is a fairly long message but I think what I need to say
here is important for the future success of Linux and open
source software, so here goes....

Update: I accept Boris Petkov's response to me yesterday as reasonable
and acceptable if within two weeks he at least explains on the public
mailing lists how he and Juergen have privately agreed to fix this regression
"soon" if he does not actually fix the regression by then with a commit,
patch set, or merge. The two-week time frame is from here:

https://www.kernel.org/doc/html/latest/process/handling-regressions.html

where developers and maintainers are exhorted as follows: "Try to fix
regressions quickly once the culprit has been identified; fixes for most
regressions should be merged within two weeks, but some need to be
resolved within two or three days."

I also think there is a private agreement between Juergen and Boris to
fix this regression because AFAICT there is no evidence in the public
mailing lists that such an agreement has been reached, yet Boris yesterday
told me on the public mailing lists in this thread to be "patient" and that
"we will fix this soon." Unless I am missing something, and I hope I am,
the only way that a fix could be coming "soon" would be to presume
that Juergen and Boris have agreed to a fix for the regression in private.

However, AFAICT, keeping their solution private would be a violation of
netiquette as described here:

https://people.kernel.org/tglx/notes-about-netiquette

where a whole section is devoted to the importance of keeping the
discussion of changes to the kernel in public, with private discussions
being a violation of the netiquette that governs the discussions that
take place between persons interested in the Linux kernel project and
other open source projects.

Yet, in one of his messages to me yesterday, Boris appended the link
to the netiquette rules, thus implicitly accusing me of violating the
netiquette rules when in fact he is the one who at least seems to be
violating the rule forbidding private discussions of changes to the
kernel once a patch set is already up for discussion on the public
mailing lists.

Of course Boris can exonerate himself completely if within two
weeks he at least explains on the public mailing lists how he and
Juergen have agreed to fix the regression. I sincerely hope he at
least does that within the next two weeks, or even better, that he
exonerates himself by actually committing the official fix for the
regression within the next two weeks.

However, I will only believe it when I see it. When it comes to the
Linux kernel, I go by what I see  in the performance of the Linux
kernel in my computing environments, what I see on the public
mailing lists and in the official documentation, and by what I
see in the source code itself. I do not go by blind faith in any
single developer. I am not religious when it comes to the Linux
kernel. Instead, I am scientific and practical about it.

Finally, please forgive me also if I am mistaken in my assumption
that these rules of netiquette apply no less to the developers and
maintainers of the Linux kernel than to others who wish to offer
their contributions to the development of the Linux kernel. If the
rules of netiquette do not apply to the developers and maintainers,
of the kernel, then, IMHO, the great advantage of open source
software development is totally lost, because the advantage of the
open source software development model depends at least as
much on free and open access to the discussions about the
source code conducted by the developers and maintainers as it
does on the freedom to have access to the source code itself.
If someone here tells me that those rules of netiquette need
not be followed by the developers and maintainers I certainly
hope someone else will come to the defense of those same
wise rules that have allowed such a successful open source
software ecosystem to flourish and thrive around this project,
the Linux kernel.

IMHO, the day someone make the decision to stop enforcing these
wise rules is the day that the open source development model will
begin to lose its advantage over proprietary software development
models. And perhaps the most important rule of all for the continued
success of Linux and open source software development is the Linus
regression rule, with the rule that discussions about changes
to the source code must be done in public being a close second in
importance to the Linus regression rule.

Best Regards,

Chuck
Juergen Gross Aug. 14, 2022, 8:08 a.m. UTC | #11
On 14.08.22 09:42, Chuck Zmudzinski wrote:
> On 8/13/2022 12:56 PM, Chuck Zmudzinski wrote:
>> On 7/17/22 3:55 AM, Thorsten Leemhuis wrote:
>>> Hi Juergen!
>>>
>>> On 15.07.22 16:25, Juergen Gross wrote: ...
>>
>> Hi Thorsten,
>>
>> This appears stalled again and we are now over three months
>> from the first report of the regression, The only excuse for
>> ignoring your comments, and other comments on the patches
>> in this patch series for this long a time is that the patch series
>> for some reason cannot be considered a true regression. If this is a
>> regression, then, IMHO, this needs to have a higher priority by the
>> maintainers, or the maintainers need to explain why this regression
>> cannot be fixed in a more timely manner. But continued silence
>> by the maintainers is unacceptable, IMHO. This is especially true
>> in this case when multiple fixes for the regression have been
>> identified and the maintainers have not yet clearly explained why
>> at least a fix, even if temporary, cannot be applied immediately
>> while we wait for a more comprehensive fix.
>>
>> At the very least, I would expect Juergen to reply here and say that
>> he is delayed but does plan to spin up an updated version and include
>> the necessary links in the new version to facilitate your tracking of
>> the regression. Why the silence from Juergen here?
> 
> This is a fairly long message but I think what I need to say
> here is important for the future success of Linux and open
> source software, so here goes....
> 
> Update: I accept Boris Petkov's response to me yesterday as reasonable
> and acceptable if within two weeks he at least explains on the public
> mailing lists how he and Juergen have privately agreed to fix this regression
> "soon" if he does not actually fix the regression by then with a commit,
> patch set, or merge. The two-week time frame is from here:
> 
> https://www.kernel.org/doc/html/latest/process/handling-regressions.html
> 
> where developers and maintainers are exhorted as follows: "Try to fix
> regressions quickly once the culprit has been identified; fixes for most
> regressions should be merged within two weeks, but some need to be
> resolved within two or three days."

And some more citations from the same document:

"Prioritize work on handling regression reports and fixing regression over all
other Linux kernel work, unless the latter concerns acute security issues or
bugs causing data loss or damage."

First thing to note here: "over all Linux kernel work". I' not only working
on the kernel, but I have other responsibilities e.g. in the Xen community,
where I was sending patches for fixing a regression and where I'm quite busy
doing security related work. Apart from that I'm of course responsible to
handle SUSE customers' bug reports at a rather high priority. So please stop
accusing me to ignore the responses to these patches. This is just not really
motivating me to continue interacting with you.

"Always consider reverting the culprit commits and reapplying them later
together with necessary fixes, as this might be the least dangerous and quickest
way to fix a regression."

I didn't introduce the regression, nor was it introduced in my area of
maintainership. It just happened to hit Xen. So I stepped up after Jan's patches
were not deemed to be the way to go, and I wrote the patches in spite of me
having other urgent work to do. In case you are feeling so strong about the fix
of the regression, why don't you ask for the patch introducing it to be reverted
instead? Accusing me and Boris is not acceptable at all!

> I also think there is a private agreement between Juergen and Boris to
> fix this regression because AFAICT there is no evidence in the public
> mailing lists that such an agreement has been reached, yet Boris yesterday
> told me on the public mailing lists in this thread to be "patient" and that
> "we will fix this soon." Unless I am missing something, and I hope I am,
> the only way that a fix could be coming "soon" would be to presume
> that Juergen and Boris have agreed to a fix for the regression in private.
> 
> However, AFAICT, keeping their solution private would be a violation of
> netiquette as described here:
> 
> https://people.kernel.org/tglx/notes-about-netiquette
> 
> where a whole section is devoted to the importance of keeping the
> discussion of changes to the kernel in public, with private discussions
> being a violation of the netiquette that governs the discussions that
> take place between persons interested in the Linux kernel project and
> other open source projects.

Another uncalled for attack.

After sending the patches I just told Boris via IRC that I wouldn't react
to any responses soon, as I was about to start my vacation. This was just a
hint for him, as he was rather busy at that time handling kernel security
issues.

I won't comment on the rest of your absolute unacceptable accusations.

I will continue with the patches as soon as I find time to do so.


Juergen
Chuck Zmudzinski Aug. 14, 2022, 9:19 a.m. UTC | #12
On 8/14/2022 3:42 AM, Chuck Zmudzinski wrote:
> On 8/13/2022 12:56 PM, Chuck Zmudzinski wrote:
> > On 7/17/22 3:55 AM, Thorsten Leemhuis wrote:
> > > Hi Juergen!
> > >
> > > On 15.07.22 16:25, Juergen Gross wrote: ...
> >
> > Hi Thorsten,
> >
> > This appears stalled again and we are now over three months
> > from the first report of the regression, The only excuse for
> > ignoring your comments, and other comments on the patches
> > in this patch series for this long a time is that the patch series
> > for some reason cannot be considered a true regression. If this is a
> > regression, then, IMHO, this needs to have a higher priority by the
> > maintainers, or the maintainers need to explain why this regression
> > cannot be fixed in a more timely manner. But continued silence
> > by the maintainers is unacceptable, IMHO. This is especially true
> > in this case when multiple fixes for the regression have been
> > identified and the maintainers have not yet clearly explained why
> > at least a fix, even if temporary, cannot be applied immediately
> > while we wait for a more comprehensive fix.
> >
> > At the very least, I would expect Juergen to reply here and say that
> > he is delayed but does plan to spin up an updated version and include
> > the necessary links in the new version to facilitate your tracking of
> > the regression. Why the silence from Juergen here?
>
> This is a fairly long message but I think what I need to say
> here is important for the future success of Linux and open
> source software, so here goes....
>
> Update: I accept Boris Petkov's response to me yesterday as reasonable
> and acceptable if within two weeks he at least explains on the public
> mailing lists how he and Juergen have privately agreed to fix this regression
> "soon" if he does not actually fix the regression by then with a commit,
> patch set, or merge. The two-week time frame is from here:
>
> https://www.kernel.org/doc/html/latest/process/handling-regressions.html
>
> where developers and maintainers are exhorted as follows: "Try to fix
> regressions quickly once the culprit has been identified; fixes for most
> regressions should be merged within two weeks, but some need to be
> resolved within two or three days."
>
> I also think there is a private agreement between Juergen and Boris to
> fix this regression because AFAICT there is no evidence in the public
> mailing lists that such an agreement has been reached, yet Boris yesterday
> told me on the public mailing lists in this thread to be "patient" and that
> "we will fix this soon." Unless I am missing something, and I hope I am,
> the only way that a fix could be coming "soon" would be to presume
> that Juergen and Boris have agreed to a fix for the regression in private.
>
> However, AFAICT, keeping their solution private would be a violation of
> netiquette as described here:
>
> https://people.kernel.org/tglx/notes-about-netiquette
>
> where a whole section is devoted to the importance of keeping the
> discussion of changes to the kernel in public, with private discussions
> being a violation of the netiquette that governs the discussions that
> take place between persons interested in the Linux kernel project and
> other open source projects.
>
> Yet, in one of his messages to me yesterday, Boris appended the link
> to the netiquette rules, thus implicitly accusing me of violating the
> netiquette rules when in fact he is the one who at least seems to be
> violating the rule forbidding private discussions of changes to the
> kernel once a patch set is already up for discussion on the public
> mailing lists.
>
> Of course Boris can exonerate himself completely if within two
> weeks he at least explains on the public mailing lists how he and
> Juergen have agreed to fix the regression. I sincerely hope he at
> least does that within the next two weeks, or even better, that he
> exonerates himself by actually committing the official fix for the
> regression within the next two weeks.
>
> However, I will only believe it when I see it. When it comes to the
> Linux kernel, I go by what I see  in the performance of the Linux
> kernel in my computing environments, what I see on the public
> mailing lists and in the official documentation, and by what I
> see in the source code itself. I do not go by blind faith in any
> single developer. I am not religious when it comes to the Linux
> kernel. Instead, I am scientific and practical about it.
>
> Finally, please forgive me also if I am mistaken in my assumption
> that these rules of netiquette apply no less to the developers and
> maintainers of the Linux kernel than to others who wish to offer
> their contributions to the development of the Linux kernel. If the
> rules of netiquette do not apply to the developers and maintainers,
> of the kernel, then, IMHO, the great advantage of open source
> software development is totally lost, because the advantage of the
> open source software development model depends at least as
> much on free and open access to the discussions about the
> source code conducted by the developers and maintainers as it
> does on the freedom to have access to the source code itself.
> If someone here tells me that those rules of netiquette need
> not be followed by the developers and maintainers I certainly
> hope someone else will come to the defense of those same
> wise rules that have allowed such a successful open source
> software ecosystem to flourish and thrive around this project,
> the Linux kernel.
>
> IMHO, the day someone make the decision to stop enforcing these
> wise rules is the day that the open source development model will
> begin to lose its advantage over proprietary software development
> models. And perhaps the most important rule of all for the continued
> success of Linux and open source software development is the Linus
> regression rule, with the rule that discussions about changes
> to the source code must be done in public being a close second in
> importance to the Linus regression rule.
>
> Best Regards,
>
> Chuck

Hi Thorsten,

Well, that did not take long. Juergen responded with a message,
which is encrypted and not delivered to my mailbox because I do not
have the PGP keys, presumably to make it difficult for me to continue
the discussion and defend myself after I was accused of violating
the netiquette rules yesterday by Boris:

https://lore.kernel.org/lkml/c88ea08c-a9d5-ef6a-333a-db9e00c6da6f@suse.com/raw

Fortunately, lore.kernel.org did decrypt Juergen's message so you can read
what he wrote in response to my message there. I don't think what Juergen said
there is very constructive although I am not surprised he seeks to defend himself,
and he makes many valid points that are good for developers and Linux insiders
but not so good for users and the long-term success of the Linux kernel project,
so I am not going to reproduce what he said in this message, but I think you
need to read it to help you understand why this regression is not being fixed
in a timely manner:

https://lore.kernel.org/lkml/c88ea08c-a9d5-ef6a-333a-db9e00c6da6f@suse.com/

Sorry for the trouble, but I am just a user trying to understand why this
regression has not been fixed for over three months.  If this is the best the
Linux kernel community can do in response to my questions about this regression,
then in the long run, I can assure, you, the open source development model is
doomed to a slow, long, and eventually painful death.

Best regards,

Chuck
Greg Kroah-Hartman Aug. 14, 2022, 9:50 a.m. UTC | #13
On Sun, Aug 14, 2022 at 05:19:12AM -0400, Chuck Zmudzinski wrote:
> Well, that did not take long. Juergen responded with a message,
> which is encrypted and not delivered to my mailbox because I do not
> have the PGP keys, presumably to make it difficult for me to continue
> the discussion and defend myself after I was accused of violating
> the netiquette rules yesterday by Boris:

The message was signed, not encrypted.  Odd that your email client could
not read it, perhaps you need to use a different one?

thanks,

greg k-h
Chuck Zmudzinski Aug. 14, 2022, 12:08 p.m. UTC | #14
On 8/14/2022 5:50 AM, Greg KH wrote:
> On Sun, Aug 14, 2022 at 05:19:12AM -0400, Chuck Zmudzinski wrote:
> > Well, that did not take long. Juergen responded with a message,
> > which is encrypted and not delivered to my mailbox because I do not
> > have the PGP keys, presumably to make it difficult for me to continue
> > the discussion and defend myself after I was accused of violating
> > the netiquette rules yesterday by Boris:
>
> The message was signed, not encrypted.  Odd that your email client could
> not read it, perhaps you need to use a different one?
>
> thanks,
>
> greg k-h

It's not that my e-mail client could not read it, there is no evidence it
was ever sent to me. I use aol.com which is administered by Yahoo!. It
did not even appear in the web interface for my e-mail service, so it
was never delivered to my e-mail client, which is Thunderbird. Neither
the Windows nor the Linux client can retrieve a message not delivered
to the Yahoo! servers! I also checked the Junk and Spam folders and it
was not there either. But I received your message and other messages
normally. It is as if the message was sent to everyone else on the To:
and Cc: lists except for me. I think the problem was on the sender's end
or with my e-mail service, Yahoo!, which apparently does not accept signed
messages without some special configuration that I have not done
with Yahoo! yet. I will look into it next week.

Chuck
Greg Kroah-Hartman Aug. 14, 2022, 1:01 p.m. UTC | #15
On Sun, Aug 14, 2022 at 08:08:30AM -0400, Chuck Zmudzinski wrote:
> On 8/14/2022 5:50 AM, Greg KH wrote:
> > On Sun, Aug 14, 2022 at 05:19:12AM -0400, Chuck Zmudzinski wrote:
> > > Well, that did not take long. Juergen responded with a message,
> > > which is encrypted and not delivered to my mailbox because I do not
> > > have the PGP keys, presumably to make it difficult for me to continue
> > > the discussion and defend myself after I was accused of violating
> > > the netiquette rules yesterday by Boris:
> >
> > The message was signed, not encrypted.  Odd that your email client could
> > not read it, perhaps you need to use a different one?
> >
> > thanks,
> >
> > greg k-h
> 
> It's not that my e-mail client could not read it, there is no evidence it
> was ever sent to me.

The To: line had your address in it, so it was sent to you, and again,
it was not encrypted as you claimed, but rather just signed to verify he
was the sender.  That's not making anything difficult for anyone, so I
think you owe him an apology here, especially as you are asking him to
do work for you.

best of luck!

greg k-h
Chuck Zmudzinski Aug. 14, 2022, 4:03 p.m. UTC | #16
On 8/14/2022 9:01 AM, Greg KH wrote:
> On Sun, Aug 14, 2022 at 08:08:30AM -0400, Chuck Zmudzinski wrote:
> > On 8/14/2022 5:50 AM, Greg KH wrote:
> > > On Sun, Aug 14, 2022 at 05:19:12AM -0400, Chuck Zmudzinski wrote:
> > > > Well, that did not take long. Juergen responded with a message,
> > > > which is encrypted and not delivered to my mailbox because I do not
> > > > have the PGP keys, presumably to make it difficult for me to continue
> > > > the discussion and defend myself after I was accused of violating
> > > > the netiquette rules yesterday by Boris:
> > >
> > > The message was signed, not encrypted.  Odd that your email client could
> > > not read it, perhaps you need to use a different one?
> > >
> > > thanks,
> > >
> > > greg k-h
> > 
> > It's not that my e-mail client could not read it, there is no evidence it
> > was ever sent to me.
>
> The To: line had your address in it, so it was sent to you, and again,
> it was not encrypted as you claimed, but rather just signed to verify he
> was the sender.  That's not making anything difficult for anyone, so I
> think you owe him an apology here, especially as you are asking him to
> do work for you.
>
> best of luck!
>
> greg k-h

Dear Greg,

Thanks for the advice. I appreciate it. Below follows my apology to Juergen and
and Thorsten and some additional comments for anyone willing to hear what
I am trying to say as I continue to try to participate in the discussion of this
regression...

Dear Juergen and Thorsten,

I do apologize since I agree there is not enough evidence to conclude that
Juergen purposely made it difficult for me to respond to and defend myself
against the negative things he said about me in the e-mail I never received
from him.

I am not going to try to defend myself either, since it is not necessary and is
probably an impossible task for me to succeed in defending myself here in
this forum. The e-mail you tried and failed to send to me is currently
publicly available on more than one public mailing lists and it speaks for
itself. Each person who reads it and the other relevant messages in the
thread will decide for himself or herself what that message means.

So far I am inclined to think most people who will even take the time to
read the thread will judge me to be in the wrong, and I also am inclined
to think many who are Cc'd on this thread are already ignoring me
because they consider me to be a total jerk. That's fine, but that's just
their opinion, especially if they base their opinion only on a custom
of hazing users who dare to say what they think on the Linux public
mailing lists.

But since you are the persons who create the Linux kernel, I will express
my opinion that your decision to reject my efforts to help the kernel
developers and maintainers work better together with each other and
with users like me who are brave enough to say what they think on these
public mailing lists is the wrong decision if your goal is really to make
Linux and open source software development able to continue to produce
high quality software that is actually useful to people.

I say that because I am trying to scream to you as loud as I can: "Linux
software is no longer useful to me." No one here seems willing to hear
that message. I wonder if Linus even cares about that anymore. And that is
sad, because Linux was a great project. Unfortunately, now, it is clear to
me it is going to die a slow, painful death. The Linux kernel is a big and
powerful enough project to survive for quite a while, and I probably won't
live to see its death, but unless the people who define the Linux kernel
community change, it will eventually die.

Best regards and good luck to all of you,

Chuck
Chuck Zmudzinski Aug. 14, 2022, 7:52 p.m. UTC | #17
On 8/14/2022 9:01 AM, Greg KH wrote:
> The To: line had your address in it, so it was sent to you, and again,
> it was not encrypted as you claimed, but rather just signed to verify he
> was the sender.  That's not making anything difficult for anyone, so I
> think you owe him an apology here, especially as you are asking him to
> do work for you.

You misunderstand me completely. I am not here to ask Juergen to do any
work for me, he is the one who volunteered to fix a regression that affects
my computer, so I am interested in what he has to say, and I am on this mailing
list to find out if he, and other Linux developers and maintainers, are the kind
of people I want to have writing the software that runs on my computers.
I don't have to tell you what my decision about that is, but do you really think
I want people who refuse to answer my questions about the software they
are writing for my computers to continue to be the ones I rely on for the
security and stability of my computer systems? If you think I am that stupid,
I suppose you also think I am too stupid to receive an e-mail message that
Juergen tried to send me earlier today. The fact is, Juergen is the only
person I am aware of who has tried and failed to get an e-mail message
delivered to me during the past thirty years since I started using e-mail.
That's quite an accomplishment for Juergen to achieve!

Best regards,

Chuck
Chuck Zmudzinski Aug. 15, 2022, 3:23 a.m. UTC | #18
On 8/14/22 4:08 AM, Juergen Gross wrote:
> > On 8/13/2022 12:56 PM, Chuck Zmudzinski wrote:
> > 
> > This is a fairly long message but I think what I need to say
> > here is important for the future success of Linux and open
> > source software, so here goes....
> > 
> > Update: I accept Boris Petkov's response to me yesterday as reasonable
> > and acceptable if within two weeks he at least explains on the public
> > mailing lists how he and Juergen have privately agreed to fix this regression
> > "soon" if he does not actually fix the regression by then with a commit,
> > patch set, or merge. The two-week time frame is from here:
> > 
> > https://www.kernel.org/doc/html/latest/process/handling-regressions.html
> > 
> > where developers and maintainers are exhorted as follows: "Try to fix
> > regressions quickly once the culprit has been identified; fixes for most
> > regressions should be merged within two weeks, but some need to be
> > resolved within two or three days."
>
> And some more citations from the same document:
>
> "Prioritize work on handling regression reports and fixing regression over all
> other Linux kernel work, unless the latter concerns acute security issues or
> bugs causing data loss or damage."
>
> First thing to note here: "over all Linux kernel work". I' not only working
> on the kernel, but I have other responsibilities e.g. in the Xen community,
> where I was sending patches for fixing a regression and where I'm quite busy
> doing security related work. Apart from that I'm of course responsible to
> handle SUSE customers' bug reports at a rather high priority. So please stop
> accusing me to ignore the responses to these patches. This is just not really
> motivating me to continue interacting with you.

You are busy, and that is always true for someone with your responsibilities.
That is an acceptable reason to delay your responses for a time.

>
> "Always consider reverting the culprit commits and reapplying them later
> together with necessary fixes, as this might be the least dangerous and quickest
> way to fix a regression."
>
> I didn't introduce the regression, nor was it introduced in my area of
> maintainership. It just happened to hit Xen. So I stepped up after Jan's patches
> were not deemed to be the way to go, and I wrote the patches in spite of me
> having other urgent work to do. In case you are feeling so strong about the fix
> of the regression, why don't you ask for the patch introducing it to be reverted
> instead? 

I have asked for this on more than one occasion, but I was either
ignored or shot down every time. The fact is, among the persons
who have the power to actually commit a fix, only you and Boris
are currently indicating any willingness to actually fix the regression.
I will say the greater responsibility for this falls on Boris because
he is an x86 maintainer, and you have every right to walk away
and say "I will not work on a fix," and I would not blame you or accuse
you of doing anything wrong if you did that. You are under no obligation
to fix this. Boris is the one who must fix it, or the Intel developers,
by reverting the commit that was originally identified as the bad
commit.

If it is any consolation to you, Juergen, I think the greatest problem
is the silence of the drm/i915 maintainers, and Thorsten also expressed
some dissatisfaction because of that, but since there is also some
consensus that the fix should be done in x86 or x86/pat instead of
in drm/i915, another problem is the lack of initiative by the x86
developers to fix it. If they do not know how to fix it and need to
rely on someone with Xen expertise, they should be giving you
more assistance and feedback than they currently are. So far, only
Boris shows any interest, and now my only critique of your behavior
is that in your message, you chose to engage in an ad hominum attack
against me instead of taking the same amount of time to at least
briefly answer the questions Boris raised about your patch set over
three weeks ago. Your decision to attack me instead of working on
the fix was, IMHO, not helpful and constructive.
> Accusing me and Boris is not acceptable at all!

OK, I understand, now we are even. I have said it is unacceptable to
not give greater priority to the regression fix or at least keep interested
persons informed if there is a reason to continue to delay a fix, which
ordinarily should only take two weeks, but now we are at more than
three months. Now, you are saying it is unacceptable for me to accuse
you and Boris. OK, so we are even. We each think the other is acting
in an unacceptable way. I still think it is unacceptable to not work on
the fix and instead engage in ad hominum attacks. Maybe I am wrong.
Maybe maintainers are supposed to attack persons who are not
maintainers when such outsiders try to help and encourage better
cooperation and end the hostile silence by the maintainers who are
responsible to fix this. But that does not make sense to me. It makes
sense to hold accountable those persons who are responsible for fixing
this (and you, Juergen, are not the one that needs to be held accountable).
AFAICT, that is not being done and instead I am being attacked for trying
to get work towards a fix rolling again.

>
> > I also think there is a private agreement between Juergen and Boris to
> > fix this regression because AFAICT there is no evidence in the public
> > mailing lists that such an agreement has been reached, yet Boris yesterday
> > told me on the public mailing lists in this thread to be "patient" and that
> > "we will fix this soon." Unless I am missing something, and I hope I am,
> > the only way that a fix could be coming "soon" would be to presume
> > that Juergen and Boris have agreed to a fix for the regression in private.
> > 
> > However, AFAICT, keeping their solution private would be a violation of
> > netiquette as described here:
> > 
> > https://people.kernel.org/tglx/notes-about-netiquette
> > 
> > where a whole section is devoted to the importance of keeping the
> > discussion of changes to the kernel in public, with private discussions
> > being a violation of the netiquette that governs the discussions that
> > take place between persons interested in the Linux kernel project and
> > other open source projects.
>
> Another uncalled for attack.

I am just asking for some transparency and an indication that
a fix is really and truly in sight. It would only take you a few
minutes to fulfill what I am asking you to do now. The fact is,
Boris commented on your patches over three weeks ago and
asked you if you accepted the approach he outlined and you
have remained silent. That does not indicate you and Boris
are close to coming to a fix even though Boris stated that a fix
is coming soon. Based on what has been said on the mailing
lists, I just don't see the fix coming soon. That's all I can say
about it now.

>
> After sending the patches I just told Boris via IRC that I wouldn't react
> to any responses soon, as I was about to start my vacation.

That is certainly a valid reason to delay work on this - you were on
vacation. I hope you enjoyed yourself and had a good time. But I
had no way of knowing this because I was not part of the IRC
communication, so I cannot be blamed for not knowing this.

> I will continue with the patches as soon as I find time to do so.

I am willing to wait patiently for you to get back to these patches,
and I hope you can agree that you should find a few minutes
to confirm or deny Boris' statement that a fix is coming "soon"
by posting a public message to this thread within the next two
weeks, given that this regression has not been fixed for over three
months. I will not be upset if you say something like: "it looks like
it might take a while for Boris and I to work out the details of a fix,
it might take until the end of the year," and briefly explain why there
will be a delay. Boris might not like that because it would contradict
his statement that a fix is coming "soon" but I would rather be told
the truth - that the fix is going to be delayed, than be told a lie - that
a fix is coming soon.

Thanks for all the work you do.

Best regards,

Chuck
Chuck Zmudzinski Aug. 15, 2022, 4:56 p.m. UTC | #19
Hi Thorsten,

I am forwarding this to you to help you cut through the noise. Unfortunately
the discussion of fixes for this regression has degenerated into ad hominum
attacks. I admit that I started complaining about the response of the
maintainers to this regression and now they are attacking me. I do apologize,
but I do not want to over-apologize. I do not apologize for trying to get
the fix for this regression rolling again. After all, it has been over three months
since the regression was first reported. I don't think I should be accused of
doing anything wrong just for asking for some transparency, honesty, and
a realistic estimate for how long it will take before a fix is committed from the
maintainers responsible for and working on a fix for this regression. I do want
you to provide some feedback here on the public mailing lists.

I present the following message which cuts out the noise and I think describes
fairly completely the problems that are preventing a fix for this regression from
getting merged into the mainline kernel. Can you weigh in with your opinion
about what should be done now?

Best regards,

Chuck

On 8/14/2022 11:23 PM, Chuck Zmudzinski wrote:
> On 8/14/22 4:08 AM, Juergen Gross wrote:
> > > On 8/13/2022 12:56 PM, Chuck Zmudzinski wrote:
> > > 
> > > This is a fairly long message but I think what I need to say
> > > here is important for the future success of Linux and open
> > > source software, so here goes....
> > > 
> > > Update: I accept Boris Petkov's response to me yesterday as reasonable
> > > and acceptable if within two weeks he at least explains on the public
> > > mailing lists how he and Juergen have privately agreed to fix this regression
> > > "soon" if he does not actually fix the regression by then with a commit,
> > > patch set, or merge. The two-week time frame is from here:
> > > 
> > > https://www.kernel.org/doc/html/latest/process/handling-regressions.html
> > > 
> > > where developers and maintainers are exhorted as follows: "Try to fix
> > > regressions quickly once the culprit has been identified; fixes for most
> > > regressions should be merged within two weeks, but some need to be
> > > resolved within two or three days."
> >
> > And some more citations from the same document:
> >
> > "Prioritize work on handling regression reports and fixing regression over all
> > other Linux kernel work, unless the latter concerns acute security issues or
> > bugs causing data loss or damage."
> >
> > First thing to note here: "over all Linux kernel work". I' not only working
> > on the kernel, but I have other responsibilities e.g. in the Xen community,
> > where I was sending patches for fixing a regression and where I'm quite busy
> > doing security related work. Apart from that I'm of course responsible to
> > handle SUSE customers' bug reports at a rather high priority. So please stop
> > accusing me to ignore the responses to these patches. This is just not really
> > motivating me to continue interacting with you.
>
> You are busy, and that is always true for someone with your responsibilities.
> That is an acceptable reason to delay your responses for a time.
>
> >
> > "Always consider reverting the culprit commits and reapplying them later
> > together with necessary fixes, as this might be the least dangerous and quickest
> > way to fix a regression."
> >
> > I didn't introduce the regression, nor was it introduced in my area of
> > maintainership. It just happened to hit Xen. So I stepped up after Jan's patches
> > were not deemed to be the way to go, and I wrote the patches in spite of me
> > having other urgent work to do. In case you are feeling so strong about the fix
> > of the regression, why don't you ask for the patch introducing it to be reverted
> > instead? 
>
> I have asked for this on more than one occasion, but I was either
> ignored or shot down every time. The fact is, among the persons
> who have the power to actually commit a fix, only you and Boris
> are currently indicating any willingness to actually fix the regression.
> I will say the greater responsibility for this falls on Boris because
> he is an x86 maintainer, and you have every right to walk away
> and say "I will not work on a fix," and I would not blame you or accuse
> you of doing anything wrong if you did that. You are under no obligation
> to fix this. Boris is the one who must fix it, or the Intel developers,
> by reverting the commit that was originally identified as the bad
> commit.
>
> If it is any consolation to you, Juergen, I think the greatest problem
> is the silence of the drm/i915 maintainers, and Thorsten also expressed
> some dissatisfaction because of that, but since there is also some
> consensus that the fix should be done in x86 or x86/pat instead of
> in drm/i915, another problem is the lack of initiative by the x86
> developers to fix it. If they do not know how to fix it and need to
> rely on someone with Xen expertise, they should be giving you
> more assistance and feedback than they currently are. So far, only
> Boris shows any interest, and now my only critique of your behavior
> is that in your message, you chose to engage in an ad hominum attack
> against me instead of taking the same amount of time to at least
> briefly answer the questions Boris raised about your patch set over
> three weeks ago. Your decision to attack me instead of working on
> the fix was, IMHO, not helpful and constructive.
> > Accusing me and Boris is not acceptable at all!
>
> OK, I understand, now we are even. I have said it is unacceptable to
> not give greater priority to the regression fix or at least keep interested
> persons informed if there is a reason to continue to delay a fix, which
> ordinarily should only take two weeks, but now we are at more than
> three months. Now, you are saying it is unacceptable for me to accuse
> you and Boris. OK, so we are even. We each think the other is acting
> in an unacceptable way. I still think it is unacceptable to not work on
> the fix and instead engage in ad hominum attacks. Maybe I am wrong.
> Maybe maintainers are supposed to attack persons who are not
> maintainers when such outsiders try to help and encourage better
> cooperation and end the hostile silence by the maintainers who are
> responsible to fix this. But that does not make sense to me. It makes
> sense to hold accountable those persons who are responsible for fixing
> this (and you, Juergen, are not the one that needs to be held accountable).
> AFAICT, that is not being done and instead I am being attacked for trying
> to get work towards a fix rolling again.
>
> >
> > > I also think there is a private agreement between Juergen and Boris to
> > > fix this regression because AFAICT there is no evidence in the public
> > > mailing lists that such an agreement has been reached, yet Boris yesterday
> > > told me on the public mailing lists in this thread to be "patient" and that
> > > "we will fix this soon." Unless I am missing something, and I hope I am,
> > > the only way that a fix could be coming "soon" would be to presume
> > > that Juergen and Boris have agreed to a fix for the regression in private.
> > > 
> > > However, AFAICT, keeping their solution private would be a violation of
> > > netiquette as described here:
> > > 
> > > https://people.kernel.org/tglx/notes-about-netiquette
> > > 
> > > where a whole section is devoted to the importance of keeping the
> > > discussion of changes to the kernel in public, with private discussions
> > > being a violation of the netiquette that governs the discussions that
> > > take place between persons interested in the Linux kernel project and
> > > other open source projects.
> >
> > Another uncalled for attack.
>
> I am just asking for some transparency and an indication that
> a fix is really and truly in sight. It would only take you a few
> minutes to fulfill what I am asking you to do now. The fact is,
> Boris commented on your patches over three weeks ago and
> asked you if you accepted the approach he outlined and you
> have remained silent. That does not indicate you and Boris
> are close to coming to a fix even though Boris stated that a fix
> is coming soon. Based on what has been said on the mailing
> lists, I just don't see the fix coming soon. That's all I can say
> about it now.
>
> >
> > After sending the patches I just told Boris via IRC that I wouldn't react
> > to any responses soon, as I was about to start my vacation.
>
> That is certainly a valid reason to delay work on this - you were on
> vacation. I hope you enjoyed yourself and had a good time. But I
> had no way of knowing this because I was not part of the IRC
> communication, so I cannot be blamed for not knowing this.
>
> > I will continue with the patches as soon as I find time to do so.
>
> I am willing to wait patiently for you to get back to these patches,
> and I hope you can agree that you should find a few minutes
> to confirm or deny Boris' statement that a fix is coming "soon"
> by posting a public message to this thread within the next two
> weeks, given that this regression has not been fixed for over three
> months. I will not be upset if you say something like: "it looks like
> it might take a while for Boris and I to work out the details of a fix,
> it might take until the end of the year," and briefly explain why there
> will be a delay. Boris might not like that because it would contradict
> his statement that a fix is coming "soon" but I would rather be told
> the truth - that the fix is going to be delayed, than be told a lie - that
> a fix is coming soon.
>
> Thanks for all the work you do.
>
> Best regards,
>
> Chuck
Thorsten Leemhuis Aug. 15, 2022, 6 p.m. UTC | #20
Hi Chuck!

On 15.08.22 18:56, Chuck Zmudzinski wrote:
> 
> I am forwarding this to you to help you cut through the noise.

Sorry for not replying earlier, I ignored this thread and all other
non-urgent mail in the past two weeks: I was on vacation until a few
days ago and when I came home I had to deal with some other stuff first.

> I do not apologize for trying to get
> the fix for this regression rolling again.

Yeah, it's important to ensure regressions don't simply fall though the
cracks, but my advice in this case: let things rest for a few days now,
the right people have the issue on their radar again; give them time to
breath and work out a solution: it's not something that can be fixed
easily within a few minutes by one person alone, as previous discussions
have shown (also keep in mind that the merge window was open until
yesterday, which keeps many maintainers quite busy).

And FWIW: I've seen indicators that a solution to resolve this is
hopefully pretty close now.

>  After all, it has been over three months
> since the regression was first reported.

Yes, things take/took to long, as a few things were far from ideal how
this regression was dealt with. But that happens sometimes, we're all
just humans and make errors. I did a few as well and learned a thing or
two from then. Due to that I'll do a few things slightly different in
the future to hopefully get similar situations resolved a lot quicker in
the future.

Ciao, Thorsten
Thorsten Leemhuis Aug. 16, 2022, 2:41 p.m. UTC | #21
On 15.08.22 20:17, Chuck Zmudzinski wrote:
> On 8/15/2022 2:00 PM, Thorsten Leemhuis wrote:
>
>> the right people have the issue on their radar again; give them time to
>> breath and work out a solution: it's not something that can be fixed
>> easily within a few minutes by one person alone, as previous discussions
>> have shown (also keep in mind that the merge window was open until
>> yesterday, which keeps many maintainers quite busy).
>>
>> And FWIW: I've seen indicators that a solution to resolve this is
>> hopefully pretty close now.
> 
> That's good to know. But I must ask, can you provide a link to a public
> discussion that indicates a fix is close?

I just searched for the commit id of the culprit yesterday like this:
https://lore.kernel.org/all/?q=bdd8b6c982*

Which brought me to this message, which looks like Boris applied a
slightly(?) modified version of Jan's patch to a branch that afaik is
regularly pushed to Linus:
https://lore.kernel.org/all/166055884287.401.612271624942869534.tip-bot2@tip-bot2/

So unless problems show up in linux-next I expect this will land in
master soon (and a bit later be backported to stable due to the CC
stable tag).

> Or do you know a fix is close
> because of private discussions? That distinction is important to me
> because open source software is much less useful to me if the solutions
> to problems are not discussed openly (except, of course, for solutions
> to security vulnerabilities that are not yet public).

You IMHO are expecting a bit too much here IMHO. Solutions to problems
in open source software get discussed on various, sometimes private
channels all the time. Just take conferences for example, where people
discuss them during talks, meetings, or in one-to-ones over coffee;
sometimes they are the only way to solve complex problems. But as you
can see from above link it's not like anybody is trying to sneak things
into the kernel.

Ciao, Thorsten
Thorsten Leemhuis Aug. 16, 2022, 4:53 p.m. UTC | #22
On 16.08.22 18:16, Chuck Zmudzinski wrote:
> On 8/16/2022 10:41 AM, Thorsten Leemhuis wrote:
>> On 15.08.22 20:17, Chuck Zmudzinski wrote:
>>> On 8/15/2022 2:00 PM, Thorsten Leemhuis wrote:
>>>
>>>> And FWIW: I've seen indicators that a solution to resolve this is
>>>> hopefully pretty close now.
>>> That's good to know. But I must ask, can you provide a link to a public
>>> discussion that indicates a fix is close?
>> I just searched for the commit id of the culprit yesterday like this:
>> https://lore.kernel.org/all/?q=bdd8b6c982*
>>
>> Which brought me to this message, which looks like Boris applied a
>> slightly(?) modified version of Jan's patch to a branch that afaik is
>> regularly pushed to Linus:
>> https://lore.kernel.org/all/166055884287.401.612271624942869534.tip-bot2@tip-bot2/
>>
>> So unless problems show up in linux-next I expect this will land in
>> master soon (and a bit later be backported to stable due to the CC
>> stable tag).
> 
> OK, that's exactly the kind of thing I am looking for. It would be
> nice if regzbot could have found that patch in that tree and
> display it in the web interface as a notable patch. Currently,
> regzbot is only linking to a dead patch that does not even fix
> the regression as a notable patch associated with this regression.
> 
> If regzbot is not yet smart enough to find it, could you take the
> time to manually intervene with a regzbot command so that
> patch is displayed as a notable patch for this regression?

regzbot will notice when the patch hit's Linux next, where many changes
land and hang around for a few days before they hit mainline. Watching
all the different development trees would be possible and would catch
this patch earlier, but I'm not sure that's worth the work. Maybe
regzbot will do that one day, but there are more important missing
features on my todo list for now.

Ciao, Thorsten
Chuck Zmudzinski Aug. 18, 2022, 6:54 p.m. UTC | #23
On 8/16/22 1:28 PM, Chuck Zmudzinski wrote:
> On 8/16/2022 12:53 PM, Thorsten Leemhuis wrote:
>
> >
> > regzbot will notice when the patch hit's Linux next,
>
> IIUC, regzbot might not notice because the patch lacks a Link: tag
> to the original regression report. The Link tag is to Jan's patch
> that was posted sometime in April, I think, which also lacks the
> Link tag to the original report of the regression which did not
> happen until May 4. If regzbot is smart enough to notice that the
> patch also has a Fixes: tag for the commit that was identified as
> bad in the original regression report, then I expect regzbot will
> find it.

Hey, I see the patch hit linux-next and regzbot noticed and
now lists the patch as an incoming fix. Great job with regzbot!

By the way, I think regzbot is a great idea, and I think any resources
devoted to develop it more would pay handsome returns for the
quality of Linux. If no one but you is working on it, I actually might
be willing to volunteer some time to help you develop it.

Best regards,

Chuck