Message ID | 20220715142549.25223-1-jgross@suse.com |
---|---|
Headers | show |
Series | x86: make pat and mtrr independent from each other | expand |
On Fri, Jul 15, 2022 at 4:25 PM Juergen Gross <jgross@suse.com> wrote: > > There are several MTRR functions which also do PAT handling. In order > to support PAT handling without MTRR in the future, add some wrappers > for those functions. > > Cc: <stable@vger.kernel.org> # 5.17 > Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()") > Signed-off-by: Juergen Gross <jgross@suse.com> Do I understand correctly that this particular patch doesn't change the behavior? If so, it would be good to mention that in the changelog. > --- > arch/x86/include/asm/mtrr.h | 2 -- > arch/x86/include/asm/processor.h | 7 +++++ > arch/x86/kernel/cpu/common.c | 44 +++++++++++++++++++++++++++++++- > arch/x86/kernel/cpu/mtrr/mtrr.c | 25 +++--------------- > arch/x86/kernel/setup.c | 5 +--- > arch/x86/kernel/smpboot.c | 8 +++--- > arch/x86/power/cpu.c | 2 +- > 7 files changed, 59 insertions(+), 34 deletions(-) > > diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h > index 12a16caed395..900083ac9f60 100644 > --- a/arch/x86/include/asm/mtrr.h > +++ b/arch/x86/include/asm/mtrr.h > @@ -43,7 +43,6 @@ extern int mtrr_del(int reg, unsigned long base, unsigned long size); > extern int mtrr_del_page(int reg, unsigned long base, unsigned long size); > extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi); > extern void mtrr_ap_init(void); > -extern void set_mtrr_aps_delayed_init(void); > extern void mtrr_aps_init(void); > extern void mtrr_bp_restore(void); > extern int mtrr_trim_uncached_memory(unsigned long end_pfn); > @@ -86,7 +85,6 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi) > { > } > #define mtrr_ap_init() do {} while (0) > -#define set_mtrr_aps_delayed_init() do {} while (0) > #define mtrr_aps_init() do {} while (0) > #define mtrr_bp_restore() do {} while (0) > #define mtrr_disable() do {} while (0) > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h > index 5c934b922450..e2140204fb7e 100644 > --- a/arch/x86/include/asm/processor.h > +++ b/arch/x86/include/asm/processor.h > @@ -865,7 +865,14 @@ bool arch_is_platform_page(u64 paddr); > #define arch_is_platform_page arch_is_platform_page > #endif > > +extern bool cache_aps_delayed_init; > + > void cache_disable(void); > void cache_enable(void); > +void cache_bp_init(void); > +void cache_ap_init(void); > +void cache_set_aps_delayed_init(void); > +void cache_aps_init(void); > +void cache_bp_restore(void); > > #endif /* _ASM_X86_PROCESSOR_H */ > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index e43322f8a4ef..0a1bd14f7966 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -1929,7 +1929,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c) > #ifdef CONFIG_X86_32 > enable_sep_cpu(); > #endif > - mtrr_ap_init(); > + cache_ap_init(); > validate_apic_and_package_id(c); > x86_spec_ctrl_setup_ap(); > update_srbds_msr(); > @@ -2403,3 +2403,45 @@ void cache_enable(void) __releases(cache_disable_lock) > > raw_spin_unlock(&cache_disable_lock); > } > + > +void __init cache_bp_init(void) > +{ > + if (IS_ENABLED(CONFIG_MTRR)) > + mtrr_bp_init(); > + else > + pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel."); > +} > + > +void cache_ap_init(void) > +{ > + if (cache_aps_delayed_init) > + return; > + > + mtrr_ap_init(); > +} > + > +bool cache_aps_delayed_init; > + > +void cache_set_aps_delayed_init(void) > +{ > + cache_aps_delayed_init = true; > +} > + > +void cache_aps_init(void) > +{ > + /* > + * Check if someone has requested the delay of AP cache initialization, > + * by doing cache_set_aps_delayed_init(), prior to this point. If not, > + * then we are done. > + */ > + if (!cache_aps_delayed_init) > + return; > + > + mtrr_aps_init(); > + cache_aps_delayed_init = false; > +} > + > +void cache_bp_restore(void) > +{ > + mtrr_bp_restore(); > +} > diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c > index 2746cac9d8a9..c1593cfae641 100644 > --- a/arch/x86/kernel/cpu/mtrr/mtrr.c > +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c > @@ -69,7 +69,6 @@ unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES]; > static DEFINE_MUTEX(mtrr_mutex); > > u64 size_or_mask, size_and_mask; > -static bool mtrr_aps_delayed_init; > > static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init; > > @@ -176,7 +175,8 @@ static int mtrr_rendezvous_handler(void *info) > if (data->smp_reg != ~0U) { > mtrr_if->set(data->smp_reg, data->smp_base, > data->smp_size, data->smp_type); > - } else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) { > + } else if ((use_intel() && cache_aps_delayed_init) || > + !cpu_online(smp_processor_id())) { > mtrr_if->set_all(); > } > return 0; > @@ -789,7 +789,7 @@ void mtrr_ap_init(void) > if (!mtrr_enabled()) > return; > > - if (!use_intel() || mtrr_aps_delayed_init) > + if (!use_intel()) > return; > > /* > @@ -823,16 +823,6 @@ void mtrr_save_state(void) > smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1); > } > > -void set_mtrr_aps_delayed_init(void) > -{ > - if (!mtrr_enabled()) > - return; > - if (!use_intel()) > - return; > - > - mtrr_aps_delayed_init = true; > -} > - > /* > * Delayed MTRR initialization for all AP's > */ > @@ -841,16 +831,7 @@ void mtrr_aps_init(void) > if (!use_intel() || !mtrr_enabled()) > return; > > - /* > - * Check if someone has requested the delay of AP MTRR initialization, > - * by doing set_mtrr_aps_delayed_init(), prior to this point. If not, > - * then we are done. > - */ > - if (!mtrr_aps_delayed_init) > - return; > - > set_mtrr(~0U, 0, 0, 0); > - mtrr_aps_delayed_init = false; > } > > void mtrr_bp_restore(void) > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index bd6c6fd373ae..27d61f73c68a 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -1001,10 +1001,7 @@ void __init setup_arch(char **cmdline_p) > max_pfn = e820__end_of_ram_pfn(); > > /* update e820 for memory not covered by WB MTRRs */ > - if (IS_ENABLED(CONFIG_MTRR)) > - mtrr_bp_init(); > - else > - pat_disable("PAT support disabled because CONFIG_MTRR is disabled in the kernel."); > + cache_bp_init(); > > if (mtrr_trim_uncached_memory(max_pfn)) > max_pfn = e820__end_of_ram_pfn(); > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 5e7f9532a10d..535d73a47062 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -1432,7 +1432,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) > > uv_system_init(); > > - set_mtrr_aps_delayed_init(); > + cache_set_aps_delayed_init(); > > smp_quirk_init_udelay(); > > @@ -1443,12 +1443,12 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) > > void arch_thaw_secondary_cpus_begin(void) > { > - set_mtrr_aps_delayed_init(); > + cache_set_aps_delayed_init(); > } > > void arch_thaw_secondary_cpus_end(void) > { > - mtrr_aps_init(); > + cache_aps_init(); > } > > /* > @@ -1491,7 +1491,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus) > > nmi_selftest(); > impress_friends(); > - mtrr_aps_init(); > + cache_aps_init(); > } > > static int __initdata setup_possible_cpus = -1; > diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c > index bb176c72891c..21e014715322 100644 > --- a/arch/x86/power/cpu.c > +++ b/arch/x86/power/cpu.c > @@ -261,7 +261,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) > do_fpu_end(); > tsc_verify_tsc_adjust(true); > x86_platform.restore_sched_clock_state(); > - mtrr_bp_restore(); > + cache_bp_restore(); > perf_restore_debug_store(); > > c = &cpu_data(smp_processor_id()); > -- > 2.35.3 >
On 7/15/2022 10:25 AM, Juergen Gross wrote: > Today PAT can't be used without MTRR being available, unless MTRR is at > least configured via CONFIG_MTRR and the system is running as Xen PV > guest. In this case PAT is automatically available via the hypervisor, > but the PAT MSR can't be modified by the kernel and MTRR is disabled. > > As an additional complexity the availability of PAT can't be queried > via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT > to be disabled. This leads to some drivers believing that not all cache > modes are available, resulting in failures or degraded functionality. > > The same applies to a kernel built with no MTRR support: it won't > allow to use the PAT MSR, even if there is no technical reason for > that, other than setting up PAT on all cpus the same way (which is a > requirement of the processor's cache management) is relying on some > MTRR specific code. > > Fix all of that by: > > - moving the function needed by PAT from MTRR specific code one level > up > - adding a PAT indirection layer supporting the 3 cases "no or disabled > PAT", "PAT under kernel control", and "PAT under Xen control" > - removing the dependency of PAT on MTRR > > Juergen Gross (3): > x86: move some code out of arch/x86/kernel/cpu/mtrr > x86: add wrapper functions for mtrr functions handling also pat > x86: decouple pat and mtrr handling > > arch/x86/include/asm/memtype.h | 13 ++- > arch/x86/include/asm/mtrr.h | 27 ++++-- > arch/x86/include/asm/processor.h | 10 +++ > arch/x86/kernel/cpu/common.c | 123 +++++++++++++++++++++++++++- > arch/x86/kernel/cpu/mtrr/generic.c | 90 ++------------------ > arch/x86/kernel/cpu/mtrr/mtrr.c | 58 ++++--------- > arch/x86/kernel/cpu/mtrr/mtrr.h | 1 - > arch/x86/kernel/setup.c | 12 +-- > arch/x86/kernel/smpboot.c | 8 +- > arch/x86/mm/pat/memtype.c | 127 +++++++++++++++++++++-------- > arch/x86/power/cpu.c | 2 +- > arch/x86/xen/enlighten_pv.c | 4 + > 12 files changed, 289 insertions(+), 186 deletions(-) > This patch series seems related to the regression reported here on May 5, 2022: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/ I am experiencing that regression and could test this patch on my system. Can you confirm that with this patch series you are trying to fix that regression? Chuck
On Sat, Jul 16, 2022 at 07:32:46AM -0400, Chuck Zmudzinski wrote: > Can you confirm that with this patch series you are trying > to fix that regression? Yes, this patchset is aimed to fix the whole situation but please don't do anything yet - I need to find time and look at the whole approach before you can test it. Just be patient and we'll ping you when the time comes. Thx.
On 7/16/2022 7:32 AM, Chuck Zmudzinski wrote: > On 7/15/2022 10:25 AM, Juergen Gross wrote: > > Today PAT can't be used without MTRR being available, unless MTRR is at > > least configured via CONFIG_MTRR and the system is running as Xen PV > > guest. In this case PAT is automatically available via the hypervisor, > > but the PAT MSR can't be modified by the kernel and MTRR is disabled. > > > > As an additional complexity the availability of PAT can't be queried > > via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT > > to be disabled. This leads to some drivers believing that not all cache > > modes are available, resulting in failures or degraded functionality. > > > > The same applies to a kernel built with no MTRR support: it won't > > allow to use the PAT MSR, even if there is no technical reason for > > that, other than setting up PAT on all cpus the same way (which is a > > requirement of the processor's cache management) is relying on some > > MTRR specific code. > > > > Fix all of that by: > > > > - moving the function needed by PAT from MTRR specific code one level > > up > > - adding a PAT indirection layer supporting the 3 cases "no or disabled > > PAT", "PAT under kernel control", and "PAT under Xen control" > > - removing the dependency of PAT on MTRR > > > > Juergen Gross (3): > > x86: move some code out of arch/x86/kernel/cpu/mtrr > > x86: add wrapper functions for mtrr functions handling also pat > > x86: decouple pat and mtrr handling > > > > arch/x86/include/asm/memtype.h | 13 ++- > > arch/x86/include/asm/mtrr.h | 27 ++++-- > > arch/x86/include/asm/processor.h | 10 +++ > > arch/x86/kernel/cpu/common.c | 123 +++++++++++++++++++++++++++- > > arch/x86/kernel/cpu/mtrr/generic.c | 90 ++------------------ > > arch/x86/kernel/cpu/mtrr/mtrr.c | 58 ++++--------- > > arch/x86/kernel/cpu/mtrr/mtrr.h | 1 - > > arch/x86/kernel/setup.c | 12 +-- > > arch/x86/kernel/smpboot.c | 8 +- > > arch/x86/mm/pat/memtype.c | 127 +++++++++++++++++++++-------- > > arch/x86/power/cpu.c | 2 +- > > arch/x86/xen/enlighten_pv.c | 4 + > > 12 files changed, 289 insertions(+), 186 deletions(-) > > > > This patch series seems related to the regression reported > here on May 5, 2022: I'm sorry, the date of that report was May 4, 2022, not May 5, 2022 - just to avoid any doubt about which regression I am referring to. Chuck > > https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/ > > I am experiencing that regression or a very similar regression that is caused by the same commit: bdd8b6c98239cad ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()") > and could test this patch > on my system. > > Can you confirm that with this patch series you are trying > to fix that regression? > > Chuck Chuck
On 7/16/2022 7:42 AM, Borislav Petkov wrote: > On Sat, Jul 16, 2022 at 07:32:46AM -0400, Chuck Zmudzinski wrote: > > Can you confirm that with this patch series you are trying > > to fix that regression? > > Yes, this patchset is aimed to fix the whole situation but please don't > do anything yet - I need to find time and look at the whole approach > before you can test it. Just be patient and we'll ping you when the time > comes. > > Thx. > I will wait until I get the ping before trying it. Thanks, Chuck
Hi Juergen! On 15.07.22 16:25, Juergen Gross wrote: > Today PAT can't be used without MTRR being available, unless MTRR is at > least configured via CONFIG_MTRR and the system is running as Xen PV > guest. In this case PAT is automatically available via the hypervisor, > but the PAT MSR can't be modified by the kernel and MTRR is disabled. > > As an additional complexity the availability of PAT can't be queried > via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT > to be disabled. This leads to some drivers believing that not all cache > modes are available, resulting in failures or degraded functionality. > > The same applies to a kernel built with no MTRR support: it won't > allow to use the PAT MSR, even if there is no technical reason for > that, other than setting up PAT on all cpus the same way (which is a > requirement of the processor's cache management) is relying on some > MTRR specific code. > > Fix all of that by: > > - moving the function needed by PAT from MTRR specific code one level > up > - adding a PAT indirection layer supporting the 3 cases "no or disabled > PAT", "PAT under kernel control", and "PAT under Xen control" > - removing the dependency of PAT on MTRR Thx for working on this. If you need to respin these patches for one reason or another, could you do me a favor and add proper 'Link:' tags pointing to all reports about this issue? e.g. like this: Link: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/ These tags are considered important by Linus[1] and others, as they allow anyone to look into the backstory weeks or years from now. That is why they should be placed in cases like this, as Documentation/process/submitting-patches.rst and Documentation/process/5.Posting.rst explain in more detail. I care personally, because these tags make my regression tracking efforts a whole lot easier, as they allow my tracking bot 'regzbot' to automatically connect reports with patches posted or committed to fix tracked regressions. [1] see for example: https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-DgvdBWg@mail.gmail.com/ https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@mail.gmail.com/ https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_DoJA@mail.gmail.com/ Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. BTW, let me tell regzbot to monitor this thread: #regzbot ^backmonitor: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
On 7/17/2022 3:55 AM, Thorsten Leemhuis wrote: > Hi Juergen! > > On 15.07.22 16:25, Juergen Gross wrote: > > Today PAT can't be used without MTRR being available, unless MTRR is at > > least configured via CONFIG_MTRR and the system is running as Xen PV > > guest. In this case PAT is automatically available via the hypervisor, > > but the PAT MSR can't be modified by the kernel and MTRR is disabled. > > > > As an additional complexity the availability of PAT can't be queried > > via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT > > to be disabled. This leads to some drivers believing that not all cache > > modes are available, resulting in failures or degraded functionality. > > > > The same applies to a kernel built with no MTRR support: it won't > > allow to use the PAT MSR, even if there is no technical reason for > > that, other than setting up PAT on all cpus the same way (which is a > > requirement of the processor's cache management) is relying on some > > MTRR specific code. > > > > Fix all of that by: > > > > - moving the function needed by PAT from MTRR specific code one level > > up > > - adding a PAT indirection layer supporting the 3 cases "no or disabled > > PAT", "PAT under kernel control", and "PAT under Xen control" > > - removing the dependency of PAT on MTRR > > Thx for working on this. If you need to respin these patches for one > reason or another, could you do me a favor and add proper 'Link:' tags > pointing to all reports about this issue? e.g. like this: > > Link: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/ > > These tags are considered important by Linus[1] and others, as they > allow anyone to look into the backstory weeks or years from now. That is > why they should be placed in cases like this, as > Documentation/process/submitting-patches.rst and > Documentation/process/5.Posting.rst explain in more detail. I care > personally, because these tags make my regression tracking efforts a > whole lot easier, as they allow my tracking bot 'regzbot' to > automatically connect reports with patches posted or committed to fix > tracked regressions. > > [1] see for example: > https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-DgvdBWg@mail.gmail.com/ > https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@mail.gmail.com/ > https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_DoJA@mail.gmail.com/ > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > I echo Thorsten's thx for starting on this now instead of waiting until September which I think is when Juergen said he could start working on this last week. I agree with Thorsten that Link tags are needed. Since multiple patches have been proposed to fix this regression, perhaps a Link to each proposed patch, and a note that the original report identified a specific commit which when reverted also fixes it. IMO, this is all part of the backstory Thorsten refers to. It looks like with this approach, a fix will not be coming real soon, and Borislav Petkov also discouraged me from testing this patch set until I receive a ping telling me it is ready for testing, which seems to confirm that this regression will not be fixed very soon. Please correct me if I am wrong about how long it will take to fix it with this approach. Also, is there any guarantee this approach is endorsed by all the maintainers who will need to sign-off, especially Linus? I say this because some of the discussion on the earlier proposed patches makes me doubt this. I am especially referring to this discussion: https://lore.kernel.org/lkml/4c8c9d4c-1c6b-8e9f-fa47-918a64898a28@leemhuis.info/ and also, here: https://lore.kernel.org/lkml/YsRjX%2FU1XN8rq+8u@zn.tnic/ where Borislav Petkov argues that Linux should not be patched at all to fix this regression but instead the fix should come by patching the Xen hypervisor. So I have several questions, presuming at least the fix is going to be delayed for some time, and also presuming this approach is not yet an approach that has the blessing of the maintainers who will need to sign-off: 1. Can you estimate when the patch series will be ready for testing and suitable for a prepatch or RC release? 2. Can you estimate when the patch series will be ready to be merged into the mainline release? Is there any hope it will be fixed before the next longterm release hosted on kernel.org? 3. Since a fix is likely not coming soon, can you explain why the commit that was mentioned in the original report cannot be reverted as a temporary solution while we wait for the full fix to come later? I can say that reverting that commit (It was a commit affecting drm/i915) does fix the issue on my system with no negative side effects at all. In such a case, it seems contrary to Linus' regression rule to not revert the offending commit, even if reverting the offending commit is not going to be the final solution. IOW, I am trying to argue that an important corollary to the Linus regression rule is that we revert commits that introduce regressions, especially when there are no negative effects when reverting the offending commit. Why are we not doing that in this case? 4. Can you explain why this patch series is superior to the other proposed patches that are much more simple and have been reported to fix the regression? 5. This approach seems way too aggressive for backporting to the stable releases. Is that correct? Or, will the patches be backported to the stable releases? I was told that backports to the stable releases are needed to keep things consistent across all the supported versions when I submitted a patch to fix this regression that identified a specific five year old commit that my proposed patch would fix. Remember, this is a regression that is really bothering people now. For example, I am now in a position where I cannot install the updates of the Linux kernel that Debian pushes out to me without patching the kernel with my own private build that has one of the known fixes that have already been identified as ways to workaround this regression while we wait for the full solution that will hopefully come later. Chuck > P.S.: As the Linux kernel's regression tracker I deal with a lot of > reports and sometimes miss something important when writing mails like > this. If that's the case here, don't hesitate to tell me in a public > reply, it's in everyone's interest to set the public record straight. > > BTW, let me tell regzbot to monitor this thread: > > #regzbot ^backmonitor: > https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
On 7/18/2022 7:32 AM, Chuck Zmudzinski wrote: > On 7/17/2022 3:55 AM, Thorsten Leemhuis wrote: > > Hi Juergen! > > > > On 15.07.22 16:25, Juergen Gross wrote: > > > Today PAT can't be used without MTRR being available, unless MTRR is at > > > least configured via CONFIG_MTRR and the system is running as Xen PV > > > guest. In this case PAT is automatically available via the hypervisor, > > > but the PAT MSR can't be modified by the kernel and MTRR is disabled. > > > > > > As an additional complexity the availability of PAT can't be queried > > > via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT > > > to be disabled. This leads to some drivers believing that not all cache > > > modes are available, resulting in failures or degraded functionality. > > > > > > The same applies to a kernel built with no MTRR support: it won't > > > allow to use the PAT MSR, even if there is no technical reason for > > > that, other than setting up PAT on all cpus the same way (which is a > > > requirement of the processor's cache management) is relying on some > > > MTRR specific code. > > > > > > Fix all of that by: > > > > > > - moving the function needed by PAT from MTRR specific code one level > > > up > > > - adding a PAT indirection layer supporting the 3 cases "no or disabled > > > PAT", "PAT under kernel control", and "PAT under Xen control" > > > - removing the dependency of PAT on MTRR > > > > Thx for working on this. If you need to respin these patches for one > > reason or another, could you do me a favor and add proper 'Link:' tags > > pointing to all reports about this issue? e.g. like this: > > > > Link: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/ > > > > These tags are considered important by Linus[1] and others, as they > > allow anyone to look into the backstory weeks or years from now. That is > > why they should be placed in cases like this, as > > Documentation/process/submitting-patches.rst and > > Documentation/process/5.Posting.rst explain in more detail. I care > > personally, because these tags make my regression tracking efforts a > > whole lot easier, as they allow my tracking bot 'regzbot' to > > automatically connect reports with patches posted or committed to fix > > tracked regressions. > > > > [1] see for example: > > https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-DgvdBWg@mail.gmail.com/ > > https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@mail.gmail.com/ > > https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_DoJA@mail.gmail.com/ > > > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > > > > I echo Thorsten's thx for starting on this now instead of waiting until > September which I think is when Juergen said he could start working > on this last week. I agree with Thorsten that Link tags are needed. > Since multiple patches have been proposed to fix this regression, > perhaps a Link to each proposed patch, and a note that > the original report identified a specific commit which when reverted > also fixes it. IMO, this is all part of the backstory Thorsten refers to. > > It looks like with this approach, a fix will not be coming real soon, > and Borislav Petkov also discouraged me from testing this > patch set until I receive a ping telling me it is ready for testing, > which seems to confirm that this regression will not be fixed > very soon. Please correct me if I am wrong about how long > it will take to fix it with this approach. > > Also, is there any guarantee this approach is endorsed by > all the maintainers who will need to sign-off, especially > Linus? I say this because some of the discussion on the > earlier proposed patches makes me doubt this. I am especially > referring to this discussion: > > https://lore.kernel.org/lkml/4c8c9d4c-1c6b-8e9f-fa47-918a64898a28@leemhuis.info/ > > and also, here: > > https://lore.kernel.org/lkml/YsRjX%2FU1XN8rq+8u@zn.tnic/ > > where Borislav Petkov argues that Linux should not be > patched at all to fix this regression but instead the fix > should come by patching the Xen hypervisor. > > So I have several questions, presuming at least the fix is going > to be delayed for some time, and also presuming this approach > is not yet an approach that has the blessing of the maintainers > who will need to sign-off: > > 1. Can you estimate when the patch series will be ready for > testing and suitable for a prepatch or RC release? > > 2. Can you estimate when the patch series will be ready to be > merged into the mainline release? Is there any hope it will be > fixed before the next longterm release hosted on kernel.org? > > 3. Since a fix is likely not coming soon, can you explain > why the commit that was mentioned in the original > report cannot be reverted as a temporary solution while > we wait for the full fix to come later? I can say that > reverting that commit (It was a commit affecting > drm/i915) does fix the issue on my system with no > negative side effects at all. In such a case, it seems > contrary to Linus' regression rule to not revert the > offending commit, even if reverting the offending > commit is not going to be the final solution. IOW, > I am trying to argue that an important corollary to > the Linus regression rule is that we revert commits > that introduce regressions, especially when there > are no negative effects when reverting the offending > commit. Why are we not doing that in this case? > > 4. Can you explain why this patch series is superior > to the other proposed patches that are much more > simple and have been reported to fix the regression? > > 5. This approach seems way too aggressive for backporting > to the stable releases. Is that correct? Or, will the patches > be backported to the stable releases? I was told that > backports to the stable releases are needed to keep things > consistent across all the supported versions when I submitted > a patch to fix this regression that identified a specific five year > old commit that my proposed patch would fix. > > Remember, this is a regression that is really bothering > people now. For example, I am now in a position where > I cannot install the updates of the Linux kernel that Debian > pushes out to me without patching the kernel with my > own private build that has one of the known fixes that > have already been identified as ways to workaround this > regression while we wait for the full solution that will > hopefully come later. > > Chuck > > > P.S.: As the Linux kernel's regression tracker I deal with a lot of > > reports and sometimes miss something important when writing mails like > > this. If that's the case here, don't hesitate to tell me in a public > > reply, it's in everyone's interest to set the public record straight. > > > > BTW, let me tell regzbot to monitor this thread: > > > > #regzbot ^backmonitor: > > https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/ > OK, the comments Boris made on the individual patches of this patch set answers most of my questions. Thx, Boris. Chuck
On 7/17/22 3:55 AM, Thorsten Leemhuis wrote: > Hi Juergen! > > On 15.07.22 16:25, Juergen Gross wrote: > > Today PAT can't be used without MTRR being available, unless MTRR is at > > least configured via CONFIG_MTRR and the system is running as Xen PV > > guest. In this case PAT is automatically available via the hypervisor, > > but the PAT MSR can't be modified by the kernel and MTRR is disabled. > > > > As an additional complexity the availability of PAT can't be queried > > via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT > > to be disabled. This leads to some drivers believing that not all cache > > modes are available, resulting in failures or degraded functionality. > > > > The same applies to a kernel built with no MTRR support: it won't > > allow to use the PAT MSR, even if there is no technical reason for > > that, other than setting up PAT on all cpus the same way (which is a > > requirement of the processor's cache management) is relying on some > > MTRR specific code. > > > > Fix all of that by: > > > > - moving the function needed by PAT from MTRR specific code one level > > up > > - adding a PAT indirection layer supporting the 3 cases "no or disabled > > PAT", "PAT under kernel control", and "PAT under Xen control" > > - removing the dependency of PAT on MTRR > > Thx for working on this. If you need to respin these patches for one > reason or another, could you do me a favor and add proper 'Link:' tags > pointing to all reports about this issue? e.g. like this: > > Link: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/ > > These tags are considered important by Linus[1] and others, as they > allow anyone to look into the backstory weeks or years from now. That is > why they should be placed in cases like this, as > Documentation/process/submitting-patches.rst and > Documentation/process/5.Posting.rst explain in more detail. I care > personally, because these tags make my regression tracking efforts a > whole lot easier, as they allow my tracking bot 'regzbot' to > automatically connect reports with patches posted or committed to fix > tracked regressions. > > [1] see for example: > https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-DgvdBWg@mail.gmail.com/ > https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@mail.gmail.com/ > https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_DoJA@mail.gmail.com/ > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > > P.S.: As the Linux kernel's regression tracker I deal with a lot of > reports and sometimes miss something important when writing mails like > this. If that's the case here, don't hesitate to tell me in a public > reply, it's in everyone's interest to set the public record straight. > > BTW, let me tell regzbot to monitor this thread: > > #regzbot ^backmonitor: > https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/ Hi Thorsten, This appears stalled again and we are now over three months from the first report of the regression, The only excuse for ignoring your comments, and other comments on the patches in this patch series for this long a time is that the patch series for some reason cannot be considered a true regression. If this is a regression, then, IMHO, this needs to have a higher priority by the maintainers, or the maintainers need to explain why this regression cannot be fixed in a more timely manner. But continued silence by the maintainers is unacceptable, IMHO. This is especially true in this case when multiple fixes for the regression have been identified and the maintainers have not yet clearly explained why at least a fix, even if temporary, cannot be applied immediately while we wait for a more comprehensive fix. At the very least, I would expect Juergen to reply here and say that he is delayed but does plan to spin up an updated version and include the necessary links in the new version to facilitate your tracking of the regression. Why the silence from Juergen here? Best regards, Chuck
On 8/13/2022 12:56 PM, Chuck Zmudzinski wrote: > On 7/17/22 3:55 AM, Thorsten Leemhuis wrote: > > Hi Juergen! > > > > On 15.07.22 16:25, Juergen Gross wrote: ... > > Hi Thorsten, > > This appears stalled again and we are now over three months > from the first report of the regression, The only excuse for > ignoring your comments, and other comments on the patches > in this patch series for this long a time is that the patch series > for some reason cannot be considered a true regression. If this is a > regression, then, IMHO, this needs to have a higher priority by the > maintainers, or the maintainers need to explain why this regression > cannot be fixed in a more timely manner. But continued silence > by the maintainers is unacceptable, IMHO. This is especially true > in this case when multiple fixes for the regression have been > identified and the maintainers have not yet clearly explained why > at least a fix, even if temporary, cannot be applied immediately > while we wait for a more comprehensive fix. > > At the very least, I would expect Juergen to reply here and say that > he is delayed but does plan to spin up an updated version and include > the necessary links in the new version to facilitate your tracking of > the regression. Why the silence from Juergen here? This is a fairly long message but I think what I need to say here is important for the future success of Linux and open source software, so here goes.... Update: I accept Boris Petkov's response to me yesterday as reasonable and acceptable if within two weeks he at least explains on the public mailing lists how he and Juergen have privately agreed to fix this regression "soon" if he does not actually fix the regression by then with a commit, patch set, or merge. The two-week time frame is from here: https://www.kernel.org/doc/html/latest/process/handling-regressions.html where developers and maintainers are exhorted as follows: "Try to fix regressions quickly once the culprit has been identified; fixes for most regressions should be merged within two weeks, but some need to be resolved within two or three days." I also think there is a private agreement between Juergen and Boris to fix this regression because AFAICT there is no evidence in the public mailing lists that such an agreement has been reached, yet Boris yesterday told me on the public mailing lists in this thread to be "patient" and that "we will fix this soon." Unless I am missing something, and I hope I am, the only way that a fix could be coming "soon" would be to presume that Juergen and Boris have agreed to a fix for the regression in private. However, AFAICT, keeping their solution private would be a violation of netiquette as described here: https://people.kernel.org/tglx/notes-about-netiquette where a whole section is devoted to the importance of keeping the discussion of changes to the kernel in public, with private discussions being a violation of the netiquette that governs the discussions that take place between persons interested in the Linux kernel project and other open source projects. Yet, in one of his messages to me yesterday, Boris appended the link to the netiquette rules, thus implicitly accusing me of violating the netiquette rules when in fact he is the one who at least seems to be violating the rule forbidding private discussions of changes to the kernel once a patch set is already up for discussion on the public mailing lists. Of course Boris can exonerate himself completely if within two weeks he at least explains on the public mailing lists how he and Juergen have agreed to fix the regression. I sincerely hope he at least does that within the next two weeks, or even better, that he exonerates himself by actually committing the official fix for the regression within the next two weeks. However, I will only believe it when I see it. When it comes to the Linux kernel, I go by what I see in the performance of the Linux kernel in my computing environments, what I see on the public mailing lists and in the official documentation, and by what I see in the source code itself. I do not go by blind faith in any single developer. I am not religious when it comes to the Linux kernel. Instead, I am scientific and practical about it. Finally, please forgive me also if I am mistaken in my assumption that these rules of netiquette apply no less to the developers and maintainers of the Linux kernel than to others who wish to offer their contributions to the development of the Linux kernel. If the rules of netiquette do not apply to the developers and maintainers, of the kernel, then, IMHO, the great advantage of open source software development is totally lost, because the advantage of the open source software development model depends at least as much on free and open access to the discussions about the source code conducted by the developers and maintainers as it does on the freedom to have access to the source code itself. If someone here tells me that those rules of netiquette need not be followed by the developers and maintainers I certainly hope someone else will come to the defense of those same wise rules that have allowed such a successful open source software ecosystem to flourish and thrive around this project, the Linux kernel. IMHO, the day someone make the decision to stop enforcing these wise rules is the day that the open source development model will begin to lose its advantage over proprietary software development models. And perhaps the most important rule of all for the continued success of Linux and open source software development is the Linus regression rule, with the rule that discussions about changes to the source code must be done in public being a close second in importance to the Linus regression rule. Best Regards, Chuck
On 14.08.22 09:42, Chuck Zmudzinski wrote: > On 8/13/2022 12:56 PM, Chuck Zmudzinski wrote: >> On 7/17/22 3:55 AM, Thorsten Leemhuis wrote: >>> Hi Juergen! >>> >>> On 15.07.22 16:25, Juergen Gross wrote: ... >> >> Hi Thorsten, >> >> This appears stalled again and we are now over three months >> from the first report of the regression, The only excuse for >> ignoring your comments, and other comments on the patches >> in this patch series for this long a time is that the patch series >> for some reason cannot be considered a true regression. If this is a >> regression, then, IMHO, this needs to have a higher priority by the >> maintainers, or the maintainers need to explain why this regression >> cannot be fixed in a more timely manner. But continued silence >> by the maintainers is unacceptable, IMHO. This is especially true >> in this case when multiple fixes for the regression have been >> identified and the maintainers have not yet clearly explained why >> at least a fix, even if temporary, cannot be applied immediately >> while we wait for a more comprehensive fix. >> >> At the very least, I would expect Juergen to reply here and say that >> he is delayed but does plan to spin up an updated version and include >> the necessary links in the new version to facilitate your tracking of >> the regression. Why the silence from Juergen here? > > This is a fairly long message but I think what I need to say > here is important for the future success of Linux and open > source software, so here goes.... > > Update: I accept Boris Petkov's response to me yesterday as reasonable > and acceptable if within two weeks he at least explains on the public > mailing lists how he and Juergen have privately agreed to fix this regression > "soon" if he does not actually fix the regression by then with a commit, > patch set, or merge. The two-week time frame is from here: > > https://www.kernel.org/doc/html/latest/process/handling-regressions.html > > where developers and maintainers are exhorted as follows: "Try to fix > regressions quickly once the culprit has been identified; fixes for most > regressions should be merged within two weeks, but some need to be > resolved within two or three days." And some more citations from the same document: "Prioritize work on handling regression reports and fixing regression over all other Linux kernel work, unless the latter concerns acute security issues or bugs causing data loss or damage." First thing to note here: "over all Linux kernel work". I' not only working on the kernel, but I have other responsibilities e.g. in the Xen community, where I was sending patches for fixing a regression and where I'm quite busy doing security related work. Apart from that I'm of course responsible to handle SUSE customers' bug reports at a rather high priority. So please stop accusing me to ignore the responses to these patches. This is just not really motivating me to continue interacting with you. "Always consider reverting the culprit commits and reapplying them later together with necessary fixes, as this might be the least dangerous and quickest way to fix a regression." I didn't introduce the regression, nor was it introduced in my area of maintainership. It just happened to hit Xen. So I stepped up after Jan's patches were not deemed to be the way to go, and I wrote the patches in spite of me having other urgent work to do. In case you are feeling so strong about the fix of the regression, why don't you ask for the patch introducing it to be reverted instead? Accusing me and Boris is not acceptable at all! > I also think there is a private agreement between Juergen and Boris to > fix this regression because AFAICT there is no evidence in the public > mailing lists that such an agreement has been reached, yet Boris yesterday > told me on the public mailing lists in this thread to be "patient" and that > "we will fix this soon." Unless I am missing something, and I hope I am, > the only way that a fix could be coming "soon" would be to presume > that Juergen and Boris have agreed to a fix for the regression in private. > > However, AFAICT, keeping their solution private would be a violation of > netiquette as described here: > > https://people.kernel.org/tglx/notes-about-netiquette > > where a whole section is devoted to the importance of keeping the > discussion of changes to the kernel in public, with private discussions > being a violation of the netiquette that governs the discussions that > take place between persons interested in the Linux kernel project and > other open source projects. Another uncalled for attack. After sending the patches I just told Boris via IRC that I wouldn't react to any responses soon, as I was about to start my vacation. This was just a hint for him, as he was rather busy at that time handling kernel security issues. I won't comment on the rest of your absolute unacceptable accusations. I will continue with the patches as soon as I find time to do so. Juergen
On 8/14/2022 3:42 AM, Chuck Zmudzinski wrote: > On 8/13/2022 12:56 PM, Chuck Zmudzinski wrote: > > On 7/17/22 3:55 AM, Thorsten Leemhuis wrote: > > > Hi Juergen! > > > > > > On 15.07.22 16:25, Juergen Gross wrote: ... > > > > Hi Thorsten, > > > > This appears stalled again and we are now over three months > > from the first report of the regression, The only excuse for > > ignoring your comments, and other comments on the patches > > in this patch series for this long a time is that the patch series > > for some reason cannot be considered a true regression. If this is a > > regression, then, IMHO, this needs to have a higher priority by the > > maintainers, or the maintainers need to explain why this regression > > cannot be fixed in a more timely manner. But continued silence > > by the maintainers is unacceptable, IMHO. This is especially true > > in this case when multiple fixes for the regression have been > > identified and the maintainers have not yet clearly explained why > > at least a fix, even if temporary, cannot be applied immediately > > while we wait for a more comprehensive fix. > > > > At the very least, I would expect Juergen to reply here and say that > > he is delayed but does plan to spin up an updated version and include > > the necessary links in the new version to facilitate your tracking of > > the regression. Why the silence from Juergen here? > > This is a fairly long message but I think what I need to say > here is important for the future success of Linux and open > source software, so here goes.... > > Update: I accept Boris Petkov's response to me yesterday as reasonable > and acceptable if within two weeks he at least explains on the public > mailing lists how he and Juergen have privately agreed to fix this regression > "soon" if he does not actually fix the regression by then with a commit, > patch set, or merge. The two-week time frame is from here: > > https://www.kernel.org/doc/html/latest/process/handling-regressions.html > > where developers and maintainers are exhorted as follows: "Try to fix > regressions quickly once the culprit has been identified; fixes for most > regressions should be merged within two weeks, but some need to be > resolved within two or three days." > > I also think there is a private agreement between Juergen and Boris to > fix this regression because AFAICT there is no evidence in the public > mailing lists that such an agreement has been reached, yet Boris yesterday > told me on the public mailing lists in this thread to be "patient" and that > "we will fix this soon." Unless I am missing something, and I hope I am, > the only way that a fix could be coming "soon" would be to presume > that Juergen and Boris have agreed to a fix for the regression in private. > > However, AFAICT, keeping their solution private would be a violation of > netiquette as described here: > > https://people.kernel.org/tglx/notes-about-netiquette > > where a whole section is devoted to the importance of keeping the > discussion of changes to the kernel in public, with private discussions > being a violation of the netiquette that governs the discussions that > take place between persons interested in the Linux kernel project and > other open source projects. > > Yet, in one of his messages to me yesterday, Boris appended the link > to the netiquette rules, thus implicitly accusing me of violating the > netiquette rules when in fact he is the one who at least seems to be > violating the rule forbidding private discussions of changes to the > kernel once a patch set is already up for discussion on the public > mailing lists. > > Of course Boris can exonerate himself completely if within two > weeks he at least explains on the public mailing lists how he and > Juergen have agreed to fix the regression. I sincerely hope he at > least does that within the next two weeks, or even better, that he > exonerates himself by actually committing the official fix for the > regression within the next two weeks. > > However, I will only believe it when I see it. When it comes to the > Linux kernel, I go by what I see in the performance of the Linux > kernel in my computing environments, what I see on the public > mailing lists and in the official documentation, and by what I > see in the source code itself. I do not go by blind faith in any > single developer. I am not religious when it comes to the Linux > kernel. Instead, I am scientific and practical about it. > > Finally, please forgive me also if I am mistaken in my assumption > that these rules of netiquette apply no less to the developers and > maintainers of the Linux kernel than to others who wish to offer > their contributions to the development of the Linux kernel. If the > rules of netiquette do not apply to the developers and maintainers, > of the kernel, then, IMHO, the great advantage of open source > software development is totally lost, because the advantage of the > open source software development model depends at least as > much on free and open access to the discussions about the > source code conducted by the developers and maintainers as it > does on the freedom to have access to the source code itself. > If someone here tells me that those rules of netiquette need > not be followed by the developers and maintainers I certainly > hope someone else will come to the defense of those same > wise rules that have allowed such a successful open source > software ecosystem to flourish and thrive around this project, > the Linux kernel. > > IMHO, the day someone make the decision to stop enforcing these > wise rules is the day that the open source development model will > begin to lose its advantage over proprietary software development > models. And perhaps the most important rule of all for the continued > success of Linux and open source software development is the Linus > regression rule, with the rule that discussions about changes > to the source code must be done in public being a close second in > importance to the Linus regression rule. > > Best Regards, > > Chuck Hi Thorsten, Well, that did not take long. Juergen responded with a message, which is encrypted and not delivered to my mailbox because I do not have the PGP keys, presumably to make it difficult for me to continue the discussion and defend myself after I was accused of violating the netiquette rules yesterday by Boris: https://lore.kernel.org/lkml/c88ea08c-a9d5-ef6a-333a-db9e00c6da6f@suse.com/raw Fortunately, lore.kernel.org did decrypt Juergen's message so you can read what he wrote in response to my message there. I don't think what Juergen said there is very constructive although I am not surprised he seeks to defend himself, and he makes many valid points that are good for developers and Linux insiders but not so good for users and the long-term success of the Linux kernel project, so I am not going to reproduce what he said in this message, but I think you need to read it to help you understand why this regression is not being fixed in a timely manner: https://lore.kernel.org/lkml/c88ea08c-a9d5-ef6a-333a-db9e00c6da6f@suse.com/ Sorry for the trouble, but I am just a user trying to understand why this regression has not been fixed for over three months. If this is the best the Linux kernel community can do in response to my questions about this regression, then in the long run, I can assure, you, the open source development model is doomed to a slow, long, and eventually painful death. Best regards, Chuck
On Sun, Aug 14, 2022 at 05:19:12AM -0400, Chuck Zmudzinski wrote: > Well, that did not take long. Juergen responded with a message, > which is encrypted and not delivered to my mailbox because I do not > have the PGP keys, presumably to make it difficult for me to continue > the discussion and defend myself after I was accused of violating > the netiquette rules yesterday by Boris: The message was signed, not encrypted. Odd that your email client could not read it, perhaps you need to use a different one? thanks, greg k-h
On 8/14/2022 5:50 AM, Greg KH wrote: > On Sun, Aug 14, 2022 at 05:19:12AM -0400, Chuck Zmudzinski wrote: > > Well, that did not take long. Juergen responded with a message, > > which is encrypted and not delivered to my mailbox because I do not > > have the PGP keys, presumably to make it difficult for me to continue > > the discussion and defend myself after I was accused of violating > > the netiquette rules yesterday by Boris: > > The message was signed, not encrypted. Odd that your email client could > not read it, perhaps you need to use a different one? > > thanks, > > greg k-h It's not that my e-mail client could not read it, there is no evidence it was ever sent to me. I use aol.com which is administered by Yahoo!. It did not even appear in the web interface for my e-mail service, so it was never delivered to my e-mail client, which is Thunderbird. Neither the Windows nor the Linux client can retrieve a message not delivered to the Yahoo! servers! I also checked the Junk and Spam folders and it was not there either. But I received your message and other messages normally. It is as if the message was sent to everyone else on the To: and Cc: lists except for me. I think the problem was on the sender's end or with my e-mail service, Yahoo!, which apparently does not accept signed messages without some special configuration that I have not done with Yahoo! yet. I will look into it next week. Chuck
On Sun, Aug 14, 2022 at 08:08:30AM -0400, Chuck Zmudzinski wrote: > On 8/14/2022 5:50 AM, Greg KH wrote: > > On Sun, Aug 14, 2022 at 05:19:12AM -0400, Chuck Zmudzinski wrote: > > > Well, that did not take long. Juergen responded with a message, > > > which is encrypted and not delivered to my mailbox because I do not > > > have the PGP keys, presumably to make it difficult for me to continue > > > the discussion and defend myself after I was accused of violating > > > the netiquette rules yesterday by Boris: > > > > The message was signed, not encrypted. Odd that your email client could > > not read it, perhaps you need to use a different one? > > > > thanks, > > > > greg k-h > > It's not that my e-mail client could not read it, there is no evidence it > was ever sent to me. The To: line had your address in it, so it was sent to you, and again, it was not encrypted as you claimed, but rather just signed to verify he was the sender. That's not making anything difficult for anyone, so I think you owe him an apology here, especially as you are asking him to do work for you. best of luck! greg k-h
On 8/14/2022 9:01 AM, Greg KH wrote: > On Sun, Aug 14, 2022 at 08:08:30AM -0400, Chuck Zmudzinski wrote: > > On 8/14/2022 5:50 AM, Greg KH wrote: > > > On Sun, Aug 14, 2022 at 05:19:12AM -0400, Chuck Zmudzinski wrote: > > > > Well, that did not take long. Juergen responded with a message, > > > > which is encrypted and not delivered to my mailbox because I do not > > > > have the PGP keys, presumably to make it difficult for me to continue > > > > the discussion and defend myself after I was accused of violating > > > > the netiquette rules yesterday by Boris: > > > > > > The message was signed, not encrypted. Odd that your email client could > > > not read it, perhaps you need to use a different one? > > > > > > thanks, > > > > > > greg k-h > > > > It's not that my e-mail client could not read it, there is no evidence it > > was ever sent to me. > > The To: line had your address in it, so it was sent to you, and again, > it was not encrypted as you claimed, but rather just signed to verify he > was the sender. That's not making anything difficult for anyone, so I > think you owe him an apology here, especially as you are asking him to > do work for you. > > best of luck! > > greg k-h Dear Greg, Thanks for the advice. I appreciate it. Below follows my apology to Juergen and and Thorsten and some additional comments for anyone willing to hear what I am trying to say as I continue to try to participate in the discussion of this regression... Dear Juergen and Thorsten, I do apologize since I agree there is not enough evidence to conclude that Juergen purposely made it difficult for me to respond to and defend myself against the negative things he said about me in the e-mail I never received from him. I am not going to try to defend myself either, since it is not necessary and is probably an impossible task for me to succeed in defending myself here in this forum. The e-mail you tried and failed to send to me is currently publicly available on more than one public mailing lists and it speaks for itself. Each person who reads it and the other relevant messages in the thread will decide for himself or herself what that message means. So far I am inclined to think most people who will even take the time to read the thread will judge me to be in the wrong, and I also am inclined to think many who are Cc'd on this thread are already ignoring me because they consider me to be a total jerk. That's fine, but that's just their opinion, especially if they base their opinion only on a custom of hazing users who dare to say what they think on the Linux public mailing lists. But since you are the persons who create the Linux kernel, I will express my opinion that your decision to reject my efforts to help the kernel developers and maintainers work better together with each other and with users like me who are brave enough to say what they think on these public mailing lists is the wrong decision if your goal is really to make Linux and open source software development able to continue to produce high quality software that is actually useful to people. I say that because I am trying to scream to you as loud as I can: "Linux software is no longer useful to me." No one here seems willing to hear that message. I wonder if Linus even cares about that anymore. And that is sad, because Linux was a great project. Unfortunately, now, it is clear to me it is going to die a slow, painful death. The Linux kernel is a big and powerful enough project to survive for quite a while, and I probably won't live to see its death, but unless the people who define the Linux kernel community change, it will eventually die. Best regards and good luck to all of you, Chuck
On 8/14/2022 9:01 AM, Greg KH wrote: > The To: line had your address in it, so it was sent to you, and again, > it was not encrypted as you claimed, but rather just signed to verify he > was the sender. That's not making anything difficult for anyone, so I > think you owe him an apology here, especially as you are asking him to > do work for you. You misunderstand me completely. I am not here to ask Juergen to do any work for me, he is the one who volunteered to fix a regression that affects my computer, so I am interested in what he has to say, and I am on this mailing list to find out if he, and other Linux developers and maintainers, are the kind of people I want to have writing the software that runs on my computers. I don't have to tell you what my decision about that is, but do you really think I want people who refuse to answer my questions about the software they are writing for my computers to continue to be the ones I rely on for the security and stability of my computer systems? If you think I am that stupid, I suppose you also think I am too stupid to receive an e-mail message that Juergen tried to send me earlier today. The fact is, Juergen is the only person I am aware of who has tried and failed to get an e-mail message delivered to me during the past thirty years since I started using e-mail. That's quite an accomplishment for Juergen to achieve! Best regards, Chuck
On 8/14/22 4:08 AM, Juergen Gross wrote: > > On 8/13/2022 12:56 PM, Chuck Zmudzinski wrote: > > > > This is a fairly long message but I think what I need to say > > here is important for the future success of Linux and open > > source software, so here goes.... > > > > Update: I accept Boris Petkov's response to me yesterday as reasonable > > and acceptable if within two weeks he at least explains on the public > > mailing lists how he and Juergen have privately agreed to fix this regression > > "soon" if he does not actually fix the regression by then with a commit, > > patch set, or merge. The two-week time frame is from here: > > > > https://www.kernel.org/doc/html/latest/process/handling-regressions.html > > > > where developers and maintainers are exhorted as follows: "Try to fix > > regressions quickly once the culprit has been identified; fixes for most > > regressions should be merged within two weeks, but some need to be > > resolved within two or three days." > > And some more citations from the same document: > > "Prioritize work on handling regression reports and fixing regression over all > other Linux kernel work, unless the latter concerns acute security issues or > bugs causing data loss or damage." > > First thing to note here: "over all Linux kernel work". I' not only working > on the kernel, but I have other responsibilities e.g. in the Xen community, > where I was sending patches for fixing a regression and where I'm quite busy > doing security related work. Apart from that I'm of course responsible to > handle SUSE customers' bug reports at a rather high priority. So please stop > accusing me to ignore the responses to these patches. This is just not really > motivating me to continue interacting with you. You are busy, and that is always true for someone with your responsibilities. That is an acceptable reason to delay your responses for a time. > > "Always consider reverting the culprit commits and reapplying them later > together with necessary fixes, as this might be the least dangerous and quickest > way to fix a regression." > > I didn't introduce the regression, nor was it introduced in my area of > maintainership. It just happened to hit Xen. So I stepped up after Jan's patches > were not deemed to be the way to go, and I wrote the patches in spite of me > having other urgent work to do. In case you are feeling so strong about the fix > of the regression, why don't you ask for the patch introducing it to be reverted > instead? I have asked for this on more than one occasion, but I was either ignored or shot down every time. The fact is, among the persons who have the power to actually commit a fix, only you and Boris are currently indicating any willingness to actually fix the regression. I will say the greater responsibility for this falls on Boris because he is an x86 maintainer, and you have every right to walk away and say "I will not work on a fix," and I would not blame you or accuse you of doing anything wrong if you did that. You are under no obligation to fix this. Boris is the one who must fix it, or the Intel developers, by reverting the commit that was originally identified as the bad commit. If it is any consolation to you, Juergen, I think the greatest problem is the silence of the drm/i915 maintainers, and Thorsten also expressed some dissatisfaction because of that, but since there is also some consensus that the fix should be done in x86 or x86/pat instead of in drm/i915, another problem is the lack of initiative by the x86 developers to fix it. If they do not know how to fix it and need to rely on someone with Xen expertise, they should be giving you more assistance and feedback than they currently are. So far, only Boris shows any interest, and now my only critique of your behavior is that in your message, you chose to engage in an ad hominum attack against me instead of taking the same amount of time to at least briefly answer the questions Boris raised about your patch set over three weeks ago. Your decision to attack me instead of working on the fix was, IMHO, not helpful and constructive. > Accusing me and Boris is not acceptable at all! OK, I understand, now we are even. I have said it is unacceptable to not give greater priority to the regression fix or at least keep interested persons informed if there is a reason to continue to delay a fix, which ordinarily should only take two weeks, but now we are at more than three months. Now, you are saying it is unacceptable for me to accuse you and Boris. OK, so we are even. We each think the other is acting in an unacceptable way. I still think it is unacceptable to not work on the fix and instead engage in ad hominum attacks. Maybe I am wrong. Maybe maintainers are supposed to attack persons who are not maintainers when such outsiders try to help and encourage better cooperation and end the hostile silence by the maintainers who are responsible to fix this. But that does not make sense to me. It makes sense to hold accountable those persons who are responsible for fixing this (and you, Juergen, are not the one that needs to be held accountable). AFAICT, that is not being done and instead I am being attacked for trying to get work towards a fix rolling again. > > > I also think there is a private agreement between Juergen and Boris to > > fix this regression because AFAICT there is no evidence in the public > > mailing lists that such an agreement has been reached, yet Boris yesterday > > told me on the public mailing lists in this thread to be "patient" and that > > "we will fix this soon." Unless I am missing something, and I hope I am, > > the only way that a fix could be coming "soon" would be to presume > > that Juergen and Boris have agreed to a fix for the regression in private. > > > > However, AFAICT, keeping their solution private would be a violation of > > netiquette as described here: > > > > https://people.kernel.org/tglx/notes-about-netiquette > > > > where a whole section is devoted to the importance of keeping the > > discussion of changes to the kernel in public, with private discussions > > being a violation of the netiquette that governs the discussions that > > take place between persons interested in the Linux kernel project and > > other open source projects. > > Another uncalled for attack. I am just asking for some transparency and an indication that a fix is really and truly in sight. It would only take you a few minutes to fulfill what I am asking you to do now. The fact is, Boris commented on your patches over three weeks ago and asked you if you accepted the approach he outlined and you have remained silent. That does not indicate you and Boris are close to coming to a fix even though Boris stated that a fix is coming soon. Based on what has been said on the mailing lists, I just don't see the fix coming soon. That's all I can say about it now. > > After sending the patches I just told Boris via IRC that I wouldn't react > to any responses soon, as I was about to start my vacation. That is certainly a valid reason to delay work on this - you were on vacation. I hope you enjoyed yourself and had a good time. But I had no way of knowing this because I was not part of the IRC communication, so I cannot be blamed for not knowing this. > I will continue with the patches as soon as I find time to do so. I am willing to wait patiently for you to get back to these patches, and I hope you can agree that you should find a few minutes to confirm or deny Boris' statement that a fix is coming "soon" by posting a public message to this thread within the next two weeks, given that this regression has not been fixed for over three months. I will not be upset if you say something like: "it looks like it might take a while for Boris and I to work out the details of a fix, it might take until the end of the year," and briefly explain why there will be a delay. Boris might not like that because it would contradict his statement that a fix is coming "soon" but I would rather be told the truth - that the fix is going to be delayed, than be told a lie - that a fix is coming soon. Thanks for all the work you do. Best regards, Chuck
Hi Thorsten, I am forwarding this to you to help you cut through the noise. Unfortunately the discussion of fixes for this regression has degenerated into ad hominum attacks. I admit that I started complaining about the response of the maintainers to this regression and now they are attacking me. I do apologize, but I do not want to over-apologize. I do not apologize for trying to get the fix for this regression rolling again. After all, it has been over three months since the regression was first reported. I don't think I should be accused of doing anything wrong just for asking for some transparency, honesty, and a realistic estimate for how long it will take before a fix is committed from the maintainers responsible for and working on a fix for this regression. I do want you to provide some feedback here on the public mailing lists. I present the following message which cuts out the noise and I think describes fairly completely the problems that are preventing a fix for this regression from getting merged into the mainline kernel. Can you weigh in with your opinion about what should be done now? Best regards, Chuck On 8/14/2022 11:23 PM, Chuck Zmudzinski wrote: > On 8/14/22 4:08 AM, Juergen Gross wrote: > > > On 8/13/2022 12:56 PM, Chuck Zmudzinski wrote: > > > > > > This is a fairly long message but I think what I need to say > > > here is important for the future success of Linux and open > > > source software, so here goes.... > > > > > > Update: I accept Boris Petkov's response to me yesterday as reasonable > > > and acceptable if within two weeks he at least explains on the public > > > mailing lists how he and Juergen have privately agreed to fix this regression > > > "soon" if he does not actually fix the regression by then with a commit, > > > patch set, or merge. The two-week time frame is from here: > > > > > > https://www.kernel.org/doc/html/latest/process/handling-regressions.html > > > > > > where developers and maintainers are exhorted as follows: "Try to fix > > > regressions quickly once the culprit has been identified; fixes for most > > > regressions should be merged within two weeks, but some need to be > > > resolved within two or three days." > > > > And some more citations from the same document: > > > > "Prioritize work on handling regression reports and fixing regression over all > > other Linux kernel work, unless the latter concerns acute security issues or > > bugs causing data loss or damage." > > > > First thing to note here: "over all Linux kernel work". I' not only working > > on the kernel, but I have other responsibilities e.g. in the Xen community, > > where I was sending patches for fixing a regression and where I'm quite busy > > doing security related work. Apart from that I'm of course responsible to > > handle SUSE customers' bug reports at a rather high priority. So please stop > > accusing me to ignore the responses to these patches. This is just not really > > motivating me to continue interacting with you. > > You are busy, and that is always true for someone with your responsibilities. > That is an acceptable reason to delay your responses for a time. > > > > > "Always consider reverting the culprit commits and reapplying them later > > together with necessary fixes, as this might be the least dangerous and quickest > > way to fix a regression." > > > > I didn't introduce the regression, nor was it introduced in my area of > > maintainership. It just happened to hit Xen. So I stepped up after Jan's patches > > were not deemed to be the way to go, and I wrote the patches in spite of me > > having other urgent work to do. In case you are feeling so strong about the fix > > of the regression, why don't you ask for the patch introducing it to be reverted > > instead? > > I have asked for this on more than one occasion, but I was either > ignored or shot down every time. The fact is, among the persons > who have the power to actually commit a fix, only you and Boris > are currently indicating any willingness to actually fix the regression. > I will say the greater responsibility for this falls on Boris because > he is an x86 maintainer, and you have every right to walk away > and say "I will not work on a fix," and I would not blame you or accuse > you of doing anything wrong if you did that. You are under no obligation > to fix this. Boris is the one who must fix it, or the Intel developers, > by reverting the commit that was originally identified as the bad > commit. > > If it is any consolation to you, Juergen, I think the greatest problem > is the silence of the drm/i915 maintainers, and Thorsten also expressed > some dissatisfaction because of that, but since there is also some > consensus that the fix should be done in x86 or x86/pat instead of > in drm/i915, another problem is the lack of initiative by the x86 > developers to fix it. If they do not know how to fix it and need to > rely on someone with Xen expertise, they should be giving you > more assistance and feedback than they currently are. So far, only > Boris shows any interest, and now my only critique of your behavior > is that in your message, you chose to engage in an ad hominum attack > against me instead of taking the same amount of time to at least > briefly answer the questions Boris raised about your patch set over > three weeks ago. Your decision to attack me instead of working on > the fix was, IMHO, not helpful and constructive. > > Accusing me and Boris is not acceptable at all! > > OK, I understand, now we are even. I have said it is unacceptable to > not give greater priority to the regression fix or at least keep interested > persons informed if there is a reason to continue to delay a fix, which > ordinarily should only take two weeks, but now we are at more than > three months. Now, you are saying it is unacceptable for me to accuse > you and Boris. OK, so we are even. We each think the other is acting > in an unacceptable way. I still think it is unacceptable to not work on > the fix and instead engage in ad hominum attacks. Maybe I am wrong. > Maybe maintainers are supposed to attack persons who are not > maintainers when such outsiders try to help and encourage better > cooperation and end the hostile silence by the maintainers who are > responsible to fix this. But that does not make sense to me. It makes > sense to hold accountable those persons who are responsible for fixing > this (and you, Juergen, are not the one that needs to be held accountable). > AFAICT, that is not being done and instead I am being attacked for trying > to get work towards a fix rolling again. > > > > > > I also think there is a private agreement between Juergen and Boris to > > > fix this regression because AFAICT there is no evidence in the public > > > mailing lists that such an agreement has been reached, yet Boris yesterday > > > told me on the public mailing lists in this thread to be "patient" and that > > > "we will fix this soon." Unless I am missing something, and I hope I am, > > > the only way that a fix could be coming "soon" would be to presume > > > that Juergen and Boris have agreed to a fix for the regression in private. > > > > > > However, AFAICT, keeping their solution private would be a violation of > > > netiquette as described here: > > > > > > https://people.kernel.org/tglx/notes-about-netiquette > > > > > > where a whole section is devoted to the importance of keeping the > > > discussion of changes to the kernel in public, with private discussions > > > being a violation of the netiquette that governs the discussions that > > > take place between persons interested in the Linux kernel project and > > > other open source projects. > > > > Another uncalled for attack. > > I am just asking for some transparency and an indication that > a fix is really and truly in sight. It would only take you a few > minutes to fulfill what I am asking you to do now. The fact is, > Boris commented on your patches over three weeks ago and > asked you if you accepted the approach he outlined and you > have remained silent. That does not indicate you and Boris > are close to coming to a fix even though Boris stated that a fix > is coming soon. Based on what has been said on the mailing > lists, I just don't see the fix coming soon. That's all I can say > about it now. > > > > > After sending the patches I just told Boris via IRC that I wouldn't react > > to any responses soon, as I was about to start my vacation. > > That is certainly a valid reason to delay work on this - you were on > vacation. I hope you enjoyed yourself and had a good time. But I > had no way of knowing this because I was not part of the IRC > communication, so I cannot be blamed for not knowing this. > > > I will continue with the patches as soon as I find time to do so. > > I am willing to wait patiently for you to get back to these patches, > and I hope you can agree that you should find a few minutes > to confirm or deny Boris' statement that a fix is coming "soon" > by posting a public message to this thread within the next two > weeks, given that this regression has not been fixed for over three > months. I will not be upset if you say something like: "it looks like > it might take a while for Boris and I to work out the details of a fix, > it might take until the end of the year," and briefly explain why there > will be a delay. Boris might not like that because it would contradict > his statement that a fix is coming "soon" but I would rather be told > the truth - that the fix is going to be delayed, than be told a lie - that > a fix is coming soon. > > Thanks for all the work you do. > > Best regards, > > Chuck
Hi Chuck! On 15.08.22 18:56, Chuck Zmudzinski wrote: > > I am forwarding this to you to help you cut through the noise. Sorry for not replying earlier, I ignored this thread and all other non-urgent mail in the past two weeks: I was on vacation until a few days ago and when I came home I had to deal with some other stuff first. > I do not apologize for trying to get > the fix for this regression rolling again. Yeah, it's important to ensure regressions don't simply fall though the cracks, but my advice in this case: let things rest for a few days now, the right people have the issue on their radar again; give them time to breath and work out a solution: it's not something that can be fixed easily within a few minutes by one person alone, as previous discussions have shown (also keep in mind that the merge window was open until yesterday, which keeps many maintainers quite busy). And FWIW: I've seen indicators that a solution to resolve this is hopefully pretty close now. > After all, it has been over three months > since the regression was first reported. Yes, things take/took to long, as a few things were far from ideal how this regression was dealt with. But that happens sometimes, we're all just humans and make errors. I did a few as well and learned a thing or two from then. Due to that I'll do a few things slightly different in the future to hopefully get similar situations resolved a lot quicker in the future. Ciao, Thorsten
On 15.08.22 20:17, Chuck Zmudzinski wrote: > On 8/15/2022 2:00 PM, Thorsten Leemhuis wrote: > >> the right people have the issue on their radar again; give them time to >> breath and work out a solution: it's not something that can be fixed >> easily within a few minutes by one person alone, as previous discussions >> have shown (also keep in mind that the merge window was open until >> yesterday, which keeps many maintainers quite busy). >> >> And FWIW: I've seen indicators that a solution to resolve this is >> hopefully pretty close now. > > That's good to know. But I must ask, can you provide a link to a public > discussion that indicates a fix is close? I just searched for the commit id of the culprit yesterday like this: https://lore.kernel.org/all/?q=bdd8b6c982* Which brought me to this message, which looks like Boris applied a slightly(?) modified version of Jan's patch to a branch that afaik is regularly pushed to Linus: https://lore.kernel.org/all/166055884287.401.612271624942869534.tip-bot2@tip-bot2/ So unless problems show up in linux-next I expect this will land in master soon (and a bit later be backported to stable due to the CC stable tag). > Or do you know a fix is close > because of private discussions? That distinction is important to me > because open source software is much less useful to me if the solutions > to problems are not discussed openly (except, of course, for solutions > to security vulnerabilities that are not yet public). You IMHO are expecting a bit too much here IMHO. Solutions to problems in open source software get discussed on various, sometimes private channels all the time. Just take conferences for example, where people discuss them during talks, meetings, or in one-to-ones over coffee; sometimes they are the only way to solve complex problems. But as you can see from above link it's not like anybody is trying to sneak things into the kernel. Ciao, Thorsten
On 16.08.22 18:16, Chuck Zmudzinski wrote: > On 8/16/2022 10:41 AM, Thorsten Leemhuis wrote: >> On 15.08.22 20:17, Chuck Zmudzinski wrote: >>> On 8/15/2022 2:00 PM, Thorsten Leemhuis wrote: >>> >>>> And FWIW: I've seen indicators that a solution to resolve this is >>>> hopefully pretty close now. >>> That's good to know. But I must ask, can you provide a link to a public >>> discussion that indicates a fix is close? >> I just searched for the commit id of the culprit yesterday like this: >> https://lore.kernel.org/all/?q=bdd8b6c982* >> >> Which brought me to this message, which looks like Boris applied a >> slightly(?) modified version of Jan's patch to a branch that afaik is >> regularly pushed to Linus: >> https://lore.kernel.org/all/166055884287.401.612271624942869534.tip-bot2@tip-bot2/ >> >> So unless problems show up in linux-next I expect this will land in >> master soon (and a bit later be backported to stable due to the CC >> stable tag). > > OK, that's exactly the kind of thing I am looking for. It would be > nice if regzbot could have found that patch in that tree and > display it in the web interface as a notable patch. Currently, > regzbot is only linking to a dead patch that does not even fix > the regression as a notable patch associated with this regression. > > If regzbot is not yet smart enough to find it, could you take the > time to manually intervene with a regzbot command so that > patch is displayed as a notable patch for this regression? regzbot will notice when the patch hit's Linux next, where many changes land and hang around for a few days before they hit mainline. Watching all the different development trees would be possible and would catch this patch earlier, but I'm not sure that's worth the work. Maybe regzbot will do that one day, but there are more important missing features on my todo list for now. Ciao, Thorsten
On 8/16/22 1:28 PM, Chuck Zmudzinski wrote: > On 8/16/2022 12:53 PM, Thorsten Leemhuis wrote: > > > > > regzbot will notice when the patch hit's Linux next, > > IIUC, regzbot might not notice because the patch lacks a Link: tag > to the original regression report. The Link tag is to Jan's patch > that was posted sometime in April, I think, which also lacks the > Link tag to the original report of the regression which did not > happen until May 4. If regzbot is smart enough to notice that the > patch also has a Fixes: tag for the commit that was identified as > bad in the original regression report, then I expect regzbot will > find it. Hey, I see the patch hit linux-next and regzbot noticed and now lists the patch as an incoming fix. Great job with regzbot! By the way, I think regzbot is a great idea, and I think any resources devoted to develop it more would pay handsome returns for the quality of Linux. If no one but you is working on it, I actually might be willing to volunteer some time to help you develop it. Best regards, Chuck