Message ID | aa33f1e4-5a91-aaaf-70f1-557148b29b38@linaro.org |
---|---|
State | New |
Headers | show |
On 5/29/2016 11:11 PM, Minchan Kim wrote: > On Fri, May 27, 2016 at 11:16:41AM -0700, Shi, Yang wrote: > > <snip> > >>> >>> If we goes this way, how to guarantee this race? >> >> Thanks for pointing out this. It sounds reasonable. However, this >> should be only possible to happen on 32 bit since just 32 bit >> version page_is_idle() calls lookup_page_ext(), it doesn't do it on >> 64 bit. >> >> And, such race condition should exist regardless of whether DEBUG_VM >> is enabled or not, right? >> >> rcu might be good enough to protect it. >> >> A quick fix may look like: >> >> diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h >> index 8f5d4ad..bf0cd6a 100644 >> --- a/include/linux/page_idle.h >> +++ b/include/linux/page_idle.h >> @@ -77,8 +77,12 @@ static inline bool >> test_and_clear_page_young(struct page *page) >> static inline bool page_is_idle(struct page *page) >> { >> struct page_ext *page_ext; >> + >> + rcu_read_lock(); >> page_ext = lookup_page_ext(page); >> + rcu_read_unlock(); >> + >> if (unlikely(!page_ext)) >> return false; >> >> diff --git a/mm/page_ext.c b/mm/page_ext.c >> index 56b160f..94927c9 100644 >> --- a/mm/page_ext.c >> +++ b/mm/page_ext.c >> @@ -183,7 +183,6 @@ struct page_ext *lookup_page_ext(struct page *page) >> { >> unsigned long pfn = page_to_pfn(page); >> struct mem_section *section = __pfn_to_section(pfn); >> -#if defined(CONFIG_DEBUG_VM) || defined(CONFIG_PAGE_POISONING) >> /* >> * The sanity checks the page allocator does upon freeing a >> * page can reach here before the page_ext arrays are >> @@ -195,7 +194,7 @@ struct page_ext *lookup_page_ext(struct page *page) >> */ >> if (!section->page_ext) >> return NULL; >> -#endif >> + >> return section->page_ext + pfn; >> } >> >> @@ -279,7 +278,8 @@ static void __free_page_ext(unsigned long pfn) >> return; >> base = ms->page_ext + pfn; >> free_page_ext(base); >> - ms->page_ext = NULL; >> + rcu_assign_pointer(ms->page_ext, NULL); >> + synchronize_rcu(); > > How does it fix the problem? > I cannot understand your point. Assigning NULL pointer to page_Ext will be blocked until rcu_read_lock critical section is done, so the lookup and writing operations will be serialized. And, rcu_read_lock disables preempt too. Yang > >> } >> >> static int __meminit online_page_ext(unsigned long start_pfn, >> >> Thanks, >> Yang >> >>> >>> kpageflags_read >>> stable_page_flags >>> page_is_idle >>> lookup_page_ext >>> section = __pfn_to_section(pfn) >>> offline_pages >>> memory_notify(MEM_OFFLINE) >>> offline_page_ext >>> ms->page_ext = NULL >>> section->page_ext + pfn >>> >>>> >>>> Thanks. >>>> >>
On 6/1/2016 10:00 PM, Minchan Kim wrote: > On Wed, Jun 01, 2016 at 01:40:48PM -0700, Shi, Yang wrote: >> On 5/29/2016 11:11 PM, Minchan Kim wrote: >>> On Fri, May 27, 2016 at 11:16:41AM -0700, Shi, Yang wrote: >>> >>> <snip> >>> >>>>> >>>>> If we goes this way, how to guarantee this race? >>>> >>>> Thanks for pointing out this. It sounds reasonable. However, this >>>> should be only possible to happen on 32 bit since just 32 bit >>>> version page_is_idle() calls lookup_page_ext(), it doesn't do it on >>>> 64 bit. >>>> >>>> And, such race condition should exist regardless of whether DEBUG_VM >>>> is enabled or not, right? >>>> >>>> rcu might be good enough to protect it. >>>> >>>> A quick fix may look like: >>>> >>>> diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h >>>> index 8f5d4ad..bf0cd6a 100644 >>>> --- a/include/linux/page_idle.h >>>> +++ b/include/linux/page_idle.h >>>> @@ -77,8 +77,12 @@ static inline bool >>>> test_and_clear_page_young(struct page *page) >>>> static inline bool page_is_idle(struct page *page) >>>> { >>>> struct page_ext *page_ext; >>>> + >>>> + rcu_read_lock(); >>>> page_ext = lookup_page_ext(page); >>>> + rcu_read_unlock(); >>>> + >>>> if (unlikely(!page_ext)) >>>> return false; >>>> >>>> diff --git a/mm/page_ext.c b/mm/page_ext.c >>>> index 56b160f..94927c9 100644 >>>> --- a/mm/page_ext.c >>>> +++ b/mm/page_ext.c >>>> @@ -183,7 +183,6 @@ struct page_ext *lookup_page_ext(struct page *page) >>>> { >>>> unsigned long pfn = page_to_pfn(page); >>>> struct mem_section *section = __pfn_to_section(pfn); >>>> -#if defined(CONFIG_DEBUG_VM) || defined(CONFIG_PAGE_POISONING) >>>> /* >>>> * The sanity checks the page allocator does upon freeing a >>>> * page can reach here before the page_ext arrays are >>>> @@ -195,7 +194,7 @@ struct page_ext *lookup_page_ext(struct page *page) >>>> */ >>>> if (!section->page_ext) >>>> return NULL; >>>> -#endif >>>> + >>>> return section->page_ext + pfn; >>>> } >>>> >>>> @@ -279,7 +278,8 @@ static void __free_page_ext(unsigned long pfn) >>>> return; >>>> base = ms->page_ext + pfn; >>>> free_page_ext(base); >>>> - ms->page_ext = NULL; >>>> + rcu_assign_pointer(ms->page_ext, NULL); >>>> + synchronize_rcu(); >>> >>> How does it fix the problem? >>> I cannot understand your point. >> >> Assigning NULL pointer to page_Ext will be blocked until >> rcu_read_lock critical section is done, so the lookup and writing >> operations will be serialized. And, rcu_read_lock disables preempt >> too. > > I meant your rcu_read_lock in page_idle should cover test_bit op. Yes, definitely. Thanks for catching it. > One more thing, you should use rcu_dereference. I will check which one is the best since I saw some use rcu_assign_pointer. > > As well, please cover memory onlining case I mentioned in another > thread as well as memory offlining. I will look into it too. Thanks, Yang > > Anyway, to me, every caller of page_ext should prepare lookup_page_ext > can return NULL anytime and they should use rcu_read_[un]lock, which > is not good. :( >
diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h index 8f5d4ad..bf0cd6a 100644 --- a/include/linux/page_idle.h +++ b/include/linux/page_idle.h @@ -77,8 +77,12 @@ static inline bool test_and_clear_page_young(struct page *page) static inline bool page_is_idle(struct page *page) { struct page_ext *page_ext; + + rcu_read_lock(); page_ext = lookup_page_ext(page); + rcu_read_unlock(); + if (unlikely(!page_ext)) return false; diff --git a/mm/page_ext.c b/mm/page_ext.c index 56b160f..94927c9 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -183,7 +183,6 @@ struct page_ext *lookup_page_ext(struct page *page) { unsigned long pfn = page_to_pfn(page); struct mem_section *section = __pfn_to_section(pfn); -#if defined(CONFIG_DEBUG_VM) || defined(CONFIG_PAGE_POISONING) /* * The sanity checks the page allocator does upon freeing a * page can reach here before the page_ext arrays are @@ -195,7 +194,7 @@ struct page_ext *lookup_page_ext(struct page *page) */ if (!section->page_ext) return NULL; -#endif + return section->page_ext + pfn; } @@ -279,7 +278,8 @@ static void __free_page_ext(unsigned long pfn) return; base = ms->page_ext + pfn; free_page_ext(base); - ms->page_ext = NULL; + rcu_assign_pointer(ms->page_ext, NULL); + synchronize_rcu(); } static int __meminit online_page_ext(unsigned long start_pfn,