diff mbox series

scripts/checkpatch.pl: Enforce multiline comment syntax

Message ID 20180809160011.17225-1-peter.maydell@linaro.org
State Superseded
Headers show
Series scripts/checkpatch.pl: Enforce multiline comment syntax | expand

Commit Message

Peter Maydell Aug. 9, 2018, 4 p.m. UTC
We now require Linux-kernel-style multiline comments:
    /*
     * line one
     * line two
     */

Enforce this in checkpatch.pl, by backporting the relevant
parts of the Linux kernel's checkpatch.pl. (The only changes
needed are that Linux's checkpatch.pl WARN() function takes
an extra argument that ours does not.)

The kernel's checkpatch does not enforce "leading /* on
a line of its own, so that part is unique to QEMU's checkpatch.

Sample warning output:

WARNING: Block comments use a leading /* on a separate line
#34: FILE: hw/intc/arm_gicv3_common.c:39:
+    /* Older versions of QEMU had a bug in the handling of state save/restore

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

---
I'm still not used to the leeading-/*-on-it's-own style,
so having checkpatch catch my lapses is handy...

I used WARN level severity mostly because Linux does.
---
 scripts/checkpatch.pl | 48 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

-- 
2.17.1

Comments

Paolo Bonzini Aug. 9, 2018, 4:43 p.m. UTC | #1
On 09/08/2018 18:00, Peter Maydell wrote:
> The kernel's checkpatch does not enforce "leading /* on

> a line of its own, so that part is unique to QEMU's checkpatch.


Yeah, that's because for some reason the Linux network subsystem uses
the style without the lone leading "/*".  Which is actually what QEMU is
using in most of the code, I think, so...

> I'm still not used to the leeading-/*-on-it's-own style,

> so having checkpatch catch my lapses is handy...


... if it's not what we are using, why enforce it?

Paolo
Peter Maydell Aug. 9, 2018, 5:03 p.m. UTC | #2
On 9 August 2018 at 17:43, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 09/08/2018 18:00, Peter Maydell wrote:

>> The kernel's checkpatch does not enforce "leading /* on

>> a line of its own, so that part is unique to QEMU's checkpatch.

>

> Yeah, that's because for some reason the Linux network subsystem uses

> the style without the lone leading "/*".


Yes. Linux checkpatch warns if you use lone-leading-/* in the
network subsystem. What it doesn't do but should is warn you
if you *don't* use lone-leading-/* anywhere else. (I've had
kernel patches nitpicked for getting this wrong before.)
If I'm feeling enthusiastic enough I might submit a patch
to Linux checkpatch to do

   if (networking subsystem) {
        warn if lone leading opener used;
   } else {
        warn if lone leading opener not used;
   }

> Which is actually what QEMU is

> using in most of the code, I think, so...

>

>> I'm still not used to the leeading-/*-on-it's-own style,

>> so having checkpatch catch my lapses is handy...

>

> ... if it's not what we are using, why enforce it?


See the enormous long threads on the recent changes to CODING_STYLE:
https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00696.html
https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg02717.html

Basically, I wanted to rule out things like

  /* this
     weirdness */

and lots of other people wanted (a) to not have

  /* this thing
   * which I think is fine
   */

and (b) to consistently define only one format as OK.

So I accepted having my personal preferred format not being
permitted in order to get consensus on getting rid of the
formats I think are really ugly :-)

thanks
-- PMM
Markus Armbruster Aug. 10, 2018, 6:22 a.m. UTC | #3
Peter Maydell <peter.maydell@linaro.org> writes:

> We now require Linux-kernel-style multiline comments:

>     /*

>      * line one

>      * line two

>      */

>

> Enforce this in checkpatch.pl, by backporting the relevant

> parts of the Linux kernel's checkpatch.pl. (The only changes

> needed are that Linux's checkpatch.pl WARN() function takes

> an extra argument that ours does not.)


Really?  [*]

>

> The kernel's checkpatch does not enforce "leading /* on

> a line of its own, so that part is unique to QEMU's checkpatch.

>

> Sample warning output:

>

> WARNING: Block comments use a leading /* on a separate line

> #34: FILE: hw/intc/arm_gicv3_common.c:39:

> +    /* Older versions of QEMU had a bug in the handling of state save/restore

>

> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

> ---

> I'm still not used to the leeading-/*-on-it's-own style,

> so having checkpatch catch my lapses is handy...


Yes, please!

> I used WARN level severity mostly because Linux does.

> ---

>  scripts/checkpatch.pl | 48 +++++++++++++++++++++++++++++++++++++++++++

>  1 file changed, 48 insertions(+)

>

> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl

> index 42e1c50dd80..18bc3c59c85 100755

> --- a/scripts/checkpatch.pl

> +++ b/scripts/checkpatch.pl

> @@ -1566,6 +1566,54 @@ sub process {

>  # check we are in a valid C source file if not then ignore this hunk

>  		next if ($realfile !~ /\.(h|c|cpp)$/);

>  

> +# Block comment styles

> +

> +		# Block comments use /* on a line of its own

> +		if ($rawline !~ m@^\+.*/\*.*\*/[ \t]*$@ &&	#inline /*...*/

> +		    $rawline =~ m@^\+.*/\*\*?[ \t]*.+[ \t]*$@) { # /* or /** non-blank

> +			WARN("Block comments use a leading /* on a separate line\n" . $herecurr);

> +		}

> +


[*] The kernel's has

   # Networking with an initial /*
                   if ($realfile =~ m@^(drivers/net/|net/)@ &&
                       $prevrawline =~ /^\+[ \t]*\/\*[ \t]*$/ &&
                       $rawline =~ /^\+[ \t]*\*/ &&
                       $realline > 2) {
                           WARN("NETWORKING_BLOCK_COMMENT_STYLE",
                                "networking block comments don't use an empty /* line, use /* Comment...\n" . $hereprev);
                   }

which makes no sense for us.  Your code looks good to me, but your
commit message claims it doesn't exist :)

The remainder matches the kernel's version.

> +# Block comments use * on subsequent lines

> +		if ($prevline =~ /$;[ \t]*$/ &&			#ends in comment

> +		    $prevrawline =~ /^\+.*?\/\*/ &&		#starting /*

> +		    $prevrawline !~ /\*\/[ \t]*$/ &&		#no trailing */

> +		    $rawline =~ /^\+/ &&			#line is new

> +		    $rawline !~ /^\+[ \t]*\*/) {		#no leading *

> +			WARN("Block comments use * on subsequent lines\n" . $hereprev);

> +		}

> +

> +# Block comments use */ on trailing lines

> +		if ($rawline !~ m@^\+[ \t]*\*/[ \t]*$@ &&	#trailing */

> +		    $rawline !~ m@^\+.*/\*.*\*/[ \t]*$@ &&	#inline /*...*/

> +		    $rawline !~ m@^\+.*\*{2,}/[ \t]*$@ &&	#trailing **/

> +		    $rawline =~ m@^\+[ \t]*.+\*\/[ \t]*$@) {	#non blank */

> +			WARN("Block comments use a trailing */ on a separate line\n" . $herecurr);

> +		}

> +

> +# Block comment * alignment

> +		if ($prevline =~ /$;[ \t]*$/ &&			#ends in comment

> +		    $line =~ /^\+[ \t]*$;/ &&			#leading comment

> +		    $rawline =~ /^\+[ \t]*\*/ &&		#leading *

> +		    (($prevrawline =~ /^\+.*?\/\*/ &&		#leading /*

> +		      $prevrawline !~ /\*\/[ \t]*$/) ||		#no trailing */

> +		     $prevrawline =~ /^\+[ \t]*\*/)) {		#leading *

> +			my $oldindent;

> +			$prevrawline =~ m@^\+([ \t]*/?)\*@;

> +			if (defined($1)) {

> +				$oldindent = expand_tabs($1);

> +			} else {

> +				$prevrawline =~ m@^\+(.*/?)\*@;

> +				$oldindent = expand_tabs($1);

> +			}

> +			$rawline =~ m@^\+([ \t]*)\*@;

> +			my $newindent = $1;

> +			$newindent = expand_tabs($newindent);

> +			if (length($oldindent) ne length($newindent)) {

> +				WARN("Block comments should align the * on each line\n" . $hereprev);

> +			}

> +		}

> +

>  # Check for potential 'bare' types

>  		my ($stat, $cond, $line_nr_next, $remain_next, $off_next,

>  		    $realline_next);


With a corrected commit message:
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Paolo Bonzini Aug. 10, 2018, 8:34 a.m. UTC | #4
On 09/08/2018 19:03, Peter Maydell wrote:
> On 9 August 2018 at 17:43, Paolo Bonzini <pbonzini@redhat.com> wrote:

>>> I'm still not used to the leeading-/*-on-it's-own style,

>>> so having checkpatch catch my lapses is handy...

>>

>> ... if it's not what we are using, why enforce it?

> 

> See the enormous long threads on the recent changes to CODING_STYLE:

> https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00696.html

> https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg02717.html

> 

> Basically, I wanted to rule out things like

> 

>   /* this

>      weirdness */

> 

> and lots of other people wanted (a) to not have

> 

>   /* this thing

>    * which I think is fine

>    */

> 

> and (b) to consistently define only one format as OK.

> 

> So I accepted having my personal preferred format not being

> permitted in order to get consensus on getting rid of the

> formats I think are really ugly :-)


This is one of the cases where we are decently consistent:

Lone "/*" or "/**": 9986 cases
	of which in the first column: 7617
	of which the first line in the file (license headers): 2834
	regex: ^[ \t]*/\*\*?[ \t]*$

"/*" with the first line of the comment: 11246
	of which in the first column: 4985
	of which the first line in the file: 97
	regex: ^[ \t]*/\*\*?+(?:(?!\*/).)+?$

License headers almost always have the "lone /*" format.  Apart from
license headers, 63% of the comments have the now-deprecated format.

Inside functions, 73% of the comments have the now-deprecated format.
Outside functions it's 50-50.  That's because there are 2024 doc
comments, which in turn are 50% of the comments that are 1) outside the
functions 2) using a lone "/*".

So my proposal, which is actually consistent with what QEMU is doing, is
the following:

1) the first line of a file should always be "/*", otherwise warn

2) a comment that starts with "/**" should have it on a lone line

3) every other multiline comment should start with
"/*<whitespace><something>"

Yes, there is overlap between QEMU and Linux developers, but really only
in a few subsystems (s390, pSeries, networking---which uses the "other"
comment style), and I don't see why we should pretend that QEMU and
Linux use similar coding styles.  In fact they couldn't be more
different: spaces vs. tabs, indent-4 vs. indent-8, camelcase struct
names with typedefs...  Basically the only thing that is the same is
lowercase for variable names and braces on the same line as the
statement.  Linux's checkpatch was a useful base not because Linux and
QEMU are similar, but only because of the complex expression parsing
stuff that really is the same for _any_ sane coding style (even GNU ;)).

Paolo
Peter Maydell Aug. 10, 2018, 9:07 a.m. UTC | #5
On 10 August 2018 at 07:22, Markus Armbruster <armbru@redhat.com> wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:

>

>> We now require Linux-kernel-style multiline comments:

>>     /*

>>      * line one

>>      * line two

>>      */

>>

>> Enforce this in checkpatch.pl, by backporting the relevant

>> parts of the Linux kernel's checkpatch.pl. (The only changes

>> needed are that Linux's checkpatch.pl WARN() function takes

>> an extra argument that ours does not.)

>

> Really?  [*]


Yes; the parts that I have taken from checkpatch.pl
are only modified by adjusting the WARN() function calls.

>> The kernel's checkpatch does not enforce "leading /* on

>> a line of its own, so that part is unique to QEMU's checkpatch.

>>

>> Sample warning output:

>>

>> WARNING: Block comments use a leading /* on a separate line

>> #34: FILE: hw/intc/arm_gicv3_common.c:39:

>> +    /* Older versions of QEMU had a bug in the handling of state save/restore

>>

>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

>> ---

>> I'm still not used to the leeading-/*-on-it's-own style,

>> so having checkpatch catch my lapses is handy...

>

> Yes, please!

>

>> I used WARN level severity mostly because Linux does.

>> ---

>>  scripts/checkpatch.pl | 48 +++++++++++++++++++++++++++++++++++++++++++

>>  1 file changed, 48 insertions(+)

>>

>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl

>> index 42e1c50dd80..18bc3c59c85 100755

>> --- a/scripts/checkpatch.pl

>> +++ b/scripts/checkpatch.pl

>> @@ -1566,6 +1566,54 @@ sub process {

>>  # check we are in a valid C source file if not then ignore this hunk

>>               next if ($realfile !~ /\.(h|c|cpp)$/);

>>

>> +# Block comment styles

>> +

>> +             # Block comments use /* on a line of its own

>> +             if ($rawline !~ m@^\+.*/\*.*\*/[ \t]*$@ &&      #inline /*...*/

>> +                 $rawline =~ m@^\+.*/\*\*?[ \t]*.+[ \t]*$@) { # /* or /** non-blank

>> +                     WARN("Block comments use a leading /* on a separate line\n" . $herecurr);

>> +             }

>> +


This is the bit that is "unique to QEMU's checkpatch",
per the commit message.


> [*] The kernel's has

>

>    # Networking with an initial /*

>                    if ($realfile =~ m@^(drivers/net/|net/)@ &&

>                        $prevrawline =~ /^\+[ \t]*\/\*[ \t]*$/ &&

>                        $rawline =~ /^\+[ \t]*\*/ &&

>                        $realline > 2) {

>                            WARN("NETWORKING_BLOCK_COMMENT_STYLE",

>                                 "networking block comments don't use an empty /* line, use /* Comment...\n" . $hereprev);

>                    }

>

> which makes no sense for us.


This is a part which I did not take from kernel checkpatch;
it is not a "relevant part", per the commit message.

> With a corrected commit message:


The commit message still makes sense to me.

> Reviewed-by: Markus Armbruster <armbru@redhat.com>


thanks
-- PMM
Peter Maydell Aug. 10, 2018, 9:10 a.m. UTC | #6
On 10 August 2018 at 09:34, Paolo Bonzini <pbonzini@redhat.com> wrote:
> This is one of the cases where we are decently consistent:

>

> Lone "/*" or "/**": 9986 cases

>         of which in the first column: 7617

>         of which the first line in the file (license headers): 2834

>         regex: ^[ \t]*/\*\*?[ \t]*$

>

> "/*" with the first line of the comment: 11246

>         of which in the first column: 4985

>         of which the first line in the file: 97

>         regex: ^[ \t]*/\*\*?+(?:(?!\*/).)+?$

>

> License headers almost always have the "lone /*" format.  Apart from

> license headers, 63% of the comments have the now-deprecated format.

>

> Inside functions, 73% of the comments have the now-deprecated format.

> Outside functions it's 50-50.  That's because there are 2024 doc

> comments, which in turn are 50% of the comments that are 1) outside the

> functions 2) using a lone "/*".

>

> So my proposal, which is actually consistent with what QEMU is doing, is

> the following:

>

> 1) the first line of a file should always be "/*", otherwise warn

>

> 2) a comment that starts with "/**" should have it on a lone line

>

> 3) every other multiline comment should start with

> "/*<whitespace><something>"


Personally I would prefer your suggestion, but as I say, there
was no consensus in the thread for it, and there was consensus
for "use the kernel's style here". I don't think we gain much
from reopening the debate at this point.

thanks
-- PMM
Markus Armbruster Aug. 10, 2018, 9:41 a.m. UTC | #7
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 09/08/2018 19:03, Peter Maydell wrote:

>> On 9 August 2018 at 17:43, Paolo Bonzini <pbonzini@redhat.com> wrote:

>>>> I'm still not used to the leeading-/*-on-it's-own style,

>>>> so having checkpatch catch my lapses is handy...

>>>

>>> ... if it's not what we are using, why enforce it?

>> 

>> See the enormous long threads on the recent changes to CODING_STYLE:

>> https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00696.html

>> https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg02717.html

>> 

>> Basically, I wanted to rule out things like

>> 

>>   /* this

>>      weirdness */

>> 

>> and lots of other people wanted (a) to not have

>> 

>>   /* this thing

>>    * which I think is fine

>>    */

>> 

>> and (b) to consistently define only one format as OK.

>> 

>> So I accepted having my personal preferred format not being

>> permitted in order to get consensus on getting rid of the

>> formats I think are really ugly :-)

>

> This is one of the cases where we are decently consistent:

>

> Lone "/*" or "/**": 9986 cases

> 	of which in the first column: 7617

> 	of which the first line in the file (license headers): 2834

> 	regex: ^[ \t]*/\*\*?[ \t]*$

>

> "/*" with the first line of the comment: 11246

> 	of which in the first column: 4985

> 	of which the first line in the file: 97

> 	regex: ^[ \t]*/\*\*?+(?:(?!\*/).)+?$

>

> License headers almost always have the "lone /*" format.  Apart from

> license headers, 63% of the comments have the now-deprecated format.

>

> Inside functions, 73% of the comments have the now-deprecated format.

> Outside functions it's 50-50.  That's because there are 2024 doc

> comments, which in turn are 50% of the comments that are 1) outside the

> functions 2) using a lone "/*".

>

> So my proposal, which is actually consistent with what QEMU is doing, is

> the following:

>

> 1) the first line of a file should always be "/*", otherwise warn

>

> 2) a comment that starts with "/**" should have it on a lone line

>

> 3) every other multiline comment should start with

> "/*<whitespace><something>"


(3) is not consistent with what QEMU is doing in this case, since
according to your data there is no consistency in what QEMU is doing.

Moreover, requiring winged comment start in some, but not all places is
more complicated than simply requiring it everywhere.

The "enourmous long threads" quoted by Peter resulted in strong
agreement we don't want

   /* this
      weirdness */

and rough consensus to go with

   /*
    * Wing your multiline comment
    * at both ends, please.
    */

going forward.

If there's actually no consensus on this, then the issue devolves to
subsystem maintainers.

For what it's worth, I've been demanding "wing your multi-line comments
at both ends", and intend to continue doing so, unless we adopt a
different project-wide multi-line comment style, enforced by checkpatch.

[...]
Markus Armbruster Aug. 10, 2018, 12:08 p.m. UTC | #8
Peter Maydell <peter.maydell@linaro.org> writes:

> On 10 August 2018 at 07:22, Markus Armbruster <armbru@redhat.com> wrote:

>> Peter Maydell <peter.maydell@linaro.org> writes:

>>

>>> We now require Linux-kernel-style multiline comments:

>>>     /*

>>>      * line one

>>>      * line two

>>>      */

>>>

>>> Enforce this in checkpatch.pl, by backporting the relevant

>>> parts of the Linux kernel's checkpatch.pl. (The only changes

>>> needed are that Linux's checkpatch.pl WARN() function takes

>>> an extra argument that ours does not.)

>>

>> Really?  [*]

>

> Yes; the parts that I have taken from checkpatch.pl

> are only modified by adjusting the WARN() function calls.

>

>>> The kernel's checkpatch does not enforce "leading /* on

>>> a line of its own, so that part is unique to QEMU's checkpatch.

>>>

>>> Sample warning output:

>>>

>>> WARNING: Block comments use a leading /* on a separate line

>>> #34: FILE: hw/intc/arm_gicv3_common.c:39:

>>> +    /* Older versions of QEMU had a bug in the handling of state save/restore

>>>

>>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

>>> ---

>>> I'm still not used to the leeading-/*-on-it's-own style,

>>> so having checkpatch catch my lapses is handy...

>>

>> Yes, please!

>>

>>> I used WARN level severity mostly because Linux does.

>>> ---

>>>  scripts/checkpatch.pl | 48 +++++++++++++++++++++++++++++++++++++++++++

>>>  1 file changed, 48 insertions(+)

>>>

>>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl

>>> index 42e1c50dd80..18bc3c59c85 100755

>>> --- a/scripts/checkpatch.pl

>>> +++ b/scripts/checkpatch.pl

>>> @@ -1566,6 +1566,54 @@ sub process {

>>>  # check we are in a valid C source file if not then ignore this hunk

>>>               next if ($realfile !~ /\.(h|c|cpp)$/);

>>>

>>> +# Block comment styles

>>> +

>>> +             # Block comments use /* on a line of its own

>>> +             if ($rawline !~ m@^\+.*/\*.*\*/[ \t]*$@ &&      #inline /*...*/

>>> +                 $rawline =~ m@^\+.*/\*\*?[ \t]*.+[ \t]*$@) { # /* or /** non-blank

>>> +                     WARN("Block comments use a leading /* on a separate line\n" . $herecurr);

>>> +             }

>>> +

>

> This is the bit that is "unique to QEMU's checkpatch",

> per the commit message.

>

>

>> [*] The kernel's has

>>

>>    # Networking with an initial /*

>>                    if ($realfile =~ m@^(drivers/net/|net/)@ &&

>>                        $prevrawline =~ /^\+[ \t]*\/\*[ \t]*$/ &&

>>                        $rawline =~ /^\+[ \t]*\*/ &&

>>                        $realline > 2) {

>>                            WARN("NETWORKING_BLOCK_COMMENT_STYLE",

>>                                 "networking block comments don't use an empty /* line, use /* Comment...\n" . $hereprev);

>>                    }

>>

>> which makes no sense for us.

>

> This is a part which I did not take from kernel checkpatch;

> it is not a "relevant part", per the commit message.

>

>> With a corrected commit message:

>

> The commit message still makes sense to me.


It confused me.  Looks like I'm too easily confused today.

>> Reviewed-by: Markus Armbruster <armbru@redhat.com>


Feel free to use this R-by even without commit message improvements.
Paolo Bonzini Aug. 10, 2018, 12:44 p.m. UTC | #9
On 10/08/2018 11:41, Markus Armbruster wrote:
>> Lone "/*" or "/**": 9986 cases

>> 	of which in the first column: 7617

>> 	of which the first line in the file (license headers): 2834

>> 	regex: ^[ \t]*/\*\*?[ \t]*$

>>

>> "/*" with the first line of the comment: 11246

>> 	of which in the first column: 4985

>> 	of which the first line in the file: 97

>> 	regex: ^[ \t]*/\*\*?+(?:(?!\*/).)+?$

>>

>> License headers almost always have the "lone /*" format.  Apart from

>> license headers, 63% of the comments have the now-deprecated format.

>>

>> Inside functions, 73% of the comments have the now-deprecated format.

>> Outside functions it's 50-50.  That's because there are 2024 doc

>> comments, which in turn are 50% of the comments that are 1) outside the

>> functions 2) using a lone "/*".

>>

>> So my proposal, which is actually consistent with what QEMU is doing, is

>> the following:

>>

>> 1) the first line of a file should always be "/*", otherwise warn

>>

>> 2) a comment that starts with "/**" should have it on a lone line

>>

>> 3) every other multiline comment should start with

>> "/*<whitespace><something>"

> (3) is not consistent with what QEMU is doing in this case, since

> according to your data there is no consistency in what QEMU is doing.


65-35, and 75-25 inside functions (comments outside should be doc
comments so that's again a different story) is pretty consistent
actually.  So no patch would be better in your opinion?

Paolo
Paolo Bonzini Aug. 10, 2018, 12:45 p.m. UTC | #10
On 10/08/2018 11:10, Peter Maydell wrote:
>> So my proposal, which is actually consistent with what QEMU is doing, is

>> the following:

>>

>> 1) the first line of a file should always be "/*", otherwise warn

>>

>> 2) a comment that starts with "/**" should have it on a lone line

>>

>> 3) every other multiline comment should start with

>> "/*<whitespace><something>"

> Personally I would prefer your suggestion, but as I say, there

> was no consensus in the thread for it, and there was consensus

> for "use the kernel's style here". I don't think we gain much

> from reopening the debate at this point.


What we lose is that 3000 more new warnings appear.  So if we make an
exception and convert all of the comments, I'm okay.

But otherwise, at least Eric, you, me (only now I admit), Thomas
expressed a preference for the other style; on the other side it's
Markus, Stefan, Conny and Alex, some of whom were okay with applying
maintainer discretion; John and rth wanted a third one but disagreed on
their second choice.  I appreciate your writing the patch, but I'm not
sure that's consensus...

Paolo
Peter Maydell Aug. 10, 2018, 12:53 p.m. UTC | #11
On 10 August 2018 at 13:45, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 10/08/2018 11:10, Peter Maydell wrote:

>>> So my proposal, which is actually consistent with what QEMU is doing, is

>>> the following:

>>>

>>> 1) the first line of a file should always be "/*", otherwise warn

>>>

>>> 2) a comment that starts with "/**" should have it on a lone line

>>>

>>> 3) every other multiline comment should start with

>>> "/*<whitespace><something>"

>> Personally I would prefer your suggestion, but as I say, there

>> was no consensus in the thread for it, and there was consensus

>> for "use the kernel's style here". I don't think we gain much

>> from reopening the debate at this point.

>

> What we lose is that 3000 more new warnings appear.  So if we make an

> exception and convert all of the comments, I'm okay.

>

> But otherwise, at least Eric, you, me (only now I admit), Thomas

> expressed a preference for the other style; on the other side it's

> Markus, Stefan, Conny and Alex, some of whom were okay with applying

> maintainer discretion; John and rth wanted a third one but disagreed on

> their second choice.  I appreciate your writing the patch, but I'm not

> sure that's consensus...


What I strongly want is that checkpatch should (a) catch stuff
I get wrong and (b) catch stuff that other people get wrong,
so I don't have to nitpick over coding style myself (which I
have done for multiline comment style in the past). So to me
the current situation (checkpatch doesn't warn at all about
out-of-style multiline comments) is no good.

Nobody runs checkpatch on the whole existing codebase anyway,
do they? So I think "3000 new warnings" is a red herring.

thanks
-- PMM
Markus Armbruster Aug. 10, 2018, 1:19 p.m. UTC | #12
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 10/08/2018 11:41, Markus Armbruster wrote:

>>> Lone "/*" or "/**": 9986 cases

>>> 	of which in the first column: 7617

>>> 	of which the first line in the file (license headers): 2834

>>> 	regex: ^[ \t]*/\*\*?[ \t]*$

>>>

>>> "/*" with the first line of the comment: 11246

>>> 	of which in the first column: 4985

>>> 	of which the first line in the file: 97

>>> 	regex: ^[ \t]*/\*\*?+(?:(?!\*/).)+?$

>>>

>>> License headers almost always have the "lone /*" format.  Apart from

>>> license headers, 63% of the comments have the now-deprecated format.

>>>

>>> Inside functions, 73% of the comments have the now-deprecated format.

>>> Outside functions it's 50-50.  That's because there are 2024 doc

>>> comments, which in turn are 50% of the comments that are 1) outside the

>>> functions 2) using a lone "/*".

>>>

>>> So my proposal, which is actually consistent with what QEMU is doing, is

>>> the following:

>>>

>>> 1) the first line of a file should always be "/*", otherwise warn

>>>

>>> 2) a comment that starts with "/**" should have it on a lone line

>>>

>>> 3) every other multiline comment should start with

>>> "/*<whitespace><something>"

>> (3) is not consistent with what QEMU is doing in this case, since

>> according to your data there is no consistency in what QEMU is doing.

>

> 65-35, and 75-25 inside functions (comments outside should be doc

> comments so that's again a different story) is pretty consistent

> actually.  So no patch would be better in your opinion?


I'm fine with Peter's patch as is.  I'd also be fine with a tree-wide
conversion of comments to the preferred style on top.

I'd prefer not to reopen the debate.

If we reopen it, and reach a different consensus, we need a patch
enforcing it.

If we reopen it, and fail to reach consensus, there's nothing for a
patch to enforce.

Note that "always wing both ends, and star every line in between" and
"wing only the end, not the start, unless file's first line or /** doc
comment, and star every line in between" are not the only options.  A
weaker style rule would be "may wing the start, must wing the end, must
star every line in between".

That said, I reiterate my preference not to reopen the debate :)
Paolo Bonzini Aug. 10, 2018, 5:06 p.m. UTC | #13
On 10/08/2018 14:53, Peter Maydell wrote:
>> But otherwise, at least Eric, you, me (only now I admit), Thomas

>> expressed a preference for the other style; on the other side it's

>> Markus, Stefan, Conny and Alex, some of whom were okay with applying

>> maintainer discretion; John and rth wanted a third one but disagreed on

>> their second choice.  I appreciate your writing the patch, but I'm not

>> sure that's consensus...

> What I strongly want is that checkpatch should (a) catch stuff

> I get wrong and (b) catch stuff that other people get wrong,

> so I don't have to nitpick over coding style myself (which I

> have done for multiline comment style in the past). So to me

> the current situation (checkpatch doesn't warn at all about

> out-of-style multiline comments) is no good.

> 

> Nobody runs checkpatch on the whole existing codebase anyway,

> do they? So I think "3000 new warnings" is a red herring.


It's just a proxy for how many file would become inconsistent in the long
run, by respecting the new rule for now code and not changing the existing
code.  So let's try doing it: :)

  git ls-tree -r HEAD | grep -v 'tests/tcg' | grep -v disas | \
    grep -v capstone | grep -v libxl | grep -v libdecnumber | \
    grep '\.[ch]$' | while read f g h i; do \
      scripts/checkpatch.pl -f $i;
  done | tee checkpatch.list

First of all, there were no hangs and only one file took a long time (ui/vgafont.h)
because it hits a quadratic loop; that was a nice start.

We have:

	Tested files: 3392
	Zero errors and zero warnings: 1938
	Zero errors, some warnings: 152
	Total errors + warnings: 82700 (only ~4000 warnings)

	Top reports:
	44425 ERROR: code indent should never use tabs
	19072 ERROR: space required / space prohibited
	4893 ERROR: braces {} are necessary for all arms
	3573 WARNING: line over 80 characters
	1198 ERROR: do not use C99 // comments
	1090 ERROR: line over 90 characters
	1053 ERROR: trailing statements should be on next line

So about 65% of the files pass before the patch, which is much better 
than I actually thought.

A note about tabs: only 360 lines have a space-tab sequence, and
perhaps we could fix those.  18950 lines have tabs only at the beginning
of the line.  23405 lines have a single sequence of tabs in the middle
of the line but the indentation is otherwise absent or space-based.
But I digress.

With Peter's patch:

	Zero errors and zero warnings: 1269
	Zero errors, some warnings: 895

	New warnings:
	13214 WARNING: Block comments use a leading /* on a separate line
	4392 WARNING: Block comments use a trailing */ on a separate line
	2762 WARNING: Block comments use * on subsequent lines
	117 WARNING: Block comments should align the * on each line

With alternate rule (and dispensation for line 1):

	Zero errors and warnings: 1435
	Zero errors, some warnings: 831

	New warnings:
	4891 WARNING: Block comments use a leading /* on a separate line
	4392 WARNING: Block comments use a trailing */ on a separate line
	2762 WARNING: Block comments use * on subsequent lines
	117 WARNING: Block comments should align the * on each line

Given the number of warnings about trailing */, the 2-line format is much more
prevalent than I thought.  We can assume that there are no comments with
separate "/*" and joined "*/", and then by comparing the different number
of warnings in the two cases you get the following estimates:

- 13713 multiline comments (13214+4891-4392)
- 1630 multiline comments in "2-line" format with asterisks (4392-2762)
- 2762 multiline comments in "2-line" format without asterisks
- 4430 multiline comments in "3-line" format
- 2024 doc comments (should be in "4-line" format, I spot checked a few dozen;
  this was from a separate grep for "/\*\*$")
- 2867 multiline comments in "4-line" format (the rest)

So in conclusion: I'm not opposed to this patch, as long as someone converts
at least all 3-line comments to 4-line (including mine) and then the WARN can
become an ERROR.  If someone wants to do that, the checkpatch output from the
script at the top is probably a good start.  Otherwise the chosen format
should be the most common, considering that most maintainers have been using
it for 5 years or more.

If we want to accept both it's certainly fine by me, and then it's enough not to
warn at all on the leading line.  In that case only 289 files (baseline is 152)
would have zero errors and some warnings.  I think it's the best compromise.

I attach the incremental patch I used for the alternate rule.

Paolo

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 52ab18bfce..0f182670b8 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1568,10 +1568,18 @@ sub process {
 
 # Block comment styles
 
-		# Block comments use /* on a line of its own
-		if ($rawline !~ m@^\+.*/\*.*\*/[ \t]*$@ &&	#inline /*...*/
-		    $rawline =~ m@^\+.*/\*\*?[ \t]*.+[ \t]*$@) { # /* or /** non-blank
-			WARN("Block comments use a leading /* on a separate line\n" . $herecurr);
+		# Doc comments use /** on a line of its own
+		if ($rawline =~ m@/\*\*@ && $rawline !~ m@^\+[ \t]*/\*\*$@) { # /* or /** non-blank
+			ERROR("Documentation comments use a leading /** on a separate line\n" . $herecurr);
+		}
+		if ($rawline =~ m@\*\*/@ && $rawline !~ m@^\+[ \t]*\*\*\/@) { # /* or /** non-blank
+			ERROR("Documentation comments use a trailing */ on a separate line\n" . $herecurr);
+		}
+
+		# Block comments use /* on the first line, except for the license header
+		if ($rawline !~ m@^\+.*/\*.*\*/[ \t]*$@ &&	#inline /*...*/
+		    $rawline =~ m@^\+.*/\*[ \t]*$@ && $realline > 1) { # /* or /** non-blank
+			WARN("Block comment with leading /* on a separate line\n" . $herecurr);
 		}
 
 # Block comments use * on subsequent lines
Thomas Huth Aug. 13, 2018, 6:18 a.m. UTC | #14
On 08/10/2018 02:45 PM, Paolo Bonzini wrote:
> On 10/08/2018 11:10, Peter Maydell wrote:

>>> So my proposal, which is actually consistent with what QEMU is doing, is

>>> the following:

>>>

>>> 1) the first line of a file should always be "/*", otherwise warn

>>>

>>> 2) a comment that starts with "/**" should have it on a lone line

>>>

>>> 3) every other multiline comment should start with

>>> "/*<whitespace><something>"

>> Personally I would prefer your suggestion, but as I say, there

>> was no consensus in the thread for it, and there was consensus

>> for "use the kernel's style here". I don't think we gain much

>> from reopening the debate at this point.

> 

> What we lose is that 3000 more new warnings appear.  So if we make an

> exception and convert all of the comments, I'm okay.


Why do you want to enforce to convert all of them in one go? For example
we still have also some TABs in some source files, and nobody cares
about converting them all in one go. If we enforce the comments in new
code, that should IMHO be good enough. Or make it a BiteSizeTask for the
next GSoC maybe.

> But otherwise, at least Eric, you, me (only now I admit), Thomas

> expressed a preference for the other style;


No, wait, you've got that wrong. I said that I don't like the asymetric
style that you're now trying to propagate again, but I'm fine with either

 /* dense comments
  * on multiple lines */

or

 /*
  * no dense comments
  * on multiple lines
  */

> on the other side it's

> Markus, Stefan, Conny and Alex, some of whom were okay with applying

> maintainer discretion; John and rth wanted a third one but disagreed on

> their second choice.  I appreciate your writing the patch, but I'm not

> sure that's consensus...


Count me in for the consensus on this other side, I really don't want
the asymetric style.

 Thomas
Thomas Huth Aug. 13, 2018, 6:26 a.m. UTC | #15
On 08/09/2018 06:00 PM, Peter Maydell wrote:
> We now require Linux-kernel-style multiline comments:

>     /*

>      * line one

>      * line two

>      */

> 

> Enforce this in checkpatch.pl, by backporting the relevant

> parts of the Linux kernel's checkpatch.pl. (The only changes

> needed are that Linux's checkpatch.pl WARN() function takes

> an extra argument that ours does not.)

> 

> The kernel's checkpatch does not enforce "leading /* on

> a line of its own, so that part is unique to QEMU's checkpatch.

> 

> Sample warning output:

> 

> WARNING: Block comments use a leading /* on a separate line

> #34: FILE: hw/intc/arm_gicv3_common.c:39:

> +    /* Older versions of QEMU had a bug in the handling of state save/restore

> 

> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

> ---

> I'm still not used to the leeading-/*-on-it's-own style,

> so having checkpatch catch my lapses is handy...

> 

> I used WARN level severity mostly because Linux does.

> ---

>  scripts/checkpatch.pl | 48 +++++++++++++++++++++++++++++++++++++++++++

>  1 file changed, 48 insertions(+)


Acked-by: Thomas Huth <thuth@redhat.com>
Paolo Bonzini Aug. 13, 2018, 8:01 a.m. UTC | #16
On 13/08/2018 08:18, Thomas Huth wrote:
> On 08/10/2018 02:45 PM, Paolo Bonzini wrote:

>> On 10/08/2018 11:10, Peter Maydell wrote:

>>>> So my proposal, which is actually consistent with what QEMU is doing, is

>>>> the following:

>>>>

>>>> 1) the first line of a file should always be "/*", otherwise warn

>>>>

>>>> 2) a comment that starts with "/**" should have it on a lone line

>>>>

>>>> 3) every other multiline comment should start with

>>>> "/*<whitespace><something>"

>>> Personally I would prefer your suggestion, but as I say, there

>>> was no consensus in the thread for it, and there was consensus

>>> for "use the kernel's style here". I don't think we gain much

>>> from reopening the debate at this point.

>>

>> What we lose is that 3000 more new warnings appear.  So if we make an

>> exception and convert all of the comments, I'm okay.

> 

> Why do you want to enforce to convert all of them in one go? For example

> we still have also some TABs in some source files, and nobody cares

> about converting them all in one go. If we enforce the comments in new

> code, that should IMHO be good enough. Or make it a BiteSizeTask for the

> next GSoC maybe.


Because TABs are usually very few in a file and crept in by mistake (100
files have 5 or fewer TABs).  If there are many, it's almost always in
files that nobody touches (exception: linux-user/syscall.c and
linux-user/syscall_defs.h).  For what it's worth, I'd be in favor of
changing TABs to spaces in files that only have a few of them or in
those two linux-user files.

The 3-line comment style is in files that are actively developed and
would become inconsistent over a very short time.

Paolo
diff mbox series

Patch

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 42e1c50dd80..18bc3c59c85 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1566,6 +1566,54 @@  sub process {
 # check we are in a valid C source file if not then ignore this hunk
 		next if ($realfile !~ /\.(h|c|cpp)$/);
 
+# Block comment styles
+
+		# Block comments use /* on a line of its own
+		if ($rawline !~ m@^\+.*/\*.*\*/[ \t]*$@ &&	#inline /*...*/
+		    $rawline =~ m@^\+.*/\*\*?[ \t]*.+[ \t]*$@) { # /* or /** non-blank
+			WARN("Block comments use a leading /* on a separate line\n" . $herecurr);
+		}
+
+# Block comments use * on subsequent lines
+		if ($prevline =~ /$;[ \t]*$/ &&			#ends in comment
+		    $prevrawline =~ /^\+.*?\/\*/ &&		#starting /*
+		    $prevrawline !~ /\*\/[ \t]*$/ &&		#no trailing */
+		    $rawline =~ /^\+/ &&			#line is new
+		    $rawline !~ /^\+[ \t]*\*/) {		#no leading *
+			WARN("Block comments use * on subsequent lines\n" . $hereprev);
+		}
+
+# Block comments use */ on trailing lines
+		if ($rawline !~ m@^\+[ \t]*\*/[ \t]*$@ &&	#trailing */
+		    $rawline !~ m@^\+.*/\*.*\*/[ \t]*$@ &&	#inline /*...*/
+		    $rawline !~ m@^\+.*\*{2,}/[ \t]*$@ &&	#trailing **/
+		    $rawline =~ m@^\+[ \t]*.+\*\/[ \t]*$@) {	#non blank */
+			WARN("Block comments use a trailing */ on a separate line\n" . $herecurr);
+		}
+
+# Block comment * alignment
+		if ($prevline =~ /$;[ \t]*$/ &&			#ends in comment
+		    $line =~ /^\+[ \t]*$;/ &&			#leading comment
+		    $rawline =~ /^\+[ \t]*\*/ &&		#leading *
+		    (($prevrawline =~ /^\+.*?\/\*/ &&		#leading /*
+		      $prevrawline !~ /\*\/[ \t]*$/) ||		#no trailing */
+		     $prevrawline =~ /^\+[ \t]*\*/)) {		#leading *
+			my $oldindent;
+			$prevrawline =~ m@^\+([ \t]*/?)\*@;
+			if (defined($1)) {
+				$oldindent = expand_tabs($1);
+			} else {
+				$prevrawline =~ m@^\+(.*/?)\*@;
+				$oldindent = expand_tabs($1);
+			}
+			$rawline =~ m@^\+([ \t]*)\*@;
+			my $newindent = $1;
+			$newindent = expand_tabs($newindent);
+			if (length($oldindent) ne length($newindent)) {
+				WARN("Block comments should align the * on each line\n" . $hereprev);
+			}
+		}
+
 # Check for potential 'bare' types
 		my ($stat, $cond, $line_nr_next, $remain_next, $off_next,
 		    $realline_next);