checkpatch.pl: Add SPDX license tag check

Message ID 20171109011047.20040-1-robh@kernel.org
State New
Headers show
Series
  • checkpatch.pl: Add SPDX license tag check
Related show

Commit Message

Rob Herring Nov. 9, 2017, 1:10 a.m.
Add a check warning if SPDX-License-Identifier tags are not used in
newly added files.

Cc: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rob Herring <robh@kernel.org>

---
I rewrote my previous version to check more than just dts files. It also 
now looks for a tag in added files rather than trying a fuzzy match on 
freeform license text.

 scripts/checkpatch.pl | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

-- 
2.14.1

Comments

Joe Perches Nov. 9, 2017, 2:10 a.m. | #1
On Wed, 2017-11-08 at 19:10 -0600, Rob Herring wrote:
> Add a check warning if SPDX-License-Identifier tags are not used in

> newly added files.


If this is to be done, and I think it's not a great idea,
there are better ways of doing this that emit this warning
on a per-file basis instead of a per-patch.
gregkh Nov. 9, 2017, 9:12 a.m. | #2
On Wed, Nov 08, 2017 at 06:10:19PM -0800, Joe Perches wrote:
> On Wed, 2017-11-08 at 19:10 -0600, Rob Herring wrote:

> > Add a check warning if SPDX-License-Identifier tags are not used in

> > newly added files.

> 

> If this is to be done, and I think it's not a great idea,

> there are better ways of doing this that emit this warning

> on a per-file basis instead of a per-patch.


Any hints as to how to do that?  :)

thanks,

greg k-h
Rob Herring Nov. 9, 2017, 1:47 p.m. | #3
On Wed, Nov 8, 2017 at 8:10 PM, Joe Perches <joe@perches.com> wrote:
> On Wed, 2017-11-08 at 19:10 -0600, Rob Herring wrote:

>> Add a check warning if SPDX-License-Identifier tags are not used in

>> newly added files.

>

> If this is to be done, and I think it's not a great idea,


Which part? SPDX tags or checking new files or just using checkpatch for this?

> there are better ways of doing this that emit this warning

> on a per-file basis instead of a per-patch.


You had mentioned using something like checkincludes.pl before. The
problem I see with that is few people run those tools. Lots of people
run checkpatch.pl. We want this to be correct when added, not after
the fact by someone else who is not the author. It fits in the
workflow, because if checkpatch doesn't catch it, then I have to in
reviews.

I do agree though that the implementation is a bit ugly given the line
by line way checkpatch works.

Rob
Joe Perches Nov. 9, 2017, 1:55 p.m. | #4
On Thu, 2017-11-09 at 10:12 +0100, Greg Kroah-Hartman wrote:
> On Wed, Nov 08, 2017 at 06:10:19PM -0800, Joe Perches wrote:

> > On Wed, 2017-11-08 at 19:10 -0600, Rob Herring wrote:

> > > Add a check warning if SPDX-License-Identifier tags are not used in

> > > newly added files.

> > 

> > If this is to be done, and I think it's not a great idea,

> > there are better ways of doing this that emit this warning

> > on a per-file basis instead of a per-patch.

> 

> Any hints as to how to do that?  :)


Sure, but only after it's decided if adding spdx license tag
to all compilable source files is appropriate.

And if that's true, then that requirement also has to be
added in the Documentation directory somewhere.
Joe Perches Nov. 9, 2017, 3:39 p.m. | #5
On Thu, 2017-11-09 at 07:47 -0600, Rob Herring wrote:
> On Wed, Nov 8, 2017 at 8:10 PM, Joe Perches <joe@perches.com> wrote:

> > On Wed, 2017-11-08 at 19:10 -0600, Rob Herring wrote:

> > > Add a check warning if SPDX-License-Identifier tags are not used in

> > > newly added files.

> > 

> > If this is to be done, and I think it's not a great idea,

> 

> Which part? SPDX tags or checking new files or just using checkpatch for this?


SPDX tags in all files.

There's no real way to check a patch for this.

You have to check the entire file.

checkpatch could, as you've done, scan for new files
against /dev/null, but a single patch can add
multiple files and each newly added file should have
a missing SPDX indicator check.

My concern is that there are ~50,000 files in the
kernel source tree and, after that scripted patch
adding the tags, only about a quarter of them have
an SPDX tag.

So which files actually _need_ a SPDX tag?

files in -next with an SPDX tag:

$ git grep --name-only -i -P "spdx-licen[cs]e-identifier" | \
  while read file ; do basename $file ; done | \
  sed -r -e 's/^.*(\..*)/\1/' | \
  sort | uniq -c | sort -rn | head -10
   7514 .h
   3435 .c
   1193 Makefile
    486 .S
    221 .dts
    186 Kconfig
    185 .dtsi
     97 .sh
     34 .tc
     24 .debug

vs all files in -next (not Documentation/)

$ git ls-files | grep -v "^Documentation/" | \
  while read file ; do basename $file ; done | \
  sed -r -e 's/^.*(\..*)/\1/' | \
  sort | uniq -c | sort -rn | head -10
  25946 .c
  20360 .h
   2437 Makefile
   1454 .S
   1442 .dts
   1380 Kconfig
   1099 .dtsi
    207 .json
    204 .gitignore
    194 .sh
Rob Herring Nov. 9, 2017, 6:12 p.m. | #6
On Thu, Nov 9, 2017 at 9:39 AM, Joe Perches <joe@perches.com> wrote:
> On Thu, 2017-11-09 at 07:47 -0600, Rob Herring wrote:

>> On Wed, Nov 8, 2017 at 8:10 PM, Joe Perches <joe@perches.com> wrote:

>> > On Wed, 2017-11-08 at 19:10 -0600, Rob Herring wrote:

>> > > Add a check warning if SPDX-License-Identifier tags are not used in

>> > > newly added files.

>> >

>> > If this is to be done, and I think it's not a great idea,

>>

>> Which part? SPDX tags or checking new files or just using checkpatch for this?

>

> SPDX tags in all files.

>

> There's no real way to check a patch for this.

>

> You have to check the entire file.


Changing existing files is a separate problem. There is a script for
that (though the data file is not public). I'm only worried with new
files here because that's what I review and have to tell folks to
replace their 2 pages of license text with SPDX tags. (It will be much
easier to just tell them to run checkpatch. ;) ).

> checkpatch could, as you've done, scan for new files

> against /dev/null, but a single patch can add

> multiple files and each newly added file should have

> a missing SPDX indicator check.


I was going with the easy route of just giving one warning per patch.
I'd hope that's enough info for folks to figure out what's needed from
there. However, it should be possible to make it per file. The main
complication is we need to look for either '^+++' or the end of the
patch which I didn't see an easy/clean way to do.

> My concern is that there are ~50,000 files in the

> kernel source tree and, after that scripted patch

> adding the tags, only about a quarter of them have

> an SPDX tag.

>

> So which files actually _need_ a SPDX tag?

>

> files in -next with an SPDX tag:

>

> $ git grep --name-only -i -P "spdx-licen[cs]e-identifier" | \

>   while read file ; do basename $file ; done | \

>   sed -r -e 's/^.*(\..*)/\1/' | \

>   sort | uniq -c | sort -rn | head -10

>    7514 .h

>    3435 .c

>    1193 Makefile

>     486 .S

>     221 .dts

>     186 Kconfig

>     185 .dtsi

>      97 .sh

>      34 .tc

>      24 .debug

>

> vs all files in -next (not Documentation/)

>

> $ git ls-files | grep -v "^Documentation/" | \

>   while read file ; do basename $file ; done | \

>   sed -r -e 's/^.*(\..*)/\1/' | \

>   sort | uniq -c | sort -rn | head -10

>   25946 .c

>   20360 .h

>    2437 Makefile

>    1454 .S

>    1442 .dts

>    1380 Kconfig

>    1099 .dtsi

>     207 .json

>     204 .gitignore

>     194 .sh

>
Joe Perches Nov. 9, 2017, 6:27 p.m. | #7
On Thu, 2017-11-09 at 12:12 -0600, Rob Herring wrote:
> On Thu, Nov 9, 2017 at 9:39 AM, Joe Perches <joe@perches.com> wrote:

> > On Thu, 2017-11-09 at 07:47 -0600, Rob Herring wrote:

> > > On Wed, Nov 8, 2017 at 8:10 PM, Joe Perches <joe@perches.com> wrote:

> > > > On Wed, 2017-11-08 at 19:10 -0600, Rob Herring wrote:

> > > > > Add a check warning if SPDX-License-Identifier tags are not used in

> > > > > newly added files.

> > > > 

> > > > If this is to be done, and I think it's not a great idea,

> > > 

> > > Which part? SPDX tags or checking new files or just using checkpatch for this?

> > 

> > SPDX tags in all files.


Is having an SPDX tag in every file really desired?

> > 

> > There's no real way to check a patch for this.

> > 

> > You have to check the entire file.

> 

> Changing existing files is a separate problem. There is a script for

> that (though the data file is not public). I'm only worried with new

> files here because that's what I review and have to tell folks to

> replace their 2 pages of license text with SPDX tags. (It will be much

> easier to just tell them to run checkpatch. ;) ).

> 

> > checkpatch could, as you've done, scan for new files

> > against /dev/null, but a single patch can add

> > multiple files and each newly added file should have

> > a missing SPDX indicator check.

> 

> I was going with the easy route of just giving one warning per patch.

> I'd hope that's enough info for folks to figure out what's needed from

> there. However, it should be possible to make it per file. The main

> complication is we need to look for either '^+++' or the end of the

> patch which I didn't see an easy/clean way to do.


EOF is easy.
There already is a $realfile test for start of file.

> > My concern is that there are ~50,000 files in the

> > kernel source tree and, after that scripted patch

> > adding the tags, only about a quarter of them have

> > an SPDX tag.

> > 

> > So which files actually _need_ a SPDX tag?

> > 

> > files in -next with an SPDX tag:

> > 

> > $ git grep --name-only -i -P "spdx-licen[cs]e-identifier" | \

> >   while read file ; do basename $file ; done | \

> >   sed -r -e 's/^.*(\..*)/\1/' | \

> >   sort | uniq -c | sort -rn | head -10

> >    7514 .h

> >    3435 .c

> >    1193 Makefile

> >     486 .S

> >     221 .dts

> >     186 Kconfig

> >     185 .dtsi

> >      97 .sh

> >      34 .tc

> >      24 .debug

> > 

> > vs all files in -next (not Documentation/)

> > 

> > $ git ls-files | grep -v "^Documentation/" | \

> >   while read file ; do basename $file ; done | \

> >   sed -r -e 's/^.*(\..*)/\1/' | \

> >   sort | uniq -c | sort -rn | head -10

> >   25946 .c

> >   20360 .h

> >    2437 Makefile

> >    1454 .S

> >    1442 .dts

> >    1380 Kconfig

> >    1099 .dtsi

> >     207 .json

> >     204 .gitignore

> >     194 .sh

> >

Patch

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 8b80bac055e4..6665735123e5 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2185,6 +2185,8 @@  sub process {
 	my $commit_log_has_diff = 0;
 	my $reported_maintainer_file = 0;
 	my $non_utf8_charset = 0;
+	my $added_file = 0;
+	my $missing_spdx_license = 0;
 
 	my $last_blank_line = 0;
 	my $last_coalesced_string_linenr = -1;
@@ -2368,6 +2370,16 @@  sub process {
 		$here = "#$linenr: " if (!$file);
 		$here = "#$realline: " if ($file);
 
+		# determine if this is a new file
+		if ($line =~ m@^\-\-\-\s/@) {
+			if ($line =~ m@/dev/null@) {
+				$added_file = 1;
+				$missing_spdx_license++;
+			} else {
+				$added_file = 0;
+			}
+		}
+
 		my $found_file = 0;
 		# extract the filename as it passes
 		if ($line =~ /^diff --git.*?(\S+)$/) {
@@ -2865,6 +2877,14 @@  sub process {
 			}
 		}
 
+# check for using SPDX tag instead of free form license text
+		if ($added_file &&
+		    ($rawline =~ /\bSPDX-License-Identifier/ ||
+		     $realfile =~ /Documentation/)) {
+			$missing_spdx_license--;
+			$added_file = 0;
+		}
+
 # check we are in a valid source file if not then ignore this hunk
 		next if ($realfile !~ /\.(h|c|s|S|sh|dtsi|dts)$/);
 
@@ -6399,6 +6419,11 @@  sub process {
 		      "Missing Signed-off-by: line(s)\n");
 	}
 
+	if ($missing_spdx_license) {
+		WARN("SPDX_LICENSE_TAG",
+		     "Missing SPDX-License-Identifier tags in added files. Use tags instead of full license text.\n");
+	}
+
 	print report_dump();
 	if ($summary && !($clean == 1 && $quiet == 1)) {
 		print "$filename " if ($summary_file);