62 Commits

Author SHA1 Message Date
Joel Brobecker
b05bcaf46d ensure_iso_8859_15_only: Remove Python-2.x-only handling
Change-Id: I8d7ca476c55573f26c74a7aaa3f5d8a8f0e3f037
TN: U530-006
2021-11-30 17:41:32 +04:00
Joel Brobecker
9c82498e7b Introduce (the concept of) git command output decoding
This commit is preparation work for the transition to Python 3.x,
where the output obtained by running Git commands will become
bytes as opposed to a string. In the vast majority of cases,
we'll want to decode that output into a string. Ideally, we would
want to do this in a way that is both compatible with Python 2.x
and Python 3.x, but we have found that this requires a lot of
work with many changes spread all over the code. So, instead,
what this commit does is introduce the concept of decoding
the output, but with the decoding only occurring when running
under Python 3.x.

That way, we can make progress towards Python 3.x while preserving
the behavior under Python 2.x intact.

Change-Id: I189577798ee96cba1fa55c7356babf102575642f
TN: U530-006
2021-10-06 11:27:20 -07:00
Joel Brobecker
d87407ca43 ensure_iso_8859_15_only: Enhance for Python 3.x (unicode revlog lines)
This is another commit to prepare for the transition to Python 3.x,
where the revlog of the given Commit object will be (unicode) strings.
In this situation decoding before checking the string against
the ISO-8859-15 is not necessary.

This commit therefore enhances the function to handle the case
where the revlog is already unicode by checking whether it offers
the "decode" method or not. The code is written in such a way that
the Python 3.x case doesn't use a temporary variable. That way, when
Python 2.x support is removed, we can simply remove the corresponding
half of the if-else block, and just keep the other half untouched.

TN: U530-006
Change-Id: I674c639eb7af656cdc084cc510b9eca90751f18f
2021-10-04 08:16:01 -07:00
Joel Brobecker
f16eb39b2d pre_commit_checks.py: Show filename collisions in alphabetical order
This commit enhances the error message being generated when
filename collisions are detected to list the collisions in
alphabetical order. While this doesn't make a big difference
for the users, it ensures that the error message is deterministic.

Found while working on the transition to Python 3.x (U530-006)

Change-Id: Ie6787022a1833ea7f0db23a17c79d3bb9ce2ba96
2021-07-04 14:06:32 -07:00
Joel Brobecker
bd94624cec style_check_commit: Convert to tuple return value from call to "filter"
This is a preparation patch for the transition to Python 3.x.

Currently, we have the following code:

    | files_to_check = filter(needs_style_check_p, files_to_check)
    | if not files_to_check:
    |     debug("style_check_commit: no files to style-check")
    |     return

With Python 2.x, the "filter" function constructs a list, so
we can indeed check for emptiness using the logical operator "not".

With Python 3.x, however, filter returns an iterator, and using
the logical operator "not" no longer has the same semantics at all.

The easiest way to handle this is to simply convert the result to
a tuple.

TN: U530-006
Change-Id: I334f2657728405226d7da577e893b7c64a3220fb
2021-06-15 05:54:49 -07:00
Joel Brobecker
987d0692f0 style_check_files: Allow "filename_list" param to be any kind of iterable
The "filename_list" of the style_check_files function is expected
to be a list. However, as it happens, we have a call to this function
where the filename_list parameter was computed using the "filter"
function.

    files_to_check = filter(needs_style_check_p, files_to_check)
    style_check_files(files_to_check, new_rev, project_name)

This is no problem when trying to iterate on the list alone, but
at one point inside the "style_check_files" funtion, we try to
add to it like so:

    for filename in filename_list + aux_files:

With Python 3.x, this causes a crash because the "filter" function
now returns a generator rather than a list. Seems it seems desirable
to allow "style_check_files" to accept any iterable as "filename_list",
this commit modifies the code to allow it.

TN: U530-006
Change-Id: I8e5caaf09d177096788c522d7bff8168eb158ec7
2021-06-05 19:27:39 -07:00
Joel Brobecker
a075b1653e reformat all the code using black
Change-Id: Idbc70777233ab2d40ab59765abb9cbbeeb88ec63
2021-04-18 14:59:01 +04:00
Joel Brobecker
784e71d0b4 hooks/pre_commit_checks.py: Avoid unnecessary list comprehension
This was flagged by flake8, under error C416.

Change-Id: I5e3c49d0db57d32d42cb854c7fe1b95c1d4024f4
2021-04-18 14:51:34 +04:00
Joel Brobecker
fe8c5b5009 Add support for new config option: hooks.max-filepath-length
Note that, as part of the implementation, this commit introduces
a couple of calls to Git for each commit meant to retrieve the entire
list of files in a given commit. Luckily, one of these two calls
was already part of another check (filename collisions), and
the result of that call is cached in a CommitInfo object, so
the delta is only one extra Git call. Some performance measurements
seem to indicate that this is a fast-enough operation that users
should not notice the difference in performance, even for repositories
with a very large number of files [1]. If real-life usage proves
this assessment wrong, however, we'll revisit the documented behavior
so as to allow the use of a cheaper approach (e.g perform the check
on all modified files, rather than just the new ones).

[1]: Tested for instance on the GCC repository, which has more than
     87,000 files in total; the first call took roughly 100ms, and
     the next ones took about 60-70ms.

Change-Id: If0eea2c4990a945a3006cdd74d7a9aca2fc770d9
TN: T811-015
2020-09-11 10:47:51 -07:00
Joel Brobecker
3df0ef1658 CommitInfo.all_files: New method
This commit enhances the CommitInfo class by adding a new method that
returns the list of all files in the repository for the given commit.
This factorizes some code which already exists, and allows for the
result to be cached in this object for reuse, something we intend
to do on a followup commit. In the meantime, this commit should be
a nop in practice.

Change-Id: I3f9328dd9e89f8d82ad2ad6fdac69b5e2fbc0992
TN: T811-015
2020-09-02 17:28:28 -07:00
Joel Brobecker
111da4bad3 Add support for new config option: hooks.no-rh-character-range-check
Change-Id: Ic64b5b0f9007605bd339402b93689a944fd13485
TN: T829-004
2020-08-30 09:23:10 -07:00
Joel Brobecker
07642e1591 Factorize code to call third-party hooks even more
This commit introduces a new class (ThirdPartyHook), which centralizes
the handling of third-party hooks (scripts that project set up via
the configuration file). Most of the code is simply a move of the code
from maybe_call_thirdparty_hook, with a couple of enhancements:
  - Ability to specify the directory from which to call the hook;
  - Handling of the situation when calling the hook itself fails.

We can already see the benefits of this in function style_check_files,
where the code now necessary to call the style_checker is both clearer
and more compact.

In the future, the intent is to push this class one step further to
allow users to specify hooks to be fetched from the repository, rather
than requiring that the hooks be installed (usually by an admin) on
the machine hosting the repository. The current class' API should
allow us to implement this in a way that's transparent to all users.

Change-Id: Ie8bf8accbbfd75a91b914628fad27d780532dac4
TN: T209-005
2020-07-19 19:38:40 -07:00
Joel Brobecker
ff8d81cb4a compute each commit's revlog once and then cache it
This is preparatory work for being to pass a certain amount of data
to user-defined hooks. The intent of this change is to limit the number
of times we compute that information to at most once.

Generally speaking, the main part of this change consists in adding
the following new methods to class CommitInfo:
  - raw_revlog;
  - raw_revlog_lines; and

The rest of this change is mostly adjustments to the code that needs
to access commits' rev logs to get them from a shared CommitInfo object
rather than from a play revision (SHA1).

Additionally, the function is_revert_commit in git.py, which took
a commit revision as a paramenter and needed a call to "git log"
to get the commit's body, has been replaced by a new method in
class CommitInfo. An alternative approach might have been to
keep the function, and change its parameter to be a CommitInfo object.
But it seemed more natural to make this a method of the CommitInfo
class instead, so this is what this change does.

Change-Id: Ia4bf23f24226d1e9eddafc61afd37db37f0f5287
TN: T209-005
2020-07-12 17:24:39 -07:00
Thomas Quinot
56f3ed28a7 Minor error message rewording 2020-06-15 11:32:45 -07:00
Joel Brobecker
7644236b43 show correct location when identifying non-ISO-8859-15 characters in rev log
There was a bug when returning an error message indicating that
the revision contains non-ISO-8859-15, that would always show
the first line has having the unwanted character, even when
that characters was actually on a different line.

And while at it, I improved the error message to include the line
number as well. Just in case showing the text itself is either not
sufficient or just not as efficient.

Credit for finding the bug: Thomas Quinot (thank you!).

Followup on QB25-001.
2018-01-10 15:41:06 +04:00
Joel Brobecker
24ebc361d6 Add support for hooks.style-checker-config-file config
For QC13-022.
2017-12-13 17:35:05 +04:00
Joel Brobecker
0ce5aa8f51 reject commits with non-ISO-8859-15 characters.
For QB25-001.
2017-12-13 15:39:06 +04:00
Joel Brobecker
d79d7b0e66 Require TNs to be on word boundaries to be accepted.
For QB25-001.
2017-12-13 15:38:58 +04:00
Joel Brobecker
3fed778872 rename check_commit and check_files to style_check_{commit,files}
This makes it clear what this functions are meant to do.
2017-08-04 16:06:47 -07:00
Joel Brobecker
407bcd597e remove support for the word "minor" as substitute for a TN
Users are now invited to use "no-tn-check" exclusively for that.

For Q728-031.
2017-07-28 16:24:55 -07:00
Joel Brobecker
7cedde9495 Only call style_checker once per commit (or group of commits)
This is a performance enhancement which reduces the number of calls
to style_checker to, at most, once per commit. For repositories which
are configured to combine the commits for style-checking, the
style_checker should only be called once at most.

In terms of performance gains, the new approach increases
the performance by a factor of about 5 to 8 (measured using
a set of 10, 100, 1000 and 4500 files, the latter being the typical
commit in the GDB repository when adjusting the copyright year in
all GDB sources).

Roughly speaking, on an Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz,
style-checking 4,500 text files, for which no check is actually
performed, took 31 seconds instead of 226 seconds. Using 4,500 Ada
files, for which we actually run the GNAT compiler and perform some
rulific checks, the new approach took 61 seconds instead of 230
seconds. The relative gain in the Ada case may appear to decrease
with the number of files to check, but 61 secs instead of 230 secs
is the difference between completing before the gerrit timeout
(around 2 minutes) or being aborted by the gerrit timeout.

For Q530-009.
2017-07-05 09:11:02 -07:00
Joel Brobecker
e4fa3fd38b pre_commit_checks: pass the filename via stdin rather than the command line
This is preparation work towards calling the style checker only once
for an entire commit, rather than once per file.

This commit also adds a new testcase, allowing the code coverage to
remain at 100%.

For Q530-009.
2017-07-05 09:10:59 -07:00
Joel Brobecker
8333edae90 simplify module name in call to style_checker
Thanks to the transition to the new style checker, we can now
call it directly with the module name, without having to fake
the SVN path to that module.

Part of NA17-007.
2017-05-18 09:34:28 -07:00
Joel Brobecker
168de73a97 transition default style checker to style_checker
For NA17-007.
2017-05-18 09:34:27 -07:00
Joel Brobecker
048462bc05 do not require '(' and ')' around no-rh-check or no-precommit-check
The team felt that requiring the extra '(' and ')' around the keywords
we can use in revision logs aren't really all that necessary, so
this patch makes it all consistent within the git-hooks, but
the svn hooks as well.

This patch also adjusts the two corresponding tests
(M424-008__no_rh_check_keyword, and P426-035__no_precommit_check_in_rh)
to verify that the relaxed requirement is actually relaxed.

Part of OA23-034.
2017-03-17 14:25:55 -07:00