This commit is preparation work for the transition to Python 3.x,
where the output obtained by running Git commands will become
bytes as opposed to a string. In the vast majority of cases,
we'll want to decode that output into a string. Ideally, we would
want to do this in a way that is both compatible with Python 2.x
and Python 3.x, but we have found that this requires a lot of
work with many changes spread all over the code. So, instead,
what this commit does is introduce the concept of decoding
the output, but with the decoding only occurring when running
under Python 3.x.
That way, we can make progress towards Python 3.x while preserving
the behavior under Python 2.x intact.
Change-Id: I189577798ee96cba1fa55c7356babf102575642f
TN: U530-006
This is another commit to prepare for the transition to Python 3.x,
where the revlog of the given Commit object will be (unicode) strings.
In this situation decoding before checking the string against
the ISO-8859-15 is not necessary.
This commit therefore enhances the function to handle the case
where the revlog is already unicode by checking whether it offers
the "decode" method or not. The code is written in such a way that
the Python 3.x case doesn't use a temporary variable. That way, when
Python 2.x support is removed, we can simply remove the corresponding
half of the if-else block, and just keep the other half untouched.
TN: U530-006
Change-Id: I674c639eb7af656cdc084cc510b9eca90751f18f
This commit enhances the error message being generated when
filename collisions are detected to list the collisions in
alphabetical order. While this doesn't make a big difference
for the users, it ensures that the error message is deterministic.
Found while working on the transition to Python 3.x (U530-006)
Change-Id: Ie6787022a1833ea7f0db23a17c79d3bb9ce2ba96
This is a preparation patch for the transition to Python 3.x.
Currently, we have the following code:
| files_to_check = filter(needs_style_check_p, files_to_check)
| if not files_to_check:
| debug("style_check_commit: no files to style-check")
| return
With Python 2.x, the "filter" function constructs a list, so
we can indeed check for emptiness using the logical operator "not".
With Python 3.x, however, filter returns an iterator, and using
the logical operator "not" no longer has the same semantics at all.
The easiest way to handle this is to simply convert the result to
a tuple.
TN: U530-006
Change-Id: I334f2657728405226d7da577e893b7c64a3220fb
The "filename_list" of the style_check_files function is expected
to be a list. However, as it happens, we have a call to this function
where the filename_list parameter was computed using the "filter"
function.
files_to_check = filter(needs_style_check_p, files_to_check)
style_check_files(files_to_check, new_rev, project_name)
This is no problem when trying to iterate on the list alone, but
at one point inside the "style_check_files" funtion, we try to
add to it like so:
for filename in filename_list + aux_files:
With Python 3.x, this causes a crash because the "filter" function
now returns a generator rather than a list. Seems it seems desirable
to allow "style_check_files" to accept any iterable as "filename_list",
this commit modifies the code to allow it.
TN: U530-006
Change-Id: I8e5caaf09d177096788c522d7bff8168eb158ec7
Note that, as part of the implementation, this commit introduces
a couple of calls to Git for each commit meant to retrieve the entire
list of files in a given commit. Luckily, one of these two calls
was already part of another check (filename collisions), and
the result of that call is cached in a CommitInfo object, so
the delta is only one extra Git call. Some performance measurements
seem to indicate that this is a fast-enough operation that users
should not notice the difference in performance, even for repositories
with a very large number of files [1]. If real-life usage proves
this assessment wrong, however, we'll revisit the documented behavior
so as to allow the use of a cheaper approach (e.g perform the check
on all modified files, rather than just the new ones).
[1]: Tested for instance on the GCC repository, which has more than
87,000 files in total; the first call took roughly 100ms, and
the next ones took about 60-70ms.
Change-Id: If0eea2c4990a945a3006cdd74d7a9aca2fc770d9
TN: T811-015
This commit enhances the CommitInfo class by adding a new method that
returns the list of all files in the repository for the given commit.
This factorizes some code which already exists, and allows for the
result to be cached in this object for reuse, something we intend
to do on a followup commit. In the meantime, this commit should be
a nop in practice.
Change-Id: I3f9328dd9e89f8d82ad2ad6fdac69b5e2fbc0992
TN: T811-015
This commit introduces a new class (ThirdPartyHook), which centralizes
the handling of third-party hooks (scripts that project set up via
the configuration file). Most of the code is simply a move of the code
from maybe_call_thirdparty_hook, with a couple of enhancements:
- Ability to specify the directory from which to call the hook;
- Handling of the situation when calling the hook itself fails.
We can already see the benefits of this in function style_check_files,
where the code now necessary to call the style_checker is both clearer
and more compact.
In the future, the intent is to push this class one step further to
allow users to specify hooks to be fetched from the repository, rather
than requiring that the hooks be installed (usually by an admin) on
the machine hosting the repository. The current class' API should
allow us to implement this in a way that's transparent to all users.
Change-Id: Ie8bf8accbbfd75a91b914628fad27d780532dac4
TN: T209-005
This is preparatory work for being to pass a certain amount of data
to user-defined hooks. The intent of this change is to limit the number
of times we compute that information to at most once.
Generally speaking, the main part of this change consists in adding
the following new methods to class CommitInfo:
- raw_revlog;
- raw_revlog_lines; and
The rest of this change is mostly adjustments to the code that needs
to access commits' rev logs to get them from a shared CommitInfo object
rather than from a play revision (SHA1).
Additionally, the function is_revert_commit in git.py, which took
a commit revision as a paramenter and needed a call to "git log"
to get the commit's body, has been replaced by a new method in
class CommitInfo. An alternative approach might have been to
keep the function, and change its parameter to be a CommitInfo object.
But it seemed more natural to make this a method of the CommitInfo
class instead, so this is what this change does.
Change-Id: Ia4bf23f24226d1e9eddafc61afd37db37f0f5287
TN: T209-005
There was a bug when returning an error message indicating that
the revision contains non-ISO-8859-15, that would always show
the first line has having the unwanted character, even when
that characters was actually on a different line.
And while at it, I improved the error message to include the line
number as well. Just in case showing the text itself is either not
sufficient or just not as efficient.
Credit for finding the bug: Thomas Quinot (thank you!).
Followup on QB25-001.
This is a performance enhancement which reduces the number of calls
to style_checker to, at most, once per commit. For repositories which
are configured to combine the commits for style-checking, the
style_checker should only be called once at most.
In terms of performance gains, the new approach increases
the performance by a factor of about 5 to 8 (measured using
a set of 10, 100, 1000 and 4500 files, the latter being the typical
commit in the GDB repository when adjusting the copyright year in
all GDB sources).
Roughly speaking, on an Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz,
style-checking 4,500 text files, for which no check is actually
performed, took 31 seconds instead of 226 seconds. Using 4,500 Ada
files, for which we actually run the GNAT compiler and perform some
rulific checks, the new approach took 61 seconds instead of 230
seconds. The relative gain in the Ada case may appear to decrease
with the number of files to check, but 61 secs instead of 230 secs
is the difference between completing before the gerrit timeout
(around 2 minutes) or being aborted by the gerrit timeout.
For Q530-009.
This is preparation work towards calling the style checker only once
for an entire commit, rather than once per file.
This commit also adds a new testcase, allowing the code coverage to
remain at 100%.
For Q530-009.
Thanks to the transition to the new style checker, we can now
call it directly with the module name, without having to fake
the SVN path to that module.
Part of NA17-007.
The team felt that requiring the extra '(' and ')' around the keywords
we can use in revision logs aren't really all that necessary, so
this patch makes it all consistent within the git-hooks, but
the svn hooks as well.
This patch also adjusts the two corresponding tests
(M424-008__no_rh_check_keyword, and P426-035__no_precommit_check_in_rh)
to verify that the relaxed requirement is actually relaxed.
Part of OA23-034.