Commit Graph

575 Commits

Author SHA1 Message Date
Simon Marchi
a2909b2c8d Convert print statements to print function
In preparation for the transition to Python 3, convert to using the
print function.  The special import

    from __future__ import print_function

in Python 2.7 makes "print" become the function and not the statement.
In Python 3, it has no effect.  Once the transition is done and we
exclusively use Python 3, we can simply remove the imports.

The testsuite shows no regressions.
2020-11-22 19:29:38 -05:00
Joel Brobecker
ee21c3e032 TB15-003__non_ascii_char_in_diff: New testcase.
This testcase was an attempt at trying to reproduce the problem
reported in https://github.com/AdaCore/git-hooks/issues/13.

As it happens, this testcase is not sufficient to reproduce
the problem above; but it's sufficiently interesting on its own
that it's worth adding to our testsuite nonetheless, while we continue
trying to find another reproducer for the GitHub issue above.

Change-Id: I1c63d64797ebb9ccd79eca52e45e179e741c9132
TN: TB15-003
2020-11-15 19:38:31 +04:00
Joel Brobecker
ac8f4d6a54 update.py crash when not commits to style-check and combined-style-checking
This commit fixes a crash which occurs during the validation phase
(when the "update" hook is called). This only happens when the repository
is configured with the combined-style-checking config option set.
So far, we found a couple of scenarios where this could happen:

  - The user pushes a single commit, which happens to be a "revert"
    commit (i.e. a commit which is created via the "git revert"
    command); those commits are purposefully excluded from the list
    of commits we have to check;

  - The user pushes a tag, where the tag points to a commit which
    already exists in the repository (something very much typical
    of people pushing tags).

In both cases, we land in the AbstractUpdate.__do_style_checks method,
doing the following:

        | added = self.commits_to_check
        | if git_config('hooks.combined-style-checking'):
        |     # This project prefers to perform the style check on
        |     # the cumulated diff, rather than commit-per-commit.
        |     # Behave as if the update only added one commit (new_rev),
        |     # with a single parent being old_rev.  If old_rev is nul
        |     # (branch creation), then use the first parent of the oldest
        |     # added commit.
        |     debug('(combined style checking)')
 !!! -> |     if not added[-1].pre_existing_p:
        |         [...]

The problem in the scenarios mentioned above is that, because there are
no commits to check, added is the empty list, and so trying to get
the last element of that list does not work.

This commit fixes the problem by checking the list of added commits
early, and returning right away if there are no commits to check.

As a result of this early return, a couple of tests saw their output
change, because a debug log is no longer printed. I verified that
these were for two tests where the push did not send any commits
that are new to the repository. So the change of behavior is expected.

Change-Id: I28e5ffe61ec19b8acf46b6da5c0bfe5ed0139d2c
TN: TA21-016
2020-10-22 07:09:08 +04:00
Joel Brobecker
e424e08c0e a_tag_before_commit: Adjust erroneous comment
See the "a_tag_before_commit" testcase where we have a very similar
situation, and where the comment explicitly confirms that we should
not be checking any commit during this push.

Found while working on TA21-016.

Change-Id: I758ac7d7559cc9c76e981e47700674b163ffbcc6
2020-10-22 07:07:36 +04:00
Joel Brobecker
e536bf865a centralize the computation of the list of commits to apply checks on
This change is mostly a cleanup which simplifies a bit the iteration
over the list of commits for which to perform the various checks,
by having that information provided via an attribute whose name
is very specific about the semantics of what to use that list for.

This was motivated by an issue we discovered recently where we forgot
to exclude some commits when calling the commit-extra-checker hook.
This happened because, no matter how simple the algorithm for
computing the list of commits to check was, that simple algorithm
was duplicated in many places. This avoids that, in the hopes that
this will prevent this same kind of issues again in the future.

Another way to view the benefits of this cleanup is that this will
help ensure consistency in terms of the list of commits to which
the various checks are applied.

Note that we could have logically split this patch into two steps,
with the first step simply renaming the "added_commits" attribute
into "new_commits_for_ref", and then a second step introducing
the "commits_to_check" attribute. In the end, both are combined
in a single commit because it seems easier to review every location
where the "added_commits" attribute was used, and make sure from
the context that the correct list of commits was chosen in each
instance the "added_commits" list used to be referenced.
Otherwise, it's harder to review the second commit adding
the "commits_to_check" attribute, because the reviewer then has to
audit the entire set of sources himself in order to make sure
no spot was missed.

No actual behavior change should result from this change, and
therefore no test is being added.

Change-Id: I93c206968800dc738d3ebe4f5424f9201875383b
TN: T929-030
2020-10-02 11:09:37 -07:00
Joel Brobecker
8b48cfec4f catch merge commits masquerading as revert commits
The git-hooks currently incorrectly disable the check against
merge commits for commits that looks like (from the revision log)
merge commits. This commit fixes this problem and adds a new
testcase to verify this is correctly handled.

Change-Id: Ie2d12de818c43cc11971734ea12ae3dc3fc79039
TN: TA02-034.
2020-10-02 10:29:00 -07:00
Joel Brobecker
35192940c7 AbstractUpdate.pre_commit_checks: Remove unnecessary early return
This method computes a list of commits that need to be checked,
and then makes the assumption that all the checks should be applied
to the same list. This is mostly correct, except in the case of
merge commits: The list of commits to check excludes commits
which have a revision log suggesting that they are revert commits.
In the interest of keeping a given branch free of merge commits,
we still want to reject the revert commits that are merges. So
the list of commits to check for merge-commit violations is
different from the list used by most checks.

In practice, this early return doesn't contribute anything tangible
in terms of performance, and doesn't allow each check to decide
what list of commits it should operate on.

So this commit removes it. There should be no user-visible effect
coming from this change.

Seen while working on TA02-034.

Change-Id: I3a7e826ed4dba36f75da2fc088360356e5a8fa70
2020-10-02 10:28:29 -07:00
Joel Brobecker
f1414aed4a Do not call the hooks.commit-extra-checker for pre-existing commits
A new testcase for this change is not necessary, because an existing
testcase was actually already exercising this scenario (and is therefore
adjusted to expect the correct behavior). An extra test within the same
testcase is also added to exercise the scenario where all commits being
pushed are new to the repository. To make the branch names clearer
in terms of which has all-new commits, and which has some preexisting
ones, the multiple-commits-accept and future-multiple-commits-accept
have been renamed to add a "-some-prexisting" suffix to them. And
in the new test, we use the "-all-new" suffix.

Change-Id: I784a5f2fb3a4619c94d667f301d5a948cf843ac0
TN: T929-030
2020-10-02 09:06:46 -07:00
Joel Brobecker
fe8c5b5009 Add support for new config option: hooks.max-filepath-length
Note that, as part of the implementation, this commit introduces
a couple of calls to Git for each commit meant to retrieve the entire
list of files in a given commit. Luckily, one of these two calls
was already part of another check (filename collisions), and
the result of that call is cached in a CommitInfo object, so
the delta is only one extra Git call. Some performance measurements
seem to indicate that this is a fast-enough operation that users
should not notice the difference in performance, even for repositories
with a very large number of files [1]. If real-life usage proves
this assessment wrong, however, we'll revisit the documented behavior
so as to allow the use of a cheaper approach (e.g perform the check
on all modified files, rather than just the new ones).

[1]: Tested for instance on the GCC repository, which has more than
     87,000 files in total; the first call took roughly 100ms, and
     the next ones took about 60-70ms.

Change-Id: If0eea2c4990a945a3006cdd74d7a9aca2fc770d9
TN: T811-015
2020-09-11 10:47:51 -07:00
Joel Brobecker
3df0ef1658 CommitInfo.all_files: New method
This commit enhances the CommitInfo class by adding a new method that
returns the list of all files in the repository for the given commit.
This factorizes some code which already exists, and allows for the
result to be cached in this object for reuse, something we intend
to do on a followup commit. In the meantime, this commit should be
a nop in practice.

Change-Id: I3f9328dd9e89f8d82ad2ad6fdac69b5e2fbc0992
TN: T811-015
2020-09-02 17:28:28 -07:00
Joel Brobecker
111da4bad3 Add support for new config option: hooks.no-rh-character-range-check
Change-Id: Ic64b5b0f9007605bd339402b93689a944fd13485
TN: T829-004
2020-08-30 09:23:10 -07:00
Joel Brobecker
a28f519d07 not_ISO_8859-15: Add bad-after-subject branch to bare repository
This commit just changes this testcase's bare repository to add
a new branch, called bad-after-subject, pointing to the same commit
as the master branch. This new branch name was chosen to match
the name of one of the two branches used by this testcase.
The testcase is then changed to try to push its bad-after-subject
branch to bad-after-subject, rather than re-using master as
the destination again.

This change doesn't make a difference at the moment, because neither
of the two push operations performed by the test are expected to
succeed. But they are preparation work for a followup change which
provides the ability to skip the verifications causing these pushes
to fail. In that followup commit, we'll have the testcase change
the test repository's configuration, and then re-attempt the pushes,
expecting them to succeed this time. When we do that, we'll want
to push each of the two branches to different branches on the remote.
That's why this commit introduces the second branch on the remote.

Change-Id: I86e0c18ae094818821202e545e646cbcad595c25
TN: T829-004
2020-08-29 14:56:45 -07:00
Joel Brobecker
19f08564f1 not_ISO_8859-15: merge all testing into a single method
The issue with having two methods is that the order in which those
methods are executed is not visual and can create situations
where the test is mislead because the order chosen by unittest to
execute the testing methods is different from the order they appear
in the test script. For this test, as of now, this does not matter,
because nothing gets pushed to the bare repository. However, in
an upcoming change where we add a new feature allowing users to
push commits with revision logs containing characters outside
the range allowed by default, the order will matter. Prevent
this issue by merging the two methods into a single one. The downside
is that a failure in the first test will prevent the second one
from running. But since we never allow changes that fail a test
to get pushed to production, this shouldn't be an issue in practice.

Change-Id: I4f8f17b8ae0a73055c1207de62e9615500dd9b57
TN: T829-004
2020-08-29 14:20:55 -07:00
Joel Brobecker
a62e4bcad0 Add support for new config option hooks.email-new-commits-only
This enhancement should answer issue #9 on GitHub:
https://github.com/AdaCore/git-hooks/issues/9

Change-Id: I4b45f8fc62cb858eefd8b8b417bbdbaa7e80bf91
TN: T815-009
2020-08-15 14:20:30 -07:00
Joel Brobecker
da02784ca7 Move check for hooks.no-emails config to __set_send_email_p_attr
Currently, all users of AbstractUpdate.__set_send_email_p_attr first
verify that the reference being updated isn't on the hooks.no-emails
list, so __set_send_email_p_attr could arguably assume that this is
the case, and add an assertion. However, from a functional perspective,
this method is perfectly capable of doing its jobs in the case of
hooks.no-emails references as well, and doing so avoid a self-induced
potential crash, which means better resilience.

Therefore, this commit moves the check in __check_max_commit_emails
down to __set_send_email_p_attr.

This change makes the hooks slightly less effective in terms of
performance in the case of references with the hooks.no-emails and
the hooks.no-precommit-checks config options set (as demonstrated
by the slight change of behavior in one of our testcases showing that
we end up making one additional to Git): However, the impact is very
minimal, and only concerns updates for which we know the hooks have
very little to do, and therefore where performance is not key.

In the case of references that only have the hooks.no-emails config
option set, the overhead consists in setting one attribute for all
commits in a list and then counting the number of commits in that list.
This is offset to some degree by the fact that we save ourselves the
need to add an assertion check which would check the hooks.no-email
config option (which consists of regular expressions). Either way,
it should be negligible.

The upside is that future users of the __set_send_email_p_attr method
can save themselves a check of the hooks.no-emails config option.

Change-Id: Ib2216c639967fe6abf2dde57db7b6f0c7e5f554e
TN: T815-009
2020-08-15 14:20:30 -07:00
Joel Brobecker
632ea61e68 Update incorrect comment in AbstractUpdate.__check_max_commit_emails
This commit updates a comment which hasn't been true anymore for
quite a while. It is written in such a way that it is neutral about
how the policy used to determine which commits trigger an email
and which don't, so as to avoid being wrong again the next time
we introduce new policies.

Change-Id: Icbbf7d5451e7ce2bf6b3fd0e4e10cdca902b38b4
TN: T815-009
2020-08-15 14:20:30 -07:00
Joel Brobecker
1457710a9e clarify the effect of the `no-precommit-check' string in commit rev logs
The current description makes it sound like this keyword disables
all precommit checks, but in fact, it only disables the style checks.
This commit clarifies this.

Change-Id: I8f8b8245b60d0e01d1ef48e5e416adeecddefe6a
TN: P426-035
2020-08-12 14:25:35 -07:00
Joel Brobecker
2ebb722e13 Allow projects to customzie the contents of commit emails
This commit introduces a new config option hooks.commit-email-formatter
which allows projects to customize the emails being sent for new commits.

Change-Id: Ibd6b4991f15e204e0a076bb705fc8106f08ff8b4
TN: T209-006
2020-08-10 06:35:24 -07:00
Joel Brobecker
4c982600a3 AbstractUpdate.commit_data_for_hook: New method to factorize code
This commit factorizes some code which is currently defined locally
when handling the hooks.commit-extra-checker config option. The intent
is to build on it when we add support for a new option allowing users
to customize the contents of the commit emails being sent (the info
we will want to pass to the new hook to customer emails will be
a superset of the information passed here).

Change-Id: I3cc9659b80b0b03752d227966776c804d3c5203a
TN: T209-006
2020-08-07 18:30:02 -07:00
Joel Brobecker
ff8859e9ac Incorrect Content-Type charset for emails with non-ascii chars in email body
Prior to this commit, emails are always sent with the default charset.
This is a problem when the email body contains some characters which
are not ASCII (e.g. accented letters).

This commit fixes the issue by checking the contents of the email body,
makes a best-guess as to the encoding being used for that body, and
then sets the email's charset accordingly.

A new testcase is not added, because an existing testcase already
demonstrates the issue and thus needed to be adjusted to verify
that this change has the desired effect: email_header_encoding.
This testcase verifies the behavior when sending emails for two
commits, and those two commits do indeed use different encodings
for the commit author (the first one is UTF-8, and the second is
indeed iso8859-1).

Change-Id: I20e4659ac416845f5970448d5cf97d96e061fc5d
TN: S507-007
2020-07-26 15:37:52 -07:00
Joel Brobecker
3e7cb792fe move code guessing encoding into own new function (guess_encoding)
This is preparation work towards using that function in the context
of improving the handling of non-ascii characters in the email body

Change-Id: Ic64ae4ad5dedeb531704245243708c2c7a96e7f9
TN: S507-007
2020-07-26 15:37:52 -07:00
Joel Brobecker
1109d5c75e README.md: Fix thinko in name of hooks.commit-extra-checker option
Change-Id: I6770c2a8ac9bd7d72750e6f95bbc04225589a8f4
TN: T209-005
2020-07-26 08:45:29 -07:00
Joel Brobecker
1de3eb14c7 Add support for new config option hooks.commit-extra-check
This option allows projects to ask the git-hooks to perform custom
checks of any kind to validate every new commit being pushed.

This commit also contains a change in hooks/updates/emails.py which
enhances the email-sending debug tracing mechanism used by the testsuite,
adding support for a tracing mode where the traces are just one line
per email. Technically, this change is independent from the rest,
but it is merged with this commit nonetheless because these changes
would show up as uncovered by the testsuite without the testcases
we're adding here. That way, we maintain the principle that we always
have 100% testing coverage for any commit.

Change-Id: I0e446a95d05f5578f977af3eaa547144187eb86b
TN: T209-005
2020-07-19 19:38:58 -07:00
Joel Brobecker
07642e1591 Factorize code to call third-party hooks even more
This commit introduces a new class (ThirdPartyHook), which centralizes
the handling of third-party hooks (scripts that project set up via
the configuration file). Most of the code is simply a move of the code
from maybe_call_thirdparty_hook, with a couple of enhancements:
  - Ability to specify the directory from which to call the hook;
  - Handling of the situation when calling the hook itself fails.

We can already see the benefits of this in function style_check_files,
where the code now necessary to call the style_checker is both clearer
and more compact.

In the future, the intent is to push this class one step further to
allow users to specify hooks to be fetched from the repository, rather
than requiring that the hooks be installed (usually by an admin) on
the machine hosting the repository. The current class' API should
allow us to implement this in a way that's transparent to all users.

Change-Id: Ie8bf8accbbfd75a91b914628fad27d780532dac4
TN: T209-005
2020-07-19 19:38:40 -07:00
Joel Brobecker
9db584955e pass the RefKind and object_type when creating Update objects
As part of determining which Update class it needs to instantiate
for the reference udpate being evaluated, the udpate factory
computes the following information:
  - The kind of reference being updated (branch? notes? tag?)
  - What operation on the branch is being performed (create? delete?
    update?)
  - The type of object the commit targeted by the reference
    (commit? tag?)

In the context of enhancing the git-hooks to allow projects to provide
custom-checks on commits, the first and third items seem like these
could be useful information to pass to those hooks. In order to allow
this, without recomputing that information, this commit enhances
the AbstractUpdate class __init__ method to add those as additional
required paramaters, and then stores them as two new attributes.

As a bonus side-effect of this change, the new_rev_type attribute
is no longer necessary, which allows us to save one external call
to Git.

In the meantime, having these new attributes means that we can use
those to cross-check, within each AbstractUpdate child class'
self_sanity_check method, that the ref_kind and object_type values
correspond to each class' expectation (in other words, we cross-check
that the factory instantiated the correct class).

One other side-effect of that change is that we are no longer calling
the get_object_type function with a null SHA1 anymore. We could
modify the function's implementation to only accept non-null SHA1
revisions, but this would denature the function, in my opinion.
There is a genuine chance that perhaps, one day, we'll need that
again. So, instead, we cover that function by adding a new unit test
instead.

Change-Id: I8fd1ce180a6e17c0401b4dee07b9bc07d2abfdda
TN: T209-005
2020-07-19 19:30:55 -07:00