Two kind of mistakes:
1. Missed space. After concatenating there is no space between words.
2. Missed comma. Causes unintentional concatenating in a list of strings.
(cherry picked from commit 34fd4c2019)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Fix typos found by codespell in docs, docstrings, and comments.
(cherry picked from commit c3d9508ff2)
Co-authored-by: Leo Arias <leo.arias@canonical.com>
The original algorithm tried to delegate the folding to the tokens so
that those tokens whose folding rules differed could specify the
differences. However, this resulted in a lot of duplicated code because
most of the rules were the same.
The new algorithm moves all folding logic into a set of functions
external to the token classes, but puts the information about which
tokens can be folded in which ways on the tokens...with the exception of
mime-parameters, which are a special case (which was not even
implemented in the old folder).
This algorithm can still probably be improved and hopefully simplified
somewhat.
Note that some of the test expectations are changed. I believe the
changes are toward more desirable and consistent behavior: in general
when (re) folding a line the canonical version of the tokens is
generated, rather than preserving errors or extra whitespace.
Leading whitespace was incorrectly dropped during folding of certain lines in the _header_value_parser's folding algorithm. This makes the whitespace handling code consistent.
This could use more edge case tests, but the basic functionality is tested.
(Note that this changeset does not add tailored support for the RFC 6532
message/global MIME type, but the email package generic facilities will handle
it.)
Reviewed by Maciej Szulik.
This mimics get_param's error handling for the most part. It is slightly
better in some regards as get_param can produce some really weird results for
duplicate *0* parts. It departs from get_param slightly in that if we have a
mix of non-extended and extended pieces for the same parameter name, the new
parser assumes they were all supposed to be extended and concatenates all the
values, whereas get_param always picks the non-extended parameter value. All
of this error recovery is pretty much arbitrary decisions...
This applies only to the new parser. The old parser decodes encoded words
inside quoted strings already, although it gets the whitespace wrong
when it does so.
This version of the patch only handles the most common case (a single encoded
word surrounded by quotes), but I haven't seen any other variations of this in
the wild yet, so its good enough for now.
There is more to be done here in terms of accepting RFC invalid
input that some mailers accept, but this covers the valid
RFC places where encoded words can occur in structured headers.
The problem was I was only checking for decimal digits after the third '?',
not for *hex* digits :(.
This changeset also fixes a couple of comment typos, deletes an unused
function relating to encoded word parsing, and removed an invalid
'if' test from the folding function that was revealed by the tests
written to validate this issue.