The functions to retrieve and print process cmdlines were based on the
assumption that they contain printable ASCII, and everything else
should be filtered out. That assumption doesn't hold in today's world,
where people are free to use unicode everywhere.
This replaces the custom cmdline reading code with a more generic approach
using utf8_escape_non_printable_full().
For kernel threads, truncation is done on the parenthesized name, so we'll
get "[worker]", "[worker…]", …, "[w…]", "[…", "…" as we reduce the number of
available columns.
This implementation is most likely slower for very long cmdlines, but I don't
think this is very important. The common case is to have short commandlines,
and should print those properly. Absurdly long cmdlines are the exception,
which needs to be handled correctly and safely, but speed is not too important.
Fixes#12532.
v2:
- use size_t for the number of columns. This change propagates into various
other functions that call get_process_cmdline(), increasing the size of the
patch, but the changes are rather trivial.
Unfortunately the warning must be known, or otherwise the pragma generates a
warning or an error. So let's do a meson check for it.
Is it worth doing this to silence the warning? I think so, because apparently
the warning was already emitted by gcc-8.1, and with the recent push in gcc to
catch more such cases, we'll most likely only get more of those.
v2: fix error in free_and_strndup()
When the orignal and copied message were the same, but shorter than specified
length l, memory read past the end of the buffer would be performed. A test
case is included: a string that had an embedded NUL ("q\0") is used to replace
"q".
v3: Fix one more bug in free_and_strndup and add tests.
v4: Some style fixed based on review, one more use of free_and_replace, and
make the tests more comprehensive.
According to RFC2616[1], HTTP header names are case-insensitive. So
it's totally valid to have a header starting with either `Date:` or
`date:`.
However, when systemd-importd pulls an image from an HTTP server, it
parses HTTP headers by comparing header names as-is, without any
conversion. That causes failures when some HTTP servers return headers
with different combinations of upper-/lower-cases.
An example:
https://alpha.release.flatcar-linux.net/amd64-usr/current/flatcar_developer_container.bin.bz2 returns `Etag: "pe89so9oir60"`,
while https://alpha.release.core-os.net/amd64-usr/current/coreos_developer_container.bin.bz2
returns `ETag: "f03372edea9a1e7232e282c346099857"`.
Since systemd-importd expects to see `ETag`, the etag for the Container Linux image
is correctly interpreted as a part of the hidden file name.
However, it cannot parse etag for Flatcar Linux, so the etag the Flatcar Linux image
is not appended to the hidden file name.
```
$ sudo ls -al /var/lib/machines/
-r--r--r-- 1 root root 3303014400 Aug 21 20:07 '.raw-https:\x2f\x2falpha\x2erelease\x2ecore-os\x2enet\x2famd64-usr\x2fcurrent\x2fcoreos_developer_container\x2ebin\x2ebz2.\x22f03372edea9a1e7232e282c346099857\x22.raw'
-r--r--r-- 1 root root 3303014400 Aug 17 06:15 '.raw-https:\x2f\x2falpha\x2erelease\x2eflatcar-linux\x2enet\x2famd64-usr\x2fcurrent\x2fflatcar_developer_container\x2ebin\x2ebz2.raw'
```
As a result, when the Flatcar image is removed and downloaded again,
systemd-importd is not able to determine if the file has been already
downloaded, so it always download it again. Then it fails to rename it
to an expected name, because there's already a hidden file.
To fix this issue, let's introduce a new helper function
`memory_startswith_no_case()`, which compares memory regions in a
case-insensitive way. Use this function in `curl_header_strdup()`.
See also https://github.com/kinvolk/kube-spawn/issues/304
[1]: https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
We shouldn't just log arbitrary stuff, in particular newlines and control chars
Now:
Unknown dunder line __CURSORFACILITY=6\nSYSLOG_IDENTIFIER=/USR/SBIN/CRON\nMES…, ignoring.
Unknown dunder line __REALTIME_TIME[TAMP=1404101101501874\n__MONOTONIC_TIMEST…, ignoring.
It's not supposed to be the most efficient, but instead fast and simple to use.
I kept the logic in ellipsize_mem() to use unicode ellipsis even in non-unicode
locales. I'm not quite convinced things should be this way, especially that with
this patch it'd actually be simpler to always use "…" in unicode locale and "..."
otherwise, but Lennart wanted it this way for some reason.