Files
Maciej Borzecki 9e2e5e206a o/snapstate: handling of unexpected runtime restart (#14002)
* o/snapstate: when invoked for rollback, check for any changes related to snapd

When the snapd.service unit fails to be activated, an OnFailure handler will
execute snap-failure which in turn starts the snapd process from the previous
revision of the snap with SNAPD_REVERT_TO_REV set in its environment. It may
happen that the snapd unit fails at runtime without an associated change to the
snapd snap, however snap-failure is not able to detect such case, and so the
snapd process started in its context would continue to run. Avoid this by
extending the logic within snapd to check it if has been started by snap-failure
with the intention of handling a rollback, and so whether there is a change
related to snapd snap in the state. When the conditions have not been met, snapd
exits and snap-failure continues to restart the snapd service.

Related issues: SNAPDENG-21605

Signed-off-by: Maciej Borzecki <maciej.borzecki@canonical.com>

* tests/core/snapd-failover: account for improved snap-failure behavior

Signed-off-by: Maciej Borzecki <maciej.borzecki@canonical.com>

* overlord: add Is() to startup error

Add Is() support to startupError, so that error can be introspected at runtime.

Signed-off-by: Maciej Borzecki <maciej.borzecki@canonical.com>

* o/snapstate: return an explicit error from StartUp() when no recovery was detected

When snapd is invoked in a context of recovery but the state does not reflect
this, return an explicit error indicating that further startup should be carried out.

Signed-off-by: Maciej Borzecki <maciej.borzecki@canonical.com>

* daemon: return an explicit error when startup in recovery context was aborted

Return an explicit error when startup in failure recovery context was aborted
due to lack of operations in the state which may have triggered it.

Signed-off-by: Maciej Borzecki <maciej.borzecki@canonical.com>

* cmd/snapd: handle unnecessary failure recovery

Gracefully handle unnecessary failure recovery but exiting with 0 status, so
that snap-failure may continue with cleanup and restart.

Signed-off-by: Maciej Borzecki <maciej.borzecki@canonical.com>

* o/snapstate: improve the check for asserting if restart was warranted

Signed-off-by: Maciej Borzecki <maciej.borzecki@canonical.com>

* overlord: add managers test for handling of runtime restart with failure handling

Signed-off-by: Maciej Borzecki <maciej.borzecki@canonical.com>

* many: tweak names and comments

Signed-off-by: Maciej Borzecki <maciej.borzecki@canonical.com>

* o/snapstate: tweak unit test names

Signed-off-by: Maciej Borzecki <maciej.borzecki@canonical.com>

* cmd/snapd: use fmt.Fprintln

Signed-off-by: Maciej Borzecki <maciej.borzecki@canonical.com>

* overlord/snapstate: tweak naming

Signed-off-by: Maciej Borzecki <maciej.borzecki@canonical.com>

---------

Signed-off-by: Maciej Borzecki <maciej.borzecki@canonical.com>
2024-06-03 15:18:49 +02:00
..