Commit Graph

202 Commits

Author SHA1 Message Date
Benjamin Berg
2f681ba4b3 um: move thread info into task
This selects the THREAD_INFO_IN_TASK option for UM and changes the way
that the current task is discovered. This is trivial though, as UML
already tracks the current task in cpu_tasks[] and this can be used to
retrieve it.

Also remove the signal handler code that copies the thread information
into the IRQ stack. It is obsolete now, which also means that the
mentioned race condition cannot happen anymore.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Hajime Tazaki <thehajime@gmail.com>
Link: https://patch.msgid.link/20241111102910.46512-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-12 14:50:31 +01:00
Benjamin Berg
ce6e85a186 um: remove broken double fault detection
The show_stack function had some code to detect double faults. However,
the logic is wrong and it would e.g. trigger if a WARNING happened
inside an IRQ.

Remove it without trying to add a new logic. The current behaviour,
which will just fault repeatedly until the IRQ stack is used up and the
host kills UML, seems to be good enough.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103150506.1367695-5-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07 17:36:31 +01:00
Benjamin Berg
b69f22dfd6 um: remove duplicate UM_NSEC_PER_SEC definition
Just remove the first entry as there is a second later on.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103150506.1367695-4-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07 17:36:31 +01:00
Benjamin Berg
37c691151e um: remove file sync for stub data
There is no need to sync the stub code to "disk" for the other process
to see the correct memory. Drop the fsync there and remove the helper
function.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103150506.1367695-3-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07 17:36:30 +01:00
Benjamin Berg
2f278b5957 um: always include kconfig.h and compiler-version.h
Since commit a95b37e20d ("kbuild: get <linux/compiler_types.h> out of
<linux/kconfig.h>") we can safely include these files in userspace code.
Doing so simplifies matters as options do not need to be exported via
asm-offsets.h anymore.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241103150506.1367695-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07 17:36:30 +01:00
Tiwei Bie
4e5adbe447 um: Add os_set_pdeathsig helper function
This helper can be used to set the parent-death signal of the calling
process to SIGKILL to ensure that the process will be killed if the
UML kernel dies unexpectedly without proper cleanup. This helper will
be used in the follow-up patches.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241024142828.2612828-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-25 11:34:54 +02:00
Benjamin Berg
0b8b2668f9 um: insert scheduler ticks when userspace does not yield
In time-travel mode userspace can do a lot of work without any time
passing. Unfortunately, this can result in OOM situations as the RCU
core code will never be run.

Work around this by keeping track of userspace processes that do not
yield for a lot of operations. When this happens, insert a jiffie into
the sched_clock clock to account time against the process and cause the
bookkeeping to run.

As sched_clock is used for tracing, it is useful to keep it in sync
between the different VMs. As such, try to remove added ticks again when
the actual clock ticks.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241010142537.1134685-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 09:52:49 +02:00
Tiwei Bie
2717c6b649 um: Abandon the _PAGE_NEWPROT bit
When a PTE is updated in the page table, the _PAGE_NEWPAGE bit will
always be set. And the corresponding page will always be mapped or
unmapped depending on whether the PTE is present or not. The check
on the _PAGE_NEWPROT bit is not really reachable. Abandoning it will
allow us to simplify the code and remove the unreachable code.

Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011102354.1682626-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 09:52:49 +02:00
Johannes Berg
188b64f288 um: remove fault_catcher infrastructure
This was perhaps intended to do _nofault copies, but the
real reason is lost to history. Remove this, it's not
needed, and using longjmp() out of the middle of the
signal handler with all the state it has modified is
not going to be a good idea anyway.

Link: https://patch.msgid.link/20241010224513.901c4d390b3e.Ia74742668b44603c1ca23dd36f90e964e6e7ee55@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 09:52:46 +02:00
Benjamin Berg
68b9883cc1 um: Discover host_task_size from envp
When loading the UML binary, the host kernel will place the stack at the
highest possible address. It will then map the program name and
environment variables onto the start of the stack.

As such, an easy way to figure out the host_task_size is to use the
highest pointer to an environment variable as a reference.

Ensure that this works by disabling address layout randomization and
re-executing UML in case it was enabled.

This increases the available TASK_SIZE for 64 bit UML considerably.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240919124511.282088-9-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10 13:37:22 +02:00
Benjamin Berg
32e8eaf263 um: use execveat to create userspace MMs
Using clone will not undo features that have been enabled by libc. An
example of this already happening is rseq, which could cause the kernel
to read/write memory of the userspace process. In the future the
standard library might also use mseal by default to protect itself,
which would also thwart our attempts at unmapping everything.

Solve all this by taking a step back and doing an execve into a tiny
static binary that sets up the minimal environment required for the
stub without using any standard library. That way we have a clean
execution environment that is fully under the control of UML.

Note that this changes things a bit as the FDs are not anymore shared
with the kernel. Instead, we explicitly share the FDs for the physical
memory and all existing iomem regions. Doing this is fine, as iomem
regions cannot be added at runtime.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240919124511.282088-3-benjamin@sipsolutions.net
[use pipe() instead of pipe2(), remove unneeded close() calls]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10 13:37:16 +02:00
Benjamin Berg
c6ce72005d um: remove auxiliary FP registers
We do not need the extra save/restore of the FP registers when getting
the fault information. This was originally added in commit 2f56debd77
("uml: fix FP register corruption") but at that time the code was not
saving/restoring the FP registers when switching to userspace. This was
fixed in commit fbfe9c847e ("um: Save FPU registers between task
switches") and since then the auxiliary registers have not been useful.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20241004233821.2130874-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10 12:10:30 +02:00
Benjamin Berg
5a6951273e um: always use the internal copy of the FP registers
When switching from userspace to the kernel, all registers including the
FP registers are copied into the kernel and restored later on. As such,
the true source for the FP register state is actually already in the
kernel and they should never be grabbed from the userspace process.

Change the various places to simply copy the data from the internal FP
register storage area. Note that on i386 the format of PTRACE_GETFPREGS
and PTRACE_GETFPXREGS is different enough that conversion would be
needed. With this patch, -EINVAL is returned if the non-native format is
requested.

The upside is, that this patchset fixes setting registers via ptrace
(which simply did not work before) as well as fixing setting floating
point registers using the mcontext on signal return on i386.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240913133845.964292-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10 12:03:55 +02:00
Tiwei Bie
242fef3610 um: Fix the definition for physmem_size
Currently physmem_size is defined as long long but declared locally
as unsigned long long before using it in separate .c files. Make them
match by defining physmem_size as unsigned long long and also move
the declaration to a common header to allow the compiler to check it.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20240916045950.508910-5-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10 12:02:13 +02:00
Tiwei Bie
cd05cbed42 um: Remove highmem leftovers
Highmem was only supported on UML/i386. And the support has been
removed by commit a98a6d864d ("um: Remove broken highmem support").
Remove the leftovers and stop UML from trying to setup highmem when
the sum of physmem_size and iomem_size exceeds max_physmem.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20240916045950.508910-4-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10 12:02:13 +02:00
Benjamin Berg
71fae9dfa7 um: Remove unused os_getpgrp function
The function is not used anywhere.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240913134442.967599-5-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10 12:02:04 +02:00
Benjamin Berg
377c23c558 um: Remove unused os_stop_process
The function is not used anywhere.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240913134442.967599-4-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10 12:02:04 +02:00
Benjamin Berg
47e174969c um: Remove unused os_process_parent
The function is not used anywhere.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240913134442.967599-3-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10 12:02:04 +02:00
Benjamin Berg
7852ee068a um: Remove unused os_process_pc
The function is not used anywhere in the codebase.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240913134442.967599-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10 12:02:04 +02:00
Tiwei Bie
fe6abeba24 um: Remove the declaration of user_thread function
This function has never been defined since its declaration was
introduced by commit 1da177e4c3 ("Linux-2.6.12-rc2").

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2024-09-12 20:43:26 +02:00
Tiwei Bie
59376fb2a7 um: Remove unused mm_fd field from mm_id
It's no longer used since the removal of the SKAS3/4 support.

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2024-09-12 20:36:22 +02:00
Gaosheng Cui
2df8c8d118 um: Remove obsoleted declaration for execute_syscall_skas
The execute_syscall_skas() have been removed since
commit e32dacb9f4 ("[PATCH] uml: system call path cleanup"),
and now it is useless, so remove it.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
2024-09-12 20:23:33 +02:00
Benjamin Berg
bcf3d957c6 um: refactor TLB update handling
Conceptually, we want the memory mappings to always be up to date and
represent whatever is in the TLB. To ensure that, we need to sync them
over in the userspace case and for the kernel we need to process the
mappings.

The kernel will call flush_tlb_* if page table entries that were valid
before become invalid. Unfortunately, this is not the case if entries
are added.

As such, change both flush_tlb_* and set_ptes to track the memory range
that has to be synchronized. For the kernel, we need to execute a
flush_tlb_kern_* immediately but we can wait for the first page fault in
case of set_ptes. For userspace in contrast we only store that a range
of memory needs to be synced and do so whenever we switch to that
process.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240703134536.1161108-13-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 17:09:50 +02:00
Benjamin Berg
573a446fc8 um: simplify and consolidate TLB updates
The HVC update was mostly used to compress consecutive calls into one.
This is mostly relevant for userspace where it is already handled by the
syscall stub code.

Simplify the whole logic and consolidate it for both kernel and
userspace. This does remove the sequential syscall compression for the
kernel, however that shouldn't be the main factor in most runs.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240703134536.1161108-12-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 17:09:50 +02:00
Benjamin Berg
3c83170d7c um: Delay flushing syscalls until the thread is restarted
As running the syscalls is expensive due to context switches, we should
do so as late as possible in case more syscalls need to be queued later
on. This will also benefit a later move to a SECCOMP enabled userspace
as in that case the need for extra context switches is removed entirely.

Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Link: https://patch.msgid.link/20240703134536.1161108-9-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03 17:09:49 +02:00