kernel

mirror of https://github.com/ukui/kernel.git synced 2026-03-09 10:07:04 -07:00

Author	SHA1	Message	Date
Linus Torvalds	29c395c77a	Merge tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 irq entry updates from Thomas Gleixner: "The irq stack switching was moved out of the ASM entry code in course of the entry code consolidation. It ended up being suboptimal in various ways. This reworks the X86 irq stack handling: - Make the stack switching inline so the stackpointer manipulation is not longer at an easy to find place. - Get rid of the unnecessary indirect call. - Avoid the double stack switching in interrupt return and reuse the interrupt stack for softirq handling. - A objtool fix for CONFIG_FRAME_POINTER=y builds where it got confused about the stack pointer manipulation" * tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix stack-swizzle for FRAME_POINTER=y um: Enforce the usage of asm-generic/softirq_stack.h x86/softirq/64: Inline do_softirq_own_stack() softirq: Move do_softirq_own_stack() to generic asm header softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK x86/softirq: Remove indirection in do_softirq_own_stack() x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall x86/entry: Convert device interrupts to inline stack switching x86/entry: Convert system vectors to irq stack macro x86/irq: Provide macro for inlining irq stack switching x86/apic: Split out spurious handling code x86/irq/64: Adjust the per CPU irq stack pointer by 8 x86/irq: Sanitize irq stack tracking x86/entry: Fix instrumentation annotation	2021-02-24 16:32:23 -08:00
Jens Axboe	4727dc20e0	arch: setup PF_IO_WORKER threads like PF_KTHREAD PF_IO_WORKER are kernel threads too, but they aren't PF_KTHREAD in the sense that we don't assign ->set_child_tid with our own structure. Just ensure that every arch sets up the PF_IO_WORKER threads like kthreads in the arch implementation of copy_thread(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-02-21 17:25:22 -07:00
Thomas Gleixner	3aac798a91	um: Enforce the usage of asm-generic/softirq_stack.h The recent rework of the X86 irq stack switching mechanism broke UM as UM pulls in the X86 specific variant of softirq_stack.h. Enforce the usage of the asm-generic variant. Fixes: `72f40a2823` ("x86/softirq/64: Inline do_softirq_own_stack()") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Weinberger <richard@nod.at>	2021-02-16 10:23:14 +01:00
Johannes Berg	ddad5187fc	um: irq.h: include <asm-generic/irq.h> This will get the (no-op) definition of irq_canonicalize() which some code might want. We could define that ourselves, but it seems like we'd likely want generic extensions in the future, if any. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:40:14 +01:00
Johannes Berg	cc3ac20fc2	um: io.h: include <linux/types.h> This may be needed for size_t if something doesn't get it included elsewhere before including <asm/io.h>, so add the include. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:39:37 +01:00
Johannes Berg	dde8b58d51	um: add a pseudo RTC Add a pseudo RTC that simply is able to send an alarm signal waking up the system at a given time in the future. Since apparently timerfd_create() FDs don't support SIGIO, we use the sigio-creating helper thread, which just learned to do suspend/resume properly in the previous patch. For time-travel mode, OTOH, just add an event at the specified time in the future, and that's already sufficient to wake up the system at that point in time since suspend will just be in an "endless wait". For s2idle support also call pm_system_wakeup(). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:38:52 +01:00
Johannes Berg	bfc58e2b98	um: remove process stub VMA This mostly reverts the old commit `3963333fe6` ("uml: cover stubs with a VMA") which had added a VMA to the existing PTEs. However, there's no real reason to have the PTEs in the first place and the VMA cannot be 'fixed' in place, which leads to bugs that userspace could try to unmap them and be forcefully killed, or such. Also, there's a bit of an ugly hole in userspace's address space. Simplify all this: just install the stub code/page at the top of the (inner) address space, i.e. put it just above TASK_SIZE. The pages are simply hard-coded to be mapped in the userspace process we use to implement an mm context, and they're out of reach of the inner mmap/munmap/mprotect etc. since they're above TASK_SIZE. Getting rid of the VMA also makes vma_merge() no longer hit one of the VM_WARN_ON()s there because we installed a VMA while the code assumes the stack VMA is the first one. It also removes a lockdep warning about mmap_sem usage since we no longer have uml_setup_stubs() and thus no longer need to do any manipulation that would require mmap_sem in activate_mm(). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:37:38 +01:00
Johannes Berg	9f0b4807a4	um: rework userspace stubs to not hard-code stub location The userspace stacks mostly have a stack (and in the case of the syscall stub we can just set their stack pointer) that points to the location of the stub data page already. Rework the stubs to use the stack pointer to derive the start of the data page, rather than requiring it to be hard-coded. In the clone stub, also integrate the int3 into the stack remap, since we really must not use the stack while we remap it. This prepares for putting the stub at a variable location that's not part of the normal address space of the userspace processes running inside the UML machine. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:35:02 +01:00
Johannes Berg	84b2789d61	um: separate child and parent errors in clone stub If the two are mixed up, then it looks as though the parent returned an error if the child failed (before) the mmap(), and then the resulting process never gets killed. Fix this by splitting the child and parent errors, reporting and using them appropriately. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:34:33 +01:00
Johannes Berg	a7d48886ca	um: defer killing userspace on page table update failures In some cases we can get to fix_range_common() with mmap_sem held, and in others we get there without it being held. For example, we get there with it held from sys_mprotect(), and without it held from fork_handler(). Avoid any issues in this and simply defer killing the task until it runs the next time. Do it on the mm so that another task that shares the same mm can't continue running afterwards. Cc: stable@vger.kernel.org Fixes: `468f65976a` ("um: Fix hung task in fix_range_common()") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:32:04 +01:00
Johannes Berg	47da29763e	um: mm: check more comprehensively for stub changes If userspace tries to change the stub, we need to kill it, because otherwise it can escape the virtual machine. In a few cases the stub checks weren't good, e.g. if userspace just tries to mmap(0x100000 - 0x1000, 0x3000, ...) it could succeed to get a new private/anonymous mapping replacing the stubs. Fix this by checking everywhere, and checking for _overlap_, not just direct changes. Cc: stable@vger.kernel.org Fixes: `3963333fe6` ("uml: cover stubs with a VMA") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:31:08 +01:00
Johannes Berg	e1e22d0d91	um: print register names in wait_for_stub Since we're basically debugging the userspace (it runs in ptrace) it's useful to dump out the registers - but they're not readable, so if something goes wrong it's hard to say what. Print the names of registers in the register dump so it's easier to look at. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:30:19 +01:00
Christophe Leroy	731ecea3e5	mm: Remove arch_remap() and mm-arch-hooks.h powerpc was the last provider of arch_remap() and the last user of mm-arch-hooks.h. Since commit `526a9c4a72` ("powerpc/vdso: Provide vdso_remap()"), arch_remap() hence mm-arch-hooks.h are not used anymore. Remove them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:27:43 +01:00
Colin Ian King	3a5f415474	um: fix spelling mistake in Kconfig "privleges" -> "privileges" There is a spelling mistake in the Kconfig help text. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:26:20 +01:00
Johannes Berg	1fcf9da389	um: virtio: allow devices to be configured for wakeup With all the IRQ machinery being in place, we can allow virtio devices to additionally be configured as wakeup sources, in which case basically any interrupt from them wakes us up. Note that this requires a call FD because the VQs are all disabled. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:25:46 +01:00
Johannes Berg	c8177aba37	um: time-travel: rework interrupt handling in ext mode In external time-travel mode, where time is controlled via the controller application socket, interrupt handling is a little tricky. For example on virtio, the following happens: * we receive a message (that requires an ACK) on the vhost-user socket * we add a time-travel event to handle the interrupt (this causes communication on the time socket) * we ACK the original vhost-user message * we then handle the interrupt once the event is triggered This protocol ensures that the sender of the interrupt only continues to run in the simulation when the time-travel event has been added. So far, this was only done in the virtio driver, but it was actually wrong, because only virtqueue interrupts were handled this way, and config change interrupts were handled immediately. Additionally, the messages were actually handled in the real Linux interrupt handler, but Linux interrupt handlers are part of the simulation and shouldn't run while there's no time event. To really do this properly and only handle all kinds of interrupts in the time-travel event when we are scheduled to run in the simulation, rework this to plug in to the lower interrupt layers in UML directly: Add a um_request_irq_tt() function that let's a time-travel aware driver request an interrupt with an additional timetravel_handler() that is called outside of the context of the simulation, to handle the message only. It then adds an event to the time-travel calendar if necessary, and no "real" Linux code runs outside of the time simulation. This also hooks in with suspend/resume properly now, since this new timetravel_handler() can run while Linux is suspended and interrupts are disabled, and decide to wake up (or not) the system based on the message it received. Importantly in this case, it ACKs the message before the system even resumes and interrupts are re-enabled, thus allowing the simulation to progress properly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:24:27 +01:00
Johannes Berg	9b84512cfe	um: virtio: disable VQs during suspend If the system is suspended, the device shouldn't be able to send anything to it. Disable virtqueues in suspend to simulate this, and as we might be only using s2idle (kernel services are still on), prevent sending anything on them as well. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:22:32 +01:00
Johannes Berg	10c2b5aeb2	um: virtio: fix handling of messages without payload If we have a message without payload, we call full_read() with len set to 0, which causes it to return -ECONNRESET. Catch this case and explicitly return 0 for it so we can actually use the zero-size config-changed message. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:21:52 +01:00
Johannes Berg	74e919d230	um: virtio: clean up a comment There's no 'simtime' device, because implementing that through virtio was just too much complexity. Clean up the comment that still refers to it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-02-12 21:21:17 +01:00
Johannes Berg	7f3414226b	um: time: fix initialization in time-travel mode In time-travel mode, since my previous patch, the start time was initialized too late, so that the system would read it before we set it, thus always starting system time at 0 (1970-01-01). This happens because timekeeping_init() reads the time and is called before time_init(). Unfortunately, I didn't see this before because I was testing it only with the RTC patch applied (and enabled), and then the time is read again by the RTC a little - after time_init() this time. Fix this by just doing the initialization whenever necessary. Fixes: `2701c1bd91` ("um: time: Fix read_persistent_clock64() in time-travel") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-01-26 22:11:38 +01:00
Johannes Berg	9868c2081d	um: fix os_idle_sleep() to not hang Changing os_idle_sleep() to use pause() (I accidentally described it as an empty select() in the commit log because I had changed it from that to pause() in a later revision) exposed a race condition in the idle code. The following can happen: timer_settime(0, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=624017}}, NULL) = 0 ... <SIGALRM is delivered but we're already on the way to idle> pause() and we now hang forever. This was previously possible as well, but it could never cause UML to hang for more than a second since we could only sleep for that much, so at most you'd notice a "hiccup" in the UML. Obviously, any sort of external interrupt also "saves" it and interrupts pause(). Fix this by properly handling the race, rather than papering over it again: - first, block SIGALRM, and obtain the old signal set - check the timer - suspend, waiting for any signal out of the old set, if, and only if, the timer will fire in the future - restore the old signal mask This ensures race-free operation: as it's blocked, the signal won't be delivered while we're looking at the timer even if it were to be triggered right _after_ we've returned from timer_gettime() with a non-zero value (telling us the timer will trigger). Thus, despite getting to sigsuspend() because timer_gettime() told us we're still waiting, we'll not hang because sigsuspend() will return immediately due to the pending signal. Fixes: `49da38a3ef` ("um: Simplify os_idle_sleep() and sleep longer") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-01-26 22:11:38 +01:00
Johannes Berg	a31e9c4e72	Revert "um: support some of ARCH_HAS_SET_MEMORY" This reverts commit `963285b0b4` ("um: support some of ARCH_HAS_SET_MEMORY"), as it turns out that it's not only not working (due to um never using the protection bits in the page tables) but also corrupts the page tables if used on a non-vmalloc page, since um never allocates proper page tables for the 'physmem' in the first place. Fixing all this will take more effort, so for now revert it. Reported-by: Benjamin Berg <benjamin@sipsolutions.net> Fixes: `963285b0b4` ("um: support some of ARCH_HAS_SET_MEMORY") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-01-26 22:11:38 +01:00
Johannes Berg	2fcb4090cd	Revert "um: allocate a guard page to helper threads" This reverts commit `ef4459a6da` ("um: allocate a guard page to helper threads"), it's broken in multiple ways: 1) the free no longer matches the alloc; and 2) more importantly, the set_memory_ro() causes allocation of page tables for the normal memory that doesn't have any, and that later causes corruption and crashes (usually but not always in vfree()). We could fix the first bug and use vmalloc() to work around the second, but set_memory_ro() actually doesn't do anything either so I'll just revert that as well. Reported-by: Benjamin Berg <benjamin@sipsolutions.net> Fixes: `ef4459a6da` ("um: allocate a guard page to helper threads") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-01-26 22:11:38 +01:00
Johannes Berg	f4172b0843	um: virtio: free vu_dev only with the contained struct device Since struct device is refcounted, we shouldn't free the vu_dev immediately when it's removed from the platform device, but only when the references actually all go away. Move the freeing to the release to accomplish that. Fixes: `5d38f32499` ("um: drivers: Add virtio vhost-user driver") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-01-26 22:11:38 +01:00
Thomas Meyer	e23fe90dec	um: kmsg_dumper: always dump when not tty console With the addition of the ttynull console driver, the chance that a console driver was already registerd did increase. Refine the logic when to dump the kernel message buffer: always dump the buffer, when the UML stdio console driver is not active and the preferred console. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-01-26 22:11:37 +01:00

... 3 4 5 6 7 ...

2486 Commits