With this change, on SYSCALL64 code path we are now populating
pt_regs->cs, pt_regs->ss and pt_regs->rcx unconditionally and
therefore don't need to do that in FIXUP_TOP_OF_STACK.
We lose a number of large instructions there:
text data bss dec hex filename
13298 0 0 13298 33f2 entry_64_before.o
12978 0 0 12978 32b2 entry_64.o
What's more important, we convert two "MOVQ $imm,off(%rsp)" to
"PUSH $imm" (the ones which fill pt_regs->cs,ss).
Before this patch, placing them on fast path was slowing it down
by two cycles: this form of MOV is very large, 12 bytes, and
this probably reduces decode bandwidth to one instruction per cycle
when CPU sees them.
Therefore they were living in FIXUP_TOP_OF_STACK instead (away
from fast path).
"PUSH $imm" is a small 2-byte instruction. Moving it to fast path does
not slow it down in my measurements.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
PER_CPU_VAR(kernel_stack) was set up in a way where it points
five stack slots below the top of stack.
Presumably, it was done to avoid one "sub $5*8,%rsp"
in syscall/sysenter code paths, where iret frame needs to be
created by hand.
Ironically, none of them benefits from this optimization,
since all of them need to allocate additional data on stack
(struct pt_regs), so they still have to perform subtraction.
This patch eliminates KERNEL_STACK_OFFSET.
PER_CPU_VAR(kernel_stack) now points directly to top of stack.
pt_regs allocations are adjusted to allocate iret frame as well.
Hopefully we can merge it later with 32-bit specific
PER_CPU_VAR(cpu_current_top_of_stack) variable...
Net result in generated code is that constants in several insns
are changed.
This change is necessary for changing struct pt_regs creation
in SYSCALL64 code path from MOV to PUSH instructions.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This changes the THREAD_INFO() definition and all its callsites
so that they do not count stack position from
(top of stack - KERNEL_STACK_OFFSET), but from top of stack.
Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??"
are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS)
- "calculate thread_info's address using information that
rsp is SIZEOF_PTREGS bytes below top of stack".
While at it, replace "(off)-THREAD_SIZE(reg)" with equivalent
"((off)-THREAD_SIZE)(reg)". The form without parentheses
falsely looks like we invoke THREAD_SIZE() macro.
Improve comment atop THREAD_INFO macro definition.
This patch does not change generated code (verified by objdump).
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Having syscall32/sysenter32 initialization in a separate tiny
function, called from within a function that is already syscall
init specific, serves no real purpose.
Its existense also caused an unintended effect of having
wrmsrl(MSR_CSTAR) performed twice: once we set it to a dummy
function returning -ENOSYS, and immediately after
(if CONFIG_IA32_EMULATION), we set it to point to the proper
syscall32 entry point, ia32_cstar_target.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Both the execve() and sigreturn() family of syscalls have the
ability to change registers in ways that may not be compatabile
with the syscall path they were called from.
In particular, SYSRET and SYSEXIT can't handle non-default %cs and %ss,
and some bits in eflags.
These syscalls have stubs that are hardcoded to jump to the IRET path,
and not return to the original syscall path.
The following commit:
76f5df43ca ("Always allocate a complete "struct pt_regs" on the kernel stack")
recently changed this for some 32-bit compat syscalls, but introduced a bug where
execve from a 32-bit program to a 64-bit program would fail because it still returned
via SYSRETL. This caused Wine to fail when built for both 32-bit and 64-bit.
This patch sets TIF_NOTIFY_RESUME for execve() and sigreturn() so
that the IRET path is always taken on exit to userspace.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1426978461-32089-1-git-send-email-brgerst@gmail.com
[ Improved the changelog and comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull bugfix for md from Neil Brown:
"One fix for md in 4.0-rc4
Regression in recent patch causes crash on error path"
* tag 'md/4.0-rc4-fix' of git://neil.brown.name/md:
md: fix problems with freeing private data after ->run failure.
Pull driver core fixes from Greg KH:
"Here are two bugfixes for things reported. One regression in kernfs,
and another issue fixed in the LZ4 code that was fixed in the
"upstream" codebase that solves a reported kernel crash
Both have been in linux-next for a while"
* tag 'driver-core-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
LZ4 : fix the data abort issue
kernfs: handle poll correctly on 'direct_read' files.
Pull char/misc fixes from Greg KH:
"Here are three fixes for 4.0-rc5 that revert 3 PCMCIA patches that
were merged in 4.0-rc1 that cause regressions. So let's revert them
for now and they will be reworked and resent sometime in the future.
All have been tested in linux-next for a while"
* tag 'char-misc-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Revert "pcmcia: add a new resource manager for non ISA systems"
Revert "pcmcia: fix incorrect bracketing on a test"
Revert "pcmcia: add missing include for new pci resource handler"
Pull staging driver fixes from Greg KH:
"Here are four small staging driver fixes, all for the vt6656 and
vt6655 drivers, that resolve some reported issues with them.
All of these patches have been in linux next for a while"
* tag 'staging-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
vt6655: Fix late setting of byRFType.
vt6655: RFbSetPower fix missing rate RATE_12M
staging: vt6656: vnt_rf_setpower: fix missing rate RATE_12M
staging: vt6655: vnt_tx_packet fix dma_idx selection.
Pull tty/serial driver fix from Greg KH:
"Here's a single 8250 serial driver that fixes a reported deadlock with
the serial console and the tty driver.
It's been in linux-next for a while now"
* tag 'tty-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250_dw: Fix deadlock in LCR workaround
Pull USB / PHY driver fixes from Greg KH:
"Here's a number of USB and PHY driver fixes for 4.0-rc5.
The largest thing here is a revert of a gadget function driver patch
that removes 500 lines of code. Other than that, it's a number of
reported bugs fixes and new quirk/id entries.
All have been in linux-next for a while"
* tag 'usb-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
usb: common: otg-fsm: only signal connect after switching to peripheral
uas: Add US_FL_NO_ATA_1X for Initio Corporation controllers / devices
USB: ehci-atmel: rework clk handling
MAINTAINERS: add entry for USB OTG FSM
usb: chipidea: otg: add a_alt_hnp_support response for B device
phy: omap-usb2: Fix missing clk_prepare call when using old dt name
phy: ti/omap: Fix modalias
phy: core: Fixup return value of phy_exit when !pm_runtime_enabled
phy: miphy28lp: Convert to devm_kcalloc and fix wrong sizof
phy: miphy365x: Convert to devm_kcalloc and fix wrong sizeof
phy: twl4030-usb: Remove redundant assignment for twl->linkstat
phy: exynos5-usbdrd: Fix off-by-one valid value checking for args->args[0]
phy: Find the right match in devm_phy_destroy()
phy: rockchip-usb: Fixup rockchip_usb_phy_power_on failure path
phy: ti-pipe3: Simplify ti_pipe3_dpll_wait_lock implementation
phy: samsung-usb2: Remove NULL terminating entry from phys array
phy: hix5hd2-sata: Check return value of platform_get_resource
phy: exynos-dp-video: Kill exynos_dp_video_phy_pwr_isol function
Revert "usb: gadget: zero: Add support for interrupt EP"
Revert "xhci: Clear the host side toggle manually when endpoint is 'soft reset'"
...