Commit Graph

735133 Commits

Author SHA1 Message Date
John Fastabend b11a632c44 net: add a UID to use for ULP socket assignment
Create a UID field and enum that can be used to assign ULPs to
sockets. This saves a set of string comparisons if the ULP id
is known.

For sockmap, which is added in the next patches, a ULP is used to
hook into TCP sockets close state. In this case the ULP being added
is done at map insert time and the ULP is known and done on the kernel
side. In this case the named lookup is not needed. Because we don't
want to expose psock internals to user space socket options a user
visible flag is also added. For TLS this is set for BPF it will be
cleared.

Alos remove pr_notice, user gets an error code back and should check
that rather than rely on logs.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-06 11:39:31 +01:00
Yonghong Song 7b4eb53d95 tools/bpf: fix batch-mode test failure of test_xdp_redirect.sh
The tests at tools/testing/selftests/bpf can run in patch mode, e.g.,
    make -C tools/testing/selftests/bpf run_tests

With the batch mode, I experimented intermittent test failure of
test_xdp_redirect.sh.
    ....
    selftests: test_xdp_redirect [PASS]
    selftests: test_xdp_redirect.sh [PASS]
    RTNETLINK answers: File exists
    selftests: test_xdp_meta [FAILED]
    selftests: test_xdp_meta.sh [FAIL]
    ....

The following illustrates what caused the failure:
     (1). test_xdp_redirect creates veth pairs (veth1,veth11) and
          (veth2,veth22), and assign veth11 and veth22 to namespace
          ns1 and ns2 respectively.
     (2). at the end of test_xdp_redirect test, ns1 and ns2 are
          deleted. During this process, the deletion of actual
          namespace resources, including deletion of veth1{1} and veth2{2},
          is put into a workqueue to be processed asynchronously.
     (3). test_xdp_meta tries to create veth pair (veth1, veth2).
          The previous veth deletions in step (2) have not finished yet,
          and veth1 or veth2 may be still valid in the kernel, thus
          causing the failure.

The fix is to explicitly delete the veth pair before test_xdp_redirect
exits. Only one end of veth needs deletion as the kernel will delete
the other end automatically. Also test_xdp_meta is also fixed in
similar manner to avoid future potential issues.

Fixes: 996139e801 ("selftests: bpf: add a test for XDP redirect")
Fixes: 22c8852624 ("bpf: improve selftests and add tests for meta pointer")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-06 11:34:42 +01:00
Yonghong Song 09584b4067 bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y
With CONFIG_BPF_JIT_ALWAYS_ON is defined in the config file,
tools/testing/selftests/bpf/test_kmod.sh failed like below:
  [root@localhost bpf]# ./test_kmod.sh
  sysctl: setting key "net.core.bpf_jit_enable": Invalid argument
  [ JIT enabled:0 hardened:0 ]
  [  132.175681] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
  [  132.458834] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
  [ JIT enabled:1 hardened:0 ]
  [  133.456025] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
  [  133.730935] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
  [ JIT enabled:1 hardened:1 ]
  [  134.769730] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
  [  135.050864] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
  [ JIT enabled:1 hardened:2 ]
  [  136.442882] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
  [  136.821810] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
  [root@localhost bpf]#

The test_kmod.sh load/remove test_bpf.ko multiple times with different
settings for sysctl net.core.bpf_jit_{enable,harden}. The failed test #297
of test_bpf.ko is designed such that JIT always fails.

Commit 290af86629 (bpf: introduce BPF_JIT_ALWAYS_ON config)
introduced the following tightening logic:
    ...
        if (!bpf_prog_is_dev_bound(fp->aux)) {
                fp = bpf_int_jit_compile(fp);
    #ifdef CONFIG_BPF_JIT_ALWAYS_ON
                if (!fp->jited) {
                        *err = -ENOTSUPP;
                        return fp;
                }
    #endif
    ...
With this logic, Test #297 always gets return value -ENOTSUPP
when CONFIG_BPF_JIT_ALWAYS_ON is defined, causing the test failure.

This patch fixed the failure by marking Test #297 as expected failure
when CONFIG_BPF_JIT_ALWAYS_ON is defined.

Fixes: 290af86629 (bpf: introduce BPF_JIT_ALWAYS_ON config)
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-05 00:31:57 +01:00
David S. Miller a6b88814ab Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2018-02-02

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) support XDP attach in libbpf, from Eric.

2) minor fixes, from Daniel, Jakub, Yonghong, Alexei.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-04 16:46:58 -05:00
Linus Torvalds 35277995e1 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull spectre/meltdown updates from Thomas Gleixner:
 "The next round of updates related to melted spectrum:

   - The initial set of spectre V1 mitigations:

       - Array index speculation blocker and its usage for syscall,
         fdtable and the n180211 driver.

       - Speculation barrier and its usage in user access functions

   - Make indirect calls in KVM speculation safe

   - Blacklisting of known to be broken microcodes so IPBP/IBSR are not
     touched.

   - The initial IBPB support and its usage in context switch

   - The exposure of the new speculation MSRs to KVM guests.

   - A fix for a regression in x86/32 related to the cpu entry area

   - Proper whitelisting for known to be safe CPUs from the mitigations.

   - objtool fixes to deal proper with retpolines and alternatives

   - Exclude __init functions from retpolines which speeds up the boot
     process.

   - Removal of the syscall64 fast path and related cleanups and
     simplifications

   - Removal of the unpatched paravirt mode which is yet another source
     of indirect unproteced calls.

   - A new and undisputed version of the module mismatch warning

   - A couple of cleanup and correctness fixes all over the place

  Yet another step towards full mitigation. There are a few things still
  missing like the RBS underflow mitigation for Skylake and other small
  details, but that's being worked on.

  That said, I'm taking a belated christmas vacation for a week and hope
  that everything is magically solved when I'm back on Feb 12th"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
  KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  KVM/x86: Add IBPB support
  KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX
  x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL
  x86/pti: Mark constant arrays as __initconst
  x86/spectre: Simplify spectre_v2 command line parsing
  x86/retpoline: Avoid retpolines for built-in __init functions
  x86/kvm: Update spectre-v1 mitigation
  KVM: VMX: make MSR bitmaps per-VCPU
  x86/paravirt: Remove 'noreplace-paravirt' cmdline option
  x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel
  x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable"
  x86/spectre: Report get_user mitigation for spectre_v1
  nl80211: Sanitize array index in parse_txq_params
  vfs, fdtable: Prevent bounds-check bypass via speculative execution
  x86/syscall: Sanitize syscall table de-references under speculation
  x86/get_user: Use pointer masking to limit speculation
  ...
2018-02-04 11:45:55 -08:00
Linus Torvalds 0a646e9c99 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A small set of changes:

   - a fixup for kexec related to 5-level paging mode. That covers most
     of the cases except kexec from a 5-level kernel to a 4-level
     kernel. The latter needs more work and is going to come in 4.17

   - two trivial fixes for build warnings triggered by LTO and gcc-8"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/power: Fix swsusp_arch_resume prototype
  x86/dumpstack: Avoid uninitlized variable
  x86/kexec: Make kexec (mostly) work in 5-level paging mode
2018-02-04 11:43:30 -08:00
Linus Torvalds f74a127f66 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "Two small changes:

   - a fix for a interrupt regression caused by the vector management
     changes in 4.15 affecting museum pieces which rely on interrupt
     probing for legacy (e.g. parallel port) devices.

     One of the startup calls in the autoprobe code was not changed to
     the new activate_and_startup() function resulting in a warning and
     as a consequence failing to discover the device interrupt.

   - a trivial update to the copyright/license header of the STM32 irq
     chip driver"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Make legacy autoprobing work again
  irqchip/stm32: Fix copyright
2018-02-04 11:41:31 -08:00
Linus Torvalds 64b28683de Merge tag 'for-linus-20180204' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
 "Most of this is fixes and not new code/features:

   - skd fix from Arnd, fixing a build error dependent on sla allocator
     type.

   - blk-mq scheduler discard merging fixes, one from me and one from
     Keith. This fixes a segment miscalculation for blk-mq-sched, where
     we mistakenly think two segments are physically contigious even
     though the request isn't carrying real data. Also fixes a bio-to-rq
     merge case.

   - Don't re-set a bit on the buffer_head flags, if it's already set.
     This can cause scalability concerns on bigger machines and
     workloads. From Kemi Wang.

   - Add BLK_STS_DEV_RESOURCE return value to blk-mq, allowing us to
     distuingish between a local (device related) resource starvation
     and a global one. The latter might happen without IO being in
     flight, so it has to be handled a bit differently. From Ming"

* tag 'for-linus-20180204' of git://git.kernel.dk/linux-block:
  block: skd: fix incorrect linux/slab_def.h inclusion
  buffer: Avoid setting buffer bits that are already set
  blk-mq-sched: Enable merging discard bio into request
  blk-mq: fix discard merge with scheduler attached
  blk-mq: introduce BLK_STS_DEV_RESOURCE
2018-02-04 11:16:35 -08:00
Linus Torvalds d3658c2266 Merge tag 'ntb-4.16' of git://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
 "Bug fixes galore, removal of the ntb atom driver, and updates to the
  ntb tools and tests to support the multi-port interface"

* tag 'ntb-4.16' of git://github.com/jonmason/ntb: (37 commits)
  NTB: ntb_perf: fix cast to restricted __le32
  ntb_perf: Fix an error code in perf_copy_chunk()
  ntb_hw_switchtec: Make function switchtec_ntb_remove() static
  NTB: ntb_tool: fix memory leak on 'buf' on error exit path
  NTB: ntb_perf: fix printing of resource_size_t
  NTB: ntb_hw_idt: Set NTB_TOPO_SWITCH topology
  NTB: ntb_test: Update ntb_perf tests
  NTB: ntb_test: Update ntb_tool MW tests
  NTB: ntb_test: Add ntb_tool Message tests
  NTB: ntb_test: Update ntb_tool Scratchpad tests
  NTB: ntb_test: Update ntb_tool DB tests
  NTB: ntb_test: Update ntb_tool link tests
  NTB: ntb_test: Add ntb_tool port tests
  NTB: ntb_test: Safely use paths with whitespace
  NTB: ntb_perf: Add full multi-port NTB API support
  NTB: ntb_tool: Add full multi-port NTB API support
  NTB: ntb_pp: Add full multi-port NTB API support
  NTB: Fix UB/bug in ntb_mw_get_align()
  NTB: Set dma mask and dma coherent mask to NTB devices
  NTB: Rename NTB messaging API methods
  ...
2018-02-04 11:13:49 -08:00
Linus Torvalds 8ac4840a3c Merge tag 'mailbox-v4.16' of git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox updates from Jassi Brar:
 "Misc driver changes only:

   - TI-MsgMgr: Fix print format for a printk

   - TI-MSgMgr: SPDX license switch for the driver

   - QCOM-IPC: Convert driver to use regmap

   - QCOM-IPC: Spawn sibling clock device from mailbox driver"

* tag 'mailbox-v4.16' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
  dt-bindings: mailbox: qcom: Document the APCS clock binding
  mailbox: qcom: Create APCS child device for clock controller
  mailbox: qcom: Convert APCS IPC driver to use regmap
  mailbox: ti-msgmgr: Use %zu for size_t print format
  mailbox: ti-msgmgr: Switch to SPDX Licensing
2018-02-04 11:11:23 -08:00
Linus Torvalds 4141cf676b Merge branch 'i2c/for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
 "I2C has the following changes for you:

   - new flag to mark DMA safe buffers in i2c_msg. Also, some
     infrastructure around it. And docs.

   - huge refactoring of the at24 driver led by the new maintainer
     Bartosz

   - update I2C bus recovery to send STOP after recovery

   - conversion from gpio to gpiod for I2C bus recovery

   - adding a fault-injector to the i2c-gpio driver

   - lots of small driver improvements, and bigger ones to
     i2c-sh_mobile"

* 'i2c/for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (99 commits)
  i2c: mv64xxx: Add myself as maintainer for this driver
  i2c: mv64xxx: Fix clock resource by adding an optional bus clock
  i2c: mv64xxx: Remove useless test before clk_disable_unprepare
  i2c: mxs: use true and false for boolean values
  i2c: meson: update doc description to fix build warnings
  i2c: meson: add configurable divider factors
  dt-bindings: i2c: update documentation for the Meson-AXG
  i2c: imx-lpi2c: add runtime pm support
  i2c: rcar: fix some trivial typos in comments
  i2c: davinci: fix the cpufreq transition
  i2c: rk3x: add proper kerneldoc header
  i2c: rk3x: account for const type of of_device_id.data
  i2c: acorn: remove outdated path from file header
  i2c: acorn: add MODULE_LICENSE tag
  i2c: rcar: implement bus recovery
  i2c: send STOP after successful bus recovery
  i2c: ensure SDA is released in recovery if SDA is controllable
  i2c: add 'set_sda' to bus_recovery_info
  i2c: add identifier in declarations for i2c_bus_recovery
  i2c: make kerneldoc about bus recovery more precise
  ...
2018-02-04 10:57:43 -08:00
Linus Torvalds 3462ac5703 Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt
Pull fscrypt updates from Ted Ts'o:
 "Refactor support for encrypted symlinks to move common code to fscrypt"

Ted also points out about the merge:
 "This makes the f2fs symlink code use the fscrypt_encrypt_symlink()
  from the fscrypt tree. This will end up dropping the kzalloc() ->
  f2fs_kzalloc() change, which means the fscrypt-specific allocation
  won't get tested by f2fs's kmalloc error injection system; which is
  fine"

* tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: (26 commits)
  fscrypt: fix build with pre-4.6 gcc versions
  fscrypt: remove 'ci' parameter from fscrypt_put_encryption_info()
  fscrypt: document symlink length restriction
  fscrypt: fix up fscrypt_fname_encrypted_size() for internal use
  fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names
  fscrypt: calculate NUL-padding length in one place only
  fscrypt: move fscrypt_symlink_data to fscrypt_private.h
  fscrypt: remove fscrypt_fname_usr_to_disk()
  ubifs: switch to fscrypt_get_symlink()
  ubifs: switch to fscrypt ->symlink() helper functions
  ubifs: free the encrypted symlink target
  f2fs: switch to fscrypt_get_symlink()
  f2fs: switch to fscrypt ->symlink() helper functions
  ext4: switch to fscrypt_get_symlink()
  ext4: switch to fscrypt ->symlink() helper functions
  fscrypt: new helper function - fscrypt_get_symlink()
  fscrypt: new helper functions for ->symlink()
  fscrypt: trim down fscrypt.h includes
  fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c
  fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h
  ...
2018-02-04 10:43:12 -08:00
Georgi Djakov 0ae7d327a6 dt-bindings: mailbox: qcom: Document the APCS clock binding
Update the binding documentation for APCS to mention that the APCS
hardware block also expose a clock controller functionality.

The APCS clock controller is a mux and half-integer divider. It has the
main CPU PLL as an input and provides the clock for the application CPU.

Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2018-02-04 12:16:34 +05:30
Georgi Djakov c815d769b5 mailbox: qcom: Create APCS child device for clock controller
There is a clock controller functionality provided by the APCS hardware
block of msm8916 devices. The device-tree would represent an APCS node
with both mailbox and clock provider properties.
Create a platform child device for the clock controller functionality so
the driver can probe and use APCS as parent.

Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2018-02-04 12:16:23 +05:30
Georgi Djakov c6a8b171ca mailbox: qcom: Convert APCS IPC driver to use regmap
This hardware block provides more functionalities that just IPC. Convert
it to regmap to allow other child platform devices to use the same regmap.

Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2018-02-04 12:16:09 +05:30
Linus Torvalds 617aebe6a9 Merge tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardened usercopy whitelisting from Kees Cook:
 "Currently, hardened usercopy performs dynamic bounds checking on slab
  cache objects. This is good, but still leaves a lot of kernel memory
  available to be copied to/from userspace in the face of bugs.

  To further restrict what memory is available for copying, this creates
  a way to whitelist specific areas of a given slab cache object for
  copying to/from userspace, allowing much finer granularity of access
  control.

  Slab caches that are never exposed to userspace can declare no
  whitelist for their objects, thereby keeping them unavailable to
  userspace via dynamic copy operations. (Note, an implicit form of
  whitelisting is the use of constant sizes in usercopy operations and
  get_user()/put_user(); these bypass all hardened usercopy checks since
  these sizes cannot change at runtime.)

  This new check is WARN-by-default, so any mistakes can be found over
  the next several releases without breaking anyone's system.

  The series has roughly the following sections:
   - remove %p and improve reporting with offset
   - prepare infrastructure and whitelist kmalloc
   - update VFS subsystem with whitelists
   - update SCSI subsystem with whitelists
   - update network subsystem with whitelists
   - update process memory with whitelists
   - update per-architecture thread_struct with whitelists
   - update KVM with whitelists and fix ioctl bug
   - mark all other allocations as not whitelisted
   - update lkdtm for more sensible test overage"

* tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits)
  lkdtm: Update usercopy tests for whitelisting
  usercopy: Restrict non-usercopy caches to size 0
  kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
  kvm: whitelist struct kvm_vcpu_arch
  arm: Implement thread_struct whitelist for hardened usercopy
  arm64: Implement thread_struct whitelist for hardened usercopy
  x86: Implement thread_struct whitelist for hardened usercopy
  fork: Provide usercopy whitelisting for task_struct
  fork: Define usercopy region in thread_stack slab caches
  fork: Define usercopy region in mm_struct slab caches
  net: Restrict unwhitelisted proto caches to size 0
  sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
  sctp: Define usercopy region in SCTP proto slab cache
  caif: Define usercopy region in caif proto slab cache
  ip: Define usercopy region in IP proto slab cache
  net: Define usercopy region in struct proto slab cache
  scsi: Define usercopy region in scsi_sense_cache slab cache
  cifs: Define usercopy region in cifs_request slab cache
  vxfs: Define usercopy region in vxfs_inode slab cache
  ufs: Define usercopy region in ufs_inode_cache slab cache
  ...
2018-02-03 16:25:42 -08:00
KarimAllah Ahmed b2ac58f905 KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
[ Based on a patch from Paolo Bonzini <pbonzini@redhat.com> ]

... basically doing exactly what we do for VMX:

- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
  actually used it.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: kvm@vger.kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/1517669783-20732-1-git-send-email-karahmed@amazon.de
2018-02-03 23:06:52 +01:00
KarimAllah Ahmed d28b387fb7 KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
[ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
guests that do not actually use the MSR, only start saving and restoring
when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: kvm@vger.kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karahmed@amazon.de
2018-02-03 23:06:52 +01:00
KarimAllah Ahmed 28c1c9fabf KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
(bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
contents will come directly from the hardware, but user-space can still
override it.

[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: kvm@vger.kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karahmed@amazon.de
2018-02-03 23:06:52 +01:00
Ashok Raj 15d4507152 KVM/x86: Add IBPB support
The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
control mechanism. It keeps earlier branches from influencing
later ones.

Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
It's a command that ensures predicted branch targets aren't used after
the barrier. Although IBRS and IBPB are enumerated by the same CPUID
enumeration, IBPB is very different.

IBPB helps mitigate against three potential attacks:

* Mitigate guests from being attacked by other guests.
  - This is addressed by issing IBPB when we do a guest switch.

* Mitigate attacks from guest/ring3->host/ring3.
  These would require a IBPB during context switch in host, or after
  VMEXIT. The host process has two ways to mitigate
  - Either it can be compiled with retpoline
  - If its going through context switch, and has set !dumpable then
    there is a IBPB in that path.
    (Tim's patch: https://patchwork.kernel.org/patch/10192871)
  - The case where after a VMEXIT you return back to Qemu might make
    Qemu attackable from guest when Qemu isn't compiled with retpoline.
  There are issues reported when doing IBPB on every VMEXIT that resulted
  in some tsc calibration woes in guest.

* Mitigate guest/ring0->host/ring0 attacks.
  When host kernel is using retpoline it is safe against these attacks.
  If host kernel isn't using retpoline we might need to do a IBPB flush on
  every VMEXIT.

Even when using retpoline for indirect calls, in certain conditions 'ret'
can use the BTB on Skylake-era CPUs. There are other mitigations
available like RSB stuffing/clearing.

* IBPB is issued only for SVM during svm_free_vcpu().
  VMX has a vmclear and SVM doesn't.  Follow discussion here:
  https://lkml.org/lkml/2018/1/15/146

Please refer to the following spec for more details on the enumeration
and control.

Refer here to get documentation about mitigations.

https://software.intel.com/en-us/side-channel-security-support

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
           - vmx: expose PRED_CMD if guest has it in CPUID
           - svm: only pass through IBPB if guest has it in CPUID
           - vmx: support !cpu_has_vmx_msr_bitmap()]
           - vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
        PRED_CMD is a write-only MSR]

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: kvm@vger.kernel.org
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
Link: https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karahmed@amazon.de
2018-02-03 23:06:51 +01:00
KarimAllah Ahmed b7b27aa011 KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX
[dwmw2: Stop using KF() for bits in it, too]
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Link: https://lkml.kernel.org/r/1517522386-18410-2-git-send-email-karahmed@amazon.de
2018-02-03 23:06:51 +01:00
Linus Torvalds 0771ad44a2 Merge tag 'pstore-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore update from Kees Cook:
 "Only a header cleanup this release; nice and quiet. :)

   - clean up hardirq header usage (Yang Shi)"

* tag 'pstore-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  fs: pstore: remove unused hardirq.h
2018-02-03 13:55:01 -08:00
Linus Torvalds 23aedc4b9b Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "Only miscellaneous cleanups and bug fixes for ext4 this cycle"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: create ext4_kset dynamically
  ext4: create ext4_feat kobject dynamically
  ext4: release kobject/kset even when init/register fail
  ext4: fix incorrect indentation of if statement
  ext4: correct documentation for grpid mount option
  ext4: use 'sbi' instead of 'EXT4_SB(sb)'
  ext4: save error to disk in __ext4_grp_locked_error()
  jbd2: fix sphinx kernel-doc build warnings
  ext4: fix a race in the ext4 shutdown path
  mbcache: make sure c_entry_count is not decremented past zero
  ext4: no need flush workqueue before destroying it
  ext4: fixed alignment and minor code cleanup in ext4.h
  ext4: fix ENOSPC handling in DAX page fault handler
  dax: pass detailed error code from dax_iomap_fault()
  mbcache: revert "fs/mbcache.c: make count_objects() more robust"
  mbcache: initialize entry->e_referenced in mb_cache_entry_create()
  ext4: fix up remaining files with SPDX cleanups
2018-02-03 13:49:22 -08:00
Linus Torvalds 85b8bac957 Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
Pull dmi subsystem updates/fixes from Jean Delvare.

* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  firmware: dmi: handle missing DMI data gracefully
  firmware: dmi_scan: Fix handling of empty DMI strings
  firmware: dmi_scan: Drop dmi_initialized
  firmware: dmi: Optimize dmi_matches
2018-02-03 13:46:14 -08:00
Linus Torvalds 1726aa70e7 Merge branch 'fixes-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull integrity fixes from James Morris:

-  add James Bottommley as a Trusted Keys maintainer.

 - IMA: re-initialize iint->atomic_flags on iint_free(), from Mimi.

* 'fixes-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  ima: re-initialize iint->atomic_flags
  maintainers: update trusted keys
2018-02-03 13:44:29 -08:00