mirror of
https://github.com/ukui/kernel.git
synced 2026-03-09 10:07:04 -07:00
81d2c6f81996e01fbcd2b5aeefbb519e21c806e9
7984 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e524d16e7e |
Merge tag 'copy-struct-from-user-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull copy_struct_from_user() helper from Christian Brauner: "This contains the copy_struct_from_user() helper which got split out from the openat2() patchset. It is a generic interface designed to copy a struct from userspace. The helper will be especially useful for structs versioned by size of which we have quite a few. This allows for backwards compatibility, i.e. an extended struct can be passed to an older kernel, or a legacy struct can be passed to a newer kernel. For the first case (extended struct, older kernel) the new fields in an extended struct can be set to zero and the struct safely passed to an older kernel. The most obvious benefit is that this helper lets us get rid of duplicate code present in at least sched_setattr(), perf_event_open(), and clone3(). More importantly it will also help to ensure that users implementing versioning-by-size end up with the same core semantics. This point is especially crucial since we have at least one case where versioning-by-size is used but with slighly different semantics: sched_setattr(), perf_event_open(), and clone3() all do do similar checks to copy_struct_from_user() while rt_sigprocmask(2) always rejects differently-sized struct arguments. With this pull request we also switch over sched_setattr(), perf_event_open(), and clone3() to use the new helper" * tag 'copy-struct-from-user-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: usercopy: Add parentheses around assignment in test_copy_struct_from_user perf_event_open: switch to copy_struct_from_user() sched_setattr: switch to copy_struct_from_user() clone3: switch to copy_struct_from_user() lib: introduce copy_struct_from_user() helper |
||
|
|
af0622f6ae |
Merge tag 'for-linus-20191003' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull clone3/pidfd fixes from Christian Brauner:
"This contains a couple of fixes:
- Fix pidfd selftest compilation (Shuah Kahn)
Due to a false linking instruction in the Makefile compilation for
the pidfd selftests would fail on some systems.
- Fix compilation for glibc on RISC-V systems (Seth Forshee)
In some scenarios linux/uapi/linux/sched.h is included where
__ASSEMBLY__ is defined causing a build failure because struct
clone_args was not guarded by an #ifndef __ASSEMBLY__.
- Add missing clone3() and struct clone_args kernel-doc (Christian Brauner)
clone3() and struct clone_args were missing kernel-docs. (The goal
is to use kernel-doc for any function or type where it's worth it.)
For struct clone_args this also contains a comment about the fact
that it's versioned by size"
* tag 'for-linus-20191003' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
sched: add kernel-doc for struct clone_args
fork: add kernel-doc for clone3
selftests: pidfd: Fix undefined reference to pthread_create()
sched: Add __ASSEMBLY__ guards around struct clone_args
|
||
|
|
768b47b7a9 |
Merge tag 'drm-fixes-2019-10-04' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Been offline for 3 days, got back and had some fixes queued up.
Nothing too major, the i915 dp-mst fix is important, and amdgpu has a
bulk move speedup fix and some regressions, but nothing too insane for
an rc2 pull. The intel fixes are also 2 weeks worth, they missed the
boat last week.
core:
- writeback fixes
i915:
- Fix DP-MST crtc_mask
- Fix dsc dpp calculations
- Fix g4x sprite scaling stride check with GTT remapping
- Fix concurrence on cases where requests where getting retired at
same time as resubmitted to HW
- Fix gen9 display resolutions by setting the right max plane width
- Fix GPU hang on preemption
- Mark contents as dirty on a write fault. This was breaking cursor
sprite with dumb buffers.
komeda:
- memory leak fix
tilcdc:
- include fix
amdgpu:
- Enable bulk moves
- Power metrics fixes for Navi
- Fix S4 regression
- Add query for tcc disabled mask
- Fix several leaks in error paths
- randconfig fixes
- clang fixes"
* tag 'drm-fixes-2019-10-04' of git://anongit.freedesktop.org/drm/drm: (21 commits)
Revert "drm/i915: Fix DP-MST crtc_mask"
drm/omap: fix max fclk divider for omap36xx
drm/i915: Fix g4x sprite scaling stride check with GTT remapping
drm/i915/dp: Fix dsc bpp calculations, v5.
drm/amd/display: fix dcn21 Makefile for clang
drm/amd/display: hide an unused variable
drm/amdgpu: display_mode_vba_21: remove uint typedef
drm/amdgpu: hide another #warning
drm/amdgpu: make pmu support optional, again
drm/amd/display: memory leak
drm/amdgpu: fix multiple memory leaks in acp_hw_init
drm/amdgpu: return tcc_disabled_mask to userspace
drm/amdgpu: don't increment vram lost if we are in hibernation
Revert "drm/amdgpu: disable stutter mode for renoir"
drm/amd/powerplay: add sensor lock support for smu
drm/amd/powerplay: change metrics update period from 1ms to 100ms
drm/amdgpu: revert "disable bulk moves for now"
drm/tilcdc: include linux/pinctrl/consumer.h again
drm/komeda: prevent memory leak in komeda_wb_connector_add
drm: Clear the fence pointer when writeback job signaled
...
|
||
|
|
c4bd70e8c9 |
Merge tag 'for-linus-2019-10-03' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Mandate timespec64 for the io_uring timeout ABI (Arnd)
- Set of NVMe changes via Sagi:
- controller removal race fix from Balbir
- quirk additions from Gabriel and Jian-Hong
- nvme-pci power state save fix from Mario
- Add 64bit user commands (for 64bit registers) from Marta
- nvme-rdma/nvme-tcp fixes from Max, Mark and Me
- Minor cleanups and nits from James, Dan and John
- Two s390 dasd fixes (Jan, Stefan)
- Have loop change block size in DIO mode (Martijn)
- paride pg header ifdef guard (Masahiro)
- Two blk-mq queue scheduler tweaks, fixing an ordering issue on zoned
devices and suboptimal performance on others (Ming)
* tag 'for-linus-2019-10-03' of git://git.kernel.dk/linux-block: (22 commits)
block: sed-opal: fix sparse warning: convert __be64 data
block: sed-opal: fix sparse warning: obsolete array init.
block: pg: add header include guard
Revert "s390/dasd: Add discard support for ESE volumes"
s390/dasd: Fix error handling during online processing
io_uring: use __kernel_timespec in timeout ABI
loop: change queue block size to match when using DIO
blk-mq: apply normal plugging for HDD
blk-mq: honor IO scheduler for multiqueue devices
nvme-rdma: fix possible use-after-free in connect timeout
nvme: Move ctrl sqsize to generic space
nvme: Add ctrl attributes for queue_count and sqsize
nvme: allow 64-bit results in passthru commands
nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T
nvmet-tcp: remove superflous check on request sgl
Added QUIRKs for ADATA XPG SX8200 Pro 512GB
nvme-rdma: Fix max_hw_sectors calculation
nvme: fix an error code in nvme_init_subsystem()
nvme-pci: Save PCI state before putting drive into deepest state
nvme-tcp: fix wrong stop condition in io_work
...
|
||
|
|
78f6face5a |
sched: add kernel-doc for struct clone_args
Add kernel-doc for struct clone_args for the clone3() syscall. Link: https://lore.kernel.org/r/20191001114701.24661-3-christian.brauner@ubuntu.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> |
||
|
|
3a4b46c3bc |
block: pg: add header include guard
Add a header include guard just in case. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
815fb4c9d7 |
drm/amdgpu: return tcc_disabled_mask to userspace
UMDs need this for correct programming of harvested chips. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> |
||
|
|
f14c234b4b |
clone3: switch to copy_struct_from_user()
Switch clone3() syscall from it's own copying struct clone_args from userspace to the new dedicated copy_struct_from_user() helper. The change is very straightforward, and helps unify the syscall interface for struct-from-userspace syscalls. Additionally, explicitly define CLONE_ARGS_SIZE_VER0 to match the other users of the struct-extension pattern. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> [christian.brauner@ubuntu.com: improve commit message] Link: https://lore.kernel.org/r/20191001011055.19283-3-cyphar@cyphar.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> |
||
|
|
61129dd29f |
sched: Add __ASSEMBLY__ guards around struct clone_args
The addition of struct clone_args to uapi/linux/sched.h is not protected
by __ASSEMBLY__ guards, causing a failure to build from source for glibc
on RISC-V. Add the guards to fix this.
Fixes:
|
||
|
|
02dc96ef6c |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Sanity check URB networking device parameters to avoid divide by
zero, from Oliver Neukum.
2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
don't work properly. Longer term this needs a better fix tho. From
Vijay Khemka.
3) Small fixes to selftests (use ping when ping6 is not present, etc.)
from David Ahern.
4) Bring back rt_uses_gateway member of struct rtable, it's semantics
were not well understood and trying to remove it broke things. From
David Ahern.
5) Move usbnet snaity checking, ignore endpoints with invalid
wMaxPacketSize. From Bjørn Mork.
6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.
7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
Alex Vesker, and Yevgeny Kliteynik
8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.
9) Fix crash when removing sch_cbs entry while offloading is enabled,
from Vinicius Costa Gomes.
10) Signedness bug fixes, generally in looking at the result given by
of_get_phy_mode() and friends. From Dan Crapenter.
11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.
12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.
13) Fix quantization code in tcp_bbr, from Kevin Yang.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
net: tap: clean up an indentation issue
nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
sk_buff: drop all skb extensions on free and skb scrubbing
tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
Documentation: Clarify trap's description
mlxsw: spectrum: Clear VLAN filters during port initialization
net: ena: clean up indentation issue
NFC: st95hf: clean up indentation issue
net: phy: micrel: add Asym Pause workaround for KSZ9021
net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
ptp: correctly disable flags on old ioctls
lib: dimlib: fix help text typos
net: dsa: microchip: Always set regmap stride to 1
nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
...
|
||
|
|
298fb76a55 |
Merge tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Highlights:
- Add a new knfsd file cache, so that we don't have to open and close
on each (NFSv2/v3) READ or WRITE. This can speed up read and write
in some cases. It also replaces our readahead cache.
- Prevent silent data loss on write errors, by treating write errors
like server reboots for the purposes of write caching, thus forcing
clients to resend their writes.
- Tweak the code that allocates sessions to be more forgiving, so
that NFSv4.1 mounts are less likely to hang when a server already
has a lot of clients.
- Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be
limited only by the backend filesystem and the maximum RPC size.
- Allow the server to enforce use of the correct kerberos credentials
when a client reclaims state after a reboot.
And some miscellaneous smaller bugfixes and cleanup"
* tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits)
sunrpc: clean up indentation issue
nfsd: fix nfs read eof detection
nfsd: Make nfsd_reset_boot_verifier_locked static
nfsd: degraded slot-count more gracefully as allocation nears exhaustion.
nfsd: handle drc over-allocation gracefully.
nfsd: add support for upcall version 2
nfsd: add a "GetVersion" upcall for nfsdcld
nfsd: Reset the boot verifier on all write I/O errors
nfsd: Don't garbage collect files that might contain write errors
nfsd: Support the server resetting the boot verifier
nfsd: nfsd_file cache entries should be per net namespace
nfsd: eliminate an unnecessary acl size limit
Deprecate nfsd fault injection
nfsd: remove duplicated include from filecache.c
nfsd: Fix the documentation for svcxdr_tmpalloc()
nfsd: Fix up some unused variable warnings
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
nfsd: rip out the raparms cache
nfsd: have nfsd_test_lock use the nfsd_file cache
nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
...
|
||
|
|
8f744bdee4 |
Merge tag 'virtio-fs-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse virtio-fs support from Miklos Szeredi: "Virtio-fs allows exporting directory trees on the host and mounting them in guest(s). This isn't actually a new filesystem, but a glue layer between the fuse filesystem and a virtio based back-end. It's similar in functionality to the existing virtio-9p solution, but significantly faster in benchmarks and has better POSIX compliance. Further permformance improvements can be achieved by sharing the page cache between host and guest, allowing for faster I/O and reduced memory use. Kata Containers have been including the out-of-tree virtio-fs (with the shared page cache patches as well) since version 1.7 as an experimental feature. They have been active in development and plan to switch from virtio-9p to virtio-fs as their default solution. There has been interest from other sources as well. The userspace infrastructure is slated to be merged into qemu once the kernel part hits mainline. This was developed by Vivek Goyal, Dave Gilbert and Stefan Hajnoczi" * tag 'virtio-fs-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: virtio-fs: add virtiofs filesystem virtio-fs: add Documentation/filesystems/virtiofs.rst fuse: reserve values for mapping protocol |
||
|
|
568d850e3c |
Merge tag 'riscv/for-v5.4-rc1-b' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Paul Walmsley:
"Some additional RISC-V updates.
This includes one significant fix:
- Prevent interrupts from being unconditionally re-enabled during
exception handling if they were disabled in the context in which
the exception occurred
Also a few other fixes:
- Fix a build error when sparse memory support is manually enabled
- Prevent CPUs beyond CONFIG_NR_CPUS from being enabled in early boot
And a few minor improvements:
- DT improvements: in the FU540 SoC DT files, improve U-Boot
compatibility by adding an "ethernet0" alias, drop an unnecessary
property from the DT files, and add support for the PWM device
- KVM preparation: add a KVM-related macro for future RISC-V KVM
support, and export some symbols required to build KVM support as
modules
- defconfig additions: build more drivers by default for QEMU
configurations"
* tag 'riscv/for-v5.4-rc1-b' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Avoid interrupts being erroneously enabled in handle_exception()
riscv: dts: sifive: Drop "clock-frequency" property of cpu nodes
riscv: dts: sifive: Add ethernet0 to the aliases node
RISC-V: Export kernel symbols for kvm
KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface
arch/riscv: disable excess harts before picking main boot hart
RISC-V: Enable VIRTIO drivers in RV64 and RV32 defconfig
RISC-V: Fix building error when CONFIG_SPARSEMEM_MANUAL=y
riscv: dts: Add DT support for SiFive FU540 PWM driver
|
||
|
|
8bbe0dec38 |
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
"x86 KVM changes:
- The usual accuracy improvements for nested virtualization
- The usual round of code cleanups from Sean
- Added back optimizations that were prematurely removed in 5.2 (the
bare minimum needed to fix the regression was in 5.3-rc8, here
comes the rest)
- Support for UMWAIT/UMONITOR/TPAUSE
- Direct L2->L0 TLB flushing when L0 is Hyper-V and L1 is KVM
- Tell Windows guests if SMT is disabled on the host
- More accurate detection of vmexit cost
- Revert a pvqspinlock pessimization"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (56 commits)
KVM: nVMX: cleanup and fix host 64-bit mode checks
KVM: vmx: fix build warnings in hv_enable_direct_tlbflush() on i386
KVM: x86: Don't check kvm_rebooting in __kvm_handle_fault_on_reboot()
KVM: x86: Drop ____kvm_handle_fault_on_reboot()
KVM: VMX: Add error handling to VMREAD helper
KVM: VMX: Optimize VMX instruction error and fault handling
KVM: x86: Check kvm_rebooting in kvm_spurious_fault()
KVM: selftests: fix ucall on x86
Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
kvm: nvmx: limit atomic switch MSRs
kvm: svm: Intercept RDPRU
kvm: x86: Add "significant index" flag to a few CPUID leaves
KVM: x86/mmu: Skip invalid pages during zapping iff root_count is zero
KVM: x86/mmu: Explicitly track only a single invalid mmu generation
KVM: x86/mmu: Revert "KVM: x86/mmu: Remove is_obsolete() call"
KVM: x86/mmu: Revert "Revert "KVM: MMU: reclaim the zapped-obsolete page first""
KVM: x86/mmu: Revert "Revert "KVM: MMU: collapse TLB flushes when zap all pages""
KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""
KVM: x86/mmu: Revert "Revert "KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages""
KVM: x86/mmu: Revert "Revert "KVM: MMU: show mmu_valid_gen in shadow page related tracepoints""
...
|
||
|
|
2d5ba0c712 |
Merge branch 'nvme-5.4' of git://git.infradead.org/nvme into for-linus
Pull NVMe changes from Sagi: "This set consists of various fixes and cleanups: - controller removal race fix from Balbir - quirk additions from Gabriel and Jian-Hong - nvme-pci power state save fix from Mario - Add 64bit user commands (for 64bit registers) from Marta - nvme-rdma/nvme-tcp fixes from Max, Mark and Me - Minor cleanups and nits from James, Dan and John" * 'nvme-5.4' of git://git.infradead.org/nvme: nvme-rdma: fix possible use-after-free in connect timeout nvme: Move ctrl sqsize to generic space nvme: Add ctrl attributes for queue_count and sqsize nvme: allow 64-bit results in passthru commands nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T nvmet-tcp: remove superflous check on request sgl Added QUIRKs for ADATA XPG SX8200 Pro 512GB nvme-rdma: Fix max_hw_sectors calculation nvme: fix an error code in nvme_init_subsystem() nvme-pci: Save PCI state before putting drive into deepest state nvme-tcp: fix wrong stop condition in io_work nvme-pci: Fix a race in controller removal nvmet: change ppl to lpp |
||
|
|
2df4de1681 |
ptp: correctly disable flags on old ioctls
Commit
|
||
|
|
c5f095baa8 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Add NFT_CHAIN_POLICY_UNSET to replace hardcoded -1 to specify that the chain policy is unset. The chain policy field is actually defined as an 8-bit unsigned integer. 2) Remove always true condition reported by smatch in chain policy check. 3) Fix element lookup on dynamic sets, from Florian Westphal. 4) Use __u8 in ebtables uapi header, from Masahiro Yamada. 5) Bogus EBUSY when removing flowtable after chain flush, from Laura Garcia Liebana. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
cbafe18c71 |
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: - almost all of the rest of -mm - various other subsystems Subsystems affected by this patch series: memcg, misc, core-kernel, lib, checkpatch, reiserfs, fat, fork, cpumask, kexec, uaccess, kconfig, kgdb, bug, ipc, lzo, kasan, madvise, cleanups, pagemap * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (77 commits) arch/sparc/include/asm/pgtable_64.h: fix build mm: treewide: clarify pgtable_page_{ctor,dtor}() naming ntfs: remove (un)?likely() from IS_ERR() conditions IB/hfi1: remove unlikely() from IS_ERR*() condition xfs: remove unlikely() from WARN_ON() condition wimax/i2400m: remove unlikely() from WARN*() condition fs: remove unlikely() from WARN_ON() condition xen/events: remove unlikely() from WARN() condition checkpatch: check for nested (un)?likely() calls hexagon: drop empty and unused free_initrd_mem mm: factor out common parts between MADV_COLD and MADV_PAGEOUT mm: introduce MADV_PAGEOUT mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM mm: introduce MADV_COLD mm: untag user pointers in mmap/munmap/mremap/brk vfio/type1: untag user pointers in vaddr_get_pfn tee/shm: untag user pointers in tee_shm_register media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get drm/radeon: untag user pointers in radeon_gem_userptr_ioctl drm/amdgpu: untag user pointers ... |
||
|
|
1a4e58cce8 |
mm: introduce MADV_PAGEOUT
When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit intentionally because it's automatically bounded by PMD size. If PMD size(e.g., 256) makes some trouble, we could fix it later by limit it to SWAP_CLUSTER_MAX[1]. - man-page material MADV_PAGEOUT (since Linux x.x) Do not expect access in the near future so pages in the specified regions could be reclaimed instantly regardless of memory pressure. Thus, access in the range after successful operation could cause major page fault but never lose the up-to-date contents unlike MADV_DONTNEED. Pages belonging to a shared mapping are only processed if a write access is allowed for the calling process. MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/ [minchan@kernel.org: clear PG_active on MADV_PAGEOUT] Link: http://lkml.kernel.org/r/20190802200643.GA181880@google.com [akpm@linux-foundation.org: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/20190726023435.214162-5-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
9c276cc65a |
mm: introduce MADV_COLD
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7. - Background The Android terminology used for forking a new process and starting an app from scratch is a cold start, while resuming an existing app is a hot start. While we continually try to improve the performance of cold starts, hot starts will always be significantly less power hungry as well as faster so we are trying to make hot start more likely than cold start. To increase hot start, Android userspace manages the order that apps should be killed in a process called ActivityManagerService. ActivityManagerService tracks every Android app or service that the user could be interacting with at any time and translates that into a ranked list for lmkd(low memory killer daemon). They are likely to be killed by lmkd if the system has to reclaim memory. In that sense they are similar to entries in any other cache. Those apps are kept alive for opportunistic performance improvements but those performance improvements will vary based on the memory requirements of individual workloads. - Problem Naturally, cached apps were dominant consumers of memory on the system. However, they were not significant consumers of swap even though they are good candidate for swap. Under investigation, swapping out only begins once the low zone watermark is hit and kswapd wakes up, but the overall allocation rate in the system might trip lmkd thresholds and cause a cached process to be killed(we measured performance swapping out vs. zapping the memory by killing a process. Unsurprisingly, zapping is 10x times faster even though we use zram which is much faster than real storage) so kill from lmkd will often satisfy the high zone watermark, resulting in very few pages actually being moved to swap. - Approach The approach we chose was to use a new interface to allow userspace to proactively reclaim entire processes by leveraging platform information. This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages that are known to be cold from userspace and to avoid races with lmkd by reclaiming apps as soon as they entered the cached state. Additionally, it could provide many chances for platform to use much information to optimize memory efficiency. To achieve the goal, the patchset introduce two new options for madvise. One is MADV_COLD which will deactivate activated pages and the other is MADV_PAGEOUT which will reclaim private pages instantly. These new options complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to gain some free memory space. MADV_PAGEOUT is similar to MADV_DONTNEED in a way that it hints the kernel that memory region is not currently needed and should be reclaimed immediately; MADV_COLD is similar to MADV_FREE in a way that it hints the kernel that memory region is not currently needed and should be reclaimed when memory pressure rises. This patch (of 5): When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Pages in the specified regions will be treated as less-recently-accessed compared to pages in the system with similar access frequencies. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [akpm@linux-foundation.org: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
541be05095 |
linux/coff.h: add include guard
Add a header include guard just in case. My motivation is to allow Kbuild to detect missing include guard: https://patchwork.kernel.org/patch/11063011/ Before I enable this checker I want to fix as many headers as possible. Link: http://lkml.kernel.org/r/20190728154728.11126-1-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
65e68edce0 |
nvme: allow 64-bit results in passthru commands
It is not possible to get 64-bit results from the passthru commands, what prevents from getting for the Capabilities (CAP) property value. As a result, it is not possible to implement IOL's NVMe Conformance test 4.3 Case 1 for Fabrics targets [1] (page 123). This issue has been already discussed [2], but without a solution. This patch solves the problem by adding new ioctls with a new passthru structure, including 64-bit results. The older ioctls stay unchanged. [1] https://www.iol.unh.edu/sites/default/files/testsuites/nvme/UNH-IOL_NVMe_Conformance_Test_Suite_v11.0.pdf [2] http://lists.infradead.org/pipermail/linux-nvme/2018-June/018791.html Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> |
||
|
|
7b1373dd6e |
Merge tag 'fuse-update-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi: - Continue separating the transport (user/kernel communication) and the filesystem layers of fuse. Getting rid of most layering violations will allow for easier cleanup and optimization later on. - Prepare for the addition of the virtio-fs filesystem. The actual filesystem will be introduced by a separate pull request. - Convert to new mount API. - Various fixes, optimizations and cleanups. * tag 'fuse-update-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (55 commits) fuse: Make fuse_args_to_req static fuse: fix memleak in cuse_channel_open fuse: fix beyond-end-of-page access in fuse_parse_cache() fuse: unexport fuse_put_request fuse: kmemcg account fs data fuse: on 64-bit store time in d_fsdata directly fuse: fix missing unlock_page in fuse_writepage() fuse: reserve byteswapped init opcodes fuse: allow skipping control interface and forced unmount fuse: dissociate DESTROY from fuseblk fuse: delete dentry if timeout is zero fuse: separate fuse device allocation and installation in fuse_conn fuse: add fuse_iqueue_ops callbacks fuse: extract fuse_fill_super_common() fuse: export fuse_dequeue_forget() function fuse: export fuse_get_unique() fuse: export fuse_send_init_request() fuse: export fuse_len_args() fuse: export fuse_end_request() fuse: fix request limit ... |
||
|
|
20ff1cb506 |
netfilter: ebtables: use __u8 instead of uint8_t in uapi header
When CONFIG_UAPI_HEADER_TEST=y, exported headers are compile-tested to
make sure they can be included from user-space.
Currently, linux/netfilter_bridge/ebtables.h is excluded from the test
coverage. To make it join the compile-test, we need to fix the build
errors attached below.
For a case like this, we decided to use __u{8,16,32,64} variable types
in this discussion:
https://lkml.org/lkml/2019/6/5/18
Build log:
CC usr/include/linux/netfilter_bridge/ebtables.h.s
In file included from <command-line>:32:0:
./usr/include/linux/netfilter_bridge/ebtables.h:126:4: error: unknown type name ‘uint8_t’
uint8_t revision;
^~~~~~~
./usr/include/linux/netfilter_bridge/ebtables.h:139:4: error: unknown type name ‘uint8_t’
uint8_t revision;
^~~~~~~
./usr/include/linux/netfilter_bridge/ebtables.h:152:4: error: unknown type name ‘uint8_t’
uint8_t revision;
^~~~~~~
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
||
|
|
b6cb84b4fc |
Merge tag 'for-5.4/io_uring-2019-09-24' of git://git.kernel.dk/linux-block
Pull more io_uring updates from Jens Axboe:
"A collection of later fixes and additions, that weren't quite ready
for pushing out with the initial pull request.
This contains:
- Fix potential use-after-free of shadow requests (Jackie)
- Fix potential OOM crash in request allocation (Jackie)
- kmalloc+memcpy -> kmemdup cleanup (Jackie)
- Fix poll crash regression (me)
- Fix SQ thread not being nice and giving up CPU for !PREEMPT (me)
- Add support for timeouts, making it easier to do epoll_wait()
conversions, for instance (me)
- Ensure io_uring works without f_ops->read_iter() and
f_ops->write_iter() (me)"
* tag 'for-5.4/io_uring-2019-09-24' of git://git.kernel.dk/linux-block:
io_uring: correctly handle non ->{read,write}_iter() file_operations
io_uring: IORING_OP_TIMEOUT support
io_uring: use cond_resched() in sqthread
io_uring: fix potential crash issue due to io_get_req failure
io_uring: ensure poll commands clear ->sqe
io_uring: fix use-after-free of shadow_req
io_uring: use kmemdup instead of kmalloc and memcpy
|