Commit Graph

7474 Commits

Author SHA1 Message Date
Jesse Brandeburg
ee1444b5e1 dim: initialize all struct fields
The W=2 build pointed out that the code wasn't initializing all the
variables in the dim_cq_moder declarations with the struct initializers.
The net change here is zero since these structs were already static
const globals and were initialized with zeros by the compiler, but
removing compiler warnings has value in and of itself.

lib/dim/net_dim.c: At top level:
lib/dim/net_dim.c:54:9: warning: missing initializer for field ‘comps’ of ‘const struct dim_cq_moder’ [-Wmissing-field-initializers]
   54 |         NET_DIM_RX_EQE_PROFILES,
      |         ^~~~~~~~~~~~~~~~~~~~~~~
In file included from lib/dim/net_dim.c:6:
./include/linux/dim.h:45:13: note: ‘comps’ declared here
   45 |         u16 comps;
      |             ^~~~~

and repeats for the tx struct, and once you fix the comps entry then
the cq_period_mode field needs the same treatment.

Use the commonly accepted style to indicate to the compiler that we
know what we're doing, and add a comma at the end of each struct
initializer to clean up the issue, and use explicit initializers
for the fields we are initializing which makes the compiler happy.

While here and fixing these lines, clean up the code slightly with
a fix for the super long lines by removing the word "_MODERATION" from a
couple defines only used in this file.

Fixes: f8be17b81d ("lib/dim: Fix -Wunused-const-variable warnings")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20220507011038.14568-1-jesse.brandeburg@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-09 17:20:37 -07:00
Linus Torvalds
b2da7df52e Merge tag 'x86_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - A fix to disable PCI/MSI[-X] masking for XEN_HVM guests as that is
   solely controlled by the hypervisor

 - A build fix to make the function prototype (__warn()) as visible as
   the definition itself

 - A bunch of objtool annotation fixes which have accumulated over time

 - An ORC unwinder fix to handle bad input gracefully

 - Well, we thought the microcode gets loaded in time in order to
   restore the microcode-emulated MSRs but we thought wrong. So there's
   a fix for that to have the ordering done properly

 - Add new Intel model numbers

 - A spelling fix

* tag 'x86_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests
  bug: Have __warn() prototype defined unconditionally
  x86/Kconfig: fix the spelling of 'becoming' in X86_KERNEL_IBT config
  objtool: Use offstr() to print address of missing ENDBR
  objtool: Print data address for "!ENDBR" data warnings
  x86/xen: Add ANNOTATE_NOENDBR to startup_xen()
  x86/uaccess: Add ENDBR to __put_user_nocheck*()
  x86/retpoline: Add ANNOTATE_NOENDBR for retpolines
  x86/static_call: Add ANNOTATE_NOENDBR to static call trampoline
  objtool: Enable unreachable warnings for CLANG LTO
  x86,objtool: Explicitly mark idtentry_body()s tail REACHABLE
  x86,objtool: Mark cpu_startup_entry() __noreturn
  x86,xen,objtool: Add UNWIND hint
  lib/strn*,objtool: Enforce user_access_begin() rules
  MAINTAINERS: Add x86 unwinding entry
  x86/unwind/orc: Recheck address range after stack info was updated
  x86/cpu: Load microcode during restore_processor_state()
  x86/cpu: Add new Alderlake and Raptorlake CPU model numbers
2022-05-01 10:03:36 -07:00
Mikulas Patocka
e4d8a29997 hex2bin: fix access beyond string end
If we pass too short string to "hex2bin" (and the string size without
the terminating NUL character is even), "hex2bin" reads one byte after
the terminating NUL character.  This patch fixes it.

Note that hex_to_bin returns -1 on error and hex2bin return -EINVAL on
error - so we can't just return the variable "hi" or "lo" on error.
This inconsistency may be fixed in the next merge window, but for the
purpose of fixing this bug, we just preserve the existing behavior and
return -1 and -EINVAL.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Fixes: b78049831f ("lib: add error checking to hex2bin")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-27 10:57:33 -07:00
Mikulas Patocka
e5be15767e hex2bin: make the function hex_to_bin constant-time
The function hex2bin is used to load cryptographic keys into device
mapper targets dm-crypt and dm-integrity.  It should take constant time
independent on the processed data, so that concurrently running
unprivileged code can't infer any information about the keys via
microarchitectural convert channels.

This patch changes the function hex_to_bin so that it contains no
branches and no memory accesses.

Note that this shouldn't cause performance degradation because the size
of the new function is the same as the size of the old function (on
x86-64) - and the new function causes no branch misprediction penalties.

I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64
i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32
sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64
powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are
no branches in the generated code.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-27 10:57:33 -07:00
Matthew Wilcox (Oracle)
63b1898fff XArray: Disallow sibling entries of nodes
There is a race between xas_split() and xas_load() which can result in
the wrong page being returned, and thus data corruption.  Fortunately,
it's hard to hit (syzbot took three months to find it) and often guarded
with VM_BUG_ON().

The anatomy of this race is:

thread A			thread B
order-9 page is stored at index 0x200
				lookup of page at index 0x274
page split starts
				load of sibling entry at offset 9
stores nodes at offsets 8-15
				load of entry at offset 8

The entry at offset 8 turns out to be a node, and so we descend into it,
and load the page at index 0x234 instead of 0x274.  This is hard to fix
on the split side; we could replace the entire node that contains the
order-9 page instead of replacing the eight entries.  Fixing it on
the lookup side is easier; just disallow sibling entries that point
to nodes.  This cannot ever be a useful thing as the descent would not
know the correct offset to use within the new node.

The test suite continues to pass, but I have not added a new test for
this bug.

Reported-by: syzbot+cf4cf13056f85dec2c40@syzkaller.appspotmail.com
Tested-by: syzbot+cf4cf13056f85dec2c40@syzkaller.appspotmail.com
Fixes: 6b24ca4a1a ("mm: Use multi-index entries in the page cache")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-04-22 15:35:40 -04:00
Peter Zijlstra
226d44acf6 lib/strn*,objtool: Enforce user_access_begin() rules
Apparently GCC can fail to inline a 'static inline' single caller
function:

  lib/strnlen_user.o: warning: objtool: strnlen_user()+0x33: call to do_strnlen_user() with UACCESS enabled
  lib/strncpy_from_user.o: warning: objtool: strncpy_from_user()+0x33: call to do_strncpy_from_user() with UACCESS enabled

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220408094718.262932488@infradead.org
2022-04-19 21:58:47 +02:00
Linus Torvalds
33563138ac Merge tag 'driver-core-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
 "Here are two small driver core changes for 5.18-rc2.

  They are the final bits in the removal of the default_attrs field in
  struct kobj_type. I had to wait until after 5.18-rc1 for all of the
  changes to do this came in through different development trees, and
  then one new user snuck in. So this series has two changes:

   - removal of the default_attrs field in the powerpc/pseries/vas code.

     The change has been acked by the PPC maintainers to come through
     this tree

   - removal of default_attrs from struct kobj_type now that all
     in-kernel users are removed.

     This cleans up the kobject code a little bit and removes some
     duplicated functionality that confused people (now there is only
     one way to do default groups)

  Both of these have been in linux-next for all of this week with no
  reported problems"

* tag 'driver-core-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kobject: kobj_type: remove default_attrs
  powerpc/pseries/vas: use default_groups in kobj_type
2022-04-10 09:55:09 -10:00
Guo Xuenan
eafc0a0239 lz4: fix LZ4_decompress_safe_partial read out of bound
When partialDecoding, it is EOF if we've either filled the output buffer
or can't proceed with reading an offset for following match.

In some extreme corner cases when compressed data is suitably corrupted,
UAF will occur.  As reported by KASAN [1], LZ4_decompress_safe_partial
may lead to read out of bound problem during decoding.  lz4 upstream has
fixed it [2] and this issue has been disscussed here [3] before.

current decompression routine was ported from lz4 v1.8.3, bumping
lib/lz4 to v1.9.+ is certainly a huge work to be done later, so, we'd
better fix it first.

[1] https://lore.kernel.org/all/000000000000830d1205cf7f0477@google.com/
[2] c5d6f8a8be#
[3] https://lore.kernel.org/all/CC666AE8-4CA4-4951-B6FB-A2EFDE3AC03B@fb.com/

Link: https://lkml.kernel.org/r/20211111105048.2006070-1-guoxuenan@huawei.com
Reported-by: syzbot+63d688f1d899c588fb71@syzkaller.appspotmail.com
Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
Reviewed-by: Nick Terrell <terrelln@fb.com>
Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Yann Collet <cyan@fb.com>
Cc: Chengyang Fan <cy.fan@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-08 14:20:36 -10:00
Greg Kroah-Hartman
cdb4f26a63 kobject: kobj_type: remove default_attrs
Now that all in-kernel users of default_attrs for the kobj_type are gone
and converted to properly use the default_groups pointer instead, it can
be safely removed.

There is one standard way to create sysfs files in a kobj_type, and not
two like before, causing confusion as to which should be used.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20220106133151.607703-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-05 15:39:19 +02:00
Linus Torvalds
d589ae0d44 Merge tag 'for-5.18/block-2022-04-01' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Either fixes or a few additions that got missed in the initial merge
  window pull. In detail:

   - List iterator fix to avoid leaking value post loop (Jakob)

   - One-off fix in minor count (Christophe)

   - Fix for a regression in how io priority setting works for an
     exiting task (Jiri)

   - Fix a regression in this merge window with blkg_free() being called
     in an inappropriate context (Ming)

   - Misc fixes (Ming, Tom)"

* tag 'for-5.18/block-2022-04-01' of git://git.kernel.dk/linux-block:
  blk-wbt: remove wbt_track stub
  block: use dedicated list iterator variable
  block: Fix the maximum minor value is blk_alloc_ext_minor()
  block: restore the old set_task_ioprio() behaviour wrt PF_EXITING
  block: avoid calling blkg_free() in atomic context
  lib/sbitmap: allocate sb->map via kvzalloc_node
2022-04-01 16:20:00 -07:00
Linus Torvalds
5a3fe95d76 Merge tag 'xarray-5.18' of git://git.infradead.org/users/willy/xarray
Pull XArray updates from Matthew Wilcox:

 - Documentation update

 - Fix test-suite build after move of bitmap.h

 - Fix xas_create_range() when a large entry is already present

 - Fix xas_split() of a shadow entry

* tag 'xarray-5.18' of git://git.infradead.org/users/willy/xarray:
  XArray: Update the LRU list in xas_split()
  XArray: Fix xas_create_range() when multi-order entry present
  XArray: Include bitmap.h from xarray.h
  XArray: Document the locking requirement for the xa_state
2022-04-01 13:40:44 -07:00
Linus Torvalds
e8b767f5e0 Merge tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:

 - Devicetree support (for testing)

 - Various cleanups and fixes: UBD, port_user, uml_mconsole

 - Maintainer update

* tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: run_helper: Write error message to kernel log on exec failure on host
  um: port_user: Improve error handling when port-helper is not found
  um: port_user: Allow setting path to port-helper using UML_PORT_HELPER envvar
  um: port_user: Search for in.telnetd in PATH
  um: clang: Strip out -mno-global-merge from USER_CFLAGS
  docs: UML: Mention telnetd for port channel
  um: Remove unused timeval_to_ns() function
  um: Fix uml_mconsole stop/go
  um: Cleanup syscall_handler_t definition/cast, fix warning
  uml: net: vector: fix const issue
  um: Fix WRITE_ZEROES in the UBD Driver
  um: Migrate vector drivers to NAPI
  um: Fix order of dtb unflatten/early init
  um: fix and optimize xor select template for CONFIG64 and timetravel mode
  um: Document dtb command line option
  lib/logic_iomem: correct fallback config references
  um: Remove duplicated include in syscalls_64.c
  MAINTAINERS: Update UserModeLinux entry
2022-03-31 16:16:58 -07:00
Matthew Wilcox (Oracle)
3ed4bb7715 XArray: Update the LRU list in xas_split()
When splitting a value entry, we may need to add the new nodes to the LRU
list and remove the parent node from the LRU list.  The WARN_ON checks
in shadow_lru_isolate() catch this oversight.  This bug was latent
until we stopped splitting folios in shrink_page_list() with commit
820c4e2e6f ("mm/vmscan: Free non-shmem folios without splitting them").
That allows the creation of large shadow entries, and subsequently when
trying to page in a small page, we will split the large shadow entry
in __filemap_add_folio().

Fixes: 8fc75643c5 ("XArray: add xas_split")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-31 08:52:57 -04:00
Dan Carpenter
dc0ce6cc4b lib/test: use after free in register_test_dev_kmod()
The "test_dev" pointer is freed but then returned to the caller.

Fixes: d9c6a72d6f ("kmod: add test driver to stress test the module loader")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-03-29 15:13:36 -07:00
Matthew Wilcox (Oracle)
3e3c658055 XArray: Fix xas_create_range() when multi-order entry present
If there is already an entry present that is of order >= XA_CHUNK_SHIFT
when we call xas_create_range(), xas_create_range() will misinterpret
that entry as a node and dereference xa_node->parent, generally leading
to a crash that looks something like this:

general protection fault, probably for non-canonical address 0xdffffc0000000001:
0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 32 Comm: khugepaged Not tainted 5.17.0-rc8-syzkaller-00003-g56e337f2cf13 #0
RIP: 0010:xa_parent_locked include/linux/xarray.h:1207 [inline]
RIP: 0010:xas_create_range+0x2d9/0x6e0 lib/xarray.c:725

It's deterministically reproducable once you know what the problem is,
but producing it in a live kernel requires khugepaged to hit a race.
While the problem has been present since xas_create_range() was
introduced, I'm not aware of a way to hit it before the page cache was
converted to use multi-index entries.

Fixes: 6b24ca4a1a ("mm: Use multi-index entries in the page cache")
Reported-by: syzbot+0d2b0bf32ca5cfd09f2e@syzkaller.appspotmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-28 19:25:11 -04:00
Linus Torvalds
4be240b18a Merge tag 'memcpy-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull FORTIFY_SOURCE updates from Kees Cook:
 "This series consists of two halves:

   - strict compile-time buffer size checking under FORTIFY_SOURCE for
     the memcpy()-family of functions (for extensive details and
     rationale, see the first commit)

   - enabling FORTIFY_SOURCE for Clang, which has had many overlapping
     bugs that we've finally worked past"

* tag 'memcpy-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  fortify: Add Clang support
  fortify: Make sure strlen() may still be used as a constant expression
  fortify: Use __diagnose_as() for better diagnostic coverage
  fortify: Make pointer arguments const
  Compiler Attributes: Add __diagnose_as for Clang
  Compiler Attributes: Add __overloadable for Clang
  Compiler Attributes: Add __pass_object_size for Clang
  fortify: Replace open-coded __gnu_inline attribute
  fortify: Update compile-time tests for Clang 14
  fortify: Detect struct member overflows in memset() at compile-time
  fortify: Detect struct member overflows in memmove() at compile-time
  fortify: Detect struct member overflows in memcpy() at compile-time
2022-03-26 12:19:04 -07:00
Linus Torvalds
3f7282139f Merge tag 'for-5.18/64bit-pi-2022-03-25' of git://git.kernel.dk/linux-block
Pull block layer 64-bit data integrity support from Jens Axboe:
 "This adds support for 64-bit data integrity in the block layer and in
  NVMe"

* tag 'for-5.18/64bit-pi-2022-03-25' of git://git.kernel.dk/linux-block:
  crypto: fix crc64 testmgr digest byte order
  nvme: add support for enhanced metadata
  block: add pi for extended integrity
  crypto: add rocksoft 64b crc guard tag framework
  lib: add rocksoft model crc64
  linux/kernel: introduce lower_48_bits function
  asm-generic: introduce be48 unaligned accessors
  nvme: allow integrity on extended metadata formats
  block: support pi with extended metadata
2022-03-26 12:01:35 -07:00
Linus Torvalds
29c8c18363 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:
 "This is the material which was staged after willystuff in linux-next.

  Subsystems affected by this patch series: mm (debug, selftests,
  pagecache, thp, rmap, migration, kasan, hugetlb, pagemap, madvise),
  and selftests"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (113 commits)
  selftests: kselftest framework: provide "finished" helper
  mm: madvise: MADV_DONTNEED_LOCKED
  mm: fix race between MADV_FREE reclaim and blkdev direct IO read
  mm: generalize ARCH_HAS_FILTER_PGPROT
  mm: unmap_mapping_range_tree() with i_mmap_rwsem shared
  mm: warn on deleting redirtied only if accounted
  mm/huge_memory: remove stale locking logic from __split_huge_pmd()
  mm/huge_memory: remove stale page_trans_huge_mapcount()
  mm/swapfile: remove stale reuse_swap_page()
  mm/khugepaged: remove reuse_swap_page() usage
  mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()
  mm: streamline COW logic in do_swap_page()
  mm: slightly clarify KSM logic in do_swap_page()
  mm: optimize do_wp_page() for fresh pages in local LRU pagevecs
  mm: optimize do_wp_page() for exclusive pages in the swapcache
  mm/huge_memory: make is_transparent_hugepage() static
  userfaultfd/selftests: enable hugetlb remap and remove event testing
  selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test
  mm: enable MADV_DONTNEED for hugetlb mappings
  kasan: disable LOCKDEP when printing reports
  ...
2022-03-25 10:21:20 -07:00
Peter Collingbourne
2dfd1bd992 kasan: update function name in comments
The function kasan_global_oob was renamed to kasan_global_oob_right, but
the comments referring to it were not updated.  Do so.

Link: https://linux-review.googlesource.com/id/I20faa90126937bbee77d9d44709556c3dd4b40be
Link: https://lkml.kernel.org/r/20220219012433.890941-1-pcc@google.com
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:48 -07:00
Andrey Konovalov
ed6d74446c kasan: test: support async (again) and asymm modes for HW_TAGS
Async mode support has already been implemented in commit e80a76aa1a
("kasan, arm64: tests supports for HW_TAGS async mode") but then got
accidentally broken in commit 99734b535d ("kasan: detect false-positives
in tests").

Restore the changes removed by the latter patch and adapt them for asymm
mode: add a sync_fault flag to kunit_kasan_expectation that only get set
if the MTE fault was synchronous, and reenable MTE on such faults in
tests.

Also rename kunit_kasan_expectation to kunit_kasan_status and move its
definition to mm/kasan/kasan.h from include/linux/kasan.h, as this
structure is only internally used by KASAN.  Also put the structure
definition under IS_ENABLED(CONFIG_KUNIT).

Link: https://lkml.kernel.org/r/133970562ccacc93ba19d754012c562351d4a8c8.1645033139.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:48 -07:00
Andrey Konovalov
1a2473f0cb kasan: improve vmalloc tests
Update the existing vmalloc_oob() test to account for the specifics of the
tag-based modes.  Also add a few new checks and comments.

Add new vmalloc-related tests:

 - vmalloc_helpers_tags() to check that exported vmalloc helpers can
   handle tagged pointers.

 - vmap_tags() to check that SW_TAGS mode properly tags vmap() mappings.

 - vm_map_ram_tags() to check that SW_TAGS mode properly tags
   vm_map_ram() mappings.

 - vmalloc_percpu() to check that SW_TAGS mode tags regions allocated
   for __alloc_percpu(). The tagging of per-cpu mappings is best-effort;
   proper tagging is tracked in [1].

[1] https://bugzilla.kernel.org/show_bug.cgi?id=215019

[sfr@canb.auug.org.au: similar to "kasan: test: fix compatibility with FORTIFY_SOURCE"]
  Link: https://lkml.kernel.org/r/20220128144801.73f5ced0@canb.auug.org.au
  Link: https://lkml.kernel.org/r/865c91ba49b90623ab50c7526b79ccb955f544f0.1644950160.git.andreyknvl@google.com
[andreyknvl@google.com: set_memory_rw/ro() are not exported to modules]
  Link: https://lkml.kernel.org/r/019ac41602e0c4a7dfe96dc8158a95097c2b2ebd.1645554036.git.andreyknvl@google.com
[akpm@linux-foundation.org: fix build]

Cc: Andrey Konovalov <andreyknvl@gmail.com>
[andreyknvl@google.com: vmap_tags() and vm_map_ram_tags() pass invalid page array size]
Link: https://lkml.kernel.org/r/bbdc1c0501c5275e7f26fdb8e2a7b14a40a9f36b.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:48 -07:00
Andrey Konovalov
fbefb423f8 kasan: allow enabling KASAN_VMALLOC and SW/HW_TAGS
Allow enabling CONFIG_KASAN_VMALLOC with SW_TAGS and HW_TAGS KASAN modes.

Also adjust CONFIG_KASAN_VMALLOC description:

 - Mention HW_TAGS support.

 - Remove unneeded internal details: they have no place in Kconfig
   description and are already explained in the documentation.

Link: https://lkml.kernel.org/r/bfa0fdedfe25f65e5caa4e410f074ddbac7a0b59.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:48 -07:00
Waiman Long
ef62c8ff1d lib/vsprintf: avoid redundant work with 0 size
Patch series "mm/page_owner: Extend page_owner to show memcg information", v4.

While debugging the constant increase in percpu memory consumption on a
system that spawned large number of containers, it was found that a lot
of offline mem_cgroup structures remained in place without being freed.
Further investigation indicated that those mem_cgroup structures were
pinned by some pages.

In order to find out what those pages are, the existing page_owner
debugging tool is extended to show memory cgroup information and whether
those memcgs are offline or not.  With the enhanced page_owner tool, the
following is a typical page that pinned the mem_cgroup structure in my
test case:

  Page allocated via order 0, mask 0x1100cca(GFP_HIGHUSER_MOVABLE), pid 162970 (podman), ts 1097761405537 ns, free_ts 1097760838089 ns
  PFN 1925700 type Movable Block 3761 type Movable Flags 0x17ffffc00c001c(uptodate|dirty|lru|reclaim|swapbacked|node=0|zone=2|lastcpupid=0x1fffff)
    prep_new_page+0xac/0xe0
    get_page_from_freelist+0x1327/0x14d0
    __alloc_pages+0x191/0x340
    alloc_pages_vma+0x84/0x250
    shmem_alloc_page+0x3f/0x90
    shmem_alloc_and_acct_page+0x76/0x1c0
    shmem_getpage_gfp+0x281/0x940
    shmem_write_begin+0x36/0xe0
    generic_perform_write+0xed/0x1d0
    __generic_file_write_iter+0xdc/0x1b0
    generic_file_write_iter+0x5d/0xb0
    new_sync_write+0x11f/0x1b0
    vfs_write+0x1ba/0x2a0
    ksys_write+0x59/0xd0
    do_syscall_64+0x37/0x80
    entry_SYSCALL_64_after_hwframe+0x44/0xae
  Charged to offline memcg libpod-conmon-15e4f9c758422306b73b2dd99f9d50a5ea53cbb16b4a13a2c2308a4253cc0ec8.

So the page was not freed because it was part of a shmem segment.  That
is useful information that can help users to diagnose similar problems.

With cgroup v1, /proc/cgroups can be read to find out the total number
of memory cgroups (online + offline).  With cgroup v2, the cgroup.stat
of the root cgroup can be read to find the number of dying cgroups (most
likely pinned by dying memcgs).

The page_owner feature is not supposed to be enabled for production
system due to its memory overhead.  However, if it is suspected that
dying memcgs are increasing over time, a test environment with
page_owner enabled can then be set up with appropriate workload for
further analysis on what may be causing the increasing number of dying
memcgs.

This patch (of 4):

For *scnprintf(), vsnprintf() is always called even if the input size is
0.  That is a waste of time, so just return 0 in this case.

Note that vsnprintf() will never return -1 to indicate an error.  So
skipping the call to vsnprintf() when size is 0 will have no functional
impact at all.

Link: https://lkml.kernel.org/r/20220202203036.744010-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20220202203036.744010-2-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:44 -07:00
Linus Torvalds
b9132c32e0 Merge tag 'cxl-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL (Compute Express Link) updates from Dan Williams:
 "This development cycle extends the subsystem to discover CXL resources
  throughout a CXL/PCIe switch topology and respond to hot add/remove
  events anywhere in that topology.

  This is more foundational infrastructure in preparation for dynamic
  memory region provisioning support. Recall that CXL memory regions, as
  the new "Theory of Operation" section of
  Documentation/driver-api/cxl/memory-devices.rst describes, bring
  storage volume striping semantics to memory.

  The hot add/remove behavior is validated with extensions to the
  cxl_test unit test environment and this test in the cxl-cli test
  suite:

      https://github.com/pmem/ndctl/blob/djbw/for-74/cxl/test/cxl-topology.sh

  Summary:

   - Add a driver for 'struct cxl_memdev' objects responsible for
     CXL.mem operation as distinct from 'cxl_pci' mailbox operations.

     Its primary responsibility is enumerating an endpoint 'struct
     cxl_port' and all the 'struct cxl_port' instances between an
     endpoint and the CXL platform root.

   - Add a driver for 'struct cxl_port' objects responsible for
     enumerating and operating all Host-managed Device Memory (HDM)
     decoder resources between the platform-level CXL memory
     description, all intervening host bridges / switches, and the HDM
     resources in endpoints.

   - Update the cxl_pci driver to validate CXL.mem operation precursors
     to HDM decoder operation like ready-polling, and legacy CXL 1.1
     DVSEC based CXL.mem configuration.

   - Add basic lockdep coverage for usage of device_lock() on CXL
     subsystem objects similar to what exists for LIBNVDIMM. Include a
     compile-time switch for which subsystem to validate at run-time.

   - Update cxl_test to emulate a one level switch topology.

   - Document a "Theory of Operation" for the subsystem.

   - Add 'numa_node' and 'serial' attributes to cxl_memdev sysfs

   - Include miscellaneous fixes for spec / QEMU CXL emulation
     compatibility and static analysis reports"

* tag 'cxl-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (48 commits)
  cxl/core/port: Fix NULL but dereferenced coccicheck error
  cxl/port: Hold port reference until decoder release
  cxl/port: Fix endpoint refcount leak
  cxl/core: Fix cxl_device_lock() class detection
  cxl/core/port: Fix unregister_port() lock assertion
  cxl/regs: Fix size of CXL Capability Header Register
  cxl/core/port: Handle invalid decoders
  cxl/core/port: Fix / relax decoder target enumeration
  tools/testing/cxl: Add a physical_node link
  tools/testing/cxl: Enumerate mock decoders
  tools/testing/cxl: Mock one level of switches
  tools/testing/cxl: Fix root port to host bridge assignment
  tools/testing/cxl: Mock dvsec_ranges()
  cxl/core/port: Add endpoint decoders
  cxl/core: Move target_list out of base decoder attributes
  cxl/mem: Add the cxl_mem driver
  cxl/core/port: Add switch port enumeration
  cxl/memdev: Add numa_node attribute
  cxl/pci: Emit device serial number
  cxl/pci: Implement wait for media active
  ...
2022-03-24 18:07:03 -07:00
Linus Torvalds
52deda9551 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "Various misc subsystems, before getting into the post-linux-next
  material.

  41 patches.

  Subsystems affected by this patch series: procfs, misc, core-kernel,
  lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump,
  taskstats, panic, kcov, resource, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
  Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang"
  kernel/resource: fix kfree() of bootmem memory again
  kcov: properly handle subsequent mmap calls
  kcov: split ioctl handling into locked and unlocked parts
  panic: move panic_print before kmsg dumpers
  panic: add option to dump all CPUs backtraces in panic_print
  docs: sysctl/kernel: add missing bit to panic_print
  taskstats: remove unneeded dead assignment
  kasan: no need to unset panic_on_warn in end_report()
  ubsan: no need to unset panic_on_warn in ubsan_epilogue()
  panic: unset panic_on_warn inside panic()
  docs: kdump: add scp example to write out the dump file
  docs: kdump: update description about sysfs file system support
  arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
  cgroup: use irqsave in cgroup_rstat_flush_locked().
  fat: use pointer to simple type in put_user()
  minix: fix bug when opening a file with O_DIRECT
  ...
2022-03-24 14:14:07 -07:00