Commit Graph

201 Commits

Author SHA1 Message Date
Christoph Hellwig 782e6769c0 dma-mapping: provide a generic dma-noncoherent implementation
Add a new dma_map_ops implementation that uses dma-direct for the
address mapping of streaming mappings, and which requires arch-specific
implemenations of coherent allocate/free.

Architectures have to provide flushing helpers to ownership trasnfers
to the device and/or CPU, and can provide optional implementations of
the coherent mmap functionality, and the cache_flush routines for
non-coherent long term allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Alexey Brodkin <abrodkin@synopsys.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
2018-05-19 08:46:12 +02:00
Christoph Hellwig 35ddb69cd2 dma-mapping: simplify Kconfig dependencies
ARCH_DMA_ADDR_T_64BIT is always true for 64-bit architectures now, so we
can skip the clause requiring it.  'n' is the default default, so no need
to explicitly state it.

Tested-by: Alexey Brodkin <abrodkin@synopsys.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-19 08:46:12 +02:00
Christoph Hellwig 09230cbc1b swiotlb: move the SWIOTLB config symbol to lib/Kconfig
This way we have one central definition of it, and user can select it as
needed.  The new option is not user visible, which is the behavior
it had in most architectures, with a few notable exceptions:

 - On x86_64 and mips/loongson3 it used to be user selectable, but
   defaulted to y.  It now is unconditional, which seems like the right
   thing for 64-bit architectures without guaranteed availablity of
   IOMMUs.
 - on powerpc the symbol is user selectable and defaults to n, but
   many boards select it.  This change assumes no working setup
   required a manual selection, but if that turned out to be wrong
   we'll have to add another select statement or two for the respective
   boards.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-09 06:58:01 +02:00
Christoph Hellwig 4965a68780 arch: define the ARCH_DMA_ADDR_T_64BIT config symbol in lib/Kconfig
Define this symbol if the architecture either uses 64-bit pointers or the
PHYS_ADDR_T_64BIT is set.  This covers 95% of the old arch magic.  We only
need an additional select for Xen on ARM (why anyway?), and we now always
set ARCH_DMA_ADDR_T_64BIT on mips boards with 64-bit physical addressing
instead of only doing it when highmem is set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: James Hogan <jhogan@kernel.org>
2018-05-09 06:57:04 +02:00
Christoph Hellwig f616ab59c2 dma-mapping: move the NEED_DMA_MAP_STATE config symbol to lib/Kconfig
This way we have one central definition of it, and user can select it as
needed.  Note that we now also always select it when CONFIG_DMA_API_DEBUG
is select, which fixes some incorrect checks in a few network drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
2018-05-09 06:56:08 +02:00
Christoph Hellwig 86596f0a28 scatterlist: move the NEED_SG_DMA_LENGTH config symbol to lib/Kconfig
This way we have one central definition of it, and user can select it as
needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
2018-05-09 06:55:59 +02:00
Christoph Hellwig a4ce5a48d7 iommu-helper: move the IOMMU_HELPER config symbol to lib/
This way we have one central definition of it, and user can select it as
needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
2018-05-09 06:55:51 +02:00
Zhichang Yuan 031e360186 lib: Add generic PIO mapping method
41f8bba7f5 ("of/pci: Add pci_register_io_range() and
pci_pio_to_address()") added support for PCI I/O space mapped into CPU
physical memory space.  With that support, the I/O ranges configured for
PCI/PCIe hosts on some architectures can be mapped to logical PIO and
converted easily between CPU address and the corresponding logical PIO.
Based on this, PCI I/O port space can be accessed via in/out accessors that
use memory read/write.

But on some platforms, there are bus hosts that access I/O port space with
host-local I/O port addresses rather than memory addresses.

Add a more generic I/O mapping method to support those devices.  With this
patch, both the CPU addresses and the host-local port can be mapped into
the logical PIO space with different logical/fake PIOs.  After this, all
the I/O accesses to either PCI MMIO devices or host-local I/O peripherals
can be unified into the existing I/O accessors defined in asm-generic/io.h
and be redirected to the right device-specific hooks based on the input
logical PIO.

Tested-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Zhichang Yuan <yuanzhichang@hisilicon.com>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
[bhelgaas: remove -EFAULT return from logic_pio_register_range() per
https://lkml.kernel.org/r/20180403143909.GA21171@ulmo, fix NULL pointer
checking per https://lkml.kernel.org/r/20180403211505.GA29612@embeddedor.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2018-03-21 17:18:34 -05:00
Linus Torvalds 2382dc9a3e Merge tag 'dma-mapping-4.16' of git://git.infradead.org/users/hch/dma-mapping
Pull dma mapping updates from Christoph Hellwig:
 "Except for a runtime warning fix from Christian this is all about
  consolidation of the generic no-IOMMU code, a well as the glue code
  for swiotlb.

  All the code is based on the x86 implementation with hooks to allow
  all architectures that aren't cache coherent to use it.

  The x86 conversion itself has been deferred because the x86
  maintainers were a little busy in the last months"

* tag 'dma-mapping-4.16' of git://git.infradead.org/users/hch/dma-mapping: (57 commits)
  MAINTAINERS: add the iommu list for swiotlb and xen-swiotlb
  arm64: use swiotlb_alloc and swiotlb_free
  arm64: replace ZONE_DMA with ZONE_DMA32
  mips: use swiotlb_{alloc,free}
  mips/netlogic: remove swiotlb support
  tile: use generic swiotlb_ops
  tile: replace ZONE_DMA with ZONE_DMA32
  unicore32: use generic swiotlb_ops
  ia64: remove an ifdef around the content of pci-dma.c
  ia64: clean up swiotlb support
  ia64: use generic swiotlb_ops
  ia64: replace ZONE_DMA with ZONE_DMA32
  swiotlb: remove various exports
  swiotlb: refactor coherent buffer allocation
  swiotlb: refactor coherent buffer freeing
  swiotlb: wire up ->dma_supported in swiotlb_dma_ops
  swiotlb: add common swiotlb_map_ops
  swiotlb: rename swiotlb_free to swiotlb_exit
  x86: rename swiotlb_dma_ops
  powerpc: rename swiotlb_dma_ops
  ...
2018-01-31 11:32:27 -08:00
Christoph Hellwig 002e67454f dma-direct: rename dma_noop to dma_direct
The trivial direct mapping implementation already does a virtual to
physical translation which isn't strictly a noop, and will soon learn
to do non-direct but linear physical to dma translations through the
device offset and a few small tricks.  Rename it to a better fitting
name.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
2018-01-15 09:35:06 +01:00
Bart Van Assche e80a0af475 lib/scatterlist: Introduce sgl_alloc() and sgl_free()
Many kernel drivers contain code that allocates and frees both a
scatterlist and the pages that populate that scatterlist.
Introduce functions in lib/scatterlist.c that perform these tasks
instead of duplicating this functionality in multiple drivers.
Only include these functions in the build if CONFIG_SGL_ALLOC=y
to avoid that the kernel size increases if this functionality is
not used.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06 09:18:00 -07:00
Linus Torvalds 14b661ebb6 Merge tag 'for-linus-20171120' of git://git.infradead.org/linux-mtd
Pull MTD updates from Richard Weinberger:
 "General changes:
   -  Unconfuse get_unmapped_area and point/unpoint driver methods
   -  New partition parser: sharpslpart
   -  Kill GENERIC_IO
   -  Various fixes

  NAND changes:
   -  Add a flag to mark NANDs that require 3 address cycles to encode a
      page address
   -  Set a default ECC/free layout when NAND_ECC_NONE is requested
   -  Fix a bug in panic_nand_write()
   -  Another batch of cleanups for the denali driver
   -  Fix PM support in the atmel driver
   -  Remove support for platform data in the omap driver
   -  Fix subpage write in the omap driver
   -  Fix irq handling in the mtk driver
   -  Change link order of mtk_ecc and mtk_nand drivers to speed up boot
      time
   -  Change log level of ECC error messages in the mxc driver
   -  Patch the pxa3xx driver to support Armada 8k platforms
   -  Add BAM DMA support to the qcom driver
   -  Convert gpio-nand to the GPIO desc API
   -  Fix ECC handling in the mt29f driver

  SPI-NOR changes:
   -  Introduce system power management support
   -  New mechanism to select the proper .quad_enable() hook by JEDEC
      ID, when needed, instead of only by manufacturer ID
   -  Add support to new memory parts from Gigadevice, Winbond, Macronix
      and Everspin
   -  Maintainance for Cadence, Intel, Mediatek and STM32 drivers"

*  tag 'for-linus-20171120' of git://git.infradead.org/linux-mtd: (85 commits)
  mtd: Avoid probe failures when mtd->dbg.dfs_dir is invalid
  mtd: sharpslpart: Add sharpslpart partition parser
  mtd: Add sanity checks in mtd_write/read_oob()
  mtd: remove the get_unmapped_area method
  mtd: implement mtd_get_unmapped_area() using the point method
  mtd: chips/map_rom.c: implement point and unpoint methods
  mtd: chips/map_ram.c: implement point and unpoint methods
  mtd: mtdram: properly handle the phys argument in the point method
  mtd: mtdswap: fix spelling mistake: 'TRESHOLD' -> 'THRESHOLD'
  mtd: slram: use memremap() instead of ioremap()
  kconfig: kill off GENERIC_IO option
  mtd: Fix C++ comment in include/linux/mtd/mtd.h
  mtd: constify mtd_partition
  mtd: plat-ram: Replace manual resource management by devm
  mtd: nand: Fix writing mtdoops to nand flash.
  mtd: intel-spi: Add Intel Lewisburg PCH SPI super SKU PCI ID
  mtd: nand: mtk: fix infinite ECC decode IRQ issue
  mtd: spi-nor: Add support for mr25h128
  mtd: nand: mtk: change the compile sequence of mtk_nand.o and mtk_ecc.o
  mtd: spi-nor: enable 4B opcodes for mx66l51235l
  ...
2017-11-22 20:46:06 -10:00
Geert Uytterhoeven d6b28e0996 lib: add module support to string tests
Extract the string test code into its own source file, to allow
compiling it either to a loadable module, or built into the kernel.

Fixes: 03270c13c5 ("lib/string.c: add testcases for memset16/32/64")
Link: http://lkml.kernel.org/r/1505397744-3387-1-git-send-email-geert@linux-m68k.org
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17 16:10:01 -08:00
Rob Herring 9de8da4774 kconfig: kill off GENERIC_IO option
The GENERIC_IO option is set for every architecture except tile and score
as those define NO_IOMEM. The option only controls visibility of
CONFIG_MTD which doesn't appear to be necessary for any reason, so let's
just remove GENERIC_IO.

Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: linux-mtd@lists.infradead.org
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-11-13 21:39:15 +01:00
Palmer Dabbelt b35cd9884f lib: Add shared copies of some GCC library routines
Many ports (m32r, microblaze, mips, parisc, score, and sparc) use
functionally identical copies of various GCC library routine files,
which came up as we were submitting the RISC-V port (which also uses
some of these).

This patch adds a new copy of these library routine files, which are
functionally identical to the various other copies.  These are
availiable via Kconfig as CONFIG_GENERIC_$ROUTINE, which currently isn't
used anywhere.

Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2017-09-25 15:50:57 -07:00
Linus Torvalds e7cdb60fd2 Merge branch 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull zstd support from Chris Mason:
 "Nick Terrell's patch series to add zstd support to the kernel has been
  floating around for a while. After talking with Dave Sterba, Herbert
  and Phillip, we decided to send the whole thing in as one pull
  request.

  zstd is a big win in speed over zlib and in compression ratio over
  lzo, and the compression team here at FB has gotten great results
  using it in production. Nick will continue to update the kernel side
  with new improvements from the open source zstd userland code.

  Nick has a number of benchmarks for the main zstd code in his lib/zstd
  commit:

      I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB
      of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel
      Core i7 processor, 16 GB of RAM, and a SSD. I benchmarked using
      `silesia.tar` [3], which is 211,988,480 B large. Run the following
      commands for the benchmark:

        sudo modprobe zstd_compress_test
        sudo mknod zstd_compress_test c 245 0
        sudo cp silesia.tar zstd_compress_test

      The time is reported by the time of the userland `cp`.
      The MB/s is computed with

        1,536,217,008 B / time(buffer size, hash)

      which includes the time to copy from userland.
      The Adjusted MB/s is computed with

        1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

      The memory reported is the amount of memory the compressor
      requests.

        | Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
        |----------|----------|----------|-------|---------|----------|----------|
        | none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
        | zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
        | zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
        | zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
        | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
        | zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
        | zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
        | zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
        | zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
        | zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
        | zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |

      I benchmarked zstd decompression using the same method on the same
      machine. The benchmark file is located in the upstream zstd repo
      under `contrib/linux-kernel/zstd_decompress_test.c` [4]. The
      memory reported is the amount of memory required to decompress
      data compressed with the given compression level. If you know the
      maximum size of your input, you can reduce the memory usage of
      decompression irrespective of the compression level.

        | Method   | Time (s) | MB/s    | Adjusted MB/s | Memory (MB) |
        |----------|----------|---------|---------------|-------------|
        | none     |    0.025 | 8479.54 |             - |           - |
        | zstd -1  |    0.358 |  592.15 |        636.60 |        0.84 |
        | zstd -3  |    0.396 |  535.32 |        571.40 |        1.46 |
        | zstd -5  |    0.396 |  535.32 |        571.40 |        1.46 |
        | zstd -10 |    0.374 |  566.81 |        607.42 |        2.51 |
        | zstd -15 |    0.379 |  559.34 |        598.84 |        4.61 |
        | zstd -19 |    0.412 |  514.54 |        547.77 |        8.80 |
        | zlib -1  |    0.940 |  225.52 |        231.68 |        0.04 |
        | zlib -3  |    0.883 |  240.08 |        247.07 |        0.04 |
        | zlib -6  |    0.844 |  251.17 |        258.84 |        0.04 |
        | zlib -9  |    0.837 |  253.27 |        287.64 |        0.04 |

  I ran a long series of tests and benchmarks on the btrfs side and the
  gains are very similar to the core benchmarks Nick ran"

* 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  squashfs: Add zstd support
  btrfs: Add zstd support
  lib: Add zstd modules
  lib: Add xxhash module
2017-09-14 17:30:49 -07:00
Linus Torvalds 89fd915c40 Merge tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm from Dan Williams:
 "A rework of media error handling in the BTT driver and other updates.
  It has appeared in a few -next releases and collected some late-
  breaking build-error and warning fixups as a result.

  Summary:

   - Media error handling support in the Block Translation Table (BTT)
     driver is reworked to address sleeping-while-atomic locking and
     memory-allocation-context conflicts.

   - The dax_device lookup overhead for xfs and ext4 is moved out of the
     iomap hot-path to a mount-time lookup.

   - A new 'ecc_unit_size' sysfs attribute is added to advertise the
     read-modify-write boundary property of a persistent memory range.

   - Preparatory fix-ups for arm and powerpc pmem support are included
     along with other miscellaneous fixes"

* tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits)
  libnvdimm, btt: fix format string warnings
  libnvdimm, btt: clean up warning and error messages
  ext4: fix null pointer dereference on sbi
  libnvdimm, nfit: move the check on nd_reserved2 to the endpoint
  dax: fix FS_DAX=n BLOCK=y compilation
  libnvdimm: fix integer overflow static analysis warning
  libnvdimm, nd_blk: remove mmio_flush_range()
  libnvdimm, btt: rework error clearing
  libnvdimm: fix potential deadlock while clearing errors
  libnvdimm, btt: cache sector_size in arena_info
  libnvdimm, btt: ensure that flags were also unchanged during a map_read
  libnvdimm, btt: refactor map entry operations with macros
  libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path
  libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute
  ext4: perform dax_device lookup at mount
  ext2: perform dax_device lookup at mount
  xfs: perform dax_device lookup at mount
  dax: introduce a fs_dax_get_by_bdev() helper
  libnvdimm, btt: check memory allocation failure
  libnvdimm, label: fix index block size calculation
  ...
2017-09-11 13:10:57 -07:00
Matthew Wilcox 03270c13c5 lib/string.c: add testcases for memset16/32/64
[akpm@linux-foundation.org: minor tweaks]
Link: http://lkml.kernel.org/r/20170720184539.31609-3-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:48 -07:00
Robin Murphy 5deb67f77a libnvdimm, nd_blk: remove mmio_flush_range()
mmio_flush_range() suffers from a lack of clearly-defined semantics,
and is somewhat ambiguous to port to other architectures where the
scope of the writeback implied by "flush" and ordering might matter,
but MMIO would tend to imply non-cacheable anyway. Per the rationale
in 67a3e8fe90 ("nd_blk: change aperture mapping from WC to WB"), the
only existing use is actually to invalidate clean cache lines for
ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
cleanup of the pmem API, that also now happens to be the exact purpose
of arch_invalidate_pmem(), which would be a far more well-defined tool
for the job.

Rather than risk potentially inconsistent implementations of
mmio_flush_range() for the sake of one callsite, streamline things by
removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
definitions up to the libnvdimm level, so they can be shared by NFIT
as well. This allows NFIT to be enabled for arm64.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-08-31 15:05:10 -07:00
Nick Terrell 73f3d1b48f lib: Add zstd modules
Add zstd compression and decompression kernel modules.
zstd offers a wide varity of compression speed and quality trade-offs.
It can compress at speeds approaching lz4, and quality approaching lzma.
zstd decompressions at speeds more than twice as fast as zlib, and
decompression speed remains roughly the same across all compression levels.

The code was ported from the upstream zstd source repository. The
`linux/zstd.h` header was modified to match linux kernel style.
The cross-platform and allocation code was stripped out. Instead zstd
requires the caller to pass a preallocated workspace. The source files
were clang-formatted [1] to match the Linux Kernel style as much as
possible. Otherwise, the code was unmodified. We would like to avoid
as much further manual modification to the source code as possible, so it
will be easier to keep the kernel zstd up to date.

I benchmarked zstd compression as a special character device. I ran zstd
and zlib compression at several levels, as well as performing no
compression, which measure the time spent copying the data to kernel space.
Data is passed to the compresser 4096 B at a time. The benchmark file is
located in the upstream zstd source repository under
`contrib/linux-kernel/zstd_compress_test.c` [2].

I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
211,988,480 B large. Run the following commands for the benchmark:

    sudo modprobe zstd_compress_test
    sudo mknod zstd_compress_test c 245 0
    sudo cp silesia.tar zstd_compress_test

The time is reported by the time of the userland `cp`.
The MB/s is computed with

    1,536,217,008 B / time(buffer size, hash)

which includes the time to copy from userland.
The Adjusted MB/s is computed with

    1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

The memory reported is the amount of memory the compressor requests.

| Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
|----------|----------|----------|-------|---------|----------|----------|
| none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
| zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
| zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
| zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
| zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
| zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
| zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
| zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
| zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
| zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
| zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |

I benchmarked zstd decompression using the same method on the same machine.
The benchmark file is located in the upstream zstd repo under
`contrib/linux-kernel/zstd_decompress_test.c` [4]. The memory reported is
the amount of memory required to decompress data compressed with the given
compression level. If you know the maximum size of your input, you can
reduce the memory usage of decompression irrespective of the compression
level.

| Method   | Time (s) | MB/s    | Adjusted MB/s | Memory (MB) |
|----------|----------|---------|---------------|-------------|
| none     |    0.025 | 8479.54 |             - |           - |
| zstd -1  |    0.358 |  592.15 |        636.60 |        0.84 |
| zstd -3  |    0.396 |  535.32 |        571.40 |        1.46 |
| zstd -5  |    0.396 |  535.32 |        571.40 |        1.46 |
| zstd -10 |    0.374 |  566.81 |        607.42 |        2.51 |
| zstd -15 |    0.379 |  559.34 |        598.84 |        4.61 |
| zstd -19 |    0.412 |  514.54 |        547.77 |        8.80 |
| zlib -1  |    0.940 |  225.52 |        231.68 |        0.04 |
| zlib -3  |    0.883 |  240.08 |        247.07 |        0.04 |
| zlib -6  |    0.844 |  251.17 |        258.84 |        0.04 |
| zlib -9  |    0.837 |  253.27 |        287.64 |        0.04 |

Tested in userland using the test-suite in the zstd repo under
`contrib/linux-kernel/test/UserlandTest.cpp` [5] by mocking the kernel
functions. Fuzz tested using libfuzzer [6] with the fuzz harnesses under
`contrib/linux-kernel/test/{RoundTripCrash.c,DecompressCrash.c}` [7] [8]
with ASAN, UBSAN, and MSAN. Additionaly, it was tested while testing the
BtrFS and SquashFS patches coming next.

[1] https://clang.llvm.org/docs/ClangFormat.html
[2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_compress_test.c
[3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
[4] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_decompress_test.c
[5] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/UserlandTest.cpp
[6] http://llvm.org/docs/LibFuzzer.html
[7] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/RoundTripCrash.c
[8] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/DecompressCrash.c

zstd source repository: https://github.com/facebook/zstd

Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2017-08-15 09:02:08 -07:00
Nick Terrell 5d2405227a lib: Add xxhash module
Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an
extremely fast non-cryptographic hash algorithm for checksumming.
The zstd compression and decompression modules added in the next patch
require xxhash. I extracted it out from zstd since it is useful on its
own. I copied the code from the upstream XXHash source repository and
translated it into kernel style. I ran benchmarks and tests in the kernel
and tests in userland.

I benchmarked xxhash as a special character device. I ran in four modes,
no-op, xxh32, xxh64, and crc32. The no-op mode simply copies the data to
kernel space and ignores it. The xxh32, xxh64, and crc32 modes compute
hashes on the copied data. I also ran it with four different buffer sizes.
The benchmark file is located in the upstream zstd source repository under
`contrib/linux-kernel/xxhash_test.c` [1].

I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
16 GB of RAM, and a SSD. I benchmarked using the file `filesystem.squashfs`
from `ubuntu-16.10-desktop-amd64.iso`, which is 1,536,217,088 B large.
Run the following commands for the benchmark:

    modprobe xxhash_test
    mknod xxhash_test c 245 0
    time cp filesystem.squashfs xxhash_test

The time is reported by the time of the userland `cp`.
The GB/s is computed with

    1,536,217,008 B / time(buffer size, hash)

which includes the time to copy from userland.
The Normalized GB/s is computed with

    1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

| Buffer Size (B) | Hash  | Time (s) | GB/s | Adjusted GB/s |
|-----------------|-------|----------|------|---------------|
|            1024 | none  |    0.408 | 3.77 |             - |
|            1024 | xxh32 |    0.649 | 2.37 |          6.37 |
|            1024 | xxh64 |    0.542 | 2.83 |         11.46 |
|            1024 | crc32 |    1.290 | 1.19 |          1.74 |
|            4096 | none  |    0.380 | 4.04 |             - |
|            4096 | xxh32 |    0.645 | 2.38 |          5.79 |
|            4096 | xxh64 |    0.500 | 3.07 |         12.80 |
|            4096 | crc32 |    1.168 | 1.32 |          1.95 |
|            8192 | none  |    0.351 | 4.38 |             - |
|            8192 | xxh32 |    0.614 | 2.50 |          5.84 |
|            8192 | xxh64 |    0.464 | 3.31 |         13.60 |
|            8192 | crc32 |    1.163 | 1.32 |          1.89 |
|           16384 | none  |    0.346 | 4.43 |             - |
|           16384 | xxh32 |    0.590 | 2.60 |          6.30 |
|           16384 | xxh64 |    0.466 | 3.30 |         12.80 |
|           16384 | crc32 |    1.183 | 1.30 |          1.84 |

Tested in userland using the test-suite in the zstd repo under
`contrib/linux-kernel/test/XXHashUserlandTest.cpp` [2] by mocking the
kernel functions. A line in each branch of every function in `xxhash.c`
was commented out to ensure that the test-suite fails. Additionally
tested while testing zstd and with SMHasher [3].

[1] https://phabricator.intern.facebook.com/P57526246
[2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/XXHashUserlandTest.cpp
[3] https://github.com/aappleby/smhasher

zstd source repository: https://github.com/facebook/zstd
XXHash source repository: https://github.com/cyan4973/xxhash

Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2017-08-15 09:02:07 -07:00
Linus Torvalds b6ffe9ba46 Merge tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
 "libnvdimm updates for the latest ACPI and UEFI specifications. This
  pull request also includes new 'struct dax_operations' enabling to
  undo the abuse of copy_user_nocache() for copy operations to pmem.

  The dax work originally missed 4.12 to address concerns raised by Al.

  Summary:

   - Introduce the _flushcache() family of memory copy helpers and use
     them for persistent memory write operations on x86. The
     _flushcache() semantic indicates that the cache is either bypassed
     for the copy operation (movnt) or any lines dirtied by the copy
     operation are written back (clwb, clflushopt, or clflush).

   - Extend dax_operations with ->copy_from_iter() and ->flush()
     operations. These operations and other infrastructure updates allow
     all persistent memory specific dax functionality to be pushed into
     libnvdimm and the pmem driver directly. It also allows dax-specific
     sysfs attributes to be linked to a host device, for example:
     /sys/block/pmem0/dax/write_cache

   - Add support for the new NVDIMM platform/firmware mechanisms
     introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
     namespace label format, extensions to the address-range-scrub
     command set, new error injection commands, and a new BTT
     (block-translation-table) layout. These updates support inter-OS
     and pre-OS compatibility.

   - Fix a longstanding memory corruption bug in nfit_test.

   - Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
     capable.

   - Miscellaneous fixes and small updates across libnvdimm and the nfit
     driver.

  Acknowledgements that came after the branch was pushed: commit
  6aa734a2f3 ("libnvdimm, region, pmem: fix 'badblocks'
  sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
  <toshi.kani@hpe.com>"

* tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
  libnvdimm, namespace: record 'lbasize' for pmem namespaces
  acpi/nfit: Issue Start ARS to retrieve existing records
  libnvdimm: New ACPI 6.2 DSM functions
  acpi, nfit: Show bus_dsm_mask in sysfs
  libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
  acpi, nfit: Enable DSM pass thru for root functions.
  libnvdimm: passthru functions clear to send
  libnvdimm, btt: convert some info messages to warn/err
  libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
  libnvdimm: fix the clear-error check in nsio_rw_bytes
  libnvdimm, btt: fix btt_rw_page not returning errors
  acpi, nfit: quiet invalid block-aperture-region warnings
  libnvdimm, btt: BTT updates for UEFI 2.7 format
  acpi, nfit: constify *_attribute_group
  libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
  libnvdimm, pmem, dax: export a cache control attribute
  dax: convert to bitmask for flags
  dax: remove default copy_from_iter fallback
  libnvdimm, nfit: enable support for volatile ranges
  libnvdimm, pmem: fix persistence warning
  ...
2017-07-07 09:44:06 -07:00
Dan Williams 0aed55af88 x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations
The pmem driver has a need to transfer data with a persistent memory
destination and be able to rely on the fact that the destination writes are not
cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
(non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
to ensure data-writes have reached a power-fail-safe zone in the platform. The
fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
around and fence previous writes with an "sfence".

Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
memcpy_flushcache, that guarantee that the destination buffer is not dirty in
the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
will be used to replace the "pmem api" (include/linux/pmem.h +
arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
otherwise.

This is meant to satisfy the concern from Linus that if a driver wants to do
something beyond the normal nocache semantics it should be something private to
that driver [1], and Al's concern that anything uaccess related belongs with
the rest of the uaccess code [2].

The first consumer of this interface is a new 'copy_from_iter' dax operation so
that pmem can inject cache maintenance operations without imposing this
overhead on other dax-capable drivers.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html

Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-09 09:09:56 -07:00
Jeremy Kerr 0cbaa44841 lib: Add crc4 module
Add a little helper for crc4 calculations. This works 4-bits-at-a-time,
using a simple table approach.

We will need this in the FSI core code, as well as any master
implementations that need to calculate CRCs in software.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Chris Bostic <cbostic@linux.vnet.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-09 11:52:07 +02:00
Linus Torvalds c2eca00fec Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Don't save TIPC header values before the header has been validated,
    from Jon Paul Maloy.

 2) Fix memory leak in RDS, from Zhu Yanjun.

 3) We miss to initialize the UID in the flow key in some paths, from
    Julian Anastasov.

 4) Fix latent TOS masking bug in the routing cache removal from years
    ago, also from Julian.

 5) We forget to set the sockaddr port in sctp_copy_local_addr_list(),
    fix from Xin Long.

 6) Missing module ref count drop in packet scheduler actions, from
    Roman Mashak.

 7) Fix RCU annotations in rht_bucket_nested, from Herbert Xu.

 8) Fix use after free which happens because L2TP's ipv4 support returns
    non-zero values from it's backlog_rcv function which ipv4 interprets
    as protocol values. Fix from Paul Hüber.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
  qed: Don't use attention PTT for configuring BW
  qed: Fix race with multiple VFs
  l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv
  xfrm: provide correct dst in xfrm_neigh_lookup
  rhashtable: Fix RCU dereference annotation in rht_bucket_nested
  rhashtable: Fix use before NULL check in bucket_table_free
  net sched actions: do not overwrite status of action creation.
  rxrpc: Kernel calls get stuck in recvmsg
  net sched actions: decrement module reference count after table flush.
  lib: Allow compile-testing of parman
  ipv6: check sk sk_type and protocol early in ip_mroute_set/getsockopt
  sctp: set sin_port for addr param when checking duplicate address
  net/mlx4_en: fix overflow in mlx4_en_init_timestamp()
  netfilter: nft_set_bitmap: incorrect bitmap size
  net: s2io: fix typo argumnet argument
  net: vxge: fix typo argumnet argument
  netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value.
  ipv4: mask tos for input route
  ipv4: add missing initialization for flowi4_uid
  lib: fix spelling mistake: "actualy" -> "actually"
  ...
2017-02-28 10:00:39 -08:00