CMA has been enabled unconditionally on all ARMv6+ systems to solve the
long standing issue of double kernel mappings for all dma coherent
buffers. This however created a dependency on CONFIG_EXPERIMENTAL for
the whole ARM architecture what should be really avoided. This patch
removes this dependency and lets one use old, well-tested dma-mapping
implementation also on ARMv6+ systems without the need to use
EXPERIMENTAL stuff.
Reported-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Pull CMA and ARM DMA-mapping updates from Marek Szyprowski:
"These patches contain two major updates for DMA mapping subsystem
(mainly for ARM architecture). First one is Contiguous Memory
Allocator (CMA) which makes it possible for device drivers to allocate
big contiguous chunks of memory after the system has booted.
The main difference from the similar frameworks is the fact that CMA
allows to transparently reuse the memory region reserved for the big
chunk allocation as a system memory, so no memory is wasted when no
big chunk is allocated. Once the alloc request is issued, the
framework migrates system pages to create space for the required big
chunk of physically contiguous memory.
For more information one can refer to nice LWN articles:
- 'A reworked contiguous memory allocator':
http://lwn.net/Articles/447405/
- 'CMA and ARM':
http://lwn.net/Articles/450286/
- 'A deep dive into CMA':
http://lwn.net/Articles/486301/
- and the following thread with the patches and links to all previous
versions:
https://lkml.org/lkml/2012/4/3/204
The main client for this new framework is ARM DMA-mapping subsystem.
The second part provides a complete redesign in ARM DMA-mapping
subsystem. The core implementation has been changed to use common
struct dma_map_ops based infrastructure with the recent updates for
new dma attributes merged in v3.4-rc2. This allows to use more than
one implementation of dma-mapping calls and change/select them on the
struct device basis. The first client of this new infractructure is
dmabounce implementation which has been completely cut out of the
core, common code.
The last patch of this redesign update introduces a new, experimental
implementation of dma-mapping calls on top of generic IOMMU framework.
This lets ARM sub-platform to transparently use IOMMU for DMA-mapping
calls if one provides required IOMMU hardware.
For more information please refer to the following thread:
http://www.spinics.net/lists/arm-kernel/msg175729.html
The last patch merges changes from both updates and provides a
resolution for the conflicts which cannot be avoided when patches have
been applied on the same files (mainly arch/arm/mm/dma-mapping.c)."
Acked by Andrew Morton <akpm@linux-foundation.org>:
"Yup, this one please. It's had much work, plenty of review and I
think even Russell is happy with it."
* 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: (28 commits)
ARM: dma-mapping: use PMD size for section unmap
cma: fix migration mode
ARM: integrate CMA with DMA-mapping subsystem
X86: integrate CMA with DMA-mapping subsystem
drivers: add Contiguous Memory Allocator
mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks
mm: extract reclaim code from __alloc_pages_direct_reclaim()
mm: Serialize access to min_free_kbytes
mm: page_isolation: MIGRATE_CMA isolation functions added
mm: mmzone: MIGRATE_CMA migration type added
mm: page_alloc: change fallbacks array handling
mm: page_alloc: introduce alloc_contig_range()
mm: compaction: export some of the functions
mm: compaction: introduce isolate_freepages_range()
mm: compaction: introduce map_pages()
mm: compaction: introduce isolate_migratepages_range()
mm: page_alloc: remove trailing whitespace
ARM: dma-mapping: add support for IOMMU mapper
ARM: dma-mapping: use alloc, mmap, free from dma_ops
ARM: dma-mapping: remove redundant code and do the cleanup
...
Conflicts:
arch/x86/include/asm/dma-mapping.h
Pull arm-soc power management changes from Olof Johansson:
"Power management changes here are mostly for the omap platform, but
also include cpuidle changes for ux500 and suspend/resume code for
mmp."
* tag 'pm' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (48 commits)
ARM: OMAP2+: WDTIMER integration: fix !PM boot crash, disarm timer after hwmod reset
ARM: OMAP2/3: hwmod data: Add 32k-sync timer data to hwmod database
ARM: OMAP4: hwmod_data: Name the common irq for McBSP ports
ARM: OMAP4: hwmod data: I2C: add flag for context restore
ARM: OMAP3: hwmod_data: Rename the common irq for McBSP ports
ARM: OMAP2xxx: hwmod data: add HDQ/1-wire hwmod
ARM: OMAP3: hwmod data: add HDQ/1-wire hwmod
ARM: OMAP2+: hwmod data: add HDQ/1-wire hwmod shared data
ARM: OMAP2+: HDQ1W: add custom reset function
ARM: OMAP2420: hwmod data: Add MMC hwmod data for 2420
arm: omap3: clockdomain data: Remove superfluous commas from gfx_sgx_3xxx_wkdeps[]
ARM: OMAP2+: powerdomain: Get rid off duplicate pwrdm_clkdm_state_switch() API
ARM: OMAP3: clock data: add clockdomain for HDQ functional clock
ARM: OMAP3+: dpll: Configure autoidle mode only if it's supported
ARM: OMAP2+: dmtimer: cleanup iclk usage
ARM: OMAP4+: Add prm and cm base init function.
ARM: OMAP2/3: Add idle_st bits for ST_32KSYNC timer to prcm-common header
ARM: OMAP3: Fix CM register bit masks
ARM: OMAP: clock: convert AM3517/3505 detection/flags to AM35xx
ARM: OMAP3: clock data: treat all AM35x devices the same
...
The dma_contiguous_remap() function clears existing section maps using
the wrong size (PGDIR_SIZE instead of PMD_SIZE). This is a bug which
does not affect non-LPAE systems, where PGDIR_SIZE and PMD_SIZE are the same.
On LPAE systems, however, this bug causes the kernel to hang at this point.
This fix has been tested on both LPAE and non-LPAE kernel builds.
Signed-off-by: Vitaly Andrianov <vitalya@ti.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialisation).
Contiguous memory areas reserved for DMA are remapped with 2-level page
tables on boot. Once a buffer is requested, a low memory kernel mapping
is updated to to match requested memory access type.
GFP_ATOMIC allocations are performed from special pool which is created
early during boot. This way remapping page attributes is not needed on
allocation time.
CMA has been enabled unconditionally for ARMv6+ systems.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Rob Clark <rob.clark@linaro.org>
Tested-by: Ohad Ben-Cohen <ohad@wizery.com>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Robert Nelson <robertcnelson@gmail.com>
Tested-by: Barry Song <Baohua.Song@csr.com>
This patch add a complete implementation of DMA-mapping API for
devices which have IOMMU support.
This implementation tries to optimize dma address space usage by remapping
all possible physical memory chunks into a single dma address space chunk.
DMA address space is managed on top of the bitmap stored in the
dma_iommu_mapping structure stored in device->archdata. Platform setup
code has to initialize parameters of the dma address space (base address,
size, allocation precision order) with arm_iommu_create_mapping() function.
To reduce the size of the bitmap, all allocations are aligned to the
specified order of base 4 KiB pages.
dma_alloc_* functions allocate physical memory in chunks, each with
alloc_pages() function to avoid failing if the physical memory gets
fragmented. In worst case the allocated buffer is composed of 4 KiB page
chunks.
dma_map_sg() function minimizes the total number of dma address space
chunks by merging of physical memory chunks into one larger dma address
space chunk. If requested chunk (scatter list entry) boundaries
match physical page boundaries, most calls to dma_map_sg() requests will
result in creating only one chunk in dma address space.
dma_map_page() simply creates a mapping for the given page(s) in the dma
address space.
All dma functions also perform required cache operation like their
counterparts from the arm linear physical memory mapping version.
This patch contains code and fixes kindly provided by:
- Krishna Reddy <vdumpa@nvidia.com>,
- Andrzej Pietrasiewicz <andrzej.p@samsung.com>,
- Hiroshi DOYU <hdoyu@nvidia.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
This patch converts dma_alloc/free/mmap_{coherent,writecombine}
functions to use generic alloc/free/mmap methods from dma_map_ops
structure. A new DMA_ATTR_WRITE_COMBINE DMA attribute have been
introduced to implement writecombine methods.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
This patch just performs a global cleanup in DMA mapping implementation
for ARM architecture. Some of the tiny helper functions have been moved
to the caller code, some have been merged together.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
This patch removes dma bounce hooks from the common dma mapping
implementation on ARM architecture and creates a separate set of
dma_map_ops for dma bounce devices.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
This patch converts all dma_sg methods to be generic (independent of the
current DMA mapping implementation for ARM architecture). All dma sg
operations are now implemented on top of respective
dma_map_page/dma_sync_single_for* operations from dma_map_ops structure.
Before this patch there were custom methods for all scatter/gather
related operations. They iterated over the whole scatter list and called
cache related operations directly (which in turn checked if we use dma
bounce code or not and called respective version). This patch changes
them not to use such shortcut. Instead it provides similar loop over
scatter list and calls methods from the device's dma_map_ops structure.
This enables us to use device dependent implementations of cache related
operations (direct linear or dma bounce) depending on the provided
dma_map_ops structure.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
This patch modifies dma-mapping implementation on ARM architecture to
use common dma_map_ops structure and asm-generic/dma-mapping-common.h
helpers.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
This patch removes the need for the offset parameter in dma bounce
functions. This is required to let dma-mapping framework on ARM
architecture to use common, generic dma_map_ops based dma-mapping
helpers.
Background and more detailed explaination:
dma_*_range_* functions are available from the early days of the dma
mapping api. They are the correct way of doing a partial syncs on the
buffer (usually used by the network device drivers). This patch changes
only the internal implementation of the dma bounce functions to let
them tunnel through dma_map_ops structure. The driver api stays
unchanged, so driver are obliged to call dma_*_range_* functions to
keep code clean and easy to understand.
The only drawback from this patch is reduced detection of the dma api
abuse. Let us consider the following code:
dma_addr = dma_map_single(dev, ptr, 64, DMA_TO_DEVICE);
dma_sync_single_range_for_cpu(dev, dma_addr+16, 0, 32, DMA_TO_DEVICE);
Without the patch such code fails, because dma bounce code is unable
to find the bounce buffer for the given dma_address. After the patch
the above sync call will be equivalent to:
dma_sync_single_range_for_cpu(dev, dma_addr, 16, 32, DMA_TO_DEVICE);
which succeeds.
I don't consider this as a real problem, because DMA API abuse should be
caught by debug_dma_* function family. This patch lets us to simplify
the internal low-level implementation without chaning the driver visible
API.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
A zero value for prot_sect in the memory types table implies that
section mappings should never be created for the memory type in question.
This is checked for in alloc_init_section().
With LPAE, we set a bit to mask access flag faults for kernel mappings.
This breaks the aforementioned (!prot_sect) check in alloc_init_section().
This patch fixes this bug by first checking for a non-zero
prot_sect before setting the PMD_SECT_AF flag.
Signed-off-by: Vitaly Andrianov <vitalya@ti.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
For the SOC chips using tauros2 cache, will need disable
and resume tauros2 cache for SOC suspend/resume.
Signed-off-by: Chao Xie <chao.xie@marvell.com>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
When enable ARCH_SUSPEND_POSSIBLE, it need defintion of
cpu_mohawk_do_suspend and cpu_mohawk_do_resume
Signed-off-by: Chao Xie <chao.xie@marvell.com>
Signed-off-by: Haojian Zhuang <<haojian.zhuang@gmail.com>
This patch removes support for ARMv3 CPUs, which haven't worked properly
for quite some time (see the FIXME comment in arch/arm/mm/fault.c). The
only V3 parts left is the cache model for ARMv3, which is needed for some
odd reason by ARM740T CPUs, and being able to build with -march=armv3,
which is required for the RiscPC platform due to its bus structure.
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The cacheflush syscall can fail for two reasons:
(1) The arguments are invalid (nonsensical address range or no VMA)
(2) The region generates a translation fault on a VIPT or PIPT cache
This patch allows do_cache_op to return an error code to userspace in
the case of the above. The various coherent_user_range implementations
are modified to return 0 in the case of VIVT caches or -EFAULT in the
case of an abort on v6/v7 cores.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>