Merge branch 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine updates from Vinod Koul: "Main features this time are: - BAM v1.3.0 support form qcom bam dma - support for Allwinner sun8i dma - atmels eXtended DMA Controller driver - chancnt cleanup by Maxime - fixes spread over drivers" * 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma: (56 commits) dmaenegine: Delete a check before free_percpu() dmaengine: ioatdma: fix dma mapping errors dma: cppi41: add a delay while setting the TD bit dma: cppi41: wait longer for the HW to return the descriptor dmaengine: fsl-edma: fixup reg offset and hw S/G support in big-endian model dmaengine: fsl-edma: fix calculation of remaining bytes drivers/dma/pch_dma: declare pch_dma_id_table as static dmaengine: ste_dma40: fix error return code dma: imx-sdma: clarify about firmware not found error Documentation: devicetree: Fix Xilinx VDMA specification dmaengine: pl330: update author info dmaengine: clarify the issue_pending expectations dmaengine: at_xdmac: Add DMA_PRIVATE ARM: dts: at_xdmac: fix bad value of dma-cells in documentation dmaengine: at_xdmac: fix missing spin_unlock dmaengine: at_xdmac: fix a bug in transfer residue computation dmaengine: at_xdmac: fix software lockup at_xdmac_tx_status() dmaengine: at_xdmac: remove chancnt affectation dmaengine: at_xdmac: prefer usage of readl/writel_relaxed dmaengine: xdmac: fix print warning on dma_addr_t variable ...
2026-05-01 15:00:59 -07:00 · 2014-12-12 14:59:53 -08:00
parent eea0cf3fcd a9507ca3fb
commit 87c779baab
47 changed files with 2413 additions and 268 deletions
@@ -0,0 +1,54 @@
+* Atmel Extensible Direct Memory Access Controller (XDMAC)
+
+* XDMA Controller
+Required properties:
+- compatible: Should be "atmel,<chip>-dma".
+  <chip> compatible description:
+  - sama5d4: first SoC adding the XDMAC
+- reg: Should contain DMA registers location and length.
+- interrupts: Should contain DMA interrupt.
+- #dma-cells: Must be <1>, used to represent the number of integer cells in
+the dmas property of client devices.
+  - The 1st cell specifies the channel configuration register:
+    - bit 13: SIF, source interface identifier, used to get the memory
+    interface identifier,
+    - bit 14: DIF, destination interface identifier, used to get the peripheral
+    interface identifier,
+    - bit 30-24: PERID, peripheral identifier.
+
+Example:
+
+dma1: dma-controller@f0004000 {
+	compatible = "atmel,sama5d4-dma";
+	reg = <0xf0004000 0x200>;
+	interrupts = <50 4 0>;
+	#dma-cells = <1>;
+};
+
+
+* DMA clients
+DMA clients connected to the Atmel XDMA controller must use the format
+described in the dma.txt file, using a one-cell specifier for each channel.
+The two cells in order are:
+1. A phandle pointing to the DMA controller.
+2. Channel configuration register. Configurable fields are:
+    - bit 13: SIF, source interface identifier, used to get the memory
+    interface identifier,
+    - bit 14: DIF, destination interface identifier, used to get the peripheral
+    interface identifier,
+  - bit 30-24: PERID, peripheral identifier.
+
+Example:
+
+i2c2: i2c@f8024000 {
+	compatible = "atmel,at91sam9x5-i2c";
+	reg = <0xf8024000 0x4000>;
+	interrupts = <34 4 6>;
+	dmas = <&dma1
+		(AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1)
+		 | AT91_XDMAC_DT_PERID(6))>,
+	       <&dma1
+		(AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1)
+		| AT91_XDMAC_DT_PERID(7))>;
+	dma-names = "tx", "rx";
+};
@@ -48,6 +48,7 @@ The full ID of peripheral types can be found below.
 	21	ESAI
 	22	SSI Dual FIFO	(needs firmware ver >= 2)
 	23	Shared ASRC
+	24	SAI

 The third cell specifies the transfer priority as below.

@@ -1,7 +1,9 @@
 QCOM BAM DMA controller

 Required properties:
- compatible: must contain "qcom,bam-v1.4.0" for MSM8974
+- compatible: must be one of the following:
+ * "qcom,bam-v1.4.0" for MSM8974, APQ8074 and APQ8084
+ * "qcom,bam-v1.3.0" for APQ8064, IPQ8064 and MSM8960
 - reg: Address range for DMA registers
 - interrupts: Should contain the one interrupt shared by all channels
 - #dma-cells: must be <1>, the cell in the dmas property of the client device
@@ -4,7 +4,7 @@ This driver follows the generic DMA bindings defined in dma.txt.

 Required properties:

- compatible:	Must be "allwinner,sun6i-a31-dma"
+- compatible:	Must be "allwinner,sun6i-a31-dma" or "allwinner,sun8i-a23-dma"
 - reg:		Should contain the registers base address and length
 - interrupts:	Should contain a reference to the interrupt used by this device
 - clocks:	Should contain a reference to the parent AHB clock
@@ -0,0 +1,366 @@
+DMAengine controller documentation
+==================================
+
+Hardware Introduction
+++++++++++++++++++++
+
+Most of the Slave DMA controllers have the same general principles of
+operations.
+
+They have a given number of channels to use for the DMA transfers, and
+a given number of requests lines.
+
+Requests and channels are pretty much orthogonal. Channels can be used
+to serve several to any requests. To simplify, channels are the
+entities that will be doing the copy, and requests what endpoints are
+involved.
+
+The request lines actually correspond to physical lines going from the
+DMA-eligible devices to the controller itself. Whenever the device
+will want to start a transfer, it will assert a DMA request (DRQ) by
+asserting that request line.
+
+A very simple DMA controller would only take into account a single
+parameter: the transfer size. At each clock cycle, it would transfer a
+byte of data from one buffer to another, until the transfer size has
+been reached.
+
+That wouldn't work well in the real world, since slave devices might
+require a specific number of bits to be transferred in a single
+cycle. For example, we may want to transfer as much data as the
+physical bus allows to maximize performances when doing a simple
+memory copy operation, but our audio device could have a narrower FIFO
+that requires data to be written exactly 16 or 24 bits at a time. This
+is why most if not all of the DMA controllers can adjust this, using a
+parameter called the transfer width.
+
+Moreover, some DMA controllers, whenever the RAM is used as a source
+or destination, can group the reads or writes in memory into a buffer,
+so instead of having a lot of small memory accesses, which is not
+really efficient, you'll get several bigger transfers. This is done
+using a parameter called the burst size, that defines how many single
+reads/writes it's allowed to do without the controller splitting the
+transfer into smaller sub-transfers.
+
+Our theoretical DMA controller would then only be able to do transfers
+that involve a single contiguous block of data. However, some of the
+transfers we usually have are not, and want to copy data from
+non-contiguous buffers to a contiguous buffer, which is called
+scatter-gather.
+
+DMAEngine, at least for mem2dev transfers, require support for
+scatter-gather. So we're left with two cases here: either we have a
+quite simple DMA controller that doesn't support it, and we'll have to
+implement it in software, or we have a more advanced DMA controller,
+that implements in hardware scatter-gather.
+
+The latter are usually programmed using a collection of chunks to
+transfer, and whenever the transfer is started, the controller will go
+over that collection, doing whatever we programmed there.
+
+This collection is usually either a table or a linked list. You will
+then push either the address of the table and its number of elements,
+or the first item of the list to one channel of the DMA controller,
+and whenever a DRQ will be asserted, it will go through the collection
+to know where to fetch the data from.
+
+Either way, the format of this collection is completely dependent on
+your hardware. Each DMA controller will require a different structure,
+but all of them will require, for every chunk, at least the source and
+destination addresses, whether it should increment these addresses or
+not and the three parameters we saw earlier: the burst size, the
+transfer width and the transfer size.
+
+The one last thing is that usually, slave devices won't issue DRQ by
+default, and you have to enable this in your slave device driver first
+whenever you're willing to use DMA.
+
+These were just the general memory-to-memory (also called mem2mem) or
+memory-to-device (mem2dev) kind of transfers. Most devices often
+support other kind of transfers or memory operations that dmaengine
+support and will be detailed later in this document.
+
+DMA Support in Linux
++++++++++++++++++++
+
+Historically, DMA controller drivers have been implemented using the
+async TX API, to offload operations such as memory copy, XOR,
+cryptography, etc., basically any memory to memory operation.
+
+Over time, the need for memory to device transfers arose, and
+dmaengine was extended. Nowadays, the async TX API is written as a
+layer on top of dmaengine, and acts as a client. Still, dmaengine
+accommodates that API in some cases, and made some design choices to
+ensure that it stayed compatible.
+
+For more information on the Async TX API, please look the relevant
+documentation file in Documentation/crypto/async-tx-api.txt.
+
+DMAEngine Registration
++++++++++++++++++++++
+
+struct dma_device Initialization
+--------------------------------
+
+Just like any other kernel framework, the whole DMAEngine registration
+relies on the driver filling a structure and registering against the
+framework. In our case, that structure is dma_device.
+
+The first thing you need to do in your driver is to allocate this
+structure. Any of the usual memory allocators will do, but you'll also
+need to initialize a few fields in there:
+
+  * channels:	should be initialized as a list using the
+		INIT_LIST_HEAD macro for example
+
+  * dev: 	should hold the pointer to the struct device associated
+		to your current driver instance.
+
+Supported transaction types
+---------------------------
+
+The next thing you need is to set which transaction types your device
+(and driver) supports.
+
+Our dma_device structure has a field called cap_mask that holds the
+various types of transaction supported, and you need to modify this
+mask using the dma_cap_set function, with various flags depending on
+transaction types you support as an argument.
+
+All those capabilities are defined in the dma_transaction_type enum,
+in include/linux/dmaengine.h
+
+Currently, the types available are:
+  * DMA_MEMCPY
+    - The device is able to do memory to memory copies
+
+  * DMA_XOR
+    - The device is able to perform XOR operations on memory areas
+    - Used to accelerate XOR intensive tasks, such as RAID5
+
+  * DMA_XOR_VAL
+    - The device is able to perform parity check using the XOR
+      algorithm against a memory buffer.
+
+  * DMA_PQ
+    - The device is able to perform RAID6 P+Q computations, P being a
+      simple XOR, and Q being a Reed-Solomon algorithm.
+
+  * DMA_PQ_VAL
+    - The device is able to perform parity check using RAID6 P+Q
+      algorithm against a memory buffer.
+
+  * DMA_INTERRUPT
+    - The device is able to trigger a dummy transfer that will
+      generate periodic interrupts
+    - Used by the client drivers to register a callback that will be
+      called on a regular basis through the DMA controller interrupt
+
+  * DMA_SG
+    - The device supports memory to memory scatter-gather
+      transfers.
+    - Even though a plain memcpy can look like a particular case of a
+      scatter-gather transfer, with a single chunk to transfer, it's a
+      distinct transaction type in the mem2mem transfers case
+
+  * DMA_PRIVATE
+    - The devices only supports slave transfers, and as such isn't
+      available for async transfers.
+
+  * DMA_ASYNC_TX
+    - Must not be set by the device, and will be set by the framework
+      if needed
+    - /* TODO: What is it about? */
+
+  * DMA_SLAVE
+    - The device can handle device to memory transfers, including
+      scatter-gather transfers.
+    - While in the mem2mem case we were having two distinct types to
+      deal with a single chunk to copy or a collection of them, here,
+      we just have a single transaction type that is supposed to
+      handle both.
+    - If you want to transfer a single contiguous memory buffer,
+      simply build a scatter list with only one item.
+
+  * DMA_CYCLIC
+    - The device can handle cyclic transfers.
+    - A cyclic transfer is a transfer where the chunk collection will
+      loop over itself, with the last item pointing to the first.
+    - It's usually used for audio transfers, where you want to operate
+      on a single ring buffer that you will fill with your audio data.
+
+  * DMA_INTERLEAVE
+    - The device supports interleaved transfer.
+    - These transfers can transfer data from a non-contiguous buffer
+      to a non-contiguous buffer, opposed to DMA_SLAVE that can
+      transfer data from a non-contiguous data set to a continuous
+      destination buffer.
+    - It's usually used for 2d content transfers, in which case you
+      want to transfer a portion of uncompressed data directly to the
+      display to print it
+
+These various types will also affect how the source and destination
+addresses change over time.
+
+Addresses pointing to RAM are typically incremented (or decremented)
+after each transfer. In case of a ring buffer, they may loop
+(DMA_CYCLIC). Addresses pointing to a device's register (e.g. a FIFO)
+are typically fixed.
+
+Device operations
+-----------------
+
+Our dma_device structure also requires a few function pointers in
+order to implement the actual logic, now that we described what
+operations we were able to perform.
+
+The functions that we have to fill in there, and hence have to
+implement, obviously depend on the transaction types you reported as
+supported.
+
+   * device_alloc_chan_resources
+   * device_free_chan_resources
+     - These functions will be called whenever a driver will call
+       dma_request_channel or dma_release_channel for the first/last
+       time on the channel associated to that driver.
+     - They are in charge of allocating/freeing all the needed
+       resources in order for that channel to be useful for your
+       driver.
+     - These functions can sleep.
+
+   * device_prep_dma_*
+     - These functions are matching the capabilities you registered
+       previously.
+     - These functions all take the buffer or the scatterlist relevant
+       for the transfer being prepared, and should create a hardware
+       descriptor or a list of hardware descriptors from it
+     - These functions can be called from an interrupt context
+     - Any allocation you might do should be using the GFP_NOWAIT
+       flag, in order not to potentially sleep, but without depleting
+       the emergency pool either.
+     - Drivers should try to pre-allocate any memory they might need
+       during the transfer setup at probe time to avoid putting to
+       much pressure on the nowait allocator.
+
+     - It should return a unique instance of the
+       dma_async_tx_descriptor structure, that further represents this
+       particular transfer.
+
+     - This structure can be initialized using the function
+       dma_async_tx_descriptor_init.
+     - You'll also need to set two fields in this structure:
+       + flags:
+		TODO: Can it be modified by the driver itself, or
+		should it be always the flags passed in the arguments
+
+       + tx_submit:	A pointer to a function you have to implement,
+			that is supposed to push the current
+			transaction descriptor to a pending queue, waiting
+			for issue_pending to be called.
+
+   * device_issue_pending
+     - Takes the first transaction descriptor in the pending queue,
+       and starts the transfer. Whenever that transfer is done, it
+       should move to the next transaction in the list.
+     - This function can be called in an interrupt context
+
+   * device_tx_status
+     - Should report the bytes left to go over on the given channel
+     - Should only care about the transaction descriptor passed as
+       argument, not the currently active one on a given channel
+     - The tx_state argument might be NULL
+     - Should use dma_set_residue to report it
+     - In the case of a cyclic transfer, it should only take into
+       account the current period.
+     - This function can be called in an interrupt context.
+
+   * device_control
+     - Used by client drivers to control and configure the channel it
+       has a handle on.
+     - Called with a command and an argument
+       + The command is one of the values listed by the enum
+         dma_ctrl_cmd. The valid commands are:
+         + DMA_PAUSE
+           + Pauses a transfer on the channel
+           + This command should operate synchronously on the channel,
+             pausing right away the work of the given channel
+         + DMA_RESUME
+           + Restarts a transfer on the channel
+           + This command should operate synchronously on the channel,
+             resuming right away the work of the given channel
+         + DMA_TERMINATE_ALL
+           + Aborts all the pending and ongoing transfers on the
+             channel
+           + This command should operate synchronously on the channel,
+             terminating right away all the channels
+         + DMA_SLAVE_CONFIG
+           + Reconfigures the channel with passed configuration
+           + This command should NOT perform synchronously, or on any
+             currently queued transfers, but only on subsequent ones
+           + In this case, the function will receive a
+             dma_slave_config structure pointer as an argument, that
+             will detail which configuration to use.
+           + Even though that structure contains a direction field,
+             this field is deprecated in favor of the direction
+             argument given to the prep_* functions
+         + FSLDMA_EXTERNAL_START
+           + TODO: Why does that even exist?
+       + The argument is an opaque unsigned long. This actually is a
+         pointer to a struct dma_slave_config that should be used only
+         in the DMA_SLAVE_CONFIG.
+
+  * device_slave_caps
+    - Called through the framework by client drivers in order to have
+      an idea of what are the properties of the channel allocated to
+      them.
+    - Such properties are the buswidth, available directions, etc.
+    - Required for every generic layer doing DMA transfers, such as
+      ASoC.
+
+Misc notes (stuff that should be documented, but don't really know
+where to put them)
+------------------------------------------------------------------
+  * dma_run_dependencies
+    - Should be called at the end of an async TX transfer, and can be
+      ignored in the slave transfers case.
+    - Makes sure that dependent operations are run before marking it
+      as complete.
+
+  * dma_cookie_t
+    - it's a DMA transaction ID that will increment over time.
+    - Not really relevant any more since the introduction of virt-dma
+      that abstracts it away.
+
+  * DMA_CTRL_ACK
+    - Undocumented feature
+    - No one really has an idea of what it's about, besides being
+      related to reusing the DMA transaction descriptors or having
+      additional transactions added to it in the async-tx API
+    - Useless in the case of the slave API
+
+General Design Notes
+--------------------
+
+Most of the DMAEngine drivers you'll see are based on a similar design
+that handles the end of transfer interrupts in the handler, but defer
+most work to a tasklet, including the start of a new transfer whenever
+the previous transfer ended.
+
+This is a rather inefficient design though, because the inter-transfer
+latency will be not only the interrupt latency, but also the
+scheduling latency of the tasklet, which will leave the channel idle
+in between, which will slow down the global transfer rate.
+
+You should avoid this kind of practice, and instead of electing a new
+transfer in your tasklet, move that part to the interrupt handler in
+order to have a shorter idle window (that we can't really avoid
+anyway).
+
+Glossary
+--------
+
+Burst: 		A number of consecutive read or write operations
+		that can be queued to buffers before being flushed to
+		memory.
+Chunk:		A contiguous collection of bursts
+Transfer:	A collection of chunks (be it contiguous or not)
@@ -1722,6 +1722,13 @@ F:	drivers/dma/at_hdmac.c
 F:	drivers/dma/at_hdmac_regs.h
 F:	include/linux/platform_data/dma-atmel.h

+ATMEL XDMA DRIVER
+M:	Ludovic Desroches <ludovic.desroches@atmel.com>
+L:	linux-arm-kernel@lists.infradead.org
+L:	dmaengine@vger.kernel.org
+S:	Supported
+F:	drivers/dma/at_xdmac.c
+
 ATMEL I2C DRIVER
 M:	Ludovic Desroches <ludovic.desroches@atmel.com>
 L:	linux-i2c@vger.kernel.org
@@ -3162,7 +3169,8 @@ Q:	https://patchwork.kernel.org/project/linux-dmaengine/list/
 S:	Maintained
 F:	drivers/dma/
 F:	include/linux/dma*
-T:	git git://git.infradead.org/users/vkoul/slave-dma.git (slave-dma)
+F:	Documentation/dmaengine/
+T:	git git://git.infradead.org/users/vkoul/slave-dma.git

 DME1737 HARDWARE MONITOR DRIVER
 M:	Juerg Haefliger <juergh@gmail.com>
@@ -107,6 +107,13 @@ config AT_HDMAC
 	help
 	  Support the Atmel AHB DMA controller.

+config AT_XDMAC
+	tristate "Atmel XDMA support"
+	depends on ARCH_AT91
+	select DMA_ENGINE
+	help
+	  Support the Atmel XDMA controller.
+
 config FSL_DMA
 	tristate "Freescale Elo series DMA support"
 	depends on FSL_SOC
@@ -395,12 +402,12 @@ config XILINX_VDMA

 config DMA_SUN6I
 	tristate "Allwinner A31 SoCs DMA support"
-	depends on MACH_SUN6I || COMPILE_TEST
+	depends on MACH_SUN6I || MACH_SUN8I || COMPILE_TEST
 	depends on RESET_CONTROLLER
 	select DMA_ENGINE
 	select DMA_VIRTUAL_CHANNELS
 	help
-	  Support for the DMA engine for Allwinner A31 SoCs.
+	  Support for the DMA engine first found in Allwinner A31 SoCs.

 config NBPFAXI_DMA
 	tristate "Renesas Type-AXI NBPF DMA support"
@@ -16,6 +16,7 @@ obj-$(CONFIG_PPC_BESTCOMM) += bestcomm/
 obj-$(CONFIG_MV_XOR) += mv_xor.o
 obj-$(CONFIG_DW_DMAC_CORE) += dw/
 obj-$(CONFIG_AT_HDMAC) += at_hdmac.o
+obj-$(CONFIG_AT_XDMAC) += at_xdmac.o
 obj-$(CONFIG_MX3_IPU) += ipu/
 obj-$(CONFIG_TXX9_DMAC) += txx9dmac.o
 obj-$(CONFIG_SH_DMAE_BASE) += sh/
@@ -2164,7 +2164,6 @@ static int pl08x_probe(struct amba_device *adev, const struct amba_id *id)
 			 __func__, ret);
 		goto out_no_memcpy;
 	}
-	pl08x->memcpy.chancnt = ret;

 	/* Register slave channels */
 	ret = pl08x_dma_init_virtual_channels(pl08x, &pl08x->slave,
@@ -2175,7 +2174,6 @@ static int pl08x_probe(struct amba_device *adev, const struct amba_id *id)
 				__func__, ret);
 		goto out_no_slave;
 	}
-	pl08x->slave.chancnt = ret;

 	ret = dma_async_device_register(&pl08x->memcpy);
 	if (ret) {
@@ -525,8 +525,6 @@ static int bcm2835_dma_chan_init(struct bcm2835_dmadev *d, int chan_id, int irq)
 	vchan_init(&c->vc, &d->ddev);
 	INIT_LIST_HEAD(&c->node);

-	d->ddev.chancnt++;
-
 	c->chan_base = BCM2835_DMA_CHANIO(d->base, chan_id);
 	c->ch = chan_id;
 	c->irq_number = irq;
@@ -694,7 +692,6 @@ static struct platform_driver bcm2835_dma_driver = {
 	.remove	= bcm2835_dma_remove,
 	.driver = {
 		.name = "bcm2835-dma",
-		.owner = THIS_MODULE,
 		.of_match_table = of_match_ptr(bcm2835_dma_of_match),
 	},
 };
@@ -1,3 +1,4 @@
+#include <linux/delay.h>
 #include <linux/dmaengine.h>
 #include <linux/dma-mapping.h>
 #include <linux/platform_device.h>
@@ -567,7 +568,7 @@ static int cppi41_tear_down_chan(struct cppi41_channel *c)
 		reg |= GCR_TEARDOWN;
 		cppi_writel(reg, c->gcr_reg);
 		c->td_queued = 1;
-		c->td_retry = 100;
+		c->td_retry = 500;
 	}

 	if (!c->td_seen || !c->td_desc_seen) {
@@ -603,12 +604,16 @@ static int cppi41_tear_down_chan(struct cppi41_channel *c)
 	 * descriptor before the TD we fetch it from enqueue, it has to be
 	 * there waiting for us.
 	 */
-	if (!c->td_seen && c->td_retry)
+	if (!c->td_seen && c->td_retry) {
+		udelay(1);
 		return -EAGAIN;
-
+	}
 	WARN_ON(!c->td_retry);
+
 	if (!c->td_desc_seen) {
 		desc_phys = cppi41_pop_desc(cdd, c->q_num);
+		if (!desc_phys)
+			desc_phys = cppi41_pop_desc(cdd, c->q_comp_num);
 		WARN_ON(!desc_phys);
 	}

@@ -1088,7 +1093,6 @@ static struct platform_driver cpp41_dma_driver = {
 	.remove = cppi41_dma_remove,
 	.driver = {
 		.name = "cppi41-dma-engine",
-		.owner = THIS_MODULE,
 		.pm = &cppi41_pm_ops,
 		.of_match_table = of_match_ptr(cppi41_dma_ids),
 	},
@@ -563,10 +563,9 @@ static int jz4740_dma_probe(struct platform_device *pdev)
 	dd->device_prep_dma_cyclic = jz4740_dma_prep_dma_cyclic;
 	dd->device_control = jz4740_dma_control;
 	dd->dev = &pdev->dev;
-	dd->chancnt = JZ_DMA_NR_CHANS;
 	INIT_LIST_HEAD(&dd->channels);

-	for (i = 0; i < dd->chancnt; i++) {
+	for (i = 0; i < JZ_DMA_NR_CHANS; i++) {
 		chan = &dmadev->chan[i];
 		chan->id = i;
 		chan->vchan.desc_free = jz4740_dma_desc_free;
@@ -608,7 +607,6 @@ static struct platform_driver jz4740_dma_driver = {
 	.remove = jz4740_dma_remove,
 	.driver = {
 		.name = "jz4740-dma",
-		.owner = THIS_MODULE,
 	},
 };
 module_platform_driver(jz4740_dma_driver);
@@ -330,8 +330,7 @@ static int __init dma_channel_table_init(void)
 	if (err) {
 		pr_err("initialization failure\n");
 		for_each_dma_cap_mask(cap, dma_cap_mask_all)
-			if (channel_table[cap])
-				free_percpu(channel_table[cap]);
+			free_percpu(channel_table[cap]);
 	}

 	return err;
@@ -118,17 +118,17 @@
 				BIT(DMA_SLAVE_BUSWIDTH_8_BYTES)

 struct fsl_edma_hw_tcd {
-	u32	saddr;
-	u16	soff;
-	u16	attr;
-	u32	nbytes;
-	u32	slast;
-	u32	daddr;
-	u16	doff;
-	u16	citer;
-	u32	dlast_sga;
-	u16	csr;
-	u16	biter;
+	__le32	saddr;
+	__le16	soff;
+	__le16	attr;
+	__le32	nbytes;
+	__le32	slast;
+	__le32	daddr;
+	__le16	doff;
+	__le16	citer;
+	__le32	dlast_sga;
+	__le16	csr;
+	__le16	biter;
 };

 struct fsl_edma_sw_tcd {
@@ -175,18 +175,12 @@ struct fsl_edma_engine {
 };

 /*
- * R/W functions for big- or little-endian registers
- * the eDMA controller's endian is independent of the CPU core's endian.
+ * R/W functions for big- or little-endian registers:
+ * The eDMA controller's endian is independent of the CPU core's endian.
+ * For the big-endian IP module, the offset for 8-bit or 16-bit registers
+ * should also be swapped opposite to that in little-endian IP.
 */

-static u16 edma_readw(struct fsl_edma_engine *edma, void __iomem *addr)
-{
-	if (edma->big_endian)
-		return ioread16be(addr);
-	else
-		return ioread16(addr);
-}
-
 static u32 edma_readl(struct fsl_edma_engine *edma, void __iomem *addr)
 {
 	if (edma->big_endian)
@@ -197,13 +191,18 @@ static u32 edma_readl(struct fsl_edma_engine *edma, void __iomem *addr)

 static void edma_writeb(struct fsl_edma_engine *edma, u8 val, void __iomem *addr)
 {
-	iowrite8(val, addr);
+	/* swap the reg offset for these in big-endian mode */
+	if (edma->big_endian)
+		iowrite8(val, (void __iomem *)((unsigned long)addr ^ 0x3));
+	else
+		iowrite8(val, addr);
 }

 static void edma_writew(struct fsl_edma_engine *edma, u16 val, void __iomem *addr)
 {
+	/* swap the reg offset for these in big-endian mode */
 	if (edma->big_endian)
-		iowrite16be(val, addr);
+		iowrite16be(val, (void __iomem *)((unsigned long)addr ^ 0x2));
 	else
 		iowrite16(val, addr);
 }
@@ -254,13 +253,12 @@ static void fsl_edma_chan_mux(struct fsl_edma_chan *fsl_chan,
 	chans_per_mux = fsl_chan->edma->n_chans / DMAMUX_NR;
 	ch_off = fsl_chan->vchan.chan.chan_id % chans_per_mux;
 	muxaddr = fsl_chan->edma->muxbase[ch / chans_per_mux];
+	slot = EDMAMUX_CHCFG_SOURCE(slot);

 	if (enable)
-		edma_writeb(fsl_chan->edma,
-				EDMAMUX_CHCFG_ENBL | EDMAMUX_CHCFG_SOURCE(slot),
-				muxaddr + ch_off);
+		iowrite8(EDMAMUX_CHCFG_ENBL | slot, muxaddr + ch_off);
 	else
-		edma_writeb(fsl_chan->edma, EDMAMUX_CHCFG_DIS, muxaddr + ch_off);
+		iowrite8(EDMAMUX_CHCFG_DIS, muxaddr + ch_off);
 }

 static unsigned int fsl_edma_get_tcd_attr(enum dma_slave_buswidth addr_width)
@@ -286,9 +284,8 @@ static void fsl_edma_free_desc(struct virt_dma_desc *vdesc)

 	fsl_desc = to_fsl_edma_desc(vdesc);
 	for (i = 0; i < fsl_desc->n_tcds; i++)
-			dma_pool_free(fsl_desc->echan->tcd_pool,
-					fsl_desc->tcd[i].vtcd,
-					fsl_desc->tcd[i].ptcd);
+		dma_pool_free(fsl_desc->echan->tcd_pool, fsl_desc->tcd[i].vtcd,
+			      fsl_desc->tcd[i].ptcd);
 	kfree(fsl_desc);
 }

@@ -363,8 +360,8 @@ static size_t fsl_edma_desc_residue(struct fsl_edma_chan *fsl_chan,

 	/* calculate the total size in this desc */
 	for (len = i = 0; i < fsl_chan->edesc->n_tcds; i++)
-		len += edma_readl(fsl_chan->edma, &(edesc->tcd[i].vtcd->nbytes))
-			* edma_readw(fsl_chan->edma, &(edesc->tcd[i].vtcd->biter));
+		len += le32_to_cpu(edesc->tcd[i].vtcd->nbytes)
+			* le16_to_cpu(edesc->tcd[i].vtcd->biter);

 	if (!in_progress)
 		return len;
@@ -376,17 +373,15 @@ static size_t fsl_edma_desc_residue(struct fsl_edma_chan *fsl_chan,

 	/* figure out the finished and calculate the residue */
 	for (i = 0; i < fsl_chan->edesc->n_tcds; i++) {
-		size = edma_readl(fsl_chan->edma, &(edesc->tcd[i].vtcd->nbytes))
-			* edma_readw(fsl_chan->edma, &(edesc->tcd[i].vtcd->biter));
+		size = le32_to_cpu(edesc->tcd[i].vtcd->nbytes)
+			* le16_to_cpu(edesc->tcd[i].vtcd->biter);
 		if (dir == DMA_MEM_TO_DEV)
-			dma_addr = edma_readl(fsl_chan->edma,
-					&(edesc->tcd[i].vtcd->saddr));
+			dma_addr = le32_to_cpu(edesc->tcd[i].vtcd->saddr);
 		else
-			dma_addr = edma_readl(fsl_chan->edma,
-					&(edesc->tcd[i].vtcd->daddr));
+			dma_addr = le32_to_cpu(edesc->tcd[i].vtcd->daddr);

 		len -= size;
-		if (cur_addr > dma_addr && cur_addr < dma_addr + size) {
+		if (cur_addr >= dma_addr && cur_addr < dma_addr + size) {
 			len += dma_addr + size - cur_addr;
 			break;
 		}
@@ -424,55 +419,67 @@ static enum dma_status fsl_edma_tx_status(struct dma_chan *chan,
 	return fsl_chan->status;
 }

-static void fsl_edma_set_tcd_params(struct fsl_edma_chan *fsl_chan,
-		u32 src, u32 dst, u16 attr, u16 soff, u32 nbytes,
-		u32 slast, u16 citer, u16 biter, u32 doff, u32 dlast_sga,
-		u16 csr)
+static void fsl_edma_set_tcd_regs(struct fsl_edma_chan *fsl_chan,
+				  struct fsl_edma_hw_tcd *tcd)
 {
+	struct fsl_edma_engine *edma = fsl_chan->edma;
 	void __iomem *addr = fsl_chan->edma->membase;
 	u32 ch = fsl_chan->vchan.chan.chan_id;

 	/*
-	 * TCD parameters have been swapped in fill_tcd_params(),
-	 * so just write them to registers in the cpu endian here
+	 * TCD parameters are stored in struct fsl_edma_hw_tcd in little
+	 * endian format. However, we need to load the TCD registers in
+	 * big- or little-endian obeying the eDMA engine model endian.
 	 */
-	writew(0, addr + EDMA_TCD_CSR(ch));
-	writel(src, addr + EDMA_TCD_SADDR(ch));
-	writel(dst, addr + EDMA_TCD_DADDR(ch));
-	writew(attr, addr + EDMA_TCD_ATTR(ch));
-	writew(soff, addr + EDMA_TCD_SOFF(ch));
-	writel(nbytes, addr + EDMA_TCD_NBYTES(ch));
-	writel(slast, addr + EDMA_TCD_SLAST(ch));
-	writew(citer, addr + EDMA_TCD_CITER(ch));
-	writew(biter, addr + EDMA_TCD_BITER(ch));
-	writew(doff, addr + EDMA_TCD_DOFF(ch));
-	writel(dlast_sga, addr + EDMA_TCD_DLAST_SGA(ch));
-	writew(csr, addr + EDMA_TCD_CSR(ch));
+	edma_writew(edma, 0, addr + EDMA_TCD_CSR(ch));
+	edma_writel(edma, le32_to_cpu(tcd->saddr), addr + EDMA_TCD_SADDR(ch));
+	edma_writel(edma, le32_to_cpu(tcd->daddr), addr + EDMA_TCD_DADDR(ch));
+
+	edma_writew(edma, le16_to_cpu(tcd->attr), addr + EDMA_TCD_ATTR(ch));
+	edma_writew(edma, le16_to_cpu(tcd->soff), addr + EDMA_TCD_SOFF(ch));
+
+	edma_writel(edma, le32_to_cpu(tcd->nbytes), addr + EDMA_TCD_NBYTES(ch));
+	edma_writel(edma, le32_to_cpu(tcd->slast), addr + EDMA_TCD_SLAST(ch));
+
+	edma_writew(edma, le16_to_cpu(tcd->citer), addr + EDMA_TCD_CITER(ch));
+	edma_writew(edma, le16_to_cpu(tcd->biter), addr + EDMA_TCD_BITER(ch));
+	edma_writew(edma, le16_to_cpu(tcd->doff), addr + EDMA_TCD_DOFF(ch));
+
+	edma_writel(edma, le32_to_cpu(tcd->dlast_sga), addr + EDMA_TCD_DLAST_SGA(ch));
+
+	edma_writew(edma, le16_to_cpu(tcd->csr), addr + EDMA_TCD_CSR(ch));
 }

-static void fill_tcd_params(struct fsl_edma_engine *edma,
-		struct fsl_edma_hw_tcd *tcd, u32 src, u32 dst,
-		u16 attr, u16 soff, u32 nbytes, u32 slast, u16 citer,
-		u16 biter, u16 doff, u32 dlast_sga, bool major_int,
-		bool disable_req, bool enable_sg)
+static inline
+void fsl_edma_fill_tcd(struct fsl_edma_hw_tcd *tcd, u32 src, u32 dst,
+		       u16 attr, u16 soff, u32 nbytes, u32 slast, u16 citer,
+		       u16 biter, u16 doff, u32 dlast_sga, bool major_int,
+		       bool disable_req, bool enable_sg)
 {
 	u16 csr = 0;

 	/*
-	 * eDMA hardware SGs require the TCD parameters stored in memory
-	 * the same endian as the eDMA module so that they can be loaded
-	 * automatically by the engine
+	 * eDMA hardware SGs require the TCDs to be stored in little
+	 * endian format irrespective of the register endian model.
+	 * So we put the value in little endian in memory, waiting
+	 * for fsl_edma_set_tcd_regs doing the swap.
 	 */
-	edma_writel(edma, src, &(tcd->saddr));
-	edma_writel(edma, dst, &(tcd->daddr));
-	edma_writew(edma, attr, &(tcd->attr));
-	edma_writew(edma, EDMA_TCD_SOFF_SOFF(soff), &(tcd->soff));
-	edma_writel(edma, EDMA_TCD_NBYTES_NBYTES(nbytes), &(tcd->nbytes));
-	edma_writel(edma, EDMA_TCD_SLAST_SLAST(slast), &(tcd->slast));
-	edma_writew(edma, EDMA_TCD_CITER_CITER(citer), &(tcd->citer));
-	edma_writew(edma, EDMA_TCD_DOFF_DOFF(doff), &(tcd->doff));
-	edma_writel(edma, EDMA_TCD_DLAST_SGA_DLAST_SGA(dlast_sga), &(tcd->dlast_sga));
-	edma_writew(edma, EDMA_TCD_BITER_BITER(biter), &(tcd->biter));
+	tcd->saddr = cpu_to_le32(src);
+	tcd->daddr = cpu_to_le32(dst);
+
+	tcd->attr = cpu_to_le16(attr);
+
+	tcd->soff = cpu_to_le16(EDMA_TCD_SOFF_SOFF(soff));
+
+	tcd->nbytes = cpu_to_le32(EDMA_TCD_NBYTES_NBYTES(nbytes));
+	tcd->slast = cpu_to_le32(EDMA_TCD_SLAST_SLAST(slast));
+
+	tcd->citer = cpu_to_le16(EDMA_TCD_CITER_CITER(citer));
+	tcd->doff = cpu_to_le16(EDMA_TCD_DOFF_DOFF(doff));
+
+	tcd->dlast_sga = cpu_to_le32(EDMA_TCD_DLAST_SGA_DLAST_SGA(dlast_sga));
+
+	tcd->biter = cpu_to_le16(EDMA_TCD_BITER_BITER(biter));
 	if (major_int)
 		csr |= EDMA_TCD_CSR_INT_MAJOR;

@@ -482,7 +489,7 @@ static void fill_tcd_params(struct fsl_edma_engine *edma,
 	if (enable_sg)
 		csr |= EDMA_TCD_CSR_E_SG;

-	edma_writew(edma, csr, &(tcd->csr));
+	tcd->csr = cpu_to_le16(csr);
 }

 static struct fsl_edma_desc *fsl_edma_alloc_desc(struct fsl_edma_chan *fsl_chan,
@@ -558,9 +565,9 @@ static struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic(
 			doff = fsl_chan->fsc.addr_width;
 		}

-		fill_tcd_params(fsl_chan->edma, fsl_desc->tcd[i].vtcd, src_addr,
-				dst_addr, fsl_chan->fsc.attr, soff, nbytes, 0,
-				iter, iter, doff, last_sg, true, false, true);
+		fsl_edma_fill_tcd(fsl_desc->tcd[i].vtcd, src_addr, dst_addr,
+				  fsl_chan->fsc.attr, soff, nbytes, 0, iter,
+				  iter, doff, last_sg, true, false, true);
 		dma_buf_next += period_len;
 	}

@@ -607,16 +614,16 @@ static struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg(
 		iter = sg_dma_len(sg) / nbytes;
 		if (i < sg_len - 1) {
 			last_sg = fsl_desc->tcd[(i + 1)].ptcd;
-			fill_tcd_params(fsl_chan->edma, fsl_desc->tcd[i].vtcd,
-					src_addr, dst_addr, fsl_chan->fsc.attr,
-					soff, nbytes, 0, iter, iter, doff, last_sg,
-					false, false, true);
+			fsl_edma_fill_tcd(fsl_desc->tcd[i].vtcd, src_addr,
+					  dst_addr, fsl_chan->fsc.attr, soff,
+					  nbytes, 0, iter, iter, doff, last_sg,
+					  false, false, true);
 		} else {
 			last_sg = 0;
-			fill_tcd_params(fsl_chan->edma, fsl_desc->tcd[i].vtcd,
-					src_addr, dst_addr, fsl_chan->fsc.attr,
-					soff, nbytes, 0, iter, iter, doff, last_sg,
-					true, true, false);
+			fsl_edma_fill_tcd(fsl_desc->tcd[i].vtcd, src_addr,
+					  dst_addr, fsl_chan->fsc.attr, soff,
+					  nbytes, 0, iter, iter, doff, last_sg,
+					  true, true, false);
 		}
 	}

@@ -625,17 +632,13 @@ static struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg(

 static void fsl_edma_xfer_desc(struct fsl_edma_chan *fsl_chan)
 {
-	struct fsl_edma_hw_tcd *tcd;
 	struct virt_dma_desc *vdesc;

 	vdesc = vchan_next_desc(&fsl_chan->vchan);
 	if (!vdesc)
 		return;
 	fsl_chan->edesc = to_fsl_edma_desc(vdesc);
-	tcd = fsl_chan->edesc->tcd[0].vtcd;
-	fsl_edma_set_tcd_params(fsl_chan, tcd->saddr, tcd->daddr, tcd->attr,
-			tcd->soff, tcd->nbytes, tcd->slast, tcd->citer,
-			tcd->biter, tcd->doff, tcd->dlast_sga, tcd->csr);
+	fsl_edma_set_tcd_regs(fsl_chan, fsl_chan->edesc->tcd[0].vtcd);
 	fsl_edma_enable_request(fsl_chan);
 	fsl_chan->status = DMA_IN_PROGRESS;
 }
@@ -1337,7 +1337,6 @@ static int fsl_dma_chan_probe(struct fsldma_device *fdev,

 	/* Add the channel to DMA device channel list */
 	list_add_tail(&chan->common.device_node, &fdev->common.channels);
-	fdev->common.chancnt++;

 	dev_info(fdev->dev, "#%d (%s), irq %d\n", chan->id, compatible,
 		 chan->irq != NO_IRQ ? chan->irq : fdev->irq);
@@ -729,6 +729,7 @@ static void sdma_get_pc(struct sdma_channel *sdmac,
 	case IMX_DMATYPE_CSPI:
 	case IMX_DMATYPE_EXT:
 	case IMX_DMATYPE_SSI:
+	case IMX_DMATYPE_SAI:
 		per_2_emi = sdma->script_addrs->app_2_mcu_addr;
 		emi_2_per = sdma->script_addrs->mcu_2_app_addr;
 		break;
@@ -1287,7 +1288,8 @@ static void sdma_load_firmware(const struct firmware *fw, void *context)
 	unsigned short *ram_code;

 	if (!fw) {
-		dev_err(sdma->dev, "firmware not found\n");
+		dev_info(sdma->dev, "external firmware not found, using ROM firmware\n");
+		/* In this case we just use the ROM firmware. */
 		return;
 	}

@@ -1346,7 +1348,7 @@ static int sdma_get_firmware(struct sdma_engine *sdma,
 	return ret;
 }

-static int __init sdma_init(struct sdma_engine *sdma)
+static int sdma_init(struct sdma_engine *sdma)
 {
 	int i, ret;
 	dma_addr_t ccb_phys;
@@ -1265,9 +1265,17 @@ static int ioat_xor_val_self_test(struct ioatdma_device *device)
 	op = IOAT_OP_XOR;

 	dest_dma = dma_map_page(dev, dest, 0, PAGE_SIZE, DMA_FROM_DEVICE);
+	if (dma_mapping_error(dev, dest_dma))
+		goto dma_unmap;
+
 	for (i = 0; i < IOAT_NUM_SRC_TEST; i++)
+		dma_srcs[i] = DMA_ERROR_CODE;
+	for (i = 0; i < IOAT_NUM_SRC_TEST; i++) {
 		dma_srcs[i] = dma_map_page(dev, xor_srcs[i], 0, PAGE_SIZE,
 					   DMA_TO_DEVICE);
+		if (dma_mapping_error(dev, dma_srcs[i]))
+			goto dma_unmap;
+	}
 	tx = dma->device_prep_dma_xor(dma_chan, dest_dma, dma_srcs,
 				      IOAT_NUM_SRC_TEST, PAGE_SIZE,
 				      DMA_PREP_INTERRUPT);
@@ -1298,7 +1306,6 @@ static int ioat_xor_val_self_test(struct ioatdma_device *device)
 		goto dma_unmap;
 	}

-	dma_unmap_page(dev, dest_dma, PAGE_SIZE, DMA_FROM_DEVICE);
 	for (i = 0; i < IOAT_NUM_SRC_TEST; i++)
 		dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE, DMA_TO_DEVICE);

@@ -1313,6 +1320,8 @@ static int ioat_xor_val_self_test(struct ioatdma_device *device)
 	}
 	dma_sync_single_for_device(dev, dest_dma, PAGE_SIZE, DMA_FROM_DEVICE);

+	dma_unmap_page(dev, dest_dma, PAGE_SIZE, DMA_FROM_DEVICE);
+
 	/* skip validate if the capability is not present */
 	if (!dma_has_cap(DMA_XOR_VAL, dma_chan->device->cap_mask))
 		goto free_resources;
@@ -1327,8 +1336,13 @@ static int ioat_xor_val_self_test(struct ioatdma_device *device)
 	xor_val_result = 1;

 	for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++)
+		dma_srcs[i] = DMA_ERROR_CODE;
+	for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++) {
 		dma_srcs[i] = dma_map_page(dev, xor_val_srcs[i], 0, PAGE_SIZE,
 					   DMA_TO_DEVICE);
+		if (dma_mapping_error(dev, dma_srcs[i]))
+			goto dma_unmap;
+	}
 	tx = dma->device_prep_dma_xor_val(dma_chan, dma_srcs,
 					  IOAT_NUM_SRC_TEST + 1, PAGE_SIZE,
 					  &xor_val_result, DMA_PREP_INTERRUPT);
@@ -1374,8 +1388,13 @@ static int ioat_xor_val_self_test(struct ioatdma_device *device)

 	xor_val_result = 0;
 	for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++)
+		dma_srcs[i] = DMA_ERROR_CODE;
+	for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++) {
 		dma_srcs[i] = dma_map_page(dev, xor_val_srcs[i], 0, PAGE_SIZE,
 					   DMA_TO_DEVICE);
+		if (dma_mapping_error(dev, dma_srcs[i]))
+			goto dma_unmap;
+	}
 	tx = dma->device_prep_dma_xor_val(dma_chan, dma_srcs,
 					  IOAT_NUM_SRC_TEST + 1, PAGE_SIZE,
 					  &xor_val_result, DMA_PREP_INTERRUPT);
@@ -1417,14 +1436,18 @@ static int ioat_xor_val_self_test(struct ioatdma_device *device)
 	goto free_resources;
 dma_unmap:
 	if (op == IOAT_OP_XOR) {
-		dma_unmap_page(dev, dest_dma, PAGE_SIZE, DMA_FROM_DEVICE);
+		if (dest_dma != DMA_ERROR_CODE)
+			dma_unmap_page(dev, dest_dma, PAGE_SIZE,
+				       DMA_FROM_DEVICE);
 		for (i = 0; i < IOAT_NUM_SRC_TEST; i++)
-			dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
-				       DMA_TO_DEVICE);
+			if (dma_srcs[i] != DMA_ERROR_CODE)
+				dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
+					       DMA_TO_DEVICE);
 	} else if (op == IOAT_OP_XOR_VAL) {
 		for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++)
-			dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
-				       DMA_TO_DEVICE);
+			if (dma_srcs[i] != DMA_ERROR_CODE)
+				dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
+					       DMA_TO_DEVICE);
 	}
 free_resources:
 	dma->device_free_chan_resources(dma_chan);
--- a/Show More
+++ b/Show More