Pull dmaengine fixes from Dan Williams:
1/ regression fix for Xen as it now trips over a broken assumption
about the dma address size on 32-bit builds
2/ new quirk for netdma to ignore dma channels that cannot meet
netdma alignment requirements
3/ fixes for two long standing issues in ioatdma (ring size overflow)
and iop-adma (potential stack corruption)
* tag 'dmaengine-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
netdma: adding alignment check for NETDMA ops
ioatdma: DMA copy alignment needed to address IOAT DMA silicon errata
ioat: ring size variables need to be 32bit to avoid overflow
iop-adma: Corrected array overflow in RAID6 Xscale(R) test.
ioat: fix size of 'completion' for Xen
Silicon errata where when RAID and legacy descriptors are mixed, the legacy
(memcpy and friends) operation must have alignment of 64 bytes to avoid
hanging. This effects Intel Xeon C55xx, C35xx, E5-2600.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The alloc order can be up to 16 and 1 << 16 will over flow the 16bit
integer. Change the appropriate variables to 16bit to avoid overflow.
Reported-by: Jim Harris <james.r.harris@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Starting with v3.2 Jonathan reports that Xen crashes loading the ioatdma
driver. A debug run shows:
ioatdma 0000:00:16.4: desc[0]: (0x300cc7000->0x300cc7040) cookie: 0 flags: 0x2 ctl: 0x29 (op: 0 int_en: 1 compl: 1)
...
ioatdma 0000:00:16.4: ioat_get_current_completion: phys_complete: 0xcc7000
...which shows that in this environment GFP_KERNEL memory may be backed
by a 64-bit dma address. This breaks the driver's assumption that an
unsigned long should be able to contain the physical address for
descriptor memory. Switch to dma_addr_t which beyond being the right
size, is the true type for the data i.e. an io-virtual address
inidicating the engine's last processed descriptor.
[stable: 3.2+]
Cc: <stable@vger.kernel.org>
Reported-by: Jonathan Nieder <jrnieder@gmail.com>
Reported-by: William Dauchy <wdauchy@gmail.com>
Tested-by: William Dauchy <wdauchy@gmail.com>
Tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Ensure all DMA engine drivers initialize their cookies in the same way,
so that they all behave in a similar fashion. This means their first
issued cookie will be 2 rather than 1, and will increment to INT_MAX
before returning 1 and starting over.
In connection with this, Dan Williams said:
> Russell King wrote:
> > Secondly, some DMA engine drivers initialize the dma_chan cookie to 0,
> > others to 1. Is there a reason for this, or are these all buggy?
>
> I know that ioat and iop-adma expect 0 to mean "I have cleaned up this
> descriptor and it is idle", and would break if zero was an in-flight
> cookie value. The reserved usage of zero is an driver internal
> concern, but I have no problem formalizing it as a reserved value.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jassi Brar <jassisinghbrar@gmail.com>
[imx-sdma.c & mxs-dma.c]
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
Now that we have the completed cookie in the dma_chan structure, we
can consolidate the tx_status functions by providing a function to set
the txstate structure and returning the DMA status. We also provide
a separate helper to set the residue for cookies which are still in
progress.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jassi Brar <jassisinghbrar@gmail.com>
[imx-sdma.c & mxs-dma.c]
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
For versions of the device that implement operation-types 0x87, 0x88
(IOAT_OP_XOR, IOAT_OP_XOR_VAL) this map determines whether a given
source is located in the base or extended descriptor. Source addresses
6 through 8 require an extended descriptor, hence 0xe0, not 0xd0. No
shipping hardware currently implements these operation types.
Reported-by: Evgueni Smogailov <evgueni.smogailov@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
const __read_mostly is not legal and causes section type conflicts.
That's because the read.mostly section is not read only.
Simply drop the __read_mostly designation.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
[drop __read_mostly instead of const]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (33 commits)
x86: poll waiting for I/OAT DMA channel status
maintainers: add dma engine tree details
dmaengine: add TODO items for future work on dma drivers
dmaengine: Add API documentation for slave dma usage
dmaengine/dw_dmac: Update maintainer-ship
dmaengine: move link order
dmaengine/dw_dmac: implement pause and resume in dwc_control
dmaengine/dw_dmac: Replace spin_lock* with irqsave variants and enable submission from callback
dmaengine/dw_dmac: Divide one sg to many desc, if sg len is greater than DWC_MAX_COUNT
dmaengine/dw_dmac: set residue as total len in dwc_tx_status if status is !DMA_SUCCESS
dmaengine/dw_dmac: don't call callback routine in case dmaengine_terminate_all() is called
dmaengine: at_hdmac: pause: no need to wait for FIFO empty
pch_dma: modify pci device table definition
pch_dma: Support new device ML7223 IOH
pch_dma: Support I2S for ML7213 IOH
pch_dma: Fix DMA setting issue
pch_dma: modify for checkpatch
pch_dma: fix dma direction issue for ML7213 IOH video-in
dmaengine: at_hdmac: use descriptor chaining help function
dmaengine: at_hdmac: implement pause and resume in atc_control
...
Fix up trivial conflict in drivers/dma/dw_dmac.c
For certain system configurations a 5 usec udelay before checking I/OAT DMA
channel status is sometimes not sufficient, resulting in a false failure
status and unnecessary freeing of channel resources. Conversely, for many
configurations 5 usec is longer than necessary.
Loop for up to 20 usec waiting for successful status before failing.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
After discovering that wide use of prefetch on modern CPUs
could be a net loss instead of a win, net drivers which were
relying on the implicit inclusion of prefetch.h via the list
headers showed up in the resulting cleanup fallout. Give
them an explicit include via the following $0.02 script.
=========================================
#!/bin/bash
MANUAL=""
for i in `git grep -l 'prefetch(.*)' .` ; do
grep -q '<linux/prefetch.h>' $i
if [ $? = 0 ] ; then
continue
fi
( echo '?^#include <linux/?a'
echo '#include <linux/prefetch.h>'
echo .
echo w
echo q
) | ed -s $i > /dev/null 2>&1
if [ $? != 0 ]; then
echo $i needs manual fixup
MANUAL="$i $MANUAL"
fi
done
echo ------------------- 8\<----------------------
echo vi $MANUAL
=========================================
Signed-off-by: Paul <paul.gortmaker@windriver.com>
[ Fixed up some incorrect #include placements, and added some
non-network drivers and the fib_trie.c case - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 0793448 "DMAENGINE: generic channel status v2" changed the interface for
how dma channel progress is retrieved. It inadvertently exported an internal
helper function ioat_tx_status() instead of ioat_dma_tx_status(). The latter
polls the hardware to get the latest completion state, while the helper just
evaluates the current state without touching hardware. The effect is that we
end up waiting for completion timeouts or descriptor allocation errors before
the completion state is updated.
iperf (before fix):
[SUM] 0.0-41.3 sec 364 MBytes 73.9 Mbits/sec
iperf (after fix):
[SUM] 0.0- 4.5 sec 499 MBytes 940 Mbits/sec
This is a regression starting with 2.6.35.
Cc: <stable@kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Reported-by: Richard Scobie <richard@sauce.co.nz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
On some platforms (MacPro3,1) the BIOS assigns the ioatdma device to the
incorrect iommu causing faults when the driver initializes. Add a quirk
to catch this misconfiguration and try falling back to untranslated
operation (which works in the MacPro3,1 case).
Assuming there are other platforms with misconfigured iommus teach the
ioatdma driver to treat initialization failures as non-fatal (just fail
the driver load and emit a warning instead of triggering a BUG_ON).
This can be classified as a boot regression since 2.6.32 on affected
platforms since the ioatdma module did not autoload prior to that
kernel.
Cc: <stable@kernel.org>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Reported-by: Chris Li <lkml@chrisli.org>
Tested-by: Chris Li <lkml@chrisli.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
There are cases where cacheline-unaligned raid operations can hang the
dma channel. Simply disable these operations by increasing the
alignment constraints published to async_tx. The raid456 driver always
issues page aligned requests, so the only in-kernel user of the ioatdma
driver that is affected by this change is dmatest.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Use separate locks for the descriptor prep (producer) and descriptor
cleanup (consumer) paths. Allows the producer path to run concurrently
with the cleanup path. Inspired by Documentation/circular-buffer.txt.
Cc: David Howells <dhowells@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>