Flash filesystems like JFFS2, UBIFS and MTD block layer can provide
vmalloc'd or kmap'd buffers that cannot be mapped using dma_map_sg() and
can potentially be in memory region above 32bit addressable region(ie
buffers belonging to memory region backed by LPAE) of DMA, implement
spi_flash_can_dma() interface to inform SPI core not to map such
buffers.
When buffers are not mapped for DMA, then use a pre allocated bounce
buffer(64K = typical flash erase sector size) to read from flash and
then do a copy to actual destination buffer. This is approach is much
faster than using memcpy using CPU and also reduces CPU load.
With this patch, UBIFS read speed is ~18MB/s and CPU utilization <20% on
DRA74 Rev H EVM. Performance degradation is negligible when compared
with non bounce buffer case while using UBIFS.
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
commit 1351aaeb50 ("spi: spi-ti-qspi: Use dma_engine wrapper for dma
memcpy call") introduced this warning:
drivers/spi/spi-ti-qspi.c: In function 'ti_qspi_dma_xfer':
drivers/spi/spi-ti-qspi.c:398:21: warning: unused variable 'dma_dev' [-Wunused-variable]
struct dma_device *dma_dev = chan->device;
Fix it by removing the unused variable.
Signed-off-by: Vignesh R <vigneshr@ti.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Mark Brown <broonie@kernel.org>
Instead of calling device_prep_dma_memcpy() directly with dma_device
pointer, use the newly introduced dmaengine_prep_dma_memcpy() wrapper
API.
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
'dma_request_chan_by_mask()' can not return NULL.
Try to keep the logic in 'no_dma:' by resetting 'qspi->rx_chan' in case
of error.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Mark Brown <broonie@kernel.org>
call spi_master_put() in case of failures after spi_alloc_master().
call pm_runtime_disable() in case of failures after pm_runtime_enable().
Signed-off-by: Prahlad V <prahlad.eee@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Use mem-to-mem DMA to read from flash when reading in mmap mode. This
gives improved read performance and reduces CPU load.
With this patch the raw-read throughput is ~16MB/s on DRA74 EVM. And CPU
load is <20%. UBIFS read throughput ~13 MB/s.
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Before disabling the pm_runtime, we must ensure that there is no transfer
in progress nor will a new one be started. Otherwise the message pump will
fail and in the end, the process requesting the transfer will be stuck.
This behavior has been observed when transferring data from a SPI flash
with dd while removing the module on a DRA7x-evm.
Signed-off-by: Jean-Jacques Hiblot <jjhiblot@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
We clamp frame_len_words to a maximum of 4096, but do not actually
limit the number of words written or read through the DATA registers
or the length added to spi_message::actual_length. This results in
silent data corruption for commands longer than this maximum.
Recalculate the length of each transfer, taking frame_len_words into
account. Use this length in qspi_{read,write}_msg(), and to increment
spi_message::actual_length.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Each transfer can specify 8, 16 or 32 bits per word independently of
the default for the device being addressed. However, currently we
calculate the number of words in the frame assuming that the word size
is the device default.
If multiple transfers in the same message have differing
bits_per_word, we bitwise-or the different values in the WLEN register
field.
Fix both of these. Also rename 'frame_length' to 'frame_len_words' to
make clear that it's not a byte count like spi_message::frame_length.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
ti-qspi controller provides mmap port to read data from SPI flashes.
mmap port is enabled in QSPI_SPI_SWITCH_REG. ctrl module register may
also need to be accessed for some SoCs. The QSPI_SPI_SETUP_REGx needs to
be populated with flash specific information like read opcode, read
mode(quad, dual, normal), address width and dummy bytes. Once,
controller is in mmap mode, the whole flash memory is available as a
memory region at SoC specific address. This region can be accessed using
normal memcpy() (or mem-to-mem dma copy). The ti-qspi controller hardware
will internally communicate with SPI flash over SPI bus and get the
requested data.
Implement spi_flash_read() callback to support mmap read over SPI
flash devices. With this, the read throughput increases from ~100kB/s to
~2.5 MB/s.
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
there's no need to call pm_runtime_get_sync()
followed by pm_runtime_put(). We should, instead,
just call pm_runtime_put_sync() and pm_runtime_disable().
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Currently word completion interrupt is fired for transfer of every
word(8bit to 128bit in size). This adds a lot of overhead, and decreases
r/w throughput. It hardly takes 3us(@48MHz) for 128bit r/w to complete,
hence its better to poll on word complete bit to be set in
QSPI_SPI_STATUS_REG instead of using interrupts.
This increases the throughput by 30% in both read and write case.
So, switch to polling mode instead of interrupts to determine completion
of word transfer.
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Writing invalid command to QSPI_SPI_CMD_REG will terminate current
transfer and de-assert the chip select. This has to be done before
calling spi_finalize_current_message(). Because
spi_finalize_current_message() will mark the end of current message
transfer and schedule the next transfer. If the chipselect is not
de-asserted before calling spi_finalize_current_message() then the next
transfer will overlap with the previous transfer leading to data
corruption.
__spi_pump_message() can be called either from kthread worker context or
directly from the calling process's context. It is possible that these
two calls can race against each other. But race is serialized by
checking whether master->cur_msg == NULL (pointer to msg being handled
by transfer_one() at present). The master->cur_msg is set to NULL when
spi_finalize_current_message() is called on that message, which means
calling spi_finalize_current_message() allows __spi_sync() to pump next
message in calling process context.
Now if spi-ti-qspi calls spi_finalize_current_message() before we
terminate transfer at hardware side, if __spi_pump_message() is called
from process context then the successive transactions can overlap.
Fix this by moving writing invalid command to QSPI_SPI_CMD_REG to
before calling spi_finalize_current_message() call.
Cc: stable@vger.kernel.org # v3.12+
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
TI QSPI has four 32 bit data regsiters which can be used to transfer 16
bytes of data at once. The register group QSPI_SPI_DATA_REG_3,
QSPI_SPI_DATA_REG_2, QSPI_SPI_DATA_REG_1 and QSPI_SPI_DATA_REG is
treated as a single 128-bit word for shifting data in and out. The bit
at QSPI_SPI_DATA_REG_3[31] position is the first bit to be shifted out
in case of 128 bit transfer mode. Therefore the first byte to be written
to flash should be at QSPI_SPI_DATA_REG_3[31-25] position.
Instead of writing 1 byte at a time when interacting with spi-nor flash,
make use of all the four registers so that 16 bytes can be transferred
in one go. This reduces number of register writes and Word Complete
interrupts for a given transfer message size, thereby increasing the
write performance.
Without this patch the raw flash write speed is ~100KB/s, with this
patch the write speed increases to ~400 kB/s on DRA74 EVM.
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Data corruption is seen while reading/writing large data from/to qspi
device because the data register is over written or read before data
is ready which is denoted by busy bit in status register. SO adding
a busy bit check before writing/reading data to/from qspi device.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
return type of wait_for_completion_timeout is unsigned long not int, this
patch uses the return value of wait_for_completion_timeout in the condition
directly rather than assigning it to an incorrect type variable.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
mmap resource requirement is only for memory mapped operations.
If the user does not populate mmap resource, dont call return,
instead we go on for normal spi mode operations.
Signed-off-by: Sourav Poddar <sourav.poddar@ti.com>
Signed-off-by: Mark Brown <broonie@linaro.org>