For almost 2 decades, the max allowed requests were limited to 512KB because of
SDMA's max 512KiB boundary limit.
ADMA2 and ADMA3 do not have such limits and were effectively made so any
kind of block count would not impose interrupt and managing stress to the host.
By limiting that to 512KiB, it effectively downgrades these DMA modes to SDMA.
Fix that by actually following the spec:
When ADMA is selected tuning mode is advised.
On lesser modes 4MiB transfers is selected as max, so re-tuning if timer trigger
or if requested by host interrupt, can be done in time.
Otherwise, the only limit is the variable size of types used.
In this implementation, 16MiB is used as maximum since tests showed that after
that point, there are diminishing returns.
Also 16MiB in worst case scenarios, when card is eMMC and its max speed is a
generous 350MiB/s, will generate interrupts every 45ms on huge data transfers.
For example, on local tests with rigorous CPU/GPU burn-in tests and abrupt
cut-offs to generate huge temperature changes (upwards/downwards) to the card,
tested host was fine up to 128MB/s transfers on slow cards that used SDR104
bus timing without re-tuning.
In that case the 4MiB limit was overridden with a more than safe 8MiB value.
In all testing cases and boards, that change brought the following:
Depending on bus timing and eMMC/SD specs:
* Max Read throughput increased by 2-20%
* Max Write throughput increased by 50-200%
Depending on CPU frequency and transfer sizes:
* Reduced mmcqd cpu core usage by 4-50%
Signed-off-by: CTCaer <ctcaer@gmail.com>