dma_range_map is freed to early, which might cause an oops when
a driver probe fails.
Call trace:
is_free_buddy_page+0xe4/0x1d4
__free_pages+0x2c/0x88
dma_free_contiguous+0x64/0x80
dma_direct_free+0x38/0xb4
dma_free_attrs+0x88/0xa0
dmam_release+0x28/0x34
release_nodes+0x78/0x8c
devres_release_all+0xa8/0x110
really_probe+0x118/0x2d0
__driver_probe_device+0xc8/0xe0
driver_probe_device+0x54/0xec
__driver_attach+0xe0/0xf0
bus_for_each_dev+0x7c/0xc8
driver_attach+0x30/0x3c
bus_add_driver+0x17c/0x1c4
driver_register+0xc0/0xf8
__platform_driver_register+0x34/0x40
...
This issue is introduced by commit d0243bbd5d ("drivers core:
Free dma_range_map when driver probe failed"). It frees
dma_range_map before the call to devres_release_all, which is too
early. The solution is to free dma_range_map only after
devres_release_all.
Fixes: d0243bbd5d ("drivers core: Free dma_range_map when driver probe failed")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Filip Schauer <filip@mg6.at>
Link: https://lore.kernel.org/r/20210727112311.GA7645@DESKTOP-E8BN1B0.localdomain
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is intended as a replacement API for device_bind_driver(). It has at
least the following benefits:
- Internal locking. Few of the users of device_bind_driver() follow the
locking rules
- Calls device driver probe() internally. Notably this means that devm
support for probe works correctly as probe() error will call
devres_release_all()
- struct device_driver -> dev_groups is supported
- Simplified calling convention, no need to manually call probe().
The general usage is for situations that already know what driver to bind
and need to ensure the bind is synchronized with other logic. Call
device_driver_attach() after device_add().
If probe() returns a failure then this will be preserved up through to the
error return of device_driver_attach().
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20210617142218.1877096-6-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
EPROBE_DEFER is an internal kernel error code and it should not be leaked
to userspace via the bind_store() sysfs. Userspace doesn't have this
constant and cannot understand it.
Further, it doesn't really make sense to have userspace trigger a deferred
probe via bind_store(), which could eventually succeed, while
simultaneously returning an error back.
Resolve this by splitting driver_probe_device so that the version used
by the sysfs binding that turns EPROBE_DEFER into -EAGAIN, while the one
used for internally binding keeps the error code, and calls
driver_deferred_probe_add where needed. This also allows to nicely split
out the defer_all_probes / probe_count checks so that they actually allow
for full device_{block,unblock}_probing protection while not bothering
the sysfs bind case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20210617142218.1877096-5-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Currently really_probe() returns 1 on success and 0 if the probe() call
fails. This return code arrangement is designed to be useful for
__device_attach_driver() which is walking the device list and trying every
driver. 0 means to keep trying.
However, it is not useful for the other places that call through to
really_probe() that do actually want to see the probe() return code.
For instance bind_store() would be better to return the actual error code
from the driver's probe method, not discarding it and returning -ENODEV.
Reorganize things so that really_probe() returns the error code from
->probe as a (inverted) positive number, and 0 for successful attach.
With this, __device_attach_driver can ignore the (positive) probe errors,
return 1 to exit the loop for a successful binding and pass on the
other negative errors, while device_driver_attach simplify inverts the
positive errors and returns all errors to the sysfs code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/20210617142218.1877096-4-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Checking if the dev is dead or if the dev is already bound is a required
precondition to invoking driver_probe_device(). All the call chains
leading here duplicate these checks.
Add it directly to driver_probe_device() so the precondition is clear and
remove the checks from device_driver_attach() and
__driver_attach_async_helper().
The other call chain going through __device_attach_driver() does have
these same checks but they are inlined into logic higher up the call stack
and can't be removed.
The sysfs uAPI call chain starting at bind_store() is a bit confused
because it reads dev->driver unlocked and returns -ENODEV if it is !NULL,
otherwise it reads it again under lock and returns 0 if it is !NULL. Fix
this to always return -EBUSY and always read dev->driver under its lock.
Done in preparation for the next patches which will add additional
callers to driver_probe_device() and will need these checks as well.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[hch: drop the extra checks in device_driver_attach and bind_store]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/20210617142218.1877096-2-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
deferred_probe_timeout kernel commandline parameter allows probing of
consumer devices if the supplier devices don't have any drivers.
fw_devlink=on will indefintely block probe() calls on a device if all
its suppliers haven't probed successfully. This completely skips calls
to driver_deferred_probe_check_state() since that's only called when a
.probe() function calls framework APIs. So fw_devlink=on breaks
deferred_probe_timeout.
deferred_probe_timeout in its current state also ignores a lot of
information that's now available to the kernel. It assumes all suppliers
that haven't probed when the timer expires (or when initcalls are done
on a static kernel) will never probe and fails any calls to acquire
resources from these unprobed suppliers.
However, this assumption by deferred_probe_timeout isn't true under many
conditions. For example:
- If the consumer happens to be before the supplier in the deferred
probe list.
- If the supplier itself is waiting on its supplier to probe.
This patch fixes both these issues by relaxing device links between
devices only if the supplier doesn't have any driver that could match
with (NOT bound to) the supplier device. This way, we only fail attempts
to acquire resources from suppliers that truly don't have any driver vs
suppliers that just happen to not have probed yet.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210402040342.2944858-3-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Deferred probe usually runs only on pinned kworkers, which might take
longer time if a device contains multiple sub-devices. One such case
is of sound card on mobile devices, where we have good number of
mixers and controls per mixer.
We observed boot up improvement - deferred probes take ~600ms when bound
to little core kworker and ~200ms when deferred probe is queued on
unbound wq. This is due to scheduler moving the worker running deferred
probe work to big CPUs. Without this change, we see the worker is running
on LITTLE CPU due to affinity.
Since kworker runs deferred probe of several devices, the locality may
not be important. Also, init thread executing driver initcalls, can
potentially migrate as it has cpu affinity set to all cpus.In addition
to this, async probes use unbounded workqueue. So, using unbounded wq for
deferred probes looks to be similar to these w.r.t. scheduling behavior.
Signed-off-by: Yogesh Lal <ylal@codeaurora.org>
Link: https://lore.kernel.org/r/1616583698-6398-1-git-send-email-ylal@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When retrying a deferred probe, any old defer reason string should be
discarded. Otherwise, if the probe is deferred again at a different spot,
but without setting a message, the now incorrect probe reason will remain.
This was observed with the i.MX I2C driver, which ultimately failed
to probe due to lack of the GPIO driver. The probe defer for GPIO
doesn't record a message, but a previous probe defer to clock_get did.
This had the effect that /sys/kernel/debug/devices_deferred listed
a misleading probe deferral reason.
Cc: stable <stable@vger.kernel.org>
Fixes: d090b70ede ("driver core: add deferring probe reason to devices_deferred property")
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Link: https://lore.kernel.org/r/20210319110459.19966-1-a.fatoum@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There will be memory leak if driver probe failed. Trace as below:
backtrace:
[<000000002415258f>] kmemleak_alloc+0x3c/0x50
[<00000000f447ebe4>] __kmalloc+0x208/0x530
[<0000000048bc7b3a>] of_dma_get_range+0xe4/0x1b0
[<0000000041e39065>] of_dma_configure_id+0x58/0x27c
[<000000006356866a>] platform_dma_configure+0x2c/0x40
......
[<000000000afcf9b5>] ret_from_fork+0x10/0x3c
This issue is introduced by commit e0d072782c73("dma-mapping:
introduce DMA range map, supplanting dma_pfn_offset "). It doesn't
free dma_range_map when driver probe failed and cause above
memory leak. So, add code to free it in error path.
Fixes: e0d072782c ("dma-mapping: introduce DMA range map, supplanting dma_pfn_offset ")
Cc: stable@vger.kernel.org
Signed-off-by: Meng Li <Meng.Li@windriver.com>
Link: https://lore.kernel.org/r/20210105070927.14968-1-Meng.Li@windriver.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Device drivers usually depend on the fact that the devices that they
control are suspended in the same order that they were probed in. In
most cases this is already guaranteed via deferred probe.
However, there's one case where this can still break: if a device is
instantiated before a dependency (for example if it appears before the
dependency in device tree) but gets probed only after the dependency is
probed. Instantiation order would cause the dependency to get probed
later, in which case probe of the original device would be deferred and
the suspend/resume queue would get reordered properly. However, if the
dependency is provided by a built-in driver and the device depending on
that driver is controlled by a loadable module, which may only get
loaded after the root filesystem has become available, we can be faced
with a situation where the probe order ends up being different from the
suspend/resume order.
One example where this happens is on Tegra186, where the ACONNECT is
listed very early in device tree (sorted by unit-address) and depends on
BPMP (listed very late because it has no unit-address) for power domains
and clocks/resets. If the ACONNECT driver is built-in, there is no
problem because it will be probed before BPMP, causing a probe deferral
and that in turn reorders the suspend/resume queue. However, if built as
a module, it will end up being probed after BPMP, and therefore not
result in a probe deferral, and therefore the suspend/resume queue will
stay in the instantiation order. This in turn causes problems because
ACONNECT will be resumed before BPMP, which will result in a hang
because the ACONNECT's power domain cannot be powered on as long as the
BPMP is still suspended.
Fix this by always reordering devices on successful probe. This ensures
that the suspend/resume queue is always in probe order and hence meets
the natural expectations of drivers vs. their dependencies.
Reported-by: Jonathan Hunter <jonathanh@nvidia.com>
Acked-by: Rafael. J. Wysocki <rafael@kernel.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20201203175756.1405564-1-thierry.reding@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 716a7a2596.
The fw_devlink_pause/resume() APIs added by the commit being reverted
were a first cut attempt at optimizing boot time. But these APIs don't
fully solve the problem and are very fragile (can only be used for the
top level devices being added). This series replaces them with a much
better optimization that works for all device additions and also has the
benefit of reducing the complexity of the firmware (DT, EFI) specific
code and abstracting out common code to driver core.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20201121020232.908850-7-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>