Commit Graph

91 Commits

Author SHA1 Message Date
Kirill Tkhai 2f635ceeb2 net: Drop pernet_operations::async
Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27 13:18:09 -04:00
Christian Brauner 692ec06d7c netns: send uevent messages
This patch adds a receive method to NETLINK_KOBJECT_UEVENT netlink sockets
to allow sending uevent messages into the network namespace the socket
belongs to.

Currently non-initial network namespaces are already isolated and don't
receive uevents. There are a number of cases where it is beneficial for a
sufficiently privileged userspace process to send a uevent into a network
namespace.

One such use case would be debugging and fuzzing of a piece of software
which listens and reacts to uevents. By running a copy of that software
inside a network namespace, specific uevents could then be presented to it.
More concretely, this would allow for easy testing of udevd/ueventd.

This will also allow some piece of software to run components inside a
separate network namespace and then effectively filter what that software
can receive. Some examples of software that do directly listen to uevents
and that we have in the past attempted to run inside a network namespace
are rbd (CEPH client) or the X server.

Implementation:
The implementation has been kept as simple as possible from the kernel's
perspective. Specifically, a simple input method uevent_net_rcv() is added
to NETLINK_KOBJECT_UEVENT sockets which completely reuses existing
af_netlink infrastructure and does neither add an additional netlink family
nor requires any user-visible changes.

For example, by using netlink_rcv_skb() we can make use of existing netlink
infrastructure to report back informative error messages to userspace.

Furthermore, this implementation does not introduce any overhead for
existing uevent generating codepaths. The struct netns got a new uevent
socket member that records the uevent socket associated with that network
namespace including its position in the uevent socket list. Since we record
the uevent socket for each network namespace in struct net we don't have to
walk the whole uevent socket list. Instead we can directly retrieve the
relevant uevent socket and send the message. At exit time we can now also
trivially remove the uevent socket from the uevent socket list. This keeps
the codepath very performant without introducing needless overhead and even
makes older codepaths faster.

Uevent sequence numbers are kept global. When a uevent message is sent to
another network namespace the implementation will simply increment the
global uevent sequence number and append it to the received uevent. This
has the advantage that the kernel will never need to parse the received
uevent message to replace any existing uevent sequence numbers. Instead it
is up to the userspace process to remove any existing uevent sequence
numbers in case the uevent message to be sent contains any.

Security:
In order for a caller to send uevent messages to a target network namespace
the caller must have CAP_SYS_ADMIN in the owning user namespace of the
target network namespace. Additionally, any received uevent message is
verified to not exceed size UEVENT_BUFFER_SIZE. This includes the space
needed to append the uevent sequence number.

Testing:
This patch has been tested and verified to work with the following udev
implementations:
1. CentOS 6 with udevd version 147
2. Debian Sid with systemd-udevd version 237
3. Android 7.1.1 with ueventd

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 11:16:43 -04:00
Christian Brauner 94e5e3087a net: add uevent socket member
This commit adds struct uevent_sock to struct net. Since struct uevent_sock
records the position of the uevent socket in the uevent socket list we can
trivially remove it from the uevent socket list during cleanup. This speeds
up the old removal codepath.
Note, list_del() will hit __list_del_entry_valid() in its call chain which
will validate that the element is a member of the list. If it isn't it will
take care that the list is not modified.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 11:16:42 -04:00
Kirill Tkhai 15898a011b net: Convert uevent_net_ops
uevent_net_init() and uevent_net_exit() create and
destroy netlink socket, and these actions serialized
in netlink code.

Parallel execution with other pernet_operations
makes the socket disappear earlier from uevent_sock_list
on ->exit. As userspace can't be interested in broadcast
messages of dying net, and, as I see, no one in kernel
listen them, we may safely make uevent_net_ops async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13 10:36:07 -05:00
Greg Kroah-Hartman 8c9076b07c Merge 4.15-rc6 into driver-core-next
We want the fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 14:56:51 +01:00
Dmitry Torokhov 9b3fa47d4a kobject: fix suppressing modalias in uevents delivered over netlink
The commit 4a336a23d6 ("kobject: copy env blob in one go") optimized
constructing uevent data for delivery over netlink by using the raw
environment buffer, instead of reconstructing it from individual
environment pointers. Unfortunately in doing so it broke suppressing
MODALIAS attribute for KOBJ_UNBIND events, as the code that suppressed this
attribute only adjusted the environment pointers, but left the buffer
itself alone. Let's fix it by making sure the offending attribute is
obliterated form the buffer as well.

Reported-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Casey Leedom <leedom@chelsio.com>
Fixes: 4a336a23d6 ("kobject: copy env blob in one go")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-21 11:10:33 +01:00
Greg Kroah-Hartman 045c5f75b7 kobject: Remove redundant license text
Now that the SPDX tag is in all kobject files, that identifies the
license in a specific and legally-defined manner.  So the extra GPL text
wording can be removed as it is no longer needed at all.

This is done on a quest to remove the 700+ different ways that files in
the kernel describe the GPL license text.  And there's unneeded stuff
like the address (sometimes incorrect) for the FSF which is never
needed.

No copyright headers or other non-license-description text was removed.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-07 18:36:43 +01:00
Greg Kroah-Hartman d9d16e16a3 kobject: add SPDX identifiers to all kobject files
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.

Update the kobject files files with the correct SPDX license identifier
based on the license text in the file itself.  The SPDX identifier is a
legally binding shorthand, which can be used instead of the full boiler
plate text.

This work is based on a script and data from Thomas Gleixner, Philippe
Ombredanne, and Kate Stewart.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-07 18:36:43 +01:00
David S. Miller 53954cf8c5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 18:19:22 -07:00
Eric Dumazet d464e84eed kobject: factorize skb setup in kobject_uevent_net_broadcast()
We can build one skb and let it be cloned in netlink.

This is much faster, and use less memory (all clones will
share the same skb->head)

Tested:

time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.110 MB perf.data (~179584 samples) ]

real    0m24.227s # instead of 0m52.554s
user    0m0.329s
sys 0m23.753s # instead of 0m51.375s

    14.77%       ip  [kernel.kallsyms]  [k] __ip6addrlbl_add
    14.56%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
    11.65%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
     6.19%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
     5.66%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
     4.97%       ip  [kernel.kallsyms]  [k] memset_erms
     4.67%       ip  [kernel.kallsyms]  [k] refcount_sub_and_test
     4.41%       ip  [kernel.kallsyms]  [k] _raw_read_lock
     3.59%       ip  [kernel.kallsyms]  [k] refcount_inc_not_zero
     3.13%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     1.55%       ip  [kernel.kallsyms]  [k] __wake_up
     1.20%       ip  [kernel.kallsyms]  [k] strlen
     1.03%       ip  [kernel.kallsyms]  [k] __wake_up_common
     0.93%       ip  [kernel.kallsyms]  [k] consume_skb
     0.92%       ip  [kernel.kallsyms]  [k] netlink_trim
     0.87%       ip  [kernel.kallsyms]  [k] insert_header
     0.63%       ip  [kernel.kallsyms]  [k] unmap_page_range

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:32:23 -07:00
Eric Dumazet 4a336a23d6 kobject: copy env blob in one go
No need to iterate over strings, just copy in one efficient memcpy() call.

Tested:
time perf record "(for f in `seq 1 3000` ; do ip netns add tast$f; done)"
[ perf record: Woken up 10 times to write data ]
[ perf record: Captured and wrote 8.224 MB perf.data (~359301 samples) ]

real    0m52.554s  # instead of 1m7.492s
user    0m0.309s
sys 0m51.375s # instead of 1m6.875s

     9.88%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
     8.86%       ip  [kernel.kallsyms]  [k] string
     7.37%       ip  [kernel.kallsyms]  [k] __ip6addrlbl_add
     5.68%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
     5.52%       ip  [kernel.kallsyms]  [k] memcpy_erms
     4.76%       ip  [kernel.kallsyms]  [k] __alloc_skb
     4.54%       ip  [kernel.kallsyms]  [k] vsnprintf
     3.94%       ip  [kernel.kallsyms]  [k] format_decode
     3.80%       ip  [kernel.kallsyms]  [k] kmem_cache_alloc_node_trace
     3.71%       ip  [kernel.kallsyms]  [k] kmem_cache_alloc_node
     3.66%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
     3.38%       ip  [kernel.kallsyms]  [k] strlen
     2.65%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
     2.20%       ip  [kernel.kallsyms]  [k] kfree
     2.09%       ip  [kernel.kallsyms]  [k] memset_erms
     2.07%       ip  [kernel.kallsyms]  [k] ___cache_free
     1.95%       ip  [kernel.kallsyms]  [k] kmem_cache_free
     1.91%       ip  [kernel.kallsyms]  [k] _raw_read_lock
     1.45%       ip  [kernel.kallsyms]  [k] ksize
     1.25%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     1.00%       ip  [kernel.kallsyms]  [k] widen_string

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:32:23 -07:00
Eric Dumazet 16dff336b3 kobject: add kobject_uevent_net_broadcast()
This removes some #ifdef pollution and will ease follow up patches.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:32:23 -07:00
Dmitry Torokhov 6878e7de6a driver core: suppress sending MODALIAS in UNBIND uevents
The current udev rules cause modules to be loaded on all device events save
for "remove". With the introduction of KOBJ_BIND/KOBJ_UNBIND this causes
issues, as driver modules that have devices bound to their drivers get
immediately reloaded, and it appears to the user that module unloading doe
snot work.

The standard udev matching rule is foillowing:

ENV{MODALIAS}=="?*", RUN{builtin}+="kmod load $env{MODALIAS}"

Given that MODALIAS data is not terribly useful for UNBIND event, let's zap
it from the generated uevent environment until we get userspace updated
with the correct udev rule that only loads modules on "add" event.

Reported-by: Jakub Kicinski <kubakici@wp.pl>
Tested-by: Jakub Kicinski <kubakici@wp.pl>
Fixes: 1455cf8dbf ("driver core: emit uevents when device is bound ...")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-18 16:48:33 +02:00
Greg Kroah-Hartman 1af824f085 Merge branch 'bind_unbind' into driver-core-next
This merges the bind_unbind driver core feature into the
driver-core-next branch.  bind_unbind is a branch so that others can
pull and work off of it safely.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-22 12:00:15 +02:00
Dmitry Torokhov 1455cf8dbf driver core: emit uevents when device is bound to a driver
There are certain touch controllers that may come up in either normal
(application) or boot mode, depending on whether firmware/configuration is
corrupted when they are powered on. In boot mode the kernel does not create
input device instance (because it does not necessarily know the
characteristics of the input device in question).

Another number of controllers does not store firmware in a non-volatile
memory, and they similarly need to have firmware loaded before input device
instance is created. There are also other types of devices with similar
behavior.

There is a desire to be able to trigger firmware loading via udev, but it
has to happen only when driver is bound to a physical device (i2c or spi).
These udev actions can not use ADD events, as those happen too early, so we
are introducing BIND and UNBIND events that are emitted at the right
moment.

Also, many drivers create additional driver-specific device attributes
when binding to the device, to provide userspace with additional controls.
The new events allow userspace to adjust these driver-specific attributes
without worrying that they are not there yet.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-22 11:59:23 +02:00
Peter Rajnoha f36776fafb kobject: support passing in variables for synthetic uevents
This patch makes it possible to pass additional arguments in addition
to uevent action name when writing /sys/.../uevent attribute. These
additional arguments are then inserted into generated synthetic uevent
as additional environment variables.

Before, we were not able to pass any additional uevent environment
variables for synthetic uevents. This made it hard to identify such uevents
properly in userspace to make proper distinction between genuine uevents
originating from kernel and synthetic uevents triggered from userspace.
Also, it was not possible to pass any additional information which would
make it possible to optimize and change the way the synthetic uevents are
processed back in userspace based on the originating environment of the
triggering action in userspace. With the extra additional variables, we are
able to pass through this extra information needed and also it makes it
possible to synchronize with such synthetic uevents as they can be clearly
identified back in userspace.

The format for writing the uevent attribute is following:

    ACTION [UUID [KEY=VALUE ...]

There's no change in how "ACTION" is recognized - it stays the same
("add", "change", "remove"). The "ACTION" is the only argument required
to generate synthetic uevent, the rest of arguments, that this patch
adds support for, are optional.

The "UUID" is considered as transaction identifier so it's possible to
use the same UUID value for one or more synthetic uevents in which case
we logically group these uevents together for any userspace listeners.
The "UUID" is expected to be in "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
format where "x" is a hex digit. The value appears in uevent as
"SYNTH_UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" environment variable.

The "KEY=VALUE" pairs can contain alphanumeric characters only. It's
possible to define zero or more more pairs - each pair is then delimited
by a space character " ". Each pair appears in synthetic uevents as
"SYNTH_ARG_KEY=VALUE" environment variable. That means the KEY name gains
"SYNTH_ARG_" prefix to avoid possible collisions with existing variables.
To pass the "KEY=VALUE" pairs, it's also required to pass in the "UUID"
part for the synthetic uevent first.

If "UUID" is not passed in, the generated synthetic uevent gains
"SYNTH_UUID=0" environment variable automatically so it's possible to
identify this situation in userspace when reading generated uevent and so
we can still make a difference between genuine and synthetic uevents.

Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 18:30:51 +02:00
Julia Lawall edbbf994bd kobject: improve function-level documentation
In the first case, rename the second variable to correspond to the name
found in the function parameter list.

In the remaining cases, reorder the variables to correspond to their order
in the parameter list.

Issue detected using Coccinelle (http://coccinelle.lip6.fr/)

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 02:42:10 -04:00
Rasmus Villemoes a69ae45c26 lib/kobject_uevent.c: remove redundant include
The file doesn't seem to use anything from linux/user_namespace.h, and
removing it yields byte-identical object code and strictly fewer
dependencies in the .cmd file.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:15 -08:00
Michael Marineau 86d56134f1 kobject: Make support for uevent_helper optional.
Support for uevent_helper, aka hotplug, is not required on many systems
these days but it can still be enabled via sysfs or sysctl.

Reported-by: Darren Shepherd <darren.s.shepherd@gmail.com>
Signed-off-by: Michael Marineau <mike@marineau.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-25 12:00:49 -07:00
Vladimir Davydov bcccff93af kobject: don't block for each kobject_uevent
Currently kobject_uevent has somewhat unpredictable semantics.  The
point is, since it may call a usermode helper and wait for it to execute
(UMH_WAIT_EXEC), it is impossible to say for sure what lock dependencies
it will introduce for the caller - strictly speaking it depends on what
fs the binary is located on and the set of locks fork may take.  There
are quite a few kobject_uevent's users that do not take this into
account and call it with various mutexes taken, e.g.  rtnl_mutex,
net_mutex, which might potentially lead to a deadlock.

Since there is actually no reason to wait for the usermode helper to
execute there, let's make kobject_uevent start the helper asynchronously
with the aid of the UMH_NO_WAIT flag.

Personally, I'm interested in this, because I really want kobject_uevent
to be called under the slab_mutex in the slub implementation as it used
to be some time ago, because it greatly simplifies synchronization and
automatically fixes a kmemcg-related race.  However, there was a
deadlock detected on an attempt to call kobject_uevent under the
slab_mutex (see https://lkml.org/lkml/2012/1/14/45), which was reported
to be fixed by releasing the slab_mutex for kobject_uevent.

Unfortunately, there was no information about who exactly blocked on the
slab_mutex causing the usermode helper to stall, neither have I managed
to find this out or reproduce the issue.

BTW, this is not the first attempt to make kobject_uevent use
UMH_NO_WAIT.  Previous one was made by commit f520360d93 ("kobject:
don't block for each kobject_uevent"), but it was wrong (it passed
arguments allocated on stack to async thread) so it was reverted in
05f54c13cd ("Revert "kobject: don't block for each kobject_uevent".").
It targeted on speeding up the boot process though.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Greg KH <greg@kroah.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:04 -07:00
Weilong Chen 82ef3d5d5f net: fix "queues" uevent between network namespaces
When I create a new namespace with 'ip netns add net0', or add/remove
new links in a namespace with 'ip link add/delete type veth', rx/tx
queues events can be got in all namespaces. That is because rx/tx queue
ktypes do not have namespace support, and their kobj parents are setted to
NULL. This patch is to fix it.

Reported-by: Libo Chen <chenlibo@huawei.com>
Signed-off-by: Libo Chen <chenlibo@huawei.com>
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 20:02:02 -08:00
Pablo Neira Ayuso 9f00d9776b netlink: hide struct module parameter in netlink_kernel_create
This patch defines netlink_kernel_create as a wrapper function of
__netlink_kernel_create to hide the struct module *me parameter
(which seems to be THIS_MODULE in all existing netlink subsystems).

Suggested by David S. Miller.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-08 18:46:30 -04:00
Pablo Neira Ayuso 9785e10aed netlink: kill netlink_set_nonroot
Replace netlink_set_nonroot by one new field `flags' in
struct netlink_kernel_cfg that is passed to netlink_kernel_create.

This patch also renames NL_NONROOT_* to NL_CFG_F_NONROOT_* since
now the flags field in nl_table is generic (so we can add more
flags if needed in the future).

Also adjust all callers in the net-next tree to use these flags
instead of netlink_set_nonroot.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-08 18:45:27 -04:00
Pablo Neira Ayuso a31f2d17b3 netlink: add netlink_kernel_cfg parameter to netlink_kernel_create
This patch adds the following structure:

struct netlink_kernel_cfg {
        unsigned int    groups;
        void            (*input)(struct sk_buff *skb);
        struct mutex    *cb_mutex;
};

That can be passed to netlink_kernel_create to set optional configurations
for netlink kernel sockets.

I've populated this structure by looking for NULL and zero parameters at the
existing code. The remaining parameters that always need to be set are still
left in the original interface.

That includes optional parameters for the netlink socket creation. This allows
easy extensibility of this interface in the future.

This patch also adapts all callers to use this new interface.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-29 16:46:02 -07:00
Linus Torvalds 11bcb32848 Merge tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
 "Fix up files in fs/ and lib/ dirs to only use module.h if they really
  need it.

  These are trivial in scope vs the work done previously.  We now have
  things where any few remaining cleanups can be farmed out to arch or
  subsystem maintainers, and I have done so when possible.  What is
  remaining here represents the bits that don't clearly lie within a
  single arch/subsystem boundary, like the fs dir and the lib dir.

  Some duplicate includes arising from overlapping fixes from
  independent subsystem maintainer submissions are also quashed."

Fix up trivial conflicts due to clashes with other include file cleanups
(including some due to the previous bug.h cleanup pull).

* tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  lib: reduce the use of module.h wherever possible
  fs: reduce the use of module.h wherever possible
  includecheck: delete any duplicate instances of module.h
2012-03-24 10:24:31 -07:00