introduce new setsockopt() command:
setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))
where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER
setsockopt() calls bpf_prog_get() which increments refcnt of the program,
so it doesn't get unloaded while socket is using the program.
The same eBPF program can be attached to multiple sockets.
User task exit automatically closes socket which calls sk_filter_uncharge()
which decrements refcnt of eBPF program
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ieee802154/fakehard.c
A bug fix went into 'net' for ieee802154/fakehard.c, which is removed
in 'net-next'.
Add build fix into the merge from Stephen Rothwell in openvswitch, the
logging macros take a new initial 'log' argument, a new call was added
in 'net' so when we merge that in here we have to explicitly add the
new 'log' arg to it else the build fails.
Signed-off-by: David S. Miller <davem@davemloft.net>
Alternative to RPS/RFS is to use hardware support for multiple
queues.
Then split a set of million of sockets into worker threads, each
one using epoll() to manage events on its own socket pool.
Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
know after accept() or connect() on which queue/cpu a socket is managed.
We normally use one cpu per RX queue (IRQ smp_affinity being properly
set), so remembering on socket structure which cpu delivered last packet
is enough to solve the problem.
After accept(), connect(), or even file descriptor passing around
processes, applications can use :
int cpu;
socklen_t len = sizeof(cpu);
getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);
And use this information to put the socket into the right silo
for optimal performance, as all networking stack should run
on the appropriate cpu, without need to send IPI (RPS/RFS).
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch over the msgctl, shmat, shmctl and semtimedop syscalls to use the compat
layer. The problem was found with the debian procenv package, which called
shmctl(0, SHM_INFO, &info);
in which the shmctl syscall then overwrote parts of the surrounding areas on
the stack on which the info variable was stored and thus lead to a segfault
later on.
Additionally fix the definition of struct shminfo64 to use unsigned longs like
the other architectures. This has no impact on userspace since we only have a
32bit userspace up to now.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v3.10+
This patch reduces the value of SIGRTMIN on PARISC from 37 to 32, thus
increasing the number of available RT signals and bring it in sync with other
Linux architectures.
Historically we wanted to natively support HP-UX 32bit binaries with the
PA-RISC Linux port. Because of that we carried the various available signals
from HP-UX (e.g. SIGEMT and SIGLOST) and folded them in between the native
Linux signals. Although this was the right decision at that time, this
required us to increase SIGRTMIN to at least 37 which left us with 27 (64-37)
RT signals.
Those 27 RT signals haven't been a problem in the past, but with the upcoming
importance of systemd we now got the problem that systemd alloctes (hardcoded)
signals up to SIGRTMIN+29 which is beyond our NSIG of 64. Because of that we
have not been able to use systemd on the PARISC Linux port yet.
Of course we could ask the systemd developers to not use those hardcoded
values, but this change is very unlikely, esp. with PA-RISC being a niche
architecture.
The other possibility would be to increase NSIG to e.g. 128, but this would
mean to duplicate most of the existing Linux signal handling code into the
parisc specific Linux kernel tree which would most likely introduce lots of new
bugs beside the code duplication.
The third option is to drop some HP-UX signals and shuffle some other signals
around to bring SIGRTMIN to 32. This is of course an ABI change, but testing
has shown that existing Linux installations are not visibly affected by this
change - most likely because we move those signals around which are rarely used
and move them to slots which haven't been used in Linux yet. In an existing
installation I was able to exchange either the Linux kernel or glibc (or both)
without affecting the boot process and installed applications.
Dropping the HP-UX signals isn't an issue either, since support for HP-UX was
basically dropped a few months back with Kernel 3.14 in commit
f5a408d53e already, when we changed EWOULDBLOCK
to be equal to EAGAIN.
So, even if this is an ABI change, it's better to change it now and thus bring
PARISC Linux in sync with other architectures to avoid other issues in the
future.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Carlos O'Donell <carlos@systemhalted.org>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: PARISC Linux Kernel Mailinglist <linux-parisc@vger.kernel.org>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Commit: e676253b19 (serial/8250: Add
support for RS485 IOCTLs), adds support for RS485 ioctls for 825_core on
all the archs. Unfortunaltely the definition of TIOCSRS485 and
TIOCGRS485 was missing on the ioctls.h file
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The sa_restorer field in struct sigaction is obsolete and no longer in
the parisc implementation. However, the core code assumes the field is
present if SA_RESTORER is defined. So, the define needs to be removed.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
There are only a couple of architectures that override _STK_LIM_MAX to
a non-infinity value. This changes the stack allocation semantics in
subtle ways. For example, GNU make changes its stack allocation to the
hard maximum defined by _STK_LIM_MAX. As a results, threads executed
by processes running under make are allocated a stack size of
_STK_LIM_MAX rather than a sensible default value. This causes various
thread stress tests to fail when they can't muster more than about 50
threads.
The attached change implements the default behavior used by the
majority of architectures.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Reviewed-by: Carlos O'Donell <carlos@systemhalted.org>
Cc: stable@vger.kernel.org # 3.14
Signed-off-by: Helge Deller <deller@gmx.de>
We seem to be nearly the only platform which does not provide the
sys_utimes syscall. Adding it now makes our life much easier with
userspace applications (like dietlibc and e2fsprogs) since we then
behave like all other platforms too and don't need extra patches which
are hard to get upstream anyway because we are not a mainstream
architecture.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v3.13
On Linux, only parisc uses a different value for EWOULDBLOCK which
causes a lot of troubles for applications not checking for both values.
Since the hpux compat is long dead, make EWOULDBLOCK behave the same as
all other architectures.
Signed-off-by: Guy Martin <gmsoft@tuxicoman.be>
Signed-off-by: Helge Deller <deller@gmx.de>
The stat.h header file is exported to userspace. Some userspace
applications failed to compile due to missing/unknown types, so we
better convert it to use native types only (like it's done on other
architectures too).
Signed-off-by: Helge Deller <deller@gmx.de>
For user space packet capturing libraries such as libpcap, there's
currently only one way to check which BPF extensions are supported
by the kernel, that is, commit aa1113d9f8 ("net: filter: return
-EINVAL if BPF_S_ANC* operation is not supported"). For querying all
extensions at once this might be rather inconvenient.
Therefore, this patch introduces a new option which can be used as
an argument for getsockopt(), and allows one to obtain information
about which BPF extensions are supported by the current kernel.
As David Miller suggests, we do not need to define any bits right
now and status quo can just return 0 in order to state that this
versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
additions to BPF extensions need to add their bits to the
bpf_tell_extensions() function, as documented in the comment.
Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Cc: David Miller <davem@davemloft.net>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SO_MAX_PACING_RATE definition on parisc got a typo.
Its not too late to fix it, before 3.13 is official.
Fixes: 62748f32d5 ("net: introduce SO_MAX_PACING_RATE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Break SOCK_NONBLOCK out to its own asm-file as other arches do. This
fixes build errors with auditd and probably other packages.
Signed-off-by: Helge Deller <deller@gmx.de>
Pull networking updates from David Miller:
1) The addition of nftables. No longer will we need protocol aware
firewall filtering modules, it can all live in userspace.
At the core of nftables is a, for lack of a better term, virtual
machine that executes byte codes to inspect packet or metadata
(arriving interface index, etc.) and make verdict decisions.
Besides support for loading packet contents and comparing them, the
interpreter supports lookups in various datastructures as
fundamental operations. For example sets are supports, and
therefore one could create a set of whitelist IP address entries
which have ACCEPT verdicts attached to them, and use the appropriate
byte codes to do such lookups.
Since the interpreted code is composed in userspace, userspace can
do things like optimize things before giving it to the kernel.
Another major improvement is the capability of atomically updating
portions of the ruleset. In the existing netfilter implementation,
one has to update the entire rule set in order to make a change and
this is very expensive.
Userspace tools exist to create nftables rules using existing
netfilter rule sets, but both kernel implementations will need to
co-exist for quite some time as we transition from the old to the
new stuff.
Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
worked so hard on this.
2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
to our pseudo-random number generator, mostly used for things like
UDP port randomization and netfitler, amongst other things.
In particular the taus88 generater is updated to taus113, and test
cases are added.
3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
and Yang Yingliang.
4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
Sujir.
5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
Neal Cardwell, and Yuchung Cheng.
6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
control message data, much like other socket option attributes.
From Francesco Fusco.
7) Allow applications to specify a cap on the rate computed
automatically by the kernel for pacing flows, via a new
SO_MAX_PACING_RATE socket option. From Eric Dumazet.
8) Make the initial autotuned send buffer sizing in TCP more closely
reflect actual needs, from Eric Dumazet.
9) Currently early socket demux only happens for TCP sockets, but we
can do it for connected UDP sockets too. Implementation from Shawn
Bohrer.
10) Refactor inet socket demux with the goal of improving hash demux
performance for listening sockets. With the main goals being able
to use RCU lookups on even request sockets, and eliminating the
listening lock contention. From Eric Dumazet.
11) The bonding layer has many demuxes in it's fast path, and an RCU
conversion was started back in 3.11, several changes here extend the
RCU usage to even more locations. From Ding Tianhong and Wang
Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
Falico.
12) Allow stackability of segmentation offloads to, in particular, allow
segmentation offloading over tunnels. From Eric Dumazet.
13) Significantly improve the handling of secret keys we input into the
various hash functions in the inet hashtables, TCP fast open, as
well as syncookies. From Hannes Frederic Sowa. The key fundamental
operation is "net_get_random_once()" which uses static keys.
Hannes even extended this to ipv4/ipv6 fragmentation handling and
our generic flow dissector.
14) The generic driver layer takes care now to set the driver data to
NULL on device removal, so it's no longer necessary for drivers to
explicitly set it to NULL any more. Many drivers have been cleaned
up in this way, from Jingoo Han.
15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.
16) Improve CRC32 interfaces and generic SKB checksum iterators so that
SCTP's checksumming can more cleanly be handled. Also from Daniel
Borkmann.
17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
using the interface MTU value. This helps avoid PMTU attacks,
particularly on DNS servers. From Hannes Frederic Sowa.
18) Use generic XPS for transmit queue steering rather than internal
(re-)implementation in virtio-net. From Jason Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
random32: add test cases for taus113 implementation
random32: upgrade taus88 generator to taus113 from errata paper
random32: move rnd_state to linux/random.h
random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
random32: add periodic reseeding
random32: fix off-by-one in seeding requirement
PHY: Add RTL8201CP phy_driver to realtek
xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
macmace: add missing platform_set_drvdata() in mace_probe()
ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
ixgbe: add warning when max_vfs is out of range.
igb: Update link modes display in ethtool
netfilter: push reasm skb through instead of original frag skbs
ip6_output: fragment outgoing reassembled skb properly
MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
net_sched: tbf: support of 64bit rates
ixgbe: deleting dfwd stations out of order can cause null ptr deref
ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
...