Commit Graph

316 Commits

Author SHA1 Message Date
蔡正龙
a9302e8439 alpha: Enable system-call auditing support.
Signed-off-by: Zhenglong.cai <zhenglong.cai@cs2c.com.cn>
Signed-off-by: Matt Turner <mattst88@gmail.com>
2014-01-31 09:21:55 -08:00
Linus Torvalds
4ba9920e5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BPF debugger and asm tool by Daniel Borkmann.

 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

 3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

 4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match.  From Ben Hutchings.

 5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

 6) Implement auto corking in TCP, from Eric Dumazet.  Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

 9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

10) Fix ipv6 router reachability probing, from Jiri Benc.

11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

12) Support tunneling in GRO layer, from Jerry Chu.

13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI.  From Atzm Watanabe.

15) New "Heavy Hitter" qdisc, from Terry Lam.

16) Significantly improve the IPSEC support in pktgen, from Fan Du.

17) Allow ipv4 tunnels to cache routes, just like sockets.  From Tom
    Herbert.

18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

20) Key TCP metrics blobs also by source address, not just destination
    address.  From Christoph Paasch.

21) Support 10G in generic phylib.  From Andy Fleming.

22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided.  From Tom Herbert.

The wireless and netfilter folks have been busy little bees too.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
  net/cxgb4: Fix referencing freed adapter
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  fib_frontend: fix possible NULL pointer dereference
  rtnetlink: remove IFLA_BOND_SLAVE definition
  rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
  qlcnic: update version to 5.3.55
  qlcnic: Enhance logic to calculate msix vectors.
  qlcnic: Refactor interrupt coalescing code for all adapters.
  qlcnic: Update poll controller code path
  qlcnic: Interrupt code cleanup
  qlcnic: Enhance Tx timeout debugging.
  qlcnic: Use bool for rx_mac_learn.
  bonding: fix u64 division
  rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
  sfc: Use the correct maximum TX DMA ring size for SFC9100
  Add Shradha Shah as the sfc driver maintainer.
  net/vxlan: Share RX skb de-marking and checksum checks with ovs
  tulip: cleanup by using ARRAY_SIZE()
  ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
  net/cxgb4: Don't retrieve stats during recovery
  ...
2014-01-25 11:17:34 -08:00
Michal Sekletar
ea02f9411d net: introduce SO_BPF_EXTENSIONS
For user space packet capturing libraries such as libpcap, there's
currently only one way to check which BPF extensions are supported
by the kernel, that is, commit aa1113d9f8 ("net: filter: return
-EINVAL if BPF_S_ANC* operation is not supported"). For querying all
extensions at once this might be rather inconvenient.

Therefore, this patch introduces a new option which can be used as
an argument for getsockopt(), and allows one to obtain information
about which BPF extensions are supported by the current kernel.

As David Miller suggests, we do not need to define any bits right
now and status quo can just return 0 in order to state that this
versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
additions to BPF extensions need to add their bits to the
bpf_tell_extensions() function, as documented in the comment.

Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Cc: David Miller <davem@davemloft.net>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 19:08:58 -08:00
Peter Zijlstra
93ea02bb84 arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h
We're going to be adding a few new barrier primitives, and in order to
avoid endless duplication make more agressive use of
asm-generic/barrier.h.

Change the asm-generic/barrier.h such that it allows partial barrier
definitions and fills out the rest with defaults.

There are a few architectures (m32r, m68k) that could probably
do away with their barrier.h file entirely but are kept for now due to
their unconventional nop() implementation.

Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20131213150640.846368594@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 10:37:15 +01:00
David S. Miller
e3fec2f74f lib: Add missing arch generic-y entries for asm-generic/hash.h
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 21:26:19 -05:00
Linus Torvalds
d5bdaf4f68 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha
Pull alpha updates from Matt Turner:
 "It contains a few fixes and some work from Richard to make alpha
  emulation under QEMU much more usable"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
  alpha: Prevent a NULL ptr dereference in csum_partial_copy.
  alpha: perf: fix out-of-bounds array access triggered from raw event
  alpha: Use qemu+cserve provided high-res clock and alarm.
  alpha: Switch to GENERIC_CLOCKEVENTS
  alpha: Enable the rpcc clocksource for single processor
  alpha: Reorganize rtc handling
  alpha: Primitive support for CPU power down.
  alpha: Allow HZ to be configured
  alpha: Notice if we're being run under QEMU
  alpha: Eliminate compiler warning from memset macro
2013-11-20 15:03:45 -08:00
Linus Torvalds
4007162647 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq cleanups from Ingo Molnar:
 "This is a multi-arch cleanup series from Thomas Gleixner, which we
  kept to near the end of the merge window, to not interfere with
  architecture updates.

  This series (motivated by the -rt kernel) unifies more aspects of IRQ
  handling and generalizes PREEMPT_ACTIVE"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  preempt: Make PREEMPT_ACTIVE generic
  sparc: Use preempt_schedule_irq
  ia64: Use preempt_schedule_irq
  m32r: Use preempt_schedule_irq
  hardirq: Make hardirq bits generic
  m68k: Simplify low level interrupt handling code
  genirq: Prevent spurious detection for unconditionally polled interrupts
2013-11-19 10:40:00 -08:00
Richard Henderson
4914d7b458 alpha: Use qemu+cserve provided high-res clock and alarm.
QEMU provides a high-resolution timer and alarm; use this for
a clock source and clock event source when available.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-11-16 16:33:21 -08:00
Richard Henderson
85d0b3a573 alpha: Reorganize rtc handling
Discontinue use of GENERIC_CMOS_UPDATE; rely on the RTC subsystem.

The marvel platform requires that the rtc only be touched from the
boot cpu.  This had been partially implemented with hooks for
get/set_rtc_time, but read/update_persistent_clock were not handled.
Move the hooks from the machine_vec to a special rtc_class_ops struct.

We had read_persistent_clock managing the epoch against which the
rtc hw is based, but this didn't apply to get_rtc_time or set_rtc_time.
This resulted in incorrect values when hwclock(8) gets involved.

Allow the epoch to be set from the kernel command-line, overriding
the autodetection, which is doomed to fail in 2020.  Further, by
implementing the rtc ioctl function, we can expose this epoch to
userland.

Elide the alarm functions that RTC_DRV_CMOS implements.  This was
highly questionable on Alpha, since the interrupt is used by the
system timer.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-11-16 16:33:16 -08:00
Richard Henderson
7f3bbb82e0 alpha: Primitive support for CPU power down.
Use WTINT to wait for the next interrupt.  Squash the WTINT call
if the PALcode doesn't support it (e.g. MILO).  No attempt is yet
made to skip clock ticks during normal scheduling in order to stay
in power down mode longer.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-11-16 16:33:15 -08:00
Richard Henderson
994dcf7055 alpha: Notice if we're being run under QEMU
When building a generic kernel, do a run-time check on the serial
number, like we do for MILO.  When building a custom kernel, make
this a configure-time check.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-11-16 16:33:12 -08:00
Richard Henderson
a47e5bb576 alpha: Eliminate compiler warning from memset macro
Compiling with GCC 4.8 yields several instances of

crypto/vmac.c: In function ‘vmac_final’:
crypto/vmac.c:616:9: warning: value computed is not used [-Wunused-value]
  memset(&mac, 0, sizeof(vmac_t));
         ^
arch/alpha/include/asm/string.h:31:25: note: in definition of macro ‘memset’
     ? __builtin_memset((s),0,(n))          \
                         ^
Converting the macro to an inline function eliminates this problem.

However, doing only that causes problems with the GCC 3.x series.  The
inline function cannot be named "memset", as otherwise we wind up with
recursion via __builtin_memset.  Solve this by adjusting the symbols
such that __memset is the inline, and ___memset is the real function.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-11-16 16:33:09 -08:00
Kirill A. Shutemov
3fd681b68c alpha: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:16 +09:00
Thomas Gleixner
00d1a39e69 preempt: Make PREEMPT_ACTIVE generic
No point in having this bit defined by architecture.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130917183629.090698799@linutronix.de
2013-11-13 20:21:47 +01:00
Linus Torvalds
42a2d923cc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) The addition of nftables.  No longer will we need protocol aware
    firewall filtering modules, it can all live in userspace.

    At the core of nftables is a, for lack of a better term, virtual
    machine that executes byte codes to inspect packet or metadata
    (arriving interface index, etc.) and make verdict decisions.

    Besides support for loading packet contents and comparing them, the
    interpreter supports lookups in various datastructures as
    fundamental operations.  For example sets are supports, and
    therefore one could create a set of whitelist IP address entries
    which have ACCEPT verdicts attached to them, and use the appropriate
    byte codes to do such lookups.

    Since the interpreted code is composed in userspace, userspace can
    do things like optimize things before giving it to the kernel.

    Another major improvement is the capability of atomically updating
    portions of the ruleset.  In the existing netfilter implementation,
    one has to update the entire rule set in order to make a change and
    this is very expensive.

    Userspace tools exist to create nftables rules using existing
    netfilter rule sets, but both kernel implementations will need to
    co-exist for quite some time as we transition from the old to the
    new stuff.

    Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
    worked so hard on this.

 2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
    to our pseudo-random number generator, mostly used for things like
    UDP port randomization and netfitler, amongst other things.

    In particular the taus88 generater is updated to taus113, and test
    cases are added.

 3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
    and Yang Yingliang.

 4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
    Sujir.

 5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
    Neal Cardwell, and Yuchung Cheng.

 6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
    control message data, much like other socket option attributes.
    From Francesco Fusco.

 7) Allow applications to specify a cap on the rate computed
    automatically by the kernel for pacing flows, via a new
    SO_MAX_PACING_RATE socket option.  From Eric Dumazet.

 8) Make the initial autotuned send buffer sizing in TCP more closely
    reflect actual needs, from Eric Dumazet.

 9) Currently early socket demux only happens for TCP sockets, but we
    can do it for connected UDP sockets too.  Implementation from Shawn
    Bohrer.

10) Refactor inet socket demux with the goal of improving hash demux
    performance for listening sockets.  With the main goals being able
    to use RCU lookups on even request sockets, and eliminating the
    listening lock contention.  From Eric Dumazet.

11) The bonding layer has many demuxes in it's fast path, and an RCU
    conversion was started back in 3.11, several changes here extend the
    RCU usage to even more locations.  From Ding Tianhong and Wang
    Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
    Falico.

12) Allow stackability of segmentation offloads to, in particular, allow
    segmentation offloading over tunnels.  From Eric Dumazet.

13) Significantly improve the handling of secret keys we input into the
    various hash functions in the inet hashtables, TCP fast open, as
    well as syncookies.  From Hannes Frederic Sowa.  The key fundamental
    operation is "net_get_random_once()" which uses static keys.

    Hannes even extended this to ipv4/ipv6 fragmentation handling and
    our generic flow dissector.

14) The generic driver layer takes care now to set the driver data to
    NULL on device removal, so it's no longer necessary for drivers to
    explicitly set it to NULL any more.  Many drivers have been cleaned
    up in this way, from Jingoo Han.

15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.

16) Improve CRC32 interfaces and generic SKB checksum iterators so that
    SCTP's checksumming can more cleanly be handled.  Also from Daniel
    Borkmann.

17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
    using the interface MTU value.  This helps avoid PMTU attacks,
    particularly on DNS servers.  From Hannes Frederic Sowa.

18) Use generic XPS for transmit queue steering rather than internal
    (re-)implementation in virtio-net.  From Jason Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
  random32: add test cases for taus113 implementation
  random32: upgrade taus88 generator to taus113 from errata paper
  random32: move rnd_state to linux/random.h
  random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
  random32: add periodic reseeding
  random32: fix off-by-one in seeding requirement
  PHY: Add RTL8201CP phy_driver to realtek
  xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
  macmace: add missing platform_set_drvdata() in mace_probe()
  ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
  ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
  vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
  ixgbe: add warning when max_vfs is out of range.
  igb: Update link modes display in ethtool
  netfilter: push reasm skb through instead of original frag skbs
  ip6_output: fragment outgoing reassembled skb properly
  MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
  net_sched: tbf: support of 64bit rates
  ixgbe: deleting dfwd stations out of order can cause null ptr deref
  ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
  ...
2013-11-13 17:40:34 +09:00
Eric Sandeen
0ca4343518 errno.h: remove "NFS" from descriptions in comments
glibc recently changed the error string for ESTALE to remove "NFS" -

https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=96945714ec61951cc748da2b4b8a80cf02127ee9

from: [ERR_REMAP (ESTALE)] = N_("Stale NFS file handle"),
to:   [ERR_REMAP (ESTALE)] = N_("Stale file handle"),

And some have expressed concern that the kernel's errno.h
comments still refer to NFS.

So make that change... note that this is a comment-only change,
and has no functional difference.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:12 +09:00
Eric Dumazet
62748f32d5 net: introduce SO_MAX_PACING_RATE
As mentioned in commit afe4fd0624 ("pkt_sched: fq: Fair Queue packet
scheduler"), this patch adds a new socket option.

SO_MAX_PACING_RATE offers the application the ability to cap the
rate computed by transport layer. Value is in bytes per second.

u32 val = 1000000;
setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));

To be effectively paced, a flow must use FQ packet scheduler.

Note that a packet scheduler takes into account the headers for its
computations. The effective payload rate depends on MSS and retransmits
if any.

I chose to make this pacing rate a SOL_SOCKET option instead of a
TCP one because this can be used by other protocols.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-28 15:35:41 -07:00
Peter Zijlstra
a787870924 sched, arch: Create asm/preempt.h
In order to prepare to per-arch implementations of preempt_count move
the required bits into an asm-generic header and use this for all
archs.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-h5j0c1r3e3fk015m30h8f1zx@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:07:50 +02:00
Richard Henderson
46931bf67c alpha: Force the user-visible HZ to a constant 1024.
This kernel/user split was done long ago for other architectures.

Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-19 13:54:26 -07:00
Richard Henderson
748a76b515 alpha: Implement atomic64_dec_if_positive
Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-19 13:54:24 -07:00
Richard Henderson
6da7539734 alpha: Improve atomic_add_unless
Use ll/sc loops instead of C loops around cmpxchg.
Update the atomic64_add_unless block comment to match the code.

Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-19 13:54:24 -07:00
Richard Henderson
01350eb6c0 alpha: Add kcmp and finit_module syscalls
Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-19 13:54:23 -07:00
Will Deacon
8b3ed3c002 alpha: locks: remove unused arch_*_relax operations
The arch_{spin,read,write}_relax macros are not used anywhere in the
kernel and are typically just aliases for cpu_relax().

This patch removes the unused definitions for Alpha.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2013-07-19 13:54:23 -07:00
Linus Torvalds
41d9884c44 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs stuff from Al Viro:
 "O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including
  making simple_lookup() usable for filesystems with non-NULL s_d_op,
  which allows us to get rid of quite a bit of ugliness"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  sunrpc: now we can just set ->s_d_op
  cgroup: we can use simple_lookup() now
  efivarfs: we can use simple_lookup() now
  make simple_lookup() usable for filesystems that set ->s_d_op
  configfs: don't open-code d_alloc_name()
  __rpc_lookup_create_exclusive: pass string instead of qstr
  rpc_create_*_dir: don't bother with qstr
  llist: llist_add() can use llist_add_batch()
  llist: fix/simplify llist_add() and llist_add_batch()
  fput: turn "list_head delayed_fput_list" into llist_head
  fs/file_table.c:fput(): add comment
  Safer ABI for O_TMPFILE
2013-07-14 11:42:26 -07:00
Al Viro
bb458c644a Safer ABI for O_TMPFILE
[suggested by Rasmus Villemoes] make O_DIRECTORY | O_RDWR part of O_TMPFILE;
that will fail on old kernels in a lot more cases than what I came up with.
And make sure O_CREAT doesn't get there...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-13 13:26:37 +04:00