Commit Graph

406 Commits

Author SHA1 Message Date
Eric W. Biederman
eeb1bd5c40 net: Add a struct net parameter to sock_create_kern
This is long overdue, and is part of cleaning up how we allocate kernel
sockets that don't reference count struct net.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:50:17 -04:00
Johannes Berg
053c095a82 netlink: make nlmsg_end() and genlmsg_end() void
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

  if (genlmsg_end(...) < 0) { ... }

be a whole bunch of dead code. Many places also simply do

  return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

  if (my_function(...))
    /* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

-	return nlmsg_end(...);
+	nlmsg_end(...);
+	return 0;

I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with <= 0 in dump functionality, but that could just
be changed to < 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for <0 or <=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 01:03:45 -05:00
Linus Torvalds
cbfe0de303 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS changes from Al Viro:
 "First pile out of several (there _definitely_ will be more).  Stuff in
  this one:

   - unification of d_splice_alias()/d_materialize_unique()

   - iov_iter rewrite

   - killing a bunch of ->f_path.dentry users (and f_dentry macro).

     Getting that completed will make life much simpler for
     unionmount/overlayfs, since then we'll be able to limit the places
     sensitive to file _dentry_ to reasonably few.  Which allows to have
     file_inode(file) pointing to inode in a covered layer, with dentry
     pointing to (negative) dentry in union one.

     Still not complete, but much closer now.

   - crapectomy in lustre (dead code removal, mostly)

   - "let's make seq_printf return nothing" preparations

   - assorted cleanups and fixes

  There _definitely_ will be more piles"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  copy_from_iter_nocache()
  new helper: iov_iter_kvec()
  csum_and_copy_..._iter()
  iov_iter.c: handle ITER_KVEC directly
  iov_iter.c: convert copy_to_iter() to iterate_and_advance
  iov_iter.c: convert copy_from_iter() to iterate_and_advance
  iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
  iov_iter.c: convert iov_iter_zero() to iterate_and_advance
  iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
  iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
  iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
  iov_iter.c: iterate_and_advance
  iov_iter.c: macros for iterating over iov_iter
  kill f_dentry macro
  dcache: fix kmemcheck warning in switch_names
  new helper: audit_file()
  nfsd_vfs_write(): use file_inode()
  ncpfs: use file_inode()
  kill f_dentry uses
  lockd: get rid of ->f_path.dentry->d_sb
  ...
2014-12-10 16:10:49 -08:00
David Teigland
2ab4bd8ea3 dlm: adopt orphan locks
A process may exit, leaving an orphan lock in the lockspace.
This adds the capability for another process to acquire the
orphan lock.  Acquiring the orphan just moves the lock from
the orphan list onto the acquiring process's list of locks.

An adopting process must specify the resource name and mode
of the lock it wants to adopt.  If a matching lock is found,
the lock is moved to the caller's 's list of locks, and the
lkid of the lock is returned like the lkid of a new lock.

If an orphan with a different mode is found, then -EAGAIN is
returned.  If no orphan lock is found on the resource, then
-ENOENT is returned.  No async completion is used because
the result is immediately available.

Also, when orphans are purged, allow a zero nodeid to refer
to the local nodeid so the caller does not need to look up
the local nodeid.

Signed-off-by: David Teigland <teigland@redhat.com>
2014-11-19 14:48:02 -06:00
Joe Perches
f365ef9b79 dlm: Use seq_puts() instead of seq_printf() for constant strings
Convert the seq_printf output with constant strings to seq_puts.

Link: http://lkml.kernel.org/p/b416b016f4a6e49115ba736cad6ea2709a8bc1c4.1412031505.git.joe@perches.com

Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-05 14:13:09 -05:00
Joe Perches
d6d906b234 dlm: Remove seq_printf() return checks and use seq_has_overflowed()
The seq_printf() return is going away soon and users of it should
check seq_has_overflowed() to see if the buffer is full and will
not accept any more data.

Convert functions returning int to void where seq_printf() is used.

Link: http://lkml.kernel.org/p/43590057bcb83846acbbcc1fe641f792b2fb7773.1412031505.git.joe@perches.com
Link: http://lkml.kernel.org/r/20141029220107.939492048@goodmis.org

Acked-by: David Teigland <teigland@redhat.com>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-05 14:12:38 -05:00
Neale Ferguson
c07127b48c dlm: fix missing endian conversion of rcom_status flags
The flags are already converted to le when being sent,
but are not being converted back to cpu when received.

Signed-off-by: Neale Ferguson <neale@sinenomine.net>
Signed-off-by: David Teigland <teigland@redhat.com>
2014-10-14 15:11:48 -05:00
Joe Perches
d0449b90f8 locks: Remove unused conf argument from lm_grant
This argument is always NULL so don't pass it around.

[jlayton: remove dependencies on previous patches in series]

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-09-09 16:01:06 -04:00
Fabian Frederick
e0d9bf4cc0 fs/dlm/debug_fs.c: remove unnecessary null test before debugfs_remove
This fixes checkpatch warning:

  WARNING: debugfs_remove(NULL) is safe this check is probably not required

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:27 -07:00
Lidong Zhong
883854c545 dlm: keep listening connection alive with sctp mode
The connection struct with nodeid 0 is the listening socket,
not a connection to another node.  The sctp resend function
was not checking that the nodeid was valid (non-zero), so it
would mistakenly get and resend on the listening connection
when nodeid was zero.

Signed-off-by: Lidong Zhong <lzhong@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2014-06-12 10:26:14 -05:00
Fabian Frederick
c1d4518c4e fs/dlm/debug_fs.c: replace seq_printf by seq_puts
Replace seq_printf where possible.  This patch also fixes the following
checkpatch warning "unnecessary whitespace before a quoted newline"

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:18 -07:00
Fabian Frederick
6edb56871a fs/dlm/lockspace.c: convert simple_str to kstr
Replace obsolete functions.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:18 -07:00
Fabian Frederick
4f4c337fb7 fs/dlm/config.c: convert simple_str to kstr
Replace obsolete functions

simple_strtoul/kstrtouint
simple_strtol/kstrtoint
(kstr __must_check requires the right function to be applied)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:17 -07:00
David S. Miller
676d23690f net: Fix use after free by removing length arg from sk_data_ready callbacks.
Several spots in the kernel perform a sequence like:

	skb_queue_tail(&sk->s_receive_queue, skb);
	sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-11 16:15:36 -04:00
David Teigland
075f01775f dlm: use INFO for recovery messages
The log messages relating to the progress of recovery
are minimal and very often useful.  Change these to
the KERN_INFO level so they are always available.

Signed-off-by: David Teigland <teigland@redhat.com>
2014-02-14 11:54:44 -06:00
Rashika Kheria
9505857103 fs: Include appropriate header file in dlm/ast.c
Include appropriate header file fs/dlm/ast.h in fs/dlm/ast.c because it
contains function prototypes of some functions defined in fs/dlm/ast.c.

This also eliminates the following warning in fs/dlm/ast:
fs/dlm/ast.c:52:5: warning: no previous prototype for ‘dlm_add_lkb_callback’ [-Wmissing-prototypes]
fs/dlm/ast.c:113:5: warning: no previous prototype for ‘dlm_rem_lkb_callback’ [-Wmissing-prototypes]
fs/dlm/ast.c:174:6: warning: no previous prototype for ‘dlm_add_cb’ [-Wmissing-prototypes]
fs/dlm/ast.c:212:6: warning: no previous prototype for ‘dlm_callback_work’ [-Wmissing-prototypes]
fs/dlm/ast.c:267:5: warning: no previous prototype for ‘dlm_callback_start’ [-Wmissing-prototypes]
fs/dlm/ast.c:278:6: warning: no previous prototype for ‘dlm_callback_stop’ [-Wmissing-prototypes]
fs/dlm/ast.c:284:6: warning: no previous prototype for ‘dlm_callback_suspend’ [-Wmissing-prototypes]
fs/dlm/ast.c:292:6: warning: no previous prototype for ‘dlm_callback_resume’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2014-02-12 15:44:19 -06:00
Dan Carpenter
e8243f32f2 dlm: silence a harmless use after free warning
We pass the freed "r" pointer back to the caller.  It's harmless but it
upsets the static checkers.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2014-02-12 15:44:03 -06:00
Linus Torvalds
4ba9920e5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BPF debugger and asm tool by Daniel Borkmann.

 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

 3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

 4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match.  From Ben Hutchings.

 5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

 6) Implement auto corking in TCP, from Eric Dumazet.  Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

 9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

10) Fix ipv6 router reachability probing, from Jiri Benc.

11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

12) Support tunneling in GRO layer, from Jerry Chu.

13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI.  From Atzm Watanabe.

15) New "Heavy Hitter" qdisc, from Terry Lam.

16) Significantly improve the IPSEC support in pktgen, from Fan Du.

17) Allow ipv4 tunnels to cache routes, just like sockets.  From Tom
    Herbert.

18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

20) Key TCP metrics blobs also by source address, not just destination
    address.  From Christoph Paasch.

21) Support 10G in generic phylib.  From Andy Fleming.

22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided.  From Tom Herbert.

The wireless and netfilter folks have been busy little bees too.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
  net/cxgb4: Fix referencing freed adapter
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  fib_frontend: fix possible NULL pointer dereference
  rtnetlink: remove IFLA_BOND_SLAVE definition
  rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
  qlcnic: update version to 5.3.55
  qlcnic: Enhance logic to calculate msix vectors.
  qlcnic: Refactor interrupt coalescing code for all adapters.
  qlcnic: Update poll controller code path
  qlcnic: Interrupt code cleanup
  qlcnic: Enhance Tx timeout debugging.
  qlcnic: Use bool for rx_mac_learn.
  bonding: fix u64 division
  rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
  sfc: Use the correct maximum TX DMA ring size for SFC9100
  Add Shradha Shah as the sfc driver maintainer.
  net/vxlan: Share RX skb de-marking and checksum checks with ovs
  tulip: cleanup by using ARRAY_SIZE()
  ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
  net/cxgb4: Don't retrieve stats during recovery
  ...
2014-01-25 11:17:34 -08:00
wangweidong
048ed4b626 sctp: remove macros sctp_{lock|release}_sock
Redefined {lock|release}_sock to sctp_{lock|release}_sock for user space friendly
code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:41:36 -08:00
Dongmao Zhang
ece35848c1 dlm: set zero linger time on sctp socket
The recovery time for a failed node was taking a long
time because the failed node could not perform the full
shutdown process.  Removing the linger time speeds this
up.  The dlm does not care what happens to messages to
or from the failed node.

Signed-off-by: Dongmao Zhang <dmzhang@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-12-16 09:52:34 -06:00
Johannes Berg
c53ed74236 genetlink: only pass array to genl_register_family_with_ops()
As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.

The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:05 -05:00
Bart Van Assche
a97f4a66d8 dlm: Avoid that dlm_release_lockspace() incorrectly returns -EBUSY
When dlm_release_lockspace(ls, 1) is invoked on a busy system
immediately after the last dlm_unlock() AST has finished it can occur
that lkb_idr_is_local() is invoked for the unlocked LKB since removal
from ls_lkbidr only occurs after the AST has returned. If that happens
dlm_release_lockspace(ls, 1) will return -EBUSY instead of releasing
the lockspace. Fix this race condition by changing lkb_idr_is_local()
such that it only returns true for LKB's that have not yet been
unlocked.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-10-16 10:32:42 -05:00
David Teigland
c6ca7bc91d dlm: remove signal blocking
The signal blocking was incorrect and unnecessary
so just remove it.

Signed-off-by: David Teigland <teigland@redhat.com>
2013-08-12 15:22:43 -05:00
Tejun Heo
ededf305a8 dlm: WQ_NON_REENTRANT is meaningless and going away
dbf2576e37 ("workqueue: make all workqueues non-reentrant") made
WQ_NON_REENTRANT no-op and the flag is going away.  Remove its usages.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-07-30 09:24:24 -05:00
Bart Van Assche
cfa805f6f1 dlm: Avoid LVB truncation
For lockspaces with an LVB length above 64 bytes, avoid truncating
the LVB while exchanging it with another node in the cluster.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-26 11:38:02 -05:00