linux

mirror of https://github.com/armbian/linux.git synced 2026-01-06 10:13:00 -08:00

Author	SHA1	Message	Date
Jiri Pirko	fddd8b501c	netfilter: push reasm skb through instead of original frag skbs [ Upstream commit `6aafeef03b` ] Pushing original fragments through causes several problems. For example for matching, frags may not be matched correctly. Take following example: <example> On HOSTA do: ip6tables -I INPUT -p icmpv6 -j DROP ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT and on HOSTB you do: ping6 HOSTA -s2000 (MTU is 1500) Incoming echo requests will be filtered out on HOSTA. This issue does not occur with smaller packets than MTU (where fragmentation does not happen) </example> As was discussed previously, the only correct solution seems to be to use reassembled skb instead of separete frags. Doing this has positive side effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams dances in ipvs and conntrack can be removed. Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c entirely and use code in net/ipv6/reassembly.c instead. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-12-08 07:29:25 -08:00
Ansis Atteka	68a9e70789	ip: generate unique IP identificator if local fragmentation is allowed [ Upstream commit `703133de33` ] If local fragmentation is allowed, then ip_select_ident() and ip_select_ident_more() need to generate unique IDs to ensure correct defragmentation on the peer. For example, if IPsec (tunnel mode) has to encrypt large skbs that have local_df bit set, then all IP fragments that belonged to different ESP datagrams would have used the same identificator. If one of these IP fragments would get lost or reordered, then peer could possibly stitch together wrong IP fragments that did not belong to the same datagram. This would lead to a packet loss or data corruption. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-13 16:08:30 -07:00
Julian Anastasov	06f3d7f973	ipvs: SCTP ports should be writable in ICMP packets Make sure that SCTP ports are writable when embedded in ICMP from client, so that ip_vs_nat_icmp can translate them safely. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-06-19 09:53:52 +09:00
Dan Carpenter	a8241c6351	ipvs: info leak in __ip_vs_get_dest_entries() The entry struct has a 2 byte hole after ->port and another 4 byte hole after ->stats.outpkts. You must have CAP_NET_ADMIN in your namespace to hit this information leak. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-06-10 14:53:00 +02:00
Jan Beulich	a70b9641e6	ipvs: ip_vs_sh: fix build kfree_rcu() requires offsetof(..., rcu_head) < 4096, which can get violated with a sufficiently high CONFIG_IP_VS_SH_TAB_BITS. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-05-29 17:50:39 +02:00
Grzegorz Lyczba	dc7b3eb900	ipvs: Fix reuse connection if real server is dead Expire cached connection for new TCP/SCTP connection if real server is down. Otherwise, IPVS uses the dead server for the reused connection, instead of a new working one. Signed-off-by: Grzegorz Lyczba <grzegorz.lyczba@gmail.com> Acked-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-05-27 13:00:45 +02:00
David S. Miller	58717686cf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c drivers/net/ethernet/emulex/benet/be.h include/net/tcp.h net/mac802154/mac802154.h Most conflicts were minor overlapping stuff. The be2net driver brought in some fixes that added __vlan_put_tag calls, which in net-next take an additional argument. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-30 03:55:20 -04:00
Hans Schillstrom	f7a1dd6e3a	ipvs: ip_vs_sip_fill_param() BUG: bad check of return value The reason for this patch is crash in kmemdup caused by returning from get_callid with uniialized matchoff and matchlen. Removing Zero check of matchlen since it's done by ct_sip_get_header() BUG: unable to handle kernel paging request at ffff880457b5763f IP: [<ffffffff810df7fc>] kmemdup+0x2e/0x35 PGD 27f6067 PUD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: xt_state xt_helper nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle xt_connmark xt_conntrack ip6_tables nf_conntrack_ftp ip_vs_ftp nf_nat xt_tcpudp iptable_mangle xt_mark ip_tables x_tables ip_vs_rr ip_vs_lblcr ip_vs_pe_sip ip_vs nf_conntrack_sip nf_conntrack bonding igb i2c_algo_bit i2c_core CPU 5 Pid: 0, comm: swapper/5 Not tainted 3.9.0-rc5+ #5 /S1200KP RIP: 0010:[<ffffffff810df7fc>] [<ffffffff810df7fc>] kmemdup+0x2e/0x35 RSP: 0018:ffff8803fea03648 EFLAGS: 00010282 RAX: ffff8803d61063e0 RBX: 0000000000000003 RCX: 0000000000000003 RDX: 0000000000000003 RSI: ffff880457b5763f RDI: ffff8803d61063e0 RBP: ffff8803fea03658 R08: 0000000000000008 R09: 0000000000000011 R10: 0000000000000011 R11: 00ffffffff81a8a3 R12: ffff880457b5763f R13: ffff8803d67f786a R14: ffff8803fea03730 R15: ffffffffa0098e90 FS: 0000000000000000(0000) GS:ffff8803fea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff880457b5763f CR3: 0000000001a0c000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper/5 (pid: 0, threadinfo ffff8803ee18c000, task ffff8803ee18a480) Stack: ffff8803d822a080 000000000000001c ffff8803fea036c8 ffffffffa000937a ffffffff81f0d8a0 000000038135fdd5 ffff880300000014 ffff880300110000 ffffffff150118ac ffff8803d7e8a000 ffff88031e0118ac 0000000000000000 Call Trace: <IRQ> [<ffffffffa000937a>] ip_vs_sip_fill_param+0x13a/0x187 [ip_vs_pe_sip] [<ffffffffa007b209>] ip_vs_sched_persist+0x2c6/0x9c3 [ip_vs] [<ffffffff8107dc53>] ? __lock_acquire+0x677/0x1697 [<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d [<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d [<ffffffff810649bc>] ? sched_clock_cpu+0x43/0xcf [<ffffffffa007bb1e>] ip_vs_schedule+0x181/0x4ba [ip_vs] ... Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-29 11:35:30 -04:00
Simon Horman	38561437d0	ipvs: Use network byte order for sync message size struct ip_vs_sync_mesg and ip_vs_sync_mesg_v0 are both sent across the wire and used internally to store IPVS synchronisation messages. Up until now the scheme used has been to convert the size field to network byte order before sending a message on the wire and convert it to host byte order when sending a message. This patch changes that scheme to always treat the field as being network byte order. This seems appropriate as the structure is sent across the wire. And by consistently treating the field has network byte order it is now possible to take advantage of sparse to flag any future miss-use. Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:06 +09:00
Dan Carpenter	4bfbfbf91f	ipvs: off by one in set_sctp_state() The sctp_events[] come from sch->type in set_sctp_state(). They are between 0-255 so that means we need 256 elements in the array. I believe that because of how the code is aligned there is normally a hole after sctp_events[] so this patch doesn't actually change anything. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:06 +09:00
Simon Horman	9c37510b8f	ipvs: Use min3() in ip_vs_dbg_callid() There are two motivations for this: 1. It improves readability to my eyes 2. Using nested min() calls results in a shadowed _min1 variable, which is a bit untidy. Sparse complained about this. I have also replaced (size_t)64 with a variable of type size_t and value 64. This also improves readability to my eyes. Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:06 +09:00
Simon Horman	9fd0fa7ac3	ipvs: Avoid shadowing net variable in ip_vs_leave() Flagged by sparse. Compile and sparse tested only. Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:06 +09:00
Julian Anastasov	0a925864c1	ipvs: fix sparse warnings for some parameters Some service fields are in network order: - netmask: used once in network order and also as prefix len for IPv6 - port Other parameters are in host order: - struct ip_vs_flags: flags and mask moved between user and kernel only - sync state: moved between user and kernel only - syncid: sent over network as single octet Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:05 +09:00
Julian Anastasov	f33c8b94fd	ipvs: fix sparse warnings in lblc and lblcr kbuild test robot reports for sparse warnings in commits `c2a4ffb70e` ("ipvs: convert lblc scheduler to rcu") and `c5549571f9` ("ipvs: convert lblcr scheduler to rcu"). Fix it by removing extra __rcu annotation. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:05 +09:00
Julian Anastasov	371990eeec	ipvs: fix the remaining sparse warnings in ip_vs_ctl.c - RCU annotations for ip_vs_info_seq_start and _stop - __percpu for cpustats - properly dereference svc->pe in ip_vs_genl_fill_service Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:05 +09:00
Julian Anastasov	7cf2eb7bcc	ipvs: fix sparse warnings for ip_vs_conn listing kbuild test robot reports for sparse warnings in commit `088339a57d` ("ipvs: convert connection locking"): net/netfilter/ipvs/ip_vs_conn.c:962:13: warning: context imbalance in 'ip_vs_conn_array' - wrong count at exit include/linux/rcupdate.h:326:30: warning: context imbalance in 'ip_vs_conn_seq_next' - unexpected unlock include/linux/rcupdate.h:326:30: warning: context imbalance in 'ip_vs_conn_seq_stop' - unexpected unlock Fix it by running ip_vs_conn_array under RCU lock to avoid conditional locking and by adding proper RCU annotations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:05 +09:00
Julian Anastasov	d717bb2a98	ipvs: properly dereference dest_dst in ip_vs_forget_dev Use rcu_dereference_protected to resolve sparse warning, found by kbuild test robot: net/netfilter/ipvs/ip_vs_ctl.c:1464:35: warning: dereference of noderef expression Problem from commit `026ace060d` ("ipvs: optimize dst usage for real server") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-23 11:43:05 +09:00
Julian Anastasov	ac69269a45	ipvs: do not disable bh for long time We used a global BH disable in LOCAL_OUT hook. Add _bh suffix to all places that need it and remove the disabling from LOCAL_OUT and sync code. Functions like ip_defrag need protection from BH, so add it. As for nf_nat_mangle_tcp_packet, it needs RCU lock. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:23:58 +02:00
Julian Anastasov	ceec4c3816	ipvs: convert services to rcu This is the final step in RCU conversion. Things that are removed: - svc->usecnt: now svc is accessed under RCU read lock - svc->inc: and some unused code - ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE - __ip_vs_svc_lock: replaced with RCU - IP_VS_WAIT_WHILE: now readers lookup svcs and dests under RCU and work in parallel with configuration Other changes: - before now, a RCU read-side critical section included the calling of the schedule method, now it is extended to include service lookup - ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist - svc->pe and svc->scheduler remain to the end (of grace period), the schedulers are prepared for such RCU readers even after done_service is called but they need to use synchronize_rcu because last ip_vs_scheduler_put can happen while RCU read-side critical sections use an outdated svc->scheduler pointer - as planned, update_service is removed - empty services can be freed immediately after grace period. If dests were present, the services are freed from the dest trash code Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:23:58 +02:00
Julian Anastasov	413c2d04e9	ipvs: convert dests to rcu In previous commits the schedulers started to access svc->destinations with _rcu list traversal primitives because the IP_VS_WAIT_WHILE macro still plays the role of grace period. Now it is time to finish the updating part, i.e. adding and deleting of dests with _rcu suffix before removing the IP_VS_WAIT_WHILE in next commit. We use the same rule for conns as for the schedulers: dests can be searched in RCU read-side critical section where ip_vs_dest_hold can be called by ip_vs_bind_dest. Some things are not perfect, for example, calling functions like ip_vs_lookup_dest from updating code under RCU, just because we use some function both from reader and from updater. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:23:57 +02:00
Julian Anastasov	ba3a3ce14e	ipvs: convert sched_lock to spin lock As all read_locks are gone spin lock is preferred. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:23:56 +02:00
Julian Anastasov	ed3ffc4e48	ipvs: do not expect result from done_service This method releases the scheduler state, it can not fail. Such change will help to properly replace the scheduler in following patch. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:23:56 +02:00
Julian Anastasov	578bc3ef1e	ipvs: reorganize dest trash All dests will go to trash, no exceptions. But we have to use new list node t_list for this, due to RCU changes in following patches. Dests will wait there initial grace period and later all conns and schedulers to put their reference. The dests don't get reference for staying in dest trash as before. As result, we do not load ip_vs_dest_put with extra checks for last refcnt and the schedulers do not need to play games with atomic_inc_not_zero while selecting best destination. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:23:55 +02:00
Julian Anastasov	08cb2d032f	ipvs: convert wrr scheduler to rcu The schedule method now needs _rcu list-traversal primitive for svc->destinations. As the weight for some dest can be reduced during dest selection, change the algorithm to check weights by using minimum weights in the 1 .. max_weight-(di-1) range, with the same step (di). By this way we ensure that there will be always a weight >= 1 check before claiming that all destinations are overloaded. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:23:54 +02:00
Julian Anastasov	b310faad3e	ipvs: convert wlc scheduler to rcu The schedule method now needs _rcu list-traversal primitive for svc->destinations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:23:54 +02:00

1 2 3 4 5 ...

368 Commits