Commit 6f4c618ddb ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:
skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
end:0x440 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
<IRQ>
[<ffffffff8144fb1c>] skb_put+0x5c/0x70
[<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
[<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
[<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
[<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
[<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
[<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
[<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
[<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
[<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
[<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
[<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
[<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
[<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
[<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
[<ffffffff81497078>] ip_local_deliver+0x98/0xa0
[<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
[<ffffffff81496ac5>] ip_rcv+0x275/0x350
[<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
[<ffffffff81460588>] netif_receive_skb+0x58/0x60
This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...
-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
------------------ ASCONF; UNKNOWN ------------------>
... where ASCONF chunk of length 280 contains 2 parameters ...
1) Add IP address parameter (param length: 16)
2) Add/del IP address parameter (param length: 255)
... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.
The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.
In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.
When walking to the next parameter this time, we proceed
with ...
length = ntohs(asconf_param->param_hdr.length);
asconf_param = (void *)asconf_param + length;
... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.
Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.
Joint work with Vlad Yasevich.
Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug: Unable to send and receive Ethernet packets with Micrel PHY.
Affected devices:
KSZ8031RNL (commercial temp)
KSZ8031RNLI (industrial temp)
Description:
PHY device is correctly detected during probe.
PHY power-up default is 25MHz crystal clock input
and output 50MHz RMII clock to MAC.
Reconfiguration of PHY to input 50MHz RMII clock from MAC
causes PHY to become unresponsive if clock source is changed
after Operation Mode Strap Override (OMSO) register setup.
Cause:
Long lead times on parts where clock setup match circuit design
forces the usage of similar parts with wrong default setup.
Solution:
Swapped KSZ8031 register setup and added phy_write return code validation.
Tested with Freescale i.MX28 Fast Ethernet Controler (fec).
Signed-off-by: Bruno Thomsen <bth@kamstrup.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli says:
====================
net: bcmgenet & systemport fixes
This patch series fixes an off-by-one error introduced during a previous
change, and the two other fixes fix a wake depth imbalance situation for
the Wake-on-LAN interrupt line.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Multiple enable_irq_wake() calls will keep increasing the IRQ
wake_depth, which ultimately leads to the following types of
situation:
1) enable Wake-on-LAN interrupt w/o password
2) enable Wake-on-LAN interrupt w/ password
3) enable Wake-on-LAN interrupt w/o password
4) disable Wake-on-LAN interrupt
After step 4), SYSTEMPORT would always wake-up the system no matter what
wake-up device we use, which is not what we want. Fix this by making
sure there are no unbalanced enable_irq_wake() calls.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Multiple enable_irq_wake() calls will keep increasing the IRQ
wake_depth, which ultimately leads to the following types of
situation:
1) enable Wake-on-LAN interrupt w/o password
2) enable Wake-on-LAN interrupt w/ password
3) enable Wake-on-LAN interrupt w/o password
4) disable Wake-on-LAN interrupt
After step 4), GENET would always wake-up the system no matter what
wake-up device we use, which is not what we want. Fix this by making
sure there are no unbalanced enable_irq_wake() calls.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b629be5c83 ("net: bcmgenet: check
harder for out of memory conditions") moved the increment of the local
read pointer *before* reading from the hardware descriptor using
dmadesc_get_length_status(), which creates an off-by-one situation.
Fix this by moving again the read_ptr increment after we have read the
hardware descriptor to get both the control block and the read pointer
back in sync.
Fixes: b629be5c83 ("net: bcmgenet: check harder for out of memory conditions")
Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Petri Gynther <pgynther@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
net: fix races accessing page->_count
This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
The only case it is valid is when page->_count is 0, we can use this in
__netdev_alloc_frag()
Note that I never seen crashes caused by these races, the issue was reported
by Andres Lagar-Cavilla and Hugh Dickins.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
The only case it is valid is when page->_count is 0
Fixes: 540eb7bf0b ("net: Update alloc frag to reduce get/put page usage and recycle pages")
Signed-off-by: Eric Dumaze <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is illegal to use atomic_set(&page->_count, 2) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is illegal to use atomic_set(&page->_count, 2) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is illegal to use atomic_set(&page->_count, 2) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The KSZ8021 and KSZ8031 support RMII reference input clocks of 25MHz
and 50MHz. Both PHYs differ in the default frequency they expect
after reset. If this differs from the actual input clock, then
register 0x1f bit 7 must be changed.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses a kernel unaligned access bug seen on a sparc64 system
with an igb adapter. Specifically the __skb_flow_get_ports was returning a
be32 pointer which was then having the value directly returned.
In order to prevent this it is actually easier to simply not populate the
ports or address values when an skb is not present. In this case the
assumption is that the data isn't needed and rather than slow down the
faster aligned accesses by making them have to assume the unaligned path on
architectures that don't support efficent unaligned access it makes more
sense to simply switch off the bits that were copying the source and
destination address/port for the case where we only care about the protocol
types and lengths which are normally 16 bit fields anyway.
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. sk_run_filter has been renamed, sk_filter() is using SK_RUN_FILTER.
2. Remove wrong comments about storing intermediate value.
3. replace sk_run_filter with __bpf_prog_run for check_load_and_stores's
comments
Cc: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__sk_run_filter has been renamed as __bpf_prog_run, so replace them in comments
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Baron says:
====================
macvlan: optimize receive path
So after porting this optimization to net-next, I found that the netperf
results of TCP_RR regress right at the maximum peak of transactions/sec. That
is as I increase the number of threads via the first argument to super_netperf,
the number of transactions/sec keep increasing, peak, and then start
decreasing. It is right at the peak, that I see a small regression with this
patch (see results in patch 2/2).
Without the patch, the ksoftirqd threads are the top cpu consumers threads on
the system, since the extra 'netif_rx()', is queuing more softirq work, whereas
with the patch, the ksoftirqd threads are below all of the 'netserver' threads
in terms of their cpu usage. So there appears to be some interaction between how
softirqs are serviced at the peak here and this patch. I think the test results
are still supportive of this approach, but I wanted to be clear on my findings.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The netif_rx() call on the fast path of macvlan_handle_frame() appears to
be there to ensure that we properly throttle incoming packets. However, it
would appear as though the proper throttling is already in place for all
possible ingress paths, and that the call is redundant. If packets are arriving
from the physical NIC, we've already throttled them by this point. Otherwise,
if they are coming via macvlan_queue_xmit(), it calls either
'dev_forward_skb()', which ends up calling netif_rx_internal(), or else in
the broadcast case, we are throttling via macvlan_broadcast_enqueue().
The test results below are from off the box to an lxc instance running macvlan.
Once the tranactions/sec stop increasing, the cpu idle time has gone to 0.
Results are from a quad core Intel E3-1270 V2@3.50GHz box with bnx2x 10G card.
for i in {10,100,200,300,400,500};
do super_netperf $i -H $ip -t TCP_RR; done
Average of 5 runs.
trans/sec trans/sec
(3.17-rc7-net-next) (3.17-rc7-net-next + this patch)
---------- ----------
208101 211534 (+1.6%)
839493 850162 (+1.3%)
845071 844053 (-.12%)
816330 819623 (+.4%)
778700 789938 (+1.4%)
735984 754408 (+2.5%)
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass last argument to macvlan_count_rx() as the correct bool type.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian says:
====================
Add 10GbE support to APM X-Gene SoC ethernet driver
Adding 10GbE support to APM X-Gene SoC ethernet driver.
v4: Address comments from v3
* dtb: resolved merge conflict for the net tree
v3: Address comments from v2
* dtb: changed to use all-zeros for the mac address
v2: Address comments from v1
* created preparatory patch to review before adding new functionality
* dtb: updated to use tabs consistently
v1:
* Initial version
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- Added 10GbE support
- Removed unused macros/variables
- Moved mac_init call to the end of hardware init
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Rearranged code to pave the way for adding 10GbE support
- Added mac_ops structure containing function pointers for mac specific functions
- Added port_ops structure containing function pointers for port specific functions
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>