linux

mirror of https://github.com/FEX-Emu/linux.git synced 2024-12-25 10:59:05 +00:00

Author	SHA1	Message	Date
David S. Miller	95f873f2ff	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: arch/arm/boot/dts/imx6sx-sdb.dts net/sched/cls_bpf.c Two simple sets of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 16:59:56 -08:00
Christoph Hellwig	06539d3071	net: don't OOPS on socket aio Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 12:25:33 -08:00
Herbert Xu	8ea65f4a2d	netlink: Kill redundant net argument in netlink_insert The socket already carries the net namespace with it so there is no need to be passing another net around. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 00:30:30 -08:00
David S. Miller	bf693f7beb	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== ipsec 2015-01-26 Just two small fixes for _decode_session6() where we might decode to wrong header information in some rare situations. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 00:28:38 -08:00
Hannes Frederic Sowa	6e9e16e614	ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too Lubomir Rintel reported that during replacing a route the interface reference counter isn't correctly decremented. To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>: \| [root@rhel7-5 lkundrak]# sh -x lal \| + ip link add dev0 type dummy \| + ip link set dev0 up \| + ip link add dev1 type dummy \| + ip link set dev1 up \| + ip addr add 2001:db8:8086::2/64 dev dev0 \| + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20 \| + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10 \| + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20 \| + ip link del dev0 type dummy \| Message from syslogd@rhel7-5 at Jan 23 10:54:41 ... \| kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2 \| \| Message from syslogd@rhel7-5 at Jan 23 10:54:51 ... \| kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2 During replacement of a rt6_info we must walk all parent nodes and check if the to be replaced rt6_info got propagated. If so, replace it with an alive one. Fixes: `4a287eba2d` ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag") Reported-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 00:22:14 -08:00
subashab@codeaurora.org	fc752f1f43	ping: Fix race in free in receive path An exception is seen in ICMP ping receive path where the skb destructor sock_rfree() tries to access a freed socket. This happens because ping_rcv() releases socket reference with sock_put() and this internally frees up the socket. Later icmp_rcv() will try to free the skb and as part of this, skb destructor is called and which leads to a kernel panic as the socket is freed already in ping_rcv(). -->\|exception -007\|sk_mem_uncharge -007\|sock_rfree -008\|skb_release_head_state -009\|skb_release_all -009\|__kfree_skb -010\|kfree_skb -011\|icmp_rcv -012\|ip_local_deliver_finish Fix this incorrect free by cloning this skb and processing this cloned skb instead. This patch was suggested by Eric Dumazet Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 00:04:53 -08:00
Herbert Xu	86f3cddbc3	udp_diag: Fix socket skipping within chain While working on rhashtable walking I noticed that the UDP diag dumping code is buggy. In particular, the socket skipping within a chain never happens, even though we record the number of sockets that should be skipped. As this code was supposedly copied from TCP, this patch does what TCP does and resets num before we walk a chain. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 00:02:41 -08:00
David S. Miller	7d63585bf0	Another set of last-minute fixes: * fix station double-removal when suspending while associating * fix the HT (802.11n) header length calculation * fix the CCK radiotap flag used for monitoring, a pretty old regression but a simple one-liner * fix per-station group-key handling -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJUwmkTAAoJEDBSmw7B7bqr3X0P/3Y/2UnQSBSi2LnqmsuV8igR noAwaPuCf2ASs25v9nQv64JM1qC0kiemYesi3fFWHiqPhoxx9WrMSvjXdarynrQN LwBG74tcQVEeW70w+36yeKCS+wMU35eTzDDxdnYOfJeNOhW2rQE7WVT2drkc6lVL hvGRhqlBx0mVLQ3VCW5c1kmMN/VAttZKx5WWWurmA8OSQzcHJGZW6ZTuqwpDx4RT iqwi5uBtMkqTQV+22Zb4xI+YxLGwcGiYjy15KksPsrSZZeS5pJzeyovrDhACBVpE aVj3K+OAGVmXCN7HVpQ3tTqCaQea5o+EDqtnT/IQrjnmw6p4mC3co6ShiVn+DkTX 6nv/ka92k9q1LMGIzl+lje2cNmaerVrDa84gUqnHXbTr80+ugKu4NpZowQgdW4yG Qu4kQALkTRhUoNf+RUzXKcFqAW2qLKf83qLcrhbQWkSrY0hzLVWKI/1/FAkCyo6I zKhsB44mHMMaKGVq5qsHTD0E89PtiJuizDLiU04uJExYJCi1FCh9NDlc+HaVlTnt 5LHFbrsPDuAUKoKSiJmJUIyueL+Of5N34+epuLRZb55aE2AQhApg6Nxd+FMuro1m 0aeHEREEO04QZ7IUrYgBA4G7L1dsZJDD8auU6Y4V3chAg6ArbswbtzAv8ON1tAk+ pr0kThACKqkfg2tN5fny =DhIp -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2015-01-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Another set of last-minute fixes: * fix station double-removal when suspending while associating * fix the HT (802.11n) header length calculation * fix the CCK radiotap flag used for monitoring, a pretty old regression but a simple one-liner * fix per-station group-key handling Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 17:32:24 -08:00
Hannes Frederic Sowa	df4d92549f	ipv4: try to cache dst_entries which would cause a redirect Not caching dst_entries which cause redirects could be exploited by hosts on the same subnet, causing a severe DoS attack. This effect aggravated since commit `f886497212` ("ipv4: fix dst race in sk_dst_get()"). Lookups causing redirects will be allocated with DST_NOCACHE set which will force dst_release to free them via RCU. Unfortunately waiting for RCU grace period just takes too long, we can end up with >1M dst_entries waiting to be released and the system will run OOM. rcuos threads cannot catch up under high softirq load. Attaching the flag to emit a redirect later on to the specific skb allows us to cache those dst_entries thus reducing the pressure on allocation and deallocation. This issue was discovered by Marcelo Leitner. Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Marcelo Leitner <mleitner@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 17:28:27 -08:00
Daniel Borkmann	600ddd6825	net: sctp: fix slab corruption from use after free on INIT collisions When hitting an INIT collision case during the 4WHS with AUTH enabled, as already described in detail in commit `1be9a950c6` ("net: sctp: inherit auth_capable on INIT collisions"), it can happen that we occasionally still remotely trigger the following panic on server side which seems to have been uncovered after the fix from commit `1be9a950c6` ... [ 533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff [ 533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230 [ 533.940559] PGD 5030f2067 PUD 0 [ 533.957104] Oops: 0000 [#1] SMP [ 533.974283] Modules linked in: sctp mlx4_en [...] [ 534.939704] Call Trace: [ 534.951833] [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0 [ 534.984213] [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0 [ 535.015025] [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170 [ 535.045661] [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0 [ 535.074593] [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50 [ 535.105239] [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp] [ 535.138606] [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0 [ 535.166848] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b ... or depending on the the application, for example this one: [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff [ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0 [ 1370.054568] PGD 633c94067 PUD 0 [ 1370.070446] Oops: 0000 [#1] SMP [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...] [ 1370.963431] Call Trace: [ 1370.974632] [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960 [ 1371.000863] [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960 [ 1371.027154] [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170 [ 1371.054679] [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130 [ 1371.080183] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b With slab debugging enabled, we can see that the poison has been overwritten: [ 669.826368] BUG kmalloc-128 (Tainted: G W ): Poison overwritten [ 669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b [ 669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494 [ 669.826424] __slab_alloc+0x4bf/0x566 [ 669.826433] __kmalloc+0x280/0x310 [ 669.826453] sctp_auth_create_key+0x23/0x50 [sctp] [ 669.826471] sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp] [ 669.826488] sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp] [ 669.826505] sctp_do_sm+0x29d/0x17c0 [sctp] [...] [ 669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494 [ 669.826635] __slab_free+0x39/0x2a8 [ 669.826643] kfree+0x1d6/0x230 [ 669.826650] kzfree+0x31/0x40 [ 669.826666] sctp_auth_key_put+0x19/0x20 [sctp] [ 669.826681] sctp_assoc_update+0x1ee/0x2d0 [sctp] [ 669.826695] sctp_do_sm+0x674/0x17c0 [sctp] Since this only triggers in some collision-cases with AUTH, the problem at heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice when having refcnt 1, once directly in sctp_assoc_update() and yet again from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on the already kzfree'd memory, which is also consistent with the observation of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected at a later point in time when poison is checked on new allocation). Reference counting of auth keys revisited: Shared keys for AUTH chunks are being stored in endpoints and associations in endpoint_shared_keys list. On endpoint creation, a null key is being added; on association creation, all endpoint shared keys are being cached and thus cloned over to the association. struct sctp_shared_key only holds a pointer to the actual key bytes, that is, struct sctp_auth_bytes which keeps track of users internally through refcounting. Naturally, on assoc or enpoint destruction, sctp_shared_key are being destroyed directly and the reference on sctp_auth_bytes dropped. User space can add keys to either list via setsockopt(2) through struct sctp_authkey and by passing that to sctp_auth_set_key() which replaces or adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes with refcount 1 and in case of replacement drops the reference on the old sctp_auth_bytes. A key can be set active from user space through setsockopt() on the id via sctp_auth_set_active_key(), which iterates through either endpoint_shared_keys and in case of an assoc, invokes (one of various places) sctp_auth_asoc_init_active_key(). sctp_auth_asoc_init_active_key() computes the actual secret from local's and peer's random, hmac and shared key parameters and returns a new key directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops the reference if there was a previous one. The secret, which where we eventually double drop the ref comes from sctp_auth_asoc_set_secret() with intitial refcount of 1, which also stays unchanged eventually in sctp_assoc_update(). This key is later being used for crypto layer to set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac(). To close the loop: asoc->asoc_shared_key is freshly allocated secret material and independant of the sctp_shared_key management keeping track of only shared keys in endpoints and assocs. Hence, also commit `4184b2a79a` ("net: sctp: fix memory leak in auth key management") is independant of this bug here since it concerns a different layer (though same structures being used eventually). asoc->asoc_shared_key is reference dropped correctly on assoc destruction in sctp_association_free() and when active keys are being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is to remove that sctp_auth_key_put() from there which fixes these panics. Fixes: `730fc3d05c` ("[SCTP]: Implete SCTP-AUTH parameter processing") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 17:02:05 -08:00
Erik Hugne	08bfc9cb76	flow_dissector: add tipc support The flows are hashed on the sending node address, which allows us to spread out the TIPC link processing to RPS enabled cores. There is no point to include the destination address in the hash as that will always be the same for all inbound links. We have experimented with a 3-tuple hash over [srcnode, sport, dport], but this showed to give slightly lower performance because of increased lock contention when the same link was handled by multiple cores. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 16:58:09 -08:00
Erik Hugne	3fa9cacd69	tipc: fix excessive network event logging If a large number of namespaces is spawned on a node and TIPC is enabled in each of these, the excessive printk tracing of network events will cause the system to grind down to a near halt. The traces are still of debug value, so instead of removing them completely we fix it by changing the link state and node availability logging debug traces. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 16:58:08 -08:00
Daniel Borkmann	fd3e646c87	net: act_bpf: fix size mismatch on filter preparation Similarly as in cls_bpf, also this code needs to reject mismatches. Reference: http://article.gmane.org/gmane.linux.network/347406 Fixes: `d23b8ad8ab` ("tc: add BPF based action") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 16:08:55 -08:00
Daniel Borkmann	1c1bc6bdb7	net: cls_basic: return from walking on match in basic_get As soon as we've found a matching handle in basic_get(), we can return it. There's no need to continue walking until the end of a filter chain, since they are unique anyway. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Cc: Thomas Graf <tgraf@suug.ch> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 16:08:55 -08:00
Daniel Borkmann	3f2ab13594	net: cls_bpf: fix auto generation of per list handles When creating a bpf classifier in tc with priority collisions and invoking automatic unique handle assignment, cls_bpf_grab_new_handle() will return a wrong handle id which in fact is non-unique. Usually altering of specific filters is being addressed over major id, but in case of collisions we result in a filter chain, where handle ids address individual cls_bpf_progs inside the classifier. Issue is, in cls_bpf_grab_new_handle() we probe for head->hgen handle in cls_bpf_get() and in case we found a free handle, we're supposed to use exactly head->hgen. In case of insufficient numbers of handles, we bail out later as handle id 0 is not allowed. Fixes: `7d1d65cb84` ("net: sched: cls_bpf: add BPF-based classifier") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 15:50:19 -08:00
Daniel Borkmann	7913ecf69e	net: cls_bpf: fix size mismatch on filter preparation In cls_bpf_modify_existing(), we read out the number of filter blocks, do some sanity checks, allocate a block on that size, and copy over the BPF instruction blob from user space, then pass everything through the classic BPF checker prior to installation of the classifier. We should reject mismatches here, there are 2 scenarios: the number of filter blocks could be smaller than the provided instruction blob, so we do a partial copy of the BPF program, and thus the instructions will either be rejected from the verifier or a valid BPF program will be run; in the other case, we'll end up copying more than we're supposed to, and most likely the trailing garbage will be rejected by the verifier as well (i.e. we need to fit instruction pattern, ret {A,K} needs to be last instruction, load/stores must be correct, etc); in case not, we would leak memory when dumping back instruction patterns. The code should have only used nla_len() as Dave noted to avoid this from the beginning. Anyway, lets fix it by rejecting such load attempts. Fixes: `7d1d65cb84` ("net: sched: cls_bpf: add BPF-based classifier") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 15:50:18 -08:00
Joe Stringer	74ed7ab926	openvswitch: Add support for unique flow IDs. Previously, flows were manipulated by userspace specifying a full, unmasked flow key. This adds significant burden onto flow serialization/deserialization, particularly when dumping flows. This patch adds an alternative way to refer to flows using a variable-length "unique flow identifier" (UFID). At flow setup time, userspace may specify a UFID for a flow, which is stored with the flow and inserted into a separate table for lookup, in addition to the standard flow table. Flows created using a UFID must be fetched or deleted using the UFID. All flow dump operations may now be made more terse with OVS_UFID_F_* flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to omit the flow key from a datapath operation if the flow has a corresponding UFID. This significantly reduces the time spent assembling and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags enabled, the datapath only returns the UFID and statistics for each flow during flow dump, increasing ovs-vswitchd revalidator performance by 40% or more. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 15:45:50 -08:00
Joe Stringer	272c2cf841	openvswitch: Use sw_flow_key_range for key ranges. These minor tidyups make a future patch a little tidier. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 15:45:50 -08:00
Joe Stringer	d29ab6f8a9	openvswitch: Refactor ovs_flow_tbl_insert(). Rework so that ovs_flow_tbl_insert() calls flow_{key,mask}_insert(). This tidies up a future patch. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 15:45:49 -08:00
Joe Stringer	5b4237bbc9	openvswitch: Refactor ovs_nla_fill_match(). Refactor the ovs_nla_fill_match() function into separate netlink serialization functions ovs_nla_put_{unmasked_key,mask}(). Modify ovs_nla_put_flow() to handle attribute nesting and expose the 'is_mask' parameter - all callers need to nest the flow, and callers have better knowledge about whether it is serializing a mask or not. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 15:45:49 -08:00
Eric Dumazet	1dc7b90f7c	ipv6: tcp: fix race in IPV6_2292PKTOPTIONS IPv6 TCP sockets store in np->pktoptions skbs, and use skb_set_owner_r() to charge the skb to socket. It means that destructor must be called while socket is locked. Therefore, we cannot use skb_get() or atomic_inc(&skb->users) to protect ourselves : kfree_skb() might race with other users manipulating sk->sk_forward_alloc Fix this race by holding socket lock for the duration of ip6_datagram_recv_ctl() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 00:44:08 -08:00
Dan Carpenter	1b846f9282	bridge: simplify br_getlink() a bit Static checkers complain that we should maybe set "ret" before we do the "goto out;". They interpret the NULL return from br_port_get_rtnl() as a failure and forgetting to set the error code is a common bug in this situation. The code is confusing but it's actually correct. We are returning zero deliberately. Let's re-write it a bit to be more clear. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-25 23:32:49 -08:00
Martin KaFai Lau	b0a1ba5992	ipv6: Fix __ip6_route_redirect In my last commit (`a3c00e4`: ipv6: Remove BACKTRACK macro), the changes in __ip6_route_redirect is incorrect. The following case is missed: 1. The for loop tries to find a valid gateway rt. If it fails to find one, rt will be NULL. 2. When rt is NULL, it is set to the ip6_null_entry. 3. The newly added 'else if', from `a3c00e4`, will stop the backtrack from happening. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-25 22:09:51 -08:00
Vivien Didelot	24df8986f3	net: dsa: set slave MII bus PHY mask When registering a mdio bus, Linux assumes than every port has a PHY and tries to scan it. If a switch port has no PHY registered, DSA will fail to register the slave MII bus. To fix this, set the slave MII bus PHY mask to the switch PHYs mask. As an example, if we use a Marvell MV88E6352 (which is a 7-port switch with no registered PHYs for port 5 and port 6), with the following declared names: static struct dsa_chip_data switch_cdata = { [...] .port_names[0] = "sw0", .port_names[1] = "sw1", .port_names[2] = "sw2", .port_names[3] = "sw3", .port_names[4] = "sw4", .port_names[5] = "cpu", }; DSA will fail to create the switch instance. With the PHY mask set for the slave MII bus, only the PHY for ports 0-4 will be scanned and the instance will be successfully created. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-25 16:00:54 -08:00
Harout Hedeshian	c2943f1453	net: ipv6: Add sysctl entry to disable MTU updates from RA The kernel forcefully applies MTU values received in router advertisements provided the new MTU is less than the current. This behavior is undesirable when the user space is managing the MTU. Instead a sysctl flag 'accept_ra_mtu' is introduced such that the user space can control whether or not RA provided MTU updates should be applied. The default behavior is unchanged; user space must explicitly set this flag to 0 for RA MTUs to be ignored. Signed-off-by: Harout Hedeshian <harouth@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-25 14:54:41 -08:00
Alexander Duyck	64c6272351	fib_trie: Various clean-ups for handling slen While doing further work on the fib_trie I noted a few items. First I was using calls that were far more complicated than they needed to be for determining when to push/pull the suffix length. I have updated the code to reflect the simplier logic. The second issue is that I realised we weren't necessarily handling the case of a leaf_info struct surviving a flush. I have updated the logic so that now we will call pull_suffix in the event of having a leaf info value left in the leaf after flushing it. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-25 14:47:16 -08:00
Alexander Duyck	02525368f4	fib_trie: Move fib_find_alias to file where it is used The function fib_find_alias is only accessed by functions in fib_trie.c as such it makes sense to relocate it and cast it as static so that the compiler can take advantage of optimizations it can do to it as a local function. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-25 14:47:16 -08:00
Alexander Duyck	30cfe7c9c8	fib_trie: Use empty_children instead of counting empty nodes in stats collection It doesn't make much sense to count the pointers ourselves when empty_children already has a count for the number of NULL pointers stored in the tnode. As such save ourselves the cycles and just use empty_children. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-25 14:47:16 -08:00
Alexander Duyck	95f60ea3e9	fib_trie: Add collapse() and should_collapse() to resize This patch really does two things. First it pulls the logic for determining if we should collapse one node out of the tree and the actual code doing the collapse into a separate pair of functions. This helps to make the changes to these areas more readable. Second it encodes the upper 32b of the empty_children value onto the full_children value in the case of bits == KEYLENGTH. By doing this we are able to handle the case of a 32b node where empty_children would appear to be 0 when it was actually 1ul << 32. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-25 14:47:16 -08:00
Alexander Duyck	a80e89d4c6	fib_trie: Fall back to slen update on inflate/halve failure This change corrects an issue where if inflate or halve fails we were exiting the resize function without at least updating the slen for the node. To correct this I have moved the update of max_size into the while loop so that it is only decremented on a successful call to either inflate or halve. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-25 14:47:16 -08:00
Alexander Duyck	69fa57b1e4	fib_trie: Fix RCU bug and merge similar bits of inflate/halve This patch addresses two issues. The first issue is the fact that I believe I had the RCU freeing sequence slightly out of order. As a result we could get into an issue if a caller went into a child of a child of the new node, then backtraced into the to be freed parent, and then attempted to access a child of a child that may have been consumed in a resize of one of the new nodes children. To resolve this I have moved the resize after we have freed the oldtnode. The only side effect of this is that we will now be calling resize on more nodes in the case of inflate due to the fact that we don't have a good way to test to see if a full_tnode on the new node was there before or after the allocation. This should have minimal impact however since the node should already be correctly size so it is just the cost of calling should_inflate that we will be taking on the node which is only a couple of cycles. The second issue is the fact that inflate and halve were essentially doing the same thing after the new node was added to the trie replacing the old one. As such it wasn't really necessary to keep the code in both functions so I have split it out into two other functions, called replace and update_children. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-25 14:47:15 -08:00
Alexander Duyck	b3832117b4	fib_trie: Use index & (~0ul << n->bits) instead of index >> n->bits In doing performance testing and analysis of the changes I recently found that by shifting the index I had created an unnecessary dependency. I have updated the code so that we instead shift a mask by bits and then just test against that as that should save us about 2 CPU cycles since we can generate the mask while the key and pos are being processed. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-25 14:47:15 -08:00
Sasha Levin	6b8d9117cc	net: llc: use correct size for sysctl timeout entries The timeout entries are sizeof(int) rather than sizeof(long), which means that when they were getting read we'd also leak kernel memory to userspace along with the timeout values. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-25 00:23:21 -08:00
Tom Herbert	af33c1adae	vxlan: Eliminate dependency on UDP socket in transmit path In the vxlan transmit path there is no need to reference the socket for a tunnel which is needed for the receive side. We do, however, need the vxlan_dev flags. This patch eliminate references to the socket in the transmit path, and changes VXLAN_F_UNSHAREABLE to be VXLAN_F_RCV_FLAGS. This mask is used to store the flags applicable to receive (GBP, CSUM6_RX, and REMCSUM_RX) in the vxlan_sock flags. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-24 23:15:40 -08:00
Tom Herbert	d998f8efa4	udp: Do not require sock in udp_tunnel_xmit_skb The UDP tunnel transmit functions udp_tunnel_xmit_skb and udp_tunnel6_xmit_skb include a socket argument. The socket being passed to the functions (from VXLAN) is a UDP created for receive side. The only thing that the socket is used for in the transmit functions is to get the setting for checksum (enabled or zero). This patch removes the argument and and adds a nocheck argument for checksum setting. This eliminates the unnecessary dependency on a UDP socket for UDP tunnel transmit. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-24 23:15:40 -08:00
Nicolas Dichtel	193523bf93	vxlan: advertise netns of vxlan dev in fdb msg Netlink FDB messages are sent in the link netns. The header of these messages contains the ifindex (ndm_ifindex) of the netdevice, but this ifindex is unusable in case of x-netns vxlan. I named the new attribute NDA_NDM_IFINDEX_NETNSID, to avoid confusion with NDA_IFINDEX. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-23 17:51:15 -08:00
Nicolas Dichtel	1f17257b1f	vlan: advertise link netns via netlink Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is added to rtnetlink messages. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-23 17:51:15 -08:00
Nicolas Dichtel	3390e39761	ip6gretap: advertise link netns via netlink Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is added to rtnetlink messages. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-23 17:51:14 -08:00
Nicolas Dichtel	bdef279b99	rtnl: fix error path when adding an iface with a link net If an error occurs when the netdevice is moved to the link netns, a full cleanup must be done. Fixes: `317f4810e4` ("rtnl: allow to create device with IFLA_LINK_NETNSID set") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-23 17:51:14 -08:00
Thomas Graf	d7924450e1	act_connmark: Add missing dependency on NF_CONNTRACK_MARK Depending on NETFILTER is not sufficient to ensure the presence of the 'mark' field in nf_conn, also needs to depend on NF_CONNTRACK_MARK. Fixes: 22a5dc ("net: sched: Introduce connmark action") Cc: Felix Fietkau <nbd@openwrt.org> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-23 17:46:09 -08:00
Johannes Berg	0fa7b39131	nl80211: fix per-station group key get/del and memory leak In case userspace attempts to obtain key information for or delete a unicast key, this is currently erroneously rejected unless the driver sets the WIPHY_FLAG_IBSS_RSN flag. Apparently enough drivers do so it was never noticed. Fix that, and while at it fix a potential memory leak: the error path in the get_key() function was placed after allocating a message but didn't free it - move it to a better place. Luckily admin permissions are needed to call this operation. Cc: stable@vger.kernel.org Fixes: `e31b82136d` ("cfg80211/mac80211: allow per-station GTKs") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-23 11:21:02 +01:00
Mathy Vanhoef	3a5c5e81d8	mac80211: properly set CCK flag in radiotap Fix a regression introduced by commit `a5e70697d0` ("mac80211: add radiotap flag and handling for 5/10 MHz") where the IEEE80211_CHAN_CCK channel type flag was incorrectly replaced by the IEEE80211_CHAN_OFDM flag. This commit fixes that by using the CCK flag again. Cc: stable@vger.kernel.org Fixes: `a5e70697d0` ("mac80211: add radiotap flag and handling for 5/10 MHz") Signed-off-by: Mathy Vanhoef <vanhoefm@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-23 10:53:58 +01:00
Fred Chou	fb142f4bbb	mac80211: correct header length calculation HT Control field may also be present in management frames, as defined in 8.2.4.1.10 of 802.11-2012. Account for this in calculation of header length. Signed-off-by: Fred Chou <fred.chou.nd@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-23 10:52:48 +01:00
Luciano Coelho	2af81d6718	mac80211: only roll back station states for WDS when suspending In normal cases (i.e. when we are fully associated), cfg80211 takes care of removing all the stations before calling suspend in mac80211. But in the corner case when we suspend during authentication or association, mac80211 needs to roll back the station states. But we shouldn't roll back the station states in the suspend function, because this is taken care of in other parts of the code, except for WDS interfaces. For AP types of interfaces, cfg80211 takes care of disconnecting all stations before calling the driver's suspend code. For station interfaces, this is done in the quiesce code. For WDS interfaces we still need to do it here, so move the code into a new switch case for WDS. Cc: stable@kernel.org [3.15+] Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-01-23 10:47:40 +01:00
David S. Miller	0c49087462	Some further updates for net-next: * fix network-manager which was broken by the previous changes * fix delete-station events, which were broken by me making the genlmsg_end() mistake * fix a timer left running during suspend in some race conditions that would cause an annoying (but harmless) warning * (less important, but in the tree already) remove 80+80 MHz rate reporting since the spec doesn't distinguish it from 160 MHz; as the bitrate they're both 160 MHz bandwidth -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJUvUZlAAoJEDBSmw7B7bqrfNMQAJT5jjOSjmwW8Zdvujvda/qt bFpYa9t0NsN3izzMpjPSrCwPrHN5qE86ZA8TcZrIzejPH4rpltjaXB6JNHZardVo deCUWU9xotoPELoE0Xex9mHPEkYYvOaht/P8A/88qP1S2PykMmj9fqNeijyUvwuo Jlsh0wKe4Jq6bCmdxvy/bde84ceAQcuh2TKNov1S0tB38tRY9qSu2n6ZGpoMNcEe CWuW+23jL1uAvt6Ljk2fTKdLR8iyXykfM0UMX2/4R2PMnJXK/dQqV/eeXTjpxtMk UV4aIMcSS19D7HxICKOXOdZLdMMuXaFUnUMlGLBtXZz3n9lZoP7RtVIHoib8VBXZ tY7xQrK6YNvwZ4SZZPuv/yr6YWP2+vBM2FUfXjzD+or6uYsej203a5q0RsOY+3Tp 6Yklr+mQNlrOtpMsHMSgrBUUZsAK1I95ALMVVqaq1hgf1cDvRIDHlOo4A7bjwNFw eA3L1o4O1/E/IGp4v6+2Iukc9rIwm11sNr/wuD8dDkZTradUPH1iY6J5sxJNb2Nq YdpCnQ/lNXj650+z9/G2omSA6DTTTOtXJPxKR+FOHZVKDpZYtF6TxKb0S79fINps 6ZlWIna5bUiXF1b6xad1x+vtyjNMgTvkg6mR+WQnvF57Ri8hucbtpv5wpA5bhYUQ Fbz9VZF2nfMeIbXfTaWi =Bvmr -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2015-01-19' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Some further updates for net-next: * fix network-manager which was broken by the previous changes * fix delete-station events, which were broken by me making the genlmsg_end() mistake * fix a timer left running during suspend in some race conditions that would cause an annoying (but harmless) warning * (less important, but in the tree already) remove 80+80 MHz rate reporting since the spec doesn't distinguish it from 160 MHz; as the bitrate they're both 160 MHz bandwidth Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-19 16:22:19 -05:00
Johannes Berg	926e9878a3	phonet netlink: allow multiple messages per skb in route dump My previous patch to this file changed the code to be bug-compatible towards userspace. Unless userspace (which I wasn't able to find) implements the dump reader by hand in a wrong way, this isn't needed. If it uses libnl or similar code putting multiple messages into a single SKB is far more efficient. Change the code to do this. While at it, also clean it up and don't use so many variables - just store the address in the callback args directly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-19 16:20:17 -05:00
Felix Fietkau	22a5dc0e5e	net: sched: Introduce connmark action This tc action allows you to retrieve the connection tracking mark This action has been used heavily by openwrt for a few years now. There are known limitations currently: doesn't work for initial packets, since we only query the ct table. Fine given use case is for returning packets no implicit defrag. frags should be rare so fix later.. won't work for more complex tasks, e.g. lookup of other extensions since we have no means to store results we still have a 2nd lookup later on via normal conntrack path. This shouldn't break anything though since skb->nfct isn't altered. V2: remove unnecessary braces (Jiri) change the action identifier to 14 (Jiri) Fix some stylistic issues caught by checkpatch V3: Move module params to bottom (Cong) Get rid of tcf_hashinfo_init and friends and conform to newer API (Cong) Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-19 16:02:06 -05:00
Florian Fainelli	8db0a2ee2c	net: bridge: reject DSA-enabled master netdevices as bridge members DSA-enabled master network devices with a switch tagging protocol should strip the protocol specific format before handing the frame over to higher layer. When adding such a DSA master network device as a bridge member, we go through the following code path when receiving a frame: __netif_receive_skb_core -> first ptype check against ptype_all is not returning any handler for this skb -> check and invoke rx_handler: -> deliver frame to the bridge layer: br_handle_frame DSA registers a ptype handler with the fake ETH_XDSA ethertype, which is called after the bridge-layer rx_handler has run. br_handle_frame() tries to parse the frame it received from the DSA master network device, and will not be able to match any of its conditions and jumps straight at the end of the end of br_handle_frame() and returns RX_HANDLER_CONSUMED there. Since we returned RX_HANDLER_CONSUMED, __netif_receive_skb_core() stops RX processing for this frame and returns NET_RX_SUCCESS, so we never get a chance to call our switch tag packet processing logic and deliver frames to the DSA slave network devices, and so we do not get any functional bridge members at all. Instead of cluttering the bridge receive path with DSA-specific checks, and rely on assumptions about how __netif_receive_skb_core() is processing frames, we simply deny adding the DSA master network device (conduit interface) as a bridge member, leaving only the slave DSA network devices to be bridge members, since those will work correctly in all circumstances. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-19 15:45:10 -05:00
Florian Fainelli	728c02089a	net: ipv4: handle DSA enabled master network devices The logic to configure a network interface for kernel IP auto-configuration is very simplistic, and does not handle the case where a device is stacked onto another such as with DSA. This causes the kernel not to open and configure the master network device in a DSA switch tree, and therefore slave network devices using this master network devices as conduit device cannot be open. This restriction comes from a check in net/dsa/slave.c, which is basically checking the master netdev flags for IFF_UP and returns -ENETDOWN if it is not the case. Automatically bringing-up DSA master network devices allows DSA slave network devices to be used as valid interfaces for e.g: NFS root booting by allowing kernel IP autoconfiguration to succeed on these interfaces. On the reverse path, make sure we do not attempt to close a DSA-enabled device as this would implicitely prevent the slave DSA network device from operating. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-19 15:45:10 -05:00
Hagen Paul Pfeifer	9d289715eb	ipv6: stop sending PTB packets for MTU < 1280 Reduce the attack vector and stop generating IPv6 Fragment Header for paths with an MTU smaller than the minimum required IPv6 MTU size (1280 byte) - called atomic fragments. See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1] for more information and how this "feature" can be misused. [1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00 Signed-off-by: Fernando Gont <fgont@si6networks.com> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-19 14:52:07 -05:00

1 2 3 4 5 ...

36088 Commits