linux

mirror of https://github.com/FEX-Emu/linux.git synced 2025-01-08 02:21:18 +00:00

Author	SHA1	Message	Date
Xin Long	cee360ab4d	sctp: define the member stream as an object instead of pointer in asoc As Marcelo's suggestion, stream is a fixed size member of asoc and would not grow with more streams. To avoid an allocation for it, this patch is to define it as an object instead of pointer and update the places using it, also create sctp_stream_update() called in sctp_assoc_update() to migrate the stream info from one stream to another. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-02 13:56:26 -04:00
Vivien Didelot	717ffbfb28	net: dsa: remove dsa_uses_tagged_protocol Since dev->dsa_ptr is a pointer to a dsa_switch_tree, there is no need to have another inline helper just to check rcv. Remove dsa_uses_tagged_protocol and check dsa_ptr && dsa_ptr->rcv together at the same time. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-01 17:34:56 -04:00
Vivien Didelot	73a7ece8f7	net: dsa: comment hot path requirements The DSA layer uses inline helpers and copy of the tagging functions for faster access in hot path. Add comments to detail that. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-01 17:34:56 -04:00
Woojung Huh	8b8010fb78	dsa: add support for Microchip KSZ tail tagging Adding support for the Microchip KSZ switch family tail tagging. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-31 20:56:31 -04:00
Jakub Kicinski	d897a638e9	sched: add helper for updating statistics on all actions Forgetting to disable preemption around tcf_action_stats_update() seems to be a common mistake. Add a helper function for updating stats on all actions of a filter. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-31 17:58:13 -04:00
Vivien Didelot	23c9ee4934	net: dsa: remove dev arg of dsa_register_switch The current dsa_register_switch function takes a useless struct device pointer argument, which always equals ds->dev. Drivers either call it with ds->dev, or with the same device pointer passed to dsa_switch_alloc, which ends up being assigned to ds->dev. This patch removes the second argument of the dsa_register_switch and _dsa_register_switch functions. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-31 12:35:43 -04:00
David Ahern	9ae2872748	net: add extack arg to lwtunnel build state Pass extack arg down to lwtunnel_build_state and the build_state callbacks. Add messages for failures in lwtunnel_build_state, and add the extarg to nla_parse where possible in the build_state callbacks. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-30 11:55:32 -04:00
David Ahern	c255bd681d	net: lwtunnel: Add extack to encap attr validation Pass extack down to lwtunnel_valid_encap_type and lwtunnel_valid_encap_type_attr. Add messages for unknown or unsupported encap types. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-30 11:55:31 -04:00
David Ahern	7805599895	net: ipv4: Add extack message for invalid prefix or length Add extack error message for invalid prefix length and invalid prefix. Example of the latter is a route spec containing 172.16.100.1/24, where the /24 mask means the lower 8-bits should be 0. Amazing how easy that one is to overlook when an EINVAL is returned. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-30 11:55:31 -04:00
Vlad Yasevich	7a7e96e09d	bonding: Prevent duplicate userspace notification Whenever a user changes bonding options, a NETDEV_CHANGEINFODATA notificatin is generated which results in a rtnelink message to be sent. While runnig 'ip monitor', we can actually see 2 messages, one a result of the event, and the other a result of state change that is generated bo netdev_state_change(). However, this is not always the case. If bonding changes were done via sysfs or ifenslave (old ioctl interface), then only 1 message is seen. This patch removes duplicate messages in the case of using netlink to configure bonding. It introduceds a separte function that triggers a netdev event and uses that function in the syfs and ioctl cases. This was discovered while auditing all the different envents and continues the effort of cleaning up duplicated netlink messages. CC: David Ahern <dsa@cumulusnetworks.com> CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-27 18:51:41 -04:00
David S. Miller	34aa83c2fc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Overlapping changes in drivers/net/phy/marvell.c, bug fix in 'net' restricting a HW workaround alongside cleanups in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-26 20:46:35 -04:00
Eric Dumazet	3fb07daff8	ipv4: add reference counting to metrics Andrey Konovalov reported crashes in ipv4_mtu() I could reproduce the issue with KASAN kernels, between 10.246.7.151 and 10.246.7.152 : 1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 & 2) At the same time run following loop : while : do ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 done Cong Wang attempted to add back rt->fi in commit `82486aa6f1` ("ipv4: restore rt->fi for reference counting") but this proved to add some issues that were complex to solve. Instead, I suggested to add a refcount to the metrics themselves, being a standalone object (in particular, no reference to other objects) I tried to make this patch as small as possible to ease its backport, instead of being super clean. Note that we believe that only ipv4 dst need to take care of the metric refcount. But if this is wrong, this patch adds the basic infrastructure to extend this to other families. Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang for his efforts on this problem. Fixes: `2860583fe8` ("ipv4: Kill rt->fi") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-26 14:57:07 -04:00
David Ahern	6ffd903415	net: ipv4: Save trie prefix to fib lookup result Prefix is needed for returning matching route spec on get route request. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-26 14:12:50 -04:00
David Ahern	5510cdf7be	net: ipv4: refactor ip_route_input_noref A later patch wants access to the fib result on an input route lookup with the rcu lock held. Refactor ip_route_input_noref pushing the logic between rcu_read_lock ... rcu_read_unlock into a new helper that takes the fib_result as an input arg. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-26 14:12:49 -04:00
David Ahern	3abd1ade67	net: ipv4: refactor __ip_route_output_key_hash A later patch wants access to the fib result on an output route lookup with the rcu lock held. Refactor __ip_route_output_key_hash, pushing the logic between rcu_read_lock ... rcu_read_unlock into a new helper with the fib_result as an input arg. To keep the name length under control remove the leading underscores from the name and add _rcu to the name of the new helper indicating it is called with the rcu read lock held. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-26 14:12:49 -04:00
David S. Miller	52c05fc744	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-05-23 Here's the first Bluetooth & 802.15.4 pull request targeting the 4.13 kernel release. - Bluetooth 5.0 improvements (Data Length Extensions and alternate PHY) - Support for new Intel Bluetooth adapter [[8087:0aaa] - Various fixes to ieee802154 code - Various fixes to HCI UART code ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-25 12:54:49 -04:00
WANG Cong	367a8ce896	net_sched: only create filter chains for new filters/actions tcf_chain_get() always creates a new filter chain if not found in existing ones. This is totally unnecessary when we get or delete filters, new chain should be only created for new filters (or new actions). Fixes: `5bc1701881` ("net: sched: introduce multichain support for filters") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-25 12:15:05 -04:00
Jiri Pirko	ac4bb5de27	net: flow_dissector: add support for dissection of tcp flags Add support for dissection of tcp flags. Uses similar function call to tcp dissection function as arp, mpls and others. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-24 16:22:11 -04:00
David S. Miller	3f6b123bcc	mlx5-fixes-2017-05-23 Some TC offloads fixes from Or Gerlitz. From Erez, mlx5 IPoIB RX fix to improve GRO. From Mohamad, Command interface fix to improve mitigation against FW commands timeouts. From Tariq, Driver load Tolerance against affinity settings failures. Thanks, Saeed. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJZJD6WAAoJEEg/ir3gV/o+EJkH+gN9G9jXCkYEkuy0eADCRRMY Zs1wkJory1whkMyLScA8xO13IpSZ8AmZCp53hPi+Ak17JQrQ26D9MlzkR3WelWL4 4ABZBRDapKdFNsY2SSnGWb7U1INqCmamHF9hOIcezk6rPxKdx9RQ2pkShM5fObKL vSi+ptrUd5KuMWjikKr/P0v8BfFGYhDTcS5ToNFcITDrbs9srXRjMzgM0MFtvWit 9chXJVpudJdb9vlHjYrlY1nuJopfXyJxtvfBZqjQmviA/+LT0qJ81qkBEjaEyjxk 10Nc6eYfuZKIiDav3AC69xuSTPk73dxrrhOEBpPdqaq6sEOFl8NjpidETYVBnwQ= =GMLr -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2017-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-fixes-2017-05-23 Some TC offloads fixes from Or Gerlitz. From Erez, mlx5 IPoIB RX fix to improve GRO. From Mohamad, Command interface fix to improve mitigation against FW commands timeouts. From Tariq, Driver load Tolerance against affinity settings failures. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-24 15:43:57 -04:00
Alexey Dobriyan	417ccf6b5b	net: make struct request_sock_ops::obj_size unsigned This field is sizeof of corresponding kmem_cache so it can't be negative. Space will be saved after 32-bit kmem_cache_create() patch. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-23 11:13:19 -04:00
Alexey Dobriyan	4c0ebd6fed	net: make struct inet_frags::qsize unsigned This field is sizeof of corresponding kmem_cache so it can't be negative. Prepare for 32-bit kmem_cache_create(). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-23 11:13:19 -04:00
David S. Miller	2f9bfd3399	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-05-23 1) Fix wrong header offset for esp4 udpencap packets. 2) Fix a stack access out of bounds when creating a bundle with sub policies. From Sabrina Dubroca. 3) Fix slab-out-of-bounds in pfkey due to an incorrect sadb_x_sec_len calculation. 4) We checked the wrong feature flags when taking down an interface with IPsec offload enabled. Fix from Ilan Tayari. 5) Copy the anti replay sequence numbers when doing a state migration, otherwise we get out of sync with the sequence numbers. Fix from Antony Antony. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-23 10:51:32 -04:00
Or Gerlitz	3aa4266405	net/sched: act_csum: Add accessors for offloading drivers Add the accessors for realizing if this is a csum action, and for which fields checksum is needed. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-05-23 16:23:31 +03:00
David S. Miller	218b6a5b23	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2017-05-22 23:32:48 -04:00
Vivien Didelot	52c96f9d70	net: dsa: move notifier info to private header The DSA notifier events and info structure definitions are not meant for DSA drivers and users, but only used internally by the DSA core files. Move them from the public net/dsa.h file to the private dsa_priv.h file. Also use this opportunity to turn the events into an anonymous enum, because we don't care about the values, and this will prevent future conflicts when adding (and sorting) new events. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-22 19:37:32 -04:00
David Ahern	333c430167	net: ipv6: Plumb extack through route add functions Plumb extack argument down to route add functions. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-22 12:12:20 -04:00
David Ahern	6d8422a175	net: ipv4: Plumb extack through route add functions Plumb extack argument down to route add functions. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-22 12:12:19 -04:00
David S. Miller	23416e2304	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) When using IPVS in direct-routing mode, normal traffic from the LVS host to a back-end server is sometimes incorrectly NATed on the way back into the LVS host. Patch to fix this from Julian Anastasov. 2) Calm down clang compilation warning in ctnetlink due to type mismatch, from Matthias Kaehlcke. 3) Do not re-setup NAT for conntracks that are already confirmed, this is fixing a problem that was introduced in the previous nf-next batch. Patch from Liping Zhang. 4) Do not allow conntrack helper removal from userspace cthelper infrastructure if already in used. This comes with an initial patch to introduce nf_conntrack_helper_put() that is required by this fix. From Liping Zhang. 5) Zero the pad when copying data to userspace, otherwise iptables fails to remove rules. This is a follow up on the patchset that sorts out the internal match/target structure pointer leak to userspace. Patch from the same author, Willem de Bruijn. This also comes with a build failure when CONFIG_COMPAT is not on, coming in the last patch of this series. 6) SYNPROXY crashes with conntrack entries that are created via ctnetlink, more specifically via conntrackd state sync. Patch from Eric Leblond. 7) RCU safe iteration on set element dumping in nf_tables, from Liping Zhang. 8) Missing sanitization of immediate date for the bitwise and cmp expressions in nf_tables. 9) Refcounting logic for chain and objects from set elements does not integrate into the nf_tables 2-phase commit protocol. 10) Missing sanitization of target verdict in ebtables arpreply target, from Gao Feng. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-21 13:00:02 -04:00
David S. Miller	c6cd850d65	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2017-05-18 16:11:32 -04:00
Vivien Didelot	438ff53739	net: dsa: use switchdev_obj_dump_cb_t everywhere Now that the DSA public header includes switchdev.h, use the provided switchdev_obj_dump_cb_t typedef for the object dump callback. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-18 10:40:12 -04:00
Vivien Didelot	f0c24ccf49	net: dsa: include switchdev.h only once DSA drivers and core use switchdev. Include switchdev.h only once, in the dsa.h public header, so that inclusion in DSA drivers or forward declarations of switchdev structures in not necessary anymore. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-18 10:40:12 -04:00
Alexey Dobriyan	667271455f	net: make struct dst_entry::dev first member struct dst_entry::dev is used most often. Move it so it can be accessed without imm8 offset on x86_64. add/remove: 0/0 grow/shrink: 9/239 up/down: 52/-413 (-361) function old new delta dst_rcu_free 126 138 +12 fnhe_flush_routes 211 219 +8 rt_set_nexthop 747 754 +7 rt_cache_route 85 91 +6 rt6_release 209 215 +6 dst_release 107 111 +4 dst_destroy_rcu 29 33 +4 dn_dst_check_expire 329 333 +4 dn_insert_route 484 485 +1 xfrm_resolve_and_create_bundle 2991 2990 -1 ... ip_route_me_harder 1163 1157 -6 __ip_append_data.isra 2730 2724 -6 ip6_forward 3052 3045 -7 callforward_do_filter 659 651 -8 dst_gc_task 571 549 -22 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-18 10:30:36 -04:00
linzhang	64df6d525f	net: x25: fix one potential use-after-free issue The function x25_init is not properly unregister related resources on error handler.It is will result in kernel oops if x25_init init failed, so add properly unregister call on error handler. Also, i adjust the coding style and make x25_register_sysctl properly return failure. Signed-off-by: linzhang <xiaolou4617@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-18 10:05:40 -04:00
Marcel Holtmann	de2ba3039c	Bluetooth: Set LE Default PHY preferences If the LE Set Default PHY command is supported, the indicate to the controller that the host has no preferences for transmitter PHY or receiver PHY selection. Issuing this command gives the controller a clear indication that other PHY can be selected if available. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2017-05-18 13:52:49 +02:00
Marcel Holtmann	9756d33b85	Bluetooth: Enable LE Channel Selection Algorithm event If the Channel Selection Algorithm #2 feature is supported, then enable the new LE Channel Selection Algorithm event. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2017-05-18 13:52:49 +02:00
Eric Dumazet	9a568de481	tcp: switch TCP TS option (RFC 7323) to 1ms clock TCP Timestamps option is defined in RFC 7323 Traditionally on linux, it has been tied to the internal 'jiffies' variable, because it had been a cheap and good enough generator. For TCP flows on the Internet, 1 ms resolution would be much better than 4ms or 10ms (HZ=250 or HZ=100 respectively) For TCP flows in the DC, Google has used usec resolution for more than two years with great success [1] Receive size autotuning (DRS) is indeed more precise and converges faster to optimal window size. This patch converts tp->tcp_mstamp to a plain u64 value storing a 1 usec TCP clock. This choice will allow us to upstream the 1 usec TS option as discussed in IETF 97. [1] https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-17 16:06:01 -04:00
Eric Dumazet	70eabf0e1b	tcp: use tcp_jiffies32 for rcv_tstamp and lrcvtime Use tcp_jiffies32 instead of tcp_time_stamp, since tcp_time_stamp will soon be only used for TCP TS option. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-17 16:06:01 -04:00
Eric Dumazet	d635fbe27e	tcp: use tcp_jiffies32 to feed tp->lsndtime Use tcp_jiffies32 instead of tcp_time_stamp to feed tp->lsndtime. tcp_time_stamp will soon be a litle bit more expensive than simply reading 'jiffies'. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-17 16:06:01 -04:00
Eric Dumazet	ec66eda82d	tcp: introduce tcp_jiffies32 We abuse tcp_time_stamp for two different cases : 1) base to generate TCP Timestamp options (RFC 7323) 2) A 32bit version of jiffies since some TCP fields are 32bit wide to save memory. Since we want in the future to have 1ms TCP TS clock, regardless of HZ value, we want to cleanup things. tcp_jiffies32 is the truncated jiffies value, which will be used only in places where we want a 'host' timestamp. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-17 16:06:01 -04:00
Jiri Pirko	db50514f9a	net: sched: add termination action to allow goto chain Introduce new type of termination action called "goto_chain". This allows user to specify a chain to be processed. This action type is then processed as a return value in tcf_classify loop in similar way as "reclassify" is, only it does not reset to the first filter in chain but rather reset to the first filter of the desired chain. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-17 15:22:13 -04:00
Jiri Pirko	9fb9f251d2	net: sched: push tp down to action init Tp pointer will be needed by the next patch in order to get the chain. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-17 15:22:13 -04:00
Jiri Pirko	5bc1701881	net: sched: introduce multichain support for filters Instead of having only one filter per block, introduce a list of chains for every block. Create chain 0 by default. UAPI is extended so the user can specify which chain he wants to change. If the new attribute is not specified, chain 0 is used. That allows to maintain backward compatibility. If chain does not exist and user wants to manipulate with it, new chain is created with specified index. Also, when last filter is removed from the chain, the chain is destroyed. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-17 15:22:13 -04:00
Jiri Pirko	2190d1d094	net: sched: introduce helpers to work with filter chains Introduce struct tcf_chain object and set of helpers around it. Wraps up insertion, deletion and search in the filter chain. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-17 15:22:13 -04:00
Jiri Pirko	6529eaba33	net: sched: introduce tcf block infractructure Currently, the filter chains are direcly put into the private structures of qdiscs. In order to be able to have multiple chains per qdisc and to allow filter chains sharing among qdiscs, there is a need for common object that would hold the chains. This introduces such object and calls it "tcf_block". Helpers to get and put the blocks are provided to be called from individual qdisc code. Also, the original filter_list pointers are left in qdisc privs to allow the entry into tcf_block processing without any added overhead of possible multiple pointer dereference on fast path. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-17 15:22:13 -04:00
Jiri Pirko	87d83093bf	net: sched: move tc_classify function to cls_api.c Move tc_classify function to cls_api.c where it belongs, rename it to fit the namespace. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-17 15:22:13 -04:00
Andrew Lunn	eb7b721129	net: dsa: Sort DSA tagging protocol drivers With more tag protocols being added, regain some order by sorting the entries in various places. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-17 15:19:40 -04:00
Vivien Didelot	8b0d3ea555	net: dsa: store CPU port pointer in the tree A dsa_switch_tree instance holds a dsa_switch pointer and a port index to identify the switch port to which the CPU is attached. Now that the DSA layer has a dsa_port structure to hold this data, use it to point the switch CPU port. This patch simply substitutes s/dst->cpu_switch/dst->cpu_dp->ds/ and s/dst->cpu_port/dst->cpu_dp->index/. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-17 14:19:12 -04:00
Eric Dumazet	218af599fa	tcp: internal implementation for pacing BBR congestion control depends on pacing, and pacing is currently handled by sch_fq packet scheduler for performance reasons, and also because implemening pacing with FQ was convenient to truly avoid bursts. However there are many cases where this packet scheduler constraint is not practical. - Many linux hosts are not focusing on handling thousands of TCP flows in the most efficient way. - Some routers use fq_codel or other AQM, but still would like to use BBR for the few TCP flows they initiate/terminate. This patch implements an automatic fallback to internal pacing. Pacing is requested either by BBR or use of SO_MAX_PACING_RATE option. If sch_fq happens to be in the egress path, pacing is delegated to the qdisc, otherwise pacing is done by TCP itself. One advantage of pacing from TCP stack is to get more precise rtt estimations, and less work done from TX completion, since TCP Small queue limits are not generally hit. Setups with single TX queue but many cpus might even benefit from this. Note that unlike sch_fq, we do not take into account header sizes. Taking care of these headers would add additional complexity for no practical differences in behavior. Some performance numbers using 800 TCP_STREAM flows rate limited to ~48 Mbit per second on 40Gbit NIC. If MQ+pfifo_fast is used on the NIC : $ sar -n DEV 1 5 \| grep eth 14:48:44 eth0 725743.00 2932134.00 46776.76 4335184.68 0.00 0.00 1.00 14:48:45 eth0 725349.00 2932112.00 46751.86 4335158.90 0.00 0.00 0.00 14:48:46 eth0 725101.00 2931153.00 46735.07 4333748.63 0.00 0.00 0.00 14:48:47 eth0 725099.00 2931161.00 46735.11 4333760.44 0.00 0.00 1.00 14:48:48 eth0 725160.00 2931731.00 46738.88 4334606.07 0.00 0.00 0.00 Average: eth0 725290.40 2931658.20 46747.54 4334491.74 0.00 0.00 0.40 $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 4 0 0 259825920 45644 2708324 0 0 21 2 247 98 0 0 100 0 0 4 0 0 259823744 45644 2708356 0 0 0 0 2400825 159843 0 19 81 0 0 0 0 0 259824208 45644 2708072 0 0 0 0 2407351 159929 0 19 81 0 0 1 0 0 259824592 45644 2708128 0 0 0 0 2405183 160386 0 19 80 0 0 1 0 0 259824272 45644 2707868 0 0 0 32 2396361 158037 0 19 81 0 0 Now use MQ+FQ : lpaa23:~# echo fq >/proc/sys/net/core/default_qdisc lpaa23:~# tc qdisc replace dev eth0 root mq $ sar -n DEV 1 5 \| grep eth 14:49:57 eth0 678614.00 2727930.00 43739.13 4033279.14 0.00 0.00 0.00 14:49:58 eth0 677620.00 2723971.00 43674.69 4027429.62 0.00 0.00 1.00 14:49:59 eth0 676396.00 2719050.00 43596.83 4020125.02 0.00 0.00 0.00 14:50:00 eth0 675197.00 2714173.00 43518.62 4012938.90 0.00 0.00 1.00 14:50:01 eth0 676388.00 2719063.00 43595.47 4020171.64 0.00 0.00 0.00 Average: eth0 676843.00 2720837.40 43624.95 4022788.86 0.00 0.00 0.40 $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 0 259832240 46008 2710912 0 0 21 2 223 192 0 1 99 0 0 1 0 0 259832896 46008 2710744 0 0 0 0 1702206 198078 0 17 82 0 0 0 0 0 259830272 46008 2710596 0 0 0 0 1696340 197756 1 17 83 0 0 4 0 0 259829168 46024 2710584 0 0 16 0 1688472 197158 1 17 82 0 0 3 0 0 259830224 46024 2710408 0 0 0 0 1692450 197212 0 18 82 0 0 As expected, number of interrupts per second is very different. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-16 15:43:31 -04:00
Paolo Abeni	2276f58ac5	udp: use a separate rx queue for packet reception under udp flood the sk_receive_queue spinlock is heavily contended. This patch try to reduce the contention on such lock adding a second receive queue to the udp sockets; recvmsg() looks first in such queue and, only if empty, tries to fetch the data from sk_receive_queue. The latter is spliced into the newly added queue every time the receive path has to acquire the sk_receive_queue lock. The accounting of forward allocated memory is still protected with the sk_receive_queue lock, so udp_rmem_release() needs to acquire both locks when the forward deficit is flushed. On specific scenarios we can end up acquiring and releasing the sk_receive_queue lock multiple times; that will be covered by the next patch Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-16 15:41:29 -04:00
Paolo Abeni	65101aeca5	net/sock: factor out dequeue/peek with offset code And update __sk_queue_drop_skb() to work on the specified queue. This will help the udp protocol to use an additional private rx queue in a later patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-16 15:41:29 -04:00

1 2 3 4 5 ...

10513 Commits