linux/kernel
Eric Dumazet 5640f76858 net: use a per task frag allocator
We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-24 16:31:37 -04:00
..
debug kernel/debug: Make use of KGDB_REASON_NMI 2012-07-31 08:16:43 -05:00
events perf/hwpb: Invoke __perf_event_disable() if interrupts are already disabled 2012-09-04 17:29:53 +02:00
gcov
irq Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-08-03 10:56:44 -07:00
power Revert "NMI watchdog: fix for lockup detector breakage on resume" 2012-08-08 20:49:45 +02:00
sched sched: Fix kernel-doc warnings in kernel/sched/fair.c 2012-09-04 14:30:49 +02:00
time sched: Add missing call to calc_load_exit_idle() 2012-09-04 14:30:29 +02:00
trace tracing/syscalls: Fix perf syscall tracing when syscall_nr == -1 2012-08-17 15:19:46 -04:00
.gitignore
acct.c
async.c [SCSI] async: make async_synchronize_full() flush all work regardless of domain 2012-07-20 09:07:37 +01:00
audit_tree.c audit: clean up refcounting in audit-tree 2012-08-15 12:55:22 +02:00
audit_watch.c get rid of kern_path_parent() 2012-07-14 16:35:02 +04:00
audit.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
audit.h
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c
cgroup_freezer.c
cgroup.c Merge branch 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2012-07-24 17:47:44 -07:00
compat.c
configs.c
cpu_pm.c
cpu.c mm/hotplug: correctly setup fallback zonelists when creating new pgdat 2012-07-31 18:42:44 -07:00
cpuset.c cpusets: Remove/update outdated comments 2012-07-24 13:53:28 +02:00
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c net: use a per task frag allocator 2012-09-24 16:31:37 -04:00
extable.c
fork.c net: use a per task frag allocator 2012-09-24 16:31:37 -04:00
freezer.c
futex_compat.c
futex.c futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi() 2012-07-24 16:02:57 +02:00
groups.c
hrtimer.c hrtimer: Update hrtimer base offsets each hrtimer_interrupt 2012-07-11 23:34:39 +02:00
hung_task.c
irq_work.c
itimer.c
jump_label.c
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c kdump: append newline to the last lien of vmcoreinfo note 2012-07-30 17:25:20 -07:00
kfifo.c
kmod.c kmod: avoid deadlock from recursive kmod call 2012-07-30 17:25:20 -07:00
kprobes.c
ksysfs.c
kthread.c kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed 2012-07-22 10:15:28 -07:00
latencytop.c
lglock.c
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
lockdep.c
Makefile
module.c
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
nsproxy.c
padata.c
panic.c panic: fix a possible deadlock in panic() 2012-07-30 17:25:13 -07:00
params.c
pid_namespace.c pidns: Export free_pid_ns 2012-08-14 21:49:35 -07:00
pid.c net ip6 flowlabel: Make owner a union of struct pid * and kuid_t 2012-08-14 21:49:25 -07:00
posix-cpu-timers.c
posix-timers.c
printk.c printk: Fix calculation of length used to discard records 2012-08-12 21:25:50 +03:00
profile.c
ptrace.c
range.c
rcu.h
rcupdate.c rcu: Consolidate tree/tiny __rcu_read_{,un}lock() implementations 2012-07-02 12:34:23 -07:00
rcutiny_plugin.h rcu: Fix code-style issues involving "else" 2012-07-06 06:01:48 -07:00
rcutiny.c rcu: Fix rcu_is_cpu_idle() #ifdef in TINY_RCU 2012-07-02 12:34:25 -07:00
rcutorture.c rcu: Fix broken strings in RCU's source code. 2012-07-06 06:01:49 -07:00
rcutree_plugin.h rcu: Fix code-style issues involving "else" 2012-07-06 06:01:48 -07:00
rcutree_trace.c rcu: Fix broken strings in RCU's source code. 2012-07-06 06:01:49 -07:00
rcutree.c rcu: Fix code-style issues involving "else" 2012-07-06 06:01:48 -07:00
rcutree.h Merge branches 'bigrtm.2012.07.04a', 'doctorture.2012.07.02a', 'fixes.2012.07.06a' and 'fnh.2012.07.02a' into HEAD 2012-07-06 05:59:30 -07:00
relay.c splice: fix racy pipe->buffers uses 2012-06-13 21:16:42 +02:00
res_counter.c
resource.c resource: make sure requested range is included in the root range 2012-07-30 17:25:21 -07:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rwsem.c
seccomp.c
semaphore.c
signal.c signal: make sure we don't get stopped with pending task_work 2012-07-22 23:57:54 +04:00
smp.c
smpboot.c
smpboot.h smpboot: Remove leftover declaration 2012-06-11 15:07:52 +02:00
softirq.c mm: allow PF_MEMALLOC from softirq context 2012-07-31 18:42:45 -07:00
spinlock.c
srcu.c
stacktrace.c
stop_machine.c
sys_ni.c
sys.c kernel/sys.c: avoid argv_free(NULL) 2012-07-30 17:25:13 -07:00
sysctl_binary.c mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threads 2012-07-31 18:42:40 -07:00
sysctl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
task_work.c task_work: add a scheduling point in task_work_run() 2012-08-21 09:11:44 -07:00
taskstats.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
test_kprobes.c
time.c
timeconst.pl
timer.c alpha: take a bunch of syscalls into osf_sys.c 2012-08-19 08:41:19 -07:00
tracepoint.c
tsacct.c
uid16.c
up.c
user_namespace.c
user-return-notifier.c
user.c
utsname_sysctl.c
utsname.c
wait.c
watchdog.c Revert "NMI watchdog: fix for lockup detector breakage on resume" 2012-08-08 20:49:45 +02:00
workqueue_sched.h
workqueue.c workqueue: fix possible idle worker depletion across CPU hotplug 2012-09-10 10:05:54 -07:00