linux/Documentation
Aaron Lu a2468cc9bf swap: choose swap device according to numa node
If the system has more than one swap device and swap device has the node
information, we can make use of this information to decide which swap
device to use in get_swap_pages() to get better performance.

The current code uses a priority based list, swap_avail_list, to decide
which swap device to use and if multiple swap devices share the same
priority, they are used round robin.  This patch changes the previous
single global swap_avail_list into a per-numa-node list, i.e.  for each
numa node, it sees its own priority based list of available swap
devices.  Swap device's priority can be promoted on its matching node's
swap_avail_list.

The current swap device's priority is set as: user can set a >=0 value,
or the system will pick one starting from -1 then downwards.  The
priority value in the swap_avail_list is the negated value of the swap
device's due to plist being sorted from low to high.  The new policy
doesn't change the semantics for priority >=0 cases, the previous
starting from -1 then downwards now becomes starting from -2 then
downwards and -1 is reserved as the promoted value.

Take 4-node EX machine as an example, suppose 4 swap devices are
available, each sit on a different node:
swapA on node 0
swapB on node 1
swapC on node 2
swapD on node 3

After they are all swapped on in the sequence of ABCD.

Current behaviour:
their priorities will be:
swapA: -1
swapB: -2
swapC: -3
swapD: -4
And their position in the global swap_avail_list will be:
swapA   -> swapB   -> swapC   -> swapD
prio:1     prio:2     prio:3     prio:4

New behaviour:
their priorities will be(note that -1 is skipped):
swapA: -2
swapB: -3
swapC: -4
swapD: -5
And their positions in the 4 swap_avail_lists[nid] will be:
swap_avail_lists[0]: /* node 0's available swap device list */
swapA   -> swapB   -> swapC   -> swapD
prio:1     prio:3     prio:4     prio:5
swap_avali_lists[1]: /* node 1's available swap device list */
swapB   -> swapA   -> swapC   -> swapD
prio:1     prio:2     prio:4     prio:5
swap_avail_lists[2]: /* node 2's available swap device list */
swapC   -> swapA   -> swapB   -> swapD
prio:1     prio:2     prio:3     prio:5
swap_avail_lists[3]: /* node 3's available swap device list */
swapD   -> swapA   -> swapB   -> swapC
prio:1     prio:2     prio:3     prio:4

To see the effect of the patch, a test that starts N process, each mmap
a region of anonymous memory and then continually write to it at random
position to trigger both swap in and out is used.

On a 2 node Skylake EP machine with 64GiB memory, two 170GB SSD drives
are used as swap devices with each attached to a different node, the
result is:

runtime=30m/processes=32/total test size=128G/each process mmap region=4G
kernel         throughput
vanilla        13306
auto-binding   15169 +14%

runtime=30m/processes=64/total test size=128G/each process mmap region=2G
kernel         throughput
vanilla        11885
auto-binding   14879 +25%

[aaron.lu@intel.com: v2]
  Link: http://lkml.kernel.org/r/20170814053130.GD2369@aaronlu.sh.intel.com
  Link: http://lkml.kernel.org/r/20170816024439.GA10925@aaronlu.sh.intel.com
[akpm@linux-foundation.org: use kmalloc_array()]
Link: http://lkml.kernel.org/r/20170814053130.GD2369@aaronlu.sh.intel.com
Link: http://lkml.kernel.org/r/20170816024439.GA10925@aaronlu.sh.intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Cc: "Chen, Tim C" <tim.c.chen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:30 -07:00
..
ABI mm, swap: add sysfs interface for VMA based swap readahead 2017-09-06 17:27:29 -07:00
accounting
acpi This is the bulk of GPIO changes for the v4.13 series: 2017-07-07 12:40:27 -07:00
admin-guide mm, page_alloc: rip out ZONELIST_ORDER_ZONE 2017-09-06 17:27:25 -07:00
aoe
arm Documentation: arm: Replace use of virt_to_phys with __pa_symbol 2017-07-17 13:43:58 -06:00
arm64 arm64: Expose DC CVAP to userspace 2017-08-09 11:00:35 +01:00
auxdisplay
backlight
blackfin
block bio-integrity: fold bio_integrity_enabled to bio_integrity_prep 2017-07-03 16:56:24 -06:00
blockdev zram: add config and doc file for writeback feature 2017-09-06 17:27:25 -07:00
bus-devices
cdrom
cgroup-v1 mm, vmpressure: pass-through notification support 2017-07-10 16:32:31 -07:00
cma
connector
console
core-api Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-09-04 08:13:52 -07:00
cpu-freq
cpuidle
cris
crypto KEYS: Add documentation for asymmetric keyring restrictions 2017-07-14 11:01:38 +10:00
dev-tools docs: disable KASLR when debugging kernel 2017-07-17 14:49:01 -06:00
device-mapper dm raid: bump target version 2017-07-25 14:54:20 -04:00
devicetree Power management updates for v4.14-rc1 2017-09-05 12:19:08 -07:00
dmaengine
doc-guide sphinx.rst: Allow Sphinx version 1.6 at the docs 2017-08-26 15:50:27 -06:00
driver-api This is the bulk of the GPIO changes for the v4.14 cycle: 2017-09-05 11:49:48 -07:00
driver-model genirq/irq_sim: Add a devres variant of irq_sim_init() 2017-08-16 16:40:02 +02:00
early-userspace
EDID
extcon
fault-injection fault-inject: add /proc/<pid>/fail-nth 2017-07-14 15:05:13 -07:00
fb efifb: allow user to disable write combined mapping. 2017-07-31 18:45:41 +02:00
features docs/features: parisc implements tracehook 2017-08-07 14:18:40 -06:00
filesystems fscache: remove unused ->now_uncached callback 2017-09-06 17:27:26 -07:00
firmware_class
fmc
fpga
frv
gpio pinctrl: generic: update references to Documentation/pinctrl.txt 2017-08-07 15:26:34 +02:00
gpu Merge tag 'drm-intel-next-2017-08-18' of git://anongit.freedesktop.org/git/drm-intel into drm-next 2017-08-22 10:03:07 +10:00
hid
hwmon hwmon: (pmbus/lm25066) Add support for TI LM5066I 2017-08-30 06:31:13 -07:00
i2c
ia64
ide
iio iio: adc: New driver for Cirrus Logic EP93xx ADC 2017-07-25 19:56:23 +01:00
infiniband Documentation: Hardware tag matching 2017-08-29 08:30:21 -04:00
input Documentation:input: fix typo 2017-08-30 15:18:24 -06:00
ioctl
isdn
kbuild kbuild: Update example for ccflags-y usage 2017-08-07 14:24:28 -06:00
kdump kexec/kdump: minor Documentation updates for arm64 and Image 2017-07-12 16:26:00 -07:00
kernel-hacking There has been a fair amount of activity in the docs tree this time 2017-07-03 21:13:25 -07:00
laptops
leds
lightnvm
livepatch
locking Merge branch 'linus' into locking/core, to fix up conflicts 2017-09-04 11:01:18 +02:00
m68k
md
media Merge branch 'docs-next' of git://git.lwn.net/linux 2017-09-03 21:07:29 -07:00
memory-devices
metag
mic
mips
misc-devices
mmc
mn10300
mtd
namespaces
netlabel
networking Merge branch 'docs-next' of git://git.lwn.net/linux 2017-09-03 21:07:29 -07:00
nfc
nios2
nvdimm
nvmem NVMEM documentation fix: A minor typo 2017-08-24 13:31:58 -06:00
parisc
PCI
pcmcia
perf
phy
platform
power Merge branch 'pm-sleep' 2017-09-04 00:06:02 +02:00
powerpc powerpc updates for 4.13 2017-07-07 13:55:45 -07:00
pps
process docs: process: drop git snapshots from applying-patches.rst 2017-08-30 15:25:30 -06:00
pti
ptp
rapidio
RCU doc: Set down RCU's scheduling-clock-interrupt needs 2017-08-17 07:31:14 -07:00
s390
scheduler
scsi
security docs: ReSTify table of contents in core.rst 2017-08-30 15:27:58 -06:00
serial
sh
sound sound updates for 4.13-rc1 2017-07-06 10:56:51 -07:00
sparc
sphinx Documentation/sphinx: fix kernel-doc decode for non-utf-8 locale 2017-08-31 13:36:28 -06:00
sphinx-static docs RTD theme: code-block with line nos - lines and line numbers don't line up. 2017-07-17 13:48:45 -06:00
spi
sysctl mm, page_alloc: rip out ZONELIST_ORDER_ZONE 2017-09-06 17:27:25 -07:00
target
thermal
timers
trace stm class: Document the stm_ftrace 2017-08-25 17:58:34 +03:00
translations Merge branch 'linus' into locking/core, to fix up conflicts 2017-09-04 11:01:18 +02:00
usb
userspace-api
virtual kvm: x86: hyperv: make VP_INDEX managed by userspace 2017-07-14 16:28:18 +02:00
vm swap: choose swap device according to numa node 2017-09-06 17:27:30 -07:00
w1
watchdog
wimax
x86 Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-09-04 13:56:37 -07:00
xtensa
.gitignore
00-INDEX linux-kselftest-4.13-rc1-update 2017-07-07 14:04:47 -07:00
atomic_bitops.txt Documentation/locking/atomic: Add documents for new atomic_t APIs 2017-08-10 12:29:00 +02:00
atomic_t.txt Documentation/locking/atomic: Finish the document... 2017-08-25 11:06:33 +02:00
bcache.txt bcache.txt: standardize document format 2017-07-14 13:51:27 -06:00
bt8xxgpio.txt bt8xxgpio.txt: standardize document format 2017-07-14 13:51:27 -06:00
btmrvl.txt btmrvl.txt: standardize document format 2017-07-14 13:51:27 -06:00
bus-virt-phys-mapping.txt bus-virt-phys-mapping.txt: standardize document format 2017-07-14 13:51:28 -06:00
cachetlb.txt cachetlb.txt: standardize document format 2017-07-14 13:51:28 -06:00
cgroup-v2.txt cgroup-v2.txt: standardize document format 2017-07-14 13:58:13 -06:00
Changes
circular-buffers.txt circular-buffers.txt: standardize document format 2017-07-14 13:51:29 -06:00
clk.txt clk.txt: standardize document format 2017-07-14 13:51:29 -06:00
CodingStyle
conf.py docs-rst: fix verbatim font size on tables 2017-08-26 15:50:20 -06:00
cpu-load.txt cpu-load: standardize document format 2017-07-14 13:51:30 -06:00
cputopology.txt cputopology.txt: standardize document format 2017-07-14 13:51:30 -06:00
crc32.txt crc32.txt: standardize document format 2017-07-14 13:51:30 -06:00
dcdbas.txt dcdbas.txt: standardize document format 2017-07-14 13:51:31 -06:00
debugging-modules.txt
debugging-via-ohci1394.txt debugging-via-ohci1394.txt: standardize document format 2017-07-14 13:51:34 -06:00
dell_rbu.txt dell_rbu.txt: standardize document format 2017-07-14 13:58:12 -06:00
digsig.txt digsig.txt: standardize document format 2017-07-14 13:51:31 -06:00
DMA-API-HOWTO.txt DMA-API-HOWTO.txt: standardize document format 2017-07-14 13:51:32 -06:00
DMA-API.txt DMA-API.txt: standardize document format 2017-07-14 13:51:32 -06:00
DMA-attributes.txt DMA-attributes.txt: standardize document format 2017-07-14 13:51:33 -06:00
DMA-ISA-LPC.txt DMA-ISA-LPC.txt: standardize document format 2017-07-14 13:51:33 -06:00
docutils.conf
dontdiff GCC plugin updates: 2017-07-05 11:46:59 -07:00
efi-stub.txt efi-stub.txt: standardize document format 2017-07-14 13:51:34 -06:00
eisa.txt eisa.txt: standardize document format 2017-07-14 13:51:34 -06:00
flexible-arrays.txt flexible-arrays.txt: standardize document format 2017-07-14 13:51:35 -06:00
futex-requeue-pi.txt futex-requeue-pi.txt: standardize document format 2017-07-14 13:51:35 -06:00
gcc-plugins.txt gcc-plugins.txt: standardize document format 2017-07-14 13:51:36 -06:00
highuid.txt highuid.txt: standardize document format 2017-07-14 13:51:36 -06:00
hw_random.txt hw_random.txt: standardize document format 2017-07-14 13:51:37 -06:00
hwspinlock.txt hwspinlock.txt: standardize document format 2017-07-14 13:51:37 -06:00
index.rst
intel_txt.txt intel_txt.txt: standardize document format 2017-07-14 13:51:38 -06:00
Intel-IOMMU.txt Intel-IOMMU.txt: standardize document format 2017-07-14 13:51:38 -06:00
io_ordering.txt io_ordering.txt: standardize document format 2017-07-14 13:51:39 -06:00
io-mapping.txt io-mapping.txt: standardize document format 2017-07-14 13:51:38 -06:00
iostats.txt iostats.txt: update it to cover recent Kernels 2017-07-14 13:51:40 -06:00
IPMI.txt IPMI.txt: standardize document format 2017-07-14 13:51:40 -06:00
IRQ-affinity.txt IRQ-affinity.txt: standardize document format 2017-07-14 13:51:41 -06:00
IRQ-domain.txt IRQ-domain.txt: standardize document format 2017-07-14 13:51:41 -06:00
IRQ.txt IRQ.txt: add a markup for its title 2017-07-14 13:51:42 -06:00
irqflags-tracing.txt irqflags-tracing.txt: standardize document format 2017-07-14 13:51:42 -06:00
isa.txt isa.txt: standardize document format 2017-07-14 13:51:43 -06:00
isapnp.txt isapnp.txt: promote title level 2017-07-14 13:51:43 -06:00
kernel-doc-nano-HOWTO.txt
kernel-per-CPU-kthreads.txt kernel-per-CPU-kthreads.txt: standardize document format 2017-07-14 13:51:43 -06:00
kobject.txt kobject.txt: standardize document format 2017-07-14 13:51:44 -06:00
kprobes.txt docs: kprobes.txt: Fix whitespacing 2017-07-14 13:58:14 -06:00
kref.txt kref.txt: standardize document format 2017-07-14 13:51:45 -06:00
ldm.txt ldm.txt: standardize document format 2017-07-14 13:51:45 -06:00
lockup-watchdogs.txt lockup-watchdogs.txt: standardize document format 2017-07-14 13:51:46 -06:00
logo.gif
logo.txt
lsm.txt
lzo.txt lzo.txt: standardize document format 2017-07-14 13:51:46 -06:00
mailbox.txt mailbox.txt: standardize document format 2017-07-14 13:51:47 -06:00
Makefile doc: Makefile: if sphinx is not found, run a check script 2017-08-24 13:18:30 -06:00
memory-barriers.txt Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-09-04 11:52:29 -07:00
memory-hotplug.txt memory-hotplug.txt: standardize document format 2017-07-14 13:57:53 -06:00
men-chameleon-bus.txt men-chameleon-bus.txt: standardize document format 2017-07-14 13:57:54 -06:00
nommu-mmap.txt nommu-mmap.txt: don't use all upper case on titles 2017-07-14 13:57:55 -06:00
ntb.txt This series converts a number of top-level documents to the RST format 2017-07-15 12:58:58 -07:00
numastat.txt numastat.txt: standardize document format 2017-07-14 13:57:56 -06:00
padata.txt padata.txt: standardize document format 2017-07-14 13:57:56 -06:00
parport-lowlevel.txt parport-lowlevel.txt: standardize document format 2017-07-14 13:57:57 -06:00
percpu-rw-semaphore.txt percpu-rw-semaphore.txt: standardize document format 2017-07-14 13:57:58 -06:00
phy.txt phy.txt: standardize document format 2017-07-14 13:57:58 -06:00
pi-futex.txt pi-futex.txt: standardize document format 2017-07-14 13:57:59 -06:00
pnp.txt pnp.txt: standardize document format 2017-07-14 13:57:59 -06:00
preempt-locking.txt preempt-locking.txt: standardize document format 2017-07-14 13:58:00 -06:00
printk-formats.txt printk-formats.txt: Add examples for %pF and %pS usage 2017-08-24 18:49:52 +02:00
pwm.txt pwm: Standardize document format 2017-07-06 08:23:30 +02:00
rbtree.txt rbtree.txt: standardize document format 2017-07-14 13:58:01 -06:00
remoteproc.txt remoteproc.txt: standardize document format 2017-07-14 13:58:02 -06:00
rfkill.txt rfkill.txt: standardize document format 2017-07-14 13:58:02 -06:00
robust-futex-ABI.txt robust-futex-ABI.txt: standardize document format 2017-07-14 13:58:03 -06:00
robust-futexes.txt robust-futexes.txt: standardize document format 2017-07-14 13:58:03 -06:00
rpmsg.txt rpmsg.txt: standardize document format 2017-07-14 13:58:04 -06:00
rtc.txt rtc: add generic nvmem support 2017-07-07 13:14:14 +02:00
SAK.txt SAK.txt: standardize document format 2017-07-14 13:58:04 -06:00
sgi-ioc4.txt sgi-ioc4.txt: standardize document format 2017-07-14 13:58:05 -06:00
siphash.txt siphash.txt: standardize document format 2017-07-14 13:58:06 -06:00
SM501.txt SM501.txt: standardize document format 2017-07-14 13:58:06 -06:00
smsc_ece1099.txt smsc_ece1099.txt: standardize document format 2017-07-14 13:58:07 -06:00
static-keys.txt jump_label: Provide hotplug context variants 2017-08-10 12:28:59 +02:00
SubmittingPatches
svga.txt svga.txt: standardize document format 2017-07-14 13:58:08 -06:00
switchtec.txt
sync_file.txt
tee.txt tee.txt: standardize document format 2017-07-14 13:58:14 -06:00
this_cpu_ops.txt this_cpu_ops.txt: standardize document format 2017-07-14 13:58:08 -06:00
unaligned-memory-access.txt unaligned-memory-access.txt: standardize document format 2017-07-14 13:58:09 -06:00
vfio-mediated-device.txt vfio-mediated-device.txt: standardize document format 2017-07-14 13:58:10 -06:00
vfio.txt vfio.txt: standardize document format 2017-07-14 13:58:10 -06:00
video-output.txt
xillybus.txt xillybus.txt: standardize document format 2017-07-14 13:58:11 -06:00
xz.txt xz.txt: standardize document format 2017-07-14 13:58:11 -06:00
zorro.txt zorro.txt: standardize document format 2017-07-14 13:58:12 -06:00