linux/Documentation
Oza Pawandeep 7e9084b367 PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices
PCIe ERR_FATAL errors mean the Link is unreliable.  Components on the Link
may need to be reset to return to reliable operation (PCIe r4.0, sec
6.2.2).  We previously handled these errors much differently depending on
whether the platform supports Downstream Port Containment (DPC) (PCIe r4.0,
sec 6.2.10) or not.

The AER driver has historically logged the error details, called
driver-supplied pci_error_handlers callbacks, and reset the Link.  This
reset downstream devices, but did not remove them from the PCI subsystem,
re-enumerate them, or call their driver .remove() or .probe() methods.

DPC is different because the hardware automatically disables the Link when
it detects ERR_FATAL, which resets downstream devices.  There's no
opportunity for pci_error_handlers callbacks before resetting the Link.
The DPC driver removes affected devices (which calls their driver .remove()
methods), brings the Link back up, and re-enumerates (which calls driver
.probe() methods).

Align AER ERR_FATAL handling with DPC by resetting the Link in software,
skipping the driver pci_error_handlers callbacks, removing the devices from
the PCI subsystem, and re-enumerating.  The idea is that drivers and
devices should see the same behavior for ERR_FATAL events, regardless of
whether they're handled by AER or DPC.

Here are the basic ERR_FATAL recovery steps, showing the previous AER
behavior, the AER behavior after this patch, and the DPC behavior:

                          AER        AER      DPC
                          previous   new      behavior
                          --------   ---      --------
  Log error               yes        yes      yes (minimal)
  drv.error_detected()    yes        no       no
  Reset Link              yes        yes      yes
  drv.mmio_enabled()      yes        no       no
  drv.slot_reset()        yes        no       no
  drv.resume()            yes        no       no
  Remove PCI devices      no         yes      yes
    (calls drv.remove())
  Re-enumerate            no         yes      yes
    (calls drv.probe())

N.B. With DPC, the Link reset happens before the driver .remove() calls,
while with AER, the reset happens *after* the .remove() calls.  The goal is
to eventually do the reset before .remove() for AER as well.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog, squash doc patch into this, remove unused
"result_data"]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
2018-05-17 16:44:13 -05:00
..
ABI RTC for 4.17 2018-04-10 10:22:27 -07:00
accelerators ocxl: Document the OCXL_IOCTL_GET_METADATA IOCTL 2018-03-02 13:02:15 +11:00
accounting
acpi ACPI / LPIT: Add Low Power Idle Table (LPIT) support 2017-10-11 15:38:10 +02:00
admin-guide ARM: 2018-04-09 11:42:31 -07:00
aoe
arm MTD changes: 2018-04-06 12:15:41 -07:00
arm64 ARM: 2018-04-09 11:42:31 -07:00
auxdisplay
backlight
block block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP 2017-11-14 20:13:33 -07:00
blockdev SCSI misc on 20170907 2017-09-07 21:11:05 -07:00
bpf bpf, doc: add description wrt native/bpf clang target and pointer size 2018-03-20 15:47:45 -07:00
bus-devices
cdrom Documentation/cdrom: fix German sharp s in LaTex 2018-03-08 19:35:29 -07:00
cgroup-v1 page cache: use xa_lock 2018-04-11 10:28:39 -07:00
cma
connector
console
core-api Documentation: Mention why %p prints ptrval 2018-03-23 12:42:18 -06:00
cpu-freq cpufreq: Drop cpufreq_table_validate_and_show() 2018-04-10 08:40:45 +02:00
cpuidle cpuidle: Add definition of residency to sysfs documentation 2018-04-09 13:44:37 +02:00
crypto crypto: doc - clarify hash callbacks state machine 2018-03-31 01:33:02 +08:00
dev-tools There's been a fair amount of activity in Documentation/ this time around: 2018-04-03 13:35:51 -07:00
device-mapper dm verity: add 'check_at_most_once' option to only validate hashes once 2018-04-03 15:04:29 -04:00
devicetree The large diff this time around is from the addition of a new clk driver 2018-04-13 15:51:06 -07:00
doc-guide doc-guide: kernel-doc: add comment about formatting verification 2018-02-23 08:06:15 -07:00
driver-api MTD changes: 2018-04-06 12:15:41 -07:00
driver-model serdev: Introduce devm_serdev_device_open() 2018-01-08 10:08:34 +00:00
early-userspace
EDID
extcon
fault-injection Documentation: nvme: Documentation for nvme fault injection 2018-03-26 08:53:43 -06:00
fb documentation: fb: update list of available compiled-in fonts 2017-11-08 03:39:52 -07:00
features arch: remove obsolete architecture ports 2018-04-02 20:20:12 -07:00
filesystems Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2018-04-13 16:55:41 -07:00
firmware_class
fmc
fpga fpga: mgr: separate getting/locking FPGA manager 2017-11-28 16:30:37 +01:00
gpio Documentation: gpio: Move drivers-on-gpio.txt to driver-api 2018-03-23 04:22:29 +01:00
gpu Linux 4.16-rc7 2018-03-28 14:30:41 +10:00
hid Documentation: fix input related doc refs 2017-10-12 11:14:06 -06:00
hwmon hwmon: (lm92) Add max6635 to lm92_id[] 2018-03-22 09:33:24 -07:00
i2c i2c: i801: Add missing documentation entries for Braswell and Kaby Lake 2018-02-21 09:17:20 +01:00
ia64 ia64: doc: tweak whitespace for 'console=' parameter 2018-03-05 14:41:38 -08:00
ide
iio iio: adc: New driver for Cirrus Logic EP93xx ADC 2017-07-25 19:56:23 +01:00
infiniband Documentation/ABI: update infiniband sysfs interfaces 2018-02-23 08:18:33 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2018-04-05 13:21:57 -07:00
ioctl arch: remove tile port 2018-03-16 10:56:03 +01:00
isdn Documentation/isdn: check and fix dead links ... 2018-03-26 12:31:13 -04:00
kbuild Kconfig updates for v4.17 2018-04-03 16:28:01 -07:00
kdump
kernel-hacking Documentation: Fix misconversion of #if 2018-01-17 16:45:01 -07:00
laptops Documentation: fix admin-guide doc refs 2017-10-12 11:13:28 -06:00
leds Documentation: leds: Update 00-INDEX file 2017-10-23 20:17:03 +02:00
lightnvm
livepatch livepatch: Remove immediate feature 2018-01-11 10:58:03 +01:00
locking Linux 4.16-rc2 2018-02-21 09:57:55 +01:00
m68k
maintainer docs: Add an intro note to the maintainers handbook 2017-12-11 14:46:10 -07:00
md raid5-ppl: PPL support for disks with write-back cache enabled 2018-01-15 14:29:42 -08:00
media media: extended-controls.rst: transmitter -> receiver 2018-04-04 08:05:20 -04:00
memory-devices
mic
mips Documentation: mips: Update AU1xxx_IDE Kconfig dependencies 2018-02-01 12:45:35 -07:00
misc-devices
mmc
mtd mtd: spi-nor: add an API to restore the status of SPI flash chip 2017-12-13 00:36:00 +01:00
namespaces
netlabel
networking Staging/IIO patches for 4.17-rc1 2018-04-04 18:56:27 -07:00
nfc
nios2
nvdimm
nvmem NVMEM documentation fix: A minor typo 2017-08-24 13:31:58 -06:00
openrisc Documentation: openrisc: Updates to README 2017-10-30 21:37:53 +09:00
parisc
PCI PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices 2018-05-17 16:44:13 -05:00
pcmcia
perf drivers/bus: Move Arm CCN PMU driver 2018-03-06 17:26:15 +01:00
phy
platform
power PM / hibernate: Make passing hibernate offsets more friendly 2018-03-30 12:01:20 +02:00
powerpc
pps drivers/pps: aesthetic tweaks to PPS-related content 2017-09-08 18:26:51 -07:00
process Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-04-15 16:12:35 -07:00
pti
ptp ptp: Fix documentation to match code. 2018-03-26 12:13:21 -04:00
rapidio Documentation: rapidio: move sysfs interface to ABI 2018-02-23 08:25:45 -07:00
RCU Merge branches 'cond_resched.2017.12.04a', 'dyntick.2017.11.28a', 'fixes.2017.12.11a', 'srbd.2017.12.05a' and 'torture.2017.12.11a' into HEAD 2017-12-11 09:21:58 -08:00
s390 vfio-ccw: update documentation 2018-03-01 17:32:14 +01:00
scheduler sched/deadline: Fix the description of runtime accounting in the documentation 2017-11-16 09:00:35 +01:00
scsi scsi: documentation: Obsolete documentation references 2018-03-21 18:34:20 -04:00
security selinux: Update SELinux SCTP documentation 2018-03-20 16:26:15 -04:00
serial
sh
sound sound updates for 4.15-rc1 2017-11-14 18:01:46 -08:00
sparc sparc64: Add support for ADI (Application Data Integrity) 2018-03-18 07:38:48 -07:00
sphinx Documentation/sphinx: Fix Directive import error 2018-03-09 10:46:14 -07:00
sphinx-static docs RTD theme: code-block with line nos - lines and line numbers don't line up. 2017-07-17 13:48:45 -06:00
spi
sysctl taint: add taint for randstruct 2018-04-11 10:28:35 -07:00
target
thermal thermal: Add cooling device's statistics in sysfs 2018-04-02 21:49:01 +08:00
timers sched/isolation: Eliminate NO_HZ_FULL_ALL 2018-02-15 15:40:37 -08:00
trace New features: 2018-04-10 11:27:30 -07:00
translations Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk 2018-02-01 13:36:15 -08:00
usb Documentation updates for 4.16. New stuff includes refcount_t 2018-01-31 19:25:25 -08:00
userspace-api seccomp: Implement SECCOMP_RET_KILL_PROCESS action 2017-08-14 13:46:50 -07:00
virtual KVM: trivial documentation cleanups 2018-03-28 22:47:06 +02:00
vm page cache: use xa_lock 2018-04-11 10:28:39 -07:00
w1 Documentation updates for 4.16. New stuff includes refcount_t 2018-01-31 19:25:25 -08:00
watchdog watchdog: remove bfin_wdt driver 2018-03-26 15:57:04 +02:00
wimax
x86 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-04-15 16:12:35 -07:00
xtensa xtensa: add support for KASAN 2017-12-16 22:37:12 -08:00
.gitignore
00-INDEX CRIS: Drop support for the CRIS port 2018-03-16 10:56:05 +01:00
atomic_bitops.txt locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit() 2018-02-13 14:55:53 +01:00
atomic_t.txt Documentation/locking/atomic: Finish the document... 2017-08-25 11:06:33 +02:00
bcache.txt
bt8xxgpio.txt
btmrvl.txt
bus-virt-phys-mapping.txt
cachetlb.txt
cgroup-v2.txt cgroup, docs: document the root cgroup behavior of cpu and io controllers 2018-01-16 08:07:09 -08:00
Changes
circular-buffers.txt doc: De-emphasize smp_read_barrier_depends 2017-12-05 11:57:53 -08:00
clearing-warn-once.txt kernel debug: support resetting WARN*_ONCE 2017-11-17 16:10:00 -08:00
clk.txt Documentation: clk: enable lock is not held for clk_is_enabled API 2018-03-16 15:44:43 -07:00
CodingStyle
conf.py docs: Remove "could not extract kernel version" warning 2017-12-11 15:20:04 -07:00
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt
digsig.txt
DMA-API-HOWTO.txt
DMA-API.txt dma-coherent: remove the DMA_MEMORY_MAP and DMA_MEMORY_IO flags 2017-09-01 11:59:17 +02:00
DMA-attributes.txt
DMA-ISA-LPC.txt
docutils.conf
dontdiff Remove gperf usage from toolchain 2017-08-19 11:02:53 -07:00
efi-stub.txt
eisa.txt
flexible-arrays.txt
futex-requeue-pi.txt
gcc-plugins.txt
highuid.txt
hw_random.txt
hwspinlock.txt
index.rst Documentation: add Linux tracing to Sphinx TOC tree 2018-03-07 10:22:53 -07:00
intel_txt.txt
Intel-IOMMU.txt
io_ordering.txt
io-mapping.txt
iostats.txt
IPMI.txt ipmi: Make IPMI panic strings always available 2017-09-27 16:03:45 -05:00
IRQ-affinity.txt
IRQ-domain.txt irqdomain: Kill CONFIG_IRQ_DOMAIN_DEBUG 2018-01-24 12:32:58 +01:00
IRQ.txt
irqflags-tracing.txt
isa.txt
isapnp.txt
kernel-per-CPU-kthreads.txt
kobject.txt
kprobes.txt kprobes/docs: Remove jprobes related documents 2017-10-20 11:02:55 +02:00
kref.txt
ldm.txt
lockup-watchdogs.txt
logo.gif
logo.txt
lsm.txt
lzo.txt
mailbox.txt
Makefile Documentation: add script and build target to check for broken file references 2017-10-12 11:07:42 -06:00
memory-barriers.txt locking/memory-barriers: De-emphasize smp_read_barrier_depends() some more 2018-03-10 10:22:22 +01:00
memory-hotplug.txt
men-chameleon-bus.txt
nommu-mmap.txt
ntb.txt
numastat.txt
padata.txt
parport-lowlevel.txt
percpu-rw-semaphore.txt
phy.txt
pi-futex.txt Documentation: fix locking rt-mutex doc refs 2017-10-19 12:56:44 -06:00
pnp.txt
preempt-locking.txt
pwm.txt
rbtree.txt rbtree: cache leftmost node internally 2017-09-08 18:26:48 -07:00
remoteproc.txt
rfkill.txt
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt Documentation: rtc: move iotcl interface documentation to ABI 2018-01-12 00:20:41 +01:00
SAK.txt
sgi-ioc4.txt
siphash.txt
SM501.txt
smsc_ece1099.txt
speculation.txt Documentation: Document array_index_nospec 2018-01-30 21:54:28 +01:00
static-keys.txt jump_label: Provide hotplug context variants 2017-08-10 12:28:59 +02:00
SubmittingPatches
svga.txt documentation/svga.txt: update outdated file 2017-11-20 10:45:50 -07:00
switchtec.txt NTB: switchtec_ntb: Update switchtec documentation with notes for NTB 2017-11-18 20:37:13 -05:00
sync_file.txt
tee.txt
this_cpu_ops.txt
unaligned-memory-access.txt
vfio-mediated-device.txt
vfio.txt
video-output.txt
xillybus.txt
xz.txt
zorro.txt