Provides a way for NMI reported errors on x86 to notify the EDAC
subsystem pending ECC errors by writing to a software state variable.
Here's the reworked patch. I added an EDAC stub to the kernel so we can
have variables that are in the kernel even if EDAC is a module. I also
implemented the idea of using the chip driver to select error detection
mode via module parameter and eliminate the kernel compile option.
Please review/test. Thx!
Also, I only made changes to some of the chipset drivers since I am
unfamiliar with the other ones. We can add similar changes as we go.
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fix/change returns the offset into the page for the ce/ue error, instead
of just 0. The e752x dram controller reads 34:6 of the linear address with
the error.
Signed-off-by: Mike Chan <mikechan@google.com>
Signed-off-by: doug thompson <norsk5@xmission.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The reading of the DRA registers should be a byte at a time (one register at a
time) instead of 4 bytes at a time (four registers). Reading a dword at a
time retrieves erroneous information from all but the first register. A
change was made to read in each register in a loop prior to using the data in
those registers.
Signed-off-by: Brian Pomerantz <bapper@mvista.com>
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The fatal vs. non-fatal mask for the sysbus FERR status is incorrect
according to the E7520 datasheet. This patch corrects the mask to correctly
handle fatal and non-fatal errors.
Signed-off-by: Brian Pomerantz <bapper@mvista.com>
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the quoted module name in the sysfs for EDAC modules and reported by several
people.
Instead of ../_edac_e752x_/ now the following will be presented, like other
modules: ../edac_e752x/
Signed-off-by: Doug Thompson <norsk5@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Add lower-level functions that handle various parts of the initialization
done by the xxx_probe1() functions. Some of the xxx_probe1() functions are
much too long and complicated (see "Chapter 5: Functions" in
Documentation/CodingStyle).
- Cleanup of probe1() functions in EDAC
Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove add_mc_to_global_list(). In next patch, this function will be
reimplemented with different semantics.
1 Reimplement add_mc_to_global_list() with semantics that allow the caller to
determine the ID number for a mem_ctl_info structure. Then modify
edac_mc_add_mc() so that the caller specifies the ID number for the new
mem_ctl_info structure. Platform-specific code should be able to assign the
ID numbers in a platform-specific manner. For instance, on Opteron it makes
sense to have the ID of the mem_ctl_info structure match the ID of the node
that the memory controller belongs to.
2 Modify callers of edac_mc_add_mc() so they use the new semantics.
Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change MC drivers from using CVS revision strings for their version number,
Now each driver has its own local string.
Remove some PCI dependencies from the core EDAC module. Made the code 'struct
device' centric instead of 'struct pci_dev' Most of the code changes here are
from a patch by Dave Jiang. It may be best to eventually move the
PCI-specific code into a separate source file.
Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Address the issue of EDAC/BIOS coexistence for the e752x chip-sets.
We have found a problem where the BIOS will start the system with the error
registers (dev0:fun1) hidden and assuming it has exclusive access to them.
The edac driver violates this assumption.
The workaround this patch offers is to honor the hidden-ness as an
indication that it is not safe to use those registers.
Signed-off-by: Mark Gross <mark.gross@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch from Dave Jiang <djiang@mvista.com>: Fix EDAC e752x driver so it
outputs sysbus-specific error message when sysbus error detected.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cosmetic indentation/formatting cleanup for EDAC code. Make sure we
are using tabs rather than spaces to indent, etc.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Fix code so we always hold mem_ctls_mutex while we are stepping
through the list of mem_ctl_info structures. Otherwise bad things
may happen if one task is stepping through the list while another
task is modifying it. We may eventually want to use reference
counting to manage the mem_ctl_info structures. In the meantime we
may as well fix this bug.
- Don't disable interrupts while we are walking the list of
mem_ctl_info structures in check_mc_devices(). This is unnecessary.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix xxx_probe1() functions so they call xxx_get_error_info() functions
to clear initial errors. This is simpler and cleaner than duplicating
the low-level code for accessing PCI config space.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Add ctl_dev field to "struct e752x_dev_info". Then we can eliminate
ugly switch statement from e752x_probe1().
- Remove code from e752x_probe1() that clears initial PCI bus parity
errors. The core EDAC module already does this.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Perform the following name substitutions on all source files:
sed 's/BS_MOD_STR/EDAC_MOD_STR/g'
sed 's/bs_thread_info/edac_thread_info/g'
sed 's/bs_thread/edac_thread/g'
sed 's/bs_xstr/edac_xstr/g'
sed 's/bs_str/edac_str/g'
The names that start with BS_ or bs_ are artifacts of when the code
was called "bluesmoke".
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This implements the following idea:
On Monday 30 January 2006 19:22, Eric W. Biederman wrote:
> One piece missing from this conversation is the issue that we need errors
> in a uniform format. That is why edac_mc has helper functions.
>
> However there will always be errors that don't fit any particular model.
> Could we add a edac_printk(dev, ); That is similar to dev_printk but
> prints out an EDAC header and the device on which the error was found?
> Letting the rest of the string be user specified.
>
> For actual control that interface may be to blunt, but at least for people
> looking in the logs it allows all of the errors to be detected and
> harvested.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a subset of the bluesmoke project core code, stripped of the NMI work
which isn't ready to merge and some of the "interesting" proc functionality
that needs reworking or just has no place in kernel. It requires no core
kernel changes except the added scrub functions already posted.
The goal is to merge further functionality only after the core code is
accepted and proven in the base kernel, and only at the point the upstream
extras are really ready to merge.
From: doug thompson <norsk5@xmission.com>
This converts EDAC to sysfs and is the final chunk neccessary before EDAC
has a stable user space API and can be considered for submission into the
base kernel.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: doug thompson <norsk5@xmission.com>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>