640 Commits

Author SHA1 Message Date
Borislav Petkov
cb9d5ecdff EDAC, MCE: Add F12h NB MCE decoder
F12h is completely covered by the generic path.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:05 +02:00
Borislav Petkov
e7281eb37d EDAC, MCE: Add F12h IC MCE decoder
... which is the same as for K8 and F10h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:05 +02:00
Borislav Petkov
9be0bb1072 EDAC, MCE: Add F12h DC MCE decoder
F12h DC MCE signatures are a subset of F10h's so reuse them.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:04 +02:00
Borislav Petkov
f0157b3afd EDAC, MCE: Add support for F11h MCEs
F11h has almost the same MCE signatures as K8 except DRAM ECC and MC5
bank errors. Reuse functionality from the other families.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:04 +02:00
Borislav Petkov
9530d608ef EDAC, MCE: Enable MCE decoding on F14h
Now that all decoders have been taught about F14h, models < 0x10
MCEs, enable decoding on this family of CPUs. Also, issue a short
informational message upon boot that MCE decoding gets enabled.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:03 +02:00
Borislav Petkov
fe4ea2623b EDAC, MCE: Fix FR MCEs decoding
Those are N/A on K8, so don't decode them there.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:03 +02:00
Borislav Petkov
5ce88f6ea6 EDAC, MCE: Complete NB MCE decoders
Add support for decoding F14h BU MCEs and improve decoding of the
remaining families.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:02 +02:00
Borislav Petkov
ded5062328 EDAC, MCE: Warn about LS MCEs on F14h
F14h CPUs do not generate LS MCEs so exit early and warn the user in
case this path is ever hit that something else might be going haywire.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:02 +02:00
Borislav Petkov
dd53bce4e8 EDAC, MCE: Adjust IC decoders to F14h
Add support for IC MCEs for F14h CPUs. K8 and F10h are almost identical
so use one function for both.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:01 +02:00
Borislav Petkov
888ab8e6eb EDAC, MCE: Adjust DC decoders to F14h
Add a per-family data cache decoders. Since there is a certain overlap
between the different DC MCE signatures, reuse functionality between the
families as far as possible.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:00 +02:00
Borislav Petkov
47ca08a40b EDAC, MCE: Rename files
Drop "edac_" string from the filenames since they're prefixed with edac/
in their pathname anyway.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:00 +02:00
Borislav Petkov
9cdeb404a1 EDAC, MCE: Rework MCE injection
Add sysfs injection facilities for testing of the MCE decoding code.
Remove large parts of amd64_edac_dbg.c, as a result, which did only
NB MCE injection anyway and the new injection code supports that
functionality already.

Add an injection module so that MCE decoding code in production kernels
like those in RHEL and SLES can be tested.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:59 +02:00
Borislav Petkov
30e1f7a812 EDAC: Export edac sysfs class to users.
Move toplevel sysfs class to the stub and make it available to
non-modularized code too. Add proper refcounting of its users and move
the registration functionality into the reference counting routines.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:59 +02:00
Borislav Petkov
7cfd4a8744 EDAC, MCE: Pass complete MCE info to decoders
... instead of the MCi_STATUS info only for improved handling of certain
types of errors later.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:58 +02:00
Borislav Petkov
6337583d7d EDAC, MCE: Sanitize error codes
Clean up error codes names, shorten to mnemonics, add RRRR boundary
checking.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:58 +02:00
Borislav Petkov
0ee8efa8f4 EDAC, MCE: Remove unused function parameter
Remove remains from previous functionality.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:57 +02:00
Borislav Petkov
c9f281fd96 EDAC, MCE: Add HW_ERR prefix
.. so that the user knows what she's looking at there in dmesg. Also,
fix a minor cosmetic output inconsistency.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:57 +02:00
Borislav Petkov
ca755e0a49 EDAC: Fix error return
We should return a negative value when we cannot get the toplevel edac
sysfs class.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:56 +02:00
Justin P. Mattock
631dd1a885 Update broken web addresses in the kernel.
The patch below updates broken web addresses in the kernel

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Dimitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Acked-by: Ben Pfaff <blp@cs.stanford.edu>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-10-18 11:03:14 +02:00
Marcin Slusarz
64aab720bd i7core_edac: fix panic in udimm sysfs attributes registration
Array of udimm sysfs attributes was not ended with NULL marker, leading to
dereference of random memory.

  EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm0
  EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm1
  EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm2
  BUG: unable to handle kernel NULL pointer dereference at 00000000000001a4
  IP: [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
  Pid: 1, comm: swapper Not tainted 2.6.36-rc3-nv+ #483 P6T SE/System Product Name
  RIP: 0010:[<ffffffff81330b36>]  [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
  (...)
  Call Trace:
   [<ffffffff81330b86>] edac_create_mci_instance_attributes+0x198/0x1f1
   [<ffffffff81330c9a>] edac_create_sysfs_mci_device+0xbb/0x2b2
   [<ffffffff8132f533>] edac_mc_add_mc+0x46b/0x557
   [<ffffffff81428901>] i7core_probe+0xccf/0xec0
  RIP  [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
  ---[ end trace 20de320855b81d78 ]---
  Kernel panic - not syncing: Attempted to kill init!

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-01 10:50:58 -07:00
Borislav Petkov
00740c5854 amd64_edac: Fix driver module removal
f4347553b30ec66530bfe63c84530afea3803396 removed the edac polling
mechanism in favor of using a notifier chain for conveying MCE
information to edac. However, the module removal path didn't test
whether the driver had setup the polling function workqueue at all and
the rmmod process was hanging in the kernel at try_to_del_timer_sync()
in the cancel_delayed_work() path, trying to cancel an uninitialized
work struct.

Fix that by adding a balancing check to the workqueue removal path.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-09-27 12:52:58 +02:00
Mauro Carvalho Chehab
e6649cc629 i7300_edac: Properly initialize per-csrow memory size
Due to the current edac-core limits, we cannot represent a per-channel
memory size, for FB-DIMM drivers. So, we need to sum-up all values
for each slot, in order to properly represent the total amount of
memory found by the i7300 driver.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-09-24 14:16:12 -03:00
Mauro Carvalho Chehab
1aa4a7b6b0 V4L/DVB: i7300_edac: better initialize page counts
It is still somewhat fake, as the pages may not be on this exact order,
and may even be used in mirror mode, but this is a best guess than the
other random fake values.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-09-24 14:16:12 -03:00
Andreas Herrmann
23ac4ae827 x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB
The file names are somehow misleading as the code is not specific to
AMD K8 CPUs anymore. The files accomodate code for other AMD CPU
northbridges as well.

Same is true for the config option which is valid for AMD CPU
northbridges in general and not specific to K8.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160343.GD4958@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-20 14:22:58 -07:00
Andreas Herrmann
900f9ac9f1 x86, k8-gart: Decouple handling of garts and northbridges
So far we only provide num_k8_northbridges. This is required in
different areas (e.g. L3 cache index disable, GART). But not all AMD
CPUs provide a GART. Thus it is useful to split off the GART handling
from the generic caching of AMD northbridge misc devices.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160254.GC4958@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-17 13:26:21 -07:00
Mauro Carvalho Chehab
9c6f6b65d2 i7300-edac: CodingStyle cleanup
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:57:06 -03:00
Mauro Carvalho Chehab
d091a6eb17 i7300_edac: Improve comments
This is basically a cleanup patch, improving the comments for each
function.

While here, do a few cleanups.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:57:05 -03:00
Mauro Carvalho Chehab
b4552aceb3 i7300_edac: Cleanup: reorganize the file contents
This change should do no functional change. It just rearranges the
contents of the c file, in order to make easier to understand and
maintain it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:57:04 -03:00
Mauro Carvalho Chehab
37b69cf91c i7300_edac: Properly detect channel on CE errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:57:03 -03:00
Mauro Carvalho Chehab
32f9472613 i7300_edac: enrich FBD error info for corrected errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:57:02 -03:00
Mauro Carvalho Chehab
8199d8cc65 i7300_edac: enrich FBD error info for fatal errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:57:01 -03:00
Mauro Carvalho Chehab
85580ea4f7 i7300_edac: pre-allocate a buffer used to prepare err messages
Instead of dynamically allocating a buffer for it where needed,
just allocate it once. As we'll use the same buffer also during
fatal and non-fatal errors, is is very risky to dynamically allocate
it during an error.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:59 -03:00
Mauro Carvalho Chehab
28c2ce7c8b i7300_edac: Fix MTR x4/x8 detection logic
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:58 -03:00
Mauro Carvalho Chehab
3b330f6758 i7300_edac: Make the debug messages coherent with the others
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:57 -03:00
Mauro Carvalho Chehab
f427742248 i7300_edac: Cleanup: remove get_error_info logic
As the error logic in this driver came from i5400 driver, it
were using one function to get errors, and another to display.
Let's make it simpler and avoid doing it into two steps.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:56 -03:00
Mauro Carvalho Chehab
e432760509 i7300_edac: Add a code to cleanup error registers
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:55 -03:00
Mauro Carvalho Chehab
57021918aa i7300_edac: Add support for reporting FBD errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:54 -03:00
Mauro Carvalho Chehab
15154c57c6 i7300_edac: Properly detect the type of error correction
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:52 -03:00
Mauro Carvalho Chehab
bb81a21637 i7300_edac: Detect if the device is on single mode
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:51 -03:00
Mauro Carvalho Chehab
d7de2bdb0e i7300_edac: Adds detection for enhanced scrub mode on x8
While here, do some cleanup by adding some macros to check
for device features.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:50 -03:00
Mauro Carvalho Chehab
86002324cf i7300_edac: Clear the error bit after reading
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:49 -03:00
Mauro Carvalho Chehab
5de6e07ed7 i7300_edac: Add error detection code for global errors
There's no mention at the datasheet about how to enable global error
reporting. So, I'm assuming that those errors are always enabled.
Maybe I'm plain wrong about that ;)

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:48 -03:00
Mauro Carvalho Chehab
3e57eef64c i7300_edac: Better name PCI devices
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:47 -03:00
Mauro Carvalho Chehab
116389ed21 i7300_edac: Add a FIXME note about the error correction type
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:45 -03:00
Mauro Carvalho Chehab
c3af2eaf7a i7300_edac: add global error registers
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:44 -03:00
Mauro Carvalho Chehab
af3d8831e7 i7300_edac: display info if ECC is enabled or not
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:43 -03:00
Mauro Carvalho Chehab
fcaf780b2a i7300_edac: start a driver for i7300 chipset (Clarksboro)
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:42 -03:00
Borislav Petkov
37b7370a8d amd64_edac: Do not report error overflow as a separate error
When the Overflow MCi_STATUS bit is set, EDAC reports the lost error
with a "no information available" message which often puzzles users
parsing the dmesg. This doesn't make much sense since this error has
been lost anyway so no need for reporting it separately. Thus, report
the overflow bit setting in the MCE dump instead. While at it, remove
reporting of MiscV and ErrorEnable (en) which are superfluous.

Now it looks like this:

[ 1501.650024] MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
[ 1501.666887] Northbridge Error, node 2

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-08-26 12:46:03 +02:00
Borislav Petkov
e045c29126 MCE, AMD: Limit MCE decoding to current families for now
Limit MCE error decoding to current and older families only (K8-F11h).

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-08-24 18:06:54 +02:00
Linus Torvalds
58d4ea65b9 Merge branch 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6
* 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6:
  mmc_spi: Fix unterminated of_match_table
  of/sparc: fix build regression from of_device changes
  of/device: Replace struct of_device with struct platform_device
2010-08-12 09:11:31 -07:00