Commit Graph

132297 Commits

Author SHA1 Message Date
Dave Airlie
1847a549ac drm: fix warnings about new mappings in info code.
This fixes up the warnings in the debugfs code that conflicted
with the mapping fixups.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:19 +10:00
Hannes Eder
8f497aade8 drm/radeon: NULL noise: drivers/gpu/drm/radeon/radeon_*.c
Fix this sparse warning:
  drivers/gpu/drm/radeon/r600_cp.c:1811:52: warning: Using plain integer as NULL pointer
  drivers/gpu/drm/radeon/radeon_cp.c:1363:52: warning: Using plain integer as NULL pointer
  drivers/gpu/drm/radeon/radeon_state.c:1983:61: warning: Using plain integer as NULL pointer

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:19 +10:00
Dave Airlie
a763d7dc0a drm/radeon: fix r600 pci mapping calls.
This realigns the r600 pci mapping calls with the ati pcigart ones,
fixing the direction and using the correct interface.

Suggested by Jerome Glisse.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:18 +10:00
Alex Deucher
08932156cc drm/radeon: r6xx/r7xx: fix possible oops in r600_page_table_cleanup()
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:17 +10:00
Dave Airlie
53c379e946 radeon: call the correct idle function, logic got inverted.
This calls the correct idle function for the R600 and previous chips.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:17 +10:00
Alex Deucher
800b699511 drm/radeon: RS600: fix interrupt handling
the checks weren't updated when RS600 support
was added.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-13 14:24:16 +10:00
Dave Airlie
a7d13ad0e2 drm/r600: fix rptr address along lines of previous fixes to radeon.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:16 +10:00
Dave Airlie
eb1d91954e drm/r600: fixup r600 gart table accessor like ati_pcigart.c
This attempts to fixup the r600 GART accessors so they work on other arches.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:15 +10:00
Dave Airlie
6abf66018f drm/ati_pcigart: use memset_io to reset the memory
Also don't setup pci_gart if we aren't going to need it.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:14 +10:00
Dave Airlie
87f0da5535 drm: add DRM_READ/WRITE64 wrappers around readq/writeq.
The readq/writeq stuff is from Dave Miller, and he
warns users to be careful about using these. Plans are only
r600 to use it so far.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:14 +10:00
Alex Deucher
8ced9c7516 radeon: add RS600 pci ids
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:13 +10:00
Alex Deucher
c1556f7151 radeon: add support for rs600 GPUs
RS600s are an AMD IGP for Intel CPUs, that look like RS690s from
a lot of perspectives but look like r600s from a memory controller
point of view.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:13 +10:00
Alex Deucher
7659e9804b radeon: fix r600 AGP support
This fixes the ioremap issues with r600 AGP.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:12 +10:00
Alex Deucher
7335aafa30 radeon: add R6xx/R7xx pci ids
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:12 +10:00
Alex Deucher
c05ce0834a drm/radeon: add initial support for R6xx/R7xx GPUs
This adds support for 2D/Xv acceleration in the X.org 2D driver,
to the drm. It doesn't yet provide any 3D support hooks.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:11 +10:00
Alex Deucher
80b3334a4d drm/radeon: add r6xx/r7xx microcode
This uses the same microcode system as the current radeon code.

It should be converted to the new microcode loader I suppose,
though really I need a lot more proof of the worth of me maintaining
firmware blobs externally.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:10 +10:00
Alex Deucher
befb73c232 drm/radeon: prep for r6xx/r7xx support
- add r6xx/r7xx regs and macros
- add r6xx/r7xx chip families
- fix register access for regs with offsets >= 0x10000

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:10 +10:00
Owain G. Ainsworth
995e37cafb i915/drm: Remove two redundant agp_chipset_flushes
agp_chipset_flush() is for flushing the intel GMCH write cache via the
IFP, these two uses are for when we're getting the object into the cpu
READ domain, and thus should not be needed. This confused me when I was
getting my head around the code.

With thanks to airlied for helping me check my mental picture of how the
flushes and clflushes are supposed to be used.

Signed-off-by: Owain G. Ainsworth <oga@openbsd.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:09 +10:00
Chris Wilson
87ba7c663a drm/i915: Display fence register state in debugfs i915_gem_fence_regs node.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:08 +10:00
Eric Anholt
97d479e77b drm/i915: Add information on pinning and fencing to the i915 list debug.
This was inspired by a patch by Chris Wilson, though none of it applied in any
way due to the debugfs work and I decided to change the formatting of the
new information anyway.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:08 +10:00
Ben Gamari
30106f97a6 drm/i915: Consolidate gem object list dumping
Here we eliminate a few functions in favor of using a single function
to dump from all of the object lists.

Signed-Off-By: Ben Gamari <bgamari@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:07 +10:00
Ben Gamari
955b12def4 drm: Convert proc files to seq_file and introduce debugfs
The old mechanism to formatting proc files is extremely ugly. The
seq_file API was designed specifically for cases like this and greatly
simplifies the process.

Also, most of the files in /proc really don't belong there. This patch
introduces the infrastructure for putting these into debugfs and exposes
all of the proc files in debugfs as well.

This contains the i915 hooks rewrite as well, to make bisectability better.

Signed-off-by: Ben Gamari <bgamari@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:07 +10:00
Dave Airlie
dd8d7cb49e drm/radeon: split busmaster enable out to a separate function
this is just a code cleanup from the kms tree.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:06 +10:00
Dave Airlie
4247ca942a drm/radeon: align ring writes to 16 dwords boundaries.
On some radeon GPUs this appears to introduce another level of
stability around interacting with the ring.

Its pretty much what fglrx appears to do.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:05 +10:00
Benjamin Herrenschmidt
cd00f95aff drm/radeon: Print PCI ID of cards when probing
This is usedul when you have multiple cards to figure out which
one is which minor.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:05 +10:00
David Miller
09e40d65d0 drm: Only use DRM_IOCTL_UPDATE_DRAW compat wrapper for compat X86.
Only X86 32-bit uses a different alignment for "unsigned long long"
than it's 64-bit counterpart.

Therefore this compat translation is only correct, and only needed,
when either CONFIG_X86 or CONFIG_IA64.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:04 +10:00
David Miller
958a6f8ccb drm: radeon: Fix unaligned access in r300_scratch().
In compat mode, the cmdbuf->buf 64-bit address cookie can
potentially be only 32-bit aligned.  Dereferencing this as
64-bit causes expensive unaligned traps on platforms like
sparc64.

Use get_unaligned() to fix.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:04 +10:00
David Miller
f1a2a9b618 drm: Preserve SHMLBA bits in hash key for _DRM_SHM mappings.
Platforms such as sparc64 have D-cache aliasing issues.  We
cannot allow virtual mappings in different contexts to be such
that two cache lines can be loaded for the same backing data.
Updates to one cache line won't be seen by accesses to the other
cache line.

Code in sparc64 and other architectures solve this problem by
making sure that all userland mappings of MAP_SHARED objects have
the same virtual address base.  They implement this by keying
off of the page offset, and using that to choose a suitably
consistent virtual address for mmap() requests.

Making things even worse, getting this wrong on sparc64 can result
in hangs during DRM lock acquisition.  This is because, at least on
UltraSPARC-III, normal loads consult the D-cache but atomics such
as 'cas' (which is what cmpxchg() is implement using) only consult
the L2 cache.  So if a D-cache alias is inserted, the load can
see different data than the atomic, and we'll loop forever because
the atomic compare-and-exchange will never complete successfully.

So to make this all work properly, we need to make sure that the
hash address computed by drm_map_handle() preserves the SHMLBA
relevant bits, and that's what this patch does for _DRM_SHM mappings.

As a historical note, many years ago this bug didn't exist because we
used to just use the low 32-bits of the address as the hash and just
hope for the best.  This preserved the SHMLBA bits properly.  But when
the hashtab code was added to DRM, this was no longer the case.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-03-13 14:24:03 +10:00
David Miller
d30333bbab drm: ati_pcigart: Fix limit check in drm_ati_pcigart_init().
The variable 'max_pages' is ambiguous.  There are two concepts
of "pages" being used in this function.

First, we have ATI GART pages which are always 4096 bytes.
Then, we have system pages which are of size PAGE_SIZE.

Eliminate the confusion by creating max_ati_pages and
max_real_pages.  Calculate and use them as appropriate.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-13 14:24:03 +10:00
David Miller
6abf6bb0ff drm: radeon: Use surface for PCI GART table.
This allocates a physical surface for the PCI GART table, this way no
matter what other surface configurations exist the GART table will
always be seen by the hardware properly.

We encode the file pointer of the virtual surface allocate using a
special cookie value, called PCIGART_FILE_PRIV.  On the last close, we
release that surface.

Just to be doubly safe, we run the pcigart table setup with the main
surface control register clear.

Based upon ideas from David Airlie and Ben Benjamin Herrenschmidt.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-13 14:24:02 +10:00
David Miller
e8a894372b drm: radeon: Fix calculation of RB_RPTR_ADDR in non-AGP case.
The address needs to be a GART relative address, rather than a PCI
DMA address.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-13 14:24:02 +10:00
David Miller
b266503072 drm: radeon: Fix RADEON_*_EMITED defines.
These are not supposed to be booleans, they are
supposed to be bit masks.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-13 14:24:01 +10:00
David Miller
b07fa022ec drm: radeon: Fix ring_rptr accesses.
The memory behind ring_rptr can either be in ioremapped memory
or a vmalloc() normal kernel memory buffer.

However, the code unconditionally uses DRM_{READ,WRITE}32() (and thus
readl() and writel()) to access it.

Basically, if RADEON_IS_AGP then it's ioremap()'d memory else it's
vmalloc'd memory.

Adjust all of the ring_rptr access code as needed.

While we're here, kill the 'scratch' pointer in drm_radeon_private.
It's only used in the one place where it is initialized.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-13 14:24:00 +10:00
David Miller
296c6ae0e9 drm: ati_pcigart: Need to use PCI_DMA_BIDIRECTIONAL.
The buffers mapped by the PCI GART can be written to by the device,
not just read.

For example, this happens via the RB_RPTR writeback on Radeon.

So we can't use PCI_DMA_TODEVICE else we'll get protection faults
on IOMMU platforms.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-13 14:24:00 +10:00
David Miller
5a7aad9a55 drm: ati_pcigart: Do not access I/O MEM space using pointer derefs.
The PCI GART table initialization code treats the GART table mapping
unconditionally as a kernel virtual address.

But it could be in the framebuffer, for example, and thus we're
dealing with a PCI MEM space ioremap() cookie.  Treating that as a
virtual address is illegal and will crash some system types (such as
sparc64 where the ioremap() return value is actually a physical I/O
address).

So access the area correctly, using gart_info->gart_table_location as
our guide.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-13 14:23:59 +10:00
Kristian Høgsberg
8e1004580e drm: Drop unused and broken dri_library_name sysfs attribute.
The kernel shouldn't be in the business of telling user space which
driver to load.  The kernel defers mapping PCI IDs to module names
to user space and we should do the same for DRI drivers.

And in fact, that's how it does work today.  Nothing uses the
dri_library_name attribute, and the attribute is in fact broken.
For intel devices, it falls back to the default behaviour of returning
the kernel module name as the DRI driver name, which doesn't work for
i965 devices.  Nobody has ever hit this problem or filed a bug about this.

Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-13 14:23:58 +10:00
Kristian Høgsberg
112b715e8e drm: claim PCI device when running in modesetting mode.
Under kernel modesetting, we manage the device at all times, regardless
of VT switching and X servers, so the only decent thing to do is to
claim the PCI device.  In that case, we call the suspend/resume hooks
directly from the pci driver hooks instead of the current class device detour.

Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-13 14:23:58 +10:00
Benjamin Herrenschmidt
41c2e75e60 drm: Make drm_local_map use a resource_size_t offset
This changes drm_local_map to use a resource_size for its "offset"
member instead of an unsigned long, thus allowing 32-bit machines
with a >32-bit physical address space to be able to store there
their register or framebuffer addresses when those are above 4G,
such as when using a PCI video card on a recent AMCC 440 SoC.

This patch isn't as "trivial" as it sounds: A few functions needed
to have some unsigned long/int changed to resource_size_t and a few
printk's had to be adjusted.

But also, because userspace isn't capable of passing such offsets,
I had to modify drm_find_matching_map() to ignore the offset passed
in for maps of type _DRM_FRAMEBUFFER or _DRM_REGISTERS.

If we ever support multiple _DRM_FRAMEBUFFER or _DRM_REGISTERS maps
for a given device, we might have to change that trick, but I don't
think that happens on any current driver.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-13 14:23:57 +10:00
Benjamin Herrenschmidt
f77d390c97 drm: Split drm_map and drm_local_map
Once upon a time, the DRM made the distinction between the drm_map
data structure exchanged with user space and the drm_local_map used
in the kernel.

For some reasons, while the BSD port still has that "feature", the
linux part abused drm_map for kernel internal usage as the local
map only existed as a typedef of the struct drm_map.

This patch fixes it by declaring struct drm_local_map separately
(though its content is currently identical to the userspace variant),
and changing the kernel code to only use that, except when it's a
user<->kernel interface (ie. ioctl).

This allows subsequent changes to the in-kernel format

I've also replaced the use of drm_local_map_t with struct drm_local_map
in a couple of places. Mostly by accident but they are the same (the
former is a typedef of the later) and I have some remote plans and
half finished patch to completely kill the drm_local_map_t typedef
so I left those bits in.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-13 14:23:56 +10:00
Benjamin Herrenschmidt
d883f7f1b7 drm: Use resource_size_t for drm_get_resource_{start, len}
The DRM uses its own wrappers to obtain resources from PCI devices,
which currently convert the resource_size_t into an unsigned long.

This is broken on 32-bit platforms with >32-bit physical address
space.

This fixes them, along with a few occurences of unsigned long used
to store such a resource in drivers.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-13 14:23:56 +10:00
Linus Torvalds
041b62374c Linus 2.6.29-rc8 2009-03-12 19:39:28 -07:00
Linus Torvalds
aa8e4fc68d bitmap: fix end condition in bitmap_find_free_region
Guennadi Liakhovetski noticed that the end condition for the loop in
bitmap_find_free_region() is wrong, and the "return if error" was also
using the wrong conditional that would only trigger if the bitmap was an
exact multiple of the allocation size, which is not necessarily the case
with dma_alloc_from_coherent().

Such a failure would end up in bitmap_find_free_region() accessing
beyond the end of the bitmap.

Reported-by: Guennadi Liakhovetski <lg@denx.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-12 19:32:51 -07:00
Linus Torvalds
9ead64974b Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
  kbuild: remove unused -r option for module-init-tool depmod
  kbuild: fix 'make rpm' when CONFIG_LOCALVERSION_AUTO=y and using SCM tree
  kbuild: fix mkspec to cleanup RPM_BUILD_ROOT
  kbuild: fix C libary confusion in unifdef.c due to getline()
2009-03-12 16:35:26 -07:00
Linus Torvalds
0b80e3adc2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  cpumask: mm_cpumask for accessing the struct mm_struct's cpu_vm_mask.
  cpumask: tsk_cpumask for accessing the struct task_struct's cpus_allowed.
2009-03-12 16:34:59 -07:00
Linus Torvalds
188de5ec56 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  Squashfs: Valid filesystems are flagged as bad by the corrupted fs patch
2009-03-12 16:32:36 -07:00
Linus Torvalds
5216a3c6d1 Merge branch 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
  hwmon: (f75375s) Remove unnecessary and confusing initialization
  hwmon: (it87) Properly decode -128 degrees C temperature
  hwmon: (lm90) Document support for the MAX6648/6692 chips
  hwmon: (abituguru3) Fix I/O error handling
2009-03-12 16:25:04 -07:00
Jody McIntyre
ab03eca8d4 trivial: fix bad links in the ext2 and ext3 documentation
Trivial patch to fix bad links in the ext2 and ext3 documentation.

Signed-off-by: Jody McIntyre <scjody@sun.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-12 16:24:25 -07:00
Linus Torvalds
8be3e1f1ca Merge branch 'fixes-20090312' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/pci
* 'fixes-20090312' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/pci:
  PCIe: portdrv: call pci_disable_device during remove
  pci: Fix typo in message while disabling HT MSI mapping
  pci: don't disable too many HT MSI mapping
  powerpc/pseries: The RPA PCI hotplug driver depends on EEH
  PCIe: AER: during disable, check subordinate before walking
  PCI: Add PCI quirk to disable L0s ASPM state for 82575 and 82598
2009-03-12 16:22:51 -07:00
Faisal Latif
c12e56ef69 RDMA/nes: Don't allow userspace QPs to use STag zero
STag zero is a special STag that allows consumers to access any bus
address without registering memory.  The nes driver unfortunately
allows STag zero to be used even with QPs created by unprivileged
userspace consumers, which means that any process with direct verbs
access to the nes device can read and write any memory accessible to
the underlying PCI device (usually any memory in the system).  Such
access is usually given for cluster software such as MPI to use, so
this is a local privilege escalation bug on most systems running this
driver.

The driver was using STag zero to receive the last streaming mode
data; to allow STag zero to be disabled for unprivileged QPs, the
driver now registers a special MR for this data.

Cc: <stable@kernel.org>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-12 16:21:41 -07:00
Nick Piggin
7ef0d7377c fs: new inode i_state corruption fix
There was a report of a data corruption
http://lkml.org/lkml/2008/11/14/121.  There is a script included to
reproduce the problem.

During testing, I encountered a number of strange things with ext3, so I
tried ext2 to attempt to reduce complexity of the problem.  I found that
fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
cleared, even though instrumentation showed that unlock_new_inode had
already been called for that inode.  This points to memory scribble, or
synchronisation problme.

i_state of I_NEW inodes is not protected by inode_lock because other
processes are not supposed to touch them until I_LOCK (and I_NEW) is
cleared.  Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
i_state revealed that generic_sync_sb_inodes is picking up new inodes from
the inode lists and passing them to __writeback_single_inode without
waiting for I_NEW.  Subsequently modifying i_state causes corruption.  In
my case it would look like this:

CPU0                            CPU1
unlock_new_inode()              __sync_single_inode()
 reg <- inode->i_state
 reg -> reg & ~(I_LOCK|I_NEW)   reg <- inode->i_state
 reg -> inode->i_state          reg -> reg | I_SYNC
                                reg -> inode->i_state

Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.

Fix for this is rather than wait for I_NEW inodes, just skip over them:
inodes concurrently being created are not subject to data integrity
operations, and should not significantly contribute to dirty memory
either.

After this change, I'm unable to reproduce any of the added warnings or
hangs after ~1hour of running.  Previously, the new warnings would start
immediately and hang would happen in under 5 minutes.

I'm also testing on ext3 now, and so far no problems there either.  I
don't know whether this fixes the problem reported above, but it fixes a
real problem for me.

Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Reported-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-12 16:20:24 -07:00