This flag currently only affects whether we allow "zero-copy" writes
with signing enabled. Typically we map pages in the pagecache directly
into the write request. If signing is enabled however and the contents
of the page change after the signature is calculated but before the
write is sent then the signature will be wrong. Servers typically
respond to this by closing down the socket.
Still, this can provide a performance benefit so the "Experimental" flag
was overloaded to allow this. That's really not a good place for this
option however since it's not clear what that flag does.
Move that flag instead to a new module parameter that better describes
its purpose. That's also better since it can be set at module insertion
time by configuring modprobe.d.
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs_close doesn't check that the filp->private_data is non-NULL before
trying to put it. That can cause an oops in certain error conditions
that can occur on open or lookup before the private_data is set.
Reported-by: Ben Greear <greearb@candelatech.com>
CC: Stable <stable@kernel.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: use proper interfaces for on-stack plugging
xfs: fix xfs_debug warnings
xfs: fix variable set but not used warnings
xfs: convert log tail checking to a warning
xfs: catch bad block numbers freeing extents.
xfs: push the AIL from memory reclaim and periodic sync
xfs: clean up code layout in xfs_trans_ail.c
xfs: convert the xfsaild threads to a workqueue
xfs: introduce background inode reclaim work
xfs: convert ENOSPC inode flushing to use new syncd workqueue
xfs: introduce a xfssyncd workqueue
xfs: fix extent format buffer allocation size
xfs: fix unreferenced var error in xfs_buf.c
Also, applied patch from Tony Luck that fixes ia64:
xfs_destroy_workqueues() should not be tagged with__exit
in the branch before merging.
ia64 throws away .exit sections for the built-in CONFIG case, so routines
that are used in other circumstances should not be tagged as __exit.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix data corruption regression by reverting commit 6de9843dab
ext4: Allow indirect-block file to grow the file size to max file size
ext4: allow an active handle to be started when freezing
ext4: sync the directory inode in ext4_sync_parent()
ext4: init timer earlier to avoid a kernel panic in __save_error_info
jbd2: fix potential memory leak on transaction commit
ext4: fix a double free in ext4_register_li_request
ext4: fix credits computing for indirect mapped files
ext4: remove unnecessary [cm]time update of quota file
jbd2: move bdget out of critical section
* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6:
dt/fsldma: fix build warning caused by of_platform_device changes
spi: Fix race condition in stop_queue()
gpio/pch_gpio: Fix output value of pch_gpio_direction_output()
gpio/ml_ioh_gpio: Fix output value of ioh_gpio_direction_output()
gpio/pca953x: fix error handling path in probe() call
Make XEN_SAVE_RESTORE select HIBERNATE_CALLBACKS.
Remove XEN_SAVE_RESTORE dependency from PM_SLEEP.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Xen save/restore is going to use hibernate device callbacks for
quiescing devices and putting them back to normal operations and it
would need to select CONFIG_HIBERNATION for this purpose. However,
that also would cause the hibernate interfaces for user space to be
enabled, which might confuse user space, because the Xen kernels
don't support hibernation. Moreover, it would be wasteful, as it
would make the Xen kernels include a substantial amount of code that
they would never use.
To address this issue introduce new power management Kconfig option
CONFIG_HIBERNATE_CALLBACKS, such that it will only select the code
that is necessary for the hibernate device callbacks to work and make
CONFIG_HIBERNATION select it. Then, Xen save/restore will be able to
select CONFIG_HIBERNATE_CALLBACKS without dragging the entire
hibernate code along with it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
In commit 13583b1659 ("PCI: refactor io size calculation code") Ram
had a thinko in the refactorization of the code: the end result used the
variable 'align' for the bus alignment, but the original code used
'min_align'.
Since then, another use of that 'align' variable got introduced by
commit c8adf9a3e8 ("PCI: pre-allocate additional resources to devices
only after successful allocation of essential resources.")
Fix both of those uses to use 'min_align' as they should.
Daniel Hellstrom <daniel@gaisler.com>
Acked-by: Ram Pai <linuxram@us.ibm.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
net: Add support for SMSC LAN9530, LAN9730 and LAN89530
mlx4_en: Restoring RX buffer pointer in case of failure
mlx4: Sensing link type at device initialization
ipv4: Fix "Set rt->rt_iif more sanely on output routes."
MAINTAINERS: add entry for Xen network backend
be2net: Fix suspend/resume operation
be2net: Rename some struct members for clarity
pppoe: drop PPPOX_ZOMBIEs in pppoe_flush_dev
dsa/mv88e6131: add support for mv88e6085 switch
ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2)
be2net: Fix a potential crash during shutdown.
bna: Fix for handling firmware heartbeat failure
can: mcp251x: Allow pass IRQ flags through platform data.
smsc911x: fix mac_lock acquision before calling smsc911x_mac_read
iwlwifi: accept EEPROM version 0x423 for iwl6000
rt2x00: fix cancelling uninitialized work
rtlwifi: Fix some warnings/bugs
p54usb: IDs for two new devices
wl12xx: fix potential buffer overflow in testmode nvs push
zd1211rw: reset rx idle timer from tasklet
...
If the request_fn ends up blocking, we could be re-entering
the plug flush. Since the list is protected by explicitly
not allowing schedule events, this isn't a terribly good idea.
Additionally, it can cause us to recurse. As request_fn called by
__blk_run_queue is allowed to 'schedule()' (after dropping the queue
lock of course), it is possible to get a recursive call:
schedule -> blk_flush_plug -> __blk_finish_plug -> flush_plug_list
-> __blk_run_queue -> request_fn -> schedule
We must make sure that the second schedule does not call into
blk_flush_plug again. So instead of leaving the list of requests on
blk_plug->list, move them to a separate list leaving blk_plug->list
empty.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Commit 000061245a, "dt/powerpc:
Eliminate users of of_platform_{,un}register_driver" forgot to convert
the type of structure passed into platform_device_register() when it
was converted from of_platform_device_register. Fix it.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* 'nouveau/drm-nouveau-fixes' of /ssd/git/drm-nouveau-next:
drm/nvc0: improve vm flush function
drm/nv50-nvc0: remove some code that doesn't belong here
drm/nv50: use "nv86" tlb flush method on everything except 0x50/0xac
drm/nouveau: quirk for XFX GT-240X-YA
drm/nv50-nvc0: work around an evo channel hang that some people see
drm/nouveau: implement init table opcode 0x5c
drm/nouveau: fix oops on unload with disabled LVDS panel
nv30: Fix parsing of perf table
drm/nouveau: correct memtiming table parsing for nv4x
Revert commit 6de9843dab, since it
caused a data corruption regression with BitTorrent downloads. Thanks
to Damien for discovering and bisecting to find the problem commit.
https://bugzilla.kernel.org/show_bug.cgi?id=32972
Reported-by: Damien Grassart <damien@grassart.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We can create 4402345721856 byte file with indirect block mapping.
However, if we grow an indirect-block file to the size with ftruncate(),
we can see an ext4 warning. The following patch fixes this problem.
How to reproduce:
# dd if=/dev/zero of=/mnt/mp1/hoge bs=1 count=0 seek=4402345721856
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000221428 s, 0.0 kB/s
# tail -n 1 /var/log/messages
Nov 25 15:10:27 test kernel: EXT4-fs warning (device sda8): ext4_block_to_path:345: block 1074791436 > max in inode 12
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
ext4_journal_start_sb() should not prevent an active handle from being
started due to s_frozen. Otherwise, deadlock is easy to happen, below
is a situation.
================================================
freeze | truncate
================================================
| ext4_ext_truncate()
freeze_super() | starts a handle
sets s_frozen |
| ext4_ext_truncate()
| holds i_data_sem
ext4_freeze() |
waits for updates |
| ext4_free_blocks()
| calls dquot_free_block()
|
| dquot_free_blocks()
| calls ext4_dirty_inode()
|
| ext4_dirty_inode()
| trys to start an active
| handle
|
| block due to s_frozen
================================================
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Amir Goldstein <amir73il@users.sf.net>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
ext4 has taken the stance that, in the absence of a journal,
when an fsync/fdatasync of an inode is done, the parent
directory should be sync'ed if this inode entry is new.
ext4_sync_parent(), which implements this, does indeed sync
the dirent pages for parent directories, but it does not
sync the directory *inode*. This patch fixes this.
Also now return error status from ext4_sync_parent().
I tested this using a power fail test, which panics a
machine running a file server getting requests from a
client. Without this patch, on about every other test run,
the server is missing many, many files that had been synced.
With this patch, on > 6 runs, I see zero files being lost.
Google-Bug-Id: 4179519
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch adds support for SMSC's LAN9530, LAN9730 and LAN89530 USB
ethernet controllers to the existing smsc95xx driver by adding
their new USB VID/PID pairs.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: hda - Don't query connections for widgets have no connections
ALSA: HDA: Fix single internal mic on ALC275 (Sony Vaio VPCSB1C5E)
ALSA: hda - HDMI: Fix MCP7x audio infoframe checksums
ALSA: usb-audio: define another USB ID for a buggy USB MIDI cable
ALSA: HDA: Fix dock mic for Lenovo X220-tablet
ASoC: format_register_str: Don't clip register values
ASoC: PXA: Fix oops in __pxa2xx_pcm_prepare
ASoC: zylonite: set .codec_dai_name in initializer
* git://git.infradead.org/mtd-2.6:
mtd: atmel_nand: use CPU I/O when buffer is in vmalloc(ed) region
mtd: atmel_nand: modify test case for using DMA operations
mtd: atmel_nand: fix support for CPUs that do not support DMA access
mtd: atmel_nand: trivial: change DMA usage information trace
mtd: mtdswap: fix printk format warning
Switch some errors to debug output. These are generally harmless
and tend to confuse users.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is necessary even with PCI(e) GART, and it makes writeback work even with
AGP on my PowerBook. Might still be unreliable with older revisions of UniNorth
and other AGP bridges though.
Signed-off-by: Michel Dänzer <daenzer@vmware.com>
Reviewed-by: Alex Deucher <alex.deucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This has always used a big hammer, but that hammer is probably
too big, I'm also not sure its necessary but at least this
should be safe.
Should fix: https://bugzilla.kernel.org/show_bug.cgi?id=23592
Signed-off-by: Dave Airlie <airlied@redhat.com>
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSEC
NFS: Fix a signed vs. unsigned secinfo bug
Revert "net/sunrpc: Use static const char arrays"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
quota: Don't write quota info in dquot_commit()
ext3: Fix writepage credits computation for ordered mode
Add proper blk_start_plug/blk_finish_plug pairs for the two places where
we issue buffer I/O, and remove the blk_flush_plug in xfs_buf_lock and
xfs_buf_iowait, given that context switches already flush the per-process
plugging lists.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
For a CONFIG_XFS_DEBUG=n build gcc complains about statements with no
effect in xfs_debug:
fs/xfs/quota/xfs_qm_syscalls.c: In function 'xfs_qm_scall_trunc_qfiles':
fs/xfs/quota/xfs_qm_syscalls.c:291:3: warning: statement with no effect
The reason for that is that the various new xfs message functions have a
return value which is never used, and in case of the non-debug build
xfs_debug the macro evaluates to a plain 0 which produces the above
warnings. This can be fixed by turning xfs_debug into an inline function
instead of a macro, but in addition to that I've also changed all the
message helpers to return void as we never use their return values.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
GCC 4.6 now warnings about variables set but not used. Fix the trivially
fixable warnings of this sort.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
If not done, second attempt to open the RX ring would cause memory corruption.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
When bringing the port up, performing a SENSE_PORT command
To try and check to which physical link type (IB or Ethernet) the physical
port is connected.
In case there is no valid link partner, the port will come up as its
supported default.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the Power platform, the log tail debug checks fire excessively
causing the system to panic early in testing. The debug checks are
known to be racy, though on x86_64 there is no evidence that they
trigger at all.
We want to keep the checks active on debug systems to alert us to
problems with log space accounting, but we need to reduce the impact
of a racy check on testing on the Power platform.
As a result, convert the ASSERT conditions to warnings, and
allow them to fire only once per filesystem mount. This will prevent
false positives from interfering with testing, whilst still
providing us with the indication that they may be a problem with log
space accounting should that occur.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
A fuzzed filesystem crashed a kernel when freeing an extent with a
block number beyond the end of the filesystem. Convert all the debug
asserts in xfs_free_extent() to active checks so that we catch bad
extents and return that the filesytsem is corrupted rather than
crashing.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
When we are short on memory, we want to expedite the cleaning of
dirty objects. Hence when we run short on memory, we need to kick
the AIL flushing into action to clean as many dirty objects as
quickly as possible. To implement this, sample the lsn of the log
item at the head of the AIL and use that as the push target for the
AIL flush.
Further, we keep items in the AIL that are dirty that are not
tracked any other way, so we can get objects sitting in the AIL that
don't get written back until the AIL is pushed. Hence to get the
filesystem to the idle state, we might need to push the AIL to flush
out any remaining dirty objects sitting in the AIL. This requires
the same push mechanism as the reclaim push.
This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to
match the new xfs_ail_max_lsn() function introduced in this patch.
Similarly for xfs_trans_ail_push -> xfs_ail_push.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
This patch rearranges the location of functions in xfs_trans_ail.c
to remove the need for forward declarations of those functions in
preparation for adding new functions without the need for forward
declarations.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Similar to the xfssyncd, the per-filesystem xfsaild threads can be
converted to a global workqueue and run periodically by delayed
works. This makes sense for the AIL pushing because it uses
variable timeouts depending on the work that needs to be done.
By removing the xfsaild, we simplify the AIL pushing code and
remove the need to spread the code to implement the threading
and pushing across multiple files.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Background inode reclaim needs to run more frequently that the XFS
syncd work is run as 30s is too long between optimal reclaim runs.
Add a new periodic work item to the xfs syncd workqueue to run a
fast, non-blocking inode reclaim scan.
Background inode reclaim is kicked by the act of marking inodes for
reclaim. When an AG is first marked as having reclaimable inodes,
the background reclaim work is kicked. It will continue to run
periodically untill it detects that there are no more reclaimable
inodes. It will be kicked again when the first inode is queued for
reclaim.
To ensure shrinker based inode reclaim throttles to the inode
cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the
background inode reclaim so that when we are low on memory we are
trying to reclaim inodes as efficiently as possible. This kick shoul
d not be necessary, but it will protect against failures to kick the
background reclaim when inodes are first dirtied.
To provide the rate throttling, make the shrinker pass do
synchronous inode reclaim so that it blocks on inodes under IO. This
means that the shrinker will reclaim inodes rather than just
skipping over them, but it does not adversely affect the rate of
reclaim because most dirty inodes are already under IO due to the
background reclaim work the shrinker kicked.
These two modifications solve one of the two OOM killer invocations
Chris Mason reported recently when running a stress testing script.
The particular workload trigger for the OOM killer invocation is
where there are more threads than CPUs all unlinking files in an
extremely memory constrained environment. Unlike other solutions,
this one does not have a performance impact on performance when
memory is not constrained or the number of concurrent threads
operating is <= to the number of CPUs.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
On of the problems with the current inode flush at ENOSPC is that we
queue a flush per ENOSPC event, regardless of how many are already
queued. Thi can result in hundreds of queued flushes, most of
which simply burn CPU scanned and do no real work. This simply slows
down allocation at ENOSPC.
We really only need one active flush at a time, and we can easily
implement that via the new xfs_syncd_wq. All we need to do is queue
a flush if one is not already active, then block waiting for the
currently active flush to complete. The result is that we only ever
have a single ENOSPC inode flush active at a time and this greatly
reduces the overhead of ENOSPC processing.
On my 2p test machine, this results in tests exercising ENOSPC
conditions running significantly faster - 042 halves execution time,
083 drops from 60s to 5s, etc - while not introducing test
regressions.
This allows us to remove the old xfssyncd threads and infrastructure
as they are no longer used.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
All of the work xfssyncd does is background functionality. There is
no need for a thread per filesystem to do this work - it can al be
managed by a global workqueue now they manage concurrency
effectively.
Introduce a new gglobal xfssyncd workqueue, and convert the periodic
work to use this new functionality. To do this, use a delayed work
construct to schedule the next running of the periodic sync work
for the filesystem. When the sync work is complete, queue a new
delayed work for the next running of the sync work.
For laptop mode, we wait on completion for the sync works, so ensure
that the sync work queuing interface can flush and wait for work to
complete to enable the work queue infrastructure to replace the
current sequence number and wakeup that is used.
Because the sync work does non-trivial amounts of work, mark the
new work queue as CPU intensive.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
When formatting an inode item, we have to allocate a separate buffer
to hold extents when there are delayed allocation extents on the
inode and it is in extent format. The allocation size is derived
from the in-core data fork representation, which accounts for
delayed allocation extents, while the on-disk representation does
not contain any delalloc extents.
As a result of this mismatch, the allocated buffer can be far larger
than needed to hold the real extent list which, due to the fact the
inode is in extent format, is limited to the size of the literal
area of the inode. However, we can have thousands of delalloc
extents, resulting in an allocation size orders of magnitude larger
than is needed to hold all the real extents.
Fix this by limiting the size of the buffer being allocated to the
size of the literal area of the inodes in the filesystem (i.e. the
maximum size an inode fork can grow to).
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Commit 1018b5c016 ("Set rt->rt_iif more
sanely on output routes.") breaks rt_is_{output,input}_route.
This became the cause to return "IP_PKTINFO's ->ipi_ifindex == 0".
To fix it, this does:
1) Add "int rt_route_iif;" to struct rtable
2) For input routes, always set rt_route_iif to same value as rt_iif
3) For output routes, always set rt_route_iif to zero. Set rt_iif
as it is done currently.
4) Change rt_is_{output,input}_route() to test rt_route_iif
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>