Commit Graph

11107 Commits

Author SHA1 Message Date
Dean Nelson
a460ef8d0a [IA64] fix possible XPC deadlock when disconnecting
This patch eliminates a potential deadlock that is possible when XPC
disconnects a channel to a partition that has gone down. This deadlock will
occur if at least one of the kthreads created by XPC for the purpose of making
callouts to the channel's registerer is detained in the registerer and will
not be returning back to XPC until some registerer request occurs on the now
downed partition. The potential for a deadlock is removed by ensuring that
there always is a kthread available to make the channel disconnecting callout
to the registerer.

Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-12-12 11:48:53 -08:00
Linus Torvalds
d224a93d91 Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: (29 commits)
  sh: Fixup SH-2 BUG() trap handling.
  sh: Use early_param() for earlyprintk parsing.
  sh: Fix .empty_zero_page alignment for PAGE_SIZE > 4096.
  sh: Fixup .data.page_aligned.
  sh: Hook up SH7722 scif ipr interrupts.
  sh: Fixup sh_bios() trap handling.
  sh: SH-MobileR SH7722 CPU support.
  sh: Fixup dma_cache_sync() callers.
  sh: Convert remaining remap_area_pages() users to ioremap_page_range().
  sh: Fixup kernel_execve() for syscall cleanups.
  sh: Fix get_wchan().
  sh: BUG() handling through trapa vector.
  rtc: rtc-sh: alarm support.
  rtc: rtc-sh: fix rtc for out-by-one for the month.
  sh: Kill off unused SE7619 I/O ops.
  serial: sh-sci: Shut up various sci_rxd_in() gcc4 warnings.
  sh: Split out atomic ops logically.
  sh: Fix Solution Engine 7619 build.
  sh: Trivial build fixes for SH-2 support.
  sh: IPR IRQ updates for SH7619/SH7206.
  ...
2006-12-12 08:14:46 -08:00
Ingo Molnar
99a3eb3845 [PATCH] lockdep: fix seqlock_init()
seqlock_init() needs to use spin_lock_init() for dynamic locks, so that
lockdep is notified about the presence of a new lock.

(this is a fallout of the recent networking merge, which started using
the so-far unused seqlock_init() API.)

This fix solves the following lockdep-internal warning on current -git:

 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
     __lock_acquire+0x10c/0x9f9
     lock_acquire+0x56/0x72
     _spin_lock+0x35/0x42
     neigh_destroy+0x9d/0x12e
     neigh_periodic_timer+0x10a/0x15c
     run_timer_softirq+0x126/0x18e
     __do_softirq+0x6b/0xe6
     do_softirq+0x64/0xd2
     ksoftirqd+0x82/0x138

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-12 08:10:44 -08:00
Boaz Harrosh
2b02a17920 [PATCH] remove blk_queue_activity_fn
While working on bidi support at struct request level
I have found that blk_queue_activity_fn is actually never used.
The only user is in ide-probe.c with this code:

	/* enable led activity for disk drives only */
	if (drive->media == ide_disk && hwif->led_act)
		blk_queue_activity_fn(q, hwif->led_act, drive);

And led_act is never initialized anywhere.
(Looking back at older kernels it was used in the PPC arch, but was removed around 2.6.18)
Unless it is all for future use off course.
(this patch is against linux-2.6-block.git as off 2006/12/4)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2006-12-12 10:22:23 +01:00
Linus Torvalds
4259cb25d4 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits)
  [NETPOLL]: Fix local_bh_enable() warning.
  [IPVS]: Make ip_vs_sync.c <= 80col wide.
  [IPVS]: Use msleep_interruptable() instead of ssleep() aka msleep()
  [HAMRADIO]: Fix baycom_epp.c compile failure.
  [DCCP]: Whitespace cleanups
  [DCCP] ccid3: Fixup some type conversions related to rtts
  [DCCP] ccid3: BUG-FIX - conversion errors
  [DCCP] ccid3: Reorder packet history source file
  [DCCP] ccid3: Reorder packet history header file
  [DCCP] ccid3: Make debug output consistent
  [DCCP] ccid3: Perform history operations only after packet has been sent
  [DCCP] ccid3: TX history - remove unused field
  [DCCP] ccid3: Shift window counter computation
  [DCCP] ccid3: Sanity-check RTT samples
  [DCCP] ccid3: Initialise RTT values
  [DCCP] ccid: Deprecate ccid_hc_tx_insert_options
  [DCCP]: Warn when discarding packet due to internal errors
  [DCCP]: Only deliver to the CCID rx side in charge
  [DCCP]: Simplify TFRC calculation
  [DCCP]: Debug timeval operations
  ...
2006-12-11 18:35:17 -08:00
Linus Torvalds
cd39301a68 Merge branch 'for-linus' of git://www.atmel.no/~hskinnemoen/linux/kernel/avr32
* 'for-linus' of git://www.atmel.no/~hskinnemoen/linux/kernel/avr32:
  [AVR32] Add missing #include <linux/param.h> to delay.c
  [AVR32] Pass dev parameter to dma_cache_sync()
  [AVR32] Implement intc_get_pending()
  [AVR32] Don't include <asm/delay.h>
  [AVR32] Put the chip in "stop" mode when halting the system
  [AVR32] Set flow handler for external interrupts
  [AVR32] Remove unused file
  [AVR32] Remove mii_phy_addr and eth_addr from eth_platform_data
  [AVR32] Move ethernet tag parsing to board-specific code
  [AVR32] Add macb1 platform_device
  [AVR32] Portmux API update
2006-12-11 18:28:59 -08:00
Linus Torvalds
13d7d84e07 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (36 commits)
  [POWERPC] Generic BUG for powerpc
  [PPC] Fix compile failure do to introduction of PHY_POLL
  [POWERPC] Only export __mtdcr/__mfdcr if CONFIG_PPC_DCR is set
  [POWERPC] Remove old dcr.S
  [POWERPC] Fix SPU coredump code for max_fdset removal
  [POWERPC] Fix irq routing on some 32-bit PowerMacs
  [POWERPC] ps3: Add vuart support
  [POWERPC] Support ibm,dynamic-reconfiguration-memory nodes
  [POWERPC] dont allow pSeries_probe to succeed without initialising MMU
  [POWERPC] micro optimise pSeries_probe
  [POWERPC] Add SPURR SPR to sysfs
  [POWERPC] Add DSCR SPR to sysfs
  [POWERPC] Fix 440SPe CPU table entry
  [POWERPC] Add support for FP emulation for the e300c2 core
  [POWERPC] of_device_register: propagate device_create_file return code
  [POWERPC] Fix mmap of PCI resource with hack for X
  [POWERPC] iSeries: head_64.o needs to depend on lparmap.s
  [POWERPC] cbe_thermal: Fix initialization of sysfs attribute_group
  [POWERPC] Remove QE header files from lite5200.c
  [POWERPC] of_platform_make_bus_id(): make `magic' int
  ...
2006-12-11 18:24:58 -08:00
Ralf Baechle
8b2f35504d [MIPS] IP27: Don't drag <asm/sn/arch.h> into topology.h.
Another way that old SGI types were getting dragged into generic code.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-12-12 01:46:24 +00:00
Ralf Baechle
2bbc5bdfb1 [MIPS] IP27: Move definition of nic_t to its sole user.
This also fixes the duplicate definition of nic_t in the s2io driver.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-12-12 01:46:24 +00:00
Ralf Baechle
2f3643aecd [MIPS] IP27: Don't include <asm/sn/arch.h>.
Nothing <asm/sn/arch.h> defines is used.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-12-12 01:46:24 +00:00
Ralf Baechle
b723782587 [MIPS] compat.h uses struct pt_regs so needs to include ptrace.h.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-12-12 01:46:24 +00:00
Paul Mundt
41504c3972 sh: SH-MobileR SH7722 CPU support.
This adds CPU support for the SH7722.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-12-12 08:42:09 +09:00
Paul Mundt
5432143464 sh: Fixup dma_cache_sync() callers.
This now takes a struct device, update all of the callers.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-12-12 08:42:09 +09:00
Paul Mundt
dc34d312c7 sh: BUG() handling through trapa vector.
Previously we haven't been doing anything with verbose BUG() reporting,
and we've been relying on the oops path for handling BUG()'s, which is
rather sub-optimal.

This switches BUG handling to use a fixed trapa vector (#0x3e) where we
construct a small bug frame post trapa instruction to get the context
right. This also makes it trivial to wire up a DIE_BUG for the atomic
die chain, which we couldn't really do before.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-12-12 08:42:08 +09:00
Paul Mundt
ec723fbe7e sh: Split out atomic ops logically.
We have a few different ways to do the atomic operations, so split
them out in to different headers rather than bloating atomic.h.
Kernelspace gUSA will take this up to a third implementation.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-12-12 08:42:08 +09:00
Paul Mundt
b6250e3729 sh: landisk board build fixes.
Get the landisk board building again..

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-12-12 08:42:07 +09:00
Paul Mundt
fce3a24e70 sh: push-switch fixups for work_struct API damage.
INIT_WORK() dropped the data arg, so now we have to stash an extra
pointer and backpedal instead.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-12-12 08:42:07 +09:00
Paul Mundt
b482ad5dae sh: Shut up csum_ipv6_magic() warnings.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-12-12 08:42:07 +09:00
Paul Mundt
b9b382dabb sh: Reworked swap cache entry encoding for SH-X2 MMU.
In the 64-bit PTE case there's no point in restricting the encoding
to the low bits of the PTE, we can instead bump all of this up to
the high 32 bits and extend PTE_FILE_MAX_BITS to 32, adopting the
same convention used by x86 PAE.

There's a minor discrepency between the number of bits used for the
swap type encoding between 32 and 64-bit PTEs, but this is unlikely
to cause any problem given the extended offset.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-12-12 08:42:06 +09:00
Ralf Baechle
f654c854d1 [HAMRADIO]: Fix baycom_epp.c compile failure.
Fix foobar in 15b1c0e822 and
e8cc49bb0f patch series.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-11 14:35:01 -08:00
Arnaldo Carvalho de Melo
8109b02b53 [DCCP]: Whitespace cleanups
That accumulated over the last months hackaton, shame on me for not
using git-apply whitespace helping hand, will do that from now on.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:35:00 -08:00
Gerrit Renker
1a21e49a8d [DCCP] ccid3: Finer-grained resolution of sending rates
This patch
 * resolves a bug where packets smaller than 32/64 bytes resulted in sending rates of 0
 * supports all sending rates from 1/64 bytes/second up to 4Gbyte/second
 * simplifies the present overflow problems in calculations

Current sending rate X and the cached value X_recv of the receiver-estimated
sending rate are both scaled by 64 (2^6) in order to
 * cope with low sending rates (minimally 1 byte/second)
 * allow upgrading to use a packets-per-second implementation of CCID 3
 * avoid calculation errors due to integer arithmetic cut-off

The patch implements a revised strategy from
http://www.mail-archive.com/dccp@vger.kernel.org/msg01040.html

The only difference with regard to that strategy is that t_ipi is already
used in the calculation of the nofeedback timeout, which saves one division.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:42 -08:00
David Howells
022416967a [PATCH] LOG2: Make powerpc's __ilog2_u64() take a 64-bit argument
Make powerpc's __ilog2_u64() take a 64-bit argument.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-11 12:29:27 -08:00
Linus Torvalds
8d610dd52d Make sure we populate the initroot filesystem late enough
We should not initialize rootfs before all the core initializers have
run.  So do it as a separate stage just before starting the regular
driver initializers.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-11 12:12:04 -08:00
Linus Torvalds
8993780a6e Make SLES9 "get_kernel_version" work on the kernel binary again
As reported by Andy Whitcroft, at least the SLES9 initrd build process
depends on getting the kernel version from the kernel binary.  It does
that by simply trawling the binary and looking for the signature of the
"linux_banner" string (the string "Linux version " to be exact. Which
is really broken in itself, but whatever..)

That got broken when the string was changed to allow /proc/version to
change the UTS release information dynamically, and "get_kernel_version"
thus returned "%s" (see commit a2ee8649ba:
"[PATCH] Fix linux banner utsname information").

This just restores "linux_banner" as a static string, which should fix
the version finding.  And /proc/version simply uses a different string.

To avoid wasting even that miniscule amount of memory, the early boot
string should really be marked __initdata, but that just causes the same
bug in SLES9 to re-appear, since it will then find other occurrences of
"Linux version " first.

Cc: Andy Whitcroft <apw@shadowen.org>
Acked-by: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Steve Fox <drfickle@us.ibm.com>
Acked-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-11 11:34:11 -08:00
Jeremy Fitzhardinge
73c9ceab40 [POWERPC] Generic BUG for powerpc
This makes powerpc use the generic BUG machinery.  The biggest reports the
function name, since it is redundant with kallsyms, and not needed in general.

There is an overall reduction of code, since module_32/64 duplicated several
functions.

Unfortunately there's no way to tell gcc that BUG won't return, so the BUG
macro includes a goto loop.  This will generate a real jmp instruction, which
is never used.

[akpm@osdl.org: build fix]
[paulus@samba.org: remove infinite loop in BUG_ON]
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Hugh Dickens <hugh@veritas.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-11 16:35:07 +11:00
Paul Mackerras
973c1fabc7 Merge branch 'for_paulus' of master.kernel.org:/pub/scm/linux/kernel/git/galak/powerpc 2006-12-11 16:31:42 +11:00
Kumar Gala
d10f73480b [PPC] Fix compile failure do to introduction of PHY_POLL
PHY_POLL is defined in <linux/phy.h> include it in <linux/fsl_devices.h> so
board code will have it defined.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2006-12-10 23:26:16 -06:00
Kumar Gala
c86c676cca Merge branch '85xx' into for_paulus 2006-12-10 23:16:08 -06:00
Kumar Gala
45d8e7aaf4 [POWERPC] Only export __mtdcr/__mfdcr if CONFIG_PPC_DCR is set
On 85xx we don't build in dcr support because the core doesn't implement the
instructions.  This caused problems when building an 85xx kernel.  Additionally
made it so we only build __mtdcr/__mfdcr if we are CONFIG_PPC_DCR_NATIVE.

The 85xx build issue wasPointed out by Dai Haruki.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2006-12-10 23:15:47 -06:00
Ralf Baechle
2d911e9a4e [MIPS] Move die and die_if_kernel() from system.h to ptrace.h
This eleminates the need to include ptrace.h into system.h and fixes a
harmless namespace conflict on the PC symbol in bpck.c.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-12-10 21:52:11 +00:00
Ralf Baechle
5b1d221e62 [MIPS] Fix build of several IDE drivers by providing pci_get_legacy_ide_irq
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-12-10 21:52:11 +00:00
Linus Torvalds
edb16bec41 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: Fix several kprobes bugs.
  [SPARC64]: Update defconfig.
  [SPARC64]: dma remove extra brackets
  [SPARC{32,64}]: Propagate ptrace_traceme() return value.
  [SPARC64]: Replace kmalloc+memset with kzalloc
  [SPARC]: Check kzalloc() return value in SUN4D irq/iommu init.
  [SPARC]: Replace kmalloc+memset with kzalloc
  [SPARC64]: Run ctrl-alt-del action for sun4v powerdown request.
  [SPARC64]: Unaligned accesses to userspace are hard errors.
  [SPARC64]: Call do_mathemu on illegal instruction traps too.
  [SPARC64]: Update defconfig.
  [SPARC64]: Add irqtrace/stacktrace/lockdep support.
2006-12-10 10:00:00 -08:00
Linus Torvalds
bb7320d1d9 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (132 commits)
  V4L/DVB 4949b: Fix container_of pointer retreival
  V4L/DVB (4949a): Fix INIT_WORK
  V4L/DVB (4949): Cxusb: codingstyle cleanups
  V4L/DVB (4948): Cxusb: Convert tuner functions to use dvb_pll_attach
  V4L/DVB (4947): Cx88: trivial cleanups
  V4L/DVB (4946): Cx88: Move cx88_dvb_bus_ctrl out of the card-specific area
  V4L/DVB (4945): Cx88: consolidate cx22702_config structs
  V4L/DVB (4944): Cx88: Convert DViCO FusionHDTV Hybrid to use dvb_pll_attach
  V4L/DVB (4943): Cx88: cleanup dvb_pll_attach for lgdt3302 tuners
  V4L/DVB (4953): Usbvision minor fixes
  V4L/DVB (4951): Add version.h, since it is required for VIDIOC_QUERYCAP
  V4L/DVB (4940): Or51211: Changed SNR and signal strength calculations
  V4L/DVB (4939): Or51132: Changed SNR and signal strength reporting
  V4L/DVB (4938): Cx88: Convert lgdt3302 tuning function to use dvb_pll_attach
  V4L/DVB (4941): Remove LINUX_VERSION_CODE and fix identations
  V4L/DVB (4942): Whitespace cleanups
  V4L/DVB (4937): Usbvision cleanup and code reorganization
  V4L/DVB (4936): Make MT4049FM5 tuner to set FM Gain to Normal
  V4L/DVB (4935): Added the capability of selecting fm gain by tuner
  V4L/DVB (4934): Usbvision radio requires GainNormal at e register
  ...
2006-12-10 09:59:18 -08:00
Avi Kivity
6aa8b732ca [PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net

mailing list: kvm-devel@lists.sourceforge.net
  (http://lists.sourceforge.net/lists/listinfo/kvm-devel)

The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture.  The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace.  Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.

Using this driver, one can start multiple virtual machines on a host.

Each virtual machine is a process on the host; a virtual cpu is a thread in
that process.  kill(1), nice(1), top(1) work as expected.  In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode.  Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm).  Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.

The driver supports i386 and x86_64 hosts and guests.  All combinations are
allowed except x86_64 guest on i386 host.  For i386 guests and hosts, both pae
and non-pae paging modes are supported.

SMP hosts and UP guests are supported.  At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.

Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch.  We plan to address this in two ways:

- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables

Currently a virtual desktop is responsive but consumes a lot of CPU.  Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization.  Linux/X is slower, probably due
to X being in a separate process.

In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.

Caveats (akpm: might no longer be true):

- The Windows install currently bluescreens due to a problem with the
  virtual APIC.  We are working on a fix.  A temporary workaround is to
  use an existing image or install through qemu
- Windows 64-bit does not work.  That's also true for qemu, so it's
  probably a problem with the device model.

[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:22 -08:00
Daniel Walker
f5f1a24a2c [PATCH] clocksource: small cleanup
Mostly changing alignment.  Just some general cleanup.

[akpm@osdl.org: build fix]
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:22 -08:00
Arjan van de Ven
4c36a5dec2 [PATCH] round_jiffies infrastructure
Introduce a round_jiffies() function as well as a round_jiffies_relative()
function.  These functions round a jiffies value to the next whole second.
The primary purpose of this rounding is to cause all "we don't care exactly
when" timers to happen at the same jiffy.

This avoids multiple timers firing within the second for no real reason;
with dynamic ticks these extra timers cause wakeups from deep sleep CPU
sleep states and thus waste power.

The exact wakeup moment is skewed by the cpu number, to avoid all cpus from
waking up at the exact same time (and hitting the same lock/cachelines
there)

[akpm@osdl.org: fix variable type]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:22 -08:00
Vadim Lobanov
5466b456ed [PATCH] fdtable: Implement new pagesize-based fdtable allocator
This patch provides an improved fdtable allocation scheme, useful for
expanding fdtable file descriptor entries.  The main focus is on the fdarray,
as its memory usage grows 128 times faster than that of an fdset.

The allocation algorithm sizes the fdarray in such a way that its memory usage
increases in easy page-sized chunks. The overall algorithm expands the allowed
size in powers of two, in order to amortize the cost of invoking vmalloc() for
larger allocation sizes. Namely, the following sizes for the fdarray are
considered, and the smallest that accommodates the requested fd count is
chosen:

    pagesize / 4
    pagesize / 2
    pagesize      <- memory allocator switch point
    pagesize * 2
    pagesize * 4
    ...etc...

Unlike the current implementation, this allocation scheme does not require a
loop to compute the optimal fdarray size, and can be done in efficient
straightline code.

Furthermore, since the fdarray overflows the pagesize boundary long before any
of the fdsets do, it makes sense to optimize run-time by allocating both
fdsets in a single swoop.  Even together, they will still be, by far, smaller
than the fdarray.  The fdtable->open_fds is now used as the anchor for the
fdset memory allocation.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:22 -08:00
Vadim Lobanov
4fd45812cb [PATCH] fdtable: Remove the free_files field
An fdtable can either be embedded inside a files_struct or standalone (after
being expanded).  When an fdtable is being discarded after all RCU references
to it have expired, we must either free it directly, in the standalone case,
or free the files_struct it is contained within, in the embedded case.

Currently the free_files field controls this behavior, but we can get rid of
it entirely, as all the necessary information is already recorded.  We can
distinguish embedded and standalone fdtables using max_fds, and if it is
embedded we can divine the relevant files_struct using container_of().

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:22 -08:00
Vadim Lobanov
bbea9f6966 [PATCH] fdtable: Make fdarray and fdsets equal in size
Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets.  The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).

In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.

Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal.  This
patch removes fdtable->max_fdset.  As an added bonus, most of the supporting
code becomes simpler.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:22 -08:00
Raz Ben-Jehuda(caro)
46031f9a38 [PATCH] md: allow reads that have bypassed the cache to be retried on failure
If a bypass-the-cache read fails, we simply try again through the cache.  If
it fails again it will trigger normal recovery precedures.

update 1:

From: NeilBrown <neilb@suse.de>

1/
  chunk_aligned_read and retry_aligned_read assume that
      data_disks == raid_disks - 1
  which is not true for raid6.
  So when an aligned read request bypasses the cache, we can get the wrong data.

2/ The cloned bio is being used-after-free in raid5_align_endio
   (to test BIO_UPTODATE).

3/ We forgot to add rdev->data_offset when submitting
   a bio for aligned-read

4/ clone_bio calls blk_recount_segments and then we change bi_bdev,
   so we need to invalidate the segment counts.

5/ We don't de-reference the rdev when the read completes.
   This means we need to record the rdev to so it is still
   available in the end_io routine.  Fortunately
   bi_next in the original bio is unused at this point so
   we can stuff it in there.

6/ We leak a cloned bio if the target rdev is not usable.

From: NeilBrown <neilb@suse.de>

update 2:

1/ When aligned requests fail (read error) they need to be retried
   via the normal method (stripe cache).  As we cannot be sure that
   we can process a single read in one go (we may not be able to
   allocate all the stripes needed) we store a bio-being-retried
   and a list of bioes-that-still-need-to-be-retried.
   When find a bio that needs to be retried, we should add it to
   the list, not to single-bio...

2/ We were never incrementing 'scnt' when resubmitting failed
   aligned requests.

[akpm@osdl.org: build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:20 -08:00
Alan Cox
ee2f344b33 [PATCH] ide-cd: Handle strange interrupt on the Intel ESB2
The ESB2 appears to emit spurious DMA interrupts when configured for native
mode and handling ATAPI devices.  Stratus were able to pin this bug down and
produce a patch.  This is a rework which applies the fixup only to the ESB2
(for now).  We can apply it to other chips later if the same problem is found.

This code has been tested and confirmed to fix the problem on the tested
systems.

Signed-off-by: Alan Cox <alan@redhat.com>
(Most of the hard work done by Stratus however)
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:20 -08:00
Chen, Kenneth W
06066714f6 [PATCH] sched: remove lb_stopbalance counter
Remove scheduler stats lb_stopbalance counter.  This counter can be
calculated by: lb_balanced - lb_nobusyg - lb_nobusyq.  There is no need to
create gazillion counters while we can derive the value.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:43 -08:00
Siddha, Suresh B
783609c6cb [PATCH] sched: decrease number of load balances
Currently at a particular domain, each cpu in the sched group will do a
load balance at the frequency of balance_interval.  More the cores and
threads, more the cpus will be in each sched group at SMP and NUMA domain.
And we endup spending quite a bit of time doing load balancing in those
domains.

Fix this by making only one cpu(first idle cpu or first cpu in the group if
all the cpus are busy) in the sched group do the load balance at that
particular sched domain and this load will slowly percolate down to the
other cpus with in that group(when they do load balancing at lower
domains).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:43 -08:00
Christoph Lameter
08c183f31b [PATCH] sched: add option to serialize load balancing
Large sched domains can be very expensive to scan.  Add an option SD_SERIALIZE
to the sched domain flags.  If that flag is set then we make sure that no
other such domain is being balanced.

[akpm@osdl.org: build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:43 -08:00
Christoph Lameter
c9819f4593 [PATCH] sched: use softirq for load balancing
Call rebalance_tick (renamed to run_rebalance_domains) from a newly introduced
softirq.

We calculate the earliest time for each layer of sched domains to be rescanned
(this is the rescan time for idle) and use the earliest of those to schedule
the softirq via a new field "next_balance" added to struct rq.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:42 -08:00
Siddha, Suresh B
ac7d550499 [PATCH] sched domain: increase the SMT busy rebalance interval
With SMT, if the logical processor is busy, load balance happens for every
8msec(min)-16msec(max).  There is no need to do this often, as this is just
for fairness(to maintain uniform runqueue lengths) and default time slice
anyhow is 100msec.

Appended patch increases this interval to 64msec(min)-128msec(max) when the
logical processor is busy.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:42 -08:00
Andrew Morton
4a7864ca63 [PATCH] io-accounting: via taskstats
Deliver IO accounting via taskstats.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:41 -08:00
Andrew Morton
f2f1f8a3b8 [PATCH] cleanup taskstats.h
Fix weird whitespace mangling in taskstats.h

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:41 -08:00
Andrew Morton
7c3ab7381e [PATCH] io-accounting: core statistics
The present per-task IO accounting isn't very useful.  It simply counts the
number of bytes passed into read() and write().  So if a process reads 1MB
from an already-cached file, it is accused of having performed 1MB of I/O,
which is wrong.

(David Wright had some comments on the applicability of the present logical IO accounting:

  For billing purposes it is useless but for workload analysis it is very
  useful

  read_bytes/read_calls  average read request size
  write_bytes/write_calls average write request size

  read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
  write_bytes/write_blocks  ie logical/physical  guess since pdflush writes can
                                                be missed

  I often look for logical larger than physical to see filesystem cache
  problems.  And the bytes/cpusec can help find applications that are
  dominating the cache and causing slow interactive response from page cache
  contention.

  I want to find the IO intensive applications and make sure they are doing
  efficient IO.  Thus the acctcms(sysV) or csacms command would give the high
  IO commands).

This patchset adds new accounting which tries to be more accurate.  We account
for three things:

reads:

  attempt to count the number of bytes which this process really did cause
  to be fetched from the storage layer.  Done at the submit_bio() level, so it
  is accurate for block-backed filesystems.  I also attempt to wire up NFS and
  CIFS.

writes:

  attempt to count the number of bytes which this process caused to be sent
  to the storage layer.  This is done at page-dirtying time.

  The big inaccuracy here is truncate.  If a process writes 1MB to a file
  and then deletes the file, it will in fact perform no writeout.  But it will
  have been accounted as having caused 1MB of write.

  So...

cancelled_writes:

  account the number of bytes which this process caused to not happen, by
  truncating pagecache.

  We _could_ just subtract this from the process's `write' accounting.  But
  that means that some processes would be reported to have done negative
  amounts of write IO, which is silly.

  So we just report the raw number and punt this decision up to userspace.

Now, we _could_ account for writes at the physical I/O level.  But

- This would require that we track memory-dirtying tasks at the per-page
  level (would require a new pointer in struct page).

- It would mean that IO statistics for a process are usually only available
  long after that process has exitted.  Which means that we probably cannot
  communicate this info via taskstats.

This patch:

Wire up the kernel-private data structures and the accessor functions to
manipulate them.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:41 -08:00