In the NUMA or memory hot-add case where system memory has been
partitioned up, we immediately run in to a situation where the existing
PMB entry doesn't cover the new range (primarily as a result of the entry
size being shrunk to match the node size early in the initialization). In
order to fix this up it's necessary to preload a PMB mapping for the new
range prior to activation in order to circumvent reset by MMU.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The minimum section size for the PMB is 16M, so just always error
out early if the specified size is too small. This permits us to
unconditionally call in to pmb_bolt_mapping() with variable sizes
without wasting a TLB and cache flush for the range.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This reworks much of the bootmem setup and initialization code allowing
us to get rid of duplicate work between the NUMA and non-NUMA cases. The
end result is that we end up with a much more flexible interface for
supporting more complex topologies (fake NUMA, highmem, etc, etc.) which
is entirely LMB backed. This is an incremental step for more NUMA work as
well as gradually enabling migration off of bootmem entirely.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This reworks the memory limit handling to tie in through the available
LMB infrastructure. This requires a bit of reordering as we need to have
all of the LMB reservations taken care of prior to establishing the
limits.
While we're at it, the crash kernel reservation semantics are reworked
so that we allocate from the bottom up and reduce the risk of having
to disable the memory limit due to a clash with the crash kernel
reservation.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This plugs in a memory init callback in the machvec to permit boards to
wire up various bits of memory directly in to LMB. A generic machvec
implementation is provided that simply wraps around the normal
Kconfig-derived memory start/size.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This bumps up the extra LMB reservations in ordering so that they're
accounted for prior to iterating over the region list. This ensures that
reservations are visible both within the LMB and bootmem context.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Various boot loaders go to various extents to thwart the initrd detection
logic (mostly on account of not being able to be bothered with adhering
to the established boot ABI), so we make the detection logic a bit more
robust. This makes it possible to work around the SDK7786's firmware's
attempts to thwart compressed image booting. Victory is mine.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
arch/sh/kernel/smp.c:164: error: conflicting types for 'native_cpu_disable'
/home/matt/src/kernels/sh-2.6/arch/sh/include/asm/smp.h:48: error: previous declaration of 'native_cpu_disable' was here
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add CONFIG_VIRTUALIZATION to the SH architecture
and include the virtio code there. Used to enable
the virtio drivers under QEMU.
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The UP dependency was inherited from ARM, which seems to have run in to
it due to the stacktrace code not being available for SMP in certain
cases, as we don't have this particular limitation there is no specific
need to block on the SMP dependency.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This wires up CPU hotplug for SH-X3 SMP CPUs. Presently only secondary
cores can be hotplugged given that the boot CPU has to contend with the
broadcast timer. When real local timers are implemented this restriction
can be lifted.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This provides a cache of the secondary CPUs idle loop for the cases where
hotplug simply enters a low power state instead of resetting or powering
off the core.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
smp_store_cpu_info() is presently flagged as __init, but is called by
start_secondary() which is __cpuinit, fix it up.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This iterates over the maximum number of CPUs we plan to support and
makes sure they're all set in the present CPU map.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This converts from cpu_set() for the online map to set_cpu_online().
The two online map modifiers were the last remaining manual map
manipulation bits, with this in place everything now goes through
cpumask.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
With the platform ops migration, the definitions still need to be
included in the CONFIG_SMP=n case, so make the asm/smp.h include
explicit.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
When registering CPUs in the topology initialization ensure that all of
the present CPUs are flagged as hotpluggable.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Now that compressed image loading is possible for sdk7786, drop the
vmlinux.bin default image target and update the defconfig accordingly.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
If channel allocation is failing, mark the channel unused and give PM a chance
to power down the hardware.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch updates the sh7751 pci code to handle io ports
correctly. The code is based on the sh7788x implementation.
Tested on a R2D-1 board with CONFIG_8139TOO_PIO=y.
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Update the R2D defconfigs to bump up the maximum
number of SCIF ports on the system.
Fixes a broken serial console regression added
by cd5f107628ab89c5dec5ad923f1c27f4cba41972.
Reported-by: Shin-ichiro KAWASAKI <kawasaki@juno.dti.ne.jp>
Signed-off-by: Magnus Damm <damm@opensource.se>
Tested-by: Alexandre Courbot <alex@dcl.info.waseda.ac.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
CHCR_TS_HIGH_SHIFT is defined as a shift of TS high bits in CHCR register,
relative to low bits. The TS_INDEX2VAL() macro has to take this into account.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/mfleming/sh-2.6:
sh: Use correct mask when comparing PMB DATA array values
sh: Do not try merging two 128MB PMB mappings
sh: Fix zImage load address when CONFIG_32BIT=y
sh: Fix address to decompress at when CONFIG_32BIT=y
sh: Assembly friendly __pa and __va definitions
Lists of DMA channels and slaves are not changed, make them constant. Besides,
SH7724 channel and slave configuration of both DMA controllers is identical,
remove the extra copy of the configuration data.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
As explained in commit 1c0fe6e3bd, we want to call the architecture independent
oom killer when getting an unexplained OOM from handle_mm_fault, rather than
simply killing current.
Cc: linux-sh@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Commit fda48a0d7a84 (tcp: bind() fix when many ports are bound)
introduced a bug on IPV6 part.
We should not call ipv6_addr_any(inet6_rcv_saddr(sk2)) but
ipv6_addr_any(inet6_rcv_saddr(sk)) because sk2 can be IPV4, while sk is
IPV6.
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously we were masking the PMB DATA array values with the value of
__MEMORY_START | PMB_V, which misses some PFN bits off the mask.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
There is a logic error in pmb_merge() that means we will incorrectly try
to merge two 128MB PMB mappings into one mapping. However, 256MB isn't a
valid PMB map size and pmb_merge() will actually drop the second 128MB
mapping.
This patch allows my SDK7786 board to boot when configured with
CONFIG_MEMORY_SIZE=0x10000000.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
We can't necessarily use the P1SEG region to access RAM when running in
32BIT mode, so use CONFIG_MEMORY_START as the base address.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
When running in 32BIT mode the P1SEG region doesn't necessarily provide
a window onto RAM (it depends how the bootloader setup the PMB). The
correct location to place the decompressed kernel is the physical
address of _text.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
This patch defines ___pa and ___va which return the physical and virtual
address of an address, respectively. These macros are suitable for
calling from assembly because they don't include the C casts required by
__pa and __va.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Issue the discard operation *before* releasing the blocks to be reused
ext4: Fix buffer head leaks after calls to ext4_get_inode_loc()
ext4: Fix possible lost inode write in no journal mode
Nothing stops the workqueue being left to run in parallel with close or a
few other operations. This causes double unmaps and the like.
See kerneloops.org #1041230 for an example
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] use max load in conservative governor
[CPUFREQ] fix a lockdep warning
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits)
gianfar: Fix potential oops during OF address translation
fsl_pq_mdio: Fix kernel oops during OF address translation
tcp: bind() fix when many ports are bound
rdma: potential ERR_PTR dereference
rtnetlink: potential ERR_PTR dereference
net: ipv6 bind to device issue
ipv6: allow to send packet after receiving ICMPv6 Too Big message with MTU field less than IPV6_MIN_MTU
drivers/net/usb: Add new driver ipheth
cxgb3: fix linkup issue
X25 fix dead unaccepted sockets
KS8851: NULL pointer dereference if list is empty
net: 3c574_cs fix stats.tx_bytes counter
xfrm6: ensure to use the same dev when building a bundle
can: Fix possible NULL pointer dereference in ems_usb.c
net: Fix an RCU warning in dev_pick_tx()
ipv6: Fix tcp_v6_send_response transport header setting.
bridge: add a missing ntohs()
8139too: Fix a typo in the function name.
mac80211: pass HT changes to driver when off channel
mac80211: remove bogus TX agg state assignment
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: Ensure we re-enable devices on resume
x86/PCI: parse additional host bridge window resource types
PCI: revert broken device warning
PCI aerdrv: use correct bit defines and add 2ms delay to aer_root_reset
x86/PCI: ignore Consumer/Producer bit in ACPI window descriptions
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86:
eeepc-laptop: add missing sparse_keymap_free
eeepc-wmi: Build fix
asus: don't modify bluetooth/wlan on boot
dell-wmi: Fix memory leak
eeepc-wmi: add backlight support
eeepc-wmi: use a platform device as parent device of all sub-devices
eeepc-wmi: add an eeepc_wmi context structure
The unpack routine fails to handle the decompress_method() returning
unrecognised decompressor (compress_name == NULL). This results in the
routine looping eventually oopsing on an out of bounds memory access.
Note this bug is usually hidden, only triggering on trailing junk after
one or more correct compressed blocks. The case of the compressed archive
being complete junk is (by accident?) caught by the if (state != Reset)
check because state is initialised to Start, but not updated due to the
decompressor not having been called. Obviously if the junk is trailing a
correctly decompressed buffer, state == Reset from the previous call to
the decompressor.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The follow_page() function can potentially return -EFAULT so I added
checks for this.
Also I silenced an uninitialized variable warning on my version of gcc
(version 4.3.2).
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a standalone version of VMware Balloon driver. Ballooning is a
technique that allows hypervisor dynamically limit the amount of memory
available to the guest (with guest cooperation). In the overcommit
scenario, when hypervisor set detects that it needs to shuffle some
memory, it instructs the driver to allocate certain number of pages, and
the underlying memory gets returned to the hypervisor. Later hypervisor
may return memory to the guest by reattaching memory to the pageframes and
instructing the driver to "deflate" balloon.
We are submitting a standalone driver because KVM maintainer (Avi Kivity)
expressed opinion (rightly) that our transport does not fit well into
virtqueue paradigm and thus it does not make much sense to integrate with
virtio.
There were also some concerns whether current ballooning technique is the
right thing. If there appears a better framework to achieve this we are
prepared to evaluate and switch to using it, but in the meantime we'd like
to get this driver upstream.
We want to get the driver accepted in distributions so that users do not
have to deal with an out-of-tree module and many distributions have
"upstream first" requirement.
The driver has been shipping for a number of years and users running on
VMware platform will have it installed as part of VMware Tools even if it
will not come from a distribution, thus there should not be additional
risk in pulling the driver into mainline. The driver will only activate
if host is VMware so everyone else should not be affected at all.
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We are seeing a large regression in database performance on recent
kernels. The database opens a block device with O_DIRECT|O_SYNC and a
number of threads write to different regions of the file at the same time.
A simple test case is below. I haven't defined DEVICE since getting it
wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we
see about 17MB/sec and only a few threads in IO wait:
procs -----io---- -system-- -----cpu------
r b bi bo in cs us sy id wa st
0 3 0 16170 656 2259 0 0 86 14 0
0 2 0 16704 695 2408 0 0 92 8 0
0 2 0 17308 744 2653 0 0 86 14 0
0 2 0 17933 759 2777 0 0 89 10 0
Most threads are blocking in vfs_fsync_range, which has:
mutex_lock(&mapping->host->i_mutex);
err = fop->fsync(file, dentry, datasync);
if (!ret)
ret = err;
mutex_unlock(&mapping->host->i_mutex);
commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new
helpers for syncing after writing to O_SYNC file or IS_SYNC inode) offers
some explanation of what is going on:
Use these new helpers for syncing from generic VFS functions. This makes
O_SYNC writes to block devices acquire i_mutex for syncing. If we really
care about this, we can make block_fsync() drop the i_mutex and reacquire
it before it returns.
Thanks Jan for such a good commit message! As well as dropping i_mutex,
Christoph suggests we should remove the call to sync_blockdev():
> sync_blockdev is an overcomplicated alias for filemap_write_and_wait on
> the block device inode, which is exactly what we did just before calling
> into ->fsync
The patch below incorporates both suggestions. With it the testcase improves
from 17MB/s to 68M/sec:
procs -----io---- -system-- -----cpu------
r b bi bo in cs us sy id wa st
0 7 0 65536 1000 3878 0 0 70 30 0
0 34 0 69632 1016 3921 0 1 46 53 0
0 57 0 69632 1000 3921 0 0 55 45 0
0 53 0 69640 754 4111 0 0 81 19 0
Testcase:
#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define NR_THREADS 64
#define BUFSIZE (64 * 1024)
#define DEVICE "/dev/mapper/XXXXXX"
#define ALIGN(VAL, SIZE) (((VAL)+(SIZE)-1) & ~((SIZE)-1))
static int fd;
static void *doit(void *arg)
{
unsigned long offset = (long)arg;
char *b, *buf;
b = malloc(BUFSIZE + 1024);
buf = (char *)ALIGN((unsigned long)b, 1024);
memset(buf, 0, BUFSIZE);
while (1)
pwrite(fd, buf, BUFSIZE, offset);
}
int main(int argc, char *argv[])
{
int flags = O_RDWR|O_DIRECT;
int i;
unsigned long offset = 0;
if (argc > 1 && !strcmp(argv[1], "O_SYNC"))
flags |= O_SYNC;
fd = open(DEVICE, flags);
if (fd == -1) {
perror("open");
exit(1);
}
for (i = 0; i < NR_THREADS-1; i++) {
pthread_t tid;
pthread_create(&tid, NULL, doit, (void *)offset);
offset += BUFSIZE;
}
doit((void *)offset);
return 0;
}
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>