725 Commits

Author SHA1 Message Date
Konrad Rzeszutek Wilk
63a741757d xen/swiotlb: Use page alignment for early buffer allocation.
This fixes an odd bug found on a Dell PowerEdge 1850/0RC130
(BIOS A05 01/09/2006) where all of the modules doing pci_set_dma_mask
would fail with:

ata_piix 0000:00:1f.1: enabling device (0005 -> 0007)
ata_piix 0000:00:1f.1: can't derive routing for PCI INT A
ata_piix 0000:00:1f.1: BMDMA: failed to set dma mask, falling back to PIO

The issue was the Xen-SWIOTLB was allocated such as that the end of
buffer was stradling a page (and also above 4GB). The fix was
spotted by Kalev Leonid  which was to piggyback on git commit
e79f86b2ef9c0a8c47225217c1018b7d3d90101c "swiotlb: Use page alignment
for early buffer allocation" which:

	We could call free_bootmem_late() if swiotlb is not used, and
	it will shrink to page alignment.

	So alloc them with page alignment at first, to avoid lose two pages

And doing that fixes the outstanding issue.

CC: stable@kernel.org
Suggested-by: "Kalev, Leonid" <Leonid.Kalev@ca.com>
Reported-and-Tested-by: "Taylor, Neal E" <Neal.Taylor@ca.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-12-15 11:28:46 -05:00
Kay Sievers
0706802183 xen-balloon: convert sysdev_class to a regular subsystem
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-14 15:32:50 -08:00
Konrad Rzeszutek Wilk
f21ffe9f6d swiotlb: Expose swiotlb_nr_tlb function to modules
As a mechanism to detect whether SWIOTLB is enabled or not.
We also fix the spelling - it was swioltb instead of
swiotlb.

CC: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
[v1: Ripped out swiotlb_enabled]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-12-06 10:38:03 +00:00
Annie Li
c123799a41 xen/granttable: Keep code format clean
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Annie Li <annie.li@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-22 09:26:20 -05:00
Annie Li
85ff6acb07 xen/granttable: Grant tables V2 implementation
Receiver-side copying of packets is based on this implementation, it gives
better performance and better CPU accounting. It totally supports three types:
full-page, sub-page and transitive grants.

However this patch does not cover sub-page and transitive grants, it mainly
focus on Full-page part and implements grant table V2 interfaces corresponding
to what already exists in grant table V1, such as: grant table V2
initialization, mapping, releasing and exported interfaces.

Each guest can only supports one type of grant table type, every entry in grant
table should be the same version. It is necessary to set V1 or V2 version before
initializing the grant table.

Grant table exported interfaces of V2 are same with those of V1, Xen is
responsible to judge what grant table version guests are using in every grant
operation.

V2 fulfills the same role of V1, and it is totally backwards compitable with V1.
If dom0 support grant table V2, the guests runing on it can run with either V1
or V2.

Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Annie Li <annie.li@oracle.com>
[v1: Modified alloc_vm_area call (new parameters), indentation, and cleanpatch
     warnings]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-22 09:24:51 -05:00
Annie Li
b1e495b2fa xen/granttable: Refactor some code
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Annie Li <annie.li@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-22 09:24:00 -05:00
Annie Li
0f9f5a9588 xen/granttable: Introducing grant table V2 stucture
This patch introduces new structures of grant table V2, grant table V2 is an
extension from V1. Grant table is shared between guest and Xen, and Xen is
responsible to do corresponding work for grant operations, such as: figure
out guest's grant table version, perform different actions based on
different grant table version, etc. Although full-page structure of V2
is different from V1, it play the same role as V1.

Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Annie Li <annie.li@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-22 09:23:44 -05:00
Daniel De Graaf
420eb554d5 xen/event: Add reference counting to event channels
Event channels exposed to userspace by the evtchn module may be used by
other modules in an asynchronous manner, which requires that reference
counting be used to prevent the event channel from being closed before
the signals are delivered.

The reference count on new event channels defaults to -1 which indicates
the event channel is not referenced outside the kernel; evtchn_get fails
if called on such an event channel. The event channels made visible to
userspace by evtchn have a normal reference count.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-21 17:14:48 -05:00
Daniel De Graaf
0cc678f850 xen/gnt{dev,alloc}: reserve event channels for notify
When using the unmap notify ioctl, the event channel used for
notification needs to be reserved to avoid it being deallocated prior to
sending the notification.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-21 17:14:48 -05:00
Daniel De Graaf
8ca19a8937 xen/gntalloc: Change gref_lock to a mutex
The event channel release function cannot be called under a spinlock
because it can attempt to acquire a mutex due to the event channel
reference acquired when setting up unmap notifications.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-21 17:14:47 -05:00
Dan Carpenter
99cb2ddcc6 xen-gntalloc: signedness bug in add_grefs()
gref->gref_id is unsigned so the error handling didn't work.
gnttab_grant_foreign_access() returns an int type, so we can add a
cast here, and it doesn't cause any problems.
gnttab_grant_foreign_access() can return a variety of errors
including -ENOSPC, -ENOSYS and -ENOMEM.

CC: stable@kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-16 12:13:48 -05:00
Dan Carpenter
21643e69a4 xen-gntalloc: integer overflow in gntalloc_ioctl_alloc()
On 32 bit systems a high value of op.count could lead to an integer
overflow in the kzalloc() and gref_ids would be smaller than
expected.  If the you triggered another integer overflow in
"if (gref_size + op.count > limit)" then you'd probably get memory
corruption inside add_grefs().

CC: stable@kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-16 12:13:47 -05:00
Dan Carpenter
fc6e0c3b90 xen-gntdev: integer overflow in gntdev_alloc_map()
The multiplications here can overflow resulting in smaller buffer
sizes than expected.  "count" comes from a copy_from_user().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-16 12:13:46 -05:00
Daniel De Graaf
72e9cf2ab1 xen/balloon: Avoid OOM when requesting highmem
If highmem pages are requested from the balloon on a system without
highmem, the implementation of alloc_xenballooned_pages will allocate
all available memory trying to find highmem pages to return. Allow
low memory to be returned when highmem pages are requested to avoid
this loop.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-16 12:13:43 -05:00
David Vrabel
cd12909cb5 xen: map foreign pages for shared rings by updating the PTEs directly
When mapping a foreign page with xenbus_map_ring_valloc() with the
GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
pass a pointer to the PTE (in init_mm).

After the page is mapped, the usual fault mechanism can be used to
update additional MMs.  This allows the vmalloc_sync_all() to be
removed from alloc_vm_area().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
[v1: Squashed fix by Michal for no-mmu case]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
2011-11-16 12:13:08 -05:00
Linus Torvalds
daedd8708f Merge branch 'stable/cleanups-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/cleanups-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: use static initializers in xen-balloon.c
  Xen: fix braces and tabs coding style issue in xenbus_probe.c
  Xen: fix braces coding style issue in xenbus_probe.h
  Xen: fix whitespaces,tabs coding style issue in drivers/xen/pci.c
  Xen: fix braces coding style issue in gntdev.c and grant-table.c
  Xen: fix whitespaces,tabs coding style issue in drivers/xen/events.c
  Xen: fix whitespaces,tabs coding style issue in drivers/xen/balloon.c

Fix up trivial whitespace-conflicts in
 drivers/xen/{balloon.c,pci.c,xenbus/xenbus_probe.c}
2011-11-06 20:13:34 -08:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Linus Torvalds
06d381484f Merge branch 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  net: xen-netback: use API provided by xenbus module to map rings
  block: xen-blkback: use API provided by xenbus module to map rings
  xen: use generic functions instead of xen_{alloc, free}_vm_area()
2011-11-06 18:31:36 -08:00
Matthew Wilcox
0b934ccd70 Xen: Export xen_biovec_phys_mergeable
When Xen is enabled, using BIOVEC_PHYS_MERGEABLE in a module
causes xen_biovec_phys_mergeable to be referenced, so it needs
to be exported.

Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:41:27 -04:00
Paul Gortmaker
63c9744b9a xen: Add export.h for THIS_MODULE/EXPORT_SYMBOL to various xen users.
Things like THIS_MODULE and EXPORT_SYMBOL were simply everywhere
because module.h was also everywhere.  But we are fixing the latter.
So we need to call out the real users in advance.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:11 -04:00
Paul Gortmaker
72ee5112a0 xen: Add module.h to modular drivers/xen users.
Previously these drivers just got module.h implicitly, but we
are cleaning that up and it will be no longer.  Call out the
real users of it.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:11 -04:00
Linus Torvalds
1f6e05171b Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Add IRQF_RESUME_EARLY and resume such IRQs earlier
  genirq: Fix fatfinered fixup really
  genirq: percpu: allow interrupt type to be set at enable time
  genirq: Add support for per-cpu dev_id interrupts
  genirq: Add IRQCHIP_SKIP_SET_WAKE flag
2011-10-26 16:44:09 +02:00
Linus Torvalds
8a9ea3237e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1745 commits)
  dp83640: free packet queues on remove
  dp83640: use proper function to free transmit time stamping packets
  ipv6: Do not use routes from locally generated RAs
  |PATCH net-next] tg3: add tx_dropped counter
  be2net: don't create multiple RX/TX rings in multi channel mode
  be2net: don't create multiple TXQs in BE2
  be2net: refactor VF setup/teardown code into be_vf_setup/clear()
  be2net: add vlan/rx-mode/flow-control config to be_setup()
  net_sched: cls_flow: use skb_header_pointer()
  ipv4: avoid useless call of the function check_peer_pmtu
  TCP: remove TCP_DEBUG
  net: Fix driver name for mdio-gpio.c
  ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT
  rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
  ipv4: fix ipsec forward performance regression
  jme: fix irq storm after suspend/resume
  route: fix ICMP redirect validation
  net: hold sock reference while processing tx timestamps
  tcp: md5: add more const attributes
  Add ethtool -g support to virtio_net
  ...

Fix up conflicts in:
 - drivers/net/Kconfig:
	The split-up generated a trivial conflict with removal of a
	stale reference to Documentation/networking/net-modules.txt.
	Remove it from the new location instead.
 - fs/sysfs/dir.c:
	Fairly nasty conflicts with the sysfs rb-tree usage, conflicting
	with Eric Biederman's changes for tagged directories.
2011-10-25 13:25:22 +02:00
Linus Torvalds
04a8752485 Merge branches 'stable/drivers-3.2', 'stable/drivers.bugfixes-3.2' and 'stable/pci.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/drivers-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xenbus: don't rely on xen_initial_domain to detect local xenstore
  xenbus: Fix loopback event channel assuming domain 0
  xen/pv-on-hvm:kexec: Fix implicit declaration of function 'xen_hvm_domain'
  xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel
  xen/pv-on-hvm kexec: update xs_wire.h:xsd_sockmsg_type from xen-unstable
  xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel
  xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports
  xen/pv-on-hvm kexec: prevent crash in xenwatch_thread() when stale watch events arrive

* 'stable/drivers.bugfixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/pciback: Check if the device is found instead of blindly assuming so.
  xen/pciback: Do not dereference psdev during printk when it is NULL.
  xen: remove XEN_PLATFORM_PCI config option
  xen: XEN_PVHVM depends on PCI
  xen/pciback: double lock typo
  xen/pciback: use mutex rather than spinlock in vpci backend
  xen/pciback: Use mutexes when working with Xenbus state transitions.
  xen/pciback: miscellaneous adjustments
  xen/pciback: use mutex rather than spinlock in passthrough backend
  xen/pciback: use resource_size()

* 'stable/pci.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/pci: support multi-segment systems
  xen-swiotlb: When doing coherent alloc/dealloc check before swizzling the MFNs.
  xen/pci: make bus notifier handler return sane values
  xen-swiotlb: fix printk and panic args
  xen-swiotlb: Fix wrong panic.
  xen-swiotlb: Retry up three times to allocate Xen-SWIOTLB
  xen-pcifront: Update warning comment to use 'e820_host' option.
2011-10-25 09:19:36 +02:00
Linus Torvalds
31018acd4c Merge branches 'stable/bug.fixes-3.2' and 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/p2m/debugfs: Make type_name more obvious.
  xen/p2m/debugfs: Fix potential pointer exception.
  xen/enlighten: Fix compile warnings and set cx to known value.
  xen/xenbus: Remove the unnecessary check.
  xen/irq: If we fail during msi_capability_init return proper error code.
  xen/events: Don't check the info for NULL as it is already done.
  xen/events: BUG() when we can't allocate our event->irq array.

* 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Fix selfballooning and ensure it doesn't go too far
  xen/gntdev: Fix sleep-inside-spinlock
  xen: modify kernel mappings corresponding to granted pages
  xen: add an "highmem" parameter to alloc_xenballooned_pages
  xen/p2m: Use SetPagePrivate and its friends for M2P overrides.
  xen/p2m: Make debug/xen/mmu/p2m visible again.
  Revert "xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set."
2011-10-25 09:17:47 +02:00
Konrad Rzeszutek Wilk
d98b15db37 xen/xenbus: Remove the unnecessary check.
.. we check whether 'xdev' is NULL - but there is no need for
it as the 'dev' check is done before. The 'dev' is embedded in
the 'xdev' so having xdev != NULL with dev being being checked
is not going to happen.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-19 17:03:29 -04:00
Konrad Rzeszutek Wilk
e6599225db xen/irq: If we fail during msi_capability_init return proper error code.
There are three different modes: PV, HVM, and initial domain 0. In all
the cases we would return -1 for failure instead of a proper error code.
Fix this by propagating the error code from the generic IRQ code.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-19 17:03:28 -04:00
Konrad Rzeszutek Wilk
9bb9efe4ba xen/events: Don't check the info for NULL as it is already done.
The list operation checks whether the 'info' structure that is
retrieved from the list is NULL (otherwise it would not been able
to retrieve it). This check is not neccessary.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-19 17:03:26 -04:00
Konrad Rzeszutek Wilk
9d093e2958 xen/events: BUG() when we can't allocate our event->irq array.
In case we can't allocate we are doomed. We should BUG_ON
instead of trying to dereference it later on.

Acked-by: Ian Campbell <ian.campbell@citrix.com>
[v1: Use BUG_ON instead of BUG]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-19 17:03:25 -04:00
Konrad Rzeszutek Wilk
4645bf3067 xen/pciback: Check if the device is found instead of blindly assuming so.
Just in case it is not found, don't try to dereference it.

[v1: Added WARN_ON, suggested by Jan Beulich <JBeulich@suse.com>]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-19 17:01:10 -04:00
Konrad Rzeszutek Wilk
72bf809a19 xen/pciback: Do not dereference psdev during printk when it is NULL.
.. instead use BUG_ON() as all the callers of the kill_domain_by_device
check for psdev.

Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-19 16:58:17 -04:00
Ian Campbell
9bab0b7fba genirq: Add IRQF_RESUME_EARLY and resume such IRQs earlier
This adds a mechanism to resume selected IRQs during syscore_resume
instead of dpm_resume_noirq.

Under Xen we need to resume IRQs associated with IPIs early enough
that the resched IPI is unmasked and we can therefore schedule
ourselves out of the stop_machine where the suspend/resume takes
place.

This issue was introduced by 676dc3cf5bc3 "xen: Use IRQF_FORCE_RESUME".

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1318713254.11016.52.camel@dagon.hellion.org.uk
Cc: stable@kernel.org (at least to 2.6.32.y)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-10-17 11:42:49 +02:00
Dan Magenheimer
38a1ed4f03 xen: Fix selfballooning and ensure it doesn't go too far
The balloon driver's "current_pages" is very different from
totalram_pages.  Self-ballooning needs to be driven by
the latter.  Also, Committed_AS doesn't account for pages
used by the kernel so:
1) Add totalreserve_pages to Committed_AS for the normal target.
2) Enforce a floor for when there are little or no user-space threads
   using memory (e.g. single-user mode) to avoid OOMs.  The floor
   function includes a "min_usable_mb" tuneable in case we discover
   later that the floor function is still too aggressive in some
   workloads, though likely it will not be needed.

Changes since version 4:
- change floor calculation so that it is not as aggressive; this version
  uses a piecewise linear function similar to minimum_target in the 2.6.18
  balloon driver, but modified to add to totalreserve_pages instead of
  subtract from max_pfn, the 2.6.18 version causes OOMs on recent kernels
  because the kernel has expanded over time
- change safety_margin to min_usable_mb and comment on its use
- since committed_as does NOT include kernel space (and other reserved
  pages), totalreserve_pages is now added to committed_as.  The result is
  less aggressive self-ballooning, but theoretically more appropriate.
Changes since version 3:
- missing include causes compile problem when CONFIG_FRONTSWAP is disabled
- add comments after includes
Changes since version 2:
- missing include causes compile problem only on 32-bit
Changes since version 1:
- tuneable safety margin added

[v5: avi.miller@oracle.com: still too aggressive, seeing some OOMs]
[v4: konrad.wilk@oracle.com: fix compile when CONFIG_FRONTSWAP is disabled]
[v3: guru.anbalagane@oracle.com: fix 32-bit compile]
[v2: konrad.wilk@oracle.com: make safety margin tuneable]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
[v1: Altered description and added an extra include]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-14 12:36:08 -04:00
Daniel De Graaf
1f1503ba09 xen/gntdev: Fix sleep-inside-spinlock
BUG: sleeping function called from invalid context at /local/scratch/dariof/linux/kernel/mutex.c:271
in_atomic(): 1, irqs_disabled(): 0, pid: 3256, name: qemu-dm
1 lock held by qemu-dm/3256:
 #0:  (&(&priv->lock)->rlock){......}, at: [<ffffffff813223da>] gntdev_ioctl+0x2bd/0x4d5
Pid: 3256, comm: qemu-dm Tainted: G        W   3.1.0-rc8+ #5
Call Trace:
 [<ffffffff81054594>] __might_sleep+0x131/0x135
 [<ffffffff816bd64f>] mutex_lock_nested+0x25/0x45
 [<ffffffff8131c7c8>] free_xenballooned_pages+0x20/0xb1
 [<ffffffff8132194d>] gntdev_put_map+0xa8/0xdb
 [<ffffffff816be546>] ? _raw_spin_lock+0x71/0x7a
 [<ffffffff813223da>] ? gntdev_ioctl+0x2bd/0x4d5
 [<ffffffff8132243c>] gntdev_ioctl+0x31f/0x4d5
 [<ffffffff81007d62>] ? check_events+0x12/0x20
 [<ffffffff811433bc>] do_vfs_ioctl+0x488/0x4d7
 [<ffffffff81007d4f>] ? xen_restore_fl_direct_reloc+0x4/0x4
 [<ffffffff8109168b>] ? lock_release+0x21c/0x229
 [<ffffffff81135cdd>] ? rcu_read_unlock+0x21/0x32
 [<ffffffff81143452>] sys_ioctl+0x47/0x6a
 [<ffffffff816bfd82>] system_call_fastpath+0x16/0x1b

gntdev_put_map tries to acquire a mutex when freeing pages back to the
xenballoon pool, so it cannot be called with a spinlock held. In
gntdev_release, the spinlock is not needed as we are freeing the
structure later; in the ioctl, only the list manipulation needs to be
under the lock.

Reported-and-Tested-By: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-14 10:02:10 -04:00
Daniel De Graaf
e4184aaf3b xenbus: don't rely on xen_initial_domain to detect local xenstore
The xenstore daemon does not have to run in the xen initial domain;
however, Linux currently uses xen_initial_domain to test if a loopback
event channel should be used instead of the event channel provided in
Xen's start_info structure. Instead, if the event channel passed in the
start_info structure is not valid, assume that this domain will run
xenstored locally and set up the event channel.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-14 09:25:18 -04:00
Daniel De Graaf
77447991b6 xenbus: Fix loopback event channel assuming domain 0
The xenbus event channel established in xenbus_init is intended to be a
loopback channel, but the remote domain was hardcoded to 0; this will
cause the channel to be unusable when xenstore is not being run in
domain 0.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-14 09:25:17 -04:00
David Vrabel
4dcaebbf65 xen: use generic functions instead of xen_{alloc, free}_vm_area()
Replace calls to the Xen-specific xen_alloc_vm_area() and
xen_free_vm_area() functions with the generic equivalent
(alloc_vm_area() and free_vm_area()).

On x86, these were identical already.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 15:02:18 -04:00
David Vrabel
8b5d44a5ac xen: allow balloon driver to use more than one memory region
Allow the xen balloon driver to populate its list of extra pages from
more than one region of memory.  This will allow platforms to provide
(for example) a region of low memory and a region of high memory.

The maximum possible number of extra regions is 128 (== E820MAX) which
is quite large so xen_extra_mem is placed in __initdata.  This is safe
as both xen_memory_setup() and balloon_init() are in __init.

The balloon regions themselves are not altered (i.e., there is still
only the one region).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 11:12:10 -04:00
David Vrabel
b1cbf9b1d6 xen/balloon: simplify test for the end of usable RAM
When initializing the balloon only max_pfn needs to be checked
(max_pfn will always be <= e820_end_of_ram_pfn()) and improve the
confusing comment.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 11:12:09 -04:00
David Vrabel
aa24411b67 xen/balloon: account for pages released during memory setup
In xen_memory_setup() pages that occur in gaps in the memory map are
released back to Xen.  This reduces the domain's current page count in
the hypervisor.  The Xen balloon driver does not correctly decrease
its initial current_pages count to reflect this.  If 'delta' pages are
released and the target is adjusted the resulting reservation is
always 'delta' less than the requested target.

This affects dom0 if the initial allocation of pages overlaps the PCI
memory region but won't affect most domU guests that have been setup
with pseudo-physical memory maps that don't have gaps.

Fix this by accouting for the released pages when starting the balloon
driver.

If the domain's targets are managed by xapi, the domain may eventually
run out of memory and die because xapi currently gets its target
calculations wrong and whenever it is restarted it always reduces the
target by 'delta'.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 11:12:09 -04:00
Stefano Stabellini
5fbdc10395 xen: remove XEN_PLATFORM_PCI config option
Xen PVHVM needs xen-platform-pci, on the other hand xen-platform-pci is
useless in any other cases.
Therefore remove the XEN_PLATFORM_PCI config option and compile
xen-platform-pci built-in if XEN_PVHVM is selected.

Changes to v1:

- remove xen-platform-pci.o and just use platform-pci.o since it is not
externally visible anymore.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 10:52:16 -04:00
Dan Carpenter
e1db4cef89 xen/pciback: double lock typo
We called mutex_lock() twice instead of unlocking.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 10:50:26 -04:00
Stefano Stabellini
0930bba674 xen: modify kernel mappings corresponding to granted pages
If we want to use granted pages for AIO, changing the mappings of a user
vma and the corresponding p2m is not enough, we also need to update the
kernel mappings accordingly.
Currently this is only needed for pages that are created for user usages
through /dev/xen/gntdev. As in, pages that have been in use by the
kernel and use the P2M will not need this special mapping.
However there are no guarantees that in the future the kernel won't
start accessing pages through the 1:1 even for internal usage.

In order to avoid the complexity of dealing with highmem, we allocated
the pages lowmem.
We issue a HYPERVISOR_grant_table_op right away in
m2p_add_override and we remove the mappings using another
HYPERVISOR_grant_table_op in m2p_remove_override.
Considering that m2p_add_override and m2p_remove_override are called
once per page we use multicalls and hypercall batching.

Use the kmap_op pointer directly as argument to do the mapping as it is
guaranteed to be present up until the unmapping is done.
Before issuing any unmapping multicalls, we need to make sure that the
mapping has already being done, because we need the kmap->handle to be
set correctly.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v1: Removed GRANT_FRAME_BIT usage]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 10:32:58 -04:00
Stefano Stabellini
693394b8c3 xen: add an "highmem" parameter to alloc_xenballooned_pages
Add an highmem parameter to alloc_xenballooned_pages, to allow callers to
request lowmem or highmem pages.

Fix the code style of free_xenballooned_pages' prototype.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 09:56:52 -04:00
Konrad Rzeszutek Wilk
cf177fd049 xen/pciback: Add flag indicating device has been assigned by Xen
Device drivers that create and destroy SR-IOV virtual functions via
calls to pci_enable_sriov() and pci_disable_sriov can cause catastrophic
failures if they attempt to destroy VFs while they are assigned to
guest virtual machines.  By adding a flag for use by the Xen PCI back
to indicate that a device is assigned a device driver can check that
flag and avoid destroying VFs while they are assigned and avoid system
failures.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-27 00:48:34 -04:00
Konrad Rzeszutek Wilk
5b25d89e19 xen/pv-on-hvm:kexec: Fix implicit declaration of function 'xen_hvm_domain'
Randy found a compile error when using make randconfig to trigger

drivers/xen/xenbus/xenbus_xs.c:909:2: error: implicit declaration of function 'xen_hvm_domain'

it is unclear which of the CONFIG options triggered this. This
patch fixes the error.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-26 13:17:55 -04:00
Olaf Hering
ddacf5ef68 xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel
Add new xs_reset_watches function to shutdown watches from old kernel after
kexec boot.  The old kernel does not unregister all watches in the
shutdown path.  They are still active, the double registration can not
be detected by the new kernel.  When the watches fire, unexpected events
will arrive and the xenwatch thread will crash (jumps to NULL).  An
orderly reboot of a hvm guest will destroy the entire guest with all its
resources (including the watches) before it is rebuilt from scratch, so
the missing unregister is not an issue in that case.

With this change the xenstored is instructed to wipe all active watches
for the guest.  However, a patch for xenstored is required so that it
accepts the XS_RESET_WATCHES request from a client (see changeset
23839:42a45baf037d in xen-unstable.hg). Without the patch for xenstored
the registration of watches will fail and some features of a PVonHVM
guest are not available. The guest is still able to boot, but repeated
kexec boots will fail.

[v5: use xs_single instead of passing a dummy string to xs_talkv]
[v4: ignore -EEXIST in xs_reset_watches]
[v3: use XS_RESET_WATCHES instead of XS_INTRODUCE]
[v2: move all code which deals with XS_INTRODUCE into xs_introduce()
    (based on feedback from Ian Campbell); remove casts from kvec assignment]
Signed-off-by: Olaf Hering <olaf@aepfle.de>
[v1: Redid the git description a bit]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-22 16:32:24 -04:00
Jan Beulich
55e901fc1f xen/pci: support multi-segment systems
Now that the hypercall interface changes are in -unstable, make the
kernel side code not ignore the segment (aka domain) number anymore
(which results in pretty odd behavior on such systems). Rather, if
only the old interfaces are available, don't call them for devices on
non-zero segments at all.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
[v1: Edited git description]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-22 16:23:46 -04:00
Konrad Rzeszutek Wilk
74d33dedc2 xen/pciback: use mutex rather than spinlock in vpci backend
Similar to the "xen/pciback: use mutex rather than spinlock in passthrough backend"
this patch converts the vpci backend to use a mutex instead of
a spinlock. Note that the code taking the lock won't ever get called
from non-sleepable context

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-21 18:18:18 -04:00
Konrad Rzeszutek Wilk
b1766b6289 xen/pciback: Use mutexes when working with Xenbus state transitions.
The caller that orchestrates the state changes is xenwatch_thread
and it takes a mutex. In our processing of Xenbus states we can take
the luxery of going to sleep on a mutex, so lets do that and
also fix this bug:

BUG: sleeping function called from invalid context at /linux/kernel/mutex.c:271
in_atomic(): 1, irqs_disabled(): 0, pid: 32, name: xenwatch
2 locks held by xenwatch/32:
 #0:  (xenwatch_mutex){......}, at: [<ffffffff813856ab>] xenwatch_thread+0x4b/0x180
 #1:  (&(&pdev->dev_lock)->rlock){......}, at: [<ffffffff8138f05b>] xen_pcibk_disconnect+0x1b/0x80
Pid: 32, comm: xenwatch Not tainted 3.1.0-rc6-00015-g3ce340d #2
Call Trace:
 [<ffffffff810892b2>] __might_sleep+0x102/0x130
 [<ffffffff8163b90f>] mutex_lock_nested+0x2f/0x50
 [<ffffffff81382c1c>] unbind_from_irq+0x2c/0x1b0
 [<ffffffff8110da66>] ? free_irq+0x56/0xb0
 [<ffffffff81382dbc>] unbind_from_irqhandler+0x1c/0x30
 [<ffffffff8138f06b>] xen_pcibk_disconnect+0x2b/0x80
 [<ffffffff81390348>] xen_pcibk_frontend_changed+0xe8/0x140
 [<ffffffff81387ac2>] xenbus_otherend_changed+0xd2/0x150
 [<ffffffff810895c1>] ? get_parent_ip+0x11/0x50
 [<ffffffff81387de0>] frontend_changed+0x10/0x20
 [<ffffffff81385712>] xenwatch_thread+0xb2/0x180

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-21 18:18:16 -04:00