This allows the use of direct calls to the helpers,
and a direct branch back to the epilogue.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This will allow backends to make intelligent choices about how
to implement GUEST_BASE.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Implement the "functions may be omitted with NULL pointer"
interface mentioned in the function block comment by transforming
NULL entries in the read/write arrays into calls to the
unassigned_mem family of functions.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
exec.c has a comment 'XXX: optimize' for lduw_phys/stw_phys,
so let's do it, along the lines of stl_phys.
The reason to address 16 bit accesses specifically is that virtio relies
on these accesses to be done atomically, using memset as we do now
breaks this assumption, which is reported to cause qemu with kvm
to read wrong index values under stress.
https://bugzilla.redhat.com/show_bug.cgi?id=525323
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The usermode PAGE_RESERVED code is not required by the current mmap
implementation, and is already broken when guest_base != 0.
Unfortunately the bsd emulation still uses the old mmap implementation,
so we can't rip it out altogether.
Signed-off-by: Paul Brook <paul@codesourcery.com>
Greatly simplify the subpage implementation by not supporting
multiple devices at the same address at different widths. We
don't need full copies of mem_read/mem_write/opaque for each
address, only a single index back into the main io_mem_* arrays.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
V2 that uses endaddr = end-of-guest-address-space if !h2g_valid(endaddr)
after I found out that indeed works; and also disables the FreeBSD 6.x
/compat/linux/proc/self/maps fallback because it can return partial lines
if (at least I think that's the reason) the mappings change between
subsequent read() calls.
Signed-off-by: Juergen Lock <nox@jelal.kn-bremen.de>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Replaces direct phys_ram_dirty access with wrapper functions to prevent
direct access to the phys_ram_dirty bitmap.
Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
Reviewed-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Historically the qemu tlb "addend" field was used for both RAM and IO accesses,
so needed to be able to hold both host addresses (unsigned long) and guest
physical addresses (target_phys_addr_t). However since the introduction of
the iotlb field it has only been used for RAM accesses.
This means we can change the type of addend to unsigned long, and remove
associated hacks in the big-endian TCG backends.
We can also remove the host dependence from target_phys_addr_t.
Signed-off-by: Paul Brook <paul@codesourcery.com>
On ia64, the default memory alignement is not enough for a code
alignement. To fix that, force static_code_gen_buffer alignment
to CODE_GEN_ALIGN.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When the host page size is bigger that the target one, unprotecting a
page should:
- mark all the target pages corresponding to the host page as writable
- invalidate all tb corresponding to the host page (and not the target
page)
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Use kinfo_getvmmap(3) on FeeBSD >= 7.x and /compat/linux/proc on older
FreeBSD. (kinfo_getvmmap is preferred since /compat/linux/proc is
usually only mounted on hosts also using the Linuxolator.)
This patch is a bit hacky because the includes needed for kinfo_getvmmap
conflict with other definitions in exec.c by default so I had to `trick
around' a little, but I built the result in FreeBSD 6.4-stable and
7.2-stable tbs and on 8-stable on the host so the hacks at least
should be stable. (If this is a problem maybe we could also move the
kinfo_getvmmap invocations into a seperate source file but that would
be more work...)
Signed-off-by: Juergen Lock <nox@jelal.kn-bremen.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Arrange various declarations so that also non-CPU code can access
them, adjust users.
Move CPU specific code to cpus.c.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
QEMU uses a fixed page size for the CPU TLB. If the guest uses large
pages then we effectively split these into multiple smaller pages, and
populate the corresponding TLB entries on demand.
When the guest invalidates the TLB by virtual address we must invalidate
all entries covered by the large page. However the address used to
invalidate the entry may not be present in the QEMU TLB, so we do not
know which regions to clear.
Implementing a full vaiable size TLB is hard and slow, so just keep a
simple address/mask pair to record which addresses may have been mapped by
large pages. If the guest invalidates this region then flush the
whole TLB.
Signed-off-by: Paul Brook <paul@codesourcery.com>
The multi-level pagetable code fails to iterate ove all entries because
of the L2_BITS v.s. L2_SIZE thinko.
Signed-off-by: Paul Brook <paul@codesourcery.com>
Fixes warning:
CC sparc-bsd-user/exec.o
/src/qemu/exec.c: In function `page_check_range':
/src/qemu/exec.c:2375: warning: comparison is always true due to limited range of data type
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The page tracking code in exec.c is used by both userspace and system
emulation. Userspace emulation uses it to track virtual pages, and
system emulation to track ram pages. Introduce a new type to hold this
kind of address.
Signed-off-by: Paul Brook <paul@codesourcery.com>
The addr < end comparison prevents iterating over the last
page in the guest address space; an iteration based on
length avoids this problem.
At the same time, assert that the given address is in the
guest address space.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Define L1_MAP_ADDR_SPACE_BITS to be either the virtual address size
(in user mode) or physical address size (in system mode), and use
that to size l1_map. This rewrites page_find_alloc, page_flush_tb,
and walk_memory_regions.
Use TARGET_PHYS_ADDR_SPACE_BITS for the physical memory map based
off of l1_phys_map. This rewrites page_phys_find_alloc and
phys_page_for_each.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Removes a set of ifdefs from exec.c.
Introduce TARGET_VIRT_ADDR_SPACE_BITS for all targets other
than Alpha. This will be used for page_find_alloc, which is
supposed to be using virtual addresses in the first place.
Signed-off-by: Richard Henderson <rth@twiddle.net>
This grand cleanup drops all reset and vmsave/load related
synchronization points in favor of four(!) generic hooks:
- cpu_synchronize_all_states in qemu_savevm_state_complete
(initial sync from kernel before vmsave)
- cpu_synchronize_all_post_init in qemu_loadvm_state
(writeback after vmload)
- cpu_synchronize_all_post_init in main after machine init
- cpu_synchronize_all_post_reset in qemu_system_reset
(writeback after system reset)
These writeback points + the existing one of VCPU exec after
cpu_synchronize_state map on three levels of writeback:
- KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
- KVM_PUT_RESET_STATE (on synchronous system reset, all VCPUs stopped)
- KVM_PUT_FULL_STATE (on init or vmload, all VCPUs stopped as well)
This level is passed to the arch-specific VCPU state writing function
that will decide which concrete substates need to be written. That way,
no writer of load, save or reset functions that interact with in-kernel
KVM states will ever have to worry about synchronization again. That
also means that a lot of reasons for races, segfaults and deadlocks are
eliminated.
cpu_synchronize_state remains untouched, just as Anthony suggested. We
continue to need it before reading or writing of VCPU states that are
also tracked by in-kernel KVM subsystems.
Consequently, this patch removes many cpu_synchronize_state calls that
are now redundant, just like remaining explicit register syncs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Port qemu-kvm's -mem-path and -mem-prealloc options. These are useful
for backing guest memory with huge pages via hugetlbfs.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
CC: john cooper <john.cooper@redhat.com>
Userspace doesn't have physical memory, so cpu_physical_memory_rw
makes no sense. This is only used to implement cpu_memory_rw_debug, so
just implement that directly instead.
Signed-off-by: Paul Brook <paul@codesourcery.com>
Userspace emulation doesn't have a physical address space, so
l1_phys_map makes no sense. This code is never actually used, so don't
try and build it.
Signed-off-by: Paul Brook <paul@codesourcery.com>
remove direct kvm calls from exec.c, make
kvm use memory notifiers framework instead.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This adds notifiers for phys memory changes: a set of callbacks that
vhost can register and update kernel accordingly. Down the road, kvm
code can be switched to use these as well, instead of calling kvm code
directly from exec.c as is done now.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Qemu may hang in host_signal_handler after qemu has done a
seppuku with cpu_abort(). But at this stage we are not really
interested in target process coredump anymore, so unregister
host_signal_handler to die grafefully.
Signed-off-by: Riku Voipio <riku.voipio@nokia.com>
The default action of coalesced MMIO is, cache the writing in buffer, until:
1. The buffer is full.
2. Or the exit to QEmu due to other reasons.
But this would result in a very late writing in some condition.
1. The each time write to MMIO content is small.
2. The writing interval is big.
3. No need for input or accessing other devices frequently.
This issue was observed in a experimental embbed system. The test image
simply print "test" every 1 seconds. The output in QEmu meets expectation,
but the output in KVM is delayed for seconds.
Per Avi's suggestion, I hooked flushing coalesced MMIO buffer in VGA update
handler. By this way, We don't need vcpu explicit exit to QEmu to
handle this issue.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Win32 suffers from a very big memory leak when dealing with SCSI devices.
Each read/write request allocates memory with qemu_memalign (ie
VirtualAlloc) but frees it with qemu_free (ie free).
Pair all qemu_memalign() calls with qemu_vfree() to prevent such leaks.
Signed-off-by: Herve Poussineau <hpoussin@reactos.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Fixes receiving signals when guest code is being executed in a tight
loop. For an example, try interrupting the following code with ctrl-c.
http://nchipin.kos.to/test-loop.c
The tight loop is ofcourse brainless, but it is also exactly how the waitpid* testcases
are implemented.
Signed-off-by: Riku Voipio <riku.voipio@nokia.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The limit of iomem areas is quite low. Without the
debug print, it is quite hard to figure out why more
devices are not getting registered.
Signed-off-by: Riku Voipio <riku.voipio@nokia.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
/tmp doesn't exist under win32. Ease the pain of win32 development slightly.
From: Juha Riihimäki <juha.riihimaki@nokia.com>
Signed-off-by: Riku Voipio <riku.voipio@nokia.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
KVM on S390x requires the virtual address space of the guest's RAM to be
within the first 256GB.
The general direction I'd like to see KVM on S390 move is that this requirement
is losened, but for now that's what we're stuck with.
So let's just hack up qemu_ram_alloc until KVM behaves nicely :-).
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Call MADV_MERGEABLE on guest memory allocations. MADV_MERGABLE will be
available starting in Linux 2.6.32. This system call registers a region of
virtual address space with Linux as a candidate for transparent memory
sharing.
Patchworks-ID: 35447
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>