SFC: Disable color blending for first hires pixel with accuracy PPU
(fixes a green scanline on the left-edge of Jurassic Park)
libco: Don't include <sys/mman.h> when not using mprotect
nall: Detect Windows without invoking uname [Alcaro]
Added fully working deterministic save state support (non-portable.)
Rewind is now 100% deterministic, disk save states are still portable.
Added run-ahead support.
byuu says:
- bsnes: added video filters from bsnes v082
- bsnes: added ZSNES snow effect option when games paused or unloaded
(no, I'm not joking)
- bsnes: added 7-zip support (LZMA 19.00 SDK)
[Recent higan WIPs have also mentioned bsnes changes, although the higan code
no longer includes the bsnes code. These changes include:
- higan, bsnes: added EXLOROM, EXLOROM-RAM, EXHIROM mappings
- higan, bsnes: focus the viewport after leaving fullscreen exclusive
mode
- bsnes: re-added mightymo's cheat code database
- bsnes: improved make install rules for the game and cheat code
databases
- bsnes: delayed construction of hiro::Window objects to properly show
bsnes window icons
- Ed.]
byuu says:
First 32 instructions implemented in the TLCS900H disassembler. Only 992
to go!
I removed the use of anonymous namespaces in nall. It was something I
rarely used, because it rarely did what I wanted.
I updated all nested namespaces to use C++17-style namespace Foo::Bar {}
syntax instead of classic C++-style namespace Foo { namespace Bar {}}.
I updated ruby::Video::acquire() to return a struct, so we can use C++17
structured bindings. Long term, I want to get away from all functions
that take references for output only. Even though C++ botched structured
bindings by not allowing you to bind to existing variables, it's even
worse to have function calls that take arguments by reference and then
write to them. From the caller side, you can't tell the value is being
written, nor that the value passed in doesn't matter, which is terrible.
byuu says:
This synchronizes bsnes/higan with many recent internal nall changes.
This will be the last WIP until I am situated in Japan. Apologies for the
bugfixes that didn't get applied yet, I ran out of time.
byuu says:
Changelog:
- sfc/ppu-fast: added hires mode 7 option (doubles the sampling rate
of mode 7 pixels to reduce aliasing)
- sfc/ppu-fast: fixed mode 7 horizontal screen flip [hex_usr]
- bsnes: added capture screenshot function and path selection
- for now, it saves as BMP. I need a deflate implementation that
won't add an external dependency for PNG
- the output resolution is from the emulator: (256 or 512)x(240 or
480 minus overscan cropping if enabled)
- it captures the NEXT output frame, not the current one ... but
it may be wise to change this behavior
- it'd be a problem if the core were to exit and an image was
captured halfway through frame rendering
- bsnes: recovery state renamed to undo state
- bsnes: added manifest viewer tool
- bsnes: mention if game has been verified or not on the status bar
message at load time
- bsnes, nall: fixed a few missing function return values
[SuperMikeMan]
- bsnes: guard more strongly against failure to load games to avoid
crashes
- hiro, ruby: various fixes for macOS [Sintendo]
- hiro/Windows: paint on `WM_ERASEBKGND` to prevent status bar
flickering at startup
- icarus: SPC7110 heuristics fixes [hex_usr]
Errata:
- sfc/ppu-fast: remove debug hires mode7 force disable comment from
PPU::power()
[The `WM_ERASEBKGND` fix was already present in the 106r44 public
beta -Ed.]
byuu says:
Changelog:
- nall: renamed array to adaptive_array; marked it as deprecated
- nall: created new array class; which is properly static (ala
std::array) with optional bounds-checking
- sfc/ppu-fast: converted unmanaged arrays to use nall/array (no speed
penalty)
- bsnes: rewrote the cheat code editor to a new design
- nall: string class can stringify pointer types directly now, so
pointer() was removed
- nall: added array_view and pointer types (still unsure if/how I'll
use pointer)
byuu says:
Changelog:
- higan: Emulator::Interface::videoSize() renamed to videoResolution()
- higan: Emulator::Interface::rtcsync() renamed to rtcSynchronize()
- higan: added video display rotation support to Video
- GBA: substantially improved audio mixing
- fixed bug with FIFO 50%/100% volume setting
- now properly using SOUNDBIAS amplitude to control output
frequencies
- reduced quantization noise
- corrected relative volumes between PSG and FIFO channels
- both PSG and FIFO values cached based on amplitude; resulting in
cleaner PCM samples
- treating PSG volume=3 as 200% volume instead of 0% volume now
(unverified: to match mGBA)
- GBA: properly initialize ALL CPU state; including the vital
prefetch.wait=1 (fixes Classic NES series games)
- GBA: added video rotation with automatic key translation support
- PCE: reduced output resolution scalar from 285x242 to 285x240
- the extra two scanlines won't be visible on most TVs; and they
make all other cores look worse
- this is because all other cores output at 240p or less; so they
were all receiving black bars in windowed mode
- tomoko: added "Rotate Display" hotkey setting
- tomoko: changed hotkey multi-key logic to OR instead of AND
- left support for flipping it back inside the core; for those so
inclined; by uncommenting one line in input.hpp
- tomoko: when choosing Settings→Configuration, it will
automatically select the currently loaded system
- for instance, if you're playing a Game Gear game, it'll take you
to the Game Gear input settings
- if no games are loaded, it will take you to the hotkeys panel
instead
- WS(C): merged "Hardware-Vertical", "Hardware-Horizontal" controls
into combined "Hardware"
- WS(C): converted rotation support from being inside the core to
using Emulator::Video
- this lets WS(C) video content scale larger now that it's not
bounded by a 224x224 square box
- WS(C): added automatic key rotation support
- WS(C): removed emulator "Rotate" key (use the general hotkey
instead; I recommend F8 for this)
- nall: added serializer support for nall::Boolean (boolean) types
- although I will probably prefer the usage of uint1 in most cases
byuu wrote:
Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece
Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.
Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.
[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:
Unfortunately, 64-bits just wasn't enough precision (we were
getting misalignments ~230 times a second on 21/24MHz clocks), so
I had to move to 128-bit counters. This of course doesn't exist on
32-bit architectures (and probably not on all 64-bit ones either),
so for now ... higan's only going to compile on 64-bit machines
until we figure something out. Maybe we offer a "lower precision"
fallback for machines that lack uint128_t or something. Using the
booth algorithm would be way too slow.
Anyway, the precision is now 2^-96, which is roughly 10^-29. That
puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
referring to it as the byuusecond. The other 32-bits of precision
allows a 1Hz clock to run up to one full second before all clocks
need to be normalized to prevent overflow.
I fixed a serious wobbling issue where I was using clock > other.clock
for synchronization instead of clock >= other.clock; and also another
aliasing issue when two threads share a common frequency, but don't
run in lock-step. The latter I don't even fully understand, but I
did observe it in testing.
nall/serialization.hpp has been extended to support 128-bit integers,
but without explicitly naming them (yay generic code), so nall will
still compile on 32-bit platforms for all other applications.
Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
The "longer explanation" in the linked hastebin is:
Okay, so the idea is that we can have an arbitrary number of
oscillators. Take the SNES:
- CPU/PPU clock = 21477272.727272hz
- SMP/DSP clock = 24576000hz
- Cartridge DSP1 clock = 8000000hz
- Cartridge MSU1 clock = 44100hz
- Controller Port 1 modem controller clock = 57600hz
- Controller Port 2 barcode battler clock = 115200hz
- Expansion Port exercise bike clock = 192000hz
Is this a pathological case? Of course it is, but it's possible. The
first four do exist in the wild already: see Rockman X2 MSU1
patch. Manifest files with higan let you specify any frequency you
want for any component.
The old trick higan used was to hold an int64 counter for each
thread:thread synchronization, and adjust it like so:
- if thread A steps X clocks; then clock += X * threadB.frequency
- if clock >= 0; switch to threadB
- if thread B steps X clocks; then clock -= X * threadA.frequency
- if clock < 0; switch to threadA
But there are also system configurations where one processor has to
synchronize with more than one other processor. Take the Genesis:
- the 68K has to sync with the Z80 and PSG and YM2612 and VDP
- the Z80 has to sync with the 68K and PSG and YM2612
- the PSG has to sync with the 68K and Z80 and YM2612
Now I could do this by having an int64 clock value for every
association. But these clock values would have to be outside the
individual Thread class objects, and we would have to update every
relationship's clock value. So the 68K would have to update the Z80,
PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
per clock step event instead of one.
As such, we have to account for both possibilities. The only way to
do this is with a single time base. We do this like so:
- setup: scalar = timeBase / frequency
- step: clock += scalar * clocks
Once per second, we look at every thread, find the smallest clock
value. Then subtract that value from all threads. This prevents the
clock counters from overflowing.
Unfortunately, these oscillator values are psychotic, unpredictable,
and often times repeating fractions. Even with a timeBase of
1,000,000,000,000,000,000 (one attosecond); we get rounding errors
every ~16,300 synchronizations. Specifically, this happens with a CPU
running at 21477273hz (rounded) and SMP running at 24576000hz. That
may be good enough for most emulators, but ... you know how I am.
Plus, even at the attosecond level, we're really pushing against the
limits of 64-bit integers. Given the reciprocal inverse, a frequency
of 1Hz (which does exist in higan!) would have a scalar that consumes
1/18th of the entire range of a uint64 on every single step. Yes, I
could raise the frequency, and then step by that amount, I know. But
I don't want to have weird gotchas like that in the scheduler core.
Until I increase the accuracy to about 100 times greater than a
yoctosecond, the rounding errors are too great. And since the only
choice above 64-bit values is 128-bit values; we might as well use
all the extra headroom. 2^-96 as a timebase gives me the ability to
have both a 1Hz and 4GHz clock; and run them both for a full second;
before an overflow event would occur.
Another hastebin includes demonstration code:
#include <libco/libco.h>
#include <nall/nall.hpp>
using namespace nall;
//
cothread_t mainThread = nullptr;
const uint iterations = 100'000'000;
const uint cpuFreq = 21477272.727272 + 0.5;
const uint smpFreq = 24576000.000000 + 0.5;
const uint cpuStep = 4;
const uint smpStep = 5;
//
struct ThreadA {
cothread_t handle = nullptr;
uint64 frequency = 0;
int64 clock = 0;
auto create(auto (*entrypoint)() -> void, uint frequency) {
this->handle = co_create(65536, entrypoint);
this->frequency = frequency;
this->clock = 0;
}
};
struct CPUA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
CPUA() { create(&CPUA::Enter, cpuFreq); }
} cpuA;
struct SMPA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
SMPA() { create(&SMPA::Enter, smpFreq); }
} smpA;
uint8 queueA[iterations];
uint offsetA;
cothread_t resumeA = cpuA.handle;
auto EnterA() -> void {
offsetA = 0;
co_switch(resumeA);
}
auto QueueA(uint value) -> void {
queueA[offsetA++] = value;
if(offsetA >= iterations) {
resumeA = co_active();
co_switch(mainThread);
}
}
auto CPUA::Enter() -> void { while(true) cpuA.main(); }
auto CPUA::main() -> void {
QueueA(1);
smpA.clock -= cpuStep * smpA.frequency;
if(smpA.clock < 0) co_switch(smpA.handle);
}
auto SMPA::Enter() -> void { while(true) smpA.main(); }
auto SMPA::main() -> void {
QueueA(2);
smpA.clock += smpStep * cpuA.frequency;
if(smpA.clock >= 0) co_switch(cpuA.handle);
}
//
struct ThreadB {
cothread_t handle = nullptr;
uint128_t scalar = 0;
uint128_t clock = 0;
auto print128(uint128_t value) {
string s;
while(value) {
s.append((char)('0' + value % 10));
value /= 10;
}
s.reverse();
print(s, "\n");
}
//femtosecond (10^15) = 16306
//attosecond (10^18) = 688838
//zeptosecond (10^21) = 13712691
//yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
//byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
this->handle = co_create(65536, entrypoint);
uint128_t unitOfTime = 1;
//for(uint n : range(29)) unitOfTime *= 10;
unitOfTime <<= 96; //2^96 time units ...
this->scalar = unitOfTime / frequency;
print128(this->scalar);
this->clock = 0;
}
auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
};
struct CPUB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
CPUB() { create(&CPUB::Enter, cpuFreq); }
} cpuB;
struct SMPB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
} smpB;
auto correct() -> void {
auto minimum = min(cpuB.clock, smpB.clock);
cpuB.clock -= minimum;
smpB.clock -= minimum;
}
uint8 queueB[iterations];
uint offsetB;
cothread_t resumeB = cpuB.handle;
auto EnterB() -> void {
correct();
offsetB = 0;
co_switch(resumeB);
}
auto QueueB(uint value) -> void {
queueB[offsetB++] = value;
if(offsetB >= iterations) {
resumeB = co_active();
co_switch(mainThread);
}
}
auto CPUB::Enter() -> void { while(true) cpuB.main(); }
auto CPUB::main() -> void {
QueueB(1);
step(cpuStep);
synchronize(smpB);
}
auto SMPB::Enter() -> void { while(true) smpB.main(); }
auto SMPB::main() -> void {
QueueB(2);
step(smpStep);
synchronize(cpuB);
}
//
#include <nall/main.hpp>
auto nall::main(string_vector) -> void {
mainThread = co_active();
uint masterCounter = 0;
while(true) {
print(masterCounter++, " ...\n");
auto A = clock();
EnterA();
auto B = clock();
print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
auto C = clock();
EnterB();
auto D = clock();
print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
for(uint n : range(iterations)) {
if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
}
}
}
...and that's everything.]
byuu says:
Changelog:
- (u)int(max,ptr) abbreviations removed; use _t suffix now [didn't feel
like they were contributing enough to be worth it]
- cleaned up nall::integer,natural,real functionality
- toInteger, toNatural, toReal for parsing strings to numbers
- fromInteger, fromNatural, fromReal for creating strings from numbers
- (string,Markup::Node,SQL-based-classes)::(integer,natural,real)
left unchanged
- template<typename T> numeral(T value, long padding, char padchar)
-> string for print() formatting
- deduces integer,natural,real based on T ... cast the value if you
want to override
- there still exists binary,octal,hex,pointer for explicit print()
formatting
- lstring -> string_vector [but using lstring = string_vector; is
declared]
- would be nice to remove the using lstring eventually ... but that'd
probably require 10,000 lines of changes >_>
- format -> string_format [no using here; format was too ambiguous]
- using integer = Integer<sizeof(int)*8>; and using natural =
Natural<sizeof(uint)*8>; declared
- for consistency with boolean. These three are meant for creating
zero-initialized values implicitly (various uses)
- R65816::io() -> idle() and SPC700::io() -> idle() [more clear; frees
up struct IO {} io; naming]
- SFC CPU, PPU, SMP use struct IO {} io; over struct (Status,Registers) {}
(status,registers); now
- still some CPU::Status status values ... they didn't really fit into
IO functionality ... will have to think about this more
- SFC CPU, PPU, SMP now use step() exclusively instead of addClocks()
calling into step()
- SFC CPU joypad1_bits, joypad2_bits were unused; killed them
- SFC PPU CGRAM moved into PPU::Screen; since nothing else uses it
- SFC PPU OAM moved into PPU::Object; since nothing else uses it
- the raw uint8[544] array is gone. OAM::read() constructs values from
the OAM::Object[512] table now
- this avoids having to determine how we want to sub-divide the two
OAM memory sections
- this also eliminates the OAM::synchronize() functionality
- probably more I'm forgetting
The FPS fluctuations are driving me insane. This WIP went from 128fps to
137fps. Settled on 133.5fps for the final build. But nothing I changed
should have affected performance at all. This level of fluctuation makes
it damn near impossible to know whether I'm speeding things up or slowing
things down with changes.
byuu says:
This is a few days old, but oh well.
This WIP changes nall,hiro,ruby,icarus back to (u)int(8,16,32,64)_t.
I'm slowly pushing for (u)int(8,16,32,64) to use my custom
Integer<Size>/Natural<Size> classes instead. But it's going to be one
hell of a struggle to get that into higan.
byuu says:
New update. Most of the work today went into eliminating hiro::Image
from all objects in all ports, replacing with nall::image. That took an
eternity.
Changelog:
- fixed crashing bug when loading games [thanks endrift!!]
- toggling "show status bar" option adjusts window geometry (not
supposed to recenter the window, though)
- button sizes improved; icon-only button icons no longer being cut off
byuu says:
Changelog:
- entire GBA core ported to auto function() -> return; syntax
- fixed GBA BLDY bug that was causing flickering in a few games
- replaced nall/config usage with nall/string/markup/node
- this merges all configuration files to a unified settings.bml file
- added "Ignore Manifests" option to the advanced setting tab
- this lets you keep a manifest.bml for an older version of higan; if
you want to do regression testing
Be sure to remap your controller/hotkey inputs, and for SNES, choose
"Gamepad" from "Controller Port 1" in the system menu. Otherwise you
won't get any input. No need to blow away your old config files, unless
you want to.
byuu says:
Changelog:
- fixed I/O register reads; perfect score on endrift's I/O tests now
- fixed mouse capture clipping on Windows [Cydrak]
- several hours of code maintenance work done on the SFC core
All higan/sfc files should now use the auto fn() -> ret; syntax. Haven't
converted all unsigned->uint yet. Also, probably won't do sfc/alt as
that's mostly just speed hack stuff.
Errata:
- forgot auto& instead of just auto on SuperFamicom::Video::draw_cursor,
which makes Super Scope / Justifier crash. Will be fixed in the next
WIP.
byuu says:
Changelog:
- synchronizes lots of nall changes
- changes displayed program title from tomoko to higan(*)
- browser dialog sort is case-insensitive
- .sys folders look at user-selected library path; no longer hard-coded
Tried to get rid of the file modes from the Windows browser dialog, but
it was being a bitch so I left it on for now.
- The storage locations and binary still use tomoko. I'm not really sure
what to do here. The idea is there may be more than one "higan" UI in
the future, but I don't want people to go around calling the entire
program by the UI name. For official Windows releases, I can rename
the binaries to "higan-{profile}.exe", and by putting the config files
with the binary, they won't ever see the tomoko folder. Linux is of
course trickier.
Note: Windows users will need to edit hiro/components.hpp and comment
out these lines:
#define Hiro_Console
#define Hiro_IconView
#define Hiro_SourceView
#define Hiro_TreeView
I forgot to do that, and too lazy to upload another WIP.
byuu says:
Changelog:
- added Cocoa target: higan can now be compiled for OS X Lion
[Cydrak, byuu]
- SNES/accuracy profile hires color blending improvements - fixes
Marvelous text [AWJ]
- fixed a slight bug in SNES/SA-1 VBR support caused by a typo
- added support for multi-pass shaders that can load external textures
(requires OpenGL 3.2+)
- added game library path (used by ananke->Import Game) to
Settings->Advanced
- system profiles, shaders and cheats database can be stored in "all
users" shared folders now (eg /usr/share on Linux)
- all configuration files are in BML format now, instead of XML (much
easier to read and edit this way)
- main window supports drag-and-drop of game folders (but not game files
/ ZIP archives)
- audio buffer clears when entering a modal loop on Windows (prevents
audio repetition with DirectSound driver)
- a substantial amount of code clean-up (probably the biggest
refactoring to date)
One highly desired target for this release was to default to the optimal
drivers instead of the safest drivers, but because AMD drivers don't
seem to like my OpenGL 3.2 driver, I've decided to postpone that. AMD
has too big a market share. Hopefully with v093 officially released, we
can get some public input on what AMD doesn't like.
byuu describes the changes since v067:
This release officially introduces the accuracy and performance cores,
alongside the previously-existing compatibility core. The accuracy core
allows the most accurate SNES emulation ever seen, with every last
processor running at the lowest possible clock synchronization level.
The performance core allows slower computers the chance to finally use
bsnes. It is capable of attaining 60fps in standard games even on an
entry-level Intel Atom processor, commonly found in netbooks.
The accuracy core is absolutely not meant for casual gaming at all. It
is meant solely for getting as close to 100% perfection as possible, no
matter the cost to speed. It should only be used for testing,
development or debugging.
The compatibility core is identical to bsnes v067 and earlier, but is
now roughly 10% faster. This is the default and recommended core for
casual gaming.
The performance core contains an entirely new S-CPU core, with
range-tested IRQs; and uses blargg's heavily-optimized S-DSP core
directly. Although there are very minor accuracy tradeoffs to increase
speed, I am confident that the performance core is still more accurate
and compatible than any other SNES emulator. The S-CPU, S-SMP, S-DSP,
SuperFX and SA-1 processors are all clock-based, just as in the accuracy
and compatibility cores; and as always, there are zero game-specific
hacks. Its compatibility is still well above 99%, running even the most
challenging games flawlessly.
If you have held off from using bsnes in the past due to its system
requirements, please give the performance core a try. I think you will
be impressed. I'm also not finished: I believe performance can be
increased even further.
I would also strongly suggest Windows Vista and Windows 7 users to take
advantage of the new XAudio2 driver by OV2. Not only does it give you
a performance boost, it also lowers latency and provides better sound by
way of skipping an API emulation layer.
Changelog:
- Split core into three profiles: accuracy, compatibility and
performance
- Accuracy core now takes advantage of variable-bitlength integers (eg
uint24_t)
- Performance core uses a new S-CPU core, written from scratch for speed
- Performance core uses blargg's snes_dsp library for S-DSP emulation
- Binaries are now compiled using GCC 4.5
- Added a workaround in the SA-1 core for a bug in GCC 4.5+
- The clock-based S-PPU renderer has greatly improved OAM emulation;
fixing Winter Gold and Megalomania rendering issues
- Corrected pseudo-hires color math in the clock-based S-PPU renderer;
fixing Super Buster Bros backgrounds
- Fixed a clamping bug in the Cx4 16-bit triangle operation [Jonas
Quinn]; fixing Mega Man X2 "gained weapon" star background effect
- Updated video renderer to properly handle mixed-resolution screens
with interlace enabled; fixing Air Strike Patrol level briefing screen
- Added mightymo's 2010-08-19 cheat code pack
- Windows port: added XAudio2 output support [OV2]
- Source: major code restructuring; virtual base classes for processor
- cores removed, build system heavily modified, etc.