Forums

Sega Master System / Mark III / Game Gear
SG-1000 / SC-3000 / SF-7000 / OMV
Home - Forums - Games - Scans - Maps - Cheats - Credits
Music - Videos - Development - Hacks - Translations - Homebrew

View topic - Emulators and accuracy questions

Reply to topic
Author Message
  • Joined: 08 Mar 2007
  • Posts: 92
Reply with quote
Emulators and accuracy questions
Post Posted: Thu Dec 11, 2008 6:12 pm
Hi,
I'd like to know what in you opinion match closely a real SMS and-or GG system regarding the accuracy of internal emulation and obviously on the final experience between Fusion and Emukon. I can't consider Meka because every systems I own suffer of sound issues.

Regarding accuracy I'd like to know if the emulators that actually supports bios images, really use it to help the emulation once the game start or only let the splash initial screen start without any differences.

Thank you.
Bye
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14755
  • Location: London
Reply with quote
Post Posted: Thu Dec 11, 2008 10:16 pm
Both emulators are very accurate, I doubt either has any incompatibilities. The BIOS on the Master System has no role in emulation, it merely does cartridge detection/cartridge checking/region checking on startup, so the "support" is meaningless.
  View user's profile Send private message Visit poster's website
  • Joined: 21 Mar 2005
  • Posts: 51
  • Location: United Kingdom
Reply with quote
Post Posted: Thu Dec 11, 2008 11:53 pm
Just to add to Maxim's point, Z80 emulation is at a very advanced stage (and has been for some time) so I would have thought that all of the main SMS emulators would in fact be very accurate. Don't forget that the Z80 is used in a wide range of consoles, arcade machines and personal computers so over the last few years there have been literally hundreds of people working on Z80 emulation.

The issue with sound emulation is true for probably every emulator because the sound you get out of a real SMS or arcade rig incorporates a range of 'active' analogue characteristics that - in real terms - degrade the quality of the original audio signals generated by the CPU. This is why many of the older arcade games emulated by MAME reproduce the audio via samples rather than emulation, mainly because of the 'thinness' of the original chip output which inevitably had to be beefed up by amplification and sympathetic speaker selection.

Given the vast amount of effort that thousands of people put into emulating games consoles and arcade machines (in their free time, don't forget) let's not get into any kind of "emulator war" please :) They're all good, they're all very accurate and they allow you to play thousands of games that you'd never have dreamt would have been possible even a few years ago. If you want these people to make you the ultimate SMS emulator then put some money behind it, ok? Otherwise - to quote Mohammed Ali:

"Look at all you're getting, for free!"

Regards,

Neil
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14755
  • Location: London
Reply with quote
Post Posted: Fri Dec 12, 2008 9:31 am
To be fair, Meka's sound architecture is essentially broken on a lot of computers and fairly screwed up on a lot of others. It's a port of a DOS-friendly interrupt-driven sound system to a preemptively multitasking operating system with myriad sound drivers that are certainly not optimised for its usage of them.

One day someone will come and sort it all out...
  View user's profile Send private message Visit poster's website
  • Joined: 08 Mar 2007
  • Posts: 92
Reply with quote
Post Posted: Fri Dec 12, 2008 9:48 am
Thank for all the answers. My questions was only to know your opinions and not to pretend something. I would pay money on a new project if it will exist. Regarding z80 cpu i thoke that sms and gg ones could be different from other you find everywhere so a generic emulation could solve the task but with differences in accuracy. On the sound part i thinj that emukon has got something that convince me that the final accuracy is nearest to the original.
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14755
  • Location: London
Reply with quote
Post Posted: Fri Dec 12, 2008 10:03 am
Kega's sound is probably more accurate than Emukon's, I think the latter does nothing to avoid aliasing at high frequencies. Compare Aztec Adventure in PSG mode and listen to the tunefulness of the high notes; or use the PSG tester program and listen for the smoothness when you change between high frequencies.
  View user's profile Send private message Visit poster's website
  • Joined: 22 Feb 2006
  • Posts: 39
Reply with quote
Post Posted: Fri Dec 12, 2008 12:27 pm
I have a related question about Dega (used for speedruns, seems pretty accurate). Maybe someone here can answer it.

There's this Enhanced Psg mode in it which makes the music sound great, but what does it do exactly? It's not FM.
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14755
  • Location: London
Reply with quote
Post Posted: Fri Dec 12, 2008 1:28 pm
Dega is probably the least accurate modern emulator.

In its "enhance PSG" mode, it takes some hard-coded wave samples and scales them to fit into the square wave length. It's effectively replacing square waves with samples, somewhat like "wavetable" FM did back in prehistoric times.
  View user's profile Send private message Visit poster's website
Aamir
  • Guest
Reply with quote
Post Posted: Mon Dec 15, 2008 10:38 am
Hi,
Quote
I would pay money on a new project if it will exist.

Really?? You should've met me earlier :D .

Seriously though, if I don't get lazy I'll be releasing a new build of Regen with pretty accurate SMS/GG/SG/SC emulation in a few days. It can pass all the FluBBa's VDP tests, runs PSG and FM at original rate with very high quality FIR resampling, emulates BIOS, includes support of just about every controller(paddle,light phaser, sports pad etc..) and keyboard and I've yet to encounter a game which fails to run. Though only time can tell how accurate it is once I release it.

stay safe,

AamirM
 
  • Site Admin
  • Joined: 08 Jul 2001
  • Posts: 8661
  • Location: Paris, France
Reply with quote
Post Posted: Mon Dec 15, 2008 10:58 am
Aamir wrote
Seriously though, if I don't get lazy I'll be releasing a new build of Regen with pretty accurate SMS/GG/SG/SC emulation in a few days. It can pass all the FluBBa's VDP tests, runs PSG and FM at original rate with very high quality FIR resampling, emulates BIOS, includes support of just about every controller(paddle,light phaser, sports pad etc..) and keyboard and I've yet to encounter a game which fails to run. Though only time can tell how accurate it is once I release it.

Do you plan to release source code?

There's several good emulators around (Fusion) but they are losing a lot by being closed source (and generally closed mind when it is the work of one leading person). I wish we could have a new good emulator that could be improved on and feature-full (say, like Meka but for the 21st century).
  View user's profile Send private message Visit poster's website
  • Joined: 11 Feb 2009
  • Posts: 12
  • Location: Chatham, On, Canada
Reply with quote
Post Posted: Tue May 12, 2009 12:36 pm
Not to nitpick, just wanted to defend Steve Snake on this one. He was an actual programmer back in the day(NBA Jam!), and has spent countless years in the emulation scene from the early 486/pentium (remember Kgen?) days to now. So I think he's earned the right to keep Kega 'his' and I'm not sure how many spare time programmers could actually improve what he's done with KEGA :)
  View user's profile Send private message
  • Joined: 14 Oct 2006
  • Posts: 256
  • Location: NYC
Reply with quote
Post Posted: Thu May 14, 2009 1:41 am
You're more than welcome to pick apart my source code and help. Quite a few people from this community have helped out directly (vbt, brom, benryves) and indirectly. It's based off of brom's emulator and he's helped out immensely even in this new project.

Very much love to vbt on adding sound. :) !!!
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 08 Jul 2001
  • Posts: 8661
  • Location: Paris, France
Reply with quote
Post Posted: Thu May 14, 2009 3:46 am
Mach-X wrote
Not to nitpick, just wanted to defend Steve Snake on this one. He was an actual programmer back in the day(NBA Jam!), and has spent countless years in the emulation scene from the early 486/pentium (remember Kgen?) days to now. So I think he's earned the right to keep Kega 'his' and I'm not sure how many spare time programmers could actually improve what he's done with KEGA :)

I agree that KEGA is awesome, but there's a lot more that could be done on it if it was a public project. It's still up to the maintainer to keep control over the direction it can go. Even if 1 person can contribute someday it's better than nothing. Keeping a software "yours" as you mention hurt the community. If KEGA was open source I could be adding proper debugging feature or adding emulation of the more obscure 8-bits system or peripherals.
PoorAussie is now working on an emulator that seems great and the way of doing things in 2010 but he seems (not sure) to be doing the same closed-source way and that's wasted opportunity IHMO.
Just look at MAME.
  View user's profile Send private message Visit poster's website
  • Joined: 16 May 2002
  • Posts: 1356
  • Location: italy
Reply with quote
Post Posted: Thu May 14, 2009 8:23 am
MAME fails on so many levels, imo. But we may have different opinions about that.

Not to go offtopic but I'm still waiting for someone to add vgm dumping capabilities to MAME, for games which use either the 2151, the 2612, or the 2413. I'd kill for an arcade Outrun vgm set.
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14755
  • Location: London
Reply with quote
Post Posted: Thu May 14, 2009 9:06 am
MAME fails because it is not trying to be usable, it is trying to be "accurate" (even though it often isn't, considering its roots as a 90s-era emulator). If they figured out some way to make it a pluggable back-end, then maybe people could add custom extensions and GUIs to it without huge amounts of work. As it is, you have to live with the MAME way of doing things.

Out Run uses a YM2151 and a PCM chip. I'm not sure how much of the music you'd get without the latter. A lot of arcade games use more than one sound chip and although VGM nominally supports this - "simply" by having multiple VGM data streams in the same file - no player supports it.
  View user's profile Send private message Visit poster's website
  • Joined: 01 Apr 2005
  • Posts: 252
  • Location: Almere, The Netherlands
Reply with quote
Post Posted: Thu May 14, 2009 9:53 am
Speaking of Outrun BGM, there are OST cd's out there. But that is probably not what one'd want, but nevertheless.
  View user's profile Send private message Visit poster's website
  • Joined: 26 Aug 2008
  • Posts: 292
  • Location: Australia
Reply with quote
Post Posted: Thu May 14, 2009 9:57 am
I have a new SMS emulator coming out with cycle accurate components for the PSG, VDP and Z80. It seems to many that a cycle accurate Z80 emulator is a waste and it is in some ways, it is also somewhat slower! But I wasted the time developing it and debugging it so I may as well release it at some time (I'm distracted working on a Genesis core atm).... The SMS z80 runs at 3.5 million cycles per second (roughly), and the VDP runs at 3 times that or about 10 million cycles, emulating them both at these speeds is considerably intensive... it's on par with a modern "instruction accurate" genesis emulator.

I am aiming at a P4 2Ghz or so to run it at full speed, and this emulator is basically indistinguishable from a real machine... (I also emulate variances between models, which includes BIOS emulation). There is however gaps in knowledge about the machine and my guesses may be inaccurate, but luckily if they are it is easy to fix if someone has the info.

Compared to something like MEKA which runs very nicely with high existing compability on very low end systems it is understandable why no one until recently has decided to do this as most wouldn't be able to use it. Regen was apparently supposed to be using a cycle accurate Z80 for the next release too so I am also looking forward to aamirs work and how it compares to my own.

It is also rather unsatisfying to realize that an extremely accurate emulator is barely better than something like MEKA when it comes to compatibility with existing software, but in the name of accuracy and future software, we should go the extra step... at some point.
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 08 Jul 2001
  • Posts: 8661
  • Location: Paris, France
Reply with quote
Post Posted: Thu May 14, 2009 10:14 am
PoorAussie wrote
It is also rather unsatisfying to realize that an extremely accurate emulator is barely better than something like MEKA when it comes to compatibility with existing software, but in the name of accuracy and future software, we should go the extra step... at some point.

It is a great step. Has this been attempted before for other systems?
  View user's profile Send private message Visit poster's website
  • Joined: 25 Jul 2007
  • Posts: 733
  • Location: Melbourne, Australia
Reply with quote
Post Posted: Thu May 14, 2009 10:37 am
PoorAussie wrote
It is also rather unsatisfying to realize that an extremely accurate emulator is barely better than something like MEKA when it comes to compatibility with existing software, but in the name of accuracy and future software, we should go the extra step... at some point.


For what its worth, I appreciate the effort, because like you I believe that while not necessary a cycle accurate emulator is the least we can do for a system that endeared many of us growing up. If it weren't for such fanaticism this site would not exist and none of us would be here in the first place.

Also with no disrepect to Bock, because Meka is a grand achievement in SMS emulation, but it is also old, and very very broken that it really needs re-writting from the ground up, and if you're going to do that then what better place to start than with your project.
  View user's profile Send private message
  • Site Admin
  • Joined: 08 Jul 2001
  • Posts: 8661
  • Location: Paris, France
Reply with quote
Post Posted: Thu May 14, 2009 11:35 am
djbass wrote
Also with no disrepect to Bock, because Meka is a grand achievement in SMS emulation, but it is also old, and very very broken that it really needs re-writting from the ground up, and if you're going to do that then what better place to start than with your project.

Oh, no disrespect, I also think the same thing. This is the reason why I am hoping someone would release the source of an emulator that would be well written and a huge step forward (like PoorAussie's one) so we could move on to it. Meka is so 1999.

The reason I am maintaining Meka is mainly to add obscure stuff emulation or tools/debugging features because no other emulator is any close in this area now. I don't think many people are so obsessed with Sega 8-bits as us here - Steve Snake certainly isn't, and PoorAussie as just mentionned already moving to Megadrive. So there is very few chances that someone join in with a great closed source emulator and support it to the extent Meka was supported back in 2000-ish. Hence why we need open source, to share the load. I have no time and no competency to write a cycle-accurate emulator from scratch now, but I could be contributing to one greatly.
  View user's profile Send private message Visit poster's website
  • Joined: 26 Aug 2008
  • Posts: 292
  • Location: Australia
Reply with quote
Post Posted: Thu May 14, 2009 11:42 am
Bock wrote
PoorAussie wrote
It is also rather unsatisfying to realize that an extremely accurate emulator is barely better than something like MEKA when it comes to compatibility with existing software, but in the name of accuracy and future software, we should go the extra step... at some point.

It is a great step. Has this been attempted before for other systems?


Other NES emulators have done cycle accurate CPU/PPU/AUDIO, and I have also (but others have for a while now, mine is more recent). The NES CPU is much less complex to emulate and runs at much less speed than Z80 in the SMS so it's somewhat less intensive to emulate that one at the cycle level (though some cartridges add extra complexity). Some MSX emulators purport to be cycle accurate but I'm not sure if they actually have cycle accurate Z80's.

For the extra cost that cycle emulating the Z80 costs and the only benefit being tightly written games it is a difficult one. The NES library is full of examples which are tightly coupled to the CPU/PPU, SMS not so much.
  View user's profile Send private message Visit poster's website
  • Joined: 26 Aug 2008
  • Posts: 292
  • Location: Australia
Reply with quote
Post Posted: Thu May 14, 2009 12:45 pm
Last edited by PoorAussie on Tue Jul 21, 2009 3:05 am; edited 1 time in total
Maxim wrote
MAME fails because it is not trying to be usable, it is trying to be "accurate" (even though it often isn't, considering its roots as a 90s-era emulator). If they figured out some way to make it a pluggable back-end, then maybe people could add custom extensions and GUIs to it without huge amounts of work. As it is, you have to live with the MAME way of doing things.


Yes MAME is very limiting. The project's aim is so called accuracy but a duality exists within the project, they on one hand want it to play games (the new dynamic recompiling engine is for increased speed at cost of accuracy) and document the system. Something which better documents a system isn't actual code it is documents that Charles and others submit. Documenting by code relies upon you understanding the chaotic nature of all these developers and their habits. MAME is very far from accurate too, instruction accurate emulators for many arcade games just aren't good enough and most MAME developers stop when 90% of the game is working, the rest of the hardware is unimportant if it doesn't give them some big bang for buck.

Groups like SMSPower and a few others are just worlds apart in the form of community they have and the general well being to other humans that exist.
  View user's profile Send private message Visit poster's website
  • Joined: 08 Dec 2005
  • Posts: 488
  • Location: Melbourne, Australia
Reply with quote
Post Posted: Sat May 30, 2009 12:58 pm
PoorAussie wrote
For the extra cost that cycle emulating the Z80 costs...


Where does the extra cost of cycle-accurate emulation come from? Or, put another way: How does cycle-accurate emulation differ from the usual

for(;;) {
    RunZ80(228 /*cycles*/);
    RunVDP(1 /*scanline*/);
}

approach?
  View user's profile Send private message Visit poster's website
  • Joined: 14 Oct 2008
  • Posts: 513
Reply with quote
Post Posted: Sat May 30, 2009 2:28 pm
Cycle-accurate has more processing time.

Not sure if it's the best example of cycle-accurate, but there is the SNES emulator bsnes.
A 600mhz PC can run most emus of this system, but this one requires 3Ghz+ for full-speed.
(reportedly cycle-accurate for everything except the graphics chip, as the author reports you cannot yet buy a CPU fast enough to be able to play anything).
  View user's profile Send private message
  • Joined: 08 Dec 2005
  • Posts: 488
  • Location: Melbourne, Australia
Reply with quote
Post Posted: Sat May 30, 2009 4:10 pm
KingMike wrote
Cycle-accurate has more processing time.


Where does the extra processing time go? What does "cycle-accurate" emulation involve that the standard approach does not?

Perhaps I'm really asking: What exactly does "cycle-accurate emulation" mean?
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14755
  • Location: London
Reply with quote
Post Posted: Sat May 30, 2009 7:17 pm
I think cycle-accurate means two things:

1. Emulating on a clock cycle level, i.e. doing the CPU fetch/decode/execute stages explicitly (although there is some guesswork involved), as well as doing something similar for the VDP - fetching from VRAM at the right times, building internal buffers at the right times, etc. You could do the same for the audio, maybe even for the I/O chip or even the cartridge mappers.

2. Synchronise everything between these different clock rates (maybe needing to go up to the 54kHz master clock to get a "common denominator"), instead of running the CPU in chunks of 1 line, 1 pixel or 1 opcode.

The result is that timing sensitivities can be accurately emulated without hacks or workarounds, so long as you manage to get everything emulated just right.
  View user's profile Send private message Visit poster's website
  • Joined: 26 Aug 2008
  • Posts: 292
  • Location: Australia
Reply with quote
Post Posted: Sun May 31, 2009 4:02 am
Paul Baker wrote
PoorAussie wrote
For the extra cost that cycle emulating the Z80 costs...


Where does the extra cost of cycle-accurate emulation come from? Or, put another way: How does cycle-accurate emulation differ from the usual


It depends on the implementation. But most of the extra cost of cycle emulation comes from code outside the actual emulation, ie code which maintains the "state" of the machine. Though in some cases, like SMS, emulators in the past have simply been able to ignore emulating certain aspects (like VDP memory lookups) which adds considerable expense.

Emulators for NES can simply add a few hooks on reading/writing memory to call a "cycle" update callback function which updates certain things (ppu/audio/cartridges/etc). Their inner loops basically just call the cpu emulator.

BSNES seems to have a very weird and overly complex threading mechanism which the author believes is the best way to get speed. BSNES cannot save states because of all the complexity and it's design not being something you can easily stop and snapshot. The arguments BSNES author uses doesn't really convince me of his method, since the SNES is relatively low MHz and todays machines can easily handle 20 million cycles in a state machine per second without much sweat. He says his method is more readable, but if anyone here has actually looked at the source I am sure they will be amazed at the complexity and unreadability.

The way I do it for my Z80 is to have an internal table of steps that is added to on instruction decode and basically you switch() on this table for every step. So instead of say ~800K complex switch()s per second of emulation you have 3.5million less complex ones. There is actually very little guess work when it comes to cycle emulating the Z80 as there has been a lot of documents released which detail the steps. The undocumented instructions take some guesswork, but since they all follow similar logic it's easy to understand once you've been in the depths writing the core.

The biggest speed curse of emulating the SMS is the VDP which runs at ~10mhz, so compared to other emulators which have no concept of the VDP running at real clock speed it takes considerably more time to emulate the VDP correctly. Basically for the VDP having an internal cycle buffer wasn't the best approach, but I do use a similar method for the VRAM memory lookups as I do for Z80. The VDP is also highly undocumented so it takes considerable trial and error (and help from guys like Flubba) to find the right "settings".

The audio on the SMS is output only , so you can simply log writes and form a wave later which saves some state machine expense. At the moment on my 3.2Ghz core2duo, I get about 500% the speed of a native machine running everything cycle accurate (with a few sound filters which chew a bit more time) in release mode. MEKA runs 100% on a Pentium 200, and Massage was running on 486's.

BTW I am looking for any SMS addicts that have decent systems as beta testers, so if you want to help me test/debug my emulator send me a message. Preferably people with machines (+2Ghz) running Vista or Win7 on DX10 capable hardware as that is my testing environment at this stage. Though it will be a cross platform emulator when it gets released.
  View user's profile Send private message Visit poster's website
  • Joined: 08 Dec 2005
  • Posts: 488
  • Location: Melbourne, Australia
Reply with quote
Post Posted: Sun May 31, 2009 9:24 am
Quote
The way I do it for my Z80 is to have an internal table of steps that is added to on instruction decode and basically you switch() on this table for every step. So instead of say ~800K complex switch()s per second of emulation you have 3.5million less complex ones.


I see - no wonder there is additional overhead if you are emulating the internals of the Z80 on a per-cycle rather than per-instruction basis. However, I would have thought that in order to correctly emulate timing sensitive games it would be sufficient to emulate only the externals of the Z80 in a per-cycle manner - in other words, the interaction between the Z80 and the other components (excluding memory). For example, no game could be dependent on the timing of memory access by the Z80 since memory is not accessed by any other component. However, a game could be dependent on the VDP having generated exactly the correct number of pixels in a scanline when the Z80 accesses a VDP port.

I'm interested to know why you decided to emulate the Z80 in such a way. Have I misunderstood: Is there in fact a need for such accuracy in order to maximise compatibility?

Quote
BTW I am looking for any SMS addicts that have decent systems as beta testers, so if you want to help me test/debug my emulator send me a message.


I'm curious about your emulator: Are you, as mentioned earlier, emulating the Genesis as well? If so, do you also have a cycle-accurate 68k core?

Finally, are you planning to release the emulator as open-source so that those in this thread (including Bock) who have expressed an interest in contributing will be able to do so?
  View user's profile Send private message Visit poster's website
  • Joined: 26 Aug 2008
  • Posts: 292
  • Location: Australia
Reply with quote
Post Posted: Sun May 31, 2009 10:49 am
Paul Baker wrote
I see - no wonder there is additional overhead if you are emulating the internals of the Z80 on a per-cycle rather than per-instruction basis. However, I would have thought that in order to correctly emulate timing sensitive games it would be sufficient to emulate only the externals of the Z80 in a per-cycle manner - in other words, the interaction between the Z80 and the other components (excluding memory). For example, no game could be dependent on the timing of memory access by the Z80 since memory is not accessed by any other component. However, a game could be dependent on the VDP having generated exactly the correct number of pixels in a scanline when the Z80 accesses a VDP port.


Good question really. I emulated the Z80 in this manner because I wanted a cycle accurate Z80 emulator for this project and others (like genesis). Until you get multi-cpu some aspects don't need to be emulated as you stated. However the timing of ins/outs is something which affects SMS games, as are the interrupts, you could simulate this behaviour somewhat by modifying an existing core with callbacks, though it would be a lot more hacky.

To me there is no point in having a z80 emulator which works for one system but not another, if I was going to invest time in writing one I'd like it to be something I do once and that is the end of it.


Paul Baker wrote
I'm curious about your emulator: Are you, as mentioned earlier, emulating the Genesis as well? If so, do you also have a cycle-accurate 68k core?

Finally, are you planning to release the emulator as open-source so that those in this thread (including Bock) who have expressed an interest in contributing will be able to do so?


I doubt I'm going to open source my main project, simply because there are too many libraries (non emulation related) that have taken many years of my life. But you can never say never, Bock likely never anticipated open sourcing MEKA and he did. I think people also put too much faith in what open source brings, it hasn't done too much for MEKA unfortunately and there is a large community that loves it! When projects are very large it takes people of certain skillsets to be able to contribute to it, and these people are usually very busy making money instead of doing charity work which is fair enough.

However I plan to release the information I have about the SMS/VDP internals (once I have finalized it) and open source my tools relating to the new rom format I developed.

I'm not sure if a cycle accurate 68K is on the cards yet, baseline CPUs aren't fast enough to do genesis emulation at the cycle level yet. Whilst people think cycle level SNES is harder than cycle level genesis they are incorrect in my opinion. SNES has a lot of low MHz components whilst the Genesis is the opposite. 68K runs at ~8Mhz and the VDP is about double that, combined with a Z80 it is quite a bit of emulation. That said, if you do it in kind of hacky way it might be possible to do it on a 3GHz cpu. The instruction accurate genesis emulators out at the moment use significant amounts of CPU as it is when they are being accurate (I think some are now partly cycle based in the VDP).
  View user's profile Send private message Visit poster's website
  • Joined: 25 Jul 2007
  • Posts: 733
  • Location: Melbourne, Australia
Reply with quote
Post Posted: Mon Jun 01, 2009 9:29 am
PoorAussie wrote
I think people also put too much faith in what open source brings, it hasn't done too much for MEKA unfortunately and there is a large community that loves it! When projects are very large it takes people of certain skillsets to be able to contribute to it, and these people are usually very busy making money instead of doing charity work which is fair enough.


That depends entirely on how well structured the project is.

The problem with contributing to Meka are it doesn't compile out of the box, it uses libraries that aren't very well supported on newer platforms.

If I had to example an open-sourced project done right I would always point to the Quake source code as a gold standard, variables use hungarian notation and descriptive names making it easy to follow, it's well commented, the file names are reasonably descriptive and broken down into logical sections of code and for the most part it has VS & makefiles that actually work (with the only outside component requiring setup is the execution path to NASM).
  View user's profile Send private message
  • Joined: 26 Aug 2008
  • Posts: 292
  • Location: Australia
Reply with quote
Post Posted: Mon Jun 01, 2009 10:36 am
djbass wrote
That depends entirely on how well structured the project is.

The problem with contributing to Meka are it doesn't compile out of the box, it uses libraries that aren't very well supported on newer platforms.

If I had to example an open-sourced project done right I would always point to the Quake source code as a gold standard, variables use hungarian notation and descriptive names making it easy to follow, it's well commented, the file names are reasonably descriptive and broken down into logical sections of code and for the most part it has VS & makefiles that actually work (with the only outside component requiring setup is the execution path to NASM).


I've had no real problems looking at MEKA source but I haven't been interested in compiling it, just seeing how it ticks. I don't think it is that badly organized, but it was designed originally for DOS which will always bring about a few issues when it comes to modernizing it. It certainly has a charm about it though, probably because it was written by someone very passionate.

There are plenty of open source emulators like project 64, 1964, etc, that have gone nowhere because people who can work on them just don't want to. It's not really so much about how a project is structured as it is how easy it is to do something with it. It's easy to tweak something in quake like a weapon, for an emulator not so much.
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 08 Jul 2001
  • Posts: 8661
  • Location: Paris, France
Reply with quote
Post Posted: Mon Jun 01, 2009 10:48 am
PoorAussie wrote
There are plenty of open source emulators like project 64, 1964, etc, that have gone nowhere because people who can work on them just don't want to. It's not really so much about how a project is structured as it is how easy it is to do something with it. It's easy to tweak something in quake like a weapon, for an emulator not so much.

Yes but in our specific case (Sega 8-bit community) we have an increasing need to have a modern emulator that can be improved and tweaked by passionate people. A few of the people here would likely do that actively if given the opportunity (I am on line).

Would you consider closed-source, shared among a selected number of developers?
  View user's profile Send private message Visit poster's website
  • Joined: 26 Aug 2008
  • Posts: 292
  • Location: Australia
Reply with quote
Post Posted: Mon Jun 01, 2009 11:53 am
Bock wrote
Would you consider closed-source, shared among a selected number of developers?


Yeah I understand what you are saying. My project is a bit different than the usual emulator though so I'm not sure how well suited it will be and what my goals are yet.

I think something like Regen (made by aamir who visits here) would be better suited to what the 8bit community is after. It already has debuggers and things of this nature, and aamir said it will be cycle accurate in the next release.
  View user's profile Send private message Visit poster's website
  • Joined: 25 Jul 2007
  • Posts: 733
  • Location: Melbourne, Australia
Reply with quote
Post Posted: Mon Jun 01, 2009 12:06 pm
PoorAussie wrote
There are plenty of open source emulators like project 64, 1964, etc, that have gone nowhere because people who can work on them just don't want to. It's not really so much about how a project is structured as it is how easy it is to do something with it. It's easy to tweak something in quake like a weapon, for an emulator not so much.


That's generally what I meant when I referred to structure, willingness to contribute to a project will be proportional to how easy the project is to work with.

If someone who is normally too busy to work on a project for any significant amounts of time can jump in at any point and add little features or optimisations here and there then valid progressions can be made.
  View user's profile Send private message
  • Joined: 11 Jun 2009
  • Posts: 5
Reply with quote
Post Posted: Thu Jun 11, 2009 11:37 pm
Friend sent me a link to this thread. Hope you don't mind my registering to reply.

Quote
BSNES cannot save states because of all the complexity and it's design not being something you can easily stop and snapshot.


The same would be true of any emulator that used a threaded design. You'll see this more in next-gen emulators, I'm sure. We also have many ideas for save states anyway, I just don't have time myself to try them out.

If OSes weren't so lazy and had native per-thread or per-process hibernation, save states would be absolutely free and not require complex serialization functions, and we wouldn't have problems like "oops I forgot cpu.reg.somerandomvalue so sometimes the reload corrupts the graphics".

Quote
todays machines can easily handle 20 million cycles in a state machine per second without much sweat


Of course. In truth it was more my method of a state machine: the main CPU loop would call a state machine on the current opcode. That would call into a function with its own state machine on the current cycle. That would check an internal boolean to see if we were starting the bus hold or if the hold was completed and we needed to advance to the next cycle. This could all be unwound into one gigantic function of doom and only need one very fast state select. But I did indeed get a ~15-20% speed boost simply from moving from my old triple state machine to threading. You can get the two old versions and compare yourself if you don't believe it.

Quote
He says his method is more readable, but if anyone here has actually looked at the source I am sure they will be amazed at the complexity and unreadability.


... are you being serious? Below is an implementation using libco, followed by one using state machines.

//note: L is a shorthand #define for last_cycle();
//it just keeps the clutter out while simulating the two-stage pipeline of the 65c816 CPU IRQs

//opcode for <op> ($nn,s),y: load stack-indexed indirect+y; 16-bit version
//where <op> is adc, sbc, lda, and, ora, eor, cmp.
//my entire CPU core is 23.9kb and implements 256 instructions in both 8-bit and 16-bit variants

template<op> void CPUcore::op_read_isry_w() {
  sp = op_readpc();
  op_io();
  aa.l = op_readsp(sp + 0);
  aa.h = op_readsp(sp + 1);
  op_io();
  rd.l = op_readdbr(aa.w + regs.y.w + 0);
L rd.h = op_readdbr(aa.w + regs.y.w + 1);
  call(op);
}

template<op> void CPUcore::op_read_isry_w() {
  switch(state.cycle) {
    case 0: {
      if(state.hold == 0) add_clocks(4);
      else { sp = op_readpc(); state.cycle++; }
      state.hold ^= 1;
    } break;
    case 1: {
      op_io();
      state.cycle++;
    } break;
    case 2: {
      if(state.hold == 0) add_clocks(4);
      else { aa.l = op_readsp(sp + 0); state.cycle++; }
      state.hold ^= 1;
    } break;
    case 3: {
      if(state.hold == 0) add_clocks(4);
      else { aa.h = op_readsp(sp + 1); state.cycle++; }
      state.hold ^= 1;
    } break;
    case 4: {
      op_io();
      state.cycle++;
    } break;
    case 5: {
      if(state.hold == 0) add_clocks(4);
      else { rd.l = op_readdbr(aa.w + regs.y.w + 0); state.cycle++; }
      state.hold ^= 1;
    } break;
    case 6: {
      if(state.hold == 0) add_clocks(4);
      else { L rd.h = op_readdbr(aa.w + regs.y.w + 1); call(op); state.cycle = 0; }
      state.hold ^= 1;
    } break;
  }
}


Yes, I know you can partially hide the switch / case bloat with very clever #define tricks (case __LINE__:, etc). But you were saying my approach was more complex. Mine uses absolutely no pre-processor macros.

Both of these will synchronize to the S-SMP and S-PPU and have correct timings even for bus hold delays (yes, they matter. They affect the latch counter values for the S-PPU. The difference is slight, but it's the difference from exactly matching real hardware and not. If you can't get your timing counters right, you can't get more complex timing behaviors right.)

Further, my approach doesn't even need to iterate at a half-cycle step. Inside op_read(), it calls into a memory dispatch table. The entries in that table for the S-SMP will realize the S-CPU is trying to access S-SMP registers. It will then determine if the S-CPU is ahead of the S-SMP. If not, it just simply reads right away. Only if so will it call a simple co_switch(thread_smp) function. That's it. Now the S-SMP will run until it tries to access the S-CPU. The S-CPU thread that called co_switch() will resume right where it left off, but only after the CPU is now behind the SMP (eg it is safe to access the variable.)

Believe it or not, my S-CPU actually executes up to ~20,000 opcodes sometimes before switching to the S-SMP to catch up; and vice versa. The overhead comes from many things: modifying the stack pointer is *brutal* on modern pipelined processors, my entire emulator core is based heavily on small building blocks, inheritance and separation over enslavement, and I write code to be readable / maintainable moreso than to be fast.

Absolutely all of the synchronization is 100% transparent to the chips. Each chip is implemented and looks exactly like it's the only thing running with zero knowledge of other chips' existence. All synchronization is handled via system/scheduler, comprising 4.32kb of code for the whole emulator. How you can say with a straight face that is more complex than the traditional approach? Even libco's co_switch() itself is only 11 assembly instructions long.

One more example: the SuperFX shares the cartridge ROM with the main CPU and its own processor. Only one can assert the bus at a time. The CPU sets a special bit called RON to pass control to the SFX. If the CPU needs to touch ROM during an SFX program run, it can clear RON at any point while the SFX is running, and then read from ROM, and then set it again and the SFX will resume right where it left off. The SFX only freezes if it attempts to access ROM while the CPU has control of it. How would you go about emulating all that? Here's how I do it:

uint8 SuperFXGSUROM::read(unsigned addr) {
  while(!superfx.regs.scmr.ron) superfx.add_clocks(2);
  return memory::cartrom.read(addr);
}


But the real point of threading was to work around a serious edge case involving S-CPU DMA synchronization timing, where DMA can execute for ten full video frames, inside the middle of an opcode, and you have to know the *next* cycle's timing to properly sync the bus back up again afterward. It'd take me many pages to explain, but suffice to say it was absolute hell to code that into a state machine. It's *transparent* with my approach. Also note that we have up to *six* processors running in parallel at a time. I'm not sure if you can relate with the SMS, so my approach probably seems like overkill.

If you still believe bsnes is too complex, please cite specific examples and explain how it would be easier with a state machine.

Lastly, I apologize if I come off harsh or anything. Not my intention, I'm just kind of direct. Truly, if you really do have a better way of doing things, I would be appreciative and honestly go so far as to rewrite my entire emulator core to use it. I've done so at least a half-dozen times in the past to keep improving its readability and friendliness.

I'm honestly shocked that you find my code unreadable.

ADDENDUM: sorry to make this so long. Want to make sure I cover everything.

Sure, you can reduce the case statements in my second opcode example by merging the volatile functions (that don't affect other chips) like op_io(), and if you really wanted, you could even use case fall through when a read/write doesn't affect another chip. But to do that, you'd have to put that test right inside the opcode so it can conditionally fall through to the next case. And now you have various fall throughs that someone just looking at the opcode core won't understand at all. And I hope you don't make any subtle mistakes. libco makes it all transparent. libco does the same test for fallthrough, but not inside each of 256 opcodes. Not even inside each of ~12 various read functions or the global bus read/write functions. But inside the memory dispatcher ONLY for SMP register accesses. That is to say, the code to see if we need to sync the SMP to the CPU is inside the SMP register read function. Exactly where you would logically expect it to be.

Bottom line is this: your model also needs a scheduler to determine which state machine to enter. My model of cothreading is exactly like your model, only without the state machines. And that's it. It's not "XYZ in place of state machines", it's "sans state machines." If all we're doing is removing the most tedious and red-tape-ish parts of writing an emulator, how can that possibly be more complex?

Now, if you want to argue my overall programming is sloppy, we can agree to disagree. But threading is flat out a great idea for an emulator, and a major improvement over state machines for both readability and maintainability. It's a far more logical design. The downsides are also obvious: slower and harder to make save states with. It's up to you what is more important. My use of cooperative over pre-emptive means I can only use one core, but it also means there's no problems with deadlocks or variable stomping.

Not sure if it's the best example of cycle-accurate, but there is the SNES emulator bsnes.
A 600mhz PC can run most emus of this system, but this one requires 3Ghz+ for full-speed.


Come on, it's not that bad :P
A 1.6GHz Core Solo will get well over 60fps. My 3GHz Core 2 Duo ($149 a year and a half ago when I bought it) gets over 100fps in Yoshi's Island, a SuperFX2 game; and over 168fps in Zelda 3.
My minimum requirements are over-exaggerated so as to ensure a good experience.

Now a Pentium 4 needing 3GHz? Sure. Those aren't worth the silicon they're fabbed on.
  View user's profile Send private message
  • Joined: 26 Aug 2008
  • Posts: 292
  • Location: Australia
Reply with quote
Post Posted: Fri Jun 12, 2009 2:46 am
Hi byuu, sorry if my post(s) offended you in any way.

byuu wrote
The same would be true of any emulator that used a threaded design. You'll see this more in next-gen emulators, I'm sure. We also have many ideas for save states anyway, I just don't have time myself to try them out.

If OSes weren't so lazy and had native per-thread or per-process hibernation, save states would be absolutely free and not require complex serialization functions, and we wouldn't have problems like "oops I forgot cpu.reg.somerandomvalue so sometimes the reload corrupts the graphics".


The methods you could use for saving a state would be platform specific and pretty complex right?

I don't think we will be seeing many multithreaded emulators in regards to how you have done it in BSNES until our host CPUs get better threading instructions/mechanisms, and then finally better OS support for them. Realistically imo we won't be seeing much progress done on intel/amd cpus until 32+ cores are standard/common, which is probably 4-5 years away.

When you have enough cores for every chip you could have a thread running it (including memory controllers, io, etc), and also with better threading mechanisms a way to run threads in lockstep with each other so they never get too far off (or at all) track.

Either way, in my opinion saving the state (or rather being able to on cycle accuracy) is extremely important, mostly because without it you are very limited in what you can do with the emulator.

byuu wrote
But threading is flat out a great idea for an emulator, and a major improvement over state machines for both readability and maintainability


Just reading your "thread" version of the instruction, am I supposed to guess where the cycle break ups are? You may argue that it's pointless to know, but then I have to read other things to understand why you are doing what you are doing and when. Also you have a lot of boilerplate code with that state.hold stuff in the "non thread" which makes it look excessively ugly. At least with the latter it is a lot more clear what is going on for each cycle.

byuu wrote
Of course. In truth it was more my method of a state machine: the main CPU loop would call a state machine on the current opcode. That would call into a function with its own state machine on the current cycle. That would check an internal boolean to see if we were starting the bus hold or if the hold was completed and we needed to advance to the next cycle. This could all be unwound into one gigantic function of doom and only need one very fast state select. But I did indeed get a ~15-20% speed boost simply from moving from my old triple state machine to threading. You can get the two old versions and compare yourself if you don't believe it.


The method you used or at least the method you showed here is probably the "first state method" people think of when it comes to solving this problem. However I believe there are better ways.

The Z80 has ~1300 unique instructions, though many are simple variations of each other.

UINT8 Z80_op0xFF[]={Z80C_INC,Z80C_ENDM1,Z80C_INC,Z80C_PUSH16H,Z80C_ENDM,Z80C_INC,Z80C_PUSH16L,Z80C_ENDINST};


That is a buffer which contains every cycle for the RST instruction. If I ever need to know what RST does I can look at this and know instantly. On cycle 3 of every instruction when the instruction is being decoded on a real Z80 I add a buffer like this to a static decode buffer.

So there is a single switch for "steps" that I have broken the Z80 into. These steps are shared among MANY instructions, so my single switch which executes every one of the 1300 instruction is actually pretty small, about 150 lines.

Where I decode each instruction and put it into the static cycle buffer I have this. CYCLEBUFFER(Z80_op0xFF);Z80.addr.d=Z80PC;Z80PC=0x38;

Adding a bit of code to say "use this register" simplifies the main switch making it more compact and faster.

byuu wrote
If you still believe bsnes is too complex, please cite specific examples and explain how it would be easier with a state machine.


Sorry I don't have enough time to devote to citing specific examples right now, which may sound like a cop out but I don't really see this as that important. I'm not the only one who has this opinion though, if you ask a few SNES "emulator insiders" about this maybe they will give you their opinion on it. I must also add I haven't checked recent versions, I think the last version I checked was 18-24 months ago, maybe it's improved since then.
  View user's profile Send private message Visit poster's website
  • Joined: 11 Jun 2009
  • Posts: 5
Reply with quote
Post Posted: Fri Jun 12, 2009 7:29 am
Quote
Hi byuu, sorry if my post(s) offended you in any way.


No worries, even if I sound like it, I don't take this unpaid hobby very seriously.

I'm used to it with my unorthodox approaches to programming. But code cleanliness was one issue I didn't expect anyone to bring up. I take that very, very seriously. It's the raison d'être of an emulator that exists solely as a reference design. It would be like if I were the author of ZSNES and you called my code hopelessly slow.

Quote
The methods you could use for saving a state would be platform specific and pretty complex right?


There would probably be one method for Windows, and one method for everything else. A lot like every other API. It would actually be a lot easier if the OS supported it, the long list of saving and restoring all integers (or even taking the cheap route and fwrite()'ing entire structs at a time that makes assumptions about data alignment and endianness) would be replaced with a few lines of code in one spot for the entire app. It could be as simple as hibernate_thread(id, buffer, size) + resume_thread(id, buffer, size).

But yes, I agree those APIs likely won't exist anytime soon, if ever. They need to happen in the kernel too.

Quote
Either way, at my opinion saving the state (or rather being able to on cycle accuracy) is extremely important, mostly because without it you are very limited in what you can do with the emulator.


Cheat in video games, or watch other people cheat in video game movies? There's a small benefit to being able to save in games that don't naturally have saves (platformers), and a large benefit to debuggers for the 0.1% of us who develop for these consoles still.

I don't consider it very limiting at all, as it's not something the real hardware could do anyway. So what we have is a difference of opinion. I respect that.

Quote
Just reading your "thread" version of the instruction, am I supposed to guess where the cycle break ups are?


There aren't any break-ups, the compiled code does not enter and exit state machines, it is processed linearly. Any emu author should know how chips communicate with one another, and that they sync at those points. Thusly, an op_read*() will sync when it accesses another chip, as will op_write*(). It's just-in-time synchronization instead of just-in-case breaking everything into individual cycle-based state machines.

Quote
Also you have a lot of boilerplate code with that state.hold stuff in the "non thread" which makes it look excessively ugly. At least with the latter it is a lot more clear what is going on for each cycle.


I explained that it could be reduced in my addendum. And that you are arguing in favor of the latter shows that we simply are not going to see eye to eye here.

But at least I've stated my counter arguments, so others here can decide for themselves (if they care.)

Quote
If I ever need to know what RST does I can look at this and know instantly.


You take an interpretive language at heart, and stick extremely abbreviated micro-op identifiers into an array, and that's supposed to be easier?

UINT8 Z80_op0xFF[]={Z80C_INC,Z80C_ENDM1,Z80C_INC,Z80C_PUSH16H,Z80C_ENDM,Z80C_INC,Z80C_PUSH16L,Z80C_ENDINST};

//and to execute it
CYCLEBUFFER(Z80_op0xFF);Z80.addr.d=Z80PC;Z80PC=0x38;

vs.

void Z80::op_rst() {
  //store program counter on stack, so RET will return after the RST op
  stack_push( regs.PC.hi );
  stack_push( regs.PC.lo );
}

//and to execute it
opcode[0xff]();


Mine looks like C++ code, yours looks like data. Seriously, to anyone else here, which one is more understandable to you?

What is the difference between Z80C_ENDM1 and Z80C_ENDM? In fact, what do either mean? Why are there three Z80C_INC calls in an op that pushes two bytes on the stack? Looks like one increments the counter and two increment the stack, which if true would mean you can't even assume what the micro-ops mean, you have to take them in context of their surrounding ops.

You said in my state machine that at least we could see the cycle break-ups. I can't see bus hold delays in your example. Nor am I sure if Z80C_INC or Z80C_ENDINST is supposed to consume clock time. Your buffer lacks the same information you say is a problem with my approach.

In all honesty, your version is indeed very clever. The difference is you understand your own design better, as do I mine. But I don't make a point of saying "if anyone here has actually looked at the source I am sure they will be amazed at the complexity and unreadability" about your work on various internet forums.

Another thing about my version's sync points. With maybe a half-dozen lines of changes, they can be converted to sync at every bus hold, at every cycle edge or even at every opcode edge. That's because the opcode implementation and synchronization (and in your case, sync+state machine) are separate from one another, as it should be. Much easier to understand what a read() and write() function do than to have that boilerplate inside each and every micro-op of yours. Each of which are prone to subtle logic bugs.

I'm all about abstraction, it minimizes the points of failure. I can have an entire processor emulator with only one single spot that simulates bus hold delays. I don't believe it's better to have that inside every opcode cycle individually so people can "quick reference" that. If they can look at the opcode, they can look at op_read().

Quote
Where I decode each instruction and put it into the static cycle buffer I have this. CYCLEBUFFER(Z80_op0xFF);Z80.addr.d=Z80PC;Z80PC=0x38;


I take it CYCLEBUFFER() is what executes each entry in your list, and this code is inside the switch(opcode) table? And what is Z80PC=0x38 for?

Quote
Sorry I don't have enough time to devote to citing specific examples right now, which may sound like a cop out but I don't really see this as that important.


Of course you don't have to. But you're the one who started this by making a bold claim about my code with no factual evidence.

Quote
I'm not the only one who has this opinion though, if you ask a few SNES "emulator insiders" about this maybe they will give you their opinion on it.


And many SMS "emulation insiders" have agreed with me about the ridiculous complexity of your code and its "cycle buffers". I can't name any names, or be bothered to cite any specific examples, of course.

You see how specious that is? Sorry, I don't have the temperament of "oh no, some unnamed "insider" doesn't like my code; I better go ask everyone to find out who."
To hell with cowardly behind-the-back comments. If you have a legitimate point and show me, I listen. Deathlike2 advised my video filtering code was rather poor. I agreed and rewrote it all using much better OO-design principles. GIGO showed me a superior way to implement opcodes with templates instead of pre-processing. Just this week I rewrote my entire CPU and SMP opcode cores with the new design. I see merit in your cycle buffers and I may be able to use something similar in my cycle-based PPU core.

If you want to cop-out and not follow up with actual facts in your argument, that's fine. But you really shouldn't bring things up that you aren't willing to debate. Likewise, if you're not going to name names, then you should not state "some anonymous individual says this." For all anyone here knows, you're making things up.

I've cited specific, real code from my work and detailed, technical explanations. You've made a few damaging claims with no evidence. In the journalism world, this is called citing your research and sources. The kind of people who could agree with the argument you've presented here are the same that believe those "National Enquirer" magazines.

If I seem upset at all, it's because some people hear these things and they stick the back of their minds. So a few weeks later someone here on another forum remarks, "oh yeah, I remember another emulator author saying bsnes was programmed really poorly, actually." And that's bullshit. I'm damn proud of my code and how clean it is, and I'm willing to give up major features like speed and save states to achieve that. And I'm the first person to admit my flaws: my emulator is slow as hell, lacks an untold number of features, and is not suitable for general gaming. I'm also the first to recommend SNESGT to people who want to actually play games.

I could say everything you've said about your emulator verbatim on my forum, and you'd be in the same position I am: completely unable to defend yourself because there's no substance to counter. I would expect (and even ignore) such comments from a random nobody, but from a respected emulator author I'm extremely disappointed and forced to defend myself.
  View user's profile Send private message
  • Joined: 26 Aug 2008
  • Posts: 292
  • Location: Australia
Reply with quote
Post Posted: Fri Jun 12, 2009 9:44 am
byuu wrote
Cheat in video games, or watch other people cheat in video game movies? There's a small benefit to being able to save in games that don't naturally have saves (platformers), and a large benefit to debuggers for the 0.1% of us who develop for these consoles still.

I don't consider it very limiting at all, as it's not something the real hardware could do anyway. So what we have is a difference of opinion. I respect that.


Netplay, Rewinding live gameplay, movies, saving/loading instantly. These are features people now expect from emulators.


byuu wrote
There aren't any break-ups, the compiled code does not enter and exit state machines, it is processed linearly. Any emu author should know how chips communicate with one another, and that they sync at those points. Thusly, an op_read*() will sync when it accesses another chip, as will op_write*(). It's just-in-time synchronization instead of just-in-case breaking everything into individual cycle-based state machines.


No my point was it's not obvious what happens on a cycle level within that code compared to the other code.


byuu wrote
You take an interpretive language at heart, and stick extremely abbreviated micro-op identifiers into an array, and that's supposed to be easier?

Mine looks like C++ code, yours looks like data. Seriously, to anyone else here, which one is more understandable to you?


It's easier once you know the "micro-ops", I can easily see in a single line what makes up each instruction and when it happens.

byuu wrote
What is the difference between Z80C_ENDM1 and Z80C_ENDM? In fact, what do either mean? Why are there three Z80C_INC calls in an op that pushes two bytes on the stack? Looks like one increments the counter and two increment the stack, which if true would mean you can't even assume what the micro-ops mean, you have to take them in context of their surrounding ops.


Some special events happen on the end of the first M-Cycle, Z80C_INC is basically a NOP. I'm not expecting you to know what those microID's mean because I haven't gone over my code yet to make the ID's pretty. If I made them more descriptive then it would be easier to understand them without any other reading.

byuu wrote
You said in my state machine that at least we could see the cycle break-ups. I can't see bus hold delays in your example. Nor am I sure if Z80C_INC or Z80C_ENDINST is supposed to consume clock time. Your buffer lacks the same information you say is a problem with my approach.


The Z80 has a pin which allows you to insert wait states when reading memory and another to halt the Z80. These are checked at the correct times in the cycle breakdown only in ONE place. Similar to how all your logic is in one place when it comes to such things.

I don't understand why you needed to split everything up into threads to have cores which didn't need to know each other. I communicate to each core the same way the hardware does, by simulating pins that are physically connected, in this case it's a simple pointer. My VDP core gets a pointer that is just like a pin, it sets it to 1 whenever it needs to and whatever reads that "pin"/"pointer" knows an IRQ is set. In the SMS case it is only the CPU that it's connected to, but in the Genesis case it's a bit different.


byuu wrote
In all honesty, your version is indeed very clever. The difference is you understand your own design better, as do I mine. But I don't make a point of saying "if anyone here has actually looked at the source I am sure they will be amazed at the complexity and unreadability" about your work on various internet forums.


Well my code isn't available for you to comment on, so any comments you made would be rather ignorant I'm sure you would agree. ;)

byuu wrote
Another thing about my version's sync points. With maybe a half-dozen lines of changes, they can be converted to sync at every bus hold, at every cycle edge or even at every opcode edge. That's because the opcode implementation and synchronization (and in your case, sync+state machine) are separate from one another, as it should be. Much easier to understand what a read() and write() function do than to have that boilerplate inside each and every micro-op of yours. Each of which are prone to subtle logic bugs.

I'm all about abstraction, it minimizes the points of failure. I can have an entire processor emulator with only one single spot that simulates bus hold delays. I don't believe it's better to have that inside every opcode cycle individually so people can "quick reference" that. If they can look at the opcode, they can look at op_read().


Not sure what you are saying, my opcode reads only occur in one place, memory reads all get mapped through one function, cpu halts and wait states only occur in one area. Because I loop through something every cycle I can do these things all in the one place and at the correct cycle.

byuu wrote
I take it CYCLEBUFFER() is what executes each entry in your list, and this code is inside the switch(opcode) table? And what is Z80PC=0x38 for?


No cyclebuffer() is a macro for inserting the data into the Z80 state. Basically at any one point in time there are usually a few cycles already stored in the state to be executed. Z80PC=0x38 is a shortcut way to simply change the PC. In this case it doesn't matter when the PC is changed (with regard to the instruction) so I change it immediately and it saves time.

byuu wrote
Of course you don't have to. But you're the one who started this by making a bold claim about my code with no factual evidence.

And many SMS "emulation insiders" have agreed with me about the ridiculous complexity of your code and its "cycle buffers". I can't name any names, or be bothered to cite any specific examples, of course.

You see how specious that is? Sorry, I don't have the temperament of "oh no, some unnamed "insider" doesn't like my code; I better go ask everyone to find out who."


Of course I know how this sounds, but it is only my opinion. Most people here won't be able to understand my reasons and your reasons so it's fruitless anyhow except to the coders who care here. The coders here could also read your source code and form their own opinion if they wanted.

byuu wrote
To hell with cowardly behind-the-back comments. If you have a legitimate point and show me, I listen. Deathlike2 advised my video filtering code was rather poor. I agreed and rewrote it all using much better OO-design principles. GIGO showed me a superior way to implement opcodes with templates instead of pre-processing. Just this week I rewrote my entire CPU and SMP opcode cores with the new design. I see merit in your cycle buffers and I may be able to use something similar in my cycle-based PPU core.


As I said previously, maybe you have improved the code from what I read upwards of 2 years ago! I don't know until I check out your code again which I don't have much time to do right now. It's also not high on my priority list to help you out when I have my own problems.

byuu wrote
If I seem upset at all, it's because some people hear these things and they stick the back of their minds. So a few weeks later someone here on another forum remarks, "oh yeah, I remember another emulator author saying bsnes was programmed really poorly, actually." And that's bullshit. I'm damn proud of my code and how clean it is, and I'm willing to give up major features like speed and save states to achieve that. And I'm the first person to admit my flaws: my emulator is slow as hell, lacks an untold number of features, and is not suitable for general gaming. I'm also the first to recommend SNESGT to people who want to actually play games.

I could say everything you've said about your emulator verbatim on my forum, and you'd be in the same position I am: completely unable to defend yourself because there's no substance to counter. I would expect (and even ignore) such comments from a random nobody, but from a respected emulator author I'm extremely disappointed and forced to defend myself.


Well the thing is your emulator as stated by yourself is slower than it could be and lacks features other emulators have. In fact personally I find almost no reasoning behind why you chose to do what you do when they are better alternatives in regards to the design. In your opinion your code is clear and easy to maintain and that may be so now, but I'm not sure why this matters if say competing code is the same cleanliness, has more features and is faster.

And I do not think I am some respected emulator author, that would be someone like Bock. Just because I know a lot about emulators and writing code doesn't mean I am respected by others. Maybe because I know a lot about coding and emulators that gives my opinion more merit to some I don't know.

If you think people like myself just post what you consider a negative opinion for no reason other than boredom you probably need to reassess the situation. You've even stated yourself you've done major changes to the code when others criticized it, so you need to stop being so defensive about some coders opinion on your code from 24 months ago. Like most humans my opinion will only change when I see evidence to the contrary and unfortunately for you I don't have time to do it right now. If I do I will make a point to come back to this thread and update to others my opinion for you if you like.
  View user's profile Send private message Visit poster's website
  • Joined: 12 Jun 2009
  • Posts: 2
Reply with quote
Post Posted: Fri Jun 12, 2009 2:29 pm
PoorAussie wrote
Netplay, Rewinding live gameplay, movies, saving/loading instantly. These are features people now expect from emulators.


And people's expectations are unrealistic and sometimes nonsensical.

The minority of emulators support multiplayer over the internet, and the ones that do usually don't do a very good job of it. It can't be an easy thing to do or maintain. I've essentially given up on having it across all platforms, and it's especially far from my mind when emulators still have so many problems running the actual games correctly.

Movies are essentially emulator-specific and version-specific because of the emulation differences. That really blows a hole in the "share and archive" aspect of the format. Since you can't expect people to have 50 versions of an emulator on their computer or remake old movies every time a new release comes out, you're pretty much coerced as an author into decreasing the frequency of official releases. Personally, I'd rather people just use video capture software and post the result on youtube if they want to share gameplay with people.

Then there's the small group of people who have made a cult out of completing games faster than anyone else using cheat features. Supporting a feature for a reason that makes no sense isn't going to be a +1 for you, those people are just wierd. For personal use, using savestates to instantly rectify some mistake you made is overrated, because games without challenge or any fear of loss aren't fun. What makes it a +1 is that they can make unnecessarily hard games enjoyable, and they're convenient for old games that used passwords instead of an SRAM chip to cut costs. So sure, savestates are the one feature that makes some sense. But requiring every program author to spend time creating a custom solution that might also compromise their preferred design is just dumb when it can be done at the kernel level.
  View user's profile Send private message
  • Joined: 26 Aug 2008
  • Posts: 292
  • Location: Australia
Reply with quote
Post Posted: Fri Jun 12, 2009 4:36 pm
FitzRoy wrote
And people's expectations are unrealistic and sometimes nonsensical.

The minority of emulators support multiplayer over the internet, and the ones that do usually don't do a very good job of it. It can't be an easy thing to do or maintain. I've essentially given up on having it across all platforms, and it's especially far from my mind when emulators still have so many problems running the actual games correctly.

Movies are essentially emulator-specific and version-specific because of the emulation differences. That really blows a hole in the "share and archive" aspect of the format. Since you can't expect people to have 50 versions of an emulator on their computer or remake old movies every time a new release comes out, you're pretty much coerced as an author into decreasing the frequency of official releases. Personally, I'd rather people just use video capture software and post the result on youtube if they want to share gameplay with people.

Then there's the small group of people who have made a cult out of completing games faster than anyone else using cheat features. Supporting a feature for a reason that makes no sense isn't going to be a +1 for you, those people are just wierd. For personal use, using savestates to instantly rectify some mistake you made is overrated, because games without challenge or any fear of loss aren't fun. What makes it a +1 is that they can make unnecessarily hard games enjoyable, and they're convenient for old games that used passwords instead of an SRAM chip to cut costs. So sure, savestates are the one feature that makes some sense. But requiring every program author to spend time creating a custom solution that might also compromise their preferred design is just dumb when it can be done at the kernel level.


Hi Fitzroy, have you written an emulator? I assume so since you alluded to it but it's always helpful to know. I'm not sure why you would need to change the state of your emulator so often that it is breaking games left right and center, but it is a concern most definitely. Something I have thought about for a while actually. There is no real solution except to provide an ultra accurate emulator from version 1.0, one that has as much planning as possible in regards to what you don't currently know "for sure" and things that may change in the future.

The NES is an absolute beast to correctly save the state of due to all the cartridge hardware which all have their own requirements. I'm not sure why some authors have such a hard time with getting their emulators saving states reliably, I guess buggy cores is the biggest fault. One byte overwritten on occasion can ruin a game state.
  View user's profile Send private message Visit poster's website
  • Joined: 11 Jun 2009
  • Posts: 5
Reply with quote
Post Posted: Fri Jun 12, 2009 5:03 pm
Got a bit worked up over the "insiders" thing yesterday, sorry. Thanks for keeping things civil.

Quote
No my point was it's not obvious what happens on a cycle level within that code compared to the other code.


Ah. It seems odd to point out a flaw in my code when yours has the same :/

Quote
It's easier once you know the "micro-ops", I can easily see in a single line what makes up each instruction and when it - happens.


And mine's easier once you know one bus read and one bus write function*. I can see the same thing, but it's not as packed together. I don't see why getting an entire instruction on a single line is a benefit. We could reduce yours by turning those micro-ops into chars. By that logic, if we had:
char opFF[]="+1+H0+L"; would it be even better? Now you don't need the ENDINST, just look for the \0 terminator.

uint8 sCPU::op_read(unsigned addr) {
  clockcount = speed(addr);    //different memory runs at different speed
  cycle_edge();                //process events that occur on a cycle boundary, eg DMA
  add_clocks(clockcount - 4);  //simulate bus hold delay
  regs.mdr = bus.read(addr);   //perform the actual read, sync to other CPU if it is behind and may write to this address
  add_clocks(4);               //add bus lead out time
  return regs.mdr;             //return the results in CPU's memory data register (to support "open bus")
}


Quote
If I made them more descriptive then it would be easier to understand them without any other reading.


If you're going to make them more descriptive (read: longer), why not just use a line break for each micro-op? And if you do that, why not append () to each one? Now you have my code.

Quote
I don't understand why you needed to split everything up into threads to have cores which didn't need to know each other.


I don't. Enslavement works just as well. Separating it is a means of making the code "appear" more like a real processor would. I find it cleaner that way. I know how subjective that is, hence the quotes. But it lets you do a lot of neat stuff: I can change three bytes and turn the SA-1's 65c816 into a 21MHz Sony SMP. Or I can add another processor to the system with a half-dozen lines. Or I can with a single line swap out the DSP or PPU with more/less accurate versions, even if their internal design is very different. Even at run-time. Better yet, an Apple II GS emulator could use my CPU core with no need to remove SMP and PPU bindings. If you think my approach is strange, you're probably not a fan of MAME or MESS. I'd like my cores to be useful to eg MAME for Super System support, and for the arcade games that use the ST-0018 co-processor one day.

I'm aware some of what I mention can be done with a state machine as well, but it's more difficult there. I get the above for free with threading.

This is kind of like the debate between monolithic and micro kernels. You can pack it all together for a massive speedup, or make each component a small, unique module for simplicity.

Quote
Not sure what you are saying


Sorry, I wasn't very clear. Let's say there are three common designs for an interpretive emulator:
- sync at every bus hold point (lowest level, ~2-16 parts per opcode)
- sync at every cycle boundary (lower level, ~2-8 parts per opcode)
- sync at every opcode boundary (highest level, 1 part per opcode)

With my threading model, you can write code that looks like the highest level, but that synchronizes at any level you choose by changing two lines of code.

You mention it's harder to see the cycle boundaries in my code: I disagree as they are evident by the read/write/io() calls. I also posit that it's not information that should be in every opcode anyway. If you can abstract to a higher level, you should. My code still does everything in the right order. Use the WDC 65816 manual if you really want a break-down of each bus action.

I don't think anyone would agree with you that a cycle-based emulator's opcode implementations are less complex than an opcode-based one. With threading, you get the latter design with accuracy better than the former.

I mean really, what's the point of an opcode? To do one simple operation. Someone looking at an opcode in an emulator probably wants to look at the operation, not the fine details of cycle or bus boundaries. How is abstracting that away more complex / unreadable?

Quote
Of course I know how this sounds, but it is only my opinion.


And you're saying there's at least one "SNES insider" with the same opinion. I've been talking to all active SNES devs for the last five years. I know what they think of my work. But your comment with no names makes others think there's this group of people who agree with you, and I'm being unreasonable. All with no proof.

It's a big pet peeve of mine when people tell me what someone else said about me or my work without saying who it was. Don't bring it up if you want to keep their identity a secret.

Quote
Well the thing is your emulator as stated by yourself is slower than it could be and lacks features other emulators have.


Because it's designed to. Code clarity comes first. And that's the thing you're saying is overly complex and unreadable. If that were even remotely true, then my entire emulator would be pointless garbage.

Quote
In fact personally I find almost no reasoning behind why you chose to do what you do when they are better alternatives in regards to the design.


That sounds more like bias than objectivity. State machines are not a better alternative to modeling each processor than an independent and autonomous threaded design. It sounds like you're letting the importance of save states cloud your judgment on what is more readable in code form. I would challenge you to find an established emulator author familiar with threaded emulation who agrees with you. I can name at least Aaron Giles who designed something very similar to me, but decided against it for the sole reason that they wanted save states and such more. And Nemesis, who very much agrees with me but goes the pre-emptive route instead.

What a state machine does by its very nature is control program flow using variables. And what else controls program flow? A processor's instruction counter and stack. I move the state machine from a software implementation onto the actual native processor hardware.

As a parallel analogy, consider a shunting yard algorithm versus a recursive descent parser. Both parse LL(1) grammar. The former simulates a stack in C code, while the latter relies on the hardware-native stack. Which is easier to work with? The one that doesn't have the red tape of simulating something a processor natively supports. Your argument parallels that since we can't resume the state in the middle of an RDP computation (because people expect this ability) that it's not as readable and is more complex.

Quote
In your opinion your code is clear and easy to maintain and that may be so now, but I'm not sure why this matters if say competing code is the same cleanliness ...


Which SNES emulator are you referring to?

ZSNES?
%macro RTIMacro 0
    cmp byte[nmistatus],3
    jne .nodis658162
    test byte[curexecstate],01h
    jz .nodis65816
    and byte[curexecstate],0FEh
.nodis65816
    cmp byte[curexecstate],0
    jne .nn
    xor dh,dh
.nn
.nodis658162
    mov byte[curnmi],0
    test byte[xe],1
    jne near emulRTI

    mov cx,[xs]
    inc cx
    and cx,word[stackand]
    call membank0r8
    mov [xs],cx
    mov dl,al
    restoredl

    mov cx,[xs]
    inc cx
    and cx,word[stackand]
    xor eax,eax
    call membank0r8
    mov [xpc],al

    inc cx
    and cx,word[stackand]
    xor eax,eax
    call membank0r8
    mov [xpc+1],al

    inc cx
    and cx,word[stackand]
    xor eax,eax
    call membank0r8
    mov [xpb],al
    mov [xs],cx

    xor bh,bh
    xor eax,eax
    mov ax,[xpc]
    mov bl,dl
    mov edi,[tablead+ebx*4]
    mov bl,[xpb]
    mov [xpc],ax
    test eax,8000h
    jz .loweraddr
    mov esi,[snesmmap+ebx*4]
    mov [initaddrl],esi
    add esi,eax
    test dl,00010000b
    jnz .setx
    endloop
.loweraddr
    mov esi,[snesmap2+ebx*4]
    cmp eax,4300h
    jae .upperlower
    mov [initaddrl],esi
    add esi,eax
    cmp byte[esi],0CBh
    jne .notwai
    mov byte[intrset],2
.notwai
    test dl,00010000b
    jnz .setx
    endloop
.setx
    mov byte[xx+1],0
    mov byte[xy+1],0
    endloop
.upperlower
    cmp dword[memtabler8+ebx*4],regaccessbankr8
    je .dma
    mov byte[doirqnext],0
    mov [initaddrl],esi
    add esi,eax
    cmp byte[esi],0CBh
    jne .notwai2
    mov byte[intrset],2
.notwai2
    test dl,00010000b
    jnz .setx
    endloop
.dma
    mov esi,dmadata-4300h
    mov [initaddrl],esi
    add esi,eax
    test dl,00010000b
    jnz .setx
    endloop

emulRTI
    mov cx,[xs]
    inc cx
    and cx,word[stackand]
    call membank0r8
    mov [xs],cx

    mov dl,al
    or dl,00110000b
    restoredl

    mov cx,[xs]
    inc cx
    and cx,word[stackand]
    xor eax,eax
    call membank0r8
    mov [xpc],al

    inc cx
    and cx,word[stackand]
    xor eax,eax
    call membank0r8
    mov [xpc+1],al
    mov [xs],cx


    xor bh,bh
    xor eax,eax
    mov ax,[xpc]
    mov bl,dl
    mov edi,[tablead+ebx*4]
    xor bl,bl
    mov [xpc],ax
    test eax,8000h
    jz .loweraddr
    mov esi,[snesmmap+ebx*4]
    mov [initaddrl],esi
    add esi,eax
    endloop
.loweraddr
    mov esi,[snesmap2+ebx*4]
    mov [initaddrl],esi
    add esi,eax
    endloop
%endmacro


How about Snes9X?
static void Op40Slow (void) {
    AddCycles(TWO_CYCLES);
    if (!CheckEmulation()) {
        PullB (Registers.PL);
        S9xUnpackStatus ();
        PullW (Registers.PCw);
   PullB (Registers.PB);
        OpenBus = Registers.PB;
   ICPU.ShiftedPB = Registers.PB << 16;
    } else {
        PullBE (Registers.PL);
        S9xUnpackStatus ();
        PullWE (Registers.PCw);
        OpenBus = Registers.PCh;
   SetFlags (MemoryFlag | IndexFlag);
   missing.emulate6502 = 1;
    }
    S9xSetPCBase (Registers.PBPC);

    if (CheckIndex ()) {
   Registers.XH = 0;
   Registers.YH = 0;
    }
    S9xFixCycles();
/*    CHECK_FOR_IRQ(); */
}


For reference, here's mine:
void CPUcore::op_rti_n() {
  op_io();
  op_io();
  regs.p = op_readstack();
  if(regs.p.x) {
    regs.x.h = 0x00;
    regs.y.h = 0x00;
  }
  regs.pc.l = op_readstack();
  regs.pc.h = op_readstack();
L regs.pc.b = op_readstack();
}


If you can get code as nice as mine, while being just as accurate and faster ... please, by all means, I will pay you money to do it. And I will immediately adopt your design myself and admit I was wrong here.

I care about hardware accuracy, even when the accuracy doesn't affect any games. Remember how I said we could speed things up that by assuming fetching opcode operands would always be volatile (not requiring syncs), because 100% of known software executes out of ROM or RAM?

I ran into a problem recently where IRQs were throwing off timing by a few clocks every now and then. To track it down, I had to write a test program that jumped the program counter directly into the SMP registers. It fetched MMIO values to execute opcodes. It was needed to hit a WRAM increment port immediately after to determine if the IRQ handler was actually reading from memory or performing an I/O cycle. The official WDC docs said it was always an I/O cycle, I proved it would be converted to reads under certain conditions and emulated it.

Now that improvement can be applied to all emulators with no speed loss and it will help compatibility.

That wouldn't be possible if I took the aforementioned shortcut. If all I worried about was whether games relied on certain behaviors, I wouldn't be able to uncover higher level behaviors like that.

And that is the point of my emulator. If you look at it as a means to play games, it's no wonder you don't like it. I'm disappointed that I constantly have to defend an emulator that is focused on rigid, perfect accuracy. You already have ZSNES. Do you really want me to write an emulator that mimics ZSNES in design, features and speed? Now that is truly pointless. What I'm doing will benefit ZSNES and others.

Quote
If you think people like myself just post what you consider a negative opinion for no reason other than boredom you probably need to reassess the situation.


I know you have a reason. Which is why I wanted to understand why you would say that, and now I believe I know. You place a much higher emphasis on save states than I do. I believe that is biasing your opinion. I don't believe anyone will agree with you that your cycle buffer is cleaner than an implementation as a function. It's more terse, it's faster, but it's not cleaner.

Quote
You've even stated yourself you've done major changes to the code when others criticized it, so you need to stop being so defensive about some coders opinion on your code from 24 months ago.


I don't mind your opinion. I mind that you provided no evidence to back it up. I mind that others may blindly echo your comments elsewhere, damaging my code's reputation. I mind that you're saying random SNES insiders agree with you.

Quote
If I do I will make a point to come back to this thread and update to others my opinion for you if you like.


That's up to you. I genuinely do care and will work on any valid point you raise. I would appreciate any feedback you can give, but it's not your obligation to help improve my work.
  View user's profile Send private message
  • Joined: 12 Jun 2009
  • Posts: 2
Reply with quote
Post Posted: Fri Jun 12, 2009 6:55 pm
PoorAussie wrote
Hi Fitzroy, have you written an emulator? I assume so since you alluded to it but it's always helpful to know.


No, I'm just a studious observer.

Quote
I'm not sure why you would need to change the state of your emulator so often that it is breaking games left right and center, but it is a concern most definitely.


Well, let's say you simply change your mind about something midway, like enslavement or how you go about handling IRQs. Or let's say your code is so overwhelming and scatterbrained that you can't trace the cause of a game bug. I wouldn't think a first-time author would fully appreciate the best strategies available to him until he had already written many versions. Just like when I'm writing prose, my rough drafts end up looking nothing like the finished product.

Quote
Something I have thought about for a while actually. There is no real solution except to provide an ultra accurate emulator from version 1.0, one that has as much planning as possible in regards to what you don't currently know "for sure" and things that may change in the future.


Well, why would you expect your first effort to be 100% compatible with games? That's not so much a solution as it is a hope. Are you prepared to test 3000 games yourself to find which ones have bugs, knowing that if you fix those, it will trigger timing bugs in games that previously and serendipitously avoided them from being too inaccurate. So then are you prepared to test the 3000 again, noting specific titles that seem to be very picky? I realize the SMS library is nowhere near as large, but this is what you have to do on the SNES side to be sure you're increasing compatibility instead of just shifting it around.
  View user's profile Send private message
  • Joined: 11 Jun 2009
  • Posts: 5
Reply with quote
Post Posted: Sat Jun 13, 2009 5:36 am
I'm sorry to everyone here for derailing this thread with my discussion.

Back on topic ...

Quote
Where does the extra cost of cycle-accurate emulation come from? Or, put another way: How does cycle-accurate emulation differ from the usual approach?

Where does the extra processing time go? What does "cycle-accurate" emulation involve that the standard approach does not?


I think the main problem is people look at emulation overhead as a sum of all the processor speeds times a magic number. Eg 2.58MHz CPU+5.37MHz video*20 = ~160MHz needed to emulate it. Not true.

The real devil to emulation is in synchronization: when processor A writes to memory that processor B can read, we have to make sure B is caught up, otherwise B may end up reading the wrong value. And vice versa. And the same when one reads from the other. Say each execute ~1,000,000 cycles a second. But an opcode consists of ~5 cycles. So you only check and catch up each processor ~200,000 times at most per second. But with cycle based, you have to do that all ~1,000,000 times a second.

To sync to another processor, you have to completely exit out of the processor you're in, and go to a totally different section of code, and start executing there. This absolutely kills modern CPUs that have massive pipelines, very tiny L1 instruction caches, branch prediction tables, etc. All of that ends up getting flushed out when you switch over to a different emulated processor. And not only that, but you have to execute all of that boilerplate code that checks and syncs processors, and switches between them, five times more frequently.

It all adds up quickly to enormous overhead. Thus, accuracy gains are usually exponential to returns, that is:
If you can emulate a 10MHz CPU with a 100MHz PC at ~90% accuracy, then you need:
~200MHz for ~95% accuracy
~400MHz for ~98% accuracy
~800MHz for ~99% accuracy
~1.6GHz for ~99.5% accuracy
etc until you're down to the lowest possible timing levels of your emulated CPU.

And as you gain less accuracy while expending more and more speed, the majority of people will get very upset and go to other, faster emulators instead. Because it's very hard to perceive the difference between 99% and 99.5% accuracy, and if they don't notice the difference, most people don't care that it's there.

Note that these numbers are made up examples. Results will vary but that's the gist of it.
  View user's profile Send private message
  • Joined: 26 Aug 2008
  • Posts: 292
  • Location: Australia
Reply with quote
Post Posted: Wed Jun 17, 2009 3:43 pm
FitzRoy wrote
No, I'm just a studious observer.


I doubt that for some reason ;)


FitzRoy wrote
Well, let's say you simply change your mind about something midway, like enslavement or how you go about handling IRQs. Or let's say your code is so overwhelming and scatterbrained that you can't trace the cause of a game bug. I wouldn't think a first-time author would fully appreciate the best strategies available to him until he had already written many versions. Just like when I'm writing prose, my rough drafts end up looking nothing like the finished product.


Yeah, but any inexperienced programmer is going to have issues like that with any project.

FitzRoy wrote
Well, why would you expect your first effort to be 100% compatible with games? That's not so much a solution as it is a hope.


Once you go down to a deep enough level where you are simulating every single memory access on every single device, its latency, mistimed writes, bus conflicts, etc and *any* software works on it you know you are pretty close to a perfect emulator. Of course there are usually bugs in things, so testing is important.

Luckily for the SMS and NES nearly everything about them is known, something like the SNES is probably still somewhat off, at least for the addon hardware games. So for me, I might be able to release something initially which is very close to 100% machine accuracy because of the machines I have targeted and the work I have done with them over the past years.
  View user's profile Send private message Visit poster's website
  • Joined: 26 Aug 2008
  • Posts: 292
  • Location: Australia
Reply with quote
Post Posted: Wed Jun 17, 2009 4:35 pm
byuu wrote
I don't. Enslavement works just as well. Separating it is a means of making the code "appear" more like a real processor would. I find it cleaner that way. I know how subjective that is, hence the quotes. But it lets you do a lot of neat stuff: I can change three bytes and turn the SA-1's 65c816 into a 21MHz Sony SMP. Or I can add another processor to the system with a half-dozen lines. Or I can with a single line swap out the DSP or PPU with more/less accurate versions, even if their internal design is very different. Even at run-time. Better yet, an Apple II GS emulator could use my CPU core with no need to remove SMP and PPU bindings. If you think my approach is strange, you're probably not a fan of MAME or MESS. I'd like my cores to be useful to eg MAME for Super System support, and for the arcade games that use the ST-0018 co-processor one day.


I'm a fan of MAME, the theory of it anyhow. However I'm not sure why you are implying what I am doing is enslavement when my cores have zero idea about each other? They communicate through "pins" just like real machines do, hence I can swap whatever I want in there provided I connect the "pins" like the real hardware.

My emulator is as much enslaved as yours is. ;)

When better threaded CPUs get here with faster instructions/mechanisms I'll be able to easily take them into that direction. There is little point atm when it comes to speed.


byuu wrote
I don't think anyone would agree with you that a cycle-based emulator's opcode implementations are less complex than an opcode-based one. With threading, you get the latter design with accuracy better than the former.


When you go cycle based you have to break down a bigger problem into smaller ones along more well defined boundaries. Whilst I can read your code and guess as to where the cycle breakdowns are happening I have no way of knowing without reading some other material. This is also one of the biggest drawbacks to how you do things, no cycle boundaries means no stopping on cycle boundaries.

My philosophy with emulation is if you aren't the total god of the emulated machine what you are doing is hacking your way to a solution.

byuu wrote
I mean really, what's the point of an opcode? To do one simple operation. Someone looking at an opcode in an emulator probably wants to look at the operation, not the fine details of cycle or bus boundaries. How is abstracting that away more complex / unreadable?


Someone looking at a cycle accurate emulator wants to know how the hardware works on a cycle level. The way you are doing things is no different than MAME/ZSNES/SNES9X in regards to this information. Whilst for memory reads/writes/obvious things people can easily work it out, in some instructions there are internal ops which you have no idea how long are taking without going off to another source.


byuu wrote
And you're saying there's at least one "SNES insider" with the same opinion. I've been talking to all active SNES devs for the last five years. I know what they think of my work. But your comment with no names makes others think there's this group of people who agree with you, and I'm being unreasonable. All with no proof.

It's a big pet peeve of mine when people tell me what someone else said about me or my work without saying who it was. Don't bring it up if you want to keep their identity a secret.


Yes well when I first said it I thought the people I had spoken to had already told you these things. So it was more a reference to something I thought you knew about, and I wasn't going to be naming names for everyone else here. It is up to everyone here to draw their own opinions, I don't think they are some mindless bunch of zombies that follow what one guy says. This community here is good. :)

byuu wrote
Because it's designed to. Code clarity comes first. And that's the thing you're saying is overly complex and unreadable. If that were even remotely true, then my entire emulator would be pointless garbage.


The written emulation code isn't all you have to understand if you rely on libraries and other things to do what you do. You have even stated yourself that your approach is somewhat unique and by this admission you would understand that something unique takes more time to learn how it is working. Whilst that complexity may not be in written emulation code it is there, in the background.


byuu wrote
That sounds more like bias than objectivity. State machines are not a better alternative to modeling each processor than an independent and autonomous threaded design. It sounds like you're letting the importance of save states cloud your judgment on what is more readable in code form. I would challenge you to find an established emulator author familiar with threaded emulation who agrees with you. I can name at least Aaron Giles who designed something very similar to me, but decided against it for the sole reason that they wanted save states and such more. And Nemesis, who very much agrees with me but goes the pre-emptive route instead.


Your way lacks features and always will, which is why someone as talented as Aaron Giles decided not to use a method similar to yours. The fact that you place such a low value on save states or cycle boundaries will always place you into the "fringe" category when it comes to these things.

What I do is fast and to me I think very understandable in regards to the code. It isn't some hodge podge thing which is hard to extend or anything of this nature. I treat cores like actual hardware devices that don't know each other, which is what you do also. So when I think why you would do your method and your only reason is "code cleanliness" it kinda makes me wonder why you wouldn't choose a method like I did? When mine is faster, can be threaded at a future date with minimal work and supports cycle boundaries and save states.

byuu wrote
What a state machine does by its very nature is control program flow using variables. And what else controls program flow? A processor's instruction counter and stack. I move the state machine from a software implementation onto the actual native processor hardware.


And by your own admission it's slower? What exactly was the benefit of going this route again? If state machine code is over complicating the opcodes I don't think the state machine is very well designed!


byuu wrote
Which SNES emulator are you referring to?


None, I was comparing it to my current code. Snes9x/Zsnes are 12+ years old, your code looks much better I agree, but an instruction accurate emulator could easily look just as "nice" as yours.

byuu wrote
If you can get code as nice as mine, while being just as accurate and faster ... please, by all means, I will pay you money to do it. And I will immediately adopt your design myself and admit I was wrong here.

I care about hardware accuracy, even when the accuracy doesn't affect any games. Remember how I said we could speed things up that by assuming fetching opcode operands would always be volatile (not requiring syncs), because 100% of known software executes out of ROM or RAM?


Why would you pay me money to do it when BSNES isn't a commercial thing? ;)

I care about hardware accuracy too, maybe too much. I'm emulating the SMS VDP's internal sprite/tile grabbing even though no current games push the boundaries, simply because I don't want to have to come back in the future and add it! If there is a bug in my design it will be easy to fix because everything the hardware does is already in my code. So I can empathize with you on this one.


byuu wrote
And that is the point of my emulator. If you look at it as a means to play games, it's no wonder you don't like it. I'm disappointed that I constantly have to defend an emulator that is focused on rigid, perfect accuracy. You already have ZSNES. Do you really want me to write an emulator that mimics ZSNES in design, features and speed? Now that is truly pointless. What I'm doing will benefit ZSNES and others.


I'm not really a big user of SNES Emulators so my comments aren't based around useability. I'm simply talking about your design and why you did certain things, to me it simply doesn't make much sense. I understand that some of your users have probably frustrated you with these things in the past though making it a more sensitive subject.

However I certainly like your approach to hardware accuracy, and your goals seem to be the same as mine. I will possibly be tackling the SNES sometime in the future, so it will be interesting comparing our emulators at this date if it does happen. You've already broken a lot of ground with the SNES and made available a lot of information so I am one person who is very thankful for the work you have done, as I am with all contributors to the "scene". Just don't think a disagreement about your code makes someone not appreciative of the good work you have done!
  View user's profile Send private message Visit poster's website
  • Joined: 11 Jun 2009
  • Posts: 5
Reply with quote
Post Posted: Wed Jun 17, 2009 8:40 pm
Quote
my cores have zero idea about each other


Oh, awesome. Another assumption based around usual SNES dev.

Quote
This is also one of the biggest drawbacks to how you do things, no cycle boundaries means no stopping on cycle boundaries.


I can stop anywhere, it's just that resuming at a totally different point requires you to be in the entry function or restore the thread's stack first.

Quote
Someone looking at a cycle accurate emulator wants to know how the hardware works on a cycle level.


We're repeating ourselves. Either we simply have different preferences on where abstraction should occur (are you a C programmer by chance?), or one of us is very wrong. I'm going with the former unless someone else chimes in.

Quote
I thought the people I had spoken to had already told you these things


Yes, but the person who did was in no place to talk. So if you thought I knew this already, why'd you bring it up in public?

Quote
The written emulation code isn't all you have to understand if you rely on libraries and other things to do what you do


new = co_create(entry, stacksize), co_delete(old), co_switch(target). That's it. Someone who doesn't understand threading has no business working with emulators. And different != complex or unreadable, it just means different.

Quote
Whilst that complexity may not be in written emulation code it is there, in the background.


All of the "complexity" consists of maybe ~5kb of code, that saves me a good ~150kb of code and lots of #defines spent on state machines and their maintenance. All code has some cost.

Quote
Your way lacks features and always will


Oh? Don't tell that to the WIP testers who are using save states in bsnes right now ;)

Quote
The fact that you place such a low value on save states or cycle boundaries will always place you into the "fringe" category when it comes to these things.


I can't believe you're still talking about cycle boundaries. They work exactly the same for every single type of memory read or write, why should I carve out the details in all 512 opcodes when someone can reference two 5-line functions and get the same information?

You're arguing that 3+3+3+3+3+3+3+3 is clearer than 3*8 because I'm hiding the individual additions behind a higher-level abstraction (multiplication.)

Quote
It isn't some hodge podge thing which is hard to extend or anything of this nature.


Where do I add an fprintf() debugging statement in the middle your op0xFF example?

Quote
And by your own admission it's slower?


Yes, modern CPUs do not like having their stack pointer changed. This will be addressed in time, as it's a big part of multi-threaded programming. I'm also very partial to the idea of time-stamped writes to avoid synchronization entirely. My model would actually stack very well with that, while a state machine has to constantly maintain itself after each cycle (lest it not be able to potentially break there), even when it's not needed.

Quote
If state machine code is over complicating the opcodes I don't think the state machine is very well designed!


We're talking from different worlds. I don't think you fully understand the complexity of the SNES 65c816. The SMS has no comparison. When I look at every other chip in the SNES, I agree it'd be absolutely trivial to state machine it.

That's why everyone else does exactly that, and enslaves them to the S-CPU. The S-CPU just isn't practical for a state machine. You'll quickly find yourself thirteen levels deep in nested function calls before finding out you need to sync to another chip. Any less and your code will be full of absolutely massive unrolled functions that are an eyesore to read.

Quote
Why would you pay me money to do it when BSNES isn't a commercial thing? ;)


Not for my personal profit. I've spent roughly $1,000 on hardware, what's a little more on the software? :)

Quote
So I can empathize with you on this one.


At least you know what it's like to be in the minority on some things ;)

Quote
I understand that some of your users have probably frustrated you with these things in the past


You would be the first, actually. Other people don't always like my design, but none have found it more complex than state machines.

Quote
I will possibly be tackling the SNES sometime in the future, so it will be interesting comparing our emulators at this date if it does happen.


I would greatly look forward to that. You may get a better appreciation for the realities of SNES hardware and understand more of where I'm coming from that way.

I'm not looking to have the best emulator, I just wanted to reinvigorate a scene that had been stagnant since ~1998. If you can do better than me, I will be greatly appreciative and help in any way I can. Assuming you're not closed source. No sense helping you if nobody reaps the benefits.
  View user's profile Send private message
Chris
  • Guest
Reply with quote
Post Posted: Fri Jun 19, 2009 9:25 am
PoorAussie, since you have so much ideas of what good code design should be, what about posting part of your emulator sourcecode ?
It would be more fair, imo, than throwing some concepts on the table and judging other people's code... (personnaly I think your presentation of cycle buffer is quite limited, without any context to be applied, and in this state, I don't really see how differently someone would implement instrution cycle disctinction, seems very common concept imo)

Or maybe some kind of test release so we could all "judge" its current accuracy and features ?

I'm just kidding here off course but I mean, seriously, technical discussion is always interesting but constant declarations about how "contrary to xxx or yy, my emulator is designed to be very accurate" desserve some proof and stronger arguments ;-)


Also, no offense but it seems "extreme accuracy" is the new credo of "recent" emulation author and even if I appreciate the fact people keep working on emulating those lovely systems, there are some stuff they should not forget:

1/ there is no "best "design to accurately emulate system and microcontroller in general, the important thing is that you have fun designing your own. By experience, a design is the best until it is proven to have some flaws (and you decide to rewrite it from scratch) so never be presomptuous thinking you have the "best" idea about how emulate something until you really manage to get something 100% working ...

2/ 100% accuracy is not possible and will never be (I think byuu already figured it some time ago and felt a little bit frustrated about it...). Now matter how much you try and how deep your design go into chip latencies/synchronizations, there is no way you could *perfectly* emulate communicating hardwares using software (which is predictible by nature) only.

3/ extreme accuracy is often not required for emulating most games decently but more a way for developpers to have some fun trying different hardware vs software theories and new emulation techniques. This automatically make any debate about such emulator 's lack of speed or features (like savestate, video recording or other goodies like that) totally irrelevant.
 
Reply to topic



Back to the top of this page

Back to SMS Power!