Announcement

**Wombat** · 6 February 2005, 23:18

Originally posted by DGhost
even if it was broken down into stages, each APU has to wait until the APU working on the preceeding stage is done. if the input is not a constant, it *has* to wait until the value is computed before it can operate on it. The only exceptions of this is when using branch prediction, and even then it generally only operates on cases where there are only two branches - ie, true/false pairs.

most of the APU's would still sit idle because you cannot get around that fact.

This makes sense, but it's actually FALSE.

Techniques like loop unrolling can help here a lot. Idle cells can also do things like memory prefetch, or speculative processing. Modern processors will calculate multiple paths - not just branch prediction, but multi-pathing - calculating possible answers and choosing the correct one as soon as possible.

**DGhost** · 7 February 2005, 00:47

Originally posted by Wombat
This makes sense, but it's actually FALSE.

Techniques like loop unrolling can help here a lot. Idle cells can also do things like memory prefetch, or speculative processing. Modern processors will calculate multiple paths - not just branch prediction, but multi-pathing - calculating possible answers and choosing the correct one as soon as possible.

lol, again my understanding of modern microprocessor design stops about 15 years ago. most of this new high-speed shit is, quite simply put, beyond me. i knew that they had some pretty advanced branch prediction, and for some reason loop unrolling had entirely skipped my mind. multi-pathing... that sounds like some good shit. the cell architecture has the definate potential to improve performance on this. it just doesn't sound like a good solution to a problem... maybe a good way to give it a little bit of a performance hit when nothing else is going on. call me skeptical, but it just seems that it would become worthless if the processors came under load.

I was, however, waiting for you to show up on this topic

Spadnos - it definately has it's advantages, but many of those things can easily be faked without the need for massive parallelism. for games, what benefit are you going to see from each NPC making their own decisions? I recall a developer interview on the topic of Morrowind, where they were discussing features they pulled before the final. In one of their early demonstrations, they had a town that they took an excrutiating amount of time to detail it. It included characters that would randomly wander the city, going to bed and waking up, visiting the bar, etc. fairly quickly into the development of the actual game they pulled a lot of that work and instead made the characters far more static. Why? they found it detracted from the game play for people to have to go hunt NPC's all over town. People play games to be entertained, not to spend 15 minutes talking to NPC's to try to figure out where a quest NPC (or merchant) went. cool for demonstrations, uncool for playing.

the sorts of tasks you are talking about that can be broken down amongst different threads easily are all essentially batch processing problems - where the computer is operating on a large set of data at once and it is easily parallelized because each piece of data can be operated on or computed independantly of each other.

when you get into games (specifically) and other interactive programs, you begin to run into thread syncronization issues. to take a game, something that can operate quite effectively and quite efficently on a single thread, and have to break it down into many smaller threads, and then have them all operating in sync with each other.... that is the sort of thing that lends itself to fairly heafty performance penalties coded or compiled wrong. most of the cases outside of the acctual rendering of the scene itself do not lend themselves to multi-threading.

Quake 3 made attempts at this. the end result, for whatever reason, was that average frame rate would increase far more than your max frame rate. why? it didn't enable it to render faster... it only enabled it to reduce the number of times that the processor was directly the bottle neck, thus rasing the minimum frame rate in certain scenes. Same with most SMP platforms out there.

a majority of the times you will see a Massive Performance Improvement with this sort of architecture is when doing batch processing. other than that it seems that it will either 1) make things run smoother in the background (not a bad thing), or 2) enable more seperate things being done at the same time without a performance hit on any of them (not a bad thing either). The problem with both of those is that for a single application to take advantage of it you wind up increasing the complexity of the program. it becomes harder to develop, it becomes harder to optimize, and it becomes easier for performance problems to crop up in.

sidenote: i am ignoring things such as database/webserver benchmarks at the moment. to me they fit into the idea of batch processing, even though it is backwards and is doing many different things with the same data.

KISS. Keep It Simple, Stupid.

Similar ideas have been tried before. This is a rather unique solution, from what I know about it (not a whole lot), and could succeed where others have failed. Dunno. Historically, the simpler, cheaper solution that works well 75% of the time has always won out over the faster, more technically impressive solution that costs more.

Some developers may hate the platform, unless they have some damn good development tools. Console game programmers have historically always preferred to have a platform that can execute a series of instructions in a reliable and predictable manner and time. This has been one of the major differences between PC and console games, and one of the biggest reasons why the PC Gaming market is really starting to loose out to the console gaming market.

aaanyways... late night ramblings... feel free to point out corrections as nessicary...

**Marshmallowman** · 7 February 2005, 01:47

Limit of silicon in that, we are not going to be getting much more Ghz and process size's arn't going to shrink much more.
But we may be getting larger cores as more is put to do the job instead of just making faster and dual core fits into that category.

**Wombat** · 7 February 2005, 09:58

Once again, people have been saying that for the better part of a decade.

**Jon P. Inghram** · 7 February 2005, 17:35

Some more info, including pictures: http://www.electronicsweekly.com/art...rticleID=38754

**Nowhere** · 11 February 2005, 13:27

http://www.realworldtech.com/page.cf...WT021005084318

**DGhost** · 14 March 2005, 21:41

Interesting...

Gameplay code will get slower and harder to write on the next generation of consoles. Modern CPUs use out-of-order execution, which is there to make crappy code run fast. This was really good for the industry when it happened, although it annoyed many assembly language wizards in Sweden. Xenon and Cell are both in-order chips. What does this mean? Itâ€™s cheaper for them to do this. They can drop a lot of cores. One out-of-order core is about four times [did I catch that right?Alice] the size of an in-order core. What does this do to our code? Itâ€™s great for grinding on floating point, but for anything else it totally sucks. Rumours from people actually working on these chips â€“ straight-line runs 1/3 to 1/10th the performance at the same clock speed. This sucks.

This is a more or less paraphrased excerpt of what Chris Hecker said during the IDGA panel at the last GDC.

interesting...

**Jon P. Inghram** · 14 March 2005, 22:55

I can hear the game companies now: "But we need processors that can make up for our crappy programming!"

I'd love to see a 3 GHz 680x0 style CPU...

**Wombat** · 14 March 2005, 23:15

Originally posted by Jon P. Inghram

I can hear the game companies now: "But we need processors that can make up for our crappy programming!"

I'd love to see a 3 GHz 680x0 style CPU...

Speaking as both a processor engineer, and a code monkey:

It's not "crappy" programming, it's that the level of abstraction has moved up. Look, if people want to stick to putting every single bit together by hand, fine. But I'll take the "inefficiency" of OOO units, automated memory management and protection, and object-oriented coding in order to actually produce something "modern."

In short: Less "efficient" but correct and functional, today -- that's far better than "fast as hell but we never were able to finish it."

**Jon P. Inghram** · 14 March 2005, 23:34

In short: Less "efficient" but correct and functional, today -- that's far better than "fast as hell but we never were able to finish it."

Too bad there are so many examples of "slow as hell and we never were able to finish it" around today.

**DGhost** · 14 March 2005, 23:35

Originally posted by Jon P. Inghram

I can hear the game companies now: "But we need processors that can make up for our crappy programming!"

I'd love to see a 3 GHz 680x0 style CPU...

acctually, thats what processor manufacturers have been doing. they looked for ways to make the chunky portions of code that are not easily optimized and make them execute faster. they devoted entire architectures to being able to handle non-linear code quickly and efficently. they decided that it was the best way to make an architecture that was better than the competitions, and thus more attractive to developers and end-users. and it worked.

portions of Quake and Quake II were still being written in ASM because of the fact that they could still optimize the shit out of very linear, in-order code (aka, graphics rendering). but for the other 99% of the game they didn't even bother with ASM because of the fact processors had caught up to the point where there was no difference in performance. and once the switch to dedicated hardware renderers was made, it definately made no sense to have it.

which is slightly funny, because one of the biggest problems with the Shader Languages is that they are a throwback to the old ASM graphics routines. to get optimial performance it requires hand tuned code that might as well be ASM, just far more advanced. HLSL will eventually be the death of that, forcing graphics processors to be something far more general and capable of handling different styles of code. in the meantime though, you can see the performance differences that it can cause in games like Doom 3 and Half-Life 2.

anyways, as Wombat said - i would take a platform that is slightly slower if it meant less development time and less effort spent optimizing and tweaking the shit out of low-level ASM routines in order to get solid performance.

anyways, the person in question earlier defined crappy code as not code that is poorly written, but code that is messy, complex, and in general not a fixed function piece of software (ie, pixel shaders). it wasn't that the code is crappy, it's just that from a processing standpoint it is incredably difficult to make it run in a very linear way so that traditional platforms could run it well. as such it is an incredably pain in the ass to work with and thus, incredably crappy to have to deal with.

oh well. lots of good info in the notes - IDGA Session Partial Transcript/Notes

**DGhost** · 14 March 2005, 23:36

Originally posted by Jon P. Inghram

Too bad there are so many examples of "slow as hell and we never were able to finish it" around today.

just because platform architects have been working on making things idiot proof doesn't mean it is impossible to completely screw things up

edit: fixed grammar

Announcement

PS3 Cell processor info ... good read.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment