Announcement

**Dogbert** · 26 November 2002, 09:53

Found it !

Computing Performance in FLOPS
<table width="400" border="1" style="font-size:10pt">
<tr><td>Performance Tier</td><td>FLOPS equivalent</td><td>Key Platforms</td></tr>
<tr><td>kiloflops (KFLOPS)</td><td>1,000 FLOPS</td><td>IBM 701 (1953) IBM 704 (1955) Apple II (1977)</td></tr>
<tr><td>megaflops (MFLOPS)</td><td>1,000,000 FLOPS</td><td>CDC 6600 (1966) Cray 1 (1976) Intel Pentium (1993)</td></tr>
<tr><td>gigaflops (GFLOPS)</td><td>1,000,000,000 FLOPS</td><td>Cray 2 (1985) Thinking Machines CM-2 (1987) Microsoft Xbox (2001)</td></tr>
<tr><td>teraflops (TFLOPS)</td><td>1,000,000,000,000 FLOPS</td><td>Intel ASCI Red (1996) IBM ASCI Blue Pacific (1998) IBM ASCI White (2000) NEC Earth Simulator (2002)</td></tr>
<tr><td>petaflops (PFLOPS)</td><td>1,000,000,000,000,000 FLOPS</td><td>IBM Blue Gene (2005-2010?)</td></tr></table>

**Dogbert** · 26 November 2002, 09:57

An interesting note, check the years difference between home computers and supercomputers in each block.

1st block: 22-24 years

2nd block: 17-27 years

3rd block: 14-16 years

It's getting shorter every time.

**Dogbert** · 26 November 2002, 10:05

Last note b4 going home (@work):

Supercomputer graphics usually takes it's power from the CPU unlike with home computers. So for 3D output, our homecomputers with T&L enabled graphic accelerators should be at least as strong as 7-8 years old supercomputers.

**thop** · 26 November 2002, 10:15

rendering farms still don't use any 3D chips because the precision is higher with CPUs.
i don't think that has changed in the meanwhile or has it?

**DGhost** · 26 November 2002, 12:43

the ATI 9700 supports rendering up to 128bit fp precision (altho its not a full 128bit pipeline). The GeForceFX has a full 128bit pipeline as well. This, i believe, is either equal to or exceeds what most movies use.

render farms will start being consolidated down to a handfull of dedicated graphics cards within the next few years...

**lurqa_MU** · 26 November 2002, 14:31

Re: Supercomputer VS. Home computer

Originally posted by Dogbert
4. How many mflops does current mass production CPUs (Athlon XP 1600-2600, P4 1600-3.2, latest Sparc) have ?
5. How much (if at all) does memory architecture affect the outcome of those tests ? [/B]

Although the Athlon XP has 3 FPU pipelines, I believe its IPC for FPU operations is closer to 2 (someone correct me please...). This means that you rate a basic XP 1900+ (1600MHz) from around 3.2 to 4.8 GFLOPS.

P4 is more complex to place on the GFLOPS scale. Basically it should be able to execute 1-2 FPU operations per clock cycle, but it is probably closest to 1. If SSE2 is applicable to your application, those SIMD instructions should ensure that your IPC gets much closer to 2. So your basic 2.4GHz P4 should rate around 2.4 to 4.8 GFLOPS.

UltraSparc is a generation behind Athlon/P4 on the FPU performance side. But they do have 64bit as standard.

Memory arch. does not affect GFLOPS. But it definitely affect SPEC stats.

Regards,
lurqa

**DGhost** · 26 November 2002, 14:39

I know that there are a number of commonly used legacy FPU instructions that take 3+ cycles to execute on the P4 that used to take 1-2 cycles on the P5/P6 architecture. One of the tradeoffs of getting to the high clock speeds... The P4 really does shine when using SSE2 though.

**KvHagedorn** · 26 November 2002, 17:47

I remember reading about the Cray 2 and from what I read it wasn't as fast as a Gflop... maybe half that. I remember finding that its performance was roughly equal to a 400mhz PentiumII in Gflops. When we saw the Cray 2 back in the mid 80s and thought what an awesome machine it was, I wonder if anyone would have imagined it being outperformed by a gaming console 15 years later..

VJ · 27 November 2002, 03:34

Originally posted by DGhost
the ATI 9700 supports rendering up to 128bit fp precision (altho its not a full 128bit pipeline). The GeForceFX has a full 128bit pipeline as well. This, i believe, is either equal to or exceeds what most movies use.

render farms will start being consolidated down to a handfull of dedicated graphics cards within the next few years...

A colleague of mine recently went to a conference on computing (had to do with cache, optimisations, parallellism...) and told me there was a presentation there on this topic. A researcher had made a program to perform matrix calculations on a video-card. Whereas the execution time rose quasi-linear for 10x10, 100x100, ... matrices on a P4, it remained almost constant for the same calculations done on the videocard. Main issue was the delay in putting and retrieving the data on/from the videocard. Of course, the calculations were limited, both due to the instruction set (lack of certain control-instructions that would be usefull, ...) and the limited size of the program, but the performance was awesome.

Jörg

**Wombat** · 27 November 2002, 15:03

One of the problems is that none of the video cards are designed with moving that data in mind. AGP allows for fast reads and writes, but the video cards are designed only to grab information, their output over the bus is painfully slow.

Announcement

Supercomputer VS. Home computer

Supercomputer VS. Home computer

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment