Announcement

**Wombat** · 14 December 2000, 15:14

Do you do hardware design?
Do you understand that "simple" algorithms sometimes consume a huge amount of area, speed, or both?

**Randy Simons** · 14 December 2000, 15:29

3DFX uses a software HSR solution. This means, that it is only useful when you've plenty op CPU time to spare and your graphicsboard is the bottleneck.
If this is not the case, software HSR will do more harm than good.

**Drizzt** · 14 December 2000, 16:03

Wombat: I have a min idea of what is hardware design, so I can't really imagine with too much precision how much a simple algorithms consumes.

Anyway, I'm a computer programmer, and I tend to be a practical person.
And the pratical person in me continue asking himself WHY, if those alghoritms have been know for 40 years, neither a card manufacturer or M$ has considered using them to improve 3D performance, instead of reinventing a full new way to do 3D.

'Cause, it's really better to add a lot of texture unit and T&L and this and that, or have a way to draw half the polygons?

Honestly, I do not think that implementing an HSR algorithm in hardware is more expensive (in terms of transistors) than adding more rendering pipelines destinated to render the invisible polygons, incrementing the bus size 'cause the z-buffer is a bottleneck, having faster memory to solve the z-buffer bottleneck and so on.

(smiley series, to give the mex a more friendly aspect)

**Ali** · 14 December 2000, 16:11

Do you notice that most sites who loko at this say that other manufacurers should use the same idea. Has anybody thought that the Z-buffer information might be stored in the T-Buffer, that only 3dfx has at the moment.

If the Z-buffer info, that MUST be used for this HSR to work is stored in texture memory then you dont gain anything, because you have the same bandwidth bottleneck. Well, you might gain some speed, but probably not enough for it to be worth while.

I think you will find that 3dfx will have another buffer in their next chip that is used just for HSR, most other manufacturers will probably do the same.

In other words, dont expect Matrox to come out with new drivers next week that will have HSR, and give a 50% speed increase to our old G400s.

Ali (although I would LOVE to be proven wrong).

**Drizzt** · 14 December 2000, 16:39

Ali:
Well, the main difference is that the Z-buffer is a buffer that contain a z value for each pixel on the screen, not for each polygon.

So (the following are only personal speculation), if you consider triangolar polygons, they have three vertex each one; each vertex need 6 byte for representation;
3 vertex are 18 bytes; 10.000 polys are 175 kbytes; 175 kylobites are a very small amount, and should be no bottleneck for 3D cards.
And, this structure is already in ram 'cause is the 3D scene representation (apart from info on materials).

The same things with a z buffer, on a 800*600 screen, with a 16 bit z buffer (the same amount as above), cost 937 kylobites.
And, if you switch to 1600*1200, the HSR algorithm use the same memory amount, while the z buffer use 3750 KB.

What is the conclusion, after all this calculation?
That Matrox's workers are just a "bit" more able than me, so it must be something wrong in my tought ^_^;;;;

**Ali** · 14 December 2000, 17:47

If the Z-buffer only uses that much space, why does the ATI hyperZ make so much difference?

I have read somewhere that over half the bandwidth on modern vid cards is Z-buffer info, but I have no idea where I read that, or how true it is.

Thinking logically, each triangle has 3 points, each point needs a X,Y, and Z co-ordinate, at say 16 bit precision, that would be (16*3)*3 for each triangle (144) plus whatever information is needed to 'point' each corner to the one it should connect to (does that make sense?). Say we make that 175 bytes per triangle like you say.

When you draw something in 3d, say a cube, you will always be seeing at least 2 triangles (using 2 triangles to make one square), and if you are looking down on a corner you will be seeing 6 triangles, and 6 triangles will be hidden.

But for those 6 triangles you only need 7 points, not the 18 that you would need if you did 3 points for each triangle. (I think thats what hyperZ does).

Anyway, for HSR to work, you would have to look at each triangle in the cube and work out what one is in front of other ones. This would have to be done for all 12 triangles.

Wouldnt that mean that all the z-buffer info is looked at 12 times for each cube on the screen?

So it would go like this:
Is triangle 1 infront of triangle 2?
yes
is t 1 infront of t 3?
yes
is t 1 infront of t 4?
yes
etc, etc
and all the answers will have to be stored somewhere untill the full list is done. Thats 12 answers stored in some buffer somewhere, and then once you have done that calculation for that single cube, you have to see if that cube is hidden behind another cube. If you have say 1000 cubes on screen then this would be very slow unless you had a nice fast cache to store the info.

Now Im sure there are some very clever algorithums that would do this much faster and more efficiently, but you will still need somewhere to store the info. If its going back to video RAM, then you have to overcome the bandwidth bottleneck that aready exists. If its only going to a on chip cache (t-buffer) then its not much of a problem.

This is getting too long, and I dont realy know what Im talking about, so its probably all wrong anyway, but it just seems to me that it would be very hard to implement without having a performance decrease.

Ali

**AlgoRhythm** · 14 December 2000, 20:37

Drizzt was on the right track with his calculations. The whole reason the Z-Buffer is such a bandwidth sucker, is that for each _PIXEL_ you render, you must look up the current Z value in the Z-Buffer to see if the one you are rendering at the moment is in front.

In other words, if you figure out which triangles are in front _first_ (which is hidden surface removal) then you can just render away, without having to look up Z's all the time during rasterization.

BTW, the Kyro chip (PowerVR's newest generation) essentially does hidden surface removal in the form of tiled-based rendering.

AlgoRhythm

**Rob M.** · 14 December 2000, 20:44

*disclaimer* I'm making some educated guesses on 2nd hand info I've read on the internet. However, I've no reason not to believe what I've read.

This is the classic battle between brains and brawn...

In the beginning, entire PC graphics industry (excluding permedia) implemented a z-buffer method. All software was designed with a z-buffer in mind.

This method was better at that time. The video chip was the limitation, and there was spare memory bandwidth to use a z-buffer.

Now only with the current generations are the manafacturers are hitting the memory bandwidth limit. The problem is, all software is still setup for z-buffer. This means that the triangles get pushed to the vid card as soon as the cpu generates the 3 verts. However, in order to do HSR, you have to generate entire frames at a time, and then pass all the info across to the video card. It's extremely difficult to design a (properly working) HSR implementation in a market that optimizes for a z-buffer.

This is the main reason I assume permedia went off to do the dreamcast. The system was designed from the ground up with HSR graphics in mind. You make a dreamcast game, you'll be optimizing for the powervr2 exclusivly.

However, now that it's economical to spend large emounts on driver development (instead of expensive speedy memory chips) we're hearing more about HSR implementations in future products.

Personally, I'm more interested in EDRAM. Of course, a EDRAM chip with HSR capabilities wouldn't be bad either

**Rob M.** · 14 December 2000, 21:27

oh, to reply to some other posts about the fill-rate implementation. First, every pixel does have a 16 or 32-bit entry in the z-buffer. For a 32 bit zbuffer on a 1024x768 screen, you're using a bare minimum of 50 mb/frame of memory bandwidth. 1024x768x32 bit is 25Mb. In order to write a pixel you have to do one read (to compare the depth of the pixel in the z-buffer to the current one being written) and one write. The memory is accessed twice.

Now I say minimum 50 mb/frame because the z-buffer technique also has 'overdraw' - IE a say a video card is drawing a large character in a game. It will draw the entire background scene, then overwrite that info with the character. If a character takes up 90% of the screen, then your z-buffer usage is up to
95 mb/frame. Now, not only do you need to do the z-buffer work, you also have to write to the framebuffer. Both HSR and z-buffer techniques will require 25 mb/frame of bandwidth to write to the framebuffer (@32 bit colour), but the z-buffer technique has overdraw, so the further innefficency will cost the z-buffer method another ~23mb/frame (from our above example)

So adding up all the bandwidth, the one frame with the large character will require 143 megabits per frame of bandwidth. If you wanna have this simple scene rendered at just 30 fps, that's 536 megabytes/s of bandwidth the scene is using.

However, a properly implemented HSR would do the same scene using simply 25mb/frame of bandwidth -- one write to the framebuffer. Of course this is with a very simple scene optimized to show the benefits of HSR. Things like multistage textures (windows, reflections, quake3 skies) will also eat up quite a bit of bandwidth on both cards.

**spazm_1999** · 14 December 2000, 21:44

Originally posted by Rob M.:
IE a say a video card is drawing a large character in a game. It will draw the entire background scene, then overwrite that info with the character. If a character takes up 90% of the screen, then your z-buffer usage is up to

Ok, but I've got a question. In 3Dmark 2k for example in the second test (the one that looks like an adventure game). There's a little hallway in which the camera passes before arriving to the place with the 3 boats. While were in that hallway I make 40-50 FPS and when were on the other side(when we see the 3 boats), I make 12-15 FPS. Shouldn't I make less frames than that in the hallway because if we remove the wall, we would see the 3 boats and those seem to slow down a lot the G400.

Spazm

**Himself** · 14 December 2000, 22:21

From what I understand, and that's not much, it's a very crude system, video cards (except for the TCL variety) don't get enough information sent to them to do anything fancy. They get already transformed polygons, that is arrays of pixels with individual coordinates and z values. I'm not sure some cards can even determine what a polygon is rather than just a pixel stream. Each pixel has x/y coordinates, an alpha channel, texture lookup indexes, and a zbuffer value. Given information like that you can't apply any algorithm worth a damn.

MetaByte, the folks who hack in 3D glasses support for video cards by injecting a hook into the pixel stream have to reverse engineer the data going to the video card to generate each view. That's why you really want the support in the video card drivers.

**Himself** · 14 December 2000, 22:26

Object culling is done in the game engine, BTW.

As for HSR in drivers, any driver can implement that, there is no hardware limitation there, you just don't send invisible polygon sections along to the video card.

**Rob M.** · 14 December 2000, 22:31

The engine will have some built-in "HSR". Generally most engines will draw everything in your room, plus hallways. That's why a lot of games have the 'room-hallway-room-hallway-etc' structure.

I don't know how 3dmark works, but I do know quake3. It works like this: The 'air' inside each level is broken up into areas, called 'leafs'. Each leaf has nothing inside it, but generally touches one or more walls. If you can see any part of that leaf, the engine will have the graphics card draw the walls it touches.

So for that 3dmark boat/hallway, you can see through the door to the docks, all that's being rendered is the docks and some barrels. I'm sure that there's an invisible division between the docks and the ships so that from inside the hallway, you can see the leaves touching the docks (thus the docks are drawn), but you can't see any leaf that's touching the ships. Only the docks are being drawn.

**Rob M.** · 14 December 2000, 22:48

Originally posted by Himself:
As for HSR in drivers, any driver can implement that, there is no hardware limitation there, you just don't send invisible polygon sections along to the video card.

There is a big difference between the HSR and the z-buffer methods though, the HSR sends the entire frame to the video card only when it's finished. This presents problems for games that expect the ability to read from the z-buffer. HSR drivers have to pull some fancy tricks to emulate this ability that z-buffer cards have.

I'm positive there's huge gains to be made from hardware optimization also, current cards are built to have a constant flow of triangles and textures fed to them, until the frame is finished, at which point the framebuffer is drawn to the screen, flushed, and the next frame is started.
HSR drivers send the finished frame to the card, the framebuffer is drawn to the screen, and then the next frame is sent to the card, etc. It seems to me the HSR cards would be designed to handle bursts better than a z-buffer card.

Oh, and I think if you want a HSR card with T&L, you'd surely need new hardware..

[This message has been edited by Rob M. (edited 15 December 2000).]

Announcement

Hidden surface removal

Hidden surface removal

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment