Announcement

Collapse
No announcement yet.

'Fusion' cards

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Uhmmm... HSR is a software trick. it doesn't really do anything beneficial except reduce the number of polys a card has to process - at the expence of processing them all before they get rendered... it is more useful to implement in the engine of the game instead of in the driver or card level...

    Bus speeds are too low, yes. but a second card/co processor solution doesn't help as all the data still has to be moved across one bus or another. Pick your poison.

    faster busses may help, but you need an architecture that can handle all of it. and that goes to every single component inside a computer. that is why SGI workstations (the original visual workstations especially, where they used a completely diff architecture) are so fast.

    but a normal, desktop pc architecture is so horribly off balanced its not even funny. your hard drives can now (supposedly) do 100mbit/s burst - suppose you can ever do that sustained. since the hard drive controller is a PCI device, you just ate 3/4's of the PCI bus bandwidth. add something like a network card doing 100mbit/sec (which is *much* more likely to happen than a hard drive) and you have no more PCI bandwidth. doing firewire video capture and transfering it across a network? oops. you are very likely to run out of bandwidth.

    new architectures are 1) very expensive initially, and 2) require redesign of all existing peripherals (for a cheaper version of a spec - its cheaper to do a 10/100 hub than a 10/100 switch)

    there is a limit as to how far radically different thoughts and concepts and designs can go in this market.
    "And yet, after spending 20+ years trying to evolve the user interface into something better, what's the most powerful improvement Apple was able to make? They finally put a god damned shell back in." -jwz

    Comment


    • superfly: there's a *HUGE* difference between pre-rendered movies, and real-time 3d graphics. If you want to have movie-style graphics, then everything has to be upgraded: CPU speed, main memory speed, main memory size, "AGP" speed, GPU speed, video card memory speed, video card memory size. Plus whatever else I missed.

      However, today, the biggest bottleneck by far is video memory speed. Remember, theoretical AGP bandwidth is just as much as the theoretical bandwidth of PC133 memory (1066mb/s), so it stands to reason it's certainly not holding back the speed of the system.

      I've made a few quake3 levels, and when building them, one of the things you have to watch is the tri/s. This is a number of triangles drawn on the screen. When this number goes too high, then it's people with slow CPU's that get hurt the most, not those using PCI video cards..


      DGhost: Tell the powervr folks that HSR is a software trick, they've been building video cards with hardware implementations of HSR for a while now..

      Comment


      • BTW, the big draw for HSR is the fact that hidden surfaces aren't drawn (thus "Hidden Surface Removal")

        The reason why this is a huge advantage is that whenever you draw a pixel to a screen, you use up video memory bandwidth. Calculating precisely which pixels need to be drawn saves between 20-50% of the memory bandwidth, so you have more available to say, go up to the next resolution...

        Comment


        • Perhaps i should rephrase that - HSR is a z-buffer hack. it is a modification of algorithms. it can be implemented anywhere in a system, in hardware or software, as any algorithm can. but it doesn't particularly change the architecture of the video card to implement it. you change an algorithm used on it.
          "And yet, after spending 20+ years trying to evolve the user interface into something better, what's the most powerful improvement Apple was able to make? They finally put a god damned shell back in." -jwz

          Comment


          • Might i point out that transforming from geometric data to pixels is done *much* later in the graphics pipeline (the *very* last stage) at which point in time it already has used the z-buffer to figure out which polygons are visible and not.

            the major advantage to HSR is that it eliminates going through polygons that are not visible for things like lighting calculations, texturing, and all the other effects that are done to a scene before being rendered.

            HSR does save memory in the z-buffer, but on most video cards it is in system memory. on the kyro, it is on the card so it is a good thing to use, and does save video card memory bandwidth and AGP bandwidth. but the memory on a normal graphics card is not used to hold the z-buffer, so HSR does not usually benefit video memory bandwidth.
            "And yet, after spending 20+ years trying to evolve the user interface into something better, what's the most powerful improvement Apple was able to make? They finally put a god damned shell back in." -jwz

            Comment


            • it's not quite that way...

              Today's cpu's are very powerful and don't have any problem generating scenes with 40.000+ polys per frame,the question is getting that data to the video card.

              The only reason that allows Q3 to use more polys than most games,is that at least part of that geometry data can be offloaded directly to the video card if it has a T.L engine.

              Q3 uses about 10.000 polys per frame on average on high quality sttings,for instance here's some settings that greatly increase the average poly count of Q3 to about 25 000 per frame:

              /R_subdivisions 1(default is 4)
              /R_load curve error 10.000 (default is 250)
              /R_load bias -2 (default is 0)

              With my card(gf2 64 meg),the performance hit is about 5~10 fps average but only because some of the T.L operations are offloaded directly to the video card.

              Try it with the most powerfull cpu available and a non T.L card,and i assure you the performance hit will be a lot more than 5~10 fps,and yes i've tried it myself.

              Why does it happen,it's because the extra data goes twice over the same bus(chipset~cpu one),before it makes it way to the video card.

              Need more proof,today's cpu's are 20~30 times more powerful than when the first pentium cpu's showed up about 7~8 years ago,but how much faster have bus speeds become since then???...

              No more than 5 times faster and that's including the p4's bus,in fact even the first pentiums were already running on a 64 bit 66mhz system bus and pci bus speeds haven't increased since them either.

              So you're telling me that just because bus speeds are now running at 133mhz bus on most systems(excluding p4's),and that we have agp4x,that bus speeds aren't an issue anymore???...not a chance.

              Here's a link to voodoo extreme,where you'll see a question that was asked to tim sweeney that relates to this very subject,it's on the front page but you'll have to scroll down a bit to see it.

              www.voodooextreme.com

              note to self...

              Assumption is the mother of all f***ups....

              Primary system :
              P4 2.8 ghz,1 gig DDR pc 2700(kingston),Radeon 9700(stock clock),audigy platinum and scsi all the way...

              Comment


              • <font face="Verdana, Arial, Helvetica" size="2">Originally posted by DGhost:
                Perhaps i should rephrase that - HSR is a z-buffer hack. it is a modification of algorithms. it can be implemented anywhere in a system, in hardware or software, as any algorithm can. but it doesn't particularly change the architecture of the video card to implement it. you change an algorithm used on it.</font>
                True, but like a sony playstation emulator for PC, you're better off to design hardware built for that specific algorithm.

                Comment


                • Tim Sweeny was referring to the fact that the cpu speed is increasing faster than the bus speed. This is hardly new, the P120 was only marginally faster than the P100, since the p120 had a 30 mhz bus, and the p100's bus ran at 33mhz. However, this is not the major limitation in today's games.

                  Take a look at the p4, with it's quad-pumped 100mhz cpu bus, and dual-channel rambus memory, it has almost 3x the bandwidth the p3 has, and yet only outperforms it in extremely bandwidth-heavy applications (pre-rendering videos mostly). Ones that ONLY use memory and cpu bandwidth. In applications that don't require that bandwidth (the other 99% of programs out there), the p4 doesn't perform any better than the p3.

                  Look at it this way: quake3 sends the planes to your video card. Assume you've got 32-bit co-ordinates for your quake3 high-poly scene. A plane is made up of 3 co-ordinates, and assume another 32 bit 'texture' string to tell the video card which texture the triangle is using. Assume you've got a fast video card that's running at 60 fps.

                  32x4x250000x60/8 = 240mb/s. That's a far cry from even PCI's max 512 mb/s bandwidth, and I'm not even optimizing the bandwidth: quake3 sends only 2 vertices for many planes. The limitation is the CPU in this case, that's the reason why your T&L card helps so much.

                  Comment


                  • <font face="Verdana, Arial, Helvetica" size="2">Originally posted by DGhost:
                    Might i point out that transforming from geometric data to pixels is done *much* later in the graphics pipeline (the *very* last stage) at which point in time it already has used the z-buffer to figure out which polygons are visible and not.

                    the major advantage to HSR is that it eliminates going through polygons that are not visible for things like lighting calculations, texturing, and all the other effects that are done to a scene before being rendered.

                    HSR does save memory in the z-buffer, but on most video cards it is in system memory. on the kyro, it is on the card so it is a good thing to use, and does save video card memory bandwidth and AGP bandwidth. but the memory on a normal graphics card is not used to hold the z-buffer, so HSR does not usually benefit video memory bandwidth.
                    </font>
                    ?

                    A z-buffer is ALWAYS on the video card, except on HSR cards, which don't even REQUIRE (let alone have) z-buffers. Check out my post over here for more details on what HSR is..

                    Comment


                    • Rob,

                      You are incorrect, the P4 outperforms mostly in games, and that can be attributed to the higher FSB bandwidth.

                      Rags

                      Comment


                      • Yes, you're right. According to the nv15demo that Tom benchmarked, the p4 1.5 ghz system beat out the athlon 1.2ghz system by 6.8 percent.
                        That's in quake, possibly the most bandwidth intensive game out there.

                        In unreal (tim sweeny's game you might notice), it was only 1.2 percent faster than a PC100 system!

                        Considering the p4 has 1.5 times the cpu bandwidth of even the 133 mhz athlon boards, I'd expect much higher numbers if today's software was bandwidth-limited.

                        Comment


                        • BTW, I should also mention that those tests were optimal to show off the cpu speed: quake and unreal were tested at 640x480x16. Anything higher you'll see all the cpu's performing pretty much exactly the same. The reason: The computers become limited by the video memory bandwidth.

                          Comment


                          • UT is a piss poor benchmark for cpu speed period. There have been numerous other tests out there that show games performing better all around on a P4 system, and with the P4 processor itself being slower than the Athlon and P3, I would say that the FSB bandwidth is the saving feature there. You can argue all you want, but the fact is FSB and memory bandwidth are limiting factors now, there is only so much the limited amount of on-die cache can do to make up for the lack of progression in this area.

                            Rags



                            [This message has been edited by Rags (edited 05 February 2001).]

                            Comment


                            • PCI is limited to 133mbyte/sec (i'm acctually correcting myself from earlier - i said it was 133mbit...)

                              32bit*33mhz = 1056mbit/sec - divide by 8 for mbyte...
                              if you get 64bit 66mhz of course it is different, but since those are high end devices and not graphics cards...

                              and maybe i was mincing words with 'z-buffer' and 'depth buffer', but since the z axis is usually the depth into the screen that a polygon is i guess i just figured they were the same. usually the computer keeps a buffer containing where polygons are in the scene... HSR acts by eliminating polys that are not visible from the scene before it is passed into the video card... that buffer is kept in the memory of the computer because the memory on most video cards is used for (usually) 1) frame buffer 2) texture data. some cards also use it to hold information on other things, but it is usually broken down to those 2 things...

                              the kyro is contrary to this and holds the depth buffer on the video card in its ram... its a bit more memory intensive on the cards memory, but it is faster than going through AGP to system memory...

                              the Sharky Extreme article on the kyro mentions this, btw.
                              "And yet, after spending 20+ years trying to evolve the user interface into something better, what's the most powerful improvement Apple was able to make? They finally put a god damned shell back in." -jwz

                              Comment


                              • UT is such a piss poor benchmark... notice how most benchmarkers use Q3 instead?

                                also, you are referencing Toms Hardware. Toms Hardware is ultimately a joke. he cannot make up his mind on so many things, and it is so obvious that he is getting paid to test the hardware he reviews...

                                the rendering engine on UT was designed for glide. on any other API it runs like ass.

                                and as far as CPU speed goes, 640x480 is not pushing the video card...

                                at higher resolutions it has to process each pixel on the screen... how many pixels it can process is not so much dependant on the speed of the memory, but the speed of the core. Memory speed will make a difference, but as long as it can keep up with the core its the core thats the limiting factor...

                                at lower resolutions tho it is the memory and FSB that affect performance the most....
                                "And yet, after spending 20+ years trying to evolve the user interface into something better, what's the most powerful improvement Apple was able to make? They finally put a god damned shell back in." -jwz

                                Comment

                                Working...
                                X