Announcement

Collapse
No announcement yet.

Performance Predictions

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Rugger:

    I just noticed something in an earlier post of yours.. saying assembly level isn't necessary anymore, is well, just plain wrong. Until someone can write a _really good_ vectorizing compiler, ASM will become more and more important, with the ever growing number of SIMD extensions to most (if not all) major instruction sets.

    Steve

    PS. Admitedly I'm somewhat of an ASM nut, but I can beat GCC, MSVC(.net) and Intel's compilers easily, and sometimes by a lot.
    Distributed Computing just for fun...
    www.team-tnt.net

    Check us out.

    Comment


    • #62
      Rugger, I think most 64 bit CPU's have 128 bit data bus. (I think the hammers can be setup for either 64 bit or 128bit (two 64 bit memory controlers)

      It is true that a lot of FPU units can crunch at a similar rate to the integer units on a given CPU, (mainly because of all the extra silicon it take up) .

      Comment


      • #63
        Originally posted by otlg22
        Wombat,

        IA64 runs x86 poorly, and I'm not sure how well it would do at PA-RISC (although probably much better, as PA-RISC is a tad closer to IA-64)

        Intel's EPIC (which I'm sorry is really just VLIW) is like every other VLIW processor I have seen so far.. great for DSP like functions and miserable for general purpose computing. Of course I haven't had the chance to see much of McKinnley, and things may change (a first generation of a new architecture usually sucks rocks hard), but I have visions of the IA432, which while a techincally brilliant chip was a market disaster (although they did learn some interesting things in the process, so all wasn't for nought).

        Regards,

        Steve
        No, the Merced runs x86 poorly. The archtecture says that IA-64 chips must support x86, it doesn't tell people how to implement it. There will be different implementations as time passes.

        Also, it's not Intel's EPIC, it's HP's, and a descendant of WideWord. The "I'm sorry" isn't necessary either, since you're not using comparable words. You might as well say "x86 (which I'm sorry is really CISC).
        As for me, I've seen tons of McKinley.
        Gigabyte P35-DS3L with a Q6600, 2GB Kingston HyperX (after *3* bad pairs of Crucial Ballistix 1066), Galaxy 8800GT 512MB, SB X-Fi, some drives, and a Dell 2005fpw. Running WinXP.

        Comment


        • #64
          Wombat,

          Not looking to start a flame war here, so I'm very close to agreeing to disagree, but that said, EPIC/IA-64 and VLIW are so close they are almost the same thing.. the core concept is the same. It's just another approach to VLIW.

          How are the compilers for targetting ia-64 coming along anyway, that's the biggest issue there.. and will there be any hardware OOP support in McKinnley?

          Steve

          PS. I am assuming you are @ Intel or somewhere close, so if you are NDA'd just say so.
          Distributed Computing just for fun...
          www.team-tnt.net

          Check us out.

          Comment


          • #65
            Originally posted by otlg22
            Rugger:

            I just noticed something in an earlier post of yours.. saying assembly level isn't necessary anymore, is well, just plain wrong. Until someone can write a _really good_ vectorizing compiler, ASM will become more and more important, with the ever growing number of SIMD extensions to most (if not all) major instruction sets.

            Steve

            PS. Admitedly I'm somewhat of an ASM nut, but I can beat GCC, MSVC(.net) and Intel's compilers easily, and sometimes by a lot.
            Ah, you are right about SIMD instruction sets. All that will require assembly for quite some time, since compilers will always find it difficult to extract SIMD operations from SISD code (you simply can't describe SIMD in a language like C) However, most of the code in any program doesn't need to access SIMD, and those parts that do should be well encapsulated so they can be ripped out and replaced if needed.

            Also, about beating compilers, I said I was a decidedly average assembly programmer. I had no doubt that there are experts that are much better than me. My main problems with writing x86 assembly are the lack of registers and weird processor limitations (like partial register stall on the P6 ) you have to deal with. The only times I have been able to beat the compiler successfully over multiple types of x86 cpu is where I was able to use features of the cpu that aren't described in the C language (eg CPU carry and borrow flags for subtraction).
            80% of people think I should be in a Mental Institute

            Comment


            • #66
              Originally posted by otlg22
              Wombat,

              Not looking to start a flame war here, so I'm very close to agreeing to disagree, but that said, EPIC/IA-64 and VLIW are so close they are almost the same thing.. the core concept is the same. It's just another approach to VLIW.

              How are the compilers for targetting ia-64 coming along anyway, that's the biggest issue there.. and will there be any hardware OOP support in McKinnley?

              Steve

              PS. I am assuming you are @ Intel or somewhere close, so if you are NDA'd just say so.
              I am going to stay away from this particular battle, and just wait to see what happens. McKinnley will tell the story to whether ia-64 is stillborn or not. As I said, I like the ideas behind it, I am just not sure whether it works in the real world.
              80% of people think I should be in a Mental Institute

              Comment


              • #67
                Originally posted by Marshmallowman
                Rugger, I think most 64 bit CPU's have 128 bit data bus. (I think the hammers can be setup for either 64 bit or 128bit (two 64 bit memory controlers)

                It is true that a lot of FPU units can crunch at a similar rate to the integer units on a given CPU, (mainly because of all the extra silicon it take up) .
                You are confusing the internal cpu bit size to the data bus leading out of the cpu. A 32-bit cpu could easily use a 128-bit interface to the rest of the world (to supply the L1 and L2 caches)

                The reason x86 only has a 64-bit data bus is because 128-bit data buses are very hard and expensive to create (compared to 64-bit buses).

                Workstation 64-bit cpus can afford to have massive 128-bit external buses. Desktop cpus can't yet.
                80% of people think I should be in a Mental Institute

                Comment


                • #68
                  simplest explaination

                  multiply a 64 bit number by another 64 bit number = 128 bit number..you need 128 bits to store the result

                  Comment


                  • #69
                    Originally posted by Marshmallowman
                    simplest explaination

                    multiply a 64 bit number by another 64 bit number = 128 bit number..you need 128 bits to store the result
                    Simplest explanation of what exactly?
                    80% of people think I should be in a Mental Institute

                    Comment


                    • #70
                      of why the currnt crop of 64 bit cpu's will probably have 128 bit data busses. 64 bit cpu operations will genertae 128 results in a lot of circumstances, so they will have 128 bit registers for the results.

                      When changes occur in bit widths you may have external bus of the same size for cost reasons but with dual bank memory systems cropping up everyewhere(prestonia,nforce,hammer) I can see no technial reason that 128 bit solutions will not establishe itself as the norm very quickly.
                      In my opinion AMD will probably release(or some chipset manufacturer) a 64 bit bus version for the low end.(only uses one of its memory controllers)

                      PS what is the data width of the itanium?

                      Comment


                      • #71
                        oops, intanium has 64bit busses...that sucks.
                        I was really only talking about memory buss, not data bus(as in PCI).
                        Well at least AMD can be configuraed with 128 bit wid memory bus...

                        Well at least all current normal desktop systems(32bit) have 64 bit memory busses (pentium/k6 to pentium 4/athlon)

                        I shoud stop digging

                        Comment


                        • #72
                          Huh

                          Originally posted by Marshmallowman
                          of why the currnt crop of 64 bit cpu's will probably have 128 bit data busses. 64 bit cpu operations will genertae 128 results in a lot of circumstances, so they will have 128 bit registers for the results.

                          When changes occur in bit widths you may have external bus of the same size for cost reasons but with dual bank memory systems cropping up everyewhere(prestonia,nforce,hammer) I can see no technial reason that 128 bit solutions will not establishe itself as the norm very quickly.
                          In my opinion AMD will probably release(or some chipset manufacturer) a 64 bit bus version for the low end.(only uses one of its memory controllers)

                          PS what is the data width of the itanium?
                          Have you done ANY assembly programming . If you have, you would soon realize that:

                          1) The result of the multiply instruction is ussually split into 2 halves and put into different registers. From there, you treat them the same as any other registers and process them at the same size as the cpu bitsize. So the fact you have a 128bit result is meaningless.

                          2) You ussually (on RISC processors, x86 is different) cannot access memory directly. You must load and store memory explicitly to and from registers. Therefore, all interactions with the outside world of the CPU happen at the wordsize of the processor (actually, that is a lie, all external data transfers actually occur accross the L1 and L2 caches first)

                          However, i haven't kept up with AMD hammer info, so i don't know if it using 128bit bus or not
                          80% of people think I should be in a Mental Institute

                          Comment


                          • #73
                            yes I have done plenty of assemmbly.

                            CPU typically have 64 bit registers on a 32 bit CPU. But they can be accessed as two 32 bit registers lo and high. are used as pairs as always been the case.(this is the case for risc and most others)

                            But lets get back to the original subject.
                            256 bit external bus will certainly have excellent bandwidth, but how well does that work with 3d?.
                            I suppose it can transfer multiple displacment and texture maps simultainously and some extra data to spare.

                            Comment


                            • #74
                              Originally posted by Marshmallowman
                              yes I have done plenty of assemmbly.

                              CPU typically have 64 bit registers on a 32 bit CPU. But they can be accessed as two 32 bit registers lo and high. are used as pairs as always been the case.(this is the case for risc and most others)
                              No, you can fudge 2 32-bit registers together to do some 64-bit operations. But can you actually use store and load operations on the pairs. (I will probably end up with my foot in my mouth over this again)

                              Anyway, the only point I wanted to make is that the width of the external data bus of a CPU isn't strongly affected by word size inside the CPU, but rather by what price/performance point you want the CPU to perform at. Hence you can't say that bigger cpu data buses are a feature of bigger word size cpus
                              Originally posted by Marshmallowman
                              But lets get back to the original subject.
                              256 bit external bus it will not be bandwidth limted, but how well does that work with 3d?.
                              I suppose it can transfer multiple displacment and texture maps simultainously and some extra data to spare. [/B]
                              Most immediate mode rendering 3d adapters are basicly (more or less):

                              from the furtherest polygons away:
                              1) Get triangle from cpu or gpu
                              2) Apply texture(s) to triangle and any other transforms
                              3) draw triangle to screen and z-buffer.

                              And furthermore, graphics processors can do this to lots of pixels in one graphics clock. This requires bandwidth, lots of it, to move textures, z-buffer memory and display memory in and out!
                              80% of people think I should be in a Mental Institute

                              Comment


                              • #75
                                What I wonder is, what will GPU have next... That's the interesting question.
                                Distributed Computing just for fun...
                                www.team-tnt.net

                                Check us out.

                                Comment

                                Working...
                                X