Announcement

Collapse
No announcement yet.

A 64-Bit ULEAD MediaStudio Pro

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A 64-Bit ULEAD MediaStudio Pro

    ...leap.


    Here are some excerpts from a fascinating discussion in a VERY recent rec.video.desktop thread:




    Dave Haynie (ex-Commodore guru and Vegas Video user) wrote:




    "32-bit CPUs have been doing 64-bit floating point, and later, MMX, ops for quite some time now."


    "Some ops, in SSE or AltiVec (on the PowerPC) happen with 128-bit registers, today, on 32-bit CPUs."


    "So it's important to know what you're talking about."




    Martin Atkinson-Barr (a PH.D. physicist & programmer & truly brilliant fellow) responded:




    "It's true that x86 processors can move 64-bits, even 128 bits at a time using MMX or SSE but these are tough to use in general purpose routines because those registers are fixed at 64 and 128 bits respectively."


    "Generally, in a string move (or compare or scan) you drop off the odd bytes, or words and do a REP MOVSD (also CMPSD, SCANSD) for the bulk of the transfer."


    "Then you do a MOVSW and a mov for the remaining word or byte."


    "That is a little difficult for an arbitrary string using MMX or SSE."


    "Its not a problem using RAX in x86-64 since mostly the registers are addressable as 64, 32, 16 and 8-bits wide as RAX, EAX, AX, AL and so on for other GP regs."


    "Having written some assembler MPEG2 routines I can assure you there is a BIG advantage to x86-64."


    "If you want the source just e-mail me."




    Dave Haynie:




    "64-bit machines don't help much, if any, in string processing."


    "In the simplest case, you're reading or writing a single 8-bit or 16-bit character, one-at-a-time."


    "There's no need for larger registers, and in fact, unless you have special, ultra-CISCy string instructions (sure, the x86 has a few of these, don't know about x86-64) or some serious optimizations, that's precisely what the programmer does in their code."




    Martin Atkinson-Barr response:




    "No you generally work a row at a time, or a column at a time."


    "In integer format these are often 128-bits wide as 8x16-bit integer."


    "In my routines I convert these to floats for processing."


    "Go to AMD's website and they will send you, for FREE, the manuals for x86-64."


    "Great manuals too."


    "I'm not sure what you mean by CISCy string instructions, I presume the REP type I mentioned above?"


    "These are fully 64-bit in x86-64."




    Haynie:




    "Code isn't the whole story."


    "For one, all of the modern desktop computers, x86 and PPC as well as MIPS or Alpha, have been running on 64-bit (or greater) buses for years."


    "In fact, the Alpha went from a 128-bit conventional bus to the 64-bit, double-clocked EV6 bus that was adopted for the AMD Athlon."




    Martin Atkinson-Barr:




    "Well, there are busses and there are busses."


    "The internal chip busses may be 128-bit or 256 bit wide."


    "The external bus to memory is 64-bit currently in Athlons/P4s etc."


    "Hypertransport allows busses to be up to 32-bits wide but implemented as LVDS (low voltage differential) and so the clock rate is MUCH higher, giving up to 12GBytes/sec - compare that to the memory interface of current x86 with is maxed out at about 600MBytes/sec - TWENTY times slower."




    Haynie:




    "All modern memory for the last 10+ years has been able to go faster, taking advantage of locality of reference in the memory system."


    "This started back in the 80386/MC68030 days..."


    "I designed my first '030 product back in 1988, pretty long ago."


    "Anyway, when an Athlon reads or writes memory, it's optimized to run four-clock cycles, but each of these is composed of two data cycles of 64-bits each."


    "So the usual memory access hits 64 * 2 * 4 = 512-bits of memory per full cycle."


    "The Pentium 4 is optimized for running four-cycle cycles as well, but on its preferred bus, there are four data cycles per clock, so it's actually grabbing 1024-bit of memory per full cycle."


    "Essentially, whether you grab that first character in a sting as a byte, word, longword, or quadword in software, the hardware's going to suck 512 or 1024 bits right into cache for you."




    Martin Atkinson-Barr:




    "I think your math is off here."


    "Usual cycles are 64-bits wide even if the bus rate is running at 533MHz - actually 4x133 MHz, a quad pumped bus."


    "One read is ~4 cycles so you get max 64-bits at 133MHz or a maximum of 1066MBytes/sec."


    "Bus overhead drops this to a real, measured peak of 600MBytes/sec."


    "This is the rate from main memory to cache, which is important for video."


    "It ignores latency which is the killer (how fast to start each transfer, not the max when you get going)."


    "The Hammer chips have an on-board memory controller which really drops latency."




    Dave Haynie:




    "No one should be hand coding for modern processors -- the compilers do it better."




    Martin Atkinson-Barr:




    "Not so, the critical parts are still hand-assembled."


    "You get a factor of ~4 improvement by hand."


    "Compilers spend a lot of time running back to memory for data and writing back variables, you can see the assembler code they generate easily."


    "That puts pressure on the memory bus that is talked about above."


    "Keeping data in registers eases the critical memory bottleneck that so affects video processing."


    "It's one of the reasons why the Alpha does so

    well, because it has so many more registers."


    "In my MPEG2 routines, all of the data is kept in registers throughout, something a compiler could never do."



    Haynie (Vegas Video advocate):



    "For Ulead in particular, I think they had better spend time on fixing bugs, rather than 64-bit coding."


    "That would be of far more general value to their customers."


    (Of course, Haynie doesn't mention that Sonic Foundry already has a 64-bit version of Vegas Video in the works.)




    Martin Atkinson-Barr:




    "I suggest that Ulead go with the x86-64 architecture and standardize on that."


    "Nobody serious about video processing is going to want a 32-bit processor come next summer."



    _______________________________




    Martin Atkinson-Barr is correct; Ulead needs to get moving on 64-bit.


    The clock is ticking...


    Good luck, Ulead.


    Jerry Jones

    Last edited by Jerry Jones; 15 August 2002, 23:23.
Working...
X