Gaming and Media Encoding Performance



Based on Gaming Performance, the 3000+ rating of the new 512k Cache Athlon64 is very conservative. The 3200+ 1Mb cache version is faster in our standard game benchmarks, but by a very small margin. Compare the game benchmarks for the 3000+ to the Pentium 4 3.2 and 3.0 Processors. The 3000+ holds its own against the best from Intel. Intel does lead the Mpeg Encoding benchmark in out tests. While the FX is competitive in this vintage encoding benchmark, the single-channel A64 solutions do not do nearly as well as dual-channel designs in this benchmark.

The FX51 is clearly the best performing processor in gaming, but results should be kept in perspective. At about 1/4 the price of the FX, half the price of the 3200+, and $150 less than the 3.2 P4, the 3000+ is providing outstanding gaming performance. For those who have wanted an Athlon64 or top P4 CPU, but just couldn’t handle the cost, the A64 3000+ will be like a breath of fresh air.

Memory Performance Workstation Performance
Comments Locked

75 Comments

View All Comments

  • Reflex - Saturday, December 27, 2003 - link

    um, this conversation has taken a turn into the absurd...machine man? WTF is wrong with you..?

    Um, heh, I think if I stay in this debate any longer I may have to turn to drinking in order to understand what is being said...
  • Pumpkinierre - Friday, December 26, 2003 - link

    The passages I have quoted are well known for those that lived in that era and stand fully by themselves in relevance to the overall article. The rest of the article is about methods of required information prediction etc. BUT as the excerpts state this is NOT RELEVANT to data/commands that have a low probability of being reused in cache and one of the examples given of apps that have this occurrence are games. This is clear- not complicated. The only addition is the fact that since the cacheless celeron the difference between memory and cpu speed has increased substantially making caches more useful but this in no way negates the above argument.
    They also refer to writing back to memory once data has been changed by the cpu- one method is the cache hangs on to the data until it is deemed no longer useful where upon it must write the data back to RAM. This is what I was referring to as cache purging that a previous post hit on me saying all you needed to do was overwrite.

    With the BIOS on my i875 (ABIT IC7-G) I cant switch any of the caches off (not that its relevant because P4 L1 cache is 8K- definitely too small). I know on HX mobos I could do it and cant remember on the VIA?K6 boards but there was a utility tweakbios which hasnt been updated to later boards(last time i looked) that gave you full access to all functions. Also the powerleap utility allowed you to switch one of the caches off.

    Just as an aside I've come across a site that does the testing I suggest: tweaktown .com. Testing the HIS 128Mb 9800pro and 9600XT against a 256Mb 9800XT(AT post 3 days ago-
    http://www.tweaktown.com/document.php?dType=articl...

    "To retrieve the most accurate frames per second (FPS) in our benchmarking suite, we fired up a game then played until we reached a scene containing average amounts of action. We then recorded the FPS during a one minute time period and took the mean FPS. This gave us a fairly adequate ranking of what “Excalibur” of performance these graphic cards are at."

    They are a little bit loose on the statistics and dont define their settings too well but they are in the right direction. They give anectdotal information and include 3dMark03 results but they themselves state in a note that they dont give them much credence. They use an FX51/SKV8 testbed which is hardly your budget gamers system but at least excludes cpu limitations.The responses in different games and settings of the different cards are exactly as I said it would be: different in every situation with the 9600XT close to the 9800pro in many gaming tests (even DX9-Halo but slaughtered in 3DMk03) and the 9800pro beating the 9800XT by 5% in 'call of duty' at 1280x. Even though the tests were manual and subjective, general trends were still obvious while not losing the system/game synergy anomalies that CAN ONLY BE DISCOVERED BY ACTUAL GAME PLAY. I praise them for their understanding of the complexity of system/software testing and hence the meaninglessness of demo testing.

    Back to P4. Your argument on P4 not being server oriented and having super high latency is false. The Xeon(P4 version) may have been released later but the P4 was the experimental testbed for it. Why else would intel release a cpu that was roundly criticised by the media as being less powerful than the 1.0 P3. I know: they were having problems getting the P3 higher than 1Gig na na na-Bsht! The athlon was only at 1.2 and you lost your Tbird within 4secs if your fan failed. P3 went on to 1.4G same as Tbirds within the next year once problems were sorted. So the weak P4 was put out to test the new technologies(P4,i820/840,RAMBUS) with their eye firmly on the server market. With the K8 AMD are trying to run this battle the other way round servers first then desktop- all I can say is good luck!
    On latency , RAMBUS was meant to deliver on this front(no not just bandwidth) but failed. It was meant to be better at lower latency than SDRAM or DDR DRAM and many articles stated this at the time. WHY else would intel do this when RAMBUS was 5 TIMES the cost of SDRAM at the time?! But it was found (against DDRAM) no to be so for reasons explained in your quoted ARS article (target byte transfer etc.). This is why Intel dropped it. RAMBUS has been at the same price as DDR for over two years now, bandwidth is good and the i850 RAMBUS mobo can still match it with i865/875 so RAMBUS should be coming into its own. Intel have killed it because of latency considerations. The P4/mobo was always destined to be a low latency high bandwidth system. It is now under the i875 mobo with PAT. This is the lowest latency system outside of the K8. The i875 is actually a small server board handling ECC reg. memory and i think able to be set up as a dually (Asus mobo?). The P3 was loved by business and server personnel for its coolness and reliability not latency or bandwidth where it was quite poor. The distrust of the P4 is only as a result of the extremely conservative nature of these people and that is what AMD is facing up to with the opteron. The present Xeon has really only started to be trusted in the last year with reliable DDR mobos.

    The serverfarms you mention are a misnoma they are workstation farms. A server does what it name says it serves out files/programs to intelligent terminals basically optimising storage. Any other combination can be defined as a mainframe or a workstation. Both of these require powerful fpu cpu. A mainframe requires good bandwidth and latency if several terminals attached while a workstation only requires bandwidth if media streaming and low latency if operator driven testing of 3d virtual worlds is being carried out. A computer that renders is a workstation. A database computer is a mainframe. The K8 is ideal as a gaming chip and low to middle(MP'ed) workstation with the exclusion of media streaming apps (high bandwidth). It is not meant for servers as defined.

    Personally the term server especially applied to the K8 makes me puke- only the machine men in AMD who have long since sold their brass monkeys could have thought that one up as the fate of a brilliant gaming processor. That's why we need big Arnie to go in there and sort it out but he's tied up with california. In the meantime, Reflex you know I speak the truth, we need more to come on side and demand what we've been waiting 4yrs for: the new 300 celeron 'not quite cacheless wonder'. I detect in your life story a hint of a failed machine man (probably why you frequent AT). If this is so, you have passed the first test of turning your back on all the disconnected abstract mumbo jumbo that plagued the last century and I urge you to continue on having faith in what you sense is real not what is dished up to you as being right.

    Right now with the 512K A64 the break has occurred. Already the price has gone from 217 to $240 in the latest AT price roundup. We must make sure the price stays down and demand is met. Bickering and argument over old quarrels is pointless. We all know the K8's destiny- for the masses. So rally to the call of Jefferson and Arnie:

    "No, no we're no going to take these high priced bloated cache K8s anymore".

    or for Voltaire- la marseillaise:

    ""Marchons, marchons enfants de la GAMING COMMUNITY
    notre K8 de gloire est arrive"

    Happy New Year!
  • Reflex - Friday, December 26, 2003 - link

    You can do this test now, actually. Most BIOS implementations allow you to disable both L1 and L2 cache. I have actually done this test. Performance drops through the floor, including in gaming. As I stated before, you do not have to actually cache things to make them faster, cache is primarily used for pointers to locations in memory, which seriously reduces latencies.

    You are basically chopping tiny little sections out of the article that you think support your claims without simply reading the whole thing and seeing what it tells you. I am not going to cut and paste replies, either read the whole thing and understand why cache is so important, or continue to be ignorant on the topic. I highly reccomend anyone reading this to go read the article for themselves, the excerpts listed above are very out of context.

    Secondly, if the only purpose of a server is to serve up web pages, then you are correct that a strong FPU is not needed. However, companies like Pixar use larger serverfarms(renderfarms as they like to call them) with tons of dual CPU systems. Since those servers are used for rendering images/video, a strong FPU is very very important. Several companies have switched to K7/K8 based servers for their superior FPU, including Pixar and I believe ILM.

    Furthermore, the Xeon *is* a enhanced P4, not the other way around. I am not sure how to put it to you, but I was personally involved in the development process. I am a former Microsoft engineer who worked on the Windows 2000 and XP kernel, I do know what I am talking about. I had my hands on P4's long before they hit the market, as well as K8's. I can pretty much tell you the development cycle of any CPU made since 1999 and what order they were developed. The P3 Xeon continued as the primary server CPU from Intel for a year past the release of the P4 simply because it took that long for Intel to finish enhancing the chipset and cache algorithms of the P4, as well as validating multi-CPU support. The P4 was a purely consumer CPU, its server uses were an afterthought, if that had not been the case, it would have been a low latency rather than a super high latency design, the high latency design has crippled their competitiveness in a lot of situations, namely database servers which rely on super low latencies. In many cases, even to this day corporations prefer P3 Xeons for both their lower power/lower heat as well as the fact that it takes 2Ghz or more from a P4 to compete with a P3 for a very optimized DB and the heat/power requirements just don't make it worth it to changeover to a P4 above 2Ghz...

    Anyone wishing to test the BS being spewed above in their favorite games, go into your BIOS and disable your cache. Use FRAPS or some other counter to measure your minimum, maximum and average framerate. Then turn your cache on and repeat. There is no need for debate, anyone can run this test if they wish...
  • Pumpkinierre - Friday, December 26, 2003 - link

    "When an app fills up the cache with data that doesn't really need to be cached because it won't be used again and as a result winds up bumping out of the cache data that will be reused, that app is said to "pollute the cache." Media apps, games, and the like are big cache polluters, which is why they weren't too affected by the original Celeron's lack of cache. Because they were streaming data through the CPU at a very fast rate, they didn't actually even care that their data wasn't being cached. Since this data wasn't going to be needed again anytime soon, the fact that it wasn't in a readily accessible cache didn't really matter."
    Thats from your stated article #71 Reflux, which is saying what I have been saying. I dont totally agree with the the media app and media streaming part of the explanation but in essence it describes what I and others have observed in regard to gaming. Try this from the same article on the cacheless celeron:
    "Along with its overclockability, there was one peculiar feature of the "cacheless wonder," as the Celeron was then called, that blew everyone's mind: it performed almost as well on Quake benchmarks as the cache-endowed PII."
    I think someone hammered me earlier about the P2 slaughtering the celeron 300.

    If you got an A64 out there (I've yet to see one even in a shop) disable your L2 cache maybe in BIOS or using utility (powerleap?) and run any FS or FPS/Doom type game as long as the exe file and configuration/data files fits wholly within DRAM run this memory at its fastest (latency) settings-preferably high FSB if clock unlocked (lucky these days, the apparatchiks are also going to lock the FSB). You most likely will notice little difference in game play as long as the memory pipeline hasnt been crippled by the L2 disconnection which I doubt will happen (although there might be an overhead from the L1 checking on a non existent L2 when the CPU requests memory content).
    On the server front even Intel is still trying to break into the market, garnering a section of the low end market with Xeons. These chips are X-86 based architecture from 25 years ago, effectively out of date and meant for desktops (64Kb blocks and 640K maximum memory etc.). The P4 is a rebadged Xeon (not the other way around as some purport)but intel followed this route from its non server days-8088 and 80886. Servers require high memory bandwidth and low system latency - to respond quickly to multiple requests and stream the data/programs. They dont require big fpu. You're not modelling the worlds weather and a server is not a mainframe. The change in direction of intel with the P4 was as a result of its intended server pedigree- poor fpu, quad pumped data bus (not needed for desktop) and expected low latency with RAMBUS. Its only held its own by the phenomenal ramp in speed which is nothing to do with the 20 execution unit pipelines. With the Itanium Intel have broken away from the X-86 mold under the pretext of a new start with 64bit. From all reports the first one was very slow in all modes. The second one I dont know much about but if it has to simulate X-86 it wont be lighning fast. With this processor they hope to attack the middle range server market going against IBM, SUN etc. who use K8 and xeons (and possibly itaniums) in their low end systems.
    K8 is also X-86 (with a 64bit extension set)and hence not designed as a server chip. Further it has only medium bandwidth even double pumped opterons. It does have very low latency which also helps a bit with the bandwidth deficiency. So at best they are only going to be able to enter the low end server market. I know: 8way systems, 3HT links, separate memory for each processor blah blah blah but all this is going to take a long time to work out- even with duallies mobos have got the memry running off one cpu. AMD have'nt got time and the server market wants turnkey reliability- it doesnt like being the experimental tesbed.
    The requirements of a good gaming chip are powerful FPU and low (system) latency. The 3d virtual world of games is made up from small data input files with a large exe file requiring heavy cpu number crunching to create that world (thats why I differ with the ARS Tech article about data streaming) so massive bandwidth is not required. But fast response is and as the ARS article points out with caches a cpu read or write request must be checked in all levels of cache first before going to main memory. This adds latency if the probability of the data being in the cache is low.
    So a small L1 cached K8 fulfills perfectly these requirements and solves the prod. capacity problem which in turn should get the price in the sub 150 sweet zone. I'll buy one if it makes it there in the next 3 months. I really didnt want my P4- I wanted what I described but got sick of waiting and have been even sickerwith AMD meanderings since

    The k8 12 execution unit pipelines have been optimised and tuned making it the powerfullest fpu for the money, at equal speeds it would eat a k7 for breakfast. The G5 is supposed to be okay but from what i remember of apple powerpcs they are not all that hot on the latency front, Further its not X86- Bill likes X86- his whole company is founded on it so the K8 is going in XBOX2 - I'll bet you $50 bucks on it.

    However what AMD need is Arnie-both internally against the apparatchiks and externally- twin gatlings hammering out the cacheless society message.
  • PrinceGaz - Thursday, December 25, 2003 - link

    Thats a most excellent article you linked to there Reflex and it helped fill in a few things I wasn't sure of, thanks for that. I hope Pumpkin... reads it thorughly and learns something from it.
  • Reflex - Thursday, December 25, 2003 - link

    http://arstechnica.com/paedia/c/caching/caching-1....

    A specific article on caches, how they work and what they do for a CPU.
  • Reflex - Thursday, December 25, 2003 - link

    Pumpkin: Once again, you are wrong. Show me a link with this 'rumor'. IBM themselves stated that the Xbox2 was going to use a variant of thier Power5 architecture. It is NOT going to be an x86 chip in any way, shape or form, so the K8 is not going to be in the XB2.

    And I believe you just got your low cost A64, thats what the 3000+ is. However as a 'gaming' CPU I think your missing the point: The K8 is a great architecture for a number of uses, in fact while it is *marginally* better than the P4 at gaming, it absolutely kills the P4/Xeon when it comes to servers. So the choice by AMD to target servers(high margins) as a priority, and consumer level(very low margin) as a secondary market was pretty much a no-brainer. Furthermore, due to the lower volumes of the server market it allowed them time to figure out how specifically to tweak certain aspects of the BIOS, drivers, etc to fully take advantage of the architecture when it hit mass market. They could play very conservatively with the Opteron, it was already considerably better than the competition without tweaks and that market is always willing to sacrifice a bit of performance for stability. This essentially gave them several months to tweak while they made money from the architecture. Not a bad plan(plus gave them more time to refine manufacturing).

    Honestly the argument that its a 'gaming tuned CPU' is rediculous, its no more gaming tuned than the K7. If a strong FPU is the main argument for making it a gaming CPU that darned Alpha really shoulda been the master of Quake. And hey, lets not forget the Itanium which has one of the strongest FPU units ever devised. The K8 has the exact same FPU unit as the K7 did in fact.

    http://arstechnica.com/cpu/index.html

    Read the articles on that link for a long list of very explicit architecture overview of different CPU's, and comparisons between them. There is quite a bit there on the K7, K8 and Power5(PowerPC 970).

    Come back after you have read these articles, it will make what you have to say far more relvant.
  • Pumpkinierre - Thursday, December 25, 2003 - link

    And on the jingle:

    "No,no we're not going to take these large caches anymore"

    I can add failed poet to the CV!
  • Pumpkinierre - Thursday, December 25, 2003 - link

    errata again should be:
    "servers which require low latency and LARGE memory bandwidth"
  • Reflex - Thursday, December 25, 2003 - link

    HammerFan: I take it back, this is not a useless debate. If you look at it from the point of convincing Pumpkin, then yeah it would be useless. But as the previous post demonstrates, a lot of useful information is coming out that will help the novice who is perhaps curious about the workings of different aspects of CPU's that are mentioned in different articles understand things better.

    Anyone who wants to take this into further depth I highly reccomend reading the articles on Ars Technica relating to CPU architecture(http://www.arstechnica.com). They have a very very good overview of both the K7 and K8 core, and I believe there is an article on the P4 there as well.

    Anyways, keep reading if you wanna know more about how things work I guess. ;)

Log in

Don't have an account? Sign up now