What's new
  • Please do not post any links until you have 3 posts as they will automatically be rejected to prevent SPAM. Many words are also blocked due to being used in SPAM Messages. Thanks!

AMD A8-3850 APU Review: Llano Hits the Desktop

Status
Not open for further replies.

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,086
Location
Montreal
Years ago AMD took a leap of faith that surprised many: they purchased the popular graphics chip manufacturer ATI. Soon after that purchase the reason behind AMD’s action became apparent: a new class of processors would be created based off of an architecture called Fusion. The goal of Fusion would be to create a mutually beneficial synergy between x86 CPU cores and a graphics processing engine in order to benefit from each design’s strengths. Remember, this initiative was announced years ago and the market has been eagerly awaiting the fruits of AMD’s labor ever since.

LLANO-4.jpg

We have already seen the release of AMD’s first line of Accelerated Processing units (or APUs) in the form of the Brazos platform. Consisting of the Ontario and Zacate APUs, this platform is targeted towards the entry level mobile and desktop markets but has proven to be extremely popular nonetheless.

The Llano series of APUs was next with the launch of the Sabine mobile platform which is now being followed by the desktop Lynx platform. A typical Lynx platform will consist of the A8, A6, A4 and E2 series of Accelerated Processing Units along with FM1-socket equipped A55 and A75 motherboards. For those of you wondering, this new FM1 socket isn’t compatible with past AMD processors.

LLANO-1.png

What AMD has done with the Llano series of processors is combine a number of items onto a single die. Much like Intel did starting with Lynnfield, the typical functions of the Northbridge (PCI-E functionality, the socket to southbridge interconnect, etc.) have been incorporated into the APU. A native quad core CPU along with a dedicated DX11 graphics core have also been added to a 32nm die that only measures 228 sq.mm.

LLANO-5.png

Initially AMD will be releasing two A-series quad core processors without AMD’s Turbo Core Technology called the A8-3850 and A6-3650. Alongside the 3850 and 3650 will be a pair of APUs sans the xx50 branding that support Turbo Core but will make do with clock speed reductions in order to reach a lower TDP. All of these processors feature 4MB of L2 Cache (1MB per core), support for 1866Mhz DDR3 and Dual Graphics (more on this in an upcoming section). For the purposes of this initial review, we will be looking at the A8-3850.

The main differentiating factor between the A8 and A6 series is the graphics co-processor installed onto the APU die. With the A8 APUs there’s a 400 core, 600Mhz HD 6550D included while the A6 uses a HD 6530D sporting 320 cores and lower clock speeds AMD describes a “Discrete-Class GPU Experience” for these Llano APUs but remember, there are many different levels of discrete GPUs and a mere 320-400 Radeon cores won’t be enough for most experienced gamers.

Closer to the lower end of the market, an A4-3400 will make its way into AMD’s lineup. This budget friendly APU is essentially half of an A6-3650 with two processor cores, slightly higher clock speeds, a cut down GPU core, 1MB of L2 cache (512KB per core), a lack of Turbo Core and devoid of official 1866Mhz memory support. It also boasts an impressive TDP of just 65W support but retains compatibility with AMD’s Dual Graphics technology.

Finally we have the E2-3200 which will begin its assault on the entry level desktop market within a few weeks and will be directly targeting Intel’s entry level Pentium series. With two cores, lower clock speeds and a lack of Dual Graphics support, it brings up the tail end of AMD’s Lynx platform but it will likely be highly appealing for HTPC users.

LLANO-3.jpg

As we have come to expect from them, AMD is targeting their new APUs towards some highly affordable price points. The A8-3850 APU will hit a retail price of around $135 USD which puts it up against Intel’s Core i3 2100, 2105 and even i3 2120. The A6 processor meanwhile will be priced somewhere between $100 and $115. What really makes Llano stand out though is the fact that it is a tremendous value. $135 for the top-end A8-3850 is a great deal when you consider that it is packing a GPU that would easily retail for $50-60.

Intel realized the need for higher performance onboard graphics and their current generation of Sandy Bridge processors has taken a drastic step forward in this respect. AMD’s Fusion Accelerated Processing Units (or APUs) are actually aiming to one-up Intel in a number of areas but it remains to be seen whether they have what it takes to become market leaders.

LLANO-2.png
 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,086
Location
Montreal
Inside the Llano APU Architecture

Inside the Llano APU Architecture


AMD’s Llano chips may have been a long time in the making but the technology they use is cutting edge. There were however serious hurdles that needed to be overcome before these new Accelerated Processing Units could be launched.

One of the main challenges AMD faced was the seemingly impossible task of cramming an impressive array of previously separate items onto a single die. On a single chip they needed to implement a GPU with its audio and video I/O needs, four CPU cores with their own L2 cache, a DDR3 memory controller and an integrated Northbridge to ensure low latency communication. In order to accomplish this, AMD turned to GlobalFroundries’ new 32nm manufacturing process and the end result is indeed an impressive looking architecture. However, the challenges will likely be even greater as AMD begins designing Trinity, their next generation APUs which will feature even more x86 CPU cores and additional GPU computational power

LLANO-9.png

Upon first glance the specifications between the A-series APUs and Intel’s Sandy Bridge architecture are quite similar. Both use a leading edge 32nm manufacturing process, have approximately one billion transistors and sport similar die sizes. The TDP of these two architectures is also comparable with a range of chips from 65W to 100W being available (though Intel’s highest end 2600K chips have a TDP of 95W).

It should be mentioned that AMD is moving to the 32nm manufacturing process a full 18 months after Intel released their Clarkdale series but they simply could not have released A-series APUs without making this transition since these are transistor-packed processors. The four-core Llano variants have a whopping 1.45 billion transistors which is almost 300 million transistors more than Intel’s six-core Gulftown processors and those chips pack a huge 12MB of L3 cache. Nevertheless, despite having very high transistor counts these Llanos are quite compact when you consider that quad-core Sandy Bridge processors are a mere 5% smaller (216mm2) despite only featuring 995 million transistors.

With this new manufacturing process, we would have expected some kind of frequency bump, but that clearly is not the case when it comes to Llano. For the time being the A-series APUs are surprisingly low clocked when you consider that even low-end 45nm ‘Propus’ models have reached up to 3.2GHz. Obviously, AMD ran into some power or thermal limitations due to the addition of the large graphics/media portion to the CPU die. The size of die was also probably the reason why they opted not to have any L3 cache, a design shortcoming that has proven to have a significant impact on gaming performance. Having said that, Llano processors will still have better gaming performance than any other IGP-totting competition thanks to their impressive integrated GPUs.

LLANO-6.png

When seen from a top-down view, the A-series 32nm die quickly sheds its secrets. Along the left hand side there are four x86 cores which are based off of AMD’s current Phenom II architecture and whose capabilities can be closely compared with those found on Athlon II X4 “Propus” CPUs. As we already mentioned, AMD Turbo CORE technology is supported on the Llano CPU cores but only some processors will take advantage of it.

The CPU cores are paired up with 1MB of L2 cache per core while the typical L1 cache directly integrated onto each CPU section. Much like Athlon II processors, there is no L3 cache but the vastly improved L2 cache (up from 512 KB per core on Propus-series CPUs) should give the A8 and A6 series APUs a performance advantage in some scenarios.

The typical Northbridge functionality along with its associated PCI-E lane controllers have also been built onto the APU die along with a dual channel memory controller and display interfaces. By bringing these items on-die, AMD is able to drastically simplify motherboard designs which should keep costs down and allow for better control I/O signals.

In keeping with AMD’s design philosophy for Fusion, you can see just how much space the GPU and multimedia I/O section takes up on the die. This is without a doubt the most complicated part of any APU and sacrifices had to be made on lower-end A-series processors by cutting down the SIMD array in order to lower TDP and simplify the manufacturing process.

Whether or not this design can be called a true “fusion” between the CPU and GPU is open for debate since the two processing areas remain distinctly separate from one another. As we will see below, the interconnect between the CPU and GPU remains largely unchanged from past generations though some serious efforts have been made to cut down on interconnect latency.

LLANO-7.png

The diagram above may look complicated but its intentions are straightforward as it shows how AMD has simplified the x86 operational cycles to increase process efficiency. The vast majority of complex instructions are now sent directly through the primary logic units and onto the load / store queue instead of meandering through the chip. We can also see a clear interconnect between the northbridge interface and the L1 and L2 caching hierarchy which should help improve overall computational performance.

The result of these small changes along with the larger L2 cache structure is an average IPC (instructions per clock) improvement of about 6% when compared to the previous generation.

LLANO-8.png

The links between each section of the APU follow in the same footsteps as the previous generation but AMD has refined certain interconnects with the goal of speeding up information transfers. The AMD Fusion Compute Link is a medium bandwidth connection which manages the complex interaction between the onboard GPU, the CPU’s cache and the system memory. For the time being, AMD hasn’t completely fleshed out this pathway but upcoming generations of Fusion will be able to take full advantage of its potential bandwidth.

The Radeon Memory Bus on the other hand is the all-important link between the onboard graphics coprocessor and the primary on-chip memory controller. Rather than acting like a traffic cop (a la Fusion Compute Link) which tries to direct the flow of information, this memory bus is all about the GPU having unhindered high bandwidth access to the system’s memory controller.

In the previous generation of AMD IGPs, the Northbridge’s graphics processor had to jump through a series of hoops before gaining access to onboard memory which is partially why 128MB of “SidePort” memory was sometimes added. This single chip all in one solution allows for the elimination of many potential bottlenecks and results in an average of four times more bandwidth between the GPU and memory when compared to past solutions.

Speaking of memory, AMD allows for up to 64GB to be installed and officially supports speeds of up to 1866 MHz.
 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,086
Location
Montreal
Codename 'Llano' - A Quick Look at the A8-3850 APU

Codename 'Llano' - AMD A-Series APUs


1.png
2.png
3.png

Although AMD did not yet have any retail packaging to send or show us, this is what the logos for the A-series APUs looks like along with a quick glimpse of "FX" branding that will make its way into other processors. AMD's Vision branding has now been carried into the processor market as well so expect a drastic change from the ordinary on this front as well.

Gone are the usual "Phenom" and "Athlon" monikers since it was decided that returning to a standard number / letter scheme would help clear up some confusing product overlap.

Llano_A83850_2th.gif
Llano_A83850_3th.gif

As you can see, our A8-3850 sample is physically identical to the previous Socket AM3 Athlon II and Phenom II processors that we are all familiar with. However, these new chips have 905 pins instead of 938 pins, hence the reason for the new FM1 socket. As we explained in our preview of the GIGABYTE A75-UD4H motherboard, AMD have wisely decided to keep the same AM2/AM3 mounting bracket for the FM1 socket, so all your previous CPU coolers will be re-usable. True enthusiasts will notice that this sample was manufactured in the 19th week of 2011, so it is some very fresh silicon.

Llano_A83850_8th.gif
Llano_A83850_9th.gif

Much like Intel have done with Sandy Bridge, AMD have elected for a 100Mhz reference clock on the A-series APUs, which they aptly call the APU bus. Unlike Intel’s approach though, this new APU bus has good a bit of overclocking headroom, with early results ranging from 133Mhz to 150Mhz.

Thanks to some new power-saving features, and the 32nm manufacturing process, these new chips can undervolt themselves by an incredible amount when idle. As you can see in our screenshot, despite the fact that our sample would usually default to 1.00V, it would often dip down to 0.44V when idle. Under full load, the core voltage would shoot up to about 1.39V, which is almost exactly the same full load voltage that we've seen with our recent Phenom II processors.

Along with the A8-3850 sample, AMD sent us an ASUS F1A75-M PRO motherboard as a test bed.


This little ASUS micro-ATX motherboard is based on the new AMD A75 Fusion Controller Hub (FCH), and is typically well featured considering its price doesn't even hit the $150 mark. It come with two PCI-E x16 slots (which support Crossfire at x8 / x8 links) along with native USB 3.0, SATA 6 and some wicked overclocking headroom. If you want to know more about the new chipsets for Lynx platform, check out the following page.
 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,086
Location
Montreal
The A55 & A75 Fusion Controller Hub (FCH)

The A55 & A75 Fusion Controller Hub (FCH)


LLANO-46.gif

Prior to Llano, AMD used a two chip Northbridge / Southbridge solution on their motherboards http://www.hardwarecanucks.com/foru...s-890fx-chipset-evolution-am3-platform-3.html where the Northbridge was responsible for tasks like supplying the primary PCI-E lanes, facilitating communication between the Southbridge and the CPU and providing a platform for onboard graphics on some chipsets. The Southbridge meanwhile was used primarily as an I/O hub for USB, SATA, audio and networking connections while supplying a few additional PCI / PCI-E lane for general purpose connectivity. With Llano and the FM1 socket things have changed somewhat as all of the Northbridge’s functions are now built directly into the CPU die

With the Northbridge’s functionality built directly into the APU, the general layout of A-series motherboards begins to look a lot like Intel’s have since Clarkdale was introduced. The DDR3 memory controller, display outputs for the graphics controller and 16 PCI-E lanes for discrete graphics cards originate from the APU die. This x16 layout can be split into two x8 slots for Crossfire support but for the time being SLI hasn’t been certified for AMD’s APU motherboards. Four additional PCI-E 2.0 lanes for General Purpose Ports also originate from the APU and are used for offloading high demand I/O functions from the UMI interface.

AMD has transferred the typical Southbridge functions into an all-in-one solution called the Fusion Controller Hub of which there are two models: the A75 and lower end A55. Connecting the FCH to the APU is the Unified Media Interface (or UMI) which consists of four PCI-E 2.0 lanes for I/O transfers and system management. This results in an interconnect bandwidth of 2 GB/s which is a far cry from the 4.16 GB/s (5.2 GT/s) of the previous generation’s Northbridge / CPU pathway. However, the high bandwidth of Hypertransport 3.0 between what amounts to a glorified Southbridge and the APU isn’t needed since the FCH contains none of the Northbridge’s demanding PCI-E lanes.

LLANO-41.png

The Fusion Controller Hub is a deceptively simple 65nm chip which is responsible for a host of functions. It incorporates functionality for SATA, USB, HD audio in case the A/V outputs on the GPU aren’t used and controls up to three legacy PCI slots. One of the more interesting features which AMD has built in to the FCH is native compatibility with consumer IR (CIR) devices like remotes and this could prove to be a boon for HTPC users. The addition of a 16MB BIOS chip onto A75 and A55 boards means UEFI support is possible and some manufacturers like ASUS have implemented just that.

LLANO-44.png

The differences between the A75 and A55 FCH may not seem all that apparent upon first glance but the lower end chips do have some key features missing. To begin with, the A75 FCH is the first controller on the market to natively support both USB 3.0 (four ports) and SATA 6 Gb/s (6 ports) while the A55 doesn’t natively support either format. Any board sporting an A55 FCH can still implement USB 3.0 or SATA 6 Gb/s but it will need somewhat expensive third party controller chips to do so. The only other real difference between these two chipsets is the A55’s lack of FIS Base Switching for RAID setups.

LLANO-45.png

According to AMD, the implementation of native USB 3.0 ports running directly off of the A75 chipset will allow for higher performance levels than any current third party controller can supply due to a virtual elimination of processing overhead. Most users likely won’t feel a difference between native and non-native designs but there is a difference on paper if one was to look closely at performance.

LLANO-43.png

One area in which AMD seems to have a clear advantage over Intel is within the display output selection. Currently, Intel’s onboard graphics solutions officially support resolutions of up to 1920 x 1200 (we have actually had issues getting this resolution to work on some Intel setups) through a single link DVI port or HDMI connector. AMD’s onboard graphics processor on the other hand fully supports resolutions up to 2560 x 1600 via DVI or DisplayPort.

In addition to compatibility with high resolution monitors, the Llano APUs have a flexible display interface which allows PCI-E lanes to be configured on the fly in order to expand the desktop onto a pair of screens if needed. The Fusion Controller Hub also has an integrated DAC which supports a single VGA display which can be used in lieu of precious PCI-E bandwidth.
 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,086
Location
Montreal
An In-Depth Look at the HD 6000 Series IGPs

An In-Depth Look at the HD 6000 Series IGPs


LLANO-17.png

Llano is unique in the graphics department due to its use of the first ever integrated DX11 processing unit. Both Sandy Bridge and AMD’s last generation –the HD 4000-series which graced the Leo and Dorado platforms- IGPs were nothing more than warmed over DX10 parts. Intel’s own HD 3000 and HD 2000 Platform Graphics Controller was certainly a step in the right direction since it included support for DX10.1, Blu Ray 3D and a host of other features which brought it up to roughly the same level as an AMD HD 5450 DDR3. Unfortunately for Intel, the unit within the A8 series of APUs is an order of magnitude more powerful than anything previously installed as an integrated graphics processor.

LLANO-14.png

At the heart of higher end Llano APUs beats an updated Redwood core which is code named Sumo. Many will remember the Redwood architecture from the popular HD 5600 series of discrete GPUs. While Sumo retains its Terrascale 2 DX11 architecture and VLIW5 design, a few improvements have been made along the way in order to bring this design up to modern standards.

One of the largest differences between Redwood and Sumo is the latter’s fabrication on Global Foundries’ new 32nm HKMG manufacturing process. This effectively cuts down on the physical die size while decreasing power consumption and thermal leakage and has effectively allowed AMD to cram more transistors into a limited on-die area.

Since AMD has done away with their 2-chip motherboard solution in favor of integrated Northbridge functionality onto the APU die, some changes to Redwood’s communication structure were necessary. The GPU core now interfaces with the built-in Northbridge via the 29.8 GB/s (and aptly named) Radeon Memory Bus to ensure latency is reduced to an absolute minimum.

LLANO-13.png

The Sumo architecture adds two additional features which were lacking from the older AMD GPU designs: UVD3 and dynamic power gating. UVD3 brings increased HD output capabilities (which are detailed on an upcoming page) while dynamic power gating allows the GPU to shift though different power modes on the fly.

The primary use for power gating is to increase the battery longevity in the Sabine platform notebooks but it also has its uses in the desktop space. In order heighten overall efficiency, the graphics core will downclock to extremely low levels when an idle state is detected and increases clocks in proportion to the needs of the program being run.

LLANO-11.png

The HD6550D graphics processor on the A8-series of APUs boasts 400 of the recently renamed “Radeon Cores” along with 20 texture units and 32 ROPs which basically makes it a Redwood XT core. Meanwhile, the slightly lower spec’d HD 6530D in the A6 APUs uses a design that’s similar to the Redwood LE but as with the HD 6550D, it uses slightly lower clock speeds in order to keep thermal output within reasonable limits. The HD 5670 / HD 6570 and HD 5550 are the discrete card analogs for these two cores and that’s actually quite impressive when you consider the perfectly respectable gaming performance each delivers.

LLANO-18.png

The BIOS on any A75 or A55 board allows you to set aside a portion of the system memory to be used explicitly as the graphics core’s framebuffer. Anything from 32MB to 2GB can be selected in most motherboard BIOSes.

Using onboard memory as a means to interact with the graphics core means that both the system memory and the portion set aside for the GPU will be running at the same speeds. Therefore, AMD recommends that higher speed DDR3 is used. We’ll be testing the effect of memory bandwidth and capacity on in-game performance a bit later in this article.

LLANO-12.png

While the power of AMD’s new class of onboard graphics controllers should be apparent by now, there are several periphery benefits as well. With UVD3’s proper support of stereo 3D content at a possible 120Hz, these APUs are better prepared for upcoming display technologies. As we can see above, anisotropic filtering is also a clear win for AMD but hopefully these impressive hardware specifications and a feature rich architecture translate into useful real world performance gains.
 
Last edited by a moderator:

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,086
Location
Montreal
The Effect of Memory Bandwidth on IGP Performance

The Effect of Memory Bandwidth on IGP Performance


Since Llano’s Sumo graphics controller is literally tied at the hip to the system memory through the 29.8 GB/s Radeon Memory Bus, it goes without saying that DDR3 DRAM speeds, capacity and timings will affect performance. Indeed, you’ll likely see every major DRAM manufacturer release some form of 2x2GB or more likely a 2x4GB Lynx platform certified memory kit in the coming weeks. But how much can your choice in memory impact in-game framerates and overall 3D performance? That’s what this section is all about, folks.

LLANO-18.png

This bears repeating for anyone who skipped to this section: every A75 and A55 motherboard should have a setting within the BIOS which allows for the selection of preset frame buffer sizes along with an Auto default. Each step reserves an increasingly large chunk of the system memory explicitly for UMA (IGP) use and the Auto setting determines a frame buffer size based off of the amount of installed memory (usually a quarter of the system RAM will be reserved in this case).

LLANO-19.png

The results as seen above are pretty straightforward. The 32MB and 64MB settings just don’t set aside enough memory and which results in the GPU being absolutely starved for bandwidth. Even the 128MB settings returns unacceptable performance levels, particularly at 1920 x 1080. At 256MB, things start to look much better but the minimum framerates suffer when the resolution is increased past the 720P mark. Both 512MB and 1GB seem to be the sweet spots for this game (and others we tested) as they both returned playable framerates at 720P but higher resolutions proved to be a bridge too far for the underlying GPU architecture.

Things started to get particularly interesting when setting aside a whopping 2GB of system memory for the GPU’s use. Our average results saw very little impact but even at 720P, this game chugged along in certain parts of the test race. In addition to the Windows 7 page file, our 4GB equipped system only had about 360MB of memory left for the CPU cores to utilize which led to processor bottlenecking and very low minimum framerates. To make matters even worse, the 2GB of memory is dedicated to the GPU at all times so Windows startup / shut down took far too long and even programs like Word, AutoCAD and Outlook were sluggish at best.

Boosting the onboard system memory to 8GB alleviated every one of the issues but judging from the performance we saw, the HD 6550D GPU just can’t take advantage of anything more than 1GB of memory anyways. Meanwhile, the Auto setting exhibited perfectly capable performance so we recommend sticking with the system defaults. Just be sure to check AMD's System Monitor to ensure the Auto setting is reserving the correct amount of memory.

Next up, memory speed and latency performance…

LLANO-20.png

LLANO-21.png

There’s no doubt about it: increasing memory speeds can have a drastic effect on performance but only up to 1600 MHz. Above that, tightening timings has a minimal affect at best and even between 1600 7-7-7 and 1866 6-6-6 there is very little improvement. This is likely due to the GPU’s inability to take advantage of higher memory speeds, particularly at higher resolutions. But if you are playing on a sub-1080P screen, a bit of extra performance can be found with ultra high memory speeds and tight timings. One thing that you will most likely want to avoid is a speed of 1066 which is ironically what most motherboards will read as the default SPD speed.

However, memory speeds can be a double edged sword as well. There are a small number of DDR3 kits that can effectively hit 1866 at CL6 and the few that do tend to carry a hefty price premium since their ICs are binned for overclocking. They’re usually so expensive that you’d be better off investing in less expensive memory and buying a dedicated entry level GPU like the AMD HD 6570 rather than trying to squeeze every last drop of performance from the IGP.

In our opinion, this is one of the few cases where faster isn’t better. For a perfect mix of performance and pricing, we’d recommend picking up a 4GB kit of 1600 MHz DDR3 which has the ability to hit CL7 at a reasonable voltage setting. Anything above that would be money wasted but remember to high tail it into the BIOS as soon as possible and change the memory speed since every modules we’ve tested (even ones certified by AMD) defaulted to 1066 MHz.
 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,086
Location
Montreal
Dual Graphics: Hybrid Crossfire Done Right?

Dual Graphics: Hybrid Crossfire Done Right?


Some of you may remember two competing technologies that were originally introduced years ago: Hybrid SLI and Hybrid Crossfire. Their goal was to have the system’s integrated graphics processor and a discrete GPU work in tandem to increase in-game performance over what an IGP alone could provide. Unfortunately, the AMD / ATI solution didn’t work all that well due to the somewhat obsolete IGPs being used while Hybrid SLI never really caught on outside of the notebook market. NVIDIA has since gone on to implement Optimus –an evolution of the original Hybrid SLI concept- on Sandy Bridge platforms while AMD has now finally introduced their own similar technology called Radeon Dual Graphics.

LLANO-28.png

Much like NVIDIA’s technology, Dual Graphics only works under the Windows 7 OS and is able dynamically apply GPU acceleration when it’s needed. However what it does that technologies like Optimus (and the desktop version Synergy) and Virtu can’t is leverage the rendering power of both the IGP and dGPU for increased performance. In layman’s terms, AMD’s drivers now allow for mixed Crossfire configurations between certain discrete GPUs and the graphics coprocessor in A8, A6 and A4 series APUs. The APU acts as the primary display output source while the discrete GPU sends its signals through the onboard PCI-E interface and onto the dedicated I/O pathways.

LLANO-27.png

This may sound simple and straightforward but there is a somewhat complicated set of compatibility requirements that need to be addressed before Radeon Dual Graphics will work with A-series APUs. In short, the graphics controllers on the A8 (HD 6550D IGP) and A6-series (HD 6530D IGP) processors are compatible with any AMD graphics card based off of the Turks and Caicos cores (HD 6670, HD 6570 and HD 6450) while the HD 6410D IGP in the A4 branded APUs will only work with the HD 6450 and HD 6350 cards. The E2 series APUs aren’t compatible with Dual Graphics due to their entry level market positioning.

Once AMD’s Vision Engine Control Center picks up a Radeon Dual Graphics compatible system along with a supported discrete GPU, it will then assign the Crossfire grouping a new name. For example, combining a HD 6670 with the A8 APU will result in the system displaying HD 6690D2 as the primary display controller once Crossfire is enabled. The chart above illustrates this for other IGP / dGPU combinations as well.

Alongside the potential performance benefits of this technology, multi monitor outputs can also be augmented. With the APU’s ability to feed two monitors at once and AMD’s 6000-series discrete GPUs able to output three display signals, up to FIVE monitors are supported with Radeon Dual Graphics.

Naturally, with all of this signal routing and GPU switching taking place, there are a million and one things that can go wrong somewhere along the line. So we decided to test just how well Dual Graphics works in the real world….

LLANO-22.png

First up we have 3DMark11 and indeed it looks like the APU and discrete GPU are able to work in tandem for a cumulative performance increase. Considering the Dual Graphics configuration was deceptively simple to set up, this first result bodes well for the technology’s future.

LLANO-23.png

LLANO-24.png

In a real world gameplay scenario, the results of this technology are impressive to say the least. At 1920 x 1080 the combination is able to more than double the performance of the APU by itself while adding about 40% to the framerates achieved by the HD 6570. It seems like Dual Graphics works quite well since there’s a massive increase over the rudimentary performance granted by the APU itself.

LLANO-25.png

LLANO-26.png

Unfortunately, in F1 2010 we see the other side of the coin since compatibility with this title is less than stellar. At 720P the Dual Graphics setup struggles to beat the IGP’s number while being trounced by the HD 6570 on its own. Things go a bit better at a higher resolution but the framerate increase isn’t anywhere close to the results in Just Cause 2.

LLANO-29.gif

When it works, Radeon Dual Graphics works well but its fluid integration lives and dies by the driver stack’s game compatibility. There are obviously some teething pains which boil down to driver issues rather than a fundamental problem with Dual Graphics but that really shouldn’t lead you away from a Dual Graphics solution. We’re told that AMD’s is still working out some kinks but to be honest with you, we were left with a very positive first impression of this new technology regardless of the hiccup within F1 2010 and some uneven load balancing (as seen in the image above). Plus, with AMD’s new Crossfire app profiles being rolled out with some regularity, any problems should be easily overcome with a few simple downloadable tweaks.
 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,086
Location
Montreal
From GPGPU to APP - Simplified Parallel Processing

From GPGPU to APP - Simplified Parallel Processing


One of the primary benefits of AMD’s Fusion architecture is its ability to leverage the serial computing benefits of x86 processing cores alongside the massively parallel capabilities of GPU computing. In short, Llano APUs should be able to dedicate the appropriate resources (be it CPU or GPU) to a given program instead of using the CPU cores for every task.

HD6800-213.jpg

In order to carry on the torch from ATI’s Stream Compute moniker, AMD Accelerated Parallel Processing or APP has been created. This is now an all-encompassing term which can be used for graphics cards as well as the Fusion APUs.

AMD’s new APP SDK v2.2 has now hit developers’ doorsteps and with it there should now be a seamless integration of OpenCL computing for both x86-based CPUs and GPUs. This is one of APP’s major benefits over NVIDIA’s competing solution as it can be leveraged for a heterogeneous environment where specific tasks are sent towards whichever APU processor will complete them most efficiently.

LLANO-32.png

With an architecture that is tailor made for low latency communication between the processor cores and the onboard GPU, AMD has been able to drastically increase the compute capacity of their A-series APUs. It may not sound all that important but higher compute performance can benefit everything from HD decoding to video transcoding to in-game physics acceleration.

LLANO-35.png

Naturally, Open CL 1.1 and DirectCompute compliant programs will be the cornerstones of AMD’s Fusion generation and there are already quite a few on the market as programmers come to realize the benefits of GPU acceleration.
Since a program itself needs to support APP GPU acceleration, the real potential of Fusion APUs will live and die at the hands of manufacturer support rather than driver revisions.

LLANO-36.png

Not all GPU accelerated programs are made equally and the algorithms used by some tend to benefit certain compute architectures more than others. One example of this is Cyberlink’s MediaEspresso which clearly favors Intel’s new Sandy Bridge IGPs. We can however see that transcoding on the A8’s HD 6550D GPU does net some substantial time savings over using just the CPU.


Touching Upon AMD’s New Steady Video

LLANO-33.png

AMD has recently announced a new APP accelerated feature for their GPUs and APUs called Steady Video. With such an unassuming name, it may be glossed over by many users but its benefits are far reaching and can be attained seamlessly through the Vision Engine Control Center.

Basically, through the use of GPU computing Steady Video removes the shaking and wiggling out of videos that use DXVA or Flash 10.2. This means any jerky handicam or amateur footage hosted on sites which use streamed Flash 10.2 content like Youtube and Vimeo will be automatically smoothed out.

LLANO-34.png

For the time being, Steady Video is an exclusive technology for AMD’s APUs. Its real world benefits may not be apparent upon first glance but it will have a noticeable effect upon your online video watching experience. It was so unobtrusive throughout testing that we forgot it was active…right up until we switched back to our standard test system and were greeted by the usual shaky Youtube videos. As a side note, there were some situations where we found that having Steady Video enabled tended to detract from the director’s original intent.
 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,086
Location
Montreal
High Definition Content for the Masses

High Definition Content for the Masses


The Llano APUs may use an older graphics architecture but it has been slightly overhauled to bring it up to modern standards. One of the areas which garnered some massaging was high definition playback compatibility by brining the ubiquitous Universal Video Decoder up to the third generation spec seen on AMD’s HD 6000-series. This is particularly important since it brings hardware acceleration of high definition content to much lower price points.

HD6800-206.jpg

AMD’s Universal Video Decoder has been around for years and is known as one of the most capable video processing platforms currently available. Recently, UVD took the next logical step forward with an expanded list of accelerated codecs in addition to the ones already in place from past iterations.

One of the main features which has been added to the newly minted UVD3 is the ability to decode videos which use MVC encoding. As part of the H264 / MPEG-4 AVC codec, MVC is responsible for creating the dual video bitstreams which are essential for stereoscopic 3D output. Supporting this standard gives AMD’s APUs the ability to process Blu Ray 3D movies through a HDMI 1.4a connector. MPEG-4 Part 2 hardware acceleration for DivX and Xvid codecs has also been added but the Nero Digital codec is still MIA for the most part.

HD6800-221.jpg

AMD’s main focus for Llano’s HD content features was to go beyond simply decoding HD content and give users high end image quality improvements which are processed prior to the signal reaching the display. Through the use of the compute resources within a given system, additional pre and post-processing can be done before outputting an HD video stream. The result is some impressive HD playback capabilities for Llano…

LLANO-37.png

LLANO-39.png

In our quick testing, the A8-3850 had absolutely no problem decoding high bitrate content and compatibility with Dolby TrueHD and DTS MA tracks was seamless provided the necessary options were selected in PowerDVD. AMD’s new processor also thoroughly trounces the Intel HD 2000 IGP in both movies. Also note that any “CPU Only” result above does not include HD audio track decoding.

LLANO-38.png

Decoding the dual MVC HD bitstreams necessary for Blu Ray 3D playback is just too much to handle for your typical processor and even the Sandy Bridge IGP displayed some catastrophic results. We’re not sure if it was a lack of driver support or something else but the i3 2120’s IGP flat our refused to run any stereo 3D content through the HDMI 1.4 connector. Hopefully, this will be rectified soon with a driver revision.

Despite the load being put on the system by stereo 3D content and a DTS-HD audio track, the A8’s IGP had no issue in this test which is a testament to AMD’s refined video processing engine and codec support. Granted, the processor load was quite high but we can’t expect anything less when the engine is crunching through so much data.
 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,086
Location
Montreal
Test Setups & Methodology

Test Setups & Methodology


For this review, we have prepared four different test setups, representing all the popular platforms at the moment, as well as most of the best-selling processors. As much as possible, the four test setups feature identical components, memory timings, drivers, etc. Aside from manually selecting memory frequencies and timings, every option in the BIOS was at its default setting.

AMD Llano FM1 Test Setup​

Llano_A83850_20.jpg

AMD Phenom II AM3 Test Setup​

Llano_A83850_21.jpg

Intel Core i5/i7 LGA1155 Test Setup​

Llano_A83850_22.jpg

Intel Core i3/i5/i7 LGA1156 Test Setup​

Llano_A83850_23.jpg

Intel Core i7 LGA1366 Test Setup​

Llano_A83850_24.jpg

*Although Windows Vista SP1 was our principal OS for the majority of benchmarks, we did use Windows 7 (with all the latest updates) when benchmarking AIDA64.*

For all of the benchmarks, appropriate lengths are taken to ensure an equal comparison through methodical setup, installation, and testing. The following outlines our testing methodology:

A) Windows is installed using a full format.

B) Chipset drivers and accessory hardware drivers (audio, network, GPU) are installed followed by a defragment and a reboot.

C)To ensure consistent results, a few tweaks were applied to Windows Vista and the NVIDIA control panel:
  • Sidebar – Disabled
  • UAC – Disabled
  • System Protection/Restore – Disabled
  • Problem & Error Reporting – Disabled
  • Remote Desktop/Assistance - Disabled
  • Windows Security Center Alerts – Disabled
  • Windows Defender – Disabled
  • Windows Search – Disabled
  • Indexing – Disabled
  • Screensaver – Disabled
  • Power Plan - High Performance
  • NVIDIA PhysX – Disabled
  • V-Sync – Off

D) Programs and games are then installed & updated followed by another defragment.

E) Windows updates are then completed installing all available updates followed by a defragment.

F) Benchmarks are each run three times after a clean reboot for every iteration of the benchmark unless otherwise stated, the results are then averaged. If they were any clearly anomalous results, the 3-loop run was repeated. If they remained, we mentioned it in the individual benchmark write-up.

Here is a full list of the applications that we utilized in our benchmarking suite:
  • AIDA64 Extreme Edition v1.50.1200 & v1.80.1459 Beta (Windows 7)
  • ScienceMark 2.0 32-bit
  • MaxxMEM2 Preview
  • wPrime Benchmark v2.03
  • HyperPI 0.99b
  • PCMark Vantage Advanced 64-bit Edition (1.0.2.0)
  • Cinebench R10 64-bit
  • Cinebench R11.5.2.9 64-bit
  • WinRAR 3.94 x64
  • Photoshop CS4 64-bit
  • Lame Front-End 1.0
  • x264 Benchmark HD (2nd pass)
  • 7-Zip 9.20 x64
  • POV-Ray v3.7 beta 40
  • Deep Fritz 12
  • 3DMark06 v1.2.0
  • 3DMark Vantage v1.0.2
  • Crysis v1.21
  • Far Cry 2 1.02
  • Left 4 Dead version 1.0.2.3
  • Valve Particle Simulation Benchmark
  • Word in Conflict v1.0.0.0
  • Resident Evil 5 1.0.0.129
  • X3: Terran Conflict 1.2.0.0


That is about all you need to know methodology wise, so let's get to the good stuff!
 
Status
Not open for further replies.
Top