Thursday, October 15, 2015

AMD's ARM-based "Hierofalcon" SoC sighted

On the same OSADL site, which once provided some first signs of life of a 2 GHz Jaguar based APU, there is now an engineering sample of one of AMD's ARM based embedded processors, called "Hierofalcon". The processor can be found in rack #a slot #3. According to the tables and logs, it also runs at 2 GHz and has 8 cores. If you haven't heard of that processor, just check these two slides. I actually included the first only because of the bird. ;)

I prepared some charts out of the numbers given there. If you check the site, you'll find no directly linked latency chart. But their 1337 page allows to compare rack #a slot #3 to some other rack/slot and returns the missed latency chart of the "Hierofalcon", which looks like this (updated daily):
Latency plot of AMD "Hierofalcon" ES
Latency plot of AMD "Hierofalcon" ES
The only available performance numbers are some daily updated Unixbench results. So I took them, combined rack names with CPU strings, sorted them, choose some CPUs for comparison, and normalized the results to the CPU in question.

The first chart already shows, that on a per clock basis AMD's other CPUs already lag behind in simple integer code of the old kDhrystone benchmark. The floating point based Whetstone benchmark draws a somewhat different picture with more equally distributed per clock performances except that of the old K10 based Phenom II. The next three benchmarks Execl (not Excel!), kCopy, and kPipe test OS functions like spawning processes, doing file copying or using the pipe. The Index is a combined result.

In the next chart we can see the raw performance of all cores, only normalized again to Hierofalcon.

Even then the 8 cores of the ARM based processor have a good standing in the first two benchmarks, while in the OS benchmark, it roughly keeps up with Kaveri and Bulldozer, both running at much higher clock speeds.

The ARM based CPUs are meant to put many lower power cores together. To have a first impression of that effect, I used the given TDP numbers as the only metric available for all CPUs. Here are the power efficiency numbers:

I think in this case, the Hierofalcon bars are really easy to spot, even though I used the max listed TDP of 30W. Only the already power optimized Sandy Bridge variants and the 9W Kabini are able to keep up in some of the tests. And of course, real power measurements would shift the numbers a bit.

Of course, many (including me) would like to see more interesting benchmarks, but these are the first numbers we've got and they aren't bad at all.


Onkel_Dithmeyer said...

Did you take account of the turbo-clock-rates? If not, it looks like Herofalkon have an aggressive Turbo. Both the Phenom and Bulldozer gain much more from multithreading. Or in a seccon thought, the Herofalkon goes into throttling due to thermal or power issues.

Matthias Waldhauer said...

I had a look at this possibility. The Whetstone benchmark (using (non-SIMD) FPU), scales in a nearly perfect way from one to 8 cores (799%).

So I think, there is no turbo mode in Hierofalcon. Being an embedded SoC (or being a misnomed Seattle), there might be no need for it to have a turbo mode (but power gating).

James Deng said...

I'm really struggling to see why anybody would choose this over another ARM offering.

Matthias Waldhauer said...

James, first these values have not much to do with the intended use cases of Hierofalcon/Seattle. Second, these are standard and outdated cores (AMD is a bit late here). Third, they need to develop the environment too. For that you need some pipe cleaner. If they make them replaceable by K12, this might work well.


The advent of Internet has opened new avenues for people to build and sustain professional and business relationships. LinkedIn is one such communication channel that facilitates a robust and authentic pool of talent. See more linkedin profile writing