Monday, February 29, 2016

New AMD Zen core details emerged

Just one week after my last blog posting, providing a hint of the maximum number of Zen cores supported per socket, a news wave about details of Zen based server processors given in the presentation of a CERN researcher hit the web. The guy works in the institution's Platform Competence Centre (PCC) and manages integration of predominantly prototype hardware according to his CERN profile. So it can be assumed, that anything he says about server platforms might have been provided by representatives coming from the different processor and server OEMs. The 8 memory channels haven't been mentioned before in a leak or patch. And the 32 core number is not related to my posting, as the CERN talk has been held on 29th of January while I published my posting (unaware of the talk) on the 1st of February, after first mentioning the patch already in December.

Now a new series of patches provides further information about the Zen core's IP blocks. They've been posted on 16/02/16 on the Linux kernel mailing list by an AMD employee, after an earlier round of patches in January, which even mention a "ZP" target, very likely being the abbreviation for "Zeppelin". The more recent patches cover additions to AMD's implementation of a scalable Machine Check Architecture (MCA), and handling of deferred errors. This is implemented in the Linux EDAC kernel module, which is responsible for hardware error detection and correction. The most interesting patch contains following sections, with some details highlighted:

+/*
+ * Enumerating new IP types and HWID values
+ * in ScalableMCA enabled AMD processors
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum ip_types {
+ F17H_CORE = 0, /* Core errors */
+ DF,  /* Data Fabric */
+ UMC,  /* Unified Memory Controller */
+ FUSE,  /* FUSE subsystem */
+ PSP,  /* Platform Security Processor */
+ SMU,  /* System Management Unit */
+ N_IP_TYPES
+};

+enum core_mcatypes {
+ LS = 0,  /* Load Store */
+ IF,  /* Instruction Fetch */
+ L2_CACHE, /* L2 cache */
+ DE,  /* Decoder unit */
+ RES,  /* Reserved */
+ EX,  /* Execution unit */
+ FP,  /* Floating Point */
+ L3_CACHE /* L3 cache */
+};
+
+enum df_mcatypes {
+ CS = 0,  /* Coherent Slave */
+ PIE  /* Power management, Interrupts, etc */
+};
+#endif

The interconnect subsystem is called "Data Fabric", which knows so called coherent slaves according to the last enumeration list. The "FUSE subsystem" might be replaced by something else like "Parameter block", as it just means a block managing the processor's configuration.

The second list of enumerations contains a blocks found in the Zen core or close to it. I think, the highlighted "RES" element might actually stand for a real IP block, as it doesn't make much sense to have it sitting inmidst the other elements and not at the end. According to some other code in the patch, the L2 cache is seen as part of the core, while the L3 cache is not (as expected):

+ case F17H_CORE:
+  pr_emerg(HW_ERR "%s Error: ",
+    (mca_type == L3_CACHE) ? "L3 Cache" : "F17h Core");
+  decode_f17hcore_errors(xec, mca_type);
+  break;

Now let's go through some of the error string lists, beginning with those dedicated to the load/store unit:

+/* Scalable MCA error strings */
+
+static const char * const f17h_ls_mce_desc[] = {
+ "Load queue parity",
+ "Store queue parity",
+ "Miss address buffer payload parity",
+ "L1 TLB parity",
+ "",      /* reserved */
+ "DC tag error type 6",
+ "DC tag error type 1",

This is the first of many lists containing error strings, in this case for the load/store unit. Similar to the enumeration above, there is a reserved element, possibly hiding something, as this is a public mailing list. The strings I left out don't contain any surprises compared to the Bulldozer family. But overall I get the impression, that AMD significantly improved the RAS capabilities, which are very important for server processors. The following block contains error strings related to the instruction fetch block ("if"):

+static const char * const f17h_if_mce_desc[] = {
+ "microtag probe port parity error",
+ "IC microtag or full tag multi-hit error",
+ "IC full tag parity",
+ "IC data array parity",
+ "Decoupling queue phys addr parity error",
+ "L0 ITLB parity error",
+ "L1 ITLB parity error",
+ "L2 ITLB parity error",
+ "BPQ snoop parity on Thread 0",
+ "BPQ snoop parity on Thread 1",
+ "L1 BTB multi-match error",
+ "L2 BTB multi-match error",
+};

There is a new L0 ITLB, which is the only level 0 thing being mentioned so far, while VR World mentioned level 0 caches (besides other somewhat strange rumoured facts like no L3 cache in the APU variant - while this has been shown on the leaked Fudzilla slide). The only thing resembling such a L0 cache is a uOp cache, which has clearly been named in the new patch in a section related to the decode/dispatch block (indicated by "de"):

+static const char * const f17h_de_mce_desc[] = {
+ "uop cache tag parity error",
+ "uop cache data parity error",
+ "Insn buffer parity error",
+ "Insn dispatch queue parity error",
+ "Fetch address FIFO parity",
+ "Patch RAM data parity",
+ "Patch RAM sequencer parity",
+ "uop buffer parity"
+};

There are strings for both a "uop cache" and a "uop buffer". So far I knew about this uop buffer patent filed by AMD in 2012, which describes different related techniques aimed at saving power, e.g. when executing loops or to keep the buffer physically small by leaving immediate and displacement data of decoded instructions in an instruction byte buffer ("Insn buffer") sitting between instruction fetch and decode. The "uop cache" clearly seems to be a separate unit. Even without knowing how many uops per cycle can be provided by that cache, it will help to save power and remove an occaisional fetch/decode bottleneck when running two threads. The next interesting block is about the execution units:

+static const char * const f17h_ex_mce_desc[] = {
+ "Watchdog timeout error",
+ "Phy register file parity",
+ "Flag register file parity",
+ "Immediate displacement register file parity",
+ "Address generator payload parity",
+ "EX payload parity",
+ "Checkpoint queue parity",
+ "Retire dispatch queue parity",
+};

Here is a first confirmation of a checkpoint mechanism. This has been described in several patents and might also be an enabler for hardware transactional memory, which has been proposed in the form of ASF back in 2009. Another use case is the quick recovery from branch mispredictions, where program flow can be redirected to a checkpoint created right before evaluating a difficult to predict branch condition.

Let me continue with some random picks:

+ "L3 victim queue parity",
...
+ "Atomic request parity",
+ "ECC error on probe filter access",
...
+ "Error on GMI link",

There is a confirmation of the "GMI link" mentioned on an already leaked slide, which mentioned a bandwidth of 25 GB/s per link. The term "Data Fabric" also has been used on that slide.

When reporting about the 32 core support, I wrote that some patents used the same wording. It's actually "core processing complex" (CPC) and can contain multiple compute units (like Zen cores). So they are not the same. AMD patent filings using the term are US20150277521, US20150120978, and US20140331069.

Last but not least I have updated the Zen core diagram based on these new informations and some very likely related patents and papers:



Notable changes are:
  • uOp Cache has been added based on the new patch
  • FMUL/FADD for FMAC pairing removed, based on some corrections of the znver1 pipeline description.
  • 4x parallel Page Table Walkers added, based on US20150121046
  • 128b FP datapaths (also to/from the L1 D$) based on "direct" decode for 128b wide SIMD and "double" decode for 256b AVX/AVX2 instructions
  •  32kB L1 I$ has been mentioned in some patents. With enough ways, a fast L2$ and a uOp cache this should be enough, I think.
  • issue port descriptions and more data paths added
  • 2R1W and 4 cycle load-to-use-latency added for the L1 D$ based on info found on a LinkedIn profile and the given cylce differences in the znver1 pipeline description
  • Stack Cache speculatively added based on patents and some interesting papers. This doesn't help so much with performance, but a lot with power efficiency.
It's still interesting, what the first mentioning of fp3 port for FMAC operations was good for. I thought, it was a typo, but more of the kind "fp3" instead of "fp2" in one case. It could still be related to register file port usage and/or bridged FMA, but probably not that useful for telling the compiler. Due to the correction patch I'm still looking further into the FPU topic, as promised earlier. I'll cover that in a followup posting.

Finally there is a hint at good hardware prefetcher performance (or bad interferences?), as AMD recommends to switch off default software prefetching for the znver1 target in GCC.

BTW have you ever heard of a processor core having 2 front ends and one shared back end?

Update: There is an update of the bespoken patches, posted on the same day as this blog entry. You can see it here. So far I didn't see any significant additions other than cleanups and fixes.

14 comments:

Nintendo Maniac 64 said...

"16/02/16"

Hah, that's quite the date! I must admit though, it really did through me for a loop for a moment, especially combined with the use of a two-digit year.

Though, if you use a 4-digit year (that being 2016/02/16) then you still end up with quite the funky date where it's still only using the numbers 0, 1, 2, and 6, and using each number exactly twice.

Unknown said...

https://patchwork.ozlabs.org/patch/599066/

(define_insn_reservation "znver1_sseimul_avx256" 4
(and (eq_attr "cpu" "znver1")
(and (eq_attr "mode" "OI")
(and (eq_attr "type" "sseimul")
(eq_attr "memory" "none"))))
"znver1-double,znver1-fp0*4")

(define_insn_reservation "znver1_sseimul_avx256_load" 8
(and (eq_attr "cpu" "znver1")
(and (eq_attr "mode" "OI")
(and (eq_attr "type" "sseimul")
(eq_attr "memory" "load"))))
"znver1-double,znver1-load,znver1-fp0*4")

(define_insn_reservation "znver1_sseimul_avx256_load" 11
(and (eq_attr "cpu" "znver1")
(and (eq_attr "mode" "OI")
(and (eq_attr "type" "sseimul")
(eq_attr "type" "sseimul"))))
"znver1-direct,znver1-fp0*3")

znver1_sseimul_avx256_load don't need load, from double to direct. L1 cache data: 7 clock?

Lo Absoluto said...

Just as i was saying, 128b high and 128b low operands at FPU.

Powerrush...

Glen Maxy said...

In case you are searching for acceptable Corporate Governance Law Essays, at that point you can depend on GreatAssignmenthelp.com for getting you premium Corporate Governance Law Assignment Help . Enormous Private Corporations and Companies should be directed by the administration and for that reason, they require certain principles, guidelines, and rules.

unknow said...

Does your HP printer not work well? Well, you do not embarrass yourself as every problem brings the self-solution itself. In case your scanning and printing work disturbs without any hard glitches, then you can implement the right hierarchy to rectify this issue. But, sometimes problems become out of control, and taking the help of an HP Support representative offers you a sigh of relief. It would be good to reach out to a third party professional team to call them on the HP Printer Support number. Our service is available to you throughout the day.
Frequently asked question (FAQ) of printer: How do I contact HP customer support?
Thank You.

frs said...

hy are going to purchase a kitchen items or accessaries at affordable price you will reached at the right place, here you will find kitchen items, water bottles, shoes, men's and woman ware
Clcik here to buy

https://shoppingwithdeals.com

frs said...

if you are intersting into a buy kitchenware items or accessaries or water bottles or many daily usages things like cutter, peeler, knif etc...in a bulk quantity catact us by
Contact Here

https://firstrateshopper.com

frs said...

watch for men offering you to buy watches for mens and womens and we having available wide range of watch collections, casual and formal watch set and sporty wrist watch for men's. Best couple watches for gift to your perents or a friends wedding
Buy Now

https://www.watchformans.com

Do Assignment Help said...

I must say, a great article you posted. And I love your way of writing. Like you and be honest, I am also here to present my Assignment help website. Nowadays, Students can not perform their daily task because of the burden of school or college assignments, so we provide online assignment help to remove their workload. With online Assignment help, Our experts provide proper guidance, or we can say tuition for students. Connect our experts for your homework help.
Related topic: Tips to make your assignment look presentable
Thank You

Unknown said...

Any of your research solutions: math, algebra, calculus, geometry, recreation idea, or other kinds of do the job would be performed just in time. Really, there is a sixty seven% possibility pay people to do your essay you get way sooner than you be expecting. Most of our orders are finished prior to the deadline. Look at the paper, ask for revisions, in circumstance you have to have them. Hand your operate to the professor. Performed below.

Sarah Winget said...

We give 100% assurance that the entirety of our work is just one of its sort and interesting. Aside from that the college cutoff times ensured as we generally convey work before close cutoff times so that any inquiry or altering can be settled on schedule. Our client care staff is consistently accessible to help students will all their assignment needs and prerequisites. programming assignment help
assignment provider

Assignment service provider said...

Now a days students cannot figure out how to write an assignment thus they plan to take the assignment help from the academic writing service provider. For taking the best assignment writing help you must check the assignment samples provided by the assignment writing service provider. Reviews are also important so read the reviews of the customers that have taken the help for writing the assignments from them.

Masonethan said...

Looking for a reliable assignment help online provider to help in the UAE? Don’t worry we got the solution to your problem. We offer all academic assignment services for students within the budget. Visit our website to learn more.

Jimmy Walter said...

Are you looking for experts to work on your assignment? Do you feel frustrated while working on your homework? If you are facing issues while working on your homework, then you must opt for our help my assignment and connect with our experts. Our online academic writing services are available 24x7 and prompt in attending to students queries. Assignment Help | Assignment Helper