Update: After some of the pages were removed at SiSoftware, I removed the links, so that the full resolution screenshots are available again.
Before doing some analysis, there are the raw results:
mov eax, [edi]This code actually can't be executed out-of-order. All the logic put into an out-of-order scheduler would be a waste of energy in this case. And multiple same type execution units for parallel issue wouldn't help to speed this up either. A single scheduler with an integer execution unit (IEU), and an address generation unit (AGU) would be enough. The latter wouldn't even need a separate scheduler, similar to the K7, K8, K10 series of CPUs. This could be one reason for Zen's individual schedulers, as one identified dependency chain could be sent to a single integer scheduler and one AGU scheduler if there are memory operands. The other schedulers might even be clock gated then.
add eax, ecx
imul edi, eax
cmp [ebp+08h], edi
jnz out
Zen core size estimation based on Excavator data |
Labelled Zeppelin die photo (stitched) |
"The most exciting part is core clock. The 8c/95W variant's base clock is 2.8GHz, all core boost is 3.05GHz and maximum boost is 3.2GHz."This hasn't been confirmed in this way before (except that I heard it is true). So until now there was nothing available, that actually supported the information posted there.
Bus-Numb-Fun IRQ Vendor-Dev-Sub_OEM-Rev Class (9:255) Vendor and Device Description Showing 39 of 39
[0 - 00 - 0] 1022-1450-14501022-00 Host Bridge AMD
[0 - 01 - 0] 1022-1452-00000000-00 Host Bridge AMD
[0 - 01 - 2] 1022-1453-00000000-00 PCI Bridge (0-1) x4 (x4) AMD
[0 - 02 - 0] 1022-1452-00000000-00 Host Bridge AMD
[0 - 03 - 0] 1022-1452-00000000-00 Host Bridge AMD
[0 - 04 - 0] 1022-1452-00000000-00 Host Bridge AMD
[0 - 07 - 0] 1022-1452-00000000-00 Host Bridge AMD
[0 - 07 - 1] 1022-1454-00000000-00 PCI Bridge (0- x16 (x16) AMD
[0 - 08 - 0] 1022-1452-00000000-00 Host Bridge AMD
[0 - 08 - 1] 1022-1454-00000000-00 PCI Bridge (0-9) x16 (x16) AMD
[0 - 20 - 0] 1022-790B-790B1022-59 SMBus Controller AMD
[0 - 20 - 3] 1022-790E-790E1022-51 ISA Bridge AMD
[0 - 20 - 6] 1022-7906-79061022-51 SD Host DMA Controller AMD
[0 - 24 - 0] 1022-1460-00000000-00 Host Bridge AMD Summit Ridge (K17) Processor Link Control
[0 - 24 - 1] 1022-1461-00000000-00 Host Bridge AMD Summit Ridge (K17) Processor Address Map Configuration
[0 - 24 - 2] 1022-1462-00000000-00 Host Bridge AMD Summit Ridge (K17) Processor DRAM Controll
[0 - 24 - 3] 1022-1463-00000000-00 Host Bridge AMD Summit Ridge (K17) Processor Miscellaneous Control
[0 - 24 - 4] 1022-1464-00000000-00 Host Bridge AMD Summit Ridge (K17) Processor Link Control
[0 - 24 - 5] 1022-1465-00000000-00 Host Bridge AMD Summit Ridge (K17) Processor Function 5 Configuration
[0 - 24 - 6] 1022-1466-00000000-00 Host Bridge AMD Summit Ridge (K17) Processor Function 6 Configuration
[0 - 24 - 7] 1022-1467-00000000-00 Host Bridge AMD Summit Ridge (K17) Processor Function 7 Configuration
[1 - 00 - 0] 1022-43B9-11421B21-02 XHCI Controller x4 (x4) AMD Promotory USB 3.1 XHCI Host Controller
[1 - 00 - 1] 1022-43B5-10621B21-02 SATA (AHCI 1.0) x4 (x4) AMD
[1 - 00 - 2] 1022-43B0-00000000-02 PCI Bridge (1-2) x4 (x4) AMD
[2 - 00 - 0] 1022-43B4-00000000-02 PCI Bridge (2-3) x1 (x1) AMD
[2 - 01 - 0] 1022-43B4-00000000-02 PCI Bridge (2-4) x1 (x1) AMD
[2 - 02 - 0] 1022-43B4-00000000-02 PCI Bridge (2-5) x1 (x1) AMD
[2 - 03 - 0] 1022-43B4-00000000-02 PCI Bridge (2-6) x1 (x1) AMD
[2 - 04 - 0] 1022-43B4-00000000-02 PCI Bridge (2-7) x0 (x4) AMD
[3 - 00 - 0] 14E4-1687-168714E4-10 Ethernet Controller x1 (x1)Broadcom NetXtreme BCM5762 Gigabit Ethernet PCIe
[3 - 00 - 1] 14E4-1640-164014E4-10 SD Host DMA Controller x1 (x1)Broadcom
[5 - 00 - 0] 1002-68F9-010E1002-00 VGA Controller x1 (x16) AMD Cedar Pro [Radeon HD 5450/Radeon HD 6350] [GPU-0]
[5 - 00 - 1] 1002-AA68-AA681002-00 High Def Audio x1 (x16) AMD Cedar/Park HDMI Audio
[8 - 00 - 0] 1022-145A-145A1022-00 Other (0x130000) x16 (x16) AMD
[8 - 00 - 2] 1022-1456-14561022-00 Other Encryption x16 (x16) AMD
[8 - 00 - 3] 1022-145C-145C1022-00 XHCI Controller x16 (x16) AMD
[9 - 00 - 0] 1022-1455-14551022-00 Other (0x130000) x16 (x16) AMD
[9 - 00 - 2] 1022-7901-79011022-51 SATA (AHCI 1.0) x16 (x16) AMD
[9 - 00 - 3] 1022-1457-14571022-00 High Def Audio x16 (x16) AMD
Total of 7 PCI buses and 39 PCI devices in 0.040 seconds.
One half of the die would look like this after some perspective correction:@Dresdenboy Okay, that would make sense. I attempted to make a better quality version of the wafer shot. pic.twitter.com/nLVPMZGTcY— Thomas Ryan (@UncheckedError) 22. Mai 2016
+/*
+ * Enumerating new IP types and HWID values
+ * in ScalableMCA enabled AMD processors
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum ip_types {
+ F17H_CORE = 0, /* Core errors */
+ DF, /* Data Fabric */
+ UMC, /* Unified Memory Controller */
+ FUSE, /* FUSE subsystem */
+ PSP, /* Platform Security Processor */
+ SMU, /* System Management Unit */
+ N_IP_TYPES
+};
+enum core_mcatypes {
+ LS = 0, /* Load Store */
+ IF, /* Instruction Fetch */
+ L2_CACHE, /* L2 cache */
+ DE, /* Decoder unit */
+ RES, /* Reserved */
+ EX, /* Execution unit */
+ FP, /* Floating Point */
+ L3_CACHE /* L3 cache */
+};
+
+enum df_mcatypes {
+ CS = 0, /* Coherent Slave */
+ PIE /* Power management, Interrupts, etc */
+};
+#endif
+ case F17H_CORE:
+ pr_emerg(HW_ERR "%s Error: ",
+ (mca_type == L3_CACHE) ? "L3 Cache" : "F17h Core");
+ decode_f17hcore_errors(xec, mca_type);
+ break;
+/* Scalable MCA error strings */
+
+static const char * const f17h_ls_mce_desc[] = {
+ "Load queue parity",
+ "Store queue parity",
+ "Miss address buffer payload parity",
+ "L1 TLB parity",
+ "", /* reserved */
+ "DC tag error type 6",
+ "DC tag error type 1",
+static const char * const f17h_if_mce_desc[] = {
+ "microtag probe port parity error",
+ "IC microtag or full tag multi-hit error",
+ "IC full tag parity",
+ "IC data array parity",
+ "Decoupling queue phys addr parity error",
+ "L0 ITLB parity error",
+ "L1 ITLB parity error",
+ "L2 ITLB parity error",
+ "BPQ snoop parity on Thread 0",
+ "BPQ snoop parity on Thread 1",
+ "L1 BTB multi-match error",
+ "L2 BTB multi-match error",
+};
+static const char * const f17h_de_mce_desc[] = { + "uop cache tag parity error", + "uop cache data parity error", + "Insn buffer parity error", + "Insn dispatch queue parity error", + "Fetch address FIFO parity", + "Patch RAM data parity", + "Patch RAM sequencer parity", + "uop buffer parity" +};
+static const char * const f17h_ex_mce_desc[] = {
+ "Watchdog timeout error",
+ "Phy register file parity",
+ "Flag register file parity",
+ "Immediate displacement register file parity",
+ "Address generator payload parity",
+ "EX payload parity",
+ "Checkpoint queue parity",
+ "Retire dispatch queue parity",
+};
+ "L3 victim queue parity", ... + "Atomic request parity", + "ECC error on probe filter access",...+ "Error on GMI link",
"For its part, AMD engineers showed smart ways of squeezing as much as 15% more performance out of its Carrizo PC processor, simply by applying more aggressive power management to the 28nm design. The Bristol Ridge design was a study in using power management to overcome performance limits tied to heat, voltage and current."Months after the first leaked WEI score, first true Bristol Ridge benchmarks will show, how this improvement translates into real world performance. Hopefully they get tested with dual channel memory, even if AMD or OEMs only provide single channel equipped/designed devices, as for the recent AnandTech Carrizo review.
AMD Zeppelin (Family 17h, Model 00h) introduces an instructionsretired performance counter which indicated byCPUID.8000_0008H:EBX[1]. And dedicated Instructions Retired register(MSR 0xC000_000E9) increments on once for every instruction retired.There might even be a meaning behind the similarity of parts of the "Zen" and "Zeppelin" codenames.
On AMD Fam17h systems, the last level cache is not resident in Northbridge. Therefore, we cannot assign cpu_llc_id to same value as Node ID (as we have been doing currently)
We should rather look at the ApicID bits of the core to provide us the last level cache ID info. Doing that here.
The most interesting part describes the way, how the last level cache (LLC) ID is being calculated for Zen based MPUs:
+ core_complex_id = (apicid & ((1 << c->x86_coreid_bits) - 1)) >> 3;"Core complex" should be similar to "compute unit" and has been used in some AMD patents already. The expression marked in red means a shift right by 3, which equals a division by 8. So with two logical cores per physical core due to SMT, a core complex should contain four Zen cores and a shared LLC.
+ per_cpu(cpu_llc_id, cpu) = (socket_id << 3) | core_complex_id;
The next line shows the socket ID being shifted left by 3, leaving 3 bits for the core complex ID, which suggests a maximum number of eight core complexes per socket, or 32 physical cores. This number should first be seen as a placeholder, but we've already seen rumours mentioning that many cores.