The organizations are striving to achieve post-K application execution performance up to 100 times that of the K computer.
A64FX is the world’s first CPU to adopt the Scalable Vector Extension (SVE), an extension of Armv8-A instruction set architecture for supercomputers. Building on over 60 years’ worth of Fujitsu-developed microarchitecture, this chip offers peak performance of over 2.7 TFLOPS, demonstrating superior HPC and AI performance.
Fujitsu made the announcement at Hot Chips 30(1), an international symposium on high performance processors and related technologies held in Silicon Valley, California from August 19-21.
Post-K is the successor to the K computer which in 2011 achieved the highest ranking in the world on the TOP500 list of supercomputers around the world. Fujitsu and RIKEN are developing post-K, aiming for starting operation around 2021.
A64FX is the high-performance CPU that will be used in post-K. It offers a number of features, including broad utility supporting a wide range of applications, massive parallelization through the Tofu interconnect, low power consumption, and mainframe-class reliability.
A64FX is the world’s first CPU to adopt the SVE of Arm Limited’s Armv8-A instruction set architecture, extended for supercomputers.
Fujitsu collaborated with Arm, contributing to the development of the SVE as a lead partner, and adopted the results in the A64FX. Fujitsu developed the microarchitecture of the A64FX by building on the technology of its previous supercomputers, mainframes, and UNIX servers.
With hardware technology that draws out the high memory bandwidth of high performance stacked memory, the system can efficiently utilize the CPU’s high functional computational processing units, enabling delivery of high application execution performance.
The CPUs will be directly connected by the proprietary Tofu interconnect developed for the K computer, improving parallel performance.
The system can provide a peak double precision (64 bit) floating point operations performance of over 2.7 TFLOPS, with a computational throughput twice that amount for single precision (32 bit), and four times that amount for half precision (16 bit).
In other words, by using single precision or half precision operations, applications can get results even faster.
Fujitsu has also enhanced computational performance for 16 bit and 8 bit integer operations. Accordingly, this CPU is suited for a wide range of fields such as big data and AI, not just for the computer simulations at which traditional supercomputers excel.