Benchmarks for [A6000

  • GPU Server Expansion and A6000 Benchmarks | ALBERT Official Blog
  • BFLOAT16 and others are looking into it.
  • FP32
    • Peak performance 38.7 TFlops

    • M=640, N=480, K=320 on FP32 10TFlops, but the matrix size is also small, so it is still far from peak performance.

  • FP16
    • cudaTensorCoreGemm (FP16 Tensor)

    • A6000:TFLOPS: 77.85

    • M=4096, N=4096, K=4096 matrix product operations, so-called mixed precision. Matrices A and B are half (FP16), and the sum of products is received as a float (FP32) of matrix C. It is used not only for inference but also for learning as it is effective enough.

  • It could be 2 to 7 times faster using semi-precision.

History Very slow on 2016 GeForce GTX 1080 Ti.

2017

2019

  • Huawei Sanctions

  • Huawei announced its Ascend 910 AI computing chip in August 2019, claiming it is twice as powerful as rival Nvidia’s Tesla v100. Based on the company’s announcement, it delivers 256 teraflops in half-precision floating-point arithmetic (FP16).

    • https://www.axion.zone/hisilicon/amp/
    • I see, the high need for half-precision arithmetic for edge computing is causing a technology competition between the U.S. and China.
    • Image recognition at the edge, it’s going to affect the quality of operations with small unmanned aircraft that were mentioned in the War on Intelligence, and China doesn’t want to be dependent on American companies.

2020: NVIDIA RTX A6000 released

2022

  • Huawei, which has been suffering a significant decline in business due to sanctions from the US government that have prevented it from doing business with US companies including Google and Qualcomm, as well as from purchasing chips from TSMC, which uses semiconductor equipment made by US companies, will begin producing Kirin chips in Wuhan, China, as early as 2022, according to Taiwanese media DigiTimes. The Taiwanese media outlet DigiTimes reported that Huawei will start Kirin chip production in Wuhan, China, as early as 2022.


This page is auto-translated from /nishio/半精度演算の速度 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.