Google社がディープラーニング向けに開発したフレームワーク「Tensorflow」について、ベンチマークテスト tf_cnn_benchmarks を継続的に実施しています。さまざまな構成による実機を用いて実行したテスト結果を公開していますので、それぞれの機械学習モデルに対するシステム性能を比較することができます。
ベンチマークの値(total images/sec)の大きいものが、性能が高いことを示します。
ベンチマーク 結果抜粋
AMD Instinct MI210
System: HPC-ProServer DPeR7625
CPU:(2) AMD EPYC 9354 3.25G 32C/64T
Mem : 384GB (24) 16GB RDIMM, 4800MT/s
OS : RockyLinux8.10
GPU : (1) AMD Instinct MI210
ROCm : 6.2.0
AMD Infinity Hub : Tensorflow 2.15 (docker)
model | batch size | total images/sec |
inception4 | 32 | 188 |
inception4 | 64 | 211 |
inception3 | 32 | 364 |
inception3 | 64 | 417 |
alexnet | 32 | 3,119 |
alexnet | 64 | 4,259 |
resnet50 | 32 | 544 |
resnet50 | 64 | 603 |
resnet50_v2 | 32 | 557 |
resnet50_v2 | 64 | 615 |
resnet101 | 32 | 317 |
resnet101 | 64 | 347 |
resnet101_v2 | 32 | 323 |
resnet101_v2 | 64 | 353 |
resnet152 | 32 | 220 |
resnet152 | 64 | 239 |
resnet152_v2 | 32 | 224 |
resnet152_v2 | 64 | 243 |
vgg16 | 32 | 276 |
vgg16 | 64 | 278 |
vgg19 | 32 | 227 |
vgg19 | 64 | 230 |
trivial | 32 | 32,996 |
trivial | 64 | 60,147 |
NVIDIA A100 Tensor Core 40GB
System : DGX A100
CPU : (2) Dual AMD Rome 7742 128 cores total, 2.25 GHz(base), 3.4 GHz (max boost)
Mem : 1TB
GPU : (8) NVIDIA A100 Tensor Core 40 GB
OS : DGX OS 5.0
Driver : 450.80.02 / CUDA : 11.2
Anaconda 4.9.2
tensorflow-gpu 2.4.1
Date : 2021-03
model | Batch Size |
GPUs | Speed Up Ratio | ||||||
1 | 2 | 4 | 8 | 1gpu/1gpu | 2gpu/1gpu | 4gpu/1gpu | 8gpu/1gpu | ||
inception4 | 32 | 252 | 447 | 761 | 1224 | 1.0 | 1.8 | 3.0 | 4.9 |
inception4 | 64 | 286 | 539 | 978 | 1625 | 1.0 | 1.9 | 3.4 | 5.7 |
inception3 | 32 | 474 | 786 | 1321 | 2089 | 1.0 | 1.7 | 2.8 | 4.4 |
inception3 | 64 | 535 | 993 | 1826 | 3169 | 1.0 | 1.9 | 3.4 | 5.9 |
alexnet | 32 | 4644 | 8798 | 16578 | 22505 | 1.0 | 1.9 | 3.6 | 4.8 |
alexnet | 64 | 6537 | 12799 | 24830 | 39477 | 1.0 | 2.0 | 3.8 | 6.0 |
resnet50 | 32 | 678 | 1226 | 2121 | 3213 | 1.0 | 1.8 | 3.1 | 4.7 |
resnet50 | 64 | 786 | 1492 | 2726 | 4491 | 1.0 | 1.9 | 3.5 | 5.7 |
resnet50_v2 | 32 | 698 | 1249 | 2219 | 3462 | 1.0 | 1.8 | 3.2 | 5.0 |
resnet50_v2 | 64 | 805 | 1528 | 2819 | 4957 | 1.0 | 1.9 | 3.5 | 6.2 |
resnet101 | 32 | 414 | 724 | 1151 | 1701 | 1.0 | 1.8 | 2.8 | 4.1 |
resnet101 | 64 | 488 | 901 | 1596 | 2660 | 1.0 | 1.8 | 3.3 | 5.4 |
resnet101_v2 | 32 | 423 | 739 | 1200 | 1836 | 1.0 | 1.7 | 2.8 | 4.3 |
resnet101_v2 | 64 | 498 | 917 | 1638 | 2712 | 1.0 | 1.8 | 3.3 | 5.4 |
resnet152 | 32 | 279 | 486 | 781 | 1172 | 1.0 | 1.7 | 2.8 | 4.2 |
resnet152 | 64 | 340 | 627 | 1066 | 1831 | 1.0 | 1.8 | 3.1 | 5.4 |
resnet152_v2 | 32 | 295 | 502 | 820 | 1271 | 1.0 | 1.7 | 2.8 | 4.3 |
resnet152_v2 | 64 | 346 | 633 | 1115 | 1883 | 1.0 | 1.8 | 3.2 | 5.4 |
vgg16 | 32 | 566 | 1094 | 2172 | 3824 | 1.0 | 1.9 | 3.8 | 6.8 |
vgg16 | 64 | 609 | 1196 | 2343 | 4407 | 1.0 | 2.0 | 3.8 | 7.2 |
vgg19 | 32 | 480 | 955 | 1770 | 3055 | 1.0 | 2.0 | 3.7 | 6.4 |
vgg19 | 64 | 515 | 1021 | 1960 | 3640 | 1.0 | 2.0 | 3.8 | 7.1 |
trivial | 32 | 34166 | 50915 | 83643 | 125229 | 1.0 | 1.5 | 2.4 | 3.7 |
trivial | 64 | 57025 | 89881 | 152530 | 236302 | 1.0 | 1.6 | 2.7 | 4.1 |
NVIDIA A40
System : HPC-ProServer DPeR750XA
CPU : (2) Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz
Mem : Total 512GB (16) 32GB DDR4 3200MHz
GPU : (4) NVIDIA A40
OS : Ubuntu 20.04
Driver : 510.47.03 / CUDA : 11.6
NGC Docker : 22.01-tf1-py3 , Tensorflow 1.15
Date : 2022-02
model | Batch Size |
GPUs | Speed Up Ratio | ||||
1 | 2 | 4 | 1gpu/1gpu | 2gpu/1gpu | 4gpu/1gpu | ||
inception4 | 32 | 169 | 329 | 645 | 1.0 | 1.9 | 3.8 |
inception4 | 64 | 176 | 348 | 1161 | 1.0 | 2.0 | 6.6 |
inception3 | 32 | 316 | 594 | 1161 | 1.0 | 1.9 | 3.7 |
inception3 | 64 | 324 | 607 | 1273 | 1.0 | 1.9 | 3.9 |
alexnet | 32 | 2241 | 4288 | 1921 | 1.0 | 1.9 | 0.9 |
alexnet | 64 | 3077 | 6583 | 3972 | 1.0 | 2.1 | 1.3 |
resnet50 | 32 | 397 | 773 | 1476 | 1.0 | 1.9 | 3.7 |
resnet50 | 64 | 439 | 867 | 1663 | 1.0 | 2.0 | 3.8 |
resnet50_v2 | 32 | 404 | 793 | 1489 | 1.0 | 2.0 | 3.7 |
resnet50_v2 | 64 | 447 | 883 | 1715 | 1.0 | 2.0 | 3.8 |
resnet101 | 32 | 240 | 463 | 880 | 1.0 | 1.9 | 3.7 |
resnet101 | 64 | 265 | 519 | 1002 | 1.0 | 2.0 | 3.8 |
resnet101_v2 | 32 | 244 | 472 | 877 | 1.0 | 1.9 | 3.6 |
resnet101_v2 | 64 | 268 | 526 | 1040 | 1.0 | 2.0 | 3.9 |
resnet152 | 32 | 168 | 322 | 602 | 1.0 | 1.9 | 3.6 |
resnet152 | 64 | 184 | 355 | 705 | 1.0 | 1.9 | 3.8 |
resnet152_v2 | 32 | 169 | 325 | 623 | 1.0 | 1.9 | 3.7 |
resnet152_v2 | 64 | 185 | 362 | 716 | 1.0 | 2.0 | 3.9 |
vgg16 | 32 | 254 | 466 | 618 | 1.0 | 1.8 | 2.4 |
vgg16 | 64 | 273 | 517 | 900 | 1.0 | 1.9 | 3.3 |
vgg19 | 32 | 220 | 412 | 588 | 1.0 | 1.9 | 2.7 |
vgg19 | 64 | 234 | 455 | 761 | 1.0 | 1.9 | 3.2 |
trivial | 32 | 28122 | 34912 | 16928 | 1.0 | 1.2 | 0.6 |
trivial | 64 | 51270 | 70641 | 33871 | 1.0 | 1.4 | 0.7 |
NVIDIA RTX A4000
System : HPC-ProServer DPrR7920
CPU : (2) Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
Mem : Total 256GB (16) 16GB DDR4 2933MHz
GPU : (2) NVIDIA RTX A4000
OS : Ubuntu 20.04
Driver : 515.65.01 / CUDA : 11.7
NGC Docker : 22.09-tf1-py3 , Tensorflow 1.15
Date : 2022-10
model | Batch Size |
GPUs | Speed Up Ratio | ||
1 | 2 | 1gpu/1gpu | 2gpu/1gpu | ||
inception4 | 32 | 108 | 209 | 1 | 1.94 |
inception4 | 64 | 111 | 221 | 1 | 1.99 |
inception3 | 32 | 206 | 386 | 1 | 1.88 |
inception3 | 64 | 213 | 408 | 1 | 1.91 |
alexnet | 32 | 1483 | 1383 | 1 | 0.93 |
alexnet | 64 | 1789 | 2530 | 1 | 1.41 |
resnet50 | 32 | 279 | 530 | 1 | 1.90 |
resnet50 | 64 | 297 | 580 | 1 | 1.95 |
resnet50_v2 | 32 | 283 | 535 | 1 | 1.89 |
resnet50_v2 | 64 | 304 | 590 | 1 | 1.94 |
resnet101 | 32 | 167 | 320 | 1 | 1.92 |
resnet101 | 64 | 179 | 351 | 1 | 1.96 |
resnet101_v2 | 32 | 170 | 324 | 1 | 1.91 |
resnet101_v2 | 64 | 182 | 358 | 1 | 1.96 |
resnet152 | 32 | 117 | 223 | 1 | 1.91 |
resnet152 | 64 | 126 | 248 | 1 | 1.96 |
resnet152_v2 | 32 | 118 | 227 | 1 | 1.92 |
resnet152_v2 | 64 | 128 | 250 | 1 | 1.96 |
vgg16 | 32 | 156 | 243 | 1 | 1.56 |
vgg16 | 64 | 164 | 294 | 1 | 1.79 |
vgg19 | 32 | 132 | 219 | 1 | 1.66 |
vgg19 | 64 | 138 | 250 | 1 | 1.81 |
trivial | 32 | 15063 | 9285 | 1 | 0.62 |
trivial | 64 | 26877 | 17236 | 1 | 0.64 |
NVIDIA GeForce RTX 3090
System : HPC-ProServer DPeT640
CPU : (2) Intel Xeon Gold 6226R 2.9GHz 16C/32T, 10.4GT/s, 22 M キャッシュ, ターボ, HT (150W)
Mem : 384GB (12) 32GB RDIMM, 2933MT/s, 2R
GPU : (2) NVIDIA GeForce 3090
OS : CentOS v7.9
Driver : 460.32.03 / CUDA : 11.2
Anaconda 4.9.2
tensorflow-gpu 2.4.1
Date : 2021-03
model | Batch Size |
GPUs | Speed Up Ratio | ||
1 | 2 | 1gpu/1gpu | 2gpu/1gpu | ||
inception4 | 32 | 152 | 278 | 1 | 1.8 |
inception4 | 64 | 166 | 321 | 1 | 1.9 |
inception3 | 32 | 300 | 519 | 1 | 1.7 |
inception3 | 64 | 340 | 643 | 1 | 1.9 |
alexnet | 32 | 2650 | 1068 | 1 | 0.4 |
alexnet | 64 | 3705 | 1967 | 1 | 0.5 |
resnet50 | 32 | 440 | 774 | 1 | 1.8 |
resnet50 | 64 | 496 | 927 | 1 | 1.9 |
resnet50_v2 | 32 | 444 | 788 | 1 | 1.8 |
resnet50_v2 | 64 | 503 | 947 | 1 | 1.9 |
resnet101 | 32 | 241 | 380 | 1 | 1.6 |
resnet101 | 64 | 294 | 553 | 1 | 1.9 |
resnet101_v2 | 32 | 257 | 451 | 1 | 1.8 |
resnet101_v2 | 64 | 297 | 550 | 1 | 1.9 |
resnet152 | 32 | 173 | 306 | 1 | 1.8 |
resnet152 | 64 | 205 | 382 | 1 | 1.9 |
resnet152_v2 | 32 | 171 | 318 | 1 | 1.9 |
resnet152_v2 | 64 | 206 | 384 | 1 | 1.9 |
vgg16 | 32 | 294 | 326 | 1 | 1.1 |
vgg16 | 64 | 311 | 451 | 1 | 1.5 |
vgg19 | 32 | 252 | 307 | 1 | 1.2 |
vgg19 | 64 | 262 | 387 | 1 | 1.5 |
trivial | 32 | 25999 | 9339 | 1 | 0.4 |
trivial | 64 | 44249 | 18098 | 1 | 0.4 |
ベンチマーク 個別結果
AMD Instinct MI210 : AMD Infinity Hub Tensorflow 2.15
NVIDIA RTX A4000 Tensorflow 1.15 NGC Docker
NVIDIA A40 Tensorflow 1.15 NGC Docker
NVIDIA GeForce RTX 3090 Tensorflow 2.4.1 Anaconda 4.9.2
NVIDIA A100 Tensor Core 40GB (DGX A100) Tensorflow 2.4.1 Anaconda 4.9.2
NVIDIA TITAN RTX (DPeR740) Tensorflow v1.12.0 Anaconda3 CuDNN v7.5
NVIDIA Tesla V100 PCIe 32G Tensorflow v1.12.0 Nvidia-Docker
NVIDIA GeForce RTX 2080Ti (DPeR740) Tensorflow v1.12.0 pip
NVIDIA GeForce GTX 1080Ti 4枚構成 (DPeT640) Tensorflow v1.12.0 pip
NVIDIA GeForce RTX 2080 (DPeR740) Tensorflow v1.11.0 Anaconda/Nvidia-Docker
NVIDIA TITAN V (DPeR740) Tensorflow v1.10 Source/Anaconda/Nvidia-Docker
NVIDIA GeForce GTX 1080Ti (DPeR730) Tensorflow v1.10 CuDNN v7.1
AMD Radeon Vega Frontier Edition (DPrT7910) Tensorflow v1.3 ROCm v1.8.118