|
|
@@ -507,7 +507,7 @@ The following sections provide details on how to achieve the same performance an
|
|
|
|
|
|
##### Training accuracy: NVIDIA DGX A100 (8x A100 80G)
|
|
|
|
|
|
-Our results were obtained by running the `python scripts/train.py --gpus {1,8} --fold {0,1,2,3,4} --dim {2,3} --batch_size <bsize> [--amp]` training scripts and averaging results in the PyTorch 21.02 NGC container on NVIDIA DGX with (8x A100 80G) GPUs.
|
|
|
+Our results were obtained by running the `python scripts/train.py --gpus {1,8} --fold {0,1,2,3,4} --dim {2,3} [--amp]` training scripts and averaging results in the PyTorch 21.02 NGC container on NVIDIA DGX with (8x A100 80G) GPUs.
|
|
|
|
|
|
| Dimension | GPUs | Batch size / GPU | Accuracy - mixed precision | Accuracy - FP32 | Time to train - mixed precision | Time to train - TF32| Time to train speedup (TF32 to mixed precision)
|
|
|
|:-:|:-:|:--:|:-----:|:-----:|:-----:|:-----:|:----:|
|
|
|
@@ -519,7 +519,7 @@ Our results were obtained by running the `python scripts/train.py --gpus {1,8} -
|
|
|
|
|
|
##### Training accuracy: NVIDIA DGX-1 (8x V100 16G)
|
|
|
|
|
|
-Our results were obtained by running the `python scripts/train.py --gpus {1,8} --fold {0,1,2,3,4} --dim {2,3} --batch_size <bsize> [--amp]` training scripts and averaging results in the PyTorch 21.02 NGC container on NVIDIA DGX-1 with (8x V100 16G) GPUs.
|
|
|
+Our results were obtained by running the `python scripts/train.py --gpus {1,8} --fold {0,1,2,3,4} --dim {2,3} [--amp]` training scripts and averaging results in the PyTorch 21.02 NGC container on NVIDIA DGX-1 with (8x V100 16G) GPUs.
|
|
|
|
|
|
| Dimension | GPUs | Batch size / GPU | Accuracy - mixed precision | Accuracy - FP32 | Time to train - mixed precision | Time to train - FP32 | Time to train speedup (FP32 to mixed precision)
|
|
|
|:-:|:-:|:--:|:-----:|:-----:|:-----:|:-----:|:----:|
|
|
|
@@ -580,7 +580,7 @@ Our results were obtained by running the `python scripts/benchmark.py --mode pre
|
|
|
|
|
|
FP16
|
|
|
|
|
|
-| Dimension | Batch size | Resolution | Throughput Avg [img/s] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
|
|
|
+| Dimension | Batch size | Resolution | Throughput Avg [img/s] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
|
|
|
|:----------:|:---------:|:-------------:|:----------------------:|:----------------:|:----------------:|:----------------:|:----------------:|
|
|
|
| 2 | 64 | 4x192x160 | 3198.8 | 20.01 | 24.1 | 30.5 | 33.75 |
|
|
|
| 2 | 128 | 4x192x160 | 3587.89 | 35.68 | 36.0 | 36.08 | 36.16 |
|
|
|
@@ -591,7 +591,7 @@ FP16
|
|
|
|
|
|
TF32
|
|
|
|
|
|
-| Dimension | Batch size | Resolution | Throughput Avg [img/s] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
|
|
|
+| Dimension | Batch size | Resolution | Throughput Avg [img/s] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
|
|
|
|:----------:|:---------:|:-------------:|:----------------------:|:----------------:|:----------------:|:----------------:|:----------------:|
|
|
|
| 2 | 64 | 4x192x160 | 2353.27 | 27.2 | 27.43 | 27.53 | 27.7 |
|
|
|
| 2 | 128 | 4x192x160 | 2492.78 | 51.35 | 51.54 | 51.59 | 51.73 |
|
|
|
@@ -610,7 +610,7 @@ Our results were obtained by running the `python scripts/benchmark.py --mode pre
|
|
|
|
|
|
FP16
|
|
|
|
|
|
-| Dimension | Batch size | Resolution | Throughput Avg [img/s] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
|
|
|
+| Dimension | Batch size | Resolution | Throughput Avg [img/s] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
|
|
|
|:----------:|:---------:|:-------------:|:----------------------:|:----------------:|:----------------:|:----------------:|:----------------:|
|
|
|
| 2 | 64 | 4x192x160 | 1866.52 | 34.29 | 34.7 | 48.87 | 52.44 |
|
|
|
| 2 | 128 | 4x192x160 | 2032.74 | 62.97 | 63.21 | 63.25 | 63.32 |
|
|
|
@@ -620,7 +620,7 @@ FP16
|
|
|
|
|
|
FP32
|
|
|
|
|
|
-| Dimension | Batch size | Resolution | Throughput Avg [img/s] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
|
|
|
+| Dimension | Batch size | Resolution | Throughput Avg [img/s] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
|
|
|
|:----------:|:---------:|:-------------:|:----------------------:|:----------------:|:----------------:|:----------------:|:----------------:|
|
|
|
| 2 | 64 | 4x192x160 | 1051.46 | 60.87 | 61.21 | 61.48 | 62.87 |
|
|
|
| 2 | 128 | 4x192x160 | 1051.68 | 121.71 | 122.29 | 122.44 | 122.6 |
|