Yuanzhe Dong 3d8d878489 [ConvNets/PyT] update triton repo url in readmes		4 år sedan
..
efficientnet	c481324031 [PyT/EfficientNet] Update README	4 år sedan
image_classification	555b84b3b1 [ConvNets/PyT] Adding checkpoints for EfficientNet/PyT, Squeeze&Excitation can use Conv or Linear layer depending on `--trt` switch.	4 år sedan
img	5562ab767a Adding SE-ResNext and ResNext / PyT	6 år sedan
resnet50v1.5	7bdfc81d25 [ConvNets/PyT] Triton and performance numbers updated	4 år sedan
resnext101-32x4d	7bdfc81d25 [ConvNets/PyT] Triton and performance numbers updated	4 år sedan
scripts	2bdf2775e3 [ConvNets/PyT] EfficientNet release	4 år sedan
se-resnext101-32x4d	7bdfc81d25 [ConvNets/PyT] Triton and performance numbers updated	4 år sedan
triton	3d8d878489 [ConvNets/PyT] update triton repo url in readmes	4 år sedan
.gitmodules	f0ef8493eb ConvNets update	6 år sedan
Dockerfile	7fbe4ab64c [ConvNets/PyT] QAT for EfficientNet	4 år sedan
LICENSE	5562ab767a Adding SE-ResNext and ResNext / PyT	6 år sedan
LOC_synset_mapping.json	5562ab767a Adding SE-ResNext and ResNext / PyT	6 år sedan
README.md	7bdfc81d25 [ConvNets/PyT] Triton and performance numbers updated	4 år sedan
checkpoint2model.py	7fbe4ab64c [ConvNets/PyT] QAT for EfficientNet	4 år sedan
classify.py	7fbe4ab64c [ConvNets/PyT] QAT for EfficientNet	4 år sedan
configs.yml	555b84b3b1 [ConvNets/PyT] Adding checkpoints for EfficientNet/PyT, Squeeze&Excitation can use Conv or Linear layer depending on `--trt` switch.	4 år sedan
launch.py	7fbe4ab64c [ConvNets/PyT] QAT for EfficientNet	4 år sedan
main.py	7fbe4ab64c [ConvNets/PyT] QAT for EfficientNet	4 år sedan
model2onnx.py	a1bbe6687e [PyT/ConvNets] Fixing bug in model2onnx	4 år sedan
multiproc.py	2bdf2775e3 [ConvNets/PyT] EfficientNet release	4 år sedan
quant_main.py	7fbe4ab64c [ConvNets/PyT] QAT for EfficientNet	4 år sedan
requirements.txt	7fbe4ab64c [ConvNets/PyT] QAT for EfficientNet	4 år sedan

Convolutional Network for Image Classification in PyTorch

In this repository you will find implementations of various image classification models.

Detailed information on each model can be found here:

Models
Validation accuracy results
Training performance results
Model comparison
- Accuracy vs FLOPS
- Latency vs Throughput on different batch sizes

Models

The following table provides links to where you can find additional information on each model:

Model	Link
resnet50	README
resnext101-32x4d	README
se-resnext101-32x4d	README
EfficientNet	README

Validation accuracy results

Our results were obtained by running the applicable training scripts in the 20.12 PyTorch NGC container on NVIDIA DGX-1 with (8x V100 16GB) GPUs. The specific training script that was run is documented in the corresponding model's README.

The following table shows the validation accuracy results of the three classification models side-by-side.

Model	Mixed Precision Top1	Mixed Precision Top5	32 bit Top1	32 bit Top5
efficientnet-b0	77.63	93.82	77.31	93.76
efficientnet-b4	82.98	96.44	82.92	96.43
efficientnet-widese-b0	77.89	94.00	77.97	94.05
efficientnet-widese-b4	83.28	96.45	83.30	96.47
resnet50	78.60	94.19	78.69	94.16
resnext101-32x4d	80.43	95.06	80.40	95.04
se-resnext101-32x4d	81.00	95.48	81.09	95.45

Training performance results

Training performance: NVIDIA DGX A100 (8x A100 80GB)

Our results were obtained by running the applicable training scripts in the 21.03 PyTorch NGC container on NVIDIA DGX A100 with (8x A100 80GB) GPUs. Performance numbers (in images per second) were averaged over an entire training epoch. The specific training script that was run is documented in the corresponding model's README.

The following table shows the training accuracy results of all the classification models side-by-side.

Model	Mixed Precision	TF32	Mixed Precision Speedup
efficientnet-b0	16652 img/s	8193 img/s	2.03 x
efficientnet-b4	2570 img/s	1223 img/s	2.1 x
efficientnet-widese-b0	16368 img/s	8244 img/s	1.98 x
efficientnet-widese-b4	2585 img/s	1223 img/s	2.11 x
resnet50	16621 img/s	7248 img/s	2.29 x
resnext101-32x4d	7925 img/s	3471 img/s	2.28 x
se-resnext101-32x4d	5779 img/s	2991 img/s	1.93 x

Training performance: NVIDIA DGX-1 16G (8x V100 16GB)

Our results were obtained by running the applicable training scripts in the 21.03 PyTorch NGC container on NVIDIA DGX-1 with (8x V100 16GB) GPUs. Performance numbers (in images per second) were averaged over an entire training epoch. The specific training script that was run is documented in the corresponding model's README.

The following table shows the training accuracy results of all the classification models side-by-side.

Model	Mixed Precision	FP32	Mixed Precision Speedup
efficientnet-b0	7789 img/s	4672 img/s	1.66 x
efficientnet-b4	1366 img/s	616 img/s	2.21 x
efficientnet-widese-b0	7875 img/s	4592 img/s	1.71 x
efficientnet-widese-b4	1356 img/s	612 img/s	2.21 x
resnet50	8322 img/s	2855 img/s	2.91 x
resnext101-32x4d	4065 img/s	1133 img/s	3.58 x
se-resnext101-32x4d	2971 img/s	1004 img/s	2.95 x

Model Comparison

Accuracy vs FLOPS

Plot describes relationship between floating point operations needed for computing forward pass on a 224px x 224px image, for the implemented models. Dot size indicates number of trainable parameters.

Latency vs Throughput on different batch sizes

Plot describes relationship between inference latency, throughput and batch size for the implemented models.

README.md