4 سال پیش · 6194324190
--- a/TensorFlow/Segmentation/UNet_3D_Medical/Dockerfile
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/Dockerfile
@@ -1,8 +1,13 @@
 
				-ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:20.06-tf1-py3
			
 
				+ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:21.10-tf1-py3
			
 
				 FROM ${FROM_IMAGE_NAME}
			
 
				 
			
 
				 ADD . /workspace/unet3d
			
 
				 WORKDIR /workspace/unet3d
			
 
				 
			
 
				-RUN pip install git+https://github.com/NVIDIA/dllogger
			
 
				+RUN pip install nvidia-pyindex
			
 
				+RUN pip install nvidia-dllogger==0.1.0
			
 
				 RUN pip install --disable-pip-version-check -r requirements.txt
			
 
				+
			
 
				+ENV TF_GPU_HOST_MEM_LIMIT_IN_MB=120000
			
 
				+ENV XLA_FLAGS="--xla_multiheap_size_constraint_per_heap=2600000000"
			
 
				+ENV OMPI_MCA_coll_hcoll_enable=0
			
--- a/TensorFlow/Segmentation/UNet_3D_Medical/README.md
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/README.md
@@ -1,6 +1,7 @@
 
				 # 3D-UNet Medical Image Segmentation for TensorFlow 1.x
			
 
				  
			
 
				-This repository provides a script and recipe to train 3D-UNet to achieve state of the art accuracy, and is tested and maintained by NVIDIA.
			
 
				+This repository provides a script and recipe to train the 3D-UNet model to achieve state-of-the-art accuracy.
			
 
				+The content of this repository is tested and maintained by NVIDIA.
			
 
				  
			
 
				 ## Table of Contents
			
 
				  
			
@@ -30,13 +31,14 @@ This repository provides a script and recipe to train 3D-UNet to achieve state o
 
				      * [Inference performance benchmark](#inference-performance-benchmark)
			
 
				    * [Results](#results)
			
 
				      * [Training accuracy results](#training-accuracy-results) 
			
 
				-       * [Training accuracy: NVIDIA DGX-1 (8x V100 32GB)](#training-accuracy-nvidia-dgx-1-8x-v100-16gb)
			
 
				+       * [Training accuracy: NVIDIA DGX A100 (8x A100 80G)](#training-accuracy-nvidia-dgx-a100-8x-a100-80g)
			
 
				+       * [Training accuracy: NVIDIA DGX-1 (8x V100 16G)](#training-accuracy-nvidia-dgx-1-8x-v100-16g)
			
 
				      * [Training performance results](#training-performance-results)
			
 
				-       * [Training performance: NVIDIA DGX-1 (8x V100 16GB)](#training-performance-nvidia-dgx-1-8x-v100-16gb)
			
 
				-       * [Training performance: NVIDIA DGX-1 (8x V100 32GB)](#training-performance-nvidia-dgx-1-8x-v100-32gb)
			
 
				+       * [Training performance: NVIDIA DGX A100 (8x A100 80G)](#training-performance-nvidia-dgx-a100-8x-a100-80g)
			
 
				+       * [Training performance: NVIDIA DGX-1 (8x V100 16G)](#training-performance-nvidia-dgx-1-8x-v100-16g)
			
 
				      * [Inference performance results](#inference-performance-results)
			
 
				-        * [Inference performance: NVIDIA DGX-1 (1x V100 16GB)](#inference-performance-nvidia-dgx-1-1x-v100-16gb)
			
 
				-        * [Inference performance: NVIDIA DGX-1 (1x V100 32GB)](#inference-performance-nvidia-dgx-1-1x-v100-32gb)
			
 
				+        * [Inference performance: NVIDIA DGX A100 (1x A100 80G)](#inference-performance-nvidia-dgx-a100-1x-a100-80g)
			
 
				+        * [Inference performance: NVIDIA DGX-1 (1x V100 16G)](#inference-performance-nvidia-dgx-1-1x-v100-16g)
			
 
				 - [Release notes](#release-notes)
			
 
				    * [Changelog](#changelog)
			
 
				    * [Known issues](#known-issues)
			
@@ -120,7 +122,7 @@ if params.amp:
 
				 ```
			
 
				 
			
 
				 
			
 
				- #### Enabling TF32
			
 
				+#### Enabling TF32
			
 
				 
			
 
				 TensorFloat-32 (TF32) is the new math mode in [NVIDIA A100](#https://www.nvidia.com/en-us/data-center/a100/) GPUs for handling the matrix math also called tensor operations. TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on Volta GPUs. 
			
 
				 
			
@@ -138,11 +140,11 @@ The following section lists the requirements that you need to meet in order to s
 
				  
			
 
				 This repository contains Dockerfile which extends the TensorFlow NGC container and encapsulates some dependencies. Aside from these dependencies, ensure you have the following components:
			
 
				 - [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker)
			
 
				-- TensorFlow 20.06-tf1-py3 [NGC container](https://ngc.nvidia.com/registry/nvidia-tensorflow)
			
 
				+- TensorFlow 21.10-tf1-py3 [NGC container](https://ngc.nvidia.com/registry/nvidia-tensorflow)
			
 
				 -   GPU-based architecture:
			
 
				     - [NVIDIA Volta](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/)
			
 
				     - [NVIDIA Turing](https://www.nvidia.com/en-us/geforce/turing/)
			
 
				-    - [NVIDIA Ampere architecture](https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/)
			
 
				+    - [NVIDIA Ampere](https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/)
			
 
				 
			
 
				  
			
 
				 For more information about how to get started with NGC containers, see the following sections from the NVIDIA GPU Cloud Documentation and the Deep Learning Documentation:
			
@@ -204,19 +206,19 @@ To train your model using mixed or TF32 precision with Tensor Cores or using FP3
 
				     After the Docker container is launched, the training of a single fold (fold 0) with the [default hyperparameters](#default-parameters) (for example 1/8 GPUs TF-AMP/FP32/TF32) can be started with:
			
 
				     
			
 
				     ```bash
			
 
				-    bash examples/unet3d_train_single{_TF-AMP}.sh <number/of/gpus> <path/to/dataset> <path/to/checkpoint> <batch/size>
			
 
				+    bash scripts/unet3d_train_single{_TF-AMP}.sh <number/of/gpus> <path/to/dataset> <path/to/checkpoint> <batch/size>
			
 
				     ```
			
 
				     
			
 
				     For example, to run with 32-bit precision (FP32 or TF32) with batch size 2 on 1 GPU, simply use:
			
 
				     
			
 
				     ```bash
			
 
				-    bash examples/unet3d_train_single.sh 1 /data/preprocessed /results 2
			
 
				+    bash scripts/unet3d_train_single.sh 1 /data/preprocessed /results 2
			
 
				     ```
			
 
				     
			
 
				     to train a single fold with mixed precision (TF-AMP) with on 8 GPUs batch size 2 per GPU, use:
			
 
				     
			
 
				     ```bash
			
 
				-    bash examples/unet3d_train_single_TF-AMP.sh 8 /data/preprocessed /results 2
			
 
				+    bash scripts/unet3d_train_single_TF-AMP.sh 8 /data/preprocessed /results 2
			
 
				     ```
			
 
				     The obtained dice scores will be reported after the training has finished.
			
 
				  
			
@@ -225,19 +227,19 @@ To train your model using mixed or TF32 precision with Tensor Cores or using FP3
 
				     The training performance can be evaluated by using benchmarking scripts, such as:
			
 
				     
			
 
				     ```bash
			
 
				-    bash examples/unet3d_{train,infer}_benchmark{_TF-AMP}.sh <number/of/gpus/for/training> <path/to/dataset> <path/to/checkpoint> <batch/size>
			
 
				+    bash scripts/unet3d_{train,infer}_benchmark{_TF-AMP}.sh <number/of/gpus/for/training> <path/to/dataset> <path/to/checkpoint> <batch/size>
			
 
				     ```
			
 
				     
			
 
				     which will make the model run and report the performance. For example, to benchmark training with TF-AMP with batch size 2 on 4 GPUs, use:
			
 
				     
			
 
				     ```bash
			
 
				-    bash examples/unet3d_train_benchmark_TF-AMP.sh 4 /data/preprocessed /results 2
			
 
				+    bash scripts/unet3d_train_benchmark_TF-AMP.sh 4 /data/preprocessed /results 2
			
 
				     ```
			
 
				     
			
 
				     to obtain inference performance with 32-bit precision (FP32 or TF32) with batch size 1, use:
			
 
				     
			
 
				     ```bash
			
 
				-    bash examples/unet3d_infer_benchmark.sh /data/preprocessed /results 1
			
 
				+    bash scripts/unet3d_infer_benchmark.sh /data/preprocessed /results 1
			
 
				     ```
			
 
				 
			
 
				 ## Advanced
			
@@ -270,7 +272,7 @@ The `runtime/` folder contains scripts with training and inference logic. Its co
 
				 * `unet3d.py`: Defines the model architecture using the blocks from the `layers.py` file.
			
 
				 
			
 
				 Other folders included in the root directory are:
			
 
				-* `examples/`: Provides examples for training and benchmarking U-Net
			
 
				+* `scripts/`: Provides examples for training and benchmarking U-Net
			
 
				 * `images/`: Contains the model diagram
			
 
				  
			
 
				 ### Parameters
			
@@ -347,6 +349,8 @@ optional arguments:
 
				  --amp                 Train using TF-AMP
			
 
				  --xla                 Train using XLA
			
 
				 ```
			
 
				+
			
 
				+### Getting the data
			
 
				  
			
 
				 The 3D-UNet model was trained in the [Brain Tumor Segmentation 2019 dataset](https://www.med.upenn.edu/cbica/brats-2019/). Test images provided by the organization were used to produce the resulting masks for submission. Upon registration, the challenge's data is made available through the https//ipp.cbica.upenn.edu service.
			
 
				  
			
@@ -432,13 +436,13 @@ The following section shows how to run benchmarks measuring the model performanc
 
				  
			
 
				 #### Training performance benchmark
			
 
				  
			
 
				-To benchmark training, run one of the `train_benchmark` scripts in `./examples/`:
			
 
				+To benchmark training, run one of the `train_benchmark` scripts in `./scripts/`:
			
 
				 ```bash
			
 
				-bash examples/unet3d_train_benchmark{_TF-AMP}.sh <num/of/gpus> <path/to/dataset> <path/to/checkpoints> <batch/size>
			
 
				+bash scripts/unet3d_train_benchmark{_TF-AMP}.sh <num/of/gpus> <path/to/dataset> <path/to/checkpoints> <batch/size>
			
 
				 ```
			
 
				 For example, to benchmark training using mixed-precision on 4 GPUs with batch size of 2 use:
			
 
				 ```bash
			
 
				-bash examples/unet3d_train_benchmark_TF-AMP.sh 4 <path/to/dataset> <path/to/checkpoints> 2
			
 
				+bash scripts/unet3d_train_benchmark_TF-AMP.sh 4 <path/to/dataset> <path/to/checkpoints> 2
			
 
				 ```
			
 
				  
			
 
				 Each of these scripts will by default run 40 warm-up iterations and benchmark the performance during training in the next 40 iterations.
			
@@ -452,14 +456,14 @@ At the end of the script, a line reporting the best train throughput will be pri
 
				  
			
 
				 #### Inference performance benchmark
			
 
				  
			
 
				-To benchmark inference, run one of the scripts in `./examples/`:
			
 
				+To benchmark inference, run one of the scripts in `./scripts/`:
			
 
				 ```bash
			
 
				-bash examples/unet3d_infer_benchmark{_TF-AMP}.sh <path/to/dataset> <path/to/checkpoints> <batch/size>
			
 
				+bash scripts/unet3d_infer_benchmark{_TF-AMP}.sh <path/to/dataset> <path/to/checkpoints> <batch/size>
			
 
				 ```
			
 
				  
			
 
				 For example, to benchmark inference using mixed-precision with batch size 4:
			
 
				 ```bash
			
 
				-bash examples/unet3d_infer_benchmark_TF-AMP.sh <path/to/dataset> <path/to/checkpoints> 4
			
 
				+bash scripts/unet3d_infer_benchmark_TF-AMP.sh <path/to/dataset> <path/to/checkpoints> 4
			
 
				 ```
			
 
				  
			
 
				 Each of these scripts will by default run 20 warm-up iterations and benchmark the performance during inference in the next 20 iterations.
			
@@ -476,22 +480,14 @@ At the end of the script, a line reporting the best inference throughput will be
 
				 The following sections provide details on how we achieved our performance and accuracy of training and inference.
			
 
				  
			
 
				 #### Training accuracy results
			
 
				-
			
 
				-##### Training accuracy: NVIDIA DGX-1 (8x V100 32GB)
			
 
				- 
			
 
				-The following table lists the average DICE score across 5-fold cross-validation. Our results were obtained by running the `examples/unet3d_train_full{_TF-AMP}.sh` training script in the `tensorflow:20.06-tf1-py3` NGC container on NVIDIA DGX-1 (8x V100 32GB) GPUs.
			
 
				- 
			
 
				-| GPUs | Batch size / GPU | DICE - FP32 | DICE - mixed precision | Time to train - FP32 | Time to train - mixed precision | Time to train speedup (FP32 to mixed precision) |
			
 
				-|---|---|--------|--------|--------|--------|------|
			
 
				-| 8 | 2 | 0.8818 | 0.8819 | 41 min | 23 min | 1.78 |
			
 
				  
			
 
				 To reproduce this result, start the Docker container interactively and run one of the train scripts:
			
 
				 ```bash
			
 
				-bash examples/unet3d_train_full{_TF-AMP}.sh <num/of/gpus> <path/to/dataset> <path/to/checkpoint> <batch/size>
			
 
				+bash scripts/unet3d_train_full{_TF-AMP}.sh <num/of/gpus> <path/to/dataset> <path/to/checkpoint> <batch/size>
			
 
				 ```
			
 
				  for example to train using 8 GPUs and batch size of 2:
			
 
				 ```bash
			
 
				-bash examples/unet3d_train_full_TF-AMP.sh 8 /data/preprocessed /results 2
			
 
				+bash scripts/unet3d_train_full_TF-AMP.sh 8 /data/preprocessed /results 2
			
 
				 ```
			
 
				 
			
 
				 This command will launch a script which will run 5-fold cross-validation training for 16,000 iterations on each fold and print:
			
@@ -501,82 +497,104 @@ This command will launch a script which will run 5-fold cross-validation trainin
 
				  
			
 
				 The time reported is for one fold, which means that the training of 5 folds will take 5 times longer. The default batch size is 2, however if you have less than 16 GB memory card and you encounter GPU memory issues you should decrease the batch size. The logs of the runs can be found in the `/results` directory once the script is finished.
			
 
				 
			
 
				+##### Training accuracy: NVIDIA DGX A100 (8x A100 80G)
			
 
				+ 
			
 
				+The following table lists the average DICE score across 5-fold cross-validation. Our results were obtained by running the `scripts/unet3d_train_full{_TF-AMP}.sh` training script in the `tensorflow:21.10-tf1-py3` NGC container on NVIDIA DGX A100 (8x A100 80G) GPUs.
			
 
				+ 
			
 
				+| GPUs | Batch size / GPU | DICE - TF32 | DICE - mixed precision | Time to train - FP32 | Time to train - mixed precision | Time to train speedup (FP32 to mixed precision) |
			
 
				+|---|---|--------|--------|--------|--------|------|
			
 
				+| 8 | 2 | 0.8818 | 0.8819 |  8 min |  7 min | 1.14 |
			
 
				+
			
 
				+##### Training accuracy: NVIDIA DGX-1 (8x V100 16G)
			
 
				+ 
			
 
				+The following table lists the average DICE score across 5-fold cross-validation. Our results were obtained by running the `scripts/unet3d_train_full{_TF-AMP}.sh` training script in the `tensorflow:21.10-tf1-py3` NGC container on NVIDIA DGX-1 (8x V100 16G) GPUs.
			
 
				+ 
			
 
				+| GPUs | Batch size / GPU | DICE - FP32 | DICE - mixed precision | Time to train - FP32 | Time to train - mixed precision | Time to train speedup (FP32 to mixed precision) |
			
 
				+|---|---|--------|--------|--------|--------|------|
			
 
				+| 8 | 2 | 0.8818 | 0.8819 | 33 min | 13 min | 2.54 |
			
 
				+
			
 
				 #### Training performance results
			
 
				 
			
 
				-##### Training performance: NVIDIA DGX-1 (8x V100 16GB)
			
 
				+##### Training performance: NVIDIA DGX A100 (8x A100 80G)
			
 
				  
			
 
				-Our results were obtained by running the `examples/unet3d_train_benchmark{_TF-AMP}.sh` training script in the `tensorflow:20.06-tf1-py3` NGC container on NVIDIA DGX-1 with (8x V100 16GB) GPUs. Performance numbers (in volumes per second) were averaged over 80 iterations, excluding the first 40 warm-up steps.
			
 
				+Our results were obtained by running the `scripts/unet3d_train_benchmark{_TF-AMP}.sh` training script in the `tensorflow:21.10-tf1-py3` NGC container on NVIDIA DGX A100 with (8x A100 80G) GPUs. Performance numbers (in volumes per second) were averaged over 80 iterations, excluding the first 40 warm-up steps.
			
 
				  
			
 
				-| GPUs | Batch size / GPU | Throughput - FP32 [img/s] | Throughput - mixed precision [img/s] | Throughput speedup (FP32 - mixed precision) | Weak scaling - FP32 | Weak scaling - mixed precision |       
			
 
				-|---|---|--------|--------|-------|-------|-------|
			
 
				-| 1 | 2 | 1.987  | 4.381  | 2.205 | N/A   | N/A   |
			
 
				-| 8 | 2 | 14.843 | 28.948 | 1.950 | 7.471 | 6.608 |
			
 
				+| GPUs | Batch size / GPU | Throughput - TF32 [img/s] | Throughput - mixed precision [img/s] | Throughput speedup (FP32 - mixed precision) | Weak scaling - FP32 | Weak scaling - mixed precision |       
			
 
				+|---|---|--------|--------|------|------|------|
			
 
				+| 1 | 2 | 10.40  |  17.91 | 1.72 | N/A  | N/A  |
			
 
				+| 1 | 4 | 10.66  |  19.88 | 1.86 | N/A  | N/A  |
			
 
				+| 1 | 8 |  3.99  |  20.89 | 5.23 | N/A  | N/A  |
			
 
				+| 8 | 2 | 81.71  | 100.24 | 1.23 | 7.85 | 5.60 |
			
 
				+| 8 | 4 | 80.65  | 140.44 | 1.74 | 7.56 | 7.06 |
			
 
				+| 8 | 8 | 29.79  | 137.61 | 4.62 | 7.47 | 6.59 |
			
 
				 
			
 
				-##### Training performance: NVIDIA DGX-1 (8x V100 32GB)
			
 
				+##### Training performance: NVIDIA DGX-1 (8x V100 16G)
			
 
				  
			
 
				-Our results were obtained by running the `examples/unet3d_train_benchmark{_TF-AMP}.sh` training script in the `tensorflow:20.06-tf1-py3` NGC container on NVIDIA DGX-1 with (8x V100 32GB) GPUs. Performance numbers (in volumes per second) were averaged over 80 iterations, excluding the first 40 warm-up steps.
			
 
				+Our results were obtained by running the `scripts/unet3d_train_benchmark{_TF-AMP}.sh` training script in the `tensorflow:21.10-tf1-py3` NGC container on NVIDIA DGX-1 with (8x V100 16G) GPUs. Performance numbers (in volumes per second) were averaged over 80 iterations, excluding the first 40 warm-up steps.
			
 
				  
			
 
				 | GPUs | Batch size / GPU | Throughput - FP32 [img/s] | Throughput - mixed precision [img/s] | Throughput speedup (FP32 - mixed precision) | Weak scaling - FP32 | Weak scaling - mixed precision |       
			
 
				-|---|---|--------|--------|-------|-------|-------|
			
 
				-| 1 | 2 | 2.002  | 4.360  | 2.177 | N/A   | N/A   |
			
 
				-| 1 | 4 | 2.160  | 4.407  | 2.041 | N/A   | N/A   |
			
 
				-| 8 | 2 | 14.781 | 26.694 | 1.806 | 7.381 | 6.123 |
			
 
				-| 8 | 4 | 16.013 | 28.423 | 1.775 | 7.414 | 6.449 |
			
 
				+|---|---|-------|-------|------|------|------|
			
 
				+| 1 | 1 |  1.87 |  7.45 | 3.98 | N/A  | N/A  |
			
 
				+| 1 | 2 |  2.32 |  8.79 | 3.79 | N/A  | N/A  |
			
 
				+| 8 | 1 | 14.49 | 46.88 | 3.23 | 7.75 | 6.29 |
			
 
				+| 8 | 2 | 18.06 | 58.30 | 3.23 | 7.78 | 6.63 |
			
 
				 
			
 
				- 
			
 
				 To achieve these same results, follow the steps in the [Training performance benchmark](#training-performance-benchmark) section.
			
 
				  
			
 
				 #### Inference performance results
			
 
				 
			
 
				-##### Inference performance: NVIDIA DGX-1 (1x V100 16GB)
			
 
				+##### Inference performance: NVIDIA DGX A100 (1x A100 80G)
			
 
				  
			
 
				-Our results were obtained by running the `examples/unet3d_infer_benchmark{_TF-AMP}.sh` inferencing benchmarking script in the `tensorflow:20.06-tf1-py3` NGC container on NVIDIA DGX-1 with (1x V100 16GB) GPU. Performance numbers (in volumes per second) were averaged over 40 iterations, excluding the first 20 warm-up steps.
			
 
				+Our results were obtained by running the `scripts/unet3d_infer_benchmark{_TF-AMP}.sh` inference benchmarking script in the `tensorflow:21.10-tf1-py3` NGC container on NVIDIA DGX A100 with (1x A100 80G) GPU. Performance numbers (in volumes per second) were averaged over 40 iterations, excluding the first 20 warm-up steps.
			
 
				  
			
 
				 FP16
			
 
				  
			
 
				 | Batch size | Resolution | Throughput Avg [img/s] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
			
 
				-|---|---------------|-------|----------|----------|----------|----------|
			
 
				-| 1 | 224x224x160x4 | 2.546 | 392.803  | 393.031  | 393.075  | 393.160  |
			
 
				-| 2 | 224x224x160x4 | 2.923 | 684.363  | 684.806  | 684.891  | 685.056  |
			
 
				-| 4 | 224x224x160x4 | 3.408 | 1173.739 | 1174.369 | 1174.489 | 1174.725 |
			
 
				- 
			
 
				-FP32
			
 
				+|---|---------------|-------|--------|--------|--------|--------|
			
 
				+| 1 | 224x224x160x4 | 15.58 |  67.32 |  68.63 |  78.00 | 109.42 |
			
 
				+| 2 | 224x224x160x4 | 15.81 | 129.06 | 129.93 | 135.31 | 166.62 |
			
 
				+| 4 | 224x224x160x4 |  8.34 | 479.47 | 482.55 | 487.68 | 494.80 |
			
 
				+
			
 
				+TF32
			
 
				  
			
 
				 | Batch size | Resolution | Throughput Avg [img/s] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
			
 
				-|---|---------------|-------|----------|----------|----------|----------|
			
 
				-| 1 | 224x224x160x4 | 1.527 | 654.911  | 655.180  | 655.232  | 655.333  |
			
 
				-| 2 | 224x224x160x4 | 1.554 | 1287.376 | 1287.997 | 1288.116 | 1288.348 |
			
 
				-| 4 | 224x224x160x4 | OOM   |          |          |          |          |
			
 
				- 
			
 
				- 
			
 
				-##### Inference performance: NVIDIA DGX-1 (1x V100 32GB)
			
 
				- 
			
 
				-Our results were obtained by running the `examples/unet3d_infer_benchmark{_TF-AMP}.sh` inferencing benchmarking script in the `tensorflow:20.06-tf1-py3` NGC container on NVIDIA DGX-1 with (1x V100 32GB) GPU. Performance numbers (in volumes per second) were averaged over 40 iterations, excluding the first 20 warm-up steps.
			
 
				+|---|---------------|-------|---------|---------|---------|---------|
			
 
				+| 1 | 224x224x160x4 |  9.42 |  106.22 |  106.68 |  107.67 |  122.73 |
			
 
				+| 2 | 224x224x160x4 |  4.69 |  427.13 |  428.33 |  428.76 |  429.19 |
			
 
				+| 4 | 224x224x160x4 |  2.32 | 1723.79 | 1725.77 | 1726.30 | 1728.23 |
			
 
				+  
			
 
				+To achieve these same results, follow the steps in the [Inference performance benchmark](#inference-performance-benchmark) section.
			
 
				 
			
 
				+##### Inference performance: NVIDIA DGX-1 (1x V100 16G)
			
 
				+ 
			
 
				+Our results were obtained by running the `scripts/unet3d_infer_benchmark{_TF-AMP}.sh` inference benchmarking script in the `tensorflow:21.10-tf1-py3` NGC container on NVIDIA DGX-1 with (1x V100 16G) GPU. Performance numbers (in volumes per second) were averaged over 40 iterations, excluding the first 20 warm-up steps.
			
 
				  
			
 
				 FP16
			
 
				  
			
 
				 | Batch size | Resolution | Throughput Avg [img/s] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
			
 
				-|---|---------------|-------|----------|----------|----------|----------|
			
 
				-| 1 | 224x224x160x4 | 2.576 | 388.276  | 388.400  | 388.423  | 388.470  |
			
 
				-| 2 | 224x224x160x4 | 2.861 | 699.078  | 699.567  | 699.660  | 699.843  |
			
 
				-| 4 | 224x224x160x4 | 3.333 | 1200.198 | 1200.631 | 1200.714 | 1200.877 |
			
 
				+|---|---------------|------|--------|--------|--------|--------|
			
 
				+| 1 | 224x224x160x4 | 7.64 | 136.81 | 138.94 | 143.59 | 152.74 |
			
 
				+| 2 | 224x224x160x4 | 7.75 | 260.66 | 267.07 | 270.88 | 274.44 |
			
 
				+| 4 | 224x224x160x4 | 4.78 | 838.52 | 842.88 | 843.30 | 844.62 |
			
 
				  
			
 
				 FP32
			
 
				  
			
 
				 | Batch size | Resolution | Throughput Avg [img/s] | Latency Avg [ms] | Latency 90% [ms] | Latency 95% [ms] | Latency 99% [ms] |
			
 
				-|---|---------------|-------|----------|----------|----------|----------|
			
 
				-| 1 | 224x224x160x4 | 1.990 | 502.485  | 502.550  | 502.563  | 502.587  |
			
 
				-| 2 | 224x224x160x4 | 2.013 | 993.650  | 993.982  | 994.046  | 994.170  |
			
 
				-| 4 | 224x224x160x4 | 2.435 | 1642.637 | 1643.058 | 1643.139 | 1643.297 |
			
 
				+|---|---------------|------|--------|--------|--------|--------|
			
 
				+| 1 | 224x224x160x4 | 2.30 | 434.95 | 436.82 | 437.40 | 438.48 |
			
 
				+| 2 | 224x224x160x4 | 2.40 | 834.99 | 837.22 | 837.51 | 838.18 |
			
 
				+| 4 | 224x224x160x4 | OOM  |        |        |        |        |
			
 
				  
			
 
				-To achieve these same results, follow the steps in the [Inference performance benchmark](#inference-performance-benchmark) section.
			
 
				  
			
 
				+To achieve these same results, follow the steps in the [Inference performance benchmark](#inference-performance-benchmark) section.
			
 
				 
			
 
				  
			
 
				 ## Release notes
			
 
				  
			
 
				 ### Changelog
			
 
				+
			
 
				+November 2021
			
 
				+* Updated README tables
			
 
				  
			
 
				 June 2020
			
 
				 * Initial release
			
--- a/TensorFlow/Segmentation/UNet_3D_Medical/dataset/data_loader.py
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/dataset/data_loader.py
@@ -1,4 +1,4 @@
 
				-# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
			
 
				+# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
			
 
				 #
			
 
				 # Licensed under the Apache License, Version 2.0 (the "License");
			
 
				 # you may not use this file except in compliance with the License.
			
@@ -12,33 +12,56 @@
 
				 # See the License for the specific language governing permissions and
			
 
				 # limitations under the License.
			
 
				 
			
 
				+""" Data loader """
			
 
				 import os
			
 
				 
			
 
				-import horovod.tensorflow as hvd
			
 
				 import numpy as np
			
 
				+import horovod.tensorflow as hvd
			
 
				 import tensorflow as tf
			
 
				 
			
 
				 from dataset.transforms import NormalizeImages, OneHotLabels, apply_transforms, PadXYZ, RandomCrop3D, \
			
 
				-    RandomHorizontalFlip, RandomGammaCorrection, RandomVerticalFlip, RandomBrightnessCorrection, CenterCrop, \
			
 
				+    RandomHorizontalFlip, RandomBrightnessCorrection, CenterCrop, \
			
 
				     apply_test_transforms, Cast
			
 
				 
			
 
				 CLASSES = {0: "TumorCore", 1: "PeritumoralEdema", 2: "EnhancingTumor"}
			
 
				 
			
 
				 
			
 
				-def cross_validation(x: np.ndarray, fold_idx: int, n_folds: int):
			
 
				+def cross_validation(arr: np.ndarray, fold_idx: int, n_folds: int):
			
 
				+    """ Split data into folds for training and evaluation
			
 
				+
			
 
				+    :param arr: Collection items to split
			
 
				+    :param fold_idx: Index of crossvalidation fold
			
 
				+    :param n_folds: Total number of folds
			
 
				+    :return: Train and Evaluation folds
			
 
				+    """
			
 
				     if fold_idx < 0 or fold_idx >= n_folds:
			
 
				         raise ValueError('Fold index has to be [0, n_folds). Received index {} for {} folds'.format(fold_idx, n_folds))
			
 
				 
			
 
				-    _folders = np.array_split(x, n_folds)
			
 
				+    _folders = np.array_split(arr, n_folds)
			
 
				 
			
 
				     return np.concatenate(_folders[:fold_idx] + _folders[fold_idx + 1:]), _folders[fold_idx]
			
 
				 
			
 
				 
			
 
				-class Dataset:
			
 
				-    def __init__(self, data_dir, batch_size=2, fold_idx=0, n_folds=5, seed=0, pipeline_factor=1, params=None):
			
 
				-        self._folders = np.array([os.path.join(data_dir, path) for path in os.listdir(data_dir)])
			
 
				+class Dataset: # pylint: disable=R0902
			
 
				+    """ Class responsible for the data loading during training, inference and evaluation """
			
 
				+
			
 
				+    def __init__(self, data_dir, batch_size=2, input_shape=(128, 128, 128), # pylint: disable=R0913
			
 
				+                 fold_idx=0, n_folds=5, seed=0, params=None):
			
 
				+        """ Creates and configures the dataset
			
 
				+
			
 
				+        :param data_dir: Directory where the data is stored
			
 
				+        :param batch_size: Number of pairs to be provided by batch
			
 
				+        :param input_shape: Dimension of the input to the model
			
 
				+        :param fold_idx: Fold index for crossvalidation
			
 
				+        :param n_folds: Total number of folds in crossvalidation
			
 
				+        :param seed: Random seed
			
 
				+        :param params: Dictionary with additional configuration parameters
			
 
				+        """
			
 
				+        self._folders = np.array([os.path.join(data_dir, path) for path in os.listdir(data_dir)
			
 
				+                                  if path.endswith(".tfrecords")])
			
 
				+        assert len(self._folders) > 0, "No matching data found at {}".format(data_dir)
			
 
				         self._train, self._eval = cross_validation(self._folders, fold_idx=fold_idx, n_folds=n_folds)
			
 
				-        self._pipeline_factor = pipeline_factor
			
 
				+        self._input_shape = input_shape
			
 
				         self._data_dir = data_dir
			
 
				         self.params = params
			
 
				 
			
@@ -49,6 +72,11 @@ class Dataset:
 
				         self._yshape = (240, 240, 155)
			
 
				 
			
 
				     def parse(self, serialized):
			
 
				+        """ Parse TFRecord
			
 
				+
			
 
				+        :param serialized: Serialized record for a particular example
			
 
				+        :return: sample, label, mean and std of intensities
			
 
				+        """
			
 
				         features = {
			
 
				             'X': tf.io.FixedLenFeature([], tf.string),
			
 
				             'Y': tf.io.FixedLenFeature([], tf.string),
			
@@ -59,17 +87,22 @@ class Dataset:
 
				         parsed_example = tf.io.parse_single_example(serialized=serialized,
			
 
				                                                     features=features)
			
 
				 
			
 
				-        x = tf.io.decode_raw(parsed_example['X'], tf.uint8)
			
 
				-        x = tf.cast(tf.reshape(x, self._xshape), tf.uint8)
			
 
				-        y = tf.io.decode_raw(parsed_example['Y'], tf.uint8)
			
 
				-        y = tf.cast(tf.reshape(y, self._yshape), tf.uint8)
			
 
				+        sample = tf.io.decode_raw(parsed_example['X'], tf.uint8)
			
 
				+        sample = tf.cast(tf.reshape(sample, self._xshape), tf.uint8)
			
 
				+        label = tf.io.decode_raw(parsed_example['Y'], tf.uint8)
			
 
				+        label = tf.cast(tf.reshape(label, self._yshape), tf.uint8)
			
 
				 
			
 
				         mean = parsed_example['mean']
			
 
				         stdev = parsed_example['stdev']
			
 
				 
			
 
				-        return x, y, mean, stdev
			
 
				+        return sample, label, mean, stdev
			
 
				 
			
 
				     def parse_x(self, serialized):
			
 
				+        """ Parse only the sample in a TFRecord with sample and label
			
 
				+
			
 
				+        :param serialized:
			
 
				+        :return: sample, mean and std of intensities
			
 
				+        """
			
 
				         features = {'X': tf.io.FixedLenFeature([], tf.string),
			
 
				                     'Y': tf.io.FixedLenFeature([], tf.string),
			
 
				                     'mean': tf.io.FixedLenFeature([4], tf.float32),
			
@@ -78,28 +111,32 @@ class Dataset:
 
				         parsed_example = tf.io.parse_single_example(serialized=serialized,
			
 
				                                                     features=features)
			
 
				 
			
 
				-        x = tf.io.decode_raw(parsed_example['X'], tf.uint8)
			
 
				-        x = tf.cast(tf.reshape(x, self._xshape), tf.uint8)
			
 
				+        sample = tf.io.decode_raw(parsed_example['X'], tf.uint8)
			
 
				+        sample = tf.cast(tf.reshape(sample, self._xshape), tf.uint8)
			
 
				 
			
 
				         mean = parsed_example['mean']
			
 
				         stdev = parsed_example['stdev']
			
 
				 
			
 
				-        return x, mean, stdev
			
 
				+        return sample, mean, stdev
			
 
				 
			
 
				     def train_fn(self):
			
 
				+        """ Create dataset for training """
			
 
				+        if 'debug' in self.params.exec_mode:
			
 
				+            return self.synth_train_fn()
			
 
				+
			
 
				         assert len(self._train) > 0, "Training data not found."
			
 
				 
			
 
				-        ds = tf.data.TFRecordDataset(filenames=self._train)
			
 
				+        dataset = tf.data.TFRecordDataset(filenames=self._train)
			
 
				 
			
 
				-        ds = ds.shard(hvd.size(), hvd.rank())
			
 
				-        ds = ds.cache()
			
 
				-        ds = ds.shuffle(buffer_size=self._batch_size * 8, seed=self._seed)
			
 
				-        ds = ds.repeat()
			
 
				+        dataset = dataset.shard(hvd.size(), hvd.rank())
			
 
				+        dataset = dataset.cache()
			
 
				+        dataset = dataset.shuffle(buffer_size=self._batch_size * 8, seed=self._seed)
			
 
				+        dataset = dataset.repeat()
			
 
				 
			
 
				-        ds = ds.map(self.parse, num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				+        dataset = dataset.map(self.parse, num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				 
			
 
				         transforms = [
			
 
				-            RandomCrop3D((128, 128, 128)),
			
 
				+            RandomCrop3D(self._input_shape),
			
 
				             RandomHorizontalFlip() if self.params.augment else None,
			
 
				             Cast(dtype=tf.float32),
			
 
				             NormalizeImages(),
			
@@ -107,22 +144,29 @@ class Dataset:
 
				             OneHotLabels(n_classes=4),
			
 
				         ]
			
 
				 
			
 
				-        ds = ds.map(map_func=lambda x, y, mean, stdev: apply_transforms(x, y, mean, stdev, transforms=transforms),
			
 
				-                    num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				+        dataset = dataset.map(
			
 
				+            map_func=lambda x, y, mean, stdev: apply_transforms(x, y, mean, stdev, transforms=transforms),
			
 
				+            num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				 
			
 
				-        ds = ds.batch(batch_size=self._batch_size,
			
 
				-                      drop_remainder=True)
			
 
				+        dataset = dataset.batch(batch_size=self._batch_size,
			
 
				+                                drop_remainder=True)
			
 
				 
			
 
				-        ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
			
 
				+        dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
			
 
				 
			
 
				-        return ds
			
 
				+        if self._batch_size == 1:
			
 
				+            options = dataset.options()
			
 
				+            options.experimental_optimization.map_and_batch_fusion = False
			
 
				+            dataset = dataset.with_options(options)
			
 
				+
			
 
				+        return dataset
			
 
				 
			
 
				     def eval_fn(self):
			
 
				-        ds = tf.data.TFRecordDataset(filenames=self._eval)
			
 
				+        """ Create dataset for evaluation """
			
 
				+        dataset = tf.data.TFRecordDataset(filenames=self._eval)
			
 
				         assert len(self._eval) > 0, "Evaluation data not found. Did you specify --fold flag?"
			
 
				 
			
 
				-        ds = ds.cache()
			
 
				-        ds = ds.map(self.parse, num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				+        dataset = dataset.cache()
			
 
				+        dataset = dataset.map(self.parse, num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				 
			
 
				         transforms = [
			
 
				             CenterCrop((224, 224, 155)),
			
@@ -132,20 +176,28 @@ class Dataset:
 
				             PadXYZ()
			
 
				         ]
			
 
				 
			
 
				-        ds = ds.map(map_func=lambda x, y, mean, stdev: apply_transforms(x, y, mean, stdev, transforms=transforms),
			
 
				-                    num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				-        ds = ds.batch(batch_size=self._batch_size,
			
 
				-                      drop_remainder=False)
			
 
				-        ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
			
 
				+        dataset = dataset.map(
			
 
				+            map_func=lambda x, y, mean, stdev: apply_transforms(x, y, mean, stdev, transforms=transforms),
			
 
				+            num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				+        dataset = dataset.batch(batch_size=self._batch_size,
			
 
				+                                drop_remainder=False)
			
 
				+        dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
			
 
				+
			
 
				+        return dataset
			
 
				 
			
 
				-        return ds
			
 
				+    def test_fn(self):
			
 
				+        """ Create dataset for inference """
			
 
				+        if 'debug' in self.params.exec_mode:
			
 
				+            return self.synth_predict_fn()
			
 
				 
			
 
				-    def test_fn(self, count=1, drop_remainder=False):
			
 
				-        ds = tf.data.TFRecordDataset(filenames=self._eval)
			
 
				+        count = 1 if not self.params.benchmark \
			
 
				+            else 2 * self.params.warmup_steps * self.params.batch_size // self.test_size
			
 
				+
			
 
				+        dataset = tf.data.TFRecordDataset(filenames=self._eval)
			
 
				         assert len(self._eval) > 0, "Evaluation data not found. Did you specify --fold flag?"
			
 
				 
			
 
				-        ds = ds.repeat(count)
			
 
				-        ds = ds.map(self.parse_x, num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				+        dataset = dataset.repeat(count)
			
 
				+        dataset = dataset.map(self.parse_x, num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				 
			
 
				         transforms = [
			
 
				             CenterCrop((224, 224, 155)),
			
@@ -154,23 +206,50 @@ class Dataset:
 
				             PadXYZ((224, 224, 160))
			
 
				         ]
			
 
				 
			
 
				-        ds = ds.map(map_func=lambda x, mean, stdev: apply_test_transforms(x, mean, stdev, transforms=transforms),
			
 
				-                    num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				-        ds = ds.batch(batch_size=self._batch_size,
			
 
				-                      drop_remainder=drop_remainder)
			
 
				-        ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
			
 
				+        dataset = dataset.map(
			
 
				+            map_func=lambda x, mean, stdev: apply_test_transforms(x, mean, stdev, transforms=transforms),
			
 
				+            num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				+        dataset = dataset.batch(batch_size=self._batch_size,
			
 
				+                                drop_remainder=self.params.benchmark)
			
 
				+        dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
			
 
				+
			
 
				+        return dataset
			
 
				 
			
 
				-        return ds
			
 
				+    def export_fn(self):
			
 
				+        """ Create dataset for calibrating and exporting """
			
 
				+        dataset = tf.data.TFRecordDataset(filenames=self._eval)
			
 
				+        assert len(self._eval) > 0, "Evaluation data not found. Did you specify --fold flag?"
			
 
				+
			
 
				+        dataset = dataset.repeat(1)
			
 
				+        dataset = dataset.map(self.parse_x, num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				+
			
 
				+        transforms = [
			
 
				+            CenterCrop((224, 224, 155)),
			
 
				+            Cast(dtype=tf.float32),
			
 
				+            NormalizeImages(),
			
 
				+            PadXYZ((224, 224, 160))
			
 
				+        ]
			
 
				+
			
 
				+        dataset = dataset.map(
			
 
				+            map_func=lambda x, mean, stdev: apply_test_transforms(x, mean, stdev, transforms=transforms),
			
 
				+            num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				+        dataset = dataset.batch(batch_size=self._batch_size,
			
 
				+                                drop_remainder=True)
			
 
				+        dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
			
 
				+
			
 
				+        return dataset
			
 
				 
			
 
				     def synth_train_fn(self):
			
 
				-        """Synthetic data function for testing"""
			
 
				-        inputs = tf.random_uniform(self._xshape, dtype=tf.int32, minval=0, maxval=255, seed=self._seed,
			
 
				+        """ Synthetic data function for training """
			
 
				+        inputs = tf.random.uniform(self._xshape, dtype=tf.int32, minval=0, maxval=255, seed=self._seed,
			
 
				                                    name='synth_inputs')
			
 
				-        masks = tf.random_uniform(self._yshape, dtype=tf.int32, minval=0, maxval=4, seed=self._seed,
			
 
				+        masks = tf.random.uniform(self._yshape, dtype=tf.int32, minval=0, maxval=4, seed=self._seed,
			
 
				                                   name='synth_masks')
			
 
				+        mean = tf.random.uniform((4,), dtype=tf.float32, minval=0, maxval=255, seed=self._seed)
			
 
				+        stddev = tf.random.uniform((4,), dtype=tf.float32, minval=0, maxval=1, seed=self._seed)
			
 
				 
			
 
				-        ds = tf.data.Dataset.from_tensors((inputs, masks))
			
 
				-        ds = ds.repeat()
			
 
				+        dataset = tf.data.Dataset.from_tensors((inputs, masks))
			
 
				+        dataset = dataset.repeat()
			
 
				 
			
 
				         transforms = [
			
 
				             Cast(dtype=tf.uint8),
			
@@ -182,73 +261,38 @@ class Dataset:
 
				             OneHotLabels(n_classes=4),
			
 
				         ]
			
 
				 
			
 
				-        ds = ds.map(map_func=lambda x, y: apply_transforms(x, y, transforms),
			
 
				-                    num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				-        ds = ds.batch(self._batch_size)
			
 
				-        ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
			
 
				+        dataset = dataset.map(map_func=lambda x, y: apply_transforms(x, y, mean, stddev, transforms),
			
 
				+                              num_parallel_calls=tf.data.experimental.AUTOTUNE)
			
 
				+        dataset = dataset.batch(self._batch_size)
			
 
				+        dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
			
 
				 
			
 
				-        return ds
			
 
				+        return dataset
			
 
				 
			
 
				-    def synth_predict_fn(self, count=1):
			
 
				+    def synth_predict_fn(self):
			
 
				         """Synthetic data function for testing"""
			
 
				-        inputs = tf.truncated_normal((64, 64, 64, 4), dtype=tf.float32, mean=0.0, stddev=1.0, seed=self._seed,
			
 
				+        inputs = tf.random.truncated_normal((224, 224, 160, 4), dtype=tf.float32, mean=0.0, stddev=1.0, seed=self._seed,
			
 
				                                      name='synth_inputs')
			
 
				 
			
 
				-        ds = tf.data.Dataset.from_tensors(inputs)
			
 
				-        ds = ds.repeat(count)
			
 
				-        ds = ds.batch(self._batch_size)
			
 
				-        ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
			
 
				+        count = 2 * self.params.warmup_steps
			
 
				 
			
 
				-        return ds
			
 
				+        dataset = tf.data.Dataset.from_tensors(inputs)
			
 
				+        dataset = dataset.repeat(count)
			
 
				+        dataset = dataset.batch(self._batch_size)
			
 
				+        dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
			
 
				+
			
 
				+        return dataset
			
 
				 
			
 
				     @property
			
 
				     def train_size(self):
			
 
				+        """ Number of pairs in the training set """
			
 
				         return len(self._train)
			
 
				 
			
 
				     @property
			
 
				     def eval_size(self):
			
 
				+        """ Number of pairs in the validation set """
			
 
				         return len(self._eval)
			
 
				 
			
 
				     @property
			
 
				     def test_size(self):
			
 
				+        """ Number of pairs in the test set """
			
 
				         return len(self._eval)
			
 
				-
			
 
				-
			
 
				-def main():
			
 
				-    from time import time
			
 
				-    hvd.init()
			
 
				-
			
 
				-    dataset = Dataset(data_dir='/data/BraTS19_tfrecord', batch_size=3)
			
 
				-
			
 
				-    it = dataset.test().make_initializable_iterator()
			
 
				-
			
 
				-    sess = tf.Session()
			
 
				-    sess.run(it.initializer)
			
 
				-
			
 
				-    next_element = it.get_next()
			
 
				-
			
 
				-    t0 = time()
			
 
				-    cnt = 0
			
 
				-    # while True:
			
 
				-    import matplotlib.pyplot as plt
			
 
				-    import numpy.ma as ma
			
 
				-    for i in range(200):
			
 
				-        t0 = time()
			
 
				-        # if i == 20:
			
 
				-        #     t0 = time()
			
 
				-
			
 
				-        res = sess.run(next_element)
			
 
				-        a = res[0]
			
 
				-        a = a[0, :, :, 80, 0]
			
 
				-        a = ma.masked_array(a, mask=a == 0)
			
 
				-        # plt.imshow(a.astype(np.uint8))
			
 
				-        plt.imshow(a)
			
 
				-        plt.colorbar()
			
 
				-        plt.savefig("/opt/project/img.png")
			
 
				-
			
 
				-        # print()
			
 
				-        print(time() - t0)
			
 
				-
			
 
				-
			
 
				-if __name__ == '__main__':
			
 
				-    main()
			
--- a/TensorFlow/Segmentation/UNet_3D_Medical/dataset/preprocess_data.py
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/dataset/preprocess_data.py
@@ -1,4 +1,4 @@
 
				-# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
			
 
				+# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
			
 
				 #
			
 
				 # Licensed under the Apache License, Version 2.0 (the "License");
			
 
				 # you may not use this file except in compliance with the License.
			
@@ -12,6 +12,15 @@
 
				 # See the License for the specific language governing permissions and
			
 
				 # limitations under the License.
			
 
				 
			
 
				+""" Preprocess dataset and prepare it for training
			
 
				+
			
 
				+Example usage:
			
 
				+    $ python preprocess_data.py --input_dir ./src --output_dir ./dst
			
 
				+    --vol_per_file 2
			
 
				+
			
 
				+All arguments are listed under `python preprocess_data.py -h`.
			
 
				+
			
 
				+"""
			
 
				 import os
			
 
				 import argparse
			
 
				 from random import shuffle
			
@@ -20,7 +29,6 @@ import numpy as np
 
				 import nibabel as nib
			
 
				 import tensorflow as tf
			
 
				 
			
 
				-
			
 
				 PARSER = argparse.ArgumentParser()
			
 
				 
			
 
				 PARSER.add_argument('--input_dir', '-i',
			
@@ -38,10 +46,15 @@ PARSER.add_argument('--single_data_dir', dest='single_data_dir', action='store_t
 
				 
			
 
				 
			
 
				 def load_features(path):
			
 
				+    """ Load features from Nifti
			
 
				+
			
 
				+    :param path: Path to dataset
			
 
				+    :return: Loaded data
			
 
				+    """
			
 
				     data = np.zeros((240, 240, 155, 4), dtype=np.uint8)
			
 
				     name = os.path.basename(path)
			
 
				     for i, modality in enumerate(["_t1.nii.gz", "_t1ce.nii.gz", "_t2.nii.gz", "_flair.nii.gz"]):
			
 
				-        vol = load_single_nifti(os.path.join(path, name+modality)).astype(np.float32)
			
 
				+        vol = load_single_nifti(os.path.join(path, name + modality)).astype(np.float32)
			
 
				         vol[vol > 0.85 * vol.max()] = 0.85 * vol.max()
			
 
				         vol = 255 * vol / vol.max()
			
 
				         data[..., i] = vol.astype(np.uint8)
			
@@ -50,16 +63,37 @@ def load_features(path):
 
				 
			
 
				 
			
 
				 def load_segmentation(path):
			
 
				+    """ Load segmentations from Nifti
			
 
				+
			
 
				+    :param path: Path to dataset
			
 
				+    :return: Loaded data
			
 
				+    """
			
 
				     path = os.path.join(path, os.path.basename(path)) + "_seg.nii.gz"
			
 
				     return load_single_nifti(path).astype(np.uint8)
			
 
				 
			
 
				 
			
 
				 def load_single_nifti(path):
			
 
				+    """ Load Nifti file as numpy
			
 
				+
			
 
				+    :param path: Path to file
			
 
				+    :return: Loaded data
			
 
				+    """
			
 
				     data = nib.load(path).get_fdata().astype(np.int16)
			
 
				     return np.transpose(data, (1, 0, 2))
			
 
				 
			
 
				 
			
 
				-def write_to_file(features_list, labels_list, foreground_mean_list, foreground_std_list, output_dir, count):
			
 
				+def write_to_file(features_list, labels_list, foreground_mean_list, foreground_std_list, output_dir, # pylint: disable=R0913
			
 
				+                  count):
			
 
				+    """ Dump numpy array to tfrecord
			
 
				+
			
 
				+    :param features_list: List of features
			
 
				+    :param labels_list:  List of labels
			
 
				+    :param foreground_mean_list: List of means for each volume
			
 
				+    :param foreground_std_list:  List of std for each volume
			
 
				+    :param output_dir: Directory where to write
			
 
				+    :param count: Index of the record
			
 
				+    :return:
			
 
				+    """
			
 
				     output_filename = os.path.join(output_dir, "volume-{}.tfrecord".format(count))
			
 
				     filelist = list(zip(np.array(features_list),
			
 
				                         np.array(labels_list),
			
@@ -69,17 +103,22 @@ def write_to_file(features_list, labels_list, foreground_mean_list, foreground_s
 
				 
			
 
				 
			
 
				 def np_to_tfrecords(filelist, output_filename):
			
 
				+    """ Convert numpy array to tfrecord
			
 
				+
			
 
				+    :param filelist: List of files
			
 
				+    :param output_filename: Destination directory
			
 
				+    """
			
 
				     writer = tf.io.TFRecordWriter(output_filename)
			
 
				 
			
 
				-    for idx in range(len(filelist)):
			
 
				-        X = filelist[idx][0].flatten().tostring()
			
 
				-        Y = filelist[idx][1].flatten().tostring()
			
 
				-        mean = filelist[idx][2].astype(np.float32).flatten()
			
 
				-        stdev = filelist[idx][3].astype(np.float32).flatten()
			
 
				+    for file_item in filelist:
			
 
				+        sample = file_item[0].flatten().tostring()
			
 
				+        label = file_item[1].flatten().tostring()
			
 
				+        mean = file_item[2].astype(np.float32).flatten()
			
 
				+        stdev = file_item[3].astype(np.float32).flatten()
			
 
				 
			
 
				         d_feature = {}
			
 
				-        d_feature['X'] = tf.train.Feature(bytes_list=tf.train.BytesList(value=[X]))
			
 
				-        d_feature['Y'] = tf.train.Feature(bytes_list=tf.train.BytesList(value=[Y]))
			
 
				+        d_feature['X'] = tf.train.Feature(bytes_list=tf.train.BytesList(value=[sample]))
			
 
				+        d_feature['Y'] = tf.train.Feature(bytes_list=tf.train.BytesList(value=[label]))
			
 
				         d_feature['mean'] = tf.train.Feature(float_list=tf.train.FloatList(value=mean))
			
 
				         d_feature['stdev'] = tf.train.Feature(float_list=tf.train.FloatList(value=stdev))
			
 
				 
			
@@ -90,8 +129,9 @@ def np_to_tfrecords(filelist, output_filename):
 
				     writer.close()
			
 
				 
			
 
				 
			
 
				-def main():
			
 
				-    # parse arguments
			
 
				+def main():  # pylint: disable=R0914
			
 
				+    """ Starting point of the application"""
			
 
				+
			
 
				     params = PARSER.parse_args()
			
 
				     input_dir = params.input_dir
			
 
				     output_dir = params.output_dir
			
@@ -101,7 +141,7 @@ def main():
 
				     if params.single_data_dir:
			
 
				         patient_list.extend([os.path.join(input_dir, folder) for folder in os.listdir(input_dir)])
			
 
				     else:
			
 
				-        assert "HGG" in os.listdir(input_dir) and "LGG" in os.listdir(input_dir),\
			
 
				+        assert "HGG" in os.listdir(input_dir) and "LGG" in os.listdir(input_dir), \
			
 
				             "Data directory has to contain folders named HGG and LGG. " \
			
 
				             "If you have a single folder with patient's data please set --single_data_dir flag"
			
 
				         path_hgg = os.path.join(input_dir, "HGG")
			
@@ -135,7 +175,7 @@ def main():
 
				         foreground_mean_list.append(fg_mean)
			
 
				         foreground_std_list.append(fg_std)
			
 
				 
			
 
				-        if (i+1) % params.vol_per_file == 0:
			
 
				+        if (i + 1) % params.vol_per_file == 0:
			
 
				             write_to_file(features_list, labels_list, foreground_mean_list, foreground_std_list, output_dir, count)
			
 
				 
			
 
				             # Clear lists
			
@@ -158,4 +198,3 @@ def main():
 
				 
			
 
				 if __name__ == '__main__':
			
 
				     main()
			
 
				-
			
--- a/TensorFlow/Segmentation/UNet_3D_Medical/dataset/transforms.py
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/dataset/transforms.py
@@ -1,4 +1,4 @@
 
				-# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
			
 
				+# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
			
 
				 #
			
 
				 # Licensed under the Apache License, Version 2.0 (the "License");
			
 
				 # you may not use this file except in compliance with the License.
			
@@ -12,197 +12,280 @@
 
				 # See the License for the specific language governing permissions and
			
 
				 # limitations under the License.
			
 
				 
			
 
				+""" Transforms for 3D data augmentation """
			
 
				 import tensorflow as tf
			
 
				 
			
 
				 
			
 
				-def apply_transforms(x, y, mean, stdev, transforms):
			
 
				+def apply_transforms(samples, labels, mean, stdev, transforms):
			
 
				+    """ Apply a chain of transforms to a pair of samples and labels """
			
 
				     for _t in transforms:
			
 
				         if _t is not None:
			
 
				-            x, y = _t(x, y, mean, stdev)
			
 
				-    return x, y
			
 
				+            samples, labels = _t(samples, labels, mean, stdev)
			
 
				+    return samples, labels
			
 
				 
			
 
				 
			
 
				-def apply_test_transforms(x, mean, stdev, transforms):
			
 
				+def apply_test_transforms(samples, mean, stdev, transforms):
			
 
				+    """ Apply a chain of transforms to a samples using during test """
			
 
				     for _t in transforms:
			
 
				         if _t is not None:
			
 
				-            x = _t(x, y=None, mean=mean, stdev=stdev)
			
 
				-    return x
			
 
				+            samples = _t(samples, labels=None, mean=mean, stdev=stdev)
			
 
				+    return samples
			
 
				 
			
 
				 
			
 
				-class PadXYZ:
			
 
				+class PadXYZ: # pylint: disable=R0903
			
 
				+    """ Pad volume in three dimensiosn """
			
 
				     def __init__(self, shape=None):
			
 
				+        """ Add padding
			
 
				+
			
 
				+        :param shape: Target shape
			
 
				+        """
			
 
				         self.shape = shape
			
 
				 
			
 
				-    def __call__(self, x, y, mean, stdev):
			
 
				+    def __call__(self, samples, labels, mean, stdev):
			
 
				+        """ Run op
			
 
				+
			
 
				+        :param samples: Sample arrays
			
 
				+        :param labels: Label arrays
			
 
				+        :param mean: Mean (unused)
			
 
				+        :param stdev:  Std (unused)
			
 
				+        :return: Padded samples and labels
			
 
				+        """
			
 
				         paddings = tf.constant([[0, 0], [0, 0], [0, 5], [0, 0]])
			
 
				-        x = tf.pad(x, paddings, "CONSTANT")
			
 
				-        if y is None:
			
 
				-            return x
			
 
				-        y = tf.pad(y, paddings, "CONSTANT")
			
 
				-        return x, y
			
 
				+        samples = tf.pad(samples, paddings, "CONSTANT")
			
 
				+        if labels is None:
			
 
				+            return samples
			
 
				+        labels = tf.pad(labels, paddings, "CONSTANT")
			
 
				+        return samples, labels
			
 
				 
			
 
				 
			
 
				-class CenterCrop:
			
 
				+class CenterCrop: # pylint: disable=R0903
			
 
				+    """ Produce a central crop in 3D """
			
 
				     def __init__(self, shape):
			
 
				+        """ Create op
			
 
				+
			
 
				+        :param shape: Target shape for crop
			
 
				+        """
			
 
				         self.shape = shape
			
 
				 
			
 
				-    def __call__(self, x, y, mean, stdev):
			
 
				-        shape = x.get_shape()
			
 
				+    def __call__(self, samples, labels, mean, stdev):
			
 
				+        """ Run op
			
 
				+
			
 
				+        :param samples: Sample arrays
			
 
				+        :param labels: Label arrays
			
 
				+        :param mean: Mean (unused)
			
 
				+        :param stdev:  Std (unused)
			
 
				+        :return: Cropped samples and labels
			
 
				+        """
			
 
				+        shape = samples.get_shape()
			
 
				         delta = [(shape[i].value - self.shape[i]) // 2 for i in range(len(self.shape))]
			
 
				-        x = x[
			
 
				+        samples = samples[
			
 
				             delta[0]:delta[0] + self.shape[0],
			
 
				             delta[1]:delta[1] + self.shape[1],
			
 
				-            delta[2]:delta[2] + self.shape[2]
			
 
				-            ]
			
 
				-        if y is None:
			
 
				-            return x
			
 
				-        y = y[
			
 
				+            delta[2]:delta[2] + self.shape[2]]
			
 
				+        if labels is None:
			
 
				+            return samples
			
 
				+        labels = labels[
			
 
				             delta[0]:delta[0] + self.shape[0],
			
 
				             delta[1]:delta[1] + self.shape[1],
			
 
				-            delta[2]:delta[2] + self.shape[2]
			
 
				-            ]
			
 
				-        return x, y
			
 
				+            delta[2]:delta[2] + self.shape[2]]
			
 
				+        return samples, labels
			
 
				 
			
 
				 
			
 
				-class RandomCrop3D:
			
 
				+class RandomCrop3D: # pylint: disable=R0903
			
 
				+    """ Produce a random 3D crop """
			
 
				     def __init__(self, shape, margins=(0, 0, 0)):
			
 
				+        """ Create op
			
 
				+
			
 
				+        :param shape: Target shape
			
 
				+        :param margins: Margins within to perform the crop
			
 
				+        """
			
 
				         self.shape = shape
			
 
				         self.margins = margins
			
 
				 
			
 
				-    def __call__(self, x, y, mean, stdev):
			
 
				-        shape = x.get_shape()
			
 
				-        min = tf.constant(self.margins, dtype=tf.float32)
			
 
				-        max = tf.constant([shape[0].value - self.shape[0] - self.margins[0],
			
 
				-                           shape[1].value - self.shape[1] - self.margins[1],
			
 
				-                           shape[2].value - self.shape[2] - self.margins[2]], dtype=tf.float32)
			
 
				-        center = tf.random_uniform((len(self.shape),), minval=min, maxval=max)
			
 
				+    def __call__(self, samples, labels, mean, stdev):
			
 
				+        """ Run op
			
 
				+
			
 
				+        :param samples: Sample arrays
			
 
				+        :param labels: Label arrays
			
 
				+        :param mean: Mean (unused)
			
 
				+        :param stdev:  Std (unused)
			
 
				+        :return: Cropped samples and labels
			
 
				+        """
			
 
				+        shape = samples.get_shape()
			
 
				+        min_ = tf.constant(self.margins, dtype=tf.float32)
			
 
				+        max_ = tf.constant([shape[0].value - self.shape[0] - self.margins[0],
			
 
				+                            shape[1].value - self.shape[1] - self.margins[1],
			
 
				+                            shape[2].value - self.shape[2] - self.margins[2]],
			
 
				+                           dtype=tf.float32)
			
 
				+        center = tf.random_uniform((len(self.shape),), minval=min_, maxval=max_)
			
 
				         center = tf.cast(center, dtype=tf.int32)
			
 
				-        x = x[center[0]:center[0] + self.shape[0],
			
 
				-              center[1]:center[1] + self.shape[1],
			
 
				-              center[2]:center[2] + self.shape[2]]
			
 
				-        if y is None:
			
 
				-            return x
			
 
				-        y = y[center[0]:center[0] + self.shape[0],
			
 
				-              center[1]:center[1] + self.shape[1],
			
 
				-              center[2]:center[2] + self.shape[2]]
			
 
				-        return x, y
			
 
				-
			
 
				-
			
 
				-class NormalizeImages:
			
 
				-    def __init__(self):
			
 
				-        pass
			
 
				-
			
 
				-    def __call__(self, x, y, mean, stdev):
			
 
				-        mask = tf.math.greater(x, 0)
			
 
				-        x = tf.where(mask, (x - tf.cast(mean, x.dtype)) / (tf.cast(stdev + 1e-8, x.dtype)), x)
			
 
				-
			
 
				-        if y is None:
			
 
				-            return x
			
 
				-        return x, y
			
 
				-
			
 
				-
			
 
				-class Cast:
			
 
				+        samples = samples[center[0]:center[0] + self.shape[0],
			
 
				+                          center[1]:center[1] + self.shape[1],
			
 
				+                          center[2]:center[2] + self.shape[2]]
			
 
				+        if labels is None:
			
 
				+            return samples
			
 
				+        labels = labels[center[0]:center[0] + self.shape[0],
			
 
				+                        center[1]:center[1] + self.shape[1],
			
 
				+                        center[2]:center[2] + self.shape[2]]
			
 
				+        return samples, labels
			
 
				+
			
 
				+
			
 
				+class NormalizeImages: # pylint: disable=R0903
			
 
				+    """ Run zscore normalization """
			
 
				+    def __call__(self, samples, labels, mean, stdev):
			
 
				+        """ Run op
			
 
				+
			
 
				+        :param samples: Sample arrays
			
 
				+        :param labels: Label arrays
			
 
				+        :param mean: Mean
			
 
				+        :param stdev:  Std
			
 
				+        :return: Normalized samples and labels
			
 
				+        """
			
 
				+        mask = tf.math.greater(samples, 0)
			
 
				+        samples = tf.where(mask, (samples - tf.cast(mean, samples.dtype)) / (tf.cast(stdev + 1e-8, samples.dtype)),
			
 
				+                           samples)
			
 
				+
			
 
				+        if labels is None:
			
 
				+            return samples
			
 
				+        return samples, labels
			
 
				+
			
 
				+
			
 
				+class Cast: # pylint: disable=R0903
			
 
				+    """ Cast samples and labels to different precision """
			
 
				     def __init__(self, dtype=tf.float32):
			
 
				         self._dtype = dtype
			
 
				 
			
 
				-    def __call__(self, x, y, mean, stdev):
			
 
				-        if y is None:
			
 
				-            return tf.cast(x, dtype=self._dtype)
			
 
				-        return tf.cast(x, dtype=self._dtype), y
			
 
				+    def __call__(self, samples, labels, mean, stdev):
			
 
				+        """ Run op
			
 
				+
			
 
				+        :param samples: Sample arrays
			
 
				+        :param labels: Label arrays
			
 
				+        :param mean: Mean (unused)
			
 
				+        :param stdev:  Std (unused)
			
 
				+        :return: Casted samples and labels
			
 
				+        """
			
 
				+        if labels is None:
			
 
				+            return tf.cast(samples, dtype=self._dtype)
			
 
				+        return tf.cast(samples, dtype=self._dtype), labels
			
 
				 
			
 
				 
			
 
				-class RandomHorizontalFlip:
			
 
				+class RandomHorizontalFlip: # pylint: disable=R0903
			
 
				+    """ Randomly flip horizontally a pair of samples and labels"""
			
 
				     def __init__(self, threshold=0.5):
			
 
				         self._threshold = threshold
			
 
				 
			
 
				-    def __call__(self, x, y, mean, stdev):
			
 
				+    def __call__(self, samples, labels, mean, stdev):
			
 
				+        """ Run op
			
 
				+
			
 
				+        :param samples: Sample arrays
			
 
				+        :param labels: Label arrays
			
 
				+        :param mean: Mean (unused)
			
 
				+        :param stdev:  Std (unused)
			
 
				+        :return: Flipped samples and labels
			
 
				+        """
			
 
				         h_flip = tf.random_uniform([]) > self._threshold
			
 
				 
			
 
				-        x = tf.cond(h_flip, lambda: tf.reverse(x, axis=[1]), lambda: x)
			
 
				-        y = tf.cond(h_flip, lambda: tf.reverse(y, axis=[1]), lambda: y)
			
 
				+        samples = tf.cond(h_flip, lambda: tf.reverse(samples, axis=[1]), lambda: samples)
			
 
				+        labels = tf.cond(h_flip, lambda: tf.reverse(labels, axis=[1]), lambda: labels)
			
 
				 
			
 
				-        return x, y
			
 
				+        return samples, labels
			
 
				 
			
 
				 
			
 
				-class RandomVerticalFlip:
			
 
				+class RandomVerticalFlip: # pylint: disable=R0903
			
 
				+    """ Randomly flip vertically a pair of samples and labels"""
			
 
				     def __init__(self, threshold=0.5):
			
 
				         self._threshold = threshold
			
 
				 
			
 
				-    def __call__(self, x, y, mean, stdev):
			
 
				+    def __call__(self, samples, labels, mean, stdev):
			
 
				+        """ Run op
			
 
				+
			
 
				+        :param samples: Sample arrays
			
 
				+        :param labels: Label arrays
			
 
				+        :param mean: Mean (unused)
			
 
				+        :param stdev:  Std (unused)
			
 
				+        :return: Flipped samples and labels
			
 
				+        """
			
 
				         h_flip = tf.random_uniform([]) > self._threshold
			
 
				 
			
 
				-        x = tf.cond(h_flip, lambda: tf.reverse(x, axis=[0]), lambda: x)
			
 
				-        y = tf.cond(h_flip, lambda: tf.reverse(y, axis=[0]), lambda: y)
			
 
				+        samples = tf.cond(h_flip, lambda: tf.reverse(samples, axis=[0]), lambda: samples)
			
 
				+        labels = tf.cond(h_flip, lambda: tf.reverse(labels, axis=[0]), lambda: labels)
			
 
				 
			
 
				-        return x, y
			
 
				+        return samples, labels
			
 
				 
			
 
				 
			
 
				-class RandomGammaCorrection:
			
 
				+class RandomGammaCorrection: # pylint: disable=R0903
			
 
				+    """ Random gamma correction over samples """
			
 
				     def __init__(self, gamma_range=(0.8, 1.5), keep_stats=False, threshold=0.5, epsilon=1e-8):
			
 
				         self._gamma_range = gamma_range
			
 
				         self._keep_stats = keep_stats
			
 
				         self._eps = epsilon
			
 
				         self._threshold = threshold
			
 
				 
			
 
				-    def __call__(self, x, y, mean, stdev):
			
 
				+    def __call__(self, samples, labels, mean, stdev):
			
 
				+        """ Run op
			
 
				+
			
 
				+        :param samples: Sample arrays
			
 
				+        :param labels: Label arrays
			
 
				+        :param mean: Mean (unused)
			
 
				+        :param stdev:  Std (unused)
			
 
				+        :return: Gamma corrected samples
			
 
				+        """
			
 
				         augment = tf.random_uniform([]) > self._threshold
			
 
				         gamma = tf.random_uniform([], minval=self._gamma_range[0], maxval=self._gamma_range[1])
			
 
				 
			
 
				-        x_min = tf.math.reduce_min(x)
			
 
				-        x_range = tf.math.reduce_max(x) - x_min
			
 
				+        x_min = tf.math.reduce_min(samples)
			
 
				+        x_range = tf.math.reduce_max(samples) - x_min
			
 
				 
			
 
				-        x = tf.cond(augment,
			
 
				-                    lambda: tf.math.pow(((x - x_min) / float(x_range + self._eps)), gamma) * x_range + x_min,
			
 
				-                    lambda: x)
			
 
				-        return x, y
			
 
				+        samples = tf.cond(augment,
			
 
				+                          lambda: tf.math.pow(((samples - x_min) / float(x_range + self._eps)),
			
 
				+                                              gamma) * x_range + x_min,
			
 
				+                          lambda: samples)
			
 
				+        return samples, labels
			
 
				 
			
 
				 
			
 
				-class RandomBrightnessCorrection:
			
 
				+class RandomBrightnessCorrection: # pylint: disable=R0903
			
 
				+    """ Random brightness correction over samples """
			
 
				     def __init__(self, alpha=0.1, threshold=0.5, per_channel=True):
			
 
				         self._alpha_range = [1.0 - alpha, 1.0 + alpha]
			
 
				         self._threshold = threshold
			
 
				         self._per_channel = per_channel
			
 
				 
			
 
				-    def __call__(self, x, y, mean, stdev):
			
 
				-        mask = tf.math.greater(x, 0)
			
 
				-        size = x.get_shape()[-1].value if self._per_channel else 1
			
 
				+    def __call__(self, samples, labels, mean, stdev):
			
 
				+        """ Run op
			
 
				+
			
 
				+        :param samples: Sample arrays
			
 
				+        :param labels: Label arrays
			
 
				+        :param mean: Mean (unused)
			
 
				+        :param stdev:  Std (unused)
			
 
				+        :return: Brightness corrected samples
			
 
				+        """
			
 
				+        mask = tf.math.greater(samples, 0)
			
 
				+        size = samples.get_shape()[-1].value if self._per_channel else 1
			
 
				         augment = tf.random_uniform([]) > self._threshold
			
 
				         correction = tf.random_uniform([size],
			
 
				                                        minval=self._alpha_range[0],
			
 
				                                        maxval=self._alpha_range[1],
			
 
				-                                       dtype=x.dtype)
			
 
				+                                       dtype=samples.dtype)
			
 
				 
			
 
				-        x = tf.cond(augment,
			
 
				-                    lambda: tf.where(mask, x + correction, x),
			
 
				-                    lambda: x)
			
 
				+        samples = tf.cond(augment,
			
 
				+                          lambda: tf.where(mask, samples + correction, samples),
			
 
				+                          lambda: samples)
			
 
				 
			
 
				-        return x, y
			
 
				+        return samples, labels
			
 
				 
			
 
				 
			
 
				-class OneHotLabels:
			
 
				+class OneHotLabels: # pylint: disable=R0903
			
 
				+    """ One hot encoding of labels """
			
 
				     def __init__(self, n_classes=1):
			
 
				         self._n_classes = n_classes
			
 
				 
			
 
				-    def __call__(self, x, y, mean, stdev):
			
 
				-        return x, tf.one_hot(y, self._n_classes)
			
 
				-
			
 
				-
			
 
				-class PadXY:
			
 
				-    def __init__(self, dst_size=None):
			
 
				-        if not dst_size:
			
 
				-            raise ValueError("Invalid padding size: {}".format(dst_size))
			
 
				-
			
 
				-        self._dst_size = dst_size
			
 
				-
			
 
				-    def __call__(self, x, y, mean, stdev):
			
 
				-        return tf.pad(x, self._build_padding(x)), \
			
 
				-               tf.pad(y, self._build_padding(y))
			
 
				+    def __call__(self, samples, labels, mean, stdev):
			
 
				+        """ Run op
			
 
				 
			
 
				-    def _build_padding(self, _t):
			
 
				-        padding = []
			
 
				-        for i in range(len(_t.shape)):
			
 
				-            if i < len(self._dst_size):
			
 
				-                padding.append((0, self._dst_size[i] - _t.shape[i]))
			
 
				-            else:
			
 
				-                padding.append((0, 0))
			
 
				-        return padding
			
 
				+        :param samples: Sample arrays (unused)
			
 
				+        :param labels: Label arrays
			
 
				+        :param mean: Mean (unused)
			
 
				+        :param stdev:  Std (unused)
			
 
				+        :return: One hot encoded labels
			
 
				+        """
			
 
				+        return samples, tf.one_hot(labels, self._n_classes)
			
--- a/TensorFlow/Segmentation/UNet_3D_Medical/main.py
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/main.py
@@ -1,4 +1,4 @@
 
				-# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
			
 
				+# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
			
 
				 #
			
 
				 # Licensed under the Apache License, Version 2.0 (the "License");
			
 
				 # you may not use this file except in compliance with the License.
			
@@ -12,98 +12,85 @@
 
				 # See the License for the specific language governing permissions and
			
 
				 # limitations under the License.
			
 
				 
			
 
				+""" Entry point of the application.
			
 
				+
			
 
				+This file serves as entry point to the implementation of UNet3D for
			
 
				+medical image segmentation.
			
 
				+
			
 
				+Example usage:
			
 
				+    $ python main.py --exec_mode train --data_dir ./data --batch_size 2
			
 
				+    --max_steps 1600 --amp
			
 
				+
			
 
				+All arguments are listed under `python main.py -h`.
			
 
				+Full argument definition can be found in `arguments.py`.
			
 
				+
			
 
				+"""
			
 
				 import os
			
 
				-import logging
			
 
				 
			
 
				 import numpy as np
			
 
				-import tensorflow as tf
			
 
				 import horovod.tensorflow as hvd
			
 
				 
			
 
				+from model.model_fn import unet_3d
			
 
				 from dataset.data_loader import Dataset, CLASSES
			
 
				-from runtime.hooks import get_hooks, ProfilingHook, TrainingHook
			
 
				+from runtime.hooks import get_hooks
			
 
				 from runtime.arguments import PARSER
			
 
				-from runtime.setup import prepare_model_dir, build_estimator, set_flags, get_logger
			
 
				+from runtime.setup import build_estimator, set_flags, get_logger
			
 
				+
			
 
				 
			
 
				+def parse_evaluation_results(result, logger, step=()):
			
 
				+    """
			
 
				+    Parse DICE scores from the evaluation results
			
 
				 
			
 
				-def parse_evaluation_results(result):
			
 
				-    data = {CLASSES[i]: result[CLASSES[i]] for i in range(len(CLASSES))}
			
 
				+    :param result: Dictionary with metrics collected by the optimizer
			
 
				+    :param logger: Logger object
			
 
				+    :return:
			
 
				+    """
			
 
				+    data = {CLASSES[i]: float(result[CLASSES[i]]) for i in range(len(CLASSES))}
			
 
				     data['MeanDice'] = sum([result[CLASSES[i]] for i in range(len(CLASSES))]) / len(CLASSES)
			
 
				-    data['WholeTumor'] = result['WholeTumor']
			
 
				+    data['WholeTumor'] = float(result['WholeTumor'])
			
 
				+
			
 
				+    if hvd.rank() == 0:
			
 
				+        logger.log(step=step, data=data)
			
 
				+
			
 
				     return data
			
 
				 
			
 
				 
			
 
				 def main():
			
 
				-    tf.get_logger().setLevel(logging.ERROR)
			
 
				+    """ Starting point of the application """
			
 
				     hvd.init()
			
 
				+    set_flags()
			
 
				     params = PARSER.parse_args()
			
 
				-    model_dir = prepare_model_dir(params)
			
 
				     logger = get_logger(params)
			
 
				 
			
 
				     dataset = Dataset(data_dir=params.data_dir,
			
 
				                       batch_size=params.batch_size,
			
 
				                       fold_idx=params.fold,
			
 
				                       n_folds=params.num_folds,
			
 
				+                      input_shape=params.input_shape,
			
 
				                       params=params)
			
 
				 
			
 
				-    estimator = build_estimator(params=params, model_dir=model_dir)
			
 
				-
			
 
				-    max_steps = params.max_steps // (1 if params.benchmark else hvd.size())
			
 
				+    estimator = build_estimator(params=params, model_fn=unet_3d)
			
 
				+    hooks = get_hooks(params, logger)
			
 
				 
			
 
				     if 'train' in params.exec_mode:
			
 
				-        training_hooks = get_hooks(params, logger)
			
 
				+        max_steps = params.max_steps // (1 if params.benchmark else hvd.size())
			
 
				         estimator.train(
			
 
				             input_fn=dataset.train_fn,
			
 
				             steps=max_steps,
			
 
				-            hooks=training_hooks)
			
 
				-
			
 
				+            hooks=hooks)
			
 
				     if 'evaluate' in params.exec_mode:
			
 
				         result = estimator.evaluate(input_fn=dataset.eval_fn, steps=dataset.eval_size)
			
 
				-        data = parse_evaluation_results(result)
			
 
				+        _ = parse_evaluation_results(result, logger)
			
 
				+    if params.exec_mode == 'predict':
			
 
				         if hvd.rank() == 0:
			
 
				-            logger.log(step=(), data=data)
			
 
				-
			
 
				-    if 'predict' == params.exec_mode:
			
 
				-        inference_hooks = get_hooks(params, logger)
			
 
				-        if hvd.rank() == 0:
			
 
				-            count = 1 if not params.benchmark else 2 * params.warmup_steps * params.batch_size // dataset.test_size
			
 
				             predictions = estimator.predict(
			
 
				-                input_fn=lambda: dataset.test_fn(count=count,
			
 
				-                                                 drop_remainder=params.benchmark), hooks=inference_hooks)
			
 
				+                input_fn=dataset.test_fn, hooks=hooks)
			
 
				 
			
 
				-            for idx, p in enumerate(predictions):
			
 
				-                volume = p['predictions']
			
 
				+            for idx, pred in enumerate(predictions):
			
 
				+                volume = pred['predictions']
			
 
				                 if not params.benchmark:
			
 
				                     np.save(os.path.join(params.model_dir, "vol_{}.npy".format(idx)), volume)
			
 
				 
			
 
				-    if 'debug_train' == params.exec_mode:
			
 
				-        hooks = [hvd.BroadcastGlobalVariablesHook(0)]
			
 
				-        if hvd.rank() == 0:
			
 
				-            hooks += [TrainingHook(log_every=params.log_every,
			
 
				-                                   logger=logger,
			
 
				-                                   tensor_names=['total_loss_ref:0']),
			
 
				-                      ProfilingHook(warmup_steps=params.warmup_steps,
			
 
				-                                    global_batch_size=hvd.size() * params.batch_size,
			
 
				-                                    logger=logger,
			
 
				-                                    mode='train')]
			
 
				-
			
 
				-        estimator.train(
			
 
				-            input_fn=dataset.synth_train_fn,
			
 
				-            steps=max_steps,
			
 
				-            hooks=hooks)
			
 
				-
			
 
				-    if 'debug_predict' == params.exec_mode:
			
 
				-        if hvd.rank() == 0:
			
 
				-            hooks = [ProfilingHook(warmup_steps=params.warmup_steps,
			
 
				-                                   global_batch_size=params.batch_size,
			
 
				-                                   logger=logger,
			
 
				-                                   mode='inference')]
			
 
				-            count = 2 * params.warmup_steps
			
 
				-            predictions = estimator.predict(input_fn=lambda: dataset.synth_predict_fn(count=count),
			
 
				-                                            hooks=hooks)
			
 
				-            for p in predictions:
			
 
				-                _ = p['predictions']
			
 
				-
			
 
				 
			
 
				 if __name__ == '__main__':
			
 
				-    set_flags()
			
 
				     main()
			
--- a/TensorFlow/Segmentation/UNet_3D_Medical/model/layers.py
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/model/layers.py
@@ -1,4 +1,4 @@
 
				-# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
			
 
				+# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
			
 
				 #
			
 
				 # Licensed under the Apache License, Version 2.0 (the "License");
			
 
				 # you may not use this file except in compliance with the License.
			
@@ -12,10 +12,18 @@
 
				 # See the License for the specific language governing permissions and
			
 
				 # limitations under the License.
			
 
				 
			
 
				+""" High level definition of layers for model construction """
			
 
				 import tensorflow as tf
			
 
				 
			
 
				 
			
 
				 def _normalization(inputs, name, mode):
			
 
				+    """ Choose a normalization layer
			
 
				+
			
 
				+    :param inputs: Input node from the graph
			
 
				+    :param name: Name of layer
			
 
				+    :param mode: Estimator's execution mode
			
 
				+    :return: Normalized output
			
 
				+    """
			
 
				     training = mode == tf.estimator.ModeKeys.TRAIN
			
 
				 
			
 
				     if name == 'instancenorm':
			
@@ -45,28 +53,34 @@ def _normalization(inputs, name, mode):
 
				         return tf.keras.layers.BatchNormalization(axis=-1,
			
 
				                                                   trainable=True,
			
 
				                                                   virtual_batch_size=None)(inputs, training=training)
			
 
				-    elif name == 'none':
			
 
				+    if name == 'none':
			
 
				         return inputs
			
 
				-    else:
			
 
				-        raise ValueError('Invalid normalization layer')
			
 
				+
			
 
				+    raise ValueError('Invalid normalization layer')
			
 
				 
			
 
				 
			
 
				-def _activation(x, activation):
			
 
				+def _activation(out, activation):
			
 
				+    """ Choose an activation layer
			
 
				+
			
 
				+    :param out: Input node from the graph
			
 
				+    :param activation: Name of layer
			
 
				+    :return: Activation output
			
 
				+    """
			
 
				     if activation == 'relu':
			
 
				-        return tf.nn.relu(x)
			
 
				-    elif activation == 'leaky_relu':
			
 
				-        return tf.nn.leaky_relu(x, alpha=0.01)
			
 
				-    elif activation == 'sigmoid':
			
 
				-        return tf.nn.sigmoid(x)
			
 
				-    elif activation == 'softmax':
			
 
				-        return tf.nn.softmax(x, axis=-1)
			
 
				-    elif activation == 'none':
			
 
				-        return x
			
 
				-    else:
			
 
				-        raise ValueError("Unknown activation {}".format(activation))
			
 
				+        return tf.nn.relu(out)
			
 
				+    if activation == 'leaky_relu':
			
 
				+        return tf.nn.leaky_relu(out, alpha=0.01)
			
 
				+    if activation == 'sigmoid':
			
 
				+        return tf.nn.sigmoid(out)
			
 
				+    if activation == 'softmax':
			
 
				+        return tf.nn.softmax(out, axis=-1)
			
 
				+    if activation == 'none':
			
 
				+        return out
			
 
				+
			
 
				+    raise ValueError("Unknown activation {}".format(activation))
			
 
				 
			
 
				 
			
 
				-def convolution(x,
			
 
				+def convolution(inputs,  # pylint: disable=R0913
			
 
				                 out_channels,
			
 
				                 kernel_size=3,
			
 
				                 stride=1,
			
@@ -74,62 +88,94 @@ def convolution(x,
 
				                 normalization='batchnorm',
			
 
				                 activation='leaky_relu',
			
 
				                 transpose=False):
			
 
				-
			
 
				+    """ Create a convolution layer
			
 
				+
			
 
				+    :param inputs: Input node from graph
			
 
				+    :param out_channels: Output number of channels
			
 
				+    :param kernel_size: Size of the kernel
			
 
				+    :param stride: Stride of the kernel
			
 
				+    :param mode: Estimator's execution mode
			
 
				+    :param normalization: Name of the normalization layer
			
 
				+    :param activation: Name of the activation layer
			
 
				+    :param transpose: Select between regular and transposed convolution
			
 
				+    :return: Convolution output
			
 
				+    """
			
 
				     if transpose:
			
 
				         conv = tf.keras.layers.Conv3DTranspose
			
 
				     else:
			
 
				         conv = tf.keras.layers.Conv3D
			
 
				-    regularizer = None#tf.keras.regularizers.l2(1e-5)
			
 
				-
			
 
				-    x = conv(filters=out_channels,
			
 
				-             kernel_size=kernel_size,
			
 
				-             strides=stride,
			
 
				-             activation=None,
			
 
				-             padding='same',
			
 
				-             data_format='channels_last',
			
 
				-             kernel_initializer=tf.glorot_uniform_initializer(),
			
 
				-             kernel_regularizer=regularizer,
			
 
				-             bias_initializer=tf.zeros_initializer(),
			
 
				-             bias_regularizer=regularizer)(x)
			
 
				-
			
 
				-    x = _normalization(x, normalization, mode)
			
 
				-
			
 
				-    return _activation(x, activation)
			
 
				-
			
 
				-
			
 
				-def upsample_block(x, skip_connection, out_channels, normalization, mode):
			
 
				-    x = convolution(x, kernel_size=2, out_channels=out_channels, stride=2,
			
 
				-                    normalization='none', activation='none', transpose=True)
			
 
				-    x = tf.keras.layers.Concatenate(axis=-1)([x, skip_connection])
			
 
				-
			
 
				-    x = convolution(x, out_channels=out_channels, normalization=normalization, mode=mode)
			
 
				-    x = convolution(x, out_channels=out_channels, normalization=normalization, mode=mode)
			
 
				-    return x
			
 
				-
			
 
				-
			
 
				-def input_block(x, out_channels, normalization, mode):
			
 
				-    x = convolution(x, out_channels=out_channels, normalization=normalization, mode=mode)
			
 
				-    x = convolution(x, out_channels=out_channels, normalization=normalization, mode=mode)
			
 
				-    return x
			
 
				-
			
 
				-
			
 
				-def downsample_block(x, out_channels, normalization, mode):
			
 
				-    x = convolution(x, out_channels=out_channels, normalization=normalization, mode=mode, stride=2)
			
 
				-    return convolution(x, out_channels=out_channels, normalization=normalization, mode=mode)
			
 
				-
			
 
				-
			
 
				-def linear_block(x, out_channels, mode, activation='leaky_relu', normalization='none'):
			
 
				-    x = convolution(x, out_channels=out_channels, normalization=normalization, mode=mode)
			
 
				-    return convolution(x, out_channels=out_channels, activation=activation, mode=mode, normalization=normalization)
			
 
				-
			
 
				-
			
 
				-def output_layer(x, out_channels, activation):
			
 
				-    x = tf.keras.layers.Conv3D(out_channels,
			
 
				-                               kernel_size=3,
			
 
				-                               activation=None,
			
 
				-                               padding='same',
			
 
				-                               kernel_regularizer=None,
			
 
				-                               kernel_initializer=tf.glorot_uniform_initializer(),
			
 
				-                               bias_initializer=tf.zeros_initializer(),
			
 
				-                               bias_regularizer=None)(x)
			
 
				-    return _activation(x, activation)
			
 
				+    regularizer = None  # tf.keras.regularizers.l2(1e-5)
			
 
				+
			
 
				+    use_bias = normalization == "none"
			
 
				+    inputs = conv(filters=out_channels,
			
 
				+                  kernel_size=kernel_size,
			
 
				+                  strides=stride,
			
 
				+                  activation=None,
			
 
				+                  padding='same',
			
 
				+                  data_format='channels_last',
			
 
				+                  kernel_initializer=tf.compat.v1.glorot_uniform_initializer(),
			
 
				+                  kernel_regularizer=regularizer,
			
 
				+                  bias_initializer=tf.zeros_initializer(),
			
 
				+                  bias_regularizer=regularizer,
			
 
				+                  use_bias=use_bias)(inputs)
			
 
				+
			
 
				+    inputs = _normalization(inputs, normalization, mode)
			
 
				+
			
 
				+    return _activation(inputs, activation)
			
 
				+
			
 
				+
			
 
				+def upsample_block(inputs, skip_connection, out_channels, normalization, mode):
			
 
				+    """ Create a block for upsampling
			
 
				+
			
 
				+    :param inputs: Input node from the graph
			
 
				+    :param skip_connection: Choose whether or not to use skip connection
			
 
				+    :param out_channels: Number of output channels
			
 
				+    :param normalization: Name of the normalizaiton layer
			
 
				+    :param mode: Estimator's execution mode
			
 
				+    :return: Output from the upsample block
			
 
				+    """
			
 
				+    inputs = convolution(inputs, kernel_size=2, out_channels=out_channels, stride=2,
			
 
				+                         normalization='none', activation='none', transpose=True)
			
 
				+    inputs = tf.keras.layers.Concatenate(axis=-1)([inputs, skip_connection])
			
 
				+
			
 
				+    inputs = convolution(inputs, out_channels=out_channels, normalization=normalization, mode=mode)
			
 
				+    inputs = convolution(inputs, out_channels=out_channels, normalization=normalization, mode=mode)
			
 
				+    return inputs
			
 
				+
			
 
				+
			
 
				+def input_block(inputs, out_channels, normalization, mode):
			
 
				+    """ Create the input block
			
 
				+
			
 
				+    :param inputs: Input node from the graph
			
 
				+    :param out_channels: Number of output channels
			
 
				+    :param normalization:  Name of the normalization layer
			
 
				+    :param mode: Estimator's execution mode
			
 
				+    :return: Output from the input block
			
 
				+    """
			
 
				+    inputs = convolution(inputs, out_channels=out_channels, normalization=normalization, mode=mode)
			
 
				+    inputs = convolution(inputs, out_channels=out_channels, normalization=normalization, mode=mode)
			
 
				+    return inputs
			
 
				+
			
 
				+
			
 
				+def downsample_block(inputs, out_channels, normalization, mode):
			
 
				+    """ Create a downsample block
			
 
				+
			
 
				+    :param inputs: Input node from the graph
			
 
				+    :param out_channels: Number of output channels
			
 
				+    :param normalization:  Name of the normalization layer
			
 
				+    :param mode: Estimator's execution mode
			
 
				+    :return: Output from the downsample block
			
 
				+    """
			
 
				+    inputs = convolution(inputs, out_channels=out_channels, normalization=normalization, mode=mode, stride=2)
			
 
				+    return convolution(inputs, out_channels=out_channels, normalization=normalization, mode=mode)
			
 
				+
			
 
				+
			
 
				+def output_layer(inputs, out_channels, activation):
			
 
				+    """ Create the output layer
			
 
				+
			
 
				+    :param inputs: Input node from the graph
			
 
				+    :param out_channels: Number of output channels
			
 
				+    :param activation:  Name of the activation layer
			
 
				+    :return: Output from the output block
			
 
				+    """
			
 
				+    return convolution(inputs, out_channels=out_channels, kernel_size=3, normalization='none', activation=activation)
			
--- a/TensorFlow/Segmentation/UNet_3D_Medical/model/losses.py
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/model/losses.py
@@ -1,4 +1,4 @@
 
				-# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
			
 
				+# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
			
 
				 #
			
 
				 # Licensed under the Apache License, Version 2.0 (the "License");
			
 
				 # you may not use this file except in compliance with the License.
			
@@ -12,10 +12,18 @@
 
				 # See the License for the specific language governing permissions and
			
 
				 # limitations under the License.
			
 
				 
			
 
				+""" Different losses for UNet3D """
			
 
				 import tensorflow as tf
			
 
				 
			
 
				 
			
 
				 def make_loss(params, y_true, y_pred):
			
 
				+    """ Factory method for loss functions
			
 
				+
			
 
				+    :param params: Dict with additional parameters
			
 
				+    :param y_true: Ground truth labels
			
 
				+    :param y_pred: Predicted labels
			
 
				+    :return: Loss
			
 
				+    """
			
 
				     if params.loss == 'dice':
			
 
				         return _dice(y_true, y_pred)
			
 
				     if params.loss == 'ce':
			
@@ -27,16 +35,34 @@ def make_loss(params, y_true, y_pred):
 
				 
			
 
				 
			
 
				 def _ce(y_true, y_pred):
			
 
				+    """ Crossentropy
			
 
				+
			
 
				+    :param y_true: Ground truth labels
			
 
				+    :param y_pred: Predicted labels
			
 
				+    :return: loss
			
 
				+    """
			
 
				     return tf.reduce_sum(
			
 
				         tf.reduce_mean(tf.keras.backend.binary_crossentropy(tf.cast(y_true, tf.float32), y_pred), axis=[0, 1, 2, 3]),
			
 
				         name='crossentropy_loss_ref')
			
 
				 
			
 
				 
			
 
				 def _dice(y_true, y_pred):
			
 
				+    """ Training dice
			
 
				+
			
 
				+    :param y_true: Ground truth labels
			
 
				+    :param y_pred: Predicted labels
			
 
				+    :return: loss
			
 
				+    """
			
 
				     return tf.reduce_sum(dice_loss(predictions=y_pred, targets=y_true), name='dice_loss_ref')
			
 
				 
			
 
				 
			
 
				 def eval_dice(y_true, y_pred):
			
 
				+    """ Evaluation dice
			
 
				+
			
 
				+    :param y_true: Ground truth labels
			
 
				+    :param y_pred: Predicted labels
			
 
				+    :return: loss
			
 
				+    """
			
 
				     return 1 - dice_loss(predictions=y_pred, targets=y_true)
			
 
				 
			
 
				 
			
@@ -45,6 +71,15 @@ def dice_loss(predictions,
 
				               squared_pred=False,
			
 
				               smooth=1e-5,
			
 
				               top_smooth=0.0):
			
 
				+    """ Dice
			
 
				+
			
 
				+    :param predictions: Predicted labels
			
 
				+    :param targets: Ground truth labels
			
 
				+    :param squared_pred: Square the predicate
			
 
				+    :param smooth: Smooth term for denominator
			
 
				+    :param top_smooth: Smooth term for numerator
			
 
				+    :return: loss
			
 
				+    """
			
 
				     is_channels_first = False
			
 
				 
			
 
				     n_len = len(predictions.get_shape())
			
@@ -60,15 +95,23 @@ def dice_loss(predictions,
 
				 
			
 
				     denominator = y_true_o + y_pred_o
			
 
				 
			
 
				-    f = (2.0 * intersection + top_smooth) / (denominator + smooth)
			
 
				+    dice = (2.0 * intersection + top_smooth) / (denominator + smooth)
			
 
				 
			
 
				-    return 1 - tf.reduce_mean(f, axis=0)
			
 
				+    return 1 - tf.reduce_mean(dice, axis=0)
			
 
				 
			
 
				 
			
 
				 def total_dice(predictions,
			
 
				                targets,
			
 
				                smooth=1e-5,
			
 
				                top_smooth=0.0):
			
 
				+    """ Total Dice
			
 
				+
			
 
				+    :param predictions: Predicted labels
			
 
				+    :param targets: Ground truth labels
			
 
				+    :param smooth: Smooth term for denominator
			
 
				+    :param top_smooth: Smooth term for numerator
			
 
				+    :return: loss
			
 
				+    """
			
 
				     n_len = len(predictions.get_shape())
			
 
				     reduce_axis = list(range(1, n_len-1))
			
 
				     targets = tf.reduce_sum(targets, axis=-1)
			
--- a/TensorFlow/Segmentation/UNet_3D_Medical/model/model_fn.py
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/model/model_fn.py
@@ -1,4 +1,4 @@
 
				-# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
			
 
				+# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
			
 
				 #
			
 
				 # Licensed under the Apache License, Version 2.0 (the "License");
			
 
				 # you may not use this file except in compliance with the License.
			
@@ -12,8 +12,7 @@
 
				 # See the License for the specific language governing permissions and
			
 
				 # limitations under the License.
			
 
				 
			
 
				-import os
			
 
				-
			
 
				+""" Model function in charge to collect metrics and feed them to the optimizer """
			
 
				 import horovod.tensorflow as hvd
			
 
				 import tensorflow as tf
			
 
				 
			
@@ -23,49 +22,71 @@ from dataset.data_loader import CLASSES
 
				 
			
 
				 
			
 
				 def unet_3d(features, labels, mode, params):
			
 
				+    """ Gather loss and feed it to the optimizer
			
 
				+
			
 
				+    :param features: Input features
			
 
				+    :param labels: Input labels
			
 
				+    :param mode: Estimator's execution mode
			
 
				+    :param params: Dict with additional parameters
			
 
				+    :return: Estimator spec
			
 
				+    """
			
 
				+    # TODO: Find a better way to handle the empty params namespace
			
 
				+    try:
			
 
				+        normalization = params.normalization
			
 
				+    except:
			
 
				+        normalization = 'instancenorm'
			
 
				+
			
 
				+    input_node = tf.identity(features, name='input_node')
			
 
				+
			
 
				+    logits = Builder(n_classes=4, normalization=normalization, mode=mode)(input_node)
			
 
				 
			
 
				-    logits = Builder(n_classes=4, normalization=params.normalization, mode=mode)(features)
			
 
				+    logits = tf.identity(logits, name='output_node')
			
 
				 
			
 
				     if mode == tf.estimator.ModeKeys.PREDICT:
			
 
				-        prediction = tf.argmax(input=logits, axis=-1, output_type=tf.dtypes.int32)
			
 
				+        prediction = tf.argmax(input=logits, axis=-1, output_type=tf.dtypes.int32, name="predictions")
			
 
				         return tf.estimator.EstimatorSpec(mode=mode,
			
 
				                                           predictions={'predictions': tf.cast(prediction, tf.int8)})
			
 
				 
			
 
				     labels = tf.cast(labels, tf.float32)
			
 
				-    if not params.include_background:
			
 
				-        labels = labels[..., 1:]
			
 
				-        logits = logits[..., 1:]
			
 
				 
			
 
				     if mode == tf.estimator.ModeKeys.EVAL:
			
 
				-        eval_acc = eval_dice(y_true=labels, y_pred=tf.round(logits))
			
 
				-        total_eval_acc = total_dice(tf.round(logits), labels)
			
 
				-        metrics = {CLASSES[i]: tf.metrics.mean(eval_acc[i]) for i in range(eval_acc.shape[-1])}
			
 
				-        metrics['WholeTumor'] = tf.metrics.mean(total_eval_acc)
			
 
				+        prediction = tf.argmax(input=logits, axis=-1, output_type=tf.dtypes.int32)
			
 
				+        prediction = tf.one_hot(prediction, 4)
			
 
				+        if not params.include_background:
			
 
				+            labels = labels[..., 1:]
			
 
				+            prediction = prediction[..., 1:]
			
 
				+        prediction = tf.cast(prediction, tf.float32)
			
 
				+        eval_acc = eval_dice(y_true=labels, y_pred=prediction)
			
 
				+        total_eval_acc = total_dice(prediction, labels)
			
 
				+        metrics = {CLASSES[i]: tf.compat.v1.metrics.mean(eval_acc[i]) for i in range(eval_acc.shape[-1])}
			
 
				+        metrics['WholeTumor'] = tf.compat.v1.metrics.mean(total_eval_acc)
			
 
				         return tf.estimator.EstimatorSpec(mode=mode, loss=tf.reduce_mean(eval_acc),
			
 
				                                           eval_metric_ops=metrics)
			
 
				 
			
 
				+    if not params.include_background:
			
 
				+        labels = labels[..., 1:]
			
 
				+        logits = logits[..., 1:]
			
 
				+
			
 
				     loss = make_loss(params, y_pred=logits, y_true=labels)
			
 
				     loss = tf.identity(loss, name="total_loss_ref")
			
 
				 
			
 
				     global_step = tf.compat.v1.train.get_or_create_global_step()
			
 
				+    boundaries = [params.max_steps // (2 * hvd.size()),
			
 
				+                  params.max_steps // (2 * hvd.size()),
			
 
				+                  3 * params.max_steps // (4 * hvd.size())]
			
 
				 
			
 
				-    optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=params.learning_rate)
			
 
				-    optimizer = hvd.DistributedOptimizer(optimizer)
			
 
				+    lr = params.learning_rate
			
 
				+    values = [lr / 4, lr, lr / 5, lr / 20]
			
 
				+    learning_rate = tf.compat.v1.train.piecewise_constant(global_step, boundaries, values)
			
 
				+    optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate)
			
 
				 
			
 
				-    # NGC has TF_ENABLE_AUTO_MIXED_PRECISION enabled by default. We cannot use
			
 
				-    # both graph_rewrite and envar, so if we're not in NGC we do graph_rewrite
			
 
				-    try:
			
 
				-        amp_envar = int(os.environ['TF_ENABLE_AUTO_MIXED_PRECISION']) == 1
			
 
				-    except KeyError:
			
 
				-        amp_envar = False
			
 
				+    if params.use_amp:
			
 
				+        loss_scale = tf.train.experimental.DynamicLossScale()
			
 
				+        optimizer = tf.compat.v1.train.experimental.MixedPrecisionLossScaleOptimizer(optimizer, loss_scale)
			
 
				 
			
 
				-    if params.use_amp and not amp_envar:
			
 
				-        optimizer = tf.train.experimental.enable_mixed_precision_graph_rewrite(
			
 
				-            optimizer,
			
 
				-            loss_scale='dynamic'
			
 
				-        )
			
 
				+    optimizer = hvd.DistributedOptimizer(optimizer)
			
 
				 
			
 
				-    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
			
 
				+    with tf.control_dependencies(tf.compat.v1.get_collection(tf.compat.v1.GraphKeys.UPDATE_OPS)):
			
 
				         train_op = optimizer.minimize(loss, global_step=global_step)
			
 
				 
			
 
				     return tf.estimator.EstimatorSpec(
			
--- a/TensorFlow/Segmentation/UNet_3D_Medical/model/unet3d.py
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/model/unet3d.py
@@ -1,4 +1,4 @@
 
				-# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
			
 
				+# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
			
 
				 #
			
 
				 # Licensed under the Apache License, Version 2.0 (the "License");
			
 
				 # you may not use this file except in compliance with the License.
			
@@ -12,71 +12,84 @@
 
				 # See the License for the specific language governing permissions and
			
 
				 # limitations under the License.
			
 
				 
			
 
				+""" UNet3D model construction """
			
 
				 from model.layers import downsample_block, upsample_block, output_layer, input_block
			
 
				 
			
 
				 
			
 
				-class Builder:
			
 
				+class Builder: # pylint: disable=R0903
			
 
				+    """ Model builder """
			
 
				     def __init__(self, n_classes, mode, normalization='none'):
			
 
				+        """ Configure the unet3d builder
			
 
				+
			
 
				+        :param n_classes: Number of output channels
			
 
				+        :param mode: Estimator's execution mode
			
 
				+        :param normalization: Name of the normalization layer
			
 
				+        """
			
 
				         self._n_classes = n_classes
			
 
				         self._mode = mode
			
 
				         self._normalization = normalization
			
 
				 
			
 
				     def __call__(self, features):
			
 
				-        skip_128 = input_block(x=features,
			
 
				+        """ Build UNet3D
			
 
				+
			
 
				+        :param features: Input features
			
 
				+        :return: Output of the graph
			
 
				+        """
			
 
				+        skip_128 = input_block(inputs=features,
			
 
				                                out_channels=32,
			
 
				                                normalization=self._normalization,
			
 
				                                mode=self._mode)
			
 
				 
			
 
				-        skip_64 = downsample_block(x=skip_128,
			
 
				+        skip_64 = downsample_block(inputs=skip_128,
			
 
				                                    out_channels=64,
			
 
				                                    normalization=self._normalization,
			
 
				                                    mode=self._mode)
			
 
				 
			
 
				-        skip_32 = downsample_block(x=skip_64,
			
 
				+        skip_32 = downsample_block(inputs=skip_64,
			
 
				                                    out_channels=128,
			
 
				                                    normalization=self._normalization,
			
 
				                                    mode=self._mode)
			
 
				 
			
 
				-        skip_16 = downsample_block(x=skip_32,
			
 
				+        skip_16 = downsample_block(inputs=skip_32,
			
 
				                                    out_channels=256,
			
 
				                                    normalization=self._normalization,
			
 
				                                    mode=self._mode)
			
 
				 
			
 
				-        skip_8 = downsample_block(x=skip_16,
			
 
				+        skip_8 = downsample_block(inputs=skip_16,
			
 
				                                   out_channels=320,
			
 
				                                   normalization=self._normalization,
			
 
				                                   mode=self._mode)
			
 
				 
			
 
				-        x = downsample_block(x=skip_8,
			
 
				+        out = downsample_block(inputs=skip_8,
			
 
				+                               out_channels=320,
			
 
				+                               normalization=self._normalization,
			
 
				+                               mode=self._mode)
			
 
				+
			
 
				+        out = upsample_block(out, skip_8,
			
 
				                              out_channels=320,
			
 
				                              normalization=self._normalization,
			
 
				                              mode=self._mode)
			
 
				 
			
 
				-        x = upsample_block(x, skip_8,
			
 
				-                           out_channels=320,
			
 
				-                           normalization=self._normalization,
			
 
				-                           mode=self._mode)
			
 
				-
			
 
				-        x = upsample_block(x, skip_16,
			
 
				-                           out_channels=256,
			
 
				-                           normalization=self._normalization,
			
 
				-                           mode=self._mode)
			
 
				+        out = upsample_block(out, skip_16,
			
 
				+                             out_channels=256,
			
 
				+                             normalization=self._normalization,
			
 
				+                             mode=self._mode)
			
 
				 
			
 
				-        x = upsample_block(x, skip_32,
			
 
				-                           out_channels=128,
			
 
				-                           normalization=self._normalization,
			
 
				-                           mode=self._mode)
			
 
				+        out = upsample_block(out, skip_32,
			
 
				+                             out_channels=128,
			
 
				+                             normalization=self._normalization,
			
 
				+                             mode=self._mode)
			
 
				 
			
 
				-        x = upsample_block(x, skip_64,
			
 
				-                           out_channels=64,
			
 
				-                           normalization=self._normalization,
			
 
				-                           mode=self._mode)
			
 
				+        out = upsample_block(out, skip_64,
			
 
				+                             out_channels=64,
			
 
				+                             normalization=self._normalization,
			
 
				+                             mode=self._mode)
			
 
				 
			
 
				-        x = upsample_block(x, skip_128,
			
 
				-                           out_channels=32,
			
 
				-                           normalization=self._normalization,
			
 
				-                           mode=self._mode)
			
 
				+        out = upsample_block(out, skip_128,
			
 
				+                             out_channels=32,
			
 
				+                             normalization=self._normalization,
			
 
				+                             mode=self._mode)
			
 
				 
			
 
				-        return output_layer(x=x,
			
 
				+        return output_layer(out,
			
 
				                             out_channels=self._n_classes,
			
 
				                             activation='softmax')
			
--- a/TensorFlow/Segmentation/UNet_3D_Medical/runtime/arguments.py
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/runtime/arguments.py
@@ -1,4 +1,4 @@
 
				-# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
			
 
				+# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
			
 
				 #
			
 
				 # Licensed under the Apache License, Version 2.0 (the "License");
			
 
				 # you may not use this file except in compliance with the License.
			
@@ -12,6 +12,7 @@
 
				 # See the License for the specific language governing permissions and
			
 
				 # limitations under the License.
			
 
				 
			
 
				+""" Command line argument parsing """
			
 
				 import argparse
			
 
				 
			
 
				 PARSER = argparse.ArgumentParser(description="UNet-3D")
			
@@ -33,12 +34,14 @@ PARSER.add_argument('--normalization', choices=['instancenorm', 'batchnorm', 'gr
 
				                     default='instancenorm', type=str)
			
 
				 PARSER.add_argument('--include_background', dest='include_background', action='store_true', default=False)
			
 
				 PARSER.add_argument('--resume_training', dest='resume_training', action='store_true', default=False)
			
 
				+PARSER.add_argument('--seed', default=0, type=int)
			
 
				 
			
 
				 # Augmentations
			
 
				 PARSER.add_argument('--augment', dest='augment', action='store_true', default=False)
			
 
				 
			
 
				 # Dataset flags
			
 
				 PARSER.add_argument('--data_dir', required=True, type=str)
			
 
				+PARSER.add_argument('--input_shape', nargs='+', type=int, default=[128, 128, 128])
			
 
				 PARSER.add_argument('--batch_size', default=1, type=int)
			
 
				 PARSER.add_argument('--fold', default=0, type=int)
			
 
				 PARSER.add_argument('--num_folds', default=5, type=int)
			
--- a/TensorFlow/Segmentation/UNet_3D_Medical/runtime/hooks.py
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/runtime/hooks.py
@@ -1,4 +1,4 @@
 
				-# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
			
 
				+# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
			
 
				 #
			
 
				 # Licensed under the Apache License, Version 2.0 (the "License");
			
 
				 # you may not use this file except in compliance with the License.
			
@@ -12,6 +12,7 @@
 
				 # See the License for the specific language governing permissions and
			
 
				 # limitations under the License.
			
 
				 
			
 
				+""" Hooks for metric collection and benchmarking """
			
 
				 import time
			
 
				 
			
 
				 import numpy as np
			
@@ -20,33 +21,119 @@ import horovod.tensorflow as hvd
 
				 
			
 
				 
			
 
				 def get_hooks(params, logger):
			
 
				+    """ Get the appropriate set of hooks given the configuration
			
 
				+
			
 
				+    :param params: Dict with additional parameters
			
 
				+    :param logger: Logger object
			
 
				+    :return: Set of hooks
			
 
				+    """
			
 
				+
			
 
				+    hooks = []
			
 
				+
			
 
				+    if params.exec_mode == 'debug_train':
			
 
				+        return get_debug_training_hooks(logger, params)
			
 
				+
			
 
				+    if params.exec_mode == 'debug_predict':
			
 
				+        return get_debug_predict_hooks(logger, params)
			
 
				+
			
 
				     if 'train' in params.exec_mode:
			
 
				-        hooks = [hvd.BroadcastGlobalVariablesHook(0)]
			
 
				-        if hvd.rank() == 0:
			
 
				-            if params.benchmark:
			
 
				-                hooks += [ProfilingHook(warmup_steps=params.warmup_steps,
			
 
				-                                        global_batch_size=hvd.size() * params.batch_size,
			
 
				-                                        logger=logger,
			
 
				-                                        mode='train')]
			
 
				-            else:
			
 
				-                hooks += [TrainingHook(log_every=params.log_every,
			
 
				-                                       logger=logger,
			
 
				-                                       tensor_names=['total_loss_ref:0'])]
			
 
				-        return hooks
			
 
				-
			
 
				-    elif 'predict' == params.exec_mode:
			
 
				-        hooks = []
			
 
				-        if hvd.rank() == 0:
			
 
				-            if params.benchmark:
			
 
				-                hooks += [ProfilingHook(warmup_steps=params.warmup_steps,
			
 
				-                                        global_batch_size=params.batch_size,
			
 
				-                                        logger=logger,
			
 
				-                                        mode='test')]
			
 
				-            return hooks
			
 
				+        return get_training_hooks(logger, params)
			
 
				+
			
 
				+    if params.exec_mode == 'predict':
			
 
				+        return get_predict_hooks(logger, params)
			
 
				+
			
 
				+    return hooks
			
 
				+
			
 
				+
			
 
				+def get_debug_predict_hooks(logger, params):
			
 
				+    """ Return hooks for debugging prediction
			
 
				+
			
 
				+    :param logger: Logger object
			
 
				+    :param params: Dict with additional parameters
			
 
				+    :return: Estimator hooks
			
 
				+    """
			
 
				+    hooks = []
			
 
				+    if hvd.rank() == 0:
			
 
				+        hooks += [ProfilingHook(warmup_steps=params.warmup_steps,
			
 
				+                                global_batch_size=params.batch_size,
			
 
				+                                logger=logger,
			
 
				+                                mode='inference')]
			
 
				+    return hooks
			
 
				+
			
 
				+
			
 
				+def get_debug_training_hooks(logger, params):
			
 
				+    """ Return hooks for debugging training
			
 
				+
			
 
				+    :param logger: Logger object
			
 
				+    :param params: Dict with additional parameters
			
 
				+    :return: Estimator hooks
			
 
				+    """
			
 
				+    hooks = [hvd.BroadcastGlobalVariablesHook(0)]
			
 
				+    if hvd.rank() == 0:
			
 
				+        hooks += [TrainingHook(log_every=params.log_every,
			
 
				+                               logger=logger,
			
 
				+                               tensor_names=['total_loss_ref:0']),
			
 
				+                  ProfilingHook(warmup_steps=params.warmup_steps,
			
 
				+                                global_batch_size=hvd.size() * params.batch_size,
			
 
				+                                logger=logger,
			
 
				+                                mode='train')]
			
 
				+    return hooks
			
 
				+
			
 
				+
			
 
				+def get_predict_hooks(logger, params):
			
 
				+    """ Return hooks for prediction
			
 
				+
			
 
				+    :param logger: Logger object
			
 
				+    :param params: Dict with additional parameters
			
 
				+    :return: Estimator hooks
			
 
				+    """
			
 
				+    hooks = []
			
 
				+
			
 
				+    if hvd.rank() == 0:
			
 
				+        if params.benchmark:
			
 
				+            hooks = [ProfilingHook(warmup_steps=params.warmup_steps,
			
 
				+                                   global_batch_size=params.batch_size,
			
 
				+                                   logger=logger,
			
 
				+                                   mode='test')]
			
 
				+    return hooks
			
 
				+
			
 
				+
			
 
				+def get_training_hooks(logger, params):
			
 
				+    """ Return hooks for training
			
 
				+
			
 
				+    :param logger: Logger object
			
 
				+    :param params: Dict with additional parameters
			
 
				+    :return: Estimator hooks
			
 
				+    """
			
 
				+    hooks = [hvd.BroadcastGlobalVariablesHook(0)]
			
 
				+
			
 
				+    if hvd.rank() == 0:
			
 
				+        hooks += [OomReportingHook()]
			
 
				+
			
 
				+        if params.benchmark:
			
 
				+            hooks += [ProfilingHook(warmup_steps=params.warmup_steps,
			
 
				+                                    global_batch_size=hvd.size() * params.batch_size,
			
 
				+                                    logger=logger,
			
 
				+                                    mode='train')]
			
 
				+        else:
			
 
				+            hooks += [TrainingHook(log_every=params.log_every,
			
 
				+                                   logger=logger,
			
 
				+                                   tensor_names=['total_loss_ref:0'])]
			
 
				+
			
 
				+    return hooks
			
 
				 
			
 
				 
			
 
				 class ProfilingHook(tf.estimator.SessionRunHook):
			
 
				+    """ Hook for profiling metrics """
			
 
				+
			
 
				     def __init__(self, warmup_steps, global_batch_size, logger, mode):
			
 
				+        """ Build hook
			
 
				+
			
 
				+        :param warmup_steps: Number of steps to skip initially
			
 
				+        :param global_batch_size: Number of samples per bach in all gpus
			
 
				+        :param logger: Logger object
			
 
				+        :param mode: Estimator's execution mode
			
 
				+        """
			
 
				         self._warmup_steps = warmup_steps
			
 
				         self._global_batch_size = global_batch_size
			
 
				         self._step = 0
			
@@ -54,57 +141,86 @@ class ProfilingHook(tf.estimator.SessionRunHook):
 
				         self._logger = logger
			
 
				         self._mode = mode
			
 
				 
			
 
				-    def before_run(self, run_context):
			
 
				+    def before_run(self, _):
			
 
				+        """ Execute before run """
			
 
				         self._step += 1
			
 
				         if self._step >= self._warmup_steps:
			
 
				             self._timestamps.append(time.time())
			
 
				 
			
 
				-    def end(self, session):
			
 
				+    def end(self, _):
			
 
				+        """ Execute on completion """
			
 
				         deltas = np.array([self._timestamps[i + 1] - self._timestamps[i] for i in range(len(self._timestamps) - 1)])
			
 
				         stats = process_performance_stats(np.array(deltas),
			
 
				                                           self._global_batch_size,
			
 
				                                           self._mode)
			
 
				 
			
 
				-        self._logger.log(step=(), data={metric: float(value) for (metric, value) in stats})
			
 
				+        self._logger.log(step=(), data=stats)
			
 
				         self._logger.flush()
			
 
				 
			
 
				 
			
 
				 class TrainingHook(tf.estimator.SessionRunHook):
			
 
				+    """ Hook for training metrics """
			
 
				+
			
 
				     def __init__(self, log_every, logger, tensor_names):
			
 
				+        """ Build hook for training
			
 
				+
			
 
				+        :param log_every: Logging frequency
			
 
				+        :param logger: Logger object
			
 
				+        :param tensor_names: Names of the tensors to log
			
 
				+        """
			
 
				         self._log_every = log_every
			
 
				         self._step = 0
			
 
				         self._logger = logger
			
 
				         self._tensor_names = tensor_names
			
 
				 
			
 
				-    def before_run(self, run_context):
			
 
				-        run_args = tf.train.SessionRunArgs(
			
 
				+    def before_run(self, _):
			
 
				+        """ Execute before run """
			
 
				+        run_args = tf.compat.v1.train.SessionRunArgs(
			
 
				             fetches=self._tensor_names
			
 
				         )
			
 
				 
			
 
				         return run_args
			
 
				 
			
 
				     def after_run(self,
			
 
				-                  run_context,
			
 
				+                  _,
			
 
				                   run_values):
			
 
				+        """ Execute after run
			
 
				+
			
 
				+        :param run_values: Values to capture
			
 
				+        :return:
			
 
				+        """
			
 
				         if self._step % self._log_every == 0:
			
 
				             for i in range(len(self._tensor_names)):
			
 
				                 self._logger.log(step=(self._step,), data={self._tensor_names[i]: str(run_values.results[i])})
			
 
				         self._step += 1
			
 
				 
			
 
				-    def end(self, session):
			
 
				+    def end(self, _):
			
 
				+        """ Execute on completion """
			
 
				         self._logger.flush()
			
 
				 
			
 
				 
			
 
				+class OomReportingHook(tf.estimator.SessionRunHook):  # pylint: disable=R0903
			
 
				+    """ Report for out of memory errors"""
			
 
				+
			
 
				+    def before_run(self, _):  # pylint: disable=R0201
			
 
				+        """ Execute before run """
			
 
				+        return tf.estimator.SessionRunArgs(fetches=[],  # no extra fetches
			
 
				+                                           options=tf.compat.v1.RunOptions(report_tensor_allocations_upon_oom=True))
			
 
				+
			
 
				+
			
 
				 def process_performance_stats(timestamps, batch_size, mode):
			
 
				+    """ Get confidence intervals
			
 
				+
			
 
				+    :param timestamps: Collection of timestamps
			
 
				+    :param batch_size: Number of samples per batch
			
 
				+    :param mode: Estimator's execution mode
			
 
				+    :return: Stats
			
 
				+    """
			
 
				     timestamps_ms = 1000 * timestamps
			
 
				-    latency_ms = timestamps_ms.mean()
			
 
				-    std = timestamps_ms.std()
			
 
				-    n = np.sqrt(len(timestamps_ms))
			
 
				     throughput_imgps = (1000.0 * batch_size / timestamps_ms).mean()
			
 
				+    stats = {f"throughput_{mode}": throughput_imgps,
			
 
				+             f"latency_{mode}_mean": timestamps_ms.mean()}
			
 
				+    for level in [90, 95, 99]:
			
 
				+        stats.update({f"latency_{mode}_{level}": np.percentile(timestamps_ms, level)})
			
 
				 
			
 
				-    stats = [("throughput_{}".format(mode), str(throughput_imgps)),
			
 
				-             ('latency_{}:'.format(mode), str(latency_ms))]
			
 
				-    for ci, lvl in zip(["90%:", "95%:", "99%:"],
			
 
				-                       [1.645, 1.960, 2.576]):
			
 
				-        stats.append(("Latency_{} ".format(mode) + ci, str(latency_ms + lvl * std / n)))
			
 
				     return stats
			
--- a/TensorFlow/Segmentation/UNet_3D_Medical/runtime/parse_results.py
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/runtime/parse_results.py
@@ -1,4 +1,4 @@
 
				-# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
			
 
				+# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
			
 
				 #
			
 
				 # Licensed under the Apache License, Version 2.0 (the "License");
			
 
				 # you may not use this file except in compliance with the License.
			
@@ -12,11 +12,17 @@
 
				 # See the License for the specific language governing permissions and
			
 
				 # limitations under the License.
			
 
				 
			
 
				+""" Parsing of results"""
			
 
				 import os
			
 
				 import argparse
			
 
				 
			
 
				 
			
 
				 def parse_convergence_results(path, environment):
			
 
				+    """ Parse convergence results utility
			
 
				+
			
 
				+    :param path: Path to results
			
 
				+    :param environment: System environment
			
 
				+    """
			
 
				     whole_tumor = []
			
 
				     tumor_core = []
			
 
				     peritumoral_edema = []
			
@@ -26,8 +32,8 @@ def parse_convergence_results(path, environment):
 
				     if not logfiles:
			
 
				         raise FileNotFoundError("No logfile found at {}".format(path))
			
 
				     for logfile in logfiles:
			
 
				-        with open(os.path.join(path, logfile), "r") as f:
			
 
				-            content = f.readlines()
			
 
				+        with open(os.path.join(path, logfile), "r") as file_item:
			
 
				+            content = file_item.readlines()
			
 
				         if "TumorCore" not in content[-1]:
			
 
				             print("Evaluation score not found. The file", logfile, "might be corrupted.")
			
 
				             continue
			
--- a/TensorFlow/Segmentation/UNet_3D_Medical/runtime/setup.py
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/runtime/setup.py
@@ -1,4 +1,4 @@
 
				-# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
			
 
				+# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
			
 
				 #
			
 
				 # Licensed under the Apache License, Version 2.0 (the "License");
			
 
				 # you may not use this file except in compliance with the License.
			
@@ -12,19 +12,22 @@
 
				 # See the License for the specific language governing permissions and
			
 
				 # limitations under the License.
			
 
				 
			
 
				+""" Utils for setting up different parts of the execution """
			
 
				 import os
			
 
				-import pickle
			
 
				-import shutil
			
 
				+import multiprocessing
			
 
				 
			
 
				+import numpy as np
			
 
				 import dllogger as logger
			
 
				-import tensorflow as tf
			
 
				-import horovod.tensorflow as hvd
			
 
				 from dllogger import StdOutBackend, Verbosity, JSONStreamBackend
			
 
				 
			
 
				-from model.model_fn import unet_3d
			
 
				+import tensorflow as tf
			
 
				+import horovod.tensorflow as hvd
			
 
				 
			
 
				 
			
 
				 def set_flags():
			
 
				+    """ Set necessary flags for execution """
			
 
				+    tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
			
 
				+
			
 
				     os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
			
 
				     os.environ['CUDA_CACHE_DISABLE'] = '1'
			
 
				     os.environ['HOROVOD_GPU_ALLREDUCE'] = 'NCCL'
			
@@ -34,10 +37,16 @@ def set_flags():
 
				     os.environ['TF_ADJUST_SATURATION_FUSED'] = '1'
			
 
				     os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'
			
 
				     os.environ['TF_SYNC_ON_FINISH'] = '0'
			
 
				+    os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '0'
			
 
				 
			
 
				 
			
 
				 def prepare_model_dir(params):
			
 
				-    model_dir = os.path.join(params.model_dir, "model_chckpt")
			
 
				+    """ Prepare the directory where checkpoints are stored
			
 
				+
			
 
				+    :param params: Dict with additional parameters
			
 
				+    :return: Path to model dir
			
 
				+    """
			
 
				+    model_dir = os.path.join(params.model_dir, "model_checkpoint")
			
 
				     model_dir = model_dir if (hvd.rank() == 0 and not params.benchmark) else None
			
 
				     if model_dir is not None:
			
 
				         os.makedirs(model_dir, exist_ok=True)
			
@@ -47,14 +56,25 @@ def prepare_model_dir(params):
 
				     return model_dir
			
 
				 
			
 
				 
			
 
				-def build_estimator(params, model_dir):
			
 
				+def build_estimator(params, model_fn):
			
 
				+    """ Build estimator
			
 
				+
			
 
				+    :param params: Dict with additional parameters
			
 
				+    :param model_fn: Model graph
			
 
				+    :return: Estimator
			
 
				+    """
			
 
				+    np.random.seed(params.seed)
			
 
				+    tf.compat.v1.random.set_random_seed(params.seed)
			
 
				+    model_dir = prepare_model_dir(params)
			
 
				     config = tf.compat.v1.ConfigProto(gpu_options=tf.compat.v1.GPUOptions(), allow_soft_placement=True)
			
 
				 
			
 
				     if params.use_xla:
			
 
				-        config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
			
 
				+        config.graph_options.optimizer_options.global_jit_level = tf.compat.v1.OptimizerOptions.ON_1
			
 
				 
			
 
				     config.gpu_options.allow_growth = True
			
 
				     config.gpu_options.visible_device_list = str(hvd.local_rank())
			
 
				+    config.intra_op_parallelism_threads = 1
			
 
				+    config.inter_op_parallelism_threads = max(2, (multiprocessing.cpu_count() // hvd.size()) - 2)
			
 
				 
			
 
				     if params.use_amp:
			
 
				         config.graph_options.rewrite_options.auto_mixed_precision = 1
			
@@ -63,18 +83,23 @@ def build_estimator(params, model_dir):
 
				     checkpoint_steps = checkpoint_steps if not params.benchmark else None
			
 
				     run_config = tf.estimator.RunConfig(
			
 
				         save_summary_steps=params.max_steps,
			
 
				+        tf_random_seed=params.seed,
			
 
				         session_config=config,
			
 
				         save_checkpoints_steps=checkpoint_steps,
			
 
				         keep_checkpoint_max=1)
			
 
				 
			
 
				-    return tf.estimator.Estimator(
			
 
				-        model_fn=unet_3d,
			
 
				-        model_dir=model_dir,
			
 
				-        config=run_config,
			
 
				-        params=params)
			
 
				+    return tf.estimator.Estimator(model_fn=model_fn,
			
 
				+                                  model_dir=model_dir,
			
 
				+                                  config=run_config,
			
 
				+                                  params=params)
			
 
				 
			
 
				 
			
 
				 def get_logger(params):
			
 
				+    """ Get logger object
			
 
				+
			
 
				+    :param params: Dict with additional parameters
			
 
				+    :return: logger
			
 
				+    """
			
 
				     backends = []
			
 
				     if hvd.rank() == 0:
			
 
				         backends += [StdOutBackend(Verbosity.VERBOSE)]
			
--- a/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_infer_benchmark.sh
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_infer_benchmark.sh
--- a/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_infer_benchmark_TF-AMP.sh
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_infer_benchmark_TF-AMP.sh
--- a/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_train_benchmark.sh
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_train_benchmark.sh
--- a/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_train_benchmark_TF-AMP.sh
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_train_benchmark_TF-AMP.sh
--- a/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_train_full.sh
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_train_full.sh
--- a/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_train_full_TF-AMP.sh
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_train_full_TF-AMP.sh
--- a/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_train_single.sh
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_train_single.sh
--- a/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_train_single_TF-AMP.sh
+++ b/TensorFlow/Segmentation/UNet_3D_Medical/examples/unet3d_train_single_TF-AMP.sh