|
@@ -1,6 +1,6 @@
|
|
|
# Deploying the ResNeXt101-32x4d model using Triton Inference Server
|
|
# Deploying the ResNeXt101-32x4d model using Triton Inference Server
|
|
|
|
|
|
|
|
-The [NVIDIA Triton Inference Server](https://github.com/NVIDIA/trtis-inference-server) provides a datacenter and cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any number of GPU or CPU models being managed by the server.
|
|
|
|
|
|
|
+The [NVIDIA Triton Inference Server](https://github.com/NVIDIA/triton-inference-server) provides a datacenter and cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any number of GPU or CPU models being managed by the server.
|
|
|
|
|
|
|
|
This folder contains instructions on how to deploy and run inference on
|
|
This folder contains instructions on how to deploy and run inference on
|
|
|
Triton Inference Server as well as gather detailed performance analysis.
|
|
Triton Inference Server as well as gather detailed performance analysis.
|
|
@@ -28,7 +28,7 @@ Triton Inference Server as well as gather detailed performance analysis.
|
|
|
The ResNeXt101-32x4d is a model introduced in the [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/pdf/1611.05431.pdf) paper.
|
|
The ResNeXt101-32x4d is a model introduced in the [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/pdf/1611.05431.pdf) paper.
|
|
|
It is based on regular ResNet model, substituting 3x3 convolutions inside the bottleneck block for 3x3 grouped convolutions.
|
|
It is based on regular ResNet model, substituting 3x3 convolutions inside the bottleneck block for 3x3 grouped convolutions.
|
|
|
|
|
|
|
|
-The ResNeXt101-32x4d model can be deployed for inference on the [NVIDIA Triton Inference Server](https://github.com/NVIDIA/trtis-inference-server) using
|
|
|
|
|
|
|
+The ResNeXt101-32x4d model can be deployed for inference on the [NVIDIA Triton Inference Server](https://github.com/NVIDIA/triton-inference-server) using
|
|
|
TorchScript, ONNX Runtime or TensorRT as an execution backend.
|
|
TorchScript, ONNX Runtime or TensorRT as an execution backend.
|
|
|
|
|
|
|
|
## Setup
|
|
## Setup
|