Technologies / Deep Learning Solutions / Deep Learning Inference Solutions

DEEP LEARNING INFERENCE SOLUTIONS


Exxact Deep Learning Inference Servers


HIGH-THROUGHPUT INFERENCE

Exxact Deep Learning Inference Servers are optimized for use in image and video search, video analytics, object classification and detection, and a host of other usages.


LOW-LATENCY THROUGHPUT

Exxact Deep Learning Inference Servers cater to real-time use cases involving multiple inferences per query, such as automatic speech recognition, speech to text, natural language processing, and more.


MAXIMUM EFFICIENCY

The Turing-based Tesla T4 offers efficiency far exceeding either the Tesla P4 or the Tesla V100. With its small form factor and 70-watt (W) footprint design, it's the perfect GPU for inference solutions.


HIGH PERFORMANCE HARDWARE

From NVIDIA T4 Inference GPUs, or Xilinx FPGA Accelerators, Exxact Inference Solutions to meet your most demanding deep learning inference tasks.


GET MORE DONE

Have peace of mind and focus on what matters most, knowing your system is backed by a 3 year warranty and support.


PREINSTALLED TOOLS

Including TensorRT inference server, which is production-ready and supports multiple frameworks (TensorFlow, Caffe2, TensorRT/TensorFlow, ONNX and custom backends)


Suggested Exxact Deep Learning Inference Systems


Entry-Level

Tensor Server

2x Intel Xeon Scalable (Silver)
4x NVIDIA Tesla T4
256GB Memory
1x 2TB SSD (OS/Data)

CONTACT SALES FOR PRICING

START YOUR ORDER

Mid-Range

Tensor Server

2x Intel Xeon Scalable (Gold)
8x NVIDIA Tesla T4
512GB Memory
1x 2TB SSD (OS)
up to 5x 2TB SSD (Data)

CONTACT SALES FOR PRICING

START YOUR ORDER

High-End

Tensor Server

2x Intel Xeon Scalable (Gold)
20x NVIDIA Tesla T4
512GB Memory
1x 2TB SSD (OS)
up to 5x 2TB SSD (Data)

CONTACT SALES FOR PRICING

START YOUR ORDER

Not Sure What You Need?


Let us know what kind of project you have
planned and we can help you decide.


TELL US WHAT YOU WANT TO DO
Exxact Server

NVIDIA TensorRT Hyperscale Inference Platform


The NVIDIA TensorRTâ„¢ Inference Platform is designed to make deep learning accessible to every developer and data scientist anywhere in the world. Utilizing the new Turing architecture, Tesla T4 accelerates all types of neural networks for images, speech, translation, and recommendation systems. Tesla T4 supports a wide variety of precision and accelerates all major DL frameworks, including TensorFlow, PyTorch, MXNet, Chainer, and Caffe2.

NVIDIA TensorRT optimizer and runtime unlocks the power of Turing GPUs across a wide range of precision, from FP32 down to INT4. In addition, TensorRT integrates with TensorFlow and supports all major frameworks through the ONNX format. NVIDIA TensorRT Inference Server is a production-ready deep learning inference server. It reduces costs by maximizing utilization of GPU servers and saves time by integrating seamlessly into production architectures.

For large-scale, multi-node deployments, Kubernetes enables enterprises to scale up training and inference deployment to multi-cloud GPU clusters seamlessly. It allowssoftware developers and DevOps engineers to automate deployment, maintenance, scheduling, and operation of multiple GPU-accelerated application containers acrossclusters of nodes. With Kubernetes on NVIDIA GPUs, they can build and deploy GPU-accelerated deep learning training or inference applications to heterogeneous GPUclusters and scale seamlessly.

Exxact Deep Learning Inference Servers Maximize Performance Efficiency


Exxact Deep Learning Inference Servers powered by NVIDIA Tesla T4 GPUs bring revolutionary multi-precision inference performance to efficiently accelerate the diverse applications of modern AI. In addition, the Tesla T4 greatly outperforms it's predecessor, the Tesla P4.


System configs: Xeon Scalable Processor Gold 6140 @ 3.7 GHz and a single Tesla P4 or V100: Tesla GPUs running TensorRT 4.0.1.6, Telsa T4 (pre-production) running TensorRT 5RC: CPU running Intel OpenVINO 2018 R2: batch size 128: recision: FP32 for CPU, Mixed precision (FP16 compute/ FP32 accumulate) for V100, INT8, for P4 and T4

Deep Learning Training vs Deep Learning Inference:
Which GPU is right for me?


The T4 is truly groundbreaking for performance and efficiency for deep learning inference. But how does it stack up for deep learning training? Just because you can train on a T4, it doesn't mean you should. If your goal is training deep neural networks we recommend using NVIDIA Tesla V100 GPUs, and the numbers below (courtesy NVIDIA) back that up.



RESNET-50 IMAGE TRAINING
NVIDIA TESLA V100 AND NVIDIA TESLA T4

However, If your deep learning inference is your primary workload, the NVIDIA T4 is a better choice

RESNET-50 INFERENCE LATENCY

RESNET-50 POWER EFFICIENCY


DGX-1: 1X Tesla V100-SXM2-16GB, E5-2698V4 2.2 GHz | TensorRT 5.0 | Batch Size = 1 | Precision: INT8 | Dataset: Synthetic. Supermicro SYS-4029GP-TRT T4: 1x Tesla T4, Gold 6140 23. GHz | TensorRT 5.0 | Batch Size: = 1| Precision: INT8 | Dataset: Synthetic

Use Cases for Inference Solutions


DATA CENTER



SELF DRIVING CARS



INTELLIGENT VIDEO ANALYTICS



EMBEDDED DEVICES



Exxact Tensor TS2-673917-DLI 2U 2x Intel Xeon processor server - Deep Learning Inference Solution
MPN: TS2-673917-DLI
  • Rack Height: 2U
  • Processor Supported: 2x Intel Xeon Scalable Family
  • Drive Bays: 8x 3.5" Hot-Swap (2x NVMe)
  • Supports up to 4x Double-Wide cards
Contact sales for pricing
Exxact Tensor TS4-1910483-DLI 4U 2x Intel Xeon processor - Deep Learning Inference Solution
MPN: TS4-1910483-DLI
  • Rack Height: 4U
  • Processor Supported: 2x Intel Xeon Scalable Family
  • Drive Bays: 24x 3.5" Hot-Swap
  • Supports up to 20x NVIDIA Tesla T4 GPUs
Contact sales for pricing
Exxact Tensor TWS-1686525-DLI 2x Intel Xeon CPU workstation - Deep Learning Inference Solution
MPN: TWS-1686525-DLI
  • Form Factor: 4U Rackmountable / Tower
  • Processor: 2x Intel Xeon Scalable family
  • Drive Bays: 4x 3.5"/2.5" Hot-Swap
  • Up to 5x Double-Wide cards
Contact sales for pricing