Benchmarks

NVIDIA RTX 6000 Ada RELION Cryo-EM Benchmarks and Analysis

March 30, 2023

5 min read

NVIDIA RTX 6000 Ada Benchmarks Overview

As a value-added supplier of scientific workstations and servers, Exxact regularly provides reference benchmarks in various GPU configurations to guide Cryogenic Electron Microscopy (cryo-EM) scientists looking to procure systems optimized for their research. In this blog, we benchmark the NVIDIA RTX 6000 Ada performance using Relion Cryo-EM, comparing GPU runtime to Total Runtime performance.

Software Summary

RELION (REgularised LIkelihood OptimisatioN), or Relion, has revolutionized the cryo-EM field since 2012. Developed by Scheres Lab at the MRC Laboratory of Molecular Biology, this stand-alone computer program uses a Bayesian approach to refine macromolecular structures by single-particle analysis of electron cryo-microscopy data.

The development of RELION is supported through long-term funding by the UK Medical Research Council and is distributed under a GPLv2 license. This means that anyone (including commercial users) can download, use and modify RELION free of cost. The MRC Laboratory just request that if RELION is useful in your work, you will cite their papers.

Exxact Benchmark System Specifications:

Processor / Count	2x AMD EPYC 9654
Physical Cores	192
Memory	768GB DDR5 Memory
Storage	3.84 TB NVMe Drive
OS	Ubuntu 20.04
CUDA Version	11.8
RELION Version	4.0

GPU Benchmarks

Benchmarks below are 3D classifications performed using the Plasmodium Ribosome dataset

If runs are performed using the --scratch_dir option in RELION, the time to copy to scratch is subtracted from the total runtime of the command. This allows for the comparison of runs to be independent of storage pool variability.
"GPU Time" is reported as the time spent on the Expectation step in each of the 25 iterations of classification, which provides a solid snapshot of GPU speed as this is the primary GPU compute step in RELION.
Benchmarks in this chart are performed using four threads per process (--j 4). The system used in these results has two GPU cards installed.

Performance Overview

The NVIDIA RTX 6000 Ada performs as expected showing improvements when using dual GPU configuration
The optimum setting for NVIDIA RTX 6000 Ada on RELION is a calculation using J4 (4 threads) and J6 (6 threads) for single and dual GPU configuration respectively. Use the best benchmark value since this will be the optimal configuration.
2x RTX 6000 Ada (3556.31s) has a 81% gain over 2x RTX A6000 (6435.0s)
2x RTX 6000 Ada (3556.31s) has a 12.7% underperformance to 2x RTX 4090 (3103.0s)
- This can be attributed to the RTX 6000 Ada's lower clock speeds in favor of stability.
- Different CPU platforms - RTX 6000 Ada on Dual EPYC & RTX 4090 on AMD Threadripper PRO.
- However, RTX 6000 Ada supports quad GPU setup in workstations with more scalability in servers, whereas RTX 4090 is not supported in servers.

Notes on System Memory

Although a minimum of 64GB of RAM is recommended to run RELION with small image sizes (eg. 200×200) on either the original or accelerated versions of RELION, 360×360 problems run best on systems with more than 128GB of RAM. Systems with 256GB or more RAM are recommended for the CPU-accelerated kernels on larger image sizes. Insufficient memory causes individual MPI-ranks to be killed, leading to zombie RELION jobs.

RELION GPU Support Summary

With advancements in automation, computing power, and visual technology, the scope and complexity of datasets used in cryo-EM have grown substantially. GPU support and acceleration are essential for the flexibility of resource management, prevention of memory limitations, and addressing the most computationally intensive processes of cryo-EM such as image classification, and high-resolution refinement.

MPI Settings

Where some users may want to run more than one MPI-rank per GPU, sufficient GPU memory is needed. Each MPI-slave that shares a GPU increases the use of memory. In this case, however, it’s recommended to run a single MPI-slave per GPU for good performance and stable execution.

Note: Machines with at least two GPU cards would be preferable for refinement using GPUs. If you need to (or want to) run multiple MPI-ranks on each GPU, RELION will attempt to do so in an efficient way if you simply specify more ranks than there are GPUs.

You can run multiple threads just as with previous versions of RELION, using the --j <x> option. Each MPI process will launch the specified number of threads. This may speed up calculations, without costing much extra memory either on the CPU (RAM) or on the GPU.

The MPI Settings for this configuration used 2, 3, and 5 MPI Ranks.

Notes on Scaling

The NVIDIA RTX 6000 Ada is a dual-slot GPU built with scalability in mind. Scale up your GPU compute with up to 8 or even 10 GPU server configurations. Keep in mind that scaling your GPU resources can have diminishing returns.

Have any questions about RELION or other applications for molecular dynamics?
Contact Exxact Today

Topics

Have any questions?

Benchmarks