HPC

NVIDIA RTX 3090 RELION Cryo-EM Benchmarks and Analysis

November 16, 2020
4 min read
background-2462431_1920.jpg

RTX 3090 HPC Benchmarks for RELION

Overview

As a leading supplier of scientific workstations and servers, Exxact has conducted benchmarks for RELION Cryo-EM on the new NVIDIA GeForce RTX 3090 GPUs. The results are quite good, especially when paired with an AMD EPYC 7552 Processor, we were able to run the plasmodium ribosome benchmark in under an hour.

Interested in getting faster Cryo-EM results?
Learn more about RELION GPU Accelerated Systems for Cryo-EM

RELION GPU Support Summary

With advancements in automation, compute power, and visual technology, the scope and complexity of datasets used in cryo-EM have grown substantially. GPU support and acceleration are essential for the flexibility of resource management, prevention of memory limitations, and to address the most computationally intensive processes of cryo-EM such as image classification, and high-resolution refinement.

System Specs

Base System Configuration
Nodes1
ProcessorAMD EPYC 7552
Processor Count2
Total Logical Cores48
Memory TypeDDR4
Memory Size512 GB
StorageSSD
OSCentOS 7
CUDA Version10.2
Relion Version3

RTX 3090 Relion Benchmarks 2GPU & 4 GPU Configuration

RTX 3090 Benchmark

Plasmodium ribosome data set ftp://ftp.mrc-lmb.cam.ac.uk/pub/scheres/relion_benchmark.tar.gz

Benchmark Parameters 4 GPU

<ul>mpirun -n 5 /usr/local/relion-3/bin/relion_refine_mpi --j 6 --gpu --pool 100 --dont_combine_weights_via_disc --keep_scratch --reuse_scratch --i Particles/shiny_2sets.star --ref emd_2660.map:mrc --firstiter_cc --ini_high 60 --ctf --ctf_corrected_ref --iter 25 --tau2_fudge 4 --particle_diameter 360 --K 6 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --sym C1 --norm --scale --random_seed 0 --o class3d</ul>

Benchmark Parameters 2 GPU

<ul>mpirun -n 3 /usr/local/relion-3/bin/relion_refine_mpi --j 6 --gpu --pool 100 --dont_combine_weights_via_disc --keep_scratch --reuse_scratch --i Particles/shiny_2sets.star --ref emd_2660.map:mrc --firstiter_cc --ini_high 60 --ctf --ctf_corrected_ref --iter 25 --tau2_fudge 4 --particle_diameter 360 --K 6 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --sym C1 --norm --scale --random_seed 0 --o class3d</ul>

Notes on System Memory

Although a minimum of 64 GB of RAM is recommended to run RELION with small image sizes (eg. 200×200) on either the original or accelerated versions of RELION, 360×360 problems run best on systems with more than 128GB of RAM. Systems with 256GB or more RAM are recommended for the CPU-accelerated kernels on larger image sizes. Insufficient memory causes individual MPI ranks to be killed, leading to zombie RELION jobs.

MPI Settings

Where some users may want to run more than one MPI rank per GPU, sufficient GPU memory is needed. Each MPI-slave that shares a GPU increases the use of memory. In this case, however, it’s recommended running a single MPI-slave per GPU for good performance and stable execution.

Notes on Scaling

The GPUs tested were Turing/Volta-based and performed similarly. As a result, it is more beneficial to scale out than scale up. Another thing to note is the diminishing returns in scaling once you pass 4 GPUs.

Have any questions about RELION or other applications for molecular dynamics?
Contact Exxact Today

Topics

background-2462431_1920.jpg
HPC

NVIDIA RTX 3090 RELION Cryo-EM Benchmarks and Analysis

November 16, 20204 min read

RTX 3090 HPC Benchmarks for RELION

Overview

As a leading supplier of scientific workstations and servers, Exxact has conducted benchmarks for RELION Cryo-EM on the new NVIDIA GeForce RTX 3090 GPUs. The results are quite good, especially when paired with an AMD EPYC 7552 Processor, we were able to run the plasmodium ribosome benchmark in under an hour.

Interested in getting faster Cryo-EM results?
Learn more about RELION GPU Accelerated Systems for Cryo-EM

RELION GPU Support Summary

With advancements in automation, compute power, and visual technology, the scope and complexity of datasets used in cryo-EM have grown substantially. GPU support and acceleration are essential for the flexibility of resource management, prevention of memory limitations, and to address the most computationally intensive processes of cryo-EM such as image classification, and high-resolution refinement.

System Specs

Base System Configuration
Nodes1
ProcessorAMD EPYC 7552
Processor Count2
Total Logical Cores48
Memory TypeDDR4
Memory Size512 GB
StorageSSD
OSCentOS 7
CUDA Version10.2
Relion Version3

RTX 3090 Relion Benchmarks 2GPU & 4 GPU Configuration

RTX 3090 Benchmark

Plasmodium ribosome data set ftp://ftp.mrc-lmb.cam.ac.uk/pub/scheres/relion_benchmark.tar.gz

Benchmark Parameters 4 GPU

<ul>mpirun -n 5 /usr/local/relion-3/bin/relion_refine_mpi --j 6 --gpu --pool 100 --dont_combine_weights_via_disc --keep_scratch --reuse_scratch --i Particles/shiny_2sets.star --ref emd_2660.map:mrc --firstiter_cc --ini_high 60 --ctf --ctf_corrected_ref --iter 25 --tau2_fudge 4 --particle_diameter 360 --K 6 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --sym C1 --norm --scale --random_seed 0 --o class3d</ul>

Benchmark Parameters 2 GPU

<ul>mpirun -n 3 /usr/local/relion-3/bin/relion_refine_mpi --j 6 --gpu --pool 100 --dont_combine_weights_via_disc --keep_scratch --reuse_scratch --i Particles/shiny_2sets.star --ref emd_2660.map:mrc --firstiter_cc --ini_high 60 --ctf --ctf_corrected_ref --iter 25 --tau2_fudge 4 --particle_diameter 360 --K 6 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --sym C1 --norm --scale --random_seed 0 --o class3d</ul>

Notes on System Memory

Although a minimum of 64 GB of RAM is recommended to run RELION with small image sizes (eg. 200×200) on either the original or accelerated versions of RELION, 360×360 problems run best on systems with more than 128GB of RAM. Systems with 256GB or more RAM are recommended for the CPU-accelerated kernels on larger image sizes. Insufficient memory causes individual MPI ranks to be killed, leading to zombie RELION jobs.

MPI Settings

Where some users may want to run more than one MPI rank per GPU, sufficient GPU memory is needed. Each MPI-slave that shares a GPU increases the use of memory. In this case, however, it’s recommended running a single MPI-slave per GPU for good performance and stable execution.

Notes on Scaling

The GPUs tested were Turing/Volta-based and performed similarly. As a result, it is more beneficial to scale out than scale up. Another thing to note is the diminishing returns in scaling once you pass 4 GPUs.

Have any questions about RELION or other applications for molecular dynamics?
Contact Exxact Today

Topics