Molecular Dynamics

AMBER 24 NVIDIA GPU Benchmarks

June 13, 2024
10 min read
EXX-Blog-Amber24-Benchmarks.jpg

AMBER 24 GPU Benchmarks on NVIDIA GeForce, RTX, and Data Center GPUs

*All benchmarks were performed using a single GPU configuration using Amber 24 & AmberTools 24 on NVIDIA CUDA 12.3 which could explain the slight increase in performance from Amber 22.

**NVIDIA GeForce and RTX GPUs were tested on an Exxact workstation and can have a maximum of a 2-way configuration. All other NVIDIA Professional GPUs (RTX and Data Center GPUs) are tested in an Exxact server and support 8-way GPU configuration.

*** Since AMBER computations are only performed by GPUs via CUDA, the variation between CPUs from workstation and server systems have little to no effect on throughput between benchmarks.

Quick AMBER GPU Benchmark Takeaways

  • NVIDIA Ada Lovelace Generation GPUs outperform all Ampere Generation GPUs. Ada Generation GPUs have increased performance and energy efficiency which are worth the increase in price over Ampere.
    • NVIDIA RTX 4090 offers the best performance but the physical GPU card size thus the lack of multi-GPU scalability is a disadvantage.
    • RTX 6000 Ada offers a similar performance to the RTX 4090. The slower speed is attributed to a lower clock speed for peak reliability. The RTX 6000 Ada GPU has a larger memory capacity of 48GB and is multi-GPU scalable.
    • RTX 5000 Ada and RTX 4500 Ada are performing well above last generation's flagship RTX A6000. These might be the new best GPUs for AMBER with great cost to performance.
    • Even the mid-range consumer card RTX 4070 Ti shows considerable performance gains over the last generation flagship RTX 3090
  • NVIDIA H100 is on par as 3rd most powerful (behind the RTX 4090 and RTX 6000 Ada) winning in only a couple of tests. H100 is more geared towards AI workloads and its high price tag makes it not worthwhile GPU for simulation-only workloads.
  • For the larger simulations, such as STMV Production NPT 4fs, the high-speed memory, memory capacity, and GPU clock speed play a large factor in performance. H100, RTX 6000 Ada, and RTX 4090 dominate this department.
  • For smaller simulations, the options are wider. The RTX 4070 Ti shows promising performance, but the RTX 5000 Ada and RTX 4080 deliver exceptional performance, trailing behind the bigger and better RTX 6000 and RTX 4090.

We're Here to Deliver the Tools to Power Your Research

With access to the highest-performing hardware, at Exxact, we offer customizable platforms for AMBER optimized for your deployment, budget, and desired performance so you can make an impact with your research!

Configure your Ideal GPU System for AMBER

Benchmark Hardware and Specifications

GPU Benchmarked

Exxact System Used for Benchmarks

System SKUVWS-148320247TS4-173535991
Workstation or ServerWorkstationServer
Nodes11
Processor / Count1x AMD TR PRO 5995WX2x AMD EPYC 7552
Total Logical Cores6496
Memory256GB DDR4512GB DDR4 ECC
Storage4TB NVMe SSD2.84TB NVMe SSD
OSCentos 7Centos 7
CUDA Version12.012.0
AMBER Version2424

BenchmarkRTX 6000 AdaRTX 5000 AdaRTX 4500 AdaH100 PCIeRTX 4090RTX 4080RTX 4070 TiA100 PCIeRTX A6000RTX A5500RTX A5000RTX A4500RTX A4000RTX 3090RTX 3080
JAC Production NVE 4fs1697.341562.481297.881532.081706.211596.791385.681226.401132.861116.011029.89963.52841.321228.411160.34
JAC Production NPT 4fs1666.841550.321278.021500.371641.181598.791293.921257.771117.951126.871025.84951.60829.491197.091158.95
JAC Production NVE 2fs917.70843.16698.69806.39934.00868.79740.55642.79615.92596.22559.03518.54448.53655.52608.60
JAC Production NPT 2fs906.35835.59693.10752.49915.99843.32722.34654.20601.67586.07544.16521.92443.81643.68599.03
FactorIX Production NVE 2fs489.93406.98306.57410.77488.16400.22315.32283.70273.64242.84225.58201.79161.63276.82246.65
FactorIX Production NPT 2fs442.91376.67288.13385.12471.74377.42299.88264.03253.98233.43216.11193.83158.24262.36234.06
Cellulose Production NVE 2fs123.9895.9167.63125.82136.8596.1672.9090.1763.1555.0749.6342.1433.5767.0857.07
Cellulose Production NPT 2fs114.9992.3263.78113.81125.6391.3068.1482.7458.0052.0347.8640.3331.8960.8151.68
STMV Production NPT 4fs70.9755.3037.5874.5082.6057.9939.3653.8439.0835.1232.2927.6621.8741.0534.05
TRPCage GB 2fs1477.121448.251424.881399.511491.751578.441512.261027.351145.561176.411209.861175.601248.801231.971348.60
Myoglobin GB 2fs1016.00841.93740.651094.57888.21843.83772.38656.65648.58592.84580.02536.57491.05614.32624.68
Nucleosome GB 2fs31.5926.1118.8037.8335.9027.6020.8729.6019.7015.3215.1811.5810.9821.1217.60

JAC Production NVE 4fs - 23,558 Atoms

AMBER Benchmark on GPUs JAC Production NVE 4fs

JAC Production NPT 4fs- 23,558 Atoms

AMBER Benchmark on GPUs JAC Production NPT 4fs

JAC Production NVE 2fs - 23,558 Atoms

AMBER Benchmark on GPUs JAC Production NVE 2fs

JAC Production NPT 2fs - 23,558 Atoms

AMBER Benchmark on GPUs JAC Production NPT 2fs

FactorIX Production NVE 2FS - 90,906 Atoms

AMBER Benchmark on GPUs FactorIX Production NVE 2FS

FactorIX Production NPT 2fs - 90,906 Atoms

AMBER Benchmark on GPUs FactorIX Production NPT 2fs

Cellulose Production NVE 2fs - 408,609 Atoms

AMBER Benchmark on GPUs Cellulose Production NVE 2fs

Cellulose Production NPT 2fs - 408,609 Atoms

AMBER Benchmark on GPUs Cellulose Production NPT 2fs

STMV Production NPT 4fs - 1,067,095 Atoms

AMBER Benchmark on GPUs STMV Production NPT 4fs

TRPCage Production GB - 304 Atoms [Implicit]

AMBER Benchmark on GPUs TRPCage Production GB

Myoglobin Production GB - 2,492 Atoms [Implicit]

AMBER Benchmark on GPUs - Myoglobin Production GB

Nucleosome Production GB - 25,095 Atoms [Implicit]

AMBER Benchmark on GPUs Nucleosome Production GB

AMBER 24 Background & Hardware Recommendations

AMBER consists of several different software packages with the molecular dynamics engine PMEMD as the most compute-intensive and the engine we want to optimize the most. This consists of single CPU (pmemd), multi-CPU (pmemd.MPI), single-GPU (pmemd.cuda) and multi-GPU (pmemd.cuda.MPI) versions. Traditionally, MD simulations are executed on CPUs. However, the increased use of GPUs and native support to run AMBER MD simulations on CUDA have made GPUs the most logical choice for speed and cost efficiency.

Most AMBER simulations can fit on a single GPU and run strictly on CUDA, thus the CPU, CPU memory (RAM), and storage speed have little to no influence on simulation throughput performance. Running simulations on a single GPU means that parallelizing multi-GPUs on a single calculation won’t incur much speed up. To fully utilize a multi-GPU or multi-node deployment is to run multiple independent AMBER simulations simultaneously on multiple GPUs in the same node or on different nodes.

Hardware Recommendation

Our top 3 GPU recommendations for running AMBER and our reasonings:

  • For cost-effective parallel computing, the RTX 5000 Ada or the RTX 4500 Ada offers A-tier and B-tier performance for much lower cost compared with the RTX 6000 Ada. The additional cost of the RTX 6000 Ada stems from the better performance and larger memory, which won’t be utilized in most AMBER calculations. The extra cost can be allocated to more GPUs and thus more calculations running in parallel. A deployment with 8x RTX 4500 Ada GPUs is similar in price to a deployment with 4x RTX 6000 Ada GPUs, but can drastically parallelize your workflow.
  • For peak single GPU throughput with smaller teams would be the NVIDIA RTX 4090 with its S+ tier performance. If you don’t need to run multiple simulations simultaneously, the RTX 4090 delivers the fastest results.
  • For peak throughput and parallel computing, the RTX 6000 Ada GPU delivers S-tier performance akin to the RTX 4090 but allows deployments to slot 4x GPUs in a 2U node or 8x GPUs in a 4U node.

Our CPU & Memory Recommendation

  • There is no need to overspend on a CPU since it will not run the calculations. The bare minimum would be to allocate a CPU core for every GPU in the system. Additional GPUs require dual CPUs for additional PCIe lanes.
  • Recommended RAM would be 32GB per GPU. You can get by with 16GB of RAM per GPU as well.

Conclusion

Not all use cases are the same and AMBER is most likely not the only application used in your research. At Exxact Corp., we strive to provide the resources to configure the best custom system fit for you.

Since AMBER’s performance is not highly affected by the different setups, you may benefit from optimizing your system to other more selective application requirements that you may also use. Applications like GROMACS or NAMD can benefit from additional cores or higher-end CPUs and can be a tradeoff that can benefit other workflows.

We're Here to Deliver the Tools to Power Your Research

With access to the highest performing hardware, at Exxact, we can offer the platform optimized for your deployment, budget, and desired performance so you can make an impact with your research!

Configure your Life Science Solution Today

Topics

EXX-Blog-Amber24-Benchmarks.jpg
Molecular Dynamics

AMBER 24 NVIDIA GPU Benchmarks

June 13, 202410 min read

AMBER 24 GPU Benchmarks on NVIDIA GeForce, RTX, and Data Center GPUs

*All benchmarks were performed using a single GPU configuration using Amber 24 & AmberTools 24 on NVIDIA CUDA 12.3 which could explain the slight increase in performance from Amber 22.

**NVIDIA GeForce and RTX GPUs were tested on an Exxact workstation and can have a maximum of a 2-way configuration. All other NVIDIA Professional GPUs (RTX and Data Center GPUs) are tested in an Exxact server and support 8-way GPU configuration.

*** Since AMBER computations are only performed by GPUs via CUDA, the variation between CPUs from workstation and server systems have little to no effect on throughput between benchmarks.

Quick AMBER GPU Benchmark Takeaways

  • NVIDIA Ada Lovelace Generation GPUs outperform all Ampere Generation GPUs. Ada Generation GPUs have increased performance and energy efficiency which are worth the increase in price over Ampere.
    • NVIDIA RTX 4090 offers the best performance but the physical GPU card size thus the lack of multi-GPU scalability is a disadvantage.
    • RTX 6000 Ada offers a similar performance to the RTX 4090. The slower speed is attributed to a lower clock speed for peak reliability. The RTX 6000 Ada GPU has a larger memory capacity of 48GB and is multi-GPU scalable.
    • RTX 5000 Ada and RTX 4500 Ada are performing well above last generation's flagship RTX A6000. These might be the new best GPUs for AMBER with great cost to performance.
    • Even the mid-range consumer card RTX 4070 Ti shows considerable performance gains over the last generation flagship RTX 3090
  • NVIDIA H100 is on par as 3rd most powerful (behind the RTX 4090 and RTX 6000 Ada) winning in only a couple of tests. H100 is more geared towards AI workloads and its high price tag makes it not worthwhile GPU for simulation-only workloads.
  • For the larger simulations, such as STMV Production NPT 4fs, the high-speed memory, memory capacity, and GPU clock speed play a large factor in performance. H100, RTX 6000 Ada, and RTX 4090 dominate this department.
  • For smaller simulations, the options are wider. The RTX 4070 Ti shows promising performance, but the RTX 5000 Ada and RTX 4080 deliver exceptional performance, trailing behind the bigger and better RTX 6000 and RTX 4090.

We're Here to Deliver the Tools to Power Your Research

With access to the highest-performing hardware, at Exxact, we offer customizable platforms for AMBER optimized for your deployment, budget, and desired performance so you can make an impact with your research!

Configure your Ideal GPU System for AMBER

Benchmark Hardware and Specifications

GPU Benchmarked

Exxact System Used for Benchmarks

System SKUVWS-148320247TS4-173535991
Workstation or ServerWorkstationServer
Nodes11
Processor / Count1x AMD TR PRO 5995WX2x AMD EPYC 7552
Total Logical Cores6496
Memory256GB DDR4512GB DDR4 ECC
Storage4TB NVMe SSD2.84TB NVMe SSD
OSCentos 7Centos 7
CUDA Version12.012.0
AMBER Version2424

BenchmarkRTX 6000 AdaRTX 5000 AdaRTX 4500 AdaH100 PCIeRTX 4090RTX 4080RTX 4070 TiA100 PCIeRTX A6000RTX A5500RTX A5000RTX A4500RTX A4000RTX 3090RTX 3080
JAC Production NVE 4fs1697.341562.481297.881532.081706.211596.791385.681226.401132.861116.011029.89963.52841.321228.411160.34
JAC Production NPT 4fs1666.841550.321278.021500.371641.181598.791293.921257.771117.951126.871025.84951.60829.491197.091158.95
JAC Production NVE 2fs917.70843.16698.69806.39934.00868.79740.55642.79615.92596.22559.03518.54448.53655.52608.60
JAC Production NPT 2fs906.35835.59693.10752.49915.99843.32722.34654.20601.67586.07544.16521.92443.81643.68599.03
FactorIX Production NVE 2fs489.93406.98306.57410.77488.16400.22315.32283.70273.64242.84225.58201.79161.63276.82246.65
FactorIX Production NPT 2fs442.91376.67288.13385.12471.74377.42299.88264.03253.98233.43216.11193.83158.24262.36234.06
Cellulose Production NVE 2fs123.9895.9167.63125.82136.8596.1672.9090.1763.1555.0749.6342.1433.5767.0857.07
Cellulose Production NPT 2fs114.9992.3263.78113.81125.6391.3068.1482.7458.0052.0347.8640.3331.8960.8151.68
STMV Production NPT 4fs70.9755.3037.5874.5082.6057.9939.3653.8439.0835.1232.2927.6621.8741.0534.05
TRPCage GB 2fs1477.121448.251424.881399.511491.751578.441512.261027.351145.561176.411209.861175.601248.801231.971348.60
Myoglobin GB 2fs1016.00841.93740.651094.57888.21843.83772.38656.65648.58592.84580.02536.57491.05614.32624.68
Nucleosome GB 2fs31.5926.1118.8037.8335.9027.6020.8729.6019.7015.3215.1811.5810.9821.1217.60

JAC Production NVE 4fs - 23,558 Atoms

JAC Production NPT 4fs- 23,558 Atoms

JAC Production NVE 2fs - 23,558 Atoms

JAC Production NPT 2fs - 23,558 Atoms

FactorIX Production NVE 2FS - 90,906 Atoms

FactorIX Production NPT 2fs - 90,906 Atoms

Cellulose Production NVE 2fs - 408,609 Atoms

Cellulose Production NPT 2fs - 408,609 Atoms

STMV Production NPT 4fs - 1,067,095 Atoms

TRPCage Production GB - 304 Atoms [Implicit]

Myoglobin Production GB - 2,492 Atoms [Implicit]

Nucleosome Production GB - 25,095 Atoms [Implicit]

AMBER 24 Background & Hardware Recommendations

AMBER consists of several different software packages with the molecular dynamics engine PMEMD as the most compute-intensive and the engine we want to optimize the most. This consists of single CPU (pmemd), multi-CPU (pmemd.MPI), single-GPU (pmemd.cuda) and multi-GPU (pmemd.cuda.MPI) versions. Traditionally, MD simulations are executed on CPUs. However, the increased use of GPUs and native support to run AMBER MD simulations on CUDA have made GPUs the most logical choice for speed and cost efficiency.

Most AMBER simulations can fit on a single GPU and run strictly on CUDA, thus the CPU, CPU memory (RAM), and storage speed have little to no influence on simulation throughput performance. Running simulations on a single GPU means that parallelizing multi-GPUs on a single calculation won’t incur much speed up. To fully utilize a multi-GPU or multi-node deployment is to run multiple independent AMBER simulations simultaneously on multiple GPUs in the same node or on different nodes.

Hardware Recommendation

Our top 3 GPU recommendations for running AMBER and our reasonings:

  • For cost-effective parallel computing, the RTX 5000 Ada or the RTX 4500 Ada offers A-tier and B-tier performance for much lower cost compared with the RTX 6000 Ada. The additional cost of the RTX 6000 Ada stems from the better performance and larger memory, which won’t be utilized in most AMBER calculations. The extra cost can be allocated to more GPUs and thus more calculations running in parallel. A deployment with 8x RTX 4500 Ada GPUs is similar in price to a deployment with 4x RTX 6000 Ada GPUs, but can drastically parallelize your workflow.
  • For peak single GPU throughput with smaller teams would be the NVIDIA RTX 4090 with its S+ tier performance. If you don’t need to run multiple simulations simultaneously, the RTX 4090 delivers the fastest results.
  • For peak throughput and parallel computing, the RTX 6000 Ada GPU delivers S-tier performance akin to the RTX 4090 but allows deployments to slot 4x GPUs in a 2U node or 8x GPUs in a 4U node.

Our CPU & Memory Recommendation

  • There is no need to overspend on a CPU since it will not run the calculations. The bare minimum would be to allocate a CPU core for every GPU in the system. Additional GPUs require dual CPUs for additional PCIe lanes.
  • Recommended RAM would be 32GB per GPU. You can get by with 16GB of RAM per GPU as well.

Conclusion

Not all use cases are the same and AMBER is most likely not the only application used in your research. At Exxact Corp., we strive to provide the resources to configure the best custom system fit for you.

Since AMBER’s performance is not highly affected by the different setups, you may benefit from optimizing your system to other more selective application requirements that you may also use. Applications like GROMACS or NAMD can benefit from additional cores or higher-end CPUs and can be a tradeoff that can benefit other workflows.

We're Here to Deliver the Tools to Power Your Research

With access to the highest performing hardware, at Exxact, we can offer the platform optimized for your deployment, budget, and desired performance so you can make an impact with your research!

Configure your Life Science Solution Today

Topics