Amber18 pmemd.cuda patch (August, 2018)

In August, 2018, we released update.4, which fixes a number of bugs, rounds out its feature list, and further advances the performance of the program. The performance in periodic MD simulations exceeds that of comparable Amber16 runs by as much as 41% for default settings on the pairwise cutoff and PME accuracy, with peak gains on the Pascal architecture (GTX 1080-Ti, GP100, and Titan-X cards). Performance gains on the older Maxwell architecture are also significant, and results on the new Volta architecture (V100, Titan-V) have improved over the first installment of pmemd.cuda in Amber18. The new CUDA 9 compiler adds powerful new functionality to the language that the latest cards capitalize on. The pace of overturn is great, and we are now investigating new strategies for coding the algorithms that will take advantage of future CUDA compilers as well as GPUs with even more SMPs than the mammoth Volta.

Prevalent in very large systems, a bug in the pair list generator was causing a cascade of numerical overflows that led to simulations blowing up in the first few hundred steps. The behavior was obvious, not a subtle corruption of results, but annoying in that about one in forty runs could fail just after launch. The underlying problem, present in Amber16 and earlier versions but perhaps unmasked by more recent changes, has been corrected and very large systems now run reliably.
Corrected a host-side memory indexing vulnerability during exclusion mapping
Fixed a vulnerability that laid out too little scratch space for bond work unit mapping
Improved a minor issue with GPU device detection in MPI mode and enforced GPUs with computed capability SM3X or higher
[Thermodynamic integration] Kinetic energy of the center of mass is now initialized in NPT simulations with a Berendsen barostat.
[Thermodynamic integration] Corrected the non-bonded force accumulation in softcore regions
[Thermodynamic integration] To fix a bug in global memory addressing, SM3.7 compilation is specified for K80 chips while SM3.5 remains in use for K40
[Thermodynamic integration] A safety guide for multistream programming was added
[Thermodynamic integration] Force downloads that were previously missing have been added
[Thermodynamic integration] The GTI bond work unit calling sequence has been corrected in MPI mode

In addition, update.4 introduces the skin_permit option, which reduces the frequency at which udpated pairlists are constructed, at the expense of occasionally ignoring certain nonbonded interactions. See the GPU Logistics page for more information.

"How's that for maxed out?"

Last modified: May 3, 2020