Amber18 pmemd.cuda patch (August, 2018)
In August, 2018, we released update.4, which fixes a number of bugs, rounds out
its feature list, and further advances the performance of the program. The performance in
periodic MD simulations exceeds that of comparable Amber16 runs by as much as 41% for
default settings on the pairwise cutoff and PME accuracy, with peak gains on the
Pascal architecture (GTX 1080-Ti, GP100, and Titan-X cards). Performance gains on the older
Maxwell architecture are also significant, and results on the new Volta architecture (V100,
Titan-V) have improved over the first installment of pmemd.cuda in Amber18. The new CUDA 9
compiler adds powerful new functionality to the language that the latest cards capitalize on.
The pace of overturn is great, and we are now investigating new strategies for coding the
algorithms that will take advantage of future CUDA compilers as well as GPUs with even
more SMPs than the mammoth Volta.
- Prevalent in very large systems, a bug in the pair list generator was
causing a cascade of numerical overflows that led to simulations blowing up in the first
few hundred steps. The behavior was obvious, not a subtle corruption of results, but
annoying in that about one in forty runs could fail just after launch. The underlying
problem, present in Amber16 and earlier versions but perhaps unmasked by more recent
changes, has been corrected and very large systems now run reliably.
- Corrected a host-side memory indexing vulnerability during exclusion
mapping
- Fixed a vulnerability that laid out too little scratch space for bond work
unit mapping
- Improved a minor issue with GPU device detection in MPI mode and enforced
GPUs with computed capability SM3X or higher
- [Thermodynamic integration] Kinetic energy of the center of mass is now
initialized in NPT simulations with a Berendsen barostat.
- [Thermodynamic integration] Corrected the non-bonded force accumulation in
softcore regions
- [Thermodynamic integration] To fix a bug in global memory addressing, SM3.7
compilation is specified for K80 chips while SM3.5 remains in use for K40
- [Thermodynamic integration] A safety guide for multistream programming was
added
- [Thermodynamic integration] Force downloads that were previously missing have
been added
- [Thermodynamic integration] The GTI bond work unit calling sequence has been
corrected in MPI mode
In addition, update.4 introduces the skin_permit
option, which reduces the frequency at which udpated pairlists are
constructed, at the expense of occasionally ignoring certain nonbonded
interactions. See the GPU Logistics page
for more information.
|