This guide is intended for anyone diving into writing scientific software for computational chemistry. While the primary audience is intended to be early-stage graduate students in computational chemistry or biophysics laboratories that are intending to produce software with a life beyond a single use, we believe that these principles will also benefit even those intending to write single-use software. Many research projects tend to grow beyond their original scope, and by adopting good practices from the beginning, it will be easier to deal with this growth without necessarily having to start from scratch.
Many students coming into computational chemistry graduate programs do not have formal software engineering training. Our intent with this guide is not to provide a rigorous software engineering education, but to identify the key time saving concepts and recommendations that we have found to be most useful for developing computational chemistry code.
While a variety of languages are used for computational chemistry codes---especially to achieve high performance---we focus on Python as a high-level language to glue everything together. Python has enormously increased productivity by allowing users to take advantage of all manner of useful libraries, from linear algebra to numerical computing to plotting to GPU-accelerated simulations. Those just starting out with Python will find the Software Carpentry courses and MolSSI Educational Resources to be excellent resources. Programmers with basic Python expertise may find books such as Fluent Python to be very helpful in exploring the more powerful aspects of the Python language.
Without discouraging innovation and exploration of new approaches, we hope to encourage laboratories producing modern computational chemistry software to adhere to some common guidelines for the following reasons:
- Minimize technical debt. Any laboratory where every project has a completely different structure, philosophy, language, approach to documentation, testing scheme, and deployment strategy will accumulate a huge amount of technical debt, leading to high software entropy. This makes it difficult to connect software components together (because so many things differ in their structure). It also makes it difficult for laboratory members to work together, because both must be familiar with a much broader range of tools that encompass both projects. Overall time spent on maintenance does not benefit from economies of scale. And when one member of the laboratory leaves, there is the danger that remaining members may not be sufficiently familiar with the software development strategy to continue the project.
- Facilitate collaboration. If instead, most projects in a laboratory, or in different laboratories, are organized along similar principles, it immediately becomes easier for members to collaborate on shared projects, or to link up projects that are organized similarly. Little time is wasted in becoming familiar with other projects because they follow similar organizing principles.
- Minimize barriers to users. Software that presents a similar interface to users makes it very easy for users to get started with a new piece of software very quickly. For example, the msmbuilder and pyemma projects present similar interfaces to users, both roughly following the scikit-learn philosophy of how models are fit to data.
- Encourage interoperability. To make software interoperable, data must flow from one tool to another. To avoid having to construct bridges between tools from scratch each times, adopting standard ways of exchanging information---especially in computational chemistry codes---can make it easy for multiple codes to interoperate without special glue code.
We intend to cover only the most important aspects of the critical components of software development for computational chemistry, providing pointers for those who wish to read about certain topics in more depth.
The Molecular Sciences Software Institute (MolSSI) provides a Computational Molecular Sciences (CMS) cookiecutter for users who want to have an automated fresh Python package created for them which follows the recommendations presented in this guide. A cookiecutter template provides an automated way to set up files and directories inside a git repository tailored to the application of interest in a manner that already has working functionality, making it easy for the developer to dive in and start coding while adhering to best practices. We highly encourage people to start from this cookiecutter as it automatically creates the structure, files for continuous integration, documentation, distribution, version control, licensing, and others, saving 30-60 minutes of work!
The MolSSI CMS Cookiecutter and its two-line installation instructions can be found at:
Computational Chemistry Cookiecutter
Want to contribute to this repository? Have a suggestion for an improvement? Spot a typo? We're always looking to improve this document for the betterment of all.
- Please feel free to open a new issue with your feedback and suggestions!
- Or make a pull request from your branch or fork!
Previous: | Next: |
---|---|
Table of Contents | Licensing Guidelines |