For virtual chemical databases containing over one billion (1B+) molecules, efficient storage is critical to ensure fast processing and accessibility. MolSoft has developed a highly compressed file format that leverages frequency-based adaptive encoding in internal coordinates, significantly optimizing storage requirements. This format achieves an impressive compression rate, with an average of approximately 400 bytes per conformation stack per molecule. Compared to traditional XYZ storage in single precision, it is roughly 15 times more efficient. A database of 1 billion molecules with conformations can be stored in about 800 GB using this format. The files utilize the .molt extension and are specifically designed for compatibility with GPU-based algorithms such as RIDE, RIDGE, and GigaScreen . To make these files you need to use:
- Chemistry/Generate Conformers. If you have a GINGER GPU license check the tab "Compressed Database using GINGER". For users with a regular CPU license, choose the "Compressed Database" tab instead.