MatCloud: A High-throughput Computational Infrastructure For Integrated Management Of Materials Simulation, Data And Resources
Published 2018 · Materials Science
Abstract In-silico material design that Materials Genome Initiative promotes usually involves running large-scale high-throughput simulation jobs for structures screening and simulation of materials properties. This determines a need for the management of large number of simulation jobs, large quantities of data at each simulation stage, and availability of computing resources. Once a simulation completes, some core material properties cannot be directly acquired which need some manual post-simulation data processing. Also, the simulation data include simulation input and output files, intermediate results files, log and error files, associated metadata, etc. How to effectively acquire, store and manage them in an automatic manner? Simulation of a material property involves a series of simulation procedures connected. How to let user flexibly organise these simulation procedures on-demand for different material properties? For those experiment-centric materials scientists who are not familiar with Density Functional Theory (DFT) want to run theoretical simulations, how can they run simulations easily without deep understanding of DFT, or without having to build their computing clusters or buying computing resources? To address these needs, a high-throughput computational infrastructure for the integrated management of material simulation, data and resources, namely, MatCloud, has been developed. This paper illustrates challenges of high-throughput DFT simulations, the development of MatCloud, and how MatCloud supports the integrated management of materials simulation, data and computing resources.