Trust Not Verify? The Critical Need for Data Curation Standards in Materials Informatics

The importance of data curation has been recognized in multiple areas of research; however, the discussion of this important issue is only beginning to emerge in materials science. In this Perspective, we highlight the benefits of using the standardized data curation protocols in materials science and discuss current gaps in accurate and reproducible data reporting using case studies drawn from high-impact materials science papers and well-known databases such as the Crystallography Open Database (COD) and the Cambridge Structural Database (CSD). We argue that both experimental and computational materials scientists need to embrace a culture of rigorous data curation as part of modern research data management. We propose a sample data curation pipeline for materials chemistry and illustrate its use by creating two new materials chemistry databases. We hope that this perspective will serve to catalyze further discussion and promote the continuous development of rigorous data curation practices within the materials science research community. We posit that adherence to best practices of data curation will promote and enhance the reliability, reproducibility, and integrity of materials research and enable the development of reliable AI and machine learning models that critically depend on the use of quality data.

Hart, M.; Idanwekhai, K.; Alves, V. M.; Miller, A. J. M.; Dempsey, J. L.; Cahoon, J. F.; Chen, C-H.; Winkler, D. A.; Muratov, E. N.; Tropsha, A. Trust Not Verify? The Critical Need for Data Curation Standards in Materials Informatics, Chem. Mater., 2024, In Press. https://doi.org/10.1021/acs.chemmater.4c00981

Previous
Previous

Fast Catalysis at Low Overpotential: Designing Efficient Dicationic Re(bpy²⁺)(CO)₃I Electrocatalysts for CO₂ Reduction

Next
Next

Diazonium-Functionalized Silicon Hybrid Photoelectrodes: Film Thickness and Composition Effects on Photoelectrochemical Behavior