World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Large Language Models, Computational Chemistry, and Digital Reticular Chemistry: A Perspective and Proposed Workflow

    https://doi.org/10.1142/S2529732524500019Cited by:2 (Source: Crossref)

    Abstract

    In this article, I explore the synergy between Large Language Models (LLMs) and computational chemistry in the context of digital reticular chemistry and propose a workflow leveraging these technologies to advance research and discovery in the field. I argue that understanding the intricacies of new tools is imperative before integrating them into applications, and that the proposed workflow, though robust, merely offers a glimpse into the expansive potential and applications of this field.

    INTRODUCTION

    Chemistry fundamentally revolves around atoms and molecules — their bonding and dissociation. Since chemical structures exist in three-dimensional space, they adhere to specific physical principles. Historically, theoretical chemists investigated how these principles govern chemical systems. However, with the advent of advanced computing, computational chemists have gained prominence, propelling this field to unprecedented levels. Although computational chemistry has its limitations — primarily the need for approximations when solving Schrödinger’s equation for complex, many-particle systems, a complication not encountered in experimental work — it has demonstrated substantial value and continues to play an integral role in exploring and understanding complex chemical interactions and mechanisms.

    Figure 1.

    Figure 1. Typical overflow for digital reticular chemistry.

    DIGITAL CHEMISTRY & LLMs EMERGE

    Nevertheless, the traditional approach to chemistry, and computational chemistry in particular, has recently been confronted by two emergent technologies:

    1.

    Digital Chemistry: This approach integrates digital technology and data science into chemistry, often employing robotics for high-throughput experiments and utilizing machine-learning algorithms to identify patterns and predict certain properties of chemical structures.

    2.

    Large Language Models (LLMs): Including Generative Pre-trained Transformer (GPT) Models: This method uses deep learning models, trained on vast amounts of data1, to estimate the likelihood of a subsequent token (word or character) based on probabilistic principles, given a preexisting sequence of tokens.

    Graph 1.

    Graph 1. Typical overflow for LLMs.

    RESEARCH IN DIGITAL RETICULAR CHEMISTRY

    In this context, I argue that advancements in research and discovery in digital reticular chemistry — which employs robotics and AI to accelerate developing and implementing reticular structures2 — should not be exclusively dependent on a single method, be it Large Language Models, Machine Learning, or Digital Chemistry. Instead, a well-balanced combination of methodologies from various disciplines will provide a more robust foundation for research and discovery in digital reticular chemistry.

    From my perspective, the formation and crystallization of reticular materials — including Metal–Organic Frameworks (MOFs) and Covalent–Organic Frameworks (COFs) — involve several interconnected scientific disciplines:

    1.

    Mathematics: This manifests in the form of geometric considerations, such as the favorability of certain topologies and symmetry.

    2.

    Physics: This is illustrated by the physical characteristics of the chemical reaction system, including but not limited to forces, potentials, dimensions of the system, and thermodynamic and kinetic properties.

    3.

    Chemistry: This is embodied in the form of metal precursors, organic linkers, choice of solvent, their chemical behaviors with changing temperature and pressure, the acid–base properties, and various aspects of bonding — from the molecular level to the macro-architectural scale.

    Thus, predicting and understanding the formation and crystallization of reticular materials necessitates careful consideration of these scientific elements, given the inherently interdisciplinary nature of this field.

    Graph 2.

    Graph 2. Factors from three disciplines that influence MOF formation and crystallization.

    PROPOSED WORKFLOW:

    I propose the following approach to research and discovery in digital reticular chemistry:

    1.

    Employ LLMs solely for language-oriented tasks, such as the recognition and comprehension of literature and the parsing and analysis of variables, with the capability to convert them into datasets. More precisely, leverage LLMs for text-to-text, text-to-parameter, and parameter-to-text transformations, with the option to integrate various tools, including graph reasoning ability and graph-structured data.

    2.

    Integrate high-throughput synthesis with ML to accelerate discovery and optimization.

    3.

    Couple the data-driven approach of highthroughput synthesis with computational chemistry to gain a more nuanced understanding of the simulations, thereby improving ML predictions.

    Figure 2.

    Figure 2. The proposed experiment-driven, computationally guided, and ML-assisted approach to research and discovery in reticular chemistry.

    This way, the experiments would tell us what happened, the computations would explain to us why things happened the way they did, and the ML software would combine the experimental and computational data to provide predictive capabilities and critical scientific insight.

    From my perspective, it is imperative to integrate digital reticular chemistry with computational chemistry to gain insights into how physical principles govern the processes of crystallization and formation of reticular structures. This integration is crucial since the formation of reticular materials encompasses both chemical reactions and crystal growth.

    It is also important to highlight the value of Natural Language Processing (NLP) tools, such as the LLM and the GPT models, which manipulate linguistic entities as opposed to physical or chemical ones. These tools can streamline the utilization and deployment of chemical programming languages (e.g., XDL3,4) and graph-structured data5 or, intriguingly, could even present an opportunity to replace them6. They can also be designed to standardize and systematize reticular chemistry7,8, paving the way for a universal programming language in reticular chemistry that can interface with computer programs to conduct varied simulations and experiments.

    However, my principal concern regarding LLMs is their dependence on “linguistic units” and “probabilistic frequencies,” whereas chemical reactions predominantly rely on “chemical constituents” and related “parameters.” Unless our linguistic constructs evolve to reflect the inherent complexity of chemistry — often manifested through emergent properties9 — they may fall short of truly “echoing the voice of nature.”

    Although transforming empirical data and insights into linguistic form is feasible, the reverse process — translating the inherent ambiguity of language into scientifically valid insights — poses a challenge. This can be limiting in the domain of chemistry, a field primarily driven by the revelation of insights from empirical (experiment-based) or theoretical (computation-based) examinations of reality.

    CONCLUSION

    The proposed workflow highlights the value of incorporating various tools into digital reticular chemistry, thereby establishing a more dynamic platform to accelerate research and propel advancements in reticular chemistry. However, it merely offers a glimpse into the potential capabilities within this field. The swift progression of technology and scientific advancement may unveil new avenues for enhanced precision and nuanced understanding. Here, we can foster the development of scientific knowledge while rendering its methodologies and learning more accessible to the general public.

    CONFLICT OF INTEREST

    Abdullah AlGhamdi declares that he has no conflict of interest relating to the content of this article.