A goal of chemical space exploration is the discovery (synthesis/characterization) of novel compositions of matter.[1] Knowledge grounded in explanations of physical causes is most desirable,[2] but any method of obtaining the correct answer (e.g., actually synthesizing a compound) is valuable in practice. For this purpose, any process that consistently produces true beliefs over false ones counts as knowledge,[3] and so even a process that merely uses statistical relationships in text can be admissible. Recently, the rise of pre-trained and fine-tuned large language models (LLMs) has been demonstrated as a useful strategy for organic molecule property regression and classification,[4,5] even if their chemical space representation are unclear. In this talk, I will describe new results on predicting the synthesizability of inorganic compounds (can it be made?) and selecting precursors (how to make it?)—which correspond to a positive/unlabelled and multiclass (set) learning problems. We benchmarked pre-trained and fine-tuned LLMs against recent (traditional) machine-learning approaches.[6] Surprisingly, the LLMs can solve these problems at levels that are comparable to the best traditional approaches. The relative ease, speed, and quality of this LLM-based approach suggests both its broader adoption in chemical discovery and use of methods like these as a general baseline for when reporting the performance of more traditional chemical space prediction methods.
 Joshua Schrier