Contrary to the the more data the better dogma of deep learning, publicly available experimental chemical and pharmaceutical datasets are few in number and small. To an extent, model transferability can be boosted by including more physical bias. [1] Fundamentally, however, domain-specific high-quality data is needed for widely applicable machine learning models. Pharmaceutical companies have accumulated substantial amounts of curated chemical data. Yet, it is unrealistic to expect them to disclose confidential information compromising proprietary data. Even with legal measures in place, companies are reluctant to share their most valuable data as leaks may occur accidentally. Fully homomorphic encryption enables processing of confidential chemical data (see Fig.1) while keeping the data encrypted. [2] This technology may pave the way for synergetic data-driven collaborations between companies in molecular discovery. We have published a software-as-a-service demo for predictions of pharmacokinetic properties of confidential molecules - click vaultchem.com/demo to try!
Figure 1. Software-as-a-service applications face confidentiality issues: once data is encrypted on a local device it needs to be decrypted for processing. Our demo enables ML predictions while keeping the molecular input data encrypted during processing.
 Jan Weinreich