DNA-Encoded Libraries for novel Hit discovery

Selection of active compounds from the binders and searching for the best analogues in commercial spaces

A protein target is selected, and an inseparable DNA-Encoded Library (DEL) is created, containing approximately 50 billion compounds marked with DNA tags and chemical building blocks. A binding assessment is then performed using methods such as AS/MS and SPR to identify compounds that bind to the target protein. The top 100,000 DEL binders are selected, and tag sequencing is conducted to filter out inactive binders. From these binders, active compounds are identified using the Receptor.ai fit-for-target workflow. To reduce the cost and time of synthesis, the platform is also used to search for the best analogues among over 30 billion commercially available compounds. Around 100 active compounds are identified for further study.

Advantages of our DEL pipeline:

Efficient distinguishing of active compounds from the noise of the multiple binders.
Advanced similarity search of active compounds to reduce cost and time for the synthesis.
Compatible with all major DEL technologies.
Significant increase in success rate.
Multiparametric optimisation and annotation of each compound.

Workflow for appropriate target-specific DEL library selection to reduce the noise of “inactive binders”

The search is conducted in a commercial space of approximately 20 billion compounds using the Receptor.ai fit-for-target workflow to design initial hits. This results in a query of over 1,000 hit candidates. A set of more than 100 billion DNA-encoded libraries is then screened to select the most appropriate target-specific DEL library, based on a diverse and active fragment similarity query. A binding assessment is performed, and sequence reads are conducted on a reduced DEL library of approximately 1 billion compounds to decrease noise. Active compounds are then selected from the binders, and the Receptor.ai platform is used to search for analogues in commercial spaces to reduce the cost and time of synthesis. Approximately 100 active compounds are selected for additional investigation.

Advantages of our DEL pipeline:

False positive binders noise reducing.
Efficient distinguishing of active compounds from the noise of multiple binders.
Advanced similarity search of active compounds to reduce cost and time for synthesis.
Compatible with all major DEL technologies.
Significant increase in success rate.
Multiparametric optimisation and annotation of each compound

The Architecture of fragment-based DTI for DEL activity prediction

The DTI prediction process begins with a search for the best molecular fragments for each protein subpocket using a fragment library. The fragments are represented as graphs and processed through graph neural network blocks, followed by dense neural network layers. This process identifies the best fragments for each protein subpocket.

Next, the compatibility of these fragments is checked using reaction-specific dense neural network layers. Some reactions may fail, while others succeed, helping to identify the most compatible fragments.

Finally, the compatible fragments are combined into a complete molecule. The fragments are encoded using a SMILES tokenizer, followed by attention-based neural machine translation. The translated sequences are then decoded into a whole molecule using a detokenizer. The best ligand is ultimately constructed from the selected fragments within the protein binding pocket.