Skip to content

Substructure Screenouts

Run an optimised Tversky (a=1, b=0, threshold=1.0) substructure screenout. Note that this is not a full substructure (i.e., with subgraph isomorphism) search.

Tip

It's recommended to use RDKitPattern fingerprint type with this kind of searches.

Use the FPSim2Engine.substructure function to run a substructure screenout.

from FPSim2 import FPSim2Engine

fp_filename = 'chembl_35_v0.6.0.h5'
fpe = FPSim2Engine(fp_filename)

query = 'CC(=O)Oc1ccccc1C(=O)O'
results = fpe.substructure(query, n_workers=1)

Use the FPSim2Engine.on_disk_substructure function to run an on disk substructure screenout (much slower but doesn't require loading the entire fingerprint file into memory).

from FPSim2 import FPSim2Engine

fp_filename = 'chembl_35_v0.6.0.h5'
fpe = FPSim2Engine(fp_filename, in_memory_fps=False)

query = 'CC(=O)Oc1ccccc1C(=O)O'
results = fpe.on_disk_substructure(query, n_workers=1)

Parallel Processing

The n_workers parameter can be used to split a single query into multiple threads to speed up the search. This is especially useful when searching large datasets.