Skip to content

Substructure Screenouts

Run an optimised Tversky (a=1, b=0, threshold=1.0) substructure screenout. Note that this is not a full substructure (i.e., with subgraph isomorphism) search.

Tip

It's recommended to use RDKitPattern fingerprint type with this kind of searches.

Use the FPSim2Engine.substructure function to run a substructure screenout.

from FPSim2 import FPSim2Engine

fp_filename = 'chembl_35_v0.6.0.h5'
fpe = FPSim2Engine(fp_filename)

query = 'CC(=O)Oc1ccccc1C(=O)O'
results = fpe.substructure(query, n_workers=1)

Use the FPSim2Engine.on_disk_substructure function to run substructure screenouts on disk. This method is much slower but suitable when working with databases larger than available RAM. To use ONLY if the dataset doesn't fit in memory.

from FPSim2 import FPSim2Engine

fp_filename = 'chembl_35_v0.6.0.h5'
fpe = FPSim2Engine(fp_filename, in_memory_fps=False)

query = 'CC(=O)Oc1ccccc1C(=O)O'
results = fpe.on_disk_substructure(query, n_workers=1)

Parallel Processing

The n_workers parameter can be used to split a single query into multiple threads to speed up the search. This is especially useful when searching large datasets.