Substructure Screenouts
Run an optimised Tversky (a=1, b=0, threshold=1.0) substructure screenout. Note that this is not a full substructure (i.e., with subgraph isomorphism) search.
Tip
It's recommended to use RDKitPattern fingerprint type with this kind of searches.
Use the FPSim2Engine.substructure
function to run a substructure screenout.
from FPSim2 import FPSim2Engine
fp_filename = 'chembl_35_v0.6.0.h5'
fpe = FPSim2Engine(fp_filename)
query = 'CC(=O)Oc1ccccc1C(=O)O'
results = fpe.substructure(query, n_workers=1)
Use the FPSim2Engine.on_disk_substructure
function to run substructure screenouts on disk. This method is much slower but suitable when working with databases larger than available RAM. To use ONLY if the dataset doesn't fit in memory.
from FPSim2 import FPSim2Engine
fp_filename = 'chembl_35_v0.6.0.h5'
fpe = FPSim2Engine(fp_filename, in_memory_fps=False)
query = 'CC(=O)Oc1ccccc1C(=O)O'
results = fpe.on_disk_substructure(query, n_workers=1)
Parallel Processing
The n_workers
parameter can be used to split a single query into multiple threads to speed up the search. This is especially useful when searching large datasets.