Run Tversky searches¶
In memory¶
Use the tversky()
function to run Tversky searches. Tversky is a generalisation of Tanimoto and the Sørensen–Dice coefficients.
Tip
- By setting the a, b and threshold parameters:
a=b=1 will return the same results than a Tanimoto search but will be slower than using
similarity()
.a=b=0.5: will run a Sørensen–Dice search.
a=1, b=0, threshold=1.0 will return the same results than using
substructure()
function but will be slower.
from FPSim2 import FPSim2Engine
fp_filename = 'chembl_27.h5'
fpe = FPSim2Engine(fp_filename)
query = 'CC(=O)Oc1ccccc1C(=O)O'
results = fpe.tversky(query, 0.7, 0.5, 0.5, n_workers=1)
Tip
n_workers parameter can be used to split a single query into multiple threads to speed up the seach. This is specially useful on big datasets.
On disk¶
It is also possible to run on disk similarity searches (i.e. without loading the whole fingerprints file in memory) with the on_disk_tversky()
function. This allows running similarity searches on databases bigger than the available system memory:
from FPSim2 import FPSim2Engine
fp_filename = 'chembl_27.h5'
fpe = FPSim2Engine(fp_filename, in_memory_fps=False)
query = 'CC(=O)Oc1ccccc1C(=O)O'
results = fpe.on_disk_tversky(query, 0.7, 0.5, 0.5, n_workers=1)