Skip to content

Tversky Searches

Run a Tversky search. The Tversky similarity coefficient is a generalization of the Tanimoto coefficient that allows asymmetric weighting between query and reference fingerprints.

Using a and b Parameters

The a and b parameters in the tversky function control the weighting of the query and reference fingerprints, respectively. Adjusting these values allows you to fine-tune the similarity measure to emphasize different aspects of the fingerprints. For example, setting a to a higher value than b will give more weight to the query fingerprint.

Use the FPSim2Engine.tversky function to run a Tversky search.

from FPSim2 import FPSim2Engine

fp_filename = 'chembl_35_v0.6.0.h5'
fpe = FPSim2Engine(fp_filename)

query = 'CC(=O)Oc1ccccc1C(=O)O'
results = fpe.tversky(query, threshold=0.7, a=0.7, b=0.3, n_workers=1)

Use the FPSim2Engine.on_disk_tversky function to run an on disk Tversky search (much slower but doesn't require loading the entire fingerprint file into memory).

from FPSim2 import FPSim2Engine

fp_filename = 'chembl_35_v0.6.0.h5'
fpe = FPSim2Engine(fp_filename, in_memory_fps=False)

query = 'CC(=O)Oc1ccccc1C(=O)O'
results = fpe.on_disk_tversky(query, threshold=0.7, a=0.7, b=0.3, n_workers=1)

Parallel Processing

The n_workers parameter can be used to split a single query into multiple threads to speed up the search. This is especially useful when searching large datasets.