Tversky Searches
Run a Tversky search. The Tversky similarity coefficient is a generalization of the Tanimoto coefficient that allows asymmetric weighting between query and reference fingerprints.
Using a
and b
Parameters
The a
and b
parameters in the tversky
function control the weighting of the query and reference fingerprints, respectively. Adjusting these values allows you to fine-tune the similarity measure to emphasize different aspects of the fingerprints. For example, setting a
to a higher value than b
will give more weight to the query fingerprint.
Use the FPSim2Engine.tversky
function to run a Tversky search.
from FPSim2 import FPSim2Engine
fp_filename = 'chembl_35_v0.6.0.h5'
fpe = FPSim2Engine(fp_filename)
query = 'CC(=O)Oc1ccccc1C(=O)O'
results = fpe.tversky(query, threshold=0.7, a=0.7, b=0.3, n_workers=1)
Use the FPSim2Engine.on_disk_tversky
function to run an on disk Tversky search (much slower but doesn't require loading the entire fingerprint file into memory).
from FPSim2 import FPSim2Engine
fp_filename = 'chembl_35_v0.6.0.h5'
fpe = FPSim2Engine(fp_filename, in_memory_fps=False)
query = 'CC(=O)Oc1ccccc1C(=O)O'
results = fpe.on_disk_tversky(query, threshold=0.7, a=0.7, b=0.3, n_workers=1)
Parallel Processing
The n_workers
parameter can be used to split a single query into multiple threads to speed up the search.
This is especially useful when searching large datasets.