FPSim2 package¶

Subpackages¶

FPSim2.io package

Submodules¶

FPSim2.FPSim2 module¶

class FPSim2.FPSim2.FPSim2Engine(fp_filename: str, in_memory_fps: bool = True, fps_sort: bool = False, storage_backend: str = 'pytables')¶

Bases: FPSim2.base.BaseEngine

FPSim2 class to run fast CPU searches.

Parameters

fp_filename (str) – Fingerprints database file path.
in_memory_fps (bool) – Whether if the FPs should be loaded into memory or not.
fps_sort (bool) – Whether if the FPs should be sorted by popcnt after being loaded into memory or not.
storage_backend (str) – Storage backend to use (only pytables available at the moment).

fps¶

Fingerprints.

Type: Numpy array

popcnt_bins¶

List with the popcount bins ranges.

Type: list

fp_type¶

Fingerprint type used to create the fingerprints.

Type: str

fp_params¶

Parameters used to create the fingerprints.

Type: dict

rdkit_ver¶

RDKit version used to create the fingerprints.

Type: dict

Examples

on_disk_similarity(query_string: str, threshold: float, n_workers: int = 1, chunk_size: int = 0) → numpy.ndarray¶

Runs a on disk Tanimoto search.

Parameters

query_string (str) – SMILES, InChI or molblock.
threshold (float) – Similarity threshold.
n_workers (int) – Number of processes used for the search.
chunk_size (float) – Chunk size.

Returns

results – Similarity results.

Return type

numpy array

on_disk_substructure(query_string: str, n_workers: int = 1, chunk_size: int = None) → numpy.ndarray¶

Run a on disk substructure screenout.

Parameters

query_string (str) – SMILES, InChI or molblock.
n_workers (int) – Number of processes used for the search.
chunk_size (float) – Chunk size.

Returns

results – Substructure results.

Return type

numpy array

on_disk_tversky(query_string: str, threshold: float, a: float, b: float, n_workers: int = 1, chunk_size: int = None) → numpy.ndarray¶

Runs a on disk Tversky search.

Parameters

query_string (str) – SMILES, InChI or molblock.
threshold (float) – Similarity threshold.
a (float) – alpha
b (float) – beta
n_workers (int) – Number of processes used for the search.
chunk_size (float) – Chunk size.

Returns

results – Similarity results.

Return type

numpy array

similarity(query_string: str, threshold: float, n_workers=1) → numpy.ndarray¶

Runs a Tanimoto search.

Parameters

query_string (str) – SMILES, InChI or molblock.
threshold (float) – Similarity threshold.
n_workers (int) – Number of threads used for the search.

Returns

results – Similarity results.

Return type

numpy array

substructure(query_string: str, n_workers: int = 1) → numpy.ndarray¶

Run a substructure screenout using an optimised calculation of tversky wiht a=1, b=0

Parameters

query_string (str) – SMILES, InChI or molblock.
n_workers (int) – Number of processes used for the search.
chunk_size (float) – Chunk size.

Returns

results – Substructure results.

Return type

numpy array

symmetric_distance_matrix(threshold: float, search_type: str = 'tanimoto', a: float = 0, b: float = 0, n_workers: int = 4) → scipy.sparse.csr.csr_matrix¶

Computes the Tanimoto similarity matrix of the set.

Parameters

threshold (float) – Similarity threshold.
search_type (str) – Type of search.
a (float) – alpha in Tversky search.
b (float) – beta in Tversky search.
n_workers (int) – Number of threads to use.

Returns

results – Similarity results.

Return type

numpy array

tversky(query_string: str, threshold: float, a: float, b: float, n_workers: int = 1) → numpy.ndarray¶

Runs a Tversky search.

Parameters

query_string (str) – SMILES, InChI or molblock.
threshold (float) – Similarity threshold.
a (float) – alpha
b (float) – beta
n_workers (int) – Number of threads used for the search.
chunk_size (float) – Chunk size.

Returns

results – Similarity results.

Return type

numpy array

FPSim2.FPSim2.on_disk_search(search_func: str, query: numpy.array, storage: Any, args, chunk: Tuple[int, int]) → numpy.ndarray¶

FPSim2.FPSim2Cuda module¶

class FPSim2.FPSim2Cuda.FPSim2CudaEngine(fp_filename: str, fps_sort: bool = False, storage_backend: str = 'pytables', kernel: str = 'raw')¶

Bases: FPSim2.base.BaseEngine

FPSim2 class to run fast CPU searches.

Parameters

fp_filename (str) – Fingerprints database file path.
fps_sort (bool) – Wheter if the FPs should be sorted after being loaded into memory or not.
storage_backend (str) – Which storage backend to use (only pytables available).
kernel (str) – Which CUDA kernel to use (raw or element_wise).

fps¶

Fingerprints.

Type: Numpy array

popcnt_bins¶

List with the popcount bins ranges.

Type: list

fp_type¶

Fingerprint type used to create the fingerprints.

Type: str

fp_params¶

Parameters used to create the fingerprints.

Type: dict

rdkit_ver¶

RDKit version used to create the fingerprints.

Type: dict

ew_kernel = '\n int comm_sum = 0;\n for(int j = 1; j < in_width - 1; ++j){\n int pos = i * in_width + j;\n comm_sum += __popcll(db[pos] & query[j]);\n }\n float coeff = 0.0;\n coeff = query[in_width - 1] + db[i * in_width + in_width - 1] - comm_sum;\n if (coeff != 0.0)\n coeff = comm_sum / coeff;\n out[i] = coeff >= threshold ? coeff : 0.0;\n '¶

raw_kernel = '\n extern "C" __global__\n void taniRAW(const unsigned long long int* query,\n const unsigned long long int* qcount,\n const unsigned long long int* db,\n const unsigned long long int* popcnts,\n const float* threshold,\n float* out) {{\n\n // Shared block array. Only visible for threads in same block\n __shared__ int common[{block}];\n\n int tid = blockDim.x * blockIdx.x + threadIdx.x;\n common[threadIdx.x] = __popcll(query[threadIdx.x] & db[tid]);\n\n // threads need to wait until all threads finish\n __syncthreads();\n\n // thread 0 in each block sums the common bits\n // and calcs the final coeff\n if(0 == threadIdx.x)\n {{\n int comm_sum = 0;\n for(int i=0; i<{block}; i++)\n comm_sum += common[i];\n\n float coeff = 0.0;\n coeff = *qcount + popcnts[blockIdx.x] - comm_sum;\n if (coeff != 0.0)\n coeff = comm_sum / coeff;\n out[blockIdx.x] = coeff >= *threshold ? coeff : 0.0;\n }}\n }}\n '¶

similarity(query_string: str, threshold: str) → numpy.ndarray¶

Runs a CUDA Tanimoto search.

Parameters

query_string (str) – SMILES, InChI or molblock.
threshold (float) – Similarity threshold.

Returns

results – Similarity results.

Return type

numpy array

FPSim2.base module¶

class FPSim2.base.BaseEngine(fp_filename: str, storage_backend: str, in_memory_fps: bool, fps_sort: bool)¶

Bases: abc.ABC

fp_filename = None¶

property fp_params¶

property fp_type¶

property fps¶

load_query(query_string: str) → numpy.ndarray¶

Loads the query molecule from SMILES, molblock or InChI.

Parameters: query_string (str) – SMILES, InChi or molblock.
Returns: query – Numpy array query molecule.
Return type: numpy array

property popcnt_bins¶

property rdkit_ver¶

abstract similarity(query_string: str, threshold: float, n_workers=1) → numpy.ndarray¶: Tanimoto similarity search

storage = None¶

FPSim2 package¶

Subpackages¶

Submodules¶

FPSim2.FPSim2 module¶

FPSim2.FPSim2Cuda module¶

FPSim2.base module¶

Module contents¶