FPSim2 package

Submodules

FPSim2.FPSim2 module

class FPSim2.FPSim2.FPSim2Engine(fp_filename: str, in_memory_fps: bool = True, fps_sort: bool = False, storage_backend: str = 'pytables')

Bases: FPSim2.base.BaseEngine

FPSim2 class to run fast CPU searches.

Parameters
  • fp_filename (str) – Fingerprints database file path.

  • in_memory_fps (bool) – Whether if the FPs should be loaded into memory or not.

  • fps_sort (bool) – Whether if the FPs should be sorted by popcnt after being loaded into memory or not.

  • storage_backend (str) – Storage backend to use (only pytables available at the moment).

fps

Fingerprints.

Type

Numpy array

popcnt_bins

List with the popcount bins ranges.

Type

list

fp_type

Fingerprint type used to create the fingerprints.

Type

str

fp_params

Parameters used to create the fingerprints.

Type

dict

rdkit_ver

RDKit version used to create the fingerprints.

Type

dict

Examples

on_disk_similarity(query_string: str, threshold: float, n_workers: int = 1, chunk_size: int = 0) → numpy.ndarray

Runs a on disk Tanimoto search.

Parameters
  • query_string (str) – SMILES, InChI or molblock.

  • threshold (float) – Similarity threshold.

  • n_workers (int) – Number of processes used for the search.

  • chunk_size (float) – Chunk size.

Returns

results – Similarity results.

Return type

numpy array

on_disk_substructure(query_string: str, n_workers: int = 1, chunk_size: int = None) → numpy.ndarray

Run a on disk substructure screenout.

Parameters
  • query_string (str) – SMILES, InChI or molblock.

  • n_workers (int) – Number of processes used for the search.

  • chunk_size (float) – Chunk size.

Returns

results – Substructure results.

Return type

numpy array

on_disk_tversky(query_string: str, threshold: float, a: float, b: float, n_workers: int = 1, chunk_size: int = None) → numpy.ndarray

Runs a on disk Tversky search.

Parameters
  • query_string (str) – SMILES, InChI or molblock.

  • threshold (float) – Similarity threshold.

  • a (float) – alpha

  • b (float) – beta

  • n_workers (int) – Number of processes used for the search.

  • chunk_size (float) – Chunk size.

Returns

results – Similarity results.

Return type

numpy array

similarity(query_string: str, threshold: float, n_workers=1) → numpy.ndarray

Runs a Tanimoto search.

Parameters
  • query_string (str) – SMILES, InChI or molblock.

  • threshold (float) – Similarity threshold.

  • n_workers (int) – Number of threads used for the search.

Returns

results – Similarity results.

Return type

numpy array

substructure(query_string: str, n_workers: int = 1) → numpy.ndarray

Run a substructure screenout using an optimised calculation of tversky wiht a=1, b=0

Parameters
  • query_string (str) – SMILES, InChI or molblock.

  • n_workers (int) – Number of processes used for the search.

  • chunk_size (float) – Chunk size.

Returns

results – Substructure results.

Return type

numpy array

symmetric_distance_matrix(threshold: float, search_type: str = 'tanimoto', a: float = 0, b: float = 0, n_workers: int = 4) → scipy.sparse.csr.csr_matrix

Computes the Tanimoto similarity matrix of the set.

Parameters
  • threshold (float) – Similarity threshold.

  • search_type (str) – Type of search.

  • a (float) – alpha in Tversky search.

  • b (float) – beta in Tversky search.

  • n_workers (int) – Number of threads to use.

Returns

results – Similarity results.

Return type

numpy array

tversky(query_string: str, threshold: float, a: float, b: float, n_workers: int = 1) → numpy.ndarray

Runs a Tversky search.

Parameters
  • query_string (str) – SMILES, InChI or molblock.

  • threshold (float) – Similarity threshold.

  • a (float) – alpha

  • b (float) – beta

  • n_workers (int) – Number of threads used for the search.

  • chunk_size (float) – Chunk size.

Returns

results – Similarity results.

Return type

numpy array

FPSim2.FPSim2Cuda module

class FPSim2.FPSim2Cuda.FPSim2CudaEngine(fp_filename: str, fps_sort: bool = False, storage_backend: str = 'pytables', kernel: str = 'raw')

Bases: FPSim2.base.BaseEngine

FPSim2 class to run fast CPU searches.

Parameters
  • fp_filename (str) – Fingerprints database file path.

  • fps_sort (bool) – Wheter if the FPs should be sorted after being loaded into memory or not.

  • storage_backend (str) – Which storage backend to use (only pytables available).

  • kernel (str) – Which CUDA kernel to use (raw or element_wise).

fps

Fingerprints.

Type

Numpy array

popcnt_bins

List with the popcount bins ranges.

Type

list

fp_type

Fingerprint type used to create the fingerprints.

Type

str

fp_params

Parameters used to create the fingerprints.

Type

dict

rdkit_ver

RDKit version used to create the fingerprints.

Type

dict

ew_kernel = '\n int comm_sum = 0;\n for(int j = 1; j < in_width - 1; ++j){\n int pos = i * in_width + j;\n comm_sum += __popcll(db[pos] & query[j]);\n }\n float coeff = 0.0;\n coeff = query[in_width - 1] + db[i * in_width + in_width - 1] - comm_sum;\n if (coeff != 0.0)\n coeff = comm_sum / coeff;\n out[i] = coeff >= threshold ? coeff : 0.0;\n '
raw_kernel = '\n extern "C" __global__\n void taniRAW(const unsigned long long int* query,\n const unsigned long long int* qcount,\n const unsigned long long int* db,\n const unsigned long long int* popcnts,\n const float* threshold,\n float* out) {{\n\n // Shared block array. Only visible for threads in same block\n __shared__ int common[{block}];\n\n int tid = blockDim.x * blockIdx.x + threadIdx.x;\n common[threadIdx.x] = __popcll(query[threadIdx.x] & db[tid]);\n\n // threads need to wait until all threads finish\n __syncthreads();\n\n // thread 0 in each block sums the common bits\n // and calcs the final coeff\n if(0 == threadIdx.x)\n {{\n int comm_sum = 0;\n for(int i=0; i<{block}; i++)\n comm_sum += common[i];\n\n float coeff = 0.0;\n coeff = *qcount + popcnts[blockIdx.x] - comm_sum;\n if (coeff != 0.0)\n coeff = comm_sum / coeff;\n out[blockIdx.x] = coeff >= *threshold ? coeff : 0.0;\n }}\n }}\n '
similarity(query_string: str, threshold: str) → numpy.ndarray

Runs a CUDA Tanimoto search.

Parameters
  • query_string (str) – SMILES, InChI or molblock.

  • threshold (float) – Similarity threshold.

Returns

results – Similarity results.

Return type

numpy array

FPSim2.base module

class FPSim2.base.BaseEngine(fp_filename: str, storage_backend: str, in_memory_fps: bool, fps_sort: bool)

Bases: abc.ABC

fp_filename = None
property fp_params
property fp_type
property fps
load_query(query_string: str) → numpy.ndarray

Loads the query molecule from SMILES, molblock or InChI.

Parameters

query_string (str) – SMILES, InChi or molblock.

Returns

query – Numpy array query molecule.

Return type

numpy array

property popcnt_bins
property rdkit_ver
abstract similarity(query_string: str, threshold: float, n_workers=1) → numpy.ndarray

Tanimoto similarity search

storage = None

Module contents