FPSim2 package¶
Subpackages¶
Submodules¶
FPSim2.FPSim2 module¶
-
class
FPSim2.FPSim2.
FPSim2Engine
(fp_filename: str, in_memory_fps: bool = True, fps_sort: bool = False, storage_backend: str = 'pytables')¶ Bases:
FPSim2.base.BaseEngine
FPSim2 class to run fast CPU searches.
- Parameters
fp_filename (str) – Fingerprints database file path.
in_memory_fps (bool) – Whether if the FPs should be loaded into memory or not.
fps_sort (bool) – Whether if the FPs should be sorted by popcnt after being loaded into memory or not.
storage_backend (str) – Storage backend to use (only pytables available at the moment).
-
fps
¶ Fingerprints.
- Type
Numpy array
-
popcnt_bins
¶ List with the popcount bins ranges.
- Type
list
-
fp_type
¶ Fingerprint type used to create the fingerprints.
- Type
str
-
fp_params
¶ Parameters used to create the fingerprints.
- Type
dict
-
rdkit_ver
¶ RDKit version used to create the fingerprints.
- Type
dict
Examples
-
on_disk_similarity
(query_string: str, threshold: float, n_workers: int = 1, chunk_size: int = 0) → numpy.ndarray¶ Runs a on disk Tanimoto search.
- Parameters
query_string (str) – SMILES, InChI or molblock.
threshold (float) – Similarity threshold.
n_workers (int) – Number of processes used for the search.
chunk_size (float) – Chunk size.
- Returns
results – Similarity results.
- Return type
numpy array
-
on_disk_substructure
(query_string: str, n_workers: int = 1, chunk_size: int = None) → numpy.ndarray¶ Run a on disk substructure screenout.
- Parameters
query_string (str) – SMILES, InChI or molblock.
n_workers (int) – Number of processes used for the search.
chunk_size (float) – Chunk size.
- Returns
results – Substructure results.
- Return type
numpy array
-
on_disk_tversky
(query_string: str, threshold: float, a: float, b: float, n_workers: int = 1, chunk_size: int = None) → numpy.ndarray¶ Runs a on disk Tversky search.
- Parameters
query_string (str) – SMILES, InChI or molblock.
threshold (float) – Similarity threshold.
a (float) – alpha
b (float) – beta
n_workers (int) – Number of processes used for the search.
chunk_size (float) – Chunk size.
- Returns
results – Similarity results.
- Return type
numpy array
-
similarity
(query_string: str, threshold: float, n_workers=1) → numpy.ndarray¶ Runs a Tanimoto search.
- Parameters
query_string (str) – SMILES, InChI or molblock.
threshold (float) – Similarity threshold.
n_workers (int) – Number of threads used for the search.
- Returns
results – Similarity results.
- Return type
numpy array
-
substructure
(query_string: str, n_workers: int = 1) → numpy.ndarray¶ Run a substructure screenout using an optimised calculation of tversky wiht a=1, b=0
- Parameters
query_string (str) – SMILES, InChI or molblock.
n_workers (int) – Number of processes used for the search.
chunk_size (float) – Chunk size.
- Returns
results – Substructure results.
- Return type
numpy array
-
symmetric_distance_matrix
(threshold: float, search_type: str = 'tanimoto', a: float = 0, b: float = 0, n_workers: int = 4) → scipy.sparse.csr.csr_matrix¶ Computes the Tanimoto similarity matrix of the set.
- Parameters
threshold (float) – Similarity threshold.
search_type (str) – Type of search.
a (float) – alpha in Tversky search.
b (float) – beta in Tversky search.
n_workers (int) – Number of threads to use.
- Returns
results – Similarity results.
- Return type
numpy array
-
tversky
(query_string: str, threshold: float, a: float, b: float, n_workers: int = 1) → numpy.ndarray¶ Runs a Tversky search.
- Parameters
query_string (str) – SMILES, InChI or molblock.
threshold (float) – Similarity threshold.
a (float) – alpha
b (float) – beta
n_workers (int) – Number of threads used for the search.
chunk_size (float) – Chunk size.
- Returns
results – Similarity results.
- Return type
numpy array
-
FPSim2.FPSim2.
on_disk_search
(search_func: str, query: numpy.array, storage: Any, args, chunk: Tuple[int, int]) → numpy.ndarray¶
FPSim2.FPSim2Cuda module¶
-
class
FPSim2.FPSim2Cuda.
FPSim2CudaEngine
(fp_filename: str, fps_sort: bool = False, storage_backend: str = 'pytables', kernel: str = 'raw')¶ Bases:
FPSim2.base.BaseEngine
FPSim2 class to run fast CPU searches.
- Parameters
fp_filename (str) – Fingerprints database file path.
fps_sort (bool) – Wheter if the FPs should be sorted after being loaded into memory or not.
storage_backend (str) – Which storage backend to use (only pytables available).
kernel (str) – Which CUDA kernel to use (raw or element_wise).
-
fps
¶ Fingerprints.
- Type
Numpy array
-
popcnt_bins
¶ List with the popcount bins ranges.
- Type
list
-
fp_type
¶ Fingerprint type used to create the fingerprints.
- Type
str
-
fp_params
¶ Parameters used to create the fingerprints.
- Type
dict
-
rdkit_ver
¶ RDKit version used to create the fingerprints.
- Type
dict
-
ew_kernel
= '\n int comm_sum = 0;\n for(int j = 1; j < in_width - 1; ++j){\n int pos = i * in_width + j;\n comm_sum += __popcll(db[pos] & query[j]);\n }\n float coeff = 0.0;\n coeff = query[in_width - 1] + db[i * in_width + in_width - 1] - comm_sum;\n if (coeff != 0.0)\n coeff = comm_sum / coeff;\n out[i] = coeff >= threshold ? coeff : 0.0;\n '¶
-
raw_kernel
= '\n extern "C" __global__\n void taniRAW(const unsigned long long int* query,\n const unsigned long long int* qcount,\n const unsigned long long int* db,\n const unsigned long long int* popcnts,\n const float* threshold,\n float* out) {{\n\n // Shared block array. Only visible for threads in same block\n __shared__ int common[{block}];\n\n int tid = blockDim.x * blockIdx.x + threadIdx.x;\n common[threadIdx.x] = __popcll(query[threadIdx.x] & db[tid]);\n\n // threads need to wait until all threads finish\n __syncthreads();\n\n // thread 0 in each block sums the common bits\n // and calcs the final coeff\n if(0 == threadIdx.x)\n {{\n int comm_sum = 0;\n for(int i=0; i<{block}; i++)\n comm_sum += common[i];\n\n float coeff = 0.0;\n coeff = *qcount + popcnts[blockIdx.x] - comm_sum;\n if (coeff != 0.0)\n coeff = comm_sum / coeff;\n out[blockIdx.x] = coeff >= *threshold ? coeff : 0.0;\n }}\n }}\n '¶
-
similarity
(query_string: str, threshold: str) → numpy.ndarray¶ Runs a CUDA Tanimoto search.
- Parameters
query_string (str) – SMILES, InChI or molblock.
threshold (float) – Similarity threshold.
- Returns
results – Similarity results.
- Return type
numpy array
FPSim2.base module¶
-
class
FPSim2.base.
BaseEngine
(fp_filename: str, storage_backend: str, in_memory_fps: bool, fps_sort: bool)¶ Bases:
abc.ABC
-
fp_filename
= None¶
-
property
fp_params
¶
-
property
fp_type
¶
-
property
fps
¶
-
load_query
(query_string: str) → numpy.ndarray¶ Loads the query molecule from SMILES, molblock or InChI.
- Parameters
query_string (str) – SMILES, InChi or molblock.
- Returns
query – Numpy array query molecule.
- Return type
numpy array
-
property
popcnt_bins
¶
-
property
rdkit_ver
¶
-
abstract
similarity
(query_string: str, threshold: float, n_workers=1) → numpy.ndarray¶ Tanimoto similarity search
-
storage
= None¶
-