add_rdkit_compound_descriptors module

Add RDKit-based compound properties to the dataset.

add_rdkit_compound_descriptors.add_aromaticity_descriptors(dataset: Dataset)[source]

Add number of aromatic atoms in a compounds, specifically:

  • total # aromatics atoms (aromatic_atoms)

  • # aromatic carbon atoms (aromatic_c)

  • # aromatic nitrogen atoms (aromatic_n)

  • # aromatic hetero atoms (aromatic_hetero)

Parameters:

dataset (Dataset) – Dataset with compound-target pairs. Will be updated to only include counts of aromatic atoms

add_rdkit_compound_descriptors.add_built_in_descriptors(dataset: Dataset)[source]

Add RDKit built-in compound descriptors.

Parameters:
  • dataset (Dataset) – Dataset with compound-target pairs. Will be updated to only include built-in RDKit compound descriptors.

  • df_combined (pd.DataFrame) – Pandas DataFrame with compound-target pairs

add_rdkit_compound_descriptors.add_rdkit_compound_descriptors(dataset: Dataset)[source]

Add RDKit-based compound descriptors (built-in and numbers of aromatic atoms).

Parameters:

dataset (Dataset) – Dataset with compound-target pairs. Will be updated to only include built-in RDKit compound descriptors and numbers of aromatic atoms.

add_rdkit_compound_descriptors.calculate_aromatic_atoms(smiles_set: set[str]) tuple[dict[str, int], dict[str, int], dict[str, int], dict[str, int]][source]

Get dictionaries with number of aromatic atoms for each smiles.

Parameters:

smiles_set (set[str]) – Set of smiles to calculate the number of aromatic atoms for

Returns:

Dictionaries with:

  • SMILES -> # aromatics atoms

  • SMILES -> # aromatic carbon atoms

  • SMILES -> # aromatic nitrogen atoms

  • SMILES -> # aromatic hetero atoms

Return type:

(dict[str, int], dict[str, int], dict[str, int], dict[str, int])