get_activity_ct_pairs module
Get initial set of compound-target pairs with an associated activity for the dataset.
- get_activity_ct_pairs.get_aggregated_activity_ct_pairs(chembl_con: Connection, limit_to_literature: bool) Dataset [source]
Wrapper for get_aggregated_compound_target_pairs_with_pchembl, initialising a dataset.
- Parameters:
chembl_con (sqlite3.Connection) – Sqlite3 connection to ChEMBL database.
limit_to_literature (bool) – Include only literature sources if True. Include all available sources otherwise.
- Returns:
Dataset with a pandas Dataframe with compound-target pairs based on ChEMBL activity data aggregated into one entry per compound-target pair.
- Return type:
- get_activity_ct_pairs.get_aggregated_compound_target_pairs_with_pchembl(chembl_con: Connection, limit_to_literature: bool) DataFrame [source]
Get dataset of compound target-pairs with an associated pchembl value with pchembl and publication dates aggregated into one entry per pair.
Values are aggregated for
a subset of the initial dataset based on binding and functional assays (suffix ‘_BF’) and
a subset of the initial dataset set on only binding assays (suffix ‘_B’).
Therefore, there are two columns for pchembl_value_mean, _max, _median, first_publication_cpd_target_pair and first_publication_cpd_target_pair_w_pchembl, one with the suffix ‘_BF’ based on binding + functional data and one with the suffix ‘_B’ based on only binding data.
- Parameters:
chembl_con (sqlite3.Connection) – Sqlite3 connection to ChEMBL database.
limit_to_literature (bool) – Include only literature sources if True. Include all available sources otherwise.
- Returns:
Pandas Dataframe with compound-target pairs based on ChEMBL activity data aggregated into one entry per compound-target pair.
- Return type:
pd.DataFrame
- get_activity_ct_pairs.get_average_info(df: DataFrame, suffix: str) DataFrame [source]
Aggregate the information about compound-target pairs for which there is more than one entry into one entry. Compound-target pairs are considered equal if parent_molregno (internal compound ID) and tid_mutation (target ID + mutation annotations) are equal.
The following values are aggregated:
pchembl_value_mean
mean pchembl value for a compound-target pair
pchembl_value_max
maximum pchembl value for a compound-target pair
pchembl_value_median
median pchembl value for a compound-target pair
first_publication_cpd_target_pair
first publication in ChEMBL with this compound-target pair
first_publication_cpd_target_pair_w_pchembl
first publication in ChEMBL with this compound-target pair and an associated pchembl value
- Parameters:
df (pd.DataFrame) – Pandas DataFrame with compound-target pairs for which the information should be aggregated.
suffix (str) – Suffix indicating the type of the given DataFrame, e.g., _B for binding assays, _BF for binding+functional assays.
- Returns:
Pandas DataFrame with ‘parent_molregno’, ‘tid_mutation’, and the aggregated columns.
- Return type:
pd.DataFrame
- get_activity_ct_pairs.get_compound_target_pairs_with_pchembl(chembl_con: Connection, limit_to_literature: bool) DataFrame [source]
Query ChEMBL activities and related assay for compound-target pairs with an associated pchembl value. Compound-target pairs are required to have a pchembl value. Salt forms of compounds are mapped to their parent form. If limit_to_literature is true, only literature sources will be considered. Otherwise, all sources are included. Includes information about targets, mutations and year of publication (based on docs).
- Parameters:
chembl_con (sqlite3.Connection) – Sqlite3 connection to ChEMBL database.
limit_to_literature (bool) – Include only literature sources if True. Include all available sources otherwise.
- Returns:
Pandas DataFrame with compound-target pairs with a pchembl value.
- Return type:
pd.DataFrame