get_activity_ct_pairs module

Get initial set of compound-target pairs with an associated activity for the dataset.

get_activity_ct_pairs.get_aggregated_activity_ct_pairs(chembl_con: Connection, limit_to_literature: bool) → Dataset[source]

Wrapper for get_aggregated_compound_target_pairs_with_pchembl, initialising a dataset.

Parameters:

chembl_con (sqlite3.Connection) – Sqlite3 connection to ChEMBL database.
limit_to_literature (bool) – Include only literature sources if True. Include all available sources otherwise.

Returns:

Dataset with a pandas Dataframe with compound-target pairs based on ChEMBL activity data aggregated into one entry per compound-target pair.

Return type:

Dataset

get_activity_ct_pairs.get_aggregated_compound_target_pairs_with_pchembl(chembl_con: Connection, limit_to_literature: bool) → DataFrame[source]

Get dataset of compound target-pairs with an associated pchembl value with pchembl and publication dates aggregated into one entry per pair.

Values are aggregated for

a subset of the initial dataset based on binding and functional assays (suffix ‘_BF’) and
a subset of the initial dataset set on only binding assays (suffix ‘_B’).

Therefore, there are two columns for pchembl_value_mean, _max, _median, first_publication_cpd_target_pair and first_publication_cpd_target_pair_w_pchembl, one with the suffix ‘_BF’ based on binding + functional data and one with the suffix ‘_B’ based on only binding data.

Parameters:

chembl_con (sqlite3.Connection) – Sqlite3 connection to ChEMBL database.
limit_to_literature (bool) – Include only literature sources if True. Include all available sources otherwise.

Returns:

Pandas Dataframe with compound-target pairs based on ChEMBL activity data aggregated into one entry per compound-target pair.

Return type:

pd.DataFrame

get_activity_ct_pairs.get_average_info(df: DataFrame, suffix: str) → DataFrame[source]

Aggregate the information about compound-target pairs for which there is more than one entry into one entry. Compound-target pairs are considered equal if parent_molregno (internal compound ID) and tid_mutation (target ID + mutation annotations) are equal.

The following values are aggregated:

pchembl_value_mean	mean pchembl value for a compound-target pair
pchembl_value_max	maximum pchembl value for a compound-target pair
pchembl_value_median	median pchembl value for a compound-target pair
first_publication_cpd_target_pair	first publication in ChEMBL with this compound-target pair
first_publication_cpd_target_pair_w_pchembl	first publication in ChEMBL with this compound-target pair and an associated pchembl value

Parameters:

df (pd.DataFrame) – Pandas DataFrame with compound-target pairs for which the information should be aggregated.
suffix (str) – Suffix indicating the type of the given DataFrame, e.g., _B for binding assays, _BF for binding+functional assays.

Returns:

Pandas DataFrame with ‘parent_molregno’, ‘tid_mutation’, and the aggregated columns.

Return type:

pd.DataFrame

get_activity_ct_pairs.get_compound_target_pairs_with_pchembl(chembl_con: Connection, limit_to_literature: bool) → DataFrame[source]

Query ChEMBL activities and related assay for compound-target pairs with an associated pchembl value. Compound-target pairs are required to have a pchembl value. Salt forms of compounds are mapped to their parent form. If limit_to_literature is true, only literature sources will be considered. Otherwise, all sources are included. Includes information about targets, mutations and year of publication (based on docs).

Parameters:

chembl_con (sqlite3.Connection) – Sqlite3 connection to ChEMBL database.
limit_to_literature (bool) – Include only literature sources if True. Include all available sources otherwise.

Returns:

Pandas DataFrame with compound-target pairs with a pchembl value.

Return type:

pd.DataFrame