add_filtering_columns module

Add filtering columns for obtaining the different subsets to the dataset.

add_filtering_columns.add_filtering_columns(dataset: Dataset, args: CalculationArgs, out: OutputArgs)[source]

Add filtering columns to main dataset and save subsets if required.

Parameters:
  • dataset (Dataset) – Dataset with compound-target pairs. Will be updated to only include filtering columns.

  • args (CalculationArgs) – Arguments related to how to calculate the dataset

  • out (OutputArgs) – Arguments related to how to output the dataset

add_filtering_columns.add_subset_filtering_columns(df_combined_subset: DataFrame, dataset: Dataset, desc: str, args: CalculationArgs, out: OutputArgs)[source]

Add filtering column for binding + functional vs binding

Parameters:
  • df_combined_subset (pd.DataFrame) – Subset with binding+functional (BF) or binding (B) assay-based data in df_combined

  • dataset (Dataset) – Dataset with compound-target pairs. Will be updated to only include filtering columns.

  • desc (str) – Assay description, either “BF” (binding+functional) or “B” (binding)

  • args (CalculationArgs) – Arguments related to how to calculate the dataset

  • out (OutputArgs) – Arguments related to how to output the dataset

add_filtering_columns.get_data_subsets(data: DataFrame, min_nof_cpds: int, desc: str) tuple[tuple[DataFrame, str], tuple[DataFrame, str], tuple[DataFrame, str], tuple[DataFrame, str]][source]

Calculate and return the different subsets of interest.

  • data: Pandas DataFrame with compound-target pairs without filtering columns and without the annotations for the opposite desc, e.g. if desc = “BF”, the average pchembl value based on binding data only is dropped

  • df_enough_cpds: Pandas DataFrame with targets with at least <min_nof_cpds> compounds with a pchembl value,

  • df_c_dt_d_dt: As df_enough_cpds but with at least one compound-target pair labelled as ‘D_DT’, ‘C3_DT’, ‘C2_DT’, ‘C1_DT’ or ‘C0_DT’ (i.e., known interaction),

  • df_d_dt: As df_enough_cpds but with at least one compound-target pair labelled as ‘D_DT’ (i.e., known drug-target interaction)

Parameters:
  • data (pd.DataFrame) – Pandas DataFrame with compound-target pairs

  • min_nof_cpds (int) – Miminum number of compounds per target

  • desc (str) – Types of assays current_df contains information about. Options: “BF” (binding+functional), “B” (binding)

Returns:

List of dataset subsets and the string describing them.

Return type:

tuple[tuple[pd.DataFrame, str], tuple[pd.DataFrame, str], tuple[pd.DataFrame, str], tuple[pd.DataFrame, str]]