add_filtering_columns module
Add filtering columns for obtaining the different subsets to the dataset.
- add_filtering_columns.add_filtering_columns(dataset: Dataset, args: CalculationArgs, out: OutputArgs)[source]
Add filtering columns to main dataset and save subsets if required.
- Parameters:
dataset (Dataset) – Dataset with compound-target pairs. Will be updated to only include filtering columns.
args (CalculationArgs) – Arguments related to how to calculate the dataset
out (OutputArgs) – Arguments related to how to output the dataset
- add_filtering_columns.add_subset_filtering_columns(df_combined_subset: DataFrame, dataset: Dataset, desc: str, args: CalculationArgs, out: OutputArgs)[source]
Add filtering column for binding + functional vs binding
- Parameters:
df_combined_subset (pd.DataFrame) – Subset with binding+functional (BF) or binding (B) assay-based data in df_combined
dataset (Dataset) – Dataset with compound-target pairs. Will be updated to only include filtering columns.
desc (str) – Assay description, either “BF” (binding+functional) or “B” (binding)
args (CalculationArgs) – Arguments related to how to calculate the dataset
out (OutputArgs) – Arguments related to how to output the dataset
- add_filtering_columns.get_data_subsets(data: DataFrame, min_nof_cpds: int, desc: str) tuple[tuple[DataFrame, str], tuple[DataFrame, str], tuple[DataFrame, str], tuple[DataFrame, str]] [source]
Calculate and return the different subsets of interest.
data: Pandas DataFrame with compound-target pairs without filtering columns and without the annotations for the opposite desc, e.g. if desc = “BF”, the average pchembl value based on binding data only is dropped
df_enough_cpds: Pandas DataFrame with targets with at least <min_nof_cpds> compounds with a pchembl value,
df_c_dt_d_dt: As df_enough_cpds but with at least one compound-target pair labelled as ‘D_DT’, ‘C3_DT’, ‘C2_DT’, ‘C1_DT’ or ‘C0_DT’ (i.e., known interaction),
df_d_dt: As df_enough_cpds but with at least one compound-target pair labelled as ‘D_DT’ (i.e., known drug-target interaction)
- Parameters:
data (pd.DataFrame) – Pandas DataFrame with compound-target pairs
min_nof_cpds (int) – Miminum number of compounds per target
desc (str) – Types of assays current_df contains information about. Options: “BF” (binding+functional), “B” (binding)
- Returns:
List of dataset subsets and the string describing them.
- Return type:
tuple[tuple[pd.DataFrame, str], tuple[pd.DataFrame, str], tuple[pd.DataFrame, str], tuple[pd.DataFrame, str]]