User Guide
The default version of the dataset (the full dataset as a CSV file based on the newest ChEMBL version) can be generated by calling
python main.py -o <output_path>
with further options explained in Arguments.
An overview of the available arguments is also available by calling
python main.py --help
The output will always contain the full dataset as a CSV file. The arguments only allow for the output of additional files or modify how the full dataset is extracted.
Arguments
Parameter |
Required |
Flag |
Default |
Explanation |
---|---|---|---|---|
--chembl, -c |
No |
No |
None |
ChEMBL version. The latest available ChEMBL version is used if this is not set. |
--sqlite, -s |
No |
No |
None |
Path to SQLite database. If this is not set, ChEMBL is downloaded as an SQLite database and handled using the chembl_downloader package. |
--output, -o |
Yes |
No |
None |
Path to write the output file(s) to. |
--delimiter, -d |
No |
No |
; |
Delimiter in output csv-files. |
--all_sources |
No |
Yes |
n/a |
Include all sources if this is set. By default, this is not set, and the dataset is calculated based on only literature sources. |
--rdkit |
No |
Yes |
n/a |
Calculate RDKit-based compound properties if this is set. |
--excel |
No |
Yes |
n/a |
Write the results to excel. Note: this may fail if the output is too large. The results will always be written to csv. |
--BF |
No |
Yes |
n/a |
Write the subsets based on binding and functional assays. |
--B |
No |
Yes |
n/a |
Write the subsets based on binding assays. |
--debug |
No |
Yes |
n/a |
Log additional debugging information. |
Accessing ChEMBL
ChEMBL is accessed either through a given path to an SQLite database download or through the chembl_downloader package. In both cases, SQLite is used to query ChEMBL. Some of the earlier ChEMBL versions are missing tables or fields required to calculate the dataset. Therefore, the earliest ChEMBL version for which the dataset can be calculated is ChEMBL 26.