Compare performance of different deep learning models#
This explains the use of fit_powerlaw.py
. Only works on deep learning models through the genomicBERT
pipeline. For more information on the method, including interpretation, please refer to the publication (https://arxiv.org/pdf/2202.02842.pdf).
Source data#
Directories containing trained models from a standard huggingface
or pytorch
workflow can be passed in as input.
Results#
Note
Entry points are available if this is installed using the automated conda method. You can then use the command line argument directly, for example: create_dataset_bio
. If not, you will need to use the script directly, which follows the same naming pattern, for example: python create_dataset_bio.py
.
Running the code as below:
python fit_powerlaw.py -i [ INFILE_PATH ... ] -t OUTPUT_DIR -a N
Plots will be output to the directory. A combined plot with all performance overlaid and individual performances will be generated.
Notes#
Interpreting the plots may not be straightforward. Please refer to the publication for more information (https://arxiv.org/pdf/2202.02842.pdf).
Usage#
python fit_powerlaw.py -h
usage: fit_powerlaw.py [-h] [-m MODEL_PATH [MODEL_PATH ...]] [-o OUTPUT_DIR]
[-a ALPHA_MAX]
Take trained model dataset and apply power law fit. Acts as a performance
metric which is independent of data. For more information refer here:
https://arxiv.org/pdf/2202.02842.pdf
optional arguments:
-h, --help show this help message and exit
-m MODEL_PATH [MODEL_PATH ...], --model_path MODEL_PATH [MODEL_PATH ...]
path to trained model directory
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
path to output metrics directory (DEFAULT: same as
model_path)
-a ALPHA_MAX, --alpha_max ALPHA_MAX
max alpha value to plot (DEFAULT: 8)
Note
If you are intending to download a model and the directory path matches the one on your disk, you will need to rename or remove it since it will first use local files as a priority!