Compare performance of different deep learning models¶
This explains the use of fit_powerlaw.py. Only works on deep learning models through the genomicBERT pipeline. For more information on the method, including interpretation, please refer to the publication (https://arxiv.org/pdf/2202.02842.pdf).
Source data¶
Directories containing trained models from a standard huggingface or pytorch workflow can be passed in as input.
Results¶
Note
Entry points are available if this is installed using the automated conda method. You can then use the command line argument directly, for example: create_dataset_bio. If not, you will need to use the script directly, which follows the same naming pattern, for example: python create_dataset_bio.py.
Running the code as below:
python fit_powerlaw.py -i [ INFILE_PATH ... ] -t OUTPUT_DIR -a N
Plots will be output to the directory. A combined plot with all performance overlaid and individual performances will be generated.
Notes¶
Interpreting the plots may not be straightforward. Please refer to the publication for more information (https://arxiv.org/pdf/2202.02842.pdf).
Usage¶
python fit_powerlaw.py -h
usage: fit_powerlaw.py [-h] [-m MODEL_PATH [MODEL_PATH ...]] [-o OUTPUT_DIR]
[-a ALPHA_MAX]
Take trained model dataset and apply power law fit. Acts as a performance
metric which is independent of data. For more information refer here:
https://arxiv.org/pdf/2202.02842.pdf
optional arguments:
-h, --help show this help message and exit
-m MODEL_PATH [MODEL_PATH ...], --model_path MODEL_PATH [MODEL_PATH ...]
path to trained model directory
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
path to output metrics directory (DEFAULT: same as
model_path)
-a ALPHA_MAX, --alpha_max ALPHA_MAX
max alpha value to plot (DEFAULT: 8)
Note
If you are intending to download a model and the directory path matches the one on your disk, you will need to rename or remove it since it will first use local files as a priority!