Compare performance of different deep learning models

This explains the use of fit_powerlaw.py. Only works on deep learning models through the genomicBERT pipeline. For more information on the method, including interpretation, please refer to the publication (https://arxiv.org/pdf/2202.02842.pdf).

Source data

Directories containing trained models from a standard huggingface or pytorch workflow can be passed in as input.

Results

Note

Entry points are available if this is installed using the automated conda method. You can then use the command line argument directly, for example: create_dataset_bio. If not, you will need to use the script directly, which follows the same naming pattern, for example: python create_dataset_bio.py.

Running the code as below:

python fit_powerlaw.py -i [ INFILE_PATH ... ] -t OUTPUT_DIR -a N

Plots will be output to the directory. A combined plot with all performance overlaid and individual performances will be generated.

Notes

Interpreting the plots may not be straightforward. Please refer to the publication for more information (https://arxiv.org/pdf/2202.02842.pdf).

Usage

python fit_powerlaw.py -h
usage: fit_powerlaw.py [-h] [-m MODEL_PATH [MODEL_PATH ...]] [-o OUTPUT_DIR]
                       [-a ALPHA_MAX]

Take trained model dataset and apply power law fit. Acts as a performance
metric which is independent of data. For more information refer here:
https://arxiv.org/pdf/2202.02842.pdf

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL_PATH [MODEL_PATH ...], --model_path MODEL_PATH [MODEL_PATH ...]
                        path to trained model directory
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        path to output metrics directory (DEFAULT: same as
                        model_path)
  -a ALPHA_MAX, --alpha_max ALPHA_MAX
                        max alpha value to plot (DEFAULT: 8)

Note

If you are intending to download a model and the directory path matches the one on your disk, you will need to rename or remove it since it will first use local files as a priority!