Compare performance of different deep learning models#

This explains the use of fit_powerlaw.py. Only works on deep learning models through the genomicBERT pipeline. For more information on the method, including interpretation, please refer to the publication (https://arxiv.org/pdf/2202.02842.pdf).

Source data#

Directories containing trained models from a standard huggingface or pytorch workflow can be passed in as input.

Results#

Note

Entry points are available if this is installed using the automated conda method. You can then use the command line argument directly, for example: create_dataset_bio. If not, you will need to use the script directly, which follows the same naming pattern, for example: python create_dataset_bio.py.

Running the code as below:

python fit_powerlaw.py -i [ INFILE_PATH ... ] -t OUTPUT_DIR -a N

Plots will be output to the directory. A combined plot with all performance overlaid and individual performances will be generated.

Notes#

Interpreting the plots may not be straightforward. Please refer to the publication for more information (https://arxiv.org/pdf/2202.02842.pdf).

Usage#

python fit_powerlaw.py -h
usage: fit_powerlaw.py [-h] [-m MODEL_PATH [MODEL_PATH ...]] [-o OUTPUT_DIR]
                       [-a ALPHA_MAX]

Take trained model dataset and apply power law fit. Acts as a performance
metric which is independent of data. For more information refer here:
https://arxiv.org/pdf/2202.02842.pdf

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL_PATH [MODEL_PATH ...], --model_path MODEL_PATH [MODEL_PATH ...]
                        path to trained model directory
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        path to output metrics directory (DEFAULT: same as
                        model_path)
  -a ALPHA_MAX, --alpha_max ALPHA_MAX
                        max alpha value to plot (DEFAULT: 8)

Note

If you are intending to download a model and the directory path matches the one on your disk, you will need to rename or remove it since it will first use local files as a priority!