How to use Profiler

Profiler is a simulator for profiling performance of Machine Learning (ML) model scripts. Given compute- and memory resource constraints for a CPU-based Edge device, Profiler can provide estimates of compute- and memory usage for model scripts on the device. These estimations can be used to choose best performing models or, in certain cases, to predict how much compute and memory models will use on the target device. Because Profiler mimics the target device environment on the user’s development machine, the user can gain insights about the performance and resource needs of a model script without having to deploy it on the target device.

Currently, Profiler can be used to:

  1. Select the most efficient model for your target deployment. With Profiler, you can compare how different models will perform under specific compute and memory constraints. Our studies show that the ranking of models based on runtime or memory use under Profiler mirrors the ranking on a device with the same constraints.

  2. Make model script performance and resource requirements at the Edge more transparent. Use Profiler to estimate model script’s runtime or memory usage on a device. For similar classes of models (such as different versions of MobileNet or ShuffleNet), there is a straight line fit between model performance under Profiler and on the target device. Once you run two or three models on the device, you can use the results to find that straight line and predict a new model’s performance with Profiler.

  3. Foster lean ML model deployment at the Edge. By using Profiler, you can assess model-device compatibility and select the most suitable model for your needs without the hustle of going through multiple physical deployment cycles.

How Profiler works

  1. Simulates Device Constraints. Profiler allows developers to simulate different compute and memory constraints for the execution of the application. This is especially useful for ML model deployment, where testing on different edge devices can be tedious and require actual deployment to individual devices to ensure resource constraints are satisfied. Profiler can help easily approximate these constraints on a single host device.

  2. Provides Container Support. Profiler encapsulates the application, its requirements, and corresponding data into a Docker container. It uses user inputs to build a corresponding Docker Image so the application can run independently and without external dependencies. It can then easily be scaled and ported to ease future development and deployment. Profiler also removes the need for a developer to acquaint themselves with internal workings of Docker.

  3. Logs Resource Utilization. Profiler also tracks and records various resource utilization statistics of the application for debugging purposes. It currently tracks Average CPU Utilization, Memory Usage, and Block I/O. The logger also supports setting the Sample Time to control how frequently Profiler samples utilization statistics from the Docker container.

We have conducted over 300 experiments across multiple models, devices, and compute settings. Full results are available here.


Installation and requirements

Profiler is automatically installed as part of Auptimizer, further requiring only Docker installation. Please refer to Docker installation on how to install Docker on your system.

Using Profiler

Using Profiler is simple and requires only a few steps. Once Docker and Auptimizer are installed, all you need to do is:

  1. Ensure that the prerequisites below are met

  2. Set up the Profiler user variables in env.template

  3. Have a script that will train or perform inference on your model

  4. run python -m aup.profiler on your model file(s) (multiple models can be provided as a comma-separated list using the -m or --modellist flags or as in a txt file using the -f or --modelfile flags)

Profiler flags:

  1. -e or –environment : path to the environment file.

  2. -f or –modelfile : path to the text file containing different model names on new lines.

  3. -m or –modellist : list of model names as comma(‘,’) separated string (no spaces).


The following prerequisites help to simplify the profiling procedure. Experienced users should feel free to modify it as needed.

  1. Consolidate your project into a single directory, such that the primary application can run without any internal dependencies (the data itself can be in a separate location).

  2. Consolidate your application into a single entry point for execution. Use a wrapper file if needed. This single point of entry is needed because Profiler will execute one command to run a single application file. The application can accept different models as input.

Set up Profiler user variables

Profiler can accept two arguments as inputs - the environment file (necessary) and model name list or file (optional). Refer to env_mnist.template and env_benchmark.template in Profiler Examples for examples.

Create env.template, and add the following variables as needed:

  1. IMAGEREPO - REQUIRED Enter the name of base Docker repository to use. Refer to for public repositories. Your base image could be anything from tensorflow:1.3.0, python3, ubuntu etc.

  2. APTREQUIREMENTS - OPTIONAL Enter all linux packages required to run the application as a space-separated string. For example “curl vim”. These packages will be installed using the command apt-get install so ensure the packages are supported. This variable can also be left empty (using “”).

  3. PIPREQUIREMENTS - OPTIONAL Enter all python libraries required to run the application as a space-separated string. For example “ipython numpy”. These packages will be installed using the command pip install, so ensure the libraries are supported. This variable can also be left empty (using “”).

  4. PRERUN - OPTIONAL Enter commands to execute before running the applicati0on. PRERUN can be used to install any libraries that cannot be installed through APTREQUIREMENTS or PIPREQUIREMENTS. For example, if you need a different version of a library than what is available through pip, you can use PRERUN to install it. See env_benchmark.template for an example.

  5. DIR - REQUIRED Enter the local path to the users consolidated directory containing the application. This directory will be copied over to the Docker container.

  6. SCRIPT - REQUIRED The name of the primary application file, along with the path relative to the aforementioned DIR. This allows the container to find and execute the application file.

  7. COMMAND - REQUIRED The command used to execute the aforementioned script. For example python.

  8. SAMPLETIME - REQUIRED The wait period in seconds, when Profiler will query the Docker for resource utilization. Avoid using time periods smaller than 3 seconds since Profiler internally uses the docker stats command which takes approximately 3 seconds to finish. User can use decimal points.

  9. OUTPUTFILE - REQUIRED The name of the file which will contain all the resource utilization logs with timestamps.

  10. DOCFILE - REQUIRED The name of a user-defined Dockerfile, path relative to Profiler directory. This command will supersede all previous variables and build the Docker image from the DOCFILE. The user should only use this variable if they have already tested their Dockerfile with the application to make sure they are compatible.

  11. DOCKCPUS - OPTIONAL The amount of CPU processing compute power allowed to the application. Must be real number. Can be a floating point decimal. For example “2.5”. Refer to Can be empty - no CPU constraint.

  12. DOCKMEMORY - OPTIONAL The amount of memory allowed to the application. Must be a positive integer, followed by a suffix of b, k, m, g, to indicate bytes, kilobytes, megabytes, or gigabytes . For example “156m”. Refer to Can be empty - no memory constraint.

  13. DOCK_ARGS - OPTIONAL Additional Docker-related arguments are added here. For instance, to allow Docker to run the container with the Privileged tag, use --privileged. Refer to To use volume to mount additional folder (e.g. data folder), use -v /path/in/source:/path/in/destination.

If your primary application needs external model weight files as arguments, you can further provide a list of the names of model weight files. This list can be provided as a list of comma(‘,’) separated strings of the model names or a text file with strings of the model names, each on a new line.

Interpreting results

A summary of each Profiler run can be found in out.txt (the filename can be user-specified using the OUTPUTFILE argument in the environment file).

The individual model OUTPUTFILEs contain the raw values of different metrics profiled at distinct SAMPLETIME intervals using docker stats as a subroutine (

Each row contains the following values:

  1. Name - name of the Docker container.

  2. CPU % - the instantaneous cpu utilization (

  3. MEM USAGE / LIMIT - the instantaneous memory utilization and corresponding limit (

  4. NET I/O - refers to network input/output, the total amount of data the container has sent and received (

  5. BLOCK I/O - refers to the amount of data the container has read to and written from block devices (this could be memory external to the container or to actual HDD use) on the host (

  6. TIME - the current timestamp of the measurement.

The Usage Stats table shows the average utilization over the container’s lifetime for the aforementioned CPU % and MEM USAGE / LIMIT. For NET I/O and BLOCK I/O the total input/output data metrics are returned, instead of the average statistics.

The final usage stats from each run of Profiler is appended to OUTPUTFILE and provides a quick overview of the result of running Profiler multiple times.


We present some examples on how to use profiler in Profiler Examples folder.

TensorFlow Lite Inference Benchmarking

To use Profiler on TensorFlow Lite Inference Benchmarking classification in the benchmark folder.

  1. [Optional] Use the bench/ script (wget must be installed on your system) to download mobilenet_v1_0.75_224 and mobilenet_v1_1.0_224 (Alternatively, you can download a different set of TensorFlow Lite models from ( and save them in benchmark folder.)

  2. If needed, change arguments in env_benchmark.template.

  3. Run python -m aup.profiler -e env_benchmark.template -m mobilenet_v1_0.75_224.tflite,mobilenet_v1_1.0_224.tflite.

This will create Docker images mobilenet_v1_0.75_224_img and mobilenet_v1_1.0_224_img and corresponding Docker containers mobilenet_v1_0.75_224_con and mobilenet_v1_1.0_224_con. It will execute within these containers using the Docker Volume command to run inference on the specified models. Once execution finishes, Profiler will output the following statistics:

Final Usage Stats
NAME                   AVG CPU %      PEAK CPU  AVG MEM USAGE / LIMIT    PEAK MEM    NET I/O          BLOCK I/O        TOTAL TIME (ms)
---------------------  -----------  ----------  -----------------------  ----------  ---------------  -------------  -----------------
mobilenet_v1_0.75_224  225.09%          226.68  117.9 MiB / 1.9 GiB      117.9 MiB   742.0 B / 0.0 B  0.0 B / 0.0 B               6164
mobilenet_v1_1.0_224   244.258%         250.83  122.4 MiB / 1.9 GiB      126.9 MiB   766.0 B / 0.0 B  0.0 B / 0.0 B              12354

The results from each timestamp and each individual model are saved in model_name+out.txt (can be user-defined via OUTPUTFILE in env_benchmark.template). Additionally, a general summary is provided in out.txt containing the final stats for all the tested models.

MNIST Training Benchmarking

You can also use Profiler to profile training. MNIST classification example can be found in the mnist folder.

  1. [Optional] Download the MNIST dataset from ( Add the .gz files to the data folder. Then open env_mnist.template file and edit the DOCKER_ARGS option with the absolute path to the data folder as -v /data/:/mnist_data.

  2. Change other arguments in the env_mnist.template if you want.

  3. Run python -m aup.profiler -e env_mnist.template.

This will create a Docker Image named test_image, and a corresponding Docker Container test_container. It will execute within the container using Docker Volume command to load the data. Once the execution finishes, the Profiler will output the following statistics:

Final Usage Stats
NAME            AVG CPU %      PEAK CPU  AVG MEM USAGE / LIMIT    PEAK MEM    NET I/O              BLOCK I/O        TOTAL TIME (ms)
--------------  -----------  ----------  -----------------------  ----------  -------------------  -------------  -----------------
test_container  316.532%         337.98  502.3 MiB / 1.9 GiB      537.0 MiB   12.0 MiB / 151.4 kB  0.0 B / 0.0 B             220842

The results from each timestamp are saved in out.txt (set via OUTPUTFILE in env_mnist.template).