How to use Profiler =================== **Profiler is a simulator for profiling performance of Machine Learning (ML) model scripts.** Given compute- and memory resource constraints for a CPU-based Edge device, Profiler can provide estimates of compute- and memory usage for model scripts on the device. These estimations can be used to choose best performing models or, in certain cases, to predict how much compute and memory models will use on the target device. Because Profiler mimics the target device environment on the user’s development machine, the user can gain insights about the performance and resource needs of a model script without having to deploy it on the target device. Currently, Profiler can be used to: 1. **Select the most efficient model for your target deployment.** With Profiler, you can compare how different models will perform under specific compute and memory constraints. Our studies show that the ranking of models based on runtime or memory use under Profiler mirrors the ranking on a device with the same constraints. 2. **Make model script performance and resource requirements at the Edge more transparent.** Use Profiler to estimate model script’s runtime or memory usage on a device. For similar classes of models (such as different versions of MobileNet or ShuffleNet), there is a straight line fit between model performance under Profiler and on the target device. Once you run two or three models on the device, you can use the results to find that straight line and predict a new model’s performance with Profiler. 3. **Foster lean ML model deployment at the Edge.** By using Profiler, you can assess model-device compatibility and select the most suitable model for your needs without the hustle of going through multiple physical deployment cycles. How Profiler works ------------------ 1. **Simulates Device Constraints.** Profiler allows developers to simulate different **compute** and **memory** constraints for the execution of the application. This is especially useful for ML model deployment, where testing on different edge devices can be tedious and require actual deployment to individual devices to ensure resource constraints are satisfied. Profiler can help easily *approximate* these constraints on a single host device. 2. **Provides Container Support.** Profiler encapsulates the application, its requirements, and corresponding data into a Docker container. It uses user inputs to build a corresponding Docker Image so the application can run independently and without external dependencies. It can then easily be scaled and ported to ease future development and deployment. Profiler also removes the need for a developer to acquaint themselves with internal workings of Docker. 3. **Logs Resource Utilization.** Profiler also tracks and records various resource utilization statistics of the application for debugging purposes. It currently tracks Average CPU Utilization, Memory Usage, and Block I/O. The logger also supports setting the ``Sample Time`` to control how frequently Profiler samples utilization statistics from the Docker container. We have conducted over 300 experiments across multiple models, devices, and compute settings. Full results are available `here. `__ Usage ----- Installation and requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Profiler is automatically installed as part of Auptimizer, further requiring only Docker installation. Please refer to `Docker installation `__ on how to install Docker on your system. Using Profiler ~~~~~~~~~~~~~~ Using Profiler is simple and requires only a few steps. Once Docker and Auptimizer are installed, all you need to do is: 1. Ensure that the prerequisites below are met 2. Set up the Profiler user variables in ``env.template`` 3. Have a script that will train or perform inference on your model 4. run ``python -m aup.profiler`` on your model file(s) (multiple models can be provided as a comma-separated list using the ``-m`` or ``--modellist`` flags or as in a txt file using the ``-f`` or ``--modelfile`` flags) **Profiler flags:** 1. -e or –environment : path to the environment file. 2. -f or –modelfile : path to the text file containing different model names on new lines. 3. -m or –modellist : list of model names as comma(‘,’) separated string (no spaces). Prerequisites ~~~~~~~~~~~~~ The following prerequisites help to simplify the profiling procedure. Experienced users should feel free to modify it as needed. 1. Consolidate your project into a single directory, such that the primary application can run without any internal dependencies (the data itself can be in a separate location). 2. Consolidate your application into a single entry point for execution. Use a wrapper file if needed. This single point of entry is needed because Profiler will execute one command to run a single application file. The application can accept different models as input. Set up Profiler user variables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Profiler can accept two arguments as inputs - the environment file (necessary) and model name list or file (optional). Refer to ``env_mnist.template`` and ``env_benchmark.template`` in `Profiler Examples `__ for examples. Create ``env.template``, and add the following variables as needed: 1. ``IMAGEREPO`` - **REQUIRED** Enter the name of base Docker repository to use. Refer to https://hub.docker.com/ for public repositories. Your base image could be anything from ``tensorflow:1.3.0``, ``python3``, ``ubuntu`` etc. 2. ``APTREQUIREMENTS`` - **OPTIONAL** Enter all linux packages required to run the application as a space-separated string. For example “curl vim”. These packages will be installed using the command ``apt-get install`` so ensure the packages are supported. This variable can also be left empty (using ""). 3. ``PIPREQUIREMENTS`` - **OPTIONAL** Enter all python libraries required to run the application as a space-separated string. For example “ipython numpy”. These packages will be installed using the command ``pip install``, so ensure the libraries are supported. This variable can also be left empty (using “”). 4. ``PRERUN`` - **OPTIONAL** Enter commands to execute before running the applicati0on. ``PRERUN`` can be used to install any libraries that cannot be installed through ``APTREQUIREMENTS`` or ``PIPREQUIREMENTS``. For example, if you need a different version of a library than what is available through pip, you can use PRERUN to install it. See ``env_benchmark.template`` for an example. 5. ``DIR`` - **REQUIRED** Enter the local path to the users consolidated directory containing the application. This directory will be copied over to the Docker container. 6. ``SCRIPT`` - **REQUIRED** The name of the primary application file, along with the path relative to the aforementioned ``DIR``. This allows the container to find and execute the application file. 7. ``COMMAND`` - **REQUIRED** The command used to execute the aforementioned script. For example ``python``. 8. ``SAMPLETIME`` - **REQUIRED** The wait period in seconds, when Profiler will query the Docker for resource utilization. Avoid using time periods smaller than 3 seconds since Profiler internally uses the ``docker stats`` command which takes approximately 3 seconds to finish. User can use decimal points. 9. ``OUTPUTFILE`` - **REQUIRED** The name of the file which will contain all the resource utilization logs with timestamps. 10. ``DOCFILE`` - **REQUIRED** The name of a user-defined Dockerfile, path relative to Profiler directory. This command will supersede all previous variables and build the Docker image from the ``DOCFILE``. The user should only use this variable if they have already tested their Dockerfile with the application to make sure they are compatible. 11. ``DOCKCPUS`` - **OPTIONAL** The amount of CPU processing compute power allowed to the application. Must be real number. Can be a floating point decimal. For example “2.5”. Refer to https://docs.docker.com/config/containers/resource_constraints/. Can be empty - no CPU constraint. 12. ``DOCKMEMORY`` - **OPTIONAL** The amount of memory allowed to the application. Must be a positive integer, followed by a suffix of b, k, m, g, to indicate bytes, kilobytes, megabytes, or gigabytes . For example “156m”. Refer to https://docs.docker.com/config/containers/resource_constraints/. Can be empty - no memory constraint. 13. ``DOCK_ARGS`` - **OPTIONAL** Additional Docker-related arguments are added here. For instance, to allow Docker to run the container with the Privileged tag, use ``--privileged``. Refer to https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities. To use volume to mount additional folder (e.g. data folder), use ``-v /path/in/source:/path/in/destination``. If your primary application needs external model weight files as arguments, you can further provide a list of the names of model weight files. This list can be provided as a list of comma(‘,’) separated strings of the model names or a text file with strings of the model names, each on a new line. Interpreting results -------------------- A summary of each Profiler run can be found in ``out.txt`` (the filename can be user-specified using the ``OUTPUTFILE`` argument in the environment file). The individual model ``OUTPUTFILE``\ s contain the raw values of different metrics profiled at distinct ``SAMPLETIME`` intervals using ``docker stats`` as a subroutine (https://docs.docker.com/engine/reference/commandline/stats/) Each row contains the following values: 1. Name - name of the Docker container. 2. CPU % - the instantaneous cpu utilization (https://docs.docker.com/config/containers/resource_constraints/). 3. MEM USAGE / LIMIT - the instantaneous memory utilization and corresponding limit (https://docs.docker.com/config/containers/resource_constraints/). 4. NET I/O - refers to network input/output, the total amount of data the container has sent and received (https://docs.docker.com/engine/reference/commandline/stats/). 5. BLOCK I/O - refers to the amount of data the container has read to and written from block devices (this could be memory external to the container or to actual HDD use) on the host (https://docs.docker.com/engine/reference/commandline/stats/). 6. TIME - the current timestamp of the measurement. The Usage Stats table shows the average utilization over the container’s lifetime for the aforementioned CPU % and MEM USAGE / LIMIT. For NET I/O and BLOCK I/O the total input/output data metrics are returned, instead of the average statistics. The final usage stats from each run of Profiler is appended to ``OUTPUTFILE`` and provides a quick overview of the result of running Profiler multiple times. Examples -------- We present some examples on how to use profiler in `Profiler Examples `__ folder. TensorFlow Lite Inference Benchmarking ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To use Profiler on TensorFlow Lite Inference Benchmarking classification in the `benchmark `__ folder. 1. [Optional] Use the bench/download.sh script (wget must be installed on your system) to download mobilenet_v1_0.75_224 and mobilenet_v1_1.0_224 (Alternatively, you can download a different set of TensorFlow Lite models from (https://www.tensorflow.org/lite/guide/hosted_models) and save them in `benchmark `__ folder.) 2. If needed, change arguments in ``env_benchmark.template``. 3. Run ``python -m aup.profiler -e env_benchmark.template -m mobilenet_v1_0.75_224.tflite,mobilenet_v1_1.0_224.tflite``. This will create Docker images ``mobilenet_v1_0.75_224_img`` and ``mobilenet_v1_1.0_224_img`` and corresponding Docker containers ``mobilenet_v1_0.75_224_con`` and ``mobilenet_v1_1.0_224_con``. It will execute ``test_perf.py`` within these containers using the ``Docker Volume`` command to run inference on the specified models. Once execution finishes, Profiler will output the following statistics: :: Final Usage Stats NAME AVG CPU % PEAK CPU AVG MEM USAGE / LIMIT PEAK MEM NET I/O BLOCK I/O TOTAL TIME (ms) --------------------- ----------- ---------- ----------------------- ---------- --------------- ------------- ----------------- mobilenet_v1_0.75_224 225.09% 226.68 117.9 MiB / 1.9 GiB 117.9 MiB 742.0 B / 0.0 B 0.0 B / 0.0 B 6164 mobilenet_v1_1.0_224 244.258% 250.83 122.4 MiB / 1.9 GiB 126.9 MiB 766.0 B / 0.0 B 0.0 B / 0.0 B 12354 The results from each timestamp and each individual model are saved in ``model_name``\ +\ ``out.txt`` (can be user-defined via ``OUTPUTFILE`` in ``env_benchmark.template``). Additionally, a general summary is provided in ``out.txt`` containing the final stats for all the tested models. MNIST Training Benchmarking ~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can also use Profiler to profile training. MNIST classification example can be found in the `mnist `__ folder. 1. [Optional] Download the MNIST dataset from (http://yann.lecun.com/exdb/mnist/). Add the ``.gz`` files to the data folder. Then open ``env_mnist.template`` file and edit the ``DOCKER_ARGS`` option with the absolute path to the ``data`` folder as ``-v /data/:/mnist_data``. 2. Change other arguments in the ``env_mnist.template`` if you want. 3. Run ``python -m aup.profiler -e env_mnist.template``. This will create a Docker Image named ``test_image``, and a corresponding Docker Container ``test_container``. It will execute ``mnist.py`` within the container using Docker Volume command to load the data. Once the execution finishes, the Profiler will output the following statistics: :: Final Usage Stats NAME AVG CPU % PEAK CPU AVG MEM USAGE / LIMIT PEAK MEM NET I/O BLOCK I/O TOTAL TIME (ms) -------------- ----------- ---------- ----------------------- ---------- ------------------- ------------- ----------------- test_container 316.532% 337.98 502.3 MiB / 1.9 GiB 537.0 MiB 12.0 MiB / 151.4 kB 0.0 B / 0.0 B 220842 The results from each timestamp are saved in ``out.txt`` (set via ``OUTPUTFILE`` in ``env_mnist.template``).