2.1 Dockerfile

FILENAME: Dockerfile This file will be used for the docker build process so try to keep the default naming convention.

Defining a Base Image

Typically, a Dockerfile will include a base image such as FROM ubuntu:xenial at the top of the file. We highly recommend using the Ubuntu Xenial 16.04 image for your module as it has been tested rigorously and is the base image we use for BisQue.

NOTE. If you would like to use a different Ubuntu flavor like 18.04 Bionic, we encourage you to do so and let us know of any problems you encounter.

We typically start off our module Dockerfiles with the following lines:

FROM ubuntu:xenial
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get -y update   && \
    apt-get -y upgrade  && \
    apt-get -y install python

APT-GET Installs

After these lines, we need to add dependencies for our module such as pip, headers, and any compilers. The next few lines are where we put any apt-get installs. Usually this is good practice since some pip install packages require a compiler. If one is not present in the container, it might be hard to debug what is missing.

RUN apt-get -y install python-pip liblapack3 libblas-dev liblapack-dev gfortran
RUN apt-get update   # <--- Run before pip installs

PIP Installs

Now we add any pip installs that our module may need such as numpy, pandas, Tensorflow, and any others. If you used a virtual environment to develop module locally, and hopefully you did, then simply use pip freeze > requirements.txt. This will give you a text file with all the packages you are using in your virtual environment for your module. If you are a diligent person, you probably do not use one virtual environment for all your development in, say, Python 3.6. Hence, you will only have the necessary packages in the requirements.txt file. From this file, fill in the necessary pip installs within the Dockerfile.

RUN pip install numpy pandas tensorflow
RUN pip install scikit-learn==0.19.1   # <--- For specific versions
RUN pip install -i https://biodev.ece.ucsb.edu/py/bisque/prod/+simple bisque-api==0.5.9

Working Directory, Source Code

We typically define the working directory as follows:

WORKDIR /module

After this, you can put all of your source code along with the PythonScriptWrapper inside the /modules directory.

COPY PythonScriptWrapper /module/
COPY PythonScriptWrapper.py /module/
COPY YOUR_MODULE_SCRIPT.py /module/  #  <--- Source folders welcome too
COPY pydist /module/pydist/
ENV PATH /module:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

The last line is the ENV instruction which sets the environment variable <key> to the value <value>. This value will be in the environment for all subsequent instructions in the build stage and can be replaced inline in many as well.

What Should My Last Line Be?

Your last line in the Dockerfile should be the CMD command. There can only be one CMD instruction in a Dockerfile. If you list more than one CMD then only the last CMD will take effect.

The main purpose of a CMD is to provide defaults for an executing container.

For us, we will use the PythonScriptWrapper as our CMD command as follows:

CMD [ 'PythonScriptWrapper' ]

Simple, am I right? Let’s put all this together in a concrete example.


Example: Composite Strength

The Composite Strength module requires a variety of special packages, such as a very specific version of scikit-learn==0.19.1. Please be aware that if you do pickle files using scikit-learn, you might have to use the same exact version to unpickle the file. Some of us found that out the hard way, hence the word of caution.

This simple mistake can be extended to any number of machine learning tasks. We recommend that for maximum reproducibility, use the exact same version of all pip install packages as you did on your local development environment.

Step 1. Define Base Image, Run apt-gets, Install Python

Similar to before, we will define our base image of Ubuntu Xenial, run the necessary apt-gets which is crucial, and install Python.

FROM ubuntu:xenial
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get -y update                                            && \
    apt-get -y upgrade                                           && \
    apt-get -y install                                              \
      python

Step 2. Install Needed apt-get Packages and Dependencies

RUN apt-get -y install python-lxml python-numpy
RUN apt-get -y install python-pip liblapack3 libblas-dev liblapack-dev gfortran
RUN apt-get -y install python-scipy python-configparser python-h5py
RUN apt-get update

Step 3. Install Needed PIP Packages and Dependencies

Best practice is always to have your pip installs defined in the Dockerfile instead of in a requirements.txt. The more self-contained, the better.

Additionally, do not forget to install the BQAPI inside the container.

RUN pip install pymks tables scipy
RUN pip install --user --install-option="--prefix=" -U scikit-learn==0.19.1
RUN pip install -i https://biodev.ece.ucsb.edu/py/bisque/prod/+simple bisque-api==0.5.9
RUN pip install requests==2.10.0

Step 4. Set Working Directory and COPY Source files

The working directory should be defined as /module and in there, dump all your source code. If you want to be clean, use a /source folder in the /module directory. This is your module container and you have the control to customize the structure in any way that seems feasible to you. In this example, there is only one script that performs the entire analysis pipeline. Some might say too simple, others say efficient.

WORKDIR /module
COPY PythonScriptWrapper /module/
COPY PythonScriptWrapper.py /module/
COPY predict_strength.py /module/
COPY pydist /module/pydist/
ENV PATH /module:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Step 5. Lastly, the Final Command.

Your final command should come as no surprise. The reason we use the PythonScriptWrapper is because it makes life easier for you. Your focus should be on developing breakthrough methods, not how to handshake between a cloud platform and a Docker container.

CMD [ 'PythonScriptWrapper' ]