2.1 Dockerfile
FILENAME:
Dockerfile
This file will be used for the docker build process so try to keep the default naming convention.
Defining a Base Image
Typically, a Dockerfile
will include a base image such as FROM ubuntu:xenial
at the top of the file. We highly recommend using the Ubuntu Xenial 16.04 image for your module as it has been tested rigorously and is the base image we use for BisQue.
NOTE. If you would like to use a different Ubuntu flavor like 18.04 Bionic, we encourage you to do so and let us know of any problems you encounter.
We typically start off our module Dockerfiles with the following lines:
FROM ubuntu:xenial
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get -y update && \
apt-get -y upgrade && \
apt-get -y install python
APT-GET Installs
After these lines, we need to add dependencies for our module such as pip, headers, and any compilers. The next few lines are where we put any apt-get
installs. Usually this is good practice since some pip install packages require a compiler. If one is not present in the container, it might be hard to debug what is missing.
RUN apt-get -y install python-pip liblapack3 libblas-dev liblapack-dev gfortran
RUN apt-get update # <--- Run before pip installs
PIP Installs
Now we add any pip installs that our module may need such as numpy
, pandas
, Tensorflow
, and any others. If you used a virtual environment to develop module locally, and hopefully you did, then simply use pip freeze > requirements.txt
. This will give you a text file with all the packages you are using in your virtual environment for your module. If you are a diligent person, you probably do not use one virtual environment for all your development in, say, Python 3.6. Hence, you will only have the necessary packages in the requirements.txt
file. From this file, fill in the necessary pip installs within the Dockerfile.
RUN pip install numpy pandas tensorflow
RUN pip install scikit-learn==0.19.1 # <--- For specific versions
RUN pip install -i https://biodev.ece.ucsb.edu/py/bisque/prod/+simple bisque-api==0.5.9
Working Directory, Source Code
We typically define the working directory as follows:
WORKDIR /module
After this, you can put all of your source code along with the PythonScriptWrapper
inside the /modules
directory.
COPY PythonScriptWrapper /module/
COPY PythonScriptWrapper.py /module/
COPY YOUR_MODULE_SCRIPT.py /module/ # <--- Source folders welcome too
COPY pydist /module/pydist/
ENV PATH /module:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
The last line is the ENV
instruction which sets the environment variable <key>
to the value <value>
. This value will be in the environment for all subsequent instructions in the build stage and can be replaced inline in many as well.
What Should My Last Line Be?
Your last line in the Dockerfile should be the CMD
command. There can only be one CMD
instruction in a Dockerfile. If you list more than one CMD
then only the last CMD
will take effect.
The main purpose of a
CMD
is to provide defaults for an executing container.
For us, we will use the PythonScriptWrapper
as our CMD
command as follows:
CMD [ 'PythonScriptWrapper' ]
Simple, am I right? Let’s put all this together in a concrete example.
Example: Composite Strength
The Composite Strength module requires a variety of special packages, such as a very specific version of scikit-learn==0.19.1
. Please be aware that if you do pickle files using scikit-learn
, you might have to use the same exact version to unpickle the file. Some of us found that out the hard way, hence the word of caution.
This simple mistake can be extended to any number of machine learning tasks. We recommend that for maximum reproducibility, use the exact same version of all pip install packages as you did on your local development environment.
Step 1. Define Base Image, Run apt-gets, Install Python
Similar to before, we will define our base image of Ubuntu Xenial, run the necessary apt-gets
which is crucial, and install Python
.
FROM ubuntu:xenial
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get -y update && \
apt-get -y upgrade && \
apt-get -y install \
python
Step 2. Install Needed apt-get Packages and Dependencies
RUN apt-get -y install python-lxml python-numpy
RUN apt-get -y install python-pip liblapack3 libblas-dev liblapack-dev gfortran
RUN apt-get -y install python-scipy python-configparser python-h5py
RUN apt-get update
Step 3. Install Needed PIP Packages and Dependencies
Best practice is always to have your pip installs defined in the Dockerfile instead of in a requirements.txt
. The more self-contained, the better.
Additionally, do not forget to install the BQAPI
inside the container.
RUN pip install pymks tables scipy
RUN pip install --user --install-option="--prefix=" -U scikit-learn==0.19.1
RUN pip install -i https://biodev.ece.ucsb.edu/py/bisque/prod/+simple bisque-api==0.5.9
RUN pip install requests==2.10.0
Step 4. Set Working Directory and COPY Source files
The working directory should be defined as /module
and in there, dump all your source code. If you want to be clean, use a /source
folder in the /module
directory. This is your module container and you have the control to customize the structure in any way that seems feasible to you. In this example, there is only one script that performs the entire analysis pipeline. Some might say too simple, others say efficient.
WORKDIR /module
COPY PythonScriptWrapper /module/
COPY PythonScriptWrapper.py /module/
COPY predict_strength.py /module/
COPY pydist /module/pydist/
ENV PATH /module:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Step 5. Lastly, the Final Command.
Your final command should come as no surprise. The reason we use the PythonScriptWrapper is because it makes life easier for you. Your focus should be on developing breakthrough methods, not how to handshake between a cloud platform and a Docker container.
CMD [ 'PythonScriptWrapper' ]