DockerfileThis file will be used for the docker build process so try to keep the default naming convention.
Defining a Base Image
Dockerfile will include a base image such as
FROM ubuntu:xenial at the top of the file. We highly recommend using the Ubuntu Xenial 16.04 image for your module as it has been tested rigorously and is the base image we use for BisQue.
NOTE. If you would like to use a different Ubuntu flavor like 18.04 Bionic, we encourage you to do so and let us know of any problems you encounter.
We typically start off our module Dockerfiles with the following lines:
FROM ubuntu:xenial ENV DEBIAN_FRONTEND noninteractive RUN apt-get -y update && \ apt-get -y upgrade && \ apt-get -y install python
After these lines, we need to add dependencies for our module such as pip, headers, and any compilers. The next few lines are where we put any
apt-get installs. Usually this is good practice since some pip install packages require a compiler. If one is not present in the container, it might be hard to debug what is missing.
RUN apt-get -y install python-pip liblapack3 libblas-dev liblapack-dev gfortran RUN apt-get update # <--- Run before pip installs
Now we add any pip installs that our module may need such as
Tensorflow, and any others. If you used a virtual environment to develop module locally, and hopefully you did, then simply use
pip freeze > requirements.txt. This will give you a text file with all the packages you are using in your virtual environment for your module. If you are a diligent person, you probably do not use one virtual environment for all your development in, say, Python 3.6. Hence, you will only have the necessary packages in the
requirements.txt file. From this file, fill in the necessary pip installs within the Dockerfile.
RUN pip install numpy pandas tensorflow RUN pip install scikit-learn==0.19.1 # <--- For specific versions RUN pip install -i https://biodev.ece.ucsb.edu/py/bisque/prod/+simple bisque-api==0.5.9
Working Directory, Source Code
We typically define the working directory as follows:
After this, you can put all of your source code along with the
PythonScriptWrapper inside the
COPY PythonScriptWrapper /module/ COPY PythonScriptWrapper.py /module/ COPY YOUR_MODULE_SCRIPT.py /module/ # <--- Source folders welcome too COPY pydist /module/pydist/ ENV PATH /module:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
The last line is the
ENV instruction which sets the environment variable
<key> to the value
<value>. This value will be in the environment for all subsequent instructions in the build stage and can be replaced inline in many as well.
What Should My Last Line Be?
Your last line in the Dockerfile should be the
CMD command. There can only be one
CMD instruction in a Dockerfile. If you list more than one
CMD then only the last
CMD will take effect.
The main purpose of a
CMDis to provide defaults for an executing container.
For us, we will use the
PythonScriptWrapper as our
CMD command as follows:
CMD [ 'PythonScriptWrapper' ]
Simple, am I right? Let’s put all this together in a concrete example.
Example: Composite Strength
The Composite Strength module requires a variety of special packages, such as a very specific version of
scikit-learn==0.19.1. Please be aware that if you do pickle files using
scikit-learn, you might have to use the same exact version to unpickle the file. Some of us found that out the hard way, hence the word of caution.
This simple mistake can be extended to any number of machine learning tasks. We recommend that for maximum reproducibility, use the exact same version of all pip install packages as you did on your local development environment.
Step 1. Define Base Image, Run apt-gets, Install Python
Similar to before, we will define our base image of Ubuntu Xenial, run the necessary
apt-gets which is crucial, and install
FROM ubuntu:xenial ENV DEBIAN_FRONTEND noninteractive RUN apt-get -y update && \ apt-get -y upgrade && \ apt-get -y install \ python
Step 2. Install Needed apt-get Packages and Dependencies
RUN apt-get -y install python-lxml python-numpy RUN apt-get -y install python-pip liblapack3 libblas-dev liblapack-dev gfortran RUN apt-get -y install python-scipy python-configparser python-h5py RUN apt-get update
Step 3. Install Needed PIP Packages and Dependencies
Best practice is always to have your pip installs defined in the Dockerfile instead of in a
requirements.txt. The more self-contained, the better.
Additionally, do not forget to install the
BQAPI inside the container.
RUN pip install pymks tables scipy RUN pip install --user --install-option="--prefix=" -U scikit-learn==0.19.1 RUN pip install -i https://biodev.ece.ucsb.edu/py/bisque/prod/+simple bisque-api==0.5.9 RUN pip install requests==2.10.0
Step 4. Set Working Directory and COPY Source files
The working directory should be defined as
/module and in there, dump all your source code. If you want to be clean, use a
/source folder in the
/module directory. This is your module container and you have the control to customize the structure in any way that seems feasible to you. In this example, there is only one script that performs the entire analysis pipeline. Some might say too simple, others say efficient.
WORKDIR /module COPY PythonScriptWrapper /module/ COPY PythonScriptWrapper.py /module/ COPY predict_strength.py /module/ COPY pydist /module/pydist/ ENV PATH /module:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Step 5. Lastly, the Final Command.
Your final command should come as no surprise. The reason we use the PythonScriptWrapper is because it makes life easier for you. Your focus should be on developing breakthrough methods, not how to handshake between a cloud platform and a Docker container.
CMD [ 'PythonScriptWrapper' ]