In this post, we will go over how Docker manages data on your host and how you can leverage it in your day-to-day work.
After reading this post you should understand:
- Distinguish between two types of data: Volatile and Persistent
- When should you choose one over the other
- Real world examples on when to choose which
This post is a followup on our Docker introduction post, we highly recommend you to first go over it before proceeding.
During development, it is common to experiment in an isolated environment to test your code.
By default, unless you instruct Docker otherwise, any data written inside a container will be deleted once that container is removed (not stopped).
Lets confirm that this is indeed the case. Let’s start a container and create a file in it:
When creating a new container, if we haven’t defined a volume explicitly (using the -v flag), Docker creates a layer on the host which represents the data diff between the original image (in our example: alpine:latest) and the current running container.
In order to see this in action, we will use the Docker’s inspect command which provides low level information on Docker objects.
Now let’s look inside the UpperDir path:
As we can see, Docker maintains any data written in side a container in a special location on the host machine.
Now that we know where it is kept on the host, lets stop our container and see if the data is still there.
As we can see, the data is still there! that is because the container has not been removed yet and potentially, you want to start it again using
docker container start volatile.
Now, let us remove it and see if the data still persist.
What have we learned?
Docker manages any container data at a special location (that can be extracted using the
docker container inspect command).
As long as we do not remove the container, the data will persist.
When to use volatile type data?
Development / Experimental stage.
While we saw that the container data is persistent even if we stop the container, it is meant to be used during the development stage.
A good example is our Postgres DB from our previous post, we can experiment freely with a newly created Postgres container or reuse the data until we are happy with the result before committing our code.
Persistent data – volumes
There are two types of volume. Docker-managed volume and Bind mount volume.
Both are similar in being a mount point between the host directory tree and a container directory tree, but differ in where that location is on the host.
Each has its pros and cons and we will discuss each in depth so you will know when to choose each.
Bind mount volumes
Bind mount volume maps a user-specific directory or file on the host machine to a corresponding location on the container.
When should we use bind mount volumes?
The rule of thumb is, if you need to share data between the host machine and the container, you will want to use the mount bind type volume.
Example: Testing your code in a container
When developing a new application, you often need to test it out.
Let us develop a small flask application to demonstrate the bind mount option.
First, let create our environment, create a new folder called “app” and add two files to it:
# app/requirements.txt Flask
# app/app.py from flask import Flask app = Flask(__name__) @app.route("/") def hello(): return "Hello World!" if __name__ == "__main__": app.run(host='0.0.0.0', port=80)
Now that we finished writing our code, let us spin a new container and mount the folder inside so we can run it:
Bind mount syntax looks as follows:
-v <absolute path on host>:<absolute path on container>
So you will notice that in the example above, I used:
Now that the container is up and running, you will notice that any change made on the host side is directly changing the files in the container, feel free to experiment with it using your IDE of choice or ever add new files to the host
app folder and watch it appear in the container.
Another use case for bind mounts is when you want to share data between your container and other running processes on the host machine.
For example, if you just started your journey with containers, a good practice is to start containerising each service independently. Let us assume your log aggregation is being done by one of the many tools out there (e.g. filebeat, logstash, greylog or splunk) running on the host machine.
As we just learned, we can bind mount a folder in our host machine like:
This way, if our app is writing the logs into /
var/log/hello-world/logs we can point any agent / service on the host machine to listen on
/path/where/we/manage/logs and offload the log management to some other process.
There are many more use cases for bind mounts but we will leave it to you to explore further.
Docker-managed – volumes
Managed volumes use locations that are created by the Docker daemon in a Docker managed space (similar to what we saw in volatile).
It is the recommended way to work since it is not dependent on the host directory structure and has several advantages over bind mount volumes:
- Easier to back up and/or migrate than bind mounts.
- Can be managed by Docker CLI commands or the Docker API.
- Safely share the volume among multiple containers.
- Different volume drivers allow you to store volumes on remote (other hosts or cloud providers), encrypt the contents of volume.
When should we use it? whenever possible 🙂
Example: sharing data between containers
In this example, we are going to create a volume and show you how two different containers are going to use it, one as a
producer and the other as a
Before we start, we will introduce a new command:
To see our new volume, we will use a second command
docker volume ls that will list all volumes that Docker is currently managing.
We can see Docker has created the volume
shared-vol using the driver
local (drivers are a topic of a future post, for now
local means a managed volume similar to what we saw earlier).
Let us spin a new container called
test1 and attach our volume to it under
Note the subtle difference in how we use the flag this time:
-v <volume name>:<path on system>
Let us create a second container
test2 and mount the same volume but this time to:
Note that both containers have been created with
Now that everything is ready, from container
test1 let us create a new file under our volume mount point:
And now, from our second container
test2, let us check the (previously empty) folder
As we can see, our
test2 container can see a new file created by our
This type of functionality allows us, for example, to take our log aggregation example and implement it between two different containers, one that produces logs into a volume and a second one that consumes it.
This post was an introduction to volumes and how one can leverage the different types based on specific needs.
We went over the differences and use cases of bind-mount volumes and persistent type and showed examples on how and when you should use each.
It is important to note that for managed volumes, one should use the
--mount flag with the explicit parameters over the
-v flag we demonstrated, but for the sake of simplicity we chose not to.
There are many more things we can do with Docker volumes, especially around the managed volumes, but we will save those for a future post on advance volume patterns.