Options for Docker logging

While playing around with Docker on small projects or dev environments is surely entertaining, bringing your container-based application to production brings along critical decisions and challenges.

One of the most imperative decisions you’ll face is how to manage logging. Poor or-even worse-missing logs can transform a simple bug from a common hiccup to nights of debugging.

Visibility and tracking are keys to successful deployments, because issues will arise, and you need that precious bit of information to quickly wipe them out of your way.

We can distinguish performance logging, more related to the containers itself, and application logging, what actually happens at the application level, inside your container.

We will focus on the latter, assuming the application already produces relevant logs. The challenge is to make them available outside of the Docker container for consumption. This seems simple enough, but there are multiple ways to achieve that, from the simplest docker logs to more complex setup involving third-party log management tools.

Docker Logging Options

Every solution has different pros and cons, and not all can be readily applied to one’s own applications. In fact, inside the same organization, different applications might use different ways of logging.

Let’s go trough the available options, in order to get a nice overview and a solid foundation upon which you can build your own customized system that fits your needs.

Interactive vs background run

As a newcomer, one may be happy to run containers using an interactive shell:

$ docker run image:latest

This runs the image as expected, and prints output to the stdout of the console.

Consider the following very basic Docker file (the image is available as nfrankel/simplelog:1 if you want to skip creating it yourself), and the referenced log.sh:

FROM busybox:latest
ADD log.sh /
CMD /log.sh

#/bin/sh

# log.sh - run an infinite loop,
# and print the container hostname as well as the time every second.
i=0; while true; do echo "[$(uname -n)] $(date)"; i=$((i+1)); sleep 1; done

Running the container:

$ docker run --name logger1 -h logger1 nfrankel/simplelog:1

Will output something like this:

[logger1] 0: Mon Oct  1 11:23:04 UTC 2018
[logger1] 1: Mon Oct  1 11:23:05 UTC 2018
[logger1] 2: Mon Oct  1 11:23:06 UTC 2018
[logger1] 3: Mon Oct  1 11:23:07 UTC 2018
[logger1] 4: Mon Oct  1 11:23:08 UTC 2018
[logger1] 5: Mon Oct  1 11:23:09 UTC 2018
[logger1] 6: Mon Oct  1 11:23:10 UTC 2018
[logger1] 7: Mon Oct  1 11:23:11 UTC 2018
[logger1] 8: Mon Oct  1 11:23:12 UTC 2018
[logger1] 9: Mon Oct  1 11:23:13 UTC 2018

This is all good, but Docker containers hardly run in interactive mode in production: they mostly do run in detach mode, i.e. the -d flag in docker run. In this mode:

The Docker command-line interface sends the command to the Docker daemon
The daemon later starts the container
It returns the container’s name, without further piping the container’s stdout to the client console’s stdout! The container is completely detached from the client.

How can we get our logs back in the client console?

The direct approach

Imagine a Docker image running in background mode. It’s possible to pipe its stdout to the client console stdout using the docker attach command:

$ docker start logger1

$ docker attach logger1

docker attach reattaches the console’s stdout to the container’s stdout. As soon as the container writes to stdout, it will be displayed in the client, with a small delay according to the network quality between them.

But this works only if the container is running of course, otherwise the following message will be returned:

You cannot attach to a stopped container, start it first

That makes sense, but doesn’t solve our problem: logs are especially useful in post mortem scenarios where they help understand what happened to our application. We need a way to check the logs of a stopped container.

The canonical approach

Docker’s developers are of course well aware of the issue, and provide the docker logs command to that purpose.

The command takes a “snapshot” of sorts and works regardless of the run status of the container: that means logs are persisted, which is exactly what we want:

If the container is stopped, it will output the logs produced by the application during its lifetime.
If it’s running, it will output the logs that were produced up until the command was issued.

Note that by default, Docker logs messages in a blocking way: when the log write occurs, it blocks the execution of the application impacting performance depending on the application’s write frequency.

To circumvent the issue and decouple logging from execution, Docker allows logging in an asynchronous way. With this method logs are stored in a memory buffer, and only when the buffer is full the log is written. To active asynchronous logging start the Docker container with the --log-opt mode=non-blocking option:

docker run -it --log-opt mode=non-blocking my-awesome-image

The main disadvantage of the docker logs approach is that it requires knowing which container to attach to. Depending on your architecture, there might be dozens or even hundreds of containers running your app. If an error occurs, the core issue would be to first locate the culprit. Hence, the pull approach – reading the logs of container X – is in general not the most useful.

A push approach is more relevant.

Leveraging a log management tool

In a distributed production environments, Ops expect a centralized logging dashboard, something usually better handled by a dedicated third-party component. Among the many tools available, we’ll show the Elastic stack, which is almost the same we use internally at Exoscale to keep our infrastructure under control. Regardless of the tech stack you choose, the process will be almost exactly the same.

Elastic is made of some key components:

Beats is the platform for single-purpose data shippers. They install as lightweight agents and send data from hundreds or thousands of machines to Logstash or Elasticsearch.
Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite “stash”.
Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.
Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack, so you can do anything from learning why you’re getting paged at 2:00 a.m. to understanding the impact rain might have on your quarterly numbers.

Regarding logging, our basic setup will consist of a Filebeat agent scraping the Docker image’s logs and an Elasticsearch instance to store them. Any more sophisticated setup can be built upon this basic layout.

In order to try the steps below by yourself, if you don’t have an Elasticsearch instance available, Elastic provides Elasticsearch system packages.

Just check the instructions provided by Elastic on their respective Elasticsearch and Filebeat product pages.

To continue we will assume you have access to an Elasticsearch instance. To visualize the data in a user-friendly format, it’s also advised to have Kibana.

Making use of Docker logs via Filebeat

Elastic Beats are data shippers, available in different flavors depending on the exact kind of data:

Filebeat: helps keep the simple things simple by offering a lightweight way to forward and centralize logs and files
Metricbeat: collects metrics from systems and services (CPU, memory, Redis, Nginx, you name it)
Packetbeat: taps into data traveling over the wire
Winlogbeat: keeps a pulse on what’s happening across Windows-based infrastructure
Auditbeat: collects Linux audit framework data and monitor the integrity of files
Heartbeat: monitors services for their availability with active probing via their URL

There’s no obvious out-of-the-box Beat to send stdout to Elasticsearch, but there’s a nifty trick: we saw earlier that it’s possible to display the logs of a stopped container via the docker logs command. This is possible because those logs are in fact stored on the filesystem of the host machine!

To check the location of those logs, you can use the Docker inspect command:

$ docker inspect logger1

This yields everything Docker knows about the logger1 container, which may actually be a little too much, considering we are only interested in the logs path. Let’s use the proper option to get a more focused result:

$ docker inspect --format='{{.LogPath}}' logger1

/var/lib/docker/containers/852fced2b360d0265af93757b81d3e5da5d... ...542bcd5444-json.log

Not that if you are using a Windows or macOS system, the file above won’t be found: worse, the var/lib/docker doesn’t even exist on such systems. The reason for that is since those are not Linux, the technology Docker is based on (i.e. cgroups) is not available natively, so there’s an additional VM sitting between the host filesystem and Docker.

Accessing this VM depend on your OS and setup, but once you’re in it the log file should be again accessible. If you are on one of those OS, you can take advantage of an Exoscale virtual machine to follow along using a Linux distribution of your choice.

With this in mind, we can now proceed to update the Filebeat configuration with the path to the Docker image logs.

filebeat.inputs:
- type: log
  paths:
    - /var/lib/docker/containers/852fced2b360d0265af93757b81d3e5da5d... ...542bcd5444-json.log

With Filebeat now in place, image logs are shipped to Elasticsearch, and we can query Elasticsearch through its REST API to make sure of that:

$ curl -i http://localhost:9200/*/_search

Be careful with that * part in the url: that means the search will be executed on all indices, ensuring that you’ll get the results following this article. However, if you are using an Elasticsearch instance with existing data, the query might return a whole lot of data.

By default, Beats create one index per day e.g. filebeat-6.4.2-2018.12.01, and you should limit your query on such index or, alternatively, use Kibana.

The query should return something like the following (abridged):

{  
   "took":4,
   "timed_out":false,
   "_shards":{  
      "total":3,
      "successful":3,
      "skipped":0,
      "failed":0
   },
   "hits":{  
      "total":7124,
      "max_score":1.0,
            "_index":"filebeat-6.4.2-2018.12.01",
            "_type":"doc",
            "_id":"zoMjYmYBDAuB-C7WiXob",
            "_score":1.0,
            "_source":{  
               "@timestamp":"2018-12-01T07:59:47.767Z",
               "prospector":{  
                  "type":"log"
               },
               "input":{  
                  "type":"log"
               },
               "beat":{  
                  "name":"VM-7afb0a53-06c1-453b-b23e-e8382b094adf",
                  "hostname":"VM-7afb0a53-06c1-453b-b23e-e8382b094adf",
                  "version":"6.4.2"
               },
               "host":{  
                  "name":"VM-7afb0a53-06c1-453b-b23e-e8382b094adf"
               },
               "message":"{\"log\":\"[logger1] Tue Dec  1 08:55:45 UTC 2018\\n\",\"stream\":\"stdout\",\"time\":\"2018-12-01T08:55:45.088500761Z\"}",
               "source":"/var/lib/docker/containers/852fced2b360d0265af93757b81d3e5da5d... ...542bcd5444-json.log",
               "offset":4030
            }
         }
      ]
   }
}

We can clearly see that while the solution works, our logs have added metadata inherited by the Docker daemon.

This is the original log pattern:

[logger1] 0: Mon Oct  1 11:23:04 UTC 2018

This is what gets shipped to Elasticsearch:

{\"log\":\"[logger1] Tue Oct  9 08:55:45 UTC 2018\\n\",\"stream\":\"stdout\",\"time\":\"2018-10-09T08:55:45.088500761Z\"}

If we are interested in the oroginal log only, as it happens in most of the cases, we need to put in palce some cleanup, in oder to get a more clean output and save some space.

While it’s possible to achieve that using Logstash, that makes the setup more complicated, and more resources-hungry. Fortunately, Filebeat configuration offers a solution, albeit an experimental one, in the form of a dedicated docker input.

Just update the configuration to:

filebeat.inputs:
- type: docker
  containers.ids: "bdd402a5ab64b20e957b21ef2f330123f57851db6dd306e86e8c77dd6657266e"

The docker input correctly analyzes the logs, keeping the message is the original one. The only required configuration is the path to the log folder returned by the docker inspect command.

It’s also possible to use the * catch-all character to scrape logs from all containers.

filebeat.inputs:
- type: docker
  containers.ids: "*"

With this configuration, logs from all Docker containers running on the host are safely shipped to the Elasticsearch instance.

Improving log scraping via the sidecar pattern

We’ve covered a great length, and our log stack works great. Yet, we are left with some issues:

Installing Filebeat on the host requires administration rights
To configure it, it’s more than advised to have some configuration management tool in order to automate the configuration process
It completely defeats the Docker abstraction layer, as it needs access to the host log file location

Moreover, we have described how to make use of Dockerized applications which logs on stdout, and beyond the trivial image we built on purpose, this is not always the case of many real-world applications.

In the Java realm as an example, some applications make use of an application server. In that case, not only the application server but also the applications logs into different dedicated log files.

Let’s take the case of a WAR application deployed into a Tomcat 8 Servlet container. The setup will produce multiple log files under /var/log/tomcat8/:

Tomcat’s standard out, catalina.out
Tomcat’s standard error, catalina.err
Tomcat’s own log, rolled by date catalina.YYYY-MM-DD.log
Servlet’s engine’s logs, rolled by date localhost.YYYY-MM-DD.log
Access logs, also rolled by date localhost_access_log.YYYY-MM-DD.txt

Moreover, there might be multiple log files for each application served by Tomcat e.g. /var/log/myapp/myapp.log.

Redirecting all those log files to stdout is hardly feasible. But the solution is quite straightforward: it’s possible to connect one container’s volume to another container’s volume.

To simulate file writing, we’ll change the original image to:

FROM busybox:latest
ADD log.sh /
RUN mkdir /var/log/
VOLUME /var/log
CMD /log.sh > /var/log/dummy.log

And the run the image (in this case taking advantage of the pre-built nfrankel/simplelog:2):

$ docker run -d --name logger2 -h logger2 nfrankel/simplelog:2

How do we access the log file location from Filebeat? It turns out that Filebeat itself is available as a Docker container, and we can connect container volumes together using the --volume-from option of the Docker run command. The VOLUME directive in the above Dockerfile serves exactly that purpose.

This architecture is called the “Sidecar pattern”: support functionalities are not baked into the main “business” container, but in another container, that doesn’t run independently.

Docker Logging Sidecar Pattern

To fun the Filebeat container so that it can access the defined volume, run the following command:

docker run -d \                                                                            # 1
  --name filebeat \                                                                        # 2
  --mount type=bind,source="$(pwd)"/filebeat.yml,target=/usr/share/filebeat/filebeat.yml \ # 3
  --volumes-from logger2:ro \                                                              # 4
  docker.elastic.co/beats/filebeat:6.4.2

Run the container as daemon
Name the image for easier reference
Make available the host’s Filebeat configuration file to the container at runtime
Connect the Filebeat container to the logger2 container’s VOLUME, so the former can read the latter. Filebeat is then able to access the /var/log directory of logger2.

Finally let’s update the Filebeat configuration to watch the exposed log file:

filebeat.inputs:
- type: log
  paths:
    - /var/log/dummy.log

Considering our previous mentioned Tomcat scenario, the difference lies in more volumes to mount while running Docker, and more files to configure in the Beat configuration, but overall the process would be fairly similar.

Event-based logging with GELF and Logstash

Event-based logging is an alternative to log scraping, available if the container writes on the standard output. Docker is able to send event messages instead of writing into the log files via logging drivers. There are multiple drivers available, let see some of them:

json-file: This is the default logging driver, and it is the one we used in our previous setup. Logs are written to a file and formatted as JSON.
none: No logs are available for the container, in this case docker logs does not return any output.
syslog: Sends logging messages to the syslog process of the host. The syslog daemon has to be running on the host.
journald: Sends log messages to the journald process of the host. The journald daemon must be running on the host.
fluentd: Sends log messages to fluentd process of the host. As ususual, the fluentd daemon must be running on the host.
gelf: Sends log messages to Graylog Extended Log Format (GELF) endpoints. Compatible endpoints include Graylog and Logstash.

Using the gelf driver, it’s possible to send log messages to Logstash, which is part of the Elastic stack.

Running a container using another driver requires either a change on the Docker daemon configuration or additional parameters when using docker run. The latter option is the wisest choice, as it doesn’t impact other containers. The following command makes usage of the gelf driver and sends log messages to the default Logstash port installed on the same host:

docker run -d --name logger3 -h logger3 \
  --log-driver gelf \
  --log-opt gelf-address=udp://localhost:12201 nfrankel/simplelog:1

Compared to Filebeat, Logstash can be installed on any single machine (or a cluster of them), as long as it’s reachable from the Docker images. Even better, there’s no need for a custom configuration that depends on the container’s log location.

The only requirement is to configure Logstash to accept GELF inputs, giving it the location of the Elasticsearch instance to write to, and the format of the index to use:

input {
  gelf {}
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "gelf-%{+YYYY.MM.dd}"
  }
}

At this point, Logstash forwards log events to Elasticsearch, and it allows message processing before being sent if needed. On the flip side a very large events volume might become the bottleneck in the log processing pipeline, slowing down Logstash itself.

Conclusion

We’ve seen that there are several ways to get logs out of a Docker container.

While Docker logs the stdout and makes it available via the docker logs command by default, this approach is not always suitable, as not all applications write on stdout. In this case, a straightforward strategy is to mount the logging location(s) from the host.

Once logs become available trough this method, they can be scraped by a dedicated log management tool, such as Filebeat and then sent to Elasticsearch, or alternatively Docker can be configured to send events directly to Elasticsearch.

Among all options described above, at least one will surely fit your own requirements. Happy logging!