Log monitoring implies using technology to avoid reading every single log message, with the benefit of catching more interesting events than if you were to read every single line!

Elasticsearch is a database API built on Apache Lucene’s indexing system; JSON formatted queries are submitted to Elasticsearch for specific documents, ranges of documents or aggregations of documents.

Visualisation of Elasticsearch’s data is handled by it’s sibling Kibana which enables you to filter down to the subset of documents you are interested in, it can show the documents in their entirety or plot them in graphs/charts. Kibana can handle basic visualisation but it can feel a bit clunky.

Grafana provides time-series visualisation, not only for Elasticsearch but all popular time-series databases. Its focus on time-series visualisation means the tool is typically easier to work with and has more options to tune your graphs than Kibana.

{% include toc.html %}

Elasticsearch

It is simpler to consider Elasticsearch as a database as the API and underlying Lucene indices are a part of the same system. Elasticsearch stores documents in a hierarchy of files, split in-part by which index and shard the document is in; using SSD on your Elasticsearch servers is a must, fast ones! Equally as important as disk speed is having a large number of fast CPUs and at least 64GiB RAM for a fully-scaled node.

Ingest with Logstash

Ingesting data into Elasticsearch either means your application is sending Elasticsearch-compatible JSON or you are parsing logs to be converted into Elasticsearch documents. Logstash is the CPU-heavy server which can recieve arbitrary log data and apply regexp-powered captures and transformations to dissect information from whichever format it comes in. As its name implies it can listen for syslogs and apply a series of filters and captures to extract the useful information and assign each piece to a field, whether that field be an integer, a fully searchable string or a whole-string keyword.

Logstash executes a combination of Perl-compatible Regular Expressions and Ruby plugins which it can scale vertically (Bigger box) very effectively; I have ingested half a terabyte of logs in around a day using this system with aprox. 300 lines of configuration. I certainly recommend Logstash over building a custom parser, as I have done that too and it took much longer to build and was significantly less efficient and slower.

Ruby is used minimally to fulfil operations more complicated than RegExp; My current systems don’t use custom Ruby files but do use several Ruby one-liners, these are included in the same config file as the RegExp’s which keeps things tidy. Most of these apply some simple math or do some conditional counting.

Kibana

If Elasticsearch is an API then Kibana is the front-end; It enables you to view documents as they arrive in the database, filter using either Lucene queries or the graphical drop-downs, which are dynamically populate with the fields from your documents.

The Discovery page shows a graph of document count over time, below is a list of the most recent documents represented as a list of fields, the option exists to view the raw JSON document. This page is useful for understanding the details of a logged event; The @timestamp tells you when it happened, the message gives you the whole message recieved, if you knew what this log looked like and you applied some Logstash parsing to it then a source_ip field could tell you where it originated, dest_ip tell you where it is going, transport tell you the protocol

The Visualise tool allows you to map any of your documents fields to a graph, bar chart, heatmap, etc. - If you just need to impress it can also generate a Word Cloud.

While Kibana has visualisation capabilities, the software is better suited to search and filtering to understand an event than it is for visualisations. This may change over time as Elastic has started work a couple experimental features to improve its visualisation in both time-series and anomaly detection domains.

Grafana

Grafana is the go-to time-series visualisation tool; It integrates with any supported database and enables the user to easily make graphs using the filtering capabilities of the target database. Elasticsearch support in Grafana matches Kibana, in that the user can use drop-down menus to select fields & values or enter a Lucene query.

As with Kibana you are able to set an exact timespan for your graphs, with an automatic refresh if you are watching live metrics. Alerting can be configured on any graph, from which Grafana can send messages via many channels such as email or Slack with snapshot of the graph and information on the thresholds being exceeded.

Graphs are created in Dashboards which allow for logical grouping of related metrics; To reduce duplicate effort Grafana provides templating capabilities to allow graphs to be configured based on automatically generated lists of keys pulled from your datasource. A common example of this is for a dashboard to contain all relevant graphs for a type of server, then for a template variable to be used as the filter value in the underlying queries.

Should you deploy Grafana to a large enterprise you will find the user management capabilities in Grafana v5+ are very useful, a full Admin/Edit/View ACL system exists for Teams and individuals users of the system.