Elasticsearch

Version2.dev (preview) 2.19 (latest)

Introduction

  • Supported ES versions: 7.x, 8.x

Elasticsearch version is automatically retrieved from root/ping endpoint. Based on this version Jaeger uses compatible index mappings and Elasticsearch REST API. The version can be explicitly provided via version: config property.

Elasticsearch does not require initialization other than installing and running Elasticsearch. Once it is running, pass the correct configuration values to Jaeger.

Elasticsearch also has the following officially supported resources available from the community and Elastic:

Configuration

A sample configuration for Jaeger with Elasticsearch backend is available in the Jaeger repository: config-elasticsearch.yaml. In the future the configuration documentation will be auto-generated from the schema. Meanwhile, please refer to config.go as the authoritative source.

Shards and Replicas

Shards and replicas are some configuration values to take special attention to, because this is decided upon index creation. This article goes into more information about choosing how many shards should be chosen for optimization.

Index Management Strategies

Jaeger supports three index management strategies:

Time-based indices (default)Manual rolloverRollover with ILM (recommended)
How indices are createdJaeger creates daily or hourly indices (e.g., jaeger-span-2024-06-18)Operator runs jaeger-es-rollover init to create the first numbered index (e.g., jaeger-span-000001); cron job creates subsequent onesOperator runs jaeger-es-rollover init to create the first index; Elasticsearch creates subsequent ones
Rollover triggerAutomatic (new time period)jaeger-es-rollover rollover cron jobElasticsearch ILM policy
Retention cleanupjaeger-es-index-cleaner cron jobjaeger-es-rollover lookback (optional) + jaeger-es-index-cleaner cron jobsElasticsearch ILM policy
External tooling requiredNonejaeger-es-rollover init (one-time)jaeger-es-rollover init (one-time) + ILM policy

The relevant configuration options are:

Config propertyDefaultRelevant strategyDescription
date_layout2006-01-02Time-basedDate format for index names (e.g., 2006-01-02-15 for hourly indices)
use_aliasesfalseManual rollover, ILMUse read/write aliases instead of time-based indices (enables rollover mode)
use_ilmfalseILMDelegate rollover and retention to Elasticsearch ILM (requires use_aliases: true)
create_mappingstrueAllCreate index templates at Jaeger startup. Must be false when use_ilm: true

Index Rollover

Elasticsearch rollover is an index management strategy that optimizes use of resources allocated to indices. For example, indices that do not contain any data still allocate shards, and conversely, a single index might contain significantly more data than the others. Jaeger by default stores data in daily indices which might not optimally utilize resources. Rollover feature can be enabled by use_aliases: true config property.

Rollover lets you configure when to roll over to a new index based on one or more of the following criteria:

  • max_age - the maximum age of the index. It uses time units: d, h, m.
  • max_docs - the maximum documents in the index.
  • max_size - the maximum estimated size of primary shards (since Elasticsearch 6.x). It uses byte size units tb, gb, mb.

Rollover index management strategy is more complex than using the default daily indices and it requires an initialization job to prepare the storage and cron jobs to manage indices.

To learn more about rollover index management in Jaeger refer to this article.

For automated rollover, please refer to ILM Support section.

Initialize

The following command prepares Elasticsearch for rollover deployment:

docker run -it --rm --net=host \
  jaegertracing/jaeger-es-rollover:latest \
  init http://localhost:9200 # <1>

<1> If you need to initialize archive storage, add -e ARCHIVE=true.

The initializer performs the following steps for each index type (spans, services, dependencies):

  1. Creates index templates that define field mappings, shard/replica settings, and index patterns (e.g., jaeger-span-*). All future rollover indices inherit their schema from these templates.
  2. Creates the first rollover index (e.g., jaeger-span-000001). Subsequent rollovers increment this number.
  3. Creates read and write aliases (e.g., jaeger-span-read and jaeger-span-write) pointing to the initial index. Jaeger queries via the read alias and writes via the write alias.

After the initialization, Jaeger can be deployed with use_aliases: true.

Roll over

The next step is to periodically execute the rollover API which rolls the write alias to a new index based on supplied conditions. The command also adds a new index to the read alias to make new data available for search.

docker run -it --rm --net=host \
  -e CONDITIONS='{"max_age": "2d"}' \
  jaegertracing/jaeger-es-rollover:latest \
  rollover  http://localhost:9200 # <1>

<1> The command rolls the alias over to a new index if the age of the current write index is older than 2 days. For more conditions see Elasticsearch docs.

The next step is to remove old indices from read aliases. It means that old data will not be available for search. This imitates the behavior of max_span_age: config property used in the default index-per-day deployment. This step could be optional and old indices could be simply removed by index cleaner in the next step.

docker run -it --rm --net=host \
  -e UNIT=days -e UNIT_COUNT=7 \
  jaegertracing/jaeger-es-rollover:latest \
  lookback http://localhost:9200 # <1>

<1> Removes indices older than 7 days from read alias.

Remove old data

The historical data can be removed with the jaeger-es-index-cleaner that is also used for daily indices.

docker run -it --rm --net=host \
  -e ROLLOVER=true \
  jaegertracing/jaeger-es-index-cleaner:latest \
  14 http://localhost:9200 # <1>

<1> Remove indices older than 14 days.

ILM support

Elasticsearch ILM automatically manages indices according to performance, resiliency, and retention requirements.

ILM support is an alternative to the manual rollover + lookback + index-cleaner workflow described above. When ILM is enabled, Elasticsearch manages rollover and retention automatically according to the configured policy.

For example:

  • Rollover to a new index by size (bytes or number of documents) or age, archiving previous indices
  • Delete stale indices to enforce data retention standards

To enable ILM support:

  • Create an ILM policy in elasticsearch named jaeger-ilm-policy.

    For example, the following policy will rollover the “active” index when it is older than 1m and delete indices that are older than 2m.

    curl -X PUT \
    http://localhost:9200/_ilm/policy/jaeger-ilm-policy \
    -H 'Content-Type: application/json; charset=utf-8' \
    --data-binary @- << EOF
    {
      "policy": {
        "phases": {
          "hot": {
            "min_age": "0ms",
            "actions": {
              "rollover": {
                "max_age": "1m"
              },
              "set_priority": {
                "priority": 100
              }
            }
          },
          "delete": {
            "min_age": "2m",
            "actions": {
              "delete": {}
            }
          }
        }
      }
    }
    EOF
    
  • Run rollover initializer with ES_USE_ILM=true:

    docker run -it --rm --net=host\
      -e ES_USE_ILM=true \
      jaegertracing/jaeger-es-rollover:latest \
      init http://localhost:9200 # <1>
    

    <1> If you need to initialize archive storage, add -e ARCHIVE=true.

    The initializer performs the same steps as described above (creates index templates, seed indices, and aliases), with the following ILM-specific additions:

    • Validates that the ILM policy (jaeger-ilm-policy) exists in Elasticsearch.
    • Embeds index.lifecycle.name and index.lifecycle.rollover_alias in the index templates, so Elasticsearch automatically applies the ILM policy to every new rollover index.
    • Sets is_write_index: true on the write aliases, which is required for Elasticsearch to perform ILM-triggered rollovers.

    With ILM enabled, Elasticsearch manages rollovers and retention automatically — you no longer need the rollover, lookback, or index-cleaner cron jobs described above.

    After the initialization, deploy Jaeger with use_ilm: true and use_aliases: true.

Upgrading

Elasticsearch defines wire and index compatibility versions. The index compatibility defines the minimal version a node can read data from. For example Elasticsearch 8 can read indices created by Elasticsearch 7, however it cannot read indices created by Elasticsearch 6 even though they use the same index mappings. Therefore upgrade from Elasticsearch 7 to 8 does not require any data migration. However, upgrade from Elasticsearch 6 to 8 has to be done through Elasticsearch 7 and wait until indices created by ES 6.x are removed or explicitly reindexed.

Refer to the Elasticsearch documentation for wire and index compatibility versions. Generally this information can be retrieved from root/ping REST endpoint.

Reindex

Manual reindexing can be used when upgrading from Elasticsearch 6 to 8 (through Elasticsearch 7) without waiting until indices created by Elasticsearch 6 are removed.

  1. Reindex all span indices to new indices with suffix -1:
curl -ivX POST -H "Content-Type: application/json" \
  http://localhost:9200/_reindex -d @reindex.json
{
  "source": {
    "index": "jaeger-span-*"
  },
  "dest": {
    "index": "jaeger-span"
  },
  "script": {
    "lang": "painless",
    "source": "ctx._index = 'jaeger-span-' + (ctx._index.substring('jaeger-span-'.length(), ctx._index.length())) + '-1'"
  }
}
  1. Delete indices with old mapping:

    curl -ivX DELETE -H "Content-Type: application/json" \
      http://localhost:9200/jaeger-span-\*,-\*-1
    
  2. Create indices without -1 suffix:

    curl -ivX POST -H "Content-Type: application/json" \
      http://localhost:9200/_reindex -d @reindex.json
    {
      "source": {
        "index": "jaeger-span-*"
      },
      "dest": {
        "index": "jaeger-span"
      },
      "script": {
        "lang": "painless",
        "source": "ctx._index = 'jaeger-span-' + (ctx._index.substring('jaeger-span-'.length(), ctx._index.length() - 2))"
      }
    }
    
  3. Remove suffixed indices:

    curl -ivX DELETE -H "Content-Type: application/json" \
      http://localhost:9200/jaeger-span-\*-1
    

Run the commands analogically for other Jaeger indices.

There might exist more effective migration procedure. Please share with the community any findings.