Skip to content

Architecture

Overview

The collection deploys four Elastic Stack services — Elasticsearch, Logstash, Kibana, and Beats — each managed by its own Ansible role. A fifth shared role (elasticstack) provides common defaults that all roles inherit. A sixth role (repos) manages Elastic APT/YUM package repositories.

When elasticstack_full_stack: true (the default), roles auto-discover hosts and connections through Ansible inventory groups. Each role looks up the other services' hosts using configurable group names (elasticstack_elasticsearch_group_name, elasticstack_logstash_group_name, etc.), so you don't need to hard-code addresses between services.

Data flow

graph LR
    subgraph Beats Hosts
        FB[Filebeat]
        MB[Metricbeat]
        AB[Auditbeat]
    end

    subgraph Pipeline
        LS[Logstash :5044]
    end

    subgraph Storage & UI
        ES[Elasticsearch :9200/:9300]
        KB[Kibana :5601]
    end

    FB -- "Beats protocol<br/>TLS" --> LS
    MB -- "Beats protocol<br/>TLS" --> LS
    AB -- "Beats protocol<br/>TLS" --> LS
    LS -- "HTTPS" --> ES
    KB -- "HTTPS" --> ES

    FB -. "direct output<br/>(optional)" .-> ES
    MB -. "direct output<br/>(optional)" .-> ES
    AB -. "direct output<br/>(optional)" .-> ES

Beats collect logs, metrics, and audit data from hosts and forward them to Logstash over port 5044. Logstash processes and enriches events through its pipeline (input → filter → output) and writes them to Elasticsearch. Kibana reads from Elasticsearch to provide the web UI. All connections use TLS when security is enabled.

Beats can also output directly to Elasticsearch (bypassing Logstash) by setting beats_filebeat_output: elasticsearch.

Role execution order

Roles should be applied in this order because each depends on the previous:

graph TD
    R[repos] -->|"APT/YUM repos ready"| E[elasticsearch]
    E -->|"CA + passwords available"| K[kibana]
    E -->|"CA + passwords available"| L[logstash]
    L -->|"Logstash listening :5044"| B[beats]
    E -->|"CA available"| B

    style R fill:#f5f5f5,stroke:#333
    style E fill:#005571,stroke:#333,color:#fff
    style K fill:#e8478b,stroke:#333,color:#fff
    style L fill:#00bfb3,stroke:#333,color:#fff
    style B fill:#f04e98,stroke:#333,color:#fff
  1. repos — Adds Elastic package repositories (APT/YUM). Must run first so packages are available.
  2. elasticsearch — Installs ES, forms the cluster, initializes security (generates passwords, CA, certificates). Other roles need the CA and passwords.
  3. kibana — Connects to Elasticsearch using the kibana_system password, gets its TLS certificate from the ES CA.
  4. logstash — Creates its logstash_writer user and role in Elasticsearch, fetches TLS certificates from the ES CA, configures the pipeline.
  5. beats — Installs Filebeat/Metricbeat/Auditbeat, fetches TLS certificates, configures output to Logstash or Elasticsearch.

In a full-stack playbook, all roles run on all relevant hosts. Each role internally checks group_names or uses delegate_to to only act on the correct hosts.

TLS certificate chain

The collection manages a complete PKI rooted in a CA generated by the Elasticsearch certutil tool on the first ES host:

graph TD
    CA["Elasticsearch CA<br/>/opt/es-ca/<br/>ca.crt + ca.key"]

    CA --> ES_CERT["ES node certs<br/>(one per host)<br/>transport :9300 + HTTP :9200"]
    CA --> KB_CERT["Kibana cert<br/>ES connection<br/>+ optional HTTPS frontend"]
    CA --> LS_CERT["Logstash cert<br/>Beats input TLS<br/>+ ES output TLS"]
    CA --> BT_CERT["Beats certs<br/>(one per host)<br/>Logstash output TLS"]

    style CA fill:#ffd700,stroke:#333,color:#000
    style ES_CERT fill:#005571,stroke:#333,color:#fff
    style KB_CERT fill:#e8478b,stroke:#333,color:#fff
    style LS_CERT fill:#00bfb3,stroke:#333,color:#fff
    style BT_CERT fill:#f04e98,stroke:#333,color:#fff

Certificate renewal is handled automatically: each role checks certificate expiry against a configurable buffer (default 30 days) and regenerates when needed. Tags like renew_es_cert, renew_logstash_cert, etc. allow targeted renewal runs.

External certificates

All roles support *_cert_source: external to use certificates from any CA (corporate, ACME, Vault PKI). External certs can be provided as file paths or as inline PEM content in variables. The format (PEM vs PKCS12) is auto-detected for file paths; content mode is always PEM.

Elasticsearch supports separate transport and HTTP layer certificates — useful when transport uses an internal CA while HTTP uses a public ACME cert. HTTP falls back to transport if not specified.

See each role's documentation for the full variable reference.

Security initialization

When Elasticsearch starts for the first time with security enabled:

sequenceDiagram
    participant Role as ES Role
    participant CA as CA Host (es1)
    participant Node as ES Node
    participant Disk as Filesystem

    Role->>CA: Generate CA cert + key
    CA->>Disk: Write /opt/es-ca/ca.crt, ca.key
    Role->>CA: Generate node certificates
    CA->>Node: Distribute certs to each ES host
    Role->>Node: Configure elasticsearch.yml with TLS
    Role->>Node: Start Elasticsearch
    Node-->>Role: Wait for cluster health
    Role->>CA: Run elasticsearch-setup-passwords
    CA->>Disk: Write /usr/share/elasticsearch/initial_passwords
    Role->>Disk: Write cluster_initialized marker
    Note over Role,Disk: Subsequent roles read elastic password<br/>from initial_passwords via delegate_to

The marker file (cluster_initialized) prevents re-initialization on subsequent runs. Other roles (Kibana, Logstash, Beats) delegate to the CA host to read the elastic password before making API calls.

Rolling upgrades (8.x to 9.x)

The Elasticsearch role supports rolling upgrades when elasticstack_version is set to a 9.x version while 8.x is currently installed.

Before any upgrade work begins, the role validates the upgrade path: Elasticsearch 9.x requires that all nodes are already on 8.19.x. If any node is running an older 8.x version (e.g. 8.17.0), the play fails immediately with an UPGRADE PATH VIOLATION error directing you to upgrade to 8.19.x first. This matches Elastic's official upgrade requirements.

Once validated, the upgrade proceeds one node at a time:

graph TD
    START([Start]) --> CHECK{All nodes on<br/>8.19.x?}
    CHECK -->|No| FAIL([FAIL: upgrade path<br/>violation])
    CHECK -->|Yes| WM[Set lenient disk watermarks]
    WM --> LOOP{Next node?}
    LOOP -->|Yes| ALLOC[Disable shard allocation]
    ALLOC --> FLUSH[Synced flush]
    FLUSH --> STOP[Stop ES on node]
    STOP --> UPG[Upgrade package to 9.x]
    UPG --> STARTNODE[Start ES]
    STARTNODE --> REJOIN[Wait for node to rejoin]
    REJOIN --> ENABLE[Re-enable shard allocation]
    ENABLE --> GREEN[Wait for green health]
    GREEN --> LOOP
    LOOP -->|No more nodes| DONE([Upgrade complete])

    style START fill:#4caf50,stroke:#333,color:#fff
    style DONE fill:#4caf50,stroke:#333,color:#fff
    style FAIL fill:#f44336,stroke:#333,color:#fff
    style CHECK fill:#2196f3,stroke:#333,color:#fff
    style STOP fill:#f44336,stroke:#333,color:#fff
    style UPG fill:#ff9800,stroke:#333,color:#fff

The role sets lenient disk watermarks (97/98/99%) during the upgrade to prevent CI and small-disk environments from blocking shard allocation.

Post-upgrade: LogsDB

Upgraded clusters have logsdb.prior_logs_usage: true set internally, which causes cluster.logsdb.enabled to default to false. Fresh 9.x installs get LogsDB enabled by default. If you want the same behaviour on an upgraded cluster, enable it manually after the upgrade completes:

PUT _cluster/settings
{ "persistent": { "cluster.logsdb.enabled": true } }

POST logs-*/_rollover

LogsDB uses synthetic _source, which reorders fields, deduplicates arrays, and sorts leaf arrays. Test your dashboards and detection rules before enabling it in production.

Inventory group mapping

Default group name Used by Override variable
elasticsearch All roles that need ES hosts elasticstack_elasticsearch_group_name
logstash Beats (output target), Logstash elasticstack_logstash_group_name
kibana Kibana role elasticstack_kibana_group_name

When elasticstack_full_stack: false, roles use beats_target_hosts, logstash_elasticsearch_hosts, etc. instead of inventory group lookups. This is useful for single-service deployments where the Ansible inventory doesn't contain all stack components.

Container and CI workarounds

Several tasks detect container environments (virtualization_type in container, docker, lxc) and apply workarounds that are irrelevant for bare-metal or VM deployments:

Workaround Roles Why
systemd Type=exec override elasticsearch ES 8.19+ uses Type=notify + sd_notify, which fails in containers where the notify socket isn't functional. Without the override, systemd waits 900s then kills ES.
Lenient disk watermarks (97/98/99%) elasticsearch Containers often have limited disk. Default watermarks (85/90/95%) prevent shard allocation in small environments.
Cache cleanup (rm -rf /var/cache/*) elasticsearch, kibana, beats Frees disk for ES to allocate replica shards of the .security-7 index.

These workarounds are safe to leave in place — they only fire when ansible_facts.virtualization_type matches container-like environments.

Retry budgets

The collection uses extensive retry logic to handle timing windows during cluster formation, security initialization, and rolling upgrades. Here are the key retry budgets:

Operation Retries Delay Total Role
Package install 3 10s ~30s all
Bootstrap API check 5 10s ~50s elasticsearch
Elastic password API check 20 10s ~200s elasticsearch
Cluster health (security init) 20 10s ~200s elasticsearch
Wait for port (per node) 600s elasticsearch
Kibana readiness 300s kibana
Rolling upgrade: API responsiveness 30 10s ~300s elasticsearch
Rolling upgrade: pre-upgrade health 50 30s ~25min elasticsearch
Rolling upgrade: node rejoin 200 3s ~10min elasticsearch
Rolling upgrade: shard allocation 5-10 30s ~150-300s elasticsearch

All until: conditions use | default() for safe attribute access during mixed-version clusters where API responses may differ.

Version-specific behavior (8.x vs 9.x)

Note

Templates switch based on elasticstack_release | int >= 9. No user action needed beyond setting elasticstack_release — the correct config is generated automatically.

The collection handles ES 8.x and 9.x with version-conditional templates and guards:

Area ES 8.x ES 9.x
Filebeat input type type: log type: filestream (requires unique id)
Filebeat multiline Root-level multiline: block Nested under parsers:
Logstash SSL parameters ssl, keystore, ssl_verify_mode ssl_enabled, ssl_keystore_path, ssl_client_authentication
Logstash root execution Allowed Refused (CLI only; systemd service unaffected)
ES upgrade path validation N/A Requires 8.19.x as stepping stone
ES security requirement Required (ES 8+) Required

Password and secret defaults

Change all default passwords before deploying to production

All roles ship with placeholder passwords. Store real values in Ansible Vault or a secrets manager.

Variable Default Role
elasticstack_ca_pass PleaseChangeMe elasticstack
elasticsearch_bootstrap_pw PleaseChangeMe elasticsearch
elasticsearch_tls_key_passphrase PleaseChangeMeIndividually elasticsearch
kibana_tls_key_passphrase PleaseChangeMe kibana
logstash_tls_key_passphrase LogstashChangeMe logstash
logstash_user_password password logstash
beats_tls_key_passphrase BeatsChangeMe beats

The elastic superuser password is auto-generated during security initialization and stored in /usr/share/elasticsearch/initial_passwords.