Architecture¶
Overview¶
The collection deploys four Elastic Stack services — Elasticsearch, Logstash, Kibana, and Beats — each managed by its own Ansible role. A fifth shared role (elasticstack) provides common defaults that all roles inherit. A sixth role (repos) manages Elastic APT/YUM package repositories.
When elasticstack_full_stack: true (the default), roles auto-discover hosts and connections through Ansible inventory groups. Each role looks up the other services' hosts using configurable group names (elasticstack_elasticsearch_group_name, elasticstack_logstash_group_name, etc.), so you don't need to hard-code addresses between services.
Data flow¶
graph LR
subgraph Beats Hosts
FB[Filebeat]
MB[Metricbeat]
AB[Auditbeat]
end
subgraph Pipeline
LS[Logstash :5044]
end
subgraph Storage & UI
ES[Elasticsearch :9200/:9300]
KB[Kibana :5601]
end
FB -- "Beats protocol<br/>TLS" --> LS
MB -- "Beats protocol<br/>TLS" --> LS
AB -- "Beats protocol<br/>TLS" --> LS
LS -- "HTTPS" --> ES
KB -- "HTTPS" --> ES
FB -. "direct output<br/>(optional)" .-> ES
MB -. "direct output<br/>(optional)" .-> ES
AB -. "direct output<br/>(optional)" .-> ES
Beats collect logs, metrics, and audit data from hosts and forward them to Logstash over port 5044. Logstash processes and enriches events through its pipeline (input → filter → output) and writes them to Elasticsearch. Kibana reads from Elasticsearch to provide the web UI. All connections use TLS when security is enabled.
Beats can also output directly to Elasticsearch (bypassing Logstash) by setting beats_filebeat_output: elasticsearch.
Role execution order¶
Roles should be applied in this order because each depends on the previous:
graph TD
R[repos] -->|"APT/YUM repos ready"| E[elasticsearch]
E -->|"CA + passwords available"| K[kibana]
E -->|"CA + passwords available"| L[logstash]
L -->|"Logstash listening :5044"| B[beats]
E -->|"CA available"| B
style R fill:#f5f5f5,stroke:#333
style E fill:#005571,stroke:#333,color:#fff
style K fill:#e8478b,stroke:#333,color:#fff
style L fill:#00bfb3,stroke:#333,color:#fff
style B fill:#f04e98,stroke:#333,color:#fff
- repos — Adds Elastic package repositories (APT/YUM). Must run first so packages are available.
- elasticsearch — Installs ES, forms the cluster, initializes security (generates passwords, CA, certificates). Other roles need the CA and passwords.
- kibana — Connects to Elasticsearch using the
kibana_systempassword, gets its TLS certificate from the ES CA. - logstash — Creates its
logstash_writeruser and role in Elasticsearch, fetches TLS certificates from the ES CA, configures the pipeline. - beats — Installs Filebeat/Metricbeat/Auditbeat, fetches TLS certificates, configures output to Logstash or Elasticsearch.
In a full-stack playbook, all roles run on all relevant hosts. Each role internally checks group_names or uses delegate_to to only act on the correct hosts.
TLS certificate chain¶
The collection manages a complete PKI rooted in a CA generated by the Elasticsearch certutil tool on the first ES host:
graph TD
CA["Elasticsearch CA<br/>/opt/es-ca/<br/>ca.crt + ca.key"]
CA --> ES_CERT["ES node certs<br/>(one per host)<br/>transport :9300 + HTTP :9200"]
CA --> KB_CERT["Kibana cert<br/>ES connection<br/>+ optional HTTPS frontend"]
CA --> LS_CERT["Logstash cert<br/>Beats input TLS<br/>+ ES output TLS"]
CA --> BT_CERT["Beats certs<br/>(one per host)<br/>Logstash output TLS"]
style CA fill:#ffd700,stroke:#333,color:#000
style ES_CERT fill:#005571,stroke:#333,color:#fff
style KB_CERT fill:#e8478b,stroke:#333,color:#fff
style LS_CERT fill:#00bfb3,stroke:#333,color:#fff
style BT_CERT fill:#f04e98,stroke:#333,color:#fff
Certificate renewal is handled automatically: each role checks certificate expiry against a configurable buffer (default 30 days) and regenerates when needed. Tags like renew_es_cert, renew_logstash_cert, etc. allow targeted renewal runs.
External certificates¶
All roles support *_cert_source: external to use certificates from any CA (corporate, ACME, Vault PKI). External certs can be provided as file paths or as inline PEM content in variables. The format (PEM vs PKCS12) is auto-detected for file paths; content mode is always PEM.
Elasticsearch supports separate transport and HTTP layer certificates — useful when transport uses an internal CA while HTTP uses a public ACME cert. HTTP falls back to transport if not specified.
See each role's documentation for the full variable reference.
Security initialization¶
When Elasticsearch starts for the first time with security enabled:
sequenceDiagram
participant Role as ES Role
participant CA as CA Host (es1)
participant Node as ES Node
participant Disk as Filesystem
Role->>CA: Generate CA cert + key
CA->>Disk: Write /opt/es-ca/ca.crt, ca.key
Role->>CA: Generate node certificates
CA->>Node: Distribute certs to each ES host
Role->>Node: Configure elasticsearch.yml with TLS
Role->>Node: Start Elasticsearch
Node-->>Role: Wait for cluster health
Role->>CA: Run elasticsearch-setup-passwords
CA->>Disk: Write /usr/share/elasticsearch/initial_passwords
Role->>Disk: Write cluster_initialized marker
Note over Role,Disk: Subsequent roles read elastic password<br/>from initial_passwords via delegate_to
The marker file (cluster_initialized) prevents re-initialization on subsequent runs. Other roles (Kibana, Logstash, Beats) delegate to the CA host to read the elastic password before making API calls.
Rolling upgrades (8.x to 9.x)¶
The Elasticsearch role supports rolling upgrades when elasticstack_version is set to a 9.x version while 8.x is currently installed.
Before any upgrade work begins, the role validates the upgrade path: Elasticsearch 9.x requires that all nodes are already on 8.19.x. If any node is running an older 8.x version (e.g. 8.17.0), the play fails immediately with an UPGRADE PATH VIOLATION error directing you to upgrade to 8.19.x first. This matches Elastic's official upgrade requirements.
Once validated, the upgrade proceeds one node at a time:
graph TD
START([Start]) --> CHECK{All nodes on<br/>8.19.x?}
CHECK -->|No| FAIL([FAIL: upgrade path<br/>violation])
CHECK -->|Yes| WM[Set lenient disk watermarks]
WM --> LOOP{Next node?}
LOOP -->|Yes| ALLOC[Disable shard allocation]
ALLOC --> FLUSH[Synced flush]
FLUSH --> STOP[Stop ES on node]
STOP --> UPG[Upgrade package to 9.x]
UPG --> STARTNODE[Start ES]
STARTNODE --> REJOIN[Wait for node to rejoin]
REJOIN --> ENABLE[Re-enable shard allocation]
ENABLE --> GREEN[Wait for green health]
GREEN --> LOOP
LOOP -->|No more nodes| DONE([Upgrade complete])
style START fill:#4caf50,stroke:#333,color:#fff
style DONE fill:#4caf50,stroke:#333,color:#fff
style FAIL fill:#f44336,stroke:#333,color:#fff
style CHECK fill:#2196f3,stroke:#333,color:#fff
style STOP fill:#f44336,stroke:#333,color:#fff
style UPG fill:#ff9800,stroke:#333,color:#fff
The role sets lenient disk watermarks (97/98/99%) during the upgrade to prevent CI and small-disk environments from blocking shard allocation.
Post-upgrade: LogsDB¶
Upgraded clusters have logsdb.prior_logs_usage: true set internally, which causes cluster.logsdb.enabled to default to false. Fresh 9.x installs get LogsDB enabled by default. If you want the same behaviour on an upgraded cluster, enable it manually after the upgrade completes:
LogsDB uses synthetic _source, which reorders fields, deduplicates arrays, and sorts leaf arrays. Test your dashboards and detection rules before enabling it in production.
Inventory group mapping¶
| Default group name | Used by | Override variable |
|---|---|---|
elasticsearch |
All roles that need ES hosts | elasticstack_elasticsearch_group_name |
logstash |
Beats (output target), Logstash | elasticstack_logstash_group_name |
kibana |
Kibana role | elasticstack_kibana_group_name |
When elasticstack_full_stack: false, roles use beats_target_hosts, logstash_elasticsearch_hosts, etc. instead of inventory group lookups. This is useful for single-service deployments where the Ansible inventory doesn't contain all stack components.
Container and CI workarounds¶
Several tasks detect container environments (virtualization_type in container, docker, lxc) and apply workarounds that are irrelevant for bare-metal or VM deployments:
| Workaround | Roles | Why |
|---|---|---|
systemd Type=exec override |
elasticsearch | ES 8.19+ uses Type=notify + sd_notify, which fails in containers where the notify socket isn't functional. Without the override, systemd waits 900s then kills ES. |
| Lenient disk watermarks (97/98/99%) | elasticsearch | Containers often have limited disk. Default watermarks (85/90/95%) prevent shard allocation in small environments. |
Cache cleanup (rm -rf /var/cache/*) |
elasticsearch, kibana, beats | Frees disk for ES to allocate replica shards of the .security-7 index. |
These workarounds are safe to leave in place — they only fire when ansible_facts.virtualization_type matches container-like environments.
Retry budgets¶
The collection uses extensive retry logic to handle timing windows during cluster formation, security initialization, and rolling upgrades. Here are the key retry budgets:
| Operation | Retries | Delay | Total | Role |
|---|---|---|---|---|
| Package install | 3 | 10s | ~30s | all |
| Bootstrap API check | 5 | 10s | ~50s | elasticsearch |
| Elastic password API check | 20 | 10s | ~200s | elasticsearch |
| Cluster health (security init) | 20 | 10s | ~200s | elasticsearch |
| Wait for port (per node) | — | — | 600s | elasticsearch |
| Kibana readiness | — | — | 300s | kibana |
| Rolling upgrade: API responsiveness | 30 | 10s | ~300s | elasticsearch |
| Rolling upgrade: pre-upgrade health | 50 | 30s | ~25min | elasticsearch |
| Rolling upgrade: node rejoin | 200 | 3s | ~10min | elasticsearch |
| Rolling upgrade: shard allocation | 5-10 | 30s | ~150-300s | elasticsearch |
All until: conditions use | default() for safe attribute access during mixed-version clusters where API responses may differ.
Version-specific behavior (8.x vs 9.x)¶
Note
Templates switch based on elasticstack_release | int >= 9. No user action needed beyond setting elasticstack_release — the correct config is generated automatically.
The collection handles ES 8.x and 9.x with version-conditional templates and guards:
| Area | ES 8.x | ES 9.x |
|---|---|---|
| Filebeat input type | type: log |
type: filestream (requires unique id) |
| Filebeat multiline | Root-level multiline: block |
Nested under parsers: |
| Logstash SSL parameters | ssl, keystore, ssl_verify_mode |
ssl_enabled, ssl_keystore_path, ssl_client_authentication |
| Logstash root execution | Allowed | Refused (CLI only; systemd service unaffected) |
| ES upgrade path validation | N/A | Requires 8.19.x as stepping stone |
| ES security requirement | Required (ES 8+) | Required |
Password and secret defaults¶
Change all default passwords before deploying to production
All roles ship with placeholder passwords. Store real values in Ansible Vault or a secrets manager.
| Variable | Default | Role |
|---|---|---|
elasticstack_ca_pass |
PleaseChangeMe |
elasticstack |
elasticsearch_bootstrap_pw |
PleaseChangeMe |
elasticsearch |
elasticsearch_tls_key_passphrase |
PleaseChangeMeIndividually |
elasticsearch |
kibana_tls_key_passphrase |
PleaseChangeMe |
kibana |
logstash_tls_key_passphrase |
LogstashChangeMe |
logstash |
logstash_user_password |
password |
logstash |
beats_tls_key_passphrase |
BeatsChangeMe |
beats |
The elastic superuser password is auto-generated during security initialization and stored in /usr/share/elasticsearch/initial_passwords.