Splunk Architecture and Data Input
Splunk's core architecture: Indexers (receive, parse, and store events — split into buckets: hot, warm, cold, frozen), Search Heads (run searches, generate reports, host dashboards — submit search jobs to indexers), Forwarders (collect data at source and send to indexers — Universal Forwarder is lightweight with minimal footprint, Heavy Forwarder can parse and filter before forwarding). Data flow: source (log file, syslog, API, database) > forwarder (collect and forward) > indexer (parse, index, store) > search head (query and visualise). Source types: Splunk uses source types to parse raw data — predefined source types for common formats (syslog, access_combined for Apache, WinEventLog for Windows), custom source types for custom log formats. Data inputs: file monitoring (Splunk tail-follows files like tail -f), network inputs (listen on UDP/TCP syslog port), scripted inputs (run a script, ingest output), HTTP Event Collector (HEC — REST API endpoint for sending structured JSON events — preferred for cloud-native applications and modern integrations).
The Search Processing Language (SPL)
SPL is Splunk's query language — every dashboard, report, and alert in Splunk starts with an SPL search. Basic search: type keywords, field=value pairs, or Boolean combinations (AND, OR, NOT) in the search bar — Splunk returns matching events from the default index and time range. Time modifiers: time range picker in the UI, or SPL modifiers: earliest=-24h latest=now (relative), earliest=01/01/2025:00:00:00 latest=01/02/2025:00:00:00 (absolute). Key SPL commands: stats (aggregate and summarise — stats count by src_ip, stats avg(bytes) by host), chart (transform data for charting — chart count over _time by status), timechart (time-series stats — timechart count by status span=1h), table (select fields for tabular display), sort (sort results — sort -count for descending), dedup (remove duplicate events by field), eval (create computed fields — eval MB=bytes/1024/1024), rex (extract fields from event using regex — rex field=_raw 'user=(?<username>w+)'), lookup (enrich events from external CSV or KV Store). Search optimisation: always use time range filters, use specific index names (index=security), use field=value pairs to filter early.
Reports, Dashboards, and Alerts
Splunk transforms searches into operational intelligence through three output types. Reports: saved searches with configurable scheduling and delivery — generate PDF or send results via email on a schedule. Reports can be based on any SPL search. Dashboards: collections of panels — each panel runs a search and displays results as a chart, table, map, or single value. Dashboard inputs (dropdowns, time pickers, text boxes) parameterise panel searches — create dynamic dashboards that users filter without editing SPL. Dashboard Studio (new) vs Classic Dashboard Builder — Dashboard Studio offers more flexible pixel-based layout. Alerts: saved searches that trigger actions when conditions are met — condition types: per-result (trigger on each matching event), number of results (trigger when count exceeds threshold), custom (SPL condition using stats). Alert actions: send email, send to Slack/PagerDuty, run a script, create a Splunk notable event, trigger a webhook. Throttling: suppress duplicate alerts for a time window to prevent notification storms. Scheduled report example: daily count of failed logins by user, emailed to security team at 08:00.