Subsearches are the most efficient way to correlate data in Splunk

Subsearches are expensive — limited results, time-limited, executed first for every event in the outer search. Lookups (CSV or KV Store) are faster for enrichment. Join is for intermediate cases. Subsearches work well for small result sets but should not be the first tool you reach for.

Index-time field extractions are better because they are always available

Index-time extractions are permanent and affect all data — a bad extraction corrupts your index. Search-time extractions are flexible, reversible, and do not affect raw data. Always prefer search-time extraction unless you have a specific performance requirement that justifies index-time.

Splunk Core Certified Power User: Advanced Searching, Field Extractions, and Data Models

Advanced SPL: Subsearches, Transactions, and Stats

Power User SPL goes deeper than Core User. Subsearches: a search within a search — [search source=auth_log failed | return 20 src_ip] returns IP addresses used in a subsearch, the outer search filters its results using those IPs. Subsearches are powerful but slow — limited to 10,000 results by default, time-limited, not recommended for large result sets (use join or lookup as alternatives). Transaction command: groups related events into a single transaction object — transaction session_id maxspan=5m maxpause=30s — calculates duration and event count per transaction. Use for session analysis, multi-event workflows. Join command: like SQL JOIN — join type=inner/left/outer [subsearch] — expensive on large data, prefer lookup. Eval advanced functions: if(condition, true_val, false_val), case(condition1, val1, condition2, val2,...), coalesce(field1, field2, ...) — returns first non-null value, mvindex(multivalue_field, index) — extract value from multi-value field. Statistical commands: eventstats (adds stats as new fields to each event — unlike stats which summarises), streamstats (running/window statistics — streamstats count, sum(bytes) by src_ip), rare (least common values — opposite of top).

Field Extractions and Knowledge Objects

Field extractions define how Splunk parses event data into fields. Automatic extractions: Splunk auto-extracts key=value pairs and JSON fields. Custom extractions: use the Field Extractor UI (event-based — highlight text in an event, Splunk infers regex) or write regex/delimiter extractions manually. Extraction types: Rex (regex — named capture groups (?<fieldname>pattern)), Delim (delimiter-based — split on comma, pipe, or tab). Field extraction order: search-time extractions run when you search; index-time extractions run during indexing and are permanent (use sparingly — prefer search-time). Field aliases: create alternative names for existing fields — source_ip = src_ip — allows normalisation across different source types without changing raw data. Calculated fields: define a field using an eval expression — applied at search time as if it were a real field (MB = bytes/1024/1024 — available as a filter and in stats without repeating the eval in every search). Event types: classify events matching a search criteria — tag events with a category. Tags: key-value labels applied to event types or specific field values — tag=attack applied to events matching your detection criteria enables tag-based filtering across all searches.

Data Models, Pivot, and CIM

Data models are the structured schema layer on top of raw Splunk data. A data model defines object hierarchies (Root Event > child datasets with additional constraints), field definitions, and calculated fields — without changing the raw indexed data. Benefits: pivot tables and charts built on data models are automatically correct when the source data changes; Accelerated data models (pre-summarised, stored in TSIDX format — dramatically faster pivot queries at the cost of disk space). CIM (Common Information Model): Splunk's standard data model schema for normalising data from different sources to consistent field names. CIM data models: Authentication (user, src, dest), Network Traffic (src_ip, dest_ip, src_port, dest_port, action), Endpoint (process, registry, filesystem), Web (url, status, bytes, method). Use CIM-compliant add-ons (Splunk Add-on for Microsoft Windows, Cisco Security Suite) to normalise raw source data to CIM field names — then a single Splunk Enterprise Security (ES) correlation search works across all data sources. Pivot UI: drag-and-drop report builder on top of data models — no SPL required for end users, results are always CIM-normalised.

Splunk Core Certified Power User: Advanced Searching, Field Extractions, and Data Models

Advanced SPL: Subsearches, Transactions, and Stats

Field Extractions and Knowledge Objects

Data Models, Pivot, and CIM

Key exam facts — Splunk Power User

Common exam traps

Practice this topic

Test yourself on Splunk Power User

Related certification topics