Advanced SPL: Subsearches, Transactions, and Stats
Power User SPL goes deeper than Core User. Subsearches: a search within a search — [search source=auth_log failed | return 20 src_ip] returns IP addresses used in a subsearch, the outer search filters its results using those IPs. Subsearches are powerful but slow — limited to 10,000 results by default, time-limited, not recommended for large result sets (use join or lookup as alternatives). Transaction command: groups related events into a single transaction object — transaction session_id maxspan=5m maxpause=30s — calculates duration and event count per transaction. Use for session analysis, multi-event workflows. Join command: like SQL JOIN — join type=inner/left/outer [subsearch] — expensive on large data, prefer lookup. Eval advanced functions: if(condition, true_val, false_val), case(condition1, val1, condition2, val2,...), coalesce(field1, field2, ...) — returns first non-null value, mvindex(multivalue_field, index) — extract value from multi-value field. Statistical commands: eventstats (adds stats as new fields to each event — unlike stats which summarises), streamstats (running/window statistics — streamstats count, sum(bytes) by src_ip), rare (least common values — opposite of top).
Field Extractions and Knowledge Objects
Field extractions define how Splunk parses event data into fields. Automatic extractions: Splunk auto-extracts key=value pairs and JSON fields. Custom extractions: use the Field Extractor UI (event-based — highlight text in an event, Splunk infers regex) or write regex/delimiter extractions manually. Extraction types: Rex (regex — named capture groups (?<fieldname>pattern)), Delim (delimiter-based — split on comma, pipe, or tab). Field extraction order: search-time extractions run when you search; index-time extractions run during indexing and are permanent (use sparingly — prefer search-time). Field aliases: create alternative names for existing fields — source_ip = src_ip — allows normalisation across different source types without changing raw data. Calculated fields: define a field using an eval expression — applied at search time as if it were a real field (MB = bytes/1024/1024 — available as a filter and in stats without repeating the eval in every search). Event types: classify events matching a search criteria — tag events with a category. Tags: key-value labels applied to event types or specific field values — tag=attack applied to events matching your detection criteria enables tag-based filtering across all searches.
Data Models, Pivot, and CIM
Data models are the structured schema layer on top of raw Splunk data. A data model defines object hierarchies (Root Event > child datasets with additional constraints), field definitions, and calculated fields — without changing the raw indexed data. Benefits: pivot tables and charts built on data models are automatically correct when the source data changes; Accelerated data models (pre-summarised, stored in TSIDX format — dramatically faster pivot queries at the cost of disk space). CIM (Common Information Model): Splunk's standard data model schema for normalising data from different sources to consistent field names. CIM data models: Authentication (user, src, dest), Network Traffic (src_ip, dest_ip, src_port, dest_port, action), Endpoint (process, registry, filesystem), Web (url, status, bytes, method). Use CIM-compliant add-ons (Splunk Add-on for Microsoft Windows, Cisco Security Suite) to normalise raw source data to CIM field names — then a single Splunk Enterprise Security (ES) correlation search works across all data sources. Pivot UI: drag-and-drop report builder on top of data models — no SPL required for end users, results are always CIM-normalised.