Splunk Query Language Cheat Sheet

A simple query in JQL (also known as a “clause”) consists of a ˜eld, followed by an operator, followed by one or more values or functions. For example: To perform a more complex query, you can link clauses together with keywords. JQL Cheat Sheet. Tips and Tricks to Use Splunk Commands. Some common users who frequently use Splunk Command product, they normally use some tips and tricks for utilizing Splunk commands output in a proper way. Those kinds of tricks normally solve some user-specific queries and display screening output for understanding the same properly.

The searches that you have run to this point have retrieved events from your Splunk index. You were limited to asking questions that could only be answered by the number of events returned.

For example, you ran the following search to determine how many simulation games were purchased:

sourcetype=access_* status=200 action=purchase categoryId=simulation

To find this number for the days of the previous week, you need to run it against the data for each day of that week. To see which products are more popular than the other, run the search for each of the eight categoryId values and compare the results.

Splunk developed the Search Processing Language (SPL) to use with Splunk software. SPL encompasses all the search commands and their functions, arguments, and clauses. One way to learn the SPL language is by using the Search Assistant.

Learn with the Search Assistant

There are two modes for the Search Assistant: Compact and Full. The default mode is Compact, which you were introduced to in the Basic searches and search results topic in this tutorial.

This section shows you how to change the Search Assistant mode. You will use the Search Assistant to learn about the SPL and to construct searches. If you have a Splunk Free license, you will not be able to change the Search Assistant mode. See Choose a platform to learn about difference between the Splunk Trial and Splunk Free licenses.

Splunk platform	Step	Example
Splunk Enterprise	Select Administrator > Preferences. Click SPL Editor. On the General tab next to Search assistant, click Full. The default setting is Compact. You can tell which mode is set by the dark gray background on the mode. The Full mode provides more information as you type commands in the Search bar. Click Apply.
Splunk Cloud	Select Your_Name > User Settings. Scroll down to the Search section and change the Search assistant to Full. The Full mode provides more information as you type commands in the Search bar. Click Save.

Let's explore the benefits of the Full mode and creating searches using the SPL commands.

Click Search in the App bar to start a new search.
Change the time range to All time.
Type the letter s in the Search bar.
The Search Assistant shows a list of Matching Searches and Matching Terms. It also explains briefly How To Search.

Select the following search from the Matching Searches list, or type the search into the Search bar.
sourcetype=access_* status=200 action=purchase
After action=purchase, type a pipe character ( | ) into the Search bar.
The pipe character indicates that you are about to use a command. The results of the search to the left of the pipe are used as the input to the command to the right of the pipe. You can pass the results of one command into another command in a series, or pipeline, of search commands.
Notice that the Search Assistant changes to show a list of Common Next Commands.
You want the search to return the most popular items bought at the Buttercup Games online store.
Under Common Next Commands, select top.
The top command is appended to your search string.
Type categoryId into the Search bar.
The following search is the complete search string.
sourcetype=access_* status=200 action=purchase | top categoryId
- The search criteria before the pipe character, sourcetype=access_* status=200 action=purchase, locates events from the access control log files, that were successful (HTTP status is 200), and that were a purchase of a product.
- The search criteria after the pipe character, top categoryId, takes the events located and returns the categoryId field for the most common values.
Run the search.
The results of the top command appear in the Statistics tab.

View results in the Statistics tab

The top command is a transforming command. Transforming commands organize the search results into a table. Use transforming commands to generate results that you can use to create visualizations such as column, line, area, and pie charts. You will learn more about visualizations later in this tutorial.

Because transforming commands return your search results in a table format, the results appear on the Statistics tab.

In this search for successful purchases, seven different category IDs were found. The list shows the category ID values from highest to lowest, based on the frequency of the category ID values in the events.

Many of the transforming commands return additional fields that contain useful statistical information. The top command returns two new fields, count and percent.

The count field specifies the number of times each value of the categoryId field occurs in the search results.
The percent field specifies how large the count is compared to the total count.

View and format results on the Visualization tab

You can also view the results of transforming searches on the Visualization tab, where you can format the chart type.

Click the Visualization tab.
By default, the Visualization tab opens with a Column chart.
Click Column Chart to open the visualization type selector.
Column, Bar, and Pie charts are listed as the Recommended chart type for this data set.

Select the Pie chart.
Now, your visualization looks like the following pie chart.

Hover over each slice of the pie to see the count and percentage values for each categoryId.

Click on the STRATEGY slice.
categoryId=STRATEGY is added to your search string, replacing the top command. The search runs again.

Next step

Learn about correlating events with subsearches.

Intro. What is Splunk

Splunk turns Machine Data Into Answers

Real-Time – Splunk gives you the real-time answers you need to meet customer expectations and business goals.
See How Zillow is Taking Advantage
Machine Data – Use Splunk to connect your machine data and gain insights into opportunities and risks for your business.
Gain Answers With Machine Data
Scale – Splunk scales to meet modern data needs — embrace the complexity, get the answers.
AI and Machine Learning – Leverage artificial intelligence (AI) powered by machine learning for actionable and predictive insights.
Learn About the Must Have Technology
Reporting health conditions in real time
Delve deeper into the patient’s health record and analyze patterns
Alarms / Alerts to both the doctor and patient when the patient’s health degrades

Splunk is the engine for machine data

Machine data is more than just logs -‐ it’s configuration data, data from APIs and message queues, change events, the output of diagnostic commands and more
Log types: ApplicaFon, Web Access and Proxy, Call Detail Records(CDR), Clickstream, Message Queues, Packet, Database audit and tables, File audit, Syslog, WMI, PerfMon

Quick and easy way to…

Easily visualize the data into events rather then lines of text
Quickly get the data properly broken into events
Accurately get the Timestamp extracted
All in a wicked cool GUI… – Once everything is good you take your PROPS secngs and deploy

Splunk structure

TestEnvironment

Every Splunk deployment should have a test environment
It can be a laptop, virtual machine or spare server
Should have the same version of Splunk running in production
Accessible to other Splunk developers and administrators

CONSIDERATION IN MIND when instaling Splunk

The following considerations need to be taken into account before installingconfiguring:

1.Disc capacity
2. Prformance CPU
3.SSH as best practices for app configuratuions
4. SE/CIM setuo
5.Universal forwarder config/install

Planningfor Splunk setup

Setting up a Splunk AWS instance details: Instance URL: ec2-1-2-3-4.eu-west-1.compute.amazonaws.com

Diagram of systems with a single EC2 Instance being the AIO. Only the UFagent (installed manually to clients) and TA (pushed to clients viaDeploymentApp on Server, no manual install) are installed remoteclients/hosts.

The AIO server is comprising of all these modules All-In-One:

Search Indexer
Deployment Apps
SE
CIM
Generally Splunk keeps 14 days of logs, keeping 6/12months is an overkill, measured in TB which is not justified in Storage volumes
Data freezing: There’s HOT/WARM bucket, COLD bucket and FROZEN (archive) bucket
Capacity planning is key for healthy Splunk
Monitoring console is Healthcheck area

Apps to Install:

Common interface model (SE/CIM)
Indexes volume indexer # Always use local, do not edit default folder. Config file is indexes.conf
Splunk take precedense of LOCAL ovr DEFAULT folder locations.
Installing apps via SSH as best practice, with configs always in LOCAL folder (and create one if missing that stores configs) as opposed to defaults DEFAULTS one.
It’s best to test out configs/installs in DEV-SPLUNK box and use a Trial for 60 days, then it’s free with 500MB of indexes data !!
Data is stored in .tsidx format and not a SQL db. Raw data is stored in tsidx

PREPARATIONS

1. Prepare Drives

Live-Splunk-App1 has the following:

system drive – 20GB (system)
primary drive – 300GB (data-drive hot holding
secondary drive – 100GB (holding FROSEN data, past 10/14 days as configured)

List of apps command:

/MNT/DATA is the 300GB DATA drive. A splunkdata folder needs to be created and then user SPLUNK has access to manage filder

Rebooting to refresh config:

2. Prepare indexerbase configs

Editing indexes and configs mostly needs a restarts of splunk service
Everything in Splunk is measured in seconds

3.Prepare SE / CIM

lookup editor
SA CIM
Splunk_TA_nix
Security esentials.zip

We need permissions setup for TA (Technical Add-ons) which are actually scripts

Then reboot. Thus apps asre visible pm left and also DATA MODELS

4.Pрепаре Universal Forwarders

DOMAIN_all_deployments

DOMAIN_all forwarders

PORTS need to be whitelisted – 8089, 8081,8082, 9997 etc (see furtherfor common ports

AGENT IS INSTALLED with a quiet CMD

>>>

5. Prepare SPLUNK APPS

Splunk Server is v7

Agents are best to be matching version or older. The latest v7.1 is a bit risky to use. Might work but have that in mind
Agents are downloaded and copied to Webservers – Installation is run by a quiet CMD command:

Cluster Classes:

Creating an all_windows_server_test. Then edit classes to includerelevant IP/DNS/hostname (whitelist IP/hostmame/DNS. Then add APPS, edit app,click to include and then SAVE)

Deploying RESTARTS the agent

Forwarding agent installation: Once installed to check if app isinstalled, click EDIT

Once installed and internal logs will start pushing (used fortroubleshooting and proof)

6. Prepare TA_AGENTS

TA-agents are important, these define what is being collected forUniversal Forwarder Agent to push to Splunk

Unzip file in /deployment-apps/

Then’s the security defined:

chown – R – splunk:splunk /opt/splunk/etc/splunk/apps/

su splunk

pwd

cd splunk_TA_windows

DEPLOYMENT

Splunk Commands Pdf

Forward Managemetn – Edit – Click Move to right – Now we have 3 appsdeployed

Then troubleshoot if TA works in > Splunk>Volume.Instancesthus confirming Windows logs logging

Changes need to be applied:

OVERALL>SETTINGS>MONITORING CONSOLE> APPLY settings

In case Win Security is not showing – Windows Audit logs need to be enabledin MMC

7. Review with runnning some Search/Reports

Generic APP installlation steps

1.SpluinkAdmin

Settings>Forward Mangement (top right)

Server classes > Create new class: LIVE (this is a new group for LIVEservers) # This is needed for new GROUPS of servers

Then we have two areas:

ADD APPS – All three apps – selected to be installed

ADD CLASSES – defines which servers to add

(include) – whitelist – prefered to allow whole VPC or server IP – Addind10.1.100.* (NOTE: Dns does not work, splunk cannot ping hostname, even whenvisible in gui)

Note: AWS GATEWAY must be whitelisted for server withPrivate IP and VPC GW public IP

2.INSTALLTHE AGENT

2.1.Agent is downloaded and silently installed via command. Go to folder andexecute fillowinf

msiexec.exe /i splunkforwarder-7.1.1-8f0ead9ec3db-x64-release.msi DEPLOYMENT_SERVER=”1.2.3.4:8089″ AGREETOLICENSE=Yes SPLUNKPASSWORD=RELEVANT_CONPL /quiet

2.2.Firewall Whitelist the ProgramFiles > bin/splunkd.exe file

2.3.Enable Windows Security Logs in Locals Security Policy!!! (chooseprefered success//failure audits)

2.4.Note: AWS GATEWAY must be whitelisted in SPLUNK ADMIN

2.5.SPL management – Forwarder Management – the new server is now showing aslisted

2.6. Then to push apps to Agent Servers a deploy-server command need to beexecuted:

su splunk

(sudo -u) splunk /opt/splunk/bin/splunk reload deploy-server

2.7 Troubleshoot if agent is not connecting

Open logs in C:/ProgramFiles/UniversalForwarder/var/logs.. and read logs

Next image of logs listed the pointer of Splunk as an internalIP, which was not resolved by agent. Thus SPLUNK required additional outputs.configedit to add Splunk-server identified with its PUBLIC IP also!!!

3. Once installed, a verification can be done via SEARCH:

index=_internal | stats count by host

HandyInfo

Diagrams – Overview of Splunk systems

Optimisation

Whitelist or Blacklist Windows Events
This will selectively include or exclude events from collection on a Windows forwarder
Available feature on 6.x or greater Windows forwarders
All controlled through inputs.conf on the Windows forwarders

Splunk Quick Reference Guide

Example:

[WinEventLog://Security]
whitelist = 4,5,7,100-‐200
…
[WinEventLog://Security]
blacklist = EventCode=%^200$% User=%duca%
…

Provides reliable and consistent indexing of data with headers
Address issue on forwarder:

INDEX_EXTRACTIONS = {CSV | W3C | TSV | PSV | JSON}

Supports custom header parsing and easy mode for common formats
Extract IIS fields using Props.conf on Windows forwarder: [IIS]

INDEX_EXTRACTIONS = w3c

Modular Inputs – Splunk Enterprise app or add-‐on that extends the Splunk Enterprise framework to define a custom input capability. Examples: (Checkpoint OPSEC, Twider, Stream, Amazon S3 Online storage)
Scripted Imputs – A scripted input is used to get data from applicaFon program interfaces (APIs) and other remote data interfaces and message queues. Examples (VMStat, Top, iostat)
Scripted Inputs Example – This is Shell script saved in /opt/splunk/bin/scripts/ OR in a specific App; It Allows you to execute any program on Splunk Forwarder and index

Splunk Cheat Sheet Reference Guide Pdf

STDOUT data

Splunk DB Connect is also an option – Allows for indexing data directly from database queries.
DB Connect Best Practice:

— Normalize Fmestamps naFvely inside the SQL Query

— Filter results down in SQL Query to reduce garbage in Splunk Index.

— Repeated DBLookups should be converted to static lookup

— Search Head Pooling requires encrypted password replication

— Search Head Clustering Supported

Splunk App For Stream – Provides the ability to capture real-‐Fme streaming wire data from anywhere in your datacenter or from any public Cloud infrastructure (Win, Mac, Unix)
Splunk Stream DNS Capture – Full DNS Queries without logging enabled

Portsused by Splunk

Common ports listed below (All ports are TCP)

9997 for forwarders to the Splunk indexer. 9997 is not a default; just a convention. You need to set it explicitly on the receiving instance (indexer). Flows on port 9997 from the search heads, deployment server, license server, and cluster master to the indexers, with a footnote that this is an optional flow used for forwarding Splunk’s internal indexes (a recommended best practice).
8000 for clients to the Splunk Search page
8089 for splunkd (also used by deployment server).

Optional ports for distributed systems:

8080 – Indexer Replication port
514 – Network port
8191 – KV store port (since v6.2)
Search Head Clustering uses a new replicationport that you can pick, e.g. 8181. Also with SHC you need the KV store port (bydefault, 8191) must be available to all other members. You can use the CLIcommand splunk show kvstore-port to identify the port number. The replicationport must be available to all other members.

Note: There’s confusion about port required from UFs to a HF. Which is 9997too i.e. Many uses HF & DS as same server.

UFs —9997—> HF — 9997—> Indexers
UFs, Indexers, SHs —8089 —> DS

Directions of ports. Generally as below. Use tcpdump to verify

8089 for the deployment server is only neededfrom the client to the deployment server. Client being indexer, UF, etc.
9997 from the forwarder to the indexer. Noconnection is needed back from the indexers.
8089 is also used from a Search Head to yourindexers. Again only single direction.
port 8089 for the license-master (fromlicense-slave to license-master)
port XXXX for the replication cluster master,and slaves.

Source: https://answers.splunk.com/answers/58888/what-are-the-ports-that-i-need-to-open.html

Writing Effective Queries for Splunk with SPL

Splunk is arguably one of the most popular and powerful tools across the security space at the moment, and for good reason. It is an incredibly powerful way to sift through and analyze big sets of data in an intuitive manner. SPL is the Splunk Processing Language which is used to generate queries for searching through data within Splunk.
The organization I have in mind when writing this is a SOC or CSIRT, in which large scale hunting via Splunk is likely to be conducted, though it can apply just about any where. It is key to be able to have relevant data sets for which to properly vet queries against. Fortunately, there are many example data sets available for testing on GitHub, from Splunk, and some mentioned below. There are also “data generators” which can generate noise for testing. Best of all would be to create your own though :).
I was fortunate to have had the enjoyable experience of participating in a Boss of the SOC CTF a few years back, which had some pretty good exemplar security related data. Earlier this year, they released the data set publicly here.
This guide is not meant to be a deep dive into the structuring of a query using the SPL. The best place for that is the Splunk documentation itself, starting with this. This is geared more towards operations in which multiple queries are written, maintained, and used in an operational capacity. Many of these concepts can be generalized and applied to other signatures, rules, code or programmatic functions, such as Snort, YARA, or ELK, in which a large quantity of multi-version discrete units must be maintained.

1. Balance efficiency with enough specificity to minimize false positives

The ultimate goal of any Splunk query is to search and present data in order to answer some question(s). There are many right ways to search in Splunk, but there are often far fewer best ways (yes, multiple bests, see next sentence). Before formulating a search query, a couple considerations should be weighed and prioritized, such as accuracy, efficiency, clarity, integrity, and duration. It is easy to get spoiled by simply doing wildcard searches, but also just as easy to unnecessarily bog down a search with superfluous key value mappings. An over reliance of either can lead to problems.
Accuracy – are there multiple sources which can answer the question? If so, which is more reliable and authoritative? More importantly, how important is it to reduce or eliminate false positives from your results? There is a heavy inverse correlation between accuracy and efficiency.
Clarity – filtering down to the most relevant information needed to answer the question is only half of the battle –you still need to interpret it. It may be fine to view the results as raw data if there are only one or two results of non-complex data, but when there are rows of deeply structured data, taking the time to present it in the most appropriate manner will go a long way.
Duration – the length required for the query to complete. Is this a search that will be run often, and so delays are additive and add to total inefficiency; is there an urgent need to answer something ASAP; is a longer duration eating up resources on other running functions on the search head? Sometimes it is necessary to break a search into smaller sub-searches or to target smaller sets of data and then pivot from there.
Efficiency – closely tied to duration, an inefficient query will lead to unnecessary delays, excessive resource consumption, and could even effect the integrity of the data (pay close attention to implicit limitations of results on certain commands!). Paying attention to efficiency is especially important if there are per-user limitations on number of searches, memory usage, or other constraints.Too many explicitly defined wildcard placeholders could become very expensive, and the atomicity of a formulated query should always be considered.
Integrity – will you be manipulating any data as part of your search? If so, understand the risks to compromising the integrity of your results in doing so. The more pivots made on returned data, the more susceptible to loss of integrity the search becomes.

2. Make it readable

Write queries in a consistent and clear manner. Sometimes it is better to have a query take up many additional lines for the sake of better readability. Breaking into newlines on pipes is the defacto standard for readability purposes, as can be seen below.

3. Make it extensible

Queries should be written in such a way that other people can modify it for their own adaptations or to update or expand a current one. Some ways to accomplish this would be using obvious variable names, readability, or even leaving in inexpensive functionality or variables which can be used for other purposes.

4. Make it modular

Modularity will lead to extensibility, maintainability, and resiliency. This will also increase efficiency as code reuse will be much simpler.

5. Make it feasible

If the query is written for the purpose of manual sifting and analysis, then 50k results is not very reasonable. However, if it is for stateful preservation, alerts, or lookups, then that is more acceptable. Incorporating pivots on the information with subsearches and filtering or even, if necessary, breaking it up in to multiple different queries will make managing the results a surmountable task.

6. Make it resilient

The data can change and so can the SPL itself (or even custom commands if used), so writing queries that are less effected by potential changes is important, especially if the effects of the changes are not obvious, which could lead to a loss of integrity in the results. (This is where testing is also important)

7. Make it consistent

Having a style guide may seem like overkill, but if your operation is highly dependent on maintaining a repository of queries, it can go a long way. Naming conventions, spacing, line breaks, use of quotations, ordering, and style are some of the things to standardize to help with consistency.

8. Make it identifiable

Something as simple as:

This ID can then be printed out with the results if needed or purely used as a means to categorize and quickly identify. Naming conventions should be obvious or recognizable (wxp = Windows XP, query 110), or even mappable to the repository itself.

9. Make it noob friendly

This is obviously highly dependent on your usage and organizational structure, however, it never hurts to keep queries as simple as can be, since there is always the chance that someone else will need to maintain or interpret them. Bonus* less time needing to train people on their purpose!

10. RTFM!

I am a huge proponent of RTFM (F!=field, btw) for both myself and others. Splunk has put a lot of effort into meticulous documentation, which is clearly reflected in the detailed and thorough documentation. With regards to writing SPL queries, the search reference is your absolute best friend!

11. Know your data

The first two things that I tell anyone to do that is new to Splunk is to familiarize yourself with the syntax of SPL (#10) and just as importantly, to get to know how the data is structured. The simplest way to do this is to do a wildcard search (*) and start reviewing the raw results under the events tab. The data will usually be structure in XML or JSON. Initially, it will be less important to know which data was structured from indexing, field extractions, or other transforms, but may become important with more advanced searches.

12. Test it

Do not ever merge a query into production ops, bless off on it, trust it, or whatever it is you do to give it legitimacy without first testing and confirmation of positive results. Regardless of how simple the query is, you can never guarantee that some other confounding issue isn’t occurring. If it is a matter of missing the applicable data, well then, Try Harder! There are many great products out there to help with this at scale, such as Red Canary’s atomic red team or Mitre’s caldera.

13. Build it out piecemeal

It can get stressful spending a lot of time on a query, only for it to not return the correct or any results, regardless of tweaking. The best way to build complex queries is to build them in pieces, testing as you go along. This is especially convenient because you can point to available data for the sake of testing to ensure positive results, and then change it as it is built out.

14. Implement version control

The necessity of this is really dependent on the amount of queries and modifications, though it makes sense even for small quantities. This can be accomplished as simply as baking a version into the query itself, such as from #8 with revisions tacked on with periods (wxp-110.3) or even in its own field:

Even better than that would be to maintain them in a database or repository such as GitHub, which gives the added benefit of stateful change representations. It is also possible to save searches directly in Splunk, the version control is less intuitive in this way.

15. Maintain multiple versions of the same thing

This doesn’t just apply to older versions of the same query, but queries which may search the same thing but present it in a different manner, search a different data set, or search a different time window.

16. Don’t reinvent the wheel

It is all too easy to blow a full 12 hour shift perfecting a query, which may not even end up working at all. While it is important to have these search queries catered to your specific need, it is not always necessary to MacGyver it alone. There are lots of great resources available to borrow ideas or techniques from, such as the Splunk blogs and forums, or you can even work with a co-worker.

17. Don’t depend on the wheel

Counter to #16, you do not want to become over reliant on searching for help, as this could lead to running queries which may not be working as you think they are. This could also potentially compromise the integrity of the results. Worse yet, it could be an inefficient way of doing something which has caught on and persisted through the forums.

18. Share it

If you have written a gem or come up with a novel approach to something, share it back with the community. Even if the data set is different, there may still be much which can be gleaned from it. It also helps to drive conversations which benefit the community as a whole.

19. Save it

This is such an obvious one, but in spite of that, I still constantly find myself rewriting queries that I had previously written over and over again…

20. REGEX!

I don’t know why I have this all the way down at #20, because this is easily one of the most powerful and important concepts for which to be able to pivot on results with. There are several commands where regex is able to be leveraged, but the two most significant are regexand rex.
Regex does exactly what it says –allows you to filter on respective fields (or _raw) using regex, which in Splunk is a slimmed down version of PCRE. The rex command is much more powerful, in that it allows you to create fields based on the parsed data, which can then be used to pivot your searches on. You can even build it as a multivalued field if more than one match occurs. An example of the rex command (and potentially more than one value) can be seen in the example from #13.

21. Know when its better to go beyond just using a search with SPL

Finally, we made it all the way to #21! Sometimes, depending on circumstance, function, and operational usage, manual searching with SPL queries is just not the best answer. Splunk has a lot of other functionality which can accomplish many of the same things, with less manual requirements. Alerts, scheduled reports, dashboards, and any of a number of apps built within or against the API allow for almost limitless capability. If you are struggling to maintain or achieve some of the topics annotated here, it may mean it is time to explore some of these alternative options.

Overall

This is certainly not an all inclusive list, as there are many more practices which can apply here. Ultimately, it depends on the specific deployment, implementation, and usage of Splunk which should dictate exactly how you create and maintain search queries. This was also not meant to go too deep in the weeds on generating advanced queries (though that may come in the future), but rather a high level approach to maintaining quality and standards. There are many other people who are far more experienced and with much greater Splunk-fu out there, so if you have any input or insight, please feel free to reach out.