In the last blog, we have introduced some concepts in Logstash: the log data flow from input to filter to output, the buffer & batch etc.
In the this blog, we focus on how to setup Logstash.
Settings Files
After installing Logstash, we can find its settings files under /etc/logstash
(in linux):
- logstash.yml: Logstash parameter config file
- log4j2.properties: Logstash logging config
- jvm.options: Logstash JVM config
- startup.options: It is used by
system-install
script in/usr/share/logstash/bin
to build the startup script. It will setup options like user, group, service name, and service decription.
logstash.yml
Except the normal form of yml config file functionality, the logstash.yml
file also supports bash-style interpolation of environment variables in setting values.
pipeline:
batch:
size: ${BATCH_SIZE}
delay: ${BATCH_DELAY:5}
node:
name: "node_${LS_NODE_NAME}"
path:
queue: "/tmp/${QUEUE_DIR:queue}"
We can also set logging variables in logstash.yml
to be used in log4j2.properties
of logstash logging config:
log.level: debug
log.format: plain
--------------
rootLogger.level = ${sys:ls.log.level}
rootLogger.appenderRef.rolling.ref = ${sys:ls.log.format}_rolling
Glob Patten Paths
This is the pattern for specify path of file, which can be used either in logstash.yml
or in pipe.conf
, anywhere we want to refer to a file:
*
: match any file except dot files**
: match directories recursively?
: match any one character{p,q}
: match either literalp
or literalq
.
E.g. "/path/to/logs/{app1,app2,app3}/data-*.log"
Notice that this is not same with grok pattern, which is the regex pattern for interpret log message.
Pipeline Config
Now, we finish the config of Logstash, now we can config a pipeline workflow to handle our log data.
Input
The first part of pipeline is input plugin:
- file: read the log from a file;
- syslog: listen on port 514 and parse log according to RFC3164;
- redis: read log from redis;
- beats: read log from Elastic Beats, a light data shipper to send data to Logstash;
If there exists more than one input plugins, it will read from all of them at the same time and combine them into one stream.
Filter
Then, stream of log will go through a lists of filters to handle it.
- grok: change unstructured text into structured, break it up into many different discrete bits of information
- e.g.
match => { "message" => "%{COMBINEDAPACHELOG}" }
- grok debugger
- e.g.
- mutate: rename/remove/replace/modfiy fields
- drop: drops everything that gets to this filter
- date: parse date from a field and then store in field
match => [ "logdate", "MMM dd yyyy HH:mm:ss" ]
- This filter parses out a timestamp and uses it as the timestamp for the event (regardless of when you’re ingesting the log data)
- mutate: rename, strip
- discect: extracts unstructured event data into fields by using delimiters.
Apr 26 12:20:02 localhost systemd[1]: Starting system activity accounting tool...
mapping => { "message" => "%{ts} %{+ts} %{+ts} %{src} %{prog}[%{pid}]: %{msg}" }
- kv: parse key-value pairs
- geoip, dns, useragent, translate etc
Output
Finally, log events will go to output plugins.
- elasticsearch: write to elasticsearch cluster
- file: write to file
- graphite: pull metrics from logs and ship them to Graphite, which is an open source tool for storing and graphing metrics
- statsd etc
If there exists more than one output plugins, it will write to all of them: one target one copy of data.
Codec
Input codecs provide a convenient way to decode your data before it enters the input. Output codecs provide a convenient way to encode your data before it leaves the output. Using an input or output codec eliminates the need for a separate filter in Logstash pipeline.
It is used to separate the transport of message form serialization process. Some common codec is listed like following:
- json
- multiline
- msgpack
- plain
Customize Filter
In the handle of our log data, we can not only use plugins, but also refer to the field of our log. Because inputs generate events, there are no fields to evaluate within the input block—they do not exist yet. Because of their dependency on events and fields, the following configuration options will only work within filter and output blocks.
Field Ref
The syntax to access a field is [fieldname]
. If we are referring to a top-level field, you can omit the [] and simply use fieldname
. To refer to a nested field, you specify the full path to that field: [top-level field][nested field]
.
sprintf
Format
To put a field content in string, we can use %{}
:
output {
statsd {
increment => "apache.%{[response][status]}"
}
}
Instead of specifying a field name inside the %{}
, we can use the +FORMAT
syntax to represent timestamp, where FORMAT
is a time format.
Conditionals
Field references, sprintf format and conditionals, described below, will not work in an input block.
if [@metadata][test] == "Hello" {
stdout { codec => rubydebug }
}
- equality: ==, !=, <, >, <=, >=
- regexp: =~, !~ (checks a pattern on the right against a string value on the left)
- inclusion: in, not in
- and, or, nand, xor,
!
The @metadata
Make use of the @metadata field any time you need a temporary field but do not want it to be in the final output.
For example, used in timestamp extraction.
date {
match => [ "[@metadata][timestamp]" , "ISO8601" ]
}
Environment Variable
- Give a default value by using the form ${var:default value}. Logstash uses the default value if the environment variable is undefined.
- Environment variables are immutable. If you update the environment variable, you’ll have to restart Logstash to pick up the updated value.
- The replacement is case-sensitive.
Example
Here’s an example that uses an environment variable to set the path to a log file:
filter {
mutate {
add_field => {
"my_path" => "${HOME}/file.log"
}
}
}
the value of HOME:
export HOME="/path"
At startup, Logstash uses the following configuration:
filter {
mutate {
add_field => {
"my_path" => "/path/file.log"
}
}
}
Multiline Events
A common usage of Logstash is to combine the multiple lines log into a single one log event, here we explore three examples:
- Combining a Java stack trace into a single event
- Combining C-style line continuations into a single event
- Combining multiple lines from time-stamped events
According to the document of Filebeat, if we use Filebeat to handle the log, we might better combine them using Filebeat but Logstash. Otherwise, we may corrupt the stream of data.
Java Stack Traces
input {
stdin {
codec => multiline {
pattern => "(^[a-zA-Z.]+(?:Error|Exception): .+)|(^\s+at .+)|(^\s+... \d+ )|(^\s*Caused by:.+)"
what => "previous"
}
}
}
Line Continuations
This configuration merges any line that ends with the \ character with the following line.
input {
stdin {
codec => multiline {
pattern => "\\$"
what => "next"
}
}
}
Timestamps
This configuration uses the negate option to specify that any line that does not begin with a timestamp belongs to the previous line.
input {
file {
path => "/var/log/someapp.log"
codec => multiline {
pattern => "^%{TIMESTAMP_ISO8601} "
negate => true
what => previous
}
}
}
Ref
Written with StackEdit.
评论
发表评论