跳至主要内容

Logstash Learning (2): Config

In the last blog, we have introduced some concepts in Logstash: the log data flow from input to filter to output, the buffer & batch etc.

In the this blog, we focus on how to setup Logstash.

Settings Files

After installing Logstash, we can find its settings files under /etc/logstash (in linux):

  • logstash.yml: Logstash parameter config file
  • log4j2.properties: Logstash logging config
  • jvm.options: Logstash JVM config
  • startup.options: It is used by system-install script in /usr/share/logstash/bin to build the startup script. It will setup options like user, group, service name, and service decription.

logstash.yml

Except the normal form of yml config file functionality, the logstash.yml file also supports bash-style interpolation of environment variables in setting values.

pipeline:
  batch:
    size: ${BATCH_SIZE}
    delay: ${BATCH_DELAY:5}
node:
  name: "node_${LS_NODE_NAME}"
path:
   queue: "/tmp/${QUEUE_DIR:queue}"

We can also set logging variables in logstash.yml to be used in log4j2.properties of logstash logging config:

log.level: debug 
log.format: plain

--------------

rootLogger.level = ${sys:ls.log.level}
rootLogger.appenderRef.rolling.ref = ${sys:ls.log.format}_rolling

Glob Patten Paths

This is the pattern for specify path of file, which can be used either in logstash.yml or in pipe.conf, anywhere we want to refer to a file:

  • *: match any file except dot files
  • **: match directories recursively
  • ?: match any one character
  • {p,q}: match either literal p or literal q.

E.g. "/path/to/logs/{app1,app2,app3}/data-*.log"

Notice that this is not same with grok pattern, which is the regex pattern for interpret log message.

Pipeline Config

Now, we finish the config of Logstash, now we can config a pipeline workflow to handle our log data.

Input

The first part of pipeline is input plugin:

  • file: read the log from a file;
  • syslog: listen on port 514 and parse log according to RFC3164;
  • redis: read log from redis;
  • beats: read log from Elastic Beats, a light data shipper to send data to Logstash;

If there exists more than one input plugins, it will read from all of them at the same time and combine them into one stream.

Filter

Then, stream of log will go through a lists of filters to handle it.

  • grok: change unstructured text into structured, break it up into many different discrete bits of information
    • e.g. match => { "message" => "%{COMBINEDAPACHELOG}" }
    • grok debugger
  • mutate: rename/remove/replace/modfiy fields
  • drop: drops everything that gets to this filter
  • date: parse date from a field and then store in field
    • match => [ "logdate", "MMM dd yyyy HH:mm:ss" ]
    • This filter parses out a timestamp and uses it as the timestamp for the event (regardless of when you’re ingesting the log data)
  • mutate: rename, strip
  • discect: extracts unstructured event data into fields by using delimiters.
Apr 26 12:20:02 localhost systemd[1]: Starting system activity accounting tool...

mapping => { "message" => "%{ts} %{+ts} %{+ts} %{src} %{prog}[%{pid}]: %{msg}" }
  • kv: parse key-value pairs
  • geoip, dns, useragent, translate etc

Output

Finally, log events will go to output plugins.

  • elasticsearch: write to elasticsearch cluster
  • file: write to file
  • graphite: pull metrics from logs and ship them to Graphite, which is an open source tool for storing and graphing metrics
  • statsd etc

If there exists more than one output plugins, it will write to all of them: one target one copy of data.

Codec

Input codecs provide a convenient way to decode your data before it enters the input. Output codecs provide a convenient way to encode your data before it leaves the output. Using an input or output codec eliminates the need for a separate filter in Logstash pipeline.

It is used to separate the transport of message form serialization process. Some common codec is listed like following:

  • json
  • multiline
  • msgpack
  • plain

Customize Filter

In the handle of our log data, we can not only use plugins, but also refer to the field of our log. Because inputs generate events, there are no fields to evaluate within the input block—they do not exist yet. Because of their dependency on events and fields, the following configuration options will only work within filter and output blocks.

Field Ref

The syntax to access a field is [fieldname]. If we are referring to a top-level field, you can omit the [] and simply use fieldname. To refer to a nested field, you specify the full path to that field: [top-level field][nested field].

sprintf Format

To put a field content in string, we can use %{}:

output {
  statsd {
    increment => "apache.%{[response][status]}"
  }
}

Instead of specifying a field name inside the %{}, we can use the +FORMAT syntax to represent timestamp, where FORMAT is a time format.

Conditionals

Field references, sprintf format and conditionals, described below, will not work in an input block.

if [@metadata][test] == "Hello" {
  stdout { codec => rubydebug }
}
  • equality: ==, !=, <, >, <=, >=
  • regexp: =~, !~ (checks a pattern on the right against a string value on the left)
  • inclusion: in, not in
  • and, or, nand, xor, !

The @metadata

Make use of the @metadata field any time you need a temporary field but do not want it to be in the final output.

For example, used in timestamp extraction.

date {
  match => [ "[@metadata][timestamp]" , "ISO8601" ]
}

Environment Variable

  • Give a default value by using the form ${var:default value}. Logstash uses the default value if the environment variable is undefined.
  • Environment variables are immutable. If you update the environment variable, you’ll have to restart Logstash to pick up the updated value.
  • The replacement is case-sensitive.

Example

Here’s an example that uses an environment variable to set the path to a log file:

filter {
  mutate {
    add_field => {
      "my_path" => "${HOME}/file.log"
    }
  }
}

the value of HOME:

export HOME="/path"

At startup, Logstash uses the following configuration:

filter {
  mutate {
    add_field => {
      "my_path" => "/path/file.log"
    }
  }
}

Multiline Events

A common usage of Logstash is to combine the multiple lines log into a single one log event, here we explore three examples:

  • Combining a Java stack trace into a single event
  • Combining C-style line continuations into a single event
  • Combining multiple lines from time-stamped events

According to the document of Filebeat, if we use Filebeat to handle the log, we might better combine them using Filebeat but Logstash. Otherwise, we may corrupt the stream of data.

Java Stack Traces

input {
  stdin {
    codec => multiline {
      pattern => "(^[a-zA-Z.]+(?:Error|Exception): .+)|(^\s+at .+)|(^\s+... \d+ )|(^\s*Caused by:.+)"
      what => "previous"
    }
  }
}

Line Continuations

This configuration merges any line that ends with the \ character with the following line.

input {
  stdin {
    codec => multiline {
      pattern => "\\$"
      what => "next"
    }
  }
}

Timestamps

This configuration uses the negate option to specify that any line that does not begin with a timestamp belongs to the previous line.

input {
  file {
    path => "/var/log/someapp.log"
    codec => multiline {
      pattern => "^%{TIMESTAMP_ISO8601} "
      negate => true
      what => previous
    }
  }
}

Ref

Written with StackEdit.

评论

此博客中的热门博文

Spring Boot: Customize Environment

Spring Boot: Customize Environment Environment variable is a very commonly used feature in daily programming: used in init script used in startup configuration used by logging etc In Spring Boot, all environment variables are a part of properties in Spring context and managed by Environment abstraction. Because Spring Boot can handle the parse of configuration files, when we want to implement a project which uses yml file as a separate config file, we choose the Spring Boot. The following is the problems we met when we implementing the parse of yml file and it is recorded for future reader. Bind to Class Property values can be injected directly into your beans using the @Value annotation, accessed via Spring’s Environment abstraction or bound to structured objects via @ConfigurationProperties. As the document says, there exists three ways to access properties in *.properties or *.yml : @Value : access single value Environment : can access multi

Elasticsearch: Join and SubQuery

Elasticsearch: Join and SubQuery Tony was bothered by the recent change of search engine requirement: they want the functionality of SQL-like join in Elasticsearch! “They are crazy! How can they think like that. Didn’t they understand that Elasticsearch is kind-of NoSQL 1 in which every index should be independent and self-contained? In this way, every index can work independently and scale as they like without considering other indexes, so the performance can boost. Following this design principle, Elasticsearch has little related supports.” Tony thought, after listening their requirements. Leader notice tony’s unwillingness and said, “Maybe it is hard to do, but the requirement is reasonable. We need to search person by his friends, didn’t we? What’s more, the harder to implement, the more you can learn from it, right?” Tony thought leader’s word does make sense so he set out to do the related implementations Application-Side Join “The first implementation

Implement isdigit

It is seems very easy to implement c library function isdigit , but for a library code, performance is very important. So we will try to implement it and make it faster. Function So, first we make it right. int isdigit ( char c) { return c >= '0' && c <= '9' ; } Improvements One – Macro When it comes to performance for c code, macro can always be tried. #define isdigit (c) c >= '0' && c <= '9' Two – Table Upper version use two comparison and one logical operation, but we can do better with more space: # define isdigit(c) table[c] This works and faster, but somewhat wasteful. We need only one bit to represent true or false, but we use a int. So what to do? There are many similar functions like isalpha(), isupper ... in c header file, so we can combine them into one int and get result by table[c]&SOME_BIT , which is what source do. Source code of ctype.h : # define _ISbit(bit) (1 << (