nodes grouped under cluster name
There always is one master node
Simple Elasticsearch instance
Coordinates access to shards
logical grouping over shards
part of an index
can be distributed over many nodes for failover or performance
technically a Lucene index
number of replicas can be set per index
1 for our example
if you have only one node, Elasticsearch is not happy
you need at least two nodes
replicated shards initializing properly
Relational Database | Databases | Tables | Rows | Columns |
---|---|---|---|---|
Elasticsearch | Indices | Types | Documents | Fields |
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25
}
Looking at our example using REST calls returning JSON
GET expo2009_airline/flight/_search
./bin/logstash -f expo2009_airline.conf
input {
file {
path => "ml/raw_data/expo2009_airline/2001.csv"
type => "flight"
start_position => "beginning"
codec => plain {
charset => "ISO-8859-1"
}
}
}
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html
output {
elasticsearch {
action => "index"
hosts => "localhost:9200"
index => "expo2009_airline"
workers => 1
}
}
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html
filter {
csv {
columns => ["Year","Month","DayofMonth","DayOfWeek",
"DepTime","CRSDepTime","ArrTime","CRSArrTime",
"UniqueCarrier","FlightNum","TailNum","ActualElapsedTime",
"CRSElapsedTime","AirTime","ArrDelay","DepDelay","Origin",
"Dest","Distance","TaxiIn","TaxiOut","Cancelled",
"CancellationCode","Diverted","CarrierDelay","WeatherDelay",
"NASDelay","SecurityDelay","LateAircraftDelay"]
separator => ","
}
}
Having a timestamp (field: @timestamp
) makes data especially useful for Elasticsearch
filter {
mutate { add_field => ["timestamp",
"%{Year}-%{Month}-%{DayofMonth};%{CRSDepTime}"] }
date {
match => ["timestamp", "YYYY-MM-dd;HHmm"]
target => "@timestamp"
}
}
Adding types makes querying faster and gives additional info for queries
mutate { convert => { "ActualElapsedTime" => "integer" } }
mutate { convert => { "CRSElapsedTime" => "integer" } }
mutate { convert => { "ArrDelay" => "integer" } }
mutate { convert => { "DepDelay" => "integer" } }
mutate { convert => { "AirTime" => "integer" } }
mutate { convert => { "Distance" => "integer" } }
mutate { convert => { "TaxiIn" => "integer" } }
mutate { convert => { "TaxiOut" => "integer" } }
mutate { convert => { "Cancelled" => "boolean" } }
mutate { convert => { "Diverted" => "boolean" } }
Flights from LA or Newyark to Denver or Clinton between September 10th 2001, 11:12 and 15:38?
Problem: Making data calls from a web browser is not allowed because of Same-Origin-Policy (SOP)
Option 1: Use a (relay) web server to make calls to Elasticsearch
Option 2: enable Cross-Origin Resource Sharing (CORS) to allow direct access from browser
http.cors.enabled: true
http.cors.allow-origin: '*'