Recently I had to install & configure Elasticsearch and took some notes about it for further reference. Everything down here is actual at the moment of writing, March 2016 and Elasticsearch version 2.2.0.
Prerequisites
Java Runtime
Verify installed version of Java Runtime Environment or install it. Latest JDK 8 is strongly recommended.
- Check available JDK using command line:
java -version
. Expected output:java version 1.8.0_73
. Last numbers might be different, but1.8
is expected. -
Check system variables. Required environment variable: JAVA_HOME pointing to install location of JDK. Example:
JAVA_HOME: C:\Program Files\Java\jre1.8.0_73
Installation
- Download zip file from https://www.elastic.co/downloads/elasticsearch.
- Unzip downloaded file to location where your application will run from.
C:\Program Files\Elasticsearch-2.2.0
Configuration
Location of configuration file: {install-path}\config\elasticsearch.yml
. This is a text file using YAML format and that can be edited in any text editor.
Data folders
Uncomment lines containing path.data
and path.logs
keys. Set values to the location where elasticsearch should store indexes, documents and log files. Example:
path.data: D:\Elasticsearch\Data
path.logs: D:\Elasticsearch\Logs
Exposing elasticsearch endpoint over the local network
Uncomment network.host
and set value to _site
:
network.host: _site_
This will make elasticserch available under machine’s site name. Example: http://servername:9200
.
Set the heap size
By default Elasticsearch will reserve 1 GB for it’s heap. For most installations it’s not enough. Set it appropriatelly to RAM available on your servers and server load. To set the heap size you have to create a system environment variable (more details here:
ES_HEAP_SIZE: 4g
Set field data cache size
If you have data that often changes or becomes obsolete (like log messages) it’s useful to set the field data cache size in your config file:
indices.fielddata.cache.size: 40%
(optional) Install web interface for monitoring
Recomended tool: kopf. To install, from {install-path}
run next command:
{install-path}\bin\plugin.bat install menezes/elasticsearch-kopf/2.0
If install is failing to download or install check plugin’s homepage. If nothing works, download plugin to a temporary location and install it from a file:
bin\plugin install d:\temp\{path downloaded plugin zip file}
If even that is not working, unzip content of downloaded file to {install-path}\plugins\kopf
and restart the service.
Plugin should be accessible at: http://servername:9200/_plugin/kopf
Configure to run as a service
- Install elasticsearch service. Open command line and navigate to installation folder. Execute
bin\service.bat install
. - Open Services management console (services.msc) and find
Elasticsearch 2.2.0
service. ChangeStartup Type
toAutomatic
. If you need to run the service under a specific user account that’s the place to set that up. - Start the service
Checks (post-configure & post-install)
- After starting the service check logs for any errors.
- Open
http://machinename:9200/
in browser. No web page is opened but request should succeed by returning a JSON response or a file (depending on the browser). - If any of monitoring plugins is installed check there the state of your cluster and nodes (for example on
http://machinename:9200/_plugin/kopf
)
Setting up a basic cluster
If you want the simplest configuration but you want a your cluster to have some redundancy and good performance - then Elastic’s default settings will work well for you. Say, for redundancy and query distribution you want to run an ES cluster on 2 (or n
nodes). Then your starting point will be a very basic elasticsearch.yml
(example for 2 nodes):
First node:
cluster.name: my-es-cluster
node.name: my-es-node-one
network.host: _site_ # this will make ES available via host/mashine name
http.port: 9200 # listening endpoint
discovery.zen.ping.unicast.hosts: ["es-host-1", "es-host-2"]
Second node:
cluster.name: my-es-cluster
node.name: my-es-node-two
network.host: _site_
http.port: 9200
discovery.zen.ping.unicast.hosts: ["es-host-1", "es-host-2"]
Notice that the only difference is the node name.
Pretty self-explanatory. This should be enough to get you a working cluster of 2 boxes where both nodes will be master-eligible and data nodes and only one node will be playing the role of master node. But you can still query getting them processed on any of nodes. More about network settings