I use ElasticSearch, Kibana and also Grafana on a single Intel NUC to process/store incoming log events and metrics.

The volume is really low, so this little box has no problem doing the work. But I found some problems when I rebooted the system.

Kibana was dead, shards were unassigned and lost.

After a long, long debugging session I finally understood no actual *data* was gone, but it had lost some copies and run out of "place" to copy things to.

Cut short: it put a gun to its head and kept pulling the trigger again and again, while displaying alerts about loud noises.

 

Elasticsearch needs special configuration to better handle reboots and restarts in a standalone configuration.

It's intended to run on many nodes, not one, and some assumptions can fail if there's only one node.

i.e. on a write it would try to store that write. In case of a pending shutdown it tries to relocate that write elsewhere. it also tries to replicate all writes to other nodes.. if there's none, it would use the same node, but in case of a shutdown it would try again to replicate elsewhere.

but there is no else.

 

Instead it needs to be told to not replicate data in the first place, and that's what the config here should do.

This is not well-tested. It is what seems to work for me :-)

 

The elasticsearch config

elasticsearch.yml
cluster.name: logstash-01
node.local: true
discovery.zen.ping.multicast: false
index.number_of_shards: 1
index.number_of_replicas: 0
index.routing.allocation.total_shards_per_node: 2
# this might also be needed:
# cluster.routing.allocation.disable_new_allocation: true
# cluster.routing.allocation.disable_allocation: true
cluster.routing.allocation.allow_primary: true
node.name: "logstash-01-node01"
gateway.expected_nodes: 1

 

I also hard-code my nodename, the autoassignment has no point for this scenario:

  • node.name

settings tell ES that there's only 1 node and it doesn't need to wait for others during startup:

  • node.local
  • discovery.zen.ping.multicast.false
  • expected_nodes

settings that tell ES to store data in only one place, with no redundancy

  • number_of_shards
  • number_of_replicas

 

Depending on your luck, this might also be helpful:

Kibana Elasticsearch load failure

 

And to make things complete... 

Band-aid if cluster state is yellow/red already

The steps I took when it was already stuck and broken and horrible

Guess what, I also had to delete the "broken" copies.

Commands I seem to have used for that were first the dangerous and unholy re-route.

Here is a slightly edited version of what I found in my history.

REroute

 

curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | \
  awk '{print $1" "$2}' | \
  while read idx shard ; do 
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{   
        "commands" : [ {
              "allocate" : {
                  "index" : $idx",
                  "shard" : $shard,
                  "node"  : "Ub7nbTVCQuyOVQ3kO7ZEUQ",
                  "allow_primary" : true
              } 
            }   
        ]
  }'    
  sleep 5 
done

This part is most of why I had hardcoded the node name.

 

Dropping replicas / replica requirements

# curl -XPUT localhost:9200/_cluster/settings -d '
{
  "transient":{
     "cluster.routing.allocation.disable_allocation":false
   }
}'
# curl -XPUT http://localhost:9200/_cluster/settings -d '
{
    "transient" : {
        "cluster.routing.allocation.enable": true
    }   
}'
# curl -XPUT http://localhost:9200/_settings -d '{ "number_of_replicas" :0 
}'

The routing bit seems contradictive. I can't really remember what it meant.

Number of replicas 0 was making life better for me.