I use ElasticSearch, Kibana and also Grafana on a single Intel NUC to process/store incoming log events and metrics.
The volume is really low, so this little box has no problem doing the work. But I found some problems when I rebooted the system.
Kibana was dead, shards were unassigned and lost.
After a long, long debugging session I finally understood no actual *data* was gone, but it had lost some copies and run out of "place" to copy things to.
Cut short: it put a gun to its head and kept pulling the trigger again and again, while displaying alerts about loud noises.
Elasticsearch needs special configuration to better handle reboots and restarts in a standalone configuration.
It's intended to run on many nodes, not one, and some assumptions can fail if there's only one node.
i.e. on a write it would try to store that write. In case of a pending shutdown it tries to relocate that write elsewhere. it also tries to replicate all writes to other nodes.. if there's none, it would use the same node, but in case of a shutdown it would try again to replicate elsewhere.
but there is no else.
Instead it needs to be told to not replicate data in the first place, and that's what the config here should do.
This is not well-tested. It is what seems to work for me :-)
The elasticsearch config
I also hard-code my nodename, the autoassignment has no point for this scenario:
settings tell ES that there's only 1 node and it doesn't need to wait for others during startup:
settings that tell ES to store data in only one place, with no redundancy
Depending on your luck, this might also be helpful:
And to make things complete...
Band-aid if cluster state is yellow/red already
The steps I took when it was already stuck and broken and horrible
Guess what, I also had to delete the "broken" copies.
Commands I seem to have used for that were first the dangerous and unholy re-route.
Here is a slightly edited version of what I found in my history.
This part is most of why I had hardcoded the node name.
Dropping replicas / replica requirements
The routing bit seems contradictive. I can't really remember what it meant.
Number of replicas 0 was making life better for me.