adventures in stunnel and elasticsearch

While deploying a 3 node HIPAA compliant Elasticsearch/Kibana cluster, we ran into a really weird issue where the nodes would disconnect/reconnect, every 12 hours or so. This includes TCP analysis of the connectivity issues, a glimpse into Elasticsearch node TCP connectivity and TCP keepalive settings, discovery of the root cause, and resolution of the connectivity issue. Background We deployed a “secure 3 node cluster”, one that used TLS everywhere; on the inbound elasticsearch clients, between nodes, in front of Kibana.