Always set limits to containers running in your cluster

History about a Prometheus eating up 20GB of RAM

2017-11-30 UPDATE

As those few days have passed I have not had any problems with Kubernetes cluster being unresponsive. Therefore this article concludes a few weeks of investigation why machine could entirely freeze. Set limits to your pods so that they won't kill your node.

The History

WeaveWorks Cloud DaemonSet deploys by default Prometheus to the cluster. Prometheus scrapes metrics out of your cluster and stores them and creates time-series data out of them ( this might not be the accurate description of what Prometheus does, but it's good enough for what just hapened ).

The prometheus (version 1.7.1 used by my WeaveWorks setup) has apparently been very eager in scraping the data and pushing it to WeaveWorks servers, but never actually got rid of the data. This resulted in over 20GB of data over 3 days of a lifespan in a single-node cluster.

I have upgraded to Prometheus 1.8.1 since then, we'll see if that helps. I have also put a strict limit of 2GB on Prometheus so that it won't kill my cluster ever again.

Lesson to learn

Always, ALWAYS set container limits. Kubernetes was not smart enough to detect that my cluster is getting out of memory (24GB total, 400MB free) and let Prometheus grow to 20GB monster.

Always set limits to containers running in your cluster

History about a Prometheus eating up 20GB of RAM

2017-11-30 UPDATE

The History

Lesson to learn

See also

Share this post with your friends