Monitoring is the foundational bedrock of site reliability, and yet is the bane of most sysadmin's lives. Why? Monitoring sucks when the cost of maintenance scales proportionally with the size of the system being monitored. Recently tools like Riemann and Prometheus have emerged that can address this problem, by scaling out monitoring configurations sublinearly with the size of the system.
In this talk, Jamie will talk about the theory of alert design and timeseries-based alerting methods, and complement that with practical examples in Prometheus that you can deploy in your environment today to reduce the amount of alert spam and help operators keep a healthy level of production hygiene.
Jamie is a Site Reliability Engineer at Google in Sydney, leading a team who runs one of Google's oldest planet-scale eventually-consistent replicated key value stores. He's always been interested in monitoring since before he started at Google many years ago, and wants to share everything he has learned about making monitoring systems useful for people and business.
Geelong is Victoria's second largest city, located on Corio Bay, and within a short drive from popular beach-front communities on the Bellarine Peninsula as well as being the gateway to the famous Great Ocean Road
linux.conf.au is widely regarded by delegates as one of the best community run Linux conferences worldwide and is the largest Linux and Open Source Software conference in the Asia-Pacific.
Our Sponsors help make linux.conf.au become the awesome conference everyone comes back to year after year. Come see who's on board this year, or find out how to get in contact with us