Siemens Healthineers engaged Philipp and Vlad due to growing challenges with their platform. As more users started using the platform, the availability requirements increased, and they wanted to reduce downtime. However, their current operations capabilities did not allow them to achieve the uptime and availability needed, and there was also a problem with the time it took to recover from failures. To address these challenges, Siemens Healthineers decided to adopt SRE as a solution to improve their operations and increase reliability. The adoption of SRE was added to the list of big initiatives, and Vlad and Philipp worked through the organization to get buy-in and support for the change.
In this episode, we talk about SRE (Site Reliability Engineering) with Vlad Ukis, who’s written a book about his experience at Siemens Healthnieers’ digital health platform. His book, Establishing SRE Foundations: A Step-by-Step Guide to Introducing Site Reliability Engineering in Software Delivery Organizations, walks you through the steps to go from developing a cloud based service, to operating that service 24/7 with a method (SRE) published, and promoted by Google.
We’ve also covered this method in another episode with János Csorvási and Jeff Campbell, which is a great complement to this interview with Vlad Ukis.