We are looking for an SRE, experienced in distributed systems, Kubernetes & microservices to join our Applications team. The team focuses on providing tooling to enrich the core Hazelcast Platform, making it easier to use, scale and provide greater functionality. Ensuring solutions to meet the most demanding customer needs.
Day to day, you’ll be leveraging your solid engineering fundamentals with a focus on performance, consistency, resilience and scale, bringing your passion for solving difficult problems to help realize the product vision.
Your role as a SRE is crucial in ensuring that Hazelcast Platform meets business objectives, is robust and scalable, and is depended upon by customers for mission-critical implementations.
WHAT YOU’LL DO
Keep Hazelcast cloud-based production systems running smoothly 24/7/365
- Design and Development:
- Design, develop, and maintain our cloud infrastructure to support both our end user management center and microservice based platform
- Implement new solutions using AWS and terraform, improving scalability, throughput, and reliability.
- Support and manage our Keycloak IDP ensuring it provides appropriate security while meeting the needs of the development team
- Implement security measures to protect data integrity and confidentiality, including encryption, access control, and compliance with relevant regulations.
- Work with our operations team to maintain our SOC2 & ISO27001 compliance, and keeping our environment secure
- Monitor the system for performance issues, errors, and potential failures, and implement maintenance procedures such as backups, data recovery, and disaster recovery plans.
- Troubleshoot issues related to data storage, including performance bottlenecks, data corruption, or compatibility issues with other software components.
- Collaborate with cross-functional teams, including software developers, architects, and product managers, to ensure the effective integration and operation of the components within the overall software infrastructure.
- Document design decisions, implementation details, and operational procedures to facilitate collaboration among team members and ensure the maintainability of the system.
- Stay updated with the latest developments in storage technologies, Java programming language, and software engineering best practices, and apply this knowledge to improve existing storage systems and develop new solutions.
- Be part of our on-call rotation to respond to availability incidents and work with support and engineers on customer incidents
WHAT YOU HAVE
Experience of distributed systems, Kubernetes & microservices
- Infrastructure as Code (Terraform)
- Modern devops stack (K8s, Prometheus, Grafana, Opentelemetry, ArgoCD, helm)
- Experience with at least one programming languages, preferably Golang