onsite

Cloud Reliability Engineer - Infios

Software Engineer

Responsible for operating, maintaining, and optimizing cloud infrastructure across AWS, Azure, and GCP, with a focus on Kubernetes cluster management, ensuring high availability, scalability, and performance of supply chain software solutions.

About the role

If you are looking for a meaningful career where people work and act with passion, rethink the existing and always strive to find the best solution - you have come to the right place. We develop future technologies to relentlessly make supply chains better.

We are a leader in supply chain software solutions, helping organizations streamline operations, reduce costs, and improve efficiency.

Key Responsibilities

▶Cloud Infrastructure Operations

oOperate, maintain, and improve cloud infrastructure in AWS, Azure, or GCP environments.

oManage and optimize Kubernetes clusters — deployment, scaling, patching, and upgrades.

oEnsure system availability, scalability, and performance through proactive monitoring and optimization.

oMaintain infrastructure-as-code (IaC) for consistent and repeatable deployments.

▶Automation & Continuous Improvement

oIdentify opportunities for operational automation to eliminate manual processes (“reduce toil”).

oBuild and maintain automated pipelines for deployments, configuration, and remediation.

oDevelop self-healing mechanisms to automatically detect and resolve common service issues.

oParticipate in continuous improvement initiatives around reliability, performance, and efficiency.

▶Reliability Engineering

oImplement SRE principles: define and track SLIs, SLOs, and error budgets.

oPerform incident analysis and postmortems to identify root causes and prevent recurrence.

oDesign proactive monitoring, alerting, and observability dashboards (Dynatrace, DataDog).

oCollaborate with DevOps and development teams to build reliable, observable, and resilient systems.

▶CI/CD and Release Operations

oManage and optimize CI/CD pipelines to ensure reliable and consistent delivery.

oSupport deployment strategies (blue/green, canary, rolling) to reduce downtime risk.

oCollaborate with Product and DevOps teams on release readiness and rollback automation.

▶Incident Response & Troubleshooting

oMonitor, troubleshoot, and resolve infrastructure and application issues

oRespond to production incidents and ensure rapid mitigation and resolution.

oTroubleshoot complex cloud, container, and networking issues across distributed systems.

oDrive a culture of proactive monitoring, data-driven analysis, and preventive action.

Required Qualifications

▶Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience).

▶5+ years of experience in experience in Cloud Engineering, DevOps, or Site Reliability roles.

▶Hands-on experience with cloud platforms (OCI, AWS, Azure, or GCP).

▶Strong knowledge of Kubernetes deployment, management, and troubleshooting

▶Solid understanding of observability and monitoring (e.g., Dyn

About the role

We are a leader in supply chain software solutions, helping organizations streamline operations, reduce costs, and improve efficiency.

Key Responsibilities

▶Cloud Infrastructure Operations

oOperate, maintain, and improve cloud infrastructure in AWS, Azure, or GCP environments.

oManage and optimize Kubernetes clusters — deployment, scaling, patching, and upgrades.

oEnsure system availability, scalability, and performance through proactive monitoring and optimization.

oMaintain infrastructure-as-code (IaC) for consistent and repeatable deployments.

▶Automation & Continuous Improvement

oIdentify opportunities for operational automation to eliminate manual processes (“reduce toil”).

oBuild and maintain automated pipelines for deployments, configuration, and remediation.

oDevelop self-healing mechanisms to automatically detect and resolve common service issues.

oParticipate in continuous improvement initiatives around reliability, performance, and efficiency.

▶Reliability Engineering

oImplement SRE principles: define and track SLIs, SLOs, and error budgets.

oPerform incident analysis and postmortems to identify root causes and prevent recurrence.

oDesign proactive monitoring, alerting, and observability dashboards (Dynatrace, DataDog).

oCollaborate with DevOps and development teams to build reliable, observable, and resilient systems.

▶CI/CD and Release Operations

oManage and optimize CI/CD pipelines to ensure reliable and consistent delivery.

oSupport deployment strategies (blue/green, canary, rolling) to reduce downtime risk.

oCollaborate with Product and DevOps teams on release readiness and rollback automation.

▶Incident Response & Troubleshooting

oMonitor, troubleshoot, and resolve infrastructure and application issues

oRespond to production incidents and ensure rapid mitigation and resolution.

oTroubleshoot complex cloud, container, and networking issues across distributed systems.

oDrive a culture of proactive monitoring, data-driven analysis, and preventive action.

Required Qualifications

▶Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience).

▶5+ years of experience in experience in Cloud Engineering, DevOps, or Site Reliability roles.

▶Hands-on experience with cloud platforms (OCI, AWS, Azure, or GCP).

▶Strong knowledge of Kubernetes deployment, management, and troubleshooting

▶Solid understanding of observability and monitoring (e.g., Dyn

Cloud Reliability Engineer - Infios

About the role

Cloud Reliability Engineer - Infios

About the role

Skills