About HighRadius
HighRadius is a Fintech enterprise Software-as-a-Service (SaaS) company that leverages Artificial Intelligence (AI) and Machine Learning (ML) to help companies automate their Order-to-Cash, Treasury, and Record-to-Report processes.
About the Role
As a Staff Engineer - Infrastructure at HighRadius, you will play a pivotal role in designing, implementing, and maintaining our critical infrastructure across multiple cloud providers. You will be instrumental in ensuring the scalability, reliability, and security of our SaaS platform. Your expertise will guide the team in adopting best practices, solving complex technical challenges, and driving continuous improvement in our infrastructure.
Responsibilities
- Design, implement, and manage highly scalable, fault-tolerant, and secure infrastructure on AWS, Azure, and GCP.
- Lead the adoption and management of container orchestration platforms like Kubernetes and related tooling (e.g., Helm).
- Develop and maintain Infrastructure as Code (IaC) using tools such as Terraform, Ansible, and CloudFormation.
- Ensure the highest levels of system availability, performance, and security through proactive monitoring, alerting, and incident response.
- Implement and manage robust CI/CD pipelines for infrastructure deployments.
- Collaborate with development teams to optimize application performance and troubleshoot infrastructure-related issues.
- Mentor junior engineers, share knowledge, and foster a culture of technical excellence.
- Stay abreast of industry trends and emerging technologies to continuously improve our infrastructure.
Requirements
- Bachelor's degree in Computer Science, Engineering, or a related field.
- 8+ years of experience in Infrastructure Engineering, DevOps, or Site Reliability Engineering (SRE) roles.
- Deep expertise in at least one major cloud provider (AWS, Azure, or GCP) and significant experience with others.
- Proven experience with Kubernetes in a production environment.
- Strong proficiency in IaC tools like Terraform or Ansible.
- Solid understanding of Linux operating systems and networking concepts (DNS, Load Balancing, Firewalls).
- Experience with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or Splunk.
- Proficiency in scripting languages like Python, Go, or Bash.
- Experience with Docker and containerization best practices.
- Excellent problem-solving, communication, and leadership skills.
- Experience with production support and on-call rotations.
- Understanding of disaster recovery and high-availability strategies.