remoteonsite

Staff Database Reliability Engineer PostgreSQL Cloud- Remote

Database Reliability Engineer PostgreSQL Cloud- Remote

Staff Database Reliability Engineer PostgreSQL Cloud- Remote position — see original posting for full details.

About the role

Role Summary

We are seeking an experienced SRE/DBRE to ensure reliability, performance, scalability, and operational excellence of our multi-cloud DBaaS platform across:

Microsoft Azure

Amazon Web Services

Google Cloud Platform

This role combines deep database expertise with SRE principles to build highly available, automated, and resilient database platforms. The DBRE Lead will drive operational standards, automation frameworks, and reliability engineering practices across distributed cloud environments

🔹 What We’re Looking For

6-8 years in DBA / Platform Engineering

5+ yrs of relevent experience in PostgreSQL

Should be ok to work 24x7 set up

Work from Home / Remote Work –

Strong multi-cloud experience (Azure / AWS / GCP – at least two)

Deep HA/DR & performance tuning expertise

Automation-first mindset (Terraform, scripting, CI/CD)

Experience in SaaS/DBaaS environments preferred

For a Site Reliability Engineer (SRE) in a DBaaS (Database-as-a-Service) support role, the following mandatory skills are typically required:

1. Database Administration (DBA) Skills

Primary Database: PostgreSQL

Secondary Database: MySQL, SQLServer

Database Backup & Recovery: Tools and strategies for database backups and disaster recovery.

Performance Tuning: Query optimization, indexing strategies, and database performance troubleshooting.

Database Security: User management, roles, access control, and auditing.

2. Cloud Infrastructure Knowledge (DBaaS)

Cloud Platforms: AWS (RDS, Aurora), Azure (Cosmos DB, SQL Database), GCP (Cloud SQL, Firestore).

Infrastructure as Code (IaC): Terraform, CloudFormation, Kubernetes.

Kubernetes & Containers: Running databases in containers (like Kubernetes).

Observability Tools: ELK stack (Elasticsearch, Logstash, Kibana)

Database Migration: Migrating databases across different platforms or cloud environments.

Database Scaling: Vertical and horizontal scaling techniques in cloud environments.

3. SRE Principles (Site Reliability Engineering)

Incident Management: Handling database outages, incident response, and on-call rotations.

Monitoring and Alerting: Tools like Prometheus, Grafana, Datadog, CloudWatch.

Service Level Objectives (SLOs) / Service Level Agreements (SLAs): Ensuring uptime and performance targets.

Disaster Recovery Planning: Ensuring high availability (HA) and disaster recovery (DR) solutions.

4. Scripting and Automation

Scripting Languages: Python, Shell scripting, Bash, PowerShell.

Automation Tools: Ansible, Puppet, Chef.

Infrastructure Automation: Automating database deployment, patchi