

Infrastructure Operations Manager
Location
Remote
Level
Senior
Department
Operations
Type
Full - Time
Salary
Job Description
Posted on:
February 24, 2022
We are looking for someone to ensure the reliability of Illuviums systems and services. You will use both your technical nous and hands-on skills as well as your ability to build and manage a small but high-performing team to ensure Illuvium delivers reliable, responsive and scalable systems and services.
Responsibilities
- Take overall responsibility for operational aspects of our systems and services, specifically the maintenance of a reliable, scalable, infrastructure which meets required SLAs, and metrics like RPO and RTO
- Provide the required governance and oversight for regular operations as well as scheduled activities like managing cloud related patching and vulnerability remediation
- Manage major incident & escalation processes until resolution
- Work with solution architects, devops engineers and backend developers to ensure operational features, like standardised logging, monitoring and alerting, are baked in to all designs
- Write code to build and configure operational aspects of our systems (Iac)
- Build and manage a small team to provide round the clock operational support for our systems
- Liaise with service providers and relevant vendors as and when required
- Anticipate long-term issues and problems, but also build for the present
- Work independently but also engage with the team
Job Requirements
- 5+ years experience of operating on large scale highly available production environments, ideally on AWS
- Experience in managing a team providing infrastructure monitoring and support services in the cloud, including: service integration, SLA management and service desk operations
- Hand on experience with writing code in a DevOps context (IaC)
- AWS Experience, preferably with services such as Lambda, DynamoDB, AWS Shield, CloudWatch, RDS, EC2, ECS
- General infrastructure experience including security and *nix admin skills
- Familiarity with automation and IaC tools (Terraform, CloudFormation, Packer, Gradle, Jenkins, etc)
- Understanding of at one or more scripting language such as bash, python, ruby, JS, etc
- A strong desire to learn new technologies and keep up to date with a fast-moving technology landscape