Senior Site Reliability Engineer
AccelByte
POSITION SUMMARY:
AccelByte is building a 24x7 operations team for AAA multiplayer video games. In this position, we need a driven Site Reliability Engineer who can actively participate in the day-to-day combat by maintaining high reliability of our service and drive prioritization in fixing what may be broken today, as well as able to envision, design, and implement processes and technologies to improve the ability to identify, isolate, correlate, and mitigate service impacting problems in the system. The Site Reliability Engineer must also know some coding to automate routine tasks in service metrics gathering, correlating, organizing, and presenting, in addition to detail and in-depth root cause analysis
ESSENTIAL FUNCTIONS/RESPONSIBILITIES:
The Senior Site Reliability Engineer (SRE) is accountable for the following functions and responsibilities:
- Design, implement, and maintain infrastructure for applications
- Architect, implement and maintain a highly scalable deployment framework that improves our products' stability, reliability, and availability.
- Build and run service deployment using K8s and other CNCF projects
- Provide a secure, high-scalable, and cost-effective cloud platform
- Construct and build effective systems to monitor the health of our system/applications, and to handle outages
- Solve problems occurring in all our environments and create solutions to prevent them from happening again
- Produce automation and innovative tools to assist the product development teams and to deliver operational excellence
- Create and maintain infrastructure-related documentation and SRE runbooks
- Collaborate with other stakeholders to provide cost-effective, operational excellence, and performance-efficient infrastructure solutions to improve our products.
- Identify technology, process gaps, and opportunities for improvement
- Liaise, communicate, and work directly with our clients
- Perform any other design-related duties as required
- Envision, design, and implement AIOps solutions to enhance operational efficiency and predictive maintenance.
QUALIFICATIONS/EXPERIENCE REQUIRED
- 5+ years Cloud Engineering or DevOps experience with AWS, 2+ years Kubernetes, Certification in AWS preferred
- Degree in Computer Science or equivalent experience
- Deep knowledge of cloud service providers and best practices around implementation and configuration, preferably managing AWS and Kubernetes
- Familiarity with infrastructure management and operations lifecycle concepts and ecosystem, deep understanding of IaC and GitOps
- Proven track record of building infrastructure as code (Terraform is a must), configuration management, and package manager (eg: Helm Chart)
- Experience in delivering products against a plan in a fast-paced, multi-disciplined, and often ambiguous environment
- Experience working independently to design, plan, and execute technical projects
- Demonstrated deep knowledge of technical program management and engineering best practices
- Innovative thinking balanced with a strong customer and quality and cost efficiency focus
- Comfort and experience with cross-organizational communication; excellent written and verbal communication skills
- Working experience with some of the following technologies and tools: Docker, Kubernetes, git, Redis, MongoDB, PostgreSQL, ElasticSearch, GitLab CI, Nexus, SonarQube, Terraform, Helm, Prometheus, ELK/EFK, Grafana, CloudWatch
- Solid security best practices
- Strong proficiency in Go, including the ability to conduct high-quality code reviews. Experience with Python and Bash is also required.
- Keen problem-solving skills with the ability to work under pressure (during a production event)
- Flexibility in working with people with different timezones
- Experience with AIOps and building/harnessing AI tools to automate and optimize operational tasks.
QUALIFICATIONS/EXPERIENCE PREFERRED
- Previous experience working in the game industry
- Working experience with one or more of the following: Emissary, Linkerd, Istio, Nomad, Kafka, Flux, ArgoCD, GitOps, DevSecOps
- Familiar with web services patterns/architectures, e.g. REST, SOAP, etc.
- Experience working with auto-scaling workloads both in containers and VMs
- Experience with other cloud technologies and infrastructure: GCP, Azure
- Experience with Confluence, Jira, and BitBucket
- IT standards, methodologies, Cryptographic key management regulations, and audit experience would be asset(s).
Cara melamar
Untuk melamar pekerjaan ini, Anda perlu otorisasi di situs web kami. Jika Anda belum memiliki akun, silakan daftar.
Posting CVPekerjaan serupa
Estimator
Global Recruitment Partner (Hiring Partner) - Remote & Commission Based
Sales Motoris (Yogyakarta)