SRE Architect with skills SRE Engineering, SRE Architecture, SRE Engineering for location Any Infogain Base Location (Noida, Gurugram, Bangalore, Mumbai, Pune)
ROLES & RESPONSIBILITIES

Core Skills

12 to 14 years of experience in Site Reliability Engineering, DevOps, or a related field, with at least 3 years in a senior or architect-level role.

  1. Strong expertise in system architecture, distributed systems, cloud computing (e.g., AWS, Azure, GCP), containerization (e.g., Docker, Kubernetes), and infrastructure as code (e.g., Terraform, Ansible).

  2. Proficiency in one or more programming/scripting languages (e.g., Python, Groovy, Shell, Powershell or similar).

  3. Strong background of DevOps practices, Cloud Technologies in ensuring scalability, reliability and security of Cloud infrastructure

  4. Experience with monitoring and observability tools (e.g., Dynatrace, Prometheus, Grafana, ELK stack, Datadog).

  5. Experience in integrating SRE with backend technologies like databases, messaging systems, etc. Strong understanding of software engineering principles and practices

  6. Deep understanding of incident management, root cause analysis, and post-incident review processes.

  7. Involvement in setting strategic direction for SRE practices, leading technical initiatives, and promoting a culture of excellence in site reliability engineering.

  8. Excellent problem-solving and communication skills and ability to work collaboratively in a fast-paced and dynamic environment.

  9. Proven ability to lead technical projects, influence cross-functional teams, and drive change.

  10. Excellent verbal and written communication skills, with the ability to articulate complex technical concepts to both technical and non-technical audiences.

  11. Certifications in relevant technologies like Cloud certified DevOps Architect, Cloud Operations Support Architect etc.

 

Key Responsibilities:

  • Architecting Systems: Design and architect highly available, scalable, and resilient systems to meet the demands of our growing user base and evolving business needs.

  • Reliability Engineering: Develop and implement strategies to improve system reliability, including incident management, monitoring, and automated remediation.

  • Performance Optimization: Identify and address performance bottlenecks, optimize system performance, and ensure efficient resource utilization.

  • Collaboration: Partner with development teams, product managers, and other stakeholders to integrate SRE practices into the development lifecycle and ensure alignment with business objectives.

  • Automation: Drive automation initiatives to reduce manual intervention, increase efficiency, and improve system reliability.

  • Incident Management: Lead post-incident reviews, root cause analysis, and develop strategies for preventing future incidents.

  • Best Practices: Establish and enforce best practices for system design, monitoring, and incident management.

  • Mentorship: Provide guidance and mentorship to junior SREs and engineering teams on SRE principles and practices.

Qualifications:

  • Experience: 8+ years of experience in Site Reliability Engineering, DevOps, or a related field, with at least 3 years in a senior or architect-level role.

  • Technical Skills: SProgramming: Proficiency in one or more programming languages (e.g., Python, Go, Java, or similar).

  • Monitoring Tools: Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack, Datadog).

  • Incident Response: Leadership: Proven ability to lead technical projects, influence cross-functional teams, and drive change.

  • Communication:  

Preferred Qualifications:

  • Certifications: Relevant certifications (e.g., AWS Certified Solutions Architect, Google Professional Cloud Architect) are a plus.

  • Experience: Previous experience in high-growth or high-availability environments.

EXPERIENCE
  • 12-14 Years
SKILLS
  • Primary Skill: SRE Engineering
  • Sub Skill(s): SRE Engineering
  • Additional Skill(s): SRE Architecture, SRE Engineering
ABOUT THE COMPANY

Infogain is a human-centered digital platform and software engineering company based out of Silicon Valley. We engineer business outcomes for Fortune 500 companies and digital natives in the technology, healthcare, insurance, travel, telecom, and retail & CPG industries using technologies such as cloud, microservices, automation, IoT, and artificial intelligence. We accelerate experience-led transformation in the delivery of digital platforms. Infogain is also a Microsoft (NASDAQ: MSFT) Gold Partner and Azure Expert Managed Services Provider (MSP).

Infogain, an Apax Funds portfolio company, has offices in California, Washington, Texas, the UK, the UAE, and Singapore, with delivery centers in Seattle, Houston, Austin, Kraków, Noida, Gurgaon, Mumbai, Pune, and Bengaluru.

Express Application
Upload Microsoft word, PDF file upto 500KB.
Recent Jobs
Posted on March 12, 2025
Salesforce Developer (Standard) | 4.5-6 Years | Salesforce Technical - Salesforce Apex , Salesforce Functional, Salesforce Technical
Posted on March 12, 2025
C++ Developer (Senior) | 6-8 Years | C/C++ Development - .NET Core, ASP.Net, C, C++, Delphi...
Posted on March 12, 2025
DevOps Architect | 12-14 Years | DevOps Engineering - Groovy, Ansible, Jenkins, Terraform, DevOps Engineering
Posted on March 12, 2025
Infrastructure Support Engineer (Senior) | 8-11 Years | Infrastructure Support - Infrastructure Support, AD DC DHCP DNS