• Site Reliability Engineer - SSE

    Job Locations IN-Pune
    Job ID
  • Overview

    We at Continuum are looking for an experienced Site Reliability Engineer to join our CloudOps Team - Operations team.

    You will be a member of Operations team responsible for Availability, Performance, troubleshooting and Security entire infra which consist of Microsoft Service, Database and Web application.


    • Use open source tools to build a scalable logging, monitoring, troubleshooting for our private and public cloud platforms
    • Continually maintain and improve software build methodology, procedures, and environment
    • Manage and maintain configuration management infrastructure, source code and Docker image repositories
    • Deploy, manage, upgrade systems, services and containers using automated configuration management and service orchestration tools
    • Monitor and alert based on system metrics, analysis of log files and custom alert rules
    • Ensure uptime SLA for the SaaS infrastructure, services and applications as part of the CloudOps team.
    • Produce weekly, monthly and quarterly uptime and status reports for production and critical internal infrastructure


    Mandatory Skills:

    • Experience in the field of Data Centre Infrastructure management (LINUX/ Docker/ Kubernetes/ Microservices)
    • Experience in AWS Web Service.
    • Responsible for overall Docker & Kubernetes setup, configuration and maintenance.
    • Configuring and maintaining/ supporting a large-scale Docker-based PaaS environment using Kubernetes. Must have proven prior experience in managing a large K8S cluster and scaling it significantly on the scale.
    • Troubleshoot and resolve issues within the Docker and kubernetes environment.
    • Ensure proper operational aspects including monitoring, reporting, backup for both Docker hosts and associated images and containers
    • Help develop and maintain automated processes, tools, and documentation in support of Docker
    • Work with development teams to understand capacity and performance requirements & Define a deployment solution using container orchestration tools to ensure a scalable and highly available solution.
    • Assist Development teams to migrate applications to Docker-based PaaS platform using Kubernetes.
    • Experience in defining Application Deployment Solution on Docker-based PaaS environment and migrating applications to Kubernetes and Docker platform.
    • Should be able to design and implement required Failover mechanisms in a Docker/ Kubernetes Ecosystems.
    • Should be well versed in generic Administration tasks of creating Docker images managing versions, container networking and standard Infrastructure maintenance tasks on Docker and kubernetes platform.
    • Should have knowledge of Linux kernel options such as groups and defining application groups to restrict resources


    Other Relevant Skills:

    • Development, Operations and/ or DevOps experience deploying and maintaining global multi-tiered infrastructure and web applications
    • BS or MS in Computer Science, Engineering, or a related technical discipline or equivalent experience
    • Hands-on scripting and coding with Python, Shell, Perl, Ruby, PHP
    • Good Linux system administrator skills and TCP/ IP network fundamentals
    • Strong analytical and problem- solving skills along with good communication and documentation skills
    • Strong command of configuration management tools like Ansible, GitHub, etc. in a large-scale environment.
    • Good Knowledge on Kafka.
    • Good Knowledge on Cassandra
    • Experience with metrics collection and charting using tools like CollectD, StatsD, Graphite, Dynatrace
    • Experience with Docker, microservices and container-based deployment and service orchestration using Docker Swarm, Kubernetes.


    Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
    Share on your newsfeed