Job Title : Site Reliability Engineer
Job Location : Thane, India
Job Description
Role Purpose
Required Skills:
· 5+Yearsof experience in system administration, application development,infrastructure development or related areas
· 5+ yearsof experience with programming in languages like Javascript, Python,PHP, Go, Java or Ruby
· 3+ yearsof in reading, understanding and writing code in the same
· 3+yearsMastery of infrastructure automation technologies (like Terraform, CodeDeploy, Puppet, Ansible, Chef)
· 3+yearsexpertise in container/container-fleet-orchestration technologies (likeKubernetes, Openshift, AKS, EKS, Docker, Vagrant, etcd, zookeeper)
· 5+ yearsCloud and container native Linux administration /build/ managementskills
Key Responsibilities:
· Hands-ondesign, analysis, development and troubleshooting of highly-distributedlarge-scale production systems and event-driven, cloud-based services
· Primarily Linux Administration, managing a fleet of Linux and WindowsVMs as part of the application solutions
· Involvedin Pull Requests for site reliability goals
· AdvocateIaC (Infrastructure as Code) and CaC (Configuration as Code) practiceswithin Honeywell HCE
· Ownership of reliability, up time, system security, cost, operations,capacity and performance-analysis
Monitor and report on servicelevel objectives for a given applications services. Work with thebusiness, Technology teams and product owners to establish key servicelevel indicators.
· Ensuringthe repeatability, traceability, and transparency of our infrastructureautomation
· Supporton-call rotations for operational duties that have not been addressedwith automation
· Supporthealthy software development practices, including complying with thechosen software development methodology (Agile, or alternatives),building standards for code reviews, work packaging, etc.
· Createand maintain monitoring technologies and processes that improve thevisibility to our applications’ performance and business metrics andkeep operational workload in-check.
· Partnering with security engineers and developing plans and automationto aggressively and safely respond to new risks and vulnerabilities.
· Develop,communicate, collaborate, and monitor standard processes to promote thelong-term health and sustainability of operational development tasks.
Windows Server Admin