Jun 11, 2019

Site Reliability Engineer, Platforms, PCF

  • Kessel Run
  • Boston, MA, USA
Full time Computer Science Data Science Information Technology

Job Description

At Kessel Run we are looking to strengthen and grow our Application Operations team by bringing on board talented SRE engineers to join our platform team that manages large scale physical and virtual server environment that underpins our global platform. In this role, you will be involved with and exposed to a wide variety of systems and technologies but particularly focused on Pivotal Cloud Foundry. The Kessel Run team is aggressively moving towards "Infrastructure as Code" model. We are looking for someone with an "automation" mindset.

What Members of This Team Will Do:

Own SLA for Production Systems
Drive faster MTTD/MTTR for critical systems
Troubleshoot independently Sev1/Sev2 incidents
Own/Manage/Support baseline Operating System image/templates
Support CI/CD
Maintain and review DSC/puppet modules and support the provisioning infrastructure
Own and operate all package repos (python/java/rpm/etc.)
Drive efficiencies across the Kessel Run hybrid cloud
Invent innovative ways to drive production operational efficiency
Drive scalability and operability of supported systems/infrastructure
Own production systems scaling & throughput
Work with other teams to provide consultations in systems architecture support for new and existing production systems
Participate in on-call rotation
Create and maintain detailed documentation

Some of our larger initiatives include:

End to End Automation of system builds
Ongoing scaling of our platform to support forecasted traffic
Standardization and improving automation of systems, processes, and services Data center expansion and moving to the cloud


BA or BS degree from a 4-year college or university desired
Minimum three years systems administration/Site Reliability/Platform/DevOps background Experience with infrastructure including but not limited to data center operations, server hardware, web servers (IIS, jboss, etc.), databases (MS SQL, mySQL, mongoDB), virtualization (VMware), networking, storage, monitoring, etc.
Experience with structured programing languages (PowerShell, Python, etc.)
Experience with .NET and RestAPIs is a plus
Experience with continuous integration platforms such as Jenkins, Bamboo, Gitlab CI etc. Understanding of Agile, ITIL, DevOps practices such as CI/CD, automated testing etc.
Experience with JVM tuning and optimal configuration