#

System Engineer SRE - TSG Group

Hays Poland

Wrocław, dolnośląskie

Hays Poland

System Engineer SRE - TSG Group

Wrocław
System Engineer SRE - TSG Group
Wrocław
NR REF.: 1103608

For our client, one of the biggest financial services companies we are looking forSite Reliability Engineers.


We are searching for outstanding candidates who are passionate about building and running some of the largest and most complex software artifacts on the planet and who have the ability to quickly understand how something works that they may never have seen before. They thrive to automate everything through code, and will help us make the journey through to a true no-ops model, identifying and automating end-to-end operational processes to deliver fast, efficient, and consistent results.

We look for the ability to solve problems with software, whether that’s been acquired from a textbook or at the school of hard knocks. Troubleshooting skills and the ability to unpack a problem into smaller pieces, identify possible causes, triage, and do so systematically are essential to this position. These skills could have been acquired through debugging code, operating a network, building hardware, or in other, entirely unrelated domains, however the cognitive skills and approaches to problem-solving are subject-matter agnostic and critical to have, regardless of a candidate’s background.

An ideal Site Reliability Engineer will have a broad range of skills across a number of systems. They engineer services, and are adept at making changes to an environment safely. We also expect our SRE’s to be collaborate and inclusive to produce great results.


Our team is currently involved in the following:

  • Greenfield – designing and building new platform with no technical debt
  • Multi datacenter, multi-region application and infrastructure containerization and orchestration
  • OS and Image Build Automation
  • Hosting complex Apps in a containerized environment
  • Comprehensive postmortems
  • Scaling and building fault tolerant systems
  • Secrets Delivery
  • Orchestrators
  • Chaos Testing
  • No-Ops
  • Registry
  • Automation, automation, automation!


Responsibilities:

  • Develop, engineer, and automate every operational process feasible to limit human interaction
  • Reduce human, manual toil through tooling and automation
  • Engineer products to be fault tolerant, resilient, and scalable
  • Be a part of a weekly, follow-the-sun, on-call rotation for their Site, providing 24x7 coverage
  • Respond quickly and resolve incidents aligned with a Site’s SLOs and SLA
  • Participate actively in Code Reviews
  • Write clear and concise documentation around procedures and processes
  • Actively participate and deliver postmortems in line with our postmortem culture
  • Drive a blameless, inclusive, and collaborative environment across teams
  • Exhibit ownership of action items resulting from postmortem and following them through to implementation and release
  • Investigate issues across multiple internal teams as well as external vendors
  • Uphold SDLC standards and release automation
  • Be deliberate and use data to make decisions, not intuition


Technical Qualifications:

  • Demonstrable hands-on experience with Linux, Docker and SRE or DevOps needed.
  • Experience with automation and configuration management – Ansible, Puppet or Salt.
  • Experience with fault and performance monitoring using tools like Prometheus, Grafana, Moogsoft or Influxdb
  • Experience with the architecture and implementation of PaaS software such as Cloud Foundry, Mesos / Marathon, or Kubernetes.
  • Deep understanding of Linux and OS Tuning.
  • Experience with troubleshooting complex problems and finding root cause in Linux systems.
  • Experience with virtualization technologies – kvm, vmware, etc.
  • Deep understanding of how to build fault tolerance and scalability systems.
  • Experience with Python – preferred or any other programming language.
  • Good understanding of Software Development Life Cycle, continuous integration and deployment, code reviews, testing, pipelines, git, Jenkins, etc.
  • Good understanding of building, deploying, and maintaining critical applications in a cloud based environment.
  • Grasp of software engineering skills in modular design, data structures, algorithms, and UNIX systems development.


What we can offer you:

  • Challenging, fun and supportive environment within the Site Reliability Engineering world, a discipline developed by Google which has become the latest and greatest concept in management and automation of large disparate systems
  • Work with state of the art enterprise and cutting edge technologies like docker, mesos, marathon, nomad, packer, vault, ansible, salt, consul, terraform, nexus, artifactory, ci/cd, influxdb, grafana, prometheus, gitlab, jenkins, vmware, azure, rhel, illumio, tanium, veritas cluster, solarix, aix, cloudbolt, chaos monkey, openscap, and many more.
  • Highly competitive benefits package including pension and private medical cover
Prosimy o aplikowanie poprzez przycisk znajdujący się po prawej stronie ogłoszenia.
Hays Poland

Czy chcesz otrzymywać oferty pracy na podobne stanowiska?

Utwórz powiadomienie e-mail
Zapisz mnie

Zapisani kandydaci otrzymują informacje jako pierwsi.

Podziel się ze znajomymi