System Engineer - TSG Group

Wrocław

System Engineer - TSG Group
Wrocław
NR REF.: 1097846

The company has its branches in 36 countries and operates on more than 100 financial markets providing asset management and custodian services to financial institutions, global corporations, banks and individuals.

We are searching for outstanding candidates who are passionate about building and running some of the largest and most complex software artifacts on the planet and who have the ability to quickly understand how something works that they may never have seen before. They thrive to automate everything through code, and will help us make the journey through to a true no-ops model, identifying and automating end-to-end operational processes to deliver fast, efficient and consistent results.

We look for the ability to solve problems with software, whether that’s been acquired from a textbook or at the school of hard knocks. Troubleshooting skills and the ability to unpack a problem into smaller pieces, identify possible causes, triage, and do so systematically are essential to this position. These skills could have been acquired through debugging code, operating a network, building hardware, or in other, entirely unrelated domains, however the cognitive skills and approaches to problem-solving are subject-matter agnostic and critical to have, regardless of a candidate’s background.

An ideal Systems Engineer will have a broad range of skills across a number of systems. They engineer services, and are adept at making changes to an environment safely.

Responsibilities:

• Develop, engineer, and automate every operational process feasible to limit human interaction
• Reduce human, manual toil through tooling and automation
• Engineer products to be fault tolerant, resilient, and scalable
• Be a part of a weekly, follow-the-sun, on-call rotation for their Site, providing 24x7 coverage
• Respond quickly and resolve incidents aligned with a Site’s SLOs and SLA
• Participate actively in Code Reviews
• Write clear and concise documentation around procedures and processes
• Actively participate and deliver postmortems in line with our postmortem culture
• Drive a blameless, inclusive, and collaborative environment across teams
• Exhibit ownership of action items resulting from postmortem and following them through to implementation and release
• Investigate issues across multiple internal teams as well as external vendors
• Uphold SDLC standards and release automation
• Be deliberate and use data to make decisions, not intuition
Desired Skills:
• Languages: C, Python, Bash, Go, (Rust a plus)
• Operating Systems: RHEL, Solaris, AIX
• Monitoring and Performance: Prometheus, Grafana, Moogsoft, Influxdb
• Container Technologies: Docker, Nomad, Kubernetes, Mesos, Marathon, Chronos, Nelson
• Registry Technologies: Artifactory
• Configuration Management and Remote Execution: Ansible, Puppet, Saltstack
• Networking: Software-Defined Network technologies, switch technologies, firewalls, iptables, vlans
• Storage: pure flasharray and flashblade, NFS, IBM COS, EMC EC2, S3 Object Store, lvm, xfs, ext4, zfs
• ELK (Elastic Search, Logstash, Kibana, and filebeat a plus)
• Vmware: vsphere, drs, vcenter
• System Utilities: strace, tcpdump or wireshark, valgrind, sar, htop, iotop, sysctl, logrotate, systemd, journald, rsyslog, auditd, awk, sed, grep, cut, sort, uniq, find
• SDLC: Git, Jenkins (or Travis CI), JIRA
• Infrastructure Services: DNS, DHCP, PXE, TFTP, NTP, etc.
• Resiliency Testing: Chaos Monkey
Experience with and a deep understanding of:
• Data Structures and Algorithms, including Big-O notation
• Networking
• transport and routing protocols
• network os stack internals, tuning, and implementation
• Large Complex Distributed Systems
• Sharding
• Paxos
• Chubby
• Message Queues
• Fault Tolerance
• Resiliency
• Scalability
• Caching
• Hinted Handoff
• Eventual Consistency vs. Strong Consistency
• Operating Systems
• Filesystems
• Memory Subsystem including swap and paging
• CPU Scheduling
• IO Scheduling
• NUMA
• Performance Tuning
• Kernel Compiling
• Patching
• GRUB
• Sockets
• Block Devices
• Virtualization
• scheduling, co-stop, sizing
• high availability
• vcpu
• vnuma
• vmfs
• performance monitoring
• hypervisors
• Storage
• zone and lun masking
• san fabrics and protocols
• nas and nas-like storage
• replication
• Monitoring
• exporters
• alerting and de-duplication
• linear regression
• telemetry
• SDLC
• continuous integration
• automated build pipeline
• release management and versioning
• source version control

What we can offer you:

• Challenging, fun and supportive environment within the Site Reliability Engineering world, a discipline developed by Google which has become the latest and greatest concept in management and automation of large disparate systems
• Highly competitive benefits package including pension and private medical cover

Prosimy o aplikowanie poprzez przycisk znajdujący się po prawej stronie ogłoszenia.

System Engineer - TSG Group

Hays Poland

System Engineer - TSG Group

Podziel się ze znajomymi