Nice to have:
Keeping your assigned site or service up and running or getting it back up and running quickly when failure occurs,
Actively troubleshoot any issues that arise during testing and production, catching and solving issues before launch,
Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more,
Monitor and troubleshoot highly scalable and distributed server clusters that perform various functions, from web-servers to machine learning processing,
Be on a PagerDuty rotation to respond to availability incidents and provide support for service engineers with customer incidents,
Participate and establish best practices in Site Reliability Engineering,
Manage code deployments, fixes, updates, and related processes,
Work with a close-knit team and brainstorm on the best ways to tackle complex problems in infrastructure, security and monitoring,
Provide technical guidance and educate team members and coworkers on monitoring and logging. (Have an interesting idea or solution? Present it!),
Automating any software maintenance processes which previously required a manual procedure.
3+ years experience with software engineering, software development, or system operations on high available and high traffic environments,
Strong experience with Linux-based infrastructures, Linux/Unix administration, and Azure
Experience with databases such as PostgreSQL
Experience administering linux servers as well as docker based infrastructure (like Kubernetes, AKS, etc.) in a highly available environment,
Experience of scripting languages such as Python, Bash,
Experience with message broker/queue technologies like RabbitMQ,
Experience with modern monitoring, logging and observability tools in complex distributed systems such as with Application Insights, Grafana, New Relic, Splunk, Elastic stack, Datadog, Prometheus, etc,
Practical experience with infrastructure-as-code (with tools like Terraform, Chef, Ansible, etc.).
Good understanding of cybersecurity fundamentals and best practices,
Containerizing and clustering (Dockerfiles, docker-compose, Helm, Kubernetes, etc.),
Stellar problem-solving and troubleshooting skills with the ability to spot issues before they become problems,
Fluent language skills in English,
Excellent oral and written communication skills,
Process-oriented with great documentation skills,
Solid team player!
Founded in 2015 in Switzerland, Metaco is an enterprise technology company whose mission is to enable financial and non-financial institutions to securely build their digital asset operations. The company’s core product, Harmonize™, is a mission-critical orchestration platform for digital assets. From asset-agnostic custody and trading to tokenization, staking and smart contract management, the platform seamlessly connects institutions to the broad universe of decentralized finance and Web3 decentralized applications.
Metaco has established itself as the institutional standard for digital asset infrastructure, trusted by the world’s largest global custodians, banks, regulated exchanges, and corporates. Its software and technology solutions enable institutions to store, trade, issue and manage any type of digital asset – such as crypto and digital currencies, digital securities, non-fungible tokens (NFTs) – with the highest possible security and agility.
We are looking for a Site Reliability Engineer to join our Engineering team in Lausanne, Switzerland (or remotely). As a member of our team, you will be contributing to the development of our platform, closely with blockchain experts, software engineers and partner organizations to enable our clients turn their projects into commercial success.
We are a dynamic and fast-growing company, working collectively to tackle the most challenging problems at the intersection of distributed ledgers, blockchain technology, applied cryptography, banking, capital markets and finance. We provide an entrepreneurial culture, where merit, contribution and teamwork are rewarded. Our team is important to us, and we work hard to support both the personal and professional development of team members. We understand that maintaining a good work-life balance is crucial to a healthy and happy workplace, that is why we provide flexible working policies for employees to fit their individual needs. Join us to make your mark on the transformation of the financial services industry.