Site Reliability Engineer
Software touches every aspect of our lives: from how we bank, socialize, shop and learn, to how we entertain, obtain healthcare services, use transportation and much more. All of the world’s software needs to be developed, updated and maintained seamlessly. This is where JFrog comes in; with software development and DevOps tooling that powers the world’s digital transformation.
JFrog is on a mission to enable continuous software updates through Liquid Software, empowering developers to code high-quality applications that securely flow to end-users without interruption. JFrog is the creator of Artifactory, the heart of the JFrog Platform - a hybrid, universal, end-to-end DevOps solution available as open-source, self-hosted and as a SaaS subscription. More than 5,800 customers, including 75% of the Fortune 100, trust JFrog to manage their software binaries and accelerate their secure software delivery from code-to-production.
Site Reliability Engineering Team is the guardian for JFrog’s production systems. As an SRE at JFrog, you will be ensuring uptime and stability for our Cloud customers. You will be part of a global team working closely with a geographically distributed Support, RnD and Sales team. You will have the opportunity to work with cutting edge technologies including microservices running on containers on multiple cloud vendors. For this role, we are looking for someone who has started their career as a System Administrator and moved on to DevOps - embracing the cloud, containers, coding.
If you love working with brilliant people, being part of an energetic team, changing the world of software and you’ve got the technical skills, you might be the perfect Frog to join our Swamp! Come and help us to continue to lead the rapidly evolving space of Continuous Integration and Delivery!
- Manage our cloud-native production systems (SaaS) in AWS, GCP and Azure.
- Proficient in troubleshooting - Java applications, Linux, Networking, Log analysis, containers.
- Identify and automate manual processes and improve existing code base.
- Debugging complex problems across our cloud production systems.
- Developing and maintaining technical documentation, runbooks, and procedures
- Keep the uptime of Saas infrastructure with SLA reports.
- Defining, building and maintaining monitoring solutions for applications.
- Collaborate in a “DevOps” environment where you will work closely with our global Support, Solution Engineering, R&D, and DevOps teams Worldwide
- Keep updated on the latest trends in SRE space and bring in the best practice
- Ability to join on call 24x7 roster
- Great team player with a “can-do” attitude
- Ability to work independently, learn quickly and be proactive
Desired Skills and experience
- 3+ years of relevant SRE/DevOps work experience on Production systems
- Virtualization and containers - Docker, Kubernetes
- Linux - CentOS, Ubuntu, Other
- Networking knowledge - Firewalls, VPNs, proxies & Load balancers
- Web/Application servers - Apache, Nginx, Tomcat, JVM environments
- Monitoring and logging systems familiarity - Any tools like Graphite, LogicMonitor, SumoLogic, ELK stack, Newrelic
- Automation - Scripting (Shell, Python, or similar)
- Experience with public clouds (AWS, GCP or Azure)
- Relevant work experience and hands-on with Linux and Java applications.
- Excellent troubleshooting and problem-solving skills with a desire to take on responsibility
- Excellent written and verbal communication skills with ability to communicate technical issues to both technical and nontechnical audiences
- Bachelor’s Degree or higher education in Computer Science/Engineering Major is preferred
- Experience debugging complex problems and operating large-scale production systems
Nice to have:
- Configuration management - Ansible, Terraform
- Experience using CI/CD tools like Jenkins, Shippable, TravisCI or similar
- Databases SQL or NoSQL - MySQL, MSSQL, MongoDB, Postgres
- Storage, any of the following - NFS, SANs, RAID, lvm, EFS
- Any version control systems (SVN, Git etc.)
- Familiarity with Atlassian Suite (Confluence, JIRA, HipChat etc.)
- Knowledge of the following: Artifactory & Bintray, Build tools, CI servers, RabbitMQ