CyberArk, the global leader in privileged access management, helps organizations transform their business through improved security and reduced risk. As a trusted partner for thousands of companies around the world, CyberArk consistently sets the bar – driving innovation and helping our customers stay one step ahead of attackers.
CyberArk SRE are coders who enjoy a challenge and own the availability of CyberArk SaaS (infrastructure - application), by measuring failures and availability of SLIs and SLOs, using a proactive approach of prevention over mitigation and mitigation over fixing. The SRE collaborates with Dev and work with PM in order to continuously improve the services availability and quality. They will share ownership with the Dev team to create shared responsibility where the SRE owns the availability of the service, proactive prevention of issues, performing deliberate and structured troubleshooting to mitigate issues.
CyberArk Cloud Engineering is looking for a Site Reliability Engineer with "automation first" mindset who is passionate about performance, stability and security to share responsibility over the ownership of CyberArk SaaS reliability. The Site Reliability Engineer will work closely with the Dev teams and the DevOps Engineers to ensure the security, performance, resiliency and scale of production services.
- Monitor and improve the availability, performance of production services
- Apply prevention steps in order to improve production services reliability
- Mitigate issues on production systems and build solutions through automation to prevent them from reoccurring
- Enhance and feed the monitoring system to improve service reliability and to provide other teams at CyberArk with the dashboards to help deliver an excellent service to our customers
- Automate common, repeatable tasks using Ansible and scripting languages
- Triage and manage escalation of cases
- Performance deliberate and structured Troubleshooting
- Share the on-call rotation and act as an escalation contact for incidents
- Lead blameless postmortem
- Influence design / architecture of services to proactively prevent system failures
- 3-5 years of experience focused on site reliability, DevOps Engineering, system administration or application development
- Strong hands-on experience in Linux/Unix and Windows OS
- Hands-on experience with the following scripting technologies: Python, Ruby, Bash, PowerShell
- Automation/Configuration management using either Ansible, Puppet, Chef or an equivalent
- Ability to keep track of numerous detail-intensive, interdependent tasks and ensure their accurate completion
- Understanding of network architecture and security configurations
- Strong computer literacy and/or the comfort, ability and desire to advance technically
- Bachelor’s Degree in Computer Science or related field
- Excellent communication skills and attention to detail
- Strong hands-on technical abilities
- Competitive compensation depending on experience and skills;
- Flexible work schedule;
- Social package;
- Sick leave and regular vacation;
- Partial coverage of costs for certification and IT conferences;
- English classes with certified English teachers.