System Operations Engineer
Based in Mountain View, with offices globally, we are a privately held company building our next generation platform enabling ubiquitous access to the most advanced remoting technology in the world.
We are rapidly expanding our team and are seeking a hands-on technical lead to manage the lifecycle of OnLive’s new platform.
If you are a SysOps/DevOps engineer experienced with high volume, global cloud infrastructure environments that consists of Linux/open source technologies along with virtualization, storage and load balancers, we would love to have the opportunity to tell you more about this position.
This position is dedicated to building our core infrastructure and service, as well as acting as a point of escalation to assist our elite team of Service Reliability Engineers.
- Serves as primary escalation point to resolve all platform and third party applications and critical services
- Root cause and failure analysis of issues escalated by SREs
- Implement OnLive’s new platform environment
- Manage our public cloud infrastructure
- Monitor the performance of the infrastructure
- Identify and correct bottlenecks in the system, design and implement performance-enhancing technologies
- System capacity (CPU/memory) planning, systems integration, hardware and OS maintenance
- Design and document platform enhancements to development Operations and Engineering teams
- Training and mentoring Systems Administrators
- BS or MS in IT or Computer Science and/or 8+ years of hands-on management experience is strongly desired
- Production Linux SysAdmin experience (Redhat/Centos preferred)
- Solid understanding of Linux internals, performance tuning and general systems trouble shooting skills.
- Strong scripting knowledge: Perl scripting, shell scripting and command line utilities
- Ruby programming experience is highly desirable
- Strong working knowledge of virtualization technologies products
- Automated system deployment and management of bare metal systems (PXE, Kickstart, etc.)
- Strong knowledge of DNS and network tools (ping, traceroute, etc.)
- Knowledge of SAN/NAS management is a win!
- Experience automating configuration management tools (puppet or chef)
- Working knowledge of standard infrastructure tools. (DHCP, DNS, NTP, SYSLOG, SSH, SVN, etc…)
- Experience with open source monitoring and graphing tools (nagios, graphite)
- Demonstrated understanding and respect of change-control process management
- Experience with high availability systems and storage solutions
- Must be able to troubleshoot complex issues quickly and effectively