Site Reliability Engineer

Posted: May 3, 2024

Role Number:200550146

Do you love working on highly scalable and secure distributed applications? Do you want your technical abilities to be challenged every day and for your work to make a difference in the lives of millions of people? If so, the Product Engineering Systems group is looking for dedicated hands-on SRE Engineer who are not afraid to share knowledge, think creatively, and question assumptions. Our group is responsible for Enterprise Product Lifecycle Management (PLM) transformation initiative to deliver next generation PLM/PIM Platform to drive Apple’s Product Innovation across hardware, software and services lines of business. Join us to do the best work of your life with a welcoming, diverse, and hard-working group of engineers. Bring passion and dedication to the job, and there’s no telling what you could accomplish!

Key Qualifications

Excellent knowledge of ITIL terminology for incident and problem management
Proven experience in monitoring distributed systems application architectures in log monitoring and analysis tools (e.g. Splunk).
Hands-on experience in java programming and REST APIs for Application debugging and root cause analysis.
Proficient in at-least one programming or scripting languages like Perl, Python, Ruby etc., for developing tools in Observability, ETL etc..
Track record of excellent interpersonal, analytical, and communication skills.

Description

You demonstrate passion for achieving the highest level of uptime, emphasizing scalability and high-performance. You have the zeal to enhance our systems observability ensuring that we have the necessary insights and tools to monitor, troubleshoot, and optimize our applications and infrastructure. Expertise in debugging and root causing issues with an instinct to automate repetitive tasks. • Enhance System Observability: You will be implementing and maintaining robust observability solutions which provides real-time insights into the performance and health of our systems to proactively identify and address potential issues before they impact the users. • Troubleshooting and Root Cause Analysis: Utilize your expertise to investigate and resolve incidents quickly during crisis situations, performing root cause analysis to prevent recurrence • Automation: Leverage your coding skills to create tools and automating runbooks to improve efficiency. • Documentation: Documenting and managing Runbooks and best practices to ensure knowledge sharing and team efficiency. • Communication: Strong interpersonal skills and ability to work effectively across multiple business and technical teams

Additional Requirements

• Strong understanding of database principles and working knowledge in distributed storage and infrastructural solutions such as Oracle, Cassandra, SOLR, and Kafka
• Good command on Linux, Networking concepts (TLS/SSL, DNS, Load Balancers, etc.,) and troubleshooting skills in large scale environments
• Experience with container management and micro-services architectures such as Docker in cloud and on-premises infrastructure.
Apple is an Equal Opportunity Employer that is committed to inclusion and diversity. We also take affirmative action to offer employment and advancement opportunities to all applicants, including minorities, women, protected veterans, and individuals with disabilities. Apple will not discriminate or retaliate against applicants who inquire about, disclose, or discuss their compensation or that of other applicants.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

Site Reliability Engineer

Summary

Key Qualifications

Description

Education & Experience

Additional Requirements

Site Reliability Engineer

Add a favorite

Summary

Key Qualifications

Description

Education & Experience

Additional Requirements

Add a favorite