What you'll do...
As a member of the SRE team, you will work with other DevOps practitioners to produce mission-critical infrastructure, tools, and processes that will ensure highest levels of availability and reliability of all our websites. As a member of the team, you will be expected to work with peers and customers to implement the technical vision of the team.
You are right for the job if you are comfortable with deep technical Linux, networking topics, and distributed architectures. You will work cross-functionally amongst a variety of teams and be a core contributor in every significant engineering service or solution that we deliver to our stakeholders. You will excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization, and organization. You will work with our Software Engineering teams to build our next generation “always up” and “highly available” cloud-based e-commerce/Retail and Enterprise platform.
Site Reliability Engineers are hybrid systems and software engineers who are responsible and take ownership for reliability, scalability, automation, and other issues related to uptime and availability of Walmart’s e-commerce/Stores and Enterprise platform. Our goal is to build, scale and guard the systems that delights the customers. To do so, you will need to strong skills in following areas:
- Design, write and build tools to improve the reliability, latency, availability and scalability of Walmart e-commerce/Retail and Enterprise products.
- Engender reliability and availability starting with metrics and measurements.
- Enable scaling by providing tools, developing training and/or augmenting processes.
- Build tools/automate to prevent re-occurrence of problem to mission critical products/services.
- Augment existing instrumentation to build a cohesive picture of the characteristics of our systems with special attention to points of failure.
- Participate in capacity planning, demand forecasting, software performance analysis and system tuning.
- Develop a deep understanding of the numerous services and applications that come together to deliver Walmart e-commerce/Retail and Enterprise products
- Design new tools to monitor and smart alerts that help discover failures/issues in a timely fashion and work with engineers to identify root cause and fix issues.
- Influence, design and create new architectures, standards, and methods for large-scale enterprise systems.
- Root-cause analysis complex problems involving multiple parties, networks, hardware, and software that relate to scaling and performance.
- Participate in on-call rotation.
- Secure the system from issues, be they real, perceived, or notional.
- High focus on collecting and inferring metrics.
- Experience with containerization and container platforms. (e.g., Docker, Kubernetes, Docker EE, OpenShift, Mesosphere)
- Experience with configuration management tools such as Ansible, Saltstack, Chef, and Puppet
- Build and drive the automation systems that maintain system health.
Additional responsibilities may include:
- Drives standardization and service focused instrumentation. Provides subject matter expertise. Resolves break/fix scenarios, engaging broader teams as necessary; and partners/leads to achieve continuous improvement. Contributes to command-and-control related activities focused on restoration of complex outages, and rapid restoration. Participate on 24/7 on-call rotation. May work independently or as part of a team on more complex projects. Provides mentoring and guidance to more junior team members.
- Creates systems engineering and architectural documentation to be used by others to build and maintain systems.
- Scripting and Development responsibilities: Develop software in several modern languages. Develops large/complex database-backed systems and understands DB schema and query performance. Utilizes professional best practices in day-to-day work like revision control, unit testing, or other. Applies statistical data analysis techniques.
- Networking responsibilities: Understanding and performing TCP dumps, snoop, and other network sniffers. Understands and applies knowledge of most protocols (TCP/IP, HTTP, UDP, etc.)
- Application Technologies): Provides recommendations and advice to the team and/or department in the areas of web services, OS, and storage, including being an active liaison to Development, QA, and the Business.
- Analyzes systems and makes recommendations to prevent potential problems. Takes lead on issue resolution activities using knowledge of complex and company-wide systems.
- End-to-end audit of monitors and alarms based on subsystem knowledge.
- Utilizes time management and project management skills to lead the resolution of issues in a timely and organized manner, effectively communicating necessary information. May consult directly with developers or third-party vendors; provides subject matter expertise.
- Consistent exercise of independent judgment and discretion in matters of significance.
- Other duties and responsibilities as assigned.
- 4+ years in a software development, DevOps role, or SRE role.
- Experience in designing, investigating, analyzing, and troubleshooting large-scale enterprise systems.
- Methodical and systematic problem-solving approach, combined with a solid awareness of ownership, initiative, and drive.
- Fluency with running services at scale; understanding of Unix systems internals and networking.
- Networking knowledge and strong understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, OSI layers, and load balancing).
- Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols along the way. Experience administering Linux systems in a production environment.
- Programming experience in one or more of the following languages: Go, Java, Python, Ruby, Shell
- Bachelor's Degree in Computer Science or a related field, or relevant work experience
- Experience with distributed version control like Git or similar
- Experience with IaaS and PaaS providers such as AWS, AZURE OpenStack, GCP
- Experience with containerization and container platforms. (e.g., Docker, Kubernetes, Docker EE, OpenShift, Mesosphere).
- Experience with enterprise monitoring solutions like AppDynamics, New Relic, Prometheus, Graphite, Nagios, Sensu and Splunk
- Familiarity with continuous integration/deployment processes and tools such as Jenkins, Maven, Nexus, etc.,
Assists in providing guidance to small groups of two to three engineers, including offshore associates, for assigned Engineering projects by proving
pertinent documents, directions, examples, and timeline.
Provides support to the business by responding to user questions, concerns, and issues (for example, technical feasibility, implementation strategies);
researching and identifying needed solutions; determining implementation designs; providing guidance regarding implications of new and enhanced
systems; identifying short and long term solutions; and directing users to appropriate contacts for issues outside of associate's domain.
Manages small to large-sized complex projects by reviewing project requirements; translating requirements into technical solutions; researching and
identifying alternative solutions; determining needed solution based on return on investment and value add to the business; gathering requested
information (for example, design documents, product requirements, wire frames); writing and developing code; conducting unit testing; communicating
status and issues to team members and stakeholders; collaborating with project team and cross functional teams; identifying areas of opportunity;
interpreting information and identifying a solution; ensuring solution is sustainable across implementation and use; troubleshooting open issues and
bug-fixes; and ensuring on-time delivery and hand-offs.
Troubleshoots business and production issues by gathering information (for example, issue, impact, criticality, possible root cause); performing root
cause analysis to reduce future issues; engaging support teams to assist in the resolution of issues; developing solutions; driving the development of
an action plan; performing actions as designated in the plan; interpreting the results to determine further action; and completing online documentation.
Participates in the discovery phase of small to medium-sized projects to come up with high level design by partnering with the product management,
project management, business, and user experience teams.
Demonstrates up-to-date expertise and applies this to the development, execution, and improvement of action plans by providing expert advice and
guidance to others in the application of information and best practices; supporting and aligning efforts to meet customer and business needs; and
building commitment for perspectives and rationales.
Provides and supports the implementation of business solutions by building relationships and partnerships with key stakeholders; identifying business
needs; determining and carrying out necessary processes and practices; monitoring progress and results; recognizing and capitalizing on
improvement opportunities; and adapting to competing demands, organizational changes, and new responsibilities.
Models compliance with company policies and procedures and supports company mission, values, and standards of ethics and integrity by
incorporating these into the development and implementation of business plans; using the Open Door Policy; and demonstrating and assisting others
with how to apply these in executing business processes and practices.
Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.
Bachelor’s degree in Computer Science and 2 years’ experience in software engineering or related field OR 4 years’ experience in software engineering or related field.
Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.
Masters: Computer Science
805 SE MOBERLY LN, BENTONVILLE, AR 72712, United States of America