C

Senior Network Developer AI2NE

CLBPTS
Full-time
On-site
Seattle, Washington, United States
$87,000 - $178,200 USD yearly
Description

The AI2NE Org strives to be global leaders in the RDMA cluster networking domain and enable seamless, accelerated High-Performance Compute (HPC), Artificial Intelligence and Machine Learning advancements. We envision a future where artificial intelligence and machine learning revolutionize industries, reshape societies, and unlock limitless possibilities. Our vision is to be a pioneering force, driving the development and design of state-of-the-art RDMA clusters tailored specifically for AI, ML, HPC workloads.


We strive to be the go-to experts in RDMA cluster architecture, leveraging our deep understanding of the unique demands of AI/ML and HPC applications. By staying at the forefront of technological advancements, we aim to redefine the boundaries of what is possible, pushing the envelope of computational capabilities and unlocking unprecedented performance.


This position supports the design, deployment, and operations of a large-scale global Oracle cloud computing environment (Oracle Cloud Infrastructure - OCI). Primarily focused on development and support of network fabric and systems through a combination of a deep level understanding of networking at the protocol level coupled with programming skills to support the intensive automation required to operate a production environment.  As OCI is a cloud-based network with a global footprint, this support will include hundreds of thousands of network devices supporting millions of servers, connected over a mix of dedicated backbone infrastructure and the Internet. 


Ultimately, our vision is to enable a future where AI and ML technologies are seamlessly integrated into everyday life, solving complex problems, enhancing decision-making processes, and creating a positive impact on a global scale. Through our commitment to innovation, excellence, and collaboration, we aim to support the driving force behind this transformative era, revolutionizing the way we perceive and interact with technology.



  1. Research and Development: Conduct cutting-edge research to understand the evolving landscape of AI and ML, and apply the findings to the development of RDMA clusters. Explore new algorithms, hardware architectures, and optimization techniques to maximize performance.

  2. Design and Engineering: Design and engineer RDMA clusters that align with the unique demands of AI/ML, HPC, and Database workloads. Collaborate with hardware and software engineers to optimize cluster architecture, cooling systems, power efficiency, and integration with AI and HPC frameworks.

  3. Capacity Scaling: Global deployments of RDMA clusters that meet the needs of the business and our customers.

  4. Testing and Quality Assurance: Develop rigorous testing procedures to ensure the stability, reliability, and compatibility of RDMA clusters. Conduct comprehensive benchmarking, stress testing, and validation to guarantee optimal performance and adherence to industry standards.

  5. Customer Engagement and Support: Establish strong relationships with customers, understand their requirements, and provide expert guidance on RDMA cluster configuration, deployment, and optimization. Offer ongoing technical support, training, and troubleshooting to ensure customer success.

  6. Documentation and Knowledge Sharing: Create comprehensive documentation, user guides, and best practices to empower users in utilizing RDMA clusters effectively. Facilitate knowledge sharing through internal and external forums, presentations, and workshops to promote learning and collaboration

Career Level - IC3



Responsibilities

  • Participate in Network lifecycle management through network build and/or upgrade projects.  

  • Collaborate with program/project managers to develop milestones and deliverables.  

  • Will primarily use existing procedures and tools to develop and safely execute network change.  However, may have to develop new procedures from time to time.  

  • Serve as technical lead for team projects.  Contributes to the development of roadmap issues.  

  • Leads development of new runbooks and method of procedures.  

  • Mentors junior engineers. 

  • Participates in network solution and architecture design process.  

  • Responsibility for developing standalone features.  

  • Participate in operational rotations as either primary or secondary.  

  • Provide break-fix support for events. 

  • Serve as escalation point for event remediation.  

  • Lead post-event root cause analysis.  

  • Coordinate with networking automation services for the development and integration of support tooling.  

  • Frequently develops scripts to automate routine tasks for team and business unit.  

  • Serves as SME on software development projects for network automation.  

  • Supports network vendor software bug fixes.  

  • Collaborate with network vendor technical account team and internal Quality Assurance team to drive bug resolution and assist in the qualification of new firmware and/or operating systems.               



Qualifications
Disclaimer:

Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.

Range and benefit information provided in this posting are specific to the stated locations only

US: Hiring Range: from $87,000 to $178,200 per annum. May be eligible for bonus and equity.

Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle’s differing products, industries and lines of business.
Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.

Oracle US offers a comprehensive benefits package which includes the following:
1. Medical, dental, and vision insurance, including expert medical opinion
2. Short term disability and long term disability
3. Life insurance and AD&D
4. Supplemental life insurance (Employee/Spouse/Child)
5. Health care and dependent care Flexible Spending Accounts
6. Pre-tax commuter and parking benefits
7. 401(k) Savings and Investment Plan with company match
8. Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
9. 11 paid holidays
10. Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
11. Paid parental leave
12. Adoption assistance
13. Employee Stock Purchase Plan
14. Financial planning and group legal
15. Voluntary benefits including auto, homeowner and pet insurance

The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.