i4DM Logo

i4DM

PySpark & Delta Lake Developer

Reposted 4 Days Ago
Remote
Hiring Remotely in USA
Mid level
Remote
Hiring Remotely in USA
Mid level
The PySpark & Delta Lake Developer is responsible for designing scalable ETL pipelines for healthcare data, ensuring ACID compliance, data quality, and optimal performance within AWS.
The summary above was generated by AI

About Our Team

Our employees thrive in a culture that's fast-paced and ego-free, where innovation and collaboration are encouraged at every turn. We are an organization that provides federal agencies instant access to experienced and talented professionals who understand their unique challenges and know the most efficient ways to address them. We are continually investing in resources and talent, so we stay prepared with specialized teams in the place who are experts in creating tailored technologies. Our solutions empower Federal organizations to grow, modernize, and succeed in a rapidly evolving landscape.

We welcome diverse perspectives and seek individuals who are passionate about technology and creative problem-solving. If you enjoy learning, growing, and tackling real-world challenges, you will thrive here. Veterans and military spouses are strongly encouraged to apply and bring their unique experience to our team.

About the Role:

Our core values of People Matter, Integrity, and a Commitment to Excellence drive all that we do. By joining us, you will become a part of a fun and diverse team of talented and creative consultants who share the goal of using the latest technology to solve business challenges. We provide our clients with a dynamic mix of services and deliver focused solutions like no one else.

We are seeking talented and bright team players who are passionate about technology and want to work in a fast-paced, dynamic, and ego-free culture while applying a creative approach to problem-solving. Team members who like to grow their skill sets while solving challenging, real world business problems thrive.

We are looking for an experienced PySpark & Delta Lake Developer, who will be responsible for designing, building, and maintaining scalable ETL pipelines to process and analyze large-scale healthcare claims data. This role emphasizes building robust Delta Lake tables and ensuring ACID-compliant data lakes. The ideal candidate will focus on developing efficient PySpark scripts and leveraging Delta Lake capabilities to deliver data reliability, high performance, and seamless schema evolution within an AWS environment.

Key Responsibilities:

  • Design, develop, and maintain robust ETL pipelines using PySpark and Delta Lake for large and complex healthcare data workloads.
  • Implement and optimize data lake solutions using Delta Lake table formats, supporting ACID transactions, schema enforcement, and time travel.
  • Write efficient, reusable, and well-documented PySpark scripts for data ingestion, transformation, cleansing, and aggregation.
  • Collaborate with data engineers, architects, and data scientists to understand business and data requirements and translate them into scalable data solutions.
  • Ensure data quality, consistency, lineage, and integrity across all stages of data processing.
  • Troubleshoot, debug, and optimize PySpark applications and Delta Lake workflows for cost, speed, and reliability within AWS.
  • Maintain detailed and up-to-date technical documentation of code, data pipelines, and standard operating procedures.
  • Stay updated with the latest Delta Lake and Spark advancements, advocating for best practices in data management and analytics.

TAG: INDMJC

TAG: #LI-I4DM

Required Qualifications:

  • Strong proficiency in Python and PySpark, with hands-on experience developing data pipelines.
  • Advanced experience with Delta Lake and its ACID transaction and schema management features.
  • Solid SQL skills for querying, joining, and optimizing data in distributed environments.
  • Hands-on experience with AWS cloud data services (e.g., S3, Glue, EMR, Athena).
  • Familiarity with data lake concepts, partitioning, and performance tuning.
  • Excellent communication skills and a desire to continuously learn and adapt to innovative technologies.
  • Familiarity with CI/CD, version control (e.g., Git), and infrastructure as code.

Preferred Qualifications:

  • Experience with healthcare or claims data.
  • Knowledge of data governance, security, data cataloging (AWS Glue Catalog), and compliance best practices.
  • Strong ability to prioritize and execute tasks independently and within collaborative team environments.
  • Previous experience working in a government or public sector setting.

Top Skills

Athena
AWS
Delta Lake
Emr
Glue
Pyspark
Python
S3
SQL

Similar Jobs

5 Hours Ago
Remote or Hybrid
Santa Clara, CA, USA
191K-334K Annually
Senior level
191K-334K Annually
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Lead and manage software development teams, coordinate with product, design, and support, oversee daily development activities, mentor staff, integrate AI into workflows, enforce coding standards and best practices, and deliver high-quality solutions aligned with company priorities.
Top Skills: Java,C++,Ruby,Shell,Javascript,Servicenow,Ai
5 Hours Ago
In-Office or Remote
Long Beach, CA, USA
105K-198K Annually
Senior level
105K-198K Annually
Senior level
Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
Design, build, and maintain secure CI/CD pipelines and tooling for safety-critical avionics software. Automate deployment, integration, testing, and security controls across cloud, container, and hybrid environments while supporting certification and cross-functional teams.
Top Skills: AWSAzureCi/CdDevsecopsDockerGCPJavaKubernetesLinuxPythonWindows
5 Hours Ago
In-Office or Remote
Centennial, CO, USA
92K-178K Annually
Mid level
92K-178K Annually
Mid level
Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
Develop and maintain Python-based DevOps tools, implement and update GitLab CI pipelines, create Bazel modules, and support migration and orchestration/packaging/testing automation for Boeing Software Factory. Collaborate with teams to integrate embedded software and CI/CD tooling across the organization.
Top Skills: Python,Gitlab Ci,Bazel,Java,Rust,C,C++,Rtos,Ci/Cd,Embedded Systems,Agile

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

  • Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
  • Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
  • Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
  • Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account