Role Overview
Join our Data practice as a Data Engineer and help clients build modern data platforms that turn raw operational data into reliable, decision-ready pipelines. You will work with the latest open-source data stack — dbt, Dagster, Apache Iceberg, and DuckDB — on real production systems. This is a great role for someone who values working across multiple industries and problem domains, wants mentorship from senior data engineers, and is ready to take ownership of full data platform deliverables.
Responsibilities
- Build and maintain ELT pipelines using Airbyte, dbt, and Dagster
- Design lakehouse architectures on S3/GCS with Apache Iceberg or Delta Lake
- Implement data quality tests and alerting with dbt tests and Elementary
- Set up CDC pipelines with Debezium for real-time replication
- Support data consumers: analysts, ML engineers, and product teams
- Write clear technical documentation for data models and pipeline logic
Requirements
- 2+ years in a data engineering role
- Strong SQL — you can write window functions and CTEs in your sleep
- Python for custom pipeline logic and data transformations
- Experience with dbt Core (models, tests, macros)
- Familiarity with at least one cloud data warehouse (Snowflake, BigQuery, or Redshift)
- Comfortable with Git and basic CI/CD
Frequently Asked Questions
Do I need to know Spark?+
Helpful but not required. We use DuckDB and Trino for most analytical workloads, reserving Spark for large-scale batch jobs. If you know SQL and Python well, we can get you up to speed on the distributed computing side.
Role at a Glance
Full-time
2–5 years
Greater Noida / Remote
Apply for this Role
Data Engineer
Or email your CV to contact@codexops.com
