JOB DESCRIPTION
We are seeking a skilled Data Engineer to join our dynamic team. As a Data Engineer, you will be responsible for designing, constructing, and maintaining data pipelines (ETL/ELT). You will work closely with our data lake and data warehouse solutions (e.g., Redshift, BigQuery, Snowflake, Delta Lake) to ensure efficient data integration and transformation processes.
Design, build, and operate scalable data pipelines using tools such as Apache Airflow, dbt, Luigi, and Apache NiFi.
Utilize Python or Scala for data processing tasks and SQL for advanced querying (CTE, window functions, optimization).
Develop basic Bash scripts for automation and integration tasks.
Manage both relational (e.g., PostgreSQL, MySQL, SQL Server) and non-relational databases (e.g., MongoDB, Redis, Cassandra).
Implement and optimize data models (Star Schema, Snowflake Schema) for efficient data querying and analysis.
Work with Big Data frameworks like Apache Spark (Batch & Streaming), Hadoop, and Kafka as per project requirements.
Proficiency in at least one cloud platform: AWS (S3, Glue, EMR), GCP (BigQuery, Dataflow), Azure (Data Factory). Basic resource management and cost control skills.
Implement CI/CD pipelines for data pipelines using GitHub Actions, GitLab CI, or Jenkins. Understanding of Docker and containerization concepts.
Troubleshoot pipeline issues, optimize queries, and handle missing/corrupted data effectively.
Collaborate with analysts, BI developers, data scientists, product teams, and business stakeholders.
Document pipelines, processing flows, and transformation logic clearly and comprehensively.
REQUIREMENTS
Proven experience in Data Pipeline & ETL design and implementation.
Strong programming skills in Python or Scala. Proficiency in SQL for data manipulation and optimization.
Familiarity with Bash scripting, Big Data frameworks (Spark, Hadoop, Kafka), and data modeling concepts (Star Schema, Snowflake Schema).
Hands-on experience with cloud platforms (AWS, GCP, Azure) and their data services (e.g., S3, BigQuery, Data Factory).
Knowledge of DevOps/DataOps practices, including CI/CD pipelines, Docker, and containerization.
Experience with Git, Jira, Confluence, and proficiency in using monitoring tools like Prometheus, Grafana.
Strong problem-solving skills with the ability to analyze pipeline errors, optimize queries, and ensure data quality.
System thinking mindset with a holistic view of data and system architectures (data lineage, data quality).
Excellent communication and collaboration skills, with the ability to work effectively across teams.
Proven ability to work independently, design and implement pipelines, and mentor junior team members.
Education and Experience:
Bachelor’s degree in Computer Science, Data Engineering, or a related field.
3 years of experience in a similar role, demonstrating progressively responsible experience in data engineering and pipeline development.
BENEFITS
Minimum 13 months salary per year - not including other bonuses such as KPI bonus for work efficiency, project bonus and revenue bonus. We do performance review twice a year. You will work in a professional, dynamic and friendly environment.
VTI offer annual health check-ups and fully pay social insurance, health insurance and unemployment insurance premium following the Labor law.
We offer one vacation/company trip and 4 teambuilding trips per year for every employees, along with various entertainment activities including: Swimming Clubs, Yoga, Zumba, Kendo and music order via internal Radio channel.
Two MVPs will be rewarded with a free trip to Japan, Taiwan, Singapore, or else.
We offer variety promotion opportunities and chance to raising income for people with capacity, enthusiasm, and long-term commitment.
We offer free Japanese class at the company.
We provide training opportunities to help our people to improve their skills. We support our members to learn and get Cloud, AWS, PMF, PMP certification.
Working hour from: 08:30am to 05:30pm. From Monday to Friday.