Pyspark - Data Architect

dubai, dubai, United-Arab-Emirates • Posted June 18, 2026

Job Type: Full-time
Location: dubai, dubai
Posted: June 18, 2026
Category: Other-General
Application Deadline: July 28, 2026

Role Description

Responsibilities
  • Data Pipeline Development: Design, develop and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform ensuring data integrity.
  • Ingestion: Implement and manage data ingestion processes from a variety of sources (relational databases, APIs, file systems) to the data lake or data warehouse.
  • Transformation and Processing: Use PySpark to process, cleanse and transform large datasets into meaningful formats that support analytical needs and business.
  • Optimization: Conduct performance tuning of PySpark code and Cloudera components, optimizing resource utilization and reducing runtime of ETL.
  • Quality and Validation: Implement data quality checks, monitoring and validation routines to ensure data accuracy and reliability throughout.
  • Orchestration: Automate data workflows using tools like Apache Oozie, Airflow or similar orchestration tools within the Cloudera environment.
  • ...

Interested in this role?

Click the button below to start your application for Pyspark - Data Architect at Virtusa.

Apply Now