Pyspark - Data Architect
dubai, dubai, United-Arab-Emirates • Posted June 18, 2026
Job Type:
Full-time
Location:
dubai, dubai
Posted:
June 18, 2026
Category:
Other-General
Application Deadline:
July 28, 2026
Role Description
Responsibilities
- Data Pipeline Development: Design, develop and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform ensuring data integrity.
- Ingestion: Implement and manage data ingestion processes from a variety of sources (relational databases, APIs, file systems) to the data lake or data warehouse.
- Transformation and Processing: Use PySpark to process, cleanse and transform large datasets into meaningful formats that support analytical needs and business.
- Optimization: Conduct performance tuning of PySpark code and Cloudera components, optimizing resource utilization and reducing runtime of ETL.
- Quality and Validation: Implement data quality checks, monitoring and validation routines to ensure data accuracy and reliability throughout.
- Orchestration: Automate data workflows using tools like Apache Oozie, Airflow or similar orchestration tools within the Cloudera environment. ...
Interested in this role?
Click the button below to start your application for Pyspark - Data Architect at Virtusa.
Apply Now