SIVASUBRAMANIYAM T

Data Engineer | ETL Specialist | Azure Databricks Expert

About Me

Detail-oriented Data Engineer with 3.7 years of experience specializing in data ingestion, transformation, and processing using Python, PySpark, SQL, and Azure Databricks. I have hands-on involvement in support roles for data access management and recently enhanced my skills in Pandas, NumPy, and FastAPI for data manipulation and analysis.

Passionate about building scalable data pipelines and solving complex data challenges in cloud environments.

Technical Skills

Languages

  • Python
  • SQL

Big Data & Tools

  • Apache Spark (PySpark)
  • Databricks
  • ETL Pipelines

Cloud Platforms

  • Microsoft Azure
  • Azure Blob Storage
  • Azure Databricks

Libraries & Frameworks

  • Pandas & NumPy
  • SQLAlchemy
  • FastAPI
  • Pydantic

Professional Experience

Senior Analyst

Capgemini Technology Services India Limited, Bangalore
December 2021 – Present
  • Developed scalable ETL pipelines using PySpark on Azure Databricks to process semi-structured data from Azure Blob Storage
  • Automated the extraction of JSON/CSV files from Blob, and stored the output in curated zones
  • Performed transformation logic including column addition, column mapping, deriving columns, and type casting
  • Used SQL within Databricks notebooks to apply business logic and filtering conditions to data
  • Implemented asynchronous FastAPI endpoints with Pydantic, ensuring efficient handling of concurrent requests in microservices architecture
  • Implemented dependency injection in FastAPI to maintain clean, modular code structure
  • Developed a Python tool to create Excel trackers and automate report generation, reducing manual effort
  • Acted as point of contact for business users requesting access to confidential reports and datasets
  • Managed data access permissions and audit trails through approved tools

Featured Projects

Real-Time Data Pipeline

Built an end-to-end real-time data processing pipeline that ingests streaming data from multiple sources, transforms it using PySpark, and loads it into Azure Data Lake for analytics. The pipeline handles millions of records daily with high reliability and performance.

PySpark Azure Databricks Blob Storage SQL

Automated Reporting System

Designed and implemented an automated reporting system using Python and FastAPI that generates customized reports from multiple data sources. The system reduced manual reporting time by 70% and improved data accuracy through automated validation checks.

Python FastAPI Pandas Excel

Data Quality Framework

Developed a comprehensive data quality framework that validates incoming data against business rules and schema definitions. The framework provides automated alerts for data anomalies and maintains detailed audit logs for compliance purposes.

PySpark SQL Azure Python

Microservices API Gateway

Created a high-performance API gateway using FastAPI with asynchronous endpoints to handle data requests from multiple client applications. Implemented dependency injection patterns and comprehensive error handling for improved maintainability and reliability.

FastAPI Pydantic Python SQLAlchemy