SIVASUBRAMANIYAM T - Data Engineer Portfolio

About Me

Detail-oriented Data Engineer with 3.7 years of experience specializing in data ingestion, transformation, and processing using Python, PySpark, SQL, and Azure Databricks. I have hands-on involvement in support roles for data access management and recently enhanced my skills in Pandas, NumPy, and FastAPI for data manipulation and analysis.

Passionate about building scalable data pipelines and solving complex data challenges in cloud environments.

Technical Skills

Languages

Python
SQL

Big Data & Tools

Apache Spark (PySpark)
Databricks
ETL Pipelines

Cloud Platforms

Microsoft Azure
Azure Blob Storage
Azure Databricks

Libraries & Frameworks

Pandas & NumPy
SQLAlchemy
FastAPI
Pydantic

Professional Experience

Senior Analyst

Capgemini Technology Services India Limited, Bangalore

December 2021 – Present

Developed scalable ETL pipelines using PySpark on Azure Databricks to process semi-structured data from Azure Blob Storage
Automated the extraction of JSON/CSV files from Blob, and stored the output in curated zones
Performed transformation logic including column addition, column mapping, deriving columns, and type casting
Used SQL within Databricks notebooks to apply business logic and filtering conditions to data
Implemented asynchronous FastAPI endpoints with Pydantic, ensuring efficient handling of concurrent requests in microservices architecture
Implemented dependency injection in FastAPI to maintain clean, modular code structure
Developed a Python tool to create Excel trackers and automate report generation, reducing manual effort
Acted as point of contact for business users requesting access to confidential reports and datasets
Managed data access permissions and audit trails through approved tools

Featured Projects

Real-Time Data Pipeline

Built an end-to-end real-time data processing pipeline that ingests streaming data from multiple sources, transforms it using PySpark, and loads it into Azure Data Lake for analytics. The pipeline handles millions of records daily with high reliability and performance.

PySpark Azure Databricks Blob Storage SQL

Automated Reporting System

Designed and implemented an automated reporting system using Python and FastAPI that generates customized reports from multiple data sources. The system reduced manual reporting time by 70% and improved data accuracy through automated validation checks.

Python FastAPI Pandas Excel

Data Quality Framework

Developed a comprehensive data quality framework that validates incoming data against business rules and schema definitions. The framework provides automated alerts for data anomalies and maintains detailed audit logs for compliance purposes.

PySpark SQL Azure Python

Microservices API Gateway

Created a high-performance API gateway using FastAPI with asynchronous endpoints to handle data requests from multiple client applications. Implemented dependency injection patterns and comprehensive error handling for improved maintainability and reliability.

FastAPI Pydantic Python SQLAlchemy