Welcome to my professional portfolio of Python projects.
This repository highlights my work in data engineering, automation, backend development, and microservices, showcasing end-to-end solutions aligned with real-world business use cases.
- Designed an ETL pipeline to extract job data from 5,000+ portals (APIs, Selenium, PDFs).
- Applied Pandas/NumPy transformations for data cleaning, enrichment, and aggregation.
- Loaded processed datasets into SQL databases for analytics dashboards and reporting.
- Built a pipeline to automate PDF-to-Excel data extraction for 4 years of billing records.
- Reduced reporting time from days to minutes using pdfplumber, Pandas, and openpyxl.
- Developed modular microservices for data ingestion, validation, and reporting.
- Exposed REST APIs with FastAPI, containerized with Docker for scalability.
- Implemented an OCR-based captcha solver using Google Vision/Tesseract.
- Integrated into large-scale automation workflows for improved data acquisition.
- Automated scraping and data refresh using Selenium + scheduling scripts.
- Ensured real-time updates and high reliability across distributed environments.
- Languages: Python, SQL
- Libraries: Pandas, NumPy, Selenium, BeautifulSoup, asyncio, pdfplumber
- Frameworks: FastAPI, Django
- Tools: Git, Virtual Machines
I am a Python Developer with 3+ years of experience in:
- ETL pipelines & data wrangling
- Microservices & backend development
- Automation & process optimization
I enjoy solving complex problems with clean, scalable code and have delivered solutions across domains including job portals, healthcare, and media.
📧 Reach me at pydevrupesh@gmail.com
🔗 LinkedIn | CV