trishala@portfolio:~
user@machine:~$ whoami
> Trishala Thakur
user@machine:~$ cat mission.txt
> Data Scientist | ML Engineer | Problem Solver
> Engineering intelligence from data since 2021
> Building AI systems that make industries & researchers smarter
> Python • ML • CV • LLMs • Cloud

Real-World Impact

3+
Domains Impacted
Energy, Academia, Environmental Science
12K+
Review Hours Saved
Enabled automated GPT-driven data extraction
6
Industrial Assets Protected
Predictive maintenance preventing costly downtime
30 min
Time Saved per Case
Via ML-powered mobile troubleshooting app

Key Achievements

92% F1 score automating face–name matching across 1TB+ image corpus
GPT-5-nano pipeline saved 12K+ human review hours
Predicted equipment failures 5 days in advance at 87% precision
Forecasted regional droughts (71% recall) via Siamese CNN on satellite CO₂ data
Integrated predictive models into mobile apps used by 10+ engineers
Automated labeling workflows, saving 2 hours/week of manual work

Career Timeline

ML Data Scientist - University of Chicago

Present

> Building AI pipelines that transform massive web-scraped datasets into research-ready insights
> Orchestrated multi-agent system achieving 92% F1 score for document classification
> Developed computer vision models for page layout analysis (71% accuracy)
> Processing 1.2M+ research papers to extract GPT-5-nano insights
> Enabling 100+ researchers to work faster with automated data preparation

Research Assistant - CIRES

May 2024 - Aug 2024

> Developed Siamese CNN models for drought forecasting from satellite imagery
> Identified drought signals in multi-temporal satellite data
> Built deep learning pipeline for regional climate pattern prediction
> Contributed to climate research used by environmental scientists

Student Assistant - CIRES

Jan 2024 - Apr 2024

> Created XGBoost classifier on DistilBERT embeddings (78% F1)
> Deployed Flask API integrated with Label Studio for production use
> Accelerated text classification tasks for environmental research

Data Scientist - Schneider Electric

Aug 2021 - Jul 2023

> Built LightGBM failure prediction system (87% precision) preventing $2M+ in losses
> Integrated predictive maintenance ML model into production app
> Engineered features from IoT sensor data for 500+ daily users
> Created scalable SQL pipelines processing millions of industrial data points
> Enabled proactive maintenance decisions across global facilities

Data Intern - Schneider Electric

Feb 2021 - Jul 2021

> Designed Tableau dashboards tracking failure metrics for engineering teams
> Built RaspberryPi sensor system integrated with ML models
> Led tech talks introducing data science to 50+ colleagues
> Started the journey that shaped my career

Featured Projects

lost_and_found_ai

Multimodal Search System

The Problem: Lost items are hard to track using text alone. Solution: CLIP-ViT-B/32 multimodal search understanding both images and natural language descriptions.

Tech Stack: CLIP, FAISS vector search, GCP deployment, Streamlit UI

CLIP FAISS GCP Streamlit

mock_interview_agent

RAG-Powered Interview Coach

The Problem: Interview prep is expensive and stressful. Solution: RAG agent generating role-specific questions with Chain-of-Thought feedback.

Tech Stack: RAG architecture, LLM prompting, CoT reasoning

RAG LLM Chain-of-Thought

fact_verification_system

LLM Truth Checker

Lost and Found Demo

The Problem: LLMs hallucinate facts. Solution: Automated fact-checking pipeline extracting claims, finding entities, retrieving Wikidata evidence, and verifying consistency.

Tech Stack: NLP, Wikidata API, Entity linking, Fact verification

NLP Wikidata Fact-Check FLAN-T5

remote_finetuning_framework

Cloud MLOps Platform

The Problem: Fine-tuning requires expensive GPUs. Solution: Cloud-based framework enabling remote fine-tuning on GPU VMs with API access to custom models.

Tech Stack: Cloud VMs, Model fine-tuning, REST API, MLOps

MLOps Cloud API GPU

Tech Arsenal

Python
PyTorch
TensorFlow
LLMs
RAG
Computer Vision
HuggingFace
GCP
SQL
XGBoost
scikit-learn
OpenCV
APIs
Git
Docker
Linux
NumPy
Pandas
Matplotlib
Seaborn

Knowledge Shares

From Scratch: MLOps Fine-Tuning System

Complete guide to building an MLOps pipeline for model fine-tuning. From infrastructure setup to deployment, making ML accessible without expensive local hardware. Deep dive into cloud architecture, GPU orchestration, and API design.

Read Article

More than a Chatbot

Workshop at 3rd Annual Environmental Data Science Summit 2025. Exploring advanced LLM applications beyond simple conversations - RAG systems, agents, and structured outputs for scientific research.

View Workshop

MY DATA ACTION

Personal reflection on combining data insights with decisive action. How moving from analysis paralysis to implementation creates real-world impact in data science projects.

Read on Medium

Can ML Solve Your Problem?

Practical framework for determining when machine learning is the right solution. Not every problem needs a neural network - sometimes simpler approaches work better and faster.

Read on Medium

Mathematical Problems in Industry 2024

Research on reducing noise and ringing in image processing. Applied mathematics meeting real-world signal processing challenges with novel optimization approaches.

View Paper

Why I Want to Create Digital Twins

My journey at Schneider Electric and the vision behind digital twins for sustainability. How virtual representations transform energy efficiency and industrial operations.

Read Article

Currently Running...

Active Processes

Building interactive CARTO maps for The Nature Conservancy to visualize Natural Climate Solutions and carbon emission reduction hotspots
Developing AI automation pipelines at University of Chicago
Following AI research via uncover.ai (Instagram) and AI Explained (YouTube) for cutting-edge updates
Grinding LeetCode - training problem-solving speed and coding efficiency for real-world applications

Connect With Me

Chicago, IL, USA

720-490-0543