Mimi's Portfolio — Data Automation

A fast, no-drama pipeline that normalizes messy Excel into clean datasets.

Overview

Data Automation is a small, fast tool that turns messy spreadsheets into one clean file. You drag in Excel files; it figures out the layout, lines up the headers, and gives you a tidy download. Built with Python/Pandas + FastAPI and a React front end.

Multi-source ready: handles defect logs and shipment reports out of the box

Timeline: May 2025 – Jul 2025 Role: Solo dev (backend + frontend + QA)

Design System

01 Typography

Style	Font	Size	Weight
H1	Inter	36–60px	Bold
H2	Inter	22–28px	Bold
H3	Inter	18–20px	Semi-bold
Body	Inter	15–17px	Regular / Medium

02 Colors

Brand

Lavender (Primary)
#D1B3F5
Lavender Hover
#C6A8F0
Deep Purple (Dark)
#3C2A60
Deep Purple Hover
#5D3C87
Peach (CTA)
#F7CAA2
Peach Hover
#F2B88A

Grayscale

Gray 900
#111111
Gray 800
#333333
Gray 700
#555555
Gray 300
#BBBBBB
White
#FFFFFF

Problem

Monthly spreadsheets came in with different column orders, renamed headers, and mixed data types. Teams spent hours rearranging and cleaning by hand — a slow, error-prone, and frustrating process.

Same report looked different depending on the source
Columns renamed inconsistently (e.g. “Location” vs “Site”)
Manual cleanup caused frequent typos and mismatches

User Feedback

Before this tool, users shared these frustrations about monthly metrics:

“Every month the metrics come in a different format, so I spend forever fixing columns to match our company’s standard.”
“Even after cleaning, it’s easy to make mistakes.”
“If the tool just gave me the same format every time, it’d save me hours.”

Solution

A lightweight pipeline that automates the messy prep. The flow:

Auto-detect format → recognize Type A vs Type B layouts instantly
Normalize headers → alias maps + fuzzy match to unify naming
Validate schema → enforce types + required fields with clear error output
Export clean files → consistent CSV/XLSX with locked column order

Python Pandas FastAPI React Postgres / Supabase RTL + Jest (100% coverage)

Core Features

Auto format detection: distinguishes Type A / Type B via header signatures
Header normalization: alias maps + fuzzy matching for near-duplicates
Schema validation: type coercion, required fields, row-level error report
Stable exports: CSV/XLSX with locked column order and consistent naming
Tests: 100% coverage to prevent regressions across formats

Implementation

The API accepts uploads, inspects header rows to identify a format, and applies a mapping function to standardize columns. Pandas handles tidy transforms and validations; FastAPI streams the result back. The React UI keeps interaction minimal: drop files → automate → download.

Results

Cut prep time from ~45 minutes to under 5 minutes per batch
Reduced errors with strict validation + clear warnings
Faster releases thanks to 100% automated tests

What’s next

Two upgrades that let you teach the app exactly how the output should look — no presets.

1. Show a “correct” file. Upload a clean Excel that matches your target layout (column names, order, and types). Then upload a messy sample. The app compares the two, learns the mapping, and converts future files to match — automatically.

Step 1: Upload correct Excel Step 2: Upload messy Excel Result: Converted to match

2. Quick AI: rename + edit cells. You tell it exactly what to do in short commands.

How you use it: type a command, the app parses it. Keep it simple and explicit.

rename: location -> site — rename a header everywhere.
edit: set site = "Chicago" where location = "CHI" — bulk edit by rule.
edit: B2 = "Fixed" — change one specific cell.

Actions: rename / edit-cell Input: sample files Output: clean dataset

Flow stays the same: upload → automate → export.

data automation