data automation

A fast, no-drama pipeline that normalizes messy Excel into clean datasets.

Overview

Data Automation is a small, fast tool that turns messy spreadsheets into one clean file. You drag in Excel files; it figures out the layout, lines up the headers, and gives you a tidy download. Built with Python/Pandas + FastAPI and a React front end.

  • Multi-source ready: handles defect logs and shipment reports out of the box
Timeline: May 2025 – Jul 2025 Role: Solo dev (backend + frontend + QA)

Design System

01 Typography

Style Font Size Weight
H1 Inter 36–60px Bold
H2 Inter 22–28px Bold
H3 Inter 18–20px Semi-bold
Body Inter 15–17px Regular / Medium

02 Colors

Brand
  • Lavender (Primary)
    #D1B3F5
  • Lavender Hover
    #C6A8F0
  • Deep Purple (Dark)
    #3C2A60
  • Deep Purple Hover
    #5D3C87
  • Peach (CTA)
    #F7CAA2
  • Peach Hover
    #F2B88A
Grayscale
  • Gray 900
    #111111
  • Gray 800
    #333333
  • Gray 700
    #555555
  • Gray 300
    #BBBBBB
  • White
    #FFFFFF

Problem

Monthly spreadsheets came in with different column orders, renamed headers, and mixed data types. Teams spent hours rearranging and cleaning by hand — a slow, error-prone, and frustrating process.

  • Same report looked different depending on the source
  • Columns renamed inconsistently (e.g. “Location” vs “Site”)
  • Manual cleanup caused frequent typos and mismatches

User Feedback

Before this tool, users shared these frustrations about monthly metrics:

  • “Every month the metrics come in a different format, so I spend forever fixing columns to match our company’s standard.”
  • “Even after cleaning, it’s easy to make mistakes.”
  • “If the tool just gave me the same format every time, it’d save me hours.”

Solution

A lightweight pipeline that automates the messy prep. The flow:

  • Auto-detect format → recognize Type A vs Type B layouts instantly
  • Normalize headers → alias maps + fuzzy match to unify naming
  • Validate schema → enforce types + required fields with clear error output
  • Export clean files → consistent CSV/XLSX with locked column order
Python Pandas FastAPI React Postgres / Supabase RTL + Jest (100% coverage)

Core Features

  • Auto format detection: distinguishes Type A / Type B via header signatures
  • Header normalization: alias maps + fuzzy matching for near-duplicates
  • Schema validation: type coercion, required fields, row-level error report
  • Stable exports: CSV/XLSX with locked column order and consistent naming
  • Tests: 100% coverage to prevent regressions across formats

Implementation

The API accepts uploads, inspects header rows to identify a format, and applies a mapping function to standardize columns. Pandas handles tidy transforms and validations; FastAPI streams the result back. The React UI keeps interaction minimal: drop files → automate → download.

Results

  • Cut prep time from ~45 minutes to under 5 minutes per batch
  • Reduced errors with strict validation + clear warnings
  • Faster releases thanks to 100% automated tests

What’s next

Two upgrades that let you teach the app exactly how the output should look — no presets.

1. Show a “correct” file. Upload a clean Excel that matches your target layout (column names, order, and types). Then upload a messy sample. The app compares the two, learns the mapping, and converts future files to match — automatically.

Step 1: Upload correct Excel Step 2: Upload messy Excel Result: Converted to match

2. Quick AI: rename + edit cells. You tell it exactly what to do in short commands.

How you use it: type a command, the app parses it. Keep it simple and explicit.

  • rename: location -> site — rename a header everywhere.
  • edit: set site = "Chicago" where location = "CHI" — bulk edit by rule.
  • edit: B2 = "Fixed" — change one specific cell.
Actions: rename / edit-cell Input: sample files Output: clean dataset

Flow stays the same: upload → automate → export.