Clean Data
Clean messy data in the active sheet or a specified range.
Environment
-
If running inside Excel (Office Add-in / Office JS): Use Office JS directly (Excel.run(async (context) => {...}) ). Read via range.values , write helper-column formulas via range.formulas = [["=TRIM(A2)"]] . The in-place vs helper-column decision still applies.
-
If operating on a standalone .xlsx file: Use Python/openpyxl.
Workflow
Step 1: Scope
-
If a range is given (e.g. A1:F200 ), use it
-
Otherwise use the full used range of the active sheet
-
Profile each column: detect its dominant type (text / number / date) and identify outliers
Step 2: Detect issues
Issue What to look for
Whitespace leading/trailing spaces, double spaces
Casing inconsistent casing in categorical columns (usa / USA / Usa )
Number-as-text numeric values stored as text; stray $ , , , % in number cells
Dates mixed formats in the same column (3/8/26 , 2026-03-08 , March 8 2026 )
Duplicates exact-duplicate rows and near-duplicates (case/whitespace differences)
Blanks empty cells in otherwise-populated columns
Mixed types a column that's 98% numbers but has 3 text entries
Encoding mojibake (é , ’ ), non-printing characters
Errors #REF! , #N/A , #VALUE! , #DIV/0!
Step 3: Propose fixes
Show a summary table before changing anything:
Column Issue Count Proposed Fix
Step 4: Apply
-
Prefer formulas over hardcoded cleaned values — where the cleaned output can be expressed as a formula (e.g. =TRIM(A2) , =VALUE(SUBSTITUTE(B2,"$","")) , =UPPER(C2) , =DATEVALUE(D2) ), write the formula in an adjacent helper column rather than computing the result in Python and overwriting the original. This keeps the transformation transparent and auditable.
-
Only overwrite in place with computed values when the user explicitly asks for it, or when no sensible formula equivalent exists (e.g. encoding/mojibake repair)
-
For destructive operations (removing duplicates, filling blanks, overwriting originals), confirm with the user first
-
After each category of fix (whitespace → casing → number conversion → dates → dedup), show the user a sample of what changed and get confirmation before moving to the next category
-
Report a before/after summary of what changed