Dataiku Recipe Patterns
Reference patterns for creating different recipe types via the Python API.
Before Writing Code
MANDATORY: Read the relevant reference file before writing any recipe code.
-
GREL formulas → read references/grel-functions.md first
-
Prepare steps → read references/processors.md first
-
Joins → read references/join-recipe.md first
-
Grouping → read references/group-recipe.md first
-
Python recipes → read references/python-recipe.md first
-
Sync recipes → read references/sync-recipe.md first
-
Date handling → read references/date-operations.md first
-
Pitfalls index → references/pitfalls.md (recipe-type reference files also have a Pitfalls section at the top)
Do NOT rely on general knowledge for GREL functions or API methods. Dataiku GREL differs from OpenRefine GREL and other variants. Always verify function names against the reference.
Recipe Type Decision Table
Recipe Type Use When Key Method
Prepare Column transforms, filtering, formula columns, renaming, data cleaning project.new_recipe("prepare", ...)
Join Combining datasets on key columns (LEFT, INNER, RIGHT, OUTER) project.new_recipe("join", ...)
Group Aggregations: sum, count, avg, min, max, stddev, etc. project.new_recipe("grouping", ...)
Sync Copying data between connections (e.g., to a data warehouse) project.new_recipe("sync", ...)
Python Custom transformations not possible with visual recipes project.new_recipe("python", ...)
Universal Builder Pattern
Every recipe follows the same create-configure-run lifecycle:
1. Create via builder
builder = project.new_recipe("<type>", "<recipe_name>") builder.with_input("<input_dataset>") builder.with_new_output("<output_dataset>", "<connection>") # creates output dataset recipe = builder.create()
2. Configure settings
settings = recipe.get_settings()
... recipe-specific configuration ...
settings.save()
3. Apply schema updates
schema_updates = recipe.compute_schema_updates() if schema_updates.any_action_required(): schema_updates.apply()
4. Run and check
job = recipe.run(no_fail=True) state = job.get_status()["baseStatus"]["state"] # "DONE" or "FAILED"
After Running Any Recipe
Always sample the output and verify the result before reporting success. Silent data issues (wrong values, all nulls, unexpected types) are common.
from helpers.export import sample rows = sample(client, "PROJECT_KEY", "output_dataset", 5) for r in rows: print(r)
Always Remember
-
Call settings.save() after configuration changes
-
Call compute_schema_updates().apply() for visual recipes
-
Call recipe.run(no_fail=True) to execute (already waits for completion)
-
Check job.get_status()["baseStatus"]["state"] for "DONE" or "FAILED"
-
Sample and verify the output data before reporting success
Tested Patterns
Copy-paste patterns that have been validated against a live Dataiku instance:
-
patterns/bin-numeric-column.py — Bin a string numeric column into ranges
-
patterns/calculated-columns.py — Common GREL formula patterns
-
patterns/filter-and-clean.py — Data cleaning pipeline
Detailed References
Recipe types:
-
references/prepare-recipe.md — Prepare recipe builder, add_processor_step() API
-
references/join-recipe.md — Join configuration, multi-table joins, column selection
-
references/group-recipe.md — Aggregation flags, output naming, type compatibility
-
references/sync-recipe.md — Sync recipe pattern
-
references/python-recipe.md — Python recipe with set_code
Data preparation:
-
references/processors.md — All processor types with parameters and complete example
-
references/grel-functions.md — Full GREL function table and formula syntax
-
references/date-operations.md — DateParser, DateFormatter, datePart examples
Troubleshooting:
- references/pitfalls.md — Index of all pitfalls (details are inline in each reference file)