data-catalog

Data Catalog Patterns

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "data-catalog" with this command: npx skills add jediv/dataiku-chat-control/jediv-dataiku-chat-control-data-catalog

Data Catalog Patterns

Reference patterns for managing the Dataiku data catalog via the Python API.

Key Concepts

Concept What it is Scope

Data Collection A curated group of datasets, visible across projects Instance-level (client )

Metadata Label, description, tags, checklists, custom key-value pairs Per dataset or project

Meaning A semantic type for columns (e.g., "Email", "Country Code") Instance-level (client )

Tags Freeform labels on datasets or projects Per dataset or project

Data Collections

List Collections

collections = client.list_data_collections(as_type="dict") for c in collections: print(f"{c['displayName']} ({c['id']}) — {c['itemCount']} items")

Create a Collection

dc = client.create_data_collection( displayName="Customer Data", description="All customer-related datasets", tags=["customers", "production"] )

Add Datasets to a Collection

dc = client.get_data_collection("collection_id")

Add by dataset handle

ds = project.get_dataset("MY_DATASET") dc.add_object(ds)

Add by dict (for cross-project datasets)

dc.add_object({"type": "DATASET", "projectKey": "PROJECT_A", "id": "DATASET_NAME"})

List and Remove Objects

dc = client.get_data_collection("collection_id") objects = dc.list_objects() for obj in objects: raw = obj.get_raw() print(f" {raw['projectKey']}.{raw['id']}")

# Get as a dataset handle
ds = obj.get_as_dataset()

# Remove from collection
obj.remove()

Update Collection Settings

dc = client.get_data_collection("collection_id") settings = dc.get_settings() settings.display_name = "Renamed Collection" settings.description = "Updated description" settings.tags = ["new-tag", "production"] settings.save()

Dataset Metadata

Get and Set Metadata

ds = project.get_dataset("MY_DATASET") metadata = ds.get_metadata()

Metadata structure:

{

"label": "...",

"description": "...",

"tags": ["tag1", "tag2"],

"checklists": {"checklists": [...]},

"custom": {"kv": {"key1": "value1"}}

}

metadata["tags"] = ["cleaned", "production"] metadata["custom"]["kv"]["owner"] = "data-team" ds.set_metadata(metadata)

AI-Generated Descriptions

Generate descriptions for dataset and columns (requires AI Services enabled)

result = ds.generate_ai_description(language="english", save_description=True)

Rate-limited: 1000 requests/day, then throttled to ~60s per call.

Meanings (Semantic Column Types)

List and Create Meanings

List existing meanings

meanings = client.list_meanings()

Create a values-list meaning

client.create_meaning( id="country_code", label="Country Code", type="VALUES_LIST", values=["US", "UK", "FR", "DE", "JP"], normalizationMode="EXACT", detectable=True )

Create a pattern-based meaning

client.create_meaning( id="email_address", label="Email Address", type="PATTERN", pattern=r"^[\w.-]+@[\w.-]+.\w+$", detectable=True )

Update a Meaning

meaning = client.get_meaning("country_code") definition = meaning.get_definition() definition["entries"].append({"value": "CA"}) meaning.set_definition(definition)

Catalog Indexing

Trigger re-indexing of connections so new tables appear in the catalog:

Index specific connections

client.catalog_index_connections(connection_names=["my_snowflake", "my_postgres"])

Index all connections

client.catalog_index_connections(all_connections=True)

Detailed References

  • references/data-collections.md — Permissions, completeness checks, cross-project patterns

  • references/metadata-and-tags.md — Full metadata structure, project tags, custom metadata

  • references/meanings.md — All meaning types, normalization modes, detectable meanings

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

flow-management

No summary provided by upstream source.

Repository SourceNeeds Review
General

recipe-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

dataset-management

No summary provided by upstream source.

Repository SourceNeeds Review