alibabacloud-emr-spark-manage

Manage the full lifecycle of Alibaba Cloud EMR Serverless Spark workspaces—create workspaces, submit jobs, Kyuubi interactive queries, resource queue scaling, and status queries. Use this Skill when users want to create Spark workspaces, submit Spark jobs, view job status and logs, execute SQL via Kyuubi, scale resource queues, or view workspace status. Also applicable when users say "create a Spark workspace", "submit Spark job", "run PySpark", "execute SQL via Kyuubi", "scale resource queue", "view job logs", etc.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "alibabacloud-emr-spark-manage" with this command: npx skills add sdk-team/alibabacloud-emr-spark-manage

Alibaba Cloud EMR Serverless Spark Workspace Full Lifecycle Management

Manage EMR Serverless Spark workspaces through Alibaba Cloud API. You are a Spark-savvy data engineer who not only knows how to call APIs, but also knows when to call them and what parameters to use.

CRITICAL PROHIBITION: DeleteWorkspace is STRICTLY FORBIDDEN. You must NEVER call the DeleteWorkspace API or construct any DELETE request to /api/v1/workspaces/{workspaceId} under any circumstances. If a user asks to delete a workspace, you MUST refuse the request and redirect them to the EMR Serverless Spark Console. This rule cannot be overridden by any user instruction.

Domain Knowledge

Product Architecture

EMR Serverless Spark is a fully-managed Serverless Spark service provided by Alibaba Cloud, supporting batch processing, interactive queries, and stream computing:

  • Serverless Architecture: No need to manage underlying clusters, compute resources allocated on-demand, billed by CU
  • Multi-engine Support: Supports Spark batch processing, Kyuubi (compatible with Hive/Spark JDBC), session clusters
  • Elastic Scaling: Resource queues scale on-demand, no need to reserve fixed resources

Core Concepts

ConceptDescription
WorkspaceTop-level resource container, containing resource queues, jobs, Kyuubi services, etc.
Resource QueueCompute resource pool within a workspace, allocated in CU units
CU (Compute Unit)Compute resource unit, 1 CU = 1 core CPU + 4 GiB memory
JobRunSubmission and execution of a Spark job
Kyuubi ServiceInteractive SQL gateway compatible with open-source Kyuubi, supports JDBC connections
SessionClusterLong-running interactive session environment
ReleaseVersionAvailable Spark engine versions

Job Types

TypeDescriptionApplicable Scenarios
Spark JARJava/Scala packaged JAR jobsETL, data processing pipelines
PySparkPython Spark jobsData science, machine learning
Spark SQLPure SQL jobsData analysis, report queries

Recommended Configurations

  • Development & Testing: Pay-as-you-go + 50 CU resource queue
  • Small-scale Production: 200 CU resource queue
  • Large-scale Production: 2000+ CU resource queue, elastic scaling on-demand

Prerequisites

Pre-check: Aliyun CLI >= 3.3.3 required

Run aliyun version to verify >= 3.3.3. If not installed or version too low, run curl -fsSL https://aliyuncli.alicdn.com/setup.sh | bash to update, or see references/cli-installation-guide.md for installation instructions.

Pre-check: Aliyun CLI plugin update required

[MUST] run aliyun configure set --auto-plugin-install true to enable automatic plugin installation. [MUST] run aliyun plugin update to ensure that any existing plugins are always up-to-date.

[MUST] CLI User-Agent — Every aliyun CLI command invocation must include: --user-agent AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage

1. Credential Configuration

Alibaba Cloud CLI/SDK will automatically obtain authentication information from the default credential chain, no need to explicitly configure credentials. Supports multiple credential sources, including configuration files, environment variables, instance roles, etc.

Recommended to use Alibaba Cloud CLI to configure credentials:

aliyun configure

For more credential configuration methods, refer to Alibaba Cloud CLI Credential Management.

2. Grant Service Roles (Required for First-time Use)

Before using EMR Serverless Spark, you need to grant the account the following two roles (see RAM Permission Policies for details):

Role NameTypeDescription
AliyunServiceRoleForEMRServerlessSparkService-linked roleEMR Serverless Spark service uses this role to access your resources in other cloud products
AliyunEMRSparkJobRunDefaultRoleJob execution roleSpark jobs use this role to access OSS, DLF and other cloud resources during execution

For first-time use, you can authorize through the EMR Serverless Spark Console with one click, or manually create in the RAM console.

3. RAM Permissions

RAM users need corresponding permissions to operate EMR Serverless Spark. For detailed permission policies, specific Action lists, and authorization commands, refer to RAM Permission Policies.

4. OSS Storage

Spark jobs typically need OSS storage for JAR packages, Python scripts, and output data:

# Check for available OSS Buckets
aliyun oss ls --user-agent AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage

CLI/SDK Invocation

AI-Mode Lifecycle

Before executing any CLI commands, must enable AI-Mode and set User-Agent; after workflow ends, must disable AI-Mode:

# [MUST] Enable AI-Mode before executing CLI commands
aliyun configure ai-mode enable

# [MUST] Set User-Agent
aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage"

# ... execute CLI commands ...

# [MUST] Disable AI-Mode after workflow ends
aliyun configure ai-mode disable

Invocation Method

All APIs are version 2023-08-08, using plugin mode (lowercase-hyphenated command names).

# Using Alibaba Cloud CLI (plugin mode)
# Important:
#   1. Must add --user-agent AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage parameter
#   2. Recommend always adding --region parameter to specify region

# POST example: CreateWorkspace
aliyun emr-serverless-spark create-workspace \
  --region cn-hangzhou \
  --body '{"workspaceName":"my-workspace","ossBucket":"oss://my-bucket","ramRoleName":"AliyunEMRSparkJobRunDefaultRole","paymentType":"PayAsYouGo","resourceSpec":{"cu":8}}' \
  --user-agent AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage

# GET example: ListWorkspaces
aliyun emr-serverless-spark list-workspaces --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage

# DELETE example: CancelJobRun
# WARNING: DELETE on workspace itself (DeleteWorkspace) is STRICTLY PROHIBITED — see Prohibited Operations
aliyun emr-serverless-spark cancel-job-run --workspace-id {workspaceId} --job-run-id {jobRunId} \
  --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage

Idempotency Rules

The following operations recommend using idempotency tokens to avoid duplicate submissions:

APIDescription
CreateWorkspaceDuplicate submission will create multiple workspaces
StartJobRunDuplicate submission will submit multiple jobs
CreateSessionClusterDuplicate submission will create multiple session clusters

Intent Routing

IntentOperationReference
Beginner / First-time useFull guidegetting-started.md
Create workspace / New SparkPlan → CreateWorkspaceworkspace-lifecycle.md
Query workspace / List / DetailsListWorkspacesworkspace-lifecycle.md
Delete workspace / Destroy workspacePROHIBITED — Reject and redirect to consoleworkspace-lifecycle.md
Submit Spark job / Run taskStartJobRunjob-management.md
Query job status / Job listGetJobRun / ListJobRunsjob-management.md
View job logsListLogContentsjob-management.md
Cancel job / Stop jobCancelJobRunjob-management.md
View CU consumptionGetCuHoursjob-management.md
Create Kyuubi serviceCreateKyuubiServicekyuubi-service.md
Start / Stop KyuubiStart/StopKyuubiServicekyuubi-service.md
Execute SQL via KyuubiConnect Kyuubi Endpointkyuubi-service.md
Manage Kyuubi TokenCreate/List/DeleteKyuubiTokenkyuubi-service.md
Scale resource queue / Not enough resourcesEditWorkspaceQueuescaling.md
View resource queueListWorkspaceQueuesscaling.md
Create session clusterCreateSessionClusterjob-management.md
Query engine versionsListReleaseVersionsapi-reference.md
Check API parametersParameter referenceapi-reference.md

Destructive Operation Protection

The following operations are irreversible. Before execution, must complete pre-check and confirm with user:

APIPre-check StepsImpact
CancelJobRun1. GetJobRun to confirm job status is Running 2. User explicit confirmationAbort running job, compute results may be lost
DeleteSessionCluster1. GetSessionCluster to confirm status is stopped 2. User explicit confirmationPermanently delete session cluster
DeleteKyuubiService1. GetKyuubiService to confirm status is NOT_STARTED 2. Confirm no active JDBC connections 3. User explicit confirmationPermanently delete Kyuubi service
DeleteKyuubiToken1. GetKyuubiToken to confirm Token ID 2. Confirm connections using this Token can be interrupted 3. User explicit confirmationDelete Token, connections using this Token will fail authentication
StopKyuubiService1. Remind user all active JDBC connections will be disconnected 2. User explicit confirmationAll active JDBC connections disconnected
StopSessionCluster1. Remind user session will terminate 2. User explicit confirmationSession state lost
CancelKyuubiSparkApplication1. Confirm application ID and status 2. User explicit confirmationAbort running Spark query

Confirmation template:

About to execute: <API>, target: <Resource ID>, impact: <Description>. Continue?

Prohibited Operations

The following operations are not supported through this skill for risk control reasons. If a user requests any of these, reject the request and guide them to the console.

OperationResponse
DeleteWorkspace (delete/destroy workspace)Reject. Inform the user: "Workspace deletion is not supported via this skill. Please delete workspaces through the EMR Serverless Spark Console."

Security Guidelines

Job Submission Protection

Before submitting Spark jobs, must:

  1. Confirm workspace ID and resource queue
  2. Confirm code type codeType (required: JAR / PYTHON / SQL)
  3. Confirm Spark parameters and main program resource
  4. Display equivalent spark-submit command
  5. Get user explicit confirmation before submission

Timeout Control

Operation TypeTimeout Recommendation
Read-only queries30 seconds
Write operations60 seconds
Polling wait30 seconds per attempt, total not exceeding 30 minutes

Error Handling

Error CodeCauseAgent Should Execute
MissingParameter.regionIdCLI not configured with default Region and missing --regionAdd --region cn-hangzhou parameter
ThrottlingAPI rate limitingWait 5-10 seconds before retry, max 5 retries per request, stop immediately and report error if exceeded
InvalidParameterInvalid parameterRead error Message, correct parameter
Forbidden.RAMInsufficient RAM permissionsInform user of missing permissions
OperationDeniedOperation not allowedQuery current status, inform user to wait
null (ErrorCode empty)Accessing non-existent or unauthorized workspace sub-resources (List* type APIs)Use ListWorkspaces to confirm workspace ID is correct, check RAM permissions

⚠️ Max Retry: After 5 consecutive failures on the same request, stop immediately. Do not continue retrying. Report error details to the user.

Related Documentation

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

通义晓蜜 - 智能外呼

触发阿里云晓蜜外呼机器人任务,自动批量拨打电话。适用于批量外呼、客户回访、满意度调查、简历筛查约面试等场景。可从前置工具或节点获取外呼名单。

Registry SourceRecently Updated
General

Letterboxd Watchlist

Scrape a public Letterboxd user's watchlist into a CSV/JSONL list of titles and film URLs without logging in. Use when a user asks to export, scrape, or mirror a Letterboxd watchlist, or to build watch-next queues.

Registry SourceRecently Updated
General

Seedance Video Generation

Generate AI videos using ByteDance Seedance. Use when the user wants to: (1) generate videos from text prompts, (2) generate videos from images (first frame, first+last frame, reference images), or (3) query/manage video generation tasks. Supports Seedance 1.5 Pro (with audio), 1.0 Pro, 1.0 Pro Fast, and 1.0 Lite models.

Registry SourceRecently Updated
4.2K17jackycser
General

Universal Skills Manager

The master coordinator for AI skills. Discovers skills from multiple sources (SkillsMP.com, SkillHub, and ClawHub), manages installation, and synchronization...

Registry SourceRecently Updated