ansible-error-handling

Ansible Error Handling

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "ansible-error-handling" with this command: npx skills add basher83/lunar-claude/basher83-lunar-claude-ansible-error-handling

Ansible Error Handling

Patterns for robust error handling in Ansible playbooks and roles.

Block/Rescue/Always Pattern

Handle errors and perform cleanup:

  • name: Deploy application block:

    • name: Stop application ansible.builtin.systemd: name: myapp state: stopped

    • name: Deploy new version ansible.builtin.copy: src: myapp-v2.0 dest: /usr/bin/myapp

    • name: Start application ansible.builtin.systemd: name: myapp state: started

    rescue:

    • name: Rollback to previous version ansible.builtin.copy: src: myapp-backup dest: /usr/bin/myapp

    • name: Start application (rollback) ansible.builtin.systemd: name: myapp state: started

    • name: Report failure ansible.builtin.fail: msg: "Deployment failed, rolled back to previous version"

    always:

    • name: Cleanup temp files ansible.builtin.file: path: /tmp/deploy-* state: absent

Execution Flow

  • block: Main tasks execute sequentially

  • rescue: Runs if ANY task in block fails

  • always: Runs regardless of success/failure

Retry with Until

Handle transient failures with retries:

  • name: Wait for service to be ready ansible.builtin.uri: url: http://localhost:8080/health status_code: 200 register: health_check until: health_check.status == 200 retries: 30 delay: 10

    Total wait: up to 5 minutes (30 * 10s)

With Command Module

  • name: Wait for cluster to stabilize ansible.builtin.command: pvecm status register: cluster_status until: "'Quorate: Yes' in cluster_status.stdout" retries: 12 delay: 5 changed_when: false

Retry Parameters

Parameter Description

until

Condition that must be true to stop retrying

retries

Maximum number of attempts

delay

Seconds between attempts

Assert for Validation

Validate inputs with clear error messages:

  • name: Validate required variables ansible.builtin.assert: that: - vm_name is defined - vm_name | length > 0 - vm_memory >= 1024 - vm_cores >= 1 fail_msg: | Invalid VM configuration: - vm_name: {{ vm_name | default('NOT SET') }} - vm_memory: {{ vm_memory | default('NOT SET') }} (min: 1024) - vm_cores: {{ vm_cores | default('NOT SET') }} (min: 1) success_msg: "VM configuration validated" quiet: true

Common Assertions

Variable defined and non-empty

  • vm_name is defined and vm_name | trim | length > 0

Numeric range

  • vm_memory >= 1024 and vm_memory <= 65536

Regex match

  • vm_name is match('^[a-z0-9-]+$')

List has items

  • vm_networks | length > 0

Value in allowed list

  • vm_ostype in ['l26', 'win10', 'win11']

Fail with Context

Provide actionable error messages:

  • name: Check prerequisites ansible.builtin.command: which docker register: docker_check changed_when: false failed_when: false

  • name: Fail if Docker not installed ansible.builtin.fail: msg: | Docker is not installed on {{ inventory_hostname }}.

    To install Docker:
      sudo apt update
      sudo apt install docker.io
    
    Or use the docker role:
      ansible-playbook playbooks/install-docker.yml
    

    when: docker_check.rc != 0

Graceful Failure Handling

Allow expected "failures":

  • name: Try to stop service ansible.builtin.systemd: name: myservice state: stopped register: stop_result failed_when:
    • stop_result.failed
    • "'not found' not in stop_result.msg"

    Only fail if error is NOT "service not found"

Multiple Acceptable Conditions

  • name: Join cluster ansible.builtin.command: pvecm add {{ primary_node }} register: cluster_join failed_when:
    • cluster_join.rc != 0
    • "'already in a cluster' not in cluster_join.stderr"
    • "'cannot join' not in cluster_join.stderr" changed_when: cluster_join.rc == 0

Check Before Fail

Separate checking from failing for better control:

  • name: Check if resource exists ansible.builtin.command: check-resource {{ resource_id }} register: resource_check changed_when: false failed_when: false # Don't fail here

  • name: Fail with context if missing ansible.builtin.fail: msg: | Resource {{ resource_id }} not found. Command output: {{ resource_check.stderr }} Hint: Ensure resource was created first. when: resource_check.rc != 0

Error Recovery Pattern

Attempt operation, handle specific errors:

  • name: Attempt primary approach block:

    • name: Connect via primary endpoint ansible.builtin.uri: url: "https://{{ primary_host }}:8006/api2/json" validate_certs: true register: primary_result

    rescue:

    • name: Log primary failure ansible.builtin.debug: msg: "Primary endpoint failed: {{ primary_result.msg | default('unknown error') }}"

    • name: Try fallback endpoint ansible.builtin.uri: url: "https://{{ fallback_host }}:8006/api2/json" validate_certs: false register: fallback_result

Delegate Error Handling

Run checks from controller for better error context:

  • name: Verify API endpoint from controller ansible.builtin.uri: url: "https://{{ inventory_hostname }}:8006/api2/json/version" validate_certs: false delegate_to: localhost register: api_check failed_when: false

  • name: Report API status ansible.builtin.fail: msg: | Cannot reach Proxmox API on {{ inventory_hostname }} Status: {{ api_check.status | default('connection failed') }} Check: Network connectivity, firewall rules, pveproxy service when: api_check.status | default(0) != 200

Ignore Errors (Use Sparingly)

  • name: Remove optional backup ansible.builtin.file: path: /backup/old-backup.tar.gz state: absent ignore_errors: true register: cleanup_result

  • name: Report cleanup status ansible.builtin.debug: msg: "Cleanup {{ 'successful' if not cleanup_result.failed else 'skipped' }}"

When ignore_errors is Acceptable

  • Non-critical cleanup tasks

  • Optional operations that shouldn't block playbook

  • When the result is immediately checked anyway

Prefer failed_when

BETTER than ignore_errors

  • name: Remove backup ansible.builtin.file: path: /backup/old-backup.tar.gz state: absent register: cleanup_result failed_when:
    • cleanup_result.failed
    • "'does not exist' not in cleanup_result.msg | default('')"

Complete Example


  • name: Deploy with comprehensive error handling hosts: app_servers become: true

    tasks:

    • name: Validate configuration ansible.builtin.assert: that: - app_version is defined - app_version is match('^\d+.\d+.\d+$') fail_msg: "Invalid app_version: {{ app_version | default('NOT SET') }}"

    • name: Deploy application block:

      • name: Download release ansible.builtin.get_url: url: "https://releases.example.com/{{ app_version }}.tar.gz" dest: /tmp/app.tar.gz register: download until: download is succeeded retries: 3 delay: 5

      • name: Stop current version ansible.builtin.systemd: name: myapp state: stopped

      • name: Extract release ansible.builtin.unarchive: src: /tmp/app.tar.gz dest: /opt/myapp remote_src: true

      • name: Start new version ansible.builtin.systemd: name: myapp state: started

      • name: Verify health ansible.builtin.uri: url: http://localhost:8080/health register: health until: health.status == 200 retries: 6 delay: 10

      rescue:

      • name: Restore previous version ansible.builtin.copy: src: /opt/myapp-backup/ dest: /opt/myapp/ remote_src: true

      • name: Start previous version ansible.builtin.systemd: name: myapp state: started

      • name: Report deployment failure ansible.builtin.fail: msg: | Deployment of {{ app_version }} failed. Previous version restored. Check logs: journalctl -u myapp

      always:

      • name: Cleanup download ansible.builtin.file: path: /tmp/app.tar.gz state: absent

Additional Resources

For detailed error handling patterns and techniques, consult:

  • references/error-handling.md
  • Comprehensive error handling patterns, block/rescue/always examples, retry strategies

Related Skills

  • ansible-idempotency - changed_when/failed_when patterns

  • ansible-fundamentals - Core Ansible concepts

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

proxmox-infrastructure

No summary provided by upstream source.

Repository SourceNeeds Review
General

ansible-fundamentals

No summary provided by upstream source.

Repository SourceNeeds Review
General

ansible-role-design

No summary provided by upstream source.

Repository SourceNeeds Review
General

ansible-idempotency

No summary provided by upstream source.

Repository SourceNeeds Review