ARC Creator
Create FAIR Digital Objects following the nfdi4plants ARC specification v3.0.0.
Prerequisites
gitandgit-lfsinstalled- ARC Commander CLI at
~/bin/arc(optional but recommended) - For DataHUB sync: Personal Access Token for git.nfdi4plants.org or datahub.hhu.de
Interactive ARC Creation Workflow
Guide the user through these phases in order. Ask questions conversationally — don't dump all questions at once. Batch 2-4 related questions per message.
Phase 1: Investigation Setup
Ask the user:
- Investigation identifier (short, lowercase-hyphenated, e.g.
cold-stress-arabidopsis) - Title (concise name for the investigation)
- Description (textual description of the research goals)
- Where to store the ARC locally (suggest
/home/uranus/arc-projects/<identifier>/)
Then run scripts/create_arc.sh <path> <identifier> and set investigation metadata via:
arc investigation update -i "<id>" --title "<title>" --description "<desc>"
Phase 2: Studies
For each study, ask:
- Study identifier (e.g.
plant-growth) - Title and description
- Organism (for Characteristic [Organism])
- Growth conditions (temperature, light, medium, etc.)
- Source materials (what goes in — seeds, cell lines, etc.)
- Sample materials (what comes out — leaves, roots, extracts, etc.)
- Protocols — does the user have protocol documents to include?
- Factors — what experimental variables are being tested? (e.g., temperature, genotype, treatment)
Create with:
arc study init --studyidentifier "<id>"
arc study update --studyidentifier "<id>" --title "<title>" --description "<desc>"
Copy protocol files to studies/<id>/protocols/.
Copy resource files to studies/<id>/resources/.
Phase 3: Assays
For each assay, ask:
- Assay identifier (e.g.
proteomics-ms,rnaseq,sugar-measurement) - Measurement type (e.g., protein expression profiling, transcription profiling, metabolite profiling)
- Technology type (e.g., mass spectrometry, nucleotide sequencing, plate reader)
- Technology platform (e.g., Illumina NovaSeq, Bruker timsTOF)
- Data files — where are the raw data files? (will go into
assays/<id>/dataset/) - Processed data — any processed output files?
- Protocols — assay-specific protocols?
- Performers — who performed this assay? (name, affiliation, role)
Create with:
arc assay init -a "<id>" --measurementtype "<type>" --technologytype "<tech>"
Copy data to assays/<id>/dataset/, protocols to assays/<id>/protocols/.
Phase 4: Workflows (optional)
Ask if there are computational analysis steps. For each:
- Workflow identifier (e.g.
deseq2-analysis,heatmap-generation) - Description of what it does
- Code files (scripts, notebooks)
- Dependencies (Python packages, R libraries, Docker image)
Place code in workflows/<id>/.
Note: workflow.cwl is REQUIRED by spec but often created later. Inform user.
Phase 5: Runs (optional)
Ask if there are computation outputs. For each:
- Run identifier
- Which workflow produced it
- Output files (figures, tables, processed data)
Place outputs in runs/<id>/.
Phase 6: Contacts & Publications
Ask:
- Investigation contacts (name, email, affiliation, role — at minimum the PI)
- Publications (if any — DOI, PubMed ID, title, authors)
Add via:
arc investigation person register --lastname "<last>" --firstname "<first>" --email "<email>" --affiliation "<aff>"
Phase 7: Git Commit & DataHUB Sync
- Configure git user:
git config user.name "<name>"
git config user.email "<email>"
- Commit:
git add -A
git commit -m "Initial ARC: <investigation title>"
- Ask if the user wants to push to a DataHUB. If yes:
- Ask which host (git.nfdi4plants.org, datahub.hhu.de, etc.)
- Create remote repo (via browser or API)
- Set remote and push
ISA Metadata Reference
For detailed ISA-XLSX fields, annotation table columns, and ontology references, read references/arc-spec.md.
Key Reminders
- Assay data is immutable — never modify files in
assays/<id>/dataset/after initial placement - Studies describe materials, assays describe measurements
- Workflows are code, runs are outputs
- Git LFS for files > 100 MB:
git lfs track "*.fastq.gz" "*.bam" "*.raw" - Don't store ARCs on OneDrive/Dropbox — Git + cloud sync causes conflicts
- ARC Commander CLI reference:
arc <subcommand> --help