No description
Find a file
Gong Junmin 6d467e4b50
Some checks failed
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Deploy Documentation / build (push) Has been cancelled
Deploy Documentation / deploy (push) Has been cancelled
test(api): update auto-label alias expectation
2026-06-26 09:39:11 +08:00
.claude/skills Revert "(feat) Fully customized in house vllm " (#874) 2026-03-19 23:07:51 +08:00
.githooks chore(workflow): enforce independent PR branches with pre-push guard (#595) 2026-02-16 07:30:19 +08:00
.github feat(docker): add generic Dockerfile and CI auto-build workflow (#1211) 2026-05-18 21:54:32 +08:00
acestep test(api): update auto-label alias expectation 2026-06-26 09:39:11 +08:00
assets docs: add logo 2026-04-02 19:06:18 +08:00
docs docs(inference): clarify turbo guidance_scale/shift handling (#1241) 2026-06-26 09:36:53 +08:00
examples fix: example duration 2026-02-03 17:45:27 +08:00
openrouter feat: add codes for openrouter_adapter 2026-03-01 19:34:55 +09:00
scripts feat(flow-edit): redesign as overlay; supersedes #1165 + #1167 (#1169) 2026-04-30 21:48:22 +08:00
.dockerignore fix(docker): keep uv.lock in build context for uv sync --frozen (#1212) 2026-05-18 21:59:14 +08:00
.editorconfig Add editorconfig and fix training handler encoding 2026-02-07 22:51:33 +08:00
.env.example feat: add ACESTEP_CHECKPOINTS_DIR for shared model storage across installations (#1056) 2026-04-07 17:07:36 +08:00
.gitignore fix(build): track uv.lock in git for reproducible Docker builds (#1213) 2026-05-18 22:03:43 +08:00
AGENTS.md refractor: remove unused imports and variables 2026-02-28 22:14:05 +08:00
check_update.bat fix: check_update.bat 2026-02-10 15:55:23 +08:00
check_update.sh fix: replace Bash 4.0+ uppercase syntax for macOS compatibility 2026-04-07 16:37:49 +00:00
cli.py fix: unify checkpoint resolution to respect ACESTEP_CHECKPOINTS_DIR (#1057) 2026-04-07 17:41:05 +08:00
close_api_server.sh fix cfg kv block allocate 2026-02-02 02:13:31 +00:00
CONTRIBUTING.md Fix spelling errors across codebase (#786) 2026-03-07 15:03:46 +08:00
docker-compose.jetson.yml feat(jetson): build FFmpeg 7 + torchcodec from source, add bitsandbytes 2026-03-07 10:50:13 -06:00
docker-compose.yml feat(docker): add generic Dockerfile and CI auto-build workflow (#1211) 2026-05-18 21:54:32 +08:00
Dockerfile feat(docker): add generic Dockerfile and CI auto-build workflow (#1211) 2026-05-18 21:54:32 +08:00
Dockerfile.jetson feat(jetson): build FFmpeg 7 + torchcodec from source, add bitsandbytes 2026-03-07 10:50:13 -06:00
generate_examples.py add model zoo 2026-01-24 10:01:18 +00:00
install_uv.bat feat: Add the enforced use of lm models to avoid lm loads that are skipped due to gpu memory optimization, but are not recommended due to oom 2026-02-05 16:10:25 +08:00
install_uv.sh feat: update start_script 2026-02-09 19:07:49 +08:00
LICENSE Update LICENSE 2026-01-16 16:14:24 +08:00
merge_config.bat fix: bat error 2026-02-05 12:41:15 +08:00
merge_config.sh feat: update start_script 2026-02-09 19:07:49 +08:00
package.json docs: use ACE-Step logo, add Ecosystem page with auto-synced awesome list 2026-03-06 22:50:46 +08:00
profile_inference.py Merge remote-tracking branch 'origin/main' into feat/MLX-DiT-pre-complie-model 2026-02-13 22:32:18 -05:00
proxy_config.txt.example feat: add windows start bat 2026-02-05 02:40:43 +08:00
pyproject.toml fix: stale DiT instruction after Simple Mode sample creation + add setuptools dep (#1194) 2026-05-18 21:39:40 +08:00
quick_test.bat Fix spelling errors across codebase (#786) 2026-03-07 15:03:46 +08:00
quick_test.sh feat: update start_script 2026-02-09 19:07:49 +08:00
README-XPU.md Add Intel XPU Support (Arc GPU) + Fix Audio Loading (#746) 2026-03-03 09:52:45 +08:00
README.md docs: remove static studio UI (#1178) 2026-05-01 16:10:03 +08:00
requirements-rocm-linux.txt Revert "(feat) Fully customized in house vllm " (#874) 2026-03-19 23:07:51 +08:00
requirements-rocm.txt Revert "(feat) Fully customized in house vllm " (#874) 2026-03-19 23:07:51 +08:00
requirements-sidestep.txt feat: add Side-Step training v2 (corrected LoRA fine-tuning) 2026-02-12 13:00:14 +01:00
requirements-xpu.txt Merge branch 'main' into fix_xpu_req 2026-02-13 22:56:23 +08:00
requirements.txt Revert "(feat) Fully customized in house vllm " (#874) 2026-03-19 23:07:51 +08:00
run_api_server.sh add api 2025-12-24 04:01:34 +00:00
run_generate_test.py fix(mlx): add repaint step-injection and boundary blend to MLX diffusion path (#1197) 2026-05-11 19:31:33 +08:00
run_openrouter_api_server.sh change modelname and modelid 2026-01-31 16:05:29 +00:00
SECURITY.md docs: add responsible disclosure security policy 2026-02-06 13:43:05 +08:00
setup_xpu.bat Add Intel XPU Support (Arc GPU) + Fix Audio Loading (#746) 2026-03-03 09:52:45 +08:00
start_api_server.bat Preserve legacy torch fix in API launcher 2026-03-13 23:49:34 +00:00
start_api_server.sh Tighten launcher and CUDA compat guards 2026-03-13 23:18:28 +00:00
start_api_server_macos.sh Fix spelling errors across codebase (#786) 2026-03-07 15:03:46 +08:00
start_api_server_rocm.bat Revert "(feat) Fully customized in house vllm " (#874) 2026-03-19 23:07:51 +08:00
start_api_server_rocm.sh Revert "(feat) Fully customized in house vllm " (#874) 2026-03-19 23:07:51 +08:00
start_api_server_xpu.bat Add Intel XPU Support (Arc GPU) + Fix Audio Loading (#746) 2026-03-03 09:52:45 +08:00
start_gradio_ui.bat Use target CUDA device for bf16 checks 2026-03-14 00:22:53 +00:00
start_gradio_ui.sh fix: resolve API crash on null duration, generation corruption, and locale parsing 2026-03-20 18:24:48 +08:00
start_gradio_ui_macos.sh [FIX] Start Gradio Script - variable language not bound error (#920) 2026-03-22 00:08:08 +08:00
start_gradio_ui_macos_manual.sh fix: resolve API crash on null duration, generation corruption, and locale parsing 2026-03-20 18:24:48 +08:00
start_gradio_ui_manual.bat Fix spelling errors across codebase (#786) 2026-03-07 15:03:46 +08:00
start_gradio_ui_manual.sh fix: resolve API crash on null duration, generation corruption, and locale parsing 2026-03-20 18:24:48 +08:00
start_gradio_ui_rocm.bat Revert "(feat) Fully customized in house vllm " (#874) 2026-03-19 23:07:51 +08:00
start_gradio_ui_rocm.sh fix: resolve API crash on null duration, generation corruption, and locale parsing 2026-03-20 18:24:48 +08:00
start_gradio_ui_rocm_manual.bat Revert "(feat) Fully customized in house vllm " (#874) 2026-03-19 23:07:51 +08:00
start_gradio_ui_rocm_manual.sh fix: resolve API crash on null duration, generation corruption, and locale parsing 2026-03-20 18:24:48 +08:00
start_gradio_ui_xpu.bat Add Intel XPU Support (Arc GPU) + Fix Audio Loading (#746) 2026-03-03 09:52:45 +08:00
start_gradio_ui_xpu_manual.bat Add Intel XPU Support (Arc GPU) + Fix Audio Loading (#746) 2026-03-03 09:52:45 +08:00
test_env_detection.bat Fix spelling errors across codebase (#786) 2026-03-07 15:03:46 +08:00
test_env_detection.sh feat: update start_script 2026-02-09 19:07:49 +08:00
test_git_update.bat fix: bat error 2026-02-05 12:41:15 +08:00
test_git_update.sh feat: update script 2026-02-09 19:08:05 +08:00
train.py feat: Side-Step -- corrected LoRA/LoKR fine-tuning with interactive wizard (#557) 2026-02-15 07:54:46 +08:00
uv.lock fix(build): track uv.lock in git for reproducible Docker builds (#1213) 2026-05-18 22:03:43 +08:00

ACE-Step 1.5

Pushing the Boundaries of Open-Source Music Generation

ACEMusic | Project | Hugging Face | ModelScope | Space Demo | Discord | Technical Report | Awesome ACE-Step

StepFun Logo    ACEMusic - Try ACE-Step Online

📰 News

🎵 Want a faster & more stable experience? Try acemusic.ai — 100% free!

  • [2026-04-02] 🎉 ACE-Step 1.5 XL (4B DiT) Released! — We introduce the XL series with a 4B-parameter DiT decoder for higher audio quality. Three variants available: xl-base, xl-sft, xl-turbo. Requires ≥12GB VRAM (with offload), ≥20GB recommended. All LM models fully compatible. See Model Zoo for details.

Table of Contents

📝 Abstract

🚀 We present ACE-Step v1.5, a highly efficient open-source music foundation model that brings commercial-grade generation to consumer hardware. On commonly used evaluation metrics, ACE-Step v1.5 achieves quality beyond most commercial music models while remaining extremely fast—under 2 seconds per full song on an A100 and under 10 seconds on an RTX 3090. The model runs locally with less than 4GB of VRAM, and supports lightweight personalization: users can train a LoRA from just a few songs to capture their own style.

🌉 At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints—scaling from short loops to 10-minute compositions—while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model's internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. 🎚️

🔮 Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilities—such as cover generation, repainting, and vocal-to-BGM conversion—while maintaining strict adherence to prompts across 50+ languages. This paves the way for powerful tools that seamlessly integrate into the creative workflows of music artists, producers, and content creators. 🎸

Features

ACE-Step Framework

Performance

  • Ultra-Fast Generation — Under 2s per full song on A100, under 10s on RTX 3090 (0.5s to 10s on A100 depending on think mode & diffusion steps)
  • Flexible Duration — Supports 10 seconds to 10 minutes (600s) audio generation
  • Batch Generation — Generate up to 8 songs simultaneously

🎵 Generation Quality

  • Commercial-Grade Output — Quality beyond most commercial music models (between Suno v4.5 and Suno v5)
  • Rich Style Support — 1000+ instruments and styles with fine-grained timbre description
  • Multi-Language Lyrics — Supports 50+ languages with lyrics prompt for structure & style control

🎛️ Versatility & Control

Feature Description
Reference Audio Input Use reference audio to guide generation style
Cover Generation Create covers from existing audio
Repaint & Edit Selective local audio editing and regeneration
Track Separation Separate audio into individual stems
Multi-Track Generation Add layers like Suno Studio's "Add Layer" feature
Vocal2BGM Auto-generate accompaniment for vocal tracks
Metadata Control Control duration, BPM, key/scale, time signature
Simple Mode Generate full songs from simple descriptions
Query Rewriting Auto LM expansion of tags and lyrics
Audio Understanding Extract BPM, key/scale, time signature & caption from audio
LRC Generation Auto-generate lyric timestamps for generated music
LoRA Training One-click annotation & training in Gradio. 8 songs, 1 hour on 3090 (12GB VRAM)
Quality Scoring Automatic quality assessment for generated audio

🔔 Staying ahead

Star ACE-Step on GitHub and be instantly notified of new releases

🤝 Partners

ComfyUI Zilliz Milvus Zeabur Majik's Music Studio

Quick Start

🎵 Don't want to install locally? Try acemusic.ai — 100% free, no GPU required!

Requirements: Python 3.11-3.12, CUDA GPU recommended (also supports MPS / ROCm / Intel XPU / CPU)

Note: ROCm on Windows requires Python 3.12 (AMD officially provides Python 3.12 wheels only)

# 1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh          # macOS / Linux
# powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"  # Windows

# 2. Clone & install
git clone https://github.com/ACE-Step/ACE-Step-1.5.git
cd ACE-Step-1.5
uv sync

# 3. Launch Gradio UI (models auto-download on first run)
uv run acestep

# Or launch REST API server
uv run acestep-api

Open http://localhost:7860 (Gradio) or http://localhost:8001 (API).

📦 Windows users: A portable package with pre-installed dependencies is available. See Installation Guide.

📦 MacOS users: A portable package with pre-installed dependencies is available. See Installation Guide.

📖 Full installation guide (AMD/ROCm, Intel GPU, CPU, environment variables, command-line options): English | 中文 | 日本語

💡 Which Model Should I Choose?

Your GPU VRAM Recommended DiT Recommended LM Model Backend Notes
≤6GB 2B turbo None (DiT only) LM disabled by default; INT8 quantization + full CPU offload
6-8GB 2B turbo acestep-5Hz-lm-0.6B pt Lightweight LM with PyTorch backend
8-16GB 2B turbo/sft acestep-5Hz-lm-0.6B / 1.7B vllm 0.6B for 8-12GB, 1.7B for 12-16GB
16-20GB 2B sft or XL turbo acestep-5Hz-lm-1.7B vllm XL requires CPU offload below 20GB
20-24GB XL turbo/sft acestep-5Hz-lm-1.7B vllm XL fits without offload; 4B LM available
≥24GB XL sft (or xl-base for extract/lego/complete) acestep-5Hz-lm-4B vllm Best quality, all models fit without offload

XL (4B) models (acestep-v15-xl-*) offer higher audio quality with ~9GB VRAM for weights (vs ~4.7GB for 2B). They require ≥12GB VRAM (with offload + quantization) or ≥20GB (without offload). All LM models are fully compatible with XL.

The UI automatically selects the best configuration for your GPU. All settings (LM model, backend, offloading, quantization) are tier-aware and pre-configured.

📖 GPU compatibility details: English | 中文 | 日本語 | 한국어

🚀 Launch Scripts

Ready-to-use launch scripts for all platforms with auto environment detection, update checking, and dependency installation.

Platform Scripts Backend
Windows start_gradio_ui.bat, start_api_server.bat CUDA
Windows (ROCm) start_gradio_ui_rocm.bat, start_api_server_rocm.bat AMD ROCm
Linux start_gradio_ui.sh, start_api_server.sh CUDA
macOS start_gradio_ui_macos.sh, start_api_server_macos.sh MLX (Apple Silicon)
# Windows
start_gradio_ui.bat

# Linux
chmod +x start_gradio_ui.sh && ./start_gradio_ui.sh

# macOS (Apple Silicon)
chmod +x start_gradio_ui_macos.sh && ./start_gradio_ui_macos.sh

⚙️ Customizing Launch Settings

Recommended: Create a .env file to customize models, ports, and other settings. Your .env configuration will survive repository updates.

# Copy the example file
cp .env.example .env

# Edit with your preferred settings
# Examples in .env:
ACESTEP_CONFIG_PATH=acestep-v15-turbo
ACESTEP_LM_MODEL_PATH=acestep-5Hz-lm-1.7B
PORT=7860
LANGUAGE=en

📖 Script configuration & customization: English | 中文 | 日本語

📚 Documentation

Usage Guides

Method Description Documentation
🖥️ Gradio Web UI Interactive web interface for music generation Guide
🧭 UI Support Baseline Supported UI boundary and future UI parity checklist Guide
🎛️ VST3 Plugin Standalone VST3 plugin (C++/GGML) for DAW integration acestep.vst3
🐍 Python API Programmatic access for integration Guide
🌐 REST API HTTP-based async API for services Guide
⌨️ CLI Interactive wizard and configuration Guide

Setup & Configuration

Topic Documentation
📦 Installation (all platforms) English | 中文 | 日本語
🎮 GPU Compatibility English | 中文 | 日本語
🔧 GPU Troubleshooting English
🔬 Benchmark & Profiling English | 中文

Multi-Language Docs

Language API Gradio Inference Tutorial LoRA Training Install Benchmark
🇺🇸 English Link Link Link Link Link Link Link
🇨🇳 中文 Link Link Link Link Link Link Link
🇯🇵 日本語 Link Link Link Link Link Link
🇰🇷 한국어 Link Link Link Link Link

📖 Tutorial

🎯 Must Read: Comprehensive guide to ACE-Step 1.5's design philosophy and usage methods.

Language Link
🇺🇸 English English Tutorial
🇨🇳 中文 中文教程
🇯🇵 日本語 日本語チュートリアル

This tutorial covers: mental models and design philosophy, model architecture and selection, input control (text and audio), inference hyperparameters, random factors and optimization strategies.

🔨 Train

📖 LoRA Training Tutorial — step-by-step guide covering data preparation, annotation, preprocessing, and training:

Language Link
🇺🇸 English LoRA Training Tutorial
🇨🇳 中文 LoRA 训练教程
🇯🇵 日本語 LoRA トレーニングチュートリアル
🇰🇷 한국어 LoRA 학습 튜토리얼

See also the LoRA Training tab in Gradio UI for one-click training, or Gradio Guide - LoRA Training for UI reference.

🔧 Advanced Training with Side-Step — CLI-based training toolkit with corrected timestep sampling, LoKR adapters, VRAM optimization, gradient sensitivity analysis, and more. See the Side-Step documentation.

🏗️ Architecture

ACE-Step Framework

🦁 Model Zoo

Model Zoo

DiT Models

DiT Model Pre-Training SFT RL CFG Step Refer audio Text2Music Cover Repaint Extract Lego Complete Quality Diversity Fine-Tunability Hugging Face
acestep-v15-base 50 Medium High Easy Link
acestep-v15-sft 50 High Medium Easy Link
acestep-v15-turbo 8 Very High Medium Medium Link

XL (4B) DiT Models

XL models use a larger 4B-parameter DiT decoder (~9GB bf16) for higher audio quality. They require ≥12GB VRAM (with offload + quantization) or ≥20GB (without offload). All LM models are fully compatible.

DiT Model Pre-Training SFT RL CFG Step Refer audio Text2Music Cover Repaint Extract Lego Complete Quality Diversity Fine-Tunability Hugging Face
acestep-v15-xl-base 50 High High Easy Link
acestep-v15-xl-sft 50 Very High Medium Easy Link
acestep-v15-xl-turbo 8 Very High Medium Medium Link

LM Models

LM Model Pretrain from Pre-Training SFT RL CoT metas Query rewrite Audio Understanding Composition Capability Copy Melody Hugging Face
acestep-5Hz-lm-0.6B Qwen3-0.6B Medium Medium Weak
acestep-5Hz-lm-1.7B Qwen3-1.7B Medium Medium Medium
acestep-5Hz-lm-4B Qwen3-4B Strong Strong Strong

🔬 Benchmark

ACE-Step 1.5 includes profile_inference.py, a profiling & benchmarking tool that measures LLM, DiT, and VAE timing across devices and configurations.

python profile_inference.py                        # Single-run profile
python profile_inference.py --mode benchmark       # Configuration matrix

📖 Full guide (all modes, CLI options, output interpretation): English | 中文

📜 License & Disclaimer

This project is licensed under MIT

ACE-Step enables original music generation across diverse genres, with applications in creative production, education, and entertainment. While designed to support positive and artistic use cases, we acknowledge potential risks such as unintentional copyright infringement due to stylistic similarity, inappropriate blending of cultural elements, and misuse for generating harmful content. To ensure responsible use, we encourage users to verify the originality of generated works, clearly disclose AI involvement, and obtain appropriate permissions when adapting protected styles or materials. By using ACE-Step, you agree to uphold these principles and respect artistic integrity, cultural diversity, and legal compliance. The authors are not responsible for any misuse of the model, including but not limited to copyright violations, cultural insensitivity, or the generation of harmful content.

🔔 Important Notice
The only official website for the ACE-Step project is our GitHub Pages site.
We do not operate any other websites.
🚫 Fake domains include but are not limited to: ac**p.com, a**p.org, a***c.org
⚠️ Please be cautious. Do not visit, trust, or make payments on any of those sites.

🌐 Community & Ecosystem

Check out Awesome ACE-Step — a curated list of community projects, alternative UIs, ComfyUI nodes, cloud deployments, training tools, and more built around ACE-Step.

🙏 Acknowledgements

This project is co-led by ACE Studio and StepFun.

📖 Citation

If you find this project useful for your research, please consider citing:

@misc{gong2026acestep,
	title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
	author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, 
	howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
	year={2026},
	note={GitHub repository}
}