ACE-Step-1.5/scripts
Gong Junmin e34ab47eed
feat(flow-edit): redesign as overlay; supersedes #1165 + #1167 (#1169)
* refactor(flow-edit): redesign as cover-overlay sampler, drop "edit" task (#1156)

The previous design (PRs #1162–#1167) shipped a standalone ``task_type="edit"``
that fed the user's source audio into ``prepare_condition`` for both
branches.  In ACE-Step 1.5 ``prepare_condition`` builds ``context_latents``
from ``src_latents`` (line 1701 of the base modeling), which becomes the
dominant audio-self-conditioning at the decoder.  Both branches receiving
the same source latents meant V_src ≈ V_tar regardless of how we tweaked
the text/lyric encoder inputs — the overlay had no signal to integrate
and produced near-identical output to the baseline at every (n_min, n_max,
n_avg) we tried.

Architecturally correct shape (from user feedback): flow-edit is a
sampler-level technique that layers on top of an existing task, not a
new task.  The cover/cover-nofsq dispatch already pairs ref-audio
context with new caption/lyrics; the overlay simply adds a paired
*source* branch (encoded from new ``flow_edit_source_caption`` /
``flow_edit_source_lyrics``) so V_delta = V_tar - V_src has meaning.

Backend changes
---------------
* Drop ``task_type == "edit"`` everywhere (constants, TASK_INSTRUCTIONS,
  inference skip_lm_tasks, generate_music_request _src_audio_required_tasks,
  generate_music guard rails, task_utils generate_instruction).
* Replace ``edit_target_*`` GenerationParams with ``flow_edit_morph`` +
  ``flow_edit_source_caption/lyrics`` + ``flow_edit_n_min/max/avg`` so
  the user's ``caption``/``lyrics`` keep cover's existing target semantics.
* ``service_generate`` builds ``flow_edit_ctx`` only when
  ``flow_edit_morph=True and task_type in (cover, cover-nofsq)``.
* ``_execute_service_generate_diffusion`` dispatches to the new
  ``dispatch_flow_edit_overlay`` (renamed from ``dispatch_flow_edit``)
  when the overlay is active; otherwise the regular cover dispatch runs
  unchanged.
* ``service_generate_flow_edit_target.py`` renamed to
  ``service_generate_flow_edit_source.py`` — same helpers, but now they
  encode the *source* side (the existing payload already carries the
  user's caption/lyrics as the target).
* ``service_generate_flow_edit.py`` rewritten to drive
  ``flow_edit_pipeline.flowedit_generate_audio`` with the freshly
  encoded source side + payload's target side.

UI changes
----------
* ``build_flow_edit_morph_controls()`` adds a Smooth-morph checkbox +
  source caption/lyrics + n_min/n_max/n_avg sliders inside a group that
  ``mode_ui`` toggles visible only on Remix (cover) mode.  v1 leaves the
  controls visible for inspection but does NOT thread them through
  ``generation_run_wiring`` / ``batch_management_*`` yet — the
  end-to-end UI run path lands in the follow-up PR.  Smoke testing
  goes through the Python API for now (see
  ``scripts/flow_edit_overlay_smoke.py``).

Tests
-----
* Rewrite ``service_generate_flow_edit_test.py`` for the overlay shape:
  9 tests now exercise dispatch source-side tokenization, target-side
  pass-through, missing-method guard, device alignment, default-window
  fallbacks, dict-meta parsing, and retake_seed forwarding.
* Update ``_flow_edit_dispatch_test_support.py`` fakes accordingly.
* All 36 flow-edit / pipeline / helper tests pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(flow-edit): re-target overlay onto text2music + 1.0 default hyperparams (#1156)

User feedback after listening to the cover-overlay v1 outputs:
* drop cover task as the underlying carrier — use text2music instead so
  ``prepare_condition`` produces silence-derived context for both src and
  tar branches (the cleanest condition shape, identical between branches,
  so V_delta is purely text-driven);
* fall back to ACE-Step 1.0's flow-edit defaults: ``n_min=0, n_max=1.0,
  n_avg=1, infer_steps=60``.

Implementation
--------------
* ``flow_edit_pipeline.flowedit_generate_audio`` gains a
  ``ctx_src_latents`` param.  When the caller wants text2music-style
  context but still needs the real ``src_latents`` for ``zt_src``/
  ``zt_tar`` formation in the sampling loop, it passes
  ``ctx_src_latents=silence``.  Both ``prepare_condition`` calls then
  feed silence as both ``hidden_states`` and ``src_latents`` while the
  loop continues to use the real audio for trajectory formation.
* ``service_generate_flow_edit.dispatch_flow_edit_overlay`` rewritten:
  builds the silence tile, passes ``is_covers=zeros`` (text2music),
  and forwards real ``src_latents`` for the loop.
* ``service_generate.flow_edit_ctx`` now activates only on
  ``task_type == "text2music"`` (was cover/cover-nofsq).
* ``generate_music_request._prepare_reference_and_source_audio`` accepts
  ``flow_edit_morph`` so text2music + morph encodes ``src_audio`` instead
  of silently ignoring it.
* ``generate_music`` locks ``audio_duration`` to source audio for the
  morph case and warns if morph is enabled on a non-text2music task.
* ``inference.generate_music`` no longer forces ``src_audio=None`` for
  text2music when ``flow_edit_morph=True``.
* ``flow_edit_overlay_smoke.py`` now drives a single 1.0-default run
  (``n_min=0, n_max=1.0, n_avg=1, infer_steps=60``) on the SFT model.

Tests
-----
* Update fakes/fixtures so ``task_type='text2music'`` and
  ``is_covers=0``; all 36 flow-edit / pipeline / helper tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(flow-edit): slice silence_latent properly when tiling for prepare_condition

The previous expand() call assumed silence_latent was (1, 1, C) but it's
actually (1, available, C) — for example (1, 15000, C) — so expanding
to (bsz, 4000, C) blew up with a shape mismatch.  Mirror
``conditioning_target._get_silence_latent_slice``: slice the first
``seq`` frames if available, otherwise tile and slice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(flow-edit): add shift=3.0 vs shift=1.0 A/B for overlay smoke

ACE-Step 1.0's ``FlowMatchEulerDiscreteScheduler`` defaults to shift=3.0,
which front-loads the schedule near t=0 (more dense steps at the clean
end).  Our smoke wasn't setting shift explicitly so it ran at shift=1.0
(uniform).  Add a paired run so we can A/B the two on the same source
and confirm whether the shift mismatch explains the v1 distortion the
user reported.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(flow-edit): wire morph UI to text2music + run-handler chain (#1156)

User found that the Gradio demo didn't surface a working morph control
because the UI pieces weren't actually connected end-to-end:

* ``mode_ui`` made ``flow_edit_morph_group`` visible only on Remix
  mode, but the backend dispatches morph for ``task_type == text2music``
  (Custom mode).  Net: morph never fired.  Fix: gate visibility on
  ``is_custom`` and add Custom to ``show_src_audio`` so users can
  upload the source audio for the overlay.
* ``generation_run_wiring`` didn't pass the 6 ``flow_edit_*`` UI
  components to the handler.  Add them to the click-event ``inputs``.
* ``generate_with_batch_management`` and ``generate_with_progress``
  signatures gain the same 6 params and forward them into the
  ``GenerationParams`` constructor.
* ``generate_music_request`` now hard-errors when ``flow_edit_morph``
  is True without a ``src_audio`` (was a silent no-op).
* ``flow_edit_pipeline._warn_about_disabled_v1_tricks`` log no longer
  references the removed ``task_type='edit'`` — overlay is the right
  noun now.
* Drop the "API preview" caveat from the morph checkbox info text.

Hygiene:
* Add ``*.pkf`` and ``flow_edit_test_outputs/`` / ``retake_test_outputs/``
  to ``.gitignore``; remove the 28 .pkf binaries that a previous
  ``git add -A`` accidentally pushed.

Limitations (follow-up PRs):
* save/restore for the 6 morph params isn't threaded through batch_queue
  / metadata yet — restoring a session resets morph to defaults.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): combine retake + smooth-morph into one accordion (#1155, #1156)

Per user feedback after first manual demo round:

* The two variation knobs (Retake = #1155 noise blend, Smooth morph =
  #1156 V_delta overlay) are conceptually paired — both produce a
  controlled deviation from the seeded baseline.  Stack them in one
  ``gr.Accordion("Variation & Smooth Morph")`` with two top-line
  checkboxes; checking a box reveals only that subsystem's inputs.
* Compact retake — variance + seed sit on one row inside their panel.
* Morph controls (source caption / source lyrics / n_min / n_max /
  n_avg) get their own subpanel with explanatory help text below.
* Both subsystems are now visible in Custom / Remix / Repaint modes;
  ``mode_ui`` gates the outer accordion via ``is_custom or is_cover or
  is_repaint``.  Backend dispatch still honours morph only in Custom
  (text2music) for v1; the morph panel info text flags this.
* Send-to-Remix / Send-to-Repaint pre-fills ``flow_edit_source_caption``
  + ``flow_edit_source_lyrics`` with the previous run's prompt so the
  user can flip on morph and edit the top-level caption / lyrics as
  the target without re-typing the source description.

Implementation
--------------
* New ``generation_tab_variation_morph_controls.py`` (renamed from
  ``generation_tab_retake_controls.py``) houses the combined builder.
* ``build_flow_edit_morph_controls`` removed from
  ``generation_tab_secondary_controls.py`` (back under the 200 LOC cap).
* ``_MODE_UI_OUTPUT_KEYS`` swaps ``flow_edit_morph_group`` →
  ``variation_accordion``; ``mode_ui`` outputs gr.update for the new
  key.  Existing UI tests (``context_test.py``) updated.
* ``send_audio_to_remix`` / ``send_audio_to_repaint`` return shape
  grows by 2 (caption + lyrics for ``flow_edit_source_*``);
  ``results_aux_wiring`` adds them to the click outputs list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): retake/edit panels side-by-side; rename to plain Retake / Edit (#1155, #1156)

Three issues reported after the manual demo round:

1. Panels were stacked vertically — when both Retake and Edit were
   checked the layout collapsed top-to-bottom.  Fix: two ``gr.Column``s
   side-by-side inside a single ``gr.Row``; each checkbox sits on top
   of its own panel, expanded independently.
2. Verbose checkbox labels.  Drop the parentheticals: "Retake (variation)"
   → "Retake"; "Smooth morph (flow-edit)" → "Edit".  Accordion title
   becomes "Retake & Edit".
3. The trailing ``gr.Markdown`` with retake explainer was wider than its
   column and got clipped at the right edge.  Replace with a slim
   ``gr.HTML`` line under the Edit panel only (the morph-only caveat),
   and shorten the per-input ``info=`` blurbs so they fit inside the
   slider/textbox without overlap.

Also: place the accordion right under "LM codes Hints" in the layout so
the variation knobs are next to the source-audio inputs they apply to.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): help modal + Copy-from-current button for Retake & Edit (#1155, #1156)

User reported two issues from manual demo:

1. The inline ``gr.HTML`` help block under the Edit panel rendered as a
   black band (z-index / overflow conflict with the surrounding column /
   accordion).  Replace with the project-standard ``create_help_button``
   pattern — a (?) button next to each checkbox that opens a modal with
   full markdown help.  Both Retake and Edit now have their own modals.
2. There was no quick way to bootstrap the Edit ``source caption / lyrics``
   from the user-level fields.  Add a ``📋 Copy from current`` button
   inside the Edit panel that copies ``captions`` → ``flow_edit_source_caption``
   and ``lyrics`` → ``flow_edit_source_lyrics``.  Wired in
   ``generation_text_format_wiring`` next to the existing Format buttons.

Help modal content (en.json):
* ``help.generation_retake`` — explains the variance-preserving sin/cos
  blend (``mixed = cos(v·π/2)·base + sin(v·π/2)·retake``), variance-band
  recommendations, retake_seed semantics, and links issue #1155.
* ``help.generation_edit`` — explains the V_delta integration math
  (``z_edit_{t-Δt} = z_edit_t + (V_tar − V_src)·Δt``), the paired-CFG
  method, recommended hyperparams (Custom mode, shift=3, n_min=0,
  n_max=1, n_avg=1), the Copy-from-current workflow, and cites the
  FlowEdit paper (Kulikov et al., CVPR 2025, arXiv:2412.08629) plus
  issue #1156.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): compact Copy button + side-by-side source caption/lyrics in Edit panel

After the previous round the Copy-from-current button was rendering as a
full-width dark grey bar (Gradio expanded the button to fill the column,
so the label was pushed to one side and the empty fill looked like a
broken help block).  Wrap the button in a ``gr.Row`` with ``scale=0`` +
``min_width=180`` so it claims only the space its label needs.

Per follow-up feedback, ``source caption`` and ``source lyrics`` now sit
side-by-side inside the Edit panel — the previous top-bottom stack
wasted vertical space and made the panel scroll for no reason.  Both
textboxes now share a 4-line minimum on a single row and split the
width 50/50 via ``scale=1`` each.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ui): drop checkbox ``info`` blurbs; help (?) button is enough

The Gradio-auto ``ⓘ`` icon next to the checkbox label and the (?) help
button were both rendered next to each other, looking like two redundant
info indicators (and one of them was a no-op tooltip).  Drop the short
``info=`` strings on the Retake / Edit checkboxes — the (?) modal carries
the full explanation already.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(ui): make issue references clickable in Retake/Edit help modals

The ``_md_to_html`` renderer already converts ``[text](url)`` markdown
links to anchor tags, but the issue references in the help text were
plain ``Issue #1156`` strings.  Wrap them as proper markdown links
pointing at the GitHub issues so the modal renders them as
``<a href=...>Issue #1156</a>`` (matching the existing FlowEdit paper
arXiv link).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ui): drop the outer "Retake & Edit" accordion

The accordion added an unnecessary collapse layer — its title is
self-evident from the two checkboxes underneath, and one extra click
to expand was friction.  Replace with a plain ``gr.Group()`` so the
two checkboxes sit directly in the layout.  Mode-UI visibility still
hides/shows the whole block via the renamed ``variation_group`` key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): warn user when Retake + Think collide (#1155)

Retake's noise-blend variation is only meaningful when every other
condition matches the baseline run.  In particular, with Think on the
LM regenerates audio codes per call, so Retake's variance gets layered
on top of an already-different starting point and the result mixes
"LM drift" with "noise drift" — confusing and rarely what the user
wants.

Two surfaces, one message:

* ``help.generation_retake`` modal gains a "⚠️ Consistency requirement"
  section explaining the seed / Think / knob-locking caveats and the
  recommended A/B-comparison workflow.
* A live ``gr.Markdown`` warning sits inside the Retake panel and
  becomes visible only when both ``retake_enabled`` and ``think_checkbox``
  are simultaneously on.  Wired in
  ``generation_text_format_wiring.py`` next to the existing
  Copy-from-current handler.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(ui): document Think→Retake workflow via LM-codes pinning (#1155)

User feedback: when the baseline you want to retake was generated with
Think on, the cleanest way to lock the LM-side starting point is to
copy the result's LM Codes and paste them into the LM Codes Hints
field, then uncheck Think.  Add this workflow:

* ``help.generation_retake`` modal gains a "Recommended workflow for
  retaking a Think-mode result" section walking through the 5 steps
  (open Score & LRC & LM Codes accordion → copy LM Codes → paste into
  LM Codes Hints → uncheck Think → enable Retake / lock seed / set
  retake_seed / adjust variance).
* The inline Retake×Think warning becomes actionable: it now points
  the user at the exact panels and fields they need to use, with the
  full step-by-step left to the (?) modal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(flow-edit): extend overlay to cover / cover-nofsq tasks (#1156)

User reported Edit didn't fire when generating with LM Codes (cover
scenario) on Gradio.  Root cause: the v1 backend gate restricted morph
to ``task_type == "text2music"`` and silently ignored everything else.

Extend the dispatch:

* ``service_generate.flow_edit_ctx`` now activates on
  ``task_type in (text2music, cover, cover-nofsq)``.
* ``dispatch_flow_edit_overlay`` branches on task:
  - ``text2music`` keeps the silence-context behaviour (the verified
    clean text-driven V_delta path).
  - ``cover`` / ``cover-nofsq`` pass the payload's real ``src_latents``
    and ``is_covers`` straight through, so cover's natural LM-codes
    context flows into both branches via ``prepare_condition``'s
    ``is_covers > 0`` lm_hints branch.  Both branches share the same
    codes, so V_delta is still text-driven.
* ``generate_music`` warning relaxed to flag only repaint / extract /
  lego (which need paired-CFG derivation per task shape).
* ``mode_ui`` row 35 comment updated to note Edit's mode coverage.

Help text refreshed:
* ``help.generation_edit`` now describes both text2music and Remix
  paths, and the "ignored on Repaint/Extract/Lego" caveat replaces the
  old "Custom-only" note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(flow-edit): add flowedit_generate_audio delegate to turbo / xl-turbo (#1156)

User hit ``RuntimeError: Flow-edit overlay requires a base DiT variant``
when toggling Edit on a turbo model.  Add the same thin delegate the
4 base variants carry, with one turbo-specific guard:

* Force ``diffusion_guidance_scale=1.0`` because turbo / xl-turbo are
  CFG-distilled — paired-CFG over a delta that the model wasn't trained
  to produce just amplifies noise.  Log an INFO when overriding so the
  user sees why their guidance_scale slider didn't take effect.

Both variants share the exact same delegate body; resisted the urge to
factor it out because the 4 base variants don't have the gs override
and a single shared helper would muddle that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(flow-edit): drop silence-context override; use task-natural context (#1156)

User reported turbo + Custom + Edit produced pure noise on Gradio.
Root cause: ``dispatch_flow_edit_overlay`` forced ``is_covers=zeros``
and a silence-tiled context for text2music, but turbo's velocity
head was trained on real audio / LM-codes context.  8-step turbo
on silence is OOD — V predictions are unstable and the V_delta
integration over a short schedule accumulates into garbage latents.

Fix: stop overriding the audio context.  Pass the payload's real
``src_latents`` and ``is_covers`` straight through to
``prepare_condition`` for every task type.  Both branches still
share the SAME context (whatever the task naturally built — LM-
codes hints when Think is on / cover task, src-latents auto-
tokenization otherwise), so V_delta is still purely text-driven,
but the velocity head stays in distribution.

Verified by inspection of the user's run log:
* ``Using precomputed LM hints`` printed 3× (LM Phase 1 ran with
  Think on; both flow-edit prepare_condition calls + the downstream
  one for auto-LRC saw the precomputed tensor)
* ``infer_steps=8, guidance_scale=1.00`` → turbo dispatch path
* The forced ``is_covers=zeros`` in our dispatch was discarding the
  precomputed LM-codes hints and falling back to silence-context.

Help text updated to drop the silence-context language and add a
note that turbo's 8-step budget is sufficient for flow-edit (the
V_delta integration uses the same paired-forward count regardless
of variant).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(flow-edit): restore silence-context for text2music; keep cover's natural

Empirical sweep (sft60 / tb60 / tb8w05 / tb8s1, all on the user's
provided audio + LM codes) showed every variant collapsed to peak
≈ 0.007 the moment the dispatch passed real ``src_latents`` /
LM-codes context to ``prepare_condition``.  The previous
"transparent payload" fix was wrong: text2music's training
distribution is (clean target, silence audio context), so feeding it
either real source latents or one-sided LM-codes hints pushes the
velocity head OOD — V_delta accumulates noise, the latent collapses,
and VAE decode + auto-normalisation amplifies the residual to full
scale.  That's the "纯噪音" the user reports.

Branch dispatch:
* text2music — force silence-context (proven working: peak=1.0 on
  the earlier jieyue sft+60 run that used silence-context).
* cover / cover-nofsq — keep the payload's real ``src_latents`` and
  ``is_covers``.  The cover task IS trained on LM-codes-derived
  audio context shared by both branches, so V_delta stays clean
  while staying in distribution.

Documented the diagnostic and the design choice inline so the next
edit doesn't re-do the same loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(flow-edit): drop precomputed LM hints in text2music branch

The empirical sweep showed silence-context worked when ``audio_codes``
was absent (peak=0.92) but collapsed when codes were present
(peak=0.007), even with ``is_covers=0`` forced.  Even though
``prepare_condition``'s ``where(is_covers > 0, lm_hints, src_latents)``
keeps the silence src_latents unmodified, the codes-derived
``lm_hints_25Hz`` tensor itself lingers in downstream tensor paths
and empirically collapses the latent.

Force ``precomputed_lm_hints_25Hz=None`` for the text2music silence-
context branch so ``prepare_condition`` tokenizes silence afresh,
matching the working no-codes path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(flow-edit): strip audio_codes for text2music morph (#1156)

Root cause of the user's "纯噪音" repro: when ``audio_codes`` was
present, ``conditioning_target._prepare_target_latents_and_wavs``
replaced ``target_wavs`` with zeros and put
``_decode_audio_codes_to_latents(codes)`` into ``target_latents``.
That decoded-from-codes tensor sits at a different distribution than
VAE-encoded audio, so flow-edit's ``zt_edit = src_latents.clone()``
started OOD and the V_delta integration collapsed to a near-silent
latent (peak ≈ 0.007), which auto-normalisation then amplified to full
scale as audible noise.

Verified by toggling ``USE_CODES`` in the repro script:
  * with codes  → peak 0.0076 (broken)
  * no codes    → peak 0.9258 (clean output)

Strip ``audio_code_string`` for ``text2music`` + ``flow_edit_morph``
specifically.  The downstream pipeline then VAE-encodes the uploaded
mp3 normally, and zt_edit starts at a real audio latent — flow-edit's
math behaves as intended.

Cover / cover-nofsq + morph keeps codes intact (cover IS trained on
codes-derived context, both branches share it, no OOD).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(flow-edit): skip LM Phase 1 for text2music + morph (#1156)

Earlier fix only zeroed ``audio_code_string_to_use`` at the top of
``generate_music``.  But when Think is on (UI default), LM Phase 1
runs anyway and overwrites ``audio_code_string_to_use`` with
freshly-generated codes — the same codes path then bites
``conditioning_target`` and zt_edit starts OOD again.

Add ``text2music + flow_edit_morph`` to the LM-skip path alongside
cover / cover-nofsq / repaint / extract.  Think / CoT both silently
no-op for the morph case so the downstream pipeline VAE-encodes the
src_audio cleanly regardless of whether Think was checked in the UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(ui): rewrite Retake/Edit help to match current behaviour + i18n (#1155, #1156)

The previous help text for Edit was written when the dispatch still
used silence-context unconditionally for text2music.  After the
codes-context fix (commits 4028a00 + 80f8c9c) the Custom path now
silently drops Think / LM Codes Hints / LM Phase 1 and VAE-encodes
the user's Source Audio directly.  The old help didn't mention this.

Update help.generation_edit (en):
* Add an explicit "Workflow" section with concrete UI steps
  (which mode, where to upload, which fields to fill, recommended
  shift=3.0, what button does what).
* Add a "Mode behaviour" section that documents the silent-drop
  semantics for Custom (drops Think + LM codes), the cover-natural
  context for Remix, and the unsupported / fall-through behaviour
  for Repaint / Extract / Lego.
* Trim the recommended-settings stanza so the per-variant step
  guidance is one line instead of three.

Trim help.generation_retake (en):
* Drop redundant tip lines that duplicated the consistency
  requirement; keep the workflow + reference.
* Tighten the Think-collision warning so it points straight at the
  full workflow.

Translate both entries into zh / ja / pt / he so the (?) modal
shows native-language help everywhere instead of falling back to en.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* debug(flow-edit): log src_audio shape at the morph guard

User reports the "Flow-edit morph requires a source audio" guard fires
even when they uploaded an mp3 to the Source Audio component.  Need to
see what gradio is actually handing the backend (None / empty str /
list / dict?) before adding the right normalisation.

Also widen the missing-check to cover empty strings and empty
list/tuple in case gradio hands ``""`` instead of ``None`` for
cleared components.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): rename LM-Codes-Hints' inner audio upload label to disambiguate

User report: there were two ``Source Audio`` upload boxes on the
generation page — the real ``src_audio`` (in ``src_audio_row``, used
by the Generate Music handler) and the small one inside the LM Codes
Hints accordion (used only by the ``Convert to Codes`` utility
button).  Both shared the same ``generation.source_audio`` i18n key,
so users dropped their mp3 into the wrong one and the Generate
handler saw ``src_audio=None`` → "Flow-edit morph requires a source
audio" hard-error.

Give the inner upload its own label key
``generation.lm_codes_audio_upload_label`` ("Audio → Codes (utility)"
in en, equivalent in zh / ja / pt / he).  No layout change — only the
label disambiguates intent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): stop validate_uploaded_audio_file from silently clearing valid mp3

User repro: they uploaded an mp3 to ``Source Audio`` and saw the 3:18
waveform render, but the Generate handler got ``src_audio=None``.
Root cause: ``validate_uploaded_audio_file`` runs ``soundfile.info()``
on every upload and returns ``gr.update(value=None)`` (silently
clearing the component) on ``OSError / RuntimeError / ValueError``.
``libsndfile``'s mp3 support is spotty across platforms — on jieyue
it refuses files that torchaudio / ``process_src_audio`` decode
cleanly.  The cleared value left the waveform visible (Gradio's
player keeps its display cache) so the user had no signal that the
component state had been zeroed.

Drop the auto-clear path.  The validator now returns ``gr.skip()``
on soundfile errors too.  If the file is genuinely unreadable, the
backend's own decode path raises a clearer error from the dispatch
layer (the morph guard already covers the empty case).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* debug(ui): log key inputs at the wrapper-call level

User keeps hitting ``src_audio=None`` despite the waveform rendering in
the UI.  Static analysis says the inputs list and the wrapper signature
are aligned (78=78, src_audio at position 14 in both).  Need to see
what gradio actually hands the wrapper at click time to differentiate
between (a) wrong slot = some other component bleeding into src_audio
or (b) gradio component state genuinely None despite the visible
waveform.

Log only at the wrapper entry — backend already logs at the morph
guard, so we get a comparison from both sides of the chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): preserve src_audio for text2music + flow_edit_morph (#1156)

Root cause of the user's "Flow-edit morph requires a source audio"
error chain.  In ``generation_progress.generate_with_progress`` (the
layer right above ``GenerationParams``):

    if task_type == "text2music":
        src_audio = None

This pre-overlay defensive guard zeroed ``src_audio`` for every
Custom-mode run.  When flow_edit_morph was enabled the backend then
saw ``src_audio=None`` and bailed via the morph guard ("Flow-edit
morph requires a source audio").  Earlier debug rounds — soundfile /
torchaudio / ffmpeg checks, label disambiguation, validation
non-clearing, inference's own ``src_audio`` ternary — were all
chasing downstream symptoms.  This wrapper-level zeroing was the
real source.

Gate the zeroing on ``not flow_edit_morph`` so the morph path keeps
``src_audio`` and Custom-mode without morph still drops it (no
behaviour change for that case).

Also remove the now-stale wrapper-level debug log added while
hunting; the src_audio guard log in the request layer is enough.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(ui): correct Edit workflow ordering (original first, then target)

The previous workflow had user filling top-level caption/lyrics with the
target before clicking Copy current → source — which would just snapshot
the target into the source fields, defeating the whole point of the
button.  Reorder so the user:

1. Fills top-level fields with the **original** description (V_src).
2. Clicks Copy current → source to snapshot it.
3. Then edits top-level fields to define the **target** (V_tar).

Updated en + zh + ja + pt + he in lockstep so every language describes
the same correct sequence.  Also added a tip noting that Send-to-Remix
/ Send-to-Repaint pre-fills source automatically, so steps 3 and 5 are
skipped in that path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: remove debug logs added during the src_audio=None investigation

The wrapper-level "[wrapper] inputs at click time" log and the
morph-guard "[generate_music] flow_edit_morph guard:" log were both
added while hunting commit 1b0a95c.  Now that the root cause
(``generation_progress`` zeroing src_audio for text2music
unconditionally) is fixed, drop the verbose logging.  The empty-string
+ empty-list normalization in the guard stays — that's a real
defensive check, not debug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: remove .claude worktrees + scheduled_tasks.lock from tracking

These got into the branch via an earlier ``git add -A`` and lingered
through history.  They're Claude Code's local dev artefacts (per-agent
git worktrees + scheduling lock) that should never be in repo history.

Untrack them via ``git rm --cached`` and add explicit .gitignore lines
to prevent re-introduction.  After the squash-merge to main these
files won't be in main's tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:48:22 +08:00
..
lora_data_prepare feat: add lora training tutorial (#593) 2026-02-16 07:32:24 +08:00
check_gpu.py Add diagnostic messaging for AMD GPU detection failures (#302) 2026-02-08 04:55:35 +08:00
fetch-awesome.mjs docs: use ACE-Step logo, add Ecosystem page with auto-synced awesome list 2026-03-06 22:50:46 +08:00
flow_edit_overlay_smoke.py feat(flow-edit): redesign as overlay; supersedes #1165 + #1167 (#1169) 2026-04-30 21:48:22 +08:00
new_pr_branch.ps1 chore(workflow): enforce independent PR branches with pre-push guard (#595) 2026-02-16 07:30:19 +08:00
prepare_vae_calibration_data.py dynamic quantization for dit model & data prepare for further static quantized vae 2025-12-22 14:47:52 +00:00
profile_vram.py feat: GPU compatibility tier system with boundary testing 2026-02-10 07:39:11 +00:00