← Back to Unspool Studio|

Documentation

How do multi-speaker video clips work?

When a podcast clip includes more than one speaker, Unspool Studio automatically renders video with speaker attribution — speaker name labels, alternating left/right caption alignment, and a persistent footer showing who's currently speaking. Single-speaker clips render unchanged with centered captions.


How are speakers identified?

Speaker identification happens automatically during transcription. Unspool Studio passes candidate speaker names from podcast metadata — host names, guest names from RSS <podcast:person> tags, and episode descriptions — to the transcription engine, which returns named speakers directly in the transcript.

  • Named speakers appear when the AI can match a voice to podcast metadata (e.g., "Joe Rogan", "Lex Fridman")
  • Generic labels like "Speaker A" or "Speaker B" appear when the AI can't determine identity — you can rename these manually
  • Ad segments are automatically detected and labeled with dimmed styling in both the transcript and video output

No manual tagging is needed. Speakers are labeled before you ever see the transcript.

How do multi-speaker videos look?

When you generate a video from a multi-speaker clip:

  • Alternating alignment — Each speaker's captions are justified left or right so you can visually track the conversation flow
  • Speaker footer — A persistent bar at the bottom of the video shows the active speaker's name with an animated waveform icon
  • Speaker transitions — When the speaker changes, the footer flips with a smooth departure-board animation
  • Karaoke highlighting — Word-by-word highlighting works the same as single-speaker clips, synced to the audio

Single-speaker clips skip all of this and render with standard centered captions.

How do I edit speaker names and assignments?

Click any speaker label in the transcript to open a dropdown with three options:

  • Rename — Change the speaker's display name. This updates all segments attributed to that speaker across the entire transcript.
  • Reassign — Move a segment to a different existing speaker
  • Add new speaker — Create a new speaker and assign the segment to them

You can also select text in the transcript and use the Change Speaker action in the selection popover. This splits the segment at word boundaries so you can reassign just part of a speaker turn — useful when the AI merged two speakers into one segment.

Speaker edits are saved separately from the transcript so they survive canvas save/load cycles. Videos automatically regenerate when you change speaker assignments.

What about ad segments?

Segments the AI identifies as ad reads appear with dimmed styling in both the transcript and video output. They're still included in the clip if they fall within the boundaries, but they're visually distinct so viewers can tell them apart from the main conversation.


Frequently Asked Questions

Do I need to do anything to enable multi-speaker captions?

No. If a clip contains multiple speakers, the video automatically renders with speaker labels and alternating alignment. There's nothing to enable.

Can I rename "Speaker A" to a real name?

Yes. Click the speaker label in the transcript and choose Rename. The new name applies to every segment attributed to that speaker and updates the video automatically.

What if the AI attributes a segment to the wrong speaker?

Click the speaker label on that segment and choose Reassign to move it to the correct speaker. Or select specific words and use Change Speaker to split the segment at word boundaries.

Do speaker edits affect the audio?

No. Speaker assignments only affect the visual labels and caption alignment in the video. The audio is unchanged.

Do multi-speaker clips work with all export formats?

Yes. Multi-speaker captions render in all aspect ratios (16:9, 1:1, 9:16) and with all color palettes and fonts.


Related: AI Transcription | Editing Clips | Clip Suggestions