Overview

What are we building?  An API that lets you lip-sync a video to any audio in any language in HD

**Why is this useful?

[1]** Video localization: converting a video from one language to another

[2] Talking Head Generation: ****generate AI characters / agents who interact like a human would face-to-face.

[3] And more. what can you think of?

⚠️ Please understand this is the absolute barebones MVP version of our API. Be mindful of this while you play + test. We’re trusting the community to make good decisions – we’ve heard from so many of you and wanted to get something into your hands earlier than later.

plz be gentle.

link to discord here. link to feedback form here.

<aside> 💡 update: this will turn off Friday 09/01 midnight – dm @therealprady on twitter to request access to our limited private beta

</aside>

How does it work?

https://www.loom.com/share/c7e9388071454c83abc6dffa9393900b?sid=4e07575d-ab4a-4106-b1ca-77f52a605474

Overview

  1. Make a post request w/ two links to a hosted [1] audio + [2] video file
  2. We lip sync video / image to audio automatically, no training required
  3. You receive a link to a synchronized video → [future] stream video back as it processes

Request Details

Endpoint: https://rogue-yogi--wav2lip-2-v0-1-02-generate-sync.modal.run

Type: Post Request

Parameters: [1] audio_uri = {url to hosted audio file}

[2] video_uri = {url to hosted video file}

Here are two examples you can test w/ :

Try to send the sample files listed below to see how you can make David Sacks speak fluent hindi seamlessly.

Example 1 [~10s short]: video | audio