Seedance 2.0 is officially launched!
Since the days when we could only tell stories with text and first/last frames, we've dreamed of building a video model that truly understands your expression. Today, it's finally here!
Seedance 2.0 now supports four input modalities: image, video, audio, and text, offering richer expression and more controllable generation.
You can set the visual style with a single image, specify character movements and camera changes with a video, and set the rhythm and atmosphere with a few seconds of audio... Combined with prompts, the creation process becomes more natural, efficient, and truly like being a "director".
Precise Image Reference Reproduction
Accurate reproduction of composition, character details
Reference Video Replication
Supports replication of camera language, complex action rhythms, and creative effects
Smooth Extension & Continuity
Generate continuous shots from prompts β not just generate, but "keep filming"
Enhanced Editing
Supports character replacement, removal, and addition in existing videos
Video creation has never been just about "generation" β it's about controlling expression. 2.0 is not just multimodal, it's a truly controllable way to create.
Seedance 2.0, multimodal creation starts here. Dare to imagine β leave the rest to it.
1. Parameter Overview
| Core Dimension | Seedance 2.0 |
|---|---|
| Image Input | Up to 9 images |
| Video Input | Up to 3 videos, total duration no more than 15s (reference videos cost a bit more) |
| Audio Input | Supports MP3 upload, up to 3 files, total duration no more than 15s |
| Text Input | Natural language |
| Generation Duration | Up to 15s, freely choose between 4-15s |
| Audio Output | Built-in sound effects/background music |
<strong>Interaction Limit:</strong> The current maximum for mixed inputs is <strong>12 files</strong>. We recommend prioritizing materials that have the greatest impact on visuals or rhythm, and allocating file counts wisely across modalities.
2. Interaction Methods
Note: Seedance 2.0 supports "First/Last Frame" and "Universal Reference" entry points. Smart multi-frame and subject reference are not selectable. If you only upload a first frame image + prompt, use the First/Last Frame entry; for multimodal (image, video, audio, text) combined input, enter through the Universal Reference entry.
The current interaction method uses <code>@material_name</code> to specify the purpose of each image, video, and audio, for example:
- @Image1 as first frame
- @Video1 reference camera language
- @Audio1 for background music
Main Interface

Entry: Seedance 2.0 - Universal Reference / First-Last Frame

Open local file dialog

Select files, add to input box
Universal Reference Mode β Method 1: Type "@" to invoke reference

Type "@"

Select reference, drops into input box

Enter prompt
Universal Reference Mode β Method 2: Click the "@" tool to invoke reference

Click "@"

Select reference, drops into input box

Enter prompt
After uploading materials, images, videos, and audio all support hover preview:



Below are some usage examples and creative approaches for different scenarios to help you better understand Seedance 2.0's improvements in generation quality, control capability, and creative expression. If you don't know where to start, check out these examples for inspiration!
Seedance 2.0 Capabilities / Improvement Preview
1. Significantly Enhanced Basic Capabilities: More Stable, Smoother, More Realistic!
Beyond multimodal, Seedance 2.0 is significantly enhanced at the foundational level β <strong>more realistic physics</strong>, <strong>more natural and fluid motion</strong>, <strong>more precise instruction understanding</strong>, <strong>more stable style consistency</strong>. It can reliably handle complex actions, continuous motion, and other challenging generation tasks, making overall video output more realistic and smooth β a comprehensive evolution of core capabilities!
A girl elegantly hanging clothes to dry, after finishing she picks up another piece from the bucket and vigorously shakes it out.
First frameThe character in the painting has a guilty expression, eyes darting left and right peeking out of the frame, quickly reaches out to grab a cola and takes a sip, showing a satisfied expression. Then footsteps are heard, the character hurriedly puts the cola back. A cowboy picks up the cola and walks away. Finally the camera pushes forward as the screen fades to black with only a top-lit cola can, with artistic subtitles at the bottom: "YiKou Cola, a must-try!"
First frameCamera pulls back slightly (revealing the full street view) and follows the heroine as she walks. Wind blows her skirt hem as she walks through a 19th-century London street. A steam-powered car drives by quickly from the right side, its wind blowing up her skirt as she frantically presses it down with both hands in shock. Background sounds include footsteps, crowd noise, and vehicle sounds.
First frameCamera follows a man in black fleeing rapidly with a crowd chasing behind. Camera switches to a side tracking shot as the panicked character knocks over a fruit stand, gets up and continues running, with sounds of the chaotic crowd.
First frame2. Comprehensive Multimodal Upgrade: Video Creation Enters the "Free Combination" Era!
2.1 Multimodal Introduction
Supports uploading text, images, videos, and audio β all of which can be used as source or reference materials. You can reference any content's actions, effects, style, camera movement, characters, scenes, and sounds. As long as your prompt is clear, the model can understand it.
Seedance 2.0 = Multimodal Reference (reference anything) + Strong Creative Generation + Precise Instruction Response (excellent comprehension)
Just describe the visuals and actions you want in natural language, and clarify whether it's a reference or an edit. When using multiple materials, double-check that each @reference is clearly labeled β don't mix up images, videos, and characters!
2.2 Special Usage Methods (No Limits, Just Suggestions)
Have first/last frame images? Also want to reference video actions?
Write clearly in the prompt, e.g.: "@Image1 as first frame, reference @Video1's fighting actions"
Want to extend an existing video?
Specify the extension duration, e.g. "Extend @Video1 by 5s". Note: The selected generation duration should be the duration of the "new portion" (e.g., extend 5s, also select 5s generation length)
Want to merge multiple videos?
Explain the composition logic in the prompt, e.g.: "I want to add a scene between @Video1 and @Video2, content is xxx"
No audio materials?
You can directly reference audio from a video
Want to generate continuous actions?
Add continuity descriptions in the prompt, e.g.: "Character transitions directly from jumping to rolling, maintaining fluid and coherent motion" @Image1@Image2@Image3...
2.3 Those Long-Standing Video Challenges Can Now Actually Be Solved!
Video creation always has its pain points: faces changing between shots, actions not matching, unnatural video extensions, rhythm going off during edits... This multimodal upgrade tackles all these "persistent headaches" at once. Below are specific use cases.
2.3.1 Comprehensive Consistency Improvement
You may have encountered these frustrations: characters looking different from shot to shot, product details getting lost, small text becoming blurry, scene jumps, inconsistent camera styles... These common consistency issues in creation can now all be resolved in 2.0. From faces to clothing to font details, overall consistency is more stable and accurate.
Man @Image1 walking tiredly through the corridor after work, his pace slowing, finally stopping at the front door. Close-up on face, the man takes a deep breath, adjusts his emotions, puts away the negativity, becomes relaxed. Then close-up of finding keys, inserting into the lock. After entering the home, his little daughter and a pet dog joyfully run over to greet and hug him. The interior is very warm and cozy, with natural dialogue throughout.
Character referenceReplace the girl in @Video1 with a Chinese opera actress (Hua Dan), on an exquisite stage. Reference @Video1's camera movements and transition effects, using camera angles to match the character's actions, achieving ultimate stage aesthetics and enhanced visual impact.
Using the reference image character's appearance, generate a teaser trailer for a period time-travel drama. 0-3 seconds: The male lead with the appearance from reference image 1 holds up a basketball, looking up at the camera. Saying "I just wanted a drink, am I really about to time travel?..." ...
Character referenceReference all transitions and camera movements from @Video1, one continuous shot. Starting with a chess board, camera pans left to reveal yellow sand on the floor, camera moves up to a beach...
0-2 seconds: Quick four-panel flash cuts, red, pink, purple, and leopard print bow ties shown in sequence, close-up on satin sheen and "chΓ©ri" brand text... (Korean voiceover ad)
Product imageCreate a commercial-style showcase of the bag in @Image2, with the side view referencing @Image1, the surface texture referencing @Image3. All details of the bag should be showcased, with grand and majestic background music.
Side reference
Bag main body
Texture referenceUse @Image1 as the first frame, first-person perspective, reference @Video1's camera effects. Upper scene references @Image2, left scene references @Image3, right scene references @Image4.
First frame
Upper scene
Left scene
Right scene2.3.2 Advanced / Controllable Camera Movement and Action Precise Replication
Previously, to make a model mimic movie-style blocking, camera work, or complex actions, you either had to write tons of detailed prompts or simply couldn't do it. Now, just upload a reference video and you're good to go.
Reference @Image1's male character, he's in the elevator from @Image2. Fully reference @Video1's camera effects and the protagonist's facial expressions. Hitchcock zoom during the panic, then several orbiting shots showing the elevator interior...
Character
Elevator scene
Scene referenceReference @Image1's male character, he's in the corridor from @Image2. Fully reference @Video1's camera effects and the protagonist's facial expressions. Camera follows the protagonist running around corners in @Image2...
Character
Corridor
Long hallway
Fork in the road
Scene@Image1's tablet as the main subject, camera referencing @Video1, pushing into a screen close-up, camera rotates as the tablet flips to show its full appearance. Data streams on screen keep changing, surroundings gradually transform into a sci-fi data space.
Tablet@Image1's actress as the main subject, reference @Video1's camera techniques for rhythmic push-pull-pan movements. The actress's movements also reference the dance moves of the woman in @Video1, performing energetically on stage.
ActressReference @Image1@Image2 spear-wielding character, @Image3@Image4 dual-blade character, imitate @Video1's actions, fighting in the maple leaf forest from @Image5.
Spear character 1
Spear character 2
Dual-blade character 1
Dual-blade character 2
Maple leaf forestReference Video1's character actions, reference Video2's orbiting camera language. Generate a fight scene between Character 1 and Character 2. The fight takes place under a starry night, with white dust rising during the battle. The fight scene is spectacular and the atmosphere is very tense.
Character 1
Character 2Reference Video1's camera work and shot transition rhythm, replicate using Image1's red supercar.
Red supercar2.3.3 Creative Templates / Complex Effect Precise Replication
More than just generating images and writing stories, Seedance 2.0 also supports "copying from reference" β creative transitions, finished ads, movie clips, complex edits. As long as you have reference images or videos, the model can identify action rhythms, camera language, visual structure, and precisely replicate them. Don't worry if you don't know professional terminology β just describe what you want to reference, and the model will generate a high-quality version for you. Be bold! It can really do it.
Replace the character in @Video1 with @Image1, @Image1 as first frame. Character puts on virtual sci-fi glasses, reference @Video1's camera work, close orbiting shots, transitioning from third-person to the character's subjective perspective, traveling through the AI virtual glasses...




Reference the model's facial features from the first image. The model wears outfits from reference images 2-6 and approaches the camera, striking playful, cool, cute, surprised, and stylish poses...
Model
Outfit 1
Outfit 2
Outfit 3
Outfit 4
Outfit 5Reference the video's advertising concept, use the provided down jacket images, with the following ad copy: "This is goose down, this is the warm swan, this is the wearable polar swan-down jacket. Stay warm for the new year, live warm." Generate a new down jacket ad video.
Down jacket
Goose down
SwanBlack and white ink wash style, @Image1's character references @Video1's effects and actions, performing a segment of ink-wash style Tai Chi kung fu.
CharacterReplace @Video1's opening character with @Image1, fully reference Video1's effects and actions. Rose petals grow from the flower stamen in hand, cracks extend upward on the face...
Character 1
Character 2Starting from @Image1's ceiling, reference @Video1's puzzle-shatter effect for transition. Replace "BELIEVE" text with "Seedance", reference @Image2's font.
Ceiling
Font referenceOpening with a black screen, reference Video1's particle effects and texture. Golden gilded sand drifts from the left side of the frame and covers to the right, reference @Video1's particle scatter effect. @Image1's text gradually appears in the center of the frame.
Text@Image1's character references the actions and expression changes from @Video1, showcasing the abstract behavior of eating instant noodles.
Character2.3.4 Model Creativity & Storyline Completion
Animate @Image1 in left-to-right, top-to-bottom order as a comic performance, keeping character dialogue consistent with the image. Add special sound effects for panel transitions and key plot moments. Overall style should be humorous and witty; performance style references @Video1.
Comic imageReference @Image1's documentary-style storyboard, referencing @Image1's shot divisions, framing, camera movements, visuals, and copy. Create a 15s healing-style opening about "The Four Seasons of Childhood".
StoryboardReference Video1's audio, using Images 1-5 as inspiration, create an emotion-driven video. Background music references @Video1.





2.3.5 Video Extension
Extend 15s video, reference @Image1, @Image2's donkey-riding-motorcycle character. Add a creative ad segment: Scene 1: Fixed side camera, donkey rides motorcycle out of the barn... Scene 3: ...ad slogan "Inspire Creativity, Enrich Life"
Donkey look 1
Donkey look 2Extend video by 6s, electric guitar music kicks in, "JUST DO IT" ad text appears mid-screen then gradually fades, camera moves up to the ceiling...
Athletic wear
LogoExtend @Video1 by 15 seconds. 1-5s: Light and shadow slowly slide through blinds across the wooden table and cup... 11-15s: Text gradually appears: "Lucky Coffee", "Breakfast", "AM 7:00-10:00".
Extend forward by 10s. In warm afternoon light, the camera starts from a row of awnings fluttering in the breeze at the street corner, slowly panning down to a few small daisies peeking out at the base of the wall...
2.3.6 More Accurate Audio, More Realistic Sound
Fixed camera, central fisheye lens looking down through a circular opening. Reference Video1's fisheye lens, have the horse from @Video2 look at the fisheye lens, reference @Video1's speaking actions, background BGM references audio from @Video3.
Based on the provided office building promotional photos, generate a 15-second cinematic realistic-style real estate documentary in 2.35:1 widescreen, 24fps. The narrator's voice tone references @Video1...



A roasting dialogue in a "Cat & Dog Roast Room", with rich emotions fitting a stand-up performance: Meow-chan (cat host): "Who understands this, family?...", Wangzai (dog host): "You have the nerve to talk about me?..."
Scene referenceThe opening music of the classic Yu Opera segment "The Case of Chen Shimei" begins. The black-robed Judge Bao on the left points at the red-robed Chen Shimei on the right, singing Yu Opera through gritted teeth...
Scene referenceGenerate a 15-second music video. Keywords: steady composition / gentle push-pull / low-angle heroic feel / documentary but premium... Sunset side-backlight volumetric rays through dust particles, cinematic composition, real film grain, gentle breeze moving coat hems.
Scene referenceThe girl in the center wearing a hat gently sings "I'm so proud of my family!"... Latin music starts in the background... The whole family forms a circle, dancing to lively music, skirts swirling.
Scene referenceFixed camera. The standing muscular man (captain) clenches his fist and says in Spanish: "Raid in three minutes!"... Everyone stands at attention, completing tactical hand signals amid the sound of equipment clashing.
Scene reference0-3s: Opening alarm clock rings... 3-10s: Quick pan shot, cutting to the opposite side with a close-up of the man's face. The man reluctantly wakes the girl, voice tone and timbre reference @Video1... 12-15s: Cut to full body of the male lead, he sighs: "I really can't do anything about you!"
Girl
Man@Image1's monkey walks to the bubble tea shop counter... The monkey orders from the server in a Sichuan accent: "Hey sis, do you have 'Farewell My Concubine'?"
Monkey
Bichon server
Bubble tea shopIn a popular science style and voice, narrate the content from Image 1, which includes the story of Sun Wukong borrowing the Banana Fan from Princess Iron Fan to cross the Flaming Mountains...
Journey to the West illustration2.3.7 Stronger Shot Continuity (One-Take)
@Image1@Image2@Image3@Image4@Image5, one-take tracking shot, following a runner from the street up stairs, through a corridor, onto a rooftop, and finally overlooking the city.





Starting with @Image1 as the first frame, the view zooms out to an airplane window. Clouds slowly drift into frame, one of them adorned with colorful candy beans... gradually transforming into @Image2's ice cream...
Window
Ice cream
CharacterSpy thriller style, @Image1 as the opening frame. Camera follows a female agent in a red coat walking forward... No cuts throughout, one continuous take.
First frame
Corner building
Masked girl
MansionFrom @Image1's exterior shot, first-person perspective quick push into the cabin interior close-up. A little deer @Image2 and a sheep @Image3 are drinking tea and chatting by the fireplace. Camera pushes in for a close-up of the teacup, style referencing @Image4.
Exterior
Deer
Sheep
Teacup@Image1@Image2@Image3@Image4@Image5, first-person one-take thrilling roller coaster shot, with the coaster going faster and faster.





2.3.8 Highly Usable Video Editing
Sometimes you already have a video and don't want to find new images or redo everything from scratch β you just want to adjust a small segment of action, extend a few seconds, or make a character's performance closer to your vision. Now you can use existing video as input and make targeted modifications to specific segments, actions, or rhythms without changing anything else.
Subvert @Video1's storyline. The man's eyes shift instantly from tender to cold and ruthless. In a moment when the heroine is completely off guard, he forcefully pushes her off the bridge...
Subvert @Video1's entire storyline. 0-3s: Man in suit sitting at a bar... 6-9s: Suddenly the suited man pulls out from under the table β an absurdly large snack gift package...
Replace the female lead singer in Video1 with Image1's male lead singer. Actions completely imitate the original video, no cuts, band performance music.
Male lead singerChange the woman's hairstyle in Video1 to long red hair. Image1's great white shark slowly surfaces halfway, behind her.
Great white sharkVideo1 camera pans right, the fried chicken shop owner busily hands fried chicken to customers in line... Close-up of the owner holding a paper bag printed with Image1's logo...
Paper bag logo2.3.9 Music Beat Sync
The girl in the poster keeps changing outfits, clothing style referencing @Image1@Image2, holding @Image3's bag, video rhythm references @Video.




Images @Image1-7 sync to @Video's keyframe positions and overall rhythm for beat matching. Characters in the frames are more dynamic...






@Image1-6 landscape scenes, reference @Video's visual rhythm, transitions match scene style and music rhythm for beat sync.






2.3.10 Better Emotional Performance
@Image1's woman walks to the mirror, looks at herself. Pose references @Image2. After a moment of contemplation, she suddenly starts screaming in breakdown. The grabbing motion and breakdown screaming emotions and expressions fully reference @Video1.
Woman
Pose referenceThis is a range hood ad. @Image1 as the opening frame, woman elegantly cooking with no smoke. Camera quickly pans right to @Image2 man sweating profusely, face red, cooking...
Woman cooking
Man cooking
Range hood@Image1 as the first frame, camera rotates and pushes closer. Character suddenly looks up, facial appearance references @Image2. Starts roaring loudly, excited with some comedic flair, referencing @Image3's expression. Then the character transforms into a bear, referencing @Image4.
First frame
Face reference
Expression reference
Bear referenceA Final Word
Seedance 2.0's multimodal capabilities are constantly evolving. We will continue to update features and support more input combinations. We hope this user manual helps you unleash your creativity more freely!
If you encounter bugs, have usage suggestions, or need specific scenarios, feel free to leave a message or DM us! We'll keep optimizing to make Jimeng a truly enjoyable and convenient productivity tool for you.
Frequently Asked Questions (FAQ)
What input modalities does Seedance 2.0 support?
Seedance 2.0 supports four input modalities: images (up to 9), videos (up to 3, total duration β€15s), audio (MP3, up to 3, total duration β€15s), and text (natural language). The combined input limit is 12 files.
How long of a video can Seedance 2.0 generate?
It can generate videos up to 15 seconds, with free selection between 4-15 seconds. It also supports video extension, allowing you to continue generating from an existing video.
How do I use the multimodal reference feature?
In Universal Reference mode, use "@material_name" to specify the purpose of each image, video, and audio. For example: @Image1 as first frame, @Video1 for camera reference, @Audio1 for background music. You can type "@" directly in the input box or click the "@" button in the toolbar.
What are Seedance 2.0's core capability improvements?
Core capabilities include: multimodal reference (reference anything), precise camera and action replication, creative effect replication, video extension and continuity, video editing (character replacement/removal/addition), music beat sync, one-take continuity, emotional performance, and voice generation. Physics are more realistic, motion is more natural and fluid, instruction understanding is more precise, and style consistency is more stable.
How do I extend an existing video?
After uploading a video, specify the extension duration in the prompt, e.g. "Extend @Video1 by 5s". Note: The generation duration should be set to the "new portion" duration β e.g., if extending by 5s, also select 5s generation length. Both forward and backward extension are supported.
What's the difference between First/Last Frame and Universal Reference?
If you only upload a first frame image + prompt, you can use the First/Last Frame entry for a simpler workflow. For multimodal (image, video, audio, text) combined input, you need to use the Universal Reference entry. Universal Reference mode is more powerful and supports more complex creative needs.