The Video AI Showdown: How Do Gen-2, Pika and Moonvalley Stack Up?
A thorough test reveals very different results
Text-to-video AI is an emerging technology that converts text prompts into short video clips. A lot of progress has been made in this area recently and results have been going from surreal looking abstractions to coherent video clips of recognisable and coherent scenes.
In this article, I test and compare three leading text-to-video models - RunwayML's Gen-2, Pika Labs, and Moonvalley - using the same prompts to evaluate their video quality, understanding of prompts, and rendering of realistic motion.
The owl
Prompt 1: an ultra slo-mo recording with a high speed camera of an owl flying on a black background
When it comes to understanding the prompt Pika Labs fail quite a lot. The owl is not on a black background and we don’t even see the wings of the owl. There is also hardly any movement. Moonvalley does better and generates some quite convincing wing movements but it could be argued that the clip is not ultra slow motion. Gen-2 does a decent job overall understanding the prompt and creating some slo-mo movement. When it comes to image quality it is a matter of taste. Gen-2 looks more photorealistic while Moonvalley is very sharp but a little bit artificial. Pika Labs has some very unconvincing lighting but it’s hard to tell when the image is so cropped.
Dancers
Prompt 2: A man and a woman dancing quick rock'n'roll with a lot of movement
Pika Labs and Gen-2 definitely win on the overall look of these clips. Moonvalley has a very artificial look especially in the skin tones.
I admit I created a challenging prompt, rapid movement is not easy for these models and there is some way to go before we’ll see fully convincing dance videos generated with AI. With that in mind Moonvalley does create a decent movement that actually looks like rock’n’roll dancing. Pika Labs is decent too but suffers from a lot of blur on the moving limbs. With this prompt Gen-2 fails completely and chooses to create a very slo-mo clip which is far off the prompt.
Skipping rope
Prompt 3: a girl jumping on a skipping rope
Here is another very challenging prompt that requires an understanding of a very specific type of movement. First off Gen-2 fails on all parameters, the skipping rope looks crazy and again the movement is slo-mo even though we didn’t ask for that. Also if you look closely the face of the girl is disfigured. Moonvalley suffers from the artificial look but anatomy and body movement looks good, only we don’t have a rope that is actually moving. Pika Labs has a movement that seems ok at first but you’ll notice that there is some weird morphing of the body.
Make your own clips
RunwayML’s Gen-2 is a paid service with a free trial. You can sign up here. Pika Labs is still in beta but you can apply for access via Discord here. Moonvalley is also an early version that has a Discord access.
Final thoughts
Although text-to-video tools have come a long way there is still some way to go before we have fully production ready video generation and I think this test shows some of the strengths and weaknesses of the current technology.
Overall Gen-2 is favoured by many artists for its convincing image quality but as this test shows it’s lagging behind in understanding the movement stated in the prompt. Everything seems to become slow motion in Gen-2.
Pika Labs is a good alternative for some types of clips as it seems to handle movement better.
Moonvalley is a serious contender though. There seems to be a good coherence in creating the correct movement of a scene, only does it suffer from an image quality that is artificial looking. But maybe with some good prompt engineering that can be overcome.