Powered by

Home Technology

How AI video generator Sora works?

OpenAI announced a new model that can turn text into video, named Sora. This model could set a new standard for what generative AI can do.

By Ground report
New Update
OpenAI's AI Video Generator: What is Sora and how does it work?

OpenAI, an AI startup, has announced a new model that can turn text into video, named Sora. This model could set a new standard for what generative AI can do.

Sora is similar to Google’s text-to-video tool Lumiere, but it has some advantages. For example, Sora can make videos that are up to 1 minute long, while Lumiere can only make 10-second clips.

Generative AI is a field that aims to create new content from data, such as text, images, and video. Many big companies, such as OpenAI, Google, Microsoft, and others, are competing to develop the best generative AI models. They want to dominate a market that could be worth $1.3 trillion by 2032, and to attract customers who are curious about generative AI since ChatGPT, a chatbot model, was launched more than a year ago.

OpenAI, the creator of ChatGPT and Dall-E, a model that can generate images from text, announced that experts in topics such as misinformation, hate speech, and bias, often referred to as "red teamers", will test Sora. They aim to identify its weaknesses and limitations, as well as assess the risks of creating realistic deepfakes, videos that are fabricated to appear genuine.

OpenAI also announced that visual artists, designers, and filmmakers will use Sora, providing feedback on the model's quality and creativity. OpenAI aims to demonstrate the upcoming AI abilities to the public.

What is Sora?

According to OpenAI, Sora is a text-to-video model that generates one-minute-long videos while “maintaining the visual quality and adherence to the user’s prompt.” OpenAI claims that Sora is capable of generating complex scenes with numerous characters with specific types of motion and accurate details of the subject and background.

For example, Sora can create a video of a stylish woman walking down a Tokyo street filled with neon lights, a movie trailer featuring the adventures of a space man wearing a red wool knitted motorcycle helmet, or an animated scene of a fluffy monster kneeling beside a melting candle. Sora can also extend existing videos forwards or backwards in time, such as adding more frames to a video of a bird flying or reversing a video of a car driving.

Sora is not the first text-to-video model in the world. Google has also developed a similar tool called Lumiere, which can generate 10-second-long videos from text prompts. However, Sora has some advantages over Lumiere, such as the ability to create longer videos, to handle multiple objects and motions, and to produce more diverse and creative outputs.

How can you try it?

Sora is currently not available to the general public, as OpenAI is still testing and improving the model. However, OpenAI has invited some selected groups of people to try Sora and give their feedback. These groups include:

  • Red teamers: These are experts in areas like misinformation, hateful content, and bias, who will be “adversarially testing the model” to find its weaknesses and potential harms. They will also help OpenAI to develop safety and ethical guidelines for using Sora.
  • Visual artists, designers, and filmmakers: These are creative professionals who will use Sora to generate videos for their projects and experiments. They will also provide feedback on the quality and diversity of Sora’s outputs, and suggest ways to improve the model’s capabilities and user interface.

If you have interest in trying Sora, you have the option to fill out a form on OpenAI's website to apply for joining one of these groups. OpenAI will go through your application and will get in touch with you if they select you. However, OpenAI cannot guarantee your acceptance, as they have set a limit on the number of participants and use specific criteria for selection.

How does it work?

Sora is based on a deep learning technique called transformer, which is a type of neural network that can learn from large amounts of data and generate new content. Sora uses two types of transformers: one for encoding the text prompt into a vector, and another for decoding the vector into a video.

The encoding transformer analyzes the text prompt and extracts the relevant information, such as the objects, actions, and attributes. The decoding transformer then uses this information to generate a video frame by frame, while ensuring that the video is consistent with the text prompt and the previous frames.

Sora also uses a technique called self-attention, which allows the model to focus on the most important parts of the data and ignore the irrelevant ones. For example, when generating a video of a woman walking down a street, Sora can pay more attention to the woman and the street, and less attention to the background and the noise. Self-attention also helps Sora to capture the long-term dependencies and relationships between the text prompt and the video, such as the movement and direction of the objects and the camera.

Sora is trained on a large dataset of videos and their corresponding captions, which are collected from various sources, such as YouTube, Vimeo, and Flickr. Sora learns from these examples how to match the text and the video, and how to generate realistic and diverse videos.

The OpenAI's Sora can also learn from its outputs, by using a technique called self-training, which allows the model to improve itself by generating new data and labels.

Keep Reading

Why ChatGPT is going bankrupt?

ChatGPT is down, when will it be back?

What is ChatGPT, is it going to beat Google in future?

OpenAI CEO calls for AI regulation, expresses concerns over big tech

Why ChatGPT-maker OpenAI fired Sam Altman

Follow Ground Report for Environmental News From India. Connect with us on FacebookTwitterKoo AppInstagramWhatsapp and YouTube. Write us on [email protected] and subscribe our free newsletter

Don’t forget to check out our climate glossary, it helps in learning difficult environmental terms in simple language.