In Hollywood, the promise of artificial intelligence is all the rage: who wouldn’t want a technology that adds the magic of AI to smarter computers for an instant solution to tedious, time-intensive problems? With artificial intelligence, anyone with abundant rich media assets can easily churn out more revenue or cut costs, while simplifying operations … or so we’re told. If you’ve been to NAB or CES or any number of conferences, you’ve heard the pitch: it’s an “easy” button that’s simple to add to the workflow and foolproof to operate, turning your massive amounts of uncategorized footage into metadata.
But should you take the leap? Before you sign on the dotted line, let’s take a closer look at the technology behind AI and what it can – and can’t – do for you.
First, it’s important to understand the bigger picture of artificial intelligence in today’s marketplace. Taking unstructured data and generating relevant metadata from it is something that other industries have been doing for some time. In fact, many of the tools we embrace today started off in other industries. But unlike banking, finance or healthcare, our industry prioritizes creativity, which is why we have always shied away from tools that automate. The idea that we can rely on the same technology as a hedge fund manager just doesn’t sit well with many people in our industry, and for good reason.
In the media and entertainment industry, we’re looking for various types of metadata that could include a transcript of spoken word, important events within a period of time, or information about the production (e.g., people, location, props), and there’s no single machine-learning algorithm that will solve for all these types of metadata parameters. For that reason, the best starting point is to define your problems and identify which machine-learning tools may be able to solve them. Expecting to parse reams of untagged, uncategorized, and unstructured media data is unrealistic until you know what you’re looking for.
AI has become pretty good at solving some specific problems for our industry. Speech-to-text is one of them. With AI, extracting data from a generally accurate transcription offers an automated solution that saves time. However, it’s important to note that AI tools still have limitations. An AI tool, known as “sentiment analysis,” could theoretically look for the emotional undertones described in spoken word, but it first requires another tool to generate a transcript for analysis. And no matter how good the algorithms are, they won’t give you the qualitative data that a human observer would provide, such as the emotions expressed through body language. They won’t tell you the facial expressions of the people being spoken to, or the tone of voice, pacing, and volume level of the speaker, or what is conveyed by a sarcastic tone or a wry expression. There are sentiment analysis engines that try to do this but breaking down the components ensures the parameters you need will be addressed and solved.
Another task at which machine learning has progressed significantly is logo recognition. Certain engines are good at finding, for example, all the images with a Coke logo in 10,000 hours of video. That’s impressive and can be quite useful. But it’s another story if you want to find footage that shows two people drinking what are clearly Coke-shaped bottles with the logo obscured.
That’s because machine-learning engines tend to have a narrow focus, which goes back to the need to define very specifically what you hope to get from it. There are a bevy of algorithms and engines out there. If you license a service that will find a specific logo, then you haven’t solved your problem for finding objects that represent the product as well. Even with the right engine, you’ve got to think about how this information fits in your pipeline, and there are a lot of workflow questions to be explored.
Let’s say you’ve generated speech-to-text with audio media. But have you figured out how someone can search the results? There are several options. Sometimes vendors of have their own front end for searching. Others may offer an export option from one engine into a MAM – that you either already have on premise or plan to purchase. There are also vendors that don’t provide machine learning themselves but act as a third-party service organizing the engines.
It’s important to remember that none of these AI solutions are accurate all the time. You might get a nudity detection filter, for example, but these vendors rely on probabilistic results. If having one nude image slip through is a huge problem for your company, then machine learning alone isn’t the right solution for you. It’s important to understand whether occasional inaccuracies will be acceptable or deal breakers for your company. Testing samples of your core content in different scenarios for which you need to solve becomes another crucial step. And many vendors are happy to test footage in their systems.
Although machine learning is still in its nascent stages, I’m encouraged that clients are interested in using it. At Chesapeake Systems, we have been involved in AI for a long time and have partnerships with many of those companies pushing the technology forward. We have the expertise to help you define your needs, sift through the thousands of solution vendors to find the ones who match those needs, and integrate those solutions into your pipeline to be fully useable.
Machine learning/artificial intelligence isn’t (yet, anyway) a magic “easy” button. But it can still do some magical things, and we’re here to help you break down your needs and create an effective custom solution to suit your specific needs.
To learn more about what AI can do you for you, contact Chesapeake at firstname.lastname@example.org