Scaling in ML/AI media workflows

  ·   Apr 20, 2021

Some weeks back, I was invited to a DPP panel discussion where one of the topics was how to scale your machine-learning (ML) workflows. I’m a woman of many hats, and in addition to my work at Umeå University (find Stockholm on a map, and let your eyes follow the coast upwards - we’re just below the arctic circle), I’m also acting CTO at Adlede AB and Codemill AB, two media tech companies with a strong emphasis on AI/ML in their product offerings. This means that when I was later asked to write a short blog on the same topic, I had many good people to call on, and there are a few points we wanted to share.

Some weeks back, I was invited to a DPP panel discussion where one of the topics was how to scale your machine-learning (ML) workflows. I’m a woman of many hats, and in addition to my work at Umeå University (find Stockholm on a map, and let your eyes follow the coast upwards - we’re just below the arctic circle), I’m also acting CTO at Adlede AB and Codemill AB, two media tech companies with a strong emphasis on AI/ML in their product offerings. This means that when I was later asked to write a short blog on the same topic, I had many good people to call on, and there are a few points we wanted to share.

photo.jpg

Codemill, the sister company of Adlede, helps major Media & Entertainment customers to remaster their video workflows, moving hardware-demanding on-premise workflows to cloud. This also gives access to a built-out ML/AI machinery that helps increase the speed and quality of content production by improving searchability, automating editing tasks, and adding additional layers of security in compliance checking. A simple example is to use ML to locate the start and end of the intros, and add the metadata needed for users to skip through them. This tagging would otherwise be manual, with the human annotator having to step back and forth through the video frames to find the exact time points.

Another typical use case for ML is to detect explicit content, that is, violence, guns, and rock n roll. This is difficult to solve exactly, but a common way forward is to use over-sensitive classifiers that flag everything that seems remotely problematic, and then a human annotator checks the flagged parts and identifies the true occurrences. This hybrid solution is still not perfect, but it may be the only option when the data stream moves so rapidly that a completely manual inspection is not possible. To improve performance, we can recognise that when it comes to training data, a little gold is better than a lot of garbage - correcting a small percentage of misclassifications in the data can have the same effect on classifier accuracy as doubling the size of the dataset. Many of the solutions that we build on top of AWS Rekognition includes a feedback loop, to allow the user to correct erroneous metadata, so that the system can improve with time.

Finally, we can scale not only the technology, but also the team. We think that distributed production is the future, and have put together Accurate Video, a suite of tools for solving everyday use-cases with video. Tasks like preparing content, tagging, viewing and even editing can be done via cloud using a standard web browser, instead of using dedicated equipment, installed software and large storage servers. This makes scaling teams easy, and enables remote working that otherwise would not have been possible. Like operating out of Umeå.

By Johanna Björklund

Co-Founder & CTO

This story first appeared on thedpp.com.

Read more