The future of Computer Vision: Q&A with Trax experts

Trax Retail
Trax Retail Trax Retail

The future of Computer Vision: Q&A with Trax experts

An interdisciplinary field of science that enables computers to see, identify and process images much as the human eye does, Computer Vision (CV) technology is rapidly transforming the retail landscape. Three experts at Trax walk us through the future of CV-based applications in retail.

Our Trax experts

We know that CV is inspired from the human visual cortex. Are we at a stage where machines are at par with, or even better at object detection or classification than human vision?

Ziv: Definitely. In fact, in some tasks, we have achieved superhuman levels of vision with computers. For example, in the famous ImageNet challenge, you feed a system thousands of classes of objects like “container ship”, “mite”, “mushroom” or “cherry”, and the computer has to classify images into each of these classes. And what we have seen is that the accuracy of the best candidates in the competition has improved drastically – from around 74 percent in 2012 to 95 percent in 2017.

In simple terms, this means that computers are getting better than even humans at classifying objects correctly like this.

The human eye suffers from certain innate biases, often as a result of centuries of pattern-seeking tendencies by our ancestors. Can machines be trained to be fool proof?

Dolev: CV systems are definitely not fool proof. Much like the optical illusions that befuddle the human brain, CV systems can also be tricked using “adversarial images”. These are patterns and pictures that exploit weaknesses in CV algorithms to fool them into mistaking a panda for a gibbon or a cat for guacamole. In fact, a team of students at MIT published a study in 2017 which showed how they could fool a system into wrongly classifying a photo of a 3D-printed turtle as a rifle!

Malicious actors could use this to cause harm, like manipulating face recognition tools into recognizing the wrong people or to attack the CV systems that enable self-driving cars. For example, a small patch on the side of the motorway could make a self-driving car think that it is looking at a stop sign.

What are some real-life applications of Computer Vision that you’re excited about?

Ziv: Many applications were stuck before deep learning, with only very small improvements in accuracy – 0.3 percent or so every year. But with the advances in deep learning, CV experienced a very big leap forward, resulting in many cross-industry applications.

The autonomous vehicles industry is buzzing with activity with a number of major manufacturers and tech giants entering the market. Based on the level of autonomy offered, self-driving vehicles fall into five stages; ranging from Level 1 vehicles that require  significant involvement from the human driver to Level 5 – fully autonomous vehicles. Most of today’s self-driving vehicles fall into Level 4, where self-driving is possible but within pre-mapped routes.

Yair: The defense industry continues to be arguably the most dominant user of such technologies. It’s pretty common to see countries using sensors and camera-equipped drones in a battlefield environment to develop safer combat strategies and protect soldiers.

A lesser-known but massively impactful use of CV is in analyzing and monitoring crops in agriculture. Using camera-mounted drones, farmers can capture images of the field to detect the health of crops, pest infestations and other deficiencies that could impact harvest yield.

Dolev: But it’s retail that we obsess about! We use CV to capture shelf images to analyze individual products. Trax helps digitize the shelf to reduce audit times for sales reps, and translate the images to data for category management, shopper marketing and space planning teams to reduce out-of-stocks, improve distribution and gain market share over competitors.

Will CV become a commodity?

Yair: Deep learning has commoditized some applications of computer vision. Recognizing an object on your mobile phone is no longer something that only the big companies can do. Everyone can take open source code, public data sets and train a system quite easily. These can provide you with a very reasonable level of accuracy in object recognition.

Big players like Google, Facebook, Microsoft and Amazon will soon be able to offer out-of-the-box CV solutions for mainstream applications. But if you want to develop something new or niche or take an application’s capabilities to the next level, you need niche capabilities.

Ziv: Let’s take the retail sector as an example.  While today’s advanced image recognition algorithms are capable of recognizing objects within an image with great accuracy, the process becomes much more complex in a retail setting.

Here, you have properties that are not common – crowded environments, ever-changing SKUs, near-identical or similar products. So an automated image recognition platform must be able to meet certain key criteria to ensure a high level of accuracy – the ability to distinguish multiple products that are nearly identical in appearance, overcome obscure and reflective packaging under poor lighting conditions, and detect changes in the product life-cycle like new design versions.

Example of differences in packaging on a Classic Coca-Cola 1L bottle

What can we look forward to from Trax - what cool applications are you working on? 

Dolev: One application we are quite excited about is the Store Mapper. It uses image recognition to map physical retail stores and digitize them into a 2D map. Shoppers can use an app on their phones to be directed to the right aisles using AR-based location guides, targeted with location-based promotions and be alerted on any items that are running out. Scanning a product on the phone will open up its information and nutritional value, while a virtual assistant helps shoppers add and track their lists.

To find out more about how Computer Vision has evolved from student research labs to finding useful application across industries, download our eBook The Past, Present and Future of Computer Vision.

Back to top