We are approaching the age of artificial intelligence at a rapid pace. Whether this is a good or bad thing remains to be seen – and is the topic of a contentious debate among some of the smartest people on the planet.
Today’s announcement by Microsoft has me quite impressed. They’ve developed an incredibly accurate way for computers to ‘see’ or, rather, to understand what is in front of their camera(s). In other words, they can understand what they’re seeing and quickly identify things faster than you. They’re powerful computer eyes.
Read “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification” announcement and download the PDF here.
The benchmark has typically been something called the ImageNet dataset. This is a grouping of images that humans can examine and identify. The typical error rate of a human is about 5.1 percent. Microsoft’s new deep-learning computer system has achieved a rate of just 4.94 percent. While that may not seem like a significant improvement, it’s only the beginning for the computer-based approach. Recall and interpretation are slowly being developed to a point where computers will be able to instantly and reliably understand the ImageNet dataset with an even smaller error rate.
Prior to Microsoft’s breakthrough, the best result had been from Google (6.66 percent error rate) and Baidu (5.98 percent error rate).
What The ImageNet Test Looks Like
In short, the computer or a human subject identifies pictures. In the below images, the term ‘GT’ means ‘ground truth’ which is essentially the correct answer:
What you see below are additional images that are part of the test. How do you do? Think you can identify all the images without looking at the ‘GT’ answer?
What Microsoft Is Saying
“To our knowledge, our result is the first published instance of surpassing humans on this visual recognition challenge,” the paper states. “On the negative side, our algorithm still makes mistakes in cases that are not difficult for humans, especially for those requiring context understanding or high-level knowledge…
“While our algorithm produces a superior result on this particular dataset, this does not indicate that machine vision outperforms human vision on object recognition in general . . . Nevertheless, we believe our results show the tremendous potential of machine algorithms to match human-level performance for many visual recognition tasks.”
Jian Sun of Microsoft Research sheds some light on why computers aren’t perfect in this Microsoft post: “Humans have no trouble distinguishing between a sheep and a cow. But computers are not perfect with these simple tasks. However, when it comes to distinguishing between different breeds of sheep, this is where computers outperform humans. The computer can be trained to look at the detail, texture, shape and context of the image and see distinctions that can’t be observed by humans.”