NEW YORK (Reuters) - In the escalating battle of big data vs. human experts, score another win for numbers.
The most accurate predictions of which movies the U.S. Library of Congress will deem “culturally, historically, or aesthetically significant” are not the views of critics or fans but a simple algorithm applied to a database, according to a study published on Monday.
The crucial data, scientists reported in Proceedings of the National Academy of Sciences, are what the Internet Movie Database (IMDb.com) calls “Connections” - films, television episodes and other works that allude to an earlier movie.
For 15,425 films in IMDB.com examined in the study, the measure that was most predictive of which made it into the Library of Congress’s National Film Registry, which honors “significant” movies, was the number of references to it by other films released many years later.
The 1972 classic “The Godfather,” for instance, is referred to by 1,323 films and television episodes, which as recently as 2014 quoted the “offer he can’t refuse” line, referred to the famous horse-head scene, or played the theme music, for instance. “Godfather” made the registry in 1990.
The number of references to a film more than 25 years after its release was a nearly infallible predictor of whether it would make the registry, topping 91 percent accuracy, said applied mathematician and study author Max Wasserman of Northwestern University.
Critics’ judgments, Oscar wins, and box-office numbers did not come close.
Films are nominated for the registry by the public and chosen by the Librarian of Congress in consultation with a board of experts including critics, academics, directors, screenwriters and other industry insiders.
By the 25-year-lag rule, the 1971 box-office disappointment “Willy Wonka & the Chocolate Factory” should be in the registry: IMDb lists 52 long-lag citations to it, the 37th most in the Northwestern analysis.
In December, six months after the scientists submitted their paper, the Library added “Willy Wonka” to the list of 650 cinematic immortals, just as the research predicted.
“Experts have biases that can affect how they evaluate things,” said physicist and co-author Luis A.N. Amaral of Northwestern. “Automated, objective methods don’t suffer from that. It may hurt our pride, but they can perform as well as or better than experts.”
Other movies identified by the Northwestern algorithm as likely to make the Registry include “Dumbo,” “Spartacus” and “The Shining.”
Of course, humans are not entirely superfluous: flesh-and-blood creators must decide to refer to an earlier gem in order to establish the crucial IMDb “connections.”
Reporting by Sharon Begley; Editing by Nick Zieminski