Big data tops humans at picking 'significant' films: study
By Sharon Begley
NEW YORK (Reuters) - In the escalating battle of big data vs. human experts, score another win for numbers.
The most accurate predictions of which movies the U.S. Library of Congress will deem "culturally, historically, or aesthetically significant" are not the views of critics or fans but a simple algorithm applied to a database, according to a study published on Monday.
The crucial data, scientists reported in Proceedings of the National Academy of Sciences, are what the Internet Movie Database (IMDb.com) calls "Connections" - films, television episodes and other works that allude to an earlier movie.
For 15,425 films in IMDB.com examined in the study, the measure that was most predictive of which made it into the Library of Congress's National Film Registry, which honors "significant" movies, was the number of references to it by other films released many years later.
The 1972 classic "The Godfather," for instance, is referred to by 1,323 films and television episodes, which as recently as 2014 quoted the "offer he can't refuse" line, referred to the famous horse-head scene, or played the theme music, for instance. "Godfather" made the registry in 1990.
The number of references to a film more than 25 years after its release was a nearly infallible predictor of whether it would make the registry, topping 91 percent accuracy, said applied mathematician and study author Max Wasserman of Northwestern University.
Critics' judgments, Oscar wins, and box-office numbers did not come close.
Films are nominated for the registry by the public and chosen by the Librarian of Congress in consultation with a board of experts including critics, academics, directors, screenwriters and other industry insiders. Continued...