Dev Notes 13: Progress in IMDb ratings and Wikidata Studies

Gros, David

Dev Notes 13: Progress in IMDb ratings and Wikidata Studies

By David Gros. November 18, 2025 . Version 0.1.0

Dev Notes (DN) document progress towards a larger article. Discussion is preliminary.

Today's notes discuss two projects. The main thing I'm trying to finish is the Wikidata LLM interp study that has been the focus of Dev Notes over the last week and I started on a draft of yesterday. As a completely separate topic, I started on an analysis of IMDb movie reviews.

Key findings:

Getting data from IMDb is annoying, but I think doable for my key questions.
Progress on Wikidata draft and running small set on LLMs

IMDb rating reviews

One of the most popular movies of this year is KPop Demon Hunters. A surprise hit, it was a Netflix original musical, that did a brief weekend showing in theatres. It is hard to quantify how it compares to other films of the year, given it doesn't cleanly appear among traditional rankings like box office numbers, but has had much more cultural staying power than, say, the number-two box office film of 2025: Lilo & Stitch.

**Figure 1.** Google Trends data for 2025. KPop Demon Hunters in red vs Lilo & Stitch in blue.

I enjoyed it. Fun, artistic, and some songs are bangers. My first reaction was it was a maybe 7/10 film, though later after reflecting on the cultural phenomenon and how much I was thinking about it and how many times it came up in conversation, I updated to 8/10, and could reasonably go higher. Like about 100k other people, I made way to IMDb to play film credit. When getting there, I noticed something a bit weird.

Movie ratings of kpop demon hunters

A solid 7.6 average. The most common rating is an 8. Then a fairly substantial number of people rate it a 10. However, I noticed somewhat weirdly very few rated it a 9. There's a "9 gap". Is this normal?

Let's look at another animated movie, Toy Story 4.

toy story 4 rating dist

It has a similar 7.6 rating. However, it has no apparent 9-gap. So this doesn't just apply to all movies.

In something like Google maps or Amazon reviews, there's a pretty clear distribution where people rate something either 1 stars or 5 stars, but this isn't quite that same phenomenon of either 1 or 10. Indeed, this "9 gap" does not apply to all films. For example, here's Toy Story 4.

This seems interesting to explore what kind of have this kind of distribution and the various kinds of distributions.

Getting data to investigate this

Basically what I need here is a list of movies that includes KPop Demon Hunters, and these rating distribution graphs they show.

Despite having 96k reviews, and seemly being a big cultural hit, it looks like

Unfortunately, IMDb doesn't really have any public API for getting this data. They will sell access to some dataset, for $150,000 a year and even then charges about a cent per MB you download. Wow... That's not going to work. There's a one month free trial, which strikes me as deeply weird. Even Netflix won't offer a free trial, but you can just sign up for a $12k/month IMDb data trial? There's not even metered cost on the trial version...? So weird, but maybe an option.

Then there's just using the client API. Looking at the console tools, they anticipate this, and send you a scolding with every request.

So how to do this analysis? The data is right there. It's on the website. I need like, a megabyte of data here, so not going to bring down a site which gets over 3 billion page views per month I'm pretty sure there will be a way to do this without being a jerk (or illegal).

I'll try not directly touch their GraphQL endpoint. They clearly don't want that. A truly manual approach just clicking through 1000 pages just writing down in a spreadsheet seems tractable in a day or two. There's some compromise in workflow help that gets this down to a few hours. Else, I might sign up for a very weird online free trial while trying to not accidentally buy a small house-worth of data. We'll see how it goes.

Progress in Wikidata Study

I made decent progress in the study on Wikidata and LLM interpretability. I wrote some background, and tried to work out the article structure more. I also ran 100 prompts on a few different models. Basic results show the full matching is under around 20% for the models I tested. I started downloading a bit less toy sample of page view data and entity data. I will get that done, and will try to merge the result table into the article draft.

Conclusions

Today's notes diverge on topics. I did not want to spend all month getting sucked into the wikidata project. So this movie exploration has been a good counterbalance. Tomorrow I expect I'll have more to share on both these topics.