Dev Notes 10: Progress In Wikidata Processing to Study LLMs Ability To Articulate Connections

By . . Version 0.1.0

Dev Notes (DN) discuss incremental progress towards a larger article. Discussion is preliminary.

Today's notes document progress towards a study of using Wikidata to explore how well LLMs can articulate connections between text. This builds on DN-08 which gave a rough initial outline of some of the topics and on DN-09 which just described some progress in processing wikidata.

Some findings for today:

  • We try to condense some of the main research questions of the study
  • We demonstrate building a "mini-wikidata" to pull out good entity relations to form our questions to LLMs

Consolidating Study Scope

Motivation and Background

There is great interest in interpreting and explaining how the internals of LLMs work. Understanding some of this process might be useful for reasoning aspects like LLM deception, confidence, or preferences. At a simple level for understanding, LLMs take in a token, which becomes a dense vector, which goes through several layers where each layer is an opaque dense vector, before finally being projected to predict a next token.

Individual components the elements of these dense vectors do not always relate to concepts or algorithms the model has learned, as several features get represented at the same time (Elhage et al., 2022).

To address this, several techniques have been proposed. For example, sparse autoencoders (Bricken et al., 2023; Cunningham et al., 2023) and crosscoders (Lindsey et al., 2024) can help convert dense representations inside the network into sparser activations which might relate to individual concepts in the model.

However, even once someone arrives at a sparse feature, one is still at an impasse. One only knows that there is some text where a given feature activates more, and some text where a feature activates less, but does not necessarily know what concept the feature corresponds to, or even if it corresponds to an explainable feature at all.

An approach to solve this can be to show LLMs themselves examples of this text for a given (sparse) feature, and have it explain the pattern. Bills et al. (2023) showed GPT-4's ability to explain certain neurons (eg, that it appears to fire on movies), which was validated by having another instance of GPT-4 simulate from the explanation to measure whether the explanation worked. However, there was not a known ground truth to the underlying neuron. Sherburn et al. (2024) created 20 hand-crafted categories of rules and saw how well LLMs could articulate the rules from classification examples. The manual process meant that exploration was limited to this relatively small set of rules.

In this study we want to understand the potential of Wikidata as a possible rich source of complicated patterns. Wikidata is a dataset of triples of the form (entity, property, target). We can identify sets of entities that share a property and a target, paired with those that do not. We can then see how well LLMs can identify this known ground truth relationship. This covers only a small subset of the kinds of features we might expect to see within a neural network; i.e., those that relate to knowledge rather than other more complex algorithmic features (eg, the 3rd item in a list). However, by adding some understanding of how LLMs perform on this task, we broaden our set of datasets for studying automated interpretability.

Research Questions

  • How well can language models find connections in sets of Wikidata entities, and how choices like the example selection strategy influence performance?
  • How well can language models articulate the connections under different prompts?
    • We hope to seperate out cases where it articulates a pattern that matches on the given data vs one that generalizes to the true representation.

Additionally, I think by the end of this we'll have a "miniwikidata" artifact that can be used by others to study this area.

Building A Set of Interesting Properties

The pipeline is seemingly coming together. While there might be a few changes for the final version here is the current approach I have.

Identify Important Entities Wikidata has a massive number of entities. My initial attempts at finding the most common properties via Wikidata's traditional query interface returned a lot of "junk" entities and properties. For example, a top property found was the "elo score" property. Presumably someone uploaded a massive dataset of chess matches to Wikidata. These are not very interesting for our main research questions around LLMs connecting knowledge.

Under the current pipeline, we first try to find important entities. Wikidata does not have any clear notion of importance. However, most entities relate to Wikipedia articles. We use Wikipedia page views as a proxy of importance. The data from Wikipedia comes as hourly page view aggregations. We randomly sample 20 hours in 2024, download data for these hours, and sum together the page views. Table 1 shows the top pages here. This represents a noisy sample of the most important entities of the year. While we see some reasonable entities, others are more mysterious (eg, "Cleopatra") suggesting we might need to scale up the number of hours we sample.

RankPageViewsWikidata ID
1Deaths in 2024121,859Q123489953
2Cleopatra105,883Q635
3J. D. Vance97,342
4XXXTentacion82,487Q28561969
5Weightlifting at the 2024 Summer Olympics – Women's 49 kg75,740Q116495986
6Bob Menendez75,108Q888132
7YouTube73,930Q866
8Pornhub72,666Q936394
9Lyle and Erik Menéndez71,410
10Biggest ball of twine67,400Q4906916
11Kepler's Supernova66,182Q320670
12.xxx65,772Q481
13Tim Walz65,425Q2434360
142024 Summer Olympics62,453Q995653
15Matthew Hudson-Smith61,339Q16575549
16Deadpool & Wolverine61,319Q102180106
17Pushpa 2: The Rule57,510Q112083510
182024 Indian general election55,112Q65042773
19XXX (2002 film)52,061Q283799
20Portal:Current events51,853Q4597488
Table 1. Top 20 Wikipedia/Wikidata entity pages by views (sampled from 20 random hours in 2024). Not all entities are successfully joined connecting pages to Wikidata. Table entry hyperlinks lead to the Wikipage and the Wikidata pages.

Scraping Entities After identifying important entities we download a sample of 10,000 from the top 100,000 articles with the most page views. We flatten the properties into a flat parquet dataframe, and pull the natural language description of the property relations. This serves as a basis for a "miniwikidata" that we can use for initial analysis. Table 2 shows the most common property-value pairs in our sample.

PropertyValueEntities
instance ofQ53,603
sex or genderQ65810972,644
languages spoken, written or signedQ18602,080
country of citizenshipQ301,813
occupationQ339991,278
country of originQ301,145
sex or genderQ65810721,132
occupationQ10800557990
original language of film or TV showQ1860920
occupationQ10798782893
Table 2. Most common property-value pairs in the sample.

Additional effort is needed to convert these into prompts. However, one can start to see the sketch of how this can come together. We can take all the entities that are say the same occupation (P106) and then pair these with those of other occupations and see if the LLM can categorize these. This would presumably be somewhat easy, however, as we get deeper in the list ranking of the relationships it gets more challenging. Additionally one can construct intersections of rules (eg, occupation=Q10798782 and country of citizenship!=Q30) which encodes to "Television actors whose country of citizenship is not American". Paired with a bunch of other entities we challenge the LLMs pattern articulation abilities.

Conclusions

Here I outline some of the progress towards this exploration. Tomorrow I hope to demonstrate an LLM prompt version of this task.