Dev Notes 10: Progress In Wikidata Processing to Study LLMs Ability To Articulate Connections
By David Gros. . Version 0.1.0 Dev Notes (DN) discuss incremental progress towards a larger article. Discussion is preliminary. Today's notes document progress towards a study of using Wikidata to explore how well LLMs can articulate connections between text. This builds on DN-08 which gave a rough initial outline of some of the topics and on DN-09 which just described some progress in processing wikidata. Some findings for today: There is great interest in interpreting and explaining how the internals of LLMs work. Understanding some of this process might be useful for reasoning aspects like LLM deception, confidence, or preferences. At a simple level for understanding, LLMs take in a token, which becomes a dense vector, which goes through several layers where each layer is an opaque dense vector, before finally being projected to predict a next token. Individual components the elements of these dense vectors do not always relate to concepts or algorithms the model has learned, as several features get represented at the same time (Elhage et al., 2022). To address this, several techniques have been proposed. For example, sparse autoencoders (Bricken et al., 2023; Cunningham et al., 2023) and crosscoders (Lindsey et al., 2024) can help convert dense representations inside the network into sparser activations which might relate to individual concepts in the model. However, even once someone arrives at a sparse feature, one is still at an impasse. One only knows that there is some text where a given feature activates more, and some text where a feature activates less, but does not necessarily know what concept the feature corresponds to, or even if it corresponds to an explainable feature at all. An approach to solve this can be to show LLMs themselves examples of this text for a given (sparse) feature, and have it explain the pattern. Bills et al. (2023) showed GPT-4's ability to explain certain neurons (eg, that it appears to fire on movies), which was validated by having another instance of GPT-4 simulate from the explanation to measure whether the explanation worked. However, there was not a known ground truth to the underlying neuron. Sherburn et al. (2024) created 20 hand-crafted categories of rules and saw how well LLMs could articulate the rules from classification examples. The manual process meant that exploration was limited to this relatively small set of rules. In this study we want to understand the potential of Wikidata as a possible rich source of complicated patterns. Wikidata is a dataset of triples of the form (entity, property, target). We can identify sets of entities that share a property and a target, paired with those that do not. We can then see how well LLMs can identify this known ground truth relationship. This covers only a small subset of the kinds of features we might expect to see within a neural network; i.e., those that relate to knowledge rather than other more complex algorithmic features (eg, the 3rd item in a list). However, by adding some understanding of how LLMs perform on this task, we broaden our set of datasets for studying automated interpretability. Additionally, I think by the end of this we'll have a "miniwikidata" artifact that can be used by others to study this area. The pipeline is seemingly coming together. While there might be a few changes for the final version here is the current approach I have. Identify Important Entities Wikidata has a massive number of entities. My initial attempts at finding the most common properties via Wikidata's traditional query interface returned a lot of "junk" entities and properties. For example, a top property found was the "elo score" property. Presumably someone uploaded a massive dataset of chess matches to Wikidata. These are not very interesting for our main research questions around LLMs connecting knowledge. Under the current pipeline, we first try to find important entities. Wikidata does not have any clear notion of importance. However, most entities relate to Wikipedia articles. We use Wikipedia page views as a proxy of importance. The data from Wikipedia comes as hourly page view aggregations. We randomly sample 20 hours in 2024, download data for these hours, and sum together the page views. Table 1 shows the top pages here. This represents a noisy sample of the most important entities of the year. While we see some reasonable entities, others are more mysterious (eg, "Cleopatra") suggesting we might need to scale up the number of hours we sample. Scraping Entities After identifying important entities we download a sample of 10,000 from the top 100,000 articles with the most page views. We flatten the properties into a flat parquet dataframe, and pull the natural language description of the property relations. This serves as a basis for a "miniwikidata" that we can use for initial analysis. Table 2 shows the most common property-value pairs in our sample. Additional effort is needed to convert these into prompts. However, one can start to see the sketch of how this can come together. We can take all the entities that are say the same occupation (P106) and then pair these with those of other occupations and see if the LLM can categorize these. This would presumably be somewhat easy, however, as we get deeper in the list ranking of the relationships it gets more challenging. Additionally one can construct intersections of rules (eg, occupation=Q10798782 and country of citizenship!=Q30) which encodes to "Television actors whose country of citizenship is not American". Paired with a bunch of other entities we challenge the LLMs pattern articulation abilities. Here I outline some of the progress towards this exploration. Tomorrow I hope to demonstrate an LLM prompt version of this task.
Consolidating Study Scope
Motivation and Background
Research Questions
Building A Set of Interesting Properties
Rank Page Views Wikidata ID 1 Deaths in 2024 121,859 Q123489953 2 Cleopatra 105,883 Q635 3 J. D. Vance 97,342 — 4 XXXTentacion 82,487 Q28561969 5 Weightlifting at the 2024 Summer Olympics – Women's 49 kg 75,740 Q116495986 6 Bob Menendez 75,108 Q888132 7 YouTube 73,930 Q866 8 Pornhub 72,666 Q936394 9 Lyle and Erik Menéndez 71,410 — 10 Biggest ball of twine 67,400 Q4906916 11 Kepler's Supernova 66,182 Q320670 12 .xxx 65,772 Q481 13 Tim Walz 65,425 Q2434360 14 2024 Summer Olympics 62,453 Q995653 15 Matthew Hudson-Smith 61,339 Q16575549 16 Deadpool & Wolverine 61,319 Q102180106 17 Pushpa 2: The Rule 57,510 Q112083510 18 2024 Indian general election 55,112 Q65042773 19 XXX (2002 film) 52,061 Q283799 20 Portal:Current events 51,853 Q4597488 Property Value Entities instance of Q5 3,603 sex or gender Q6581097 2,644 languages spoken, written or signed Q1860 2,080 country of citizenship Q30 1,813 occupation Q33999 1,278 country of origin Q30 1,145 sex or gender Q6581072 1,132 occupation Q10800557 990 original language of film or TV show Q1860 920 occupation Q10798782 893 Conclusions
