A Draft Guide to Vibecoding Data Visualizations and Analysis

By . . Version 0.1.0

This post is currently a draft with only "part 1" written up

A few years ago doing data analysis and visualization took either settling for the constraints of tools like Excel, or learning complex programming tools, where even experts frequently Googled details of how to use them.

Now AI has made it dramatically easier to use the same tools as experts. For small to medium datasets or problems, AI programming ("vibecoding") is likely one of the best ways to do analysis.

This is a rough draft guide for exploring vibecoding for data visualization (data viz). I started collecting notes after I helped lead a small workshop on this topic. Currently this post is a v0.1 version where I have written some of the points about choosing and installing a vibecoding tool, as well as a sketch of tips when using the tool. However, it is missing the key parts on actually using said tool. The talk I gave included working through a dataset. This is a critical part of a final guide, but needs to be integrated in.

What is Vibecoding?

Vibecoding is a term for programming with AI growing in popularity.

We forget the code exists and just ✨vibe✨.

"""

There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good ... I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works. """

Andrej Karpathy on X. Feb 2, 2025.

Setup

In this guide we are going to be using Cursor, a popular tool for Vibecoding.

Why Cursor?

There are a lot of tools for vibecoding. It can be helpful to organize them in a few categories:

(1) General Chat Tools: Using ChatGPT (or Claude etc) you can ask it to analyze or visualize

This gets clunky when doing something complex multiple times. You have to keep uploading your data to multiple chats, reexplaining your problem, and copying files. Features like Projects or Memories can help, but fall short of alternatives here. Still, it can be worth starting here when you know the task is straight forward and you won't be returning to the problem.

(2) Integrated Development Environments (IDE): Tools that integrate into the tools professional software engineers use. This is where Cursor is, but also tools like VS Code Copilot or Jetbrains Junie.

(3) Command Line Interfaces (CLI): Tools that you can use through a command line terminal (eg, Claude Code, Codex CLI, etc). These focus chatting about what you want, without deep integration with a code editor coming secondary. They edit and run code on your computer.

These work well. If you already pay for something like Claude, using Claude Code can be good. If you already pay for ChatGPT, using Copilot can be good.

(4) Cloud Agents: Tools that integrate into existing tools software developers use on the cloud not your local computer (eg, Claude Code Web, Codex Web, etc).

These are increasingly good for lots of kinds of software development. However, for data visualization or analysis, a quicker loop with files on your computer can be much easier.

(5) Vibe-first Web Tools: Tools like Lovable or Replit are designed around make websites or apps. However, for the kinds of data visualization this guide focuses, making a website or full app is not usually needed.

This guide chooses to focus on Cursor because:

  • It's popular, and was one of the first tools associated with "vibecoding"see quote above mentioning Cursor, and continues to integrate very functional user friendly features.
  • It has a high ceiling with a lot of flexibility and professional tool. However, like with someone learning Photoshop as their first photo editor, having a high ceiling and professional tool can make it somewhat overwhelming at first.
  • Learnings on Cursor can transfer to other tools. In particular Cursor is built on top of Visual Studio Code (VS Code). Many of the other tools share a lineage with VS Code, so it fairly easy to switch.
  • It's what I've personally used the most, so have the best sense of it.
  • Not crazy expensive. There's a free plan, a one week free trial, and a year free for students. The cheapest paid plan is $20, and possible to only just use it for a month for one thing and stop.

However, realistically most of the tools will work. I wouldn't stress over this decision. Choosing any in category #2 or #3, have low cost to switch. The category #4 and #5 that work in the cloud can take more steps to get out of a given provider cloud and into another platform relative to when the files are contained on your computer

Installing

Installing https://cursor.com/download is pretty straight forward. There are extra steps on installing Python and Git.

In the workshop it took around 45 minutes to get to everyone setup, but was very doable.

Getting to A First Analysis

I put together a starter template found online for the talk. In a future iteration I want to build this out more and explain how to use this.

Touring Cursor Interface

This section needs to be added to the guide. In the talk it was done interactively, but I can pull out nice screenshots.

Exploring a Sample Dataset

In this part of the talk we worked through analyzing this dataset of US baby names from 1890-2024. It's a very cool dataset, that I think people at the workshop had fun with. It's also an example of data that would be a real pain to do in something like Excel, as every year is its own file. However, point Cursor at the folder, and it has no problem figuring out how to process. A future iteration of this guide post needs to go through this.

I also curated out a few other datasets that participants could play with, that needs to be put together in this guide.

Keeping the Good Vibes

Here is a sketch of some of tips for vibecoding data viz.

Will I destroy my computer?

Probably not.

  • The most popular AIs are pretty smart and "aligned" (they don't want to do bad things).
  • If you stick to asking about analysis questions on relatively straight forward data you very unlikely to break something. The biggest risk is likely that you make your computer freeze for a bit because tried too big of a problem. But then can just stop whatever you are running.
  • That said, if you are doing something like integrating important data through something like a database connection (in contrast to a file you download on your computer) or applying it to high stakes decisions, caution is needed.
  • Standard no warranty is implied here etc...

You can ask when you are confused

The Ais are smart. If you don't know what is going on, ask to explain saying "I'm completely unfamiliar with X. Please explain."

Context is good!

You can keep track of what you are writing up or your research questions in a file, and then just give it to the model as context. "eg, I'm working on @report.md and now trying to..."

There are a few things to know about:

  • Markdown files:
    • Text files, but with section headers, links, and bolding
    • LLMs really like these things. It's how they output text themselves in chat windows.
    • https://www.markdownguide.org/cheat-sheet/
  • Easy to work with dataset format
    • .py
      • Python files. Where the code goes
    • csv files / tsv / text files
      • comma seperated files. Like rows in a spreadsheet
        • These are usually easy to work with in AI
    • xlsx
      • Excel files. Can be more annoying, but LLMs can eventually figure it out. You might need to look at the data
    • .parquet
      • If you have to save out intermediary results, this is a good format.
      • You can't open it in a spreadsheet, but can ask the LLM what is the data format ("schema") of one them if need to check.

Look for opportunities to start new chats

  • The LLMs can still sometimes get a bit confused on very long chats
  • At the end you can say something like "ok, things are looking good. Let's clean up our code, clean extra files".
  • Then press the "Keep all" button in Cursor and make a new chat.
  • Starting new chats keeps cost down.

Use the reset function

  • Cursor lets you go back to how things were at the start of the chat. Useful to use.

Ask the model not write any code

  • Often just have questions or are trying to plan. Ask the model to not write any code.

Know the models like to make things seem good

The LLMs are trained to do hard coding tasks, and penalized when there are error messages or crashes. Thus, they tend to just make the errors silently go away.

Eg, you might ask for a scatter plot of some data, but then half the data points have missing values. The LLM might just ignore these without you realizing it.

If things don't seem right, they might not be

You might need to dig deeper for any inconsistencies to explore. Without prompting the models will not necessarily cross compare results.

Conclusion

This draft guide starts with setting up a vibe coding tool for data visualizing. Further iteration is needed.

It is an interesting topic, and I hope there is further work in helping people understand possibilities here.