Dev Notes 01: Exploration and Setup of GPT-OSS-Safeguard
By David Gros. . Version 1.0.0 This is the first of possibly several "Dev Notes" started as part of daily posts in November. These document some of the process when working towards a larger article. A few days ago OpenAI released an interesting model – "GPT-OSS-Safeguard" (OpenAI, 2025b). This is a followup on GPT-OSS (OpenAI et al., 2025a), a version of their headline product with openly downloadable weights. They took this base model, and tuned it to classify whether text is safe or not under a given policy. It's intended for use cases like content moderation, with a "bring your own policy" contribution. They contrast this with prior moderation models where the policy is baked in. After reading the paper, I was left wondering several things: I don't have access to a lab machine that can run the model right now. I attempted to coax my desktop with 32GB of RAM to run it on CPU, but in my attempt it did not cooperate. Instead, I turned to cheap cloud providers. Some providers stretch rules to operate cheaper than more established clouds such as AWS, Azure, GCS, etc. Since 2017 NVIDIA has fought their way to the top AI totem poleRecently becoming the first company with a $5T dollar market cap in one small part by restricting how their GPUs are used which helps them maintain high margins for different product segments. According to their license agreements their consumer cards (such as RTX 4090) originally aimed at gaming cannot be used in datacenters. However, some providers still offer these cards. Some brand themselves as a "marketplace" for connecting with small independent hosts as a possible workaround. But they still basically try to be like a normal cloud, and at least one (Runpod) offers "secure cloud" instances, which certainly seems like they are operating a datacenter. Overall unclear how it works, but they are about a tenth the price of renting a true datacenter card. Runpod. After brief research Runpod seemed like the initial best choice. After selecting some options, handing over $10, and uploading an SSH key, I had a machine up and running. However, I ran into a series of issues connecting to the machine. The first seemed to be due to an undocumented lack of support for RSA SSH keys. Giving it an ed25519 key made it happy. However, the connection was a weird (proxied?) SSH connection that would not cooperate with scp, rsync, or VS Code remote development. The docs say the latter should work, but in brief tries it would not. Bummer. I moved on, and will hopefully find another use for the credits when I don't need interactive use. Vast AI. This worked. It was also cheaper than Runpod for a more powerful machine with lower latency. Interestingly the machine I used booted up right into a tmux session. I actually like this. However, it did break VS Code remote development. Manually configuring the ~/.ssh/config with a RemoteCommand allowed it to connect: With a bit of reallocating the machine (loading two 20B Models + the environment used over 150GB of disk, more expected), I was able to locally run both GPT-OSS and its safeguarding sibling. Huggingface Inference. My initial curiosity was around model diffing for the safeguard variant, thus needed local parameter access. However, some of my questions just need blackbox access. While the various HF blackbox APIs don't always support the latest models, it seems like it might work. An initial step in exploring the model is replicating their results on the datasets they use (ToxicChat (Lin et al., 2023) and a Moderation Dataset (Markov et al., 2023)). Unfortunately they do not seem to give their prompts for their evals. I need to experiment more and see if I can replicate their numbers with a mix of policy prompts. Please share if you found some sample prompts for the model. There's a lot of interesting things to explore with GPT-OSS-Safeguard. After today I have better understanding of the model (/knowledge it exists. I just saw it this morning), better understanding of model diffing prior work, and the infrastructure to run it. Hopefully I'll have some results starting tomorrow.
Some Questions For Exploration
Attempts At Running The Model
Host vast-gpu
HostName <FILL IP>
Port <FILL PORT>
User root
IdentityFile ~/.ssh/id_ed25519
RemoteCommand bash -l
RequestTTY no
Replicating the results
Conclusion
