The AI Prepper Disk | Definitely not now, but hopefully not never ...
Share
A short tale of our attempt to incorporate an AI chatbot in Prepper Disk
When we started Prepper Disk, we had no idea just how big an idea this would be. We expected there may be some interest in the core community of prepping enthusiasts, but we never expected the product would resonate with homesteaders, ham radio enthusiasts, campers, sailers, and every day folks who are thinking more about emergency preparedness these days.
Founded in November 2024, we spent the first 9 months of existence working out supply chain issues, updating the software, striking content deals, commissioning custom eBooks, and listening closely to customers to make the device better.
The Concept
While listening, one of the things we heard every week or two was how interesting an AI chatbot would be. Imagine taking all the Prepper Disk content of Wikipedia, WikiHow, US Military Manuals, the Post Disaster Library, StackExchange, Ready.gov and more - all into a single LLM (Large Language Model / AI Chatbot). As employee 1 of 1.5 engineers, I got busy working on the concept, using a number of open source software tools.
Within a month, we had a pretty responsive chatbot working on Raspberry Pi 5 (an upgrade from the current Prepper Disk Premium’s Pi4B). It was certainly not anything compared to the power of the big cloud LLMS (like ChatGPT or Gemini) but it was satisfyingly quick and could speak convincingly on any number of topics.
The Promise and the Reality
At this stage, we were feeling confident enough that we started to test customer interest by taking pre-orders. We also sourced the necessary components to assemble and ship the first wave of 150 devices. Pre-orders sold out, this was something folks were interested in. We go busy finalizing the device and preparing for delivery. Then things went wrong.
First, to make the Prepper Disk AI possible we needed faster storage than what the Premium uses. This was necessary to keep the chat bot from coming across as painfully sluggish. The problem was that higher end storage (NVME for my fellow techies) drew so much power that it often saw power dips or hiccups that scrambled the drive. In other words, running an AI was too demanding for our hardware and was actively corrupting (breaking) the device every couple boots or so.
The hardware issues postponed the final quality control testing where we planned to ask the device many more high stakes questions than initially tested to assess safety.
With the hardware issues, we had to painfully cancel and disappoint many customers. We were able to offer folks a special edition Raspberry Pi5 version of the Prepper Disk and refunds as an apology, and many took advantage, but it was a big setback and a low point for the year. We shelved the device, but remained committed to coming back to the problem as soon as possible. That moment came in December ‘25 and has concluded this week.
Round Two
After playing with a number of configurations on the Raspberry Pi5 it became clear that we’d need a higher end machine to escape the power and storage problems we faced in the first attempt.
We moved to using so-called “miniPC’s” which can offer faster processors, better power management, and built in fast (nvme) storage. In initial testing, these devices handled the work the Raspberry Pi 5 device could not, and with none of the problems we had seen before.
The second phase of testing was now unlocked. We could begin to really stress test how accurate the chatbot could be across a broad range of topics.
Quality is Paramount
Not good enough. LLM’s have a possibility of making things up when they respond. They call this “hallucination”. Even first class LLM’s like ChatGPT and Gemini hallucinate, but tiny ones that can run on a raspberry pi or minipc are even worse.
While there are tunings and techniques that can help minimize hallucinations, none of them proved effective enough for our purposes (for tech folks, who may be interested, read on in the appendix to see detail).
Ultimately we could not find a reliable combination of settings and controls that would give adequate responses. With settings at their most conservative, the chatbot wouldn’t answer any question that had any level or risk at all, increased a little and you’d get just enough randomness that you couldn’t count on the chatbot reliably describing a process without occasionally changing “north” to “south” or “red” to “white” when describing poisonous mushrooms. Not good.
We tried some additional technologies that are precursors to AI but allow more control, with similarly satisfying results.
The Conclusion
The stakes really aren’t higher when you have a device that could, just could, be the difference between life and death for someone. As the technology stands today, we don’t feel it can meet the standards we and our customers expect. We will keep this feature on our roadmap to revisit down the road, but for now, we’re shifting our focus to some promising features for later this year.
The Tech Stuff (Feel free to read on if you’re a ‘techie’ and interested in a little more detail)
If you’re a dev yourself and curious what we tried, happy to share here.
First Attempt
Here we focused on Ollama / WebUI. A nice simple install on linux and the models that were 3b or less (particularly llama3.2:3b) were quite performative on a Pi5/8GB. Smaller models (1.5b) were noticeably faster but not as rich as the 3b model (though surprisingly good). We even had a model fine tuned with the PDF’s from the Post Disaster Resource Library.
The Argon5 NVME was our case with this model and we used first rate 1TB NVME storage from folks like Crucial and Western Digital. When we saw the nvme corruption issues we worked with Argon40 themselves but weren’t able to find settings that alleviated it. Ultimately the failure rate was too high. (Incidentally, the case also interfered with the 2.4Ghz channel of the device pretty mightily).
Second Attempt
We moved to a n95 miniPC with 512GB of NVME storage and 16GB of RAM. The 3b model run on the Pi5 was quite snappy, enough so that we were checking out 7b models.
With the hardware working reliably we began the broader testing and found (as previously noted) dangerous omissions, hallucinations, or word choices. We played with all the usual settings (temperature being the most effective) with unsatisfying results.
We moved to RAG. RAG in OpenWebUI wasn’t great. If you’ve done RAG you experiment with settings like chunk, overlap, which transformer you use, PDF structure, pipelines, etc. It wasn’t great now matter what we did.
So we moved to AnythingLLM, which is regarded for being quite good at RAG. And it was better (right out of the box) than anything we could do with OpenWebUI (at least for our corpus).
Pulling content was mixed, chunking and overlap tuning felt pretty useful but with diminishing returns over time. Hallucinations still happened, interwoven into the RAG citations. Even with the most conservative settings in AnythingLLM (“Optimize Accuracy”, matching settings) there were still unacceptable hallucinations in simple questions like “how do i find north in the woods”.
As a final attempt, we move to using Haystack as a simple word vector search of the PDF chunks, and then fed that into an LLM as chunks that it is instructed strictly how to only summarize and introduce no new elements.
This was by far the most accurate when it answered, but it answered much less frequently. It also gave the impression of being able to support conversation, because it was a step above search, but as a step below LLM it lives in a place users can’t understand how to communicate with.
This was ultimately the final “nail” in the coffin for an AI Chatbot for Prepper Disk in 2026