57. Building

Jun 21, 2026

This week I bought a smart ring on AliExpress: a COLMI R02 (£17). I wanted to see whether I could connect it directly to PeakForm, my training and health app, without using the COLMI or QRing app. At first, I thought the question was whether this small, cheap ring could replace my WHOOP. My subscription has about a month left, so the initial idea was to run both devices side by side and compare them.

But the real story quickly became something else. It became a story about how I work with Codex: not by handing over a perfect specification, but by exploring, testing, correcting, and iterating together when official documentation is poor or non-existent.

We started with an assumption. If the ring could collect heart rate, sleep, HRV, steps, and perhaps some recovery-like signals, then PeakForm might be able to compare those fields against WHOOP. Codex turned that into an implementation plan: a native Ring tab, a 30-day overlap period, raw packet capture, Supabase tables, and a comparison view.

Then I started using the actual ring, and reality pushed back.

The ring does not behave like WHOOP. Some metrics were unclear. Some data was firmware-dependent. Some values appeared to exist, but we did not know what they really meant. I told Codex that I did not want the app pretending this was a WHOOP replacement if the data did not support that conclusion. That changed the direction of the project. We removed the comparison framing. We removed recovery, strain, HRV, and sleep from the Ring tab. The page became about what the ring could actually do: live heart rate, SpO2, battery, steps, and raw Bluetooth packets.

That was the first important collaboration moment. Codex did not simply continue building the original idea. I tested the assumption, provided feedback, and the product changed.

I also had Peter Steinberger’s Microsoft Build session, Build the thing that builds the thing, in mind while working through the project. The point of that talk, at least as I interpreted it, was not simply that agents can write code. It was that the interesting work lies in learning how to close the loop around them. When something is painful, repetitive, or ambiguous, you do not merely endure it. You build the thing that helps you and the agent work better the next time.

A must-watch: https://build.microsoft.com/en-US/sessions/BRK245

That was exactly what happened with the ring. I did not start with a perfect protocol document. I started with a device, a rough goal, and a lot of friction. The Bluetooth list was noisy, so we improved discovery. Pairing and reconnecting were annoying, so we improved restore. The sync button became stuck, so we added stale-sync recovery. The steps tile showed zero while the status line said steps existed, so we changed the parser. The app told me I had walked 15 million steps, so we added plausibility checks. Each annoyance became a signal. Each signal became a small tool, a small fix, or a better loop.

What I liked about this process was that Codex was not simply helping me build the feature. Codex was helping me build the process by which the feature became understandable. The raw packet capture, the sync diagnostics, the Supabase persistence, the parsing, the screenshots, and the repeated TestFlight builds all became part of the system. We were not only building the Ring tab. We were building the machinery that allowed us to keep learning from the ring.

The pattern became very simple. Codex built, I tested, I sent screenshots, Codex interpreted the contradiction, and we adjusted. I said I did not want to use the COLMI app, so Codex built direct Bluetooth support. I said the Ring tab should show live heart rate, so Codex added live heart-rate streaming. I said the nearby Bluetooth list was cluttered, so Codex collapsed and filtered it. I said the ring was paired in iOS settings but the app did not reconnect properly, so Codex improved known-device restore. I said the sync button stopped responding after the app had been in the background, so Codex added stale-sync recovery.

None of these were grand architectural instructions. They were small observations from actually using the app. But each one made the app better.

The most useful feedback was often not a full explanation. It was a screenshot showing something that did not make sense. At one point, the app said that sync had completed and that it had received steps, but the visible steps tile still showed zero. Codex inferred that the ring had sent step data, but the UI was not counting it. It traced the issue to timestamps: the app was filtering samples by “today”, but the ring’s packet date was unreliable. We changed the parser so that synced activity was anchored to the day PeakForm had requested from the ring.

Later, after I had worn the ring for longer, I noticed that walking around the house appeared, but my run did not. I told Codex that I had gone for a 10km run and the steps were missing. That changed the investigation again. Codex checked the public COLMI protocol notes and found another command, “today’s sports”, which appeared to include running steps separately from normal step samples. We added support for it.

Then the app told me I had completed 15 million steps.

That was obviously nonsense, but it was not useless nonsense. It suggested that we had probably found the correct packet but were interpreting the scale incorrectly. I knew I had not walked 15 million steps. I also knew that around 15,000 steps was plausible because I had run for an hour, moved around during the day, and spent time running in the garden with Coach Matthew. Codex added plausibility checks and normalization. The number became approximately 15,865, which fit the day much better.

This was where the collaboration felt most effective. I brought the lived reality. Codex brought the protocol work, code changes, and rapid iteration. The truth emerged somewhere between those two things.

There is also an older idea underneath all of this, one that predates software entirely. Louis Pasteur is often paraphrased as saying that chance favors the prepared mind. I think that is a useful way to understand this kind of work with Codex.

The ring did not reveal itself cleanly. It produced strange clues: a step count that disappeared after reload, a run that was missing from the daily total, a packet that decoded into 15 million steps. Those moments could easily have looked like bugs. But a prepared mind treats them as signals. The point is not that I already know the answer. The point is that I have enough context, curiosity, and instrumentation to notice when something interesting is happening.

That is where human and AI collaboration becomes powerful. My role is to recognize the anomaly: “this cannot be right, but it is probably telling us something.” Codex’s role is to turn that anomaly into an experiment: inspect the protocol, adjust the parser, save the raw packets, add a plausibility check, and build again. The prepared mind is not just in me, and it is not just in the model. It exists in the loop between us.

In that sense, “build the thing that builds the thing” and “chance favors the prepared mind” are closely related. The more we improve the loop, the more prepared we become. Raw packet capture prepares us to debug unknown commands. Supabase persistence prepares us to compare what the app saw before and after reload. Screenshots prepare Codex to reason from contradictions. TestFlight prepares me to test the real device quickly. Each small tool makes the next accident more legible.

The COLMI ring integration worked because we did not wait for perfect knowledge. We prepared the system to learn. Then, when the ring behaved strangely, we were ready to notice, ready to ask better questions, and ready to turn surprise into software.

One thing this project makes clear is that AI is not a substitute for judgment. Codex can generate code, inspect the repository, search public references, reason through packet formats, and run the iOS build. But it cannot wear the ring. It cannot know whether a step count feels plausible for my day. It cannot decide, on its own, whether a metric belongs in the product.

That is my role. I decided that WHOOP comparison was misleading. I decided that HRV should be removed because we did not trust it. I decided that 15 million steps was impossible but 15,000 was plausible. I decided that a UI felt cluttered, that a reconnect flow was annoying, or that a red error message needed to be cleared.

Codex responded to those judgments and turned them into working changes. The collaboration is not “human asks, AI answers.” It is closer to a feedback system. I observe the real world. Codex changes the software. The changed software produces new observations. We repeat.

The final Ring tab is simpler than the original plan. It does not try to be a WHOOP replacement dashboard. It does not show scores we do not understand. It does not compare proprietary recovery metrics. It does not pretend certainty where we only have guesses. Instead, it shows the things we can see and test: live heart rate, SpO2, battery, steps, today’s saved ring data, and raw packet diagnostics.

That simplicity is not where we start. It is where we arrive after testing. Codex is capable of building a great deal quickly, but the work only becomes good when I keep pushing it back toward reality. Less can be better, provided that the remaining pieces are trustworthy.

A hardware integration is not something you get right in a single pass. The public documentation is incomplete or pretty much non-existent. Firmware behaviour varies. Bluetooth reconnects are messy. Supabase persistence has edge cases. The UI can show a value correctly once, then lose it after reopening. A packet can be real but scaled incorrectly. If I had written a perfect-looking specification at the start, it still would have been wrong.

The useful specification appears through use. I test the app on my phone. I wear the ring. I go for a run. I reopen the app. I notice what breaks. I send the screenshot. Codex updates the code. The app improves because the loop is short.

PeakForm now connects directly to my COLMI R02. It reads live heart rate, probes SpO2, syncs steps, handles reconnects, saves data to Supabase, and retains raw packets so we can continue learning from the device. It is not perfect.

More importantly, it is the product of a working rhythm between me and Codex. I bring intent, taste, context, and reality checks. Codex brings speed, codebase awareness, protocol research, and implementation. Neither side is enough on its own.

The result is not just a Ring tab. It is a glimpse of a new way of building: conversational, empirical, and iterative. I do not need to know the final answer at the beginning. I need to keep noticing what is true, and Codex needs to keep helping me turn those observations into better tools, better tests, and better software.

It turns out you can learn a lot with patience and £17 (shipping included!).

Training-wise, I hope normal business will resume next week. Being tired and a heatwave are not a good combination, so I scaled back this week!

We Run Ultras by Francesco 🏃‍♂️🏔️

Discussion about this post

Ready for more?