Okay I did this
Although, actually I did HSL instead of HSV, I was getting which is which mixed up before
The Lightness parameter only comes into play in this video to facilitate the playback cursor though, so mostly the L vs. V thing isn’t significant here—mainly this is all about Hue as I said.
![]()
As you can see here, matching colors in this video are spaced at an octave
![]()
I just realized something. This program would make a very nice interface for converting raw audio to MIDI or similar (i.e. auto-transcription). Almost everything needed is already there—it has a brightness parameter that would give you a very natural and easy-to-visually-parse way to set a threshold for what to call a “note”, and the program already knows how to break the sound down in terms of an arbitrary EDO (e.g. if you set the edo parameter to 12 it will spit out a piano-note-spaced representation of the sound—here the edo is set to 220 because that yields a resolution close to 2160 px/column, an entirely visual rationale in other words
). I could even code a “piano roll view” that would show you how the MIDI would come out in a given DAW/seqeuncer—I’d get part of the UI code for a piano roll out of that too which I’m sure would come in handy later. ![]()
Honestly, the main things the program still needs for this are just to run in realtime+interactively (right now it just generates video frames) and some representation of MIDI data. Both can be had easily: this presents a great opportunity to break the Vulkan-and-SDL-oriented code out of the video-generating project I was showing a few weeks ago into its own library (a library that would represent a nascent game engine I might note
), and I’m pretty sure I have some code lying around somewhere for MIDI output I can reuse too (that could also go in the engine, why not). I’ve been wanting Vulkan in this program anyway because sampling and rasterizing the FFT data on the CPU is rather time-consuming and much better-suited to the GPU in performance terms.
(although the real bottleneck right now is the encoding and writing 6000 10M PNG images part, which takes hours on my system and occupies an awkward amount of hard disk space
I think the only way to ameliorate the slowness of that directly at this point is probably just to multithread the PNG writing code, which I should do anyway not only for this but also for the video generating stuff from earlier, that would also be a natural thing to put in the engine—anyway though if I had a realtime mode for this program I could also limit the times when I do the render-frames-to-disk stuff to the times when I really want it)
Anyway, I’ve wanted a program like this to do some form of semi-auto-transcription or guided auto-transcription or whatever for years; I often want to have transcriptions of my guitar recordings but transcribing them by hand can be very time consuming. So, I’ve imagined a program like this many times. In the past, I didn’t know enough mathematics to see clearly how to implement it, but I guess I do now
in the end I kind of just walked into it accidentally I guess. Linear algebra is amazing. (this is the textbook I’ve been using if anyone else is interested, it’s really incredible stuff
)

























