Achieving generic bliss with reflection in modern C++
Reflection is often presented as a feature that makes software harder to understand. In this article, I will present ways to approximate some level of static reflection in pure C++, thanks to C++17 and C++20 features, show how that tool can considerably simplify a class of programs and libraries, and more generally enable ontologies to be specified and implemented in code.
What we aim to solve
Imagine that you are writing a neat algorithm. You’ve worked on it for a few years ; it produces great results and is now ready to be shared to the world. Let’s say that this algorithm is a noise generator.
You’d like that noise generator to easily work in a breadth of environments: in 2D bitmap manipulation programs (Krita, GIMP, …), in audio and multimedia software (PureData, Audacity, ossia score, Mixxx…), 3D voxel editors, etc.
Your algorithm’s implementation more-or-less looks like this:
You are proud of your neat results, prepare conference papers, etc… but ! Now is the time to implement your noise algorithm in a set of software in order to have it used widely and become the next industry standard in procedural noise generation.
If you are used to working in C#, Java, Python, or any other language more recent than 1983, the solution may at this point seem trivial. Sadly, in C++, this has been unordinately hard to implement until now, especially when one aims for as close as possible to a zero-runtime cost-abstraction.
On the other hand, if you implement your algorithm in C#, Java, or Python, having it useable from any other runtime environment is a massive challenge, as two VMs, often with their own garbage collection mechanism, etc… now have to cooperate. Thus, for something really universal, a language than can compile to native binaries, with minimal dependencies, is the easiest way to have a wide reach. In particular, most multimedia host environments are written in a native language and expect plug-ins conforming to operating system DLLs, executable and dynamic linker APIs: ELF, PE, Macho-O ; dlopen
and friends. There aren’t that many suitable candidates with a high enough capacity for abstraction: C++, Rust, D without GC, Zig. Since most of the media host provide C or C++ APIs, and C does not have any interesting form of reflection, C++ is the natural, minimal-friction choice.
This post is a hint of how much better and easier life is with true reflection as available in other languages, and in particular how attribute reflection and user-defined attributes, would make one’s life. And most importantly, what kind of abstracting power “reflective programming” holds over existing generic programming techniques in C++: macro-based metaprogramming, template-based metaprogramming (with e.g. CRTP being commonly used for that).
The problem domain
The software in which we want to embed our algorithm should be able to display UI widgets adapted for the control of alpha
and beta
, whose bounds you have so painstakingly and thoroughly defined. Likewise, the UI widgets should adapt to the type of the parameter ; a spinbox may make more sense for beta
, and a slider, knob, or any kind of continuous control for alpha
.
Maybe you’d also like to serve your algorithm over the network, or through an IPC protocol like D-Bus. Again, you’d have to specify the data format being used.
If for instance you were using the OSC protocol, to make your algorithm controllable over the network, messages may look like:
Maybe you’d also like to serialize your algorithm’s inputs, in order to have a preset system, or just to exchange with another runtime system expecting a serialized version of your data. In JSON ? YAML ? Binary ? Network-byte-order binary ? GLSL std140
? So many possibilities !
Hell on earth
For every protocol, host environment, plug-in system, … that you want to provide your algorithm to, you will have to write some amount of binding code, also often called glue code.
How does that binding code may look, you ask ?
Let’s look at some examples from around the world:
- Making a fade algorithm in PureData : a class is constructed at run-time, with custom
t_object
,t_float
,t_inlet
etc… types, some of which requiring calls to various runtime allocating functions. Lots of not-very-safe-looking casts (but it’s C, there’s not a lot of choice). - Noise generator for Max/MSP’s Jitter, using OpenCV. Same as PureData, with macros sprinkled on top. Wanna get a floating-point value input by the user ? Lo and behold
What happens if argv + j
isn’t a float but a string ? Let’s leave that for future generations to discover !
- Audio filter suitable for use as a VST. Notice how the parameters to the algorithms are handled in
switch/case 0,1,2...
; thankfully this is all generated code from the Faust programming language. What happens if at some point a parameter is removed ? Better have good unit tests to catch all the implicit uses of each parameter… - OpenFX image filter: here’s how one says that the algorithm has a bounded input widget (e.g. a slider going from 0 to 10 with a default value of 1):
Hopefully you don’t forget all the incantations’s updates when you decide that this control would indeed be better as an integer !
- A very good example is the 1€ filter input filtering algorithm. The actual algorithm can fit in a few dozen lines. The Unreal Engine binding is almost 600 lines !
- Things like iPlug are a bit more sane, but we still have to triplicate our parameter creation / access: in an enum in the hpp, in the constructor and finally in
ProcessBlock
where we get the actual value. This is still a whole lot of work versus JUST ACCESSING A FLOAT IN A STRUCT !!11!1!! - A Krita plug-in for noise generation – here Qt’s QObject run-time property system is used to declare and use the algorithm controls. That also means inheriting from Qt’s QObject, which has a non-negligible memory cost.
- Wanna receive messages through OSC ? Make the exceptions rain !
- Wanna expose your algorithm to another language, such as Python ? Get ready for some py::<>’y boilerplate.
As such, one can see that:
-
There is no current generic way for writing an audio processor in PureData, and have it work in, say, Audacity, Ardour or LMMS as a VST plug-in, expose it through the network… Writing a PureData external ties you to PureData, and so does writing a Krita plug-in. It’s the well-known “quadratic glue” problem: there are N algorithms and M “host systems”, thus NxM glue code to write.
-
All the approaches are riddled with unsafety, since the run-time environments force the inputs & outputs to the algorithm to be declared dynamically ; thus, if you make an error in your call sequence, you rely on the runtime system you are using to notice this error and notify you (e.g; if you are lucky you’ll get an error message on stdout ; but most likely a crash).
-
All the approaches require duplicating the actual parameters of your algorithm, e;g. our
alpha
,beta
, once as actual C++ variables, once as facades to the runtime object system you are interacting with.
Of course, the above list is not an indictment on the code quality of those various projects: they simply all do as well as they can considering the limitations of the language at the time they were originally written, in some cases multiple decades ago.
We will show how reflection allows to improve on that, and in particular get down to N+M pieces of code to write instead of NxM.
Problem statement
Basically: there’s a ton of environments (also called “hosts”, “host APIs” in the remainder of this article) which define ad-hoc protocols or object systems. Can we find a way to make a C++ definition (“algorithm”, “plug-in” in the article) which:
-
Does not depend on any pre-existing code: doesn’t inherit from a class, doesn’t call arbitrary run-time functions, etc. The definition of the algorithm shall be writable without having to include anything, even standard headers (discounting of course whatever third-party library is required for the algorithm implementation).
-
Does not use anything other than structures of trivial, standard-layout types. No tuples, no templates, no magic, just
structs
containingfloat
,int
and not much more. This is because we want to be able to give the simplest possible expression of a problem. C++ is often sold as a language which aims to leave no room for a lower-level language. The technique in this post is about leaving no room for a simpler and more readable implementation of an algorithm, while maintaining the ability to control its inputs and outputs at run-time. Ideally, that would lead to a collection of such algorithms not depending on any framework, except optional concept definitions for a given problem domain. Of course, once this works, a specific community could choose to define its core concepts and ontologies through a set of standard-library-like-types, e.g.string_view
,array
orspan
-like types. -
Does not duplicate parameter creation: defining a parameter should be as simple as adding a member to a structure. The parameter’s value should not be of a complicated, custom library type; just using
int
orfloat
should work. At no point one should have to write the name of a variable twice, e.g. with a macro system such as Boost.Fusion withBOOST_FUSION_ADAPT_STRUCT
, with Qt / copperspice / verdigris property systems, or with pybind-like templates: remember, we do not want our algorithm code to have any dependency ! -
Allows to specify metadata on parameters: as one could see, it is necessary to be able to define bounds, textual descriptions, etc… for the inputs to the algorithm. For instance, the algorithm author may want to define a help text for each of the parameters, describing how each control will affect the result.
-
Allows that definition to be automatically used to generate binding code to any of the environments, protocols, runtime systems mentioned above, with for only tool a C++20 compiler.
Massaging the problem
Sadly, C++ does not offer true reflection on any entity: from the generic function noise
defined above, it would be fairly hard to extract its parameter list, and reconstruct what we need to perform the above. Likewise, due to the lack of user-defined attributes, one wouldn’t be able to tag the input / output parameters, to give them a name, bounds, etc.
We will however show that with very simple transformations, we can reach our goals !
First transformation: function to class
This transformation is commonplace in C++: classes / structs are in general more convenient to use than straight function pointers. They are easier to pass as arguments, work better with the type system as template arguments, etc.
Let’s apply it:
Thankfully, the actual implementation does not change ; we merely put some arguments as struct members instead. If the algorithm is complex with many settings and toggles, it is likely that this was already the case in your implementation.
What if, dear reader, I told you that, as of C++17 (and actually even 14 if using a non-legal hack), this is pretty much enough for achieving three of our four goals (with some limitations, mainly on the number of members) ?
Mapping our class to a run-time API automagically
Assume the following imaginary run-time API for doing some level of processing, in cross-platform C89:
To register our process to thar imaginary API, one may write the following, which would then be compiled as a .dll / .so / .dylib and be loaded by our runtime system through dlopen
and friends:
What we want, is simply to use C++ to generate all the code above automatically.
That means, most importantly, to call the relevant lib_add_*
function for each parameter with the correct arguments.
Enumerating members
This is trivial, thanks to a library, Boost.PFR, which technically works from C++14 and up. Note that the library is under the Boost umbrella but does not have any dependencies and can be used stand-alone. The technique is basically a band-aid until we get true reflection: it counts the fields by checking whether the type T is constructible by N arguments of a magic type convertible to anything, and then uses destructuring to generate tuples of the matching count.
In a nutshell:
It opens a wealth of possibilities: iterating on every member, performing operations on them, etc. ; the only restriction being: the type must be an aggregate. Thankfully, that is not a very hard restriction to follow, especially if we want to write declarative code, which lends itself pretty well to using aggregates.
Let’s for instance write a function that takes our struct and generates the lib_add_float
/ lib_add_int
calls:
This gets us 90% there: if our C API was just lib_add_float(lib_type_t, float*);
that blog stop would stop right there !
But, as it stands, our API also expects some additional metadata: a pretty name to show to the user, mins and maxs…
Second transformation: ad-hoc types for parameters
This transformation is mechanical, but complexifies our code a little bit. We will change each of our parameters, into an anonymous structure containing the parameter:
becomes
And at this point, it becomes easy to add metadata that will not have a per-instance cost, unlike a lot of runtime systems (for instance QObject properties used in Krita plug-ins):
The code sadly uglifies a little bit:
There isn’t a lot of wiggle room to improve. It is not possible to have static member variables in anonymous structs ; if one is willing to duplicate the name of the struct, it’s possible to get things down to:
How user-defined attributes and attribute reflection would help
Now, if we were in, say, C#, what we’d most likely write instead would instead just be:
Simpler, isn’t it ? How neat would it be if we had the same thing in C++ ! There is some work towards that in Clang and the lock3/meta metaclasses clang fork.
We could even try (okay, that’s a little bit far-fetched) to read the pre/post conditions from C++ contract specification when it finally lands, in order to have to specify it only once !
Although in practice some methods may be needed: for instance, multiple APIs require the user to provide a method which will from an input value, render a string to show to the user.
Updating our binding code
We now have to go back and work on the binding function implementation: the main issue is that where we were using the actual type of the values, boost::pfr::for_each_field
will give us references to anonymous types (or, even if not anonymous, types that we shouldn’t have knowledge of in our binding code).
In our case, we assume (as part of our ontology), that parameters have a value. This is a compile-time protocol.
Thankfully, a C++20 feature, concepts, makes encoding compile-time protocols in code fairly easy. Consider a member of our earlier visitor:
We can for instance fill it that way :
And that would work with our current noise
implementation.
But what if the program author forgets to implement the name()
method ? Mainly a not-so-terrible compile error:
If our API absolutely requires a name()
, and a value
, concepts are very helpful:
Our code becomes:
Forgetting to implement name()
now results in:
Whether that constitutes an improvement in readability of errors in our specific case is left as an exercise to the reader.
But, what if our algorithm doesn’t actually need bounds ? We’d still want it to work in a bounded host system, right ? The host system would just choose arbitrary bounds that make sense for e.g. an input widget.
In this case, we’d get a combinatorial explosion of concepts: we’d need an overload for a parameter with a name and no range, an overload for a parameter with a range and no name, etc.
Handling optionality
As an algorithm author, you cannot specify every possible metadata known to man. We want our algorithm to be future-proof: even if refinements can be added, we want the code we write today to still be able to integrate into tomorrow’s host.
Thankfully, the age-old notion of condition can help here ; in particular compile-time conditions depending on the existence of a member.
C++20 makes that trivial:
This way, the algorithm has maximal flexibility: it can provide the bare minimal metadata for a proof-of-concept, or give as much information as possible.
This last part works in Clang and GCC, but MSVC’s concepts implementation does not support it yet.
Calling our code
There’s not much difference with the previous technique when we want to call our process (operator()
) function.
What we cannot do without reflection & code generation (metaclasses) is an entirely generic transformation from one of our algorithm’s processing method, which, depending on the problem domain, could have any number of inputs / outputs of various types, to arbitrary run-time data. For instance, audio processors generally have inputs and outputs in the form of an array of channels of float / double values, plus the amount of data to be processed:
While image processors would instead look like:
There’s no practical way to enumerate all the possible sets of arguments.
Thus, the author of the binding code has the responsibility of adapting the expected ontology for algorithms to the API we are binding to.
Nothing prevents multiple cases to be handled: for instance, some plug-ins may have a more efficient, array-based, implementation for their process ; some hosts may be able to use that if available:
An host which only supports array-based computations would instead write:
Going thread-safe
Suppose that our C host API specifies that the process
method is run in a separate thread, for efficiency concerns.
Such an API’s lib_add_float
function could look like this:
where context
would be an object passed to getter
and setter
so that the actual float could be found.
getter
and setter
could be called from any of the host’s threads, e.g. the main or UI thread for instance, while process
would be called from a separate thread specifically.
Thus, our actual float
needs some protection. Now, our program has the added requirement of not using locks: the algorithm could be used from a real-time system.
What we can do is, transform our list of parameters into atomic<T>
types, at compile-time.
A simple way for this is through any of the common C++ type-based metaprogramming libraries, which are able to transform tuples: in our case we’ll use Boost.MP11 ; other alternatives are Brigand, Metal, etc.
From this, our binding methods would be changed to look like:
The load_all_atomics
function is a bit dense to read, here’s a spaced-out and simplified version:
Note that in the end, compilers will happily inline all that into a couple of mov
instructions :-)
Its conjugate function, store_all_atomics
is left as an exercise to the reader.
Another interesting function that one can write is the function that will perform an operation on the nth parameter, n
being known only at run-time, as some APIs are index-based instead of pointer-based: parameters are identified through an index.
Here’s a solution I found, which Clang is able to optimize pretty well through what looks like a loop recombination optimization, but other compilers sadly don’t manage to:
Conclusion
From this, it is obvious that writing, for instance, a generic serializer, hash function, etc… that will work on any such types is trivial ; Boost.PFR already provides some amount of it. A fun exercise left to the reader would be memoization of plug-in state, for the sake of undo-redo.
Note that the algorithm could also easily be generic ; for instance, some audio plug-in APIs support working with either single- or double- precision floating-point ; one could just provide a noise<std::floating_point>
algorithm if it fits the algorithm’s spec. Otherwise, the binding library would simply perform the conversion from / to the correct floating-point type if that is a meaningful thing to do.
This concept has been prototyped in, first, an API for writing plug-ins for ossia score, and then in a tentative for writing audio plug-ins, vintage.
There is one last drop of manual binding code to write: the code that ties the algorithm to the API.
There is no easy way without full static reflection to bypass that drop of code: we have to reference the name of our algorithm in the same line than our binding code at least once ; full reflection would allow to enumerate the available types instead and skip that part. There are two band-aid solutions:
- State that the class containing the algorithm must have a specific name, e.g.
Plugin
. This does not really scale if for instance a software would like to build and bundle multiple such plug-ins statically, due to ODR ; it can be made to work with shared objects if one takes care of hiding all symbols exceptplugin_main_function
. - Generate that code in the build system: one could easily provide a set of e.g.
bind_algorithm(<name> <main_source_file>)
CMake function which would generate the appropriate.cpp
; the act of porting the algorithm to a new host platform would simply be for instance forking a template repo on GitHub, and replacing the content of ansrc/algorithm.cpp
file.
So, in the end, what we have, roughly, is:
- Algorithms without dependencies on host APIs for exposing themselves to GUI software, etc.
- Independent introspection of these algorithms.
What remains is, as a community, to specify the ontologies / concepts that a given algorithm can be made to fit in: for instance, for audio plug-ins, the LV2 specification has done a great deal of work towards that ; similar work could be done for graphics algorithms, serialization systems, etc.
This work could be encoded in C++ concepts, maybe with inspiration from the various Haskell typeclasses or Rust traits libraries: then, if as an algorithm author I want to make sure that my algorithm will be able to be used by audio, video, … software, I’d just clone a concept-checking library and see which concepts my code does (and does not) support ; an algorithm which takes a float and outputs a float would likely have a very wide applicability.
A further blog article will present how one can leverage this to build data-flow graphs either at compile-time or at run-time.