```python # ruff: noqa: N802, N803, N806, N815, N816 import os import numpy as np from scipy import signal # Simple utilities for displaying generated code in the notebook from utils import cleanup, display_text import archimedes as arc THEME = os.environ.get("ARCHIMEDES_THEME", "dark") arc.theme.set_theme(THEME) ``` # The Codegen API In the previous part of this tutorial, we learned how to automatically translate Python code to highly efficient C implementations. However, the C code generated by CasADi uses a generic interface that requires that all arrays be initialized externally. If we want to call the generated function from our own C code or use it in an embedded setting, this creates overhead of having to manually determine and adjust array sizes, names, initializations, etc. In embedded systems, memory management is critical. The "kernel" implementation of the algorithm is extremely efficient, but it expects: 1. Pre-allocated memory buffers for all inputs and outputs 2. Properly configured pointers to these buffers 3. Initialization of working memory needed by the algorithm Manually handling these details would be tedious and error-prone - especially when algorithms change during development. Archimedes handles this with an "interface layer", which creates a predictable and easy-to-use API for working with the low-level kernel code. On this page we will go through some of the details of this interface layer. First, let's recall the basic IIR filter implementation we were working with: ```python # Note: give descriptive names for return values (these don't need # to match the variable names) @arc.compile(return_names=["u_hist", "y_hist"]) def iir_filter(u, b, a, u_prev, y_prev): # Update input history u_prev[1:] = u_prev[:-1] u_prev[0] = u # Compute output using the direct II transposed structure y = (np.dot(b, u_prev) - np.dot(a[1:], y_prev[: len(a) - 1])) / a[0] # Update output history y_prev[1:] = y_prev[:-1] y_prev[0] = y return u_prev, y_prev ``` We will generate the code just as before: ```python cleanup() # Clean up any previous generated code # Design a simple IIR filter with SciPy dt = 0.01 # Sampling time [seconds] Wn = 10 # Cutoff frequency [Hz] order = 4 b, a = signal.butter(order, Wn, "low", analog=False, fs=1 / dt) # Create "template" arguments for type inference u = 1.0 u_prev = np.zeros(len(b)) y_prev = np.zeros(len(a) - 1) args = (u, b, a, u_prev, y_prev) arc.codegen(iir_filter, args) ``` ## The interface code Last time we looked at the kernel code, which used flattened arrays of pointers to access inputs, outputs, and working memory. While highly efficient, this code on its own is difficult to work with directly, particularly for more complex functions with multiple arguments and return values. Here's what the "interface" code looks like: ```python with open("iir_filter.h", "r") as f: c_code = f.read() display_text(c_code) ``` ```c #ifndef IIR_FILTER_H #define IIR_FILTER_H #include "iir_filter_kernel.h" #ifdef __cplusplus extern "C" { #endif // Input arguments struct typedef struct { float u; float b[5]; float a[5]; float u_prev[5]; float y_prev[4]; } iir_filter_arg_t; // Output results struct typedef struct { float u_hist[5]; float y_hist[4]; } iir_filter_res_t; // Workspace struct typedef struct { long int iw[iir_filter_SZ_IW]; float w[iir_filter_SZ_W]; } iir_filter_work_t; // Runtime API int iir_filter_init(iir_filter_arg_t* arg, iir_filter_res_t* res, iir_filter_work_t* work); int iir_filter_step(iir_filter_arg_t* arg, iir_filter_res_t* res, iir_filter_work_t* work); #ifdef __cplusplus } #endif #endif // IIR_FILTER_H ``` Now, instead of manually wrangling pointer arrays, we have function-specific input and output structs with meaningful names. The header is enough for us to work with this in our own code, but for completeness here's the corresponding C code: ```python with open("iir_filter.c", "r") as f: c_code = f.read() display_text(c_code) ``` ```c #include #include "iir_filter.h" int iir_filter_init(iir_filter_arg_t* arg, iir_filter_res_t* res, iir_filter_work_t* work) { if (!arg || !res || !work) { return -1; // Invalid pointers } // Initialize inputs arg->u = 1.0; arg->b[0] = 0.004824343357716228; arg->b[1] = 0.019297373430864913; arg->b[2] = 0.02894606014629737; arg->b[3] = 0.019297373430864913; arg->b[4] = 0.004824343357716228; arg->a[0] = 1.0; arg->a[1] = -2.369513007182038; arg->a[2] = 2.313988414415881; arg->a[3] = -1.054665405878568; arg->a[4] = 0.18737949236818502; arg->u_prev[0] = 0.0; arg->u_prev[1] = 0.0; arg->u_prev[2] = 0.0; arg->u_prev[3] = 0.0; arg->u_prev[4] = 0.0; arg->y_prev[0] = 0.0; arg->y_prev[1] = 0.0; arg->y_prev[2] = 0.0; arg->y_prev[3] = 0.0; // Initialize outputs memset(res, 0, sizeof(*res)); return 0; } int iir_filter_step(iir_filter_arg_t* arg, iir_filter_res_t* res, iir_filter_work_t* work) { if (!arg || !res || !work) { return -1; // Invalid pointers } // Marshal inputs to CasADi format const float* kernel_arg[iir_filter_SZ_ARG]; kernel_arg[0] = &arg->u; kernel_arg[1] = arg->b; kernel_arg[2] = arg->a; kernel_arg[3] = arg->u_prev; kernel_arg[4] = arg->y_prev; // Marshal outputs to CasADi format float* kernel_res[iir_filter_SZ_RES]; kernel_res[0] = res->u_hist; kernel_res[1] = res->y_hist; // Call kernel function return iir_filter(kernel_arg, kernel_res, work->iw, work->w, 0); } ``` If you are familiar with C, this code will be largely self-explanatory. It takes care of all array and pointer initialization, specifically initializing all arrays to the values of the "template arguments" we passed to `codegen`. One additional piece that's worth noting: the kernel code had some nice properties including deterministic and static memory allocation, and portable, self-contained code. The interface layer preserves these features, so it is also suitable for either standalone C/C++ applications or deployment to embedded platforms. ## Working with the generated API In order to use the generated code we have to do three things: ```c #include "iir_filter.h" // 1. Declare the input, output, and workspace structures iir_filter_arg_t arg; iir_filter_res_t res; iir_filter_work_t work; // 2. Initialize the structs iir_filter_init(&arg, &res, &work); // 3. Call the function iir_filter_step(&arg, &res, &work); ``` Note that, as in the "quick start" example earlier in this series, for recursive filters like the IIR, the `iir_filter_step` function does not place the outputs back in the input arrays. For example, a more useful iteration as part of a timed interrupt on an STM32 platform might look like the following: ```c int n = sizeof(arg.b) / sizeof(arg.b[0]) - 1; // Filter order float y; // Output, to be fed to actuator, control algorithm, etc. void HAL_TIM_PeriodElapsedCallback(TIM_HandleTypeDef *htim) { if (htim->Instance == TIM3) { arg.u = dma_buffer[0]; // Read from sensor, other algorithm output, etc. iir_filter_step(&arg, &res, &work); y = res.y_hist[0]; // Output, to be fed to actuator, control algorithm, etc. // Copy output arrays back to inputs for (int j = 0; j < n; j++) { arg.u_prev[j] = res.u_hist[j]; arg.y_prev[j] = res.y_hist[j]; } arg.u_prev[n] = res.u_hist[n]; } } ``` As a side note, you may well notice here that we're using roughly twice as much memory as we strictly need, since in this case we know that the outputs will be fed back to the inputs. This will generally be the case for recursive structures (IIR filters, sensor fusion, and similar algorithms) and highlights a limitation of the "functionally pure" assumption underlying code generation in Archimedes (and CasADi). While the generated code is efficient, it is not necessarily optimal; if you are working with tightly resource-constrained platforms or otherwise need extreme performance, you may still need hand-coding or other specialized tools. On the other hand, if you can afford to keep a few extra floats in memory and perform the associated copy operations, this workflow provides a clean way to abstract the algorithm implementation from the rest of the embedded application. For example, if we want to choose a different order and cutoff filter, we can do so easily: ```python cleanup() # Cleanup any previous runs # Design a simple IIR filter with SciPy dt = 0.01 # Sampling time [seconds] Wn = 20 # Cutoff frequency [Hz] order = 2 b, a = signal.butter(order, Wn, "low", analog=False, fs=1 / dt) # Create "template" arguments for type inference u = 1.0 u_prev = np.zeros(len(b)) y_prev = np.zeros(len(a) - 1) args = (u, b, a, u_prev, y_prev) arc.codegen(iir_filter, args) with open("iir_filter.h", "r") as f: c_code = f.read() display_text(c_code) ``` ```c #ifndef IIR_FILTER_H #define IIR_FILTER_H #include "iir_filter_kernel.h" #ifdef __cplusplus extern "C" { #endif // Input arguments struct typedef struct { float u; float b[3]; float a[3]; float u_prev[3]; float y_prev[2]; } iir_filter_arg_t; // Output results struct typedef struct { float u_hist[3]; float y_hist[2]; } iir_filter_res_t; // Workspace struct typedef struct { long int iw[iir_filter_SZ_IW]; float w[iir_filter_SZ_W]; } iir_filter_work_t; // Runtime API int iir_filter_init(iir_filter_arg_t* arg, iir_filter_res_t* res, iir_filter_work_t* work); int iir_filter_step(iir_filter_arg_t* arg, iir_filter_res_t* res, iir_filter_work_t* work); #ifdef __cplusplus } #endif #endif // IIR_FILTER_H ``` This has an identical API to the first version and will automatically initialize the correct filter coefficients, meaning we can tune the filter and re-deploy without changing any of the application code! ### Platform Portability Similar usage patterns can be applied in a range of contexts; since the generated code is self-contained it is highly portable (provided the compiler supports ANSI C). For example, here's a "cookbook recipe" for running a generated function in a timed loop on an Arduino. ```python with open("iir.ino", "r") as f: c_code = f.read() display_text(c_code) ``` ```c #include #include #include "iir_filter.h" // Sampling rate: 100 Hz const unsigned long SAMPLE_RATE_US = 10000; // Declare the input, output, and workspace structures iir_filter_arg_t arg; iir_filter_res_t res; iir_filter_work_t work; int n = sizeof(arg.b) / sizeof(arg.b[0]) - 1; // Filter order float y; // Output, to be fed to actuator, control algorithm, etc. // Timer interrupt handler void timerInterrupt() { arg.u = 1.0; // Or read from sensor, other algorithm output, etc. iir_filter_step(&arg, &res, &work); // Call IIR filter y = res.y_hist[0]; // Output, to be fed to actuator, control algorithm, etc. // Copy output arrays back to inputs for (int j = 0; j < n; j++) { arg.u_prev[j] = res.u_hist[j]; arg.y_prev[j] = res.y_hist[j]; } arg.u_prev[n] = res.u_hist[n]; } void setup(){ // Initialize the structs iir_filter_init(&arg, &res, &work); Serial.begin(9600); // Initialize Timer1 for interrupts at 100 Hz Timer1.initialize(SAMPLE_RATE_US); Timer1.attachInterrupt(timerInterrupt); } void loop() { // ...non-time-critical tasks delay(10); } ``` Note that the filtering code is identical to the STM32 snippet; the only differences here are in peripheral configuration and other platform-specific HAL code. The same auto-generated code can target everything from rapid prototyping platforms to production-grade embedded controllers to Unix clusters. ## Summary In this final part of the hardware deployment tutorial, we covered the basic structure of the "interface" layer, or API, generated by Archimedes as a way to abstract application code from the low-level kernel implementation. This Python-to-C code generation provides a convenient workflow for writing high-level logic in Python and then rapidly deploying to a range of C environments. ### Where to Go From Here As mentioned, this code generation and hardware deployment workflow is an early proof of concept, but it is central to the Archimedes roadmap. **If you're using Archimedes for hardware control and have questions, comments, or requests, please don't hesitate to reach out!** ```python cleanup() ```