```python
# ruff: noqa: N802, N803, N806, N815, N816
import os

import numpy as np
from scipy import signal

# Simple utilities for displaying generated code in the notebook
from utils import cleanup, display_text

import archimedes as arc

THEME = os.environ.get("ARCHIMEDES_THEME", "dark")
arc.theme.set_theme(THEME)
```

# The Codegen API

In the previous part of this tutorial, we learned how to automatically translate Python code to highly efficient C implementations.
However, the C code generated by CasADi uses a generic interface that requires that all arrays be initialized externally.
If we want to call the generated function from our own C code or use it in an embedded setting, this creates overhead of having to manually determine and adjust array sizes, names, initializations, etc.

In embedded systems, memory management is critical. The "kernel" implementation of the algorithm is extremely efficient, but it expects:

1. Pre-allocated memory buffers for all inputs and outputs
2. Properly configured pointers to these buffers
3. Initialization of working memory needed by the algorithm

Manually handling these details would be tedious and error-prone - especially when algorithms change during development.

Archimedes handles this with an "interface layer", which creates a predictable and easy-to-use API for working with the low-level kernel code.  On this page we will go through some of the details of this interface layer.

First, let's recall the basic IIR filter implementation we were working with:


```python
# Note: give descriptive names for return values (these don't need
# to match the variable names)
@arc.compile(return_names=["u_hist", "y_hist"])
def iir_filter(u, b, a, u_prev, y_prev):
    # Update input history
    u_prev[1:] = u_prev[:-1]
    u_prev[0] = u

    # Compute output using the direct II transposed structure
    y = (np.dot(b, u_prev) - np.dot(a[1:], y_prev[: len(a) - 1])) / a[0]

    # Update output history
    y_prev[1:] = y_prev[:-1]
    y_prev[0] = y

    return u_prev, y_prev
```

We will generate the code just as before:


```python
cleanup()  # Clean up any previous generated code

# Design a simple IIR filter with SciPy
dt = 0.01  # Sampling time [seconds]
Wn = 10  # Cutoff frequency [Hz]
order = 4
b, a = signal.butter(order, Wn, "low", analog=False, fs=1 / dt)

# Create "template" arguments for type inference
u = 1.0
u_prev = np.zeros(len(b))
y_prev = np.zeros(len(a) - 1)
args = (u, b, a, u_prev, y_prev)

arc.codegen(iir_filter, args)
```

## The interface code

Last time we looked at the kernel code, which used flattened arrays of pointers to access inputs, outputs, and working memory.
While highly efficient, this code on its own is difficult to work with directly, particularly for more complex functions with multiple arguments and return values.

Here's what the "interface" code looks like:


```python
with open("iir_filter.h", "r") as f:
    c_code = f.read()

display_text(c_code)
```


```c

#ifndef IIR_FILTER_H
#define IIR_FILTER_H

#include "iir_filter_kernel.h"

#ifdef __cplusplus
extern "C" {
#endif

// Input arguments struct
typedef struct {
    float u;    
    float b[5];    
    float a[5];    
    float u_prev[5];    
    float y_prev[4];    
} iir_filter_arg_t;

// Output results struct
typedef struct {
    float u_hist[5];
    float y_hist[4];
} iir_filter_res_t;

// Workspace struct
typedef struct {
    long int iw[iir_filter_SZ_IW];
    float w[iir_filter_SZ_W];
} iir_filter_work_t;

// Runtime API
int iir_filter_init(iir_filter_arg_t* arg, iir_filter_res_t* res, iir_filter_work_t* work);
int iir_filter_step(iir_filter_arg_t* arg, iir_filter_res_t* res, iir_filter_work_t* work);


#ifdef __cplusplus
}
#endif

#endif // IIR_FILTER_H
```


Now, instead of manually wrangling pointer arrays, we have function-specific input and output structs with meaningful names.

The header is enough for us to work with this in our own code, but for completeness here's the corresponding C code:


```python
with open("iir_filter.c", "r") as f:
    c_code = f.read()

display_text(c_code)
```


```c

#include <string.h>
#include "iir_filter.h"

int iir_filter_init(iir_filter_arg_t* arg, iir_filter_res_t* res, iir_filter_work_t* work) {
    if (!arg || !res || !work) {
        return -1; // Invalid pointers
    }

    // Initialize inputs
    arg->u = 1.0;

    arg->b[0] = 0.004824343357716228;
    arg->b[1] = 0.019297373430864913;
    arg->b[2] = 0.02894606014629737;
    arg->b[3] = 0.019297373430864913;
    arg->b[4] = 0.004824343357716228;

    arg->a[0] = 1.0;
    arg->a[1] = -2.369513007182038;
    arg->a[2] = 2.313988414415881;
    arg->a[3] = -1.054665405878568;
    arg->a[4] = 0.18737949236818502;

    arg->u_prev[0] = 0.0;
    arg->u_prev[1] = 0.0;
    arg->u_prev[2] = 0.0;
    arg->u_prev[3] = 0.0;
    arg->u_prev[4] = 0.0;

    arg->y_prev[0] = 0.0;
    arg->y_prev[1] = 0.0;
    arg->y_prev[2] = 0.0;
    arg->y_prev[3] = 0.0;

    
    // Initialize outputs
    memset(res, 0, sizeof(*res));

    return 0;
}

int iir_filter_step(iir_filter_arg_t* arg, iir_filter_res_t* res, iir_filter_work_t* work) {
    if (!arg || !res || !work) {
        return -1; // Invalid pointers
    }
    
    // Marshal inputs to CasADi format
    const float* kernel_arg[iir_filter_SZ_ARG];
    kernel_arg[0] = &arg->u;
    kernel_arg[1] = arg->b;
    kernel_arg[2] = arg->a;
    kernel_arg[3] = arg->u_prev;
    kernel_arg[4] = arg->y_prev;
    
    // Marshal outputs to CasADi format
    float* kernel_res[iir_filter_SZ_RES];
    kernel_res[0] = res->u_hist;
    kernel_res[1] = res->y_hist;
    
    // Call kernel function
    return iir_filter(kernel_arg, kernel_res, work->iw, work->w, 0);
}
```


If you are familiar with C, this code will be largely self-explanatory.
It takes care of all array and pointer initialization, specifically initializing all arrays to the values of the "template arguments" we passed to `codegen`.

One additional piece that's worth noting: the kernel code had some nice properties including deterministic and static memory allocation, and portable, self-contained code.  The interface layer preserves these features, so it is also suitable for either standalone C/C++ applications or deployment to embedded platforms.

## Working with the generated API

In order to use the generated code we have to do three things:

```c
#include "iir_filter.h"

// 1. Declare the input, output, and workspace structures
iir_filter_arg_t arg;
iir_filter_res_t res;
iir_filter_work_t work;

// 2. Initialize the structs
iir_filter_init(&arg, &res, &work);

// 3. Call the function
iir_filter_step(&arg, &res, &work);
```

Note that, as in the "quick start" example earlier in this series, for recursive filters like the IIR, the `iir_filter_step` function does not place the outputs back in the input arrays.
For example, a more useful iteration as part of a timed interrupt on an STM32 platform might look like the following:

```c
int n = sizeof(arg.b) / sizeof(arg.b[0]) - 1;  // Filter order
float y;  // Output, to be fed to actuator, control algorithm, etc.

void HAL_TIM_PeriodElapsedCallback(TIM_HandleTypeDef *htim) {
    if (htim->Instance == TIM3) {
        arg.u = dma_buffer[0];  // Read from sensor, other algorithm output, etc.
        iir_filter_step(&arg, &res, &work);
        y = res.y_hist[0];  // Output, to be fed to actuator, control algorithm, etc.

        // Copy output arrays back to inputs
        for (int j = 0; j < n; j++) {
            arg.u_prev[j] = res.u_hist[j];
            arg.y_prev[j] = res.y_hist[j];
        }
        arg.u_prev[n] = res.u_hist[n];
    }
}
```

As a side note, you may well notice here that we're using roughly twice as much memory as we strictly need, since in this case we know that the outputs will be fed back to the inputs.
This will generally be the case for recursive structures (IIR filters, sensor fusion, and similar algorithms) and highlights a limitation of the "functionally pure" assumption underlying code generation in Archimedes (and CasADi).
While the generated code is efficient, it is not necessarily optimal; if you are working with tightly resource-constrained platforms or otherwise need extreme performance, you may still need hand-coding or other specialized tools.

On the other hand, if you can afford to keep a few extra floats in memory and perform the associated copy operations, this workflow provides a clean way to abstract the algorithm implementation from the rest of the embedded application.

For example, if we want to choose a different order and cutoff filter, we can do so easily:


```python
cleanup()  # Cleanup any previous runs

# Design a simple IIR filter with SciPy
dt = 0.01  # Sampling time [seconds]
Wn = 20  # Cutoff frequency [Hz]
order = 2
b, a = signal.butter(order, Wn, "low", analog=False, fs=1 / dt)

# Create "template" arguments for type inference
u = 1.0
u_prev = np.zeros(len(b))
y_prev = np.zeros(len(a) - 1)
args = (u, b, a, u_prev, y_prev)

arc.codegen(iir_filter, args)

with open("iir_filter.h", "r") as f:
    c_code = f.read()

display_text(c_code)
```


```c

#ifndef IIR_FILTER_H
#define IIR_FILTER_H

#include "iir_filter_kernel.h"

#ifdef __cplusplus
extern "C" {
#endif

// Input arguments struct
typedef struct {
    float u;    
    float b[3];    
    float a[3];    
    float u_prev[3];    
    float y_prev[2];    
} iir_filter_arg_t;

// Output results struct
typedef struct {
    float u_hist[3];
    float y_hist[2];
} iir_filter_res_t;

// Workspace struct
typedef struct {
    long int iw[iir_filter_SZ_IW];
    float w[iir_filter_SZ_W];
} iir_filter_work_t;

// Runtime API
int iir_filter_init(iir_filter_arg_t* arg, iir_filter_res_t* res, iir_filter_work_t* work);
int iir_filter_step(iir_filter_arg_t* arg, iir_filter_res_t* res, iir_filter_work_t* work);


#ifdef __cplusplus
}
#endif

#endif // IIR_FILTER_H
```


This has an identical API to the first version and will automatically initialize the correct filter coefficients, meaning we can tune the filter and re-deploy without changing any of the application code!

### Platform Portability

Similar usage patterns can be applied in a range of contexts; since the generated code is self-contained it is highly portable (provided the compiler supports ANSI C).

For example, here's a "cookbook recipe" for running a generated function in a timed loop on an Arduino.


```python
with open("iir.ino", "r") as f:
    c_code = f.read()

display_text(c_code)
```


```c
#include <Arduino.h>
#include <TimerOne.h>
#include "iir_filter.h"

// Sampling rate: 100 Hz
const unsigned long SAMPLE_RATE_US = 10000;

// Declare the input, output, and workspace structures
iir_filter_arg_t arg;
iir_filter_res_t res;
iir_filter_work_t work;

int n = sizeof(arg.b) / sizeof(arg.b[0]) - 1;  // Filter order
float y;  // Output, to be fed to actuator, control algorithm, etc.

// Timer interrupt handler
void timerInterrupt() {
    arg.u = 1.0;  // Or read from sensor, other algorithm output, etc.
    iir_filter_step(&arg, &res, &work);  // Call IIR filter
    y = res.y_hist[0];  // Output, to be fed to actuator, control algorithm, etc.

    // Copy output arrays back to inputs
    for (int j = 0; j < n; j++) {
        arg.u_prev[j] = res.u_hist[j];
        arg.y_prev[j] = res.y_hist[j];
    }
    arg.u_prev[n] = res.u_hist[n];
}

void setup(){
    // Initialize the structs
    iir_filter_init(&arg, &res, &work);

    Serial.begin(9600);

    // Initialize Timer1 for interrupts at 100 Hz
    Timer1.initialize(SAMPLE_RATE_US);
    Timer1.attachInterrupt(timerInterrupt);
}

void loop() {
    // ...non-time-critical tasks
    delay(10);
}
```


Note that the filtering code is identical to the STM32 snippet; the only differences here are in peripheral configuration and other platform-specific HAL code.
The same auto-generated code can target everything from rapid prototyping platforms to production-grade embedded controllers to Unix clusters.

## Summary

In this final part of the hardware deployment tutorial, we covered the basic structure of the "interface" layer, or API, generated by Archimedes as a way to abstract application code from the low-level kernel implementation.

This Python-to-C code generation provides a convenient workflow for writing high-level logic in Python and then rapidly deploying to a range of C environments.

### Where to Go From Here

As mentioned, this code generation and hardware deployment workflow is an early proof of concept, but it is central to the Archimedes roadmap.

**If you're using Archimedes for hardware control and have questions, comments, or requests, please don't hesitate to reach out!**


```python
cleanup()
```