April 5, 2025

Using Lua in Embedded: Part 2 – Running on Real Hardware

In this follow-up to the original “Using Lua in Embedded,” the article shifts from theory to practice by running a Lua script on actual hardware — the STM32F746 Discovery board.

Lua, VFSes—what’s not to like?

By way of introduction

After a long wait, we’re finally back with the second part of the article. From now on, all articles will be hosted on my website instead of Medium—so stay tuned for new material!

If you haven’t had a chance to read the previous one, I strongly advise you to get familiar with it. It contains baseline information about what we’re trying to achieve. You can read it here.

Where did we stop last time?

In the last article, we managed to scratch the surface of topics like:

High-level concepts of embedding a Lua interpreter into an existing codebase
The benefits such an approach brings
How to structure the code around this idea—for example, dividing it into business logic and low-level, kernel-like modules

Now it’s time to get our hands dirty and start writing some code.

In this episode we will...

Prepare a template project based on the STM32F746 Discovery board to execute a simple Lua script directly from an SD card. To achieve that, we’ll cover topics such as:

POSIX compatibility and why it’s so useful in embedded projects
Incorporating Newlib's syscall layer
Virtual File System (VFS)
Using Meson as the build system

Let's get it started

Quick reminder: you can always find all the source code on the mp-coding GitHub.

Why POSIX matters?

POSIX is a set of standardized APIs that ensures compatibility between systems. In other words, a system that implements POSIX allows programs to compile and run without changes to the source code—or at least with minimal modifications. For instance, most programs written with Linux in mind can be easily compiled natively on macOS and vice versa.

Usually, embedded systems do not provide compatibility with POSIX. There are some exceptions like great Nuttx which I highly recommend or extra FreeRTOS compability layer called FreeRTOS+Posix. But, generally, we are on our own. Support is nonexistent or very basic.

I’m sure you can see where I’m going with this. Even a subset of POSIX can bring a lot to the table. It simplifies many tasks, such as running unit/integration tests on the host and speeds up the development process (since we’re using a common, well-known API). Of course, the world of programming isn’t black and white; sometimes it’s not that easy. But at least we’re heading in the right direction, aren’t we?

Although we won’t be using pthread calls or process management calls (at least for now), what we do need is basic support for file-system-related calls. This is because the Lua interpreter assumes that the standard library file-related API is available on the system it will run on. We could modify the interpreter’s source code and adapt it to our embedded platform, but generally, that’s not a good idea. Instead, we’re going to provide the required API.

I’m talking about the standard library APIs (like fopen, fwrite, printf, and so on), which internally use a subset of system calls, such as open, close, and several additional ones. I can already hear the question forming: "But hey, how are we going to make this possible?"

The answer is...

Newlib syscalls layer

Newlib is the de facto standard C library used when cross-compiling with the ARM-GCC toolchain. Of course, there are other options like Redlib, but here, we’re going to focus solely on Newlib.

If you’ve been using the STM32 or NXP SDK, there’s a good chance you’ve already worked with syscalls. I bet you’ve seen code like this somewhere:

1int _kill(int pid, int sig)
2{
3    errno = EINVAL;
4    return -1;
5}
6
7void _exit(int status)
8{
9    _kill(status, -1);
10    while (1) { } /* Make sure we hang here */
11}
12
13int _wait(int* status)
14{
15    errno = ECHILD;
16    return -1;
17}
18
19int _unlink(char* name)
20{
21    errno = ENOENT;
22    return -1;
23}
24
25and so on...

Some syscalls require the user to implement them, while others are defined by default and marked as weak, allowing them to be redefined.

That said, the minimal list of syscalls we need to implement in order to make Lua work is as follows:

_open – Opens a file
_close – Closes a file
_read – Reads data from a file
_write – Writes data to a file
_lseek – Moves the file read/write position
_fstat – Gets file status (size, type, etc.)
_stat – Similar to _fstat, but for file paths
_isatty – Checks if a file descriptor refers to a terminal
_unlink – Deletes a file
_link – Creates a new link to a file
_rename – Renames a file
_fsync – Flushes file changes to disk (if supported)
_mkdir – Creates a directory
_rmdir – Removes a directory

‍

We could take the easy route by choosing our filesystem library of choice (such as FatFS or any other) and using it to directly implement the syscalls. However, we’re going to take a slightly more advanced approach, which will pay off later.
‍

Virtual File System - What's it all about?

A Virtual File System (VFS) is an abstraction layer over various file systems. It provides users with uniform access (via an API) to any type of filesystem without the need to worry about unnecessary technical nuances. On Unix-based systems, it often also includes pseudo-filesystems. This is a quite interestingtopic on its own, but for now, we won’t dive into it.

As a reminder: our goal is to have the flexibility to support both FAT, ext4, or any other filesystem of choice that might be present on the SD card, so the Lua interpreter can easily fetch instructions from script files.

Here, you can find the eVFS library that we’ll use to achieve our goals. In short, the library:

In short library:

Provides uniform access to EXT3/4 filesystems
Can be easily extended to support other filesystems like FAT, LittleFS, etc.
Already implements all required filesystem-related Newlib syscalls
Exports a block device interface so users can easily integrate it with custom drivers
Enables stdout, stdin, and stderr, so users can simply use printf, fgetc, etc.
Supports some POSIX-related calls like mount, umount
Supports the C++ filesystem API, such as std::filesystem::directory_iterator, std::filesystem::remove, and so on

‍

Enough Talk, Let’s See Some Code!

The EmbedLab example project is based on STM32F746DISCOVERY board. It had been gathering dust in my closet for many years until now. I chose it because it has all the required peripherals and connectors, offers a substantial amount of RAM, SDRAM, and FLASH memory, all in a relatively compact form.

While there is no support for other boards just yet, we might explore this in future episodes. For now, we’ll stick with what we’ve got. 🙌

You can find the EmbedLab source code here. This project will serve as the foundation for this episode and future ones. For this episode, we'll be using branch lua_e01.

All the required packages, software, and tools are listed in the README. It also contains instructions on how to compile and flash the software onto the board. To successfully run the examples, you’ll need to provide an SD card that’s formatted correctly and contains the required data. The README also provides detailed steps on how to do this.

If you’re using macOS like me, you’ll need extFS. Unfortunately, it’s paid software, but I highly recommend it for seamless integration with EXT filesystems. In future episodes, I might also provide support for FAT.

For the build system, I chose Meson over the popular CMake. I find Meson a bit less convoluted, especially when it comes to handling variables. Both systems offer a similar approach to structuring projects, assuming you’re using modern CMake.

As you probably know, embedded systems require extra code to configure necessary peripherals and memory to boot into the main application. Here, we’ve taken extra steps to bring up additional features and quality-of-life improvements.

EmbedLab Makes Use Of:

FreeRTOS
Dynamic memory allocation
Exceptions enabled (experimental)
STM32Cube package to ease and speed up working with peripherals

There’s a common misconception that embedded systems shouldn’t use dynamic memory allocation, or at least, it should only happen once at the beginning of the program’s lifecycle. Generally, there’s some truth to this, especially when working with critical safety systems like aviation or automotive (engine controllers, ABS, etc.).

However, many systems don’t require such restrictions. Luckily, our case falls into the latter category. 🙂

I chose TLSF as the memory allocator backbone, but the heap4/5 allocators provided by FreeRTOS out-of-the-box are also highly reliable. We initialize the memory heap very early, before calling static constructors, to ensure we have access it right from the start.

/* Call the clock system initialization function.*/
  bl  SystemInit
/* Initialize dynamic memory */
  bl  sysheap_init
/* Call static constructors */
  bl __libc_init_array
/* Call the application's entry point.*/
  bl  main
  bx  lr
.size  Reset_Handler, .-Reset_Handler

‍

A lot has already been said about FreeRTOS — after all, it’s a very popular, minimalistic real-time kernel for embedded systems. There’s no need to add anything extra here. :)

We’re enabling exception support just to check how they fit into embedded development, i.e., what impact they have on the binary size. Additionally, exceptions are the only way to properly "return" errors from constructors. They also greatly simplify error handling when dealing with heavily nested function calls. I’m sure we’ll dive deeper into this in future episodes.

I’m not a big fan of the STM32Cube package. Its quality is, to put it mildly, debatable. However, in this case, we don’t want to waste time writing proper peripheral code. We want to bring up the board as quickly as possible and focus on other important elements. The STM32Cube package is ideal for this purpose.

Main Application Entry Point

Finally, let’s walk through the contents of the main.cpp file. We begin by defining the implementation of the standard stream interface. This will enable us to use stdin, stdout, and stderr streams

1namespace {
2    std::unique_ptr<vfs::VirtualFS> m_vfs;
3
4    class DefaultStdStream : public vfs::StdStream {
5    public:
6        ~DefaultStdStream() override { syscalls::stdout_deinit(); }
7        vfs::result<std::size_t> in(std::span<char>) override
8        {
9            /// TODO: Not currently supported
10            return vfs::error(ENOTSUP);
11        }
12        vfs::result<std::size_t> out(std::span<const char> data) override
13        {
14            if (not syscalls::stdout_write(data)) { return data.size(); }
15            return vfs::error(EIO);
16        }
17        vfs::result<std::size_t> err(std::span<const char> data) override
18        {
19            if (not syscalls::stdout_write(data)) { return data.size(); }
20            return vfs::error(EIO);
21        }
22    };
23
24} // namespace

‍

Followed by the main application block:

1[[noreturn]] void main_task(void*)
2{
3    vfs::logger::register_output_callback([](const auto lvl, const auto data) { printf("<%s> %s\n", vfs::logger::internal::level2str(lvl), data); });
4
5    auto blockdev  = SDCardBlockdev();
6    auto disk_mngr = vfs::DiskManager();
7
8    const auto disk = disk_mngr.register_device(blockdev);
9    assert(disk);
10
11    const auto partition_name = (*disk)->borrow_partition(0)->get_name().c_str();
12
13    m_vfs = std::make_unique<vfs::VirtualFS>(disk_mngr, std::make_unique<DefaultStdStream>());
14    assert(m_vfs->register_filesystem(vfs::fstype::ext4).value() == 0);
15
16    assert(mount(partition_name, "/mnt/vol0", "ext4", 0, nullptr) == 0);
17
18    const auto lua_state = luaL_newstate();
19    assert(lua_state != nullptr);
20
21    luaopen_base(lua_state);
22
23    const auto lua_entry_point = "/mnt/vol0/main.lua";
24    if (luaL_dofile(lua_state, lua_entry_point) != LUA_OK) {
25        printf("Error running Lua main entry point: %s\n", lua_tostring(lua_state, -1));
26        lua_pop(lua_state, 1);
27    }
28
29    lua_close(lua_state);
30
31    assert(umount("/mnt/vol0") == 0);
32
33    std::uint32_t counter {};
34    while (true) {
35        board::user_led_toggle();
36        printf("Led blink: %" PRIu32 "\n", counter++);
37        vTaskDelay(1000);
38    }
39}

‍

The good thing about this code is that it could be compiled and run on a host with minimal changes. The only things preventing this are FreeRTOS-related calls like xTaskCreate, vTaskStartScheduler, and vTaskDelay. For now, we’ll leave it as is, but we will definitely revisit this code later and improve it with the help of an OSAL.

Lines 3–16 focus on initializing the filesystem on the SD card. Please note the use of the stdlib mount API.

Starting from line 18, we create a new Lua context, initialize it, and then, by calling luaL_dofile, we execute a simple "Hello World" Lua script.

Finally, we unmount the filesystem and enter an infinite loop where we toggle the user LED every 1 second and print some debug data to the serial port. (The STM32F7DISCOVERY board has UART1 exposed via the debugger USB.)

You should see the following output in your local serial terminal:

<INFO> Disk 'sdcard0p0' of type 'ext4' mounted successfully to '/mnt/vol0'
Hello World from Lua!
Led blink: 0
Led blink: 1
Led blink: 2
Led blink: 3
...

‍

Size does matter, doesn't it?

The compiled binary size using --optimization=g resulted in:

Memory region    Used Size    Region Size  %age Used     
RAM:             144816 B     320 KB       44.19%     
FLASH:           267268 B       1 MB       25.49%

‍

Calling meson compile -C build size prints a nicely formatted extended size report. From it, we can see that Lua (from lua-5.4.6/src) took about 72.2 KiB, or 29.96% of the FLASH memory.

Not bad, I'd say.

Afterword

We successfully ran a simple Lua script on real hardware. Thanks to the eVFS and syscalls integration, we were able to load it easily directly from a file stored on the SD card. That wasn’t too difficult, was it?

This experiment demonstrates that Lua is suitable for embedded systems, offering additional features and flexibility while maintaining a reasonable compiled binary size. I wouldn’t recommend it for extremely small microcontrollers, but if you have some memory to spare, it’s definitely worth considering.

In the next episode, we’ll explore ways to write custom Lua drivers (or even more than one). With these, we’ll be able to control some of the board's peripherals directly from a Lua script. Stay tuned!

Lua