Maybe function calling using JSON blobs isn't even the optimal approach... I saw some stuff recently about having LLMs write Python code to execute what they want, and LLMs tend to be a lot better at Python without any additional function-calling training. Some of the functions exposed to the LLM can be calls into your own logic.
Huggingface has their own "smolagents" library that includes "CodeAgent", which operates by the same principle of generating and executing Python code for the purposes of function calling: https://huggingface.co/docs/smolagents/en/guided_tour
smolagents can either use a local LLM or a remote LLM, and it can either run the code locally, or run the code on a remote code execution environment, so it seems fairly flexible.
Some relevant links:
This shows how python-calling performance is supposedly better for a range of existing models than JSON-calling performance: https://huggingface.co/blog/andthattoo/dpab-a#initial-result...
A little post about the concept: https://huggingface.co/blog/andthattoo/dria-agent-a
Huggingface has their own "smolagents" library that includes "CodeAgent", which operates by the same principle of generating and executing Python code for the purposes of function calling: https://huggingface.co/docs/smolagents/en/guided_tour
smolagents can either use a local LLM or a remote LLM, and it can either run the code locally, or run the code on a remote code execution environment, so it seems fairly flexible.