More

hellovai · 2025-09-23T18:24:18 1758651858

if you haven't tried the research -> plan -> implementation approach here, you are missing out on how good LLMs are. it completely changed my perspective.

the key part was really just explicitly thinking about different levels of abstraction at different levels of vibecoding. I was doing it before, but not explicitly in discrete steps and that was where i got into messes. The prior approach made check pointing / reverting very difficult.

When i think of everything in phases, i do similar stuff w/ my git commits at "phase" levels, which makes design decision easier to make.

I also do spend ~4-5 hours cleaning up the code at the very very end once everything works. But its still way faster than writing hard features myself.

0xblacklight · 2025-09-23T18:31:25 1758652285

tbh I think the thing that's making this new approach so hard to adopt for many people is the word "vibecoding"

Like yes vibecoding in the lovable-esque "give me an app that does XYZ" manner is obviously ridiculous and wrong, and will result in slop. Building any serious app based on "vibes" is stupid.

But if you're doing this right, you are not "coding" in any traditional sense of the word, and you are *definitely* not relying on vibes

Maybe we need a new word

simonw · 2025-09-23T18:55:25 1758653725

I'm sticking to the original definition of "vibe coding", which is AI-generated code that you don't review.

If you're properly reviewing the code, you're programming.

The challenge is finding a good term for code that's responsibly written with AI assistance. I've been calling it "AI-assisted programming" but that's WAY too long.

greymalik · 2025-09-24T01:11:18 1758676278

It’s just programming. We don’t use a different word for writing code in an IDE either.

dhorthy · 2025-09-23T20:35:50 1758659750

we can come up with something better :)

dhorthy · 2025-09-23T18:33:06 1758652386

alex reibman proposed hyperengineering

i've also heard "aura coding", "spec-driven development" and a bunch of others I don't love.

but we def need a new word cause vibe coding aint it

Xss3 · 2025-09-23T19:46:48 1758656808

Vibe coding is accepting ai output based on vibes. Simple as that.

You can vibe code using specs or just by having a conversation.

chickensong · 2025-09-23T21:58:25 1758664705

AI is the new pAIr programming.

giancarlostoro · 2025-09-23T19:00:53 1758654053

> but not explicitly in discrete steps and that was where i got into messes.

I've said this repeatedly, I mostly use it for boilerplate code, or when I'm having a brain fart of sorts, I still love to solve things for myself, but AI can take me from "I know I want x, y, z" to "oh look I got to x, y, z in under 30 minutes, which could have taken hours. For side projects this is fine.

I think if you do it piecemeal it should almost always be fine. When you try to tell it to do two much, you and the model both don't consider edge cases (Ask it for those too!) and are more prone for a rude awakening eventually.

hellovai · 2025-09-23T14:58:31 1758639511

Have you tried techniques that don’t require modifying the LLM and the sampling strategy for structure outputs? For example, schema aligned passing, where you build error tolerance into the parser instead of coercing to a grammar.

https://boundaryml.com/blog/schema-aligned-parsing

Jonovono · 2025-09-23T15:23:30 1758641010

I love BAML. Surprised it’s not more popular. I can get structured outputs on any model even ones that don’t support json schema outputs etc

hedgehog · 2025-09-23T15:40:25 1758642025

It looks really slick, for us the reason we haven't adopted yet is it brings more tooling and configuration that overlaps with our existing system for prompt templates, schema definitions, etc. In the component where we couldn't rely on OpenAI structured outputs we experimented with TOML-formatted output, that ended up being reliable enough to solve the problem across many models without any new dependencies. I do think we'll revisit at some point as Boundary also provides incremental parsing of streaming outputs and may allow some cost optimization that is not easy right now.

hellovai · 2025-06-19T21:41:18 1750369278

have you tried schema-aligned parsing yet?

the idea is that instead of using JSON.parse, we create a custom Type.parse for each type you define.

so if you want a:

   class Job { company: string[] }

And the LLM happens to output:

   { "company": "Amazon" }

We can upcast "Amazon" -> ["Amazon"] since you indicated that in your schema.

https://www.boundaryml.com/blog/schema-aligned-parsing

and since its only post processing, the technique will work on every model :)

for example, on BFCL benchmarks, we got SAP + GPT3.5 to beat out GPT4o ( https://www.boundaryml.com/blog/sota-function-calling )

solaire_oa · 2025-06-19T22:57:27 1750373847

Interesting! I was using function calling in OpenAI and JSON mode in Ollama with zod. I may revisit the project with SAP.

instig007 · 2025-06-19T23:29:25 1750375765

    so if you want a:

       class Job { company: string[] }

    We can upcast "Amazon" -> ["Amazon"] since you indicated that in your schema.

Congratulations! You've discovered Applicative Lifting.

hellovai · 2025-06-21T15:57:17 1750521437

its a bit more nuanced than applicative lifting. parts of of SAP is that, but there's also supporting strings that don't have quotation marks, supporting recursive types, supporting unescaped quotes like: `"hi i wanted to say "hi""`, supporting markdown blocks inside of things that look like "json", etc.

but applicative lifting is a big part of it as well!

gloochat.notion.site/benefits-of-baml

solaire_oa · 2025-06-20T00:52:35 1750380755

Ok. Tried it, I'm not super impressed.

    Client: Ollama (phi4) - 90164ms. StopReason: stop. Tokens(in/out): 365/396
    ---PROMPT---
    user: Extract from this content:
    Grave Digger: 
     Ingredients
    
    - 1 1/2 ounces vanilla-infused brandy*
    
    - 3/4 ounce coffee liqueur
    
    - 1/2 ounce Grand Marnier
    
    - 1 ounce espresso, freshly brewed
    
    - Garnish: whipped cream
    
    - Garnish: oreo cookies, crushed
    
    Steps
    
    1.  Add all ingredients into a shaker with ice and shake until
        well-chilled.
    
    2.  Strain into a coupe.
    
    3.  Top with whipped cream and crushed Oreo cookies (discarding cream in
        center).
    
    *Vanilla-infused brandy: Cut 2 fresh vanilla pods lengthwise and place
    into a 750 mL bottle of brandy. Let sit for 3 to 5 days, shaking
    occasionally. Strain out pods and store.
    
    
    
    IngredientType
    ----
    - Alcohol
    - Sweetener
    - Sour
    - Aromatic
    - Bittering_agent
    - Food
    - Dilution
    
    LiquorType
    ----
    - Gin
    - Vodka
    - Rum
    - Whiskey
    - Tequila
    - Mezcal
    - Brandy
    - Cognac
    - Liqueur
    - Wine
    - Absinthe
    
    Answer in JSON using this schema:
    {
      Name: string,
      Ingredients: [
        {
          Text: string,
          Type: IngredientType,
          Liquor_type: LiquorType or null,
          Name_brand: string or null,
          Unit_of_measure: string,
          Measurement_or_unit_count: string,
        }
      ],
      Steps: [
        {
          Number: int,
          Instruction: string,
        }
      ],
      Garnishes: string[],
    }
    
    ---LLM REPLY---
    ```json
    {
      "Name": "Grave Digger",
      "Ingredients": [
        {
          "Text": "vanilla-infused brandy*",
          "Type": "Alcohol",
          "Liquor_type": "Brandy",
          "Name_brand": null,
          "Unit_of_measure": "ounces",
          "Measurement_or_unit_count": "1 1/2"
        },
        {
          "Text": "coffee liqueur",
          "Type": "Liqueur",
          "Liquor_type": "Liqueur",
          "Name_brand": null,
          "Unit_of_measure": "ounce",
          "Measurement_or_unit_count": "3/4"
        },
        {
          "Text": "Grand Marnier",
          "Type": "Liqueur",
          "Liquor_type": "Liqueur",
          "Name_brand": "Grand Marnier",
          "Unit_of_measure": "ounce",
          "Measurement_or_unit_count": "1/2"
        },
        {
          "Text": "espresso, freshly brewed",
          "Type": "Bittering_agent",
          "Liquor_type": null,
          "Name_brand": null,
          "Unit_of_measure": "ounces",
          "Measurement_or_unit_count": "1"
        }
      ],
      "Steps": [
        {
          "Number": 1,
          "Instruction": "Add all ingredients into a shaker with ice and shake until well-chilled."
        },
        {
          "Number": 2,
          "Instruction": "Strain into a coupe."
        },
        {
          "Number": 3,
          "Instruction": "Top with whipped cream and crushed Oreo cookies (discarding cream in center)."
        }
      ],
      "Garnishes": [
        "whipped cream",
        "oreo cookies, crushed"
      ]
    }
    ```
    ---Parsed Response (class Recipe)---
    {
      "Name": "Grave Digger",
      "Ingredients": [
        {
          "Text": "vanilla-infused brandy*",
          "Type": "Alcohol",
          "Liquor_type": "Brandy",
          "Name_brand": null,
          "Unit_of_measure": "ounces",
          "Measurement_or_unit_count": "1 1/2"
        },
        {
          "Text": "espresso, freshly brewed",
          "Type": "Bittering_agent",
          "Liquor_type": null,
          "Name_brand": null,
          "Unit_of_measure": "ounces",
          "Measurement_or_unit_count": "1"
        }
      ],
      "Steps": [
        {
          "Number": 1,
          "Instruction": "Add all ingredients into a shaker with ice and shake until well-chilled."
        },
        {
          "Number": 2,
          "Instruction": "Strain into a coupe."
        },
        {
          "Number": 3,
          "Instruction": "Top with whipped cream and crushed Oreo cookies (discarding cream in center)."
        }
      ],
      "Garnishes": [
        "whipped cream",
        "oreo cookies, crushed"
      ]
    }

Processed Recipe: { Name: 'Grave Digger', Ingredients: [ { Text: 'vanilla-infused brandy*', Type: 'Alcohol', Liquor_type: 'Brandy', Name_brand: null, Unit_of_measure: 'ounces', Measurement_or_unit_count: '1 1/2' }, { Text: 'espresso, freshly brewed', Type: 'Bittering_agent', Liquor_type: null, Name_brand: null, Unit_of_measure: 'ounces', Measurement_or_unit_count: '1' } ], Steps: [ { Number: 1, Instruction: 'Add all ingredients into a shaker with ice and shake until well-chilled.' }, { Number: 2, Instruction: 'Strain into a coupe.' }, { Number: 3, Instruction: 'Top with whipped cream and crushed Oreo cookies (discarding cream in center).' } ], Garnishes: [ 'whipped cream', 'oreo cookies, crushed' ] }

So, yeah, the main issue being that it dropped some ingredients that were present in the original LLM reply. Separately, the original LLM Reply misclassified the `Type` field in `coffee liqueur`, which should have been `Alcohol`.

hellovai · 2025-06-21T15:42:22 1750520542

appreciate you tyring it. the reason it dropped the day was due to your type system not being understood by the LLM you're using.

the model replied with

       {
          "Text": "coffee liqueur",
          "Type": "Liqueur",
          "Liquor_type": "Liqueur",
          "Name_brand": null,
          "Unit_of_measure": "ounce",
          "Measurement_or_unit_count": "3/4"
        },

but you expected a { Text: string, Type: IngredientType, Liquor_type: LiquorType or null, Name_brand: string or null, Unit_of_measure: string, Measurement_or_unit_count: string, }

there's no way to cast `Liqueur` -> `IngredientType`. but since the the data model is a `Ingredient[]` we attempted to give you as many ingredients as possible.

The model itself being wrong isn't something we can do much about. that depends on 2 things (the capabilities of the model, and the prompt you pass in).

If you wanted to capture all of the items with more rigor you could write it in this way:

    class Recipe {
        name string
        ingredients Ingredient[]
        num_ingredients int
        ...

        // add a constraint on the type
        @@assert(counts_match, {{ this.ingredients|length == this.num_ingredients }})
    }

And then if you want to be very wild, put this in your prompt:

   {{ ctx.output_format }}
   No quotes around strings

And it'll do some cool stuff

hellovai · 2025-06-21T15:58:13 1750521493

if you share your prompt with me on promptfiddle.com i can play around with it and see how i can make it better!

hellovai · 2025-04-17T07:12:51 1744873971

really cool to see BAML on here :) 100% align on so much of what you've said here. its really about treating LLMs as functions.

dhorthy · 2025-04-17T15:05:31 1744902331

excellent work on BAML and love it as a building block for agents

hellovai · 2025-02-12T01:01:40 1739322100

yea! even deepseek. Calling an external function / tool calling is really just a data extraction problem.

say you have a tool:

def calculator(expr: str) -> float

then the model just needs to say:

{ "function": "calculator", "args": { "expr": "5 + 10" } }

then in your code you can easily pass that to the "calculator" function and get the result, then hand the result back to the model. Making it feel like the model can "call" an external function.

deep seek can also do this: https://www.boundaryml.com/blog/deepseek-r1-function-calling

hellovai · on Aug 20, 2024

i 100% percent agree. people get so caught up on trying to do everything 90% right with AI, but they forget there's a reason most websites offer at least 2 9's of uptime.

hnlmorg · on Aug 20, 2024

I’m not really sure what stance is here because you say you agree with the GP but then throw some figures that clearly disagree with the authors point (99% uptime is vastly greater than 90% accuracy).

hellovai · on Aug 14, 2024

We have some preliminary data with llama3.1 and we find that the smaller model gets to around 70% with BAML (+20% from base), but we'll update this dashboard with llama3.1 by end of week!

hellovai · on Aug 7, 2024

Take a look at BAML (boundaryml.com)

Its a different take that leverages a DSL to make prompting cleaner and fixes a few other ergonomic issues along the way.

you can try it online at promptfiddle.com

hellovai · on June 18, 2024

we recently added dynamic type support with this snippet! (docs coming soon!)

Python: https://github.com/BoundaryML/baml/blob/413fdf12a0c8c1ebb75c...

Typescript: https://github.com/BoundaryML/baml/blob/413fdf12a0c8c1ebb75c...

Snippet:

async def test_dynamic():

    tb = TypeBuilder()

    tb.Person.add_property("last_name", tb.string().list())

    tb.Person.add_property("height", tb.float().optional()).description(
        "Height in meters"
    )


    tb.Hobby.add_value("chess")

    for name, val in tb.Hobby.list_values():
        val.alias(name.lower())

    tb.Person.add_property("hobbies", tb.Hobby.type().list()).description(
        "Some suggested hobbies they might be good at"
    )

    # no_tb_res = await b.ExtractPeople("My name is Harrison. My hair is black and I'm 6 feet tall.")
    tb_res = await b.ExtractPeople(
        "My name is Harrison. My hair is black and I'm 6 feet tall. I'm pretty good around the hoop.",
        {"tb": tb},
    )

    assert len(tb_res) > 0, "Expected non-empty result but got empty."

    for r in tb_res:
        print(r.model_dump())

b2v · on June 18, 2024

Neat, thanks! I'm still pondering wether I should be using this since most of the retries I have to do are because of the LLM itself not understanding the schema asked for (eg output with missing fields / using a value not present in `Literal[]`) — certain models being especially bad with deeply nested schemas and output gibberish. Anything on your end that can help with that?

hellovai · on June 18, 2024

nothing specific, but you can try our prompt / datamodel out on https://www.promptfiddle.com

or if you're open to share your prompt / data model with, I can send over my best guess of a good prompt! We've found these models works even with over 50+ fields / nested and whatnot decently well!

b2v · on June 18, 2024

I might share it with you later on your discord server.

> I can send over my best guess of a good prompt!

Now if you could automate the above process by "fitting" a first draft prompt to a wanted schema, ie where your library makes a few adjustments if some assertions do not pass by have having a chat of its own with the LLM, that would be super useful! Heck i might just implement it myself.

aaronvg · on June 18, 2024

[Another BAML creator here]. I agree this is an interesting direction! We have a "chat" feature on our roadmap to do this right in the VSCode playground, where an AI agent will have context on your prompt, schema, (and baml test results etc) and help you iterate on the prompt automatically. We've done this before and have been surprised by how good the LLM feedback can be.

We just need a bit better testing flow within BAML since we do not support adding assertions just yet.

hellovai · on June 18, 2024

oh thats really interesting, how often do you get errors like that?

fyi, we actually fix those specific errors in our parser :)