Really interesting direction. The node-based canvas feels like a more scalable a...

hypnagogicjerk · 2025-11-20T18:08:41 1763662121

Side note, just for context, since there seem to be primarily video hobbyists responding to the OP:

Node based workflows are typical in NLE software. See Fusion & Color panels in Davinci Resole, Fusion (color grading), etc. Industry folks will take to this node based canvas with ease.

Great question @danishSuri1994

adishj · 2025-11-19T20:35:28 1763584528

hey, thanks for the comment!

we've actually found that multimodal models are surprisingly good at maintaining temporal context as well

that being said, there's also a bunch of additional processing using more traditional CV / audio analysis we do to extract this information out as well (both frame-level and temporal) in your video understanding

for example, with the mean-motion analysis — you can see how subjects move over a period of time, which can help determine where important things are happening in the video, which ultimately can lead to better placements of edits.