I was wondering if at some point an evolution of "cloud kitchens" could be "edge kitchens":
Imagine a high-end apartment or condo building. In the basement of the building, there is a fully equipped professional kitchen - however, it's not supposed to be used by humans. Instead, the kitchen is fully automated with smart/wired-up devices wherever possible. The actual chopping, cooking etc is performed by robotic arms like the ones in the OP. The kitchen has a human-accessible storeroom on one end (that must be regularly restocked by the condo administration) and is connected to the apartments by a system of service lifts.
Residents can order food through a an appliance or an app or something and choose from some menu of predefined meals (or possibly even create their own recipes through some sort of building block system). Once they submitted the order (and all ingredients are available), the kitchen springs into action, prepares the dish and places it in a service elevator. In the apartment, the dish appears, fresh and hot, like right out of a Star Trek replicator.
I guess we've just gotten one step closer to that vision :)
The coolest thing to me was watching it do laundry in the video it takes a jacket and puts it on a hanger and then zips the jacket. Watching it get the zipper was pretty satisfying.
It’s also amazing that there’s just instructions and it’s open to anyone. Kudos to the builders. They sauté shrimp, do laundry, and wash dishes with this robot you can build at home.
with machine vision, it might be feasible to change the learning step to where instead of using joysticks to manipulate the robot through the steps, it can watch you 'do it once' then extrapolate the kinematics for its body just based on the end states of your hands or something
Jokes aside -- something like this could be a fantastic breakthrough for people who've lost the use of their limbs. A wearable / chair-attached exo, driven to do complex tasks through simple commands.
It would be fantastic for my wife, who often cannot stand for longer than a minute or two.
If I understand it correctly, it learns from "watching" you during telepresence sessions? If so, and if it learns fast enough, it really would be a massive boon to some disabled people.
Although I want to see it peel potatoes and dice onions :)
Even if it can't do tasks that require more dexterity, it would still be a great help.
The "rinse pan" demo did not impress - but it doesn't need to. You don't have to teach it to "scrub bowls and plates" when you can teach it to "fill and empty the dishwasher"
Can someone with expertise explain why this is so difficult?
As an outsider with no knowledge of robotics, I've always been surprised that manipulation tasks (or just smooth robotic movement) are so challenging and seem to progress so slowly, especially when compared to IT.
I'm about a decade out, so I'm sure the SOTA has moved the goal-posts, but the work I did in grad(ish) school revolved around making end-effectors (fancy word for robot hands) that could both grab a styrofoam cup, and grab a wet glass (cylindrical, no pint-cheaters). Plenty of cups were crushed. Plenty of glasses were dropped. Nothing that did both happened by the time I graduated.
The amount of feedback built into your end-effectors (pedantic word for human hands) is insane. If you're not familiar, proprioception is a good google/wiki hole. Most of the signals that allow you to move your hands don't even hit the brain stem, let alone the boss upstairs.
The challenge mostly lies in how we've instrumented these things. Precision requires low tolerances. Low tolerances + unexpected environment == you've just driven your robot through the countertop/pan/coworker or broken a very nicely geared servo.
Adding to that, even if we had perfect end-effectors with a good sense of touch, understanding the real world enough to manipulate it is hard.
These days we have 3d cameras, but they still only see part of the objects we want to manipulate. The back side is hidden. So you need to either specify and model all objects to interact with, or have some word of a world model where we can predict what the full object, it's weight, center of gravity, surface texture, etc, is like.
And before we even decide to manipulate it, we have to detect it, categorize it and segment it (where does the pan stop and the stove begin?). We have to plan out a manipulation task, including finding grasp points, finding movement patterns that do not interfere with the rest of the environment, etc.
It's a whole bunch of separate problems that need solving all at once. There's motor control, building the right manipulators with the right sensors, bringing all the sensor data into something where we can make a single decision, understanding of the world and what happens during manipulation, and higher level planning.
I realize these are difficult problems, but couldn't we simulate how the human brain approaches these situations? That is, we don't model the entire 3D world in our head, but make decisions in real-time mostly by intuition and previous knowledge. We perceive depth of objects visually, and loosely map out their position and dimensions that way. We don't need to know the center of mass of every object, but have general intuition for where to grab it (if it has a handle, etc.). We have touch sensors to determine if something is hot or cold, and thus safe to handle, but a robot could have actual temperature sensors, making this easier.
I'm far removed from this field, and speaking as a layperson, so pardon my ignorance.
The thing is that you take intuition for granted, but machine parts just have none. Programming intuition is exceedingly hard, but we are getting closer with neural networks. I'd say its easier to program machine calculating predicted centre of mass of an object than algorithmic sense of intuition outputting suitable spot to grab the item effectively.
I get that, but yeah, with ML it would be a matter of training it on raw data: objects, materials, physical properties and behaviors, etc. And then "intuition" would arise from this knowledge, and its own experience from reinforced learning. It's the same problem as implementing self-driving in vehicles, just applied to a different domain. I'm not downplaying the difficulty, of course, but pointing out that this type of automation wouldn't be feasible if we'd have to classically program every scenario the robot is likely to encounter.
I don't think you're downplaying the difficulty but just completely unaware of the depth of it.
We don't even know if "intuition" would arise from the knowledge you claim, we don't know how that model would work, and even before that, collecting all the data (not to speak of availability of all the sensors) is a vastly more complex than even what ChatGPT or any LLM model data collection would ever be.
>it's own experience from reinforcement learning
This is a common mistake often heard from CS -> ML(RL) -> robotics transition folks. Reward function is given for free in RL, but in the real world, estimating the reward is a complex problem in its self. That's why RL on robotics have mostly seen success in quadrupedal locomotion; the reward function is simple (forward velocity, calculated from IMU), but how would you calculate a reward function in 30Hz+ for a simple task such as "chop onion and put it in the pan"? If you can construct the reward function for that task, most likely, you already have all the world-states available and might as well skip RL and do something else with that, such as Model-predictive control.
That's insightful, thanks. I'm indeed not aware of the complexities here. It's not my domain at all.
I love the quote at the end of that article you linked:
> As the new generation of intelligent devices appears, it will be the stock analysts and petrochemical engineers and parole board members who are in danger of being replaced by machines. The gardeners, receptionists, and cooks are secure in their jobs for decades to come.
> We should expect the difficulty of reverse-engineering any human skill to be roughly proportional to the amount of time that skill has been evolving in animals.
This other content really jumps out at me as well because it's extremely true.
Even older than walking and manual dexterity are really basic abilities like eating. We're nowhere close on that - were so far off it's not on anyone's radar. Robots will run on batteries or some other form of power - there is no way anyone is close to building robots that can eat break down food and use it for energy and repair. One of the oldest evolutionary traits.
The other is course being procreation. Will a robot be able to assemble a new one from pre-made parts? Likely not too far off. But could a robot build or grow one from scratch? That's so far off in the sci-fi future it's silly.
> Robots will run on batteries or some other form of power - there is no way anyone is close to building robots that can eat break down food and use it for energy and repair.
Couldn't we sidestep the complexity of digestion and just get energy from the Sun? With improvements in solar cells and battery technology, we wouldn't need to engineer something as complex as extracting nutrients from food.
I don't think we'd want to replicate biological systems in robots. Digestion and procreation happen at the cellular level, and achieving that with technology is indeed hard sci-fi. Autonomous humanoid robots can exist and be useful for us without this level of sophistication. Though once this happens AI itself will be capable of self-improvement, so we can leave it up to them how they want to improve. I, for one, welcome our new robot overlords. :)
I have not been close to this field in over a decade, but this is the internet, so I will comment anyway!
I think one of the issues is that in some parts of academia, progress is made one PhD at a time. And a PhD is almost always too narrow to bring all of these fields together. I'm sure they are solvable problems, and I'm sure they will be solved. But maybe it will take some other research structure? Private? Guaranteed long time funding for academic teams?
Muscles are pretty amazing. They have a higher strength to weight ratio than pretty much any small actuators we have. That strength is essential for smooth dynamic movement (the forces you encounter trying to pick up a gallon of milk, open a jar, get yourself out of bed are surprisingly large). In addition we don't just have muscles that go forward and back, or up and down, like many actuators. We have dozens of muscles engaged during pretty much any task that allow for flexible 3D force application. This is then coupled with reflexes and the brain's ability to accurately predict the body's motion, and that of things you interact with. Robot actuators are almost all reactive making their processing speed a limitation for control.* Humans all use long range predictive control to apply forces preemptively to smooth out motions. Finally we're highly optimized for an effort minimization, we don't just choose motions that work, we choose motions that are efficient. That objective and ability goes all the way back to evolutionary influences.
So yeah, smooth motion feels easy, but is a gd miracle of biology :)
* Robots can make up for a lack of prediction through really really fast control. This is how Boston Dynamics robots operate at a basic level.
Agree with all of that, and will add one more -- the power to weight ratio of muscles is truly remarkable. Really comes into focus when you compare to even current SOTA actuators.
could you elaborate on the mechanical/physical limitations that cause SOTA actuators to lag behind muscles, and if there's an equivalent "moore's law" that might predict when this gap closes appreciably, if ever?
Just some reasons as a robotics enthusiast:
Hardware:
- lack of off the shelf parts suitable for such projects. For example sensitive enough and compact at the same time pressure & force sensing. Also actuators, most that we have off the shelf is too slow, too bulky, too weak, or too strong/heavy to the point of being unsafe in direct contact with humans.
Software:
- one requires very low latency control loops and communications. If this is to go beyond local network use it will also have to include some software that will maintain the "feel" while connection gets dropped briefly, packets are delayed etc.
- whenever force feedback is used one requires robust safety protocols. If the manipulator is strong enough to simulate a force of lifting a heavy object it might be capable of breaking your arm or a finger. Software has to prevent this even when it "glitches" or if there is hardware failure.
No doubt there is lots of much more advanced reasons. These are just from the top of my head.
Also, the hardware is too damn expensive. They will have spent thousands of dollars just on actuators for those rigs. It makes it very hard for anyone outside a research or industrial setting to sketch out ideas.
For reference. A hobbyist building a quadruped with dynamixels, the most accessible robot servos, is going to spend 12x$60 = $720 just on motors alone for a measily torque of 1.5Nm.
Basically, such a motor can lift two chocolate bars (2x100g) at arms length.
I think these days we can quite reliably train robots to do single tasks from a few demonstrations. The problem is that "quite reliably" simply is not good enough for the real world. Also a "low-cost" system as presented in this work is still 32k (ignoring hardware and software integration costs).
In another tweet [1] the authors give a count of the successful (but not the failed) attempts at various tasks:
Our robot can consistently handle these tasks, succeeding:
- 9 times in a row for Wipe Wine
- 5 times for Call Elevator
- robust against distractors for Use Cabinet
- extrapolate to chairs unseen during training
Baby steps. I wouldn't give a human-operated robot control of dangerous tasks now, but some less dangerous tasks would be okay. Eventually, through iteration, the interfaces, robots, and operators will improve to where full body teleoperation, AI-assisted, and AI-operated robots will be achieved.
That's all kind of pathetically underwhelming. The video is at 6x speed and the robot is shown pushing some chairs forward, placing a pot in an empty cupboard and ... frying one prawn? One prawn?
All those are toy tasks that have no application in a real environment, even the constrained and save environment of a home. Most homes don't have such large empty spaces. Most of the time when you need to tidy up a bunch of chairs they're not put in a neat straight line by a RA, they're left in a jumbled mess by a stampede of students and you have to do a lot more pushing and pulling and turning around (and there's tables and possibly empty cups and stuff). Most of the time you need to cook a lot more than one measly prawn. And let's not talk about the primordial chaos of kitchen cupboards.
What's worse with all those demonstrations: the robots can only do exactly what you see in the video. Change the parameters even slightly: different shape pan, different height cupboard, different room configuration; and the magick -poof- vanishes, into thin air.
That stuff doesn't work. We aren't even close to solving autonomous robotic behaviour. RL doesn't work and the older techniques aren't working either (planning). All that stuff ever does is get published into papers, advertised with fanfare and then forgotten because it never makes it to the real world, because it's all unreliable and unpredictable and costs too much if you want to do anything real, and that's always something trivial. So the state-of-the-art in robotics is in hand-coded industrial robots that do one thing and do it over and over again and nobody asks them to generalise, or to come into your house and cook you a prawn.
Not general at all. In all the object manipulation tasks the objects to be manipulated have been placed just so. If you change the placement (or the objects), you need to retrain. And it would be extremely computationally expensive to get the robot to learn to carry out the whole task on its own, from finding and retrieving, e.g. the shrimp, the oil, the plate, the slotted spoon and the pan, one by one, before having to use them all in the right order.
That (hierarchical task planning) is a problem that is not yet solved (because it's combinatorially hard) and eliminates any idea of "general" ability: you have to set things up perfectly before the robot can do anything. Exactly like for industrial robots. It just looks more general because it's inside a house rather than a factory.
We aren't even close? What? The robot is doing its tasks properly. What you are complaining about is actually a minor problem. Ok the robot is 6x slower than it should be, that can be fixed by building faster robots. One day it will be 3x slower, then 2x, then 1x, then, oops.
What's wrong with only being trained to use a specific tool? The robot is designed to be easy to train via teleoperation. If you told me that one day it's going to be cheaper to program a robot to press a button than to remodel the switch to be robot friendly I would take these demonstrations as evidence.
The things you are complaining about are quite trivial. For example, they could have added a second prawn...
Edit: I just realized that it is only the first video was sped up by 6x. So basically, the robot is already fast enough and can only get faster.
>> What's wrong with only being trained to use a specific tool? The robot is designed to be easy to train via teleoperation.
What's wrong is the fact that it's not just one tool, but one particular tool (specific size, shape, colour etc), placed in one specific way on one specific countertop, in one specific kitchen etc etc. Try to calculate how many different configurations there can be of this combination and you'll convince yourself that there is no way to make any real progress with methods that rely on such specific conditions and have to be re-trained everytime any detail changes.
As a for instance, the robot trained to fry one measly little prawn would have to be trained again to fry two, or to fry them in a bigger or smaller pan, etc.
Btw, the robot didn't fry the prawn. It pretended to. Because it has no way to tell when the prawn is fried. Good luck training it to do that.
>> Edit: I just realized that it is only the first video was sped up by 6x. So basically, the robot is already fast enough and can only get faster.
I was talking about the initial video, not the others. And you're wrong- that's the limit of the hardware. To make it faster the researchers will have to build a faster robot and that still doesn't change what it can and cannot do, at any speed, fast or slow. The speed doesn't have anything to do with generality.
>> One day it will be 3x slower, then 2x, then 1x, then, oops.
One day we will all live on Mars and consort with aliens. I like science fiction too, but this is proposed as real-world progress, right now. Well, it ain't.
Imagine a high-end apartment or condo building. In the basement of the building, there is a fully equipped professional kitchen - however, it's not supposed to be used by humans. Instead, the kitchen is fully automated with smart/wired-up devices wherever possible. The actual chopping, cooking etc is performed by robotic arms like the ones in the OP. The kitchen has a human-accessible storeroom on one end (that must be regularly restocked by the condo administration) and is connected to the apartments by a system of service lifts.
Residents can order food through a an appliance or an app or something and choose from some menu of predefined meals (or possibly even create their own recipes through some sort of building block system). Once they submitted the order (and all ingredients are available), the kitchen springs into action, prepares the dish and places it in a service elevator. In the apartment, the dish appears, fresh and hot, like right out of a Star Trek replicator.
I guess we've just gotten one step closer to that vision :)