Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Andrej Karpathy (Tesla): CVPR 2021 Workshop on Autonomous Vehicles [video] (youtube.com)
189 points by vpj on June 21, 2021 | hide | past | favorite | 199 comments


It's interesting that an academic conference now feels like a marketing op for industrial research labs more than anything. His claims about how accurate their vision system is and how it is exceeding other sensors is not verifiable in any way to the public. Given how well qualified he is I am sure he is not wrong! Andrej is brilliant. But this is an academic conference right? This isn't open science, it's a discussion about an engineered system. I'm afraid this is the future of ML research (which CV is so heavily dependent on now). Long gone are the days of reading a paper and understanding the approach. Now you need the data and model which may not even be computationally feasible without millions of dollars in hardware. This isn't Tesla's fault or anything, it just makes me sad.


In the talk, he gave clear examples with detailed position + velocity graphs where the vision system detected obstacles sooner and with less jitter than the radar system. Specifically the overpass where radar triggers erroneous braking, and the pulled over truck where radar detects the obstacle significantly slower.


That's a strawman. No one is looking to build FSD with radar sensors that have shipped on cars for 20 years now for things like adaptive cruise control. LIDAR is what vision only is compared to.


Radar is the technology that is actually shipping in millions of cars. Lidar is future tech that is not practically to deploy in millions of cars as of now.

There is absolutely 0 chance they could just 'put in lidar' from now on so for what he has to do its not relevant.


When was last time you looked at LIDAR tech? It's already cheap enough that it can be easily deployed in millions of cars.


Hi-def 360 LIDAR that companies like Waymo or Aurora are absolutely not ready to put into millions of vehicles for a price that makes it anywhere close to reasonable.

Multiple companies that have announced Lidar have later removed it for the cars.

The Lidar that are actually currently deployed that you can 'just' get from suppliers is of the not of the same quality and we have yet to actually see real high volume cars with 360 Lidar on any true mass market car.


Elon Musk is a lot of things, but stupid isn't one of them. I'd be willing to bet that Tesla continuously looks at the state of LIDAR tech and evaluates whether it's at the point where it will be feasible to put it in a mass-market car in two years.

People keep getting hung up on their vision strategy because it sharply diverges from that of all other players, but it's really just tackling the "no one knows how to do this and no one really tries" part of the problem first.

No level 5 autonomous driving system will ever ship without really good computer vision.


Tesla cannot go into LIDAR without huge destruction of the company valuation and pissing off large base of customers. They've been selling a car ready for FSD for almost 5 years now, for a huge extra margin. LIDAR retrofit is impossible, and admitting that they need it means there's huge amount of customers who paid for product that will never exists. Refunds will not only put a huge dent on finance, but it'll also further damage brand of Tesla as a player in that space, and Elon as a god-like visionary.

Is LIDAR required for self driving? Probably, but it's not certain. But one thing is certain - Tesla is in the corner, and they have to deliver solution without it, using few years old inference HW and 5 years old+ cameras. Selling HW for a problem that doesn't even have a theoretical solution in software, so you don't even know what HW you need, is a bold, bold move.


It is a bold move, but perhaps we need more of those. How much of the talent in SV is focused on optimizing ad placement or building messaging systems?


1. Radar isn't gold standard of object detection. LIDAR is. 2. From reports by owners of latest Teslas, that don't even have radar installed, they still have erroneous braking. Driving is hard.


and yet cars with radar based collision avoidance dont end up decapitating people, or plowing into parked cop cars.


I believe the teslas involved in those collisions were versions that were using radar.


The talk wasn't part of the main conference but it was part of a workshop, which was just one out of tons of workshops.

http://cvpr2021.thecvf.com/workshops-schedule


Thanks for pointing this out, the distinction wasn't obvious from the linked video - that makes a lot more sense. I haven't been to CVPR since 2014 so it has been a while.


From what I have been able to tell recently, CVPR in particular is a venue where "engineered systems" get a lot of focus. I don't think this is necessarily a bad thing, nor do I think it is representative of the big ML conferences in general.


He made a very good argument for vision-only, but it seems like training actually uses radar data to help calibrate vision measurements, so it seems to me there’s value in making some vehicles still contain radar (say, one out of 10) even if it’s not used for controlling the vehicle directly at drive time.

Also, the sensor resolution issue he mentioned could be addressed by using a higher resolution radar sensor.

I find the list of 221 triggers to be interesting. In principle, the NHTSA or NTSB could help contribute lists of triggers to companies to validate their training sets on.

Every time there is a fatal airliner accident, the NTSB does a safety investigation and airliners get a little bit safer each time. In the same way, each fatal accident in a vehicle with this kind of autonomy could end up being captured by these triggers, improving safety over time in a sort of mixture between expert human analysis and ML.

(Nobody does this for all regular car crashes because fatal car crashes happen every day! And you’re not going to retrain human drivers about some new edge case every day, although you can for vehicles like this.)


Most fatal car crashes are investigated, but by the police not engineering experts. The invetigational motivation is legal and liability focused not improvement focused.

You sparked a happy delusion in my mind...training drivers to the same level we train pilots. Can you imagine drivers having regular check rides?


I would love to see re-certification of "professional" drivers. Almost daily I encounter taxi or semi-truck operators driving at the limit of what is acceptable.


> but it seems like training actually uses radar data to help calibrate vision

They seem to use radar solely to automatically label data for training.

In the given example though where according to the vision system smog interrupted the persistence of the label for the leading car, I wonder if the use of radar data to persist the label is strictly necessary.

A car disappeared then reappeared in the data, why not just tween the bounding box over time and assume the car had always been there, like, if it looked the same when it reappears or something. An extra sensor just for labelling data seems silly.


I don't think it's silly at all, although there are probably other ways to do it (license plates are a known size, so they could be used for calibration... to some extent). In fact, I think going to a sort of mapping solution where some portion of the Teslas on the road provide mapping data (using just calibrated vision) may be appropriate. If the reason you aren't using maps as an extra layer of capability is just because of the cost of mapping, then using Teslas to provide mapping capability "for free" is a good solution.

The argument may be that humans do fine without mapping, but that's not strictly true. Humans remember and anticipate the road based on past memory. When there's a change in the road due to construction, there's often a sign placed to warn drivers of the change in traffic pattern, showing that humans definitely at least partially rely on "mapping," so it could help even a pure vision system.


They use radar for way more than that given that Autopilot has it's capabilities limited in the new cars without radar vs. the old ones with radar.


Tesla's decision not to use the LIDAR as a safety feature (i.e. having reliable high-resolution data about things the car can collide with) is so incredibly indefensible, since solving the last 1% of this using only vision likely requires a general artificial intelligence

Prediction: Tesla will be the last of all major auto manufacturers to get to L5 autonomy. Time interval between when Tesla L5 FSD is finally available and when humanity is destroyed by the general AI it runs on will be very awesome and also very short


You get sparse point cloud from LIDAR sensors, not accurate 3D maps. This is the main reason why some people think LIDAR may not work well (mostly, only comma.ai and Tesla folks).

Vision can also get you 3D maps, either in active manner (IR floodlight or structured lighting) or not. I will reserve my judgement until see more from either side.


This framing is a common error in the debate. It's not cameras or lidar, it's cameras or cameras + lidar + radar. Nobody is driving on lidar alone. Many others actually have more cameras and are doing substantially more vision than Tesla is, they're just fusing lidar and radar perception with their vision pipeline. It gives you a more robust view of the world than using a single sensor modality.


If you have one piece of rotten meat in a perfect stew, you still have a disgusting dish. Good sensor fused with garbage in is still garbage in. That was one of the major points of the talk - the vision-only system is more accurate than the one with other modalities thrown in, even though the latter has more data. We intuit that the fusion network should just learn to ignore the bad sensor when it's unreliable, but this rarely happens in practice.

If anything, knowing when to reliably ignore a sensor modality is the kind of intuition more associated with general AI.

A similar paradox occurs when trying to fuse multispectral imagery. You'd think early fusion of RGB and IR would be better since it gives the higher-resolution filters access to more data, but it does worse than late fusion. My understanding is that late fusion forces the network to "work harder" to solve object detection using IR only, and then once you've wrung what you can out, then you fuse with RGB detections.

Since radar is "one pixel" there's essentially only one object detector possible: object or nothing. If yes-object, fusion tries really hard to make sense of the RGB filters to figure out what partial detection looks like an object, which is almost always a false positive.


You have fallen victim to the trap that this video so perfectly laid out for you. Tesla used the mmWave radar that has been in cars forever since it's a good way to do emergency braking and things like adaptive cruise control, particularly when you are a new company and you need these capabilities on your luxury sedan from day one. Now that they are much further along in their FSD efforts, they realize this mmWave radar isn't very helpful anymore. Cool, but nobody else was using it to begin with. LIDAR is totally and utterly different sensor technology.


Didn't Andrej show examples of emergency braking and how poorly radar performed v. vision?


He showed how poorly their outdated radar performed. The rest of the industry uses newer, vastly superior radars.


What radar did Tesla use? What does the rest of the industry use?


They are using Continental 2D radars from 2014. The rest of the SDC industry uses what is called 4D high resolution imaging radars. That's why Teslas have poor performance like phantom braking under bridges.


What models on the market use 4D imaging radars? All info I can find shows products releases in the past 1-2 years for future vehicles.


https://arberobotics.com/product/ is one. It was rumored Tesla themselves wanted to use Arbe radars, but evidently they have ditched the plan and gone radar less. I think AutoX from China is planning to use Arbe as well.

Waymo has its own custom developed 4D imaging radar and I imagine most of the other SDC players have their own versions.


Sounds like “none”.


None what? I just gave you an example of Waymo using 4D imaging radar [1]. Aurora also uses imaging radar [2], Mobileye as well [3]. AutoX is going to start using Arbe radars. All of them are leading players in SDC industry.

[1] https://blog.waymo.com/2020/03/introducing-5th-generation-wa...

[2] https://aurora.tech/technology/driver

[3] https://www.mobileye.com/blog/radar-lidar-next-generation-ac...


You said “everyone else” besides Tesla uses 4D radar, and “that's why Teslas have poor performance like phantom braking”. But there are zero production vehicles on the road using this tech right now.


I said “The rest of the SDC industry uses 4D radars” in a follow up comment, which clarifies I don’t mean car manufacturers.

Given this post is about radar usage in self driving and how Tesla/Karpathy touts their capabilities, I compare them to other SDC players, not other car manufacturers who are not in the game. And their SDC competitors are running away because they have the newer 4D radars.


Well, until these are actually adopted by a car company and out on the road, any claims about performance are meaningless. Your original comment implied Tesla chose to use inferior tech compare to the market which is not true in the slightest.


> Your original comment implied Tesla chose to use inferior tech compare to the market which is not true in the slightest.

Only if you don’t consider Waymo, Cruise, Mobileye and others as Tesla competitors.


Technical yet casual readers on HN will compare Tesla to other car manufacturers before CV programs in self-driving startups.

+1 to the other commenter (grandparent comment).


As a radar engineer, I am not happy about repeating hearsay discussions.

Radar has a range resolution and an angular resolution. Andrej completely omitted this fundamental aspect in his talk.

In my humble opinion, Andrej needs a real Radar engineer in his group before making such a statement in his talk.

Nobody can be smart for all topics, which makes all humble also.


> That was one of the major points of the talk - the vision-only system is more accurate than the one with other modalities thrown in, even though the latter has more data. We intuit that the fusion network should just learn to ignore the bad sensor when it's unreliable, but this rarely happens in practice.

This makes some important assumptions, namely that Tesla built a lidar and radar perception pipelines and sensor fusion of equivalent quality to their competitors, and then decided they were unnecessary.

Given that their competitors have shown substantially better perception than Tesla, and that Tesla has a significant economic incentive to deliver autonomous driving on a sensor suite that already shipped years ago, I find that difficult to believe. Did Tesla build good enough perception to dismiss lidar and radar purely on their merits. Unlikely I think. Did an intern build a student-quality lidar pipeline that "proved" Elon's camera-first approach is the right one? More likely.


Karpathy joined Tesla way before they ditched radar. You’re saying it’s more likely that he based his work on a “student-quality” prototype?


>We intuit that the fusion network should just learn to ignore the bad sensor when it's unreliable, but this rarely happens in practice.

That sounds like a problem with the network's architecture and not the data itself


Sensor fusion is hard, especially with data at various qualities (different generations of LIDAR, radar, probably images are more portable actually).

I am actually on sensor-fusion side, and think a transformer can merge everything and generate a coherent world-view. But this is a hard problem and people side-step them after the evaluation shouldn't be dismissed blatantly.

For one:

> It gives you a more robust view of the world than using a single sensor modality.

How can we evaluate that correctly?


One of the arguments made in this talk is which sensor to trust when there is a disagreement. I'd love to see the scenarios where this happens with three types of sensors and the logic they use to choose the "right" one.


You can't avoid this problem by only having a single sensor modality, because you still have to decide if you trust your sensor data or not. In effect, you actually made the problem harder because you have no other information from any other sensor modalities to help understand the world around you.

In effect, they solved the disagreement problem by sticking their head in the sand and pretending their sensors are never wrong.


You’re intent on oversimplifying the issue to support your view, but this comment makes no sense. Of course you can avoid sensor fusion issues by not doing sensor fusion. Regardless of how many sources you have, you still have to decide if you trust the data, having less sensors doesn’t change that. And having more sensors means you have to decide which input to trust.


That's a problem for deciding which system drives the thing (eg. do you use cameras for detecting oncoming cars and objects or lidar/radar), but if there are major disagreements between them it would make most sense to disengage the autonomous part and ask the human driver to take over, much like current TSLA cars do. Obviously not possible if you want level 4 self driving, but it can be good enough for level 3.


> Many others actually have more cameras and are doing substantially more vision than Tesla

This is actually my main criticism of Tesla's approach. They don't seem to have enough cameras to do the job well, and its showing in a lot of actual system limitations.


> You get sparse point cloud from LIDAR sensors

Take a look at how dense the point cloud from Waymo's 5th gen LIDAR is: https://www.youtube.com/watch?v=COgEQuqTAug&t=11599s. They just talked about this a few days ago.


Any idea why this video went private?


No idea, but here's another video of a different Waymo presentation showing the same point cloud: https://www.youtube.com/watch?v=uOLLrZzljs8&t=5420s


They're cutting up the original 3 hour long Livestream into separate videos for each presentation.


Point clouds from the next generation of LIDAR are looking less and less sparse.

https://youtu.be/COgEQuqTAug?t=11601


This video is private, I cannot access.


LIDAR is valuable as a safety feature, i.e. it can (unlike radars or cameras) reliably see (at least in clear weather) if there's anything in the car's path warranting evasion/braking maneuvers. In particular it's important that the LIDAR is dumb, i.e. its failure mores are predictable


LIDAR operates by timing how a photon reflects from a surface, it doesn't guarantee see everything. As you stated already, it cannot see in snow or rain.

I am actually on sensor-fusion side, but I don't think we should jump to the conclusion LIDAR is the best 3D mapping method.

For one example, a truck has a breaking distance of 600ft, either Velodyne or OS1 LIDAR has range less than that.


I agree about LIDARs not being the best general 3D mapping method, my point was mostly about using it as a dumb physics-based safety system. For autonomous trucks, limited LIDAR range could be mitigated by reducing speed accordingly and/or employing more powerful (or SWIR) LIDARs since trucks are more expensive and there could be a bigger budget for sensors


They’re betting that they can use a massive feedback loop to train a set of neural networks to the point where they are as accurate as LiDAR without actually firing any lasers.

Even if you believe this goal is possible to achieve at some point in the future, I think the argument falls apart when you consider that it will take years, probably decades, for a pure vision approach to catch up to where Waymo is today in terms of safety. (They have cameras too.)

That Tesla can’t afford to fit expensive LiDAR sensors to all of the cars it sells is Tesla’s problem. Regulators won’t give a shit that pure vision is “better” in theory. They will simply compare Tesla’s crash rate in autonomous mode with that of Waymo and other AV operators, and act accordingly.


I understand why they made the "no-LIDAR" bet early when the LIDARs were completely unpractical for a production consumer car

However, nowadays it starts to look that 100% reliable depth estimation from cameras might actually require a human-level AI to work and also solid-state LIDAR technology is becoming cheap enough and integrateable into normal cars, but Tesla can't really change their stance on this without admitting that FSD options they already sold would not actually become FSD within the lifetimes of these vehicles. I suspect this might also be the reason why Karpathy looks more and more nervous with each new talk


> FSD options they already sold would not actually become FSD within the lifetimes of these vehicles

That's pretty much a given at this point but they will not admit it until a class-action lawsuit forces them to.


But we might actually get a functioning Mars base out of this since that's where Musk will be hiding when the FTC finally wakes up /s


>100% reliable depth estimation from cameras might actually require a human-level AI to work

you don't need 100%, and even humans are far from 100% (500Mpx resolution of our eyes allows to basically sheer brute force through it in many cases). The stereo setup provides great and fast estimation with several megapixel resolution with good fps (way better than lidar) for majority of situations. It is some share of the [part of the] scenes, and you really know it right then and there, where you need AI and/or very sophisticated compute heavy algorithms. So instead of throwing AI and the compute power at those parts, you just pull the points from the lidar (and even radar if the things are that bad) covering that segment. And that way, given a couple more iterations of sensors (from current 20Mpx+ to the hundreds Mpx) and compute, it will be doing even better than humans. Anybody not doing sensor fusion would be a loser though - just like going into a fist fight with one hand intentionally disabled.


Is it something that can be retrofitted on existing models?


LIDARs unlikely since they require a line of sight forwards, backwards and possibly on the sides of the vehicle


One of the gains from using lidar is also that it's a different sensor altogether from cameras, with different failure modes.

For example, cameras are sensitive to glare from reflections (sun near sunset or reflecting on metallic objects) and oncoming traffic at night. Lidars operate on a different narrower wavelength and are unlikely to be affected by that, although they might struggle with objects that have low reflectivity at a long distance.

The fact that these sensors are different means that the intersection where a dangerous situation would not be detected by either sensor is much smaller than any sensor individually.

In any case, once AVs are deployed at scale, if it becomes apparent that some sensors can be removed or replaced by something else, then they will be if there's a case for it.


> I think the argument falls apart when you consider that it will take years, probably decades, for a pure vision approach to catch up to where Waymo is today in terms of safety. (They have cameras too.)

On what set of metrics do you think Waymo is safer? IMO it's too early to compare and cherry-picked proofs both from Waymo and Tesla are not really representative.


For starters, Waymo reported 1 disengagement per 29,944 miles driven in 2020 to the California DMV [1], while in the talk, Karpathy implies that a Tesla being able to drive around the SF area for 2 hours without a disengagement is unusual. Note that Tesla didn't file a disengagement report because they didn't do any autonomous testing on public roads in California in 2020.

There are issues with reading too much into disengagements, but there certainly seems to be a large difference here.

[1] https://thelastdriverlicenseholder.com/2021/02/09/2020-disen...


Waymo's number for 2019 was around 11,000 miles driven per disengagement. It's been improving steadily, at a reasonably good rate.

Tesla wimps out and won't test in California, because they'd have to report.


Isn’t Waymo only driving in small HD mapped areas? Previously the were only driving within a 50 mile area of Arizona where it’s clear weather all the time.


Waymo has published detailed safety performance data of their Arizona operations: https://waymo.com/safety/performance-data

You can read their other safety whitepapers in https://waymo.com/safety


Arizona roads are also mapped to extreme precision, have very wide lanes, and are optimized for cars. Waymo has prioritized low intervention by being overly cautious and avoiding hard maneuvers (like many left turns).

That doesn't work when they scale up to any other set of normal roads, especially as density and complexity increases.


They don't avoid left turns. There are plenty of videos from Chandler, AZ of Waymo performing unprotected left turns perfectly fine.

They will always map roads to precision, whether it's Arizona or San Francisco. Why is that a problem? You should either look at their CA disengagement reports over the years or wait until they roll out a service in SF (where they've been testing heavily). That will show how safe they are in dense environments.


From what I gather, they manually mark sections as hard when the cars get stuck there, e.g. due to road work, and then their routing system chooses another route, e.g. one that avoids the left turn.

The video with the Waymo car getting stuck and taking off from the rescue team had an example of this.

I guess it makes perfect sense from a engineering perspective.


Definitely. Especially with car companies like NIO strapping in LIDAR to their upcoming models.


I predict the opposite. Tesla sold half a million cars last year and will sell nearly one million this year. The data they have access to is increasing by orders of magnitude. I bet there is a point, let's say 20 million cars total, where they can pull so much high quality data that they will be able to surpass lidar capabilities for the purposes of self driving.

The lidar/no lidar discussion is a fun one because people have different ideas about how the world works. Personally I think LiDAR is the modern version of expert systems. It appeals to a logical/geometric intuition but the approach is brittle to real world contact, especially when paired with HD maps which are a great way to drive yourself into a local maximum.


The fallacy here is that the scale of the neural network used by Tesla is sufficient to capture the problem of driving given enough training. There is no guarantee that a reasonably priced neural network can encompass the task of driving.

Having training data beyond a certain point is overrated, and Tesla's advantage in gathering it is overstated. Other companies are capturing this data as well. Is there any indication that the data Tesla is collecting is of a higher value, or is it just more bytes?


It seems as if the people gobbling up the "Tesla has the data! Autopilot will keep getting better!" line have never trained a neural network in their life. Models converge. Loss stops decreasing, regardless of more incoming data. Extreme manual data cleaning effort becomes required to prevent overfitting. Model architecture has to change and hyper parameters have to be tweaked. Then you're back at square one as far as testing goes if you change any of those things.

The notion that Tesla's model HAS to keep improving simply because they will be able to pile on more (unlabeled!) data is laughably false. And, in fact, quite insulting to the intelligence of even the most casual ML engineers.


> And, in fact, quite insulting to the intelligence of even the most casual ML engineers.

Exactly, casual ML engineers. The issue of plateauing tends to occur because there is no more novelty to be had in the data. What mega-experiments like GPT and similar have shown us is that actually you can keep adding novel data and keep improving the model. Kinda inelegant, yet effective. The problem is, most institutions can't add more novelty beyond a certain scale, since that usually means shoveling more money at data storage and compute, on top of the novelty collection.

Tesla merely has to open the money tap to get more of both compute and storage, and let the real-time data flow in.


> Tesla merely has to open the money tap to get more of both compute and storage, and let the real-time data flow in.

And if you watch the other parts of the presentation, you'll see the bits about them buying clusters with 5k+ A100 GPUs. Presumably they intend to do something with those. Probably not streaming Fortnite concerts.


I would agree if their increase in data was linear, but it is increasing by orders of magnitude, which should have qualitative consequences for what they're able to accomplish as they claw their way through 9s. I don't see how it's possible to get progressively more 9s without scaling in both data and compute.

The point of the higher scale isn't just more data, it also makes it easier to solve the unbalanced data problem, because rarer and rarer scenarios will appear in large enough numbers to work with.


> The notion that Tesla's model HAS to keep improving simply because they will be able to pile on more (unlabeled!) data

That the exact opposite of what they are doing.

It seems like you didn't watch the talk at all.


>to pile on more (unlabeled!) data

given the nature of that data you can get a lot of unsupervised mileage, so to speak, out of it.


You make it sound extremely manual and sequential when reality is anything but.

A team with funds like Tesla, Google, FAIR is going to be using NAS and have a continuous testing pipeline. Tesla has arguably the best environment for continuous testing which is the most difficult part of improving a model. Andrej even said in his talk that their supercomputer is in the top 5 for FLOPs.

SOTA on ImageNet for the past few years has been driven by pre-training on massive datasets. Vision transformers are increasingly more common and are extremely data-hungry.


I'd say the data that Tesla collects is of lower value, because it doesn't have sensor info from a different modality. Other companies are getting a good reference to ground truth for both camera to lidar and lidar to camera. I don't know how much more valuable accurate distance sensing over a 3d field is compared to not having it, but I so know it's more valuable.

It may be valuable enough to require a few petaflops less computing power.


> The data they have access to is increasing by orders of magnitude.

You can only go so far by dumping more data into it. Diminishing returns.


We are chasing the 9s... diminishing returns are still returns. If 10x the data improves from 99.999 to 99.9995, that is still progress. Maybe 100x data gets you to 99.9999.


Are we even at 80% yet, compared to non-drunk/drugged/sleep-deprived regular human drivers? All of the videos I've seen are well below the driving capacity of a human taking their first driving lesson.


> Prediction: Tesla will be the last of all major auto manufacturers to get to L5 autonomy.

Tesla is also the only company to claim to target L5 autonomy. Everyone else, including Waymo, is strictly targeting L4 and say L5 autonomy is not possible or realistic. L5 is a pipe dream.


My prediction is that Tesla will eventually use LIDAR despite whatever they are saying now.

Right now their profit model is selling cars, not autonomy, so everything is optimized for that, including the decision to not use LIDAR.


lol what? did you watch the video at all


Well yes, but I think they'll be singing a different tune in 5 years, and even more so when more Tesla cars actually start driving in weather conditions less favorable than Palo Alto and Austin.

At some point when LIDAR is cheap enough there is no reason for Elon Musk to not give in and use them. Right now he's constraining the problem to the cost of the car.


LiDAR is not better in worse weather conditions. LiDAR performance degrades in rain and snow. That’s where something like radar is better. The Tesla get is that humans drive with vision so they should be able to as well. Also every other self driving solution must solve the vision problem also in order to be successful. LiDAR doesn’t tell you that it’s a bag on the road vs a raccoon. So the question is that once you solve the vision problem, do you still need LiDAR for any meaningful impact?


> LiDAR doesn’t tell you that it’s a bag on the road vs a raccoon

I think that's largely an issue with the early LIDAR devices today, but not necessarily what may be to come.

There's something to be said about measuring actual data with solid physics vs. inferring distances with billions of operations on RGB data. If you were landing a commercial aircraft in fog, you most certainly don't rely on your eyes to do most of it, but it is in fact possible to do safely precisely because we do have good sensors on them.

I fully agree with leveraging the scale and maturity of RGB sensors for cars today, the talk is spot on about that, but that's (a) circling back to the fact that Tesla needs to sell cars now not next year and (b) not a good case against use of LIDAR in the future.


It’s easy to add a few hundred thousand dollars of sensors to a $100 million plane. But $20k on a $40k car is too much and makes it cost prohibitive for most people. If vision alone can get you to say 1 crash in 10 million miles, that’s more than good enough to replace human drivers. If someone decides they want to go the extra mile and will pay for something rated for 1 in 20 million miles for 2x the cost, then they are free to do so.

As mentioned in this talk Tesla is taking an iterative approach and trying to make things safer for people today, not in 5 yrs from now. Maybe in a few years Tesla will see that to go from 10 million miles to 20 million miles they need 4k cameras at 60 fps, but the work they had already done would still have had a big impact. You don’t need to do things in one shot and get to the finish line.


> As mentioned in this talk Tesla is taking an iterative approach

> You don’t need to do things in one shot and get to the finish line.

I think we agree on these things. I wasn't trying to dispute their approach, rather explain their rationale for their current method, which is focused at actually selling product now.

> But $20k on a $40k car is too much

Those sensors don't need to be 20K-40K, they just aren't mass produced yet. That's why I think Elon Musk will change his tune in some years when those sensors are 1/100 the cost and 10X more accurate than vision.


I strongly disagree. By all measures I've seen (including a couple of slides in the OP's video), Tesla's self-driving is far safer than human driving: the number of accidents and deaths per mile driven are something like an order of magnitude lower (i.e., around 10x safer). I mean, the machine never gets distracted, tired, sleepy, emotional, drunk, etc., so it is a LOT LESS likely to crash on boring, monotonous road segments than most people -- who do get distracted, tired, sleepy, etc. Not only that, but people make really scary mistakes in routine circumstances. The video shows several examples of human drivers hitting the accelerator when they actually meant to hit the brake!

The criticism of autopilot is really about it getting tripped-up in response to statistically rare, unusual circumstances, i.e., edge cases. Karpathy et al are working on getting better at those, bringing the rate of situations that surprise autopilot closer and closer to 0%, even if it can never be achieved -- there will be always be surprises. Personally, I would rather take a tiny risk of crash on rare, once-in-a-million-miles events with autopilot driving than a ~1% risk of crash per 1000 to 2000 miles with everyday human driving.

Prediction: Tesla will be the first of all major automakers to get to level 4 and 5 autonomy.


>Tesla's self-driving is far safer than human driving: the number of accidents and deaths per mile driven are something like an order of magnitude lower (i.e., around 10x safer).

Lies, damn lies, and statistics.

Tesla here, is, again, being funny with the numbers. They LOVE to cite autopilot ON death statistics as being "10x safer than normal driving". What they fail to note is that Autopilot can ONLY be on while driving on a limited access highway. Highways are much safer to drive on than a mix of ALL ROADS, which is where the baseline figure comes from.

Another confounding factor is the price of the vehicle. The average CONFIGURED Tesla with the FSD package today costs what? $65k? More? Those X's and S's are $100k+. Nobody is buying that base Model 3. The point is that Tesla drivers are 1) Older and 2) Wealthy. Wealthy, older people get in far fewer car crashes than the average driver. In fact, car crash fatalities are really driven by two groups: drunks (or pill addicts), and young (teenage) men. Not saying it's IMPOSSIBLE to have a substance abuse problem and own a Tesla, but the average Tesla owner is less likely to have these issues. It's also less likely to own a Tesla while young.

So, Tesla autopilot stats should be compared to other comparably priced vehicles while driving on the highway ONLY. That would actually be a fair, honest comparison. I believe a recent outgoing BMW 5 series chassis finished its entire life without a single fatality in the US. That's right -- 4-5 years of service in the US without a single death. Turns out, wealthy people who drive expensive family sedans don't get in a lot of fatal highway crashes.

Here's a Forbes article (sorry) doing some of the back-of-the-napkin math. They estimated that in Q3 2019, autopilot really wasn't any safer than manual driving.

https://www.forbes.com/sites/bradtempleton/2020/10/28/new-te...


> So, Tesla autopilot stats should be compared to other comparably priced vehicles while driving on the highway ONLY.

I disagree. Autopilot driving on the highway should be compared to all human drivers driving on the highway. Otherwise you wouldn't be comparing against human performance per mile driven apples-to-apples.


Modern vehicles have much safer crash characteristics than older cars. The average vehicle on the road in the US is 11 years old. Do you know how much crash characteristics of cars have improved in the last decade? The comparison needs to stay in the modern, $65k+ vehicle realm for it to be apples-to-apples. Otherwise, you're comparing a bunch of decade old rust buckets with heat-cycled rubber, no blind spot monitoring, and Takata airbags to modern vehicles and claiming victory. Come on.

The BMW F10 535i had ZERO FATALITIES over its entire life in the US. Zero. It had no "self-driving" capabilities. Just lane-departure warning, BLIS, and ACC.


> The BMW F10 535i had ZERO FATALITIES over its entire life in the US. Zero. It had no "self-driving" capabilities. Just lane-departure warning, BLIS, and ACC.

Yes, all evidence I've seen indicates that cars partially driven by computers (adaptive cruise control, lane-departure warning, blind spot information, etc.) are safer than cars entirely driven by human beings. The BMW you mention is safer precisely because it is partially driven by machines. The more we automate driving, as machines get better and better at it, the safer we will all be on the road.


> Yes, all evidence I've seen indicates that cars partially driven by computers

Emphasis on PARTIALLY. Anyone who has read recent takeover scenario studies is rightfully horrified at the notion of a completely “hands off” driving experience, where the driver is expected to remain alert and vigilant but they’re not inputting any steering, throttle, or braking. Unsurprisingly, it takes people about 2 seconds to re-engage as active drivers. 2 seconds is way too long, which is why it may be safer to NOT use a system that does steering input in addition to throttle and brake. You need to keep the drivers actively engaged. And no, “touch the steering wheel every 30 seconds” is not active engagement. And if a car has “self driving” but also active driver monitoring, what’s the point? The driver doesn’t get to relax at all. The stress of driving doesn’t come from the input unless you’re racing. The stress of driving comes from having to stay alert. If I have to stay alert, I’d rather just drive myself instead of trusting an experimental system that drives like an indecisive, half-blind grandmother.

The BMW was safe PRIMARILY because it’s a well-designed, modern car, driven by an older and wealthy (safe) demographic. The assistance systems are probably secondary. They weren’t even standard on all vehicles and they were very primitive in that first generation.

If you actually read the Forbes article above, the back of the napkin math actually DOESN’T indicate that Teslas in autopilot are safer than normal driving. That’s the entire contention. I do not think these full-takeover systems are safer at the present time than active human drivers in comparable vehicles with safety assist systems. Tesla is very clearly fudging the numbers to make it appear as if autopilot is safer, but the claim doesn’t stand up to some really basic analysis.


> What they fail to note is that Autopilot can ONLY be on while driving on a limited access highway.

This isn’t true anymore


Tesla does not have self driving. Production Autopilot is not self driving. It's advanced cruise control with lane keeping, nothing that any other manufacturer doesn't offer on their cars. Full self driving doesn't even work, and is acknowledged by Tesla themselves, calling FSD a beta. And FSD can barely even do basic tasks like making an unprotected left turn.


>Tesla's self-driving is far safer than human driving: the number of accidents and deaths per mile driven are something like an order of magnitude lower

Still not close to good enough for people to accept:

"Participants from both countries required Self Driving Vehicles to be 4-5 times as safe as Human Driven Vehicles" [0]

0: https://pubmed.ncbi.nlm.nih.gov/32202821/


It’s already about 10x. But it is a bit tricky since they don’t break the numbers out by highway and local driving. Also their active safety features outside of AP also improve safety. So does it need to be 4-5x better than an already improved system that has AEB and lane departure avoidance and other safety features?

https://www.tesla.com/VehicleSafetyReport

> In the 1st quarter, we registered one accident for every 4.19 million miles driven in which drivers had Autopilot engaged. For those driving without Autopilot but with our active safety features, we registered one accident for every 2.05 million miles driven. For those driving without Autopilot and without our active safety features, we registered one accident for every 978 thousand miles driven. By comparison, NHTSA’s most recent data shows that in the United States there is an automobile crash every 484,000 miles.


There are way too many confounding variables in those numbers for you to directly compare them like you are doing. The biggest being that Autopilot is predominately engaged in situations that are already safer than average driving.


> It’s already about 10x.

It is not from your data alone. It maybe, but the data is not from a controlled experiment.

Because:

1. Autopilot disengages in dangerous/ambiguous situations. 2. The set of users with autopilot is different from those without.


Of course autopilot and FSD are safer than unaided human driving. That's because a human is still required to be in ultimate control. They are aids to the human, not replacements for the human.


Illusory superiority [1] will make us think the bad drivers are only those below average drivers. It will take a while for people to truly trust FSD just by accident stats.

[1] https://en.wikipedia.org/wiki/Illusory_superiority


So all of that 'research' and several FSD crashes later and they have added a new driver monitoring tool in the refreshed line of Tesla Model S Plaid vehicles because even with FSD turned on, you must have your eyes on the road at ALL times whilst driving. [0] Looks like someone made a correct prediction on this technology and it was none other than Comma.ai [1]

I have to give it to Elon that he was able to keep his fans believing a big lie of their FSD system to achieve Level 4 / 5 autonomy last year, when this year it was admittedly Level 2 [2].

The fans will continue to believe his lies and keep on saying "It's coming soon, you'll see..." when they have just been sold a Fools Self Driving System™.

[0] https://twitter.com/Model3Owners/status/1406002366923612163

[1] https://twitter.com/comma_ai/status/1406304017400012800

[2] https://www.news18.com/news/auto/teslas-full-self-driving-cl...


No one is going to get to a true L5 for a long time. That is totally irrelevant. It's a war of attrition. Whoever can monetize L3/L4 and can scale without any vehicle upgrade cost is going to win. It's pretty obvious lidar is very very silly since it doesn't scale.

It is also pretty easy to see that Tesla doesn't have to hit L5 to have won autonomy. It just has to successfully monetize L3/L4.


The vision-only strategy only made a shred of sense before high performance / low cost LIDARs like the Ouster OS1 became available.

Now it’s an indefensible position on safety or economic grounds.


LIDAR in a drizzly city like Seattle is of minimal value so the system has to fallback to vision only anyways


It's an anti-fragile approach, a term you'll recognize if you know of Nassim Taleb and his work. I think it will win in the long run, because not requiring HD Maps or specialized sensors is an advantage, and even if it takes more resources to make it work initially, it will save billions in the future, assuming, of course, it ever ships.

Obviously, pure vision is a viable system, as it's what we as humans use. The question remains as to whether or not it will be comparable to more precise LiDAR based systems in the near future.


This is not a good argument overall for several reasons. First, we should aim to greatly exceed human performance on safety. Second, whatever work goes into making the algorithms work well using vision can just as easily still inform a system that fuses that data with LIDAR for enhanced situational awareness and safety.


Waymo etc do not use LIDAR for object sensing, only for positioning. LIDAR sucks for object sensing because it gives you no information about whether it's a plastic bag or a person — you still need vision for that. Even if you just err on the safe side and brake, that itself can cause an accident unecessarily.


You're wrong on that. LIDAR is used for object detection in every self driving car company that uses LIDAR. There's tons of research on it.

Previous generations of LIDAR were not great at classifying small objects and road debris but it works great for detecting cars, pedestrians, etc.

https://paperswithcode.com/task/3d-object-detection

Next-gen LIDAR has great density, and I bet it would do a decent job differentiating between a plastic bag or a rock in the middle of the road. In addition to depth LIDAR also returns intensity and several other metrics, which can be used as input to an ML model. It's why you can read the lettering on the side of the semi truck in this video of Waymo's next-gen LIDAR.

https://youtu.be/COgEQuqTAug?t=11601


> Prediction: Tesla will be the last of all major auto manufacturers to get to L5 autonomy

That doesn't mean much considering no company is getting to L5 autonomy likely in our lifetime and possibly beyond.


> is so incredibly indefensible

How can you make such a strong statement when you simply don't know how to achieve full autonomy? This reminds of the teapot orbiting the Sun argument [0]. You and people defending Lidar by their teeth don't sound too different from religious zealots who've "seen the light".

[0] https://en.wikipedia.org/wiki/Russell%27s_teapot


Ultimately the proof is in the pudding. With Tesla FSD you can drive the highways from New York to Boston without any issues. I am sure there are many more routes across the country that can also be driven like that. Definitely not L5, but it works. Yet to see that from any other automaker, LIDAR or not. As far as I am concerned, great job Tesla! Keep it up, I am sure you will work through more tougher problems.


Innovation is a gamble. You're not wrong to point out they might fail. The likeliness of failure is what makes it worth the gamble of trying.


Generally this is true, but in the safety critical domain you don't gamble. You do your homework and make sure you aren't exposing your users to unnecessary levels of risk.

If Tesla was developing their system with trained safety drivers or on closed courses, I think they would have higher moral ground to gamble here. But placing the untrained public behind the wheel of alpha quality software is unethical IMHO. There ways to develop autonomous software that are significantly less risky, and the only reason Tesla is doing it this way is for marketing/PR purposes as far as I can tell.


> solving the last 1% of this using only vision likely requires a general artificial intelligence

That is likely close to true with lidar as well. See also some of Waymo's recent struggles in unexpected construction zones.

Maybe lidar helps in getting there, but I'm afraid they all hit a pretty tough ceiling without this.


> That is likely close to true with lidar as well. See also some of Waymo's recent struggles in unexpected construction zones.

That example has nothing to do with lidar, but with planning. The car was able to detect the construction zone just fine.


That's what I said.


Why do you think the last 1% is dependent on LIDAR versus any of the other multitude of gaps between today’s autonomy and L5?

If the only way it becomes practical to achieve L5 is to use LIDAR, Tesla can obviously add it. But if they waited until LIDAR was cheap and practical, they still wouldn’t be shipping any hardware doing autonomy today, and not collecting the data needed to train their models and delivering value today.

Also, with vision based systems, it operates in somewhat an intuitive fashion given we have eyes too.


I see a lot of people here are stuck on the perception side of things. There's a lot more to self driving than just the sensor suite and perception. There's a lot of work that needs to be done in the planning and controls department prior to the time we get full vehicle autonomy. Andrej's work is impressive, but I wish we'd see more research into the latter. Then again this is CVPR so...


I think the more interesting question is how much human context is necessary in decreasing accident rates. The signal question and tunnel answer hinted at that. Some context is very local and some context is general at the level of humans.

Examples: human eyes will have trouble adjusting to the sudden darkness of tunnels so some people will tend to brake suddenly; that person looks old and will probably have slower reaction times so watch out for the upcoming sharp turn; that person looks like they are on their phone and may cross the lane suddenly; watch out for this intersection because young humans cross it after school without looking so slow down below the speed limit.

This human understanding doesn’t seem to be directly represented by the system without explicit architecting on their part. A more general intelligence would begin to automatically learn these. A human intelligence would automatically model these or learn from experience or read about it.

As mentioned, the current system has some advantages over humans: more eyes, doesn’t get tired or distracted, faster reaction time. I guess we shall see when these advantages cover up the disadvantages.


I agree, though I think there's an obvious path towards that higher order understanding of the road that humans have.

Suppose they eventually have this current system dialed in and they get really good, accurate bounding boxes around all interesting objects on the road.

So now, in addition to their 10 second samples of video data they're collecting, they start collecting 10 second samples of scene representations.

These samples of scene representations are time series of how various objects in the scene are moving and behaving over time. Many examples of just what you describe: cars with older drivers having slower reaction times; cars with distracted drivers driving recklessly; etc.

Now you train a model on all that data, asking it to make predictions through time. It's going to quickly pick up on the same or similar cues that humans do. It sees an older person in a car and says that slower, cautious paths through the scene are more likely for that vehicle. They see a large, lifted truck and assume a 90% probability of a "cut off every car possible" path through the scene. Etc.

So I see what Karpathy is building now as a foundation upon which they can build the higher order stuff.


Yes, building the next level of abstractions and inferences.


tldr: Tesla uses vision alone, and has dropped radar and the other sensor. He makes a very decent argument why.

(Surprisingly, he basically ignores night driving.)


Ironically this is what Tesla criticized Mobileye for.

I still think that this is far the best demonstration of autonomous driving to date https://youtu.be/A1qNdHPyHu4


The 11 minute mark is terrifying:

1) The car fails to accelerate to beat the truck as it merges (the safest option given the scenario, even with the yield sign)

2) The car almost collides with the trailer of the truck and it stops with its nose sticking out into the merge lane

3) The car sits there with its nose in the merge lane as other cars go by

4) When the car finally has an opportunity to merge after having done everything wrong up to this point, it sheepishly merges and takes forever to get up to speed. This is basically the least safe thing it can do when yield-merging into a high speed lane.

LOL. This is the best the autonomous driving world has to offer? 80% of this driving is an immediate failure for a teenage driver taking their road course. I think I can take pretty much any random 2 minute sample from that video and find more than a few inputs/maneuvers that would lead to failure in a driver's license test. Why would ANYONE with a brain put their life into the hands of this system?

OH, and do you want to know a little industry secret about these "unedited" long videos? Yes, the video is uncut. But guess what? They drove that route hundreds of times and only showed the best run. And they got laser-HD maps of the route that they won't have globally. This is what Zoox did for their infamous demo that got them bought out by Amazon. And it's what Tesla did for their "Full Self-Driving" video on YouTube that shows driving ability that their cars cannot match even today. Shhh, you didn't hear it from me.


Sorry at (1) yield means yield. Maybe it shouldn't have stuck the nose so far out, but yeah you have to wait until proper gap in the traffic. The non-accelerating to highway speed is the real bug though.


There WAS a proper gap in traffic if the car used its accelerator on the ramp. The car would not have impeded the truck at all if it drove the ramp faster. This is an incredibly common issue with autonomous systems: anticipation of other vehicles with an awkward angle of attack, which is common while merging or while other cars are merging. Also, autonomous systems also tend to be way too cautious, which is incredibly dangerous in merging scenarios.

A good driver accelerates on that ramp on beats the truck by 2-3 car lengths and merges at speed. Yield does not mean stop. A cautious driver slows before the merge point, maintains a roll, then mashes the accelerator to merge safely at next opportunity. A terrible driver (Mobileye) stop abruptly at the end of the merge junction, almost hits a truck, leaves its front end sticking out into the oncoming traffic, then dangerously merges sheepishly without accelerating fast enough. You can even see how much this freaks out the human passenger. He thought he was going to get hit!


> A good driver accelerates on that ramp on beats the truck by 2-3 car lengths and merges at speed

No. A good driver stops at the yield sign if an approaching vehicle is 2-3 car lengths away. It is dangerous to play guessing games with your car's engine and another driver's attention. If anything goes wrong (tire puncture, engine dies, slippery road, truck driver slams the gas) you have successfully managed to put yourself into the direct path of a speeding brick wall 5x as heavy as you. There is never a safe merging scenario where the right choice is to slam the gas pedal to beat another driver to the punch, especially not when merging onto an active freeway.

If there is a yield sign at the end of the onramp, that means prepare to yield. If there is not a yield sign, then the expectation is to keep merging. There wouldn't be a point to the yield sign if everyone treated it as if it didn't exist.

> leaves its front end sticking out into the oncoming traffic,

I also agree with you but this may be a byproduct of the camera's perspective and maybe it would look normal in the driver's seat. We don't really have a great view but the truck could also have been hugging the curb which would have brought it closer to the car than it should be.

> dangerously merges sheepishly without accelerating fast enough

Fully agree here. If you merge you have to commit to matching the speed of nearby vehicles or else you're creating a dangerous situation. Mobileye should have sped up much faster than it did.

I'm not a neural network, fwiw.


I think you should watch the scenario again. The car is only beat to the junction by the truck towing the trailer because the Mobileye car takes the wide-radius ramp at sub 22 km/h (13 mph). Normal human driver takes that ramp at 40-50 km/h, follows the hatchback out of the junction at flow of traffic speed, with 2-3 seconds of distance behind them for the truck to have a safe braking distance. The truck would have no need to modify speed, and the Mobileye car would satisfy the yield sign (not forcing another car to hit the brakes).

https://i.imgur.com/irkHBxc.png

I think people lower their expectations for autonomous systems. Driving the ramp at a speed a human would and taking that obvious gap to zipper-merge is the correct maneuver for a human. Perhaps you want the autonomous system to err on the side of caution, but I'm actually trying to hold it to a human standard.

But yeah, at the very least, don't get confused and sit there with your nose out in traffic. What's really scary is that this is the take Mobileye chose to go with. They probably had dozens more, with even worse blunders.


Wow that demo has it all.

- car stalled in it's lane - complicated intersections - people exiting cars in it's lane - car going over into it's lane

If I had an hour of driving that I'd be stressed.


Wait until you see the Jerusalem 40 minute video. Munich traffic is tame in comparison: https://youtu.be/kJD5R_yQ9aw I really don’t understand why mobileye gets so little recognition. They might be quietly winning the self driving race.


The list of triggers contains things like 'motorcycles at night', so it seems its all in that dataset.


Not a fan of TSLA, but isn't night driving just a special case of daytime driving if you use IR and/or hyperspectral cameras?


I imagine it's far more tricky due to the lower image quality and increased noise on the sensors (less light/energy hitting them).

Loss of colours in your visual imput as well is a bit of an issue...



Thanks! Maybe it's best if we change the URL to that from https://twitter.com/vpj/status/1407000737423368197.


That video is a screen capture from another video (which was screen capped from a livestream), but the original has much better audio quality.

Here's a direct link:

https://www.youtube.com/watch?v=eOL_rCK59ZI&t=28293s


Ok, I've switched to that link above (from https://www.youtube.com/watch?v=NSDTZQdo6H8). Thanks!


I resonantly had an argument on here where somebody insistent that breaking because of over-passes were issues with vision system. Seem pretty clear that it is the resolution of the radar, not the shadow of the bridge that causes the issue. Good to get some more insight into this.

This is the right thing to focus on, as it is by far the largest issue with Autopilot on the highway. Multiple people who do testing of these system that false positives on some highway overpasses are the biggest usability issue.


The video directly addresses this…


Per "it's unscalable to get HD 3D maps of all the roads on earth", it's interesting to consider that Google/Waymo has been growing this for years with street view and the sensors on each car. Curious to see how that plays out


HD 3D maps need to be way more accurate and be enriched with massive amounts of detail. Like where lights are, what lights are relevant for what lanes and so on.

You can't just pull out your street view footage from 3 years ago.


Good point. Street view cars since 2017 have had high def Lidar sensors on them[0], and they have some prowess in extracting features [1], but I'm not sure if that's enough detail.

[0] https://www.geoweeknews.com/blogs/google-putting-lidar-new-s...

[1] https://www.androidpolice.com/2021/01/16/google-maps-is-roll...


Even if FSD takes longer, I sure am glad about the active safety features to prevent dumb incidents. Hope that trickles to every other manufacturer petrol/electric


There's clip after clip of AP yanking people out of situations where they had no idea they were in danger. Wife and I are planning on kids soon, and we won't consider anything except a Tesla due to those safety features.


What an annoying charlatan. Karpathy is a brilliant computer vision engineer, but he has let his expertise in that subfield cloud his judgement on achieving the overall goal of autonomous driving.

Musk and Karpathy have been dead wrong about LIDAR for years. Remember Musk making the absurd claim of a million Tesla robotaxis by 2020? I think most hilarious is that both Karpathy and Musk claim the LIDAR systems are too expensive. Yet, in the same 2019 Autonomy Day they simultaneously claimed that Teslas would be able to drive themselves and operate as robotaxis, earning their owners passive income and therefore justifying significantly increased MSRPs. So, the $7k LIDAR system (that accelerates safe autonomous driving) is not worth the cost, yet stumbling towards autonomy on vision only is? If the car becomes an money-earner, you should use all of the systems available. The 2019 Autonomy Day was an utter embarrassment. I'm sure 2021 will be more of the same.

So now it seems that they've realized their folly in logic. So what's the solution? Well, you can't just complain about COST of non-vision perception systems. Because, as noted above, that doesn't make sense if you're going to simultaneously claim that your car will be able to earn you money (augmenting any extra hardware cost that gets you to that point faster). No, now you have to smear all non-vision perception systems. You have to say that their data is worthless and detrimental to the overall effort.

The entire claim from the 2019 Autonomy Day that "vision is what humans use to drive" is also completely bogus. Humans use many senses to drive. They feel the pedals and steering wheel. They use their equilibrio sense to sense motion. And they use their hearing to hear other vehicles, sirens, and issues with their own car (driving with headphones in is illegal for a reason). Any modern car, even a Tesla, is also using far more than just vision when attempting autonomy. Forget about radar and LIDAR for a moment. There are endless sensors in the drivetrain. Steering angle sensors and multiple IMUs for the electronic stability control. Brake and wheel sensors for the ABS. Temperature sensors everywhere. And countless other ECUs. The notion that vision is getting you there exclusively is nonsense. There's no good argument against LIDAR today other than perpetuating a lie to sell cars that are cheaper to produce. And, Karpathy has a massive professional conflict of interest in making CV the main player -- he's a CV expert. He was never a fusion expert before his hire. If CV is the pathway forward, he's gets to remain "the guy". It certainly behooves HIM to make that claim.

Autonomous driving will not be achieved in this decade. Perhaps ever. Ask yourself honestly: if you were tasked with building an autonomous commercial aircraft OR an autonomous car, which would you choose? Most would say aircraft -- nothing to really hit in the air, fully mapped airport and runway systems, and far fewer variables. Yet autonomous aircraft still do not exist. Perhaps the edge cases always rule the roost. Ask yourself why driving would be any different...


Re: senses - have you ever played a driving sim? You can drive just fine with vision without tactile.

Sirens are primarily a means to get you to look in a direction. 360° cameras can notice the emergency vehicle as soon as it's visually relevant. And if they decide they need an audio siren detector, that's like, practically intern level signals detection at this point. Hardly a dealbreaker.

100% hands-down would pick autonomous car vs airplane. Flying a plane isn't just moving the aluminum bird through 3 space and periodically taking off or landing.

Autonomous aircraft don't exist because a huge amount of the ritual of flight is before and after the captain is even on the plane, let alone flying. There is a tremendous amount that the pilot and copilot go through, on the ground, before taxi, after liftoff. It's way, way more involved and way more generally intelligent. We can design AI to take off, path to a destination, and land. Those 3 things are the easiest parts of flying, yet do not comprise the act of flying a 2-seater, let alone an airliner.


Mentioning the senses wasn't meant to be an itemization of "senses" that a car or car operator needs. Of course you can still drive decently while deaf and without feeling the g-forces. The point is that vision is NOT the only input. In semi modern (non-self-driving) cars, the driver is still assisted by a flurry of additional vehicle sensors in addition to vision and the other human senses. So dismissing LIDAR as "not needed" is foolish.

I understand the complexities of modern commercial flight. I still consider it far less complex (computationally) than driving on today's public road. The fact that you have V2V communication out of the box (via transponders) is probably the biggest factor. I would not rule out autonomous driving WITH a standardized V2V / V2E system. Without one? I'm bearish.


My understanding is that LIDAR is diminished on the aspect of not just "not needed," but also potentially exacerbating local minima under good (e.g. fair weather) conditions.

Having LIDAR means you can rely on LIDAR, which means you need the LIDAR to be 100% reliable. I've read some companies claiming their lidar works through rain and snow but I haven't seen actual renders of it yet to judge for myself.

No lidar acts as a forcing function for Tesla to get their vision system that much more reliable. I think this will ultimately result in a better vision system than the competitors, and I believe that (somewhat paradoxically) this will result in an overall better, safer, and more reliable system, when it comes to the 20% long tail.

Lidar+vision will probably dominate in fair weather driving. I'm skeptical it's of much benefit in bad weather, and that long tail is the hard part.


> No lidar acts as a forcing function for Tesla to get their vision system that much more reliable.

Alternatively, no LIDAR means you aren't gathering corresponding LIDAR and vision data that you can use to improve detection of the vision system.


Tesla does use LiDAR for calibration and testing of their vision systems. But they also have a million cars on the road with radar which also gives fairly accurate depth information. They use the depth information provided by the radar to automatically label their data and then they can check their vision based estimates against that data. Karpathy has mentioned this approach in a few different talks.


> Musk and Karpathy have been dead wrong about LIDAR for years.

Right, because all those companies that use LIDAR are making billions driving people around. Oh, wait, actually they are burning 100 of millions every year.

> I think most hilarious is that both Karpathy and Musk claim the LIDAR systems are too expensive.

That's literally the opposite of what Musk says about Lidar. He LITERALLY said he wouldn't use them if they were free.

> yet stumbling towards autonomy on vision only is

Look up the Marginal Revolution from 1870.

> The entire ...

... The entire paragraph is an exercise in missing the point.

It seems really what you are saying is not that Tesla are charlatan but the whole industry is.


>It seems really what you are saying is not that Tesla are charlatan but the whole industry is.

The entire industry has an air of charlatanism. Tesla is the biggest charlatan of the bunch. The LIDAR-using ventures at least have a snowball's chance in hell. Tesla has no chance with vision alone.


I would love to know what your credentials are that you can say this with such certainty


Just the fact that this person created a throwaway account just to thrash talk, speak volumes.


> Yet autonomous aircraft still do not exist.

Well, that's a truth with modifications [1]

[1] https://www.youtube.com/watch?v=B2uc98EEPqE


I like Andrej from his PhD research days and awesome blog posts but this is a series of disasters in the making, that is until FTC steps in after more people die from “self-driving” accidents under interesting and unexpected circumstances.

The whole vision vs. LIDAR stuff is a distraction as long as Tesla “AI” doesn’t have common sense.

It literally doesn’t know what it’s doing, and the tail of edge cases to "fit" the models is infinitely long. ANNs are fundamentally backwards looking and cannot adapt to unforeseen or even slightly unusual combination of circumstances. It will go fine for n miles and will dramatically fail at mile n+1 where a new situation requires understanding of ones surroundings, and n is arbitrary number.

It would be more honest to show the cases where it missed, thankfully there is no lack of them in “FSD beta“ videos on YouTube.


> ANNs are fundamentally backwards looking and cannot adapt to unforeseen or even slightly unusual combination of circumstances. It will go fine for n miles and will dramatically fail at mile n+1 where a new situation requires understanding of ones surroundings, and n is arbitrary number.

And humans also fail after some arbitrary number of m+1 miles. And in many cases far more catastrophically then AI.


>common sense

What a ridiculously useless term.


Try using common sense to figure out what he meant by that.


a better phrase might be “ability to do higher order reasoning to decide what to do in a novel situation”


exactly, who needs common sense when you can sell the bright vision of self-driving future.


There are two fundamental reasons, in principle, vision alone can do it: 1) Humans do it with vision alone 2) You can actually predict lidar’s output with vision alone. So many systems out there actually use lidar to generate more labeled data to make lidar unnecessary


In principle. But we do have 3 billion neurons for that with ridiculous number of interconnections - even if an order of magnitude less connection would be enough, we are still far away from that amount of computations. And that is actually the easy part of the problem - mapping an image to a depth map. The hard part is interpreting it for which we have a complete inner universe built up, with context-dependent logic. FSD is simply a really hard problem that we are not even getting close to. We are at a fancy robot vacuum level.


> Humans do it with vision alone

And despite humans having legs and birds having articulated wings, ground vehicles use wheels and airplanes use fixed wings with propellers or jets. Why?

Just because biology solves a problem a certain way doesn't make that same approach optimal for machines. Typically with machines you need to work around other systemic deficiencies.

Humans may have only two cameras, but those cameras are connected to a vision, spatial understanding, and reasoning system that is light years beyond what current day AI is capable of. But the good news is that we can make up for AI deficiencies by giving vehicles super-human sensing ability which makes those problems easier.

Tesla's approach seems to be one in which they think the sensors are the expensive/hard part and solving general AI is the cheap/easy part. I think plenty of people would not agree.


Human higher-order reasoning is crucial to our ability to use vision to interpret the world. We have a model of the world and we use our vision to adjust it. We don't just draw bounding boxes, we label objects as cars or trees or shadows, and use our understanding of basic physics to predict their possible movement. We further label other data as humans or driving cars or animals, and use our understanding of other agents to predict their plausible movements.

Vision is only used to periodically adjust this model of the world - most of the time, we are living in our heads. This is so true that we normally don't even notice that we entirely lose our vision every time we blink or move our eyes, and we don't notice the extreme blurriness that we have in areas of our 'vision' where we are not currently focusing our eyes.

It's very educational to visit some areas built with non - common sense objects to understand just how limited human vision is if you remove our understanding of the world from it.


There are many things humans do with vision alone that machines can't. For instance see if a person standing 50 ft away is looking at you or at the house behind you.

The human vision system operates at a bitrate equivalent of well beyond 500 gigabits per second. Resorting to "humans can do it with vision alone" is only a sufficient argument if your computer vision system can match that.


> The human vision system operates at a bitrate equivalent of well beyond 500 gigabits per second

Do you have a source for this claim? Academic sources suggest numbers 4-5 orders of magnitude smaller: 6Mbps [1], 10Mbps [2].

[1] The Oxford Companion to the Mind (1987)

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1564115/


Alright, I should clarify, as my first post was a bit sloppy.

We know the eye has highly nonuniform resolution, with the high resolution region moving around all the time to cover areas of interest. I agree this system has something like 10 Mbps actual bitrate.

But we don't know how to make such a system for computer vision. Our CV systems all have uniform resolution. And thus a CV system needs to match the peak resolution of the human vision system. That requires a ~500 megapixel resolution at 24 bits per pixel and ~50 frames per second.


Thanks for the clarification. I still don't fully buy the argument.

An AI driver doesn't necessarily need to match or exceed a human driver's sensors in order to drive as well or better. The human eye does indeed have very impressive resolution and tracking, but the ability to attend to a certain area of interest also means that you miss things in other areas.

Instead of using the human as the reference point, we can turn the question around and ask instead: given a video feed from a current generation Tesla, would a human be able to drive safely? I would suggest yes, though it would take some adjustment from existing driving habits. If a human could do it with Tesla's current sensors then an AI almost certainly can eventually do better with the same sensors.

I actually think the bigger issue with Tesla's approach (and I say this as a generally happy Tesla owner... though the timing is terrible since it had to be towed to the service center today) is their insistence on minimal maps/local data for driving. I understand why they want to do that for scaling purposes, but there are lots of intersections that are confusing for your average human the first time they come to it. I expect figuring out the traffic pattern in a confusing intersection is the harder AI problem.


With what kind of reliability can LIDAR data be predicted with vision nowadays?


besides vision, humans also have this thing called brain, and reasoning, and instincts, and being able to tell if the object in front of them is e.g. a roof of an overturned truck vs. empty space, etc, etc.

the key word is "etc" which expands into infinite tail, which no big data training on farms of GPUs would ever help with.

Humans and other animals have an ability to understand the scene and generalize from prior experiences to infinite set of new and unexpected circumstances, the "common sense" these dumb curve-fitting models are fundamentally lacking.


> humans also have this thing called brain, and reasoning, and instincts, and being able to tell if the object in front of them is e.g. a roof of an overturned truck vs. empty space, etc, etc.

Assuming the car is reasonably good at this, it has the advantage that it can see in every direction at once.

I don't think self-driving cars will ever be perfect, but I think they will quickly become less-lethal than the average human.


For the average case, perhaps. But we are not doing too badly for the average case either.

Let’s see how well does a Tesla react to exceptional cases where actual decision making is required, few data sample is available, etc. Statistics can be misleading with rare events.


I'd hope they have a "panic" mode that at least "safes" the car.

It's also worth pointing out that humans are notoriously bad at "exceptional cases" sometimes. People full-stop on the freeway for small animals to cross sometimes or panic when adjusting their eyes out of tunnels.

I do think it would be interesting to compare the two (computers vs humans) and look at the venn diagram of crash cases.


I've read claims that they are desperately trying to hire.

https://mobile.twitter.com/TaylorOgan/status/140705191831739...


For the record these are some blatant & false FUD attempts.


Good to hear it from the source! Great talk by the way. As someone in the CS field but with no knowledge of CV/NN I still find these presentations very insightful.


There is a lot of that on HN whenever Tesla is the subject.


If Elon was any good at keeping his word it wouldn’t be an issue.


Please don't perpetuate tedious flamewars.


Anecdata but both of the people I know that worked on Autopilot quit within 18 months of starting, citing extreme overwork and Musk micromanaging things. This lines up with that.


People have been claiming the same thing about SpaceX and Tesla for 20+ years now doesn't seem to stop them. Having a small sample size theater about people you personally know isn't really representative.


Maybe it’s an unfortunate side effect of their success. Almost all early members of the team are now multi millionaires if they held their stock.



This appears to be the source given[1] for the claims https://www.snowbullcapital.com/tracking-tesla-job-postings. No idea what to make of this methodology, which appears to be based on tracking re-use of req ids. Someone must be scraping LinkedIn data, which would surely be more reliable.

[1]: https://mobile.twitter.com/TaylorOgan/status/140709356765125...


We detached this subthread from https://news.ycombinator.com/item?id=27584719.


[flagged]


I didn't even see this comment before you edited it. Please don't pull tricks like this.


Tricks? You detached a perfectly valid thread. Kathpathy only responded to this thread with a shallow dismissal and none of the other critiques. That should tell you something.


By "trick" I mean posting whatever you posted and then deleting it and claiming that it was "censored by dang" when you had zero basis for saying that. That's obviously abusive.

If you have a problem with, or a question about, why I moderated the thread a certain way, that's of course fine, but a different issue.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: