Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The RAD750 has a 150nm to 250nm minimum feature sizes, a Ryzen Zen 3 chip has a feature size of 7nm,

Yup. And while the transistor sizes haven't really shrunk by the 35x that this suggests, they've still shrunk by a lot. And, of course, the area has shrunk by this amount squared.

> causing all kinds of interesting and exciting problems

Sometimes super exciting. Latchup can turn the whole package into a white-hot parasitic transistor connecting Vcore to ground at low impedance. Rad-hard variants can largely eliminate this possibility.



This does make one wonder about an architecture that has 4 or more identical processors hooked up to the same inputs and two watchdogs watching them (and each other). If any processor starts disagreeing with the rest it is instantly reset. If any processor starts to draw too much or too little current it is reset. If either watchdog stops responding to the other it is reset.

Keeping the working data synchronized would be the real trick. One could imagine all of these CPUs are hooked to a single bank of redundant and ECC stabilized memory, and all access goes through the watchdog processes which will only let through the ones that are in agreement.

The end result could be a system that is lighter and faster than the traditional RAD hardened system simply because it's built on a smaller process. The downside is the enormous complexity of the watchdog systems, they would be very expensive to get right. Also, synchronization is one of the hardest problems in computer science. It's basically cache invalidation on steroids.


> This does make one wonder about an architecture that has 4 or more identical processors hooked up to the same inputs and two watchdogs watching them (and each other). If any processor starts disagreeing with the rest it is instantly reset. If any processor starts to draw too much or too little current it is reset. If either watchdog stops responding to the other it is reset.

This is a common set of techniques in critical systems. The smallsat built by students that I mentored didn't have voting, but it had processors with moderately sophisticated watchdogs performing mutual-power-monitoring and simple hardware failsafes. Aerospace control systems often run in lockstep and have voting, etc.

> Keeping the working data synchronized would be the real trick. One could imagine all of these CPUs are hooked to a single bank of redundant and ECC stabilized memory, and all access goes through the watchdog processes which will only let through the ones that are in agreement.

Typically you make the memory redundant too, and just ensure that input and outputs are common and all code is deterministic. e.g. There's Tandem / HP NonStop which use these techniques.

> The end result could be a system that is lighter and faster than the traditional RAD hardened system simply because it's built on a smaller process.

Handling upsets by voting makes a lot of sense. But radiation can cause permanent damage of small geometry circuits. And even with fast mechanisms to crowbar power in the event of a latchup, you're not really sure that you'll save the day.

It's a great way to handle things on the cheap for cutting edge payloads and research projects, though.


>>Typically you make the memory redundant too, and just ensure that input and outputs are common and all code is deterministic. e.g. There's Tandem / HP NonStop which use these techniques.

Tandem was software fault tolerant - the Stratus machines were fully hardware fault tolerant and ran code in parallel on seperate logical cpus to test for faults. Truly neat machines. Also, it was a PL/1 based machine - even the OS.


Yes, you're right. Turns out this doesn't apply to all of NonStop. I was thinking of Tandem NonStop CLX which my brother worked on and was lockstepped.


This is rather like how the old Stratus super mini's worked..

12(?) Motorola 68K chips were wired into multiple 'logical' cpu's, and programs executed in parallel on each. If a discrepancy arose, majority won and 'wrong' logical cpu's were taken out of service. It would also automatically call Stratus's remote service, report the failure and order a replacement board :-)


> a white-hot parasitic transistor connecting Vcore to ground at low impedance

In plain English: an internal short-circuit.



If we compare the RAD750 against the 5nm Apple A1, the newer chip has about 1660 times as many transistors per mm^2. Minimum feature size (ie. line widths) may not have kept pace with the nanometer-based fab node names, but transistor sizes definitely have.


Eh. Don't compare the RAD 750. It's better to compare something that's not domain specific and rad hard.

Compare the PPC 740, which was ~300,000 transistors/mm^2 at 260nm. Apple M1 is ~130,000,000 transistors/mm^2 at 5nm. (260/5)^2 =~ 2700x, but the actual difference is ~430x.

So it's not as bad as line widths, but it still isn't quite keeping pace.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: