Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> there’s not really a difference between a database and a file system. Fundamentally they do the same thing, and are sort of just optimizations for particular problem-sets.

Conceptually that is quite true, though the domain dependencies make a lot of the code end up looking quite different.

But the first true database (pre-relational!) was developed for SABRE, American Airlines' computerized reservation system, in the early 1960s. Before that tickets were issued manually and the physical structure of the desks and filing systems used to make reservations reflected the need!

Unfortunately I can't find the paper I read (back in the mid 80s) on the SABRE database but I remember that record size (which is still used today!) was chosen based on the rotational speed of the disk and seek latency. Certainly there was no filesystem (the concept of filesystem barely existed, though Multics developed a hierarchical filesystem (intended to be quite database-like, as it happens) around the same time. The data base directly manipulated the disk. I don't know when that changed -- perhaps in the 1970s?

Like I said I can't quickly find the paper on the topic, but here's a nontechnical discussion with some cool pictures: https://www.sabre.com/files/Sabre-History.pdf. A search for "American Airlines SABRE database history" finds some interesting articles and a couple of good Wikipedia pages.



I think direct manipulation never went away, but the abstractions that were provided for general use were too useful to pass up for most workloads.

Some kinds of storage like cloud-scale object storage use custom HDD firmwares and custom on-disk formats instead of filesystems (±2005-era tech), we also have much newer solutions that do direct work on disks like HMR (not to be confused with HAMR or HAMMER2) where the host manages the recording of data on the disk. There are some generally available systems for that, but we also have articles like this: https://blog.westerndigital.com/host-managed-smr-dropbox/ (Which mostly focuses on SMR but this works on CMR too).

As for the record size in the DB vs. Disk attributes, that's probably not used like that anymore, but I do know that filesystem chunks/extents/blocks are calculated and grouped to profit from optimal LBA access. If you run ZFS and have it auto-detect or manually set the ashift size to make it match the actual on-disk sector size. This was especially relevant when 512e and 4Kn (and the various manufactures 'real' and 'soft' implementations) weren't reliable indicators of the best sector access size strategies.


I could be wrong, but I sort of think when I learned Oracle back when I was in school (mid-2000s) supported dropping a database on a raw block device. So it's been around a long time, but would be uncommon in some tech circles.


Yeah, until the mid '00s you would run your db directly to raw disk devices, both in order to optimize the use of larger contiguous disk regions (disk drives were slow in those days!) and, crucially, because if/when your server went down hard any pending OS-buffered writes would result in a corrupted database, lost data, and lengthy rebuilds from logs (generally after having to do a long fsck recovery just to get back into the OS). It wasn't until journaled filesystems became common and battle-tested that you saw databases living in the filesystem proper.


I believe the "least proprietary" interface to this, that looks like it'll cope with both SMR rotating disks and flash, is Zoned Namespaces.

With ZNS, you have a fixed number of fixed size append-only zones, each of which can only be erased as a whole. It starts to look a lot like a typical LSM tree..

https://zonedstorage.io/docs/introduction/zns


Good old ISAM (Indexed Sequential Access Method) before DASD (Direct Access Storage Device) took over. (Aren't you glad IBM didn't win the "name the things" contest? :-))

I'm going to guess that by "domain dependency" you're talking about how

   handle = open("foo.txt");
Looks semantically different than

   err = db->exec("SELECT * from DIRECTORY where NAME = 'foo.txt';", &result);
So yes in that regard they certainly "feel" different, although at some point I needed a file system for an application than built a wrapper layer for sqlite that basically gave you open/read/write/delete calls and it just filled in all the other stuff to convert specialized filesystem calls into general purpose database calls.[1]

The best thing you can say about the way UNIX decided to handle files was that it forced people to either use them as is or make up their own scheme within a file itself (and don't get me started on the hell that is 'holey' files)

[1] In my case the underlying data storage was a NAND flash chip so the result you got back which was nominally a FILE* like stdio had the direct address on flash of where the bits were. read-modify-write operations were slow since it effectively copied the file for that (preserving flash sector write lifetimes)


Funny enough, DASD is now, for the first time, more accurate than "disk".

But yes. Talking to mainframe people is a bit like talking to astronauts, in that their jargon is completely impenetrable to the uninitiated.


In addition to disks, IBM direct-access storage options available in the middle sixties included a variety of magnetic drum devices and the short-lived, tape-based Data Cell Drive[1].

[1] https://en.wikipedia.org/wiki/IBM_2321_Data_Cell


I probably should have added “in a very long time” :-)

I love the exotic tech in those. Mainframes represent a different branch in the evolution of computers, very much unlike the machines on our hands and desks.


I love that Amdahl mainframe (page 6) with that humongous 20" CRT console.

Most likely showing a 24x80 3270 console session, with 8x16 character cells (if that much), but, still, quite awesome.

I'm not aware of any that ended up in a museum, sadly.

For those with sufficiently cool IEEE memberships, there is quite a lot about Sabre in the Annals of the History of Computing magazine archives.

https://ieeexplore.ieee.org/document/397059

https://ieeexplore.ieee.org/document/1114868

https://ieeexplore.ieee.org/document/279229, which is not about Sabre, but Air Canada's system.

If you think about it, modern IBM mainframes have a lot of weirdness about their filesystems and the concept of a file. Those machines are very alien for people who grew up on Unix.


seems like a good time to remind people that using sci-hub might be unlawful and/or blocked in your country


Looks like some people failed to understand your comment.


Yep and this is why you still get a six character Passenger Name Record (PNR) for your flight booking.


> Certainly there was no filesystem [...] I remember that record size

Sounds like a record-oriented filesystem to me.

Which comes as no surprise as there is no difference between a database and a filesystem.


Not disk drives but tape drives. Most likely these:

https://en.m.wikipedia.org/wiki/IBM_729


SABRE was specifically disk drives, though given the capacity of drives in those days I'm sure tapes were very important (and you see a lot of them in the photos from the link I included)


This is somewhat mysterious. Sabre launched in 1960 on a pair of of IBM 7090s (a system itself launched only a year prior) but the first HDD compatible with the 7090 was as far as I can tell, the well-known IBM 1301 which didn’t ship until 1962. Perhaps the record size was designed based on the anticipated specs of the 1301? Or perhaps they wrote a new storage layer when they got disks - either way I’m sure as you say they optimized their record size for those very expensive disks.


And I thought SABRE sells printers and acquired Dunder Mifflin




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: