Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been going through the first edition of Art of Intel x86 Assembly and there was a mention that DEC used octal numbers (PDP-11 uses octal) and a cross reference of the tar wiki page indicates that the first tar was written for Version 7 Unix, which was made for the PDP-11.

Gnu tar most likely inherited the octal system in order to retain compatibility with the original tar utility.



Fun fact[1]: the x86 instruction encoding (pre Pentium or so) is itself octal in organization. This is immediately apparent in the 2+3+3-bit structure of the ModR/M and SIB bytes, but looking at the opcode table that way is also helpful. The manuals stubbornly describe it in hex, though—you have to go way, way back to the Datapoint 2200 “programmable terminal” to find a manual that actually talks in octal, and that wasn’t even made by Intel! (The 8008 was commissioned from Intel by the original manifacturer as a “2200 on a chip”.)

Weird Intel history aside, everybody used octal in the old days. I don’t actually know why, but I suspect this goes back to how early binary computers used 36-bit words (later 18- or 12-bit ones) because that’s how many bits you need to represent 10 decimal digits (and a sign), the standard for (electro)mechanical arithmometers which (among other devices) they were trying to displace. So three-bit groupings made more sense than four-bit ones. Besides, as far as instruction encodings go, a 8-way mux seems like a reasonable size if a 16-way one is too expensive.

(Octal on the 16-bit PDP-11 is probably a holdover from the 12-bit PDP-8? Looking at the encoding table, it does seem to be using three- and six-bit groups a lot.)

[1] https://news.ycombinator.com/item?id=30409889


> but I suspect this goes back to how early binary computers used 36-bit words (later 18- or 12-bit ones)

Actually the PDP-1 was an 18 bit machine; its successor the PDP-6 had 36 bit words (and 18 bit addresses -- yes it was explicitly designed to be a Lisp machine in 1963). Other DEC machines like the PDP-7 (on which Unix was developed) were 18-bit machines (Multics used an 18-bit non-DEC architechture).

Probably the most popular minicomputer ever, the PDP-8, used 12 bits, as did a bunch of DEC industrial control machines.


There's a story about Grace Hopper, who learned octal for the BINAC. She later had problems balancing her checkbook before she realized she was doing so in octal.

That lead her to conclude it would be better to teach the computer to handle decimal than to force everyone to use octal.

A lot of the old systems used 6-bit character codes (https://en.wikipedia.org/wiki/Six-bit_character_code ), including BCD six-bit codes, making 6*n word sizes more appropriate.


My dad used to give me maths problems when I was little but tell me to solve them in specific bases (typically decimal, octal or hex, but not always). Doing long division by hand in hex gives you a feel for how the numbers relate.

The anecdote about decimal makes sense — she was the key to the design of COBOL.

I used quite a few machines with six bit characters into the mid 1980s.


While GNU tar inherits this from the original tar, it doesn't seem to be something intrinsic to the PDP-11 or DEC given that the earlier Unix 'tap' archival system uses 'plain binary numbers ("base 256")', not octals.

Here's the V7 tar.c: https://github.com/dspinellis/unix-history-repo/blob/Researc...

and it's documentation: https://github.com/dspinellis/unix-history-repo/blob/Researc...

The C code clearly shows the octal (I've never used "%o" myself!):

sscanf(dblock.dbuf.mode, "%o", &i); sp->st_mode = i; sscanf(dblock.dbuf.uid, "%o", &i); sp->st_uid = i; sscanf(dblock.dbuf.gid, "%o", &i); sp->st_gid = i; sscanf(dblock.dbuf.size, "%lo", &sp->st_size); sscanf(dblock.dbuf.mtime, "%lo", &sp->st_mtime); sscanf(dblock.dbuf.chksum, "%o", &chksum);

Now, older versions of Unix supported DECTape, going back to the first manual from 1971 (see the "tap" command at http://www.bitsavers.org/pdf/bellLabs/unix/UNIX_ProgrammersM... , page "5_06", which is 94 of 211 - the command-line interface is clearly related to tar).

Here's the tp.5 format description from V6 at https://github.com/dspinellis/unix-history-repo/blob/Researc...

  DEC/mag tape formats
   ...
  Each entry has the following format:
    path name 32 bytes
    mode 2 bytes
    uid 1 byte
    gid 1 byte
    unused 1 byte
    size 3 bytes
    time modified 4 bytes
    tape address 2 bytes
    unused 16 bytes
    check sum 2 bytes
These are in essentially the same order as the "struct file_header" for tar shown in the linked-to essay.

But you can see at https://github.com/dspinellis/unix-history-repo/blob/Researc... that mtime is an int[2], so 4 bytes (yes, 16-bit integers; confirm with https://github.com/dspinellis/unix-history-repo/blob/Researc... showing i_mode is an integer).

Which means this older "tap" DECTape support uses 'plain binary numbers ("base 256")', not octal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: