A monochrome CRT tube and a black and white TV tube are the same physical device just packaged and sold to different customers. The bandwidth of the electronics driving the electron beam sets a maximum modulation frequency and hence a minimum line pair spacing. That minimum line pair spacing sets the effective pixel separation as directly as the shadow mask pitch does on a color CRT. The bandwidth was set by the TV standards, which were in turn set by what was feasible to manufacture economically in those days.
To be precise, the flyback transformer produces a sawtooth waveform, which generates the raster scan (lines are actually a bit slanted, dropping from left to right). While there's a preset frequency, a TV set will synchronize with the input signal – and there's some robustness in this. E.g., most 50 Hz CRT TVs will happily display a 60 Hz signal (as may be seen in the PAL/60 pseudo-standard). Horizontally, screen activation is directly bound to signal intensity, hence "analog TV", therefore only limited by the signal response of the phosphor used and by the amplification circuitry (for a character display, you'd want your signal ramps not to become too shallow). Also, there's some flexibility in the overall number of lines displayed. (E.g., while there are nominally 262 scan lines in a NTSC field, Atari VCS games historically varied between 238 and 290.) There are no hard specs for a monochrome tube, especially, when driving it directly, w/o going through modulation/demodulation stages. However, there will be a visible vertical separation, if you're just using odd fields as in non-interlaced video (due to the sawtooth waveform generating the raster scan), resulting in visible scan lines. Notably, most of the constraints resulting from this can be addressed by the type design of the display characters.
No, terminals rarely if ever used TV-type tubes (before the home computer era had people using actual TVs) — usually not even the same phosphors — nor the same signal rates, because TV rates would make no sense for a terminal. ‘Manufacture economically’ was not the same thing for a business product as a home appliance — an IBM 3278 in today's dollars cost $13000.
This isn't lost-clay-tablet stuff. There are manuals and schematics online.
The vt220 manual I found suggests the same CRT scan rates. 15.75 Khz horizontal, for example, with the reference freq coming from a very TV specific Motorola IC.
Different phosphors for sure. My terminals (I've had 3 or 4 DEC and others) hade a very visible afterglow. Very stable and flickerfree compared to my CRT TVs.
The VT220 was a very late terminal, though and I was trying to focus on early models where the line length decision was not just inherited from the previous CRT model (i.e. the VT220 had 80 columns because the VT100 did because the VT50 did).