While I'd concur in your skepticism about voice specifically, I still think documents are becoming increasingly multimedia, with elements such as video, interactive elements, graphs and tables etc, and I feel like traditional word processors, which are the go-to tool for many people for authoring, are woefully inadequate to handle such rich documents.
Most of the time, I feel like it would be easier to write HTML markup to style documents than it is to poke and prod at the Word formatting options. In college, when I was routinely having to write papers that needed footnotes and specific formatting styles, I nearly broke down and learned LaTex, I got so frustrated. And, with HTML, it would be trivial to add in the types of multimedia elements you mention.
If only I could convince the non-technical people and management on my team to ditch Word docs... If nothing more, it more make version-control merge conflicts so much less painful to deal with.
The funny thing is, what you're dreaming of is basically WordPerfect, the word processor that ruled the world for many years until Word came along and killed it.
WordPerfect didn't use HTML for formatting, because HTML hadn't been invented yet. But it did use its own set of formatting tags, which are more or less analogous. And it had a "Reveal Codes" key you could press at any time to show all the formatting tags that were active in the current document, letting you tweak their placement, add new ones by hand, etc. You could always get directly at the logic that was being used to format your document.
Word's big selling point was that, being Windows-based, it could abstract all that complexity away and let you format documents entirely visually. And it worked! Or at least, it worked well enough for most people most of the time. But the problem with Word's approach has always been that, when it doesn't work, it can be incredibly frustrating to figure out exactly why. Which was never a problem with WordPerfect, since there was no layer of hidden formatting magic sitting between you and your document.
I find that to work effectively on large Word documents you really need to understand the underlying object model. Everything is an object, and the objects fit together in reasonably consistent ways. But when you don't understand the model then it seems like things happen randomly to the document format for no good reason.
It also helps to turn on the display option "Show all formatting marks". That way you can kind of see the objects and boundaries. Even though you can't directly manipulate those markers like tags in TeX or something they still give helpful clues as to what's happening.
(I'm not trying to defend the usability problems in Word, just pointing out some ways to live with them.)
Do you know any visual (dare I say wysiwyg) HTML authoring tools that adhere and emphasize semantically sensible markup and working efficiently with styles, especially in the context of using some predefined corporate templates? Something that captures the few good bits from Word and applies those to HTML based document model?
SharePoint was a great asset for my proposal team over in Finance for version control issues and Word compatibility. Not a cure-all, but it worked for the non-technical folks well enough (and saved a lot of heart burn).
If only we had a decent way to print those html pages to pdf. But It seems like the print-css is kind of neglected. The stuff barely is doing what it should(page-break and friends).
Neither can you really use min-height=20mm. A pity. And the nicer pdf creators (weasyprint) don't understand the new css rules like flexbox or grid.
So we are still stuck. No easy way to force a footer to the end of a printed page via css.
Actually I will do my best to agree with you outright on the multimedia front! I'm a huge fan of Medium (70+ pieces on there) because of the freedom to intermingle material with the text. It's quite powerful.
However, I use those elements as supplements to the written word, not vice-versa (ex: blog spam / slide shows with large image, little text, big ads)