Plain text and markup languages are seriously the most versatile, powerful way of storing documents electronically and I wish more people understood that outside of software development. Heck, I wish within software development there was more focus on this. I guarantee you that a significant chunk of what you do with your phone or PC or whatever would be better if it was delivered to you as some text files in a markup language. I want to actually kill paper off for good.
I buy a lot of TTRPG rulebooks, usually as PDFs because I’d rather not lug books around. But the thing is… PDFs SUCK outside of their real original usecase. Reading them rather than printing them? Awful. They don’t adjust to fit your screen, the text within them is obfuscated to other software. What’s frustrating is that of course, whoever published that PDF has a plain-text version of that document somewhere, because someone wrote it. Gimmie. Please.
So I’m gonna illustrate why plaintext and markup languages are awesome now.
I already outlined the biggest reason. You can’t really extract text from a PDF sanely. Regardless of how you feel about totally fixed layouts, the fact that text is read incorrectly at least half the time? That’s because your PC doesn’t see it as text, really. It needs to convert it all back into text. When PCs can actually recognize text, it becomes searchable. Users can display it as they please. Accessibility tools can mess with it. Power users can sort and store things in pretty much infinite ways, command line tools can interact with them…
And that’s without mentioning the world’s greatest note-taking / knowledge compilation tools like Obsidian and Emacs ORG Mode. When documents are using actual text, oh my god. I can turn whole books into searchable, annotatable notes and have bits of the text link to other stuff as I like. (Speaking of, what the hell happened to hypertext fiction? They’re still awesome. For those who don’t know, people were exploring narrative ideas, or at least formats, that make heavy use of hypertext features. While it only halfway counts this story is rad as hell.)
Check this out. This is an HTML5 version of a legendary programming textbook, and it is beautiful. No matter what size screen you’re on it’ll look pretty and remain completely readable. And despite that, it also handles all the formatting that used to be exclusively for PDFs.
…But there’s a problem! HTML5 documents like this can’t really be packaged into one file and read later, as-is. That’s what EPUB is for! EPUB3, as far as I can tell, pretty much “wraps around” the markdown / formatting capabilities of HTML5 so that you can zip it all up and have it easily usable offline. This is awesome. This is why EPUB is the standard format of ebooks. And yet EPUB files have a reputation for sucking.
Y’know why? There’s literally no one teaching anyone how the hell to make EPUBs properly. There’s so few places I can find on the web with any info on this. W3C themselves, this amateur tutorial which is the closest thing to a real introduction I’ve found, and this one book O’Riley wrote. So I’m pretty sure half of the time publishers aren’t making proper EPUBs themselves, just auto-converting things. Probably. I don’t know.
You know what’s funny? I would hate to be a programmer as a job. I tried it. Hated it.
Well, here’s my personal infodump / rant:
Plain text and markup languages are seriously the most versatile, powerful way of storing documents electronically and I wish more people understood that outside of software development. Heck, I wish within software development there was more focus on this. I guarantee you that a significant chunk of what you do with your phone or PC or whatever would be better if it was delivered to you as some text files in a markup language. I want to actually kill paper off for good.
I buy a lot of TTRPG rulebooks, usually as PDFs because I’d rather not lug books around. But the thing is… PDFs SUCK outside of their real original usecase. Reading them rather than printing them? Awful. They don’t adjust to fit your screen, the text within them is obfuscated to other software. What’s frustrating is that of course, whoever published that PDF has a plain-text version of that document somewhere, because someone wrote it. Gimmie. Please.
So I’m gonna illustrate why plaintext and markup languages are awesome now.
I already outlined the biggest reason. You can’t really extract text from a PDF sanely. Regardless of how you feel about totally fixed layouts, the fact that text is read incorrectly at least half the time? That’s because your PC doesn’t see it as text, really. It needs to convert it all back into text. When PCs can actually recognize text, it becomes searchable. Users can display it as they please. Accessibility tools can mess with it. Power users can sort and store things in pretty much infinite ways, command line tools can interact with them…
And that’s without mentioning the world’s greatest note-taking / knowledge compilation tools like Obsidian and Emacs ORG Mode. When documents are using actual text, oh my god. I can turn whole books into searchable, annotatable notes and have bits of the text link to other stuff as I like. (Speaking of, what the hell happened to hypertext fiction? They’re still awesome. For those who don’t know, people were exploring narrative ideas, or at least formats, that make heavy use of hypertext features. While it only halfway counts this story is rad as hell.)
Check this out. This is an HTML5 version of a legendary programming textbook, and it is beautiful. No matter what size screen you’re on it’ll look pretty and remain completely readable. And despite that, it also handles all the formatting that used to be exclusively for PDFs.
…But there’s a problem! HTML5 documents like this can’t really be packaged into one file and read later, as-is. That’s what EPUB is for! EPUB3, as far as I can tell, pretty much “wraps around” the markdown / formatting capabilities of HTML5 so that you can zip it all up and have it easily usable offline. This is awesome. This is why EPUB is the standard format of ebooks. And yet EPUB files have a reputation for sucking.
Y’know why? There’s literally no one teaching anyone how the hell to make EPUBs properly. There’s so few places I can find on the web with any info on this. W3C themselves, this amateur tutorial which is the closest thing to a real introduction I’ve found, and this one book O’Riley wrote. So I’m pretty sure half of the time publishers aren’t making proper EPUBs themselves, just auto-converting things. Probably. I don’t know.
You know what’s funny? I would hate to be a programmer as a job. I tried it. Hated it.
Soooo much this!!!
I forgot to mention Vimwiki, thanks for bringing it up.