Anjou
anjou.wtf
Anjou
@anjou.wtf
Openly a professional programmer. Less open personally. Queer and tired.
@lookitup.baby I need you to look at this and feel the same thing I do.
May 14, 2025 at 5:00 AM
I want a pathlib.Path for URLs, so I'm writing one. I wonder whether it's the sort of thing I can push to have added to the standard library.
March 13, 2025 at 2:52 PM
Alright, I guess I'll add a post-processor that just looks for the wrong quotes and fixes them. It's not elegant, but whatever.
March 10, 2025 at 3:48 PM
Well, except for one small detail about how the parser works, where it replaces parts of text with a bunch of magic strings as it goes, making some regexes simply not match if they require context which is no longer available.

It took ages to figure out why a regex wasn't matching like it should.
March 10, 2025 at 3:48 PM
The smart quotes implementation looks like just a pile of regexes registered to perform their transforms during the parsing. Seems easy enough, I'm good at regex.
March 10, 2025 at 3:48 PM
Now we're back to smart quotes issues again. This implementation is wrong far more often than kramdown's, putting open quotes where there should be close quotes and vice-versa. Time to fork that code too, I guess!
March 10, 2025 at 3:41 PM
So if you want two blockquotes, one following another, with different attributes? Well, I found on a comment on a Github issue that includes the source code for an extension to break those up. Alright, I'll use that too, I guess.
March 10, 2025 at 3:38 PM
Except that now I'm running into the fact that python-markdown is "proudly" an implementation of John Gruber's original markdown, warts and all, and it has other limitations. Like, for example, there is no syntax for separating two blockquotes adjacent to one another.
March 10, 2025 at 3:38 PM
Okay, alright, I can make that work. I just need to add a preprocessing step on the raw, unparsed Markdown to distinguish the cases. There, working!
March 10, 2025 at 3:35 PM
Fine...except that kramdown has two different versions of the syntax to allow applying the attributes to either the blockquote as a whole or to any given paragraph of it... and python-markdown's parser doesn't expose the distinction in the syntax tree visible to the extension.
March 10, 2025 at 3:35 PM
Okay, well, this time I found an extension that tweaks the behavior of block attribute lists so that they can apply to almost all of those things...except blockquotes.

No problem, I can just fork it and extend the logic to support blockquotes.
March 10, 2025 at 3:35 PM
I'm really trying to avoid rewriting all my Markdown to accommodate an even-more-special parser. So I switched to python-markdown.

Good news: their attribute list syntax is practically identical! Bad news: unlike other implementations, it can't apply to blockquotes, ul, ol, and some other tags.
March 10, 2025 at 3:35 PM
Well, that's...fixable. Annoying, but fixable. The bigger problem is that the official attribute definition plugin uses unresolvably different syntax from any of the other markdown implementations of it I've run into. Block attributes in this plugin go *before* the block, not after.
March 10, 2025 at 3:35 PM
For the unaware: in English it is standard typography to have unmatched open quotation marks for quotes that include paragraph breaks. An implementation of smart quotes for English text should not treat this as a syntax error and fail.
March 10, 2025 at 3:14 PM
I first tried markdown-it because it advertises itself as a Commonmark implementation. The first issue I noticed was that the smart quotes feature is broken. In another example of programmer-brain, it doesn't even try to handle unmatched double quotes.
March 10, 2025 at 3:14 PM
I found two currently maintained Python libraries for handling Markdown which purportedly included support for the extensions I'm interested in. They are markdown-python and markdown-it-py (a straight reimplementation of the JS markdown-it library).

Neither matched kramdown's output.
March 10, 2025 at 3:14 PM
I should have known better. John Gruber's original markdown spec from 20 years ago had a lot of ambiguous edge-cases. I thought the world had largely converged on a common spec (Commonmark), and even a widely shared library of common extensions (e.g., smartypants, attribute lists, tables).
March 10, 2025 at 2:53 PM
The Jinja2 part of this project was the part I thought would be hard. No. I'm sorry. This was the easy part.

The hard part was, shockingly, the Markdown parsing... the part I *thought* would be easy! It's just markdown, right? There are a million parsers out there for it, right?
March 10, 2025 at 2:48 PM
That's the character class for "word," right? Good enough, right? I mean, I guess, as long as you're okay with saying "Bob's Burgers" is three words. This algorithm way overshoots the word count of typical English writing.
March 10, 2025 at 2:48 PM
Hilariously, I had to write my own word count filter because Jinja2 doesn't use the stupid-simple "count the number of things separated by whitespace" algorithm. Instead it uses the very programmer-brained "count the occurrences of the regex `\w+`".
March 10, 2025 at 2:48 PM
A bigger issue is with the include syntax. Jekyll uses a scheme where it passes arguments through to the included template, and Jinja just has the included template inherit context. Still, not a big deal to work around.
March 10, 2025 at 2:48 PM
Add some extra functions for the filters that have no direct equivalent in Jinja, intercept the token stream to swap out the filters that do, and also use the token stream to rewrite Liquid-style argument syntax for Jinja-style.
March 10, 2025 at 2:48 PM