micromark-abbr extension #181
-
Hello! First of all thank you for micromark - lovely code, really elegant approach to a really inelegant problem space 😅 I've been working on an extension to support the ContextA bit of extra context so I don't do an XY problem. In my day job, I work on the GOV.UK Publishing team. We have our own extension of markdown called Govspeak, which has numerous syntax extensions. Govspeak written in Ruby (based on Kramdown), which means it's not easy to run it in the browser. The implementation is also quite fragile and buggy, relying heavily on pre-processing the markdown with regex replacements. I'm interested in re-implementing the Govspeak syntax in a more considered way in JavaScript, so that we can have a less buggy and fragile implementation that can be run in web browsers. micromark / remark is my current preferred option, but I haven't dismissed using markdown-it or another framework yet. I also haven't dismissed giving up completely, but hopefully we won't get there 😅 QuestionsI've got a minimally working example of an abbr extension now, which is exciting! Mostly I followed the tutorial in the README and heavily cribbed off the micromark-gfm-footnotes extension. There are two areas where I had to do things which I wasn't very happy about... Thing 1: Starting characters for abbr callsThe abbr syntax is horrible, in that there actually is no syntax for abbr calls. If you have: The HTML specification is maintained by the W3C.
*[HTML]: Hyper Text Markup Language
*[W3C]: World Wide Web Consortium Then The difficulty here is that you don't know what characters might begin an I worked around this by starting on all uppercase ASCII characters, but I think this overly restrictive - a fully compliant implementation would work for lowercase letters, and also unicode labels. I did look at having the parser modify itself, so the Do you have any suggestions on a nicer way to support that? Thing 2: Hoisting eventsAs discussed in https://github.com/orgs/micromark/discussions/78 , We need similar functionality for abbr definitions, but the syntax isn't quite the same as the syntax for link reference definitions: <!-- A link reference definition -->
[micromark]: github.com/micromark/micromark "The micromark markdown parser / compiler"
<!-- An abbr definition -->
*[HTML]: Hyper Text Markup Language The built in The way I've worked around this is by using definition and definitionLabelString states, even though technically these are a slightly different kind of thing. This means they get hoisted up to the start of the events list by the compiler, and I can use the data in abbr calls. How ugly is this work around in your view? Is there already a better way to do it? I guess I could not provide an HTML compiler in the micromark extension, and do the transformation on the AST in a later step in the mdast / remark / rehype chain. Alternatively, would you consider some change to micromark to allow extension-defined events to be hoisted to the start of the events list? End noteJust to be completely open - I'm also happy to hear answers of the form "we don't think it's sensible to build a micromark extension for abbr", or "even if abbr is welcome, some of those other things in your govspeak parser look like they would be too horrible to implement". I can always look at markdown-it and other parsers. I've really enjoyed playing around writing this extension - it's the first time I've done something with parsers and compilers, and it's been fun. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 9 replies
-
Hi Richard! How interesting, Govspeak.
What is the reason JS is needed for this? Could not a
Have you seen https://github.com/micromark/micromark#markdown-it? (deep link).
We do often recommend folks not do syntax extensions, have you seen https://github.com/micromark/micromark#extending-markdown (deep link). Would it somehow be possible to switch to, say, a govspeak 2, which uses directives (a singular extension syntax for different extensions?)
What do you mean by “unicode labels”? Do you mean that any punctuation, whitespace, symbol, letter, number, anything, can be used as a label?
You might be able to ignore this: parse the definitions only. Then, when you have an AST, look for them. GFM email autolinks ( What to go with, depends. Perhaps on the grammar too. Which of these works? # W3C?
W3C? w3c?
`W3C`? *W3C*? **W3C**?
[W3C?](W3C? "W3C?") ![W3C?](W3C? "W3C?") <https://W3C?>
```W3C?
W3C?
```
*[HTML]: Do other abbreviations work? W3C?
*[W3C]: World Wide Web Consortium (and how about recursion? W3C?)
Pretty ugly! Likely to break at some point. I’d really recommend unique names for your things
Indeed, that could work. Still, for me, if there’s a
Right. If that’s needed, that must happen. This reordering is not needed for footnote definitions though: https://github.com/micromark/micromark-extension-gfm-footnote/blob/0a62fad40470f2447707020c52d38d1494199ee1/dev/lib/html.js#L145 🤔 I do wonder though.
No, whether it’s recommended is one thing. And there might be really weird syntaxes one comes up with for which that is the answer. But abbreviations, stuff like that, should be possible.
Ah, right. Ehh. Well, then I need to review them 😅 Steps seems weird.
Glad to hear! ASTs are already tough for most people. Integrating into compilers even more so. Cool to hear that you enjoy it! |
Beta Was this translation helpful? Give feedback.
Hi Richard!
How interesting, Govspeak.
What is the reason JS is needed for this? Could not a
/govspeak
endpoint that turns Govspeak into HTML be created?Have you seen https://github.com/micromark/micromark#markdown-it? (deep link).
We do often recommend folks not do syntax extensions, have you seen https://github.com/micromark/micromark#extending-markdown (deep link).
Would it somehow be possible to switch to, say, a govspeak 2, which uses directives (a singular extension syntax for different extensions?)
What do you mean by “unicode labels”? Do you me…