Jeremy W. Sherman

stay a while, and listen

Pasting HTML into Markdown

So I was writing a comment on Reddit today, and the easiest way to answer was to quote a list of search results, links and all. One problem: Reddit’s post interface uses Markdown, not HTML. That’s swell when you’re writing your comment fresh, but if you want to paste something in from a webpage, it’s no good.

I’d run into this once or twice before, but I always took the simple way out: just rewrite the one or two links in the text by hand. No big deal.

But these search results were just a list of links. And as a programmer, I am vocationally virtuously lazy.

That’s when I remembered Pandoc. Pandoc is a tool for converting between markup languages. I grabbed it as a Swiss army knife alternative to the more questionable Markdown formatters out there. (Markdown’s reference implementation is in Perl. I have trouble regarding any Perl as anything but a fragile hack.) I actually used it the first time as part of avoiding wordprocessors: instead of emailing a PDF/Word doc/Pages doc (in order of increasing uckiness), I just write up a Markdown doc, format it into a standalone webpage, slap in some CSS, and email it off.

Veering back on course, I recalled it could be used not only from Markdown to HTML but from HTML to Markdown. And how to get pasted text from the browser into HTML? I didn’t want to muck with View Source, so just opened up TextEdit and let its erstwhile annoying habit of preserving pasted formatting work to my benefit. Copy from Aurora, paste into TextEdit, save as HTML, then pandoc -f html -t markdown foo.html | pbcopy, back to Aurora, and paste, and beautiful Markdown appears.

Long story short: Use pandoc to convert HTML into Markdown for your Reddit or Stackoverflow or blogging needs.