BLU Discuss list archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Discuss] My first contribution to MediaWiki
- Subject: [Discuss] My first contribution to MediaWiki
- From: tmetro+blu at gmail.com (Tom Metro)
- Date: Sat, 17 Jan 2015 22:10:28 -0500
- In-reply-to: <CANaytccQo5Fb-BeTje9vr7iN0yHHX0EQe6Xp_Rxqwyb2WUy_jA@mail.gmail.com>
- References: <CANaytccQo5Fb-BeTje9vr7iN0yHHX0EQe6Xp_Rxqwyb2WUy_jA@mail.gmail.com>
Greg Rundlett (freephile) wrote: > The project page: http://www.mediawiki.org/wiki/Extension:Html2Wiki > > It's an extension to MediaWiki that lets you "import a website or web page > into your wiki". "It does this by first "normalizing" the content with HTMLTidy, and then "sanitizing" it with Purify and Regular Expressions. Then the content is "converted" from HTML to WikiText using Regular Expressions and a Parsoid service." Amazing that such a conversion is even possible, given how problematic most HTML is. In some ways this job is harder than what browsers do when parsing HTML, as you aren't just rendering the result, but trying to extract structure - or semantic meaning - from it. Does HTMLTidy do a lot of the heavy lifting for you? Do you still end up with a lot of situations where you have multiple HTML constructs that map to a single wiki markup construct? How does it handle HTML generated or loaded by JS, as is quite common now? (You might be able to work around that with one of the projects that use an embedded and programmatically controlled web rendering engine, like webkit.) What are the advantages to implementing this as a plugin rather than a separate command line tool (which would then support other markup formats, like Markdown)? If you couldn't find an existing HTML to wiki markup converter, did you look for something similar, like a converter to markdown? A search for this turns up hits, such as: http://johnmacfarlane.net/pandoc/README.html with an example: pandoc -f html -t markdown http://www.fsf.org which presumably retrieves content from http://www.fsf.org, specified to be in HTML format, and outputs Markdown. (It also supports MediaWiki format.) If using a tool that doesn't support MediaWiki directly, once in Markdown, I imagine the conversion to MediaWiki is relatively easy. -Tom -- Tom Metro The Perl Shop, Newton, MA, USA "Predictable On-demand Perl Consulting." http://www.theperlshop.com/
- Follow-Ups:
- [Discuss] My first contribution to MediaWiki
- From: greg at freephile.com (Greg Rundlett (freephile))
- [Discuss] My first contribution to MediaWiki
- References:
- [Discuss] My first contribution to MediaWiki
- From: greg at freephile.com (Greg Rundlett (freephile))
- [Discuss] My first contribution to MediaWiki
- Prev by Date: [Discuss] Using sftp without a shell account - [SOLVED]
- Next by Date: [Discuss] My first contribution to MediaWiki
- Previous by thread: [Discuss] My first contribution to MediaWiki
- Next by thread: [Discuss] My first contribution to MediaWiki
- Index(es):