A Hands-On Workshop on Parsing Wikitext
Experience Level: intermediate
Language: english
In this hands-on workshop, we will learn to parse wikitext from beginning to end using Python.
We will extract information from the German Wiktionary and retrieve linguistic information about German words, such as parts of speech, meanings, inflections, and more.
We will cover the following points:
- Fetching the Data: Learn two ways to retrieve wiki data—using the wiki Special Export tool or downloading wiki dump files.
- Parsing the XML Files: Once the data is retrieved in XML format, this section explains how to parse the files to extract the wikitext.
- Parsing the Wikitext: In the final part, we will parse the wikitext and extract elements such as headings, sections, word forms, meanings, inflections, and more.