Extractors¶
This module provides a set of articlequality.Extractor
s that
implement a strategy for identifying article quality labeling events
historically. These labelings are used as training data to build prediction
models.
Supported wikis¶
Base classes¶
-
class
articlequality.
Extractor
(name, doc, namespaces)[source]¶ Implements an labeling event extraction strategy.
Parameters: - name : str
A name for the extraction strategy
- doc : str
Documentation describing the extraction strategy
- namespace : iterable`(`int)
A set of namespaces that will be considered when performing an extraction
-
extract
(page, verbose=False)[source]¶ Processes an
mwxml.Page
and returns a generator of first-observations of a project/label pair.Parameters: - page :
mwxml.Page
Page to process
- verbose : bool
print dots to stderr
- page :
-
class
articlequality.
TemplateExtractor
(*args, from_template, **kwargs)[source]¶ Implements a template-based extraction strategy based on a from_template function that takes a template and returns a (project, label) pair.
Parameters: - from_template : func
A function that takes a template and returns a (project, label) pair
-
extract
(page, verbose=False)¶ Processes an
mwxml.Page
and returns a generator of first-observations of a project/label pair.Parameters: - page :
mwxml.Page
Page to process
- verbose : bool
print dots to stderr
- page :
-
extract_labels
(text)[source]¶ Extracts a set of labels for a version of text by parsing templates.
Parameters: - text : str
Wikitext markup to extract labels from
Returns: An iterator over (project, label) pairs
-
invert_reverted_status
(reverteds, revisions)¶ This method recursively searches the reverted status of revisions and inverts the status when reverts are themselves reverted.