Extractors¶

This module provides a set of articlequality.Extractor s that implement a strategy for identifying article quality labeling events historically. These labelings are used as training data to build prediction models.

Supported wikis¶

Base classes¶

class articlequality.Extractor(name, doc, namespaces)[source]¶

Implements an labeling event extraction strategy.

Parameters:	name : str A name for the extraction strategy doc : str Documentation describing the extraction strategy namespace : iterable`(`int) A set of namespaces that will be considered when performing an extraction

extract(page, verbose=False)[source]¶

Processes an mwxml.Page and returns a generator of first-observations of a project/label pair.

Parameters:	page : `mwxml.Page` Page to process verbose : bool print dots to stderr

invert_reverted_status(reverteds, revisions)[source]¶: This method recursively searches the reverted status of revisions and inverts the status when reverts are themselves reverted.

class articlequality.TemplateExtractor(*args, from_template, **kwargs)[source]¶

Implements a template-based extraction strategy based on a from_template function that takes a template and returns a (project, label) pair.

Parameters:	from_template : func A function that takes a template and returns a (project, label) pair

extract(page, verbose=False)¶

Processes an mwxml.Page and returns a generator of first-observations of a project/label pair.

Parameters:	page : `mwxml.Page` Page to process verbose : bool print dots to stderr

extract_labels(text)[source]¶

Extracts a set of labels for a version of text by parsing templates.

Parameters:	text : str Wikitext markup to extract labels from
Returns:	An iterator over (project, label) pairs

invert_reverted_status(reverteds, revisions)¶: This method recursively searches the reverted status of revisions and inverts the status when reverts are themselves reverted.

Extractors¶

Supported wikis¶

Base classes¶

articlequality

Navigation

Related Topics