Extractors

This module provides a set of articlequality.Extractor s that implement a strategy for identifying article quality labeling events historically. These labelings are used as training data to build prediction models.

Supported wikis

Base classes

class articlequality.Extractor(name, doc, namespaces)[source]

Implements an labeling event extraction strategy.

Parameters:
name : str

A name for the extraction strategy

doc : str

Documentation describing the extraction strategy

namespace : iterable`(`int)

A set of namespaces that will be considered when performing an extraction

extract(page, verbose=False)[source]

Processes an mwxml.Page and returns a generator of first-observations of a project/label pair.

Parameters:
page : mwxml.Page

Page to process

verbose : bool

print dots to stderr

invert_reverted_status(reverteds, revisions)[source]

This method recursively searches the reverted status of revisions and inverts the status when reverts are themselves reverted.

class articlequality.TemplateExtractor(*args, from_template, **kwargs)[source]

Implements a template-based extraction strategy based on a from_template function that takes a template and returns a (project, label) pair.

Parameters:
from_template : func

A function that takes a template and returns a (project, label) pair

extract(page, verbose=False)

Processes an mwxml.Page and returns a generator of first-observations of a project/label pair.

Parameters:
page : mwxml.Page

Page to process

verbose : bool

print dots to stderr

extract_labels(text)[source]

Extracts a set of labels for a version of text by parsing templates.

Parameters:
text : str

Wikitext markup to extract labels from

Returns:

An iterator over (project, label) pairs

invert_reverted_status(reverteds, revisions)

This method recursively searches the reverted status of revisions and inverts the status when reverts are themselves reverted.