attelo package¶

Attelo is a statistical discourse parser. The API provides

decoders which you should be able to call in a standalone way

machine learning infrastructure wrapping around a library like sci-kit learn

support for building experimental harnesses around the parser

Subpackages¶

Submodules¶

attelo.args module¶

Managing command line arguments

attelo.args.add_common_args(psr)¶: add usual attelo args to subcommand parser

attelo.args.add_fold_choice_args(psr)¶: ability to select a subset of the data according to a fold

attelo.args.add_model_read_args(psr, help_)¶

models files we can read in

Parameters:	help (string) – python format string for help {} will have a word (eg. ‘attachment’) plugged in

attelo.args.add_report_args(psr)¶: add args to scoring/evaluation

attelo.args.validate_fold_choice_args(wrapped)¶

Given a function that accepts an argparsed object, check the fold arguments before carrying on.

The idea here is that –fold and –fold-file are meant to be used together (xnor)

This is meant to be used as a decorator, eg.:

@validate_fold_choice_args
def main(args):
    blah

attelo.edu module¶

Uniquely identifying information for an EDU

class attelo.edu.EDU¶

Bases: attelo.edu.EDU

a class representing the EDU (id, span start and end, grouping, subgrouping)

span()¶: Starting and ending position of the EDU as an integer pair

attelo.edu.FAKE_ROOT = EDU(id='ROOT', text='', start=0, end=0, grouping=None, subgrouping=None)¶: a distinguished fake root EDU which simultaneously appears in all groupings

attelo.fold module¶

Group-aware n-fold evaluation.

Attelo uses a variant of n-fold evaluation, where we (still) andomly partition the dataset into a set of folds of roughly even size, but respecting the additional constraint that any two data entries belonging in the same “group” (determined a single distiguished feature, eg. the document id, the dialogue id, etc) are always in the same fold. Note that this makes it a bit harder to have perfectly evenly sized folds

Created on Jun 20, 2012

@author: stergos

contribs: phil

attelo.fold.fold_groupings(fold_dict, fold)¶

Return the set of groupings that belong in a fold. Raise an exception if the fold is not in the fold dictionary

:rtype frozenset(int)

attelo.fold.make_n_fold(groupings, folds, rng)¶

Given a set of groupings and a desired number of folds, return a fold selection dictionary assigning a fold number to each each grouping (see attelo.edu.EDU).

Parameters:	rng (:py:class:random.Random:) – random number generator (hint: the random module will be just fine if you don’t mind shared state)

:rtype dict(string, int)

attelo.fold.select_testing(mpack, fold_dict, fold)¶

Given a division into folds and a fold number, return only the test items for that fold

Return type:	`Multipack`

attelo.fold.select_training(mpack, fold_dict, fold)¶

Given a division into folds and a fold number, return only the training items for that fold

Return type:	`Multipack`

attelo.graph module¶

graph visualisation

exception attelo.graph.Alarm¶

Bases: exceptions.Exception

Exception to raise on signal timeout

class attelo.graph.GraphSettings¶

Bases: attelo.graph.GraphSettings

Parameters:

hide (string or None) – ‘intra’ to hide links between EDUs in the same subgrouping; ‘inter’ to hide links across subgroupings; None to show all links
select ([string] or None) – EDU groupings to graph (if None, all groupings will be graphed unless)
unrelated (bool) – show unrelated links
timeout (int) – number of seconds to allow graphviz to run before it times out
quiet (bool) – suppress informational messages

attelo.graph.alarm_handler(_, frame)¶: Raise Alarm on signal

attelo.graph.diff_all(edus, src_predictions, tgt_predictions, settings, output_dir)¶: Generate graphs for all the given predictions. Each grouping will have its own graph, saved in the output directory

attelo.graph.graph_all(edus, predictions, settings, output_dir)¶: Generate graphs for all the given predictions. Each grouping will have its own graph, saved in the output directory

attelo.graph.mk_diff_graph(title, edus, src_links, tgt_links, settings)¶

Convert attelo predictions to a graphviz graph diplaying differences between two predictions

Predictions here consist of an EDU followed by a list of (parent name, relation label) tuples

Parameters:	tgt_links – if present, we generate a graph that represents a difference between the links and tgt_links (by highlighting links that only occur in one or the other)

attelo.graph.mk_single_graph(title, edus, links, settings)¶: Convert single set of attelo predictions to a graphviz graph

attelo.graph.select_links(edus, links, settings)¶

Given a set of edus and of edu id pairs, return only the pairs whose ids appear in the edu list

Parameters:	intra – if True, in addition to the constraints above, only return links that are in the same subgrouping inter – if True, only return links between subgroupings

attelo.graph.write_dot_graph(filename, dot_graph, run_graphviz=True, quiet=False, timeout=30)¶: Write a dot graph and possibly run graphviz on it

attelo.io module¶

attelo.report module¶

attelo.score module¶

attelo.table module¶

Manipulating data tables (taking slices, etc)

class attelo.table.DataPack¶

Bases: attelo.table.DataPack

A set of data that can be said to belong together.

A typical use of the datapack would be to group together data for a single document/grouping. But in cases where this distinction does not matter, it can also be convenient to combine data from multiple documents into a single pack.

Notes

A datapack is said to be

single document (the usual case) it corresponds to a single document or “stacked” if it is made by joining multiple datapacks together. Some functions may only behave correctly on single-document datapacks
weighted if the graphs tuple is set. You should never see weighted datapacks outside of a learner or decoder

Parameters:

(EDU) (edus) – effectively a set of edus
([(EDU, EDU)]) (pairings) – edu pairs
2D array(float) (data) – sparse matrix of features, each row corresponding to a pairing
1D array (should be int, really) (target) – array of predictions for each pairing
ctarget (dict from string to objects) – Mapping from grouping name to structured target
([string]) (vocab) – list of relation labels (NB: by convention label zero is always the unknown label)
([string]) – feature names (corresponds to the feature indices) in data
(None or Graph) (graph) – if set, arrays representing the probabilities (or confidence scores) of attachment and labelling

get_label(i)¶

Return the class label for the given target value.

Parameters:	(int, less than len(self.labels)) (i) – a target value

See also

get_label

classmethod load(edus, pairings, data, target, ctarget, labels, vocab)¶

Build a data pack and run some sanity checks (see :py:method:sanity_check’) (recommended if reading from disk)

Return type:	`DataPack`

sanity_check()¶: Raising DataPackException if anything about this datapack seems wrong, for example if the number of rows in one table is not the same as in another

selected(indices)¶: Return only the items in the specified rows

set_graph(graph)¶: Return a copy of the datapack with weights set

classmethod vstack(dpacks)¶

Combine several datapacks into one.

The labels and vocabulary for all packs must be the same

exception attelo.table.DataPackException(msg)¶

Bases: exceptions.Exception

An exception which arises when worknig with an attelo data pack

class attelo.table.Graph¶

Bases: attelo.table.Graph

A graph can only be interpreted in light of a datapack.

It has predictions and attach/label weights. Predictions work like DataPack.target. The weights are useful within parsing pipelines, where it is sometimes useful for an intermediary parser to manipulate the weight vectors that a parser may calculate downstream.

See the parser interface for more details.

Parameters:	prediction (array(int)) – label for each edge (each cell corresponds to edge) attach (array(float)) – attachment weights (each cell corresponds to an edge) label (2D array(float)) – label attachment weights (edge by label)

Notes

Predictions are always labels; however, datapack targets may also be -1/0/1 when adapted to binary attachment task

selected(indices)¶: Return a subset of the links indicated by the list/array of indices

tweak(prediction=None, attach=None, label=None)¶

Return a variant of the current graph with some values changed.

Parameters:	prediction (1D array of int16) – Predicted label for each pair of EDUs attach (1D array of float) – Attachment scores for each pair of EDUs label (2D array of float) – Score of each label for each pair of EDUs
Returns:	g_copy – Copy of self with prediction, attach or label overridden with the values passed as arguments.
Return type:	Graph

Notes

This returns a copy of self with graph changed, because “[EYK] superstitiously believes that datapacks and graphs should be immutable as much as possible, and that mutability in the parsing pipeline would lead to confusion; hence this and namedtuples instead of simple getting and setting”.

classmethod vstack(graphs)¶: Combine several graphs into one.

class attelo.table.Multipack¶

Bases: dict

A multipack is a mapping from groupings to datapacks

This class exists purely for documentation purposes; in practice, a dictionary of string to Datapack will do just fine

attelo.table.UNKNOWN = '__UNK__'¶: distinguished internal value for post-labelling mode

attelo.table.UNRELATED = 'UNRELATED'¶: distinguished value for unrelated relation labels

attelo.table.attached_only(dpack, target)¶

Return only the instances which are labelled as attached (ie. this would presumably return an empty pack on completely unseen data)

Parameters:

dpack (DataPack) – Original datapack
target (array(int)) – Original targets

Returns:

dpack (DataPack) – Transformed datapack, with binary labels
target (array(int)) – Transformed targets, with binary labels

attelo.table.for_attachment(dpack, target)¶

Adapt a datapack to the attachment task.

This could involve: * selecting some of the features (all for now, but may change in the future) * modifying the features/labels in some way: we currently binarise labels to {-1 ; 1} for UNRELATED and not-UNRELATED respectively.

Parameters:

dpack (DataPack) – Original datapack
target (array(int)) – Original targets

Returns:

dpack (DataPack) – Transformed datapack, with binary labels
target (array(int)) – Transformed targets, with binary labels

attelo.table.for_labelling(dpack, target)¶

Adapt a datapack to the relation labelling task (currently a no-op).

This could involve * selecting some of the features (all for now, but may change in the future) * modifying the features/labels in some way (in practice no change)

Parameters:

dpack (DataPack) – Original datapack
target (array(int)) – Original targets

Returns:

dpack (DataPack) – Transformed datapack, with binary labels
target (array(int)) – Transformed targets, with binary labels

attelo.table.get_label_string(labels, i)¶: Return the class label for the given target value.

attelo.table.grouped_intra_pairings(dpack, include_fake_root=False)¶

Retrieve intra pairings from a datapack, grouped by subgrouping.

Parameters:	dpack (DataPack) – The datapack under scrutiny. include_fake_root (boolean, optional) – If True, (FAKE_ROOT_ID, x) pairings are included in the group defined by (grouping(x), subgrouping(x)).
Returns:	groups – Map each (grouping, subgrouping) to the list of pairing indices within the same subgrouping.
Return type:	dict from (string, string) to list of integers

Notes

The result roughly corresponds to a hypothetical dpack.pairings[‘intra’].groupby([‘grouping’, ‘subgrouping’]).groups.

attelo.table.groupings(pairings)¶

Given a list of EDU pairings, return a dictionary mapping grouping names to list of rows within the pairings.

Return type:	dict(string, [int])

attelo.table.idxes_attached(dpack, target)¶

Indices of attached pairings from dpack, according to target.

Parameters:

dpack (DataPack) – Datapack
target (list of integers) – Label for each pairings of dpack

Returns:

indices (array of integers) – Indices of attached pairings.
TODO
—-
Try and apply widely, especially for parser.intra ;
search for e.g. “target != unrelated” and “target[i] != unrelated”.

attelo.table.idxes_fakeroot(dpack)¶: Return datapack indices only the pairings which involve the fakeroot node

attelo.table.idxes_inter(dpack, include_fake_root=False)¶

Return indices of pairings from different subgroupings.

Parameters:	dpack (DataPack) – Datapack under scrutiny include_fake_root (boolean, optional) – If True, pairings of the form (FAKE_ROOT_ID, x) are included.
Returns:	idxes – Indices of the inter pairings.
Return type:	list of int

attelo.table.idxes_intra(dpack, include_fake_root=False)¶

Return indices of pairings from same subgrouping, inside a datapack.

Parameters:	dpack (DataPack) – Datapack under scrutiny include_fake_root (boolean, optional) – If True, pairings of the form (FAKE_ROOT_ID, x) are included.
Returns:	idxes – Indices of the intra pairings.
Return type:	list of int

attelo.table.locate_in_subpacks(dpack, subpacks)¶

Given a datapack and some of its subpacks, return a list of tuples identifying for each pair, its subpack and index in that subpack.

If a pair is not found in the list of subpacks, we return None instead of tuple

Returns:
Return type:	[None or (DataPack, float)]

attelo.table.mpack_pairing_distances(mpack)¶

Return for each target value (label) in the multipack. See pairing_distances() for details

:rtype dict(int, (int, int))

attelo.table.pairing_distances(dpack)¶

Return for each target value (label) in the datapack, the left and right maximum distances of edu pairings (in number of EDUs, so adjacent EDUs have distance of 0)

Note that we assume a single-document datapack. If you give this a stacked datapack, you may get very large distances to the fake root

:rtype dict(int, (int, int))

attelo.table.select_window(dpack, window)¶

Select only EDU pairs that are at most window EDUs apart from each other (adjacent EDUs would be considered 0 apart)

Note that if the window is None, we simply return the original datapack

Note that will only work correctly on single-document datapacks

attelo.util module¶

General-purpose classes and functions

class attelo.util.ArgparserEnum¶

Bases: enum.Enum

An enumeration whose values we spit out as choices to argparser

classmethod choices_str()¶: available choices in this enumeration

classmethod from_string(string)¶: from command line arg

classmethod help_suffix(default)¶: help text suffix showing choices and default

class attelo.util.Team¶

Bases: attelo.util.Team

Any collection where we have the same thing but duplicated for each attelo subtask (eg. models, learners,)

fmap(func)¶: Apply a function to each member of the collection

attelo.util.concat_i(iters)¶: Merge an iterable of iterables into a single iterable

attelo.util.concat_l(iters)¶: Merge an iterable of iterables into a list

attelo.util.mk_rng(shuffle=False, default_seed=None)¶

Return a random number generator instance, hard-seeded unless we ask for shuffling to be enabled

(note: if shuffle mode is enable, the rng in question will just be the system generator)

attelo.util.truncate(text, width)¶: Truncate a string and append an ellipsis if truncated