pychronicles package

Submodules

pychronicles.abstraction module

The abstraction module offers a class for the generalisation of a collection of sequences as a chronicle.

The problem of generalisation is the construction of a unique chronicle that occurs in every sequences to generalize, such that it represents the “common” part of them.

It is worth noticing that the empty chronicle is a possible solution but is useless. To be meaningfull, the algorithm extracts the “largest” chronicle. The notion of “largest” is defined as the chronicle with the largest multiset and the narrowest temporal constraints. This minimum exists in the specific case of finite collection of timed sequences (for more formal details, see “Chronicles: Formalization of a Temporal Model”, Besnard and Guyet, 2023).

Example

The following example illustrates the generalization of 4 sequences as a chronicle.

seq = [("a", 1), ("c", 2), ("b", 3), ("a", 8), ("a", 10), ("b", 12)]

dates = np.array([e[1] for e in seq], dtype="float")
data = np.array([e[0] for e in seq])

ts1 = TimedSequence(dates, data)

seq = [("a", 1), ("b", 12), ("c", 23), ("b", 30)]

dates = np.array([e[1] for e in seq], dtype="float")
data = np.array([e[0] for e in seq])

ts2 = TimedSequence(dates, data)

seq = [("a", 25), ("b", 26), ("c", 28), ("b", 30)]

dates = np.array([e[1] for e in seq], dtype="float")
data = np.array([e[0] for e in seq])

ts3 = TimedSequence(dates, data)

seq = [("b", 20), ("c", 23), ("a", 25), ("b", 30)]

dates = np.array([e[1] for e in seq], dtype="float")
data = np.array([e[0] for e in seq])

ts4 = TimedSequence(dates, data)

#########################
abs = Abstracter()
c = abs.abstract([ts1, ts2, ts3, ts4])

print(c)
Authors:

Thomas Guyet, Inria

Date:

08/2023

class pychronicles.abstraction.Abstracter

Bases: object

abstract(sequences: Sequence[TimedSequence]) Chronicle
Parameters:

sequences ([TimedSequence]) – List of timed sequences to generalize as a chronicle

Returns:

The largest chronicle that occurs in all the timed sequences.

Return type:

Chronicle

static first_occurrence(sequence: TimedSequence, ms: Multiset) Sequence[int] | None

pychronicles.chronicle module

The module mainly defines the implementation of Chronicle class. This class provides the basic functionalities to specify a chronicle and to recognize or enumerate chronicle occurrences in a timed sequence (see TimeSequence class).

Warning

There are two possible modelings of time constraints: float or numpy.timedelta64. The user has to be consistent: all temporal constraints must use the same type of duration and it must be consistent with the one of the timed sequences (if you use chronicles on timed sequences)

Example

The following example illustrates the main functionalities of the Chronicle class.

# Example of sequence
seq = [ ("a", 1), ("c", 2), ("b", 3), ("a", 8), ("a", 10), ("b", 12), ("a", 15), ("c", 17), ("b", 20), ("c", 23), ("c", 25), ("b", 26), ("c", 28), ("b", 30) ]

dates = np.array(
    [np.datetime64("1970-01-01") + np.timedelta64(e[1], "D") for e in seq],
    dtype="datetime64",
)
data = np.array([e[0] for e in seq])

ts = TimedSequence(dates, data)

c = Chronicle()
c.add_event(0, "a")
c.add_event(1, "b")
c.add_event(2, "c")
c.add_constraint(0, 1, (np.timedelta64(4, "D"), np.timedelta64(10, "D")))
c.add_constraint(0, 2, (np.timedelta64(2, "D"), np.timedelta64(8, "D")))
c.add_constraint(1, 2, (np.timedelta64(3, "D"), np.timedelta64(13, "D")))

try:
    import matplotlib.pyplot as plt
    c.draw()
    plt.show()
except:
    pass

print(c)
c.minimize()
print(c)

reco = c.match(ts)
print(f"Reconnaissance de la chronique: [{reco}]!")

reco = c.recognize(ts)
print(f"Reconnaissance de la chronique: [{reco}]!")

print(c)
c2 = c.copy()
c2.add_constraint(0, 2, (np.timedelta64(4, "D"), np.timedelta64(4, "D")))
print(c)

#################################
# Sequence with floats

dates = np.array([float(e[1]) for e in seq], dtype="float")
data = np.array([e[0] for e in seq])

ts = TimedSequence(dates, data)

c = Chronicle()
c.add_event(0, "a")
c.add_event(1, "b")
c.add_event(2, "c")
c.add_constraint(0, 1, (4.0, 10.0))
c.add_constraint(0, 2, (2.0, 8.0))
c.add_constraint(1, 2, (3.0, 13.0))

print(c)
c.minimize()
print(c)

reco = c.match(ts)
print(f"Reconnaissance de la chronique: [{reco}]!")

reco = c.recognize(ts)
print(f"Reconnaissance de la chronique: [{reco}]!")

print(c)
c2 = c.copy()
c2.add_constraint(0, 2, (np.timedelta64(4, "D"), np.timedelta64(4, "D")))
print(c)
Authors:

Thomas Guyet, Inria

Date:

08/2023

class pychronicles.chronicle.Chronicle

Bases: object

Class for a chronicle pattern modeling -> enables to have partially defined chronicles

sequence

a list of events representing a multiset. It may work (without guarantee) with any event type equipped with an __eq__() operator.

Type:

[int|str]

tconst

a map assigning an temporal constraint (lower and upper bounds) of the delay between the events in the key. The delay is expressed in timedelta. It is possible to define infinite intervals using None for one bound (e.g. (None, numpy.timedelta(56,’D’)) ). With discourage the use of (None,None) that may reduce the algorithm efficiency compare to not have any constraints.

Type:

{(int,int):(numpy.timedelta64, numpy.timedelta64)}

pid

chronicle identifier

Type:

int

inconsistent

True is the chronicle is inconsistent and had a consistency check (through minimization)

Type:

bool

constr_type

Specify the type of constrainted used

Type:

int

add_constraint(ei: int, ej: int, constr: Tuple[timedelta64, timedelta64] | Tuple[float, float]) None

Add a temporal constraint (couple $[m,M]$) from event ei to ej

Parameters:
  • ei (int) – index of the events in the multiset

  • ej (int) – index of the events in the multiset

  • constr ((np.timedelta64, np.timedelta64) or (float, float)) – A couple representing the temporal constraint to add between ei and ej. The temporal constraints gives the minimum and the maximum delay between the occurrences of the events.

  • ordered (ei and ej are internally ordered (ei<ej). If not) –

  • automatically (it is) –

  • constraint). (reversed (with a reversed temporal) –

  • events (If there is already a existing constraint between the two) –

  • overrided. (it is) –

add_event(pos: int, label: str) None

Add an event to the chronicle multiset at a given position.

An event is usually an item belonging to the vocabulary of the timed sequences or it can also be a query when used in combinaison with dataframe. For more details about labels as a query in the context of dataframe, see

Parameters:
  • pos (int) – identifier of the event in the chronicle multiset

  • label (str) – The label of the event defines the nature of the event. Only string are possible.

clean() None

Destroy useless items and constraints (but does not remove all)

copy()
delete(itempos: int) None

Remove all events at position pos. The placeholder at position pos will still exists after deletion but the event is None.

Parameters:

itempos (int) – Position at which the event must be removed

Warning

This function does not remove any temporal constraint.

delete_constr(ei: int, ej: int) None

Destroy the constrains from ei to ej (if any). The user can ignore the order of the event indices.

Parameters:
  • ei (int) – Indices of the events in the chronicle.

  • ej (int) – Indices of the events in the chronicle.

draw(ax=None) None

Function to draw a chronicle in a matplotlib figure. This function is based on the graph drawing capabilities of NetworkX library.

Parameters:

ax (Matplotlib Axes object, optional.) – When specified, the function draws the graph in the specified Matplotlib axes.

Warning

This functions requires to install networkx and matplotlib. These two librairies are not in the requirements and must be installed manually by the user to makes work this function.

Example

import matplotlib.pyplot as plt

c = Chronicle()
c.add_event(0, "a")
c.add_event(1, "b")
c.add_event(2, "c")
c.add_constraint(0, 1, (np.timedelta64(4, "D"), np.timedelta64(10, "D")))
c.add_constraint(0, 2, (np.timedelta64(2, "D"), np.timedelta64(8, "D")))
c.add_constraint(1, 2, (np.timedelta64(3, "D"), np.timedelta64(13, "D")))

c.draw()

plt.show()
match(df_seq: TimedSequence) bool

This function varify whether there is at least one occurrence of the chronicle in the sequence df_seq.

This function is in average faster than the recognize function, because it early stopped as soon as a valid occurrence is found.

Parameters:

df_seq (Timed sequence representing a sequence of events) –

Returns:

True if the chronicle occurs in the sequence and False otherwise

Return type:

bool

minimize() None

Minimization of the temporal constraints. It transforms the set of temporal constraints into the maximal _equivalent_ set of constraints. The recognition of minimized chronicles is often more efficient and equivalent to the recognition of the initial chronicle.

In case the set of temporal constraints are inconsistent, the flag inconsistent is set to True and the function throws a warning. In this case, the temporal constraints are not modified. Note that inconsistent chronicles will not have any occurrences. It is the user responsability to not prevent from attempting the recognition of such patterns.

In case the temporal constraints are expressed with np.timedelta, the temporal constraints are first transformed in number of days. This may change the values of all the temporal constraints. The transformation is based on a Floyd-Warshall algorithm.

Example

>>> c = Chronicle()
>>> c.add_event(0, "a")
>>> c.add_event(1, "b")
>>> c.add_event(2, "c")
>>> c.add_constraint(0, 1, (4.0, 10.0) )
>>> c.add_constraint(1, 2, (3.0,13.0) )
>>> print(c)
C0       {{[a],[b],[c]}}
0,1: (4.0, 10.0)
1,2: (3.0, 13.0)
>>> c.minimize()
>>> print(c)
C0       {{[a],[b],[c]}}
0,1: (4.0, 10.0)
1,2: (3.0, 13.0)
0,2: (7.0, 23.0)
npat: int = 0
recognize(df_seq: TimedSequence) Sequence[Sequence[datetime64]] | Sequence[Sequence[float]]

Enumerates the chronicle occurrences in a sequence.

Parameters:

df_seqs ([ (l,t), … ], [l,…]) –

Description of a temporal sequence of events.

In a sequence the timestamps are datetime.

Returns:

list is an occurrence. It contains a list of n couples, where n is the chronicle size) Return a list of occurrences of the chronicle in the sequences

Return type:

[ [ p_1, p_2 …], [ p_1, p_2 …], …] (list of lists of positions/datetimes, each

pychronicles.fchronicle module

Module that implements the notion of fuzzy chronicle. A fuzzy chronicle is an extension of a chronicle that provides the possibility to make approximated matching. Instead of having strict temporal constraint (satisfied or not), the temporal constraint satistifaction is fuzzyfied.

The set of event is not fuzzyfied … and the recognition of a chronicle in a sequence continues to require the occurrence of all its events.

The recognition function (denoted cmp) has two parameters: * the lambda parameter specificies the level of fuzzyness of temporal constraints. This level of fuzzyness it used to evaluate the similarity measure between a subsequence and the chronicle. * the threshold parameter specifies the similarity threshold to decide whether the chronicle is considered to occur or not.

More details about the recognition of fuzzy chronicles can be found in the following article

Example

The following example illustrates the main functionalities of the TimeSequence class.

ts = TimedSequence(dates, data)

c = FuzzyChronicle()
c.add_event(0, "b")
c.add_event(1, "a")
c.add_event(2, "c")
c.add_constraint(0, 1, (np.timedelta64(13, "D"), np.timedelta64(17, "D")))
c.add_constraint(0, 2, (np.timedelta64(1, "D"), np.timedelta64(30, "D")))
c.minimize()
c.tunit = "D" # this line is required to specify the default temporal unit
print(c)

occ, sim = c.cmp(ts, 0.95, 0.3)
print("similarity:" + str(sim))
print("occurrence:" + str(occ))
Authors:

Thomas Guyet, Inria

Date:

08/2023

class pychronicles.fchronicle.FuzzyChronicle

Bases: Chronicle

Class for a fuzzy chronicle pattern modeling Enable efficient recognition of chronicles with approximated intervals.

The ̀`match` and recognize functions still work with the semantics of classical chronicles.

cmp(df_seq: TimedSequence, threshold: float, lbda: float = 0.01) Tuple[Sequence[Sequence[timedelta64]], Sequence[float]]

Method that checks whether the chronicle occurs in the sequence and evaluates the simiarlity for each of the occurrences.

Parameters:
  • df_seq (Dataframe, or list of itemsets or list of couples (date, event)) –

  • threshold (float in [0,1]) – minimal similarity measure to recognize a chronicle

  • lbda (float >0, optional) – parameter of the similarity measure

Returns:

Return a pair. The first element is the list of occurrences of the chronicle in the sequences (list of lists of positions, each list is an occurrence. It contains a list of n couples, where n is the chronicle size) The second element is the list of similarity between the occurrences and the chronicle. A similarity of 1 means an exact matching, lower similarity means that events have been found but not with the exact temporal bounds.

Return type:

([ [ p_1, p_2 …], [ p_1, p_2 …], …], [float, …] )

npat: int = 0

pychronicles.mtlformula module

MTL formula

Authors:

Thomas Guyet, Inria

Date:

08/2023

class pychronicles.mtlformula.ExtractMTL

Bases: Transformer

Class that transforms a MTL formula that is based on query expression into a set of expressions for pandas datasets and a MTL formula that can be used to query a dataframe

atom(tree)
expr(tree)
formula(tree)
interval(tree)
parse(formula)
class pychronicles.mtlformula.MTLAccessor(df: DataFrame)

Bases: object

match(formula: str)

formula is a MTL formula

pychronicles.pandas_tpatterns module

This module is dedicated to pandas accessors. Thanks to this accessors, you can use chronicle recognition and abstraction from dataset represented in pandas dataframes.

The principle of the dataframe is to represent a timed sequence. Each row of the dataframe represents an event, and the columns are the feature representing the nature of an event. Contrary to internal representation of timed sequence, the events can be described with diverse and multiple features (int, str, float, …)

Then, there are some (natural) requirements on the dataframe: first, the dataframe must at least be indexed by dates (dates of floats) and have a column to describe the even’ feature. In addition, if your dataset is made of several sequences, you may have a column to identify the individuals. This column can either be a classical column or a second index.

Once your dataset is represented with such a dataframe, all the functionalities of chronicles can be directly used with the dataframe (and using the tpattern accessor).

Note

When using chronicle with dataframe, the event of chronicles are defined through the value of the features in the dataframe. The typical specification of an event is a column/value pair, for instance label==”a” meaning that the label must be “a” in this case. It is possible to express much complex situation that is used to evaluate a row of the dataframe (with combinaison with and/or operator for instance).

For more details about the acceptable syntax for the chronicle event, we recommend the user to read the pandas documentation: [https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html]

Example

The following example illustrates the use of chronicle with pandas dataframes.

from pychronicle import Chronicle, TPatternAccessor

df = pd.DataFrame(
    {
        "label": [e[0] for e in seq],
        "str_val": [
            e[0] * 2 for e in seq
        ],  # illustration of another columns than "label"
        "num_val": np.random.randint(
            10, size=len(seq)
        ),  # illustration of another columns than "label"
    },
    index=[np.datetime64("1970-01-01") + np.timedelta64(e[1], "D") for e in seq],
)
print("----------------")

c = Chronicle()
c.add_event(0, 'label=="a"')
c.add_event(1, 'label=="b" & num_val>5')
c.add_event(2, 'label=="c"')
c.add_constraint(0, 1, (np.timedelta64(4, "D"), np.timedelta64(10, "D")))
c.add_constraint(0, 2, (np.timedelta64(2, "D"), np.timedelta64(8, "D")))
c.add_constraint(1, 2, (np.timedelta64(3, "D"), np.timedelta64(13, "D")))

reco = df.tpattern.match(c)
print(f"Reconnaissance numpy de la chronique: [{reco}]!")

reco = df.tpattern.recognize(c)
print(f"Reconnaissance numpy de la chronique: [{reco}]!")

##########################################################################
# Use with a dataframe representing a collection of sequences

# Create a dataframe representing several sequences with complex events, each sequence having its own id
grpdf = pd.DataFrame(
    {
        "label": [e[0] for e in seq] * 3,
        "str_val": [e[0] * 2 for e in seq]
        * 3,  # illustration of another columns than "label"
        "num_val": np.random.randint(
            10, size=3 * len(seq)
        ),  # illustration of another columns than "label"
        "id": [1] * len(seq) + [2] * len(seq) + [3] * len(seq),
    },
    index=[np.datetime64("1970-01-01") + np.timedelta64(e[1], "D") for e in seq]
    * 3,
)

# the match function checks chronicle matches on all the sequences at the same time and
# returns its answer for each chronicle
print(f"Does the chronicle in a dataset of sequences?")
reco = grpdf.groupby("id").apply(lambda d: d.tpattern.match(c))
print(reco)

print(f"What are the occurrences of a sequence in a dataset?")
reco = grpdf.groupby("id").apply(lambda d: d.tpattern.recognize(c))
print(reco)

##########################################
# Abstraction example

grpdf = pd.DataFrame(
    {
        "label": [np.random.choice(["a", "b", "c"]) for _ in range(20)],
        "id": [int(np.floor(i / 4)) for i in range(20)],
    }
)

chro = grpdf.tpattern.abstract("label", "id")
print(chro)
Authors:

Thomas Guyet, Inria

Date:

08/2023

class pychronicles.pandas_tpatterns.TPatternAccessor(df: DataFrame)

Bases: object

abstract(event: str, groupby: str = None)

Abstract a dataframes into a chronicle

Parameters:
  • event (str) – name of the dataframe column to use as event (must contains integers or str)

  • groupby (str, optional) – name of the column to identify groups of events. In this case the abstraction method outputs one chronicle that appear in each sequence identified by the groupby column.

Returns:

a chronicle that abstract the collection of sequences represented in the dataset

Return type:

Chronicle

match(c: Chronicle)
Parameters:

c (Chronicle) – Chronicle to recognize

Returns:

True is the MTL is recognize in the sequence

Return type:

bool

match_mtl(formula: str, dt: float = 1)

Formula is a MTL formula

Only for dataframe with index with floats or integers (not dates)

Parameters:
  • formula (str) – MTL formula to recognize

  • dt (int, default 1) – Time delta used in the MTL recognition

Returns:

True is the MTL is recognize in the sequence

Return type:

bool

recognize(c: Chronicle)
Parameters:

c (Chronicle) – Chronicle to recognize

Returns:

Occurrences of the chronicle in the sequence

Return type:

[[int]]

pychronicles.timedsequence module

This module implements the class to model time sequences. A timed sequence is a sequence of events, represented by a label belonging to a vocabulary, that have a timestamp.

For compatibility with chronicle recognition, we recommand to represent labels by strings (it should work with any object equipped with __eq__ operator, but at the time, it is not a safe usage).

The timestamp of an event in a time instant (not an interval). This time instant can be modeled in two different manners:

  • float: basic representantion of a metric quantity, but that is definitivelly meaningless,

  • numpy.datetime64: standard representation of date in Numpy. This allows to describe events

    in real datasets is a natural way (without having to convert them as float).

The class is equipped with functions to ease their intuitive usage. For instance: select subsequences by date, event type, etc. This functionnalities are illustrated in the example below.

Warning

Be careful to use numpy.datetime64 dates but not datetime (from the datetime package) that do not provide the same interface and that is not compatible with TimedSequences.

Example

The following example illustrates the main functionalities of the TimeSequence class.

# Example of sequence
seq = [ ("a", 1), ("c", 2), ("b", 3), ("a", 8), ("a", 10), ("b", 12), ("a", 15), ("c", 17), ("b", 20), ("c", 23), ("c", 25), ("b", 26), ("c", 28), ("b", 30) ]

dates = np.array(
    [np.datetime64("1970-01-01") + np.timedelta64(e[1], "D") for e in seq],
    dtype="datetime64",
)
data = np.array([e[0] for e in seq])

ts = TimedSequence(dates, data)
print(ts)
print("---- time based selection ------")
tssel = ts[ts < np.datetime64("1970-01-07")]
print(tssel)

print("----- item based selection ------")
tssel = ts[ts == "a"]
print(tssel)

print("----- start -----")
print(tssel.start())

print("----- at ------")
print(ts.at(np.datetime64("1970-01-02")))
print(ts.at(np.datetime64("1970-01-08")))

######################

dates = np.array([float(e[1]) for e in seq], dtype="float")
data = np.array([e[0] for e in seq])

ts = TimedSequence(dates, data)
print(ts)
print("---- time based selection ------")
tssel = ts[ts < 6.0]
print(tssel)

try:
    tssel = ts[ts < 6]
except ValueError:
    print("Floats are mandatory")

print("----- item based selection ------")
tssel = ts[ts == "a"]
print(tssel)

print("----- start -----")
print(tssel.start())

print("----- at ------")
print(ts.at(2))
print(ts.at(7.0))
Authors:

Thomas Guyet, Inria

Date:

08/2023

class pychronicles.timedsequence.TimedSequence(dates: Sequence[datetime64] | Sequence[float], data: Sequence[str] | Sequence[int])

Bases: object

at(dt: datetime64 | float) int | str
end() datetime64 | float
len() int
start() datetime64 | float

pychronicles.utils module

This module contains functions to import and export Chronicle object in a text format. We use the format proposed by Dousson et al. in its CRS (Chronicle Recognition System).

Authors:

Thomas Guyet, Inria

Date:

08/2023

pychronicles.utils.load(crs: str) Chronicle

Load a chronicle from a string in the CRS format. Note that the all brackets (“[]” in chronicle or events names; and “()”) are assumed to be empty in this function !!!

This is a class-function.

Parameters:
  • crs (str) – String describing a string in a CRS format

  • emapper (event mapper object, optional) – An external event mapper

Returns:

The newly instantiated chronicle

Return type:

Chronicle

pychronicles.utils.to_crs(c: Chronicle) str

Generate a string representing the chronicle in the CRS format.

Unnamed events (must be figures) are called “E”+str(X) in the event description to avoid events name starting with figures (CNAME conventions) Infinite intervals are not printed out, but semi-infinite intervals will generate an description like ‘[-inf,23]’, or ‘[34,inf]’ : do not know whether it is sound or not!

Parameters:

c (Chronicle) – A chronicle to export in CRS string.

Returns:

The CRS description of a chronicle

Return type:

str

Module contents

Chronicles package

@author: Thomas Guyet @date: 10/2022 @institution: Inria