pychronicles package
Submodules
pychronicles.abstraction module
The abstraction module offers a class for the generalisation of a collection of sequences as a chronicle.
The problem of generalisation is the construction of a unique chronicle that occurs in every sequences to generalize, such that it represents the “common” part of them.
It is worth noticing that the empty chronicle is a possible solution but is useless. To be meaningfull, the algorithm extracts the “largest” chronicle. The notion of “largest” is defined as the chronicle with the largest multiset and the narrowest temporal constraints. This minimum exists in the specific case of finite collection of timed sequences (for more formal details, see “Chronicles: Formalization of a Temporal Model”, Besnard and Guyet, 2023).
Example
The following example illustrates the generalization of 4 sequences as a chronicle.
seq = [("a", 1), ("c", 2), ("b", 3), ("a", 8), ("a", 10), ("b", 12)]
dates = np.array([e[1] for e in seq], dtype="float")
data = np.array([e[0] for e in seq])
ts1 = TimedSequence(dates, data)
seq = [("a", 1), ("b", 12), ("c", 23), ("b", 30)]
dates = np.array([e[1] for e in seq], dtype="float")
data = np.array([e[0] for e in seq])
ts2 = TimedSequence(dates, data)
seq = [("a", 25), ("b", 26), ("c", 28), ("b", 30)]
dates = np.array([e[1] for e in seq], dtype="float")
data = np.array([e[0] for e in seq])
ts3 = TimedSequence(dates, data)
seq = [("b", 20), ("c", 23), ("a", 25), ("b", 30)]
dates = np.array([e[1] for e in seq], dtype="float")
data = np.array([e[0] for e in seq])
ts4 = TimedSequence(dates, data)
#########################
abs = Abstracter()
c = abs.abstract([ts1, ts2, ts3, ts4])
print(c)
- Authors:
Thomas Guyet, Inria
- Date:
08/2023
- class pychronicles.abstraction.Abstracter
Bases:
object
- abstract(sequences: Sequence[TimedSequence]) Chronicle
- Parameters:
sequences ([TimedSequence]) – List of timed sequences to generalize as a chronicle
- Returns:
The largest chronicle that occurs in all the timed sequences.
- Return type:
- static first_occurrence(sequence: TimedSequence, ms: Multiset) Sequence[int] | None
pychronicles.chronicle module
The module mainly defines the implementation of Chronicle class. This class provides the basic functionalities to specify a chronicle and to recognize or enumerate chronicle occurrences in a timed sequence (see TimeSequence class).
Warning
There are two possible modelings of time constraints: float or numpy.timedelta64. The user has to be consistent: all temporal constraints must use the same type of duration and it must be consistent with the one of the timed sequences (if you use chronicles on timed sequences)
Example
The following example illustrates the main functionalities of the Chronicle class.
# Example of sequence
seq = [ ("a", 1), ("c", 2), ("b", 3), ("a", 8), ("a", 10), ("b", 12), ("a", 15), ("c", 17), ("b", 20), ("c", 23), ("c", 25), ("b", 26), ("c", 28), ("b", 30) ]
dates = np.array(
[np.datetime64("1970-01-01") + np.timedelta64(e[1], "D") for e in seq],
dtype="datetime64",
)
data = np.array([e[0] for e in seq])
ts = TimedSequence(dates, data)
c = Chronicle()
c.add_event(0, "a")
c.add_event(1, "b")
c.add_event(2, "c")
c.add_constraint(0, 1, (np.timedelta64(4, "D"), np.timedelta64(10, "D")))
c.add_constraint(0, 2, (np.timedelta64(2, "D"), np.timedelta64(8, "D")))
c.add_constraint(1, 2, (np.timedelta64(3, "D"), np.timedelta64(13, "D")))
try:
import matplotlib.pyplot as plt
c.draw()
plt.show()
except:
pass
print(c)
c.minimize()
print(c)
reco = c.match(ts)
print(f"Reconnaissance de la chronique: [{reco}]!")
reco = c.recognize(ts)
print(f"Reconnaissance de la chronique: [{reco}]!")
print(c)
c2 = c.copy()
c2.add_constraint(0, 2, (np.timedelta64(4, "D"), np.timedelta64(4, "D")))
print(c)
#################################
# Sequence with floats
dates = np.array([float(e[1]) for e in seq], dtype="float")
data = np.array([e[0] for e in seq])
ts = TimedSequence(dates, data)
c = Chronicle()
c.add_event(0, "a")
c.add_event(1, "b")
c.add_event(2, "c")
c.add_constraint(0, 1, (4.0, 10.0))
c.add_constraint(0, 2, (2.0, 8.0))
c.add_constraint(1, 2, (3.0, 13.0))
print(c)
c.minimize()
print(c)
reco = c.match(ts)
print(f"Reconnaissance de la chronique: [{reco}]!")
reco = c.recognize(ts)
print(f"Reconnaissance de la chronique: [{reco}]!")
print(c)
c2 = c.copy()
c2.add_constraint(0, 2, (np.timedelta64(4, "D"), np.timedelta64(4, "D")))
print(c)
- Authors:
Thomas Guyet, Inria
- Date:
08/2023
- class pychronicles.chronicle.Chronicle
Bases:
object
Class for a chronicle pattern modeling -> enables to have partially defined chronicles
- sequence
a list of events representing a multiset. It may work (without guarantee) with any event type equipped with an __eq__() operator.
- tconst
a map assigning an temporal constraint (lower and upper bounds) of the delay between the events in the key. The delay is expressed in timedelta. It is possible to define infinite intervals using None for one bound (e.g. (None, numpy.timedelta(56,’D’)) ). With discourage the use of (None,None) that may reduce the algorithm efficiency compare to not have any constraints.
- Type:
{(int,int):(numpy.timedelta64, numpy.timedelta64)}
- inconsistent
True is the chronicle is inconsistent and had a consistency check (through minimization)
- Type:
- add_constraint(ei: int, ej: int, constr: Tuple[timedelta64, timedelta64] | Tuple[float, float]) None
Add a temporal constraint (couple $[m,M]$) from event ei to ej
- Parameters:
ei (int) – index of the events in the multiset
ej (int) – index of the events in the multiset
constr ((np.timedelta64, np.timedelta64) or (float, float)) – A couple representing the temporal constraint to add between ei and ej. The temporal constraints gives the minimum and the maximum delay between the occurrences of the events.
ordered (ei and ej are internally ordered (ei<ej). If not) –
automatically (it is) –
constraint). (reversed (with a reversed temporal) –
events (If there is already a existing constraint between the two) –
overrided. (it is) –
- add_event(pos: int, label: str) None
Add an event to the chronicle multiset at a given position.
An event is usually an item belonging to the vocabulary of the timed sequences or it can also be a query when used in combinaison with dataframe. For more details about labels as a query in the context of dataframe, see
- copy()
- delete(itempos: int) None
Remove all events at position pos. The placeholder at position pos will still exists after deletion but the event is None.
- Parameters:
itempos (int) – Position at which the event must be removed
Warning
This function does not remove any temporal constraint.
- delete_constr(ei: int, ej: int) None
Destroy the constrains from ei to ej (if any). The user can ignore the order of the event indices.
- draw(ax=None) None
Function to draw a chronicle in a matplotlib figure. This function is based on the graph drawing capabilities of NetworkX library.
- Parameters:
ax (Matplotlib Axes object, optional.) – When specified, the function draws the graph in the specified Matplotlib axes.
Warning
This functions requires to install networkx and matplotlib. These two librairies are not in the requirements and must be installed manually by the user to makes work this function.
Example
import matplotlib.pyplot as plt c = Chronicle() c.add_event(0, "a") c.add_event(1, "b") c.add_event(2, "c") c.add_constraint(0, 1, (np.timedelta64(4, "D"), np.timedelta64(10, "D"))) c.add_constraint(0, 2, (np.timedelta64(2, "D"), np.timedelta64(8, "D"))) c.add_constraint(1, 2, (np.timedelta64(3, "D"), np.timedelta64(13, "D"))) c.draw() plt.show()
- match(df_seq: TimedSequence) bool
This function varify whether there is at least one occurrence of the chronicle in the sequence df_seq.
This function is in average faster than the recognize function, because it early stopped as soon as a valid occurrence is found.
- Parameters:
df_seq (Timed sequence representing a sequence of events) –
- Returns:
True if the chronicle occurs in the sequence and False otherwise
- Return type:
- minimize() None
Minimization of the temporal constraints. It transforms the set of temporal constraints into the maximal _equivalent_ set of constraints. The recognition of minimized chronicles is often more efficient and equivalent to the recognition of the initial chronicle.
In case the set of temporal constraints are inconsistent, the flag inconsistent is set to True and the function throws a warning. In this case, the temporal constraints are not modified. Note that inconsistent chronicles will not have any occurrences. It is the user responsability to not prevent from attempting the recognition of such patterns.
In case the temporal constraints are expressed with np.timedelta, the temporal constraints are first transformed in number of days. This may change the values of all the temporal constraints. The transformation is based on a Floyd-Warshall algorithm.
Example
>>> c = Chronicle() >>> c.add_event(0, "a") >>> c.add_event(1, "b") >>> c.add_event(2, "c") >>> c.add_constraint(0, 1, (4.0, 10.0) ) >>> c.add_constraint(1, 2, (3.0,13.0) ) >>> print(c) C0 {{[a],[b],[c]}} 0,1: (4.0, 10.0) 1,2: (3.0, 13.0) >>> c.minimize() >>> print(c) C0 {{[a],[b],[c]}} 0,1: (4.0, 10.0) 1,2: (3.0, 13.0) 0,2: (7.0, 23.0)
- recognize(df_seq: TimedSequence) Sequence[Sequence[datetime64]] | Sequence[Sequence[float]]
Enumerates the chronicle occurrences in a sequence.
- Parameters:
df_seqs ([ (l,t), … ], [l,…]) –
Description of a temporal sequence of events.
In a sequence the timestamps are datetime.
- Returns:
list is an occurrence. It contains a list of n couples, where n is the chronicle size) Return a list of occurrences of the chronicle in the sequences
- Return type:
[ [ p_1, p_2 …], [ p_1, p_2 …], …] (list of lists of positions/datetimes, each
pychronicles.fchronicle module
Module that implements the notion of fuzzy chronicle. A fuzzy chronicle is an extension of a chronicle that provides the possibility to make approximated matching. Instead of having strict temporal constraint (satisfied or not), the temporal constraint satistifaction is fuzzyfied.
The set of event is not fuzzyfied … and the recognition of a chronicle in a sequence continues to require the occurrence of all its events.
The recognition function (denoted cmp) has two parameters: * the lambda parameter specificies the level of fuzzyness of temporal constraints. This level of fuzzyness it used to evaluate the similarity measure between a subsequence and the chronicle. * the threshold parameter specifies the similarity threshold to decide whether the chronicle is considered to occur or not.
More details about the recognition of fuzzy chronicles can be found in the following article
Example
The following example illustrates the main functionalities of the TimeSequence class.
ts = TimedSequence(dates, data)
c = FuzzyChronicle()
c.add_event(0, "b")
c.add_event(1, "a")
c.add_event(2, "c")
c.add_constraint(0, 1, (np.timedelta64(13, "D"), np.timedelta64(17, "D")))
c.add_constraint(0, 2, (np.timedelta64(1, "D"), np.timedelta64(30, "D")))
c.minimize()
c.tunit = "D" # this line is required to specify the default temporal unit
print(c)
occ, sim = c.cmp(ts, 0.95, 0.3)
print("similarity:" + str(sim))
print("occurrence:" + str(occ))
- Authors:
Thomas Guyet, Inria
- Date:
08/2023
- class pychronicles.fchronicle.FuzzyChronicle
Bases:
Chronicle
Class for a fuzzy chronicle pattern modeling Enable efficient recognition of chronicles with approximated intervals.
The ̀`match` and recognize functions still work with the semantics of classical chronicles.
- cmp(df_seq: TimedSequence, threshold: float, lbda: float = 0.01) Tuple[Sequence[Sequence[timedelta64]], Sequence[float]]
Method that checks whether the chronicle occurs in the sequence and evaluates the simiarlity for each of the occurrences.
- Parameters:
- Returns:
Return a pair. The first element is the list of occurrences of the chronicle in the sequences (list of lists of positions, each list is an occurrence. It contains a list of n couples, where n is the chronicle size) The second element is the list of similarity between the occurrences and the chronicle. A similarity of 1 means an exact matching, lower similarity means that events have been found but not with the exact temporal bounds.
- Return type:
([ [ p_1, p_2 …], [ p_1, p_2 …], …], [float, …] )
pychronicles.mtlformula module
MTL formula
- Authors:
Thomas Guyet, Inria
- Date:
08/2023
- class pychronicles.mtlformula.ExtractMTL
Bases:
Transformer
Class that transforms a MTL formula that is based on query expression into a set of expressions for pandas datasets and a MTL formula that can be used to query a dataframe
- atom(tree)
- expr(tree)
- formula(tree)
- interval(tree)
- parse(formula)
pychronicles.pandas_tpatterns module
This module is dedicated to pandas accessors. Thanks to this accessors, you can use chronicle recognition and abstraction from dataset represented in pandas dataframes.
The principle of the dataframe is to represent a timed sequence. Each row of the dataframe represents an event, and the columns are the feature representing the nature of an event. Contrary to internal representation of timed sequence, the events can be described with diverse and multiple features (int, str, float, …)
Then, there are some (natural) requirements on the dataframe: first, the dataframe must at least be indexed by dates (dates of floats) and have a column to describe the even’ feature. In addition, if your dataset is made of several sequences, you may have a column to identify the individuals. This column can either be a classical column or a second index.
Once your dataset is represented with such a dataframe, all the functionalities of chronicles can be directly used with the dataframe (and using the tpattern accessor).
Note
When using chronicle with dataframe, the event of chronicles are defined through the value of the features in the dataframe. The typical specification of an event is a column/value pair, for instance label==”a” meaning that the label must be “a” in this case. It is possible to express much complex situation that is used to evaluate a row of the dataframe (with combinaison with and/or operator for instance).
For more details about the acceptable syntax for the chronicle event, we recommend the user to read the pandas documentation: [https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html]
Example
The following example illustrates the use of chronicle with pandas dataframes.
from pychronicle import Chronicle, TPatternAccessor
df = pd.DataFrame(
{
"label": [e[0] for e in seq],
"str_val": [
e[0] * 2 for e in seq
], # illustration of another columns than "label"
"num_val": np.random.randint(
10, size=len(seq)
), # illustration of another columns than "label"
},
index=[np.datetime64("1970-01-01") + np.timedelta64(e[1], "D") for e in seq],
)
print("----------------")
c = Chronicle()
c.add_event(0, 'label=="a"')
c.add_event(1, 'label=="b" & num_val>5')
c.add_event(2, 'label=="c"')
c.add_constraint(0, 1, (np.timedelta64(4, "D"), np.timedelta64(10, "D")))
c.add_constraint(0, 2, (np.timedelta64(2, "D"), np.timedelta64(8, "D")))
c.add_constraint(1, 2, (np.timedelta64(3, "D"), np.timedelta64(13, "D")))
reco = df.tpattern.match(c)
print(f"Reconnaissance numpy de la chronique: [{reco}]!")
reco = df.tpattern.recognize(c)
print(f"Reconnaissance numpy de la chronique: [{reco}]!")
##########################################################################
# Use with a dataframe representing a collection of sequences
# Create a dataframe representing several sequences with complex events, each sequence having its own id
grpdf = pd.DataFrame(
{
"label": [e[0] for e in seq] * 3,
"str_val": [e[0] * 2 for e in seq]
* 3, # illustration of another columns than "label"
"num_val": np.random.randint(
10, size=3 * len(seq)
), # illustration of another columns than "label"
"id": [1] * len(seq) + [2] * len(seq) + [3] * len(seq),
},
index=[np.datetime64("1970-01-01") + np.timedelta64(e[1], "D") for e in seq]
* 3,
)
# the match function checks chronicle matches on all the sequences at the same time and
# returns its answer for each chronicle
print(f"Does the chronicle in a dataset of sequences?")
reco = grpdf.groupby("id").apply(lambda d: d.tpattern.match(c))
print(reco)
print(f"What are the occurrences of a sequence in a dataset?")
reco = grpdf.groupby("id").apply(lambda d: d.tpattern.recognize(c))
print(reco)
##########################################
# Abstraction example
grpdf = pd.DataFrame(
{
"label": [np.random.choice(["a", "b", "c"]) for _ in range(20)],
"id": [int(np.floor(i / 4)) for i in range(20)],
}
)
chro = grpdf.tpattern.abstract("label", "id")
print(chro)
- Authors:
Thomas Guyet, Inria
- Date:
08/2023
- class pychronicles.pandas_tpatterns.TPatternAccessor(df: DataFrame)
Bases:
object
- abstract(event: str, groupby: str = None)
Abstract a dataframes into a chronicle
- Parameters:
- Returns:
a chronicle that abstract the collection of sequences represented in the dataset
- Return type:
pychronicles.timedsequence module
This module implements the class to model time sequences. A timed sequence is a sequence of events, represented by a label belonging to a vocabulary, that have a timestamp.
For compatibility with chronicle recognition, we recommand to represent labels by strings (it should work with any object equipped with __eq__ operator, but at the time, it is not a safe usage).
The timestamp of an event in a time instant (not an interval). This time instant can be modeled in two different manners:
float: basic representantion of a metric quantity, but that is definitivelly meaningless,
- numpy.datetime64: standard representation of date in Numpy. This allows to describe events
in real datasets is a natural way (without having to convert them as float).
The class is equipped with functions to ease their intuitive usage. For instance: select subsequences by date, event type, etc. This functionnalities are illustrated in the example below.
Warning
Be careful to use numpy.datetime64 dates but not datetime (from the datetime package) that do not provide the same interface and that is not compatible with TimedSequences.
Example
The following example illustrates the main functionalities of the TimeSequence class.
# Example of sequence
seq = [ ("a", 1), ("c", 2), ("b", 3), ("a", 8), ("a", 10), ("b", 12), ("a", 15), ("c", 17), ("b", 20), ("c", 23), ("c", 25), ("b", 26), ("c", 28), ("b", 30) ]
dates = np.array(
[np.datetime64("1970-01-01") + np.timedelta64(e[1], "D") for e in seq],
dtype="datetime64",
)
data = np.array([e[0] for e in seq])
ts = TimedSequence(dates, data)
print(ts)
print("---- time based selection ------")
tssel = ts[ts < np.datetime64("1970-01-07")]
print(tssel)
print("----- item based selection ------")
tssel = ts[ts == "a"]
print(tssel)
print("----- start -----")
print(tssel.start())
print("----- at ------")
print(ts.at(np.datetime64("1970-01-02")))
print(ts.at(np.datetime64("1970-01-08")))
######################
dates = np.array([float(e[1]) for e in seq], dtype="float")
data = np.array([e[0] for e in seq])
ts = TimedSequence(dates, data)
print(ts)
print("---- time based selection ------")
tssel = ts[ts < 6.0]
print(tssel)
try:
tssel = ts[ts < 6]
except ValueError:
print("Floats are mandatory")
print("----- item based selection ------")
tssel = ts[ts == "a"]
print(tssel)
print("----- start -----")
print(tssel.start())
print("----- at ------")
print(ts.at(2))
print(ts.at(7.0))
- Authors:
Thomas Guyet, Inria
- Date:
08/2023
pychronicles.utils module
This module contains functions to import and export Chronicle object in a text format. We use the format proposed by Dousson et al. in its CRS (Chronicle Recognition System).
- Authors:
Thomas Guyet, Inria
- Date:
08/2023
- pychronicles.utils.load(crs: str) Chronicle
Load a chronicle from a string in the CRS format. Note that the all brackets (“[]” in chronicle or events names; and “()”) are assumed to be empty in this function !!!
This is a class-function.
- pychronicles.utils.to_crs(c: Chronicle) str
Generate a string representing the chronicle in the CRS format.
Unnamed events (must be figures) are called “E”+str(X) in the event description to avoid events name starting with figures (CNAME conventions) Infinite intervals are not printed out, but semi-infinite intervals will generate an description like ‘[-inf,23]’, or ‘[34,inf]’ : do not know whether it is sound or not!
Module contents
Chronicles package
@author: Thomas Guyet @date: 10/2022 @institution: Inria