library(processpredictR)
library(bupaR)
library(ggplot2)
library(dplyr)
library(keras)
library(purrr)
The goal of processpredictR is to perform prediction tasks on processes using event logs and Transformer models. The 5 process monitoring tasks are defined as follows:
The overall approach using processpredictR
is shown in
the Figure below. prepare_examples()
transforms logs into a
dataset that can be used for training and prediction, which is
thereafter split into train and test set. Subsequently a model is made,
compiled and fit. Finally, the model can be used to predict and can be
evaluated
Different levels of customization are offered. Using
create_model()
, a standard off-the-shelf model can be
created for each of the supported tasks, including standard
features.
A first customization is to include additional features, such as case
or event attributes. These can be configured in the
prepare_examples()
step, and they will be processed
automatically (normalized for numerical features, or hot-encoded for
categorical features).
A further way to customize your model, is to only generate the input
layer of the model with create_model()
, and define the
remainder of the model yourself by adding keras
layers
using the provided stack_layers()
function.
Going beyond that, you can also create the model entirely yourself
using keras
, including the preprocessing of the data.
Auxiliary functions are provided to help you with, e.g., tokenizing
activity sequences.
In the remainder of this tutorial, each of the steps and possible avenues for customization will be described in more detail.
As a first step in the process prediction workflow we use
prepare_examples()
to obtain a dataset, where:
The returned object is of class ppred_examples_df
, which
inherits from tbl_df
.
In this tutorial we will use the traffic_fines
event log
from eventdataR
. Note that both eventlog
and
activitylog
objects, as defined by bupaR
are
supported.
df <- prepare_examples(traffic_fines, task = "outcome")
df
#> # A tibble: 34,724 × 11
#> ith_case case_id prefix prefix_list outcome k activity resource
#> <int> <chr> <chr> <list> <fct> <dbl> <chr> <fct>
#> 1 1 A2127 Create Fine <chr [1]> Payment 0 Create … 537
#> 2 1 A2127 Create Fine - P… <chr [2]> Payment 1 Payment <NA>
#> 3 2 A15 Create Fine <chr [1]> Send f… 0 Create … 561
#> 4 2 A15 Create Fine - S… <chr [2]> Send f… 1 Send Fi… <NA>
#> 5 2 A15 Create Fine - S… <chr [3]> Send f… 2 Insert … <NA>
#> 6 2 A15 Create Fine - S… <chr [4]> Send f… 3 Add pen… <NA>
#> 7 2 A15 Create Fine - S… <chr [5]> Send f… 4 Send fo… <NA>
#> 8 3 A1820 Create Fine <chr [1]> Payment 0 Create … 563
#> 9 3 A1820 Create Fine - P… <chr [2]> Payment 1 Payment <NA>
#> 10 4 A22 Create Fine <chr [1]> Payment 0 Create … 561
#> # ℹ 34,714 more rows
#> # ℹ 3 more variables: start_time <dttm>, end_time <dttm>,
#> # remaining_trace_list <list>
We split the transformed dataset df
into train- and test
sets for later use in fit()
and predict()
,
respectively. The proportion of the train set is configured with the
split
argument.
set.seed(123)
split <- df %>% split_train_test(split = 0.8)
split$train_df %>% head(5)
#> # A tibble: 5 × 11
#> ith_case case_id prefix prefix_list outcome k activity resource
#> <int> <chr> <chr> <list> <fct> <dbl> <chr> <fct>
#> 1 1 A2127 Create Fine <chr [1]> Payment 0 Create … 537
#> 2 1 A2127 Create Fine - Pa… <chr [2]> Payment 1 Payment <NA>
#> 3 2 A15 Create Fine <chr [1]> Send f… 0 Create … 561
#> 4 2 A15 Create Fine - Se… <chr [2]> Send f… 1 Send Fi… <NA>
#> 5 2 A15 Create Fine - Se… <chr [3]> Send f… 2 Insert … <NA>
#> # ℹ 3 more variables: start_time <dttm>, end_time <dttm>,
#> # remaining_trace_list <list>
split$test_df %>% head(5)
#> # A tibble: 5 × 11
#> ith_case case_id prefix prefix_list outcome k activity resource
#> <int> <chr> <chr> <list> <fct> <dbl> <chr> <fct>
#> 1 8001 A24869 Create Fine <chr [1]> Payment 0 Create … 559
#> 2 8001 A24869 Create Fine - Pa… <chr [2]> Payment 1 Payment <NA>
#> 3 8002 A24871 Create Fine <chr [1]> Payment 0 Create … 559
#> 4 8002 A24871 Create Fine - Pa… <chr [2]> Payment 1 Payment <NA>
#> 5 8003 A24872 Create Fine <chr [1]> Send f… 0 Create … 559
#> # ℹ 3 more variables: start_time <dttm>, end_time <dttm>,
#> # remaining_trace_list <list>
It’s important to note that the split is done at case level (a case is fully part of either the train data or either the test data). Furthermore, the split is done chronologically, meaning that the train set contains the split% first cases, and the test set contains the (1-split)% last cases.
Note that because the split is done at case level, the percentage of all examples in the train set can be slightly different, as cases differ with respect their length.
The next step in the workflow is to build a model.
processpredictR
provides a default set of functions that
are wrappers of generics provided by keras
. For ease of
use, the preprocessing steps, such as tokenizing of sequences,
normalizing numerical features, etc. happen within the
create_model()
function and are abstracted from the
user.
Based on the train set we define the default transformer model, using
create_model()
.
model <- split$train_df %>% create_model(name = "my_model")
# pass arguments as ... that are applicable to keras::keras_model()
model # is a list
#> Model: "my_model"
#> ________________________________________________________________________________
#> Layer (type) Output Shape Param #
#> ================================================================================
#> input_1 (InputLayer) [(None, 9)] 0
#> token_and_position_embedding (Toke (None, 9, 36) 792
#> nAndPositionEmbedding)
#> transformer_block (TransformerBloc (None, 9, 36) 26056
#> k)
#> global_average_pooling1d (GlobalAv (None, 36) 0
#> eragePooling1D)
#> dropout_3 (Dropout) (None, 36) 0
#> dense_3 (Dense) (None, 64) 2368
#> dropout_2 (Dropout) (None, 64) 0
#> dense_2 (Dense) (None, 6) 390
#> ================================================================================
#> Total params: 29,606
#> Trainable params: 29,606
#> Non-trainable params: 0
#> ________________________________________________________________________________
Some useful information and metrics are stored for a tracebility and an easy extraction if needed.
#> $names
#> [1] "model" "max_case_length" "number_features" "task"
#> [5] "num_outputs" "vocabulary"
Note that create_model()
returns a list, in which the
actual keras model is stored under the element name model
.
Thus, we can use functions from the keras-package as follows:
#> [1] "my_model"
#> list()
The result of create_model()
is assigned it’s own class
(ppred_model
) for which the processpredictR
provides the methods compile(), fit(),
predict() and evaluate().
The following step is to compile the model. By default, the loss function is the log-cosh or the categorical cross entropy, for regression tasks (next time and remaining time) and classification tasks, respectively. It is of course possible to override the defaults.
#> Compilation complete!
Training of the model is done with the fit()
function.
During training, a visualization window will open in the Viewer-pane to
show the progress in terms of loss. Optionally, the result of
fit()
can be assigned to an object to access the training
metrics specified in compile().
#> $verbose
#> [1] 1
#>
#> $epochs
#> [1] 5
#>
#> $steps
#> [1] 2227
#> $loss
#> [1] 0.7875332 0.7410239 0.7388409 0.7385073 0.7363014
#>
#> $sparse_categorical_accuracy
#> [1] 0.6539739 0.6713067 0.6730579 0.6735967 0.6747193
#>
#> $val_loss
#> [1] 0.7307042 0.7261314 0.7407018 0.7326428 0.7317348
#>
#> $val_sparse_categorical_accuracy
#> [1] 0.6725934 0.6727730 0.6725934 0.6725934 0.6722342
The method predict() can return 3 types of output, by
setting the argument output
to “append”, “y_pred” or
“raw”.
Test dataset with appended predicted values (output = “append”)
predictions <- model %>% predict(test_data = split$test_df,
output = "append") # default
predictions %>% head(5)
#> # A tibble: 5 × 13
#> ith_case case_id prefix prefix_…¹ outcome k activ…² resou…³
#> <int> <chr> <chr> <list> <fct> <dbl> <chr> <fct>
#> 1 8001 A24869 Create Fine <chr [1]> Payment 0 Create… 559
#> 2 8001 A24869 Create Fine - Payment <chr [2]> Payment 1 Payment <NA>
#> 3 8002 A24871 Create Fine <chr [1]> Payment 0 Create… 559
#> 4 8002 A24871 Create Fine - Payment <chr [2]> Payment 1 Payment <NA>
#> 5 8003 A24872 Create Fine <chr [1]> Send f… 0 Create… 559
#> # … with 5 more variables: start_time <dttm>, end_time <dttm>,
#> # remaining_trace_list <list>, y_pred <dbl>, pred_outcome <chr>, and
#> # abbreviated variable names ¹prefix_list, ²activity, ³resource
#> Payment Send for Credit Collection Send Fine
#> [1,] 4.966056e-01 0.344094276 1.423686e-01
#> [2,] 9.984029e-01 0.001501600 8.890528e-05
#> [3,] 4.966056e-01 0.344094276 1.423686e-01
#> [4,] 9.984029e-01 0.001501600 8.890528e-05
#> [5,] 4.966056e-01 0.344094276 1.423686e-01
#> [6,] 1.556145e-01 0.518976271 2.884890e-01
#> [7,] 2.345311e-01 0.715000629 5.147375e-06
#> [8,] 2.627363e-01 0.726804197 5.480492e-06
#> [9,] 3.347774e-05 0.999961376 2.501280e-08
#> [10,] 4.966056e-01 0.344094276 1.423686e-01
#> [1] "Payment" "Payment"
#> [3] "Payment" "Payment"
#> [5] "Payment" "Send for Credit Collection"
#> [7] "Send for Credit Collection" "Send for Credit Collection"
#> [9] "Send for Credit Collection" "Payment"
#> [11] "Send for Credit Collection" "Payment"
#> [13] "Send for Credit Collection" "Payment"
#> [15] "Send for Credit Collection" "Send for Credit Collection"
#> [17] "Send for Credit Collection" "Send for Credit Collection"
#> [19] "Payment" "Send for Credit Collection"
For the classification tasks outcome and next activity a
confusion_matrix()
function is provided.
#> [1] "ppred_predictions" "ppred_examples_df" "ppred_examples_df"
#> [4] "ppred_examples_df" "tbl_df" "tbl"
#> [7] "data.frame"
#>
#> Payment Send Appeal to Prefecture
#> Appeal to Judge 2 6
#> Notify Result Appeal to Offender 0 0
#> Payment 1903 7
#> Send Appeal to Prefecture 34 90
#> Send Fine 387 0
#> Send for Credit Collection 688 22
#>
#> Send for Credit Collection
#> Appeal to Judge 10
#> Notify Result Appeal to Offender 0
#> Payment 617
#> Send Appeal to Prefecture 89
#> Send Fine 387
#> Send for Credit Collection 2644
Plot method for the confusion matrix (classification) or a scatter plot (regression).
Next to the activity prefixes in the data, and standard features
defined for each task, additional features can be defined when using
prepare_examples()
. The example below shows how the month
in which a case is started can be added as a feature.
# preprocessed dataset with categorical hot encoded features
df_next_time <- traffic_fines %>%
group_by_case() %>%
mutate(month = lubridate::month(min(timestamp), label = TRUE)) %>%
ungroup_eventlog() %>%
prepare_examples(task = "next_time", features = "month") %>% split_train_test()
#> [1] "latest_duration" "throughput_time" "processing_time"
#> [4] "time_before_activity" "month_jan" "month_feb"
#> [7] "month_mrt" "month_apr" "month_mei"
#> [10] "month_jun" "month_jul" "month_aug"
#> [13] "month_sep" "month_okt" "month_nov"
#> [16] "month_dec"
#> [1] "month_jan" "month_feb" "month_mrt" "month_apr" "month_mei" "month_jun"
#> [7] "month_jul" "month_aug" "month_sep" "month_okt" "month_nov" "month_dec"
Additional features can be either numerical variables, or factors. Numerical variables will be automatically normalized. Factors will automatically be converted to hot-encoded variables. A few important notes:
prepare_examples()
.Instead of using the standard off the shelf
transformer
model that comes with processpredictR
, you can customize
the model. One way to do this, is by using the custom
argument of the create_model()
function. The resulting
model will then only contain the input layers of the model, as shown
below.
df <- prepare_examples(traffic_fines, task = "next_activity") %>% split_train_test()
custom_model <- df$train_df %>% create_model(custom = TRUE, name = "my_custom_model")
custom_model
#> Model: "my_custom_model"
#> ________________________________________________________________________________
#> Layer (type) Output Shape Param #
#> ================================================================================
#> input_2 (InputLayer) [(None, 9)] 0
#> token_and_position_embedding_1 (To (None, 9, 36) 828
#> kenAndPositionEmbedding)
#> transformer_block_1 (TransformerBl (None, 9, 36) 26056
#> ock)
#> global_average_pooling1d_1 (Global (None, 36) 0
#> AveragePooling1D)
#> ================================================================================
#> Total params: 26,884
#> Trainable params: 26,884
#> Non-trainable params: 0
#> ________________________________________________________________________________
You can than stack layers on top of your custom model as you prefer,
using the stack_layers()
function. This function provides
an abstraction from a little bit more code work if keras
is
used (see later).
custom_model <- custom_model %>%
stack_layers(layer_dropout(rate = 0.1)) %>%
stack_layers(layer_dense(units = 64, activation = 'relu'))
custom_model
#> Model: "my_custom_model"
#> ________________________________________________________________________________
#> Layer (type) Output Shape Param #
#> ================================================================================
#> input_2 (InputLayer) [(None, 9)] 0
#> token_and_position_embedding_1 (To (None, 9, 36) 828
#> kenAndPositionEmbedding)
#> transformer_block_1 (TransformerBl (None, 9, 36) 26056
#> ock)
#> global_average_pooling1d_1 (Global (None, 36) 0
#> AveragePooling1D)
#> dropout_6 (Dropout) (None, 36) 0
#> dense_6 (Dense) (None, 64) 2368
#> ================================================================================
#> Total params: 29,252
#> Trainable params: 29,252
#> Non-trainable params: 0
#> ________________________________________________________________________________
# this works too
custom_model %>%
stack_layers(layer_dropout(rate = 0.1), layer_dense(units = 64, activation = 'relu'))
Once you have finalized your model, with an appropriate output-layer
(which should have the correct amount of outputs, as recorded in
customer_model$num_outputs
and an appropriate activation
function), you can use the compile()
, fit()
,
predict()
and evaluate()
functions as
before.
We can also opt for setting up and training our model manually,
instead of using the provided methods. Note that after defining a model
with keras::keras_model()
the model no longer is of class
ppred_model
.
new_outputs <- custom_model$model$output %>% # custom_model$model to access a model and $output to access the outputs of that model
keras::layer_dropout(rate = 0.1) %>%
keras::layer_dense(units = custom_model$num_outputs, activation = 'softmax')
custom_model <- keras::keras_model(inputs = custom_model$model$input, outputs = new_outputs, name = "new_custom_model")
custom_model
#> Model: "new_custom_model"
#> ________________________________________________________________________________
#> Layer (type) Output Shape Param #
#> ================================================================================
#> input_2 (InputLayer) [(None, 9)] 0
#> token_and_position_embedding_1 (To (None, 9, 36) 828
#> kenAndPositionEmbedding)
#> transformer_block_1 (TransformerBl (None, 9, 36) 26056
#> ock)
#> global_average_pooling1d_1 (Global (None, 36) 0
#> AveragePooling1D)
#> dropout_6 (Dropout) (None, 36) 0
#> dense_6 (Dense) (None, 64) 2368
#> dropout_8 (Dropout) (None, 64) 0
#> dense_8 (Dense) (None, 11) 715
#> ================================================================================
#> Total params: 29,967
#> Trainable params: 29,967
#> Non-trainable params: 0
#> ________________________________________________________________________________
#> [1] "keras.engine.functional.Functional"
#> [2] "keras.engine.training.Model"
#> [3] "keras.engine.base_layer.Layer"
#> [4] "tensorflow.python.module.module.Module"
#> [5] "tensorflow.python.trackable.autotrackable.AutoTrackable"
#> [6] "tensorflow.python.trackable.base.Trackable"
#> [7] "keras.utils.version_utils.LayerVersionSelector"
#> [8] "keras.utils.version_utils.ModelVersionSelector"
#> [9] "python.builtin.object"
# compile
compile(object=custom_model, optimizer = "adam",
loss = loss_sparse_categorical_crossentropy(),
metrics = metric_sparse_categorical_crossentropy())
Before training the model we first must prepare the data, using the
tokenize()
function.
# the trace of activities must be tokenized
tokens_train <- df$train_df %>% tokenize()
map(tokens_train, head) # the output of tokens is a list
#> $token_x
#> $token_x[[1]]
#> [1] 2
#>
#> $token_x[[2]]
#> [1] 2 3
#>
#> $token_x[[3]]
#> [1] 2
#>
#> $token_x[[4]]
#> [1] 2 4
#>
#> $token_x[[5]]
#> [1] 2 4 5
#>
#> $token_x[[6]]
#> [1] 2 4 5 6
#>
#>
#> $numeric_features
#> NULL
#>
#> $categorical_features
#> NULL
#>
#> $token_y
#> [1] 0 1 2 3 4 5
# make sequences of equal length
x <- tokens_train$token_x %>% pad_sequences(maxlen = max_case_length(df$train_df), value = 0)
y <- tokens_train$token_y
We are now ready to train our custom model (the code below is not being evaluated).
# train
fit(object = custom_model, x, y, epochs = 10, batch_size = 10) # see also ?keras::fit.keras.engine.training.Model
# predict
tokens_test <- df$test_df %>% tokenize()
x <- tokens_test$token_x %>% pad_sequences(maxlen = max_case_length(df$train_df), value = 0)
predict(custom_model, x)
# evaluate
tokens_test <- df$test_df %>% tokenize()
x <- tokens_test$token_x
# normalize by dividing y_test over the standard deviation of y_train
y <- tokens_test$token_y / sd(tokens_train$token_y)
evaluate(custom_model, x, y)