# Chapter 8Data Collection

The purpose of most simulation models is to collect data to analyze to gain insights into the system being simulated. The PLT Scheme Simulation Collection, (numeric) data subject to automatic data collection is stored in variable structures (i.e. variables).

Data about a variable may either be collected in a time dependent manner, specified using the `accumulate` macro, or in time independent manner, specified using the `tally` macro.

Currently, either statistics data or history data may be automatically collected for a variable. (Both may in turn be either time dependent or time independent.) History data allows more sophisticated analysis to be performed on the data using other analysis tools. Also, a function to plot history data is provided.

## 8.1  Variables

A variable represents a numeric variable in the model for which data can automatically be collected, as specified by the model builder.

### 8.1.1  The variable Structure

variable

 Structure: `variable` Contract: ```(struct variable ((initial-value (union/c (symbols uninitialized) real?)) (value (union/c (symbols uninitialized) real?)) (time-last-synchronized real?) (statistics (union/c statistics? #f)) (history (union/c history? #f)) (continuous? boolean?) (state-index (integer-in -1 +inf.0)) (get-monitors list?) (set-monitors list?)))```
Instances of the `variable` structure represent variables in the simulation model. The `variable` structure has the following fields.

 initial-value - The initial value of the variable. (Not currently used.) value - The current value of the variable. time-last-synchronized - The time the variable was last synchronized. (Used to implement time dependent data collectors.) statistics - The statistics data collector for the variable or #f. history - The history data collector for the variable or #f. continuous? - #t if the variable is a continuous variable or #f otherwise. (See Continuous Simulation Models.) state-index - The index of the variable in the state vector or -1. (See Continuous Simulation Models.) get-monitors - A list of get monitors for the variable. put-monitors - A list of put monitors for the variable.

make-variable

 Function: ```(make-variable initial-value) (make-variable)``` Contract: ```(case-> (-> real? variable?) (-> variable?))```
This function returns a newly created variable with the specified initial-value. If initial-value is not provided, `'uninitialized` is used.

By default, all variables accumulate statistics on their values. To turn this off, set the `statistics` field to #f.

To create continuous variables, see Chapter 10 Continuous Simulation Models.

## 8.2  Tally and Accumulate

The `tally` and `accumulate` macros specify data collection for variables.

### 8.2.1  Tally

tally

 Macro: ```(tally (variable-statistics variable)) (tally (variable-history variable))```
This macro specifies time independent data collection for the specified variable. `variable-statistics` specifies that statistics are to be tallied for variable. `variable-history` specifies that a history is to be tallied for variable.

Each time a variable value is changed, any tallied data collectors are updated with the new value.

### 8.2.2  Accumulate

accumulate

 Macro: ```(accumulate (variable-statistics variable)) (accumulate (variable-history variable))```
This macro specifies time dependent data collection for the specified variable. `variable-statistics` specifies that statistics are to be accumulated for variable. `variable-history` specifies that a history is to be accumulated for variable.

Each time a variable data collector is accessed or before a variable value is changed, any accumulated data collectors are synchronized with the current value over the time since it was last synchronized.

## 8.3  Statistics and Histories

### 8.3.1  Statistics

statistics

 Structure: `statistics` Contract: ```(struct statistics ((time-dependent? boolean) (minimum real?) (maximum real?) (n real?) (sum real?) (sum-of-squares real?)))```
The `statistics` structure maintains statistics for a variable. Table 1 shows the statistics that are gathered and how they are computed for both `tally` and `accumulate`.

 time-dependent? - #t if the statistics are being accumulated (i.e. are time dependent) or #f if the statistics are being tallied (i.e. are time independent). minimum - The minimum value the variable has had. (Initial value is +inf.0). maximum - The maximum value the variable has had. (initial value is -inf.0). n - See Table 1. sum - See Table 1. sum-of-squares - See Table 1.

Table 1 shows the statistics collected and how they are computed for both tallied and accumulated data collectors.

 statistic accumulate tally `n` timeC - time0 # of samples of X `sum` sum(X*(timeC - timeL)) sum X `mean` `sum`/`n` `sum`/`n` `sum-of-squares` sum(X2*(timeC - timeL)) sum X2 `mean-square` `sum-of-squares`/`n` `sum-of-squares`/`n` `variance` `mean-square` - `mean`2 `mean-square` - `mean`2 `standard-deviation` (`variance`)1/2 (`variance`)1/2 `maximum` maximum X for all X maximum X for all X `minimum` minumum X for all X minimum X for all X
Table 1:  Statistics Computations

timeC = current simulation time
timeL = simulation time variable was set to its current value
time0 = simulation time the variable was created
X = variable value before change occurs

### 8.3.2  History

history

 Structure: `history` Contract: ```(struct history ((time-dependent? boolean) (initial-time real?) (n real?) (values list?) (last-value-cell (union/c pair? #f)) (durations list?) (last-duration-cell (union/c pair? #f))))```
The `history` structure maintains a history of the values of a variable. For accumulated histories (i.e. those specified using the `accumulate` macro), the durations for each value are also computed.

 time-dependent? - #t if the history is being accumulated (i.e. is time dependent) or #f if the history is being tallied (i.e. is time independent). initial-time - For time dependent history, the (simulated) time the history was created. n - The number of entries in the history. values - The list of values for the history. last-value-cell - The last cell in the `values` list or #f if the `values` list is empty. (Used to efficiently append to the `values` list.) durations - The list of durations for the history. (Not used for tallied histories.) last-duration-cell - The last cell in the `durations` list or #f if the `durations` list is empty. (Used to efficiently append to the `durations` list. Not used for tallied historied)

#### 8.3.2.1  History Graphics

history

 Function: ```(history-plot history title) (history-plot history)``` Contract: ```(case-> (-> history? string? any) (-> history? any))```
This function plots history using the specified title. The string `"History"` is used if title is not specified.

### 8.3.3  Example - Tally and Accumulate Example

This example shows how the `tally` and `accumulate` macros work. Two variables are created, tallied and accumulated. Statistics and history data are collected for each - using `tally` for the variable tallied and `accumulate` for the variable accumulated. The process `test-process` iterates through a list of values and durations, setting each of the variables to the specified value for the specified duration of time. Representative statistics (`n`, `sum`, and `mean`) are printed and the histories plotted for each of the variables.

```;; Test Tally and Accumulate
(require (planet "simulation-with-graphics.ss"
("williams" "simulation.plt")))

(define tallied #f)
(define accumulated #f)

(define-process (test-process value-duration-list)
(let loop ((vdl value-duration-list))
(when (not (null? vdl))
(let ((value (caar vdl))
(set-variable-value! tallied value)
(set-variable-value! accumulated value)
(wait duration)
(loop (cdr vdl))))))

(define (main value-duration-list)
(with-new-simulation-environment
(set! tallied (make-variable))
(tally (variable-statistics tallied))
(tally (variable-history tallied))
(set! accumulated (make-variable))
(accumulate (variable-statistics accumulated))
(accumulate (variable-history accumulated))
(schedule (at 0.0) (test-process value-duration-list))
(start-simulation)
(printf "--- Test Tally and Accumulate ---~n")
(printf "~n--- Tally ---~n")
(printf "N    = ~a~n" (variable-n tallied))
(printf "Sum  = ~a~n" (variable-sum tallied))
(printf "Mean = ~a~n" (variable-mean tallied))
(printf "~a~n" (history-plot (variable-history tallied)))
(printf "~n--- Accumulate ---~n")
(printf "N    = ~a~n" (variable-n accumulated))
(printf "Sum  = ~a~n" (variable-sum accumulated))
(printf "Mean = ~a~n" (variable-mean accumulated))
(printf "~a~n" (history-plot
(variable-history accumulated)))))```

Here are the results of running the program for the following value, duration pairs: ((1 2)(2 1)(3 2)(4 3)). That is, each variable will have a value of 1 for 2 units of time (from time 0 to time 2), a value of 2 for 1 unit of time (from time 2 to time 3), a value of 3 for 2 units of time (from time 3 to time 5), and a value of 4 for 3 units of time (from time 5 to time 8). The simulation ends at time 8.

```>(main '((1 2)(2 1)(3 2)(4 3)))
--- Test Tally and Accumulate ---

--- Tally ---
N    = 4
Sum  = 10.0
Mean = 2.5
```

```--- Accumulate ---
N    = 8.0
Sum  = 22.0
Mean = 2.75
```

```>
```

### 8.3.4  Variable Monitors

Variable monitors are discussed in Chapter ?? Monitors.

## 8.4  Example - Data Collection

The previous examples (Examples 0, 1, and 2) relied on `printf` statements to print the output of the simulation model. This was sufficient to show how the models worked, but would be impractical for large models. This example is the same simulation model as Example 2 (using the `with-resource` instead of the individual calls to `resource-request` and `resource-relinquish`), but with the `printf` statements removed.

No explicit variables are needed for this example since resources already provide variables for their `satisfied` and `queue` fields - since they are in turn implemented using sets.

Note that the statement:

```   (accumulate (variable-statistics
(resource-queue-variable-n attendant)))```

isn't actually needed since statistics are accumulated for any variable by default. It is included as an example. Note that the corresponding `accumulate` is not included for the `satisfied` field and the statistics are still available.

```; Example 3 - Data Collection

(require (planet "simulation-with-graphics.ss"
("williams" "simulation.plt")))
(require (planet "random-distributions.ss"
("williams" "science.plt")))

(define n-attendants 2)
(define attendant #f)

(define-process (generator n)
(do ((i 0 (+ i 1)))
((= i n) (void))
(wait (random-exponential 4.0))
(schedule now (customer i))))

(define-process (customer i)
(with-resource (attendant)
(work (random-flat 2.0 10.0))))

(define (run-simulation n)
(with-new-simulation-environment
(set! attendant (make-resource n-attendants))
(schedule (at 0.0) (generator n))
(accumulate (variable-statistics
(resource-queue-variable-n attendant)))
(accumulate (variable-history
(resource-queue-variable-n attendant)))
(start-simulation)
(printf "--- Example 3 - Data Collection ---~n")
(printf "Maximum queue length = ~a~n"
(variable-maximum
(resource-queue-variable-n attendant)))
(printf "Average queue length = ~a~n"
(variable-mean
(resource-queue-variable-n attendant)))
(printf "Variance             = ~a~n"
(variable-variance
(resource-queue-variable-n attendant)))
(printf "Utilization          = ~a~n"
(variable-mean
(resource-satisfied-variable-n attendant)))
(printf "Variance             = ~a~n"
(variable-variance
(resource-satisfied-variable-n attendant)))
(print (history-plot
(variable-history
(resource-queue-variable-n attendant))))))```

Here is the output for the example when run for 1000 customers.

```>(run-simulation 1000)
--- Example 3 - Data Collection ---
Maximum queue length = 8
Average queue length = 0.9120534884951139
Variance             = 2.2420855874934826
Utilization          = 1.4320511974417858
Variance             = 0.5885107114317054
```

```>
```

## 8.5  Data Collection Across Multiple Simulation Runs

Even as simplistic as our example has been, it is still useful in illustrating some advanced data collection techniques. In particular, we will show how to collect statistics across multiple runs.

### 8.5.1  Open Loop Processing

Open Loop processing is a technique where a resource is considered to have an infinite number of units. That is, no process will ever block waiting for such a resource. Statistics on the demand for such resources can be collected by looking at the `resource-satisfied-variable-n` variable. Typically, this is done across multiple simulation runs.

In the simulation collection we denote an open-loop resource by specifying an infinite number of units when it is created. In PLT Scheme, `+inf.0` denoted (positive infinity).

#### 8.5.1.1  Example - Open Loop Processing

This example collects statistics on the maximum number of attendants required in the system (e.g. a measure of demand) when there is no blocking.

There is an outer simulation environment that exists solely for data collection and a variable `max-attendants` to gather statistics on the maximum number of attendants required. Note that these statistics must be tallied at this level because (simulated) time does not exist across multiple simulation runs.

The inner loop creates a new simulation environment for each simulation run. This ensures each run is properly initialized. It is in this inner loop that the attendant resource is create with an infinite number of units - `(make-resource +inf.0)`. When the simulation in the inner loop terminates, the `max-attendants` variable is updated with the maximum number of attendants from the simulation. This is done with:

```(set-variable-value! max-attendants
(variable-maximum
(resource-satisfied-variable-n attendant)))```

Finally, the statistics and histogram of the maximum attendants across all of the simulation runs is printed.

```; Open Loop Example

(require (planet "simulation-with-graphics.ss"
("williams" "simulation.plt")))
(require (planet "random-distributions.ss"
("williams" "science.plt")))

(define attendant #f)

(define (generator n)
(do ((i 0 (+ i 1)))
((= i n) (void))
(wait (random-exponential 4.0))
(schedule now (customer i))))

(define-process (customer i)
(with-resource (attendant)
(wait/work (random-flat 2.0 10.0))))

(define (run-simulation n1 n2)
(with-new-simulation-environment
(let ((max-attendants (make-variable)))
(tally (variable-statistics max-attendants))
(tally (variable-history max-attendants))
(do ((i 1 (+ i 1)))
((> i n1) (void))
(with-new-simulation-environment
(set! attendant (make-resource +inf.0))
(schedule (at 0.0) (generator n2))
(start-simulation)
(set-variable-value! max-attendants
(variable-maximum
(resource-satisfied-variable-n attendant)))))
(printf "--- Open Loop Example ---~n")
(printf "Number of experiments      = ~a~n"
(variable-n max-attendants))
(printf "Minimum maximum attendants = ~a~n"
(variable-minimum max-attendants))
(printf "Maximum maximum attendants = ~a~n"
(variable-maximum max-attendants))
(printf "Mean maximum attendants    = ~a~n"
(variable-mean max-attendants))
(printf "Variance                   = ~a~n"
(variable-variance max-attendants))
(print (history-plot (variable-history max-attendants)
"Maximum Attendants"))
(newline))))
```

The following shows the output of the simulation for 1000 run of 1000 customers each.

```>(run-simulation 1000 1000)
--- Open Loop Example ---
Number of experiments      = 1000
Minimum maximum attendants = 6
Maximum maximum attendants = 11
Mean maximum attendants    = 7.525
Variance                   = 0.6653749999999903
```

```>
```

### 8.5.2  Closed Loop Processing

Closed Loop processing is the "normal" processing where the number of units of a resource is specified and processes are queued (i.e. blocked) when there are not sufficient units of the resource to satisfy a request. Statistics on the utilitization for such resources can be collected by looking at the `resource-queue-variable-n` variable. Typically, this is done across multiple simulation runs.

#### 8.5.2.1  Example - Closed Loop Processing

This example collects statistics on the average attendant queue length in the system (e.g. a measure of utilization) when there is a specified number of attendants.

There is an outer simulation environment that exists solely for data collection and a variable `avg-queue-length` to gather statistics on the average attendant queue length. Note that these statistics must be tallied at this level because (simulated) time does not exist across multiple simulation runs.

The inner loop creates a new simulation environment for each simulation run. This ensures each run is properly initialized. It is in this inner loop that the attendant resource is create with the specified number of units - `(make-resource n-attendants)`. When the simulation in the inner loop terminates, the `avg-queue-length` variable is updated with the average attendant queue length the simulation. This is done with:

```(set-variable-value! avg-queue-length
(variable-mean (resource-queue-variable-n attendant)))```

Finally, the statistics and histogram of the average attendant attendant queue length across all of the simulation runs is printed.

```; Closed Loop Example

(require (planet "simulation-with-graphics.ss"
("williams" "simulation.plt")))
(require (planet "random-distributions.ss"
("williams" "science.plt")))

(define n-attendants 2)
(define attendant #f)

(define-process (generator n)
(do ((i 0 (+ i 1)))
((= i n) (void))
(wait (random-exponential 4.0))
(schedule now (customer i))))

(define-process (customer i)
(with-resource (attendant)
(work (random-flat 2.0 10.0))))

(define (run-simulation n1 n2)
(let ((avg-queue-length (make-variable)))
(tally (variable-statistics avg-queue-length))
(tally (variable-history avg-queue-length))
(do ((i 1 (+ i 1)))
((> i n1) (void))
(with-new-simulation-environment
(set! attendant (make-resource n-attendants))
(schedule (at 0.0) (generator n2))
(start-simulation)
(set-variable-value! avg-queue-length
(variable-mean (resource-queue-variable-n attendant)))))
(printf "--- Closed Loop Example ---~n")
(printf "Number of attendants         = ~a~n" n-attendants)
(printf "Number of experiments        = ~a~n"
(variable-n avg-queue-length))
(printf "Minimum average queue length = ~a~n"
(variable-minimum avg-queue-length))
(printf "Maximum average queue length = ~a~n"
(variable-maximum avg-queue-length))
(printf "Mean average queue length    = ~a~n"
(variable-mean avg-queue-length))
(printf "Variance                     = ~a~n"
(variable-variance avg-queue-length))
(print (history-plot (variable-history avg-queue-length)
"Average Queue Length"))
(newline)))
```

The following shows the output of the simulation for 1000 run of 1000 customers each.

```>(run-simulation 1000 1000)
--- Closed Loop Example ---
Number of attendants         = 2
Number of experiments        = 1000
Minimum average queue length = 0.5792057912006373
Maximum average queue length = 3.182757214703683
Mean average queue length    = 1.1123279920475524
Variance                     = 0.08869696318792064
```

```>
```