Archive

Archive for the ‘Modeling Tips’ Category

Generating Random Numbers from Custom Probability Distributions

May 29th, 2014 No comments

STELLA® and iThink® provide many useful probability distribution functions (listed here).  However, sometimes you need to draw random numbers from a different probability distribution, perhaps one you have developed yourself.  In these cases, it is possible to invert the cumulative probability distribution and use a uniformly distributed random number between zero and one (using the RANDOM built-in) to draw a number from the intended distribution.  With a lot of math, this can be done analytically (briefly described here).  With no math at all, it can be closely approximated using the graphical function.

Find the Cumulative Distribution Function

Every probability distribution has a probability density function (PDF) that relates a value with its probability of occurring.  The most famous continuous PDF is the bell curve for the normal distribution:

image

From the PDF, we can see that the probability of randomly drawing 100 is just under 0.09 while the probability of randomly drawing 88 or 112 is close to zero.  Note that applying the techniques described in this article to a continuous probability distribution will only approximate that distribution.  The accuracy of the approximation will be determined by the number of data points included in the graphical function.

For discrete probability functions, the PDF resembles a histogram:

image

From this PDF, we can see that the probability of randomly drawing 1 is 0.4, while the probability of drawing 3 is 0.15.  As discrete probability distributions can be represented exactly within graphical functions, the remainder of this article will focus on them.

Read more…

Generating Custom Reports Using XMILE

September 4th, 2013 No comments

XMILE is an open standard for describing system dynamics models in XML.  Version 10 of iThink and STELLA output their models in the XMILE format.  One of the advantages of XML is that it is a text-based format that can be easily queried and manipulated.  This post will show you how to use XMLStarlet, a free XML command line management tool available for Windows, Macintosh, and Linux, to easily extract information from a XMILE model.  It will also demonstrate how to modify the XML style sheet (XSLT) generated by XMLStarlet to create custom HTML reports.

Our goal is to create a report that lists the stocks, flows, and converters in the susceptible-infected-recovered (SIR) model of infection shown below (available by clicking here).  Each model variable will be listed with its own equation and sorted by name.

SIR

XMLStarlet uses the select command (sel) for making queries to an XML file and formatting the results.  We will use all of the following select command options:

-t (template): define a set of rules (below) to be applied to the XML file
-m “XPath query” (match): find and select a set of nodes in the XML file
-s <options> “XPath expression” (sort): sort selected nodes by XPath expression
-v “XPath expression” (value): output value of XPath expression
-o “text” (output): output the quoted text
-n (newline): start a new line in the output

Reporting Stock Names

Let’s start by outputting the names of the stocks in the model.  In a XMILE file, stocks are identified by the <stock> tag, which is nested inside the <xmile> and <model> tags:

<xmile …>
   <model>
      <stock name="Infected">
         <eqn>1</eqn>
      </stock>
   </model>
</xmile>

There is one <stock> tag for every stock in the model and each stock has, at a minimum, both a name (in the “name” attribute) and an initialization equation (in the <eqn> tag).  To get the names of all stocks in the model, we can build a template using these XMLStarlet command options:

sel –t -m “_:xmile/_:model/_:stock” -v “@name” -n

The “sel” chooses the select command and the –t begins the template (the set of rules used to extract and format information from the XML file).  The –n at the end puts each stock name on its own line.

The –m option defines the XML path to any stock from the root.  In this case, the –m option is selecting all the XML nodes named stock (i.e., <stock> tags) that are under any <model> tags in the <xmile> tag.  From the XMILE file, one might expect the XML path to be “xmile/model/stock,” but the tags in the XMILE file are in the XMILE namespace and XPath, which is being used for this query, requires namespaces to be explicitly specified.  Luckily, XMLStarlet, starting in version 1.5.0, allows us to use “_” for the name of the namespace used by the XML file, in this case the XMILE namespace.  Thus, every XMILE name in a query must be preceded by “_:”.

Finally, the –v option allows us to output the name of each node selected with -m (stocks, in this case).  The “@” tells XPath that “name” is an attribute, not a tag, i.e., it is of the form name=”…” rather than <name>…</name>.

To build a full command, we need to add the path to XML Starlet to the beginning and the name of the XML file being queried to the end:

XMLStarlet_path/xml <options above> SIR.stmx

The entire command without the path to XMLStarlet is:

xml sel -t -m “_:xmile/_:model/_:stock” -v “@name” -n SIR.stmx

This command produces the following output:

Infected
Susceptible
Recovered

Read more…

Working with Array Equations in Version 10

December 17th, 2012 3 comments

STELLA/iThink version 10 introduces several new array features, including simplified and more powerful Apply-To-All equations that are designed to reduce the need to specify equations for every individual element.

Dimension names are optional

When an equation is written using other array names, the dimension names are not normally needed.  For example, given arrays A, B, and C, each with the dimensions Dim1 and Dim2, A can be set to the sum of B and C with this equation:

B + C

Dimension names are still needed when the dimensions do not match.  For example, to also add in the first 2-dimensional slice of the 3-dimensional array D[Dim1, Dim2, Dim3], the equation becomes:

B + C + D[Dim1, Dim2, 1]

The wildcard * is optional

When an array builtin is used, the * is normally not needed.  For example, to find the sum of the elements of a 2-dimensional array A[Dim1, Dim2] requires this equation:

SUM(A)

If, however, the sum of only the first column of A is desired, the * is still needed:

SUM(A[*, 1])

Simplified array builtins

There are five array builtins:  SIZE, SUM, MEAN, STDDEV, and RANK.  In addition, the MIN and MAX functions have been extended to take either one or two array arguments.  All but RANK can also be applied to queues and conveyors.

SUM, MEAN, and STDDEV all work in a similar way (see examples of SUM above).

Using the MAX function, it is possible to find the maximum value in array A,

MAX(A)

the maximum value in array A, or zero if everything is negative,

MAX(A, 0)

or the maximum across two arrays A and B,

MAX(A, B)

MIN works the same way, but finds the minimum.

The SIZE function requires an array parameter, but within an array, the special name SELF can be used to refer to the array whose equation is being set.  In addition, wildcards can be used to determine the size of any array slice.  In the equation for array A[Dim1, Dim2],

SIZE(SELF)

gives the total number of elements in array A while

SIZE(SELF[*, 1])

gives the size of the first dimension of A, i.e., the number of elements – or rows – in the first column.  Likewise,

SIZE(SELF[1, *])

gives the size of the second dimension of A, i.e., the number of elements – or columns – in the first row.

Read more…

Using PEST to Calibrate Models

January 14th, 2011 21 comments

There are times when it is helpful to calibrate, or fit, your model to historical data. This capability is not built into the iThink/STELLA program, but it is possible to interface to external programs to accomplish this task. One generally available program to calibrate models is PEST, available freely from www.pesthomepage.org. In this blog post, I will demonstrate how to calibrate a simple STELLA model using PEST on Windows. Note that this method relies on the Windows command line interface added in version 9.1.2 and will not work on the Macintosh. The export to comma-separated value (CSV) file feature, added in version 9.1.2, is also used.

The model and all files associated with its calibration are available by clicking here.

The Model

The model being used is the simple SIR model first presented in my blog post Limits to Growth. The model is shown again below. There are two parameters: infection rate and recovery rate. Technically, the initial value for the Susceptible stock is also a parameter. However, since this is a conserved system, we can make an excellent guess as to its value and do not need to calibrate it.

image

The Data Set

We will calibrate this model to two data sets. The first is the number of weekly deaths caused by the Hong Kong flu in New York City over the winter of 1968-1969 (below).

clip_image004

The second is the number of weekly deaths per thousand people in the UK due to the Spanish flu (H1N1) in the winter of 1918-1919 (shown later).

In both cases, I am using the number of deaths as a proxy for the number of people infected, which we do not know. This is reasonable because the number of deaths is directly proportional to the number of infected individuals. If we knew the constant of proportionality, we could multiply the deaths by this constant to get the number of people infected.

Read more…

What is Delta Time (DT)?

August 3rd, 2010 15 comments

After reading Karim Chichakly’s recent post on Integration Methods and DT, I was reminded that delta time (DT) has always been a tricky modeling concept for me to grasp.   Beginning modelers don’t usually need to think about changing DT since STELLA and iThink set it to a useful default value of 0.25.   But once you progress with your modeling skills, you might consider the advantages and risks of playing with DT.

The DT setting is found in the Run Specs menu.

By definition, system dynamics models run over time and DT controls how frequently calculations are applied each unit of time.  Think of it this way, if your model was a movie, then DT would indicate the time interval between still frames in the strip of movie film.  For a simulation over a period of 12 hours, a DT of 1/4 (0.25) would give you a single frame every 15 minutes.  Lowering the DT to 1/60 would give a frame every minute.   The smaller the DT is, the higher the calculation frequency (1/DT).

Beware of the Extremes

A common tendency for modelers is to set the calculation frequency too high.  Without really thinking too hard about it, more data seems to imply a higher quality model – just like more frames in movie film make for smoother motion.  If your model calculates more data for every time unit, its behavior will begin to resemble the behavior of a smoothly continuous system.  But a higher frequency of calculations can greatly slow down your model’s run performance and more data does not directly translate to a better simulation.

Beware of Discrete Event Models

Another situation where DT can often lead to unexpected behavior is with models that depend on discrete events.   My eyes were opened to this when I attended one of isee’s workshops taught by Corey Peck and Steve Peterson of Lexidyne LLC.

One of the workshop exercises involved a simple model where the DT is set to the default 0.25, the inflow is set to a constant 10, and the outflow is set to flush out the stock’s contents as soon as it reaches 50.   This is how the model’s structure and equations looked:

Discrete Model

Stock = 0

inflow = 10

outflow = IF Stock >= 50 THEN 50 ELSE 0

I would have expected the value of the stock to plunge to zero after it reached or exceeded 50, but this graph shows the resulting odd saw-tooth pattern.

Sawtooth Model Behavior

The model ends up behaving like a skipping scratched record, in a perpetual state of never progressing far enough to reach the goal of zero.  (Click here to download the model.)

What is happening in the model?  In the first DT after the stock’s value reaches exactly 50, the outflow sets itself to 50 in order to remove the contents from the stock. So far so good, but now the DT gotcha begins to occur.   Since the outflow works over time, its value is always per time.  To get the quantity of material that actually flowed, you must multiply the outflow value (or rate) by how long the material was flowing.  When DT is set to 0.25,  the material flows 0.25 time units each DT.  Hence, the quantity of material removed from the stock is 50*0.25 = 12.50.

Suddenly we are in a situation where only 12.50 has been removed from the stock but the stock’s value is now less than 50.  Since the stock is no longer greater than or equal to 50, the outflow sets itself back to 0 and never actually flushes out the full contents of the stock. 

So what do we do?  One solution to this problem would be to use the PULSE built-in to remove the full value from the stock.   Here’s what the equation for the outflow would look like:

outflow = IF Stock >= 50 THEN PULSE(Stock) ELSE 0

(Note: This option will only work using Euler’s integration method.)

Further Reading

STELLA and iThink have great help documentation on DT.  The general introduction provides a good explanation of how DT works. The more advanced DT Situations Requiring Special Care section focuses more on artifactual delays and the discrete model issues mentioned in this post.  Delta time and resulting model behaviors are reminders that system dynamics models run over time, but they achieve this by applying numerous discrete calculations in order to simulate the smooth behavior of actual systems.

Categories: Modeling Tips Tags: ,

Integration Methods and DT

July 14th, 2010 10 comments

The simulation engine underlying STELLA® and iThink® uses numerical integration.  Numerical integration differs from the integration you may have learned in Calculus in that it uses algorithms that approximate the solution to the integration.  The two approximations currently available are known as Euler’s method and the Runge-Kutta method.  All algorithms require a finite value for DT, the integration step-size, rather than the infinitesimally small value used in Calculus.  On the surface, it may seem that the smaller DT is, the more accurate the results, but this turns out not to be true.

Compound Interest:  Euler’s Method over Runge-Kutta

To introduce Euler’s method, let’s take a look at the simple problem of compound interest.  If we have $100 that we invest at 10% (or 0.1) compounded annually, we can calculate the interest after N years by adding in the interest each year and recalculating:

1st year:  interest = $1000 × 0.1 = $100; Balance = 1000 + 100 = $1100
2nd year: interest = $1100 × 0.1 = $110; Balance = 1100 + 110 = $1210
3rd year:  interest = $1210 × 0.1 = $121; Balance = 1210 + 121 = $1331

And so on up to year N.  We have just seen the essence of how Euler’s method works.  It calculates the new change in the stock for this DT (in this case, interest) and then adds that to the previous value of the stock (Balance) to get the new value of the stock.  In this example, DT = 1 year.

By noticing we always add the existing balance in, we can instead just multiply the previous year’s balance by 1 + rate = 1 + 0.1 = 1.1:

1st year:  Balance = $1000 × 1.1 = $1100
2nd year: Balance = $1100 × 1.1 = $1210
3rd year:  Balance = $1210 × 1.1 = $1331

And so on up to year N. We can further generalize by noticing we are multiplying by 1.1 N times and thus arrive at the compound interest formula:

Balance = Initial_Balance*(1 + rate)^N

Checking this, we find our Balance at the end of year 3 is 1000*1.1^3 = $1331.  In the general case of the formula, rate is the fractional interest rate per compounding period and N is the number of compounding periods (an integer).  In our example, the compounding period is one year, so rate is the annual fractional interest rate and N is the number of years.  However, if interest is compounded quarterly (four times a year), the interest rate has to be adjusted to a per quarter rate by dividing by 4 (so rate = 0.1/4 = 0.025) and N must be expressed as the number of quarters (N = number of years*4 = 3*4 = 12 for the end of year 3).  We can use this formula in our model to test the accuracy of Euler’s method.  Note that for quarterly compounding, we would set DT = 1/4 = 0.25 years.

To explore the differences between Euler’s and Runge-Kutta, the following structure will be used for all of the examples in this post.  This structure models the compound interest problem outlined above.

image

The equations change for each example and can be seen in the individual model files (accessed by clicking here).  For this example, the actual value is calculated using the compound interest formula, Initial_Balance*(1 + rate)^TIME.  The approximated value is calculated by integrating rate*Approx_Balance (into Approx_Balance).

In addition to the actual and approximate values, three errors are also calculated across the model run:  the maximum absolute error, the maximum relative error, and the root-mean-squared error (RMSE).  The absolute error is:

ABS(Actual_BalanceApprox_Balance)

The relative error is:

absolute_error/ABS(Actual_Balance)

and is usually expressed as a percentage.  The RMSE is found by averaging the values of the absolute error squared, and then taking the square root of that average.

Read more…

Steady-State Initialization of Conveyors

May 25th, 2010 5 comments

Conveyors are useful model elements for representing pipelines or processes that take a certain amount of time to complete.  However, adding a leakage flow to a conveyor can make it difficult to initialize a model in steady-state.  The following discussion will explain how to initialize conveyors with leakage in steady-state.  Please refer to the model structure below while reading this discussion.

image

These additional variables will be also used:

transit_time = TRANSTIME(conveyor)
conveyor_length = transit_time/DT
leakage_fraction = the user-specified leakage fraction

Linear Leakage

The default leakage is linear in behavior.  The total amount that leaks across the length of the conveyor is directly proportional to the inflowing amount.  The leakage fraction is the constant of proportionality.  Thus, the fraction of inflowing material that makes it to the conveyor’s outflow is exactly

1 – leakage_fraction

Given the sample model structure above, to achieve equilibrium, conveyor_outflow must equal outflow.  For this to happen, we need to set the inflow as follows:

inflow = outflow/(1 – leakage_fraction)

The conveyor’s steady-state value is then:

conveyor = transit_time*inflow – (conveyor_length – 1)*leakage*DT/2

where the initial value of leakage is:

leakage = leakage_fraction*inflow

This must be calculated outside the program and entered as a constant into the conveyor as conveyors cannot be given equations (they can, however, be set to a the value of a single converter, but you must be careful how you calculate this to avoid circularity).

Exponential Leakage

Optionally, leakage can be made exponential.  The amount that leaks each DT is proportional to the amount remaining in the conveyor.  In this case, the leakage fraction is the fraction that leaks each unit of time so, for long conveyors, a lot of material can leak away.  Given the transit time, the fraction of inflowing material that makes it to the conveyor’s outflow is approximately

1 – (1 – leakage_fraction)^transit_time

Given the sample model structure above, to achieve equilibrium, conveyor_outflow must equal outflow.  For this to happen, we need to set the inflow as follows:

per_dt_no_leak = 1 – leakage_fraction*DT
inflow = outflow/(per_dt_no_leak^conveyor_length)

For steady-state, the conveyor itself must then be set as follows:

conveyor = (inflow*DT)*(1 – per_dt_no_leak^conveyor_length)/(1 – per_dt_no_leak)

Converting a Sector-based Model to Modules

March 17th, 2010 5 comments

I generally do not use modules to build very small models (only a couple of stocks and flows), which may then lead me to use sectors as the model grows because they are very convenient.  By the time I have three sectors, though, it starts to become clear that I should have used modules.  I will then need to convert my sector-based model into a module-based model.  Historically, I also have a number of sector-based models that are crying to be module-based.

Converting from sectors to modules is not very difficult:

  1. Make sure there are no connections or flows between sectors.  Replace any of these with ghosts in the target sector.
  2. In a new model, create one module for every sector.
  3. Copy and paste the structure from each sector into its corresponding module.
  4. Connect the modules:  At this point, the model structure has been rearranged into modules, but none of the modules are connected.  The ghosts that were in the sectors became real entities when they were pasted into the modules.  Go back to identify all of these connections and reconnect them in the module-based model.

Stepping Through a Sample Model

Let’s walk through an example.  A small sector-based model is shown below (and is available by clicking here).

image

This model violates what I would call good sector etiquette:  there are connectors that run between the sectors.  This is often useful in a small model such as this because it makes the feedback loops visible.  However, in a larger model, this can lead to problems such as crossed connections and difficulty in maintaining the model because sectors cannot be easily moved.

Read more…