«smooth» package for R. Intermittent state-space model. Part I. Introducing the model

One of the features of functions of smooth package is the ability to work with intermittent data and the data with periodically occurring zeroes.

Intermittent time series is a series that has non-zero values occurring at irregular frequency (Svetuknov and Boylan, 2017). Imagine retailer who sells green lipsticks. The demand on such a product will not be easy to predict, because green colour is not a popular colour in this case, and thus the data of sales will contain a lot of zeroes with seldom non-zero values. Such demand is called “intermittent”. In fact, many products exhibit intermittent patterns in sales, especially when we increase the frequency of measurement (how many tomatoes and how often does a store sell per day? What about per hour? Per minute?).

The other case is when watermelons are sold in high quantities over summer and are either not sold at all or sold very seldom in winter. In this case the demand might be intermittent or even absent in winter and have a nature of continuous demand during summer.

smooth functions can work with both of these types of data, building upon mixture distributions.

In this post we will discuss the basic intermittent demand statistical models implemented in the package.

The model

First, it is worth pointing out that the approach that is used in the statistical methods and models discussed in this post assumes that the final demand on the product can be split into two parts (Croston, 1972):

  1. Demand occurrence part, which is represented by a binary variable, which is equal to one when there is a non-zero demand in period \(t\) and zero otherwise;
  2. Demand sizes part, which reflects the amount of sold product when demand occurrence part is equal to one.

This can be represented mathematically by the following equation:
\begin{equation} \label{eq:iSS}
y_t = o_t z_t ,
\end{equation}
where \(o_t\) is the binary demand occurrence variable, \(z_t\) is the demand sizes variable and \(y_t\) is the final demand. This equation was originally proposed by Croston, (1972), although he has never considered the complete statistical model and only proposed a forecasting method.

There are several intermittent demand methods that are usually discussed in forecasting literature: Croston (Croston, 1972), SBA (Syntetos & Boylan, 2000) and TSB (Teunter et al., 2011). These are good methods that work well in intermittent demand context (see, for example, Kourentzes, 2014). The only limitation is that they are the “methods” and not “models”. Having models becomes important when you want to include additional components and produce proper prediction intervals, need the ability to select the appropriate components and do proper statistical inference. So, John Boylan and I developed the model underlying these methods (Svetunkov & Boylan, 2017), based on \eqref{eq:iSS}. It is built upon ETS framework, so we call it “iETS”. Given that all the intermittent demand forecasting methods rely on simple exponential smoothing (SES), we suggested to use ETS(M,N,N) model for both demand sizes and demand occurrence parts, because it underlies SES (Hyndman et al., 2008). One of the key assumptions in our model is that demand sizes and demand occurrences are independent of each other. Although, this is an obvious simplification, it is inherited from Croston and TSB, and seems to work well in many contexts.

The iETS(M,N,N) model, discussed in our paper is formulated the following way:
\begin{equation}
\begin{matrix}
y_t = o_t z_t \\
z_t = l_{z,t-1} \left(1 + \epsilon_t \right) \\
l_{z,t} = l_{z,t-1}( 1 + \alpha_z \epsilon_t) \\
o_t \sim \text{Bernoulli}(p_t)
\end{matrix} ,
\end{equation}
where \(z_t\) is represented by the ETS(M,N,N) model, \(l_{z,t}\) is the level of demand sizes, \(\alpha_z\) is the smoothing parameter and \(\epsilon_t\) is the error term. The important assumption in our implementation of the model is that \(\left(1 + \epsilon_t \right) \sim \text{log}\mathcal{N}(0, \sigma_\epsilon^2) \) – something that we discussed in one of the previous posts. This means that the demand will always be positive. However if you deal with some other type of data, where negative values are natural, then you might want to stick with pure additive model.

As a side note, we have advanced notations for the iETS models, which will be discuss in the next post. For now we will stick with the level models and use the shorter names.

Having this statistical model, makes it extendable, so that one can add trend, seasonal component, or exogenous variables. We don’t discuss these elements in our paper, but it is briefly mentioned in the conclusions. And we don’t discuss these features just yet, we will cover them in the next post.

Now the main question that stays unanswered is how to model the probability \(p_t\). And there are several approaches to that:

  1. iETS\(_F\) – assume that the demand occurs at random with a fixed probability (so \(p_t = p\)).
  2. iETS\(_I\) – the interval-based model, with a principle suggested by Croston (1972). In this case we assume that the probability is constant between the demand occurrences and that the intervals between the occurrences are inverse proportional to the respective probabilities in time. In this case we model and forecast the number of zeroes between the non-zero demands and then invert the forecasted value. This is one of the oldest forecasting approaches for intermittent demand, but it is not always the most accurate one. Still it seems that this model works well, when demand is building up.
  3. iETS\(_P\) – the probability-based model, the principle suggested by Teunter et al., (2011). In this case the probability is updated directly based on the values of occurrence variable, using SES method. This model works well in any case when probability changes over time.

In case (1) there is no model underlying the probability, we just estimate the value and produce forecasts. In cases of (2) and (3), we suggest using another ETS(M,N,N) model as underlying each of these processes. So when it comes to producing forecasts, in both cases we assume that future level of probability will be the same as the last obtained one (level forecast from the local-level model). After that the final forecast is generated using:
\begin{equation} \label{eq:iSSForecast}
\hat{y}_t = \hat{p}_t \hat{z}_t ,
\end{equation}
where \(\hat{p}_t\) is the forecast of the probability, \(\hat{z}_t\) is the forecast of demand sizes and \(\hat{y}_t\) is the final forecast of the intermittent demand.
Summarising advantages of our framework:

  1. Our model is extendable: you can use any ETS model and even introduce exogenous variables. In fact, you can use any model you want for demand sizes and a wide variety of models for demand occurrence variable;
  2. The model allows selecting between the aforementioned types of intermittent models (“fixed” / “probability” / “intervals”) using information criteria. This mechanism works fine on large samples, but, unfortunately, does not seem to work well in cases of small samples (which is typical for intermittent demand) with the existing information criteria. Potentially, some modifications of criteria need to be done in order to make the mechanism work better. For example, AICc and BICc need to be adjusted in order to take the demand occurrence part into account;
  3. The model allows producing prediction intervals for several steps ahead and cumulative (over a lead time) upper bound of the intervals. The latter arises naturally from the model and can be used for safety stock calculation;
  4. The estimation of models is done using likelihood function and not some ad-hoc estimators. This means that the estimates of parameters become efficient and consistent;
  5. Although the proposed model is continuous, we show in our paper that it is more accurate than several other integer-valued models. Still, if you want to have integer numbers as your final forecasts, you can round up or round down either the point or prediction intervals, ending up with meaningful values. This can be done due to a connection between the quantiles of rounded values and the rounding of quantiles of continuous variable (discussed in Appendix of our paper).

If you need more details, have a look at our working paper (have I already advertise it enough in this post?).

Implementation. Demand occurrence

The aforementioned model with different occurrence types is available in smooth package. There is a special function for demand occurrence part, called iss() (Intermittent State-Space model) and there is a parameter in every smooth forecasting function called intermittent, which can be one of: “none”, “fixed”, “interval”, “probability”, “sba”, “auto” or “logistic”. We do not cover “logistic” yet, this will be discussed in the next post. We also don’t discuss “sba” (it is based on Syntetos and Boylan, 2001) and “auto” options – we might cover them later at some point.

So, let’s consider an example with artificial data. We create the following time series:

This way we have an artificial data, where both demand sizes and demand occurrence probability decrease over time stepwise, each 25 observations. The generated data resembles something called “demand obsolescence” or “dying out demand”. Let’s fit the three different models to this time series:

By looking at the outputs, we can already say that iETS\(_P\) model performs better than the others in terms of information criteria. This is due to the flexibility of the model mentioned before – it is able to adapt to the changes in probability faster than the other models. Both iETS\(_P\) and iETS\(_I\) have smoothing parameter close to 0.1 and both start from the high probability (in case of iETS\(_I\) initial level is 1.008, which transforms to the probability of \(\frac{1}{1.008} \approx 0.99 \).

We can also plot the actual occurrence variable, fitted and forecasted probabilities using plot function:

Note that the different models capture the probability differently. While iETS\(_F\) averages out the probability, both iETS\(_I\) and iETS\(_P\) models react to the changes in the data, but differently: interval-based model does that only when demand occurs, while probability-based one does that on every observation. Given that in this example, the demand becomes obsolete, neither the iETS\(_F\), nor iETS\(_I\) produce accurate forecasts for the occurrence part.

In addition, in my example the last observation in x is non-zero demand, so both iETS\(_I\) and iETS\(_P\) react to that, each of them slightly differently: if there was a zero, iETS\(_I\) would predict a higher level of occurrence, while iETS\(_P\) would predict the lower one. This is due to the differences in the mechanisms of the probability update in the two models.

We cannot do much with the occurrence part of the model at the moment, because of the limitations that will be discussed in the next post. For now we are stuck with ETS(M,N,N) model there, so, let’s move to the demand sizes part of the model.

Implementation. The whole demand

In order to deal with the intermittent data and produce the forecasts for the whole time series, we can use either es(), or ssarima(), or ces(), or gum() – all of them have the parameter intermittent, which is equal to “none” by default. We will use an example of ETS models. In order to simplify things we will use iETS\(_P\) model:

The forecast of this model is a straight line, close to zero due to the decrease in both demand sizes and demand occurrence parts. However, knowing that the demand decreases, we can use trend model in this case. And the flexibility of the approach allows us doing that, so we fit ETS(M,M,N) to demand sizes:

The forecast in this case is even closer to zero and reaches it asymptotically, which means that we foresee that the demand on our product will on average die out.

We can also produce prediction intervals and use model selection for demand sizes. If you know that the data cannot be negative (e.g. selling tomatoes in kilograms), then I would recommend using the pure multiplicative model:

Compare this graph with the one when the pure additive model is selected:

In the latter case the prediction intervals cover the negative part of the plane which does not make sense in our context. Note also that the information criteria are lower for the multiplicative model, which is due to the changing variance in sample for each 25 observations.

The important thing to note is that although multiplicative trend model sounds solid from the theoretical point of view (it cannot produce negative values), it might be dangerous in cases of small samples and positive trends. In this situation the model can produce exploding trajectory, because the forecast corresponds to the exponent. I don’t have any universal solution for this problem for the moment, but I would recommend using ETS(M,Md,N) (damped multiplicative trend) model instead of ETS(M,M,N). The reason why I don’t recommend ETS(M,A,N), is because in cases of negative trend, with the typically low level of intermittent demand, the updated level might become negative, thus making model inapplicable to the data.

Finally, the experiments that I have conducted so far show that iETS\(_P\) is the most robust intermittent demand model in many cases. This is due to the flexibility of the probability update mechanism, proposed by Teunter et al. (2011).

Still, there is a question: how to make the model even more flexible, so that the occurrence part would be as versatile as the demand sizes part of the iETS model? We will have an answer to this question in the next post…

Комментарии к ««smooth» package for R. Intermittent state-space model. Part I. Introducing the model» (6 шт.):

  1. Interesting post! Can you elaborate on how one could tackle the problem of seasonal intermittency/absence, as with the watermelons only being sold for part of the year? I’m currently facing this issue with some products being sold only for 4-5 months per year,, where they have continuous demand, and then no demand at all for the remaining months.
    I have thought about using a different frequency for the time series and removing the periods with no sales, but sadly the starting and ending period are not always in the same week each year.

    • Hi Casper,

      I don’t have a specific model for that yet, and I have not covered some specifics of the functions of smooth in order to give the detailed answer. But if you know when the sales start and when they end (so you know specific values for o_t), then you can provide those values as a vector of dummies in the intermittent parameter. The other option would be to model time varying probability based on ETS(M,N,M) model for demand occurrence (meaning that there are periods, when probability is very high and the others, when it’s very low), but this relies on logistic model that has not been covered yet, and has its own limitations. I will discuss it in the next post.

      Cheers,
      Ivan

      • Hi Ivan,

        Thanks for your answer. I do happen to know when the sales start and end, so I think the first approach would make most sense.

        What do you mean exactly when you say to provide the dummy vector to the intermittent parameter? I tried fitting a model like this:
        es(data, “MNN”, intermittent = dummy_vector, h = h), and got an error (i also tried fitting other types than MNN).

          • I restarted my R session and now it works – I’m not sure what happened.

            The results are, however, not as great as I had hoped. The main issue is that the data is seasonal, but I can only estimate a non-seasonal model due to too few non-zero values. I guess this is due to the fact that the underlying model tries to estimate seasonality for all of the year, and I only have data for around 14 weeks during the winter?

            Do you have any ideas for how to proceed? I haven’t come across anything solving this problem yet, although I feel it should be quite a common problem in retailing.

            • This happens sometimes, I’m not sure why…

              The problem with seasonal models on intermittent data is exactly what you say – very low number of observations. So the default univariate models will struggle with fitting anything meaningful with seasonality. John Boylan, Stephan Kolassa and I are currently working on one of the potential solutions for this problem (using vector models instead of univariate ones), but this is an ongoing research, in an alpha stage…

              One of the potential solutions for now is to use dummy variables for seasons instead (for the cases, where intermittent!=0) – this might work slightly better. You can use xreg parameter in es() for that. Please, let me know if this works.

Leave a Reply

Your email address will not be published. Required fields are marked *