# STLAlgorithmExtensions/StatisticsAlgorithms

BOOST WIKI | STLAlgorithmExtensions | RecentChanges | Preferences | Page List | Links List

Difference (from prior major revision) (no other diffs)

 I believe to make this generic to both scalar and complex, you must use norm to compute variance. This means you need to define norm for scalars. (Neal Becker)

Changed: 160c163



This could include mean, median, and mode, for sequences of numeric values.

Here some simple code I had laying around for mean, variance, and standard_deviation. Hope this helps...People/Jeff Garland

I changed the code to parameterize the number type. I also introduced the zero parameter to duck the problem of coming up with the appropriate zero object for an arbitrary type T. -People/JeremySiek

Jeremy -- I like the additional parameter to genericize the number type, but I think the zero is a bad idea. It is ugly to make the user to know that he/she has to pass zero to these functions. If something else is passed the answer will be incorrect. To me, it would be much more reasonable to assume that something that is a valid floating point number can be constructed using normal 0. What do you think? -- Jeff

That is a good point. However, I'd still like to leave open the door for people that aren't using floating point numbers. I think anything that satisfies the mathematics requirements for a Field should work. How about we go with a default that uses 0? --People/JeremySiek

I believe to make this generic to both scalar and complex, you must use norm to compute variance. This means you need to define norm for scalars. (Neal Becker)

Looks great! -- Jeff

Isn't value_type() a better default? int(), float(), complex<float>() all are proper 0-values; other types (e.g. mathematical vectors) exist that have a useful mean, a useful default ctor but might not have a conversion from 0. -- People/MichielSalters

I've modified the code below to have a few more statistics functions. In order to compile the AccumulatorType? needs to be callable by sqrt() for most of them. I noticed a few other things:

• No checks are made for divide by zero. An implementation I made would return 0 in these cases instead of throwing an exception.
• I've seen variance done as follows (Sum2 - Sum * Sum / Count) / (Count - 1). I.e., dividing by Count-1 instead of just Count.
--People/ScottKirkwood

Some suggestions if someone is going to propose some statistics algorithms for a boost header:

• Often a user will want to calculate both the mean and the variance simultaneously, it being somewhat inefficient to walk a container twice. Perhaps a std::pair return would be useful here.
• More useful to me than supplying the automatic call to sqrt to transform variance to std deviation would be the error on the mean, which is sqrt(variance/N).
• Shouldn't these algorithms use std::accumulate?
• Probably equally useful would be templates for means and variances for weighted sums.
• What about simple linear regressions?
• We may be getting into the realm where a statistics object (holding N, S(w), S(w^2)) becomes important. These sums are quite generally useful. Actually, accumulating statistics up to the nth order might be a useful generic concept.
• Eventually many users using these algorithms will be interested in a fully-featured histogramming package (which would be great in boost, but shouldn't (IMHO) be a candidate for standardization). I would love to have the functionality of, say, classes like TH1F and TF1, from the ROOT package, in a mode that is closer to the spirit of the STL (and a less intrusive framework). Algorithms used here should be sure not to close off any doors for such a later package, and there are plenty of examples around of similar packages, so common functionality should be easy to find.
--GeorgeHeintzelman

I agree with George, but he beat me to the edit, and went down the same route with:-

```template<typename AccumulatorType>
class order_2_accumulator
{
public:
typedef AccumulatorType value_type;

order_2_accumulator():
Count(),
Sum(),
Sum2()
{}
order_2_accumulator(unsigned int count_,
const value_type& sum_,
const value_type& sum2_):
Count(count_),
Sum(sum_),
Sum2(sum2_)
{}

unsigned int count() const     { return Count; }
value_type sum() const         { return Sum; }
value_type sum_squares() const { return Sum2; }
value_type mean() const        { return Sum/Count; }
value_type variance() const    { return Sum2/Count - (Sum*Sum)/(Count*Count); }

// Scott Kirkwood: added these - would require sqrt() though.
value_type std() const         { return std::sqrt(variance); }
value_type std_error_of_mean() { return std() / std::sqrt(Count); }
value_type root_mean_square()  { return std::sqrt(Sum2 / Count); }
value_type coefficient_of_variation() { return 100 * std() / mean(); }

template<typename T>
order_2_accumulator<value_type> bump(const T& value_)
{
++Count;
Sum  += value_;
Sum2 += value_*value_;
return *this;
}
private:
unsigned int Count;
value_type   Sum;
value_type   Sum2;
};

// Default operator used by std::accumulate
template<typename AccumulatorType, typename T>
AccumulatorType operator+(const AccumulatorType& init_, const T& value_)
{
AccumulatorType accum(init_);
return accum.bump(value_);
}

```
Which is then used like this
```#include <iostream>
#include <numeric>
int main()
{
int seq[] ={1, 2, 3, 4};
order_2_accumulator<double> sum=std::accumulate(seq, seq+4, order_2_accumulator<double>());
std::cout << sum.count() << ' ' << sum.sum() << ' ' << sum.sum_squares() << std::endl;
std::cout << sum.mean() << ' ' << sum.variance() << std::endl;

double seq2[] = {1., 2., 3., 4.};
sum=std::accumulate(seq2, seq2+4, sum);
std::cout << sum.count() << ' ' << sum.sum() << ' ' << sum.sum_squares() << std::endl;
std::cout << sum.mean() << ' ' << sum.variance() << std::endl;

}

```
--Ian Mitchell

``` //stats.hpp
#include <cmath>

template<class InputIterator>
inline
typename std::iterator_traits<InputIterator>::value_type
mean(InputIterator begin,
InputIterator end,
typename std::iterator_traits<InputIterator>::value_type zero = 0)
{
unsigned int count = 0;
typename std::iterator_traits<InputIterator>::value_type sum = zero;
for(InputIterator i=begin; i < end; ++i) {
sum += *i;
count++;
}
return sum/count;
}

template<class InputIterator>
inline
typename std::iterator_traits<InputIterator>::value_type
variance(InputIterator begin,
InputIterator end,
typename std::iterator_traits<InputIterator>::value_type zero = 0)
{
typename std::iterator_traits<InputIterator>::value_type
mn = mean(begin,end);
unsigned int count = 0;
typename std::iterator_traits<InputIterator>::value_type sum = zero;
for(InputIterator i=begin; i < end; ++i) {
sum += std::pow(*i - mn, 2);
count++;
}
return sum/(count-1);
}

template<class InputIterator>
inline
typename std::iterator_traits<InputIterator>::value_type
standard_deviation(InputIterator begin,
InputIterator end,
typename std::iterator_traits<InputIterator>::value_type zero = 0)
{
return std::sqrt(variance(begin, end, zero));
}
```

BOOST WIKI | STLAlgorithmExtensions | RecentChanges | Preferences | Page List | Links List