Using data sets for mean and variance
-------------------------------------

Mathematical background
^^^^^^^^^^^^^^^^^^^^^^^

Formulae for a mean value and its associated variance presented here are
derived by Bayes statistics. Their derivation was described by Weise et
al. (2013) in their section 5.8 and their appendix C. Two cases a) and
b) are considered (see also Table 2 in :ref:`gross quantity: variance interpolation for a mean`):

**Unknown random influences:**

a) *Mean type 1*. For any input quantity :math:`x`\ *,* **which does not
   represent a number of counts**, the variance of *m* individual values
   is derived from the experimental variation:

.. math:: u^{2}\left( \overline{x} \right) = \frac{1}{m}\frac{(m - 1)}{(m - 3)}s_{x}^{2}
   :label: data_sets_eq1

Hierein are:

.. math:: \overline{x} = \sum_{i = 1}^{m}x_{i}
.. math:: s_{x}^{2} = \frac{1}{m - 1}\sum_{i = 1}^{m}{(x_{i} - \overline{x})^{2}}
   :label: data_sets_eq2

b) *Mean type 2*. An input quantity *n* represents **a number of
   counts** and is influenced by an additional variation, e.g., due to
   repeated sampling and/or chemical analysis, which enlarges the
   Poisson-derived variance. A normal distribution with parameters
   :math:`\mu` and :math:`\sigma^{2}` is assumed for this influence. The
   variance of the mean is then given by:

   .. math:: u^{2}\left( \overline{n} \right) = \frac{1}{m}\left( \overline{n} + \frac{(m - 1)}{(m - 3)}{(\overline{n} + s}_{n}^{2}) \right) = \frac{1}{m}(\overline{n} + E(S^{2},\mathbf{n}))
      :label: data_sets_eq3

   :math:`\overline{n}\ ` and :math:`s_{n}^{2}` are calculated analogue
   to :math:`\overline{x}\ ` und :math:`s_{x}^{2}`. The variance
   component

   .. math:: E\left( S^{2},\mathbf{n} \right) = \frac{(m - 1)}{(m - 3)}{(\overline{n} + s}_{n}^{2})
      :label: data_sets_eq4

   is considered as the best estimate of the parameter
   :math:`\sigma^{2}` of the involved normal distribution. The first
   term in the bracket of Eq. :eq:`data_sets_eq2` , :math:`\overline{n}` , represents the
   Poisson-related part of the variance.

Applying these formulae leads to surprising result that a variance can
be calculated only if there are more than three individual values.

*Mean type 3*. With version 2.3.01 the **classical** formula for the
standard uncertainty of the mean can be applied

.. math:: u\left( \overline{x} \right) = \frac{s_{x}}{\sqrt{m}}
   :label: data_sets_eq5

if the type of mean “classical“ is selected.

**Known random influences:**

If the fraction of :eq:`data_sets_eq4` within :eq:`data_sets_eq3` is small, a parameter
:math:`\vartheta` can be defined as:

:math:`\vartheta^{2} = E\left( S^{2},\mathbf{n} \right)/{\overline{n}}^{2}`

by which Eq. Gl. :eq:`data_sets_eq4` turns into:

.. math:: u^{2}\left( \overline{n} \right) = \frac{1}{m}(\overline{n} + \vartheta^{2}{\overline{n}}^{2})
   :label: data_sets_eq6

By solving Eq. :eq:`data_sets_eq4` for :math:`\vartheta^{2}`, an equation is obtained,
by which :math:`\vartheta^{2}` can be determined from the data set of
measurements of a reference sample :math:`r`:

.. math:: \vartheta^{2} = \left( {m_{r}\ u}^{2}\left( {\overline{n}}_{r} \right) - {\overline{n}}_{r} \right)/{\overline{n}}_{r}^{2}
   :label: data_sets_eq7

The parameter value :math:`\vartheta` should be less than about 0.2.

Applying means in UncertRadio
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If in the symbol list under the :ref:`tab “equations”` a symbol type is changed
into „m“, the program assumes that value and uncertainty of this
quantity are to be derived from a data set. The following input dialog
allows the input of the data set, it is invoked by the icon |format-justify-fill|
from the toolbar (it requires first selecting the row of this “m”
variable in the :ref:`tab “values, uncertainties”`):

.. |format-justify-fill| image:: /icons/format-justify-fill.png
   :height: 2ex
   :align: middle
   :class: no-scaled-link

.. figure:: /images/dataset_mean.png
    :align: center
    :alt: Dialog for variable average
    :scale: 85

    Dialog for variable average

The id values for the data sets are already known here. In the dialog
shown, the id ref_data (belonging to the input quantity ref) is selected
for data input. Besides, the type of mean and variance can be selected
from equations :eq:`data_sets_eq1` and :eq:`data_sets_eq3`. For the extreme case that there are not more
than only 3 single values, or the data shall be evaluated in a classical
sense, the variance according to Eq. :eq:`data_sets_eq5` can be chosen as third option.
The latter can also be used for more than 3 single values. In the dialog
shown, the standard deviations sx and s0x correspond to equations :eq:`data_sets_eq1`
und :eq:`data_sets_eq2` in :numref:`mathematical background`.

The combobox indicated in the dialog by the label “sel. data record used
as reference“ allows to select one of the mean datasets, which is
intended to be used as a reference in the case of “\ *known* random
influences”. An example project is ISO-Example-2b_V2_EN.txp. If no
reference data set is selected, the evaluation follows that of the
option “\ *unknown* random influences“. The details for these options
are outlined in :numref:`gross quantity: variance interpolation for a mean`.

Values of mean and uncertainty of such a data set are transferred by the
program to the uncertainty table under the TAB “Values, uncertainties“
by the button “Calculating uncertainties”.

The individual values of this quantity with a name symbol are saved in
the project file (\*.txp) as a single line record identified by the
associated identification (symbol_data).

For **organizing the data input** it is recommended to begin with data
input into the :ref:`tab “values, uncertainties”`. For mean variables
characterized by „m“ as type the „t distribution“ is to be selected as
distribution type which enables a correct statistical treatment of the
mean within the mean dialog. Then, the mean dialog can be opened in
which the desired mean variable is selected; after input of associated
singe values the type of mean is selected which then can be calculated.
After leaving the dialog the calculation of uncertainties needs to be
updated/repeated.

The input of single values in this dialog was modified such, that after
input of a value the next cell is already opened for input. It happens
that the activated cell appears to be moved a bit away from the grid
cell, however, the value entered (finalized with Enter or cursor-down)
is transferred into the original grid cell. The input of values is then
finalized with typing Enter into the activated cell, which must be empty
for this purpose.