Standardized *z*-scores

collapse all in page

## Syntax

`Z = zscore(X)`

`Z = zscore(X,flag)`

`Z = zscore(X,flag,'all')`

`Z = zscore(X,flag,dim)`

`Z = zscore(X,flag,vecdim)`

`[Z,mu,sigma]= zscore(___)`

## Description

example

`Z = zscore(X)`

returnsthe z-score foreach element of `X`

such that columns of `X`

arecentered to have mean 0 and scaled to have standard deviation 1. `Z`

isthe same size as `X`

.

If

`X`

is a vector, then`Z`

isa vector of*z*-scores.If

`X`

is a matrix, then`Z`

isa matrix of the same size as`X`

, and each columnof`Z`

has mean 0 and standard deviation 1.For multidimensional arrays,

*z*-scoresin`Z`

are computed along the firstnonsingleton dimension of`X`

.

example

`Z = zscore(X,flag)`

scales `X`

usingthe standard deviation indicated by `flag`

.

If

`flag`

is 0 (default), then`zscore`

scales`X`

usingthe sample standard deviation, with*n*-1 in the denominator of the standard deviation formula.`zscore(X,0)`

isthe same as`zscore(X)`

.If

`flag`

is 1, then`zscore`

scales`X`

usingthe population standard deviation,with*n*in the denominator of standard deviationformula.

example

`Z = zscore(X,flag,'all')`

standardizes `X`

by using the mean and standard deviation of all the values in `X`

.

example

`Z = zscore(X,flag,dim)`

standardizes `X`

along the operating dimension `dim`

. For example, for a matrix `X`

, if `dim`

= 1, then `zscore`

uses the means and standard deviations along the columns of `X`

, if `dim`

= 2, then `zscore`

uses the means and standard deviations along the rows of `X`

.

example

`Z = zscore(X,flag,vecdim)`

standardizes `X`

over the dimensions specified by the vector `vecdim`

. For example, if `X`

is a matrix, then `zscore(X,0,[1 2])`

is equivalent to `zscore(X,0,'all')`

because every element of a matrix is contained in the array slice defined by dimensions 1 and 2.

example

`[Z,mu,sigma]= zscore(___)`

also returns the means and standarddeviations used for centering and scaling, `mu`

and `sigma`

,respectively. You can use any of the input arguments in the previoussyntaxes.

## Examples

collapse all

### Z-Scores of Two Data Vectors

Open Live Script

Compute and plot the $$z$$-scores of two data vectors, and then compare the results.

Load the sample data.

`load lawdata`

Two variables load into the workspace: `gpa`

and `lsat`

.

Plot both variables on the same axes.

plot([gpa,lsat])legend('gpa','lsat','Location','East')

It is difficult to compare these two measures because they are on a very different scale.

Plot the $$z$$-scores of `gpa`

and `lsat`

on the same axes.

Zgpa = zscore(gpa);Zlsat = zscore(lsat);plot([Zgpa, Zlsat])legend('gpa z-scores','lsat z-scores','Location','Northeast')

Now, you can see the relative performance of individuals with respect to both their `gpa`

and `lsat`

results. For example, the third individual’s `gpa`

and `lsat`

results are both one standard deviation below the sample mean. The eleventh individual’s `gpa`

is around the sample mean but has an `lsat`

score almost 1.25 standard deviations above the sample average.

Check the mean and standard deviation of the $$z$$-scores you created.

mean([Zgpa,Zlsat])

ans =1×210^{-14}× -0.1088 0.0357

std([Zgpa,Zlsat])

`ans = `*1×2* 1 1

By definition, $$z$$-scores of `gpa`

and `lsat`

have mean 0 and standard deviation 1.

### Z-Scores for a Population vs. Sample

Open Live Script

Load the sample data.

`load lawdata`

Two variables load into the workspace: `gpa`

and `lsat`

.

Compute the $$z$$-scores of `gpa`

using the population formula for standard deviation.

Z1 = zscore(gpa,1); % population formulaZ0 = zscore(gpa,0); % sample formuladisp([Z1 Z0])

1.2554 1.2128 0.8728 0.8432 -1.2100 -1.1690 -0.2749 -0.2656 1.4679 1.4181 -0.1049 -0.1013 -0.4024 -0.3888 1.4254 1.3771 1.1279 1.0896 0.1502 0.1451 0.1077 0.1040 -1.5076 -1.4565 -1.4226 -1.3743 -0.9125 -0.8815 -0.5724 -0.5530

For a sample from a population, the population standard deviation formula with $$n$$ in the denominator corresponds to the maximum likelihood estimate of the population standard deviation, and might be biased. The sample standard deviation formula, on the other hand, is the unbiased estimator of the population standard deviation for a sample.

### Z-Scores of a Data Matrix

Open Live Script

Compute $$z$$-scores using the mean and standard deviation computed along the columns or rows of a data matrix.

Load the sample data.

`load flu`

The dataset array `flu`

is loaded in the workplace. `flu`

has 52 observations on 11 variables. The first variable contains dates (in weeks). The other variables contain the flu estimates for different regions in the U.S.

Convert the dataset array to a data matrix.

flu2 = double(flu(:,2:end));

The new data matrix, `flu2`

, is a 52-by-10 double data matrix. The rows correspond to the weeks and the columns correspond to the U.S. regions in the data set array `flu`

.

Standardize the flu estimate for each region (the *columns* of `flu2`

).

Z1 = zscore(flu2,[ ],1);

You can see the $$z$$-scores in the variable editor by double-clicking on the matrix `Z1`

created in the workspace.

Standardize the flu estimate for each week (the *rows* of `flu2`

).

Z2 = zscore(flu2,[ ],2);

### Z-Scores of Multidimensional Array

Open Live Script

Find the z-scores of a multidimensional array by specifying to standardize the data along different dimensions. Compare the results when using the `'all'`

, `dim`

, and `vecdim`

input arguments.

Create a 3-by-4-by-2 array.

X = reshape(1:24,[3 4 2])

X = X(:,:,1) = 1 4 7 10 2 5 8 11 3 6 9 12X(:,:,2) = 13 16 19 22 14 17 20 23 15 18 21 24

Standardize `X`

by using the mean and standard deviation of all the values in `X`

.

`Zall = zscore(X,0,'all')`

Zall = Zall(:,:,1) = -1.6263 -1.2021 -0.7778 -0.3536 -1.4849 -1.0607 -0.6364 -0.2121 -1.3435 -0.9192 -0.4950 -0.0707Zall(:,:,2) = 0.0707 0.4950 0.9192 1.3435 0.2121 0.6364 1.0607 1.4849 0.3536 0.7778 1.2021 1.6263

The resulting multidimensional array of z-scores has mean 0 and standard deviation 1. For example, compute the mean and standard deviation of `Zall`

.

`mZall = mean(Zall(:,:,:),'all')`

mZall = -9.2519e-18

`sZall = std(Zall(:,:,:),0,'all')`

sZall = 1.0000

Now standardize `X`

along the second dimension.

Zdim = zscore(X,0,2)

Zdim = Zdim(:,:,1) = -1.1619 -0.3873 0.3873 1.1619 -1.1619 -0.3873 0.3873 1.1619 -1.1619 -0.3873 0.3873 1.1619Zdim(:,:,2) = -1.1619 -0.3873 0.3873 1.1619 -1.1619 -0.3873 0.3873 1.1619 -1.1619 -0.3873 0.3873 1.1619

The elements in each row of each page of `Zdim`

have mean 0 and standard deviation 1. For example, compute the mean and standard deviation of the first row of the second page of `Zdim`

.

`mZdim = mean(Zdim(1,:,2),'all')`

mZdim = 0

`sZdim = std(Zdim(1,:,2),0,'all')`

sZdim = 1

Finally, standardize `X`

based on the second and third dimensions.

Zvecdim = zscore(X,0,[2 3])

Zvecdim = Zvecdim(:,:,1) = -1.4289 -1.0206 -0.6124 -0.2041 -1.4289 -1.0206 -0.6124 -0.2041 -1.4289 -1.0206 -0.6124 -0.2041Zvecdim(:,:,2) = 0.2041 0.6124 1.0206 1.4289 0.2041 0.6124 1.0206 1.4289 0.2041 0.6124 1.0206 1.4289

The elements in each `Zvecdim(i,:,:)`

slice have mean 0 and standard deviation 1. For example, compute the mean and standard deviation of the elements in `Zvecdim(1,:,:)`

.

`mZvecdim = mean(Zvecdim(1,:,:),'all')`

mZvecdim = 2.7756e-17

`sZvecdim = std(Zvecdim(1,:,:),0,'all')`

sZvecdim = 1

### Z-Scores, Mean, and Standard Deviation

Open Live Script

Return the mean and standard deviation used to compute the $$z$$-scores.

Load the sample data.

`load lawdata`

Two variables load into the workspace: `gpa`

and `lsat`

.

Return the $$z$$-scores, mean, and standard deviation of `gpa`

.

[Z,gpamean,gpastdev] = zscore(gpa)

`Z = `*15×1* 1.2128 0.8432 -1.1690 -0.2656 1.4181 -0.1013 -0.3888 1.3771 1.0896 0.1451 ⋮

gpamean = 3.0947

gpastdev = 0.2435

## Input Arguments

collapse all

`X`

— Input data

vector | matrix | multidimensional array

Input data, specified as a vector, matrix, or multidimensionalarray.

**Data Types: **`double`

| `single`

`flag`

— Indicator for the standard deviation

0 (default) | 1

Indicator for the standard deviation used to compute the *z*-scores, specified as 0 or 1.

If

`flag`

is 0 (default), then`zscore`

scales`X`

using the sample standard deviation.`zscore(X,0)`

is the same as`zscore(X)`

.If

`flag`

is 1, then`zscore`

scales`X`

usingthe population standard deviation.

`dim`

— Dimension

positive integer scalar

Dimension along which to calculate the *z*-scores of `X`

, specified as a positive integer scalar. If you do not specify a value, then the default value is the first array dimension whose size does not equal 1.

For example, for a matrix `X`

, if `dim`

= 1, then `zscore`

uses the means and standard deviations along the columns of `X`

, and if `dim`

= 2, then `zscore`

uses the means and standard deviations along the rows of `X`

.

`vecdim`

— Vector of dimensions

positive integer vector

Vector of dimensions along which to calculate the *z*-scores of X, specified as a positive integer vector. Each element of `vecdim`

represents a dimension of the input array `X`

. The output Z has the same dimensions as `X`

, but the mean mu and standard deviation sigma each have length 1 in the operating dimensions. The other dimension lengths are the same for `X`

, `mu`

, and `sigma`

.

For example, if `X`

is a 2-by-3-by-3 array, then `zscore(X,0,[1 2])`

uses the means and standard deviations along the pages of `X`

to standardize the values of `X`

.

**Data Types: **`single`

| `double`

## Output Arguments

collapse all

`Z`

— *z-*scores

vector | matrix | multidimensional array

*z-*scores, returned as a vector, matrix, or multidimensional array. `Z`

has the same dimensions as X.

The values of `Z`

depend on whether you specify `'all'`

, dim, or vecdim. If you do not specify any of these input arguments, then the following conditions apply:

If

`X`

is a vector, then`Z`

is a vector of*z*-scores with mean 0 and variance 1.If

`X`

is an array, then`zscore`

standardizes along the first nonsingleton dimension of`X`

.

For an example that demonstrates the differences in `Z`

when you use `'all'`

, `dim`

, and `vecdim`

, see Z-Scores of Multidimensional Array.

`mu`

— Mean

scalar | vector | matrix | multidimensional array

Mean of X used to compute the *z*-scores, returned as a scalar, vector, matrix, or multidimensional array. `mu`

has length 1 in the specified operating dimensions. The other dimension lengths are the same for `X`

and `mu`

.

For example, if `X`

is a 2-by-3-by-3 array and vecdim is `[1 2]`

, then `mu`

is a 1-by-1-by-3 array of means. Each value in `mu`

corresponds to the mean of a page in `X`

.

`sigma`

— Standard deviation

scalar | vector | matrix | multidimensional array

Standard deviation of X used to compute the *z*-scores, returned as a scalar, vector, matrix, or multidimensional array. `sigma`

has length 1 in the specified operating dimensions. The other dimension lengths are the same for `X`

and `sigma`

.

For example, if `X`

is a 2-by-3-by-3 array and vecdim is `[1 2]`

, then `sigma`

is a 1-by-1-by-3 array of standard deviations. Each value in `sigma`

corresponds to the standard deviation of a page in `X`

.

## More About

collapse all

### Z-Score

For a random variable *X* withmean μ and standard deviation σ, the *z*-scoreof a value *x* is

$$z=\frac{\left(x-\mu \right)}{\sigma}.$$

For sample data with mean $$\overline{X}$$ and standard deviation *S*,the *z*-score of a data point *x* is

$$z=\frac{\left(x-\overline{X}\right)}{S}.$$

*z*-scores measure the distance of a data pointfrom the mean in terms of the standard deviation. This is also called *standardization* ofdata. The standardized data set has mean 0 and standard deviation1, and retains the shape properties of the original data set (sameskewness and kurtosis).

You can use *z*-scores to put data on the same scale before further analysis. This lets you compare two or more data sets with different units.

### Multidimensional Array

A *multidimensional array* isan array with more than two dimensions. For example, if X is a 1-by-3-by-4array, then `X`

is a three-dimensional array.

### First Nonsingleton Dimension

A *first nonsingleton dimension* is the first dimension of an array whose size is not equal to 1. Forexample, if `X`

is a 1-by-2-by-3-by-4 array, thenthe second dimension is the first nonsingleton dimension of `X`

.

### Sample Standard Deviation

The *sample standard deviation* *S* is given by

$$S=\sqrt{\frac{{\displaystyle {\sum}_{i=1}^{n}{\left({x}_{i}-\overline{X}\right)}^{2}}}{n-1}}.$$

*S* is the square root of an unbiased estimator of the variance of the population from which X is drawn, as long as `X`

consists of independent, identically distributed samples. $$\overline{X}$$ is the sample mean.

Notice that the denominator in this variance formula is *n* – 1.

### Population Standard Deviation

If the data is the entire population of values, then you can use the *population standard deviation*,

$$\sigma =\sqrt{\frac{{\displaystyle {\sum}_{i=1}^{n}{\left({x}_{i}-\mu \right)}^{2}}}{n}}.$$

If X is a random sample from a population, then the mean *μ* is estimated by the sample mean, and *σ* is the biased maximum likelihood estimator of the population standard deviation.

Notice that the denominator in this variance formula is *n*.

## Algorithms

`zscore`

returns `NaN`

s forany sample containing `NaN`

s.

`zscore`

returns `0`

s for any sample that is constant (all values are the same). For example, if `X`

is a vector of the same numeric value, then `Z`

is a vector of `0`

s.

## Extended Capabilities

### Tall Arrays

Calculate with arrays that have more rows than fit in memory.

This function fully supports tall arrays. Formore information, see Tall Arrays.

### C/C++ Code Generation

Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

The

`'all'`

and`vecdim`

input arguments are not supported.The

`dim`

input argument must be a compile-time constant.If you do not specify the

`dim`

input argument, the working (or operating) dimension can be different in the generated code. As a result, run-time errors can occur. For more details, see Automatic dimension restriction (MATLAB Coder).

For more information on code generation, see Introduction to Code Generation and General Code Generation Workflow.

### Thread-Based Environment

Run code in the background using MATLAB® `backgroundPool`

or accelerate code with Parallel Computing Toolbox™ `ThreadPool`

.

This function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment.

### GPU Arrays

Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

The

`'all'`

and`vecdim`

input arguments are not supported.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

## Version History

**Introduced before R2006a**

## See Also

mean | std | normalize | rescale

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- Deutsch
- English
- Français

- United Kingdom (English)

Contact your local office