SIG Significance tests on distributions

<< Click to Display Table of Contents >>

Navigation:  Reference > Formats >

SIG Significance tests on distributions

See format SHG for details about column identifiers.

SIG is a numeric format. The standard setting is SIG0 (false).

SIG0

This means no testing.

SIG1

This uses the standard Z test formula using combined variance (pooled).

SIG2

This uses a non-standard Z test formula using separate variances (un-pooled).

SIG3 (recommended)

This uses the standard t test formula using combined variance.

SIG4

This uses the t test formula with continuity correction.

SIG5

This uses the overlap t test formula using combined variance. See overlap note below.

SIG6

This uses the overlap t test formula with continuity correction. See overlap note below.

IMPORTANT: format MCT sets minimum bases for testing. For weighted data this will be:

With ESS (default) the effective base is used, see format ESR.

With NESS the unweighted base is used.

SIG1-6 carry out significance tests on a table produced by CL. Each cell of the table is inspected in turn; CL looks for a significant difference between the cell and other cells in the same row.

Overlap (SIG5 and SIG6)

This is used when the columns being tested against each other overlap, meaning that some records are in both.  For use when you have a breakdown which contains multi-coded data under a header.

Having more than 20 or so columns will slow down the running of your tables.  With long breaks you are advised to separate overlapping breaks to another table and only use the overlapping formula for that table.

IMPORTANT: You can only have up to 90 columns when using the overlap formula.

IMPORTANT: The overlap formula only works when tables are defined in one go (or overlaid by using the same table name).  They will not be used for grid tables where columns are incremented separately.

IMPORTANT: The overlap formula is not not used when comparing with the total column:

When testing against the total column with SIG5, the SIG3 formula will be used

When testing against the total column with SIG6, the SIG4 formula will be used

Manip with overlap

Be careful when using manip  with the overlap formulas.  You can manipulate rows, but not individual columns except the total column.

When using part-ids any column range set like 1-$ will not include the overlap counts and should be avoided.  Using the whole table or a part-id that only refers to rows will include the overlap counts.

Overlay with overlap

Overlapping columns will only be counted if the all the relevant columns are incremented at the same time.  If you build a grid table one column at a time, the overlaps will not be counted and cannot be used in the calculations.

SHG0

This is the standard value and it causes significance markers to be generated for each cell independently. Markers are specified by formats SMA and SMB (standard values = *).

Each column is tested against an imaginary column that is obtained by subtracting the column being tested from the total column.

IMPORTANT: the table must have a total row and a total column.

Cells are marked as significantly different if the population in this cell does differ from the rest of the population in this row. The rest of the population is assumed to be the column in question subtracted from the total column

Formats SLA, SLB, SLC and SLD set the percentage confidence levels to be tested.

When a cell is marked as significant a character is placed to the right of the cell value.

When cells are marked under format SIG the marker(s) appear to the right of the cell value. If insufficient blank space is available (the marker would be overwritten by the next cell, or would be beyond the page boundary) then an additional line is generated and the marker(s) are placed right justified under the value. To avoid the generation of an additional line when double markers are used (when SMA and SMB are the same) we recommend using format CLG3 and CHG3 or more.

If SMA is set to + when using SIG then the cells will be marked with + or - depending on whether the proportion is significantly higher (+) or lower (-).

The value calculated is tested against the relevant significance level indicated by format SLA to SLD and the cell marked if appropriate.

SHG1, SHG2 and SHG3

This causes paired comparisons for all columns within each header. The value of SHG specifies the level of header to be used, usually SHG1 for the lowest header level. Each column is tested against the other columns under the same header, using formats SLA to SLD and one or more character markers is used to identify the columns which are significantly different.

The marker that identifies each column is provided in H (horizontal) text. The identifier must be surrounded by parentheses and must appear in the last line of text for the column. Almost any character may be used as the marker, although we recommend using only lower case letters.

Columns can be tested against columns not in the same group by including the relevant markers between plus (+) signs also in the last line of text for the column.

See format SHG for further details.

IMPORTANT: only the higher of the two percentages is marked.

You may need to increase format CHG to make room for the markers. If there is insufficient room the program generates a new line for the marker and places it underneath the cell.

SHG11, SHG12 and SHG13

These combine the markers from SHG0 and SHG1-3 so that both sets of markers are used.

See also formats:

CHG Column Header Gap

CLG Column Label Gap

ESS Effective sample sizes in statistics calculations

MCT Minimum Column for T tests

SHG Statistics Header Group

SLA Significance Level A

SLB Significance Level B

SLC Significance Level C

SLD Significance Level D

SMA Significance Marker A

SMB Significance Marker B