SHG Statistics Header Group

<< Click to Display Table of Contents >>

Navigation:  Reference > Formats >

SHG Statistics Header Group

SHG is a numeric format. The standard setting is SHG11.

This format should be set to 1,0,1,2,3, or 11,12,13 to indicate the groupings to use for statistics.  It controls which columns are tested against each other.

See the individual statistics formats for details.

SHG0

Markers are specified by formats SMA and SMB.

Default setting of SMA'+'/SMB'+' so that plus (+) or minus (-) is used. Earlier version used a (*)..

Each column is tested against an imaginary column that is obtained by subtracting the column being tested from the total column.

The table must have a printed total column.

SHG1 - SHG3

This sets the level of column header that controls the statistics.

If column identifiers are present in the breakdown then the appropriate letters will be used as markers.

SHG1 causes statistics to be calculated within each header (*** or \).

SHG2 causes statistics to be calculated within over-headers (**** or ^).

SHG3 causes statistics to be calculated within super-headers (*****).

If there are no super-headers then SHG3 will compare all columns in the break with each other.  If you want the total column to be included, you will need to create a "true" column in the break an suppress the real total (format NPTC).  See also settings SHG11-13.

SHG11 - SHG13

This tells CL to mark comparisons against the "rest" as SHG0, and also to mark comparisons as SHG1-3 respectively.

SHG-1

This setting is deprecated and will be treated as SHG3.

Column identifiers

The program will output significance markers when significance tests are performed.

The columns of the table can have characters (usually letters) associated with them, and these are used as column identifiers to show which columns are "significantly different and higher" to those columns.

For example:

Area\North (a)

Mid (b)

South (c)

Not stated

Sex\Male (m)

Female (f)  

Column identifier rules

The maximum number of characters allowed in a column identifier is set with RCP MARKERSEP.

For an identifier to be used it must obey the following rules:

If the column text has more than one line then it must be on the last line, preferably at the end.

It must not have more characters between the open and close parentheses than have been set in Project global.

It must not contain any numbers (digits 0-9).

It must not contain any of the following characters: plus (+), minus (-), dot (.), or space ( ).

Any identifier which does not comply with the above will be treated as part of the text and will not be used as a column identifier.

Provided that no upper case letters are used as identifiers, the program will use lower case markers for format SLA level and upper case markers for format SLB level when comparing. Level SLC will add a + sign to the marker.

If there are any upper case identifiers in the breakdown then SLA will show the letters as they are.

For languages without upper and lower case, or when upper case identifiers are used, SLB should not be used and should be set to the same value as SLA.

Markers

When testing against other columns using identifiers it is only the columns that are "higher" that get the markers to show which column that they are higher than.  By "higher" is meant a larger percentage or a larger mean score.

Note that for some tables the higher figure may mean "worse", so don't always assume that marked columns are "better" without checking.

There is an important difference between these two types of marking. When testing against columns it is only the "higher" that gets the mark.  When testing against the total each column get marked with + or ++ for "higher" and - or -- for "lower".

Where more than one character is allowed for identifiers, the markers will be separated by a dot (.).

Example marking under format SHG11 with all lower case identifiers and the default settings of SLA95/SLB99:

A cell marked as "+a" means that this cell is higher than the total (the "rest") and is also higher than the column identified with (a) under the same header, both at 95% level.

A cell marked as "++A" means that this cell is higher than the total (the "rest") and also higher than the column identified with (a) under the same header, both at 99% level .

A cell marked as "-AC" means that this cell is 95% lower than the total (the "rest") but 99% higher than the columns identified with (a) and (c) under the same header.

A cell marked as "++Cef" means that this cell is 99% higher than the total (the "rest"), and 99% higher than the column (c) and 95% higher than columns (e) and (f) under the same header.

Format SHG effect

You will notice that headers (Area and Sex) have been used in the above example. If format SHG1 or SHG11 (test within headers) is used then each area is compared with the other two, and the appropriate letter markers placed against the cell, if significant differences are found.

For example, if "Mid" is found to be different and higher to both of the other areas, with a significance of 90% (as set by format option SLA90) it will be marked with lower case letters "ac". If "Mid" was found to be different and higher to both of the other areas, with a significance of 95% (as set by format option SLB95) it will be marked with upper case letters "AC". The "Not stated" column will not be tested because it does not have an identifier. Also Males and Females will be compared, so that if Females are different and higher then the letter "m" or "M" (depending on the significance level) will be used in the Female cell.

If format SHG11 is used then every column will also be compared to the total column and marked with + or - (SLA level), ++ or -- (SLB level).  The test is actually against the total column minus the column being tested often referred to as "the rest" meaning those records not in this column being tested.

Identifiers can be reused under different headers so there is no limit on the number of columns using identifiers.

Duplicate identifiers should never be used in the same group (under the same header).  If there are too many columns in a group (usually 27+) then two character identifiers will be needed.

Using more than one character for identifiers

Care is need when setting the Max characters in MARKERSEP to more than one character. Text that you do not want to be used as an identifier such as "Heathrow airport (LHR)" may need to be changed to  "Heathrow airport {LHR}".  On the other hand you may want it to be used as an identifier:

Airport\Heathrow (lhr)

Glasgow (gla)

Edinburgh (edi)

Not stated

If figures under Glasgow are much higher than the other two airports then the Glasgow figure will be marked with "lhr.edi" (both SLA) or "LHR.EDI" (both SLB) or "LHR.edi" (one SLA and the other SLB).

If this looks untidy you could use:

Airport\Heathrow (LHR)

Glasgow (GLA)

Edinburgh (EDI)

Not stated

With SLA95/SLB95/SLC99 which would produce markers "LHR.EDI" (both SLA) or "LHR+.EDI+" (both SLB) or "LHR+.EDI" (one SLA and the other SLB).

Additional column tests

Columns can be tested against columns not in the same group by including the relevant markers between plus (+) signs after the identifier. For example:

Males Area\North (a) +d+

Mid (b) +e+

South (c) +f+

Not stated;

Females Area\North (d)

Mid (e)

South (f)

Not stated;

IMPORTANT: you can only refer to single character identifiers for additional column tests.

In this case using SHG1 or SHG11, column (b) is tested against columns (a) and (c) under the same header, and also column (e).  

Note that any forward marking implies the reverse marking so the above is equivalent to:

Males Area\North (a) +d+

Mid (b) +e+

South (c) +f+

Not stated;

Females Area\North (d) +a+

Mid (e) +b+

South (f) +c+

Not stated;

The above is acceptable for clarity to show all the columns being tested against although the second set of +a+ are not needed because they are set automatically.

IMPORTANT: backward marking will not be used if the equivalent forward marking is not also present.

There is a special setting +?+ which means test this column against all following columns, or up to the next column with +?+.  This is useful when total columns are included in the breakdown.  This particular setting will not show in the tables.  The +?+ must be the last text in the label and all the identifiers following +?+ should be unique, so you cannot reuse letters.

For example:

 \Total (t)+?+

Area\North (a)

Mid (b)

South (c)

Not stated

Sex\Male (m)

Female (f)

IMPORTANT: the overlapping statistics formulae should normally be used with +?+.

Example marking

Under format SHG11 with all lower case identifiers and the default settings of SLA95/SLB99:

A cell marked as "+a" means that this cell is higher than the total (the "rest") and is also higher than the column identified with (a) under the same header, both at 95% level .

A cell marked as "++A" means that this cell is higher than the total (the "rest") and also higher than the column identified with (a) under the same header, both at 99% level .

A cell marked as "-AC" means that this cell is 95% lower than the total (the "rest") but 99% higher than the columns identified with (a) and (c) under the same header.

A cell marked as "++Cef" means that this cell is 99% higher than the total (the "rest"), and 99% higher than the column (c) and 95% higher than columns (e) and (f) under the same header.

See also formats:

AVG Averages or mean scores

CHI Chi-square test

KST Kolmogorov-Smirnov Tests

MCT Minimum Column for T tests

MWW Mann-Whitney-Wilcoxon

SIG Significance tests on distributions

SLA Significance Level A

SLB Significance Level B

SLC Significance Level C

SLD Significance Level D

SMA Significance Marker A

SMB Significance Marker B

SMS Significance Marker Space

TTF T Tests or F tests

TTT T Tests on Tables