User Guide > Analysis > Column identifiers

Companion and Reflect

The program will output significance markers when significance tests are performed, see Significance formats.

The columns of the table can have characters (usually letters) associated with them, and these are used as column identifiers to show which columns are "significantly different and higher" to those columns.

You can use the Main window menu to [Edit] [Add column identifiers to selected entries] to add these column identifiers. Alternatively they can be added manually by putting the relevant letter in parentheses at the end of the column labels.

For example:

Area\North (a)

Mid (b)

South (c)

Not stated

Sex\Male (m)

Female (f)

Column identifier rules

The maximum number of characters allowed in an identifier is set in Project global settings General tab.

For an identifier to be used it must obey the following rules:

If the column text has more than one line then it must be on the last line, preferably at the end.

It must not have more characters between the open and close parentheses than have been set in Project global.

It must not contain any numbers (digits 0-9).

It must not contain any of the following characters: plus (+), minus (-), dot (.), or space ( ).

Any identifier which does not comply with the above will be treated as part of the text and will not be used as a column identifier.

Provided that no upper case letters are used as identifiers, the program will use lower case markers for format SLA level and upper case markers for format SLB level when comparing. Level SLC will add a + sign to the marker, see Significance formats.

If there are any upper case identifiers in the breakdown then SLA will show the letters as they are.

For languages without upper and lower case, or when upper case identifiers are used, SLB should not be used and should be set to the same value as SLA.

Markers

When testing against other columns using identifiers it is only the columns that are "higher" that get the markers to show which column that they are higher than. By "higher" is meant a larger percentage or a larger mean score.

Note that for some tables the higher figure may mean "worse", so don't always assume that marked columns are "better" without checking.

There is an important difference between these two types of marking. When testing against columns it is only the "higher" that gets the mark. When testing against the total each column get marked with + or ++ for "higher" and - or -- for "lower".

Where more than one character is allowed for identifiers, the markers will be separated by a dot (.).

Example marking under format SHG11 with all lowercase identifiers and the default settings of SLA95/SLB99:

A cell marked as "+a" means that this cell is higher than the total (the "rest") and is also higher than the column identified with (a) under the same header, both at 95% level .

A cell marked as "++A" means that this cell is higher than the total (the "rest") and also higher than the column identified with (a) under the same header, both at 99% level .

A cell marked as "-AC" means that this cell is 95% lower than the total (the "rest") but 99% higher than the columns identified with (a) and (c) under the same header.

A cell marked as "++Cef" means that this cell is 99% higher than the total (the "rest"), and 99% higher than the column (c) and 95% higher than columns (e) and (f) under the same header.

Format SHG

SHG11 is the default, see also Significance formats.

You will notice that headers (Area and Sex) have been used in the above example. If format SHG1 or SHG11 (test within headers) is used then each area is compared with the other two, and the appropriate letter markers placed against the cell, if significant differences are found. For example if "Mid" is found to be different and higher to both of the other areas, with a significance of 90% (as set by format option SLA90) it will be marked with lowercase letters "ac". If "Mid" was found to be different and higher to both of the other areas, with a significance of 95% (as set by format option SLB95) it will be marked with uppercase letters "AC". The "Not stated" column will not be tested because it does not have an identifier. Also Males and Females will be compared, so that if Females are different and higher then the letter "m" or "M" (depending on the significance level) will be used in the Female cell.

If format SHG11 is used then every column will also be compared to the total column and marked with + or - (SLA level), ++ or -- (SLB level). The test is actually against the total column minus the column being tested often referred to as "the rest" meaning those records not in this column being tested.

Identifiers can be reused under different headers so there is no limit on the number of columns using identifiers.

Duplicate identifiers should never be used in the same group (under the same header). If there are too many columns in a group (usually 27+) then two character identifiers will be needed.

Using more than one character for identifiers

Care is need when setting the Max characters in column identifiers in Project global settings to more than one character. Text that you do not want to be used as an identifier such as "Heathrow airport (LHR)" may need to be changed to "Heathrow airport {LHR}". On the other hand you may want it to be used as an identifier:

Airport\Heathrow (lhr)

Glasgow (gla)

Edinburgh (edi)

Not stated

If figures under Glasgow are much higher than the other two airports then the Glasgow figure will be marked with "lhr.edi" (both SLA) or "LHR.EDI" (both SLB) or "LHR.edi" (one SLA and the other SLB).

If this looks untidy you could use:

Airport\Heathrow (LHR)

Glasgow (GLA)

Edinburgh (EDI)

Not stated

With SLA95/SLB95/SLC99 which would produce markers "LHR.EDI" (both SLA) or "LHR+.EDI+" (both SLC) or "LHR+.EDI" (one SLA and the other SLC).

Additional column tests

Columns can be tested against columns not in the same group by including the relevant markers between plus (+) signs after the identifier. For example:

Males Area\North (a) +d+

Mid (b) +e+

South (c) +f+

Not stated;

Females Area\North (d)

Mid (e)

South (f)

Not stated;

IMPORTANT: you can only refer to single character identifiers for additional column tests.

In this case using SHG1 or SHG11, column (b) is tested against columns (a) and (c) under the same header, and also column (e).

Note that any forward marking implies the reverse marking so the above is equivalent to:

Males Area\North (a) +d+

Mid (b) +e+

South (c) +f+

Not stated;

Females Area\North (d) +a+

Mid (e) +b+

South (f) +c+

Not stated;

The above is acceptable for clarity to show all the columns being tested against although the second set of +a+ are not needed because they are set automatically.

IMPORTANT: backward marking will not be used if the equivalent forward marking is not also present.

There is a special setting +?+ which means test this column against all following columns, or up to the next column with +?+. This is useful when total columns are included in the breakdown. This particular setting will not show in the tables. The +?+ must be the last text in the label and all the identifiers following +?+ should be unique, so you cannot reuse letters.

For example:

\Total (t)+?+

Area\North (a)

Mid (b)

South (c)

Not stated

Sex\Male (m)

Female (f)

IMPORTANT: the overlapping statistics formulae should normally be used with +?+.

Example marking

Under format SHG11 with all lowercase identifiers and the default settings of SLA95/SLB99:

A cell marked as "+a" means that this cell is higher than the total (the "rest") and is also higher than the column identified with (a) under the same header, both at 95% level .

A cell marked as "++A" means that this cell is higher than the total (the "rest") and also higher than the column identified with (a) under the same header, both at 99% level .

A cell marked as "-AC" means that this cell is 95% lower than the total (the "rest") but 99% higher than the columns identified with (a) and (c) under the same header.

A cell marked as "++Cef" means that this cell is 99% higher than the total (the "rest"), and 99% higher than the column (c) and 95% higher than columns (e) and (f) under the same header.