Checking data

<< Click to Display Table of Contents >>

Navigation:  User Guide > Data > Handling data >

Checking data

Companion

From the Main window you can use [Data] [Check data file].  You will be asked for a data file to check and then be placed at Data check:

The checking is done in two or three phases, you are given the option to change the error checks to perform in phase 2 after phase 1 is completed; the third phase is optional and "cleans" errors as they are found

Automatic cleaning does not deal with missing data, it only removes faulty or unwanted data

The data file itself is not changed until it is saved; it is only the data view that has been changed

Phase 1 (check structure)

This phase is automatic and is done when the file is opened.

The whole data file is read and collated, and each serial is checked. This first phase only deals with structural problems.

This first phase lists the number of times the following errors occurred:

Raw data structure errors - this refers to faulty serial and card numbers, some data lines may have been ignored

View structure log

Any errors encountered during phase 1 are kept in a Data error log.

When viewed, this shows an Error log with a list of the errors found, each with the position and fault found.

The log can be sorted, printed and copied to the clipboard for pasting into a spreadsheet program, for example Microsoft Excel.

Phase 2 (check entry contents)

This phase is automatic and is done when the file is opened.

Each serial is checked and cleaned. This second phase will only clean data that is not recognisable for the type of question.

This second phase lists the number of times the following errors occurred:

Faulty data (not a number) - for questions that should have a number stored in the data, the contents are not a valid number; these questions will be made empty

More than 1 response in single-coded - there is more than one response to a single-coded question, only the lowest numbered response will be kept and used

Data not referenced by any question - this is an important check and will tell you if any questions or responses are missing from the project

IMPORTANT: when a multi-coded question with individual response locations is checked in a Character data file, any 0 codes in the same data locations as the 1 codes used by the question will be ignored and not listed as not referenced data.  This is done because some programs put a 0 or 1 in every data location for such questions.  Companion only uses the 1 code and will only store the 1 code if the data is saved.

IMPORTANT: when a multi-coded question or single-coded question without individual response locations is checked in a data file, any 0 values will be ignored.  This is done because some programs use a zero to mean empty.  Companion only uses the valid response numbers and will not store a zero for empty.

If there are any errors in this phase you should use the Raw data view window to look at the raw data file and correct any problems with the data or the question definitions before using this facility again.

This phase only looks at the question data; it does not check filters and variables.

View open error log

Any errors encountered during phase 2 are kept in a Data error log.

When viewed, this shows an Error log with a list of the errors found, each with the serial number, entry name, and fault found.

The log can be sorted, printed and copied to the clipboard for pasting into a spreadsheet program, for example Microsoft Excel.

Phase 3 (check data)

This is used to check the raw data file against the questions in the project and report any errors. This is sometimes called a "report edit".

Check data can be repeated as often as required (with different checks) on the opened raw data file.

Before running each time you can:

Select the entries to be checked from the Main window and only report faults in those selected questions

Choose whether to include ZZZ entries in the checking

For every serial it goes through all the entries and reports any errors.  The entire data file is always checked.

You can choose which types of fault are reported.  When checking is completed, you can view the error log and view the data with the errors marked.

The errors that can be reported are:

Hold filters true - these are used to check for inconsistencies in the data

Empty data - there is nothing in the entry and it is not marked as blank allowed; for single-coded and multi-coded questions it also does not have a supplemental reject response

Not empty - this question is filtered but does have an answer

For integer, float, date and time questions:

Out of range - the answer is not within the range set for the question

For single-coded and multi-coded questions:

Out of range - the response number is not in the response list for the question

Invalid response - the response is marked as a supplemental reject or as a print only

Response filtered - response restrictors are applied to the question and the response is not valid in one of them

Single code response with others - a multi-coded question has more than one response and one of them is marked as single code only if present

Refused input response - the response is marked as not available for input

Unallocated response - the response has no text allocated to it

You can change the types of fault to report from the above list and check the data again, only reporting the types of fault requested.

View check log

When the data file has been checked a count of the total number of errors of each type of fault is shown and a list of errors encountered is kept in a Data error log.

When viewed, this shows a Data log with a list of the errors found, each with the serial number, entry name, and fault found.

The log can be sorted, printed and copied to the clipboard for pasting into a spreadsheet program, for example Microsoft Excel.

The log will only show the types of fault requested for the selected questions.

View data

Once the file has been checked, you can view the checked data, see Viewing data.

Any faults reported when the check was done will be highlighted in the Data view window.

When viewing the data you can choose to see all the serial numbers or only those with faults.  You can also choose to only see the selected questions.

The data can be altered and saved from the Data view window.

Optional phase 4 (clean and check data)

This does the same checks as phase 3 above but the data is cleaned as it is checked.  This is sometimes called a "force edit".  For a description of the errors, see phase 3 above.

For all questions:

Hold filters true - not cleaned

Empty data - not cleaned

Not empty - they will be made empty

For integer, float, date and time questions:

Out of range - they will be made empty

For single-coded and multi-coded questions:

Out of range - the responses will be deleted

Invalid response - the responses will be deleted

Response filtered - the responses will be deleted

Single code response with others - the responses marked single code only will be deleted

Refuse input response - you can choose to leave these responses in the data or to delete them

Unallocated response - you can choose to leave these responses in the data or to delete them

IMPORTANT: filters and variables are calculated using the cleaned questions (not the original contents if they were in error).

IMPORTANT: the data file itself is not cleaned until it is saved; it is only the data view that is cleaned.

You can view the error log and view the cleaned data as in phase 3 above.

Supplemental rejects

Single-coded and multi-coded questions can have the last response marked as a supplemental (user defined) reject, see Entry details window.  During cleaning of a question with a supplemental reject response:

If it is filtered and should, therefore, be empty and it has the supplemental reject response set, then this will be removed and no error will be reported

If the question is empty (or is made empty), then the supplemental reject response will be set and not be reported as empty

Embedded rejects

Single-coded and multi-coded questions can have a response marked as an embedded reject, see Entry details window.  During cleaning of a question with an embedded reject response:

If the question is empty (or is made empty) then the embedded reject response will be set and it will not be reported as empty