Data check

<< Click to Display Table of Contents >>

Navigation:  Reference Manual > Windows and dialog boxes > Data check >

Data check

From the Main window menu [Data] [Check data file] is used to check the contents of a data file.

This enables you to inspect data for specific errors.  You should check a data file before running any analysis.

When this is first opened you are asked for the name of the data file to check.  If it does not have a valid file name extension then the software will try to determine the type from the contents.

The checking is done in three or four phases:

Phase 1 (Check structure)

This phase is done when a data file is opened.

The whole data file is read and collated.

This phase only deals with structural problems (card and serial numbers).

It lists the number of times the following types of error occurred:

Raw data structure errors - this refers to faulty serial and card numbers, some data lines may have been ignored

When phase 1 is complete the program will go straight onto phase 2.

Open data file

Use this button to open another data file for checking.

View structure log

If any errors were encountered during Phase 1 you can view the Data log with a list of the errors found, each with position and fault found.

Phase 2 (Open data)

This phase is done when a data file is opened after the structure has been checked.

This phase will only clean data that is not recognisable for the type of question.  It only looks at the question data and does not check filters and variables.

It lists the number of times the following types of error occurred:

Faulty data (not a number) - for questions that should have a number stored in the data, the contents are not a valid number; these questions will be made empty

More than 1 response in single-coded - there is more than one response to a single-coded question, only the lowest numbered response will be kept and used

Data not referenced by any question - this will tell you if any questions or responses are missing from the project

View open error log

If any errors were encountered during Phase 2 you can view the Data log with a list of the errors found, each with the serial number, entry name, and fault found.

Phase 3 (Check data)

This phase is used to check the data file against the questions and filters in the project and report any errors.  This is sometimes called a ”report edit”.

The errors that can be reported are:

Hold filters true - these are used to check for inconsistencies in the data

Empty data - there is nothing in the entry and it is not marked as blank allowed; optionally for single-coded and multi-coded questions, it also does not have a supplemental reject response

Not empty - this question is filtered but does have an answer

For integer, float, date and time questions:

Out of range - the answer is not within the range set for the question

For single-coded questions and multi-coded questions:

Out of range - the response number is not in the response list for the question

Invalid response - the response is marked as a supplemental reject or as print only

Response restrictor - response restrictors are applied to the question and the answer is not valid in one of them

Single-coded response with others - a multi-coded question has more than one response and one of them is marked as single-coded only if present

Refused input response - the answer is marked as not available for input

Unallocated response - the answer has no text allocated to it

You can choose which types of fault are reported using the check boxes alongside the counts.

Check data

This button can be used as often as required to run Phase 3 again on the opened data file.

Before running each time you can:

Check all questions or select entries from the Main window and only report faults in those selected questions

Choose whether to include ZZZ question entries in the checking

The entire data file is always checked.

View check log

If any errors were encountered during Phase 3 or 4 you can view the Data log with a list of the errors found, each with the serial number, entry name, and fault found.

View data

Once the file has been checked you can request a data view of the data and any faults reported when the check was done will be highlighted, see the Data view window.

If used after Phase 4 checking, then the cleaned data will be shown.

Optional phase 4 (Clean and check data)

This is identical to Phase 3 except that the answers are cleaned.  This is sometimes called a "force edit".

The data file itself is not cleaned until it is saved; it is only the data view that is cleaned.

As faulty answers are found in a questionnaire they are cleaned (usually made empty) and filters and variables are calculated using the cleaned answers, not the original contents.

Faulty integer, float, date and time questions will be made empty.

For faulty single-coded and multi-coded questions, the cleaning of responses is:

Out of range - it will be deleted

Invalid response - it will be deleted

Response filtered - it will be deleted

Single-coded response with others - the response marked single-coded only will be deleted

Refused response - you can choose to leave this in the data or to delete it

Unallocated response - you can choose to leave this in the data or to delete it

Empty single-coded questions and multi-coded questions with a supplemental (user defined) reject will have that response set and no error is reported.

Optional phase 5 (Save data)

This button is used to save a new copy of the data file with the same name.

The original file will be saved in the Archive sub-folder.

In the new file phase 1 and 2 errors will be fixed, and if phase 4 has been run then any faulty data will have been removed.