<< Click to Display Table of Contents >> Navigation: Reference Manual > Windows and dialog boxes > Data check > Data check |
From the Main window menu [Data] [Check data file] is used to check the contents of a data file.
This enables you to inspect data for specific errors. You should check a data file before running any analysis.
When this is first opened you are asked for the name of the data file to check. If it does not have a valid file name extension then the software will try to determine the type from the contents.
The checking is done in three or four phases:
This phase is done when a data file is opened.
The whole data file is read and collated.
This phase only deals with structural problems (card and serial numbers).
It lists the number of times the following types of error occurred:
•Raw data structure errors - this refers to faulty serial and card numbers, some data lines may have been ignored
When phase 1 is complete the program will go straight onto phase 2.
Use this button to open another data file for checking.
If any errors were encountered during Phase 1 you can view the Data log with a list of the errors found, each with position and fault found.
This phase is done when a data file is opened after the structure has been checked.
This phase will only clean data that is not recognisable for the type of question. It only looks at the question data and does not check filters and variables.
It lists the number of times the following types of error occurred:
•Faulty data (not a number) - for questions that should have a number stored in the data, the contents are not a valid number; these questions will be made empty
•More than 1 response in single-coded - there is more than one response to a single-coded question, only the lowest numbered response will be kept and used
•Data not referenced by any question - this will tell you if any questions or responses are missing from the project
If any errors were encountered during Phase 2 you can view the Data log with a list of the errors found, each with the serial number, entry name, and fault found.
This phase is used to check the data file against the questions and filters in the project and report any errors. This is sometimes called a ”report edit”.
The errors that can be reported are:
•Hold filters true - these are used to check for inconsistencies in the data
•Empty data - there is nothing in the entry and it is not marked as blank allowed; optionally for single-coded and multi-coded questions, it also does not have a supplemental reject response
•Not empty - this question is filtered but does have an answer
For integer, float, date and time questions:
•Out of range - the answer is not within the range set for the question
For single-coded questions and multi-coded questions:
•Out of range - the response number is not in the response list for the question
•Invalid response - the response is marked as a supplemental reject or as print only
•Response restrictor - response restrictors are applied to the question and the answer is not valid in one of them
•Single-coded response with others - a multi-coded question has more than one response and one of them is marked as single-coded only if present
•Refused input response - the answer is marked as not available for input
•Unallocated response - the answer has no text allocated to it
You can choose which types of fault are reported using the check boxes alongside the counts.
This button can be used as often as required to run Phase 3 again on the opened data file.
Before running each time you can:
•Check all questions or select entries from the Main window and only report faults in those selected questions
•Choose whether to include ZZZ question entries in the checking
The entire data file is always checked.
If any errors were encountered during Phase 3 or 4 you can view the Data log with a list of the errors found, each with the serial number, entry name, and fault found.
Once the file has been checked you can request a data view of the data and any faults reported when the check was done will be highlighted, see the Data view window.
If used after Phase 4 checking, then the cleaned data will be shown.
This is identical to Phase 3 except that the answers are cleaned. This is sometimes called a "force edit".
The data file itself is not cleaned until it is saved; it is only the data view that is cleaned.
As faulty answers are found in a questionnaire they are cleaned (usually made empty) and filters and variables are calculated using the cleaned answers, not the original contents.
Faulty integer, float, date and time questions will be made empty.
For faulty single-coded and multi-coded questions, the cleaning of responses is:
•Out of range - it will be deleted
•Invalid response - it will be deleted
•Response filtered - it will be deleted
•Single-coded response with others - the response marked single-coded only will be deleted
•Refused response - you can choose to leave this in the data or to delete it
•Unallocated response - you can choose to leave this in the data or to delete it
Empty single-coded questions and multi-coded questions with a supplemental (user defined) reject will have that response set and no error is reported.
This button is used to save a new copy of the data file with the same name.
The original file will be saved in the Archive sub-folder.
In the new file phase 1 and 2 errors will be fixed, and if phase 4 has been run then any faulty data will have been removed.