Cleaning Data Collected with Survey Software II

Posted on : January 31, 2018 - by :

In the previous blog we discussed how your survey software should help you get clean data by not allowing various kinds of errors people answering your questions can make. Your tool should also provide ways to take bad data out of your data file.

One form of bad data is when people should have skipped questions, but didn’t. This kind of mistake is common on paper surveys, where people see an instruction telling them to skip ahead to Question X . This kind of mistake is less common on telephone surveys, where an interviewer is asking the questions and recording the answers. It can sometimes happen in web surveys when people make an answer that does not trigger a skip, answers one or more questions and then later backs up to the question that triggers the skip and then gives another answer that this time that does trigger a skip. The answers they gave before backing up might still be in the data.

Good survey software should let you find cases in which questions that should have been skipped were not skipped and offer you the choice of automatically removing the answers to the questions that should have been skipped for presenting with a list of data records that violate one or more of these skip (or branching) instructions.

The ability to view a list of the data records that violate skip instructions is especially helpful when working from paper surveys. Your program should let you be able to view that record on-screen to see the answer pattern. If you had your data entry people write down the ID numbers of the data records as they enter them, you could also go back to the paper questionnaire. Doing so may help you identify which mistake the person filling out the survey made. Did they fail to skip, which is most likely, or do their answers to any questions that should have been skipped suggest that the answer that was supposed to have triggered the skip might be the one that was wrong.

Another feature your tool should provide is to let you identify individual cases/data records that gave a particular combination of answers. This capability can be useful both in terms of cleaning out inconsistent answers and sometimes in flagging down particular cases that you may want to analyze further.