Xã hội học - Data preparation, entry and exploration
Coding Categorical Data: existing codes may be used or new scheme of codes designed. Coding of categorical data can be done during the data collection or after data collection.
Coding at data collection: occurs when there is limited range of well-established categories into which the data can be placed
Coding after data collection: is necessary when you are unclear of the likely responses or there are a large number of possible responses in the coding scheme.
22 trang |
Chia sẻ: huyhoang44 | Lượt xem: 634 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Xã hội học - Data preparation, entry and exploration, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Data Preparation, Entry and ExplorationSamuel K. Frimpong (PhD)Outline of PresentationPreparing DataInputting and Checking DataExploring DataPresenting DataIntroduction to Microsoft ExcelData Preparation for AnalysisAs part of preparing your data for analysis, the following should be considered. That is, if you did not consider them before obtaining the data. Type of data (level of numerical measurement)Format in which the data will be input to the data analysis softwareImpact of data coding on subsequent analyses (for different data types)Need to weight casesMethods you intend to use to check data for errorsData EntryInvariably, all analysis software accept data in “table format” called “Data Matrix” Some primary data collection methods such as CAPI, CATI and online questionnaires can automatically enter and save data in this format. Some secondary data accessed from CD-ROMS can also be saved in this format without the need to re-type.For other data collection methods, you would have to enter the data manually in the data matrix form.Data MatrixTypically, data matrix has rows and columns Each column represents separate variables and each row contains the variables for an individual caseidVariable 1 (age)Variable 2 (gender)Variable 3 (level of education)Variable 4 (marital Status)Case 1127121Case 2219212Case 3324231CodingInvariably, all data should be recorded using codes. This enables quick data entry. It also makes subsequent analyses, in particular those that require re-coding of data to create new variables. The scheme for coding is called codebook.Coding Quantifiable Data: The actual numbers are often used. Some numbers can be grouped or combined to form new variables through a process called re-codingCoding Cont’dCoding Categorical Data: existing codes may be used or new scheme of codes designed. Coding of categorical data can be done during the data collection or after data collection. Coding at data collection: occurs when there is limited range of well-established categories into which the data can be placedCoding after data collection: is necessary when you are unclear of the likely responses or there are a large number of possible responses in the coding scheme.Coding Missing DataEach variable for each case in your data set should have a code, even if no data have been collected. A missing data code is used to indicate why data are missing. Four main reasons for missing data could be:Data not require from respondent probably as result of a skip/ filterNon-responseThe respondent did not known the answer or did not have an opinionThe respondent may have missed a question by mistake, or the respondent’s answer may be unclear.Data EntryOnce your data have been coded you can begin to enter them in the computer. Remember to give each data collection form an identifier which should tally with what is recorded on the PC.It is essential you take considerable care to ensure that your data are entered correctly.Remember that “Garbage in, garbage out”Data CleaningNo matter how carefully you code and subsequently enter the data, there will always be room for some errors. It is therefore extremely important you check you data for errors. Do so by:Looking for illegitimate codesIllogical relationshipsCheck that rules in filter questions are followedLook out for outliers: a respondent (observation) that has the one or more values that are distinctly different from the values of the other respondentsIntroduction to Microsoft ExcelUser InterfaceDescription of the InterfaceNavigation Keyboard ShortcutsInputting Formulas in ExcelFormula is a sequence of values and operations . Formaula in excel begins with (=) signEg. =B1+B2-B3 is formula to add the contents of cells B1 and B2 and subtract the content of cell B3. Exploring and Presenting DataOnce your data have been entered and checked for errors, you are ready to start your analysis. At this stage exploratory data analysis is useful. This approach emphasises the use of diagrams to explore and understand your data, emphasising the importance of using your data to guide your choices of analysis techniques.Exploring and Presenting Data Cont’dIt is best to begin exploratory analysis by looking at individual variables and their components. The aspects you need to consider should be guided by your research questions and objectives and are likely to include:Specific valuesHighest and lowest valuesTrends over timeProportionsdistributionsIn addition to the above, you could also begin to look at:Conjunctions (the point where values for two or more variables intersectTotalsInterdependence and relationshipsPractice DataIDAge Household sizeMonthly incomeA015541000A02282900A034455200B01311400B027041700B032522800C01913600C026263300C033832300Practice Tasks Using the data in the previous slide we will perform the following task:Sums of age, householdsize and monthtly income for:A01-A03; B01-B03; C01-C03; and A01-C03Averages of ages, householdsize and monthly income: A01-A03; B01-B03; C01-C03; and A01-C03Bar graph for the averages computed
Các file đính kèm theo tài liệu này:
- lecture_8_data_preparation_entry_and_exploration_0579.ppt