est was relevant to what they had learned. The result is shown in the chart below: Chart 1: Students’ response on test content Among four sections, functional language was perceived as the most relevant with the total proportion of 65%. Reading section was claimed to be the least relevant (31% only). Vocabulary was said to be a little more relevant to the syllabus than grammar (59% compared with 52%). Giving opinions on the test length, three fourths of the students (75%) found that the total number of 150 multi choice items was reasonable for them. 25% of them thought that it was too many. Answering the question whether the test as a whole had power to discriminate among students in the ability of interest, approximately 36 % of students determined that test items actually discriminate the student level of English. The rest of 64% claimed the level of discrimination was not remarkable. The result can be seen clearly in the following pie chart: Discrimination value 36% 64% high low Chart 2: Students’ response on item discrimination value In the fourth question, students were asked if they had enough time to fulfill the tasks given in the achievement test1. The following chart illustrates the result: Time length 84% 9% 7% enough not enough too much Chart 3: Students’ response on time length By observing the result in Chart 3, we realize that roughly 84% of students answered time management was not a problem for them. 7% of responses showed that time allowance was too much while 9% said that they needed more time to finish the tasks. Regarding the clarify of the test instruction, 90 % of student stated that the instruction was clear. Only 10% of them perceived it was quite unclear. When being asked about the influence of test supervision on the test result, 98% of students commented that test supervisors were strict. Only 2% of them acknowledged that they were under no very strict supervision. Students were also asked whether testing room affected their performance. 40 % of them claimed that the testing room did have impact on their test performance. 60% stated they were not affected. Responding to the question whether they experienced computer breakdown when doing the test and whether their test results were affected, a third of informants stated that they did and had to do the test again. 77% of them found it have a very negative influence on their test performance. The rest of 23% saw no impact. When being asked if they suffered from physical and emotional pressure when performing tasks, 45% of students admitted they did while 55% of them did not. With reference to test-taking behavior, 56% the informants responded that they did select the answers arbitrarily whereas 44% did not. The result was illustrated in the chart below. : Response Arbitrariness 56% 44% Yes No Chart 4: Students’ response arbitrariness Answering the question about prior exposure to the test format and content, 97 % of students realized that they were familiar with this type of test. And only 3% were not. This can be explained that they were the second year students and have done a number of tests. Concerning students’ computer skill, 61% of students claimed that they were good at using computer to do the test. 38% thought their skill was normal. Only 1% stated that they were not good at it. When asked whether any difference between doing the test with hard copy and soft copy exists, amazingly 50 % of the participants found it different and 50% did not although they were the second year student and experienced four times doing MCQs English test on computers. In the last question, students were asked whether the test scores reflected their actual achievement during the 4th semester. The result was presented in the following pie chart: test score and students' achievement 66% 34% exactly not exactly Chart 5: Students’ response on relation between test score and their achievement As it can be seen from Chart 5, 66% of students acknowledged that the test score actually reflected their achievement while 34% of them did not get the score as expected. From these results we can realize some points as follows: - Factors which do not affect students’ scores include students’ computer skill, students’ familiarity with the test format and content, test supervision, clarity of test instruction, and time allowance. - Factors affecting students’ test performance involve test characteristics, testee characteristics and test administration characteristic. Test characteristics include a large number of test items, low content relevance to the course book and low discrimination power. Testee characteristics consist of response arbitrariness, suffering from pressure and bad ability of reading texts on the screen. Test administration characteristic involves computer breakdown. Clearly when performing tasks, students were heavily influenced by both objective and subjective factors and therefore the results they got did not reflect their true ability as 34% of them claimed. In short, the test scores do not seem reliable from students’ perspectives. That is because students’ performances on the test were affected by a number of both objective and subjective factors. All of the findings to three research questions mentioned above lead to a conclusion that the MCQs test 1 does not yeild a reliable result. The unreliability of the test resulted from the performance of both test-takers and test-designers. As for test designers, they made the test of low quality. The allocation of items with difficulty among four sections was not reasonable. The items were also not really discriminating. As for test-takers, they did not perform the tasks well. Notably, according to the findings obtained from the comparison and analysis of test item content, there is high relevance between the test and the course book, especially in reading section. However, the findings from the questionnaire survey for students show that the test content is not actually relevant to what students have been taught, especially reading part. It is likely that the flunctuation from students when doing the test such as pressure, difficulty in reading texts on computers and response arbitrariness made them believe that the content of the test was generally 50% relevant to what they have learnt and their test scores does not reflect their true ability. Regarding to all aspects, the MCQs test 1 has one good point. That is, it is valid in terms of content. Nevertheless, this point is not enough to conclude that it is a good test as it lacks reliability. 5.4. Pedagogical implications and suggestions on improvements of the existing final achievement computer-based MCQs test 1 for the non-English majors at HUBT. In this section, some suggestions for test-designers are offered to improve the quality of the final achievement MCQs test 1. A good achievement test must be valid and reliable. In order to make a more valid achievement test, test designers should stick to the course objectives of developing speaking and listening skill when designing achievement tests. According to Table 4-section 3.2.2. illustrating time allocation and skill weighing, speaking and listening skill are the main focuses of the course book Market Leader Pre-intermediate, these skills therefore should be tested with relevant skill weight proportion. Furthermore, functional language section in the MCQs test should be removed as it is far from the real – life situations. In fact, appropriate responses to various stimuli in everyday situations should be produced rather than chosen from these limited responses. Instead, functional language should be included in speaking tests. The scoring format for semester 4 should be as follows: Semester 4 (12 credits) The first score (6 credits) The second score (6 credits ) Oral test 25% Oral test 25% Paper test 1 - listening - writing 35% 25% 10% Paper test 1 - listening - writing 35% 25% 10% Computer-based MCQs test 1 - reading - vocabulary - grammar 45% 20% 10% 15% Computer-based MCQs test 1 - reading - vocabulary - grammar 45% 20% 10% 15% Table 14: Suggested scoring format It is expected that this suggested scoring format should ensure the principle of “test what is taught”. In order to improve the test reliability, it is necessary to establish a testing committee of three to five people who will be responsible for test construction, administration and development instead of only one as it is at present. The testing committee should be made up of members with good knowledge, skills and experience of making MCQs tests. They are recommended to pay attention to the following three issues. First, the testing committee members on one hand should design MCQs tests themselves and on the other hand require teachers to make their own tests. Teachers should be provided with test design and test development techniques involving vocabulary, grammar and reading by the testing members so that they can construct tests of good quality. This can be done through regularly-held workshops. The main reason for this is no one can understand the students’ strengths and weaknesses better than these teachers. Therefore the tests made by them can be sure to be reliable and practical with the students. Both committee members and teachers need to clarify students’ levels of language in order to maximize the test efficiency. This information would be helpful for them to avoid designing items with undesirable difficulty and discriminality value. In addition, the content of the test should relate to and familiar to what the students are taught and learnt during the course as much as possible. The test should also be systematically built up on the ground of a carefully constructed test specification. Second, these test items should be carefully taken into consideration regarding the relevance to the course book content and then only acceptable tests item should be selected and piloted to students. The trial can be done at classrooms with strict supervision and it is preferable to enable students to do the test on computers in order to help them to get familiar with reading soft copy texts and to reduce their pressure. Third, the results obtained from the trials should be carefully analyzed and discussed in terms of test difficulty, test discrimination, instructions, time allowance, distractors in order to decide which items are good enough and which items need adjusting to put into an item bank. The item bank therefore can guarantee the variety of test choices, test quality and test confidentiality. Last but not least, the item bank needs to be updated, supplemented and adapted, especially after the achievement tests are given to students in each semester, with items of good quality for the consolidation and development of a standardised one. * A proposed Specification Grid for the final achievement computer-based test 1 for the 4th semester non –English majors in HUBT. Based on the findings of the study and the course objectives, a proposed test specification of the current 4th term English achievement MCQs test 1 is worked out as follows so that more accurate measures of students’ language competence can be achieved. The objectives of the final achievement objective test 1 for the 4th term non – English include: Checking what the students have learnt about vocabulary, grammar and reading and to what degree the objectives of the course have been achieved in the set timeframe. Assessing students’ achievement at the end of the course, especially to evaluate students’ results after learning three units of Market Leader pre-Intermediate book. Giving students’ feedback. The test results will be useful for students to see what they have achieved in their learning process. Identifying room for improvement for both teaching content and teaching methodology. That is, teachers will refer to their students’ scores/ errors to adapt their teaching methods, the syllabus content and materials so as to make them more appropriate to their students’ needs and abilities. The following is the grid of this tests’ specification. Achievement test: Paper specifications grids Time allowance: 150 minutes Level: Pre – intermediate for non- English majors (Hanoi University of Business and Technology) Test of Reading, Grammar and Vocabulary Section Main skill focus Input Response/ item type Number of marks Skill weighting 1. Reading Reading for gist/ specific information including topics closely related to marketing, planning and managing Narrative or factual text. Approx. 60-80 words each X 41, 4 multi choice option 4.1 41% 2. Grammar Recognizing grammar items involving wh-questions, future expression and reported speech Narrative or factual text, approx.15-20 words each X 32, 4 multi choice option 3.2 32% 3.Vocabulary Recognizing vocabulary items including noun-noun , verb- noun and verb-preposition collocation Narrative or factual text, approx. 15- 25 words each X 27, 4 multi choice option 2.7 27% Table 15: Proposed test specifications 5. 5. Summary In this chapter, results and conclusions about three research questions of the study are drawn out and discussed. The findings show that the final achievement computer-based MCQs test 1 for the second year non-English majors at HUBT is to a certain extent not reliable. Thus some suggestions to make the test more reliable and high quality are given to test designers. Chapter 6: Conclusion 6.1. Summary of the findings Test reliability is undeniably an important criterion to define the quality of a test. The investigation and evaluation of the reliability of the final achievement computer-based MCQs test 1 are therefore useful to the judgment of the quality of teaching, learning and testing process at HUBT. Through data collected from students’ test score and item responses, the author find out the answers to three research questions about the compatibility of the test objective, test content and test format to the course objective and the syllabus content, the extent to which the test scores are reliable and the students’ attitude towards the test and then to come to a final conclusion about the reliability of the test. The findings of indicate that the MCQs test 1 is not a good test as it first of all lacks compatibility between the test objective and the course objectives. The skill weight format of the test and of the syllabus are also incompatible. Four sections of the MCQs test 1 cover language items in the course book but the coverage relevance is still problematic. In addition, the MCQs test 1 fails to meet one of the most important criteria- reliability . The unreliability exists due to some problems. First, test items are of low quality as a result of low item difficulty and item discrimination value. Item analysis and students’ perception of the test discrimination indicate that the test does not have good discrimination value. Students’ perception and reliability coefficient of the MCQs test 1 both also show that the test score that students gets are unreliable. Second, several characteristics involving test items, testees and test administration such as a large number of test items, low content relevance to the course book, response arbitrariness, pressure and ability of reading text on the screen and computer breakdown as perceived by students reduce the reliability of the test scores. On the basis of these results, the author provides some suggestions towards the improvement of the test quality. The reliability of the final achievement MCQs test 1 for second year non-English majors may be increased if it is constructed more relevant to the course objectives and syllabus and if test items are designed and withdrawn from an item bank of good p-value and discrimination value by an efficient testing committee. The author hopes that the study will give a detailed view of the Computer-based MCQs tests administrated at HUBT and the suggestions towards the test improvement will come into reality in order to properly assess students’ actual language ability during the process of learning Market Leader Pre-intermediate. 6.2. Limitations of the study The study on the reliability of the final achievement Computer-based MCQs test does contain some unavoidable limitations. Firstly, this thesis investigated only a minor aspect among a lot of facets of test reliability due to the limit of time and the scope of a minor MA thesis. That is internal consistency reliability. Secondly, the test item analysis does not include distractor tally which can bring a much deeper view on the test due to the fact that access to these data was impossible. Finally, the author only developed a set of questionnaire to evaluate the test reliability from students’ perspective. If the attitude and perception of the teachers on the test had been studied, the results would have been more comprehensive. 6.3. Suggestions for further study Considering the important of testing and the existence–to-a-certain-degree of the unreliability of the computer-based MCQs test, further research is needed to study its effects on language learning and assessing and identify coping strategies to help students promote their learning of four English skills while MCQs is still employed as a very useful testing technique. References 1. Alderson, J.C., Clapham, C. and D. Wall. (1995). Language Test Construction and Evaluation. Cambridge University Press. 2. Bachman, L. F (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press 3. Bachman, L.F; Palmer, A.S (1996). Language testing in practice: designing and developing useful language tests. Oxford: Oxford University Press 4. Brown, H. D. (1995). Teaching by principles. An Interactive Approach to language Pedagogy. London: Longman 5. Cotton, D. , David , F. and K. Simon . (2002). Market Leader- Pre-intermediate. Longman. 6. Harrison, A. (1983a). A Language Testing Handbook. London: McMillan Press 7. Henton, J.B. (1988). Writing English Language Test. Longman Group U. K. 8. Henton, J.B. (1990). Classroom testing. New York: Longman 9. Henning, G. (1987). A guide to Language Testing: Development, Evaluation, Research. Cambridge: Newbury House Publishers 10. Hien, T.T. (2005). The pros and cons of the multiple-choice testing technique with reference to methodological innovation as perceived by secondary English language teachers and students. Unpublished M.A Thesis, VNU. 11. Hughes, A. (1989). Testing for language teachers. Cambridge: Cambridge University Press. 12. Kunnan, A.J; Milanovic, M. (2000). Fairness and validation in language assessment : selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida . Cambridge : Cambridge University Press, 13. Lynch, B.K (2003). Language assessment and programme evaluation. Edinburgh: Edinburgh University Press. 14. McCOUBRIE, P. (2004). Improving the fairness of Multi choice questions : a literature review. Medical Teacher, Vol 26, No. 8, 2004, pp709-712. 15. Mc Namara, T. (2000). Language Testing. Oxford: Oxford University Press 16. Milanovic, M. (1999). Issues in computer-adaptive testing of reading proficiency. Cambridge : Cambridge University Press 17. Milanovic, M; Saville, N. (1996). Performance testing, cognition and assessment : selected papers from the 15th language testing research colloquium (LTRC), Cambridge and Arnhem. Cambridge : Cambridge University Press 18. Spolsky, B. (1995). Measured words. Oxford: Oxford University Press 19. Trang, H.V. (2005). Evaluating the reliability of the achievement writing test for the first-year students in the English Department, College of Foreign language, Vietnam National University, Hanoi and some suggestions for changes. Unpublished M.A Thesis, VNU5. 6. 20, 20. Weir, C. J. (1990). Communicative Language testing. Prentice hall International (UK) Ltd. 21. Weir, C.J. (2005). Language testing and validation: an evidence- based approach. Basingstoke : Palgrave Macmillan. 