AN INTERACTIVE STUDENT EVALUATION SYSTEM

Mimi M. Recker
Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand
Mimi.Recker@vuw.ac.nz

John Greenwood
Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand
John.Greenwood@vuw.ac.nz
Abstract:
As we move into an era of "Quality Assurance," universities and schools are increasingly being called upon to assess the quality of their courses and teaching. Often, this assessment takes the form of student evaluations. This paper describes a pilot project at Victoria University of Wellington that uses the Web for conducting and collating results from student evaluations. Using the Web to successfully survey users is not new. However, due to the sensitive nature of these data and their critical role within and outside of the University, the long-term success of our project is intimately tied to the development of humane technologies that seamlessly fit into the work-flow of the institution. We describe the architecture of our system, and how its design and implementation attempted to address its many technical, administrative, and organisational requirements.

Keywords:
new applications, surveys, user studies, education.

INTRODUCTION

As we move into a new era of "Quality Assurance," universities and schools are increasingly being called upon to evaluate the quality of their courses and teaching. One common method is via student evaluations. At the Victoria University of Wellington, student evaluations have been conducted since 1988. Currently, they take the form of paper-based questionnaires in which students answer questions on a 5-point Lickert scale and, optionally, write in their comments. In 1994, over 52,000 student evaluations were collected totaling close to a half a million individual responses (Hall and Turner, 1995). The results from such evaluations are important in improving the structure, content, and teaching of courses. They are also important in the staff promotion process. However, the number of evaluations requested per year continues to increase rapidly and, as a result, the overhead of processing such information manually is a growing burden.

This paper describes a pilot project at VUW that uses the World-Wide Web (Berners-Lee et al., 1994) as an architecture for conducting and collating results from student evaluations of courses and teaching. Using the Web to successfully survey users is not new (Pitkow and Recker, 1995). However, due to the sensitive nature of these data and their critical role within and outside of the University, the system must satisfy several requirements if it is to survive. Some of the requirements are technical, for example providing easy access to students. However, many of the requirements are organisational, and tied to the local context of the University. As is the case with most technological innovations, long-term success is intimately tied to the development of "humane" technologies that seamlessly fit into the work-flow of institutions.

In the remainder of this paper we describe the architecture of our system, and how its design and implementation attempted to address the many technical, administrative, and organisational requirements of a successful evaluation system. We describe the results from an empirical evaluation of the system involving a user study with over 80 students enrolled in two courses in the Faculty of Commerce and Business Administration. We conclude with a description of proposed extensions to the system.

EVALUATION SYSTEMS: REQUIREMENTS AND IMPLEMENTATIONS

Given the important role of student evaluations within the University's infrastructure, an evaluation system must be designed to satisfy several key requirements. These key requirements are derived from user, staff, staff, administrative, and organisational considerations.

User perspective

The system must be accessible and easy to use by students.

Instructor perspective

The system must support flexible and tailorable design of questionnaires.

Administrator perspective

The system must guarantee several aspects of students' responses. Responses must be anonymous. Prior research reports somewhat lower ratings when student responses are anonymous, especially if evaluations are administered before grade assignments are made (Feldman, 1979). Responses must be authenticated, confidential, and there must be no more than one response per user. Additionally, the system must be reliable and robust.

Organisational perspective

The system cannot incur substantial overhead, and must provide for reliable and valid data on the "quality" of university courses.

Existing Implementation

The existing, paper-based system, like others of its kind, attempts to address many of the above requirements. Naturally, an in-class evaluation is accessible to students, and paper is, of course, a familiar medium. For staff, the system is flexible and tailorable, however this comes at the cost of requiring close interaction between the instructor and the evaluation administrator. There are also staff costs as they are responsible for distributing and collecting evaluations, then returning them to the administrator for collation and analysis.

Anonymity of response is preserved as students do not write their names on the evaluation form--although there is the danger, especially with smaller classes, that handwriting may be recognised. The paper-based system assumes authentication because the evaluations are carried out during normal class times--with larger classes especially, there is no guarantee that all persons present are valid participants in the course. Confidentiality relies on neighbouring respondents not looking over others' evaluation forms. One response per student can not be guaranteed, especially amongst larger groups. Finally, data collation is expensive, incurring laborious and tedious data processing costs.

Pilot Implementation

Our pilot system addresses the above requirements as follows:

User perspective

Fortunately, the Web and HTML 2.0 with Forms provide a convenient, point-and-click interface for collecting on-line student responses. This means that the system is intuitive and easy for students to use.

The system is easily accessible by students as it is available via VUW's campus-wide Web server, Panui. As the number of computer-labs, campus modems, and classrooms with network drops continues to increase, a cross-platform, networked, client-server evaluation system becomes the most viable solution.

Instructor perspective

The system maintains the flexibility and tailorable design of the current paper-based system.

Administrator perspective

The system was designed to guarantee confidentiality and anonymity of student responses, while allowing no more than one response per user. We were reluctant to use student identification numbers for controlling access to the system, as we felt this could compromise student anonymity. Instead, as we will describe, students were randomly assigned codes, which they used to access the system. Authentication was assumed because of the relatively small sizes of the classes participating in the pilot.

Organisational perspective

Clearly, on-line collection and collation of data improves efficiency, reduces paper-related costs, and eliminates the possibility of data-entry error. The system provides a means to reduce staff workload, while remaining flexible and tailorable.

Student response data are automatically collated, processed, and logged via Common Gateway Interface (CGI) scripts that reside on the campus Web server. This provides a reliable and robust method of data collection.

PILOT SYSTEM: ARCHITECTURE

Interaction scenario

A typical user scenario conveys a sense of the interaction with the system. First, the student is randomly given an access code, and access to a Forms-compatible Web browser pointing to a scheduled evaluation. This entry screen presents a type-in box into which the student enters the code. If the code is both valid and unused, the student is then presented with a screen that confirms the code's validity. The student then clicks on a button to continue, and the first evaluation is presented. After completing and submitting the first evaluation, the student is either presented with a "thank you" note, or is asked to click on a button to proceed to the next evaluation. This continues until all required evaluations for the student's course have been completed.

Architecture

In choosing the Web as our evaluation infrastructure, we were presented with several architectural challenges. The first challenge involves allowing access to the correct course or teaching evaluation by authorised students only. Moreover, access must guarantee anonymity of student responses, while allowing no more than one response per user.

Student anonymity is implemented by randomly assigning a code to students from a pre-determined list, which they use to enter the system. Before allowing the student to complete the evaluation, the system checks the validity of the code and ensures that it has not previously been used. This code also corresponds to specific courses, and is thus also used to determine the set of evaluations that is presented to the student. While we are not entirely satisfied with this approach, we have yet to determine a better solution that preserves student anonymity.

The second challenge involves the infusion of state information into the stateless HTTP protocol. This is required to keep track of which evaluation forms should be presented to a student based on his/her anonymous code. This is also required in order to keep track of evaluations already completed by a student. Following an earlier approach (Pitkow and Recker, 1995), we use the hidden attribute of the TYPE field used in input forms in HTML 2.0. This attribute passes state information from a form to a CGI program in a way that is invisible to users.

Figure 1. Architectural components on the Web evaluation system.

Figure 1 shows a sketch of the major architectural components of the evaluation system. The evaluation system resides on the VUW's campus Web server, Panui. This server operates NCSA's HTTP version 1.3 and runs on an SGI Indigo running IRIX Release 5.3 (MIPS R4400) with 256 megabytes of RAM. All CGI scripts are written in GNU C. All forms use the POST method in order to avoid hard-coded limits on the length of URLs that are present in some browsers.

The first script, the Code Checker, checks the code entered by the student against its master list of codes. If the code is valid and has not yet been used, the student is allowed to proceed. Otherwise, an error message is returned to the user via an HTML file.

Upon entering a valid code, the Evaluation Presenter script is executed. This presents to the student the first evaluation for that student's course. The evaluation contains radio and menu selection buttons, and text-entry boxes. Once the responses are submitted by the student, the data are then processed and stored in a file. Button data are stored in matrix format, while text-entry data are stored in a separate file. In order, student are presented with the requisite evaluation forms for their course until they have all been completed.

PILOT SYSTEM: EMPIRICAL EVALUATION

The evaluation system was evaluated via a user study involving over 80 students enrolled in two courses in the Faculty of Commerce and Business Administration. The primary goal of the study was to test the technical reliability and robustness of the system. From an administrative point of view, we also wished to determine how easily the new system would integrate into the existing and ongoing evaluation process. Finally, we wished to compare possible differences between students' ratings in computer-based versus paper-based questionnaires. While we did not expect the evaluation medium to significantly affect students' ratings (Bridgeman and Schaeffer, 1995), it is crucial to empirically examine this possibility in order for the on-line system to gain acceptance by students, staff, and administration.

Method

In the study, students in two courses completed course and teaching evaluations using the Web-based system. In the first course, 13 students completed a lecturer evaluation. The evaluation contained 8 questions, drawn by the course instructor for the University's standard question database. As with all evaluations, questions were answered on a 5 point Lickert scale, from "1. strongly agree" to "5. strongly disagree." This was implemented by using radio buttons and menu selections. This first evaluative study served primarily as a feasibility study of the technology.

In the second course, 69 students completed course and lecturer evaluations. The evaluations contained 8 questions on a 5-point scale, again selected by the course instructor. In addition, the course evaluation contained to text-entry boxes, into which respondents could type comments.

The second course was used as an opportunity to compare possible differences between students' ratings using computer-based versus paper-based questionnaires. Half of the students were randomly assigned to complete the lecturing evaluation using the Web-based system, followed by the paper-based course evaluation. The other half first completed the paper-based lecturing evaluation, followed by the Web-based course evaluation. In this way, all students were provide responses using both media.

In addition, after finishing the Web-based evaluations, students were presented with an additional form containing 2 radio-button questions. The first question asked students to rate the ease-of-use of the system; the second asked students to rate how well they felt their anonymity was preserved. As above, questions were answered on a 5-point scale.

On the scheduled day, students were asked to proceed to a reserved computer laboratory, which contained networked PC's running a Windows version of Netscape 1.0. Prior to entering the lab, students were randomly given an index card containing a code. The computers in the lab were pre-loaded with the evaluation page, and the students entered their code, and completed the evaluations. Students were encouraged to ask questions if they were confused or encountered problems, though this did not occur. They then returned to their regularly scheduled course. The paper-based evaluations were simply completed in students' usual lecture theatre.

Results: The medium is not the message

Since the purpose of the first study was to test the reliability of the technology, only data from the second study are reported. Due to an administrative error when randomly distributing codes, not all students completed all evaluations. In addition, the data from 4 students were excluded due to a system error. Sixty-nine students completed the evaluation of lecturing, with 35 using paper and 34 using the Web-based system, while 59 students completed the course evaluation, with 39 using paper and 20 using the Web-based system.

The primary purpose of the second study was to determine if the medium used for the evaluation (paper vs. computer) affected student responses. For reliability, three statistical analyses were performed. In brief, none of the analyses showed significant differences between media on students' responses to the two evaluations.

First, a repeated measures ANOVA with evaluation questions as the repeated measure and medium as the independent variable was performed. For both evaluations, the ANOVA was not significant (both F's < 1).

Second, independent t-tests on responses to each of the questions for the lecturing evaluation showed no significant differences between media (all t's < 1). The course evaluation also showed no significant differences, though the response to one question tended toward significance. For reasons that are unclear, the students using the Web system gave slightly higher responses to this particular question, t(57) = 1.45, p = .15.

Third, because responses were ordinals on a scale from 1 to 5, we also performed a non-parametric analysis. Again, no significant differences were found between responses to the lecturing evaluation. Responses to the course evaluation showed no significant difference, except that, as above, the response to one particular question tended toward significance, G(1) = 2.02, p = .15.

                  N       Median     Mean      St. Dev.   
Ease-of-use       42      2          2.21      1.07       
Anonymity         37      3          2.81      0.96       

Table 1. Results from system evaluation (on a scale from 1 to 5).

Lastly, only 42 students answered the 2 questions that asked respondents to rate the system. Recall that questions were answered on a 5-point scale, from "1. strongly agree" to "5. strongly disagree." Students appeared to find the system generally easy-to-use. Surprisingly, they seemed much less convinced of the system's ability to preserve anonymity: the median score indicates many students felt that their anonymity was not preserved (see Table 1). Thus, despite our effects, the "perceived" anonymity of student responses remains problematic.

LIMITATIONS AND FUTURE WORK

In this paper, we described a prototype Web system for conducting student evaluations of courses and teaching. We also presented results from an empirical evaluation of the system involving over 80 students in 2 courses. The results suggest that, overall, the system does not affect the reliability of students' response. Students found the system easy to use, but, despite our efforts, appeared somewhat concerned about the anonymity of their responses.

Our pilot project demonstrates the use of the Web in a real-world, time-critical task. However, several issues and shortcomings need to be addressed if we wish our project to penetrate successfully into the existing administrative and organisational culture of the university. In particular, long-term success depends on demonstrating that the new approach offers significant savings over the existing system. People and organisations are reluctant to adopt new technologies unless expected benefits far exceed perceived costs.

Our approach demonstrates savings in terms of paper-related and data processing costs. While these are certainly important considerations, we believe that long-term penetration will best be achieved by the potential of offering evaluations that are free from time and place constraints. Beginning in 1996, VUW will allow full network access to all students from any lab on campus, and will offer increased remote access facilities. Thus, Web-based evaluations could be made available to students during specific time periods for students to complete at their leisure. With easy network access, students could complete the evaluations from a lab, or from their home.

However, with remote access, steps must be taken to ensure that only enrolled students can complete the appropriate evaluations. Authenticated access requires the use of student identification and PIN numbers. While this method removes anonymity of response, confidentiality is maintained since the evaluation system is administered by a third party and course instructors only see aggregated results. This method would also replace the "anonymous code" approach used in the pilot system, which we felt was clumsy.

Finally, a Web-based evaluation system raises new issues in what is called the self-selection problem. No survey methodology is immune from the problem of biased results, which result from a skewed sample of the population choosing to respond to a survey. Electronic, "asynchronous" surveying is uncharted territory, and research is needed to identify potential biases that may arise.

It would be desirable to couple our system with a suite of tools that enables instructors to select questions from a questionnaire database in order to generate a personalised HTML/Forms evaluation for use in their courses. The resulting evaluation would use the Web-based infrastructure for reliable, confidential, and automatic collation of student responses.

REFERENCES

Berners-Lee, T., Cailliau, R., Luotonen, A., Nielsen, H., and Secret, A. (1994). The World-Wide Web. Communications of the ACM, 37(8):76--82.

Bridgeman, B. and Schaeffer, G. (1995). A comparison of gender differences on paper-and-pencil and computer-adaptive versions of the Graduate Record Examination. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, California.

Feldman, K. (1979). The significance of circumstances for college students' ratings of their teachers and courses. Research in Higher Education., 18:3-124.

Hall, C. and Turner, I. (1995). Report on Evaluation of Teaching Procedures-1994: Report to the Assistant Vice-Chancellor. Victoria University of Wellington.

Pitkow, J. and Recker M. (1995). Using the Web as a Survey Tool: Results from the Second WWW User Survey. Journal of Computer Networks and ISDN Systems, Vol. 27, no. 6.

AUTHOR INFORMATION

Mimi Recker received her Ph.D. from the University of California, Berkeley. For two years she was a Research Scientist in the College of Computing at Georgia Tech, and is presently a lecturer at Victoria University of Wellington. Since her first job as a software engineer on the Arpanet project (the early predecessor to the Internet), she has been interested in the technological, social, and educational impacts of global information technologies. She has authored over a dozen technical papers on the subject.

John Greenwood is a senior lecturer in Information Systems and Collaborative Learning Technologies and Director of the Diploma in Information Systems programme at Victoria University. John has researched in the area of IS planning and management, artificial intelligence applications in business, and is currently examining issues arising from electronic commerce and the information superhighway.