Learning assessment can affect students in different ways, influencing their selfesteem, their motivation for learning, and their attitudes towards the teacher. Particularly, affecting learning, even more so than how teaching might, since it sends powerful messages about the discipline being evaluated due to it being an intermediate that gives relevance to and emphasizes certain knowledge, abilities, and attitudes more so than other means.
In this respect, it is important to point out that different methods of assessment stimulate students towards respective manners of preparing their studies, and as such committing to knowledge through those means, as well as promoting different perceptions about their own capacities. The strategies of each depend on factors such as interest in the subject, the nature of the academic motivation, and the student´s perceptions of what is to be expected in their evaluations. In addition, the distinctive interrogative styles of each professor require different responses and, just as with knowledge, these are not independent of the particular manner with which the teacher teaches in class, nor independent of their style of assessment. Many studies have concluded that, for students, assessment is the clearest and most direct way of knowing the authentic intentions of their professors; that is, assessment gives significance to the curriculum (Gulikers, Bastiaens, Kirschner & Kester, 2006; Entwistle, 2000; Goñi, 2000; Scouller, 1998; Thompsom & Falchikov, 1998). In other words, any subject learned, but not evaluated, is developed with difficulty, since the students shift their attention and efforts towards those contents and abilities that are objects of evaluation. The most important findings on the influence of assessment point out, first, the value of formative evaluation in improving student learning throughout different levels of scholastics, with special emphasis on feedback, regardless of the form the latter takes (Black & Wiliam, 1998). Second, Crooks (1988) in his work on formal and informal evaluation, highlights that influence can be positive in the short, medium, and long terms, given certain conditions, and can aid the student in focusing their efforts on that which is most important, in monitoring their own progress and developing selfassessment skills, in motivating them to learning, in developing study strategies, and even with perceptions on their own abilities and on future success or failures. Another important cluster of research puts an emphasis on the relationship between the characteristics of the evaluation perceived by students and their learning approaches. This research establishes that the means of assessment influence the manner in which studies are focused, and, therefore, in learning development. Specifically, evaluation procedures which are perceived as inappropriate by students tend to build in them superficial learning goals (Struyven, Dochy & Janssens, 2005). For example, Scouller (1998), in a quantitative study, found significant differences in student perceptions on two different types of written exams, and established that students might employ superficial focuses when preparing for a multiple choice test, and a focus with more depth when preparing for a written essay, since this latter promotes a more appropriate learning context in which the students must show higher level communicative skills. Some authors state that, in an ideal scenario and when faced with an evaluative task or question, students put a large repertoire of different types of knowledge and ability into play and interact with the form and content of the task or question, as according to the needs the response requires (Shavelson, RuizPrimo, Li, & Cuauhtemoc, 2003; Camilloni, Basabe & Feeney, 2009). However, wellaware of the fundamental importance these hold for their present and future lives, students begin to develop survival strategies relating to grades early, and more specifically of how to correctly respond to assessment procedures. Tang (1994) calls this adaptive effect backwash, which many assessment tasks and their different requirement levels can produce in students; the study concludes that one of the consequences is the search for clues which can allow them to obtain a better score. The influence of assessment procedures on learning Technically, an assessment procedure is any means by which information relating to student learning is collected. The subsequent analysis of that information, starting with contrasting them with assessment criteria, allows for a judgment of both learning and teaching quality, and further allows for making founded decisions (Himmel, Olivares, and Zabalza, 1999). Nonetheless, any discussion on the assessment procedure cannot be carried out by solely taking into account with a technical point of view. Considering that the stages of design, construction, application, and subsequent correction are loaded with subjectivity. As such, the form in which a professor presents tasks or questions to their students, the content of the disciplines that they evaluate, the format of assessment chosen, and even the weight and grading scale (if it has been decided to use one), do not constitute neutral or empty elements; rather, they are intimately related to beliefs about their students, the field they teach, learning, teaching in general, and assessments in particular. At the same time, students compare themselves to assessment processes from their own perspectives, which have been constructed over their scholastic careers. Written exams seem to be the most frequently applied format for assessment in the humanities and scientific fields in many countries (Barberá, 2002). In Chile, written tests used in assessment are generally constructed and applied by the teachers themselves, a situation that could be considered advantageous, since there are more possibilities for keeping coherence between what is taught and what is evaluated (instructional validity). That said, the most utilized types of testing are either objectivestyle or questionnaires, with close or semiclosed questions, marked by what might be called a more or less traditional assessment, and have technical orientations, all of which lead to a qualitative gap between the general concepts in the curriculum and their concrete evaluative practices (Barberá, 2002). Specifically, in the area of mathematics, there is a tendency to design written tests with questions that go from “easier” to “more difficult,” that is, to put them in ascending order according to the level of cognitive requirement. For example, the first questions refer to aspects that are learned by memorization, then those for which calculations must be performed, and finally questions that look for applications, commonly known as “problems” (Yañez, Castro, Castillo, Catalán & González, 2008). At the Latin American level, the research conducted by Beatriz Picaroni, in classrooms from K8 on different areas of discipline in eight countries, characterizes mathematics assessment processes as a set of tasks associated with the recognition of some names and characteristics, especially in geometry, and any application is more of a mechanical exercise for meaning, not unlike an isolated calculation without real context. This situation lowers the possibility that the children are able to reassign their knowledge into different situations and adapt them to new needs (Picaroni & Loureiro, 2010). In the same way, research in Spain with students and teachers from either elementary or high school settings, Remesal (2006) concluded that teachers consider, for the most part, the focus of mathematic learning assessments should be on the final result, according to prevailing thought of mathematics as exact science. The largest numbers sustaining this conception were gradeschool teachers. In particular, a large percentage of these teachers work with mathematical problems with a narrative structure of the type “informationquestion that relates to information,” followed by other activities called problems (but which actually correspond directly to direct question structures). Notably, there is a total lack of complex tasks, little defined and wide enough in that the student must formulate the situation in order to interrelate their mathematical knowledge with that from other disciplines. Research problem The cited research above gives evidence that written tests, such as those used in practice, are limited to showing preference towards memory learning. These tests do not consider that, in order for knowledge to be used, it needs to be conceptually interrelated, since this is the form in which knowledge is structured, instead of demanding the student relate their knowledge with neither previous learning nor aspects of real life. Finally, these tests affect student conceptions on the nature of knowledge in a given discipline, and the form in which it is constructed. Given the importance and effects that learning assessment can have, particularly in the area of mathematics (in which this process is preferentially performed through written tests), descriptive research was designed in order to characterize the curriculum being evaluated, and the form in which it is assessed, for sixth grade mathematics in educational institutions in the region of Valparaíso, Chile. Specific objectives proposed were, first, to analyze the contents and abilities that are being assessed and given marks through written evaluation processes; second, to determine and analyze the degree of coherence between the curriculum suggested by the Ministry of Education and the curriculum evaluated; and finally, determine and analyze the forms in which mathematic learning is evaluated and graded through written assessment procedures. The area of mathematics was chosen since the abilities involved, such as resolving problems, representations, modeling, arguing, and communicating all have an important role in the acquisition of new skills, construction of learning in different disciplines, and in the application of knowledge to resolve problems in mathematics (routine or not) and other areas (Ministerio de Educación, Marco Curricular 2011). It was then decided to develop this study at the final grade level of General Elementary Education due to the fact that it is the level in which a stage of formal education culminates and, as such, the students should be able to achieve all the learning objectives expected of them for all basic requirements in the sector of mathematical education. MethodsPopulation and Sampling Documents, as object of research in this project, were formally defined as: the procedures of written assessment, or written tests, in their totality of formats and reach, designed and applied by teachers from different establishments in the Valparaíso region that are used to collect information about 6^{th} grade student learning in mathematics, and are used for grading purposes. Although sampling had to be done on the documents, this could not be undertaken since the total population of written tests was not known. For this reason, it was decided that the 6^{th} grade course would be the sampling unit from which written tests would be selected. To understand the number of 6^{th} grade courses in the region, the list of schools from the Ministry of Education website[1], which mentioned a total of 883, was taken as the number for the original population. The definite population number was constructed as the following criteria were applied: First, the project was to work only with simple 6^{th} grade courses, that is, any courses from juvenile correction facilities, establishments for speciallycapable children, children´s homes, remedial studies, or adult education were excluded; second, courses from rural educational establishments were excluded, as well as those located in island regions (such as Easter Island), since these include logistically nonfeasible contexts for the resources available for this research. Previously stated groups were discarded from the study, and in light of the respective courses, there was a population of 791 6^{th} grade courses. Considering a confidence level of 95%, an error of 5%, a proportion of p = 0.1 (and its complement, 1 – p = 0.9), we obtained a global sample of 122 courses. From these 122 courses, written tests that lead to grading and applied during the 1^{st} semester of mathematics were solicited. Considering that each teacher would give, on average, four written tests per semester, the definitive sample was calculated as 488. The design for the selection corresponded to a stratified twostage cluster sampling. The defined strata correspond, in turn, to a division of policy among the six provinces in Valparaíso (Los Andes, Petorca, Quillota, San Antonio, San Felipe, and Marga Marga). Table 1 shows the number of courses or sample units selected by strata, each of which is proportionally represented. Table 1. Distribution of courses per province
In order to select the courses, an Excel database was created to show the strata (provinces), the schools, and the number of sixth grade courses in each of them. With the support of a statistical program, a random selection followed. Procedure In the first place, contact was made with the education establishments whose courses had been selected for the sample. Authorization was solicited from the principals and subsequently from the sixth grade mathematics teachers, and as such the tests applied and grades therefrom were obtained for the first semester of 2012. The teachers were asked to sign a consent form as well as to provide information about their teaching qualifications and the curricular framework used (Bases Curriculares 2012, Ajuste Curricular Revisado 2011, or both). Some difficulties were present in collecting the written tests, with the principle problem being that many educational establishments did not wish to participate; to a lesser degree, some teachers committed to participating, but did not send their tests as promised, while others submitted only some tests and some not all. In these cases, replacement schools were sought out through the same random selection procedure mentioned above – this being said, the difficulties continued. In practice, a total of 103 written tests were obtained, sourced from 27 different educational establishments. Although the number was less than originally defined, the lack of variability within the tests suggested that new information would not substantially change the findings. For analysis of the collected documents, a codebook was constructed, that is, a set of previously defined classifications whose descriptions included the definition, rules of application (or not), as well as examples. The formal codes, applied by students in the Masters of Education program, were applied to:
As for the codes related to skill, content and level of requirements for the area of mathematics in each question, expert pedagogical judges applied the following codes:  Skills present in the 2011 sixth grade mathematics program, as prescribed by the Ministry of Education  Expected content and learning present in the 2011 sixth grade mathematics program, as prescribed the by Ministry of Education.  Level of difficulty of each question according to the 2011 Mathematics TIMMS skill classification[2]. ResultsThe Chilean school system commonly uses a classification related to a school´s administrative and financial dependence upon the state. Traditionally, the schools are divided into municipalities, which are public establishments on state property that receive financing from the state and are administrated by their respective municipalities; then there are the particular subvencionado [“privatesubsidized”], which are privately owned and financed, but receive state funds for each student that is enrolled and attends classes; and finally, private schools, which are owned, administrated, and financed privately (Roco, 2010). Currently, the municipal establishments constitute 45%, the “privatesubsidized”, 50%, and the private schools, 5%[3]. One element necessary to point out is that there is a tendency in the country to associate quality of learning with the administrative dependency of the school. This is based on a very simplistic analysis of the results of the main standardized test that is applied annually (Sistema de medición de la Calidad de la Educación, SIMCE [“Educational Quality Measuring System”]). This perception leads to the supposition that teaching, learning, and evaluation processes, among others, are developed with distinct characteristics closely associated with the methods of dependency, a situation that presents subtlety in the case of this research project. The number of tests per degree of dependency is shown in table 2. As observed, the greatest disposition towards using written tests corresponded to establishments of the “privatesubsidized” type. Table 2. Frequency and percentage of number of tests per dependency
Access to private establishments, as well as those of municipal character, was more difficult. In first place, due to the resistance towards providing written tests, and second due to the bureaucratic process of obtaining authorization in order to solicit the material. In total, of the establishments mentioned, data from 27 professors representing 27 schools were obtained; in addition to giving their tests, they provided complementary information of their professional profiles. Some complementary information can be obtained from their credentials: 15 of these teachers had general teaching degrees with a minor in mathematics, seven had general degrees, and the five remaining had a degree in highschool math. If one considers the diplomas of the teachers who provided tests for analysis, divided by dependency, one finds that those teachers with a minor in mathematics were the group that provided the most instruments for revision, and that most of them are from privatesubsidized schools. The information is shown in table 3. Table 3. Frequency and Percentage of number of tests provided by dependency and degree
Another type of information the teachers provided was the type of mathematics curriculum they were working with: the adjusted curriculum, the newest version, or both. The information is shown in table 4. Table 4. Frequency and Percentage of number of teachers per dependency and mathematics curriculum.
As seen, one third of the teachers were working, at least up through the 1^{st} semester of 2012, with the adjusted curriculum, followed by the group that had adopted both styles of curriculum. a) Type of exams and questions The collected tests had little variation in their format, and were classified as one of only three types: questionnaire (semiclosed response), objective (closed response), and combined (mix of the previous two). There were no others formats of testing, like essay style for example. Table 5 shows the distribution. Table 5. Frequency and Percentage of type of test and number of questions they contained.
Table 6 shows the compositions of the tests as a function of the frequency and percentage of the questions. Of the questions identified, the clear majority were of the questionnaire format, followed by the closedresponse format (multiple choice, true or false, matching, and completion exercises). Regardless of this distinction, it was clear that the questionnaire type tasks were constructed in such a way that called for closed responses. This is not only because they exclusively sought the correct answer without considering development, but also because of the strategies and mathematic operations involved. Table 6. Frequency and percentage of questions by type of test
b) Grading scale In terms of the level of requirements to obtain a minimum score, the cutoff point tended to differ from test to test. The results are shown in table 7. Table 7. Frequency and percentage of appearance of different requirement levels according to type of test
Appreciably, the most common requirement in mathematics tests corresponds to 60% correct, without much difference between the types of tests. c) Errors committed in the questions Another interesting category of data refers to some formal aspects of the questions that were analyzed, which affected the possible responses of the students and led them to make an error. Among these formal aspects we find lack of instructions, lack of answer space, unclear images, poor or tricky wording, lack of data, and unrealistic results. Graphic 1 shows the percentages of different errors from the questions. Graphic 1. Percentage of appearance of general errors in the questions As seen above, the most common error is the lack of instructions (44%), followed by lack of space to write the answer (38%), unclear instructions (35%), and implausible prompts (35%). Below, some sample questions from which some errors were identified are presented, accompanied by their respective categorical description. There are no instructions for how to answer. It only presents a general category for any question that does not formally state how the student should answer. This error was found in 44% of the questions reviewed. Example. There is no question per se. The student must assume that what is asked for is a numerical value, which is to be determined afterwards as an unknown value. Image 1 Example of question error: lack of instructions
Improbable prompt. In this category, any question in which the prompt is improbable, too artificial, impossible, or absurd is included. This error was found in 35% of the reviewed questions. Example. It is possible that the students might have been exposed to this type of exercise in class, in which there seems to be an attempt to make their development more entertaining. Notwithstanding, the context is so artificial that it could produce confusion. Image 2 Example of question error: implausible prompt
Unclear instructions. The indications for solving the exercise are mathematically unclear due to: the use of inappropriate language; because the prompt announces one thing, but the question asks for another; or because key information is omitted. In all the cases, the students’ responses are affected. This error was found in 35% of the questions reviewed. Example. A student is asked to calculate interest, whereas the task actually consists of calculating the sales price. Image 3 Example of question error: unclear [TRASLATION: “II – Calculate Interest. Price = $70,000, interest 20%”] Poor wording. The wording of the question creates comprehension difficulties about what the student should be doing, opening possibilities for differing interpretations and, therefore, differing answers. This error was found in 19% of the questions reviewed. Example. The question asks for the “difference between brothers,” and more specifically, the difference between their weights. The correct answer is 0.11 kg, but this answer is not available in the listed choices; one might suppose that the correct alternative is “b,” but there is no decimal place. Image 4 Example of question error: poor wording
Lack of Data. In this case, questions in which there was a lack of data (e.g., numbers, drawings, or symbols) necessary to answer the question adequately and, therefore, affected responses. This error was found in 10% of the questions reviewed. Example. Simple multiple choice questions with 4 alternatives, among which we assume there are 3 distractions and only one correct choice. As there is no indication with either word or numbers that accompany the figure on what type it is (rectangle or square), the answer could be various. Two of the 4 alternatives present could be one of these various answers. Image 5 Example of question error: lack of data
Unrealistic Result. Here those questions that have unviable, unrealistic, or impossible answers are grouped. This error was found in 6% of the questions reviewed. Example. It is quite clear that none of the alternatives available are related to the context as presented to the student, even though among them the amount that responds to the mathematical operation involved is present. In this type of exercise, the student mechanically proceeds, logically, without considering the context in which the task is inserted. Image 6 Example of question error: unrealistic result
In the tests gathered from municipal schools, the most common error was the lack of instruction, while for the privatesubsidized, it was the absence of instructions mixed with improbable prompts. In the case of the private schools, the most typical error was from unclear instructions, but the small quantity of tests obtained does not allow for a clear conclusion. b) Mathematical abilities According to the standards put forth by MINEDUC (Ministry of Education), there are four central abilities that a sixth grade student should develop in mathematics (numbered from 1 to 4, respectively): Problem solving, Discussion and Communication, Modeling, and Representation. Each of these abilities is divided into subabilities, from which the following codes have been adapted. Below we detail the number of each as classified:
The information shown in Graphic two is quite evident: the mathematical abilities corresponding to the sixth grade (2011 Program) that are tested by the questions analyzed are too few. 94% correspond to subabilities from a different grade level, followed by a tiny 2.4% that corresponds to “Translating natural language expressions to mathematical language, and vice versa.” Graph 2. Abilities tested
In municipal schools, barely 5% of all the questions contain some recognizable newly learned ability from the sixth grade. As for privatesubsidized schools, the same percentage appears, only 5% of the questions measure subabilities for that level. In the case of tests from private schools, 11% of the questions correspond to subabilities from the sixth grade level. d) Mathematical content According to the standards from MINEDUC 2011, there are four groups of content that should be included for the sixth grade (numbered from 1 to 4 respectively): Unit 1 Semester 1, Numbers and Algebra; Unit 2 Semester 1, Numbers and Algebra II; Unit 3 Semester 2, Geometry; Unit 4 Semester 2, Data and Chance. Although the semester is suggested, educational establishments can include these contents in the manner they most find convenient. Just as in the case of the abilities, codes were assigned to the contents of each of these four sets. Below we detail the number of each as classified:
Graphic 3. Contents evaluated As occurred with the mathematical abilities, graph three shows that for this case as well the majority of contents classified did not belong to the sixth grade level, with a percentage of almost 60% of the questions. Among those contents that did belong to the level, the highest percentage of the appearance of sixth grade mathematical skills are with the multiplication and division of positive fractions and decimals, with 23%, percentages as a topic appearing in 6%, and, in a much tinier level, multiplication and division of powers and use of powers of natural base and exponents, with 2%. If the questions are analyzed based on the dependency of the schools, it can be seen that the municipal schools had 30% of the questions that reflect the sixth grade levels, while 70% were questions on content from other levels. For private schools, 22% showed content from the sixth grade level, and the rest were from others. In the case of privatesubsidized establishments, some 45% reflected content at the sixth grade level, and the rest corresponded to content from other levels. It is clear that the tests in the privatesubsidized area most closely reflect the suggested contents. e) Expected learning According to MINEDUC 2011, there are four groups of learning, associated with the contents mentioned above, from which students are expected to develop for each semester in sixth grade mathematics (numbered from 1 to 4 respectively): Unit 1 Semester 1, Numbers and Algebra; Unit 2 Semester 1, Numbers and Algebra II; Unit 3 Semester 2, Geometry; Unit 4 Semester 2, Data and Chance. Below we detail the number of each as classified:
In graphic 4, as has been the tendency in this review, the most common identification of learning are those that are outside the range of sixth grade. Almost 80% of the expected learning corresponds to other courses. When looking to denote the expected learning corresponding to the sixth grade, generally speaking the greatest quantities correspond to expressing powers in base 10 and natural exponents, and multiplication and division of fractions. Graphic 4. Expected Learning
As for the dependencies of the schools, it can be said that in the questions coming from municipal schools, some 18% of them show expected learning for sixth grade. For private schools, the percentage of questions that correspond to expected learning for the level barely reach 13%. For privatesubsidized, the questions in accord with the expected learning increase to approximately some 23% of the total. f) Cognitive requirements This dimension hopes to identify the highest level of cognitive requirement that the question measures, using the 2011 TIMMS Mathematic classification. This classification considers the following levels: Knowledge, Application, and Reasoning (numbered from 1 to 3, respectively), although each is subdivided into further levels. The detail is as follows:
Based on the above analyses, graphic 5 shows a low cognitive demand in which 80% of the questions are at the Knowledge level, and particularly in the sublevel. “Calculate,” with almost 50%, followed by 16% for “Memory” (names, definitions, measurements, etc.). Questions at the Application level barely reach 13%, and Reasoning, not even 1%. Graphic 5. Cognitive Requirement
With respect to the possible statistical relationships between abilities, contents, and learning as evaluated before, and the variables for the type of degree from the teachers, dependency of the schools, or the type of curriculum used in teaching, it was not possible to find any correlation due to the fact that the majority of the questions could not be classified as abilities for the sixth grade. For example, only 150 of 2516 questions could qualify as abilities for the said level, a proportion that would not allow for the researchers to establish a tendency with respect to the variables mentioned. Given that the majority of the questions were classified as one of three levels in the TIMMS cognitive requirement scale, the correlations between this variable and the type of degree, dependency of the school, and type of curriculum used in teaching, a small significant correlation was found with the dependency of the educational establishment. A value of 0.26 suggests that there is a tendency to provide questions at a lower level for the municipal schools than in either privatesubsidized or private establishments. Nevertheless, given the scarce number of tests from private schools, this conclusion must be reduced to only municipal and privatesubsidized schools. In terms of the abilities, contents, and learning evaluated in the tests analyzed, it is important to remember that the teachers provided tests from the first semester of 2012, and as such there remains the possibility that many aspects would have been tested during the second semester in 2012. In addition, some teachers declared their use of the new curriculum (2012), and others, as in transition from 2011 to 2012. g) Disciplinary Management of Content Although not a preestablished dimension or category, it is important to highlight that 215 questions (that is, 8.5% of the total) had an error related to teacher mastery of the mathematic content. These errors are of diverse types and, below, some examples and respective discussion are presented. Example 1. The question hopes to measure the understanding of the concept of the area of a polygon; however, three of the alternatives are not related to any measurable attribute of the figure, but are rather related to the elements that compose it, such as perimeter, surface, height, and width. The only alternative that refers to the “measurement” is (B), but it does not refer to the measurement of the surface, rather to the measurement of the sides of the figure. The alternative answers are completely ambiguous. It might seem to have been a confusion of the concepts between the area of the polygon (measurement of the surface enclosed by the polygon) with the surface area enclosed by the polygon. Image 7: Example of question error: mistaken concepts
Example 2. The area of a triangle cannot be calculated, unless some conditions, not provided in the information or the figure, are assumed. Image 8 Example of question error: lack of data
Example 3. In this case, the measurements of the sides of the triangle presented as an illustration are not possible. Image 9 Example of question error: impossible situation
h) Written presence of the learning to be evaluated Expressing in writing the skills that the tests aim to measure is a desirable characteristic, since it orients both the teacher and students and, in an ideal world, said skills should coincide with the educational objectives put forth by MINEDUC. If there is coherence between the proposed, taught, learned, and evaluated objectives, then some authors speak of “curricular alignment”. In this category, it is hoped to analyze the tests in which these learning objectives are declared, in relation to their degree of coherence with that which they are truly measuring. Of the 103 tests analyzed, 51 explicitly signal which skills that the test is measuring, although they have been denominated differently (objectives, abilities, capacities, criteria, skills). In terms of preparation, these learning objectives should have some characteristics so that they may effectively take part in the evaluation. a) They should be prepared with the student in mind, since they express a set of abilities and contents that the student should develop as a product of teaching, and as such, that should be object to evaluation. Of the 51 tests, there are 7 in which this does not occur; rather, these learning objectives are expressed in terms of the teacher, that is, the teacher´s purpose is announced at the beginning of the test. Example: Image 10 Wording from the teacher´s perspective
b) The learning objectives should signal, at least, contents and ability. Of the 51 tests, there are 6 in which there is present only one of these two elements or simply none. As a result, there is no clarity of what is expected, of that which the students should demonstrate with the application of the constructed test. Example: Image 11 Wording that does not express content or abilities
c) The learning objectives should express themselves with verbs for which there is an ample degree of agreement of what they involve, since this provides clarity of what is expected that the student develops and demonstrates. Of the 51 tests, there are 6 in which the learning objectives are so vague that it is not possible to establish a judgment on if the students have achieved them or not. In the following example, only an exercise is shown, but the test contains various of the same style: Image 12 Ambiguous wording
d) Of the 51 tests, there are 14 in which the questions measure skills inferior to those declared in the objectives. In the following example, it can be seen that there are three learning objectives being measured, and one set of exercises. These exercises are not related to any of the three objectives signaled, since they correspond to the routine application of an algorithm; additionally, the students are asked to put only the answer. The teacher does not obtain evidence of analysis, interpretation, or problemsolving. It is fitting to mention that the rest of the test´s exercises are similar. Image 13 Wording that expresses abilities above those being evaluated
e) Finally, of the 51 tests, it can be said that only 18 presented coherence between what was declared to be measured and what was actually measured. This number is equivalent to 17% of the total sample and to 35% of the tests that declared the learning objectives. DiscussionIn this section, the conclusions according to the specific objectives are presented, and then a more general discussion on the quality of the evaluation procedures submitted for analysis and the implications thereof. In terms of the first specific objective (analyzing the contents and abilities to be evaluated and are used for grading through written evaluative procedures), a first aspect to point out is that there are abilities that the students must demonstrate in these tests that correspond for the most part to grade levels below sixth grade and that, for those that do correspond to sixth grade, represent a very low cognitive requirement limited to mostly memorizing and mechanical aspects. For example, this study found no questions that referred to the abilities of explaining their processes and deductions, or to evaluating problemsolving strategies and their pertinence for that situation. The analysis of the contents and the expected learning shows a panorama that seems to, for the most part, not apply to sixth grade, in as much as those that are gradeappropriate are concentrated on basic operations with positive fractions and decimals. The TIMMS cognitive requirements are mostly concentrated on aspects of memory and rote calculation. As for other characteristics, these are tasks in which students apply their knowledge disjointedly, nonintegrally, following the linear routine established beforehand. A large portion is presented with no context and, when there is, it is not related to aspects of real life (worse still, artificial, unreal, or nonexistent). Wiggins (1990) signaled that ideal evaluation tasks presented to students should be authentic in the sense that they are equal or similar to how real life or the professional world function, since they involve challenges and roles that help students to practice the complex ambiguities of the “game” of adult and professional life. As such, these tests should be a set of tasks that reflect the complexity of the real world, for which the students should design, organize, discuss, apply, justify, and evaluate. It could be said that, in the tests analyzed, mathematics is indicated as an objective body, an external reality given that the tasks have a correct or incorrect answer. These characteristics affect the validity of the tests since, first, there are many fundamental processes in mathematics that are not being evaluated and, second, there are tests that do not evaluate what the teacher declares to be evaluating. As such, we are in the presence of evaluation procedures that, for the most part, do not allow for valid inferences to be made on what the students should be able to do (Popham, 2010). In an updated perspective of evaluation in mathematics, the following would be an instance of learning for students. The tasks require a higher level of reflection; they are about quotidian, but complex, topics; they require the integrated use of distinct types of knowledge; and indeed have a certain margin of openness and allow for different answers (Borko, Mayfield, Scott, Flexer & Cumbo, 1997), considering errors as a source of learning (Bainbridge, Ellis & Wolodko, 2003; Gearhart & Saxe, 2004). In this study, it is concluded that the types of question, the scarce quantity of mathematical problems, and the absolute lack of essay type questions offer an unfavorable scenario for both learning and teaching in mathematics. Although these analyses have been done only on procedures for written evaluations, and particularly tests, it can be thought that these are what teachers mostly use to assign grades. Therefore, there are skills, expected learning outcomes, and contents that are not the objective of written tests, there would not be any other procedures to do so. In terms of the second specific objective (determining and analyzing the degree of coherence between the Ministry of Education suggested curriculum and the curriculum being evaluated), it is important to point out that the point of reference for comparison was the sixth grade mathematics program from 2011 that assigned codes to subabilities, contents, and expected learning present in the program and that the classification for cognitive requirements that was used was the 2011 mathematics TIMMS. 94% of the questions evaluated subabilities that correspond to levels below sixth grade; 60% of the contents evaluated did not correspond to sixth grade, and some of these were down to even first grade; 76% of the questions measure learning from lower levels, and the TIMMS evaluated showed preference towards Knowledge. Therefore, it could be said that as a semester passes by there is, in general, a progression in content, a very slow progression in expected learning, and almost none in relation to abilities. Said in another way, different contents are covered as the semester goes on, but over the same routine abilities, which shows a low degree of curricular alignment (Lopez, 2013). For the third specific objective (determining and evaluating the ways in which mathematical learning is evaluated and graded through written evaluation procedures), it can be concluded that, in the first place, the majority of the test questions used were questionnaires, followed by combined tests composed of questionnaire type and closed question type questions, and finally by objective tests. The questions are mostly closed answer, that is, multiplechoice, true or false, matching and fillin the blank. Although there are a great number of questions, it can be said that, in respect to their format, they are very similar among themselves and, in some cases, the same. In the heading, presenting only the strictly sufficient and necessary data to solve the problem, without the presence of other information (as would occur in real life), is the norm in which one has to select data to solve a situational problem. For the same reason, the tests ask that the response is expressed in numerical form with some unit of measurement. There are no complex, ambiguous, little defined, or open tasks for which the student must formulate the problem, select the data from among a large amount, and solve and evaluate the response according to the context; as such, no explanations, arguments, or justifications are required. As for communicative support, the majority of the questions are asked through a verbalnumerical system, followed by a verbalgraphical system. In this latter case, there are drawings, but not graphs. As for instructions, the majority of these refer to how the answers should be presented, for which one assumes that the rest of the instructions are given orally, or that there is a certain routine known by the student for the time needed for the tasks, the materials they should use, or questions they are allowed to ask. There are no instructions that refer to what the students might look for as information beyond what is strictly given in the test in general and in each question. An example of this might be that they ask the student to look for information at school, in a mall closeby, or even at home. The organization of the students seems to be individual in all cases, and, given that in the majority there are no indications of the time to use in each task (and by extension, in each test), the tests seem to be designed to be applied in one block of class time in one or two hours. Save some exceptions, there are no tests in which the student should selfevaluate or coevaluate their classmates. The three exceptions found that referred to selfevaluation center on asking the student on personal aspects and not on mathematical learning. The impression is then strong, due to the type of questions and answers expected, to the instructions, and to the little or lack of space for the students to write and express their reasoning. Furthermore, the teachers do not seem to give feedback to the students, except for returning the corrected tests. Without knowing the nature of the errors committed by the students, the teachers miss out on an important criterion for making fundamental decisions. In terms of the grading systems used, the majority of the tests had a tendency to assign weights or points more for groupings of question types than for the type of learning or level of cognitive requirement involved in the task. For example, there are groups of multiple choice questions that are all worth the same, even though the abilities and contents that they evaluate are different. In the cases where the tests contain different types of questions, the questions that are weighted more are those for which the student has to write out their process. The most common grading scale used was for a minimum of 60%. From this, the teachers most likely have a table or program that allows them to transfer these points to a grade from 1 to 7, which is the official scale of the school system at the national level. To this, it is necessary to point out that, in the tests, no condition is established for reaching the minimum grade except for reaching 60% total of the points available. Given that the questions are weighted without consideration for the level of cognitive requirement, it is quite possible that two students can reach this 60% total in totally different ways, showing as such different learning, but both obtaining the minimum 4.0. In this case, the two 4.0’s are not comparable. The above indicates that, in general, the abilities, learning outcomes expected, and contents presented in the 2011 sixth grade program are not considered as directions for the weighting and grading, as neither are the learning outcomes that the teachers have written explicitly as to be evaluated. It can be said that the points and grades do not provide information about the learning of the students since there seem not to be any explicit relationships between the points and grades that the students obtain and certain levels of known and understood learning objectives that the teachers are aware of. Even if this was not so, the grading scale utilized in some way implies levels of learning; however, in practice, it seems to have its own life. Finally, the amount of tests without proper space for answering, prompts that are somewhat implausible or artificial (or even in some cases unreal for the students), unrealistic results, and even the lack of data or wording of the problems all indicate that the source of the difficulty of many questions is outside of the realm of mathematical skills and contents, and bring the student to guess what type of answer each teacher is looking for without considering if that answer is adequate for the context in which the question is given. As Barberá (1997) points out, students end up being unable to make decisions or establish meaningful relationships between questions, obtained results, and the information given. That is, this practice promotes the highly criticized mechanization in answering, adaptation in students, and searching for clues for the correct response. In sum, there is a marked tendency to develop excessively traditional evaluation processes, set out with a technical focus and, as such, outdated. It could be said that we are using evaluation as understood as a measurement; that it is used preferentially for grading and not for modifying teaching or learning (since the teacher does not learn from this process); that the role of the student is passive and therefore learning is strongly dependent on the teacher; that institutional knowledge is promoted, which provokes a situation in which the student is assigned a value (in this case, grades) in exchange for true learning. The impression is that evaluation is done more for the teacher to accomplish administrative paperwork than for giving value to the student. This lack of knowledge within the teaching cohort about new evaluation focuses, and especially those that are directed at educating through evaluation, impedes them from understanding the importance and potential that evaluation has in improving learning outcomes, and indeed impedes them from being conscious of the negative effects that are produced in the students through the use of poor evaluation processes. ReferencesBainbridge, J., Ellis, M. & Wolodko, B. (2003). Writing to Succeed in Elementary School Mathematics. International Electronic Journal for Leadership in Learning, 7 (18). Disponible en http://www.ucalgary.ca/~iejll/volume7/bainbridge2.htm Barberá, E. (1997). La evaluación escrita en el área matemática: contenido y tendencias. Anuario de Psicología, 72, 2141. Disponible en http://www.raco.cat/index.php/AnuarioPsicologia/article/viewFile/61344/96235?origin=publication_detail Barberá, E. (2002). Evaluación escrita del aprendizaje (I): la evaluación como escenario educativo. Teoría y Didáctica de las Ciencias Sociales, 7, 2536. Disponible en http://www.redalyc.org/articulo.oa?id=65200712 Black, P. & Dylan, W. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7 – 74. Borko, H., Mayfield, V., Scott M., Flexer, R. & Cumbo, K. (1997). Teachers' Developing Ideas and Practices about Mathematics Performance Assessment: Successes, Stumbling Blocks, and Implications for Professional Development. Teaching & Teacher Education, 13(3), 259278. Camilloni, A, Basabe, L. & Feeney, S. (Septiembre 2009). Los formatos de evaluación de los aprendizajes y sus relaciones con las modalidades de estudio de los alumnos universitarios. Perspectivas de investigación y marcos de análisis. Ponencia presentada en el Primer Congreso Internacional de Pedagogía Universitaria. Secretaria de Asuntos Académicos de la Universidad de Buenos Aires, Argentina. Disponible en http://www.ungs.edu.ar/cienciaydiscurso/wpcontent/uploads/2011/11/CamilloniBasabeFeeney20091.pdf Cortés, J. (2009). Tipos de evaluación e instrumentos de evaluación. Madrid: Cátedra. Crooks, T. (1988). The impact of classroom evaluation practices on students. Review of educational research, 58, 428481. Entwistle, N. (November 2000). Promoting deep learning through teaching and assessment: conceptual frameworks and educational contexts. Paper presented at the TLRP Conference, Leicester. Disponible en http://www.etl.tla.ed.ac.uk/docs/entwistle2000.pdf Gearhart, M. & Saxe, G. (2004). When teachers know what students know: integrating mathematics assessment. Theory into Practice, 43(4), 304313. Goñi, J. (2000). Los procedimientos seguidos en la evaluación en matemáticas. Aula de innovación educativa, 6(9), 9394. Gulikers, J., Bastiaens, T., Kirschner, P. & Kester, L. (2006) Relations between Student Perceptions of Assessment Authenticity, Study Approaches and Learning Outcomes. Studies in Educational Evaluation, 32, 381400. Himmel, E., Olivares, M. A. & Zabalza, J. (1999). Hacia una Evaluación Educativa. Aprender para evaluar y Evaluar para Aprender. Santiago: MINEDUC – PUC. Lopez, A. (2013). Alineación entre las evaluaciones externas y los estándares académicos: El caso de la prueba Saber de Matemáticas en Colombia. RELIEVE, 19(2). DOI: 10.7203/relieve.19.2.3024 MINEDUC (2011). Programa de Matemática sexto básico. Disponible en http://www.mineduc.cl Picaroni, B. & Loureiro, G. (2010). Qué matemática se enseña en aulas de sexto año de Primaria en escuelas de Latinoamérica. Páginas de educación, 3(3), 2960. Popham, J. (2010). Everything school leaders need to know about Assessment. California: Corwin. Roco, R. (2010). Caracterización de los establecimientos educacionales en Chile: la necesidad de nuevas consideraciones. Ponencia presentada al Primer Congreso Interdisciplinario de Investigación en Educación, CIAE, Universidad de Chile y CEPPE, Pontificia Universidad Católica de Chile. Scouller, K. (1998). The Influence of Assessment Method on Students' Learning Approaches: Multiple Choice Question Examination versus Assignment Essay. Higher Education, 35(4), 453472. Shavelson R., RuizPrimo M. A., Li, M. & Cuauhtemoc, C. (2003). Evaluating New Approaches to Assessing Learning CSE. Report 604. CRESST. Los Angeles. Disponible en http://www.cse.ucla.edu/products/reports/r604.pdf Struyven, K., Dochy, F. & Janssens, S. (2005). Students' perceptions about evaluation and assessment in higher education: a review. Assessment & Evaluation in Higher Education, 30(4), 325 – 341. Tang, C. (1994). Assessment and student learning: Effects of modes of assessment on students' preparation strategies. In G. Gibbs (Ed.) Improving student learning: Theory and practice. (pp. 151170). Oxford, UK: Oxford Brookes University, The Oxford Centre for Staff Development. Disponible en http://teaching.polyu.edu.hk/datafiles/R126.pdf Thompsom, K. & Falchikov, N. (1998). “Full on until the sun comes out”: The effects of assessment on student’s approaches to studying. Assessment and Evaluation in Higher Education, 23(4), 379  390. Wiggins, G. (1990). The case for authentic assessment. Practical Assessment, Research & Evaluation, 2(2). Disponible en http://pareonline.net/getvn.asp?v=2&n=2 Yañez, V., Castro, A., Castillo, R., Catalán, C. & González, M. (2008). Prácticas evaluativas de profesores de matemática de enseñanza media, con énfasis en la resolución de problemas, Investigaciones en educación, VIII(1), 105124. Disponible en http://dungun.ufro.cl/~mageduc/docs/rie_2008vol1.pdf NOTES ACKNOWLEDGEMENTS This research is the result of FONIDE F6211152011 project, funded by the Fund for Research and Development in Education, Ministry of Education of Chile. It is especially grateful to Patricia C. López, Pablo Cáceres S., Karen V. Nuñez, Rocío S. Poblete, Pamela B. Contreras, D. and Javier Andrés Moya Santis T.


ARTICLE RECORD / FICHA DEL ARTÍCULO
Reference / Referencia 
Contreras, Gloria (2014). Curriculum characterisation assessed in sixth grade in mathematics. A descriptive study in Valparaiso, Chile. RELIEVE, v. 20 (2), art. 4. DOI: 10.7203/relieve.20.2.4295 
Title / Título 
Curriculum characterisation assessed in sixth grade in mathematics. A descriptive study in Valparaiso, Chile]. [Caracterización del curriculum evaluado en matemática en sexto año básico. Un estudio descriptivo en Valparaíso, Chile]. 
Authors / Autores 
Contreras, Gloria 
Review / Revista 
RELIEVE (Revista ELectrónica de Investigación y EValuación Educativa), v. 20 n. 2 
ISSN 
11344032 
Publication date / Fecha de publicación 
Reception Date: 2014 March 20 ; Approval Date: 2014 October 28. Publication Date: 2014 October 30 
Abstract / Resumen 
This article presents the main results of the investigation Curriculum Characterisation Assessed in Sixth Grade Mathematics: Guidelines for the Initial and Continuous Training of Teachers, which aims to describe and analyse what is being evaluated in sixth grade mathematics and how, in the region of Valparaiso, Chile. A total of 103 written mathematic tests conducive to grading from 27 educational institutions were analysed. A group of codes refers to both the formal aspects and mathematical contents, while mathematical skills were applied to these tests and respective questions (2,516). Researchers conclude students are mainly required to provide close and unique answers, which evaluate the memorization and solution of exercises in a mechanical way, and the coverage level of the curriculum prescribed by the Ministry of Education is low, where most mathematical contents and abilities are below the sixth grade level. Este artículo pretende dar cuenta de los principales resultados de la investigación denominada Caracterización del curriculum evaluado en sexto año básico en matemática: orientaciones para la formación inicial y continua de profesores y profesoras, cuyo objetivo principal fue describir y analizar lo que se evalúa y cómo se evalúa en matemática en dicho nivel en la región de Valparaíso, Chile. Se analizaron 103 pruebas escritas de matemática conducentes a calificación, pertenecientes a 27 establecimientos educacionales. A dichas pruebas, y a sus respectivas 2516 preguntas, se les aplicó un conjunto de códigos referido tanto a aspectos formales como de contenidos y habilidades matemáticas. Se concluye que mayoritariamente se demanda del estudiante una respuesta cerrada, única, en que se evalúa la memorización y resolución de ejercicios de forma mecánica, y que el nivel de cobertura del curriculum prescrito por el Ministerio de Educación es bajo, encontrándose muchos contenidos y habilidades matemáticas de niveles inferiores al sexto básico. 
Keywords / Descriptores 
Assessment of learning, mathematics, assessment impact, written tests, grading Evaluación del aprendizaje, matemática, impacto de la evaluación, pruebas escritas, calificación. 
Institution / Institución 
Pontificia Universidad Católica de Valparaíso (Chile) 
Publication site / Dirección 

Language / Idioma 
Español & English version (Title, abstract and keywords in English & Spanish) 
© Copyright, RELIEVE. Reproduction and distribution of this article is authorized if the content is no modified and its origin is indicated (RELIEVE Journal, volume, number and electronic address of the document).
© Copyright, RELIEVE. Se autoriza la reproducción y distribución de este artículo siempre que no se modifique el contenido y se indique su origen (RELIEVE, volumen, número y dirección electrónica del documento).
[ ISSN: 11344032 ]
Revista ELectrónica de Investigación y EValuación Educativa EJournal of Educational Research, Assessment and Evaluation

Statistics Estadísticas