Diez pasos para la construcción de un test

José Muñiz; Eduardo Fonseca-Pedrero

doi:10.7334/PSICOTHEMA2018.291

Diez pasos para la construcción de un test

José Muñiz ¹
Eduardo Fonseca-Pedrero ²

1 Universidad de Oviedo

Universidad de Oviedo

Oviedo, España

ROR https://ror.org/006gksa02
2 Universidad de La Rioja

Universidad de La Rioja

Logroño, España

ROR https://ror.org/0553yr311

Journal:

Psicothema

ISSN: 0214-9915, 1886-144X

Year of publication: 2019

Volume: 31

Issue: 1

Pages: 7-16

Type: Article

DOI: 10.7334/PSICOTHEMA2018.291 DIALNET GOOGLE SCHOLAR Open access editor

More publications in: Psicothema

Institutional repository: Open access Editor

Abstract

Background: Tests are the measurement instruments most used by psychologists to obtain data about people, both in professional and research contexts. The main goal of this paper is to synthesize in ten steps the fundamental aspects that must be taken into account when building a test in a rigorous way. Method: For the elaboration of the ten proposed phases, the specialized psychometric literature was revised, and previous works by the authors on the subject were updated. Results: Ten steps are proposed for the objective development of a test: delimitation of the general framework, defi nition of the variable to be measured, specifi cations, items development, edition of the test, pilot studies, selection of other measurement instruments, test administration, psychometric properties, and development of the fi nal version. Conclusion: Following the ten proposed steps, objective tests can be developed with adequate psychometric properties based on empirical evidence

€ View funding

Funding information

Los autores quieren agradecer los comentarios realizados por los profesores Alicia Pérez de Albéniz y Adriana Diez a una versión preliminar de este trabajo. Esta investigación ha sido financiada por el Ministerio de Ciencia e Innovación de España (MICINN) (referencias: PSI2014-56114-P, PSI2017-85724-P) y por el Instituto Carlos III, Centro de Investigación Biomédica en Red de Salud Mental (CIBERSAM).

Funders

Bibliographic References

American Educational Research Association, American Psychological Association y National Council on Measurement in Education (2014). Standards for educational and psychological testing. Washington, DC: Author.
Armayones, M., Boixadós, M., Gómez, B., Guillamón, N., Hernández, E., Nieto, R., Pousada, M., y Sara, B. (2015). Psicología 2.0: oportunidades y retos para el profesional de la psicología en el ámbito de la e-salud. Papeles del Psicólogo, 36, 153-160.
Baldwin, D., Fowles, M., y Livingston, S. (2005). Guidelines for constructed-response and other performance assessments. Princeton, NJ: Educational Testing Service.
Balluerka, N., Gorostiaga, A., Alonso-Arbiol, I., y Haranburu, M. (2007). La adaptación de instrumentos de medida de unas culturas a otras: una perspectiva práctica. Psicothema, 19, 124-133.
Bennett, R. E. (1999). Using new technology to improve assessment. Educational Measurement: Issues and Practice, 18, 5-12.
Bennett, R. E. (2006). Inexorable and inevitable: The continuing story of technology and assessment. En D. Bartram y R. K. Hambleton (Eds.), Computer-based testing and the internet: Issues and advances (pp. 201-218). Chichester: Wiley.
Borsboom D., y Cramer, A.O.J. (2013). Network analysis: An integrative approach to the structure of psychopathology. Annual Review of Clinical Psychology, 9, 91-121. doi: 10.1146/annurev-clinpsy-050212185608
Breithaupt, K. J., Mills, C. N., y Melican, G. J. (2006). Facing the opportunities of the future. En D. Bartram y R. K. Hambleton (Eds.), Computer-based testing and the Internet (pp. 219-251). Chichester: John Wiley and Sons.
Brown, T. A. (2015). Confi rmatory factor analysis for applied research (2nd edition). New York: Guilford Press.
Calero, D., y Padilla, J. L. (2004). Técnicas psicométricas: los tests. En R. Fernández-Ballesteros (Ed.), Evaluación psicológica: conceptos, métodos y estudio de casos (pp. 323-355). Madrid: Pirámide.
Carretero, H., y Pérez, C. (2005). Normas para el desarrollo y revisión de estudios instrumentales. International Journal of Clinical and Health Psychology, 5, 521-551.
Chernyshenko, O. S., y Stark, S. (2016). Mobile psychological assessment. En F. Drasgow (Ed.) (2016). Technology and testing (pp. 206-216). Nueva York: Routledge.
Clark, L. A., y Watson, D. (1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7, 309-319.
Cuesta, M. (1996). Unidimensionalidad. En J. Muñiz (Ed.), Psicometría (pp. 239-292). Madrid: Universitas.
Dillman, D. A., Smyth, J. D., y Christian, L. M. (2009). Internet, mail and mixed-mode surveys: The tailored design method. Hoboken, NJ: Wiley.
Dorans N. J., y Cook, L. (2016). Fairness in educational assessment and measurement. New York: Taylor & Francis.
Downing, S. M. (2006). Twelve steps for effective test development. En S. M. Downing y T. M. Haladyna (Eds.), Handbook of test development (pp. 3-25). Mahwah, NJ: Lawrence Erlbaum Associates.
Downing, S. M., y Haladyna, T. M. (2006). Handbook of test development. Mahwah, NJ: Lawrence Erlbaum Associates.
Drasgow, F. (Ed.) (2016). Technology and testing. Nueva York: Routledge.
Drasgow, F., Luecht, R. M., y Bennett, R. E. (2006). Technology and testing. En R. L. Brennan (Ed.), Educational measurement. Westport, CT: ACE/Praeger.
Elosua, P. (2003). Sobre la validez de los tests. Psicothema, 15, 315-321. Elosua, P., y Zumbo, B. (2008). Coefi cientes de fi abilidad para escalas de respuesta categórica ordenada. Psicothema, 20, 896-901.
Erceg-Hurn, D. M., y Mirosevich, V. M. (2008). Modern robust statistical methods: An easy way to maximize the accuracy and power of your research. American Psychologist, 63, 591-601. doi: 10.1037/0003066X.63.7.591
Ferrando, P.J., y Anguiano, C. (2010). El análisis factorial como técnica de investigación en Psicología. Papeles del Psicólogo, 31, 18-33.
Fonseca Pedrero, E., y Debbané, M. (2017). Schizotypal traits and psychoticlike experiences during adolescence: An update. Psicothema, 29, 5-17. doi: 10.7334/psicothema2016.209
Fonseca-Pedrero, E. (2017). Análisis de redes: ¿una nueva forma de comprender la psicopatología? Revista de Psiquiatria y Salud Mental, 10, 183-224. doi: 10.1016/j.rpsm.2017.06.004
Fonseca-Pedrero, E. (2018). Análisis de redes en psicología. Papeles del Psicólogo, 39, 1-12. https://doi.org/10.23923/pap.psicol2018.2852
Fonseca-Pedrero, E., y Muñiz, J. (2017). Quinta evaluación de tests editados en España: mirando hacia atrás, construyendo el futuro. Papeles del Psicólogo, 38, 161-168. https://doi.org/10.23923/pap.psicol2017.2844
Gierl, M. J., y Haladyna, T. M. (Eds.) (2013). Automatic item generation: Theory and practice. Nueva York: Routledge.
Gómez-Benito, J., Sireci, S., Padilla, J. L., Hidalgo, M. D., y Benítez, I. (2018). Differential Item Functioning: Beyond validity evidence based on internal structure. Psicothema, 30, 104-109. doi: 10.7334/ psicothema2017.183.
Haladyna, T. M. (2004). Developing and validating multiple-choice test item (3ª ed.). Hillsdale, NJ: LEA.
Haladyna, T. M., Downing, S. M., y Rodríguez, M. C. (2002). A review of multiple-choice item-writing guidelines. Applied Measurement in Education, 15, 309-334.
Haladyna, T. M., y Rodríguez, M. C. (2013). Developing and validating test items. London: Routledge.
Hambleton, R. K., Merenda, P. F., y Spielberger, C. D. (2005). Adapting educational and psychological tests for cross-cultural assessment. London: Lawrence Erlbaum Associates.
Hernández, A., Ponsoda, V., Muñiz, J., Prieto, G., y Elosua, P. (2016). Revisión del modelo para evaluar la calidad de los tests utilizados en España. Papeles del Psicólogo, 37, 192-197.
Hogan, T. P., y Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20, 427-441.
International Test Commission (2017). The ITC Guidelines for translating and adapting Tests (Second edition). Recuperado de http://www. InTestCom.org
Irwing, P. (Ed.) (2018). The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test development. UK: John Wiley & Sons Ltd.
Insel, T.R. (2017). Digital phenotyping: Technology for a new science of behavior. JAMA, 318, 1215-1216. doi: 10.1001/jama.2017.11295
Kane, M. T. (2006). Validation. En R. L. Brennan (Ed.), Educational measurement (4th edition) (pp. 17-64). Westport, CT: American Council on Education/Praeger.
Kosinski, M., Stillwell, D., y Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behaviour. Proceedings of the National Academy of Sciences, 110, 5802-5805. doi: 10.1073/ pnas.1218772110
Krosnick, J. A., y Presser, S. (2010). Question and questionnaire design. En P. V. Marsden y J. D. Wright (Eds.), Handbook of survey research (2ª edición) (pp. 263-314). Bingley, Inglaterra: Emerald Group.
Lane, S., Raymond, M.R., y Haladyna, T. M. (2016). Handbook of test development (2nd edition). New York, NY: Routledge.
Leong, T. L., Bartram, D., Cheung, F. M., Geisinger, K. F., e Illiescu, D. (Eds.) (2016). The ITC International Handbook of Testing and Assessment. New York: Oxford University Press.
Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22, 1-55.
Livingston, S. (2009). Constructed-response test questions: Why we use them, how we score them. Princeton, NJ: Educational Testing Service.
Lord, F. M., y Novick, M. R. (1968). Statistical theories of mental test scores. New York: Addison-Wesley.
Magno, C. (2009). Taxonomy of aptitude test items: A guide for item writers. The International Journal of Educational and Psychological Assessment, 2, 39-53.
Markovetz, A., Blaszkiewicz, K., Montag, C., Switala, C., y Schlaepfer, T. E. (2014). Psycho-Informatics: Big Data shaping modern psychometrics. Medical Hipotheses, 82, 405-411. doi: 10.1016/j.mehy.2013.11.030
Markus, K., y Borsboom, D. (2013). Frontiers of validity theory: Measurement, causation, and meaning. New York, NY: Routledge.
Martínez-Arias, R. (2018). Aproximaciones actuales a la validez de los test. En Academia de Psicología de España (Ed.), Psicología para un mundo sostenible (pp. 51-77). Madrid: Pirámide.
Miller, G. (2012). The smartphone psychology manifesto. Perspectives on Psychological Science, 7, 221-237. doi:10.1177/1745691612441215
Myin-Germeys, I., Kasanova, Z., Vaessen, T., Vachon, H., Kirtley, O., Viechtbauer, W., y Reininghaus, U. (2018). Experience sampling methodology in mental health research: New insights and technical developments. World Psychiatry, 17, 123-132. doi: 10.1002/wps.20513
Moreno, R., Martínez, R. J., y Muñiz, J. (2004). Directrices para la construcción de ítems de elección múltiple. Psicothema, 16, 490-497.
Moreno, R., Martínez, R., y Muñiz, J. (2006). New guidelines for developing multiple-choice items. Methodology, 2, 65-72.
Moreno, R., Martínez, R., y Muñiz, J. (2015). Guidelines based on validity criteria for the development of multiple choice items. Psicothema, 27, 388-394. doi: 10.7334/psicothema2015.110
Moreno, R., Martínez, R. J., y Muñiz, J. (2018). Test item taxonomy based on functional criteria. Frontiers in Psychology, 9, 1175, 1-9. doi: 10.3389/fpsyg.2018.01175
Muñiz, J. (Ed.) (1996). Psicometría. Madrid: Universitas. Muñiz, J. (1997). Introducción a la teoría de respuesta a los ítems. Madrid: Pirámide.
Muñiz, J. (2000). Teoría Clásica de los Tests. Madrid: Pirámide.
Muñiz, J. (2004). La validación de los tests. Metodología de las Ciencias del Comportamiento, 5, 121-141.
Muñiz, J. (2018). Introducción a la psicometría. Madrid: Pirámide.
Muñiz, J., y Bartram, D. (2007). Improving international tests and testing. European Psychologist, 12, 206-219.
Muñiz, J., Elosua, P., y Hambleton, R.K. (2013). Directrices para la traducción y adaptación de los tests: segunda edición. Psicothema, 25, 151-157. doi: 10.7334/psicothema2013.24
Muñiz, J., Fidalgo, A. M., García-Cueto, E., Martínez, R., y Moreno, R. (2005). Análisis de los ítems. Madrid: La Muralla.
Muñiz, J., y Fonseca, E. (2008). Construcción de instrumentos de medida para la evaluación universitaria. Revista de Investigación en Educación, 5, 13-25.
Muñiz, J., y Fonseca, E. (2017). Construcción de instrumentos de medida en psicología. Madrid: FOCAD. Consejo General de Psicología de España.
Nelson, B., McGorry, P. D., Wichers, M., Wigman, J. T., y Hartmann, J. A. (2017). Moving from static to dynamic models of the onset of mental disorder. JAMA Psychiatry, 74, 528-534. doi: 10.1001/ jamapsychiatry.2017.0001
Norma UNE-ISO 10667 (2013). Prestación de servicios de evaluación. Procedimientos y métodos para la evaluación de personas en entornos laborales y organizacionales (partes 1 y 2). Madrid: AENOR.
Olea, J., Abad, F., y Barrada, J.R. (2010). Tests informatizados y otros nuevos tipos de tests. Papeles del Psicólogo, 31, 94-107.
Osterlind, S. J. (1998). Constructing test items: Multiple-choice, constructed-response, performance and others formats. Boston: Kluwer Academic Publishers.
Osterlind, S.J., y Merz, W, R. (1994). Building a taxonomy for constructed-response test items. Educational Assessment, 2, 133-147.
Parshall, C. G., Harmes, J. C., Davey, T., & Pashley, P. (2010). Innovative items for computerized testing. En W. J. van der Linden y C. A. Glas (Eds.), Elements of adapting testing (pp. 215-230). Londres: Springer.
Phelps, R. (Ed.) (2005). Defending standardized testing. Londres: LEA.
Phelps, R. (Ed.) (2008). Correcting fallacies about educational and psychological testing. Washington: APA.
Prieto, G., y Delgado, A. (2010). Fiabilidad y validez. Papeles del Psicólogo, 31, 67-74.
Rauthmann, J. (2011). Not only item content but also item formats is important: Taxonomizing item format approaches. Social Behavior and Personality, 39, 119-128.
Rus-Calafell, M., Garety, P., Sason, E., Craig, T.J.K., y Valmaggia, L.R. (2018). Virtual reality in the assessment and treatment of psychosis: A systematic review of its utility, acceptability and effectiveness. Psychological Medicine, 48, 362-391. doi: 10.1017/ S0033291717001945
Scalise, K., y Gifford, B. (2006). Computer-based assessment in e-learning: A framework for constructing “intermediate constraint” questions and tasks for technology platforms. The Journal of Technology, Learning, and Assesment, 4(6). Recuperado de http://www.jtla.org
Schmeiser, C. B., y Welch, C. (2006). Test development. En R. L. Brennan (Ed.), Educational measurement (4th ed.) (pp. 307-353). Westport, CT: American Council on Education/Praeger.
Shermis, M. D., y Burstein, J. (Eds.) (2013). Handbook of automated essay evaluation. Current applications and new directions. Nueva York: Routledge.
Sireci, S. (1998a). The construct of content validity. Social Indicators Research, 45, 83-117.
Sireci, S. (1998b). Gathering and analyzing content validity data. Educational Assessment, 5, 299-321.
Sireci, S., y Faulkner-Bond, M. (2014). Validity evidence based on test content. Psicothema, 26, 100-107. doi: 10.7334/psicothema2013.256
Sireci, S., y Zenisky, A. L. (2006). Innovative items format in computerbased testing: In pursuit of construct representation. En S. M. Downing y T. M. Haladyna (Eds.), Handbook of test development (pp. 329-348). Hillsdale, NJ: LEA.
Sireci, S., y Zenisky, A. L. (2016). Computerized innovative item formats: Achievement and credentialing. En S. Lane, M. R. Raymond y T. M. Haladyna (Eds.), Handbook of test development (pp. 313-334). Nueva York: Routledge.
Smith, G. T., Fischer, S., y Fister, S. M. (2003). Incremental validity principles in test construction. Psychological Assessment, 15, 467-477.
Smith, S. T. (2005). On construct validity: Issues of method measurement. Psychological Assessment, 17, 396-408.
Suárez, J., Pedrosa, I., Lozano, L., García-Cueto, E., Cuesta, M., y Muñiz, J. (2018). Using reversed items in Likert scales: A questionable practice. Psicothema, 30, 149-158. doi: 10.7334/psicothema2018.33
Thurstone, L. L. (1927a). A law of comparative judgement. Psychological Review, 34, 273-286.
Thurstone, L. L. (1927b). The method of paired comparisons for social values. Journal of Abnormal Social Psychology, 21, 384-400.
Thurstone, L. L. (1928). Attitudes can be measured. American Journal of Sociology, 33, 529-554.
Trull, T. J., y Ebner-Priemer, U. W. (2013). Ambulatory assessment. Annual Review of Clinical Psychology, 9, 151-176. doi: 10.1146/ annurev-clinpsy-050212-185510
van der Linden, W. (Ed.) (2016). Handbook of item response theory (3 volúmenes). Boca Ratón, FL: Chamman & Hall/CRC.
van Os, J., Delespaul, P., Wigman, J., Mying-Germays, I., y Wichers, M. (2013). Beyond DSM and ICD: Introducing precision diagnosis for psychiatry using momentary assessment technology. World Psychiatry, 12, 113-117. doi: 10.1002/wps.20046
Wells, C.S., y Faulkner-Bond, M. (2016). Educational measurement. From foundations to future. New York, NY: The Guilford Press.
Wetzel, E., Böhnke, J.R., y Brown, A. (2016). Response Biases. En F.T. Leong et al. (Eds.). The ITC international handbook of testing and assessment (pp. 349-363). New York: Oxford University Press.
Williamson, D.M., Bennett, R.E., Lazer, S., Berstein, J., Foltz, P.W., Landauer, T.K., Rubin, D.P., Way, W.P., y Sweeney, K. (2010). Automated scoring for the assessment of common core standards. Princeton, NJ: Educational Testing Service.
Williamson, D.M., Mislevy, R.J., y Bejar, I. (2006). Automated scoring of complex tasks in computer-based testing. Mahwah, NJ: LEA.
Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, NJ: Lawrence Erlbaum Associates.
Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. En C. R. Rao y S. Sinharay (Eds.), Handbook of statistics: Vol. 26. Psychometrics (pp. 45-79). Amsterdam, Netherlands: Elsevier Science.

Data source: Dialnet

Diez pasos para la construcción de un test

Universidad de Oviedo

Universidad de La Rioja

Abstract

Funding information

Funders

Bibliographic References