Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference

This study examined the validity of the scoring procedures for the AP Japanese conversation task using an argument-based approach, with a focus on rater reliability and rating scale functioning. Data were collected from 102 high school students through a test simulation, with three raters scoring th...

Full description

Saved in:

Bibliographic Details
Main Author:	Nana Suzumura-Smith
Format:	Article
Language:	English
Published:	National Council of Less Commonly Taught Languages 2025-07-01
Series:	Journal of the National Council of Less Commonly Taught Languages
Subjects:	japanese language testing argument-based approach to validity speaking assessment simulated interactive conversation ap japanese exam
Online Access:	https://ncolctl.org/wp-content/uploads/2025/07/vol38-p4.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1839622960964435968
author	Nana Suzumura-Smith
author_facet	Nana Suzumura-Smith
author_sort	Nana Suzumura-Smith
collection	DOAJ
description	This study examined the validity of the scoring procedures for the AP Japanese conversation task using an argument-based approach, with a focus on rater reliability and rating scale functioning. Data were collected from 102 high school students through a test simulation, with three raters scoring the performances using a common 7-point scale. Test scores were analyzed across raters and speech acts using the Partial Credit Rasch model. Results provided support for rater reliability but only limited support for the intended functioning of the rating scale. To enhance task validity, three potential modifications were proposed: controlling speech act types and numbers, reducing the number of score categories, and modifying the scoring procedure. This study sheds light on the validity argument for the AP Japanese conversation task and addresses the scarcity of validity evidence for this exam. The findings underscore the importance of empirically confirming rating scale functioning in any assessment context.
format	Article
id	doaj-art-08c2a729bc5a42ef8e6b83fb9f0c4c81
institution	Matheson Library
issn	1930-9031 2689-2979
language	English
publishDate	2025-07-01
publisher	National Council of Less Commonly Taught Languages
record_format	Article
series	Journal of the National Council of Less Commonly Taught Languages
spelling	doaj-art-08c2a729bc5a42ef8e6b83fb9f0c4c812025-07-21T08:15:06ZengNational Council of Less Commonly Taught LanguagesJournal of the National Council of Less Commonly Taught Languages1930-90312689-29792025-07-0138151180Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation InferenceNana Suzumura-Smith0California State University, Long BeachThis study examined the validity of the scoring procedures for the AP Japanese conversation task using an argument-based approach, with a focus on rater reliability and rating scale functioning. Data were collected from 102 high school students through a test simulation, with three raters scoring the performances using a common 7-point scale. Test scores were analyzed across raters and speech acts using the Partial Credit Rasch model. Results provided support for rater reliability but only limited support for the intended functioning of the rating scale. To enhance task validity, three potential modifications were proposed: controlling speech act types and numbers, reducing the number of score categories, and modifying the scoring procedure. This study sheds light on the validity argument for the AP Japanese conversation task and addresses the scarcity of validity evidence for this exam. The findings underscore the importance of empirically confirming rating scale functioning in any assessment context.https://ncolctl.org/wp-content/uploads/2025/07/vol38-p4.pdfjapanese language testingargument-based approach to validityspeaking assessmentsimulated interactive conversationap japanese exam
spellingShingle	Nana Suzumura-Smith Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference Journal of the National Council of Less Commonly Taught Languages japanese language testing argument-based approach to validity speaking assessment simulated interactive conversation ap japanese exam
title	Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference
title_full	Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference
title_fullStr	Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference
title_full_unstemmed	Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference
title_short	Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference
title_sort	rater reliability and rating scale utility for the ap japanese computer simulated conversation task evaluation inference
topic	japanese language testing argument-based approach to validity speaking assessment simulated interactive conversation ap japanese exam
url	https://ncolctl.org/wp-content/uploads/2025/07/vol38-p4.pdf
work_keys_str_mv	AT nanasuzumurasmith raterreliabilityandratingscaleutilityfortheapjapanesecomputersimulatedconversationtaskevaluationinference

Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference

Similar Items