Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference
This study examined the validity of the scoring procedures for the AP Japanese conversation task using an argument-based approach, with a focus on rater reliability and rating scale functioning. Data were collected from 102 high school students through a test simulation, with three raters scoring th...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
National Council of Less Commonly Taught Languages
2025-07-01
|
Series: | Journal of the National Council of Less Commonly Taught Languages |
Subjects: | |
Online Access: | https://ncolctl.org/wp-content/uploads/2025/07/vol38-p4.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This study examined the validity of the scoring procedures for the AP Japanese conversation task using an argument-based approach, with a focus on rater reliability and rating scale functioning. Data were collected from 102 high school students through a test simulation, with three raters scoring the performances using a common 7-point scale. Test scores were analyzed across raters and speech acts using the Partial Credit Rasch model. Results provided support for rater reliability but only limited support for the intended functioning of the rating scale. To enhance task validity, three potential modifications were proposed: controlling speech act types and numbers, reducing the number of score categories, and modifying the scoring procedure. This study sheds light on the validity argument for the AP Japanese conversation task and addresses the scarcity of validity evidence for this exam. The findings underscore the importance of empirically confirming rating scale functioning in any assessment context. |
---|---|
ISSN: | 1930-9031 2689-2979 |