Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference

This study examined the validity of the scoring procedures for the AP Japanese conversation task using an argument-based approach, with a focus on rater reliability and rating scale functioning. Data were collected from 102 high school students through a test simulation, with three raters scoring th...

Full description

Saved in:
Bibliographic Details
Main Author: Nana Suzumura-Smith
Format: Article
Language:English
Published: National Council of Less Commonly Taught Languages 2025-07-01
Series:Journal of the National Council of Less Commonly Taught Languages
Subjects:
Online Access:https://ncolctl.org/wp-content/uploads/2025/07/vol38-p4.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839622960964435968
author Nana Suzumura-Smith
author_facet Nana Suzumura-Smith
author_sort Nana Suzumura-Smith
collection DOAJ
description This study examined the validity of the scoring procedures for the AP Japanese conversation task using an argument-based approach, with a focus on rater reliability and rating scale functioning. Data were collected from 102 high school students through a test simulation, with three raters scoring the performances using a common 7-point scale. Test scores were analyzed across raters and speech acts using the Partial Credit Rasch model. Results provided support for rater reliability but only limited support for the intended functioning of the rating scale. To enhance task validity, three potential modifications were proposed: controlling speech act types and numbers, reducing the number of score categories, and modifying the scoring procedure. This study sheds light on the validity argument for the AP Japanese conversation task and addresses the scarcity of validity evidence for this exam. The findings underscore the importance of empirically confirming rating scale functioning in any assessment context.
format Article
id doaj-art-08c2a729bc5a42ef8e6b83fb9f0c4c81
institution Matheson Library
issn 1930-9031
2689-2979
language English
publishDate 2025-07-01
publisher National Council of Less Commonly Taught Languages
record_format Article
series Journal of the National Council of Less Commonly Taught Languages
spelling doaj-art-08c2a729bc5a42ef8e6b83fb9f0c4c812025-07-21T08:15:06ZengNational Council of Less Commonly Taught LanguagesJournal of the National Council of Less Commonly Taught Languages1930-90312689-29792025-07-0138151180Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation InferenceNana Suzumura-Smith0California State University, Long BeachThis study examined the validity of the scoring procedures for the AP Japanese conversation task using an argument-based approach, with a focus on rater reliability and rating scale functioning. Data were collected from 102 high school students through a test simulation, with three raters scoring the performances using a common 7-point scale. Test scores were analyzed across raters and speech acts using the Partial Credit Rasch model. Results provided support for rater reliability but only limited support for the intended functioning of the rating scale. To enhance task validity, three potential modifications were proposed: controlling speech act types and numbers, reducing the number of score categories, and modifying the scoring procedure. This study sheds light on the validity argument for the AP Japanese conversation task and addresses the scarcity of validity evidence for this exam. The findings underscore the importance of empirically confirming rating scale functioning in any assessment context.https://ncolctl.org/wp-content/uploads/2025/07/vol38-p4.pdfjapanese language testingargument-based approach to validityspeaking assessmentsimulated interactive conversationap japanese exam
spellingShingle Nana Suzumura-Smith
Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference
Journal of the National Council of Less Commonly Taught Languages
japanese language testing
argument-based approach to validity
speaking assessment
simulated interactive conversation
ap japanese exam
title Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference
title_full Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference
title_fullStr Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference
title_full_unstemmed Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference
title_short Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference
title_sort rater reliability and rating scale utility for the ap japanese computer simulated conversation task evaluation inference
topic japanese language testing
argument-based approach to validity
speaking assessment
simulated interactive conversation
ap japanese exam
url https://ncolctl.org/wp-content/uploads/2025/07/vol38-p4.pdf
work_keys_str_mv AT nanasuzumurasmith raterreliabilityandratingscaleutilityfortheapjapanesecomputersimulatedconversationtaskevaluationinference