Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference
This study examined the validity of the scoring procedures for the AP Japanese conversation task using an argument-based approach, with a focus on rater reliability and rating scale functioning. Data were collected from 102 high school students through a test simulation, with three raters scoring th...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
National Council of Less Commonly Taught Languages
2025-07-01
|
Series: | Journal of the National Council of Less Commonly Taught Languages |
Subjects: | |
Online Access: | https://ncolctl.org/wp-content/uploads/2025/07/vol38-p4.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1839622960964435968 |
---|---|
author | Nana Suzumura-Smith |
author_facet | Nana Suzumura-Smith |
author_sort | Nana Suzumura-Smith |
collection | DOAJ |
description | This study examined the validity of the scoring procedures for the AP Japanese conversation task using an argument-based approach, with a focus on rater reliability and rating scale functioning. Data were collected from 102 high school students through a test simulation, with three raters scoring the performances using a common 7-point scale. Test scores were analyzed across raters and speech acts using the Partial Credit Rasch model. Results provided support for rater reliability but only limited support for the intended functioning of the rating scale. To enhance task validity, three potential modifications were proposed: controlling speech act types and numbers, reducing the number of score categories, and modifying the scoring procedure. This study sheds light on the validity argument for the AP Japanese conversation task and addresses the scarcity of validity evidence for this exam. The findings underscore the importance of empirically confirming rating scale functioning in any assessment context. |
format | Article |
id | doaj-art-08c2a729bc5a42ef8e6b83fb9f0c4c81 |
institution | Matheson Library |
issn | 1930-9031 2689-2979 |
language | English |
publishDate | 2025-07-01 |
publisher | National Council of Less Commonly Taught Languages |
record_format | Article |
series | Journal of the National Council of Less Commonly Taught Languages |
spelling | doaj-art-08c2a729bc5a42ef8e6b83fb9f0c4c812025-07-21T08:15:06ZengNational Council of Less Commonly Taught LanguagesJournal of the National Council of Less Commonly Taught Languages1930-90312689-29792025-07-0138151180Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation InferenceNana Suzumura-Smith0California State University, Long BeachThis study examined the validity of the scoring procedures for the AP Japanese conversation task using an argument-based approach, with a focus on rater reliability and rating scale functioning. Data were collected from 102 high school students through a test simulation, with three raters scoring the performances using a common 7-point scale. Test scores were analyzed across raters and speech acts using the Partial Credit Rasch model. Results provided support for rater reliability but only limited support for the intended functioning of the rating scale. To enhance task validity, three potential modifications were proposed: controlling speech act types and numbers, reducing the number of score categories, and modifying the scoring procedure. This study sheds light on the validity argument for the AP Japanese conversation task and addresses the scarcity of validity evidence for this exam. The findings underscore the importance of empirically confirming rating scale functioning in any assessment context.https://ncolctl.org/wp-content/uploads/2025/07/vol38-p4.pdfjapanese language testingargument-based approach to validityspeaking assessmentsimulated interactive conversationap japanese exam |
spellingShingle | Nana Suzumura-Smith Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference Journal of the National Council of Less Commonly Taught Languages japanese language testing argument-based approach to validity speaking assessment simulated interactive conversation ap japanese exam |
title | Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference |
title_full | Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference |
title_fullStr | Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference |
title_full_unstemmed | Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference |
title_short | Rater Reliability and Rating Scale Utility for the AP Japanese Computer-Simulated Conversation Task: Evaluation Inference |
title_sort | rater reliability and rating scale utility for the ap japanese computer simulated conversation task evaluation inference |
topic | japanese language testing argument-based approach to validity speaking assessment simulated interactive conversation ap japanese exam |
url | https://ncolctl.org/wp-content/uploads/2025/07/vol38-p4.pdf |
work_keys_str_mv | AT nanasuzumurasmith raterreliabilityandratingscaleutilityfortheapjapanesecomputersimulatedconversationtaskevaluationinference |