The Arizona 4 is an Articulation and Phonology assessment that measures misarticulation in children aged 18 months to 21 years. However, in order to correctly assess articulatory deficiencies, the exam must be reliable and free from measurement error. Accurate scoring translates to confidence in clinical applications.
There are three approaches to estimating the reliability of the Arizona 4. These include internal consistency, test-retest reliability, and interrater reliability.
Internal consistency measures how well each item on a scale consistently evaluates the same trait. For internal consistency to be adequate, it needs to reliably predict the performance of the remaining subsets based on the subsets that have already been evaluated. If the exam lacks internal consistency, that means there is an error in measurement that is influencing the test results.
In the same manner as many other psychological and developmental evaluations, the Arizona 4 uses the split-half method to measure internal consistency. The presence of uniformly high reliability coefficients indicates a strong internal consistency throughout all age ranges.
The same methods for measuring internal consistency were used in clinical samples and yielded high-reliability coefficients as well, meaning that across both typically developing and clinical samples, the Arizona 4 is internally consistent.
This method involves measuring the stability of test scores over time by administering the test to the same subject at two different times and then comparing the results. For the Arizona 4, an interval of two weeks was used.
While test scores should not change considerably due to underlying language abilities, they may change based on variations in performance or repeated exposure to test stimuli. The effect sizes are small across all tests, indicating little change in performance between times of administration. This shows that the Arizona 4 is reliable across multiple testings of the same subject.
Examiners conducting the Arizona 4 receive parallel training and coding instructions that they use to make judgments about the quality of a subject’s responses. These subjective judgments could potentially affect the scoring of the exam if not made correctly.
The interrater reliability of the Arizona 4 is measured by the intraclass correlation coefficient. Word Articulation and Sentence Articulation were examined by having one administrator perform the exam while another administrator observed and scored their observations. The results indicated excellent agreement between the scores given, and thus, interrater reliability.
Interrater reliability of the Phonology test used three samples to measure the scoring by trained research assistants as well as examiners in the field. The first sample measured the research assistants and the results revealed excellent measurement properties among all trained research assistants who coded the data.
The next sample was the full Arizona 4 standardization sample and compared the coding applied by trained research assistants with that of field examiners. The results were consistent with the findings of the first sample and demonstrate reliable coding even with very brief coding instructions.
The third sample was from the Word Articulation and Sentence Articulation group and the results revealed excellent measurement properties consistent with what was found in the Word Articulation and Sentence Articulation studies.
The strong reliability of the Arizona 4 in Internal Consistency, Test-Retest Reliability, and Interrater Reliability indicates consistent and reliable results across all tests for accurate measurement of developmental speech and articulatory deficiencies.