It is imperative that paediatric occupational therapists use standardized assessments that possess sound measurement properties (scalability and validity) to measure visual perceptual deficits. One means of evaluating the scalability and validity of standardized visual perceptual instruments is the Rasch Measurement Model (RMM). The RMM is an Item Response Theory model that makes it possible to determine whether instruments possess interval level scaling (scalability), are unidimensional, possess stable item difficulty across different groups of subjects and have items that are ordered from least to most difficult (hierarchical ordering). It is also possible to determine whether two instruments are measuring the same theoretical construct.
The aim of this study was to examine the scalability and validity of four visual perceptual instruments frequently used by paediatric occupational therapists: the four motor-free subscales of
the Developmental Test of Visual Perception-2 (DTVP-2), the Visual Perceptual supplemental subscale of the Developmental Test of Visual Motor Integration (VMI), (3) the Motor Free Visual Perception Test-Revised (MVPT-R) and the Test of Visual Perceptual Skills (Non-Motor)-Revised (TVPS-R). It was hypothesized that: (1) the interval level scaling of the four visual perceptual instruments would be confirmed; (2) the unidimensionality of the four individual visual perceptual instruments and their subscales would be confirmed; (3) the item difficulty calibrations would be reproducible (stable) between boys and girls (referred to as differential item functioning); (4) the four visual perceptual instruments and their respective subscales would each form hierarchical indexes with adequate item spacing; and (5) the four visual perceptual instruments would measure the same theoretical construct.
Sample size estimations were based on the available literature and a purposeful sampling strategy to ensure that the relevant age range of children was represented. Hence, a sample of 280 subjects, 40 of each year of age inclusive of 20 boys and 20 girls, was chosen to allow item difficulty estimates to fall within 0.5 logits 95% of the time and to allow age representation of the sample. This also would allow evaluation of differential item functioning based on 140 boys and 140 girls. A sample of convenience ranging in age from 5 to 11 years representing the age-range of instrument use was recruited for this study. This age range was congruent with instrument age range and corresponded with the intended target subject group. To allow for potential drop-outs or incomplete data, the accrued target sample was 356 subjects to ensure 280 analyzable cases.
In all, 356 children were enrolled in the study; of those, 171 or 48% were boys and 185 or 52% were
girls. The children came from junior kindergarten through to grade 7. The total sample percentage distribution of children in each grade level was as follows: junior kindergarten, 3.1%, senior kindergarten, 14.9%, grade one, 16%, grade two, 13.8%, grade three, 16.3%, grade four, 15.7%, grade five, 9.3%, grade six, 8.4%, and grade seven, 2.5%. Half of the subjects were enrolled in the public school system (n=178), 26.7% were enrolled in the catholic school system (n=95), and the remainder were enrolled in the private school system (23.3%). The majority of the subjects spoke only English (71.3%), while the rest spoke English and French (25.6%), English and another language (1.7%), or English, French, and another language (1.4%). At each age level the gender distribution was approximately equal. The one exception was the 6-year old group in which there were 25 boys (44.6%) and 31 girls (55.4%).
Rasch modeling also allows the determination of the relationships
between sets of instruments completed by the same group of subjects. This is referred to as common test equating and this procedure can be used to compare two instruments alleged to measure the same construct. The final objective was to compare three types of scores, mean scale scores from the test manuals, total mean scale scores and clinical mean scale scores (from this study's respondents), to determine whether or not they are significantly different. This will provide important evidence that is currently lacking about the construct validity of these four instruments and their respective subscales.
The current versions of the four visual perceptual instruments published by the test authors did not meet RMM requirements. The overall results confirmed, that after misfitting scale items based on RMM calibrations were discarded, the four revised versions of the visual perceptual instruments did exhibit adequate levels of scalability,
unidimensionality and hierarchical ordering. However, when the four scales are considered individually, two performed better (DTVP-2 and TVPS-R) and, therefore, are most suitable for clinical use. In addition, the individual subscales of these two instruments exhibited better levels of scalability and construct validity than their overall motor-free visual performance scores which are calculated by summing their respective subscale scores.
While all of the final versions the visual perceptual instruments met the RMM requirements to some extent, the individual subscales of the DTVP-2, MVPT-R and TVPS-R exhibited better RMM fit. When subscales were combined into larger comprehensive scales, RMM fit was poor for all tests. In summary, the four DTVP subscales and the seven TVPS subscales appear to be the best measures for occupational therapists to use when assessing paediatric clients. The MVPT-R and VMI did not perform as well as the DTVP-2 and the TVPS-R in
terms of rigorous measurement properties. Combining the subscales to calculate perceptual quotients is not recommended. The results also suggest that visual perception is a complex multi-dimensional construct instead of one overall unidimensional construct. Clinicians are advised to consider the visual perceptual profile of individual children rather than regard visual perception as a singular cognitive ability.
Based on visual inspection of the person ability logit score plots, only three pairs of scales appear to measure the same dimension: (1) MVPT-R visual discrimination scale (version 1) and VMI visual discrimination part (version 6); (2) MVPT-R form constancy scale (version 1) and TVPS-R form constancy scale (version 2); and (3) DTVP-2 figure ground scale (version 2) and TVPS figure ground scale (version 2). This demonstrates that very few of the visual perceptual scales could be used in place of each other or to develop item banks. It also demonstrates
that the majority of the scales that claim to be measuring the same motor-free visual perceptual construct appear to be measuring different constructs. When the mean clinical scale scores, mean test manual scale score and mean total scale scores were compared, the majority of the scale scores were found to be statistically significantly different challenging the standardized administration instructions (in the DTVP-2, VMI and TVPS-R test manuals) requiring the termination of subscales when a ceiling score is reached. The clinical and theoretical implications of these findings are discussed.