Rating Scales for Clinical Studies in Neurology—Challenges and Opportunities

Rating Scales for Clinical Studies in Neurology—Challenges and Opportunities

Published: US Neurology - Volume 4 - Issue I
dots

Rating scales are increasingly used as primary or secondary outcome measures in clinical studies in neurology.1 They are therefore becoming the key dependent variables upon which decisions are made that influence patient care and guide future research. The adequacy of these decisions depends directly on the scientific quality of the rating scales, which is reflected by the increased application of rating scale science (psychometrics) in health outcomes measurement in neuroscience and increasing regulatory involvement by governing bodies such as the US Food and Drug Administration (FDA).2,3 However, the majority of clinical studies in neurology that use rating scales are currently inadequate. Two simple examples illustrate some of the key issues.

First, current ‘state-of-the-art’ clinical trials in neuroscience continue to use scales that have been proved to be scientifically poor. This is demonstrated through even the most superficial of literature reviews. For example, in a brief literature search in PubMed we identified randomized controlled trials (RCTs) in multiple sclerosis (MS) published over a 20-year period (1987–2007). Of the 68 relevant articles, we found that 59% had used a rating scale. However, only six (15%) of those articles had included scales that had any supporting psychometric evidence. This situation can be found throughout neurology and is further exemplified by the continued widespread use of the Rankin scale in stroke research, despite growing concerns,4 the Ashworth scale, despite its inherent weakness as a single-item scale (see below), and the Alzheimer’s Disease Assessment Scale Cognitive Behavior Section (ADAS-cog) in dementia, despite important limitations (further information available from authors).

Second, statistical adequacy does not automatically confirm clinical validity or interpretability. An example from our own research focused on probably the most widely used patient-reported fatigue rating scale (currently used in over 70 studies). We conducted two independent phases of research. In the first phase, we carried out qualitative evaluations of validity through expert opinion (n=30 neurologists, therapists, nurses, and clinical researchers). The second phase involved a standard quantitative psychometric evaluation (n=333 MS patients). The findings from the second phase implied that the fatigue measure in question was reliable and valid. However, the qualitative study in the first phase did not support either the content or face validity. In fact, expert opinion agreed with the scale placement of only 23 items (58%), and classified all of its 40 items as non-specific to fatigue (further information available from authors).

Our research findings support the need for stringent quantitative and qualitative requirements for rating scales used in neurology; such scales must also be proved to be clinically meaningful and scientifically rigorous for valid interpretations of clinical studies. So, why is this not happening right now? There are two key problems. First, the numbers generated by most rating scales do not satisfy the scientific definition for measurements. Second, we do not really know what variables most rating scales are measuring. This article addresses these two problems by introducing some of the key issues in current rating scale research methodology. For readers who would like to learn more, we expand on these ideas in a recent review1 and forthcoming monograph.5

Rating Scales as Measurement Instruments — Some Basic Principles
Before anything can be measured, the variable along which the measurements are to be made must be identified and marked out.6 Common examples are rulers and weighing scales, which mark out length in centimeters (or inches) and weight in grams (or ounces), respectively. They highlight three central features of all measurements, as illustrated in Figure 1: first, instruments are constructed to make measurements; second, the attribute being measured can be marked out as a line, or continuum, onto which the measurements can be located; and third, the markings on the continuum represent the units of measurement.

References:
  1. Hobart J, Cano S, Zajicek J, Thompson A, Rating scales as outcome measures for clinical trials in neurology: problems, solutions, and recommendations, Lancet Neurol, 2007;6:1094–1105.
  2. Food and Drug Administration, Patient reported outcome measures: use in medical product development to support labelling claims, 2006.
  3. Revicki D, FDA draft guidance and health-outcomes research, Lancet, 2007;369:540–42.
  4. Kasner SE, Clinical interpretation and use of stroke scales, Lancet Neurol, 2006;5(7):603–12.
  5. Hobart J, Cano S, Improving the evaluation of therapeutic intervention in MS: the role of new psychometric methods, Monograph for the UK Health Technology Assessment Programme, in press.
  6. Wright BD, Masters G, Rating scale analysis: Rasch measurement, Chicago: MESA, 1982.
  7. Hobart JC, Rating scales for neurologists, J Neurol Neurosurg Psychiatry, 2003;74(Suppl. IV):iv22–iv26.
  8. Kurtzke JF, Rating neurological impairment in multiple sclerosis: an expanded disability status scale (EDSS), Neurology, 1983;33: 1444–52.
  9. Ashworth B, Preliminary trial of carisoprodol in multiple sclerosis, Practitioner, 1964;192:540–42.
  10. Rankin J, Cerebral vascular accidents in patients over the age of 60: II. Prognosis, Scott Med J, 1957;2:200–15.
  11. Hauser S, Dawson D, Lehrich J, Intensive immunosuppression in progressive multiple sclerosis: a randomised three-arm study of high dose intravenous cyclophosphamide, plasma exchange, and ACTH, N Engl J Med, 1983;308:173–80.
  12. Hoehn MM, Yahr MD, Parkinsonism: onset, progression, and mortality, Neurology, 1967;17:427–42.
  13. Mahoney FI, Barthel DW, Functional evaluation: the Barthel Index, Md State Med J, 1965;14:61–5.
  14. Granger CV, Hamilton BB, Sherwin FS, Guide for the use of the uniform data set for medical rehabilitation, Buffalo: Research Foundation of the State University of New York, 1986.
  15. Hobart JC, Riazi A, Lamping DL, et al., Measuring the impact of MS on walking ability: the 12-item MS Walking Scale (MSWS-12), Neurology, 2003;60:31–6.
  16. Collen FM,Wade DT, Robb GF, Bradshaw CM, Rivermead Mobility Index: a further development of the Rivermead Motor Assessment, Int Disabil Stud, 1991;13:50–54.
  17. Nunnally JC, Psychometric theory. 1st ed., New York: McGraw-Hill, 1967.
  18. Bridgeman P, The logic of modern physics, New York: Macmillan, 1927.
  19. Michell J, Measurement: a beginner’s guide, J Appl Meas, 2003;4(4):298–308.
  20. Michell J, An introduction to the logical of psychological measurement, Hillsdale, New Jersey: Lawrence Erlbaum Associates, 1990.
  21. Michell J, Measurement scales and statistics: A clash of paradigms, Psychol Bull, 1986;100(3):398–407.
  22. Wright BD, Linacre JM, Observations are always ordinal: measurements, however must be interval, Arch Phys Med Rehabil, 1989;70:857–60.
  23. Thorndike EL, An introduction to the theory of mental and social measurements, New York: The Science Press, 1904.
  24. Thurstone LL, Theory of attitude measurement, Psychol Rev, 1929;36:222–41.
  25. Merbitz C, Morris J, Grip J, Ordinal scales and foundations of misinference, Arch Phys Med Rehabil, 1989;70:380–12.
  26. Traub R, Classical Test Theory in historical perspective, Educational Measurement: Issues and Practice, 1997(winter):8–14.
  27. Novick MR, The axioms and principal results of classical test theory, J Math Psychol, 1966;3:1–18.
  28. Lord FM, Applications of item response theory to practical testing problems, Hillsdale, New Jersey: Lawrence Erlbaum Associates, 1980.
  29. Allen MJ, Yen WM, Introduction to measurement theory, Monterey, California: Brooks/Cole, 1979.
  30. Lord FM, Novick MR, Statistical theories of mental test scores, Reading, Massachusetts: Addison-Wesley, 1968.
  31. Massof R, The measurement of vision disability, Optom Vis Sci, 2002;79:516–52.
  32. Hambleton RK, Swaminathan H, Item response theory: principles and applications, Boston, Massachussets: Kluwer-Nijhoff, 1985.
  33. Lord F, A theory of test scores, Psychometric Monographs, 1952; no. 7.
  34. Lord FM, The relation of the reliability of multiple-choice tests to the distribution of item difficulties, Psychometrika, 1952;17(2): 181–94.
  35. Wright BD, Solving measurement problems with the Rasch model, Journal of Educational Measurement, 1977;14(2):97–116.
  36. Andrich D, A rating formulation for ordered response categories, Psychometrika, 1978;43:561–73.
  37. Wright BD, Stone MH. Best test design: Rasch measurement. Chicago: MESA, 1979.
  38. Cook K, Monahan P, McHorney C, Delicate balance between theory and practice, Med Care, 2003;41(5):571–4.
  39. Fisher W, The Rasch debate: Validity and revolution in education measurement. In: Wilson M (ed.), Objective measurement: Theory into practice, Norwood, NJ: Ablex, 1992.
  40. Andrich D, Controversy and the Rasch model: a characteristic of incompatible paradigms?, Med Care, 2004;42(1):I7–I16.
  41. Cronbach LJ, The two disciplines of scientific psychology, Am Psychol, 1957;12:671–84.
  42. Stenner AJ, Smith M, Testing Construct theories. Percept Mot Skills, 1982;55:415–26.
  43. Popper K, The Logic of Scientific Discovery, London: Routledge, 1992.
  44. Kuhn TS, The structure of scientific revolutions, Chicago: University of Chicago Press, 1962.
  45. Nicholl L, Hobart JC, Cramp AFL, Lowe-Strong AS, Measuring quality of life in multiple sclerosis: not as simple as it sounds, Mult Scler, 2005;11:708–12.
  46. Andrich D, A framework relating outcomes based education and the taxonomy of educational objectives, Studies in Educational Evaluation, 2002;28:35–59.
  47. Andrich D, Implication and applications of modern test theory in the context of outcomes based research, Studies in Educational Evaluation, 2002;28:103–21.
  48. Hobart JC, Riazi A, Thompson AJ, et al., Getting the measure of spasticity in multiple sclerosis: the Multiple Sclerosis Spasticity Scale (MSSS-88), Brain, 2006;129(1):224–34.
  49. Streiner DL, Norman GR, Health measurement scales: a practical guide to their development and use. 2nd ed., Oxford: Oxford University Press, 1995.
  50. Nunnally JCJ, Introduction to psychological measurement, New York: McGraw-Hill, 1970.
  51. Guilford JP, Psychometric methods. 2nd ed., New York: McGraw- Hill, 1954.
  52. Bohrnstedt GW, Measurement. In: Rossi PH, Wright JD, Anderson AB (eds), Handbook of survey research, New York: Academic Press, 1983:69–121.
  53. Cronbach LJ, Meehl PE, Construct validity in psychological tests, Psychol Bull, 1955;52(4):281–302.
  54. Campbell DT, Fiske DW, Convergent and discriminant validation by the multitrait-multimethod matrix, Psychol Bull, 1959;56(2): 81–105.
  55. Kerlinger FN, Foundations of behavioural research. 2nd ed., New York: Holt, Rinehart, and Winston, 1973.
  56. Stenner AJ, Smith M, Burdick D, Towards a theory of construct definition, Journal of Educational Measurement, 1983;20(4): 305–16.
  57. Enright MK, Sheehan KM, Modelling the difficulty of quantitative reasoning items: implications for item generation. In: Irvine SH, Kyllonen PC (eds), Item generation for test development, Mahwah, NJ: Lawrence Erlbaum Associates, 2002.
  58. Embretson SE, A cognitive design system appraoch to generating valid tests: application to abstract reasoning, Psychol Methods, 1998;3(3):380–96.
  59. Stenner AJ, Burdick H, Sandford EE, Burdick DS, How accurate are lexile text measures?, J Appl Meas, 2006;7(3):307–22.
  60. Stone MH, Knox cube test – revised, Itasca, IL: Stoelting, 2002.
  61. Stone MH, Wright BD, Stenner AJ, Mapping variables, J Outcome Meas, 1999;3(4):308–22.

Copyright® 2012 Touch Group PLC. All rights reserved.
Touch Neurology is for informational purposes and should not be considered medical advice, diagnosis or treatment recommendations.