|
|
Reliability and Validity
Reliability and Validity of the LIFO® Personal Style SurveyIan Tibbles MA (Cantab.), MSc (London)
Introduction
|
| Orientations
|
Favourable | Unfavourable |
| Supporting / Giving-in | 0.54 | 0.54 |
| Controlling / Taking-over | 0.70 | 0.61 |
| Conserving / Holding-on | 0.63
|
0.46 |
| Adapting / Dealing-away
|
0.61 | 0.37 |
The reporting of the stability of test results over time is usually reported as part of the data around the performance of any psychological instrument. Test/retest data has a less clear meaning with regard to test reliability than internal consistency data. However, it cannot be determined whether the person has changed over time, has reported him or herself from two different standpoints (not test-related) or whether the survey evokes different kinds of reporting at different times. There is also the attenuation problem; on the second completion of the survey, it is no longer really new – even though in the study reported below, meaning was not put on the test between the first and second administration. Still, in all, one should expect some amount of stability if the test measures salient variables, though apparent shortcomings are very hard to interpret.
The Personal Style Survey was administered to 63 graduate students and then re-administered after five weeks. The subjects were not given their scores or any information about the meaning of the survey until after the second administration. The simple product-moment correlations are as follows:
| Orientations | Favourable | Unfavourable |
| Supporting / Giving-in | 0.49
|
0.53
|
| Controlling / Taking-over | 0.61 | 0.57
|
| Conserving / Holding-on | 0.62
|
0.60 |
| Adapting / Dealing-away | 0.69 | 0.39
|
It is of interest to see whether the Life Orientations® Method style descriptions change from one administration to the next. Each pair of test profiles was analysed to note whether the basic descriptions changed. The results of this analysis are as follows:
No change (favourable) 38 of 6360%
No change (unfavourable) 31 of 6349%
No change (considering both) 19 of 6330%
Even though 30% of those tested showed virtually identical scores on both administrations, it was suspected that those who showed a clearly predominant style preference would be less likely to change; that is, if the test really measures some genotype variables. Again, the test was considered in two parts, the “favourable” style and “unfavourable” style. 21 subjects showed a predominant style choice (5 points more than any other score) on the “favourable” scales and of those, 14, or 67%, showed the same style preference on the second administration. 20 subjects showed a predominant “unfavourable” style with 16, or 80%, showing no change on the second taking.
These same data were also examined to pick out those subjects who had clear “favourable” and “unfavourable” styles that were the same, another gross measure of strength of preference. Of the 27 who showed such a pattern on the original administration 17, or 63%, showed no change with the second administration. The expectation that those who have clear style preferences are less likely to change over time is strongly supported.
Overall, it is evident that the Personal Style Survey measures pretty much the same thing in people over time though, as stated earlier, the interpretation of less than perfect stability is difficult. Some anecdotal evidence suggests that changes in scores could be due to subjects focusing on different parts of their lives as they took the test at different times, or that they could respond differently according to mood. One person reported some progress in his personal therapy between the first and second administrations, and felt the second test results reflected more what he was going after and the first a rather pessimistic view of himself. But this sort of evidence only adds to the confidence in the survey’s reliability and usefulness.
In demonstrating why the survey should be considered to be reliable it is important to make the following points:
The technically minded will be aware that the transparent construction of the survey limits its performance in test/retest. Having completed the survey once completing the same survey at a later date can allow some unconscious manipulation of data – if the individual has had feedback on their profile (unlike the study described above) they may answer on the second occasion as they think they should. Licensees may not be aware that we already have a Personal Style Survey – Version Two for use with individuals who wish to assess how their behaviours may have changed. During 1998 we will be making available for the first time a range of surveys where the sequence of the answers has been randomised. We will notify licensees in the quarterly newsletter when they are available to purchase.
Reliability has importance because of its relationship to the validity of the survey. Whilst reliability is about the measurement, validity is about the relevance and usefulness of what is measured. It is possible for a survey to be reliable i.e. to measure the same thing consistently and with precision and for what it measures to be of no use or invalid. An example of this would be – knowledge of the person’s behavioural preferences is not a valid measure of their intellectual ability (the Personal Style Survey does not measure this). However, it is not possible for survey results to be valid if the data is not reliable.
We shall distinguish three types of non-technical validity which in a sense could be argued not to be validity at all:
...and four main types of technical validity:
Face validity
Face validity is concerned with whether an instrument appears to measure what it was designed to measure. Whilst face validity has no technical or statistical basis, it must not be overlooked if a survey is to be accepted by participants or (psychometrically) untrained managerial staff.
Content-analytic validity
One sometimes hears test users speak of content-analytic validity where the item content of a test has been analysed and related subjectively to abilities that are of assumed importance in the job. As an illustration, the argument might go:
This is often what untrained people call validity but it has obvious flaws in failing to define what the specific characteristics of a good salesman are and how the survey will measure these.
Faith validity
This is often the most difficult to deal with. It is a belief in the validity of an instrument without any objective data to back it up, and the evidence is not wanted!
The more empirically based concepts are:
Content validity
This is mainly in relation to attainment tests e.g. a spelling test containing only the names of politicians in America would be a poor test of general spelling in the United Kingdom. High content validity should always be checked with one of the empirical methods of validation described below when using any survey as a test.
Construct validity
Construct validity is more abstract than the other forms of validity and is the extent to which a test measures some theoretical construct or trait. Such constructs might be mechanical, verbal or spatial ability, emotional stability or intelligence. Building up a picture of the construct validity of a test can be a long process and involves any information that throws some light on the nature of the construct under investigation. The complex statistical technique which goes past the more visual inspection of inter-correlations between different tests and which is often met in construct validation is known as factor analysis.
Other information, which can lead to an understanding of the construct validity of a test, includes internal consistency and the effect of experimentally controlled variables and also variables such as age, sex and culture on test scores.
Concurrent validity
Concurrent validity is the relationship between test scores and some criterion of performance obtained at the same time. Thus, if we were to test a group of computer programmers and correlate the results with supervisors’ ratings of work performance, we would have undertaken a concurrent validity study.
Where we wish to know the current status of an individual, concurrent validity is the most appropriate form of validity. Some organisations, for example, use attainment tests of job knowledge at the end of training courses or in making decisions on staff promotion. However, although a test may be of high concurrent validity it does not necessarily mean that it will be useful in predicting later performance.
Predictive validity
This is the extent to which a test predicts some future outcome or criterion. This is of crucial importance in personnel selection and placement. Two difficulties in relation to this form of study are:
Statistical benchmarks for validity studies are set at much lower levels than reliability – usually a correlation of between 0.2-0.3 as opposed to 0.6-0.7 for reliability reflecting the difficulty of achieving secure findings in validity studies!
Of the non-empirical measures only face validity has any relevance – the other non-empirical measures are seriously flawed and therefore inapplicable.
The whole range of Personal Style Surveys has very high face validity according to feedback received from licensees and course participants over many years. The reasons for this are:
Faith and Content-Analytic validity are unsound measures and should be discounted.
The empirical measures all presuppose some form of testing as they all require some form of standard to measure the survey against:
The difficulty here is that the Personal Style Survey is not designed to measure performance or ability – only behavioural preferences. As it is not used in isolation as a test there is no basis for doing such studies. A number of studies do exist on the use of the survey in career development and assessment centres but these are measuring the overall effectiveness of the process i.e. the combination of instruments and exercises – not the Personal Style Survey on its own. Information from licensees consistently indicates that the survey is very useful in processes where other instruments and processes can validate its results. It provides a helpful focus, which can be explored in more depth with the other techniques.
The Personal Style Survey is one of the most widely used behavioural surveys in the world. Because of the open process which is employed it is one of the most reliable and meaningful insights an individual can have into their subconscious self-understanding. The individual completing the survey can validate the findings against their self-experience and against the knowledge of them that others have. This information can be used to amend and extend the analysis provided by the survey results, which ensures a refinement of measurement, which is subtler and more robust than a statistical coefficient in isolation.
The ability of the individual to understand, explore and check out the survey results against real life data creates a more meaningful and valid outcome than a validity study can provide – the understanding and ownership of the conclusions are with the client rather than the coach/counsellor. Statistically the level of confidence achieved by validity studies is much lower than that derived from reliability studies and there are numerous examples where difficulties in measuring with confidence and flawed study techniques can all too often undermine the quality of the data generated.
Using a statistical framework to prove the reliability and validity of findings can (unintentionally) disempower clients as it is perceived by many as an incomprehensible “black box” which can create unnecessary threat and provoke caution and scepticism which is inhibiting and unhelpful in a development setting.
In contrast the Personal Style Survey and associated development exercises give the client ownership of the analysis using a client-centred process, promoting understanding and the confidence to consider new behavioural choices validated by their self-understanding and the feedback of friends and colleagues.