# Translation of the BSS-R: Psychometric approaches to validation

There are a number of ways to evaluate the psychometric properties of a newly translated version of the BSS-R. Prior to a description of these below, an assumption is made that the translation process itself was robust and followed established guidelines in terms of forward-backward translation etc.

## Sample Size

A primary goal of the translation process in terms of psychometric validation is to ensure equivalence to the original UK English-language version of the BSS-R (Hollins Martin & Martin, 2014). In order to do this a minimum sample size is required in order to perform some of the statistical procedures, particularly those that evaluate the measurement model of the BSS-R. A balance also has to be made between psychometric rigour and any publishable outputs that may be envisaged from the translation as well as adoption within a clinical context. Evaluating the measurement model of the BSS-R generally requires the largest minimum sample size, thus this would represent a realistic minimum for a full psychometric evaluation involving a number of tests of validity and reliability.

### Evaluation of the measurement model

The BSS-R is underpinned conceptually by a tri-dimensional measurement model comprising (i) Stress experienced during childbirth, (ii) Women’s attributes and (iii) Quality of care. These domains comprise 4, 2 and 4 items BSS-R respectively and represent the sub-scales of the BSS-R. The measurement model assumes these domains represented by the BSS-R sub-scale items to be correlated, an observation found consistently in validation studies of the BSS-R. Evaluation of the tri-dimensional measurement model of the BSS-R is usually undertaken using confirmatory factor analysis (CFA) and the findings considered against threshold levels on established measures of ‘model fit’. Invariably a translation of the BSS-R following due process in translation process will offer a good fit to data when testing the tri-dimensional measurement model using CFA and with a sufficient sample. We would recommend a minimum sample size for undertaking the CFA to be N=200.

We have undertaken a simulation study which suggests that the minimum sample size could be a little lower (N=185), however we would still recommend a minimum sample size of N=200 as this represents a realistic threshold and would also be considered acceptable by most journals if a validation paper was submitted for publication.

### Internal consistency

The general accepted measure for internal consistency is Cronbach’s alpha (Cronbach, 1951). This should be at a level of 0.70 or above for the whole scale and the Stress experienced during childbirth and Quality of care sub-scale should be near to or exceed 0.70. The two-item Women’s attributes sub-scale may be more appropriately evaluated for internal consistency using inter-item correlation with a minimum of 0.15 for sub-scale acceptability (Clark & Watson, 1995).

### Known-groups discriminant validity

Known-groups discriminant validity (KGDV) may be evaluated in a number of ways, with hypotheses related to group difference at a whole scale or sub-scale level investigated to confirm this validity domain. An example are studies that have used unassisted vaginal delivery compared to an intervention delivery to examine group differences (Romero-Gonzalez et al., 2019; Skvirsky, Taubman-Ben-Ari, Hollins Martin, & Martin, 2019). However, any hypothesis-driven and evidence-informed group comparison may be undertaken in the same way. Dependent on the profile of data, statistical evaluation by parametric or non-parametric tests may be used, for example in the case comparison of two groups and data characteristics suitable for a parametric test, the independent t-test would be appropriate.

### Divergent validity

Divergent validity assumes no relationship between BSS-R total or sub-scale scores and a domain that, naturally, is not assumed to have a relationship with BSS-R scores. The usual statistical approach to this is to undertake a Pearson’s correlation coefficient (or the non-parametric equivalent test) with the expectation that a statistically significant correlation will not be found.

### Test-retest reliability

Test-retest reliability may be evaluated by re-administering the BSS-R to the same group of participants at a second observation point and testing for a statistically significant relationship between observation points a Pearson’s correlation coefficient (or the non-parametric equivalent test). It is noteworthy that test-retest reliability findings that are reported in research papers more generally have a too short second observation point to evaluate test-retest reliability. We would suggest a minimum test-retest period of three months consistent with the recommendations of Kline (2000).

The above are some suggestions for psychometric evaluation of a translated version of the BSS-R. We would recommend further reading to consider the range of statistical approaches that may be undertaken and the characteristics of the study that should be taken into account. A good paper covering a number of these issues is that of Martin and Savage-McGlynn (2013).

