Criticizing German Medical Fitness Tests for Fire-Fighters
Kritik an der Fahrrad-ergometrischen Leistungsuntersuchung bei Feuerwehrleuten
Background: The mandatory medical fitness examinations of firefighters and occupational divers with ergometry are important, but in Germany they are based on two criticized models, Reiterer´s model and the PWC-model.
Aim: We examine the weaknesses of these models and offer alternatives. Besides, we discuss other problems of the fitness examinations.
Methods: In bicycle ergometry tests with 8583 firefighters we collected data concerning age, mass and final achievements and developed mathematical models that allow us to estimate the Median of the final achievement from age and mass.The special relevance here is that our models are based on data of firefighters and are therefore better suited as basis of medical fitness tests – in contrast to Reiterer´s model which was conceived for another purpose.
Results: Reiterer´s model is based on measurements with less-fit test persons. As expected, on average it underestimates the achievements. The structure is clearly not as precise as our models. The PWC model is even less precise and is inclined to overestimate the achievements. Our algorithm Pincremental well describes the physical fitness of firefighters and can be used to construct reference models, based on gender, form of ergometry and failure rate.
Conclusion: We recommend the use of new models for medical fitness examinations. When using a bicycle ergometer, a gradual increase protocol with steeper gradient instead of the present increment should come into use. The fitness examinations of firefighters should be based more on data “from the job” and contain job-specific forms of ergometry, at least in part.
KEY WORDS: Bicycle Ergometry, Firefighters, Fitness Tests, Cardiopulmonary Performance
Hintergrund: Die Ergometrie bei den Eignungsuntersuchungen von Feuerwehrleuten und Berufstauchern ist wichtig, beruht aber auf zwei kritisierten Modellen, dem Reiterer- und dem PWC-Modell.
Ziel der Arbeit: Wir untersuchen die Schwächen dieser Modelle und bieten Alternativen an. Zudem diskutieren wir weitere Probleme der Eignungsuntersuchungen.
Material und Methoden: Aus Daten von 8583 Feuerwehrleuten hinsichtlich Alter, Masse und Endleistungen bei der Fahrrad-Ergometrie haben wir mathematische Modelle entwickelt, mit denen aus Alter und Masse der Median der Endleistung geschätzt werden kann. Von besonderer Relevanz ist, dass unsere Modelle auf Daten von Feuerwehrleuten beruhen und sich daher eher als Grundlage von Eignungstests eignen — im Gegensatz zum Reiterer-Modell, das für einen anderen Zweck konzipiert wurde.
Ergebnisse: Das Reiterer-Modell beruht auf Messungen mit leistungsschwächeren Probanden. Wie zu erwarten, unterschätzt es im Schnitt die Leistungen. Seine Struktur ist grob und deutlich ungenauer als ein von uns entwickeltes Modell. Noch gröber ist das PWC-Modell, das jedoch zur Überschätzung der Leistungen neigt. Wir haben mit dem Algorithmus PStufen eine Funktion gefunden, die die Leistung der Feuerwehrleute gut beschreibt und mit der man geeignetere Soll-Modelle konstruieren kann – ausgehend von Geschlecht, Belastungsart und Durchfallrate.
Schlussfolgerung: Wir empfehlen die Anwendung neuer Modelle für die Leistungsbestimmung und die Eignungsuntersuchung. Bei Nutzung eines Fahrradergometers sollte anstatt der bisherigen Stufen- eine Rampenbelastung mit größerer Steigung zur Anwendung kommen. Die Eignungstests der Feuerwehrleute müssten mehr auf Daten aus dem Einsatz beruhen und wenigstens in Teilen einsatzspezifische Ergometrieformen enthalten.
SCHLÜSSELWÖRTER: Fahrrad-Ergometrie, Feuerwehrleute, Leistungstests, kardiopulmonale Leistungsfähigkeit
The work of firefighters and professional divers is often extremely stressful and dangerous (6, 9, 11). Qualification tests for these professions must therefore be based on appropriate assumptions (2).
In order to estimate qualification, performance is determined, among other things, on cycle ergometers and compared to guidelines from the standard reference publication of the Deutsche Gesetzliche Unfallversicherung (DGUV) “DGUV Grundsätze für arbeitsmedizinische Untersuchungen“ (German Social Accident Insurance – DGUV “DGUV Principles of Occupational Medicine Examinations”). These are based on the Reiterer model and the Physical Working Capacity (PWC) model (8, 14). In both models, a theoretical performance value is calculated for cycle ergometry depending on age, weight and gender.
The Reiterer model delivers the empirically-estimated anticipated performance value at the end of the test – not for firefighters, but for office workers. There is an affine estimation function for each gender. The Reiterer model dates from the 1970s and is based, as used in the DGUV principles, on linear regression with cycle ergometry data of 154 men and 82 women (3).
The PWC model cites the minimum performance which a person must attain at a certain heart rate to be considered qualified. The minimum performance shows linear dependence on body weight and step-wise on age. The origins of the PWC170go back to the 1940s (17); despite research, we were unable to elucidate the origin of the PWC model used these days in the DGUV principles.
Problem and Objective
Both models are criticized. Occupational health physicians say that the models differ considerably from one another. Firefighters complain that the relevance of ergometry is somewhat unclear and that the evaluations are incomprehensible.
For this reason, we are analyzing both models and developing alternatives based on a representative database of more than 8500 firefighters. The main thesis of this article is that both the Reiterer and the PWC models are inadequate and should be replaced.
We prove the following theses: The Reiterer model underestimates performance. The PWC model places too-high claims on young, heavy persons. Affine models are generally unsuitable. Concrete alternative models more precisely describe our data.
Material und Methods
8583 firefighters (8119 men, 464 women) between 17 and 67 years of age were examined as part of occupational health qualification examinations or during internal fire-department performance tests. Between 2002 and 2017, the subjects performed a seated cycle ergometry at submaximal/maximal exercise as incremental or gradual-increase exercise according to the recommendations of the Deutsche Gesetzliche Unfallversicherung or the Deutsche Gesellschaft für Pneumologie und Beatmungsmedizin e.V. (German Respiratory Society) (8, 12). Details on the two exercise forms are given in Table 1. Data on gender, age, body weight and end performance were collected and used for the assessment.
Table 2 shows the distribution of weight, age and performance in the subject groups. The largest of the three groups consisted of 7859 men in incremental exercise; their data for weight and age are given in Figure 1.
We defined models based on cycle-ergometric data which predict mean and variance of the end performance using easy-to-interpret formulae – from age, weight and gender. The quality of a model type is checked mainly by crossvalidation.
The median estimates are based on polynomial regression of our data and the assumption that performance falls or stagnates starting at age 30, but does not increase. We selected this assumption since it is familiar regarding the general public (15, 18), because it fits the data for men in gradual-increase exercise as well as the data for women, and because even an unexpected trend change among the men in incremental exercise is recognized, despite the assumption – as a performance plateau at more advanced age.
After we determined our own median estimate function Pincremental for males (Pincremental) for the men in incremental exercise, we examined the Reiterer model, the PWC model and Pincremental for males for their suitability to describe the data.
Comparison of Reiterer and Pincremental Model with the Data
For this, we divide the cohort by age and weight, resulting in nine groups, similar in size. For the Reiterer model and for Pincremental for males we analyze the following: For each subject, the difference is determined between the measured performance and the prognosis. A positive value indicates underestimate, i.e. the prognosis is below the measured value. A negative value indicates overestimate. If a model estimates the median correctly, a person would have to be underestimated with a probability of 50% – that is our null hypothesis. In order to reject the null hypothesis for a model and for a given group, this 50% value would have to be outside the calculated confidence interval. We use a high confidence interval of 99.7% (three standard deviations in normal distribution), because, thanks to our large quantity of data, we have a great statistical power and because we perform several comparisons and want to avoid false positives. Further Comparison of the Reiterer Model and Pincremental for males with the DataWe determine the following for each of the two models: We calculate the difference between measured and estimated performance for each person. Then we take the arithmetic mean of all these differences. For each model, we thus obtain a mean deviation. A good model should have a small mean deviation.
The comparisons differ in the type of mean value. In the first comparison, it is the median, in the second it is the arithmetic mean. Implementation was made in the Program language R.
Unlike the models described thus far, the PWC Model is not a descriptive, but a normative model, i.e. it uses target values. As with the incremental exercise, performance is measured at a predetermined heart rate.
Since we show that Pincremental estimates the performance median well, the Pincremental-values should be above the PWC-values, otherwise most of the men would not be found qualitied. Therefore the difference: Pincremental-value – PWC-target value should be positive for each age and each weight.
According to DGUV principles, “deviations by more than 20% of target value are no longer considered normal“ (8). Many examiners rate subjects as qualified if they have achieved at least 80% of the PWC-target value. Logically, at least the difference Pincremental-value – 80%-PWC-target value must be positive, which we check.
Age, Weight and Performance of Men in Incremental Exercise
Data are available from 7859 men in incremental exercise. Figure 1 shows the distribution of age and performance in a row for light, moderate weight and heavy men. The deeper the red in the image, the more men there are of the corresponding age and attendant performance.
Moreover, grey-green graphs describe the estimated performance medians in dependence on age. More of the lighter men are younger than 35 and 80% of them achieve performance between 190 and 270 watts. Age and performance of the heavy men are higher, but distribution is more homogeneous. Each model obtained from our data, as well as the visualized raw data, reveal an unexpected increase in performance with age among the older, moderate-weight men in incremental exercise.
Model for Incremental Exercise in Men
The function Pincremental estimates the median of performance from age and weight: Pincremental/Watt=214+0,86·ΔM–ΔM2/68–ΔA3/164–ΔA4/6300±43(SD)
With ΔM=(M–Mmean)/kgand ΔA=(A–Aplateau)/years with Mmean=85kg and Aplateau=52.5 years. As an example, we look at men with A=38 years and M=80kg. For this, ΔA=38–52.5=-14.5 and M=80–85=-5. Application gives Pincremental (A; M)≈221 Watts±43 Watts (SD). That means, ca. 50% of all firemen of this age and weight achieve more than 221 watts; the other 50% achieve less than 221 watts. Figure 2 gives a graphic presentation of this estimate function.
Comparison of the Reiterer Model with the Data
In Table 3, the 99.7% confidence interval for the probability of underestimate is marked with an asterisk where the 50% does not lie within the interval, as for example for the older, light third of the subjects in the Reiterer model.
The Reiterer model underestimates the men in most of the groups. The higher the age and the lower the weight, the greater the underestimation.
That the Reiterer prognoses are sometimes reasonable, despite all weaknesses, is due to the great scattering of measured performances of about 47 watts. Because of this, the false prognoses carry less weight.
Comparison of our Model P incremental with the Data
Similar to the Reiterer model, the Pincrementalprognoses are too high by max. ca. 150 watts and max. too low by ca. 225 watts. Unlike with the Reiterer model, the frequency of over- and under-estimation is similar.
In Table 3, one can see in two ways that the Pincrementalmodel provides a better estimate of the median: One examines for each confidence interval whether the 50% are within the interval (main criterion) and secondly, whether the confidence interval is within a well-suited comparison interval [40%, 60%]. In the Pincrementalmodel, both criteria are met for all nine groups. In the Reiterer model, both the main criterion and also the second criterion are met for only three of the nine groups.
On average, Pincremental lies below the measured value by 2.6 watts – that is about a factor 5 less than for the Reiterer model with an average underestimate of ca. 13.5 watts.
With respect to our data for the women, PReiterer for females (PReiterer (f)) overestimates performance on average by ca. 29 watts, whereas Pincremental (f) overestimates performance on average by ca. 0.1 watts. Especially the light women are underestimated by the Reiterer model – due to its too-high weight coefficient.
Comparison of the PWC Model with the Data
Figure 3 shows the differences in estimated median Pincrementaland PWC target value in men younger than 40. Figure 4 shows the difference in estimated median Pincrementaland 80% of the PWC target value in the same men. As an example, we take A=35 years and M=90kg. In Figure 3, we find a difference of ca. -30 watts, i.e., the median lies ca. 30 watts under the PWC target value. Without reducing the demands to 80% PWC target value, most of these active firemen don’t qualify.
The differences vary greatly with the weight and less with the age, which is due to the high weight coefficient of the PWC model.
The left half of Figure 5 is based on our data for the 1587 men between 28 and 34 years of age in incremental exercise. For this age range, the estimated maximum heart rate is ca. 190 bpm. (beats per minute); 90% is ca. 170 bpm., which fits the PWC model. These men are classified in light, moderate-weight and heavy men; the weight limits have been selected to result in three approximately equal groups. There are two columns for each category. The red (or orange-colored) column shows how often the measured performance lies below the PWC target values (or 80% of the PWC target values). The right half of Figure 5 is based on our data for the 972 men between 50 and 56 years of age in incremental exercise and shows a slightly lower failure quota than the younger men. Among the women, too, the highest failure quota was among the heavier young subjects.
Like many top athletes, firemen often have to achieve peak performance within a very short time (11). Unlike the athletes before the 400-meter finale, for example, fire fighters wearing protective breathing apparatus can rarely warm up for the life-saving “competition”. Immediate preparedness is expected as soon as the alarm goes off.
For high-performance athletes, performance tests are part of the standard program, especially when the decision is to be made about cadre inclusion or exclusion. If a fireman gets a poor performance rating, the visible “A” for “Atemschutzträger” (breathing apparatus capability) is taken from his helmet. He has to leave the prestigious team and suffers from the loss just like the athlete. So appropriate validity of the test form used becomes even more important (20).
Ergometric tests of athletes and workers of whom high performance may be required can be performed for two reasons: for preventive-diagnostic indications to rule out pathologies under stress, and for performance-physiological indications to determine data for training management (19). While the S1-guideline of DGSP (5) focuses on preventive examinations in sports, examinations under the DGUV-principles for occupational medicine are intended less for preventive-diagnostic indications but place great value on the performance-physiological indication of ergometry. As evidence for the “determination of maximal performance“, tables modified from Reiterer are recommended. To prove qualification, the PWC model is used; it dates from the 1940s and has the least-refined structure of all models considered here (8, 14).
We were able to show that both models are of limited suitability for use in high-performance athletes and fire fighters using a self-contained breathing apparatus (SCBA). We therefore developed our own recommended model to describe the ergometric data of more than 8000 subjects.
Weaknesses of Reiterer Model
- Although the Reiterer model estimates average maximal performance, it underestimates the submaximal performance in incremental exercise. Since the performance in gradual-increase exercise is higher than in incremental exercise, the Reiterer model gives an even poorer estimate of maximal performance in gradual-increase exercise.
- The weight coefficient of the Reiterer model is too high for both genders by ca. factor 2.
- The Reiterer model is affine. Thus it predicts a constant increase in performance with weight, whereas our analysis shows that performance increases less with increasing weight.
- In each of the ergometry groups, there is a performance maximum at ca. 25 years of age. An affine function has, however, no unequivocal maximum.
- The original 25 weight classes of the model (14) are reduced in the DGUV-principles (8) to nine or ten classes. For men, body weight was considered from 60 to only 94kg, instead of 60 to 109kg; for women from 40 to 78kg instead of 40 to 89kg.
- The sample on which the Reiterer model was based, consists of factory workers, employees of laboratories and medical facilities and office workers (14). Our sample consists exclusively of firemen and fire department trainees, and their physical work capacity is above average.
In Reiterer, the performance capacity of the subjects was estimated prior to starting exercise, then determined whether performance should be increased every 2 minutes by 25 or 50 watts (14). In the study by Arstila, the exercise level was continuously increased under HR control (3). The Reiterer model in a modification by Arstila is based on data from 154 men and 82 women. We have data from 7859 men in incremental exercise, 260 men in gradual-increase exercise and 451 women in incremental exercise. Therefore, our models are more precise and show more details.
The weight categories used are no longer up-to-date (´7), especially the cut-off at 94kg (men) and 78kg (women) appear arbitrarily too low. Due to the special demands, among them the equipment weight of 25kg to 75kg, functionality of a woman weighing only 40kg using SCBA is hard to imagine, and employment of men or women weighing 60kg is not very frequent.
Due to the high scattering of performances, the deficiencies of the Reiterer model are not very important.
Weaknesses of the PWC Model
- The demands made on young, heavy persons are too high.
- The PWC model is the least-refined of the models compared here: in each of the two age groups, there is no further dependency on age, but only a linear dependency on weight.
- The weight coefficients are much too high.We don’t know how the PWC model with the discontinuity at 40 years and the exact criteria of passing was established and can therefore not explain its weaknesses.
Limitations of our Models
We only have data for 451 women in incremental exercise, from 260 men in gradual-increase exercise and hardly any data for women in gradual-increase exercise. These are, in fact, almost six times as many female and nearly twice as many male subjects as for Reiterer. However, our models for these groups, like in the Reiterer model, provide fewer details than Pincremental.
The Age Range of the Subjects
The age of our subjects is between 17 and 67 years. We selected 17 as the minimum age, since the qualification test for working with SCBA is possible starting at that age. As a maximum age, we selected the German retirement age, that is 67 years. The age distribution of our subjects, excepting those younger than 17 years, corresponds approximately to the sports-active population in Germany (1).
Thanks to the large sample of nearly 8000 firemen, our model Pincremental describes the work capacity of the male fireman collective validly and with great detail.
The Transition from a Descriptive Model to a Normative Model
A normative model can arise from a descriptive model and goals. For the fire department, there are two opposing goals: the desire for equal chances in safety  and at the same time, the desire for equal chances in choosing one´s occupation or volunteer work  often rule out each other.
In sports, the wish of many athletes to be included in a cadre could compete with various restrictions for only accepting a limited number of promising athletes in a cadre (10).
Dealing with goal , the occupational stress would have to be recorded as exactly as possible. All firemen have to bear this stress – independent of gender, age and weight. A dangerous situation or an unconscious heavy man, who has to be carried out of a burning building, has no consideration for gender, age and weight. For goal , the ergometry form must be similar to real situations or at least represent a surrogate defined in consensus, like climbing an endless ladder (4). Below, we show an example of a normative model which is arbitrarily oriented to goal . We are doing this, because it reflects common practice to date. A normative model can be obtained from our Pincrementalmodel by deciding on a failure rate. We arbitrarily select the failure rate of the 80%-PWC model, as calculated based on our data. This selection leads to model Pincremental, normative, presented as a formula and graphically (Fig. 6). The „Isowatts“ make it possible to read the suitability, instead of calculating it, which increases the usability.
We want to motivate a discussion, so that consensus can be found for a new normative model which describes, as fittingly as possible, the performance demands on firefighters as a special group of high-performance athletes. Such models could thus be used both to determine the training status and to determine qualification, while at the same time, being easy to use.
Thanks to its simplicity and availability, cycle ergometry could still be broadly used. However, we recommend gradual-increase exercise (13) up to maximum load (16). In specialized centers, specific for fire fighters´ “sports” endless ladders or stair ergometry could be performed in addition. But this requires further investigation.
We express our thanks to all the fire fighters and trainers who made it possible for us to collect so much data.
In addition, we express our thanks for reviewing and giving valuable tips to Dr. Dirk Boysen, Rose Schramm, Dr. Christina Roschat, Dr. Nadine Hauptmann and especially to Leon Schramm and Dr. Verena Heidrich-Meisner, who both invested many hours in improving the comprehensibility and precision of this article.
Conflict of Interest
The authors have no conflict of interest.
- Sportkonsum in Deutschland. Springer FachmedienWiesbaden: Wiesbaden; 2014.
- Comparison of cardiocirculatory and thermal strain of malefirefighters during fire suppression to exercise stress test andaerobic exercise testing. Am J Cardiol. 2008; 102: 1551-1556.
- Pulse-conducted triangular exercise-ECG test. Afeed-back system regulating work during exercise. Acta medicaScandinavica. Supplementum. 1972; 529: 3.
- Feuerwehrdienstvorschrift 7 - Atemschutz:FwDV 7 2002.
- S1-Leitlinie Vorsorgeuntersuchung im Sport.Deutsche Gesellschaft für Sportmedizin und Prävention (DGSP).2007.
- The physicaldemands upon (Dutch) fire-fighters in relation to the maximumacceptable energetic workload. Ergonomics. 2007; 47: 446-460.
- Körpermaße nach Alter und Geschlecht: Ergebnissedes Mikrozensus 2013; 2015.
- DGUV-Grundsätzefür arbeitsmedizinische Vorsorgeuntersuchungen.5., vollst. neubearb. Aufl. Gentner: Stuttgart; 2010.
- Physiological responses of firefighterinstructors during training exercises. Ergonomics. 2004;47: 483-494.
- Zur Selektion und Führung vonNationalmannschaften – einige Erfahrungen und Erkenntnisse.In: Rudern. Springer; 1988.
- Belastung von Atemschutzgeräteträgern.Zbl Arbeitsmed. 2015; 65: 87-91.
- Belastungsuntersuchungen in der Pneumologie. Pneumologie.2013; 67: 16-34.
- Gradual VersusContinuous Increase of Load in Ergometric Tests: Are the ResultsComparable? In: Pokorski M (ed): Body Metabolism and Exercise.Cham: Springer International Publishing 2015; 51-58.
- Methodik eines rektangulär-triangulärenBelastungstestes. Herz-Kreislauf. 1975; 1975: 457-462.
- Development and validation of criterionreferencedclinically relevant fitness standards for maintainingphysical independence in later years. Gerontologist. 2013; 53:255-267.
- Positionspapier zur Durchführung vonQualitätskontrollen bei Ruhe-, Belastungs- und Langzeit-EKG.Zeitschrift für Kardiologie 2005; 94: 844-57.
- Determination of the Physical Working Capacity:A Physiological and Clinical Study with special reference toStandardization of Cardio-Pulmonary Functional Tests. Actamedica Scandinavica. Supplementum. 1948; 1948.
- Meta-analysis of the age-associated declinein maximal aerobic capacity in men: relation to training status.Am J Physiol Heart Circ Physiol. 2000; 278: H829-H834.
- Leistungsmedizinische Ergometrie imKindes- und Jugendalter. Monatsschr Kinderheilkd. 2014; 162:216-221.
- Kompendium der Sportmedizin: Physiologie, InnereMedizin und Pädiatrie. Springer Science and Business Media;Springer; 2016.
Meimersdorfer Weg 217