Sports Orthopedics
Validation and Comparison of Heart Rate Measuring Methods

Validation and Comparison of Three Different Heart Rate Measuring Methods during Treadmill Performance Diagnostics

Validierung und Vergleich von drei verschiedenen Herzfrequenzmessverfahren bei der Leistungsdiagnostik des Laufbandes


Background: Different methods for heart rate (HR)-determination are used in routine performance diagnostics. Aim of the study was to compare different HR measurement methods during treadmill performance diagnostics.

Methods: 76 athletes (28.6±14.7 years, 38% female) performed a treadmill lactate threshold test. HR during testing was simultaneously assessed by analysis of a 12-lead electrocardiogram (ECG) both automatically (aECG) and manually (mECG) and a heart rate monitor (HRM). ECGs and HRM measurements were analyzed by two diagnosticians and finally, three different HR curves (aECG, mECG, HRM) were generated and compared at different time points.

Results: ECG-based HR detection revealed excellent reproducibility and reliability. Concerning HRM/aECG, faulty measurements were detected in 14.5%/36.8% of all athletes. However, constructions of HR/lactate curves were still possible in 84.6%/73.7% of all athletes. HR at different corresponding time points did not differ significantly between mECG and HRM/aECG (intraclass correlation coefficient >0.9/0.8 and coefficient of variation <5%/5%). In Bland-Altman analysis HRM/mECG and aECG/mECG, mean differences were usually low (3-5 bpm). Limits of agreement were relatively high (approx.±10 bpm).

Conclusions: Training areas defined by mECG may be used for home training control with HRM. If HRM measurements are used for the athlete’s training recommendations, HRs determined should be checked for plausibility and comparability with corresponding ECG measurements by physicians with appropriate expertise. Due to comparably high error susceptibility, aECG HR detection should not be used in performance diagnostics.

KEY WORDS: Heart Rate Detection, Heart Rate Monitors, Lactate Curve, Performance Diagnosis


Hintergrund: In der routinemäßigen Leistungsdiagnostik werden verschiedene Methoden zur Bestimmung der Herzfrequenz (HR) eingesetzt. Ziel der Studie war es, verschiedene HR-Messmethoden während der Leistungsdiagnostik des Laufbandes zu vergleichen.

Methoden: 76 Athleten (28,6±14,7 Jahre, 38% weiblich) führten einen Laufband-Laktat-Schwellentest durch. Die Herzfrequenz während der Untersuchung wurde gleichzeitig durch die Analyse eines 12-Kanal-EKGs, sowohl automatisch (aECG) als auch manuell (mECG), und eines Herzfrequenzmessgerätes (HRM) bewertet. EKGs und HRM-Messungen wurden von zwei Diagnostikern analysiert und schließlich wurden drei verschiedene HR-Kurven (aECG, mECG, HRM) generiert und an verschiedenen Zeitpunkten miteinander verglichen.

Ergebnisse: Die EKG-basierte HR-Erkennung zeigte eine ausgezeichnete Reproduzierbarkeit und Zuverlässigkeit. Im Bereich HRM/aECG wurden bei 14,5%/36,8% aller Athleten fehlerhafte Messungen festgestellt. Allerdings waren bei 84,6%/73,7% aller Athleten die Erstellung von HR/Laktatkurven dennoch möglich. Die HR an den entsprechenden Zeitpunkten unterschieden sich nicht signifikant zwischen mECG und HRM/aECG (Intraklassen-Korrelationskoeffizient >0,9/0,8 und Variationskoeffizient <5%/5%). In der Bland-Altman-Analyse HRM/mECG und aECG/mECG waren die mittleren Differenzen in der Regel gering (3-5 bpm). Die Grenzen der Übereinstimmung waren relativ hoch: (ca.±10 bpm).

Fazit: Die von der mECG definierten Trainingsbereiche können für die Heimtrainingskontrolle über HRM genutzt werden. Werden HRM-Messungen für die Trainingsempfehlungen des Athleten verwendet, sollten bestimmte HRs von Ärzten mit entsprechendem Fachwissen auf Plausibilität und Vergleichbarkeit mit entsprechenden EKG-Messungen überprüft werden. Aufgrund der vergleichsweise hohen Fehleranfälligkeit sollte die aECG HR-Erkennung nicht in der Leistungsdiagnose eingesetzt werden.

SCHLÜSSELWÖRTER: Herzfrequenzerkennung, Pulsmesser, Laktatkurve, Leistungsdiagnose


Treadmill ergometer testing with lactate measurement is frequently used in routine performance diagnostics (16). Lactate threshold series are established in performance diagnostics and training control (9, 10, 23). Determination of heart rate zones depicting different endurance ranges is one important parameter of these concepts (e.g. according to Dickhuth (2)).

Basically, there are various possibilities for heart rate (HR) determination. In addition to the expanding market for heart-rate tools based on pulse oximetry like smart watches, commercially-available heart rate monitors (HRM, e.g. POLAR®) with a chest strap are often used. Alternatively, HR can be derived from electrocardiogram (ECG) either by automatic measurement or manual measurement. Different endurance ranges derived from these measurements are directly used for home training control (18) and also allow competition prognosis (19). Concerning the first issue, it is noteworthy that, naturally, athletes use HRMs for HR determination at home.In daily practice, depending on the respective situation, HRs determined by ECG measurements are either transferred to home training with HRMs or, alternatively, it has to be assumed that HRs determined by the „black box“ HRM were measured with sufficient precision. However, to the best of our knowledge, validity and comparability of different methods of HRM during performance diagnostics have not been evaluated systematically. Therefore, these issues were the aims of our present study.

Material and Methods

80 athletes who underwent performance diagnostics including exercise ECG within sports participation screening were included in our study. Ethics approval has been obtained from the University’s Ethics Committee (protocol number 173/18). All participants signed a consent form to participate in this study. Resting heart rate was determined by using resting ECG after the athlete had been lying in a horizontal position for at least 5 min. Exercise testing was performed on a treadmill (h/p/cosmos venus 200/75, h/p/cosmos sports & medical GmbH, Nussdorf-Traunstein, Germany) according to recent recommendations (15, 22) using a standard step protocol (beginning 4 km∙h-1, 2 km∙h-1increase every 3 min, 30 sec breaks between the steps for lactate sampling). While the athletes were still running, HR for each velocity level were determined 10 sec before the end of a velocity step using three different methods: 1.) manual ECG analysis (mECG), 2.) automatic ECG analysis (aECG) and 3.) HRM. mECG was performed independently by two experienced internal medicine specialists, whereby all ECGs were evaluated twice by one of them with a time lag of at least two weeks between the analyses. For HR determination, four consecutive heart cycles at the respective velocity step were averaged. For HR determination by aECG, a medical assistant documented HR of aECG system (cardioPart 12 Blue T, Amedtec Medizintechnik Aue GmbH, Aue, Germany) without any plausibility check of the displayed heart rate. Finally, another medical assistant documented HR of an HRM (Polar FT1, Steinhausen, Switzerland) by watching on the monitor of the HRM wristwatch without any plausibility check and recorded the HR on a load protocol.

Four of the athletes were excluded as they completed <4 velocity levels and therefore sufficient fitting of lactate curve was not possible (17). A total of 76 athletes were included in the final analysis. Individual anaerobic threshold (IAT) was determined from lactate sampling using Ergonizer software (Version 4.9.3 Build 103; ©1991-2016, Kai Roecker, Freiburg i. Brsg., Germany).

Reproducibility (inter- and intraobserver variability) of mECG was evaluated with the following statistical approaches: 1.) comparison of mean value (paired t-test), 2.) intraclass correlation coefficient (ICC, two-way mixed model, absolute agreement) (8), 3.) coefficient of variance (CV) (14) calculated as a percentage: standard deviation of the differences multiplied by 100 and divided by mean value of the two measurements (20), and 4.) Bland-Altman analysis with limits of agreement (LoA) (5). All three measuring methods were compared with univariate ANOVA.

aECG is often problematic due to misinterpretation of movement artifacts, especially at higher running velocities. For identification of erroneous measurements, both heart rate courses of an athlete assessed by automatic ECG detection and, respectively, heart rate monitor were evaluated independently by two experienced persons (S.S. [sports scientist], R.L. [cardiologist, sports physician]). This was implemented by checking the plausibility of each individual heart rate measurement of a particular curve within the overall heart rate curve. If one heart rate measurement was independently categorized as erroneous by the two evaluators, this measurement was excluded from final heart rate fitting. If the two evaluators had different opinions, a third experienced evaluator (K.E. [specialist for internal medicine, sports physician]) was consulted who made the final decision in terms of in- or exclusion of a single measurement. If plausibility check led to the situation that <4 measurements (17) remained for fitting of the heart rate course, the respective curve (e.g. the curve with HRM fitting) was excluded from further analysis. But all other curves of the athlete (e.g. fitting with aECG and mECG) were used for further analysis. Summarized, a maximum of three different heart rates courses (aECG, mECG, HRM) were available in each athlete after plausibility check. Therefore, by using the heart rate fitting algorithm of Ergonizer software (fitting “auto”), a maximum of three different method-specific heart rates at various time points during ergometry (individual anaerobic threshold (IAT), [lactate] 2mmol/l, [lactate] 3mmol/l, [lactate] 4mmol/l, lactate threshold (LT), regenerative and long jog (LSD/RER), medium endurance run (MERmax), speed endurance run (SERmax), extensive interval training (EITmax) in one individual athlete were determined and consecutively compared with each other. The upper part of figure 1 (see supplement figure 1 online) (“methods”) gives an overview of the evaluation procedure.


Table 1 gives an overview of the clinical characteristics of the athletes. In univariate ANOVA, no statistical difference between the three methods (mECG, aECG, HRM) was observed at any of several time points during ergometry (pIAT=0.905; p2mmol=0.961; p3mmol=0.863; p4mmol=0.662; pLT=0,965; pLSD/RER=0,449; pMERmax=0.636; pSERmax= 0,638; pEITmax= 0,478).

Reliability of Manual Heart Rate Detection
Both at rather low and also at rather higher heart rates, reproducibility of manual heart rate detection was excellent. Means of heart rates at various treadmill velocities did not differ between assessment 1 and 2 of observer 1 (intra-rater variability) or between observers 1 and 2 (inter-rater variability). ICCs were always >0.9 and CVs were always <2.5%. LoA of Bland-Altman analyses were in the range of 10 bpm. In summary, the results spoke for a good reliability of manual heart rate detection. Therefore, we chose this method as the reference method in our study.

Heart Rate Detection Using HRM
Heart rates obtained by the HRM were manually evaluated by two observers as described in the method section. In the majority (82.9%) of the subjects, no HR measuring points of a respective fitting were evaluated as erroneous (see also supplement figure 1 online). In 14.5% of all subjects faulty measurements points were identified, but in 84.6% the HR fitting was still possible after removal of erroneous values. Finally, in 2.6% of all subjects, removal of erroneous measurement points led to the situation that <4 HR remained and therefore sufficient heart rate fitting was not possible anymore. Therefore, these subjects were excluded from further analyses. Table 2 (see supplement table 2 online) shows the results of the comparison of values determined at various defined time points. With exception of HR at [lactate] 4mmol/l, HR determined did not significantly differ between the two methods. ICCs were always excellent (>0.9) and CVs were always <5%.

In Bland-Altman analyses (Figure 2A, exemplary graphical illustration of Bland-Altman analyses of heart rates determined at IAT), mean difference of both measures was <2 bpm. However, LoA at various time points during ergometry were nevertheless relatively high (≈10 bpm). Similarly, LoA were also relatively high at other time points during ergometry (see supplement table 2 online).

For example, at IAT, relatively large LoA were caused by three outliers, figure 3, 4 and 5 illustrates their respective heart rate fittings. Differences of the determined HRs at IAT etc. were not caused by erroneous Ergonizer® HR fittings, which were acceptable in both mECG and HRM.


Automatic ECG-Based Heart Rate Detection
Two observers also evaluated HR determined by aECG (see methods). Only in 50.0% of all subjects, no HR measuring points of a respective aECG fitting were evaluated as erroneous (see supplement figure 1 online). Instead, in 36.8% of all subjects, faulty measurements points were identified but in 73.7% of them, HR fitting was still possible after removal of erroneous values as ≥4 heart rates were available. Finally, in 13.2% of all subjects, HR fitting was not possible anymore after removal of erroneous measurement. This led to exclusion of these subjects from further analyses.

Table 3 (see supplement online) shows the HR determined at various defined time points which did not significantly differ between the methods mECG (“gold standard”) and aECG. Good-to- excellent (0.8-0.9) ICCs were observed, CVs were <6.2%. Mean difference of both measures was <3 beats, but similar to HRM-based detection, LoA in Bland Altman analyses at various time points during ergometry were high (≈18bpm) as shown in table 3 (see supplement online) and, for example, at IAT, figure 2B.

Table 3, for example, compares heart rate detection at IAT by using either aECG or HRM. The latter method was generally accompanied by fewer artifacts and artifact removal led to impossible HR fittings to a lesser extent than aECG (table 4). Percentage of athletes with under- or overestimated heart rate at IAT in comparison with the “gold standard” mECG did not differ between aECG and HRM (table 4). Table 5 shows the artifacts detected at each speed level.


In our study, 76 athletes (28.6±14.7 years, 38% female; Vmax= 14.5±1.9 km∙h-1) performed a treadmill lactate threshold test. Heart rate during testing was simultaneously assessed by analysis of a 12-lead electrocardiogram automatically and manually and, by a heart rate monitor. As expected, we could demonstrate that manual evaluation of the 12-lead ECG seems to be still the most valid heart rate measurement during exercise performance testing. Principally, data of HRM can be used to create heart rate zones for training based on the lactate curve. However, measured values have necessarily to be checked for clinical plausibility. Due to many artifacts, automatic evaluation of the 12-lead ECG should not be used for heart rate determination.

To the best of our knowledge, there is currently no study that has investigated automatic HR detection systematically, most probably as different ECG systems also use different determination algorithms. Therefore, our data have to be understood as manufacturer-specific (Amedtec). In aECG, we saw faulty measurement(s) in every second subject. Consequently, individual training exhibited deviations in the heart rate zones of up to 23 bpm. Most probably, problems of the computerized algorithm for HR determination are caused by the noisy original signal. Potential sources of error in heart rate monitoring in aECG and HRM can be: poor electrode performance, damaged electrodes (due to heat, bending, etc.), electromagnetic interference from media devices (e.g. mobile phones), transmitter units transmitting at 5 kHz or 2.4 GHz (e.g. high voltage lines), and electrostatic clothing. But the main problem are motion artifacts, which increase at higher speeds. In order to keep this artifact as low as possible, the use of one-time uni-gel electrodes recommended. In addition, the skin should be sanded, disinfected with a little alcohol and kept cream-free before applying the electrode.

Studies using chest strap-based HR monitors, which detect electrical cardiac activity similar to ECG, were performed back in the 1980s. Most studies confirmed the accuracy of HR acquisition in inactive conditions such as supine and standing (12, 13, 21) and validity for measurements of the R-R intervals (1, 3, 4, 6, 24). There are also studies concerning the accuracy of chest strap-based monitors during exercise showing good comparability to the mECG (12, 13, 21). So far, the number of subjects (10 to 14) was very low and the intensity of exercise with a maximum of 10 km∙h-1was rather moderate (13, 21). Most studies performed exercise testing on a bicycle ergometer (11). Also in a recently published study by Gillinov, subjects were loaded with a maximum of nearly 9.7 km∙h-1(7). However, for training advice and for control of different training intensities, accuracy of HR detection is important especially at higher and maximum loads.

In comparison with aECG, accuracy of HR determination of HRMs was better due to lower LoAs, lower CV and higher ICC values. However, in Bland Altman analysis, LoAs were comparably high in both HRM and aECG. On the individual level, single unidentified outliers of HR determined by HRM (statistical correlate: large LoAs in Bland Altman analysis) sometimes resulted in the situation of totally different heart rate curves and training areas as for example shown in figures 3, 4 and 5. Therefore, it is important that lactate curve construction be performed not by auxiliary staff but by trained employees with sufficient expertise in performance diagnostics and validation of curve shape in the context of the clinical picture of the athlete concerning his performance capability.

Our study has several limitations. Our subjects were young, healthy volunteers who were tested under laboratory conditions. Results could be quite different in special subgroups for example cardiac, obese or oncological patients. The results are purely related to the treadmill and may not be representative in outdoor running or activity loads with less movement


Training areas defined by mECG as a gold standard during performance testing may be also used by the athletes for home training control with their own HRM, as HRs detected with these two different methods were mostly comparable. If HRM measurements are used for the athlete’s final lactate/HR curves and training recommendations, HRs determined during exercise performance test should be always checked by physicians with appropriate expertise for plausibility and comparability with corresponding ECG measurements due to relevant differences in some athletes. Due to comparably high error susceptibility, aECG HR detection should not be used in performance diagnostics.


Many thanks to all patients who participated, the team at the Division of Sports and Rehabilitation Medicine at Ulm University.

None declared, the study was funded only by own means.

Conflict of Interest
The authors have no conflict of interest.


  1. BARBOSA MP, DA SILVA NT, DE AZEVEDO FM, PASTRE CM, VANDERLEI LC. Comparison of Polar(R) RS800G3 heart rate monitor withPolar(R) S810i and electrocardiogram to obtain the series ofRR intervals and analysis of heart rate variability at rest. ClinPhysiol Funct Imaging. 2016; 36: 112-117.
  2. DICKHUTH H-H, HUONKER M, MÜNZEL T, DREXLER H, BERG A, KEUL J. Individual Anaerobic Threshold for Evaluation of CompetitiveAthletes and Patients with Left Ventricular Dysfunction, inAdvances in Ergometry, Bachl, N, Graham, TE,Löllgen, H,Editors. 1991, Springer Berlin Heidelberg: Berlin, Heidelberg. p.173-179.
  3. GAMELIN FX, BAQUET G, BERTHOIN S, BOSQUET L. Validity of the polarS810 to measure R-R intervals in children. Int J Sports Med. 2008;29: 134-138.
  4. GAMELIN FX, BERTHOIN S, BOSQUET L. Validity of the polar S810 heartrate monitor to measure R-R intervals at rest. Med Sci SportsExerc. 2006; 38: 887-893.
  5. GIAVARINA D. Understanding Bland Altman analysis. BiochemMed (Zagreb). 2015; 25: 141-151.
  6. GILES D, DRAPER N, NEIL W. Validity of the Polar V800 heart ratemonitor to measure RR intervals at rest. Eur J Appl Physiol. 2016;116: 563-571.
  7. GILLINOV S, ETIWY M, WANG R, BLACKBURN G, PHELAN D, GILLINOV AM,HOUGHTALING P, JAVADIKASGARI H, DESAI MY. Variable Accuracyof Wearable Heart Rate Monitors during Aerobic Exercise.Med Sci Sports Exerc. 2017; 49: 1697-1703.
  8. GISEV N, BELL JS, CHEN TF. Interrater agreement and interraterreliability: key concepts, approaches, and applications.Res Social Adm Pharm. 2013; 9: 330-338.
  9. HOLLMANN W. Die ärztliche Beurteilung der körperlichen HöchstundDauerleistungsfähigkeit. Umsch Wiss Tech. 1961; 22: 689-692.
  10. HOLLMANN W. The relationship between pH, lactic acid, potassiumin the arterial and venous blood, the ventilation (PoW) andpulsfrequency during increasing spiroergometric work inendurance-trained and untrained person. Pan AmericanCongress for Sports Medicine, 1959. Chicago.
  11. KINGSLEY M, LEWIS MJ, MARSON RE. Comparison of Polar 810sand an ambulatory ECG system for RR interval measurementduring progressive exercise. Int J Sports Med. 2005; 26: 39-44.
  12. LAUKKANEN RM, VIRTANEN PK. Heart rate monitors: state of the art. .J Sports Sci. 1998; 16: 3-7.
  13. LÉGER L, THIVIERGE M. Heart Rate Monitors: Validity, Stability, andFunctionality. Phys Sportsmed. 1988; 16: 143-151.
  14. MARCK A, ANTERO J, BERTHELOT G, SAULIÈRE G, JANCOVICI JM, MASSONDELMOTTEV, BOEUF G, SPEDDING M, LE BOURG É, TOUSSAINT JF. Are WeReaching the Limits of Homo sapiens? Front Physiol. 2017; 8: 812.
  15. MEZZANI A, AGOSTONI P, COHEN-SOLAL A, CORRÀ U, JEGIER A, KOUIDI E,MAZIC S, MEURIN P, PIEPOLI M, SIMON A, LAETHEM CV, VANHEES L. Standards for the use of cardiopulmonary exercise testing forthe functional evaluation of cardiac patients: a report fromthe Exercise Physiology Section of the European Associationfor Cardiovascular Prevention and Rehabilitation. Eur JCardiovasc Prev Rehabil. 2009; 16: 249-267.
  16. RÖCKER K. Die sportmedizinische Laktatdiagnostik: TechnischeRahmenbedingungen und Einsatzbereiche. Dtsch Z Sportmed.2013; 64: 367-371.
  17. RÖCKER K, DICKHUTH H-H. Praxis der Laktatmessung. Dtsch ZSportmed. 2001; 52: 33-34.
  18. RÖCKER K, STRIEGEL H, FREUND T, DICKHUTH HH. Relative functionalbuffering capacity in 400-meter runners, long-distance runnersand untrained individuals. Eur J Appl Physiol Occup Physiol.1994; 68: 430-434.
  19. ROECKER K, SCHOTTE O, NIESS AM, HORSTMANN T, DICKHUTH HH. Predicting competition performance in long-distance runningby means of a treadmill test. Med Sci Sports Exerc. 1998; 30:1552-1557.
  20. SYNEK V. Evaluation of the standard deviation from duplicateresults. Accred Qual Assur, 2008; 13: 335-337.
  21. TERBIZAN DJ, DOLEZAL BA, ALBANO C. Validity of SevenCommercially Available Heart Rate Monitors. Meas Phys EducExerc Sci. 2002; 6: 243-247.
  22. TRAPPE HJ, LÖLLGEN H. Leitlinien zur Ergometrie. Z Kardiol. 2000;89: 821-837.
  23. WASSERMAN K, MCILROY MB. Detecting the Threshold of AnaerobicMetabolism in Cardiac Patients during Exercise. Am J Cardiol.1964; 14: 844-852.
  24. WEIPPERT M, KUMAR M, KREUZFELD S, ARNDT D, RIEGER A, STOLL R. Comparison of three mobile devices for measuring R-R intervalsand heart rate variability: Polar S810i, Suunto t6 and anambulatory ECG system. Eur J Appl Physiol. 2010; 109: 779-786.
Sebastian V. W. Schulz
Division of Sports and Rehabilitation
Medicine, Department of Internal Medicine
University Hospital Ulm
Leimgrubenweg 14, 89075 Ulm, Germany