T. Kambic, MSH
Department of Population and Family Health Sciences
The Johns Hopkins School of Hygiene and Public Health
Hopkins Population Center Papers on Population
I would like to thank Drs. Sandro Girotto and Joseph Stanford, and Professor
Bernardo Colombo for their careful reading and comments on previous drafts
of this paper. Any errors or misstatements are the sole responsibility
of the author.
It is time for a comprehensive review of the effectiveness of natural family planning (NFP) used to space and avoid births. In 1991, I said that there was a ten year time lag between NFP evaluation and the evaluation of other birth spacing methods (Kambic 1991). In the intervening years, a number of new NFP studies have been finished, and we now have sufficient published data for an overview comparable to those that have been done for other methods.
Who is interested in the effectiveness of NFP methods? First of all, the couples who use the method are. They have the right to know the pregnancy rate in order to make informed decisions about using the method. They also have the right to know if one NFP method is more effective than another. Others who are interested in the pregnancy rate are the providers; the providers are the NFP experts. They should be able to speak knowledgeably about the methods to their clients, to medical practioners, and to the public. They must be able to present information about pregnancy rates supported in the biomedical literature. Finally, government and funding agencies are interested in pregnancy rates of various methods. Birth rates are vital indicators for economic and social welfare planning and effectiveness of birth spacing methods is directly linked to birth rates. Accurate information founded on good science is essential to convince governments to include NFP within their array of birth spacing services.
The methods used to evaluate NFP effectiveness are the same used to
evaluate other birth spacing methods such as the contraceptive pill, condoms,
sterilization, etc. This paper does not attempt to discuss evaluation methodology
in detail. The bibliography contains references to a number of papers which
provide thoughtful overviews of methodology. I will discuss methodology
topics as they relate to problems in NFP studies. This paper will cover
the measurement of pregnancy rates, the Pearl rate and life table. It will
then review the three methods of data collection; the survey, the retrospective,
and prospective study along with problems and sources of error in studies.
Next it examines pregnancy analysis in NFP clinics and discusses the reporting
of results. The paper discusses sources of data for the analysis and the
results of the analysis.
There are three ways commonly used in the literature to measure unplanned pregnancy in NFP. The first measures the percentage of women becoming pregnant who are using the method. The second is the Pearl rate, measuring the number of pregnancies per 100 women years of use; and finally, the life table giving the number of pregnancies per 100 women entering the study after a specific period of use. Let us look at these ways of measuring unplanned pregnancy.
The percentage of women who become pregnant while using a method is used in surveys. This is useful comparing birth spacing methods in populations. For example, a survey that compares the pill with Norplant can give us some indication of the relative effectiveness of each. But, this number not only tells us nothing about how long the women used the method before becoming pregnant but also is not a rate. A rate measures something over a time period.
The measurement of pregnancy rates is one of the statistical techniques known as survival analysis. Survival analysis is the study of the length of time that elapses before some event occurs or how long something survives or lasts. Survival analysis is used in many different fields. In engineering, we want to know how long a particular part will last until it fails or breaks. In medical science, we want to know how long a group of people with cancer will live, or how long a group of women using a contraceptive will remain without a pregnancy. As time passes, more and more go from our initial state - not pregnant - to our final state - pregnant. The numbers used to demonstrate this process should reflect the change over time.
The Pearl rate is the oldest measure of a pregnancy rate. It was introduced by Raymond Pearl in 1933. Here is the formula to calculate the Pearl rate: (Pregnancies/(months use))*1200. The denominator is the total number of months NFP was used by all women in the study from the beginning of NFP use until the time they stop using NFP. Put the number of unplanned pregnancies in the numerator, and multiply by 1200. There are 1200 months in 100 women years (100 women * 12 months). The result is the number of pregnancies per woman year.
The Pearl rate has a serious error. The longer the observation time of the study, the lower will be the Pearl rate. Figure 1 shows the studies in Table 1 that report pregnancy rates as Pearl rates. The Pearl rates are plotted against the average months of observation. This average is calculated as the total number of months observed divided by the number of women in the study. You can see that the longer we observe women, the lower the Pearl rate. The results are a function of the calculation. Study, for example, 100 women. If all 100 become pregnant in the first month of the study, we have the calculation,
((100 Pregnancies/100 Months of use)*1200)= Pearl rate of 1200.
If they become pregnant in the 12th month of the study we have the calculation,
((100 Pregnancies/1200 Months of use)*1200) = Pearl rate 100.
For any family planning method, the most fertile get pregnant first and others using the method continue without a pregnancy. One can follow these less fertile couples for a long time, and see the Pearl rate continue to decline. Another less important aspect of NFP use, is that couples who do not quickly become pregnant adjust to the method and become successful users.
In any event, the Pearl rate has both a "less fertile" and a "successful user" time bias. The Pearl rate is an inappropriate measure of the effectiveness of birth spacing measures and should not be used. Because the Pearl rate is time dependent, it is impossible to compare studies which report Pearl rates. Pearl rates can be used only to estimate the average rate of unplanned pregnancy after one year of use. I do this below.
The appropriate measure of the effectiveness of birth spacing methods is the life table. The life table calculates the percentage of users becoming pregnant in a way that increases over time. The life table avoids the calculation and fertile woman bias of the Pearl rate and was introduced as a measure of family planning effectiveness in 1963 by Robert Potter.
Let us look at a life table calculation. If 100 women begin a study and 10 become pregnant in the first time period, the pregnancy rate of the first period is 10/100 = 10%. Continuing at the beginning of the second period are 90 women or 90%. In the second period, ten more women become pregnant. The pregnancy rate at the end of this period is 10/90 = 11.1% and the continuation rate is 80/90=88.9%. Of the original 100 women, 80 are still using the method after two intervals. We get a continuation rate at the end of two periods by multiplying the 90% continuation rate of the first period by the 88.9% continuation rate of the second period (90% * 88.9%) = 80%. This is a simplified example, but one can see that the continuation rate of the women in the study is decreasing and the pregnancy rate is increasing. In a 12-month life table the time interval could be one week, one month, or three months.
Of course, women may drop out of a study for reasons other than unplanned
pregnancy. They may stop using NFP, move away, become ill, or have a planned
pregnancy. Life tables can account for all dropouts, and provide accurate
pregnancy rates. There are two ways to calculate a life table. In one,
all the reasons for dropping out of the study are taken into account. A
woman might become pregnant, stop follow up, move away, or become ill.
A life table that shows all reasons for discontinuing is called a multiple
decrement life table, known in family planning studies as a net life table.
Alternatively, from a statistical viewpoint we can consider that the only
reason a woman can leave the study is for an unplanned pregnancy. This
is a single decrement life table and is known in family planning as a gross
life table. The gross life table is preferred when comparing rates between
studies. (Chiang 1984, Farley 1986, Lee 1980, Potter 1987, 1981, 1990)
There are three ways to conduct epidemiological studies and these can be applied to studies of NFP effectiveness. The three major epidemiological approaches are the population survey, the prospective study, and the retrospective study. In the population survey, a sample of the general population is asked about their use of birth spacing methods. To conduct a prospective study we study the records of and interview NFP users rather than interviewing a sample of the general population. A retrospective study falls between the two. I am going to discuss these approaches in detail below and show that NFP effectiveness studies belong to a special type of prospective study, the observational study.
A survey is a random sample of the population at large to determine who is and who is not using the method. Let us say we want to determine for the USA or Italy, how many women are using NFP and how well they use it. Of course, you cannot interview each woman in the country. You must take a random sample of the women of the country, and let them represent the country as a whole. The larger the random sample, the more accurate your estimates will be. In the USA, we have the National Survey on Family Growth (NSFG) (Grady 1986). This survey has been taken four times, in 1973, 1976, 1982, and 1988. In 1982, the NSFG interviewed 7,969 women, or one in 7,000 women in the USA. Another way to think about this is that each woman who was interviewed represented the social characteristics and behavior of 7,000 women. The NSFG asks questions about birth spacing and other matters. The benefits of a survey are that the results are more generalizable than other studies. After a survey, an investigator can speak with some assurance about "women in Italy" or "couples in the US."
The classic retrospective study is a case control study. In such a study the investigator would interview women who became pregnant during NFP use and women who used NFP without a pregnancy. Differences in these two groups of women would highlight characteristics of those likely to become pregnant using NFP. There are few NFP studies of this type and they typically cannot report a pregnancy rate because the number of women who began to use NFP is unknown, we have no denominator.
Prospective studies are also known as clinical studies or clinical trials. In the strict sense, NFP effectiveness studies cannot be prospective clinical trials because in a clinical trial there is a treatment group and a control group where, "patients in both groups are enrolled, treated, and followed over the same time period" (Meinert 1986). NFP effectiveness studies fall into the category of observational studies. In an observational study, the study population "are selected by a means not chosen by the investigator" (Cochran 1983). Usually in NFP studies, there is only the observed group, and there is no comparison group.
In an observational study, women starting NFP use are observed over
a period of time. We may keep information on all those entering our clinics
and see who drops out, and who becomes pregnant. In a concurrent study,
data is collected as the women use the method. In a historical study, the
investigator examines the clinical records of the users at a later date.
Let us review the sources of error that can produce inaccurate and incorrect study results. These problems can affect all studies and it is useful to be aware of them when thinking about methodologies.
The category of "Lost to follow up" (LFU) is a major concern in studies. "Lost to follow up" occurs when we do not know what happened to a woman who began the study. Did she stop using the method? Is she pregnant? We either must find and interview her, or classify her as lost to follow up. A high percentage of women in the LFU category distorts the results as we do not know if the women are pregnant or are still using the method. A LFU above 15% is considered high.
If we survey women age 40 to 50 and find a low pregnancy rate, will the rates of women age 20 to 30 be comparable? Obviously not. If a study is done in Italy, will the conclusions be true for the rest of Europe, for Africa, for North America? These questions are called the "generalizability" of the study. We ask the question, "Can the results of the study be generalized to a larger group than the one studied?" Our study group may be so selective, that only women with similar characteristics may have similar success. In particular, NFP studies may encounter selection bias, whereby couples more disposed to use NFP will be less likely to drop out or to become pregnant.
Surveys and retrospective studies interview people about events that took place in the past. Subjects may not remember, or they may have selective memory where they remember only certain things. If a woman has an unplanned pregnancy and we interview her about it at a later time, she may try to rationalize it to herself and to the investigator to reduce her own concern. Inaccurate, incorrect, and selective memory are all elements of recall bias.
The NSFG was used to provide estimates of the number of women using "rhythm", and how often they became pregnant. The NSFG asked women if they were using contraception at the time they became pregnant with their most recent pregnancy, and what method they were using.
They were given 15 choices including the pill, IUD, sterilization and barrier methods. Of interest to us are two choices. Women were asked if they were using "Rhythm or safe period by calendar," or if they were using, "Natural family planning, safe period by temperature or cervical mucus test." When the results of the survey were tabulated, out of 6,306 women using some form of birth spacing, 221 (3.5%) women said they were using "rhythm" and 35 (0.6%) were using "Natural Family Planning." Fourteen percent (14%) of natural method users were using NFP, and the rest rhythm. These numbers are too small to provide accurate estimates of the numbers of NFP users as opposed to rhythm users, so NSFG analysts combined the numbers. This means that they combined the unplanned pregnancies of rhythm and NFP users and calculated that about 20% of women using "periodic abstinence" became pregnant. The problem is that these results include untrained women, who may or may not have used the method correctly. I conclude that these results are invalid. There is no way that these results can be verified. For prescription method users, one could go to a physician and confirm that the pill was prescribed. For barrier method users, one could ask which pharmacy was visited to purchase the barrier; but for natural method users, a women can say, "I was using rhythm," and even her spouse might not be able to confirm it.
Scientists or principal investigators of studies may have a hypothesis or idea which leads them to expect certain behavior, or a certain result; they may impose their views on the study and may neglect observations or results which do not fit their hypothesis. This is called observer bias.
Observer bias is the reason for "blind" or "masked" studies. In testing drugs to control cancer, for example, the users are randomly given either a placebo or the drug. Neither the patient nor the investigator knows who is taking which of the two treatments. Both groups will be treated equally. Random assignment to a group using NFP or a group not using NFP would be difficult, and Trussell terms it unethical. Obviously masking is not possible because users and investigators will know who is using which methods. One study (Wade 1979, 1981) compared ovulation method and symptothermal methods by random assignment of users to one or the other group.
Concurrent studies are also subject to observer bias. For example, an over zealous investigator might provide too much attention or follow up and pressure users and as a result users might either drop out of the study or have less pregnancies. The number and timing of follow up visits of the client to the clinic has been shown to be related to NFP effectiveness. (Kambic, Gray et al 1991).
Observer bias can be introduced in other places in the study, such as in the structure of the questions to be asked. To NFP advocates the following terms are all quite different: rhythm, calendar method, periodic abstinence, counting days, safe period, BBT, Ovulation Method, Billings, Basal Temperature, and Sympto-Thermal. An investigator who is not familiar with NFP may think all of these terms are equivalent and treat them as such. An NFP advocate would question the validity of a study which treated these terms interchangeably.
Natural family planning studies have their own particular problems that
may cause results to differ from study to study. When does a study begin
and end? Does it begin when the woman says she will use NFP, when she charts
her first day, her first month, or her third month? Some studies have permitted
a learning and an effectiveness phase which begins some time after the
women begins to chart (Wade 1981, WHO 1981). What if she used NFP before
the study begins? Should she be included? Trussell (1991) recommends that
women be interviewed three months after the study closure date to ensure
that all pregnancies occurring during study dates be included because a
woman may unknowingly be in the early stages of pregnancy on the last day
of the study.
The objective of an effectiveness study is to answer the question, "Of those women entering the study, how many have become pregnant after 12 months of use?" Study entry criteria should allow only women intending to avoid pregnancy to begin the study. We observe these women to an end point: pregnancy, discontinuation, or study closure. Women can leave the study at anytime in order to discontinue or to become pregnant. We therefore assume that any woman who remains in the study wants to avoid pregnancy. Certainly, women and couples can and do change their mind about the spacing and timing of conceptions; all we ask is that they notify the study before they attempt to become pregnant. We classify an unplanned pregnancy as a pregnancy occurring to a woman in the study. This rule avoids several problems.
The first problem is user ambivalence. A woman may be ambivalent about pregnancy, and decide, once she is pregnant, that she planned the pregnancy. She may be embarrassed that she either did not understand or did not follow the rules because of the circumstances of that particular day. The second problem is investigator bias in classification of pregnancies. The most notable example of this is the Weissman et al (1972) study of the Ovulation Method in Tonga, where Weissman discounted 50 pregnancies. Her analysis was disputed by both NFP and public health experts (Marshall 1972, Mosley 1972, Rochat 1972). Another example is Roetzer’s 1968 study of 180 women. He discounts 26 pregnancies because they were wanted. This is an example of the problem of retrospective pregnancy analysis where both recall and observer bias are possible.
In an attempt to deal with these issues Brennan and Klaus in 1982 developed a terminology for the categorization of pregnancy among NFP users. They said, "The proportion of pregnancies among those who use NFP to conceive is termed the planned pregnancy rate... The term pregnancy avoidance applies to couples using NFP to avoid or delay pregnancy." They go on to denote four categories of pregnancies among those avoiding. Method related which are pregnancies which occur despite application of the rules; informed choice which result from intercourse on a fertile day(s) without previous indication of planning pregnancy; teaching-related which result from incorrect teaching or learning; and unresolved. This terminology is congruent with the definition of unplanned pregnancy given above.
In another approach, Hilgers argues that conceptions occurring from intercourse during identified infertile days are the only ones that can be classified as unplanned and conceptions from intercourse during fertile days are achieving-related (Doud 1985, Hilgers et al 1980). Hilgers definition of unplanned pregnancies results in almost no unplanned pregnancies, and pregnancy rates which use this definition of unplanned pregnancy cannot be compared with rates using the standard definition above (deBethune 1984). An alternative conservative interpretation of studies using the Hilgers definition of unplanned pregnancy is to consider the total pregnancy rate of the study as the unplanned pregnancy rate. This is conservative in the sense that a stricter definition of unplanned pregnancy is applied to these studies.
In the past, NFP studies attempted to compare "user failures" with "method
failures." User failures were pregnancies resulting from intercourse in
the fertile time and method failures from intercourse in the recorded infertile
time. To calculate each of these rates, NFP studies used all the cycles
in the study as the denominator. Trussel and Grummer-Strawn (1990, 1991)
have pointed out that using all of the study cycles as a denominator to
calculate method and user failures was incorrect. The actual probability
of a method failure occurring is the number of method failures divided
by the number of cycles in which a couple was subject to the risk of a
method failure. The probability of a user failure was the number of user
failures divided by the number of cycles in which a couple was subject
to the risk of a user failure. Trussel and Grummer-Strawn have suggested
new terminology - perfect and imperfect use - rather than user and method
failure. To calculate perfect and imperfect use, a study has to collect
information on intercourse for every cycle in the study. This adds to the
data collection and follow up requirements of the study. The overall unplanned
pregnancy rate is not changed by the Trussel and Grummer-Strawn observation.
It is important to note that we are still able to use data from previous
studies reported as total unplanned pregnancy rates. (Trussell discusses
perfect and imperfect use in detail in his paper in Statistics in Medicine
Studies are reported in two ways, either in peer review journals or in monographs of meetings and conferences. Studies in peer review journals are held to a higher standard as they must pass the judgment of expert referees to be published. They are also more subject to public criticism of those who read the journals. In this report I classify the NFP literature as peer review, peer review questionable, and other. Studies in the peer review questionable category have been subject to published criticism of their findings; peer review have been published without challenge. Other includes all other reports of NFP effectiveness.
Studies that have been challenged are Weissman et al (1972), Wade et al (1981) and, Johnston et al (1978). The Wade study was criticized by Billings (1980) and Hilgers (1980) and the Johnston study by Hilgers (1978) and Santamaria (1978). Weissman, Wade, and Johnston are coded as peer review questionable in the analysis below.
However they are reported, NFP studies should report the method used by the clients, the number of women taught, and the number entering the study period. When analyzing the outcomes, the proportion lost to follow up should also be reported. If a study intends to report perfect and imperfect use rates, coitus during the fertile time has to be noted and coded for each study cycle and entered into the analysis data set.
To find all the studies of NFP effectiveness, I have searched the holdings of the Johns Hopkins Welch Medical Library, the Popline computerized abstracts of documents related to population, Index Medicus, and my own extensive collection of NFP related documents. The papers I have found are listed in the bibliography. Often studies are reported in more than one place. The same studies can be reported several times in peer review journals and at NFP meetings (Kambic 1981, 1988, Wade 1979, 1981). The multiple reports are in the bibliography. For data analysis, I have used the most recent papers (Kambic 1988, Wade 1981). When a study rate is corrected in the literature, I have used the corrected rate when appropriate; for example using Rochat’s (1972) more conservative estimate of Weissman’s (1972) results.
As a matter of public health judgment and of objectivity, I choose to use more conservative estimates of NFP pregnancy rates (higher rate) where controversy exists about a reported rate. Thus I have not used the Hilgers estimates of unplanned pregnancy rates in three studies of the Creighton method (Doud 1985, Fehring 1994, Hilgers et al 1980) but have used the reported total pregnancy rate to estimate the unplanned pregnancy rate. In any health measure it is preferable to provide a more strict estimate of risk rather than to make ambitious claims. This is especially true for NFP where popular claims for NFP use-effectiveness are not supported by the data and therefore have harmed the credibility of NFP. Furthermore, a more conservative estimate of a pregnancy rate means that there is room for program improvement.
Fehring (1994) reports on the Creighton method but provides a standard
interpretation of unplanned pregnancies in addition to the Hilgers approach.
He reports a 3.2% pregnancy rate according to Hilgers, a 26.8% total pregnancy
rate, and a 12.8% unplanned pregnancy rate by standard definitions.
Table 1 presents those unduplicated NFP effectiveness studies found in peer review literature and other sources. We will use the results of these papers to conduct a Meta Analysis (Dickersin and Berlin 1992). There are two reasons to include all studies in a meta analysis. Each study adds some information to our effort to get at the truth. Second, we should respect the efforts of those who collected data and reported their attempts to discover the effectiveness of NFP. We can learn from all studies, even from those containing errors.
The major distinction in the studies is in the method used to report unplanned pregnancies. Pearl rate studies cannot be compared either with life table studies or with each other. Pearl rates can only be statistically combined to provide an estimate of the mean 12 month unplanned pregnancy rate of all Pearl rate studies. This can be considered to approximate the 12 month life table rate. I also have made a distinction between calendar rhythm and all other NFP methods in the analysis. Although both calendar rhythm and NFP depend upon abstinence during the fertile time to avoid conception, the method used to identify the fertile time is quite different.
Life table pregnancy rates are comparable in all studies. In this analysis, life table unplanned pregnancy rates are analyzed by year of the study, by method, and by peer review status - variables available for all studies. The papers were examined for other variables of interest, such as mean age of the users, but such variables were missing from a number of reports. We estimate NFP effectiveness overall and for each method, ST and OM, using multivariate techniques.
Besides meta analysis of the data, I also consider the best NFP studies, based on criteria from this paper. The question is, "Do the best designed NFP studies support the conclusions of the meta analysis?"
What are the criteria for a well designed NFP study? They are the same as for any observational study. The key is to be aware of bias, sources of variance, and to try to control for them. The study should have a clear objective, valid definitions, a clear beginning and end date, well-defined endpoints, and life table rates. The NFP methods should be standardized throughout the study period although they can differ from center to center. Pregnancies should be analyzed by at least two objective observers, especially when calculating perfect and imperfect rates. I would not choose any study that has been criticized in the peer review literature nor any study that does not fall into generally accepted definitions in the NFP field. I think that there are at least three NFP studies that meet these criteria. They are Rice et al (1981), the WHO Ovulation method study (1981), and the on going Prospective European Multicenter study (1993). These three studies are multi national, having centers in more than one country. Authors include more than one NFP expert to balance issues of observer bias. All three have as a main objective the measurement of pregnancy rates and therefore have a focus on research.
In addition, Trussell and Grummer-Strawn (1990, 1991) reanalyzed the WHO Ovulation method study to illustrate the differences between perfect and imperfect use. In doing so they confirmed that the rules of NFP worked very well. They found, as NFP rules predict, that women were unlikely to become pregnant when they followed the rules for avoiding pregnancy. Conversely, with intercourse in the fertile time, pregnancy was likely.
Lamprecht and Trussell (1997) have identified a framework for evaluating
NFP effectiveness studies. They present a number of studies which they
identify as well designed. The three studies above are in their list. However
they also include the Wade (1981) and Johnston (1978) studies in their
list of well designed studies. The Wade study was criticized for improper
teaching of the ovulation method (Hilgers 1981, Billings 1981). The Johnston
study was heavily criticized by Santamaria (1978) and Hilgers (1978) as
deficient in the teaching of the Ovulation method. There were also problems
in recruiting users in the Wade study (Spieler 1997). The high dropout
rate of the Wade study, for reasons other than pregnancy, indicates strong
dissatisfaction with the NFP methods. I disagree with Lamprecht and Trussell’s
inclusion of the Wade and Johnston study in their list of well designed
Let us first examine calendar rhythm. Of the rhythm studies in the bibliography, only those of Tietze (1951) and Dicker (1989) are clearly reported observational studies. The remainder of the reports are subject to interpretation and are contestable. Latz (1942) reports neither rule nor pregnancies in a mail survey of 1000 rhythm users. The Population Council (1963) paper lacks the rule, the months of observation, and the number of pregnancies. Ryder (1973), using data from the National Fertility Survey of 1970, provides only an estimate of the rate with no supporting data. Jaramillo-Gomez (1968) dismisses the need for a cycle length history prior to using calendar rhythm, thereby calling into question his methodology. Finally, all the reports provide Pearl rates as a measure of contraceptive efficacy; but we know that Pearl rates have a time bias.
In spite of these problems, these are the only English language studies which report information on calendar efficacy. We use five of these reports: Fleck(1940), Tietze(1951), Dunn(1956), Jaramillo-Gomez(1968), and Dicker(1989) to estimate a 12-month Pearl rate for calendar rhythm. We did not use Latz's report because it is improbable that among 1000 women using rhythm, there were no pregnancies. These women may constitute a biased sample. We did not use Ryder or the Population Council because they are population based surveys and no calendar rule was stated and the months of observation and pregnancies were not reported. Although Jaramillo-Gomez published no calendar rule, he did report the other required data.
We choose to estimate a 12-month Pearl rate because it is the Pearl rate most comparable to the standard 12 month life table rates now commonly used in efficacy studies. We estimated rates including the results of Jaramillo-Gomez for a conservative estimate, and excluding his results for a less conservative estimate. Including his data we constructed a model using the natural logarithm of the Pearl rate which fit the data better than the untransformed model (R2 =.79 vs R2 = .64) and provides an estimate of a 12-month Pearl rate of 18.5 +1.8 standard error (s.e.). For a less conservative estimate of pregnancy rate, excluding Jaramillo-Gomez, we fit an untransformed model (R2 = .76, see the appendix for an explanation of R2) giving an estimate of a 12 month pregnancy rate of 15.0+ 4.0. The loss of one data point and the use of untransformed data raise the standard error of the estimate considerably in the second case.
Let us turn to modern NFP methods, first examining the pregnancy rates of those reporting Pearl rates. Figure 1 is a scatterplot of Pearl pregnancy rates versus average months of observation of all studies in Table 1 reporting Pearl rates. As with the rhythm method Pearl rates the best fit to these data is a logarithmic transform of Pearl rates. We estimate that BBT has a 12 month rate of 14.3 unplanned pregnancies, ST 10.0, and the OM 11.9. This is as far as we can go with Pearl rates so we turn to those studies reporting life table rates.
Figure 2 is a scatter plot of the life table rates of unplanned pregnancies versus the year that the study was reported and the regression trend line. There is no discernable pattern to the data and the R2 =.15 but the trend line shows NFP effectiveness significantly (p=.02) improving from earlier to more recent studies. Figure 3 is a boxplot of life table pregnancy rates by decade. The 1970s and 1980s are very similar in reported rates with medians of about 15 per decade and with the central 50% of studies reporting life table rates of between 10 and 20 pregnancies per 100 women per year in each decade. There is dramatic improvement in the 1990s with the median reported rate being about eight and 75% of studies reporting pregnancy rates less than 10. (See the appendix for a discussion of how to interpret boxplots.)
Figure 4 is a boxplot of life table pregnancy rates by peer review status. Here we see that peer review articles have about the same level and spread as non peer review articles. The questionable peer are somewhat higher. However, medians for all three kinds of articles are between 10 and 18.
Figure 5 is a boxplot of life table pregnancy rates by method. The three Creighton method studies have somewhat higher pregnancy rates, no doubt due to the method of reporting pregnancies. The two MM studies have much lower pregnancy rates. The 15 ST method studies show a lower median and tighter spread when compared with the 18 OM studies.
In order to adjust for the effects of year of study, peer review status, and NFP method on life table unplanned pregnancy rates, I used mulitvariate analysis of variance examining ST and OM as two levels of factors. I included the Hilgers and the MM data in with the OM method because both are mucus methods. Table 2 shows the results of this analysis. There is a significant difference (p<.02) in the pregnancy rates of the ST (10.2+2.5) and OM (16.0+3.3). Because of missing values there are fewer data points for the comparison of discontinuation rate but the rates are virtually identical, ST (37.2+18.1) and OM (38.8+9.2) as is their spread. Users age is also similar between ST (27.4+.5) and OM (28.9+1.8) but with the OM having a much larger spread of user ages than the ST. Figure 6 shows the results of this analysis in a bar graph.
Considering data from those studies identified as the best, the Rice study is a symptothermal study of experienced users and the WHO OM study is of new users with a learning and an effectiveness phase. The life table rates in the Rice study vary from 3.3 in Canada to 15.6 in Colombia with an overall mean of 8.2. The WHO study rates range from 17.7 in Ireland to 33.2 in El Salvador with an overall mean of 22.3. So far, from the European collaborative study, I could find only one life table pregnancy rate and that was reported by Freundl (1997) as 3.0 which is one of the lowest life table pregnancy rates ever reported and is close to the perfect use rate for the WHO OM study as reported by Trussell and Grummer-Strawn. Figure 7 graphically represents comparisons of the analysis of the combined data (data) with that of the best studies, and the best use.
First, considering calendar rhythm, a finding from a previous study is reiterated here. It may be that calendar rhythm is in the same range of effectiveness as modern NFP. The projected rhythm pregnancy rates, 15.0 and 18.5 are within the range of modern NFP methods shown here. Trussell and Kost (1987) found that typical pregnancy rates of barrier methods were 21 for spermicides, and 18 for diaphragm, cap, and sponge and 12 for condoms. Our meta-analysis based on the only examples of calendar rhythm studies we could find suggests that calendar rhythm can be as effective as these other methods. The advantages of the Calendar Rhythm include those of other Natural Methods. It is low cost. After an initial learning time, a woman can use it without the need to purchase supplies or to return for medical follow up. It has no medical contraindications. It can be taught by para professionals releasing medical personnel for other tasks. Additionally, Calendar Rhythm may have some unexplored advantages. There is no need to chart temperature or mucus daily. A woman simply keeps track of her cycle on a calendar and uses safe days for intercourse. It is essential that women wishing to use calendar do have records of the cycle length of their six previous cycles and preferably the previous twelve. Of course, any woman with irregular cycles is not a candidate for this method. This includes women with a previous cycle variation of more than seven days in the prior six cycles.
Turning to NFP methods reported as Pearl rates, the graph clearly shows that the longer a study proceeds, the lower the Pearl rate. The Pearl rates provide estimates of a slight advantage in effectiveness of the ST over the OM.
Life table rates examined by year over the past 30 years show a significant decrease in more recent years. This trend is due to several factors; we are getting better at NFP and getting better at NFP studies. The rules of NFP have been clarified, teaching is standardized, and teachers are better trained. We understand the necessity of follow up for proper education and use of the method. Furthermore we now understand how to conduct an NFP observational study. There is more selection of users, we allow women to change their intention from avoid to plan and to leave the study. The peer review status of an article does not influence the reported pregnancy rates. There is no bias for lower or higher pregnancy rates to be reported in the peer review literature.
Using three separate approaches to examine the data, by regression analysis of Pearl rates, by multivariate analysis of life table rates, and by choosing the "best" NFP studies, ST shows itself to have lower unplanned pregnancy rates than the OM. This is probably because of the increased abstinence of ST methods whereby there is a cross check to identify fertility and using the most conservative of signs.
How effective is NFP? It is not as effective as the pill, sterilization,
or implants which have pregnancy rates less than three. It is as effective
as the barrier methods of birth spacing, the condom, foam and diaphragm
which have average pregnancy rates between ten and twenty. We know that
in any group of couples using NFP to space pregnancy, there are those who
will take chances and have intercourse in the fertile time. There are also
couples who have a low conception threshold, or high fertility, and will
be more likely to become pregnant. The pregnancy rate of any group of NFP
user couples will depend on the proportions of risk takers and high fertility
couples in the cohort. We also know from the studies of Trussell and others,
that if the NFP rules are followed, if couples are perfect users, the probability
of pregnancy is quite low, less than 5%. Couples who follow the rules for
abstaining when signs of fertility are apparent, can use NFP with confidence
that they will not become pregnant.
se - Standard error. This number represents the standard deviation of the mean (average) of a sample. It may be used to test the hypothesis that two samples are similar or different. For example, are the ages of the women in group one similar to the ages in group two? The ages are considered to be the same if there is overlap in the standard errors and different if the standard errors do not overlap.
p - the p value is the probability of erroneously declaring two sample means are different. The closer p is to 0 the lower the error.
Analysis of variance is a statistical technique to determine similarities and differences in samples when the outcome variable is continuous.
The R2, which varies between 0 and +1 is a measure of how well the data used to create the regression equation fit the equation. Some data points used to create the equation may deviate far from it and some may have little deviation from it. If all of the data points have small deviations the data is considered a good fit to the model and the R2 is high or close to 1. If the data point have large deviations the data is considered a poor fit to the model and the R2 is low, close to zero.
A boxplot graphically summarizes data. The box itself shows the middle or central 50% of the data and it shows the median. The median is the number in the middle of the data over and under which 50% of the values lie. So that from the median to one box edge contains 25% of the data. The tails or whiskers extending from the ends of the boxes show the extent of the remaining 25% of the data. (Tukey)
I have broken the bibliography into sections. This is intended to make it easier for readers to find the cited papers. There are more papers in the methods section than are cited in the work above. My intention is to provide citations for a library for those who may be interested in continuing this work.
Bibliography for Rhythm
Dicker D, Wachsman Y, Feldberg D, Ashkenazi J, Yeshaya A, Goldman JA. The Vaginal Contraceptive Diaphragm and the Condom - A Reevaluation and Comparison of Two Barrier Methods with the Rhythm Method. Contraception, 1989; 40:497-504.
Dunn HP. The Safe Period. The Lancet. 1956; Sept 1 p. 441-2.
Fleck S, Snedeker EF, Rock J. The Contraceptive Safe Period. N Eng J Med, 1940; 223:1005-9.
Latz LJ, Reiner E. Further Studies on the Sterile and Fertile Periods in Women Am J OB GYN, 1942; 43: pp 74-9
Jaramillo-Gomez M, Londono JB. Rhythm: A Hazardous Contraceptive Method Demography, 1968; 5: pp 433-38.
Ryder NB. Contraceptive Failure in the United States Family Planning Perspectives, 1973; 5: pp 133-42.
Tietze C, Poliakeff SR, Rock J. The Clinical Effectiveness of the Rhythm Method of Contraception. Fertility and Sterility, 1951; 2: pp 444-50.
The Population Council :India: The Singur Study Studies in Family Planning, 1963; 1: pp 1-4.
Bartzen PJ. Effectiveness of the Temperature Rhythm System of Contraception. Fertility and Sterility. 1967:18; 5 pp 694- 06.
Döring GK. Über die Zuverlässigkeit der Temperaturmethode zur Empfängniswerhüttung. Deutsche med. Wschr. 1967: pp 1055-1061.
Guy F, Guy M. I’ile Maurice - resultats d’un sondage. Fiches documentaires du CLER. 1966; 34, pp 19-25.
Lanctot CA. La methode sympto-thermique. Serena, Montreal, 1965.
Marshall J. A Field Trial of the Basal Body Temperature Method of Regulating Births. The Lancet. 1968, July 6, pp 8-10.
Mastorianni L. Present Status of Rhythm Techniques. Clinics in Obstetrics and Gynecology 1964; 7: pp 868-875.
Rice F. The Sympto-thermic Method: Its Reliability and Acceptability. Coverline. 1968; 1; 12.
Rendu C, Rendu E. Premiers résultats de l’enquete sondage du CLER; première partie: valeur d’efficacité. Fiches documentaires du CLER. 1966;37: pp 71-76.
Roetzer J. Erweiterte Basaltemperaturmessung und Empfængnisregelung. Archives of Gynaecology 1968; 206: 195.
Tietze C, Potter RG. Statistical Evaluation of the Rhythm Method. Am J Obtet Gynecol 1962; 5: pp 692-698.
Traissac R, Vincent B, Vincent A. Continence périodìque et méthode des temperatures. La Revue de médecine. Jan 1963, pp 11-30.
Vincent B, Aymard A, Aymard M, Besancon G, Leboterf G, Perroy J, Vincent A. Method thermiques et contraception: approach medicale et psycho-sociologique. Masson et Cie. Paris , 1967.
Ball M. A Prospective Field Trial of the Ovulation Method of Avoiding Conception. European J Obstetrics Gynecology and Reproductive Biology: 1976; 6: p 63.
Billings JJ 1972 cited in Perez A. Use-effectiveness of the Ovulation Method of Natural Family Planning. In Ramon C Ruiz, John Russell, Irene Osmund Ruiz. Proceedings of the International Seminar on Natural Family Planning and Family Life July 1988, Hong Kong. Hong Kong University Press 1990; pp 75-80.
Billings JJ. Natural Family Planning. The Lancet. 1976; September 11, pp 579.
Dunn HP. Natural family planning. NZ Medical Journal. 1975; 82: 407.
Durkan JP. Clinical Experience with Basal Temperature Rhythm. Fertility and Sterility. 1976; 21: 4 pp 322-324.
Johnston J, Roberts D, Spencer R. NFP: A Survey Evaluation of the Effectiveness and Efficiency of Natural Family Planning Service and Methods in Australia: Report of a Reseaerch Project. Sydney, Australia: St. Vincent’s Hospital, 1978.
Ek K. cited in Perez A. Use-effectiveness of the Ovulation Method of Natural Family Planning. In Ramon C Ruiz, John Russell, Irene Osmund Ruiz. Proceedings of the International Seminar on Natural Family Planning and Family Life July 1988, Hong Kong. Hong Kong University Press 1990 pp 75-80.
Hilgers TW. Analysis of an Australian NFP Survey. Advocate Press. Melbourne 1978.
Marshall J. Ovulation Method of Family Planning. The Lancet.1972; November 11, pp 1027.
Marshall J. J Biosocial Science 1975; 7: 9 pp 49.
Marshall J. Cervical-Mucus and Basal Body-Temperature Method of Regulating Births: Field Trial. The Lancet. 1976; August 7 pp 282 - 283
Mosley WH. Ovulation Method of Family Planning. The Lancet. 1972; November 11, pp 1028.
Parenteau-Carreau S, Lanctot CA, Rick F. Effectiveness Study of the Sympto-Thermal approach to Family Planning: A Canadian Sample. Presented at the 25th Congress of the Federation of French speaking OBS-GYN Societies in Montreal, Sept.26 1974.
Rochat RW. Ovulation Method of Family Planning. The Lancet.1972; November 11, pp 1027 - 1028.
Santamaria JN. A Commentary on A Survey Evaluation of the Efficacy and Efficiency of Natural Family Planning Services and Methods in Australia. Advocate Press, Melbourne 1978.
Vollman RF. Ovulation Method of Family Planning. The Lancet. 1972; November 18 pp 1085 - 1086.
Wade ME, McCarthy P, Harris GS, Denzer HC. Reply to Dr. Hilgers. Am J Obstet Gynecol 1980; 135: 5 pp 697.
Wade ME, McCarthy P, Harris GS, Denzer HC. Reply to Dr. Billings. Am J Obstet Gynecol 1980; 135: 5 pp 698.
Wade ME, McCarthy P, Abernathy JR, Harris GS, Danzer HC, Uricchio WA. A randomized prospective study of the use-effectiveness of two methods of natural family planning: An interim report. Am J Obstet Gynecol 1979; 134: 6 pp 628 - 631.
Weissmann SMC, Foliaki L, Billings EL, Billings JJ. A Trial of the Ovulation Method of Family Planning in Tonga. The Lancet. 1972; October 14, pp 813 - 815.
Barbato M, Bertolotti G. Natural methods for fertility control: A prospective study. Presented at the International Congress of the IFFLP, Ottawa Canada 1986. International Journal of Fertility Supplement. May 1988: pp 48-51.
Billings JJ. Two methods of natural family planning. Am J Obstet Gynecol 1980; 135: 5 pp 697-698.
Brennan JJ, Klaus H. Terminology and core curricula in natural family planning. Fertility and Sterility, 1982; 38:1 pp 117-118.
DeBethune AJ. On the Effectiveness of the Ovulation Method. International Review of Natural Family Planning 1984; 8 pp 150-161.
Dolack L. Study confirms values of Ovulation Method. Presented at the Congress for the Family of the Americas. Guatemala, July 1980, Knights of Columbus, New Haven Ct 1980.
Doud J. Use-Effectiveness of the Creighton Model of NFP. International Review of Natural Family Planning 1985; 9: pp 54-72.
Hilgers TW. Two methods of natural family planning. Am J Obstet Gynecol 1980; 135: 5 pp 696-697.
Hilgers TW, Prebil AM, Daly KD. The effectiveness of the Ovulation Method as a means of achieving and avoiding pregnancy. Presented at Conference for Natural Family Planning Practioners, Omaha Nebraska, July 1980.
Ghosh AK, Saha S, Chatterjee D. Symptothermia vis à vis Fertility Control. Journal of Indian Med Association 1980.
Kambic RT, Kambic M, Brixius AM, Miller S. A Thirty-Month Clinical Experience in Natural Family Planning. American Journal of Public Health 1981; 71: 11 pp 1255-1257.
Kambic RT, Martin M. Evaluating client autonomy in natural family planning. Adv Contracept
1988: 4 pp 221-231.
Klaus H, Goebel J, Maraski B et al. Use effectiveness and client satisfaction in six centers teaching the Billings Ovulation Method. Contraception. 1979 :19 pp 613-629.
Labbok MH, Klaus H, Barker D. Factors related to Ovulation Method Efficacy in three programs: Bangladesh, Kenya, and Korea. Contraception: 37; 6 pp 577-589.
Medina JE, Cufuientes A, Abernathy JR, Spieler JM, Wade ME. Comparative evaluation of two method of natural family planning in Colombia. Am J Obstet Gynecol 1980; 138: 8 pp 1142-1147.
Meng KH, Cho KS. Profile of the Billings Ovulation Method Acceptors and Use-effectiveness of the Method in Korea. Journal of the Korean Medical Science. 1989; 4: 1 pp 29-34.
Perez A, Labbok M, Barker D, Gray R. Use-effectiveness of the ovulation method initiated during postpartum breast-feeding. Contraception 1988; 38 pp 499-508.
Rice FJ, Lanctot CA, Garcia-Devesa C. The Effectiveness of the Sympto-thermal method of Natural Family Planning: An International Study. Int J Fertil 1981; 26: 3 p 222-230.
Schubarth, Braendli 1986. cited in Perez A. Use-effectiveness of the Ovulation Method of Natural Family Planning. In Ramon C Ruiz, John Russell, Irene Osmund Ruiz. Proceedings of the International seminar on Natural Family Planning and Family Life July 1988, Hong Kong. Hong Kong University Press 1990 pp 75-80.
Wade ME, McCarthy P, Braunstein GD, Abernathy JR, Suchindran CM, Harris GS, Danzer HC, Uricchio W. A randomized prospective study of the use-effectiveness of two methods of natural family planning. Am J Obstet Gynecol 1981; 141: 4 pp 368-376.
Weeks JR. An Evaluation of the Use-effectiveness of fertility awareness methods of family planning. Journal of Biosocial Science 1982; 14: 1 pp 25 - 32.
World Health Organization. A Prospective Multicentre Trial of the Ovulation Method of Natural Family Planning. II. The Effectiveness Phase. Fertility and Sterility. 1981; 36: 5 pp 591-598.
De Leizaola MA. Première phase d’un étude prospective d’éfficacité du planning familial naturel réalisée en Belgique francophone. J Gnecol Obstet Biol Reprod 1994; 23: pp 359-364.
Fehring RJ, Lawrence D, Philpot C. Use Effectiveness of the Creighton Model Ovulation Method of Natural Family Planning. Journal of Obstetrical Gynecologic and Neonatal Nursing 1994; 23: 4 pp 303-309.
Frank-Herrmann P, Greundl G, Gnoth C, Godehardt E, Kunert J, Baur S, Sottong U. Natural family planning with and without barrier method use in the fertile phase: Efficacy in relation to sexual behavior - A German perspective long-term study. Advances in Contraception 1997; 13: 2/3 pp 179-189.
Gray RH, Kambic RT, Lanctot CA, Martin MC, Wesley R, Cremins R. Evaluation of natural Family Planning Programs in Liberia and Zambia. J Biosoc Sci 1993: 25 pp 249-258.
Indian Council of Medical Research Task Force on Natural Family Planning. Field Trial of Billings Ovulation Method of Natural Family Planning. Contraception 1996: 53 pp 69-74.
Kambic RT, Gray RH, StMart R, Lanctot C, Martin MC. Use-Effectiveness Among Users of the Symptothermal Method of Family Planning. International Family Planning Perspectives. 1991; 17: 3 pp 96-99.
Kambic RT. Natural family planning use-effectiveness and continuation. Am J Obstet Gynecol 1991; 165: 6 pp 2046-2048.
Kambic RT, Gray RH, Lanctot CA, Martin MC, Wesley R, Cremins R. Factors related to autonomy and discontinuation of use of NFP for women in Liberia and Zambia. American Journal of Obstetrics and Gynecology 1991; 165: 6 p 2060 (supplement).
Kambic RT, Lanctot CA, Wesley R. Trial of a new method of natural family planning in Liberia. Advances in Contraception. 1994: 10 pp 111-119.
Kambic RT, Lamprecht V. Calendar rhythm efficacy: a review. Advances in Contraception. 1996: 12 pp 123-128.
Lamprecht VM, Trussell J. Natural Family Planning Effectiveness: evaluating published reports. Advances in Contraception. 1997; 13: pp 155-165.
Speiler J. Personal Communication. April 1997.
Thapa S. Wonga MV, Lampe PG, Pietojo H, Soejoenoes. Efficacy of Three Variations of Periodic Abstinence for Family Planning in Indonesia. Studies in Family Planning 1990; 21: 6 pp 327-334.
The European Natural Family Planning Study Groups. Prospective European multi-center study of natural family panning (1989-1992):interim results. Advances in Contraception 1993: 9 pp 269-283.
Trussell J, Grummer-Strawn L. Further analysis of contraceptive failure of the ovulation method. Am J Obstet Gynecol 1991; 165: 6 pp 2054-2059.
Trussell J, Grummer-Strawn L. Contraceptive Failure of the Ovulation Method of Periodic Abstinence. Family Planning Perspectives 1990; 16: 1 pp 5-16.
XU JX, Yan JH, Fan DZ, Zhang DW. Billings natural family planning in
shanghai, China. Advances in Contraception 1994: 10 pp 195-204.
Bibliography Research and statistical methods
Bongaarts J, Rodriguez G. A New Method for Estimating Contraceptive Failure Rates. Working Papers, The Population Council No. 6, 1989.
Chambers J, Cleveland WS, Keiner B, Tukey PA. Graphical Methods for Data Analysis. Wadsworth International Group, Belmont, California 1983.
Chiang CL. The Life Table and Its Applications. Robert E Krieger Publishing company, Malabar Florida, 1984.
Cochran WG. Planning and Analysis of Observational Studies. John Wiley & Sons, New York 1983.
Dickersin K, Berlin JA. Meta-analysis:State-of-the-Science. Epidemiologic Reviews. 1992; 4 : pp 154-175.
Farley TMM. Life-Table Methods for Contraceptive Research. Statistics in Medicine. 1986; 5 pp 475-489.
Grady WR, Haywood MD, Yagi J. Contraceptive Failure in the United States: Estimates from the 1982 National Survey of Family Growth. Family Planning Perspectives. 1986; 18: 5 pp 200, 209.
Lee ET. Statistical Methods for Survival Data Analysis. Lifetime Learning Publications, Belmont California, 1980.
Meinert CL. Clinical Trials: Design, Conduct, and Analysis. Oxford University Press, New York, 1986.
Pearl R. Factors in human fertility and their statistical evaluation. The Lancet 1933; 2 pp 607-611.
Potter RG. Additional measures of Use-Effectiveness of Contraception. Millbank Memorial Fund Quarterly. 1963; 41: pp 400-418.
Schlesselman JJ. Case-Control Studies: Design, Conduct, and Analysis. Oxford University Press, New York, 1982.
Shelton JD, Taylor RN. The Pearl Pregnancy Index reexamined: Still useful for clinical trials of contraceptives. Am J Obstet Gynecol 1981; 139: 5 pp 592- 596.
Trussell J and Kost K. "Contraceptive Failure in the United States: A Critical Review of the Literature. Studies in Family Planning, 1987; 18:5: pp 237-83.
Trussell J, Menken J. The calculation of gross rates of continuation for contraceptive methods: single and multiple increment life tables. In Contraceptive efficacy among married women aged 15-44 years. Vital and Health Statistics, Series 23, No 5 DHS Publication No PHS 80-1981 Appendix IV pp 49-57.
Trussell J, Kost K. Contraceptive Failure in the United States: A Critical Review of the Literature. Studies in Family Planning 1987; 18:5 pp237-283.
Trussell J, Hatcher RA, Cates W, Stewart FH, Kost K. A guide to interpreting contraceptive efficacy studies. Obstetrics and Gynecology 1990; 76: 3 pp 558-567.
Trussell J. Methodological pitfalls in the analysis of contraceptive failure. Statistics in medicine 1991: 10 pp 201-220.
Vessey M, Lawless M, Yeates D. Efficacy of Different Contraceptive Methods.
Lancet , 1982; April 10, pp 841-842.
El Salvador, India, Ireland, New Zealand, Philippines
= WHO 1981
o = Ovulation
s = Symptothermal
c = Calendar
h = Hilgers (Creighton)
m = Modified mucus (Dorairaj)
p = Pearl
l = Lifetable
Reported Unplanned Pregnancy Rate
1 = Peer review published
2 = Peer review questioned
3 = Not peer review
|Author/Country||Year||Method / Rate||Pregnancy Rate||Peer|
2. NFP studies from 1974 to the present, lifetable unplanned
pregnancy and discontinuation rates and age of the user by ST or OM method.
NS = not significant
|Mean Age of User||