For several years the Indoor Environment Program of Lawrence Berkeley National Laboratory and the indoor radon group of the U.S. Geological Survey have been working on the most critical issue for an effective national program on indoor radon: how to identify high-radon areas more effectively and reliably (Nero, 1992). If this can be done, monitoring and remedial efforts can be mounted much more efficiently than at present. This joint project is now nearing completion, and is beginning to publish its key results for the scientific and user communities.
Several approaches may be taken to identify high-radon areas or, put another way, to estimate the variation of local concentrations by area. These include physical modeling, the development of radon "potentials" based on physical factors, monitoring in a sampling of homes, and combinations of these approaches (Nero, 1993). A principal example of the radon potential approach is the U.S. map developed earlier by the USGS (Gundersen et al., 1991). In other countries, methods have included use of dense sets of monitoring data over a wide geographic area (Miles, 1994), and correlation of monitoring data and geological factors (Voutilainen and Makelainen, 1993), to predict percentages of new homes exceeding a chosen level of concern. The approach taken in the present project is to examine the statistical correlation between available monitoring data and physical factors, such as soil, geological, housing, and meteorological characteristics, to predict local indoor concentrations.
Initial regression analysis of short-term "screening" data from Minnesota indicated the potential power of this approach: Surficial radium concentrations, as indicated by the data from the National Uranium Resource Evaluation (NURE), accounted for approximately 60% of the variance in county geometric mean (GM) "screening" radon concentrations across the state, as measured by the R^2 (Nero et al., 1994). Although we are ultimately interested in long-term, living-area concentrations for identifying high-radon areas, this analysis indicated that available data on physical factors could account for most of the local variation in mean radon concentrations.
The analysis, however, also indicated a fundamental problem with ordinary regression analyses applied for this purpose, which is that they do not properly handle cases where the monitoring data are sparse. In the initial Minnesota analysis, for example, the calculated R^2 decreased if we included counties where smaller numbers of homes were monitored. This occurs because, for such counties, the GMs are very poorly determined by the monitoring data, so that a substantial part of the difference between the predicted and "measured" GMs is simply random variation due to small sample numbers.
On the other hand, the predictions can often have much less uncertainty than the "measured" GMs for individual counties. This is, of course, the whole point of developing a correlation model for identifying high-radon areas, i.e., to be able to provide predictions for counties in which there are insufficient representative monitoring data to do so reliably. And in contrast to ordinary regression analyses, a Bayesian approach (Gelman et al., 1995) can handle sparse data properly and thereby provide a self-consistent framework for estimating the power of the correlation model to predict the "true" county GMs and, for individual predictions, for estimating the associated uncertainties.
Of course, even accurate prediction of the GMs does not give a direct measure of the percentage of homes having levels exceeding any particular level of concern, since the actual distribution in the counties or other geographic units may not correspond well to a lognormal function. However, in many cases, including Minnesota, the correspondence is good enough to suggest the presumption of lognormality for modeling purposes. Furthermore, calculation of the percentage of homes with high levels from the GM presumes knowledge of the geometric standard deviation (GSD). In Minnesota, although there is substantial variation in the county GSDs calculated directly from the data, most of this variation appears to be due to the small numbers of homes monitored in most counties; statistical modeling in Minnesota would therefore appear to be farily soundly based if a GSD of 2.1 is presumed for each of the counties in the state. On the other hand, in the U.S. as a whole or in a region with more variability in the distribution of causative factors, such as meterorology or geology, while presuming a constant GSD may serve moderately well for prediction of the GMs, estimation of the percentages in each area of homes with high levels may be more difficult, since this parameter can depend acutely on the actual GSD. Even so, prediction of the GMs may be deemed useful in initial ranking of the areas on which intensive monitoring efforts might be focussed.
When we developed a Bayesian model for the Minnesota data, we learned two things. (Price et al., 1995) First, the variance in the true county GMs is considerably less than the variance in the "measured" GMs, i.e., those calculated directly from the monitoring data. This isn't surprising, since the approximately 900 homes monitored are spread over 87 counties and, because the survey was population based, the number of homes monitored in each county vary from 0 (for two counties) to 116. Thirty-six counties had between 1 and 4 homes monitored. The observed GMs would therefore be expected to vary substantially about the true values for the housing stock as a whole, and this would be reflected in an artificially broadened distribution of (measured) county GMs. As a result,, the GSD of the true GMs as estimated from the model, i.e., 1.4, is only about two-thirds of the GSD of the GMs calculated from the monitoring data. And, although the "measured" GMs for 7 counties lay in the range 280-500 Bq/m^3 (and all of these counties had 1 to 4 homes monitored), none of the model's "posterior estimates" ( each a weighting of the "measured" GM and that estimated from the correlation with the county NURE value) exceeds 210 Bq/m^3 (though, given the modest uncertainties in the posterior estimates, some of the "true" GMs could somewhat exceed 210 Bq/m^3.
The second important result from the model, aside from the individual county GMs (and uncertainties) that are predicted, is that the county-averaged NURE value accounts for approximately 80% of the variation in the logarithm of the true GM among counties. The model indicates that knowing the county NURE average is "worth" about 30 additional observations in the county - quite a significant contribution toward determining the county GM radon concentrations!
We have also used this approach in building an overall correlation model for the county GM screening concentrations for Pennsylvania, Maryland, Delaware, Virginia, and West Virginia (Price, 1996). The NURE data alone did not have as much predictive power as in Minnesota, but we also used a class of geologic units based on the geologic radon potential units defined previously by USGS (Gundersen et al., 1991; Schumann, 1993). Jointly, this information - together with whether or not houses had basements - accounted, again, for approximately 80% of the variation in the logarithm of county screening GMs for this region. Further, we were able to convert the estimated screening GMs to estimated GMs of long-term living-area radon concentrations for region 3, based on a joint analysis of "screening" and long-term monitoring results for the region, which yielded conversion factors for different substructure types and screening monitoring locations (Price and Nero, 1995). The regional analysis thus yields estimates for long-term GMs by county in these states, which information is not otherwise available.
Using a digitized database of geologic units (look here) completed recently by USGS researchers, we are now performing such analyses for all regions of the contiguous 48 states for which there are screening data from the EPA/state-survey program. Initial results suggest that, for many regions, this approach is as effective as it was for the mid-Atlantic states, though it is less effective for at least one region (i.e., New England). One outcome that is apparent from these regional analyses is that the predictive power added by inclusion of the geologic unit variables arises partly because the units are serving as surrogates for spatial correlations among nearby counties, which correlations are not presently included explicitly in the regional models.
We have also been investigating how well mean concentrations might be predicted for smaller areas, such as census tracts (about 4000) people or townships (which vary greatly in population). These efforts have focused on several states for which we or others have developed moderately detailed information. In Minnesota, we have augmented the initial data by conducting a new year-long survey designed primarily to test the influence of geologic and soil factors and to examine the construction of models for predicting census tract GMs, as well as to normalize the earlier results to long-term living-area concentrations. Initial analyses suggest the following:
Examination of township data from the state of Washington has served as a basis for explicit inclusion of spatial correlations in a Bayesian framework (Boscardin and Price, 1996). Preliminary results suggest that in that case the spatial correlation function provides roughly the same predictive power as the NURE data, but the two together do not provide much greater predictive power than each alone, indicating that they in effect serve as proxies for one another. (However, we note that success of spatial information in increasing predictive power over that provided by specific physical factors is always attributable to the fact that the statistical model does not adequately represent all the physical factors affecting indoor concentrations.)
We have also performed an analysis at a "larger" scale, i.e., for the United States as a whole, using data from the EPA's National Residential Radon Survey, which measured year- long radon concentrations in more than 5000 ground-contact dwelling in 116 counties in the contiguous 48 states. This analysis found that the most important specific physical factors in accounting for the variability of county GMs were the NURE data and two variables from the LBNL meteorological database developed for U.S. counties as part of this project(Apte et al., 1996), but that the soil and geologic unit types from a database prepared for this analysis by the USGS also contributed significant predictive power. Overall about 60% of the variability in the county log GMs is accounted for by the model variables (Revzan et al, 1996). We caution that this predictive power is not nearly as good for individual counties as that in the models mentioned above, where 80% of the significantly smaller county variability within the state or region is often accounted for. Still, the moderately high national correlation offers guidance on what factors influence indoor concentrations and assists in the selection of model variables for other analyses (such as the regional analyses mentioned above).
With the success of our basic approach in developing state or regional models with high predictive power, we are already planning how the analytical techniques and specific programs developed in the course of these efforts will be made available to others. Of equal interest is that the physical data needed for making effective predictions be made available, as well as indications of how new data, such as representative long-term monitoring data, might be developed. Our interest is in making these methods and data available both to the scientific community, who might further improve them or apply them to other types of problems (as noted below), and to a user community, who would apply them to their own states or regions. There are of course substantial overlaps between these communities, both in terms of individuals or groups and because agencies (Federal, state, or local) wishing to employ these methods might in fact engage members of the scientific community to do so. We will also cooperate with the agencies that have supported this work, namely the U.S. Department of Energy and Environmental Protection Agency, in bringing it to the attention of these two communities.
Many of our specific analytical techniques and model results, of course, have been (and continue to be) prepared as scientific articles or reported at scientific meetings. But as more specific assistance to others, we are documenting the methods we have used in more detail in Lawrence Berkeley National Laboratory reports. A principal example is a report being prepared on the important programs we have used, together with exemplary detailed descriptions of how to use them for selected important purposes. These programs and the associated documentation are designed for use by a person or group with a working familiarity both with a fully capable statistical programming package (in particular, S Plus) and with an appropriate Geographical Information System (GIS) for handling the input data and analytical results. This report will also describe the data needed for such analyses and specifically characterize the important databases that are being made publicly available as part of this project. Another report describes the operational procedure for conducting a new representative survey of long-term living-area radon concentrations as a basis for normalizing analytical results to concentrations that correspond relatively closely to actual occupant exposures.
One means for making the projects methods and results available to the scientific and user communities will of course be paper copies of the reports and physical transfer of computer files on diskettes or other media. This, however, will be possible only for a relatively limited number of colleagues, and is in any case relatively cumbersome.
To make general information, as well as selected reports and files, more widely available, we will also utilize email, a world-wide-web page, and FTP(as necessary). We have already established an email address to which people may send a message and in response receive brief information both on the current status of the project and on how to register for receiving continuing (or perhaps more detailed) information. This address is "high-radon@lbl.gov". Our web page, which will provide continuing information on the project, from which specific topics may be selected by the user, is http://eetd.lbl.gov/IEP/high-radon/hr.html, which also provides a registration form. Both of these addresses (as well as individual published papers) will indicate how files can be obtained directly from the web page (or via FTP). It is, by the way, not our present intention to make scientific articles available by these means, since they will be available in individual journals. However, bibliographic details and summaries will be available via the web page and, to some extent, via email (or FTP).
These methods have been developed specifically to identify high-radon areas of the United States in a quantitative, statistically-based framework. However, we have also been highly interested in contributing to the improvement of analytical methods for other environmental problems, which sometimes do not treat data in a proper manner. Such problems might include any involving environmental parameters that have substantial spatial variation, where individual measurements have considerably uncertainty or are sparse and are linked to underlying geophysical factors whose predictive power is not known. Classes of examples that might benefit from the approach developed here are contaminant concentrations in soil or ground water around waste disposal sites, or a variety of parameters involved in characterization or modeling of global warming. (For example, in examining the ability of soil to absorb or release carbon dioxide, one could make spatially dispersed measurements of soil respiration and use the approach developed in the present work to estimate the distribution of respiration rates over a large area.) Or, in simply interpreting important data fields, such as temperatures across a continent, the data themselves often ought to be looked at in a more statistically proper way than most scientists employ.
These methods may be applied directly to another problem in the area of indoor radon, i.e., for imputing missing data in epidemiological studies of indoor radon and lung cancer. In case-control studies (the strongest design available for eliciting any relationship between radon exposures and lung cancer incidence), it is not possible to perform monitoring in all previous residences of the study subjects. The result is that parts of the subjects' exposure histories are missing, which affects not only the uncertainties in their estimated exposures, but - depending on how the missing periods are treated - can introduce bias into the study.
The present methodology offers a way to impute data more properly. The homes for which data are actually available, and the characteristics and locations of those homes, provide the information from which a mapping of estimated indoor concentrations by locale and residential type may be developed, using exactly the methods developed in this project. Given information from the subjects (or their surrogates) about their residential histories, estimates of concentrations (and uncertainties) can be drawn from the mapping for homes where monitoring is not performed. These may then be used together with the data resulting from actual monitoring to provide more complete estimates of subjects' exposures. (Treating the uncertainties properly will require that the biostatistical analysis use a methodology that includes the effect of exposure uncertainties. Epidemiologists do not ordinarily use such methods, but they ought to do so for residential radon studies, where uncertainties in the exposures can have a substantial effect on the results.)
M.G. Apte, A.V. Nero, and K.L. Revzan (1996). "LBL meteorological database for the United States." Indoor Air submitted.
J. Boscardin and P.R. Price (1996). "Spatial correlations in indoor radon concentrations in Washington state." In draft.
A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin (1995). Bayesian Data Analysis. Chapman & Hall, London
L.C.S. Gundersen, R.R. Schumann, J.K. Otton, R.F. Dubiel, D.E. Owen, K.A. Dickinson, R.T. Peake, and S.J. Wirth (1991). "Preliminary radon potential map of the United States." In: Proc. 1991 Int. Symp. on Radon and Radon Reduction Technology (EPA/600/9-91/037B), v. 2. U.S. Environmental Protection Agency, Research Triangle Park, NC, pp. 9.13-9.32.
J.C.H. Miles (1994). "Mapping the proportion of the housing stock exceeding a radon reference level." Radiat. Prot. Dosim. 56, 207-210.
A.V. Nero (1992). "A national strategy for indoor radon." Issues in Science and Technology (Fall), 33-40.
A.V. Nero (1993). "Methodologies for identifying high-radon areas: A brief review." In: P. Kalliokoski, M. Jantunen, and O. Seppanen, Eds. Indoor Air '93 (Proc., 6th Int. Conf. on Indoor Air Quality and Climate), v. 4. Indoor Air '93, Helsinki, pp. 419-425.
A.V. Nero, S.M. Leiden, D.A. Nolan, P.N. Price, S. Rein, K.L. Revzan, H.R. Wollenberg, and A.J. Gadgil (1994). "Statistically based methodologies for mapping of radon "actual" concentrations: The case of Minnesota." Radiat. Prot. Dosim. 56, 215-219 (Proc., Int. Workshop on Indoor Radon Remedial Action: The Scientific Basis and Practical Implications).
P.N. Price (1996). "Predictions and maps of county mean indoor radon concentrations in the mid-Atlantic states." Health Phys. submitted.
P.N. Price and A.V. Nero (1995). "Joint analysis of long-and short-term radon monitoring data from the northern U.S." Environ. Int. in press.
P.N. Price, A.V. Nero, and A. Gelman (1995). "Bayesian prediction of mean indoor radon concentrations for Minnesota counties." Health Phys. in press (LBL- 35818).
P.N. Price et al. (1996). "A survey of long-term indoor radon concentrations in Minnesota homes." In draft.
K.L. Revzan, P.N. Price. A.V. Nero, L.C.S. Gundersen, and R.R. Schumann (1996). "Bayesian analysis of the relationship between indoor radon concentrations and predictive variables in U.S. homes." In draft.
R.R. Schumann, Ed (1993). Geologic Radon Potential of EPA Region 3 (USGS Open-File Report 93-292-C). U.S. Geological Survey, Denver.
A. Voutilainen and I. Makelainen (1993). "Radon risk mapping using indoor monitoring data - a case study of the Lahti area, Finland." Indoor Air 3, 369-375.
High-Radon main page IEP | EETD | LBNL | Search the EETD Server