A
Geostatistical Analysis of Possible Spirochetal Involvement in Multiple
Sclerosis and Other Related Diseases
© Megan M. Blewett 2006
Megan.Blewett@att.net
Abstract
Zoonotic diseases, especially those with insect or
arthropod vectors, are recognized public health problems. This class of diseases includes West Nile
Virus, Human Granulocytic Ehrlichiosis (HGE), Babesiosis, Rocky Mountain
Spotted Fever, and Lyme Disease. This
study examines whether Multiple Sclerosis (MS), which is the most common
primary neurological disorder of young adults, also belongs in this
category. Visual and geostatistical
analyses of MS and Lyme reveal striking similarities between the two
diseases. Maps displaying each
disorder’s geographic distribution by county reveal this overlap visually. In addition, the statistical correlation
between MS and Lyme deaths (specifically all arthropod-borne disease deaths) is
significant at the state-level and highly significant at the county-level. MS incidence is known to vary with latitude;
the study’s statistical analysis reveals that Lyme Disease follows the same
trend. Discussion of possible
biological explanations of these geographical and statistical trends is
included in this article. Significant
correlations also exist with other diseases: on the state level, the
correlation between MS and breast cancer is 0.330, and between MS and ALS
(Motor Neuron Disease used in this study), the value is 0.618. The control, external accident/injury, did
not yield significant correlations.
Producing the maps and data required contacting all of the state
epidemiologists in the nation for Lyme incidence data. Compiling the data has resulted in one of
the most comprehensive Lyme databases available to researchers. The results of the visual, geostatistical,
and biochemical analyses suggest common spirochetal involvement in MS and
related diseases.
A
Geostatistical Analysis of Possible Spirochetal Involvement in Multiple
Sclerosis and Other Related Diseases
Introduction
Zoonotic diseases, especially those with insect or
arthropod vectors, are well-recognized public health concerns. Such diseases include West Nile Virus, Human
Granulocytic Ehrlichiosis (HGE), Babesiosis, Rocky Mountain Spotted Fever, and
Lyme Disease. Multiple Sclerosis (MS)
is the “most common primary neurological disorder of young adults” (Warren, 2001, page 1). The National Multiple Sclerosis Society
estimates that 400,000 people in the United States have MS (National Multiple
Sclerosis Society, 2005). The National
Institute for Neurological Disorders and Stroke (NINDS) reports that the cause
of MS is “linked to an unknown environmental trigger, perhaps a virus (NINDS,
2006a). Although a viral cause of MS is
the prevailing view, some researchers believe MS is a zoonotic disease caused
by a spirochete and spread by an arthropod vector. This study examines the spirochete hypothesis.
Spirochetal involvement in MS was a hypothesis
gaining ground in Europe in the 1930s (Murray, 2005). Unfortunately, most of the research in support of this
hypothesis, as well as the researchers themselves, was lost during World War
II. A surviving researcher, Gabriel
Steiner, published work after World War II that identified a spirochete, Spirochaeta
Myelophthora, as the causal agent of MS with an unknown vector (Steiner,
1952; Steiner, 1954). Some of those who
worked with Steiner in the United States as well as other researchers
hypothesize that MS and Lyme might be either: 1) the same disease; or 2)
different diseases caused by two different spirochetes carried by the same
arthropod vector (Mattman, 2001; Rubel, 2003; Fritzsche, 2005).

Figure
1. Normalized Count of MS Deaths by
County (1998 Deaths Divided by 1990 Census Population)

Figure
2. Normalized Count of Other Specified
Arthropod-Borne Diseases (OSABD) Deaths by County (1998 Deaths Divided by 1990
Census Population)
Geostatistical and biochemical analyses reveal many
similarities between MS and Lyme. Each
is influenced by geography, and MS and Lyme overlap in this geographic
distribution. The author began to
examine the relationship between MS and Lyme after being struck by the
similarity of the distribution apparent in generated distribution maps of both
diseases. See Figure 1 and Figure
2. There are also biochemical similarities. NINDS (2006a) defines MS as “An
unpredictable disease of the central nervous system … in which the body,
through its immune system, launches a defensive attack against its own tissues
… the nerve-insulating myelin.” NINDS
(2006b) also recognizes the neurological complications of Lyme, which usually
occur in the second stage, and include “numbness, pain, weakness,
Bell's palsy … visual disturbances, and meningitis symptoms … decreased
concentration, irritability, memory and sleep disorders, and nerve damage in
the arms and legs.”
Each of the disorders is
characterized by damage to the blood-brain barrier (BBB) endothelium and
subsequent increased barrier permeability (Pardridge, 1998). Degradation of the barrier in Lyme patients
involves bacterial breakdown of the collagen in the BBB basement membrane. The method of degradation in MS is not known
(Russell, 1997), though thickness of the collagen layer could be a factor for
prevalence among certain ethnic groups.
For example, African-Americans have high levels of collagen and low rates
of MS. Both diseases also involve demyelination
triggered by what can resemble an autoimmune attack against the myelin
sheath. Among MS patients, the
mysterious increase in lymphocyte movement across the BBB could be in response
to a bacterial invader. Lastly, MS and
Lyme disease share an inflammatory response, most likely the work of
proinflammatory chemokines and cytokines(Rothwell, 2002). The epidemiological and biochemical
similarities suggest, but do not confirm a common bacterial basis for MS and Lyme.
The possibility of a common bacterial
basis for both MS and Lyme is examined in this study using geostatistical
analysis. Such analysis combines
descriptive and inferential statistical techniques with data visualization
(cartographics). The results have
proven useful in understanding the etiology of many diseases including cholera,
plague, malaria, smallpox, AIDS, and Lyme (Ormsby, 2001, Cliff, 2004; Koch,
2005;). The hypothesis to be tested is
that MS and Lyme Disease are triggered or influenced by a similar zoonotic
spirochetal agent and spread by a tick-like vector. If a common etiology exists, then a geostatistical relationship
between Lyme and MS should be observed at either the state-level or the
county-level or both. The analysis can
be improved by using a control variable (disease) and at least one other
condition in which the causal agent or geographic distribution might be similar
to that of MS.
The control variable in this study
is accident/injury because this condition should be unrelated to a bacterial
distribution. The two diseases with a
suggested bacterial cause or geographic similarity to MS are Breast Cancer
(Cantwell, 1998) and Amyotrophic
Lateral Sclerosis (ALS, Lou Gehrig’s Disease) (Agency
for Toxic Substances and Disease Registry, 2003).
Methods
Comparing disease distributions requires a database
of the incidence of the diseases under examination and their associated
environmental variables. The data
collection process began with a search for an authoritative source of incidence
and prevalence data for Lyme, MS, Breast Cancer, ALS, and
accidents/injuries. Deaths recorded
with the Centers for Disease Control and Prevention (CDC) and other government
agencies provide an incidence measure of the given diseases. A useful dataset was found on TheDataWeb,
which is an online set of data libraries.
The dataset, “Mortality – Underlying Cause-of-Death – 1998” (United States
Bureau of the Census (Census Bureau), 2005b; CDC, 2005c), was accessed via
DataFerret, a data mining
tool (Census Bureau, 2005a; CDC, 2005a).
The United States Bureau of the Census (Census Bureau) and the Centers
Disease Control and Prevention (CDC) make both TheDataWeb and DataFerrett
available to the public without charge.
This
“Mortality” dataset contains geographic, demographic, and cause-of-death
variables obtained from the death certificates of people who died in 1998. Geographic variables include: county and
state of residence, and county
and state population.
Cause-of-death-related variables include the underlying-cause-of-death
coded using the International Classification of Diseases (ICD) Code (9th
Revision).
The coding of
death certificate information is standardized across all states. Death certificates are completed and filed
at the state-level. (CDC, 2005b). The death certificate information is collected from the states at the
federal level by the National Center for Health Statistics (NCHS) and published
along with other vital statistics as part of the National Vital Statistics
System, “the oldest and most successful example of inter-governmental data
sharing in Public Health and the shared relationships, standards, and
procedures form the mechanism by which NCHS collects and disseminates the
Nation's official vital statistics.” (CDC, 2005d, Introduction section). “The vital statistics general mortality data are a fundamental
source of demographic, geographic, and cause-of-death information. This is one of the few sources of comparable
health-related data for small geographic areas and a long time period in the United States.” (Census Bureau, 2005c,
National Center for Health Statistics section).
DataFerrett returns information
from TheDataWeb in aggregate form only.
Upon submitting a DataFerrett query for data the following use
restriction statement is displayed:
WARNING! DATA USE RESTRICTIONS. Read
Carefully Before Using
The Public Health Service Act (Section 308 (d) )
provides that the data collected by the National Center for Health Statistics
(NCHS), Centers for Disease Control and Prevention (CDC), may be used only for
the purpose of health statistical reporting and analysis. Any effort to determine the identity of any
reported case is prohibited by this law.
NCHS does all it can to ensure that the identity of data subjects cannot
be disclosed. All direct identifiers,
as well as any characteristics that might lead to identifications, are omitted
from the dataset. Any intentional
identification or disclosure of a person or establishment violates the
assurances of confidentiality given to the providers of the information. Therefore, users will:
●
Use the data
in this dataset for statistical reporting and analysis only.
●
Make no use
of the identity of any person or establishment discovered inadvertently and
advise the Director, NCHS, of any such discovery.
●
Not link
this dataset with individually identifiable data from other NCHS or non-NCHS
datasets.
By using the data you
signify your agreement to comply with the above-stated statutorily based
requirements.
Because
DataFerrett queries use the ICD (9th Revision; ICD-9) codes as a
selection criteria, the appropriate ICD-9 codes for each disease were
determined through review of an online version of this document available from
the National Center for Health Statistics (NCHS, 2005). See Table 1 for a list of the ICD-9 codes
used as selection criteria. The
Disease/Condition DataFerrett Selection Codes were then used to extract the
state of residence for those who died in the United States in 1998 from each of
the five diseases/conditions of interest.
Data was obtained for each of the fifty (50) states and the District of
Columbia (total N for the state-level analyses = 51). This data was downloaded into an Excel file.
Added
to this Excel file was the population of each state according to both the 1990
Census and the 2000 Census obtained from the Census Bureau American FactFinder,
Population Finder website/webtool (Census Bureau, n.d.). The total 1990 population from the Census
Bureau and the total 1998 deaths from DataFerrett for each state were used to
calculate the incidence variables used in the analyses. See Table 2. The completed Excel file was opened and saved in SPSS (SPSS,
2003), which was used to calculate the descriptive and inferential statistics. The SPSS file was saved as a Dbase IV file
and then opened and saved in ArcGIS for the cartographic analyses.
The
same general method was used to obtain data at the county level. However, in order to protect the privacy of
individuals, DataFerrett does not return data for counties with less than
100,000 people according to the 1990 Census.
Instead, all death data for a state from counties with less than 100,000
is lumped into one value.
Wyoming,
for example, has no counties with a population of more than 100,000 so the
county-level death data for Wyoming is returned as one statewide number.
Disease/
Condition
|
Data Ferrett Selection Code
|
ICD-9
Categories and Code Descriptions
|
|
Multiple
Sclerosis (MS)
|
|
Diseases of the Nervous System and Sense Organs
(VI: 320-389), Other Disorders of the Central Nervous System (340-349), Multiple
Sclerosis (340) – Includes Disseminated or Multiple Sclerosis: Not Otherwise
Specified (NOS), Brain Stem, Cord, Generalized
|
|
Lyme Disease
|
088.8
|
Infectious and Parasitic Diseases (I: 001-139),
Rickettsioses and Other Arthropod-Borne Diseases (080-088), Other
Arthropod-Borne Diseases (088), Other Specified Arthropod-Borne Diseases
(088.8), Lyme Disease (088.81) – includes Erythema Chronicum Migrans,
Babesiosis (088.82) – includes Babesiasis, Other (088.89). NOTE: Lyme could not be selected
individually because DataFerrett does not allow more detail in selection than
088.8, so analyses were done with this dataset for the category Other
Specified Arthropod-Borne Diseases (OSABD) rather than Lyme alone.
|
|
Breast Cancer
|
|
Neoplasms (II: 140-239), Malignant Neoplasm of the
Female Breast (174) –Includes Nipple and Areola (174.0), Central Portion
(174.1), Upper-Inner Quadrant (174.2), Lower-Inner Quadrant (174.3),
Upper-Outer Quadrant (174.4), Lower-Outer Quadrant (174.5), Axillary Tail
(174.6), Other (174.8), and Breast, Unspecified (174.9)
|
|
Amyotrophic Lateral Sclerosis (ALS, Lou Gehrig’s Disease)
|
335.2
|
Diseases of the Nervous System and Sense Organs
(VI: 320-389), Hereditary and Degenerative Diseases of the Central Nervous System
(330-337), Anterior Horn Cell Disease (335), Motor Neuron Disease (335.2) – includes
Amyotrophic Lateral Sclerosis, Progressive Muscular Atrophy (Pure),
and Motor Neuron Disease (Bulbar) (Mixed Type). NOTE: ALS
could not be selected individually because ALS does not have its own ICD-9
code. The code for Motor Neuron
Disease, which includes ALS was used for the analyses done with this
dataset.
|
|
External Cause (CONTROL)
|
E800 - E999
|
Supplementary Classification of
External Causes of Injury and Poisoning (E800 -E999). NOTE: Used as the Control Variable in the analyses.
|
Table 1. ICD-9 Code Used as the DataFerret Selection Criteria and
Reasoning
Variables
|
Calculation of Variable
|
|
|
|
MS Death
Incidence per 100,000 Live (1990)
|
Number of
deaths from MS in 1998 as reported by DataFerrett in that geographic unit
(state, county) divided by the 1990 Census population for that geographic
unit.
|
|
|
MS Death
Incidence per 100,000 Deaths (1998)
|
Number of
deaths from MS in 1998 as reported by DataFerrett in that geographic unit
(state, county) divided by the total number of 1998 deaths from all causes
reported by DataFerrett for that geographic unit.
|
|
|
OSABD Death
Incidence per 100,000 Live (1990)
|
Number of
deaths from OSABD in 1998 as reported by DataFerrett in that geographic unit
(state, county) divided by the 1990 Census population for that geographic
unit.
|
|
|
OSABD Death
Incidence per 100,000 Deaths (1998)
|
Number of
deaths from OSABD in 1998 as reported by DataFerrett in that geographic unit
(state, county) divided by the total number of 1998 deaths from all causes
reported by DataFerrett for that geographic unit.
|
|
|
1998 Lyme
Incidence per 100,000 Live (1990)
|
Number of
new Lyme cases reported by State Epidemiologists to the CDC for 1998 for that
geographic unit (state, county) divided by the 1990 Census population for
that geographic unit.
|
|
|
1992-1998
Lyme Incidence per 100,000 Live (1990)
|
Total of the
number of new Lyme cases reported by State Epidemiologists to the CDC for
each of the years between 1992 and 1998 for that geographic unit (state,
county) divided by the 1990 Census population for that geographic unit.
|
|
|
Breast
Cancer Death Incidence per 100,000 Live (1990)
|
Number of
deaths from Breast Cancer in1998 as reported by DataFerrett in that
geographic unit (state, county) divided by the 1990 Census population for
that geographic unit.
|
|
|
Breast
Cancer Death Incidence per 100,000 Deaths (1998)
|
Number of
deaths from Breast Cancer in 1998 as reported by DataFerrett in that
geographic unit (state, county) divided by the total number of 1998 deaths
from all causes reported by DataFerrett for that geographic unit.
|
|
|
Motor Neuron
Death Incidence per 100,000 Live (1990)
|
Number of
deaths from Motor Neuron Disease in 1998 as reported by DataFerrett in that
geographic unit (state, county) divided by the 1990 Census population for
that geographic unit
|
|
|
Motor Neuron
Death Incidence per 100,000 Deaths (1998)
|
Number of
deaths from Breast Cancer in 1998 as reported by DataFerrett in that
geographic unit (state, county) divided by the total number of 1998 deaths
from all causes reported by DataFerrett for that geographic unit.
|
|
|
External
Cause Death Incidence per 100,000 Live (1990)
|
Number of
deaths from External Causes in 1998 as reported by DataFerrett in that
geographic unit (state, county) divided by the 1990 Census population for
that geographic unit
|
|
|
External
Cause Death Incidence per 100,000 Deaths (1998)
|
Number of
deaths from External Causes in 1998 as reported by DataFerrett in that
geographic unit (state, county) divided by the total number of 1998 deaths
from all causes reported by DataFerrett for that geographic unit.
|
|
Table 2. Calculation of Variables Used in the Dataset of Variables for
Data Analysis
Delaware’s
three counties each have a population over 100,000 so county-level data is
returned for all three Delaware counties.
New Jersey has twenty-one counties, but three of these counties have a
population less than 100,000. For New
Jersey, data is returned for each of eighteen individual counties and then one
number is returned for the three counties (combined) with a population of less
than 100,000.
There
are 3141 counties in the United States, but DataFerrett returns data on 504,
which includes the combined values for a state’s less-than-100,000 counties. At the county-level, the population data was
obtained from Census data available through the University of Virginia
(n.d.). County-level analyses were also
done using only those states generally considered to have a high Lyme incidence
(Lyme-State). These 123 Lyme-State
counties, which include those counties lumped together because of a
less-than-100,000 population, are in the following ten states: Connecticut, Delaware, Maine, Maryland,
Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania, and
Vermont.
All
statistical calculations were done using SPSS.
Counts of disease deaths provided by the CDC were normalized by the 1990
Census population information, yielding number of deaths due to a certain
disease per 100,000 people in that state or county. See Table 2. But
normalizing disease deaths by the number of living people in a state or county
produced the confounding factor of that geographic unit’s demographics and
age. So a new measurement was
introduced: the number of deaths from each disease was divided over the total
deaths of each county or state (incidence of death due to a specific disease
per 100,000 deaths in that geographic unit).
See Table 2. Another confounding
factor was the exclusion of counties with fewer than 100,000 residents due to
CDC privacy policy. To accommodate for
this, the total deaths from all of these smaller counties was smeared
proportionally across each county included in the set. This set of all the counties with fewer than
100,000 people was labeled a “super-county”.
The analysis could use these blocks in combination or independently.
To
this data, in both the state and county files, was added the number of new Lyme
cases reported each year from 1992-1998, centroid latitude, centroid longitude,
and population elevation (the elevation of the county seat or the nearest
population center to the county seat for which there is elevation data). Centroid latitude and longitude were
averaged over all counties in a state to calculate the state value. The same method was used to calculate each
state’s population elevation. Centroid
latitude, centroid longitude, and most population elevation information were
obtained from the United States Geological Survey (USGS, n.d.). The Lyme case data was added because the death
data from DataFerrett includes more than Lyme (See Table 1). The DataFerrett category that includes Lyme
deaths is “Other Specified Arthropod Borne Diseases” in ICD-9. This category variable is named OSABD in
this study.
The
number of Lyme cases in each state for the years 1992-1998 is available from
CDC publications (CDC, 2002). The
number of Lyme cases per year by county is not, however, available from the
CDC. Although the CDC publishes some
multi-year cartographic material by county, the CDC does not report
county-level, annual numerical data for a state to the public. County-level Lyme incidence data is only
available to the public by contacting each state’s department of health,
specifically, the state epidemiologist.
In this study, Lyme data available by county was subsequently compiled
to match the super-counties data available for DataFerrett death data.
The
process of obtaining Lyme incidence data by county for the years 1992 through
and including 1998 was labor-intensive.
Each state’s Department of Health website was visited to see if the
needed Lyme data was available on the website.
If the data was not available, that state’s epidemiologist was emailed
using contact information from the Council of State and Territorial
Epidemiologists (n.d.) website provided by the CDC. Most epidemiologists contacted via email responded and provided
the necessary data. All of these
sources were recorded and the data compiled and added to the database. As of this writing, this appears to be the
most comprehensive database of Lyme in existence.
Results
Descriptive statistics
for the variables in each of the three basic datasets can be found in Table 3,
Table 4, and Table 5. As many
statistical tests assume that the data are normally distributed, each
variable’s skewness and kurtosis values and standard errors were examined. A normally distributed variable has a value
of 0 for both skewness (a measure of symmetry) and kurtosis (a measure of
clustering around a central point). If
the ratio of the skewness value to its standard error is between –2 and +2,
then the distribution is symmetrical (normal).
If the ratio of the kurtosis value to its standard error is between –2
and +2, then the data are normally distributed. (SPSS, 2003; Norusis, 2003).
Few of the variables are normally distributed. In the State-Level variables, only MS
Death Incidence per 100,000 Live (1990), MS Death Incidence per 100,000 Deaths
(1998), Motor Neuron Death Incidence per 100,000 Live (1990), Motor Neuron
Death Incidence per 100,000 Deaths (1998), and External Cause Death Incidence
per 100,000 Live (1990) are normally distributed. In the Lyme-State County Level (Population >= 100,000)
variables, only MS Death Incidence per 100,000 Live (1990) and Breast Cancer
Death Incidence per 100,000 Deaths (1998) are normally distributed.
The
next step in the analysis was a correlation analysis. Calculating a Pearson correlation coefficient (r) is appropriate
for variables that are normally distributed.
(SPSS, 2003, page 379). Calculating
a Kendall’s tau-b or Spearman’s rho is appropriate when the data are not
normally distributed. Because all three
of these correlation analyses assume a linear relationship between the
variables, a scatterplot graph was constructed for each pair of variables to be
analyzed. Each scatterplot was linear
so a Pearson’s, Kendall’s, or Spearman’s coefficient was calculated as
appropriate for pairs of variables in each of the three datasets. The results can be seen in Table 6, Table 7,
and Table 8.
Multiple regression was also used to find the model
that would best predict the MS Death Incidence per 100,000 Deaths. All variables contained in the dataset were
entered into the regression analysis using the stepwise feature. All variable values were converted to
z-scores for use in the regression analysis.
These results can be seen in Table 9.
Lastly, cartographic analyses were completed. These can be seen in Figure 1, Figure 2, and Figure 3. They show the normalized distribution of MS
Deaths, OSABD Deaths, and External Causes Deaths, respectively.
Dataset of
State-Level Disease and Geographic Variables
|
N
|
Min
|
Max
|
Mean
|
Std. Dev.
|
Skewness
|
Kurtosis
|
|
Value
|
Std. Err.
|
Value
|
Std. Err.
|
|
MS Death
Incidence per 100,000 Live (1990)
|
51
|
0.1
|
2.0
|
1.1
|
0.4
|
0.2
|
0.3
|
0.5
|
0.7
|
|
MS Death
Incidence per 100,000 Deaths (1998)
|
51
|
12.4
|
219.6
|
112.8
|
43.7
|
0.3
|
0.3
|
-0.1
|
0.7
|
|
OSABD Death
Incidence per 100,000 Live (1990)
|
51
|
1.5
|
7.2
|
3.6
|
1.6
|
0.8
|
0.3
|
-0.5
|
0.7
|
|
OSABD Death
Incidence per 100,000 Deaths (1998)
|
51
|
159.0
|
803.9
|
385.0
|
166.6
|
0.9
|
0.3
|
0.1
|
0.7
|
|
1998 Lyme
Incidence per 100,000 Live (1990)
|
51
|
|