Geospatial analysis of reported activity locations to identify TB testing sites

study design

We conducted a case-control study comparing places where TB patients usually stayed before their diagnosis with places where people who live in the same community but have a low risk of TB spent time. The cases included adults with pulmonary TB, as these are the people most easily diagnosed through mobile TB screening units. The low-risk control group included individuals from the same communities with no known exposure to TB and no TB infection. Our goal was to recruit 110 patients and 80 control group participants to give us 80% power at an alpha level of 0.05 to detect a 20% difference in location-based exposure between groups.


Peru is a middle-income country with an estimated TB incidence of 116 per 100,000 population, the highest in Latin America1. The metropolitan area of ​​the capital, Lima, has approximately 10 million inhabitants and is divided into five major regions: Central, North, East and South of Lima and Callao (Supplementary Fig. S1).14.15. Indicators of urbanization and socioeconomic status are higher in central Lima than in other regions14.15. This study focuses on residents of the Carabayllo district in the Lima North region. Carabayllo is served by 12 public health facilities run by the Ministry of Health, which divide the district into defined catchment areas; Tuberculosis patients receive treatment in the health facility in whose catchment area they reside.

This study was implemented during the COVID-19 pandemic in Peru. The first case of COVID-19 in Peru was diagnosed in March 2020 and a nationwide lockdown was imposed that month16. Overall population mobility decreased significantly during the initial lockdown, but gradually increased throughout the year, although various measures to reduce mobility (e.g. school closures, transport restrictions) and social contacts are always implemented17. The first wave of COVID-19 peaked in August 2020 and the second wave peaked in March-April 202118.

Recruitment of TB patients

We recruited adult patients (≥ 18 years old) with pulmonary tuberculosis from 5 centrally located health facilities with contiguous recruitment areas between October 2020 and June 2021. Together, these recruitment areas cover an area of ​​17 km2, hereinafter referred to as the study area. The estimated population of the study area was approximately 178,000 based on the 2017 census19. Patients were recruited within one month of starting treatment. There were no exclusion criteria related to whether the diagnosis was bacteriologically confirmed or not, the patient’s treatment history or drug resistance. During the recruitment period, patients were mainly diagnosed by passive case finding since all active community case finding activities had been suspended during the pandemic. Although active screening activities have gradually resumed over the last three months of recruitment, we did not distinguish between patients diagnosed by active and passive screening.

Recruitment of the control group

Our main consideration when choosing a control group was that it represented low-risk people who live in the same communities as TB patients, because these are the people community screening programs want to avoid. An ideal control group would have been a random sample of individuals from the same communities as TB patients who never had TB and would never develop TB in their lifetime. Since it is impossible to know who will develop TB in the future, we attempted to select people at low risk of TB based on their lack of a history of TB or contact with TB, as well as their absence tuberculosis infection. Mr. tuberculosis.

Adults who lived in the study area were eligible for recruitment into the control group. Enrollment took place from December 2020 to March 2021. Participants in the control group were recruited via IRB-approved posters and flyers distributed throughout the study area. As enrollment of the control group progressed, the study team targeted flyer distribution to address any gender and age imbalances. Those interested in enrollment have been screened for a history of TB or previous close contact with a friend or family member with TB. Those who reported neither were enrolled and a tuberculin skin test (TST, the only widely available test for TB infection in Peru) was administered. Participants with a negative TST result (induration

Data gathering

Study personnel familiar with the study area administered a structured survey to participants. The survey asked participants about the living, working, educational and social places where they spent a lot of time during the 6 months preceding the diagnosis (for patients) or before the survey (for participants of the control group). A 6-month recall period was used given the months-long diagnostic delays that had previously been reported in Lima20, and because responses reflecting this longer period would be less affected by periodic COVID-19 related lockdowns. Living locations included places where participants had officially resided as well as other residences where they had slept at night or spent most of the day (eg, staying with family members). Social places were places outside homes or workplaces where a person spent more than 5 hours per week. For each of the reported locations, study staff recorded enough information to identify its approximate location on a map; locations were described using combinations of streets and landmarks or using a local street and block system, as many places lack numerical addresses. Thus, mapped locations are likely to be accurate to the level of a city block.

Participants were also asked about the use of public transport, health facilities and gathering places, as these are the types of places that could be served by a mobile TB screening unit. We asked how often participants used public transport in the past 6 months and what modalities they commonly used. Modalities included a government-operated rapid transit network, independently operated minibuses and buses that set routes and stops, and taxis or moto-taxis (auto-rickshaws) that are individually hired for transportation to the customer’s preferred destination. We asked which health facilities participants normally used, not limited to the past 6 months. We then classified the facilities into the following categories: public facilities under the Ministry of Health, public health facilities serving people with employer-based health insurance, public-private partnership hospitals run by the municipality of Lima and private sector. Finally, we asked participants if they had ever spent time in prisons or rehabilitation centers, or if they had lived in another place with communal living conditions such as a dormitory.

We did not collect information on individual characteristics or risk factors, as our aim was to identify where it would be useful to place screening units, not to identify individual risk factors for tuberculosis. Individual risk factors for TB are well established, including medical comorbidities (eg, HIV, diabetes) and social determinants (eg, income, housing instability). We know that TB patients are likely to differ from their low-risk neighbors in these factors, and some of these factors may be correlated or lead to differences in where people spend time. We did not want to control for these risk factors as potential confounders because the programmatically useful outcome for this analysis is the difference in locations of activity between the two groups, regardless of the underlying cause.


The study staff who conducted the surveys mapped the approximate locations of the living, working, education and social places reported on Google Maps. Coordinates were exported from Google Maps and overlaid on a study area shapefile using ArcGIS Pro Version 2.8.0 (Environmental Systems Research Institute, Redlands, CA, USA). We used the High Density Based Noise Spatial Clustering Algorithm (HDBSCAN), specifying a minimum cluster size of 5, to explore the presence of consistent clusters of locations that individuals reported. We chose the HDBSCAN method because it allows the identification of clusters of arbitrary shapes and can identify outliers. It requires a single input parameter of minimum cluster size, which can aid in the identification of clusters missed by other density-based clustering methods that rely on a predefined distance threshold.

We used logistic regression to assess differences between location-based exposures reported by TB patients and control group participants, adjusting for gender. Analysis was performed using SAS v9.4 (SAS Institute, Cary, NC, USA).

Ethics statement

This study was conducted in accordance with US Health and Human Services regulations for the protection of human subjects (HHS 45CFR 46). All participants were enrolled with written informed consent. This study was approved by the Mass General Brigham Institutional Review Board (protocol 2019P003679) and the ethics committee of the Universidad Peruana Cayetano Heredia (protocol 19022).