Document Type


Publication Date



There has long been a demand for cancer incidence data at a fine geographic resolution for use in etiologic hypothesis generation and testing, methodological evaluation, and teaching. In this paper we describe a public domain data set containing data for 23 anatomic sites of cancer diagnosed in New York State between 2005 and 2009 at the level of the census block group. The data set includes 524,503 tumors distributed across 13,823 block groups with an average population of about 1,400. In addition, the data have been linked with race and ethnicity and with socioeconomic indicators such as income, educational attainment, and language proficiency. We demonstrate the application of the data set by confirming two well-established relationships: that between breast cancer and median household income, and that between stomach cancer and Asian race. We foresee that this data set will serve as the basis for a wide range of spatial analyses and serve as a benchmark data set for evaluating spatial methods in the future.


Pre-print version of an article scheduled to be published in Geospatial Health (May 2016)

Included in

Epidemiology Commons



Terms of Use

This article is made available under the Scholars Archive Terms of Use.