4.4: Established Data
- Page ID
- 60411
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Many studies take advantage of established datasets that are available either publicly or accessed with permission from the owners of the data. In this way researchers can employ the resources of much larger organizations for their studies.
The United States federal government is constantly gathering large amounts of data on the economic conditions, education, crime, and health issues of those living within the United States and its territories. Much of this data can be accessed quite easily, though some of the observations will be blinded in some way to protect the privacy of those included in the data. For example, names and Social Security numbers are not available, as well as other specific information that would identify residents. Nonetheless, these databases provide invaluable information about the individuals in the population. These data have been gathered by a dedicated group of professionals using the vast number of resources and associated infrastructure available to the federal government. The availability of this data allows individuals and researchers to take advantage of their expertise and resources for their own research.
The United State Census Bureau has a wealth of information that is collected every ten years for the constitutionally mandated census in the United States. Much of the collected data is available in summaries, reports, and dynamic data tools which can be found at www.census.gov. Observation level data is also available to academic and research organizations by requesting a special key and using specialized data extraction software. It is important to note that some of the data is censored so that individuals cannot be identified from the data that is supplied. Specifically, as mandated by federal law, the following is stated on the Census Bureau website:
Federal law [13 U.S.C. §§ 8 and 9] requires the Census Bureau to publish data in a manner that does not permit use of the data alone, or in combination with other available information, to identify any particular respondent to a Census Bureau survey. The identification of a person or establishment or disclosure of information collected from that person or establishment in the course of a Census Bureau survey violates the assurances of confidentiality provided by federal law.
These types of restrictions are rarely prohibitive in research studies as the need to identify specific individuals in such a population is rarely necessary. After 50 years, full data are released to the public.
Additionally, the United States government runs the data.gov website, which is a collection of data that is available to anyone. The website was launched in late May 2009 by the Federal Chief Information Officer of the United States, Vivek Kundra, with the purpose of improving public access to datasets generated by the executive branch of the federal government. By 2022, the site listed almost 350,000 datasets that could be accessed online, including data from the federal government, state governments, city governments, county governments, and universities. Among the federal government data, agencies such as NOAA (National Oceanic and Atmospheric Administration), NASA (National Aeronautics and Space Administration), the Department of Commerce, and the Department of the Interior have data accessible from the site.
Using government data of this sort has many advantages for researchers. Access to the data is free and has been vetted by government agencies to ensure quality. For many studies much of the data comes from surveys performed by the Department of Education, the Department of Justice, and the Census Bureau. The Department of Education has a specific section dedicated to civil rights data that includes district level demographic data. Another section is dedicated to data on student outcomes. The National Center for Education Statistics was created as a part of the Department of Education by congressional mandate to “collect, collate, analyze, and report complete statistics on the condition of American education” (nces.ed.gov/about). The center's website contains additional educational data along with a tool that can link researchers from a dataset to published and peer-reviewed research articles that have analyzed that data.
The U.S. Department of Justice is a large government entity that maintains data through its agencies such as the Bureau of Justice Statistics, Federal Bureau of Investigation, Federal Bureau of Prisons, National Institute of Justice, Office of Juvenile Justice and Delinquency Prevention, and the U.S. Trustee Program. A complete list of agencies can be found at justice.gov, many of which provide data. These resources are valuable to researchers studying issues related to crime and the judicial system.
From the Bureau of Justice Statistics website (bjs.ojp.gov) data is available by topic, including data concerning corrections, courts, the federal justice system, and victim of crimes. For example, under the topic of victims of crime, the site includes a city-level survey of crime victimization and citizens’ attitudes, a national survey of victim service providers, and emergency room statistics on intentional violence. Reports can also be found on the site.
The Federal Bureau of Investigation (FBI) hosts the Uniform Crime Reporting Program (UCR), which can be used to generate reliable crime statistics (www.fbi.gov/services/cjis/ucr). The UCR Program includes data from more than 18,000 city, university and college, county, state, tribal, and federal law enforcement agencies, who voluntarily submit crime data. The databases include information on hate crime statistics, use of force, and suicide.
The Department of Commerce (data.commerce.gov) maintains a site for data from entities including the Bureau of Economic Analysis, the Bureau of Industry and Security, the Economic Data Administration, and the Minority Business Development Agency. The data included on this site includes observations on all types of economic activity.
Other established data may be available through universities or other nonprofit academic agencies. For example, it is becoming more commonplace that data availability is a prerequisite for the publication of research articles. This is due in part to the idea that the research community should be able to verify the results of a research study. This access helps maintain the integrity of the research area. Furthermore, there has been an effort to make publicly funded research data available to the public. As with census data, these datasets may censor some observations to protect the privacy of those who participated in a study or survey.

