Skip to main content
Statistics LibreTexts

3.10: Working with Data Files

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)

    When we are doing statistics, we often need to load in the data that we will analyze. Those data will live in a file on one’s computer or on the internet. For this example, let’s use a file that is hosted on the internet, which contains the gross domestic product (GDP) values for a number of countries around the world. This file is stored as comma-delimited text, meaning that the values for each of the variables in the dataset are separate by commas. There are three variables: the relative rank of the countries, the name of the country, and its GDP value. Here is what the first few lines of the file look like:


    We can load a comma-delimited text file into R using the read.csv() function, which will accept either the location of a file on one’s computer, or a URL for files that are located on the web:

    gdp_df <- read.csv(url)

    Once you have done this, take a look at the data frame using the View() function, and make sure that it looks right — it should have a column for each of the three variables.

    Let’s say that we wanted to create a new file, which contained GDP values in Euros rather than US Dollars. We use today’s exchange rate, which is 1 USD == 0.90 Euros. To convert from Dollars to Euros, we simply multiple the GDP values by the exchange rate, and assign those values to a new variable within the data frame:

    > exchange_rate = 0.9
    > gdp_df$GDP_euros <- gdp_df$GDP * exchange_rate

    You should now see a new variable within the data frame, called “GDP_euros” which contains the new values. Now let’s save this to a comma-delimited text file on our computer called “gdp_euro.csv”. We do this using the write.table() command.

    > write.table(gdp_df, file='gdp_euro.csv')

    This file will be created with the working directory that RStudio is using. You can find this directory using the getwd() function:

    > getwd()
    [1] "/Users/me/MyClasses/Psych10/LearningR"

    3.10: Working with Data Files is shared under a CC BY-NC 2.0 license and was authored, remixed, and/or curated by Russell A. Poldrack via source content that was edited to conform to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.