2.3.1.2 How to Read CSV Files from Data Boutique

Updated by Andrea Squatrito

How to Read CSV Files from Data Boutique

Data Boutique offers a variety of datasets for download in CSV format. These datasets, whether samples or full files, can be easily read and used in a variety of applications and programming environments. This guide will walk you through the best methods for reading and using CSV files from Data Boutique.

What is a CSV File?

CSV (Comma Separated Values) is a straightforward file format for storing tabular data, such as spreadsheets or databases. Each line in the CSV file represents a data record, with fields separated by delimiters.

Important Note: Semicolon-Separated Values

Data Boutique uses semicolon-separated values (;) instead of commas as delimiters. This format is common in many European countries. If you typically work with comma-separated files, adjust your settings when importing to correctly read semicolon-separated values.

Tools You Can Use to Read CSV Files

There are several tools and software applications you can use to read and analyze CSV files:

  1. Spreadsheet Applications: Microsoft Excel, Google Sheets, LibreOffice Calc.
  2. Text Editors: Notepad++, Sublime Text, Atom (useful for small files or quick checks).
  3. Programming Languages: Python, R, JavaScript, and more.

Opening Semicolon-Separated CSV Files in Spreadsheet Applications

Microsoft Excel

  1. Open Excel.
  2. Go to File > Open and select your downloaded CSV file.
  3. When the Text Import Wizard appears, select Delimited, and choose Semicolon as the delimiter.
  4. Complete the import to load your file with the correct structure.

Google Sheets

  1. Open Google Sheets in your browser.
  2. Go to File > Import and upload your CSV file.
  3. In the import settings, choose Custom as the separator type and enter a semicolon (;), then click Import Data.

LibreOffice Calc

  1. Open LibreOffice Calc.
  2. Go to File > Open and select your CSV file.
  3. In the Import Options, check Separated by and select Semicolon as the separator. Click OK to import the data.

Reading Semicolon-Separated CSV Files with Programming Languages

Python (using pandas)

Using Python’s pandas library is efficient for reading large CSV files.

import pandas as pd

# Load the CSV file into a DataFrame
data = pd.read_csv('path/to/your/file.csv', delimiter=';')

# Display the first few rows of the DataFrame
print(data.head())

R

R also supports easy CSV reading, particularly for data analysis tasks.

# Load the CSV file into a data frame
data <- read.csv('path/to/your/file.csv', sep=';')

# Display the first few rows of the data frame
head(data)

JavaScript (using Node.js)

With Node.js, you can use the readline module to handle CSV files line by line.

const fs = require('fs');
const readline = require('readline');

const fileStream = fs.createReadStream('path/to/your/file.csv');
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});

rl.on('line', (line) => {
const fields = line.split(';');
console.log(fields);
});

Common Issues and Troubleshooting

  1. Encoding Problems: If you notice unusual characters, the file may be in a different encoding (e.g., UTF-8). Open the file in a text editor and save it with the correct encoding if needed.
  2. Incorrect Delimiters: Ensure that you specify the semicolon (;) as the delimiter when importing the file.
  3. Large Files: For very large CSV files, using programming languages like Python or R is recommended, as they are more efficient for handling large data volumes.

With these tools and tips, you’ll be able to read and work with CSV files from Data Boutique efficiently, ensuring your data is correctly formatted and ready for analysis.


How did we do?