How to Hash Data in a CSV File Using Python and Pandas (with Excluded Columns)

Muhammad Umer Shaikh
2 min readJun 13, 2023

--

Introduction

In today’s data-driven world, ensuring data security is of utmost importance. One commonly used technique to enhance data security is hashing. Hashing involves transforming data into a fixed-length string of characters using a cryptographic algorithm. In this article, we’ll explore how to hash data in a CSV file using Python and Pandas while excluding specific columns. By hashing sensitive data, we can protect its integrity and confidentiality, while still retaining the usefulness of the hashed information.

Prerequisites

Before we begin, make sure you have the following installed on your system:

  1. Python: The programming language we’ll use to implement the hashing process.
  2. Pandas: A powerful library for data manipulation and analysis in Python.

Step 1: Importing Libraries

Let’s start by importing the necessary libraries: pandas for data handling and hashlib for performing the hashing operation.

import pandas as pd
import hashlib

Step 2: Reading the CSV File

Next, we need to read the CSV file into a DataFrame using Pandas. Assuming your CSV file is named “data.csv”, the following code accomplishes this:

# Path to your CSV file
csv_file_path = "data.csv"

# Read the CSV file into a DataFrame
df = pd.read_csv(csv_file_path)

Step 3: Excluding Columns from Hashing

If there are specific columns that should be excluded from hashing (such as sensitive or non-hashable data), define them in a list. For instance, let’s exclude the “zip” and “country” columns:

# Define the list of columns to exclude from hashing
exclude_columns = ["zip", "country"]

Step 4: Hashing the Data

Now, we can iterate through each row and column of the DataFrame and hash the values, excluding the columns specified in the exclude_columns list. We'll use the SHA-256 hashing algorithm for this example.

# Iterate through each row and column
for index, row in df.iterrows():
for column in df.columns:
# Check if the column is in the exclusion list
if column not in exclude_columns:
# Get the value from the current cell
value = str(row[column])

# Hash the value using SHA-256
hashed_value = hashlib.sha256(value.encode()).hexdigest()

# Update the DataFrame with the hashed value
df.at[index, column] = hashed_value

Step 5: Saving the Hashed Data

Finally, we can save the modified DataFrame with the hashed values to a new CSV file.

# Path to save the hashed CSV file
hashed_file_path = "hashed_data.csv"

# Save the modified DataFrame to a new CSV file
df.to_csv(hashed_file_path, index=False)

Conclusion

In this article, we’ve explored how to hash data in a CSV file using Python and Pandas while excluding specific columns. By applying cryptographic hashing algorithms to sensitive data, we can enhance data security without compromising its usefulness. Remember, data security is a crucial aspect of any data-driven application, and hashing is one of the many techniques you can use to protect your data from unauthorized access.

References

--

--

Muhammad Umer Shaikh
Muhammad Umer Shaikh

Written by Muhammad Umer Shaikh

I'm a seasoned software engineer with over 8 years of industry experience. My expertise spans both front-end and back-end development.

Responses (1)