Validate Imported Data
After importing data, it's crucial to validate it to ensure its accuracy and integrity. In this step, you will learn some basic SQL queries to check for common data issues after importing.
We have created a CSV file named employees_validation.csv
in your ~/project
directory that contains some potentially problematic data, including an invalid email format and a missing department. Let's examine its content:
cat ~/project/employees_validation.csv
You should see:
3,Invalid,Email,invalid_email,Sales
4,Missing,Department,[email protected],
Notice the row with ID 3 has an email invalid_email
which is not a valid format, and the row with ID 4 has an empty string for the department.
First, clear the existing data from the employees
table:
TRUNCATE TABLE employees;
Now, import the data from employees_validation.csv
:
LOAD DATA INFILE '/home/labex/project/employees_validation.csv'
INTO TABLE employees
CHARACTER SET utf8
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';
Now that the data is imported, let's perform some validation checks using SQL queries in the MySQL shell.
1. Check for Invalid Email Formats:
We can use the LIKE
operator to check for email addresses that don't contain both @
and .
characters, which are typically required for a valid email format.
SELECT * FROM employees WHERE email NOT LIKE '%@%.%';
This query should return the row with id = 3
because its email address invalid_email
does not contain @
and .
.
2. Check for Missing Departments:
We can check for rows where the department
column is an empty string.
SELECT * FROM employees WHERE department = '';
This query should return the row with id = 4
because its department is an empty string.
3. Check for Duplicate IDs:
Although our validation file doesn't have duplicate IDs, it's a common validation check. You can use GROUP BY
and HAVING
to find duplicate primary keys.
SELECT id, COUNT(*) FROM employees GROUP BY id HAVING COUNT(*) > 1;
This query would return any id
values that appear more than once in the table. In this case, it should return an empty result set.
These are just a few examples of basic data validation using SQL. Depending on your data and requirements, you might need more complex checks using regular expressions, constraints, or stored procedures.
Finally, you can exit the MySQL shell:
exit
You have successfully learned how to import and export data in MySQL, handle encoding issues, and perform basic data validation.