How to Delete Duplicate Rows in MySQL
Having duplicate data in your database can negatively impact performance and data accuracy. In this article, we'll explore methods to delete duplicate rows in MySQL.
1. Identifying Duplicate Rows
Before deleting duplicate rows, we should identify which rows are duplicates. This can be done using the following SQL command:
SELECT column1, column2, ..., COUNT(*)
FROM table_name
GROUP BY column1, column2, ...
HAVING COUNT(*) > 1;
Replace column1, column2, ...
with the columns you want to check for duplicates, and table_name
with your table name.
2. Methods to Delete Duplicate Rows
There are several methods to delete duplicate rows, depending on your needs and data structure:
2.1 Using a Subquery
DELETE t1 FROM table_name t1
INNER JOIN table_name t2
WHERE t1.id > t2.id AND t1.column1 = t2.column1 AND t1.column2 = t2.column2;
This method will delete duplicate rows while keeping the row with the lowest ID.
2.2 Using a Temporary Table
CREATE TEMPORARY TABLE temp_table AS
SELECT MIN(id) AS id
FROM table_name
GROUP BY column1, column2, ...;
DELETE FROM table_name
WHERE id NOT IN (SELECT id FROM temp_table);
DROP TEMPORARY TABLE temp_table;
This method creates a temporary table to store the IDs of rows to keep, then deletes rows that are not in this temporary table.
2.3 Using ROW_NUMBER()
For MySQL 8.0 and above:
DELETE FROM table_name
WHERE id IN (
SELECT id FROM (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY column1, column2, ... ORDER BY id) AS row_num
FROM table_name
) t
WHERE t.row_num > 1
);
This method uses the ROW_NUMBER() function to assign row numbers to duplicate data, then deletes rows with a number greater than 1.
Precautions
- Always backup your data before deleting duplicate rows.
- Check the results after deletion to ensure that the desired data is still intact.
- Consider using a UNIQUE constraint or INDEX to prevent duplicate data in the future.
Deleting duplicate rows is an important step in maintaining data quality. However, it should be done carefully and using the method most appropriate for your data structure.