How to Delete Duplicate Rows in MySQL

Having duplicate data in your database can negatively impact performance and data accuracy. In this article, we'll explore methods to delete duplicate rows in MySQL.

1. Identifying Duplicate Rows

Before deleting duplicate rows, we should identify which rows are duplicates. This can be done using the following SQL command:

SELECT column1, column2, ..., COUNT(*)

FROM table_name

GROUP BY column1, column2, ...

HAVING COUNT(*) > 1;

Replace column1, column2, ... with the columns you want to check for duplicates, and table_name with your table name.

2. Methods to Delete Duplicate Rows

There are several methods to delete duplicate rows, depending on your needs and data structure:

2.1 Using a Subquery

DELETE t1 FROM table_name t1

INNER JOIN table_name t2 

WHERE t1.id > t2.id AND t1.column1 = t2.column1 AND t1.column2 = t2.column2;

This method will delete duplicate rows while keeping the row with the lowest ID.

2.2 Using a Temporary Table

CREATE TEMPORARY TABLE temp_table AS

SELECT MIN(id) AS id

FROM table_name

GROUP BY column1, column2, ...;



DELETE FROM table_name

WHERE id NOT IN (SELECT id FROM temp_table);



DROP TEMPORARY TABLE temp_table;

This method creates a temporary table to store the IDs of rows to keep, then deletes rows that are not in this temporary table.

2.3 Using ROW_NUMBER()

For MySQL 8.0 and above:

DELETE FROM table_name

WHERE id IN (

    SELECT id FROM (

        SELECT id,

        ROW_NUMBER() OVER (PARTITION BY column1, column2, ... ORDER BY id) AS row_num

        FROM table_name

    ) t

    WHERE t.row_num > 1

);

This method uses the ROW_NUMBER() function to assign row numbers to duplicate data, then deletes rows with a number greater than 1.

Precautions

Always backup your data before deleting duplicate rows.
Check the results after deletion to ensure that the desired data is still intact.
Consider using a UNIQUE constraint or INDEX to prevent duplicate data in the future.

Deleting duplicate rows is an important step in maintaining data quality. However, it should be done carefully and using the method most appropriate for your data structure.