Database Research & Development

  • Home
  • NoSQL
    • NoSQL
    • Cassandra
  • Databases
    • Database Theory
    • Database Designing
    • SQL Server Coding Standards
    • SQL Server
    • PostgreSQL
    • MySQL
    • Greenplum
    • Linux
  • Interviews
    • SQL Server Interviews
    • MySQL Interviews
    • SQL Puzzles
  • DBA Scripts
    • SQL Server DBA Scripts
    • PostgreSQL DBA Scripts
    • MySQL DBA Scripts
    • Greenplum DBA Scripts
  • Home
  • Blog Archives !
  • (: Laugh@dbrnd :)
  • Contact Me !
sqlserverinterviews
Home 2015 September Delete all duplicate rows in MySQL

Delete all duplicate rows in MySQL

This article is half-done without your Comment! *** Please share your thoughts via Comment ***

In this post, I am sharing a demonstration on how to remove all rows except for one in MySQL.

Before a few days ago, I was working in one of our production reports and found that there are a lot of duplicate records.

This table has more than 2,00,00,000 records and I removed duplicate based upon two varchar column and required to store one record for each group.

I prepared a demo with a different solution because I have also to take care of performance.

The Solutions are:

Using Self – Join to check every next row with current row and create a delete flag for duplicate rows.
Swapping of the table – copy unique records in the temp table and restore back to the original table after deleting of duplicate records.

But this didn’t work for me because this solution requires a massive amount of DML operation which will degrade performance.


After some research, I found that MySQL has a feature called ALTER IGNORE.
Using ALTER IGNORE, you can apply unique key on those columns which required for removing duplicates. Once you execute this DDL command, it creates a unique key constraint and removes all duplicate records from the database.

This is an excellent option in MySQL and also worked for me.

Let me demonstrate this:

First, create demo table and data for varchar datatype:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
CREATE DATABASE Employee;
 
CREATE TABLE Employee.tbl_DuplicateEmployee_VarcharDatatype
(
EmpID INTEGER
,EmpName VARCHAR(250)
);
 
INSERT INTO Employee.tbl_DuplicateEmployee_VarcharDatatype
VALUES
(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR');

SET old ALTER table behavior:

1
SET SESSION old_alter_table=1;

Execute ALTER IGNORE:

1
2
3
4
ALTER IGNORE
TABLE Employee.tbl_DuplicateEmployee_VarcharDatatype
ADD UNIQUE INDEX idx_tbl_DuplicateEmployee_VarcharDatatype
(EmpID,EmpName);

Find result:

1
2
SELECT *FROM
Employee.tbl_DuplicateEmployee_VarcharDatatype;

MySQL Alter Ignore Statement

As you can see in the above result, all duplicate records removed.


If your column has a BLOB data type, you cannot directly insert any unique key constraint on it.
If you have a BLOB data type column like TEXT, so temporary, you have to create one MD5 column and use this MD5 column in ALTER IGNORE.

Below is a demonstration for deleting BLOB duplicate records.

First, create a demo table and data for TEXT datatype:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
CREATE TABLE Employee.tbl_DuplicateEmployee_BLOBDataType
(
EmpID INTEGER
,EmpName TEXT
);
 
INSERT INTO Employee.tbl_DuplicateEmployee_BLOBDataType
VALUES
(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR');

Create MD5 column for EmpID(Integer) + EmpName(Text):

1
2
3
4
ALTER TABLE Employee.tbl_DuplicateEmployee_BLOBDataType ADD MD5Column VARCHAR(50);
 
UPDATE Employee.tbl_DuplicateEmployee_BLOBDataType
SET MD5Column = MD5(CONCAT(EmpID,EmpName));

Execute ALTER IGNORE:

1
ALTER IGNORE TABLE Employee.tbl_DuplicateEmployee_BLOBDataType ADD UNIQUE INDEX idx_tbl_DuplicateEmployee_BLOBDataType(MD5Column);

Now check the result:

1
2
SELECT *FROM
Employee.tbl_DuplicateEmployee_BLOBDataType;

Remove MD5 column:

1
ALTER TABLE Employee.tbl_DuplicateEmployee_BLOBDataType DROP COLUMN MD5Column;
Sep 19, 2015Anvesh Patel
Find duplicate records in MySQLImport or Export a CSV File using PostgreSQL COPY Command
Comments: 4
  1. Hareesh
    December 27, 2016 at 4:17 pm

    Hi Anvesh,

    Please explain a bit more about below queries.

    what does this query :
    SET SESSION old_alter_table=1;

    ALTER IGNORE
    TABLE Employee.tbl_DuplicateEmployee_VarcharDatatype
    ADD UNIQUE INDEX idx_tbl_DuplicateEmployee_VarcharDatatype
    (EmpID,EmpName);

    Thanks

    • Anvesh Patel
      Anvesh Patel
      December 29, 2016 at 6:33 pm

      When this option is given, the server does not use the optimized method of processing an ALTER TABLE operation. It reverts to using a temporary table, copying over the data, and then renaming the temporary table to the original.

  2. Sudheer
    September 23, 2019 at 6:25 am

    Hiii Anvesh,

    How to delete the duplicate data in mysql workbench.There are 6 lakhs above data in the table. I have used below query for delete but it shows Lost connection to Mysql Server during query. can you please send the query or explain how to solve problem.

    SET SQL_SAFE_UPDATES=0;
    DELETE t1 FROM PumpErrorLog t1
    INNER JOIN PumpErrorLog t2
    WHERE
    t1.id < t2.id AND
    t1.device_id = t2.device_id AND
    t1.side = t2.side AND
    t1.event_date = t2.event_date AND
    t1.data_source = t2.data_source AND
    t1.code= t2.code AND
    t1.level= t2.level AND
    t1.description= t2.description AND
    t1.pump_number= t2.pump_number AND
    t1.TimeStamp= t2.TimeStamp AND
    t1.report_time= t2.report_time AND
    t1.s3logfile= t2.s3logfile AND
    t1.board_timestamp= t2.board_timestamp ;

  3. Shaik
    February 18, 2020 at 10:17 am

    I am getting below error while trying this

    ALTER IGNORE
    TABLE tbl_DuplicateEmployee_VarcharDatatype
    ADD UNIQUE INDEX idx_tbl_DuplicateEmployee_VarcharDatatype
    (EmpID,EmpName);

    You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘IGNORE
    TABLE tbl_DuplicateEmployee_VarcharDatatype
    ADD UNIQUE INDEX idx_tbl_Dup’ at line 1

Anvesh Patel
Anvesh Patel

Database Engineer

September 19, 2015 MySQL, MySQL DBA ScriptALTER IGNORE, Anvesh Patel, database, database research and development, dbrnd, duplicate records, MySQL, MySQL Command, MySQL Database Administrator, MySQL Database Designing, MySQL Database Programming, MySQL Error, MySQL Performance Tunning, MySQL Query, MySQL Tips and Tricks
About Me!

I'm Anvesh Patel, a Database Engineer certified by Oracle and IBM. I'm working as a Database Architect, Database Optimizer, Database Administrator, Database Developer. Providing the best articles and solutions for different problems in the best manner through my blogs is my passion. I have more than six years of experience with various RDBMS products like MSSQL Server, PostgreSQL, MySQL, Greenplum and currently learning and doing research on BIGData and NoSQL technology. -- Hyderabad, India.

About DBRND !

dbrnd

This is a personal blog (www.dbrnd.com).

Any views or opinions represented in this blog are personal and belong solely to the blog owner and do not represent those of people, institutions or organizations that the owner may or may not be associated with in professional or personal capacity, unless explicitly stated.

Feel free to challenge me, disagree with me, or tell me I’m completely nuts in the comments section of each blog entry, but I reserve the right to delete any comment for any reason whatsoever (abusive, profane, rude, or anonymous comments) - so keep it polite.

The content of this website is protected by copyright. No portion of this website may be copied or replicated in any form without the written consent of the website owner.

Recent Comments !
  • Anvesh Patel { Sure will do... } – May 27, 12:43 PM
  • Anvesh Patel { Great... } – May 27, 12:41 PM
  • Anvesh Patel { Great... } – May 27, 12:39 PM
  • Anvesh Patel { Great... } – May 27, 12:36 PM
  • Anvesh Patel { Great... } – May 27, 12:28 PM
  • Anvesh Patel { Great... } – May 27, 12:27 PM
  • Anvesh Patel { Great... } – May 27, 12:16 PM
  • Older »
Follow Me !
  • facebook
  • linkedin
  • twitter
  • youtube
  • google
  • flickr
© 2015 – 2019 All rights reserved. Database Research & Development (dbrnd.com)
Posting....