Excel - De-Duplicating Data

Stripping Out Duplicate Data From an Excel Worksheet

We often find that after converging data from a number of sources we have duplicate records in our worksheets. In this article I show you how to strip out those duplicate rows using the Excel “Remove Duplicates” option.

De-duplicating A Single Excel Column

In the following example we have duplicate company names in the data on our worksheet.

To de-duplicate the column of company names we highlight the data range (column A) containing the duplicate data and then click on the Remove Duplicates button under the Data tab (the screen shot is taken from Excel 2013 but you will find the same icons in more current versions).

We start off with 831 rows of data selected and the Remove Duplicates dialog opens to enable the columns that determine what constitutes a duplicaton to be selected:

In this example we only have the Ship Name column available as it is the only one in our selected cell range.

We click on the OK button and the following dialog displays:

741 duplicate records were identified. After clicking on the OPK button 89 records will remain.

 

De-duplicating an Entire Excel Worksheet

The followng example tackles a scenario where entire records need to be removed based on the duplication of values in only two of the column values in those records. We again use the Remove Duplicate Rows tool from the Data ribbon, having selected the entire data set:

The Order Date and Ship Name clumns have been selected in the dialog. The result will be that records that have both a duplicate Order Date and duplicate Ship Name values will be removed. Just 7 duplicate records are identfied in ths example:

There will be 823 records left after de-duplication.

 

If you would like to know more about working with Microsoft Excel take a look at our Excel Training Courses or email us at info@ptr.co.uk.

Share this post