Microsoft KB Archive/104332

From BetaArchive Wiki

Methods of Deleting Duplicate Records from a Database

ID: Q104332



The information in this article applies to:

  • Microsoft FoxPro for Windows, versions 2.5, 2.5a
  • Microsoft FoxPro for MS-DOS, versions 1.02, 2.0, 2.5, 2.5a
  • Microsoft FoxBASE+ for MS-DOS, version 2.01
  • Microsoft FoxBASE+ for Macintosh, version 2.01




SUMMARY

You can use a number of methods to eliminate duplicate records from a database. The method you use will be determined by the specific product being used, whether making a duplicate of the original table is feasible, and the amount of disk space available.

The three methods can be summarized as follows:


  • Process database sequentially, determining if records are duplicates and manually deleting duplicate records. This method requires writing a program.
  • Create a unique index and copy the records to a new file that contains only unique records. This method can be performed without writing a program.
  • In FoxPro 2.0 or later, perform a SELECT DISTINCT command, which will create a new table with only unique records. This method can be performed without writing a program.


MORE INFORMATION

Method 1 - Process Database Sequentially

This method can be used with FoxBASE+ as well as FoxPro. It requires the least amount of disk space to perform the deletions, although it requires a PACK command after all deletions have been completed. The PACK command can require 3-5 times the database size in disk space.

  1. Create a program named DUPE.PRG with the following commands: '

       * Memory variables -
       * counter = counter used to cycle through fields
       * firstrec = array containing first record to test
       * secrec   = array containing second record to test
       USE <filename>
    
       * The following command would be modified to include all fields
       * in the data table that make a record unique.
       * Some fields may have to be converted to character fields.
       INDEX ON field1 + field2 TO temp.idx
       SET INDEX TO temp.idx
       GO TOP
    
       * Use two arrays to see if data has changed.
       * This program requires that all fields in each record be
       * identical in order for a record to be considered a duplicate.
       * If fewer fields were required to define a record as a duplicate,
       * the arrays would be built with only those fields required to
       * define a record as a duplicate.
       DO WHILE .NOT. EOF()   && In FoxPro 2.0 and later, DO WHILE can be
          SCATTER TO firstrec && replaced by more efficient SCAN.ENDSCAN
          counter = 1
          SKIP
          SCATTER TO secrec
       * Compare each array element to see if other record differs.
          DO WHILE counter < FCOUNT()
             IF firstrec(counter) <> secrec(counter)
                EXIT
             ENDIF
              DELETE
              EXIT
          ENDDO
       ENDDO 
  2. Issue the following command: '

          DO dupe.prg 
  3. Pack the database if desired.

Method 2 - Create a Unique Index and Copy the Records to a New File

This method can be used with FoxBASE+ as well as FoxPro.

  1. Issue the following commands: '

          USE <database>
          SET UNIQUE ON 
  2. Create a temporary, unique index. Modify the following command as necessary to include all the fields that make a unique record. '

          INDEX ON field1 + field2 TO TEMP.IDX
          SET INDEX TO temp.idx 
  3. If you are not using FoxPro 2.0 or later, or if maintaining the index expressions in the current .CDX file is not required, create a new file as follows. Otherwise, go to step 4. '

          COPY TO newtable
          USE newtable 
  4. If FoxPro 2.0 or later is being used and it is desirable to maintain the structure of the original .CDX file, create a new file as follows. Otherwise, go to step 5. '

          COPY TO newtable WITH CDX
          USE newtable 
  5. Copy or rename the new table, memo file, and .CDX to the original name. If you are renaming, delete the original file(s) first.
  6. Issue the following command: '

          SET UNIQUE OFF 
  7. Delete TEMP.IDX if desired.

Method 3 - In FoxPro 2.0 or Later, Use SELECT

DISTINCT to Create New Table with Only Unique Records

  1. Issue the following command: '

          USE <filename> 
  2. Do one of the following:

    • Create a new query by choosing New Query from the Run menu, selecting the Fields check box, selecting the No Duplicates check box, and choosing OK. From the Output list box, choose Table/DBF and type "newtable" (without the quotation marks) in the Name box. Choose the Do Query button to execute the query.
    • Issue the following command in the Command window: '

               SELECT DISTINCT * from <filename> INTO TABLE newtable 
  3. If maintaining the index expressions in the current .CDX file is not required, create a new file as follows. Otherwise, go to step 4. '

          COPY TO newtable
          USE newtable 
  4. If you want to maintain the structure of the original .CDX file, create a new file as follows. Otherwise, go to step 5. '

          COPY TO newtab2 WITH CDX
          USE newtab2
          APPEND FROM newtable 
  5. Copy or rename the new table, memo file, and .CDX to the original name. If you are renaming, delete the original file(s) first.

Additional query words: FoxDos FoxWin remove sql

Keywords          : kbcode FxprgSql 
Version           : 2.50 2.50a | 1.02 2.00 2.50 2.50
Platform          : MS-DOS WINDOWS 
Issue type        : 

Last Reviewed: August 12, 1999
© 1999 Microsoft Corporation. All rights reserved. Terms of Use.