Microsoft KB Archive/246800

From BetaArchive Wiki
Knowledge Base


How to correctly parse quotation marks in Full Text Search queries

Article ID: 246800

Article Last Modified on 2/22/2007



APPLIES TO

  • Microsoft SQL Server 7.0 Standard Edition
  • Microsoft SQL Server 7.0 Enterprise Edition
  • Microsoft SQL Server 2000 Standard Edition
  • Microsoft SQL Server 2005 Standard Edition
  • Microsoft SQL Server 2005 Developer Edition
  • Microsoft SQL Server 2005 Enterprise Edition
  • Microsoft SQL Server 2005 Workgroup Edition



This article was previously published under Q246800

SUMMARY

SQL Server 7.0 or later Full Text Search (FTS) provides a rich set of search functionality. You can use a CONTAINS clause to perform a variety of searches, such as:

  • A word or phrase.
  • The prefix of a word or phrase.
  • A word that is near another word.
  • A word that is inflectionally generated from another. For example, the word "drive" is the inflectional stem of: drives, drove, driving, and driven.
  • A word that has a higher designated weighting than another word.

The CONTAINS syntax requires double quotation marks for prefix ("choc*") and for multiple word or phrase searches ("chocolate chip" or "oatmeal"). This can be problematic in terms of UI design, because users are not accustomed to having to properly quote their searches and are prone to making mistakes when they do. A simple function is needed that developers can use to properly quote a query string before passing it to SQL Server.

MORE INFORMATION

Following is a review of the basics of full-text query syntax with the CONTAINS keyword in the WHERE clause (for more complete syntax, see the SQL Server 7.0 Books Online):

  • Simple term:

    WHERE CONTAINS(QuantityPerUnit, 'bottles')
                        
  • Phrase in simple term:

    WHERE CONTAINS(ProductName, ' "Sasquatch ale" OR "steeleye stout" ')
                        
  • Prefix term:

    WHERE CONTAINS(ProductName, ' "choc*" ')
                        
  • OR with prefix term:

    WHERE CONTAINS(Description, '"sea*" OR "bread*"')
                        
  • Proximity term:

    WHERE CONTAINS(ProductName, 'spread NEAR Boysenberry')
                        
  • Generation term:

    WHERE CONTAINS(ProductName, ' FORMSOF (INFLECTIONAL, dry) ')
                        
  • Weighted term:

    WHERE CONTAINS(Description, 'ISABOUT (spread weight (.8), 1)
                        

Notice that proximity, generation, and weighted terms do not use double quotation marks. Also notice that:

  • Individual words and phrases can be surrounded with double quotation marks (except in proximity, generation, and weighted terms).
  • Prefix term searches must be surrounded with double quotation marks.
  • Multiple term and phrase searches must have each word and/or phrase surrounded with double quotation marks.

With the preceding simplified rules in mind, you can create a simple parser using regular expressions to correctly place quotation marks around search strings. The algorithm to use is:

Replace all double quotes (clears the text and any improper quotations)
   If the text string contains one of the key words "NEAR", "FORMSOF", or 
   "ISABOUT", the parsing is complete
   Else
      Surround any instances of 'and' or 'and not' with quotes
      Surround any instances of 'or' or 'or not' with quotes
      Surround the entire string with quotes
                

The following is a JavaScript version:

function fxnParseIt() {
    // Note: sInputString code for demo purposes only, and should be
    //     replaced with user's code for getting in string value.

    var sInputString = 'asp and database';

    sText = sInputString;
    sText = sText.replace(/"/g,"");
    if (sText.search(/(formsof|near|isabout)/i) == -1) {
        sText = sText.replace(/ (and not|and) /gi,'" $1 "');
        sText = sText.replace(/ (or not|or) /gi,'" $1 "');
        sText = '"' + sText + '"';
    }

    sInputString = sText;
}
                

The following is a VBScript version:

Sub ParseIt()
      '// Note: sInputString code for demo purposes only, and should be
      '/    replaced with user's code for getting in string value.
      '// Note: this code could also be easily re-written as a function,
      '/        and is written as a sub for demo purposes only.

      Dim strIn, RegEx, sInput

      sInputString = "asp and database"

      strIn = sInput
      Set RegEx = New RegExp    
    
      If Len(strIn) < 1 Then
            MsgBox ("You must enter a search string")
      Else
            strIn = Replace(strIn, Chr(34), "")
            If (InStr(strIn, "formsof") > 0) Or (InStr(strIn, "near") > 0) Or (InStr(strIn, "isabout") > 0) Then
                  ParseMe.value = strIn
            Else
                  RegEx.IgnoreCase = True
                  RegEx.Global = True
                  RegEx.Pattern = "( and not | and )"
                  strIn = RegEx.Replace(strIn, Chr(34) & "$1" & Chr(34))
                  RegEx.Pattern = "( or not | or )"
                  strIn = RegEx.Replace(strIn, Chr(34) & "$1" & Chr(34))
                  strIn = Chr(34) & strIn & Chr(34)
                  ParseMe.value = strIn
            End If
      End If

End Sub
                


Additional query words: quote quotes fuzzy search fulltext full text

Keywords: kbinfo KB246800