Saturday, February 11, 2012

"CONTAINS" ignores "fahrenheit"?

I've posted this before in microsoft.public.sqlserver.programming, and
someone suggested me to post to this newsgroup.
Anyway, I have articles table like this:
tblArticle
article_id
article_title
article_text
article_title and article_text are part of full-catalog.
And among those article records there is one record like this:
article_id: 999
article_title: "Michael Moore..."
article_text: "...Fahrenheit 9/11..."
When I do search like this:
SELECT *
FROM tblArticle
WHERE CONTAINS(article_text, ' "*Fahrenheit 9/11*" ')
it returned me no result.
However, if I do search using LIKE:
SELECT *
FROM tblArticle
WHERE article_text LIKE '%Fahrenheit 9/11%'
it returned me that article_id: 999
More interesting is if I run this SQL Statement:
SELECT *
FROM tblArticle
WHERE CONTAINS(article_text, ' "*Fahrenheit 9/11*" ')
it returned me records that contains "9/11" but not "Fahrenheit 9/11",
for example:
it returns -> article_text: ... in 9/11 event...
it doesn't return -> article_text: ... Michael Moore who wrote Fahrenheit
9/11 ...
At first, I thought probably full-text catalog were not populated.
So, I did repopulate the full-text catalog, yet it still didn't work.
Also, I thought "Fahrenheit" is part of "noise" words, but it's not listed
in the noise.enu.
Is there any way to resolve this "CONTAINS" issue?
Or is this SQL Server bug?
Thanks in advance,
Danny
Correction:

> More interesting is if I run this SQL Statement:
> SELECT *
> FROM tblArticle
> WHERE CONTAINS(article_text, ' "*9/11*" ')
> it returned me records that contains "9/11" but not "Fahrenheit 9/11",
> for example:
> it returns -> article_text: ... in 9/11 event...
> it doesn't return -> article_text: ... Michael Moore who wrote Fahrenheit
> 9/11 ...
>
|||Hi Danny,
Hmm... that someone would be me? <G>
Ok, you need to provide some additional info, specifically, run and post the
full output of the following SQL script:
use <your_database_name_here>
go
SELECT @.@.language
SELECT @.@.version
-- Note, you may need to set advance options on
sp_configure 'default full-text language'
EXEC sp_help_fulltext_catalogs
EXEC sp_help_fulltext_tables
EXEC sp_help_fulltext_columns
EXEC sp_help tblArticle
go
Depending upon the language (the FULLTEXT_LANGUAGE column from
sp_help_fulltext_columns) of the wordbreaker you are using, have you removed
all single digits from the noise.<language> (noise.enu = US_English) file
under the folder: \FTDATA\SQLServer\Config ? If not, then you should and
then run a Full Population. Note, you will need to stop the "Microsoft
Search" service first in order to save the changes to the noise.* files.
Additionally, the preceding asterisk "*" in your query ' "*9/11*" ' is
always ignored and therefore adds no value to your query. SQL FTS support
only "word prefix" wildcard searches with the asterisk and not "word
suffix", for example ' "*og" ' would find dog and log, but these "words"
are not related. However, using a "word prefix" search such as ' "9/11*" '
returns rows that contain "9/11", or ' "fish*" ' will return "fish",
"fishes" or "fishing" as these words are inflectionally related...
Regards,
John
"Danny" <daniel_c@.NOSPAMmyrealbox.com> wrote in message
news:uoE953peEHA.2440@.tk2msftngp13.phx.gbl...[vbcol=seagreen]
> Correction:
Fahrenheit
>
|||John, I am confused by this sentance:
"Additionally, the preceding asterisk "*" in your query ' "*9/11*" ' is
always ignored and therefore adds no value to your query. SQL FTS support
only "word prefix" wildcard searches with the asterisk and not "word
suffix", for example ' "*og" ' would find dog and log, but these "words"
are not related."
In the first part you seem to be saying that the * in the prefix is always
ingored - ie ""Additionally, the preceding asterisk "*" in your query '
"*9/11*" ' is always ignored and therefore adds no value to your query."
and then you go on to say "SQL FTS support only "word prefix" wildcard
searches with the asterisk and not "word suffix", for example ' "*og" '
would find dog and log, but these "words" are not related."
Prefix goes in front, suffix goes behind. Your example of *og matching with
dog and log does not work.
Don't you mean to say "SQL FTS support only "word suffix" wildcard searches
with the asterisk and not "word prefix", for example ' "wild*" ' would find
wildcard and wild, but these "words" are not related."?
Then you go on to say "However, using a "word prefix" search such as '
"9/11*" ' returns rows that contain "9/11", or ' "fish*" ' will return
"fish", "fishes" or "fishing" as these words are inflectionally related..."
I think you mean to say "However, using a "word suffix" search such as '
"9/11*" ' returns rows that contain "9/11", or ' "fish*" ' will return
"fish", "fishes" or "fishing" as these words are inflectionally related...""
And further more the wild card operator does not do stemming, it simply
returns hits to words that start with the letters in front of the *. You are
thinking on the Inflectional operator. To get an idea of what I am talking
add the words mouse to one row and mice to another
Then do this search:
select * from tablename where contains(*,'FormsOF(INFLECTIONAL, Mouse)')
you will get hits to mouse and mice as mouse and mice are inflectionally
related.
as opposed to select * from tablename where contains(*,'mi*')
which will only return hits to mice. So your statement "fish*" ' will
return "fish", "fishes" or "fishing" as these words are inflectionally
related..." seems to be incorrect, or will only hold true if the
inflectionally related terms have the same stems, like with fish, fishes,
and fishing, but not for many English words which do not have the same
stems, like mouse, mice/tooth, teeth/wake,woke/fight, fought/wear,
wore/teach, taught/win, won/sit, sat/write, wrote/take, took/sleep,
slept/run, ran/tell, told/hold, held, off the top of my head .
Daniel, 9/11 as a search phrase works fine for me. Is is possibly your
catalog had not completely built? After you removed 9 and 1 from your noise
word list, did you rebuild your catalog?
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"John Kane" <jt-kane@.comcast.net> wrote in message
news:uFZGJPreEHA.2804@.TK2MSFTNGP11.phx.gbl...
> Hi Danny,
> Hmm... that someone would be me? <G>
> Ok, you need to provide some additional info, specifically, run and post
the
> full output of the following SQL script:
> use <your_database_name_here>
> go
> SELECT @.@.language
> SELECT @.@.version
> -- Note, you may need to set advance options on
> sp_configure 'default full-text language'
> EXEC sp_help_fulltext_catalogs
> EXEC sp_help_fulltext_tables
> EXEC sp_help_fulltext_columns
> EXEC sp_help tblArticle
> go
> Depending upon the language (the FULLTEXT_LANGUAGE column from
> sp_help_fulltext_columns) of the wordbreaker you are using, have you
removed
> all single digits from the noise.<language> (noise.enu = US_English) file
> under the folder: \FTDATA\SQLServer\Config ? If not, then you should and
> then run a Full Population. Note, you will need to stop the "Microsoft
> Search" service first in order to save the changes to the noise.* files.
> Additionally, the preceding asterisk "*" in your query ' "*9/11*" ' is
> always ignored and therefore adds no value to your query. SQL FTS support
> only "word prefix" wildcard searches with the asterisk and not "word
> suffix", for example ' "*og" ' would find dog and log, but these "words"
> are not related. However, using a "word prefix" search such as ' "9/11*"
'
> returns rows that contain "9/11", or ' "fish*" ' will return "fish",
> "fishes" or "fishing" as these words are inflectionally related...
> Regards,
> John
>
>
> "Danny" <daniel_c@.NOSPAMmyrealbox.com> wrote in message
> news:uoE953peEHA.2440@.tk2msftngp13.phx.gbl...
> Fahrenheit
>
|||Hilary,
All I was trying to explain (late at night my time ;-0) was that the
preceding asterisk is ignored in SQL FTS as seems to be a consistent problem
for many people understanding FTS relative to T-SQL LIKE. We still need the
OS and SQL configuration info from Danny. If you want we can discuss this
off-line...
Danny, could you provide your server's configuration info and table info via
the following SQL script and post the full output?
use <your_database_name_here>
go
SELECT @.@.language
SELECT @.@.version
-- Note, you may need to set advance options on
sp_configure 'default full-text language'
EXEC sp_help_fulltext_catalogs
EXEC sp_help_fulltext_tables
EXEC sp_help_fulltext_columns
EXEC sp_help tblArticle
go
Thanks,
John
"Hilary Cotter" <hilaryk@.att.net> wrote in message
news:ONGxDZueEHA.236@.tk2msftngp13.phx.gbl...
> John, I am confused by this sentance:
> "Additionally, the preceding asterisk "*" in your query ' "*9/11*" ' is
> always ignored and therefore adds no value to your query. SQL FTS support
> only "word prefix" wildcard searches with the asterisk and not "word
> suffix", for example ' "*og" ' would find dog and log, but these "words"
> are not related."
> In the first part you seem to be saying that the * in the prefix is always
> ingored - ie ""Additionally, the preceding asterisk "*" in your query '
> "*9/11*" ' is always ignored and therefore adds no value to your query."
> and then you go on to say "SQL FTS support only "word prefix" wildcard
> searches with the asterisk and not "word suffix", for example ' "*og" '
> would find dog and log, but these "words" are not related."
> Prefix goes in front, suffix goes behind. Your example of *og matching
with
> dog and log does not work.
> Don't you mean to say "SQL FTS support only "word suffix" wildcard
searches
> with the asterisk and not "word prefix", for example ' "wild*" ' would
find
> wildcard and wild, but these "words" are not related."?
> Then you go on to say "However, using a "word prefix" search such as '
> "9/11*" ' returns rows that contain "9/11", or ' "fish*" ' will return
> "fish", "fishes" or "fishing" as these words are inflectionally
related..."
> I think you mean to say "However, using a "word suffix" search such as '
> "9/11*" ' returns rows that contain "9/11", or ' "fish*" ' will return
> "fish", "fishes" or "fishing" as these words are inflectionally
related...""
> And further more the wild card operator does not do stemming, it simply
> returns hits to words that start with the letters in front of the *. You
are
> thinking on the Inflectional operator. To get an idea of what I am talking
> add the words mouse to one row and mice to another
> Then do this search:
> select * from tablename where contains(*,'FormsOF(INFLECTIONAL, Mouse)')
> you will get hits to mouse and mice as mouse and mice are inflectionally
> related.
> as opposed to select * from tablename where contains(*,'mi*')
> which will only return hits to mice. So your statement "fish*" ' will
> return "fish", "fishes" or "fishing" as these words are inflectionally
> related..." seems to be incorrect, or will only hold true if the
> inflectionally related terms have the same stems, like with fish, fishes,
> and fishing, but not for many English words which do not have the same
> stems, like mouse, mice/tooth, teeth/wake,woke/fight, fought/wear,
> wore/teach, taught/win, won/sit, sat/write, wrote/take, took/sleep,
> slept/run, ran/tell, told/hold, held, off the top of my head .
> Daniel, 9/11 as a search phrase works fine for me. Is is possibly your
> catalog had not completely built? After you removed 9 and 1 from your
noise[vbcol=seagreen]
> word list, did you rebuild your catalog?
> --
> Hilary Cotter
> Looking for a book on SQL Server replication?
> http://www.nwsu.com/0974973602.html
>
> "John Kane" <jt-kane@.comcast.net> wrote in message
> news:uFZGJPreEHA.2804@.TK2MSFTNGP11.phx.gbl...
> the
> removed
file[vbcol=seagreen]
support[vbcol=seagreen]
"words"[vbcol=seagreen]
"9/11*"[vbcol=seagreen]
> '
9/11",
>
|||> Hi Danny,
> Hmm... that someone would be me? <G>
* Yes, it would be you John

> Ok, you need to provide some additional info, specifically, run and post the
> full output of the following SQL script:
> use <your_database_name_here>
> go
> SELECT @.@.language
Returned:
us_english

> SELECT @.@.version
Returned:
Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002 14:22:05 Copyright (c) 1988-2003 Microsoft Corporation Enterprise Edition on Windows NT 5.0 (Build 2195: Service Pack 4)

> -- Note, you may need to set advance options on
> sp_configure 'default full-text language'
Returned:
name | minimum | maximum | config_value | run_value
--+--+--+--+--
default full-text language| 0 |2147483647 | 1033 | 1033
--+--+--+--+--

> EXEC sp_help_fulltext_catalogs
Returned:
ftcatid | Name | Path | Status | Number_Of_Full_Text_Tables
--+--+--+--+--
6 | FT_CSPDB | d:\SQL\MSSQL\FTDATA | 0 | 10
--+--+--+--+--

> EXEC sp_help_fulltext_tables
Returned several records:
Table_Owner | Table_Name | FullText_Key_Index_Name | FullText_Key_ColID | FullText_Index_Active | FullText_Catalog_Name
--+--+--+--+--+--
...
dbo | tblArticle | PK_tblArticle | 1 | 1 | FT_CSPDB
...

> EXEC sp_help_fulltext_columns
Returned many records:
Table_Owner | Table_ID | Table_Name | FullText_Column_Name | FullText_ColID | FullText_Blobtp_ColName | FullText_Bloptp_ColID | FullText_Language
--+--+--+--+--+--+--+--
...
dbo |1721877301| tblArticle | title | 4 | NULL | NULL | 1033
dbo |1721877301| tblArticle | description | 7 | NULL | NULL | 1033
...

> EXEC sp_help tblArticle
This returned many results. Any specific data you want to know John?

> Depending upon the language (the FULLTEXT_LANGUAGE column from
> sp_help_fulltext_columns) of the wordbreaker you are using, have you removed
> all single digits from the noise.<language> (noise.enu = US_English) file
> under the folder: \FTDATA\SQLServer\Config ? If not, then you should and
> then run a Full Population. Note, you will need to stop the "Microsoft
> Search" service first in order to save the changes to the noise.* files.
* Yes, I did

> Additionally, the preceding asterisk "*" in your query ' "*9/11*" ' is
> always ignored and therefore adds no value to your query. SQL FTS support
> only "word prefix" wildcard searches with the asterisk and not "word
> suffix", for example ' "*og" ' would find dog and log, but these "words"
> are not related. However, using a "word prefix" search such as ' "9/11*" '
> returns rows that contain "9/11", or ' "fish*" ' will return "fish",
> "fishes" or "fishing" as these words are inflectionally related...
So, logically if I use CONTAINS(description, ' "9/11" '), I should get record of "Fahrenheit 9/11" right?
(Since it contains 9/11). The fact is, I don't get that record.
Any more ideas John?
Thanks a lot for your help so far.
|||"John Kane" <jt-kane@.comcast.net> wrote in message
news:OvFUl7veEHA.3792@.TK2MSFTNGP09.phx.gbl...
> Yes, I have an idea as to why "Fahrenheit 9/11" is not returned as you are
> using SQL 2000 on Windows 2000 (Win2K), there is a bug relative to
searching
> for words that may have punctuation characters "touching" or in contact
with
> the search word or phrase. The workaround for this bug is to use the
Neutral
> "Language for Word Breaker" on your FT-enable column and then run a Full
> Population. Note, if you change the language for word breaker, be sure to
> remove single numbers from noise.dat (Neutral noise word file) prior to
> running the Full Population.
* Pardon my lack of understanding, could you elaborate more detail on how to
do this John?

> Could you provide the exact content from the FT-enable column where
> "Fahrenheit 9/11" (including any html tags, if present) as well as the
rows
> that contain only "9/11" ?
* The exact content of "Fahrenheit 9/11" is:
Filmmaker Michael Moore is attacking President George W. Bush and the war in
Iraq with <EM>Fahrenheit 9/11</EM>.
* The exact content of "9/11" is:
The 9/11 crisis has provided a dramatic opportunity for manifesting&nbsp;the
gradual strategic shift in the country's domestic and foreign policy
priorities.
|||"Hilary Cotter" <hilaryk@.att.net> wrote in message
news:ONGxDZueEHA.236@.tk2msftngp13.phx.gbl...
> Daniel, 9/11 as a search phrase works fine for me. Is is possibly your
> catalog had not completely built? After you removed 9 and 1 from your
noise
> word list, did you rebuild your catalog?
> --
> Hilary Cotter
> Looking for a book on SQL Server replication?
> http://www.nwsu.com/0974973602.html
Thanks Hilary for your response.
First of all, my noise word list is completely blank (I have removed them
all).
Then I rebuilt my catalog, and restarted Microsoft Search agent.
Also, if I search 9/11, it returns all records contain 9/11, but NOT records
contain "Fahrenheit 9/11".
It's odd, and I can't figure it out.
- Danny
|||what happens if you remove the <EM>, </EM> tags?
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"DC" <dc@.office> wrote in message
news:%23OzB5GweEHA.140@.TK2MSFTNGP12.phx.gbl...[vbcol=seagreen]
> "John Kane" <jt-kane@.comcast.net> wrote in message
> news:OvFUl7veEHA.3792@.TK2MSFTNGP09.phx.gbl...
are[vbcol=seagreen]
> searching
> with
> Neutral
to
> * Pardon my lack of understanding, could you elaborate more detail on how
to
> do this John?
> rows
> * The exact content of "Fahrenheit 9/11" is:
> Filmmaker Michael Moore is attacking President George W. Bush and the war
in
> Iraq with <EM>Fahrenheit 9/11</EM>.
> * The exact content of "9/11" is:
> The 9/11 crisis has provided a dramatic opportunity for
manifesting&nbsp;the
> gradual strategic shift in the country's domestic and foreign policy
> priorities.
>
|||Stop mssearch. Place a single blank space in your noise word list. Restart
MSSearch and rebuild your catalog.
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"DC" <dc@.office> wrote in message
news:OnHHDJweEHA.1652@.TK2MSFTNGP09.phx.gbl...
> "Hilary Cotter" <hilaryk@.att.net> wrote in message
> news:ONGxDZueEHA.236@.tk2msftngp13.phx.gbl...
> noise
> Thanks Hilary for your response.
> First of all, my noise word list is completely blank (I have removed them
> all).
> Then I rebuilt my catalog, and restarted Microsoft Search agent.
> Also, if I search 9/11, it returns all records contain 9/11, but NOT
records
> contain "Fahrenheit 9/11".
> It's odd, and I can't figure it out.
> - Danny
>

No comments:

Post a Comment