Full-text search missed results

> - There are three searches logged for each "1999" and "2000" while I
> only searched once.

As this is the live search, the search action isn't triggered when you
hit the enter key, but on a timer. It can happen that a search is
triggered more than once under certain circumstances. Which explains why
performance is crucial for this search action. See below.

> Why do some searches get a "w10"?

The full text index stores multiple sets of data for every item. They
are put in different places, which later on would be weighted by their
importance. w10 has highest weight, w1 lowest.

And here comes the lengthy explanation of what I've found investigating
this very special case. Most likely (depending on your music collection)
this really is a rather special case:

- the keyword is very precise: you know what you expect
- the keyword is very short
- the keyword likely is very popular

Now that popularity thing might be a bit irritating. You probably only
have one single track with this name. Why would it be popular? Because
we're dealing with a full text index, covering not only titles, but lots
of other pieces of information, too. Eg. years, file paths, comments,
even MusicBrainz IDs.

Digging the 99 case in my collection I found a lot of these:

Comment: ExactAudioCopy v0.99pb5

Yep. Or something like that:

UFID: [ http://musicbrainz.org, ebe13618-bbdd-4ef3-9a91-9981602e528f ]

That -9981602e528f at the end would match, too, as our search term is at
the start of that "word".

That would explain the popularity of the search term. But why would an
obvious hit not show up, but some obscure, hidden data would win?

Now this is getting complicated. Many factors play a role: optimization
for speed (which might penalize this particular case), the nature of
full text search indexing not only the obvious data, but anything. And
some poor, deliberate choices. And bugs. Wow. Searching for "99" brought
quite a few issues to the light of day :-).

So there's some optimization going on because the search needs to be
fast. One of these optimizations is to try to limit the result set when
we risk to deal with a large number of hits. Eg. short search terms, or
single terms. In this case we're limiting the results to hits in the
highest priority column only (which explains the "w10:99").

If we know that we are still dealing with a large resultset (>500 items
found), the current implementation would only pick the top 500 items.
And that's where I would say there is/was a bug: we pick the top items
out of an non-ordered list... which means that even if the score of "99
Luftballons" was high, but it was far down the "randomly" ordered result
list, it would be cut off.

When the search is being run, it does weigh the results based on
aforementioned columns. If Nena's album had one track called "99
Luftballons", but another album had ten tracks with the EAC version
string in the comment, the latter might outweigh Nena, because the track
title on an album has weight 5, but the comment has 10x weight 1.

This is where a stupid decision kicks in: for whatever reason I decided
it was a good idea to put the MusicBrainz IDs in w10. Sure, it's a
unique value for every item. But nothing else should have them, right?
Therefore they should always bring up exactly one track, even if the
value is stored in the lowest priority column.

New builds are due out in a bit. Unfortunately my shiny new build system
still isn't installed in a decent place. Therefore I have to upload from
behind this super slow 10Mb connection... So please be patient.

Thanks for an interesting test/edge case! :-)

--

Michael

Full-text search missed results

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112