Searching return both matched non-accented and accented chars

heartbleeded · April 29, 2020, 1:54pm

Hi,

Do you think is it possible to make Mopidy fuzzy search specifically for accented metadata? Or maybe where should I look at in the code to patch.

For example let’s say I’d like to search for “artist”: [“Bjork”] and Mopidy could return songs including “artist”: [“Björk”] as well.

kingosticks · April 29, 2020, 2:16pm

Search terms are passed to the backends which may or may not already do this. For example, Spotify and YouTube do. I expect other streaming providers also do.

I don’t think SQLite’s text search (used by Mopidy-Local) has any support for this type of fuzzy search. So you’d have to hack it to spot these characters in the search terms and then perform two searches, one for the original word and then one with all the accented characters replaced with the non-accented versions, and then merge them. This sounds like a mess to me. Or, patch Mopidy-Local to replace all accented characters in it’s search table and then also hack it to replace all accented characters in the search queries. But then return the original non-hacked metadata. I’m not sure how possible that is.

We have no plans to support this in Mopidy itself.

heartbleeded · April 29, 2020, 3:31pm

Thank you for the insight.

One can create a collation extension using regex for SQLite query that deals with accents. Many solutions out there and I found this very helpful: https://stackoverflow.com/a/47631008

Do you know a specify spot in mopidy-local which would be a good start to apply that kind of patch? I think I need to make another branch myself, in case there’s no plan in Modipy itself yet.

kingosticks · April 29, 2020, 4:50pm

All the database stuff lives in schema.py. Take a look, it’s not a large codebase.

heartbleeded · April 29, 2020, 10:30pm

I think I’ve found a nice and clean solution:

If we use tokenizer unicode for table fts (full text search) of mopidy, it attempts to remove diacritics from Latin script characters. See: https://www.sqlite.org/fts3.html#tokenizer

I’ve patched SQL schema to this

CREATE VIRTUAL TABLE fts USING fts4(
    uri,
    track_name,
    album,
    artist,
    composer,
    performer,
    albumartist,
    genre,
    track_no,
    date,
    comment,
    tokenize=unicode61 "remove_diacritics=2");

Now it works perfectly as I wish, both ways for accented and non-accented queries will return same result.

What do you think @kingosticks?

Topic		Replies	Views
(Solved) Mopidy-soundcloud not working for utf-8 queries General	1	1111	December 30, 2014
Fixing ID3/meta data issues Q&A	8	715	April 28, 2022
Mopidy Filenames General	4	1146	October 14, 2014
[Solved] Search function doesn't work Q&A	3	25	September 27, 2024
ANNOUNCEMENT: Mopidy Mobile Translators Wanted! General	0	682	March 14, 2017

Searching return both matched non-accented and accented chars

Related topics