Searching return both matched non-accented and accented chars

Hi,

Do you think is it possible to make Mopidy fuzzy search specifically for accented metadata? Or maybe where should I look at in the code to patch.

For example let’s say I’d like to search for “artist”: [“Bjork”] and Mopidy could return songs including “artist”: [“Björk”] as well.

Search terms are passed to the backends which may or may not already do this. For example, Spotify and YouTube do. I expect other streaming providers also do.

I don’t think SQLite’s text search (used by Mopidy-Local) has any support for this type of fuzzy search. So you’d have to hack it to spot these characters in the search terms and then perform two searches, one for the original word and then one with all the accented characters replaced with the non-accented versions, and then merge them. This sounds like a mess to me. Or, patch Mopidy-Local to replace all accented characters in it’s search table and then also hack it to replace all accented characters in the search queries. But then return the original non-hacked metadata. I’m not sure how possible that is.

We have no plans to support this in Mopidy itself.

Thank you for the insight.

One can create a collation extension using regex for SQLite query that deals with accents. Many solutions out there and I found this very helpful: https://stackoverflow.com/a/47631008

Do you know a specify spot in mopidy-local which would be a good start to apply that kind of patch? I think I need to make another branch myself, in case there’s no plan in Modipy itself yet.

All the database stuff lives in schema.py. Take a look, it’s not a large codebase.

1 Like

I think I’ve found a nice and clean solution:

If we use tokenizer unicode for table fts (full text search) of mopidy, it attempts to remove diacritics from Latin script characters. See: https://www.sqlite.org/fts3.html#tokenizer

I’ve patched SQL schema to this

CREATE VIRTUAL TABLE fts USING fts4(
    uri,
    track_name,
    album,
    artist,
    composer,
    performer,
    albumartist,
    genre,
    track_no,
    date,
    comment,
    tokenize=unicode61 "remove_diacritics=2");

Now it works perfectly as I wish, both ways for accented and non-accented queries will return same result.

What do you think @kingosticks?