Developing a Headless Client for Mopidy

I already mentioned the headless client i am developing in the Announcement thread for the beetslocal extension.
Now i would like to ramble a bit about the project to get some new insights, hear about available options, discuss alternative routes or at least to structure my thought and development process.

Firstly, what is it good for?

If you feel at times starting up a client, browsing and searching for music is just to inconvenient.
If you feel existing random play solutions are to simplistic.
If you do not (want to) sit always in front of a screen when listening to music.
If you feel different moods call for different music.
If you don’t want to hear the same stuff over and over again.
If you wish for a learning system to automagically play the right music at the right time.
Maybe the headless client is something for you.
Or maybe my requirements are very special.

After using various Interfaces to operate my music systems, i found the good old remote control to be the most convenient option for me. The X10 Medion RF Remote Control is quite cheap and can be operated through a (thin) wall. I have a couple of these distributed at convenient places ( At the entrance, at the kitchen table, next to the bed etc…) I have a Set of Speakers in every room. They are operated by a central mopidy server. When leaving the house, a single button stops the playback in all rooms. On returning home a single button resumes playback, but i might be in a different mood so the type of music should change.

Some of this is quite simple, other aspects are becoming more complicated.

Secondly, design principles:

Ease of use is key.
Less buttons more intelligence in the background.
Based around “more of this” and “Skip that” requests.
Low system requirements. (Should run on an ARM Plattform)

Features an Implementation:

LIRC and irexec for translating button presses to actions.
For every button on the remote there is a corresponding shell skript in the bin path.
This makes it easy to adapt a buttons action.

Basic Operations

  • play
  • pause
  • stop
  • next (skip)
  • previous
  • jump forward
  • jump back
    are handled via calls to mpc.
    Using mopidys mpd interface.

The more advanced Actions are handled via http requests using curl.

  • play random track (out of a selection defined by a query construct.)
    example:
    curl --data ‘{“query”: {u"artist": [u"Super Flu"]},“uris”: [“spotify:”,“beetsloca:l”],“options”: {“track”: 5,“album”: 0}}’ http://192.168.0.2:6688/randomplay/

  • play random album (out of a selection defined by a query construct.)

    curl --data ‘{“query”: {u"artist": [u"Solomun"]},“options”: {“track”: 0,“album”: 1}}’ http://192.168.0.2:6688/randomplay/

  • play more_by_(album|artist|genre)

    curl http://192.168.0.2:6688/randomplay/?cmd=more_of_artist

  • find related artists on LastFM

    todo

  • toggle continous random play (radio mode)

    In continous mode, when less then 2 tracks remain to be played, play more by (genre|artist|album} or related artist is called. Continous Mode is a state that is stored in SQLite table.

  • like a track

    Every like request increases the rating up to 5 stars. (single button action compatible to usual 5 star rating). A liked track and its rating is written to a SQLite table.

  • track info (spoken)

    On request the current tracks attributes are spoken. (todo)

Internal Operations

  • maintain play count

    On track_play_back_started play count and last_played is written to SQLite table.
    Works for all backends and all clients.

  • register skips, maintain skip count

    **work in progress

  • rating, play count, last_played and skip count define likelihood for choosing a track

    todo

  • save remaining tracklist and play on mopidy restart

    todo

Current Status

Proof of concept is operational.
Some small features have a big impact on real live usage, while others have surprisingly little effect.
Despite all still existing shortcomings its quite a pleasant experience already.

The Last FM related artist feature is now working. The result is quite useful except when spotify decides to throw a little surprise (https://github.com/mopidy/mopidy-spotify/issues/11)

I find the effect of related_artist to be better than more_of_genre, I can see this to become the new default for continuous play.

A new Issue has come up when playback is stopped. There is no current_track available to base a related play on.
I am already writing title and uri along with playcount and last_played.
Unfortunately it is not possible to make a reverse lookup on an uri.
So i came up with these Options:

  1. store a musicbrainz id and get Track, Album and Artist through the musicbrainz api.
  2. store Album and Artist along with the Track.
  3. play the uri, use current_track and go to next track.

I don’t really like any of it.
1 introduces another dependency, overhead and lookup times.
2 means i am building yet another repository thats more or less identical to the local sqlite extension.
3 Feels like a hack, but if the system is fast enough it might be tolerable.

Not sure if I understand you correctly, but won’t core.library.lookup() give you what you need?

It does indeed. That is exactly what i was looking for.
Seems rather obvious now :flushed:
Thank you, this helped a lot.

Text to speak is almost there now. The first words have been played.
There are a few issues though.

I use no software mixer and digital passthrough to have a bitperfect output and to be able to play DTS streams. The price of this being for a info announcement i need to stop the player, make the announcement, restart the player and jump to the last position. Or is there a better way to do this?
This is what i have so far:

def info(self):
        current_track = self.playback.current_track.get()
        logger.info("Current Track %s" % current_track.name)
        pos = self.playback.get_time_position().get()
        self.playback.stop()
        self.track_info.speak_info(current_track)
        self.playback.play()
        sleep(0.2)
        self.playback.seek(pos)

Without the sleep the seek did not succeed, but there probably is a better way to do this.
I am still trying to get my head around gstreamer.
Therefore my code in speak_info is still very experimental

def speak_info(self,track):
    self.player.set_state(gst.STATE_NULL)
    params = {}
    params['tl'] = 'en'
    params['q'] = self.track_info(track)
    music_stream_uri = 'http://translate.google.com/translate_tts?' \
                       + urllib.urlencode(params)
    self.player.set_property('uri', music_stream_uri)
    output = gst.parse_bin_from_description(self.config['audio']['output'],
                                                                      ghost_unconnected_pads=True)
    self.player.set_property('audio-sink', output)
    bus = self.player.get_bus()
    self.player.set_state(gst.STATE_PLAYING)
    i = 0
    while i < 50:
        logger.debug(i)
        #import pdb; pdb.set_trace()
        msg = bus.peek()
        sleep(0.1)
        logger.debug(msg)
        i += 1
    self.player.set_state(gst.STATE_NULL)

In the while loop i am still trying to find out how to establish the end of stream has been reached.
And then use that to break the loop.
So far i am getting the same message at every iteration. I probably have to unref the message somehow. Or use pop() or pop_filtered().
Or maybe i am on a wrong track altogether.

Another issue is the speed of the voice being rather fast and highpitched. I wonder if i could use gstreamer to change that. If that fails there still is espeak.

Hi.
I have also make something similar to this TTS (I am also using google translate).
I do not know if the way I do it is the correct one but I do not have any problems without stopping playback.
You can have a look here.

To know when the end of stream has been reached I used this code:

bus = self.player.get_bus()
bus.enable_sync_message_emission()
bus.add_signal_watch()
bus.connect('message::eos', self.end_of_stream)

This calls method end_of_stream().

Hope it helps!

Thank you, i had seen and bookmarked your extension already.
Its ben a great inspiration and helped getting me started.

The difference though is, that for the reasons mentioned above, i do not use softwaremixers.
Your solution of reducing the volume of the song and speak over it, sure sounds nicer, yet requires a mixer. Without a mixer i do need to stop and restart the player.

I also noticed you use the the default audio output. You might consider setting the gstreamer output to config[‘audio’][‘output’].

After some source code reading and experiments i made another little step.

1.) I added

import gobject
import pygst
pygst.require('0.10')

not sure if this is needed.

2.) i switched to using bus.pop()

def speak_info(self,track):
    self.pipeline.set_state(gst.STATE_NULL)
    params = {}
    params['tl'] = 'en'
    params['q'] = self.track_info(track)
    music_stream_uri = 'http://translate.google.com/translate_tts?' \
                       + urllib.urlencode(params)
    self.pipeline.set_property('uri', music_stream_uri)
    self.pipeline.set_property('flags', PLAYBIN_FLAGS)
    # cant claim that i know what i am doing here (yet)
    # copied from modidy.audio.actor
    self.pipeline.set_property('buffer-size', 2*1024*1024)
    self.pipeline.set_property('buffer-duration', 2*gst.SECOND)
    output = gst.parse_bin_from_description(self.config['audio']['output'],
                                            ghost_unconnected_pads=True)
    self.pipeline.set_property('audio-sink', output)
    bus = self.pipeline.get_bus()
    self.pipeline.set_state(gst.STATE_PLAYING)
    while 42:
        logger.debug(i)
        #import pdb; pdb.set_trace()
        msg = bus.pop()
        if msg:
            logger.debug(msg)
            if msg.type == gst.MESSAGE_EOS:
               break
    self.pipeline.set_state(gst.STATE_NULL)

bus.pop_filtered(gst.MESSAGE_EOS) probably would be a bit shorter, but for now i like to see the gstreamer messages. I still have a chipmunk voice issue to solve.
I am pondering to use a software mixer just for the text announcements.

Thanks for the tip. I will look into the audio output.

I managed to get a more pleasant pitch and tempo

self.pipeline = gst.Pipeline()
self.playbin = gst.element_factory_make("playbin2", "tts_pipeline")
self.playbin.set_property('flags', PLAYBIN_FLAGS)
self.playbin.set_property('buffer-size', 2*1024*1024)
self.playbin.set_property('buffer-duration', 2*gst.SECOND)
# self.playbin.set_property('delay', 50*gst.MSECOND)
self.pipeline.add(self.playbin)
bin = gst.Bin("speed-bin")
pitch = gst.element_factory_make("pitch","tts_pitch")
pitch.set_property("tempo", 0.7)
pitch.set_property("pitch", 0.8 )
bin.add(pitch)
audiosink = gst.parse_bin_from_description(self.config['audio']['output'],
                                                                 ghost_unconnected_pads=True)
bin.add(audiosink)
convert = gst.element_factory_make("audioconvert", "tts_convert")
bin.add(convert)
gst.element_link_many(pitch, convert)
gst.element_link_many(convert, audiosink)
sink_pad = gst.GhostPad("sink", pitch.get_pad("sink"))
bin.add_pad(sink_pad)
self.playbin.set_property('audio-sink', bin)

I suspect a sample rate mismatch of my audio device ( set to 44100) and the stream being 32000hz to be at the core of the issue. Until i find out how to deal with that, this solution not only fixes the issue, but gives me some additional flexibility.
There are still huge areas of Gstreamer to be explored.

Have you looked at using an external tts program? I’d be a bit wary if
getting so dependent on the gstreamer 0.10 API which will, at some point,
be replaced. Something maybe even like jasper? I mention that as it also
apparently supports voice commands which could be useful in your project.

Thanks for pointing me to jasper. Very interesting project.
Yet my problem is not so much the tts program, as the playback.
In jasper they also use the google tts api (among other tts engines like espeak and festival), save it to a file and play it with aplay on the default device.
When i play the downloaded gtts.mp3

mplayer -ao alsa:device=isq_digital gtts.mp3

i hear a high pitched chipmunk voice. This is as i suspected because my direct linked alsa audio device is expecting a 44100hz sample rate.

mplayer  -af resample=44100  -ao alsa:device=isq_digital gtts.mp3

After resampling it sounds right now. I still have not found out how to achieve this in gstreamer.
Cant connect a caps to playbin2 and uridecodebin does not work for me.

So i am calling mplayer in a subprocess for now. This also takes care of waiting for the end of stream. Makes it really easy.
I would prefer using gstreamer though as its less dependant on external programs and device configuration would be more consistent.

Voice commands is definitely something i will look into. Will need to drill more holes in the wall to get a mic next to my bed :wink:

There’s also an option of a Bluetooth headset. You could even pin it to
your shirt and have it so you tap it before you speak. Hopefully you see
where I’m going with this…!

Yeah, basically i translate buttons on a remote to http requests.
jasper can do the same for spoken words.
ARM Devices (raspberry etc) distributed around the house could act as a jasper interface and a remote range extender at the same time.
So i don’t need to drill those holes after all. :sweat_smile:

For this to become a viable option jasper needs to allow for more than one word commands and work reliable without a constantly open mike to google first.

Seems I overlooked this rather interesting page http://jasperproject.github.io/documentation/usage/#spotify-mode

Yes, this looks indeed interesting. Unfortunately i am already spoiled by the endless options mopidy offers. A simple player for spotify playlists just doesn’t cut it anymore. :wink:

While direkt voice input sure is the most spectacular, i suspect a couple of remotes to still be more practical in everyday usage. Especially considering the many different languages in band names and song titles. Nevertheless i am already on the lookout for an usb mic.

By now i have already more functions implemented than there are sensible buttons that different remotes have in common. So i came up with with a navigation concept based on the up, down, right, left, ok and the number buttons.
Every command except for the obvious fix button command play, stop, pause, next, previous etc. can be reached via a two level cross navigation.
Up and down will navigate through the main menu items like:
1 tracklist management
2 Rating
3 Settings
4 Info
…etc
Each Item will be announced together with its numerical shortcut. The ok Button activates the selection.
Right and Left will navigate the second level menu:
(1) 1 more_of_album
(1) 2 more_of_artist
(1) 3 more_of_genre
(1) 4 related_artist

Once a selection is made the selected action is executed. Just pressing the ok button will redo the action. All Action can be accessed either by a single number inside the selected main menu or by pressing a double digit following the -/-- button.
The 0 button will announce the active selection.

This of course relies on tts feedback and saving states, which luckily i have already in place.
I am using pico2wave and mplayer now. 80% less code than google tts and gstreamer.

I have introduced a debugger state that can dynamically be set during run time.
For example:

curl http://192.168.0.2:6688/randomplay/\?cmd\=set_debugger\&arg\=track_playback_started

sets the python debugger for the method track_playback_started in the listener modul.

and there the debugger then is started:

if self.library.get_state('debugger') == u'track_playback_started':
        import pdb; pdb.set_trace()

setting the debugger state to ‘0’ disables debugging.
This way i don’t have to interrupt the music when issues came up i need to look into.
Sweet.