Convert stereo to mono in mopidy.conf

I’m using Mopidy with Snapcast and a mono speaker. I got it working with the output setting in /etc/mopidy/mopidy.conf as described in the Snapcast setup instructions:

output = audioresample ! audioconvert ! audio/x-raw,rate=48000,channels=2,format=S16LE ! wavenc ! filesink location=/tmp/snapfifo

This works but since my speaker is mono (connected to only one side of the amplifier) I don’t hear the second audio channel. I’d like to mix together the stereo channels into a single mono channel using GStreamer on the Snapcast server before sending it to the client (speaker). I searched the web and found in this thread from 2015 a suggestion to use this setting:

output=capsfilter caps=audio/x-raw-int,channels=1 ! alsasink

I adapted this for Snapcast’s output sink:

output = audioresample ! audioconvert ! capsfilter caps=audio/x-raw,rate=48000,channels=1,format=S16LE ! wavenc ! filesink location=/tmp/snapfifo

but this makes audio play way too fast on the speaker. No combination of settings I tried, e.g. sample rates, format, whether audioresample and audioconvert are present, etc., seemed to work. I am kind of out of my depth here. I also found a GStreamer plugin interleave which seems to be able to do this but I couldn’t get it to work with the other settings (it says in syslog: “mopidy[1875]: ERROR [Audio-2] mopidy.audio.actor Failed to create audio output “audioresample ! audioconvert ! audio/x-raw,rate=48000,channels=2,format=S16LE ! interleave ! wavenc ! filesink location=/tmp/snapfifo”: gst_parse_error: could not link audioconvert0 to interleave0, interleave0 can’t handle caps audio/x-raw, rate=(int)48000, channels=(int)2, format=(string)S16LE (3)”).

Does anyone have a working solution for mixing stereo into mono for use with a Snapcast filesink?

I figured out one way: use ALSA on the client connected to the speaker to mix the stereo stream into mono. However, this still requires sending the stereo stream over the network unnecessarily - so I’d still like to figure out how to do this on the server (Mopidy) side.

Forgive me as I don’t use snapcast and I’m not that familiar with how it works, doesn’t snapcast need to know you are providing only half the samples?

I’m not an expert either, but as far as I can tell, Snapcast is just the transporter for an audio stream. It sends whatever is written into a buffer on the server to the client, and pipes this into ALSA at the other side. To ALSA, it’s as if the audio were being produced by an application on the same client.

So, I think as long as ALSA on the client can handle what is produced by Mopidy on the server, I think it should work…

I don’t see how the client knows you’ve sent it something non-standard. If you just only plug in one speaker but leave the client still expecting stereo format (each speaker gets every other sample), but you send mono format (the speaker should get every sample), it will playback twice as fast.

And snapcast is pulling data out of some buffer at a given rate in order to provide stereo sound. If you want to only send mono then surely it needs to know about this.

I guess that mono configuration is in ALSA on the client (and Mopidy on the server?) though, and nothing to do with Snapcast…?

I thought you had to create a snapclient.conf file to tell the client where the stream is, and what the format is, but I can’t see anything about that in the docs now. I haven’t used Snapcast in a year or so. I think it’s changed since I last looked.