Stream Live Audio from a Microphone in Near Real Time in Ubuntu

I have been endeavouring over the past few months to hack my own baby monitor. I initially kinda cheated, by using an IP-webcam.  However, that isn’t nearly as geeky as using a PC and USB webcam (plus, I also wanted night-vision, and IP-webcams with nightvision are not cheap). I got myself a cheap USB webcam from eBay that has six IR lights for nightvision and a built in mic. I’ll post later about sorting out the video feed, which turned out to be relatively easy. Sorting out the live audio feed turned out to be much harder.

My basic set-up is: a PC (I am using a small thin-client, so low-power and silent) running Ubuntu Lucid, a USB webcam with built in mic (with an audio output jack for the mic) and a wired connection to my LAN. I aim to listen to the audio on at least two separate PCs using VLC, so the format of the audio stream wasn’t much of an issue. It turned out to be much more difficult than I expected to get a real-time stream, as the latency with many options turned out to be terrible.

First, I needed to determine what the audio input was. As I had only installed a minimal install of Ubuntu, I needed to install ALSA, the linux sound architecture, via sudo apt-get install alsa-base alsa-utils. Next, I had to set up the mixer levels, since my PC is headless and I was doing this all via SSH, I used the ncurses alsamixer which allows you to set mixer levels via the command line. Don’t forget to run alsactl store afterwards to save your settings. As a result of doing all this, /dev/dsp now pointed to my microphone input.

Next, I needed a way of streaming the audio over my network. As I am a big fan of FFmpeg, that was my first choice, as it comes with a rather neat little streaming server called FFserver. FFserver works in the following way, you set up various streams using an ffserver.conf file, run the server and then run FFmpeg and direct FFmpeg’s output to FFserver. Here is the ffserver.conf file that I used.

After lots of trial and error, I eventually got this going using the FFmpeg command: ffmpeg -f oss -i /dev/dsp http://localhost:8090/feed1.ffm, but the latency was terrible, hitting almost 30 seconds. From what I can gather, FFserver doesn’t get nearly as much love as FFmpeg, the FAQ for FFserver freely admits that audio and video will drift out of sync alongside other issues, so I realised that FFserver was not for me.

Next, I tried using icecast2. Icecast is a streaming music solution, primarily designed to allow you stream music over a network, based on WinAmp’s Shoutcast technology, essentially making your own radio station. I used icecast2 in combination with darkice (since you need a program to send the audio to the icecast server, so it has something to stream). Darkice was perfect as it is designed to stream live audio from the audio input. Both programs are configured via xml files. Here is my icecast.xml and my darkice.xml. You’ll have to edit both these files somewhat, to make sure the log file location is correct for example. Due to the bizzare way that alsa sometimes works, /dev/dsp doesn’t always work. So in the case of darkice, I used hw:0,0 instead. This refers to the same thing, but in a different way it seems (the reasons for it go beyond me, I think it refers to the first card and the first input (the first being 0, the second being 1 etc)).

I would then run the icecast2 server using: icecast2 -b -c ~/icecast.xml and then darkice using: darkice -c ~/darkice.cfg. I had to run the darkice command using sudo, as there were some permission problems with accessing the mic input. This worked much better than FFserver, the delay was now down to about five seconds, but I got constant buffering issues, meaning the audio constantly cut off for ages. Also, over time, the delay would gradually get worse and worse, till it was as bad as the thirty second delay I got with FFserver. Once again, this wasn’t good enough.

My next effort was a complete hack, from the client PC that I wanted to listen to the audio on, I would run the following: ssh server 'cat /dev/dsp > /dev/dsp'. This would literally copy the input from the microphone on the server (my thin client) to the output of my local PC using SSH. Remarkebly, it worked as well as using icecast, but once again there were buffering issues, as I was trasnferring raw uncompressed audio.

So, my next option was using VLC. From doing lots of googling, I had read many people recommending it. My issue with VLC is that it is not a lightweight option and seemed somewhat overkill to me.

The huge advantage with VLC is that it acts as both client and server, taking the audio input and streaming it (although it actually does this using the Live555 streaming media module).

VLC also allows you to stream audio using various protocols; http, rtp, rtsp among others. Once again this wasn’t easy to get working, but after lots and lots and lots of trial and error, I finally came to the following command that worked for me: cvlc -vvv alsa://hw:0,0 --sout '#transcode{acodec=mp2,ab=32}/

:rtp{dst=,port=1234,sdp=rtsp://}' .

This basically takes the microphone input, transcodes it into an MP2 file (despite having FFmpeg installed, VLC would refuse to transcode to MP3) and then streams it via rtp to the address rtsp://

I then simply entered rtsp:// into the network stream input in VLC on the client PC and I got live audio, in near real time!! FINALLY!! The delay was about 1.5 seconds, it required very little bandwidth and seemed to work well…..for a while. Sadly, there were three problems with this. First; only one client could connect at a time, secondly; the stream would fail after a while, sometimes after five minutes, sometime after two hours, but it would always fail and third it would totally slaughter the CPU. Once again, this was no good for listening out for a crying baby.

Finally, I came to the perfect solution, funnily enough going full circle and using FFmpeg. From my googling of VLC and fixing stream dropouts with rtp, I found out that FFmpeg can stream via rtp nativley, with no need to use FFserver. I thought this couldn’t possibly work, but I tried it out using the following command: ffmpeg -f oss -i /dev/dsp -acodec libmp3lame -ab 32k -ac 1 -re -f rtp rtp:// This is similiar to the VLC command, except that this time I am converting into MP3 and streaming to the address: rtp:// Once again, all I need to do is enter rtp:// as the streaming source in VLC. There are a few huge advantages with this method, first, the CPU usage is only about 25% on my naff little 750Mhz Transmeta Crusoe. Second, it seems multiple clients can connect at once and third, it seems rock solid. I have had this command running for three days straight with no problems. Memory usage seems to creep up over time, but that’s about all. I still get the roughly 1.5 seconds latency, and that is rock solid, it never gets any worse than that over time. Finally, a solution that works. So now, I use the following script, that is run at boot, to stream live audio in real time over my LAN:

echo killing old ffmpegs

PID=( $(ps -e | grep ffmpeg | awk '{print $1;}'))
if [ $? = 1 ];then
echo "error getting vlc PID, exiting"
if [ ! -n "$PID" ];then
echo killing ffmpeg with PID $PID
kill $PID

echo starting ffmpeg
ffmpeg -f oss -i /dev/dsp -acodec libmp3lame -ab 32k -ac 1 -re -f rtp rtp:// 2> ~/ffmpeg.log &

echo ffmpeg started with PID $FF

This simply kills any other FFmpeg processes running and then starts a new FFmpeg process to stream the audio input (I kill old processes, so I can run this script if for some reason the stream fails and always be sure that only one instance of FFmpeg is running at any one time).

So, it took me about a month to get this far, but I am finally happy. There are other options that I also tried but could never get to work, such as the ability to share audio inputs using Pulse Audio (I got them to share, but it seemed to constantly crash my network) or using the Live555MediaServer directly (the one that powers VLC) or MPEG4IP but they both were poorly documented and were too complicated. I’ll write up about the video portion of this next, to show how I hacked my own baby monitor.


13 thoughts on “Stream Live Audio from a Microphone in Near Real Time in Ubuntu

  1. The IP address of my system is running Windows XP.
    The IP address of the system from which sound is to be streamed from is running Ubuntu 10.01
    The IP address of the wireless router to which the above two systems are connected to is
    The wireless router is in turn connected to the broadband modem

    Now in your ffmpeg argument which of the above IP addresses should be provided in place of rtp:// ?
    And can a port other than 1234 be used?

    Additional info (maybe irrelevant): I am able to view the other system’s desktop remotely using vnc4server and UltraVNCViewer.

    Please help thanks.

  2. This is a great article. Your one month of effort is going to save my one month of effort to do the same (probably even more) … 🙂 thanks a lot for sharing! I was searching a lot to do something very similar – using my old 800MHz Pentium3 system to stream live audio from mic input so that I can listen to it remotely, across the globe.

  3. What about combining ssh and ffmpeg mp3 encoding:

    ssh server ‘ffmpeg -f oss -i /dev/dsp -acodec libmp3lame -ab 32k -ac 1 – ‘ | ffplay –

    Security encryption of ssh and low bandwith mp3 encoded stream achieved at the same time!

    I must say I could not have the rtp solution work on my machine.

  4. Looks promising! I already have webcam-server for video using a cheap mic-less camera and have that feed running in a html document as a java script on my web site, I would like to do something similar with the audio on a separate mic also a cheap one that just plugs into the motherboards built in sound card from the 1/8″ input jack.

    What I need to know is how to write the html code on a web page like my cam uses to serve the live audio directly from that same web page as the cam so that not only they can watch but also hear. Just so when they open the web page the audio is available and perhaps a simple volume control right in the web page.

    So far Googling has gotten me some close (but not quite close enough) answers and plenty of very vague ones as well using a number of various keywords, I figure once I install the right app to stream the audio in Ubuntu. I need the appropriate html code to make it work streaming straight from the web cam page(no pop ups, or any other apps opening aside from the browser support add ons) if you could point me in the right direction that would be cool.

    Apologies for the long winded request in the comment box.


  5. Hi @prupert,

    Although I have little experience in programming with sound, I am looking into building a simple application on my Ubuntu machine that listens to the audio stream generated by the microphone to recognise a “clap of my hands”. If it does I can use that to start a service of my choice (music for example). I define a clap as the difference between the peak sound and the minimum sound in the last two seconds. If this diference is larger than X (will need some testing) it is recognised as a clap.

    In your opinion, which library would be best to get the maximum and minimum amount of sound of the last two seconds?

    All tips are welcome!

  6. Hi Andrew – it’ll likely be wifi that is the issue,RTP streaming is pretty bandwidth intensive, so you wont get great quality, but itA will be real time. If quality is important, upgrade all tech to the latest version of wifi around 802.11c? or go for another streaming approach that is better quality but not real time.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.