CarPlay Karaoke: Playing Music Videos and Lyrics Through Cover Art

Can you play videos natively through CarPlay? That is, without jailbreaking your phone or buying expensive hardware. The short answer is, no.

The longer answer is, poorly.

Poorly as in, slice up a video’s audio track, treat each slice’s album art as a video frame, and play it back as a gapless album. How well can this simulate video? What follows is my experimentation with this hack.

Note: Apple restricts video from CarPlay for a reason. Keep your eyes on the road while driving.

From Music Videos to Slideshows

For my first test subject, I used the music video for A-Ha’s 1985 classic Take On Me. For my scalpel, I grabbed FFmpeg, the ultimate open source tool for manipulating audio and video.

I started ambitious, splitting the song’s audio track into 100-millisecond segments. This netted 2,271 MP3s and a theoretical frame rate of 10fps. Each MP3’s cover art was a still generated from its video segment. The MP3s were then loaded onto an Apple Music playlist and synced to CarPlay.

Predictably, this playlist stalled. The Music app had trouble playing a single MP3.

I bumped up the segments to 1 second. Lo and behold, Apple Music was able to start this playlist. Both the song and music video were recognizable now.

Yet, it struggled. The biggest hiccups were in the transition from one MP3 to the next. Tracks would start too soon, leaving a tiny but perceptible gap of silence, or too late, skipping a beat. Sometimes the player would freeze.

Next I tried 10 seconds. Much improved, but by now the experience was less a video and more a slideshow. And the transitions between tracks still wasn’t perfectly smooth. Some tracks also suffered from pops and squelches at their start. This was the case on CarPlay, on iPhone, and on Mac.

Switching from a lossy format like MP3 to a lossless one — ALAC (Apple Lossless Audio Codec), fixed most of this. But it introduced even worse staticky noise at the end of a few tracks.

Trying to diagnose all this — was it the encoder or the player? — I played the same MP3 and ALAC files through mpv. The MP3s were better, and the ALACs were flawless!

Apple Music, not the encoding process, appears to be the root of the problem. The part I have no control over.

My fears were confirmed when I tested Abbey Road, a gapless album purchased and downloaded from the iTunes Store. During the side 2 medley, I heard those same gaps.

This was about as a good as I was gonna get.

Generating Lyrics from Subtitles

With full-motion video out of reach, I pivoted from music videos to lyrics. This is actually a feature I’ve wanted in CarPlay. Who doesn’t want to host their own carpool karaoke?

As a quick test, I sliced up one of those lyric videos with a cheesy Hawaiian sunset background from YouTube. The text was a tad small, but the results were promising.

I took it a step further and generated my own images, keeping the stills from the music videos, overlaid with lyrics plucked from the associated SRT (SubRip subtitle file). SRTs contain timestamps for each line of the lyrics, which I used to mark the segments.

Success. But judge for yourself. Here are videos of it in action.

Judy Garland’s Somewhere Over the Rainbow:

And Encanto’s We Don’t Talk About Bruno:

Flexing FFmpeg

None of this would have been possible without FFmpeg. Using only FFmpeg, I was able to:

Burn lyrics to the music video using subtitles from an SRT file
Re-encode the video with new keyframes to force segments at precise frames
Segment the video at timestamps to produce audio segments
Attach metadata to each audio segment including a screenshot from the video as cover art

Here’s the full shell script if you want to try it out. To all the FFmpeg pros out there, please suggest any improvements to the code.


# settings
mp4="take-on-me.mp4"
srt="take-on-me.srt"
title="Take On Me"
artist="A-Ha"
album="Hunting High and Low"
date="1985"

# get timestamps from srt
timestamps=$(grep -oE "^\d+:\d+:\d+," $srt | sed -e "s/,/.999/"  | paste -sd "," -)

# hardcode subtitles
mkdir temp
ffmpeg -i $mp4 \
    -vf "subtitles=$srt:force_style='Fontsize=48,Alignment=10,PrimaryColour=&Hffffff&'" \
    -c:a copy \
    temp/subtitles.mp4

# add keyframes for exact segments
ffmpeg -i temp/subtitles.mp4 \
    -force_key_frames $timestamps \
    temp/keyframes.mp4

# segment mp4s
mkdir -p temp/mp4
ffmpeg -i temp/keyframes.mp4 \
    -c copy \
    -f segment \
    -segment_times $timestamps \
    -segment_start_number 1 \
    -reset_timestamps 1 \
    temp/mp4/%d.mp4

# add metadata to mp3s
mkdir output
files=(temp/mp4/*.mp4)
for ((i=1; i<=${#files[@]}; i++)); do
    ffmpeg -y -i temp/mp4/$i.mp4 \
        -map 0:a:0 \
        -map 0:v:0 \
        -c:a libmp3lame \
        -metadata title="$title" \
        -metadata artist="$artist" \
        -metadata album="$album" \
        -metadata date="$date" \
        -metadata track="$i/${#files[@]}" \
        output/"$title $(printf '%04d' $i)".mp3
done

# delete temp files
rm -r temp

Next Chapter

Two other avenues worth exploring are audiobooks and podcasts with chapter artwork. Wonder how far they could be pushed.

CarPlay Karaoke: Playing Music Videos and Lyrics Through Cover Art

From Music Videos to Slideshows

Generating Lyrics from Subtitles

Flexing FFmpeg

Next Chapter

2 comments Write a comment

Leave a Reply Cancel reply

From Music Videos to Slideshows

Generating Lyrics from Subtitles

Flexing FFmpeg

Next Chapter

Related Posts

2 comments Write a comment

Leave a Reply Cancel reply