Break on Reject

The one option that changed everything for me was --break-on-reject.

I have a need to download a large amount of videos from YouTube on a daily basis. I call this need, "Project No Thank You Mr. Advertiser", and it involves looping over a list of channels and then asking youtube-dl (or in this case the new and improved, and maintained, yt-dlp) to download all the videos on the channel for me.

There are problems, though. I won't go into them all because that would be just about as interesting as paint drying, but the primary two are:

  • I only want the latest videos I haven't downloaded yet
  • I want the process to stop trying to find videos once a single video does not meet the above criteria

Getting the latest videos involves doing two things:

  • Using --dateafter today-1day so basically everything from a day ago
  • And using --download-archive to store a history of what has been downloaded in the past

That's that sorted. The second problem was a bit more tricky.

Essentially yt-dlp has to download the meta data about every video and then determine if it meets my criteria of --dateafter today-1day. The problem with this is Linus Tech Tips has about 5,450 videos on his channel at the time of writing. The utility has to download meta data about every single one of these videos.

Now we can't stop it downloading the JSON meta data about all the videos on a given channel, it has to do this, but what we can do is tell it to stop processing any more videos after it finds one that doesn't meet our criteria: --break-on-reject.

Basically after downloading a list of videos to check, which can take minutes on large channels, --break-on-reject will prevent yt-dlp from downloading the meta for 5,450 videos and instead will stop the moment it finds a video that doesn't meet our needs.

Perfect.

I've tested a simple script, below, on a selection of large channels and I can download he latest videos in sub ten minutes.

#!/bin/bash

for channel in $(cat channels.txt)
do
	name=$(echo ${channel} | cut -d ':' -f1)
	url=$(echo ${channel} | cut -d ':' -f2)
	mkdir -p "downloads/${name}"
	yt-dlp -f best -ciw -o "%(title)s.%(ext)s" -v ${url} -P "./downloads/${name}/" --download-archive "downloads/${name}/downloaded.txt" --dateafter today-1day --max-downloads 10 --abort-on-error --break-on-reject &
done

And the channels, from channels.txt

russell_brand:youtube.com/channel/UCswH8ovgUp5Bdg-0_JTYFNw
linus_tech_tips:youtube.com/user/LinusTechTips
ravi_sharma:youtube.com/c/PersonalFinancewithRaviSharma
lego:youtube.com/c/LEGO

I believe this is working as expected, but I will have to keep an eye on this over time and make sure it's not skipping or missing videos.

So now I can finally begin to automate the downloading of videos before I get into the office, generate a HTML listing, and just watch them, instantly, free of any adverts or shit UIs. What's more the new yt-dlp can even remove sections from videos that contain sponsored content, but I'm foregoing this as I think that's a step too far and it's not the most annoying content ever.

Adverts from KFC, though? They can fuck off.

Michael Crilly

Michael Crilly

A simple nerd.
Brisbane, Australia