...

METADATA AS A SECOND LANGUAGE

Metadata

By Rick Bidlack, Wheatstone Development Engineer

There is no universal language for streaming metadata. Getting song titles, ad insertion triggers and other metadata from your studio out to the CDN and onto listener devices requires a translator proficient in a variety of metadata tags, types and formats. Fortunately, we speak metadata. Our development engineer Rick Bidlack explains. 

On the one end is the CDN, which takes streams and metadata from the station studio and sends it out to listeners. On the other end is your studio, which includes your automation system, your routing system, and a streaming encoder such as our Wheatstream or Streamblade appliance that performs all stream provisioning, audio processing, and metadata transformation and sends it all off to the CDN.

To hand over the right metadata at the right time and in the right format, our Wheatstream and Streamblade encoders use transform filters written in Lua (see fig. 1), an embedded scripting language that can parse, manipulate and reformat data based on specific field values, content, or patterns that would be difficult to define with conventional methods. Lua transform filters give us a way to map what’s coming in to what’s needed to come out of the studio in order for CDNs to be able to pass on the metadata.

IT STARTS HERE

Metadata as Second Language - Figure 1

Fig. 1 shows the components in Wheatstream/Streamblade encoders with the metadata section, and the Lua transform filter in particular, in red.

Artist and song title metadata typically comes from the automation system and is often synced with the music. Metadata is received by the stream encoder on a TCP or UDP socket, and most commonly arrives formatted as XML. What happens after that depends to a large extent on the transport protocol being used by the CDN, the details of which differ because there are no universally accepted standards for handling metadata (See Fig. 2 and Fig. 3).

Metadata as Second Language - Figure 2

Fig. 2 shows two metadata events from a TRE server. Top: a 30-second commercial. Bottom: a 297-second song. Not all of the data here is meaningful to anything other than the computer that produced it. Useful data, or what the Lua transform filters are looking for, is outlined in red boxes.

WHAT THE CDN SEES

CDNs use various protocols, and depending on the protocol, metadata is either injected into the audio streaming, in the case of protocols HLS, Triton MRV2, and RTMP, or sent separately, in the case of the Icecast protocol. For Icecast, metadata is sent as an independent stream separate from the audio (see Fig. 4). HLS, the HTTP Live Streaming adaptive bitrate streaming protocol by Apple, is a common protocol used in contribution networks that feed into CDNs, and many CDNs have also adopted HLS for carrying metadata with audio in the same stream.

Metadata as Second Language - Figure 3

Fig. 3 shows a song event from an Enco system. Note the presence of the “ampersand” HTML entity in the Artist tag – this will need to be replaced with either an actual ampersand character or the equivalent URL encoding ‘%26’ (depending on how this event is transmitted to the CDN) in order to display in the listener’s player as an actual ampersand rather than the more obtuse string ‘&’. There are similar HTML entities for all special characters that have syntactical meaning within the transmission format. Dealing with all of the details of character encoding, decoding and transcoding is an important part of the job of the transform filter.

For example, in HLS, metadata such as artist, title, duration, album, album art, fan club URL, etc., are formatted as ID3v2 tags and inserted into the MPEG3-TS segments between AAC frames. Metadata involved in switching between program content and ad insertion is commonly written to the manifest file (which is constantly updated with the addition of each new TS segment and ageing-out of the oldest) in the form of SCTE-35 splice points.

Meanwhile, RTMP metadata is encoded into a “setDataFrame” message using the Action Message Format (AMF) developed originally for Flash applications (see Fig. 5). (Despite the demise of Flash video, RTMP itself is still in active use for backhaul streams up to the CDN.) Metadata is represented in serialized structures called AMF arrays. The entire message is wrapped in a standard RTMP packet and inserted into the outbound stream along with the audio packets.

Metadata as Second Language - Figure 4

Fig. 4 shows four examples of HTTP metadata update messages transmitted to Icecast servers. They all start out more or less the same way: the station’s credentials, followed by the DNS address of the CDN’s server, followed by boilerplate signifying a metadata update, along with the mount (the endpoint we are sending our stream to) to which this update pertains. After this is where formats might diverge.

A CDN knows when to switch to ad insertion and when to switch back to normal programming based on the category of the metadata itself. Most CDNs are expecting incoming metadata events to be categorized into three different types: songs, ads (spots), and everything else (sweepers, liners, station ID, PSAs, etc). Ad insertion begins the moment the CDN receives a COM or spot event and ends when the CDN receives any other event. Metadata tightly synced with audio is especially critical for making sure that data matches the actual audio for spots as well as music.

Metadata as Second Language - Figure 5

Fig. 5 shows a schematic representation of an RTMP packet carrying metadata for a single event. The labels STRING, PROPERTY, NAME and VALUE are not to be taken literally, they are just human-readable representations of specific byte values in the AMF structure.

There are many ways to customize and create special conditions that can be transmitted to the CDN with the proper signals, as long as both parties agree on what the signals are.

JOB ONE

Job one for our Streamblade and Wheatstream encoders is to hand off as much relatable and useful data as possible to the CDN, whose main function is to serve your stream to thousands or tens of thousands of listeners.

The twin facts that A) your program and all associated metadata passes through the CDN’s servers; and B) they know who is listening, from what location, and for how long – means your CDN provider has the ability to give you a whole suite of add-on services.

A big one is ad insertion or replacement, which is usually geographically based, but could also be tailored to whatever can be deduced about the individual listener’s tastes and habits. Geo-blocking, logging, skimming, catch-up recording, and playback, access to additional metadata (e.g. album art, fan club URLs), listener statistics and click-throughs, customized players, royalty tracking, redundant stream failover, transcoding from one format to another – these are some of the services that CDNs typically provide. Thus, the CDN basically controls the distribution of the stream to the listening public. It is the responsibility of stream encoders like our Wheatstream Duo and Streamblade – the origin server to the CDN’s ingest and distribution servers – to make sure that the CDN gets the right data at the right time and in the right format.

Especially with regard to metadata, the stream encoder is the mediator/translator between the automation system and the CDN that can open opportunities for ad revenue and more.

The above is an excerpt from Radio World’s latest ebook Streaming Best Practices.

We hope you'll come along with us at Club Wheat by clicking on the SUBSCRIBE button below to begin receiving Wheat News in your email inbox every month.