Playing in-memory audio streams on Windows 8
A customer I'd been working with recently came up with a support request for a Windows 8 Store app they'd been working on. They were building the app using the HTML/CSS/JS stack and wanted the ability to play audio streams completely from memory instead of loading it up from a file on the file system or a network stream. They needed this because their service implemented a custom Digital Rights Management (DRM) system where the audio content was encrypted and this needed to be decrypted before playback (duh!). They wanted however, to perform this decryption on the fly during playback instead of creating a decrypted version of the content on the file system. In this post I talk about a little sample I put together for them showing how you can achieve this on Windows 8. If you prefer to directly jump into the code and take a look at things on your own, then here's where its at:
Playing media streams from memory
The primary requirement proved to be fairly straightforward to accomplish. Turns out, there already exists an SDK sample showing exactly this. The sample shows how to achieve media playback from memory streams using the Windows.Media.Core.MediaStreamSource object. Briefly, here are the steps:
First you go fetch some metadata from the media stream. In case of audio content, this turns out to be the sample rate, encoding bit rate, duration and number of channels. For file based audio sources, the Windows.Storage.StorageFile object has the ability to extract this information from the file directly via Windows.Storage.StorageFile.Properties.RetrievePropertiesAsync. Here's an example function that accepts a
StorageFile
object as input and then extracts and returns the said metadata from it.function loadProps(file) { var props = { fileName: "", sampleRate: 0, bitRate: 0, channelCount: 0, duration: 0 }; // save file name props.fileName = file.name; return file.properties.getMusicPropertiesAsync().then( function (musicProps) { // save duration props.duration = musicProps.duration; var encProps = [ "System.Audio.SampleRate", "System.Audio.ChannelCount", "System.Audio.EncodingBitrate" ]; return file.properties. retrievePropertiesAsync(encProps); }).then(function (encProps) { // save encoding properties props.sampleRate = encProps["System.Audio.SampleRate"]; props.bitRate = encProps["System.Audio.EncodingBitrate"]; props.channelCount = encProps["System.Audio.ChannelCount"]; return props; }); }
Wrap the metadata gathered in step 1 in a Windows.Media.MediaProperties.AudioEncodingProperties object which in turn is then wrapped in a Windows.Media.Core.AudioStreamDescriptor object.
- Use the
AudioStreamDescriptor
object to initialize aMediaStreamSource
instance and setup event handlers for theMediaStreamSource
'sStarting
,SampleRequested
andClosed
events. As you might imagine, the idea is to respond to these events by handing out audio data to theMediaStreamSource
which then proceeds to play that content.
This is all fine and dandy, but how do we get this to work when the audio content is stored in memory in an Windows.Storage.Streams.InMemoryRandomAccessStream object? The challenge of course is in extracting the metadata we need to setup a MediaStreamSource
object.
StorageFile can read from arbitrary streams?
As it happens, the StorageFile
object has direct support for having it powered by an arbitrary stream (or pretty much anything really). I figured I'll hook up a StorageFile
with an InMemoryRandomAccessStream
object and have it extract the metadata that I needed. Here's how you connect a StorageFile
with data fetched from any arbitrary source - in this case, just a string constant. You create a StorageFile
object by calling StorageFile.CreateStreamedFileAsync. CreateStreamedFileAsync
requires that you pass a reference to a callback routine which is expected to supply the data the StorageFile
object needs when it is first accessed. Here's a brief example:
function init() {
var reader;
var size = 0;
Windows.Storage.StorageFile.createStreamedFileAsync(
"foo.txt", generateData, null).then(
function (file) {
// open a stream on the file and read the data;
// this will cause the StorageFile object to
// invoke the "generateData" function
return file.openReadAsync();
}).then(function (stream) {
var inputStream = stream.getInputStreamAt(0);
reader = new Windows.Storage.Streams.DataReader(inputStream);
size = stream.size;
return reader.loadAsync(size);
}).then(function () {
var str = reader.readString(size);
console.log(str);
});
}
function generateData(stream) {
var writer = new Windows.Storage.Streams.DataWriter();
writer.writeString("Some arbit random data.");
var buffer = writer.detachBuffer();
writer.close();
stream.writeAsync(buffer).then(function () {
return stream.flushAsync();
}).done(function () {
stream.close();
});
}
The problem however, as I ended up discovering, is that StorageFile
objects that work off of a stream created in this fashion do not support retrieval of file properties via StorageFile.Properties.RetrievePropertiesAsync
or for that matter StorageFile.Properties.GetMusicPropertiesAsync
. So clearly, this approach is not going to work. Having said that its useful to know that this technique is possible at all with StorageFile
objects as it allows you to defer performing the actual work of producing the data represented by the StorageFile
object till it is actually needed. And being a bona fide Windows Runtime object you can confidently pass this around wherever a StorageFile
object is accepted - for instance when implementing a share source contract you might hand out a StorageFile
object created in this manner via Windows.ApplicationModel.DataTransfer.DataPackage.SetStorageItems.
Reading music metadata using the Microsoft Media Foundation
After a bit of research I discovered that there is another API that can be used for fetching metadata from media streams (among other things) called the Microsoft Media Foundation. In particular, the API features an object called the source reader that can be used to get the data we are after. The trouble though is that this is a COM based API and cannot therefore be directly invoked from JavaScript. I decided to write a little wrapper Windows Runtime component in C++ and then use that from the JS app. After non-trivial help from my colleague Chris Guzak and others directly from the Media Foundation team at Microsoft (perks of working for Microsoft I guess!) we managed to put together a small component that allows us to read the required meta data from an InMemoryRandomAccessStream
object. Here's relevant snippet that does the main job (stripped out all the error handling code to de-clutter the code):
MFAttributesHelper(InMemoryRandomAccessStream^ stream, String^ mimeType)
{
MFStartup(MF_VERSION);
// create an IMFByteStream from "stream"
ComPtr<IMFByteStream> byteStream;
MFCreateMFByteStreamOnStreamEx(
reinterpret_cast<IUnknown*>(stream),
&byteStream);
// assign mime type to the attributes on this byte stream
ComPtr<IMFAttributes> attributes;
byteStream.As(&attributes);
attributes->SetString(
MF_BYTESTREAM_CONTENT_TYPE,
mimeType->Data());
// create a source reader from the byte stream
ComPtr<IMFSourceReader> sourceReader;
MFCreateSourceReaderFromByteStream(
byteStream.Get(),
nullptr,
&sourceReader);
// get current media type
ComPtr<IMFMediaType> mediaType;
sourceReader->GetCurrentMediaType(
MF_SOURCE_READER_FIRST_AUDIO_STREAM,
&mediaType);
// get all the data we're looking for
PROPVARIANT prop;
sourceReader->GetPresentationAttribute(
MF_SOURCE_READER_MEDIASOURCE,
MF_PD_DURATION,
&prop);
Duration = prop.uhVal.QuadPart;
UINT32 data;
sourceReader->GetPresentationAttribute(
MF_SOURCE_READER_MEDIASOURCE,
MF_PD_AUDIO_ENCODING_BITRATE,
&prop);
BitRate = prop.ulVal;
mediaType->GetUINT32(
MF_MT_AUDIO_SAMPLES_PER_SECOND,
&data);
SampleRate = data;
mediaType->GetUINT32(
MF_MT_AUDIO_NUM_CHANNELS,
&data);
ChannelCount = data;
}
This is the implementation of the constructor on the MFAttributesHelper
ref class. As you can tell, the constructor accepts a reference to an instance of an InMemoryRandomAccessStream
object and the MIME type of the content in question and proceeds to extract the duration, encoding bitrate, sample rate and channel count from it. It does this by first creating an IMFByteStream object via the convenient MFCreateMFByteStreamOnStreamEx function which basically wraps an IRandomAccessStream object (which InMemoryRandomAccessStream
implements) and returns an IMFByteStream
instance. The object returned by MFCreateMFByteStreamOnStreamEx
also implements IMFAttributes which we then QueryInterface for (via ComPtr::As) and assign the MIME type value to it. Next we instantiate an object that implements IMFSourceReader via MFCreateSourceReaderFromByteStream and use that instance to fetch the duration and encoding bitrate values via the GetPresentationAttribute method. And finally, we retrieve an object that implements the IMFMediaType interface via IMFSourceReader::GetCurrentMediaType and use that object to fetch the sample rate and the channel count values. Once you know how to do all this, it seems quite trivial of course but getting here, believe me, took some doing!
Now that we have this component, reading the metadata from JavaScript proves to be fairly straightforward. Here's an example. In the code below, memoryStream
is an InMemoryRandomAccessStream
instance and mimeType
is a string with the MIME type of the content:
var helper = MFUtils.MFAttributesHelper.create(memoryStream, mimeType);
// now, helper's sampleRate, bitRate, duration and channelCount
// properties contain the data we are looking for
Now with the metadata handy, we simply follow the steps as outlined earlier in this post to commence playback. As mentioned before the sample is hosted up on Github here:
For the sake of the sample, I took a plain MP3 file and applied a XOR cipher on it and then loaded it up and played back from memory applying another XOR transform on the bits before playback. It all works rather well together and again, hat-tip to Chris Guzak for all his help in whittling down the WinRT component down to its essence and really cleaning up its interface!