Integrate Microsoft AI in Sitecore media library

Sitecore’s media library does a good job of keeping the media that you want to use on your site, but the amount of media in a large site may become hard to navigate. The editors will then rely on search to find what they are looking for. Wouldn’t it be nice to be able to search for the contents of the images instead of the metadata?

Microsoft AI

Microsoft AI to the rescue! The idea is to add Microsoft AI in Sitecore media library to analyze the images in the media library and add the information that the analyzer finds to the media items in Sitecore. At this stage I will focus on getting something up and running, so I am aware that to hook it up to the save event of items might not be the best idea but it does the trick for this blog post.

So where to start? First of all you will need an azure account if you don’t already have one. You will also need to sign up for Microsoft AI. The computer-vision part of Microsoft AI can be found here. Sign up for a free trial to get started. Be sure to note the key and the region-url. You will need those later. When the sign up is done, install the API in your solution from NuGet, the API and installation instructions can be found here.

The code

Now let’s get started. The actual code to analyze an image is pretty simple, you just send the image via an HttpClient and get a JSon response back. Hopefully. I created an ImageAnalyzingService class that does the job at hand. The subscription key and the region-url are added to the Constants-file.

    public class ImageAnalyzingService
        public ImageAnalysisResponse MakeAnalysisRequest(byte[] byteData)
            HttpClient client = new HttpClient();
            client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", Constants.SubscriptionKey);
            string requestParameters = "visualFeatures=Categories,Description,Color&language=en";
            string uri = Constants.UriBase + "?" + requestParameters;

            HttpResponseMessage response;
            using (ByteArrayContent content = new ByteArrayContent(byteData))
                content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");

                response = client.PostAsync(uri, content).Result;
                string contentString = response.Content.ReadAsStringAsync().Result;

                return ImageAnalysisResponseRepository.Get(contentString);

This should probably be called from an agent that handles the image analysis and it should also probably be asynchronous, but to keep it simple we will wait for the response. For now, as mentioned earlier, the code will be executed at the save event of the item. The following handler will be hooked up.

    public class ImageRecognitionHandler
        private ImageAnalyzingService _imageAnalyzingService;

        private ImageAnalyzingService ImageAnalyzingService
                if (_imageAnalyzingService == null)
                    _imageAnalyzingService = new ImageAnalyzingService();
                return _imageAnalyzingService;

        public void OnItemSaved(object sender, EventArgs args)
            if (args == null)

            Item item = Event.ExtractParameter(args, 0) as Item;
            if (item == null)

            if (!item.Paths.IsMediaItem)

            var itemTemplate = TemplateManager.GetTemplate(item);
            if (!itemTemplate.GetBaseTemplates().Any(baseTemplate => baseTemplate.ID.Equals(Constants.ImageRecognition.TemplateId)))

            var mediaItem = new MediaItem(item);
            if (mediaItem.Size <= 0) 

            var mediaStream = mediaItem.GetMediaStream(); 
            byte[] buffer = new byte[(int)mediaStream.Length]; 
            mediaStream.Read(buffer, 0, (int)mediaStream.Length); 

            var imageAnalysisResponse = ImageAnalyzingService.MakeAnalysisRequest(buffer); 
            if (imageAnalysisResponse == null || !imageAnalysisResponse.description.Captions.Any()) 

            using (new SecurityDisabler()) { 
                item.Fields[Constants.ImageRecognition.Fields.AccentColor].Value = imageAnalysisResponse.color.AccentColor; 
                item.Fields[Constants.ImageRecognition.Fields.BackgroundColor].Value = imageAnalysisResponse.color.AccentColor; 
                item.Fields[Constants.ImageRecognition.Fields.Caption].Value = string.Join(",", imageAnalysisResponse.description.Captions.Select(c => c.Text));
                item.Fields[Constants.ImageRecognition.Fields.Categories].Value = string.Join(",", imageAnalysisResponse.categories.Select(c =>;
                item.Fields[Constants.ImageRecognition.Fields.DominantColors].Value = string.Join(",", imageAnalysisResponse.color.DominantColors);
                item.Fields[Constants.ImageRecognition.Fields.ForegroundColor].Value = imageAnalysisResponse.color.DominantColorForeground;
                item.Fields[Constants.ImageRecognition.Fields.Tags].Value = string.Join(",", imageAnalysisResponse.description.Tags);
                item.Fields[Constants.ImageRecognition.Fields.IsBlackAndWhite].Value = (imageAnalysisResponse.color.IsBWImg ? "1" : "0");



This is all pretty straight forward. The first if-statements are there to get out of this function as fast as possible if the item is not up for image analysis. I have also added a new data template in the inheritance of the Image (/sitecore/templates/System/Media/Unversioned/Image) template. I named the new data template __ImageRecognition. The ID of the template and all the field IDs were added to the Constants file. I will not bore you with explaining how that is done, instead here is the template.

Image recognition template

Nothing out of the ordinary here. Some texts and a checkbox. These are the fields that will be set by Microsoft AI if everything goes well.

On with the code. In the ImageAnalyzingService A ImageAnalysisResponseRepository is called. The repository in its turn calls the factory which does the following.

    public static ImageAnalysisResponse Create(string jSon)
        var serializer = new JavaScriptSerializer();
        return serializer.Deserialize(jSon);

This code uses the JavaScriptSerializer to deserialize the JSon Response into an ImageAnlysisResponse class.

    public class ImageAnalysisResponse

        public List categories { get; set; }
        public Description description { get; set; }
        public string requestId { get; set; }
        public Metadata metadata { get; set; }
        public Color color { get; set; }

    public class Category
        public string name { get; set; }
        public string score { get; set; }

I am not sure why the categories are not a part of Microsoft.ProjectOxford.Vision.Contract. The list of Category will however do the trick. Maybe I wasn’t looking hard enough for the correct class. Now we are almost set. We just need to hook it up via an include file that could look something like this.

<?xml version="1.0"?>

<configuration xmlns:patch="">
      <event name="item:saved">
        <handler patch:after="*[@type='Sitecore.Analytics.Data.Items.ItemEventHandler, Sitecore.Analytics']" method="OnItemSaved" type="Feature.ImageRecognition.Infrastructure.ImageRecognitionHandler, Feature.ImageRecognition"/>

Let’s have a look in Sitecore. Let’s try to upload an image of a beer. The upload takes a while because of the new request that we added to the Image Saved event, again this should probably be done in the background via an agent or something. However when done we should see the following result.

Image analysis of beer

So the image belongs to the categories drink and dark_light. The most impressive here is tha caption that reads “a close up of a bottle and a glass of beer on a table”. Pretty accurate I would say, considering the image.

Beer and bottle

This becomes really useful when an editor wants to insert an image on a web page and uses the search to find an image.

Media library search

Starting this blog post I thought it would be a fun way to explore the Microsoft AI library, and I still do. More posts will come. What surprised me was that this would actually be quite useful. Maybe there will be another post about this in the future when I tidy things up and move the logic to the background and make it more stable. Please post your comments and thoughts.

Edit, I uploaded an image of myself

And i got the following response:

I especially like the caption “a man holding a fish” 🙂

Also the tags indicate that it is a person, it may be a man or a woman. Which ever it is, the person is young (thank you Microsoft). The person is standing (?) and holding a fish, maybe on a boat, skiing and wearing a hat.





Leave a Reply

Your email address will not be published. Required fields are marked *