Follow

Guide to: The Model Template Library (Beta)

Overview

As part of the Pre-labeling (Beta) feature, Appen offers various machine learning models available in the model template library to help provide initial "best-guess" hypotheses to an annotation project. This feature is helpful to specialized annotation projects as providing a contributor with model-predicted annotation hypotheses can dramatically cut down annotation time while maintaining and improving annotation quality. Below you can learn more about the models available in the model template library.

Screen_Shot_2020-09-16_at_4.28.26_PM.png

Fig. 1: Model Template Library

Important note: This feature is part of our managed services offering; contact your Customer Success Manager for access or more information.

Blur Faces in Images 

This model accepts URLs of images and returns URLs pointing to copies of the source images with any pictured faces blurred. 

  • The face encoding model uses Adam Geitgey’s face recognition library, which is built using dlib’s face recognition model.
  • It was trained on a dataset containing about 3 million images of faces grouped by individual. 
  • The model achieves an accuracy of 99.38% on the standard Labeled Faces in the Wild dataset, which means that given two images of faces, it correctly predicts if the images are of the same person 99.38% of the time. 
  • To learn more about the model, please read this blog post. 

Box and Transcribe Words

This model is designed to be used as part of a document transcription workflow with the following steps: 

  1. A contributor draws bounding boxes around lines of text in an image 
  2. Given a bounding box around a line of text, the model predicts the bounding box coordinates and transcriptions corresponding to each word in the line of text 
  3. A contributor reviews the model’s predictions 

In addition to input columns for the images and the text-line bounding boxes, this model also takes an input for the language in which to make the transcription prediction. The values accepted are two-letter ISO 639-1 language codes, and the following language codes are accepted: English ('en'), Spanish ('es'), German ('de'), French ('fr'), Italian ('it'), Portuguese ('pt'), Dutch ('nl'), Hebrew ('he'), Hungarian ('hu'), Swedish ('sv'), Norwegian Nynorsk ('nn'), Danish ('da'), Finnish ('fi'), Chinese ('zh'), Chinese Simple ('zh-sim'), Chinese Traditional ('zh-tra'), Japanese ('jp'), Arabic ('ar'), Russian ('ru'). 

  • The underlying optical character recognition (OCR) model is the Tesseract Open-Source OCR Engine
  • The model is designed to recognize printed text and is unlikely to work well on handwriting.
  • Please refer to the model documentation for further support.  

Classify Images

This trainable model classifies images into user-defined categoriesOnce trained, it accepts images and returns classification predictions, as well as the confidence of the predictions as values between 0 and 1. The model can be trained on class tags for the whole image by providing positive examples for each type. The data can also include negative examples for images that should not be assigned a class value. For instructions on how to train and evaluate the model, refer to this article.

  • This model was developed and is hosted by IBM Watson Visual Recognition service. 
  • For more information, please visit the IBM Watson Visual Recognition service webpage here. 

Detect Explicit Content

This model can be used to scan images for explicit or adult content and is helpful in monitoring chat, social media, and user-generated content. It accepts images and returns boolean values; if the value returned is “true,” the model has predicted that the image contains explicit content. The model also returns the confidence of its prediction as a value between 0 and 1 in the “explicit_confidence” output data column. 

  • This model was developed and is hosted by IBM Watson Visual Recognition service
  • For more information, please visit the IBM Watson Visual Recognition service webpage here. 

Identify Face Landmarks

This model accepts URLs of images and returns json strings containing the coordinates of predicted facial landmarks.

  • The face encoding model uses Adam Geitgey’s face recognition library, which is built using dlib’s face recognition model.
  • It was trained on a dataset containing about 3 million images of faces grouped by individual
  • The model achieves an accuracy of 99.38% on the standard Labeled Faces in the Wild dataset, which means that given two images of faces, it correctly predicts if the images are of the same person 99.38% of the time. 
  • To learn more about the model, please read this blog post.  

Label Pixels in Street Scene Images 

This model generates pixel-level semantic segmentation masks for street scene images, such as images containing cars, trucks, buildings, pedestrians and signs. It can be useful for enhancing annotation efficiency and quality for autonomous vehicle and related use cases. 

Segment Audio

This model can be used to classify periods of time in audio according to the sound within it; pre-labeling with this model can streamline audio segmentation and transcription workflows. The model segments audio into the classes speechmusicnoise and silence. It accepts audio files and returns the class labels, start times and end times of the identified segments

 


Was this article helpful?
0 out of 0 found this helpful


Have more questions? Submit a request
Powered by Zendesk