Skip to content

Item Entries

All content scraped from an external resource is categorized as either text or multimedia (images, audio, and video). It is important to note that, due to copyright restrictions, it was decided not to store multimedia files in the Informfully database (the system only stores the URL to the media object). This has the advantage of reducing the load on the Informfully server as the media item is streamed from the original host. The disadvantage is that if the source removes the file, users in Informfully can no longer access it. If the item modality is text, however, the system will create and store a copy after applying data augmentation/pre-processing steps.

AttributesTypeDescription
_idStringID of article
articleTypeStringCan be one of three: text, video, or podcast. Indicates whether the article contains a video, audio, or only text.
titleStringTitle of the article.
leadStringLead of the article.
bodyArray of ObjectsContains the article text as paragraphs. The paragraphs are objects of the array, and they have two properties: type (String) and text.
urlStringURL through which the article can be accessed.
imageStringOptional field, the URL to the cover image of the article.
multimediaURLStringContains a link to a video or audio file. The field should be set to null if empty. Should be consistent with the field articleType (meaning that if we have a text articleType, multimediaURL is set to null).
multimediaDurationInMillisIntegerThe length of the multimedia file (video or audio) in ms. Should be set to 1 if articleType is text.
datePublishedDateTime at which the article was published on the news outlet's website.
dateScrapedDateTime at which the article was scraped.
dateUpdatedDateOutlets might update the article's contents. Instead of creating a new article, the contents of the previous version are updated.
dateDeletedDateOptional field, we are sometimes asked by the outlets to remove articles. Instead of deleting them, we add a dateDeleted entry. Articles with this entry will not be shown.
authorStringCan also be a press agency or sponsored content. In case of multiple authors, separate them with a comma (,) symbol.
outletStringCurrent options include BLICK, NZZ, TAGI, SRF, WOZ, or WW.
primaryCategoryStringThe category of an item.
subCategoriesArray of StringsThe sub-categories of an article. This information is not always provided.
languageStringLanguage code of the article (e.g., en-US, de-CH, etc.)

INFO

Be aware that Android devices can only handle websites secured by an SSL certificate (i.e., only HTTPS websites, not HTTP websites). Therefore, data fields such as url or multimediaURL should only contain URLs for HTTPS websites. Please visit the Scraper Documentation page to get access to sample code that scrapes and adds item entries to the back end.

When creating item entries, we recommend setting default values for each field. If we used non-existing fields to signify the absence of an attribute, we would have to use the $exists keyword to distinguish between articles that do and do not feature certain attributes. This use of the $exists operator, however, cannot use any index and results in lower overall performance. Items will be rendered as follows inside the app:

img/app_screenshots/app_2.png

For a text item, the top of the interface displays a thumbnail preview specified in the image attribute. For a multimedia item (podcast or video), a multimedia player will load the specified image as a thumbnail.

The body consists of a list of elements. There are currently three types of elements that are supported:

  • text for unformatted text,
  • subtitle for adding a new paragraph and subtitle to the text, and
  • quote for a cursive, indented quote block.

A sample `body`` element of an item looks like this:

json

  "body": [
    {
      "type": "text",
      "text": "This is the normal \"text\" option used to show text within the item view."
    },
    {
      "type": "subtitle",
      "text": "This is the \"subtitle\" option that creates a new paragraph with a larger font size compared to regular text."
    },
    {
      "type": "quote",
      "text": "This is the \"quote\" option that creates an indented paragraph with the same font size as regular text."
    }
  ]