facebook-pixel

Classification of Web Pages into Google AdWords Industry Verticals

Industry Media and Advertising

Specialization Or Business Function

Technical Function Analytics (Text Analytics)

Technology & Tools Programming Languages and Frameworks (R, Python)

CLOSED FOR BIDDING

Project Description

Swoop is a fast-growing search advertising startup founded by a team with many previous IPO and M&A exits. We run Google AdWords campaigns on Web pages with the same targeting precision Google uses on search result pages. Not infrequently, we outperform Google AdWords. To do this at scale, we are solving a number of both standard and unique text analysis, search, classification and optimization problems.

The goal of this project is to categorize Web page content features into Google AdWords industry verticals. The end result of the project is to improve the matching between ads and pages.

Overview

Google AdWords allows advertising campaigns to be restricted to a set of verticals. For example, an advertiser may choose to show an ad for hiking boots only to a category “Hiking & Camping”. Categories are nodes in a category hierarchy managed by Google, e.g., “Hobbies & Leisure > Outdoors > Hiking & Camping”.

Swoop targets ads to various content features of Web pages. Improving the association of content features to AdWords verticals improves the quality of ad matching.

Because they come from Web pages, content features are associated with a URL and page-level meta-data of varying quality, based on the publisher (title, META keywords, OpenGraph meta-data, etc.). In terms of granularity, content features may be one of the following:

  • The "clean" text of the Web page with most navigation, ads, widgets, comments and other "non-core" content removed.

  • A snippet of the clean text. We are interested in snippets because, often, the categories of individual snippets on the same page vary. Snippet length varies from a few words (a recipe ingredient, a heading) to a paragraph or a section of content.

  • A subset of the snippets on the page.

The content is overwhelmingly in English with a small portion in Spanish. Other languages are not important in the short run.

Goal

Swoop seeks an algorithm to determine the set of category memberships for a given content feature. The algorithm will be operationalized to work in an online system. Therefore, proposals must include descriptions of the process for both initial and ongoing training/tuning of the algorithm.

Measuring Performance

Algorithm performance will based on the following factors:

  • Performance (ROC AUC) against a validation set of content features.

  • Operational convenience, including but not limited to: setup, ongoing training & tuning costs, scalability characteristics and operating costs.

Bonus

As an optional bonus, we are also interested in algorithms with the capability to balance type I and type II errors based on the qualities of a particular category in the hierarchy. In categories where we see a lot more content features than we have ads to show, we may be interested in reducing Type II errors. In categories where we have more ad opportunities than content features to target, we may be interested in reducing Type I errors.

Input Data

Web page information:

  • URL

  • Contents of TITLE tag

  • Attributes of all META tags

  • High-level vertical of the site the page if from, e.g., auto, health, finance, food, lifestyle, news, entertainment, etc.

Content feature information:

  • URL of Web page

  • Snippet text or HTML, in whatever representation is best-suited for processing

If additional input data would be helpful, please describe it and the reasoning behind the need. We may have access to it already or we may be able to readily acquire it through third parties. An example would by referrer URLs (pages that users arrive from to a given URL).

Training data

If the project solution requires the development of a training data set, describe the process by which it should be efficiently created, the size of the training data set required and the expected cost to create it.

Project is Hourly

This project seeks the highest value work, not the lowest possible cost. Thus the hourly rate range we will consider is very broad, from $50-$200. We will evaluate proposals based on experience and qualifications, and the hourly rate will be based on merit.

Proposals

Please don't merely tell us about your background since we will have access to your profile once you apply. Your proposal should outline your methodology and provide insights into how you would solve the problem.

Project Overview

  • Posted
    January 12, 2015
  • Planned Start
    January 26, 2015
  • Preferred Location
    From anywhere

Client Overview

  • S*****

  • Projects
    0 % Awarded ( 0 of 1 )

EXPERTISE REQUIRED
Text Classification
Classification
Online Advertising
Google Adwords

Matching Providers