facebook-pixel

Intelligent System on top of a cloud infrastructure like AWS - Phase 2 (feature extraction from crawled content and data science modeling)

Industry Hi-Tech

Specialization Or Business Function

Technical Function Data Management (Data Modeling), Data Warehousing (Data Integration), Data Engineering, Analytics (Predictive Modeling, Data Mining), Marketing and Web Analytics

Technology & Tools Business Intelligence and Visualization, Big Data and Cloud (MongoDB, Amazon Web Services)

$10,000 - $50,000

FIXED PRICE

Project Description

You will help us build a Big Data Analytic system designed and implemented on top of a cloud infrastructure service like AWS. The goal of this system is to crawl and extract information from websites, which is used to build predictive models. This project has two phases. In the first phase, you will integrate with third party APIs (e.g.: Alexa, Google Analytics) to accept a list of URLs that the system will crawl. You will parse the crawled content to extract HTML, CSS and images from the website. You will also build data store infrastructure in the cloud to store the extracted content. In the second phase, you will further process the extracted data to setup a feature engineering pipeline that will be used to build predictive models. In this phase, you will look to construct progressively richer representations of the extracted data sets towards building increasingly sophisticated predictive models.

This request is for the second phase of the project which includes:

  • Feature extraction from crawled content
  • Data science modeling
  • The system needs to minimise the the transfer of information between S3, EC2 and MongoDB to minimise computing costs
  • Link classification algorithm
  • D3 Visualisation

Project Overview

  • Posted
    October 01, 2014
  • Preferred Location
    From anywhere

Client Overview


EXPERTISE REQUIRED

Matching Providers