facebook-pixel

Intelligent System on top of a cloud infrastructure like AWS - Phase 1 (Entity extraction and storage of extracted content)

Industry Hi-Tech

Specialization Or Business Function

Technical Function Data Management (Data Modeling), Business Intelligence, Analytics (Data Mining), Marketing and Web Analytics

Technology & Tools Big Data and Cloud (MongoDB, Amazon Web Services), Data Analysis and AI Tools

COMPLETED Mar 04, 2015

Project Description

You will help us build a Big Data Analytic system designed and implemented on top of a cloud infrastructure service like AWS. The goal of this system is to crawl and extract information from websites, which is used to build predictive models. This project has two phases. In the first phase, you will integrate with third party APIs (e.g.: Alexa, Google Analytics) to accept a list of URLs that the system will crawl. You will parse the crawled content to extract HTML, CSS and images from the website. You will also build data store infrastructure in the cloud to store the extracted content. In the second phase, you will further process the extracted data to setup a feature engineering pipeline that will be used to build predictive modes. In this phase, you will look to construct progressively richer representations of the extracted data sets towards building increasingly sophisticated predictive models.

This request is for Phase 1 of the project which is application developement involving the following:

  • User interfaces and web-app for interacting with user to capture input data and present results. (Front-end skills.)

  • Scraping infrastructure and backend development using cloud infrastructure. (Backend skills.)

Scope of Work for Phase1

  • Put together UIs to accept user provided data
  • Put together UIs to present modeling output to user
  • Scraping infrastructure built on a cloud IaaS to crawl user provided URLs
  • Setup data store to preserve raw crawled content
  • Extraction and storage of HTML content, Images etc from scraped output
  • Setup data store to preserve data science work
  • The system needs to minimise the the transfer of information between S3, EC2 and MongoDB to minimise computing

UPDATE: THIS PROJECT IS ON HOLD. PLEASE DO NOT SUBMIT BIDS UNTIL WE PROVIDE ANOTHER UPDATE TO MOVE FORWARD.

Project Overview

  • Posted
    October 01, 2014
  • Planned Start
    January 08, 2015
  • Preferred Location
    From anywhere

Client Overview


EXPERTISE REQUIRED

Matching Providers