Experfy Big Data, Analytics, and BI Projects

Browse Projects

Apply as an Expert Hire Expert

381 Projects that match your criteria

Sort by:

Ad hoc consultancy

Ongoing project for ad hoc work on our existing platform

Media and Advertising

Multi-Touch Attribution

Media and Advertising

$25/hr - $125/hr

Starts Jan 16, 2015

4 Proposals Status: IN PROGRESS

Client: A******* * ***

Posted: Jan 16, 2015

Auto Loan Credit Model Leveraging Real Time Risk Variables

We are building a credit model that sees beyond traditional FICO for auto loans. We want to use real-time relavant metrics that measure past achievments with future potential. Variables such as GPA, University Attended, SAT Score, Career Path, Income, etc to better underwrite risk and approve more thin file consumers who get declined due to lack of payment history.

Automotive

Credit Risk

Big Data

$175/hr

Starts Jan 17, 2015

9 Proposals Status: IN PROGRESS

Client: R********

Posted: Jan 12, 2015

Classification of Web Pages into Google AdWords Industry Verticals

Swoop is a fast-growing search advertising startup founded by a team with many previous IPO and M&A exits. We run Google AdWords campaigns on Web pages with the same targeting precision Google uses on search result pages. Not infrequently, we outperform Google AdWords. To do this at scale, we are solving a number of both standard and unique text analysis, search, classification and optimization problems.

The goal of this project is to categorize Web page content features into Google AdWords industry verticals. The end result of the project is to improve the matching between ads and pages.

Overview

Google AdWords allows advertising campaigns to be restricted to a set of verticals. For example, an advertiser may choose to show an ad for hiking boots only to a category “Hiking & Camping”. Categories are nodes in a category hierarchy managed by Google, e.g., “Hobbies & Leisure > Outdoors > Hiking & Camping”.

Swoop targets ads to various content features of Web pages. Improving the association of content features to AdWords verticals improves the quality of ad matching.

Because they come from Web pages, content features are associated with a URL and page-level meta-data of varying quality, based on the publisher (title, META keywords, OpenGraph meta-data, etc.). In terms of granularity, content features may be one of the following:

The "clean" text of the Web page with most navigation, ads, widgets, comments and other "non-core" content removed.
A snippet of the clean text. We are interested in snippets because, often, the categories of individual snippets on the same page vary. Snippet length varies from a few words (a recipe ingredient, a heading) to a paragraph or a section of content.
A subset of the snippets on the page.

The content is overwhelmingly in English with a small portion in Spanish. Other languages are not important in the short run.

Goal

Swoop seeks an algorithm to determine the set of category memberships for a given content feature. The algorithm will be operationalized to work in an online system. Therefore, proposals must include descriptions of the process for both initial and ongoing training/tuning of the algorithm.

Measuring Performance

Algorithm performance will based on the following factors:

Performance (ROC AUC) against a validation set of content features.
Operational convenience, including but not limited to: setup, ongoing training & tuning costs, scalability characteristics and operating costs.

Bonus

As an optional bonus, we are also interested in algorithms with the capability to balance type I and type II errors based on the qualities of a particular category in the hierarchy. In categories where we see a lot more content features than we have ads to show, we may be interested in reducing Type II errors. In categories where we have more ad opportunities than content features to target, we may be interested in reducing Type I errors.

Input Data

Web page information:

URL
Contents of TITLE tag
Attributes of all META tags
High-level vertical of the site the page if from, e.g., auto, health, finance, food, lifestyle, news, entertainment, etc.

Content feature information:

URL of Web page
Snippet text or HTML, in whatever representation is best-suited for processing

If additional input data would be helpful, please describe it and the reasoning behind the need. We may have access to it already or we may be able to readily acquire it through third parties. An example would by referrer URLs (pages that users arrive from to a given URL).

Training data

If the project solution requires the development of a training data set, describe the process by which it should be efficiently created, the size of the training data set required and the expected cost to create it.

Project is Hourly

This project seeks the highest value work, not the lowest possible cost. Thus the hourly rate range we will consider is very broad, from $50-$200. We will evaluate proposals based on experience and qualifications, and the hourly rate will be based on merit.

Proposals

Please don't merely tell us about your background since we will have access to your profile once you apply. Your proposal should outline your methodology and provide insights into how you would solve the problem.

Text Classification

Classification

Online Advertising

$200/hr

Starts Jan 26, 2015

24 Proposals Status: CLOSED

Client: S*****

Posted: Jan 12, 2015

Design and Develop Tableau Dashboards for National Education Project

The Digital Youth Network is a non-profit project that supports organizations, educators and researchers in learning best practices to help develop our youths’ technical, creative, and analytical skills. The DYN is seeking a Tableau consultant to work with its team to design and implement a suite of dashboards designed to provide timely reports for different stakeholders (i.e. funders, city leadership, program providers, mentors, and youth). The consultant needs have experience creating highly customized data dashboards with Tableau, be knowledgeable about working with Amazon Redshift, partitioning access to data based upon user status, automating updates, creating reports for view via Tableau reader, and creating interactive visualizations that link to urls.

Education

Descriptive Analysis

Analytics

$55/hr

Starts Jan 16, 2015

4 Proposals Status: IN PROGRESS

Client: D****** **********

Posted: Jan 10, 2015

Matching Algorithm for High-Profile Executive Mastermind Group

Summary:

Broadly, we'd like to build an algorithm that takes in "Member Profile Data" and returns a list of the "Top 10 Members You Should Meet", prioritized by best match.

Background:

We run a mastermind group of 250 CEOs, executives, and investors. The group meets once a year, and we'd like to provide a list to each of them of the top 10 people they should make an effort to meet at the event.

We collect extensive profile information on each member.

Here are the fields:

Standard:

Name, Title, Company, Address/Location, Zipcode, Country, Gender, Area Code, Personal Website, LinkedIn Profile URL, Company Website, Number of Employees (Select a Range), Company Annual Revenue (Select a Range)

Non-Standard:

List of Crowd Tools Most Interested In: (Community Building, Crowdfunding, Crowdsourcing, Incentive Competitions)

List of Similar Organizations You Are Involved In: (Strategic Coach, Genius Network, Center RIng, Entrepreneur's Organization, YPO, Business Mastery, Maverick1000)

Number of Previous Events You've Attended: (0, 1, 2, or 3)

Which Global Grand Challenges Are You Interested In Solving: (Abundant Energy, Clean Water, Democracy, Global Healthcare, Hunger, Literacy, Poverty, Other)

Your Industry: (Select from Standard List of Industries)

Description of What Your Company Does In this Industry: (Comment box input)

Are You An Angel Investor: (Y/N)

How Many Companies Have You Started: (Number)

Top Two Exponential Technologies You Are Interested In: (AI, Internet of Things/Networks and Sensors, Robotics, 3D Printing, Cryptocurrency, Alternative Energy, Virtual Reality, Human Longevity)

How Did You Hear About This Event: (Comment box input)

Description of Where You Want Your Company to Be in 5 Years: (Comment box input)

Would You Consider Yourself and Early Adopter (Y/N)

Have you Attended a Singularity University Event?: (Y/N)

If yes to the above, which ones?: (EP, IPP, GSP, Other)

Number of Investment Deals Per Year: (Number)

Average Investment Size: (Select a Range)

Deliverable

The deliverable would be an algorithm that would take in this profile information and "match" the members. Think of a "match" as representative of "most likely to do business together" -- Proximity, similarity of business, similarity of business size/business revenues, similar interests in technology, similar grand challenges they want to solve, similar networks (ie. heard about the event from the same person), similar titles, similar angel investment sizes and frequencies.

The output would be a ranked list of the top 5-10 best matches.

Note -- Perhaps there should be a randomization factor included here as well. For example, if we have a husband and a wife (or two business partners) attending the event and they both run a business together or work at the same place, we don't want the algorithm to list them as top matches. Perhaps, if they are too similar, they shouldn't be included in the output.

Note 2 -- We will also provide a basic "training set" or benchmark list of connections that we consider "valuable". In other words, we will provide a list of people and their top 5-10 connections (that we know of).

Note 3 -- Though it maybe out of the scope of this project, plugging into LinkedIn's API or crawling individual's personal websites for keywords ("Real-Estate", "Coach", etc) could add interesting dimensions to their similarity indices.

Bonus -- Though not required, it would be nice if we could see the "Top Factors in Common" for each match (ie. Person A and Person B both want to solve this problem or A and B both run equal sized businesses out of New York).

Language: Python is preferrable, but not required.

User Interface -- We'd like to have a basic user-interface that would allow us to upload a file and see the output of matches.

Input File -- CSV or Excel Document (All fields not 100% complete)

Output: CSV or Excel Document -- List of Top 10 Matches for Each of the 250 Members

+ README.md with detailed instructions and explanation of algorithm

Proposed Output Format:

Member: John Smith

Top Matches:

1) Name

2) Name

3) Name

4) Name

5) Name

6) Name

7) Name

8) Name

9) Name

10) Name

Member: Jack Stone

Top Matches:

1) Name

2) Name

3) Name

4) Name

5) Name

6) Name

...

Education

Hi-Tech

Professional Services

$1,500 - $2,500

Starts Dec 15, 2014

17 Proposals Status: COMPLETED

Client: P*** ********

Posted: Nov 27, 2014

Word Frequency Algorithm

[updated to add more clarity on the problem statement]

We have a large set of sentences.

I 'm looking for the algorithm that will produce the *minimum* set of words (in descending order of appearence) such that each sentence contains *at least one* of these words.

For example:

For the sentence set {"black white red", "black white"} the minimum set of words is

"black"

For the sentence set {"one two five", "three four six", "seven nine one"} the minimum set of words is

"one", "three"

(there are other correct answers, too, but none with less than 2 words)

Text Analytics

Data Mining

Machine Learning

$500 - $600

Starts Nov 18, 2014

11 Proposals Status: COMPLETED

Client: V************

Posted: Nov 07, 2014

Automate Data Ingestion and Build Reporting Dashboards in Tableau

We need to build an ingestion pipeline to bring data in from our advertising and editorial portals into Tableau; help create reports and dashboards; and explore options for sharing reports with our clients.

Milestone 1: Manual extraction and assembly of datasets from the portal, migrate some of the charts from the current POP reports and Quantitative reports into Tableau.

Milestone 2: Migrate any remaining charts from these reports, construction of some new analytic charts/dashboards, start prototyping database store for serving content into Tableau

Milestone 3: Automate data ingestion pipeline into Tableau

NOTE: The client has shorlisted candidates. PLEASE DO NOT SUMBIT NEW PROPOSALS.

Media and Advertising

Dashboards & Scorecards

Data Visualization

$5,000 - $10,000

Starts Nov 17, 2014

6 Proposals Status: CLOSED

Client: P**** ******** ***

Posted: Nov 05, 2014

Amazon Elastic MapReduce High-Level Consultation

Seeking technical resource with experience in AWS EMR (Elastic MapReduce) to provide 10-15 hours of consultation. Need to be able to describe and explain how to perform variety of functions in EMR for purpose of comparison document.

Professional Services

Amazon Elastic MapReduce

Big Data and Cloud

$150/hr

Starts Nov 04, 2014

5 Proposals Status: COMPLETED

Client: S******** ******

Posted: Nov 03, 2014

ETL data from AWS RDS to Redshift or from Delimited Text Files to Redshift

We have a new and sizeable sand-box environment leveraging Tableau, Redshift, and potentially Informatica Cloud. We need someone that could step in immediately and help us load data into Redshift.

Amazon Redshift

Amazon S3

amazon rds

$150/hr

Starts Oct 30, 2014

7 Proposals Status: COMPLETED

Client: M****** ************ ********

Posted: Oct 29, 2014

Design & Implementation of System to Query a massive dataset in realtime.

We need to design and implement a system that will enable us to query a massive dataset in realtime. We need a Big Data architect to help us design the system using appropriate technologies to scale to millions of users and implement the solution by working with our team. We have the luxury and advantage of organizing and structuring the data set. Here is the problem:

If a user wants to explorer a node of interest: x1 with and without filters, we would like to show incoming path and outgoing path with aggregations at each node

x12 —> x9 —> x1 —> x2 —> x1 is a path

x12 —> x9 —> x1 —> x3 —> x8 is a path

We want to show counts next to each node with and without filters like: (for all x1 the filters f1, f2 and f3... and for x3: fa, fb, f2, f3… and so on)

an example of filter f45 is model=BMW

an example of x1 is action=TestDroveCar

The data could be thought of stored with filter attributes

x1 f1 f2 f56 f45

x9 fx fy

x12 fa fx f2

//pseudo query

select count(x) from table where
(
(x == x1 and f1 == f1v and f2=f2v) OR
(x == x2 and f45 == f45v) OR
(x == x3)
)
and
x1Timestamp < x2Timestamp < x3Timestamp

// alternatively we could use intersect

Hi-Tech

Design Optimization

Performance Analysis

$100/hr

Starts Nov 03, 2014

9 Proposals Status: CLOSED

Client: P******** *****

Posted: Oct 28, 2014

FUTURE OF WORK PLATFORM

COMPARE OFFERINGS

UPSKILLING PLATFORM

EXPERFY TALENTCLOUDS

Custom TalentClouds

Browse Projects

Ad hoc consultancy

$25/hr - $125/hr

Auto Loan Credit Model Leveraging Real Time Risk Variables

$175/hr

Classification of Web Pages into Google AdWords Industry Verticals

$200/hr

Design and Develop Tableau Dashboards for National Education Project

$55/hr

Matching Algorithm for High-Profile Executive Mastermind Group

$1,500 - $2,500

Word Frequency Algorithm

$500 - $600

Automate Data Ingestion and Build Reporting Dashboards in Tableau

$5,000 - $10,000

Amazon Elastic MapReduce High-Level Consultation

$150/hr

ETL data from AWS RDS to Redshift or from Delimited Text Files to Redshift

$150/hr

Design & Implementation of System to Query a massive dataset in realtime.

$100/hr