Experfy Big Data, Analytics, and BI Projects

Browse Projects

381 Projects that match your criteria

Sort by:

Data Strategy Advisor for Financial Services Company

We need a data scientist to advise us on our data strategy. This includes our current approach to data capture (including what pieces of information we should be tracking) as well as a strategy / timeline for implementing a new automated pricing and underwriting tool to improve our underwriting process.

We are a small, fast-growing merchant cash advance company. We need someone that has deep expertise in data science, a strong business acumen, and extensive experience in financial services - preferably with some experience with merchant cash advance companies.

We are currently rolling out a new internal application that will track all aspects of the underwriting process in a SQL database. Prior to the application launch, most of our underwriting data is stored in PDF forms on our internal network.

The deliverable we are looking for is advice on:

What data we should be tracking with the launch of the new application
Approach to converting previous PDF data into a format that is able to be analyzed
Timeline / strategy for implementing an automated underwriting model based on this data

Financial Services

Fraud Identification and Prevention

Finance

$75/hr - $125/hr

Starts Mar 01, 2017

14 Proposals Status: IN PROGRESS

Client: G******** *******

Posted: Feb 05, 2017

Price Optimization of Condominium Units

We are a real estate developer that would like to develop a Proof of Concept (POC) predictive model to identify which factors influence the pricing and sales of a home and/or condominium units. The POC will utilize all available historical data for condominium projects. After the POC has demonstrated value with predictive power and an actionable implementation within the project, the scope may be expanded across a broad array of projects.

The POC will utilize the full history of condominium sales data, including (but not limited to):

Unit available for sale date
Unit list price
Unit sale date
Unit sale price
Agent incentive fee and characteristics
Unit characteristics
Building characteristics
Accessibility measures
Expected building transfer date
Advertising expenses
Brand
Project
Room Type
Room Size
Floor
Tower
Project Transfer Duration
Transfer Duration
Selling Duration
Price/Sqm
Price Increase/Decrease
Selling Discount
Transfer Discount
Resale Count
Resale Amount
Transfer Amount
SPA Amount
More to be determined

With the data organized into a structured format and validated for accuracy and completeness, the data scientist will build models to predict the optimal price for a unit given a collection of data points.

Restrictions, such as the sum of the revenue of all individual units must equal the set building revenue, will be incorporated as modeling options as required by the client.

For new model predictions, the heads of business units will enter building attributes, unit attributes, and all other required characteristics into a simple-to-use front end tool. The output generated will be a numeric suggested price. Additionally, work will be done to derive estimates for the effects of changes in each variable. Depending on the complexity desired of the model developed, and variation in the data across buildings, these by-variable impacts may or may not be attainable individually at a high accuracy level.

The statistical model will be developed in R, accompanied with a spreadsheet and/or PowerPoint deliverable that gives details on the functional form of model, examples of how historical sales patterns fit the function, and measures of model accuracy.

The user tool for the POC will be published in Excel or other front end alternative (Shiny app) and serve for testing purposes to evaluate the quality of results and overall value. Recommendations for the expansion of the tool beyond the POC will be given for future implementation.

The client will have full access to all code and tools used in this project, both for model development and model deployment.

Note: This project is being awarded to a data scientist which whom we have an existing relationship.

Real Estate

Sales

Machine Learning

$25,000

Starts Feb 02, 2017

1 Proposal Status: COMPLETED

Net 30

Client: E*******

Posted: Jan 31, 2017

Knn Algorithm Addition in Cluster App

Building a series of breakthrough visualizations for many analysis tasks on platform. You are seeking a qualitatively improved way to view clusters of information, compared to existing methods. Viewing data that naturally “clusters together” is of value in many application domains, including data formatted as surveys,transactions, and text. In preparation for this project, we collaborated on a UI sketch, which we have rendered in a mockup image below. -

Project is to be awarded to Expert already engaged in client's project work.

Professional Services

Analytics

Customer Analytics

$41,000 - $42,000

Starts Jan 28, 2017

1 Proposal Status: IN PROGRESS

Net 7

Client: V********

Posted: Jan 26, 2017

Voice Analytics - Sentiment Analysis Predictive Model

The goal of this project is to take an existing sentiment analysis framework and pipeline and apply to to additional call center calls to determine it's business value and predictive power. Note - this is the next phase of a completed project and will be awarded to the same data scientist who worked on the earlier phase

Professional Services

Call Center Analytics

Consumer Experience

$281/hr

Starts Jan 19, 2017

1 Proposal Status: COMPLETED

Client: T****

Posted: Jan 18, 2017

Training Data Generation for Ansible Build Time Prediction

Summary

We would like to build a continuous learning algorithm that will be able to predict execution times of Ansible builds (Playbooks) based on historical Ansible build data. As a precursor to developing the algorithm we are seeking a technologist to develop the continuous learning environment using free versions of Ansible, Splunk and Elastic Search and to generate training data from which the algorithm can learn.

Proposal

As part of your proposal please answer the following questions:

What cloud environment will you use to develop the continuous learning environment? Please describe or diagram the system and provide an estimated cost (e.g. EC2 instance costs or Heroku dyno costs) for maintaining the environment for data generation.
Please provide an estimate of hours required to build and configure the environment. Please provide an estimate of hours required to generate the training data.
What will be your strategy/approach for configuring Playbooks based on the specified Galaxy Roles? How will you ensure the generated data provides a variety of Playbook structures (singletons, clusters, single and multi-target builds) for optimal machine learning?
How do you plan to structure the resulting training data?
Please describe your knowledge of and past experience with the technologies required for this project.

Scope of Work

The selected consultant will be responsible for:

Setting up a cloud environment for data generation (included in this project posting) and continuous learning (for the future project posting).
Setting up and configuring at least one Ansible instance.
Setting up and configuring Splunk.
Installing and configuring the Ansible App for Splunk (used to import Ansible data in to Splunk).
Installing and configuring Elastic Search to access the Ansible data within Splunk.
Setting up multiple hosts upon which Ansiblebuilds can be executed.
Configuring a selected set of publicly available Ansible Roles (from galaxy.ansible.com) in to both singleton (single-Role) and cluster (multiple-Roles) Ansible Playbooks for the purpose of data generation.
Developing a script to execute the resulting Ansible playbooks against single and multiple hosts in order to generate approximately 2,000 rows of test data.
Provide a method for extracting the training data for machine learning (extracted data must be in a flat-file format).

The primary outputs of this project are both the test data and the environment for generating additional test data, which can be accessed by the continuous learning environment.

The attached presentation provides additional details around the environment and data requirements and gives additional context to the broader project scope (beyond the environment and data generation scope of this first project). Details relevant to the scope of this Experfy project posting have been highlighted in yellow in the presentation for clarification.

When submitting your proposal please include an executive summary which describes key elements and numbers for your approach. In your proposal, when estimating cost for the environment, please declare some assumptions regarding number and size of hosts and indicate how much of the proposed cost is due to environmental costs.

After the executive summary, the rest of the proposal should address all questions listed in the proposal section above. The proposal does not need to explicitly be in question and answer format, however it can be. The important thing is that all questions are clearly answered or, if they cannot be answered, then an explanation is given as to why.

Finally, please make sure all proposals are client-ready.

Hi-Tech

Application Deployment

System Provisioning & Configuration

$13,000 - $20,000

Starts Jan 28, 2017

8 Proposals Status: COMPLETED

Net 60

Client: C*******

Posted: Jan 12, 2017

Data Scientist for Exploratory Analysis

This project will be awarded to the same data scientist as before. The details have already been discussed with the expert.

Consumer Goods and Retail

$100/hr - $150/hr

Starts Jan 18, 2017

3 Proposals Status: IN PROGRESS

Net 30

Client: M***

Posted: Jan 09, 2017

Proof of concept for a Web Page Classifier that identifies reader intent

Background:

Taboola is widely recognized as the world’s leading content discovery platform, reaching 1B unique visitors and serving over 360 billion recommendations every month. Recent ComScore data shows that Taboola is second only to Facebook in terms of reach (https://www.taboola.com/press-release/taboola-crosses-one-billion-user-mark-second-only-facebook-world%E2%80%99s-largest-discovery).

Publishers, marketers, and agencies leverage Taboola to retain users on their site, monetize their traffic and distribute their content to drive high quality audiences. Publishers using Taboola include USA Today, NYTimes, TMZ, Politico.com, BusinessInsider, CafeMom, Billboard.com, Fox Television, Weather.com, Examiner, and many more.

Taboola's operation is vast with ~2,000 servers in 6 data centers processing big data about users and user behavior, content, pages etc..

General:

The premise behind this project is that web pages can be used to identify a specific reader intent.
For example, people who read about a store’s opening hours have an intention to visit that store or people that read about “how to write a great CV” probably intend to seek employment.

What we are looking for:

Our goal is to have a reproducible methodology for building a web page classifiers for identifying specific user intents.

Given a specific user intent, we would like to build a binary classifier that determines whether the reader has the specific intent, and would like to be able to reproduce this methodology with different intents.

Once operational, the classifier should run efficiently and be able to scale into classifying millions of web pages in a short amount of time.

Project scope:

The project deliverables should be a working classifier which will serve as a proof of concept for a re-usable methodology for creating such classifiers.

In addition the project should include ample documentation describing the general methodology used so it can be recreated for additional intents.

We will decide as initial for the initial proof of concept with the selected candidate

Your proposal:

Your proposal should outline your approach in general terms, which algorithms you intend to use, which features would you extract from each url and how, how would you determine a truth set for the classifier, how would you measure the correctness and/or other KPIs.

We will share additional information with the expert and define the approach and scope in detail with the relevant expert.

(Image provided by Mimooh under the Creative Commons Attribution-Share Alike 3.0 Unported License - https://commons.wikimedia.org/wiki/File:Med_classifier3_by_mimooh.svg)

Consumer Goods and Retail

Financial Services

Healthcare

$10,000 - $15,000

Starts Jan 15, 2017

12 Proposals Status: CLOSED

Net 30

Client: T*******

Posted: Jan 08, 2017

Machine Learning pipelines for optimizing online marketing performance

We would like to create ML pipelines to improve the conversion performance (leads and sales) of the ads we manage on adwords, facebook ads, instagram, twitter ads.

We're looking for a long-term engagement with someone who ideally has some experience with applied ML in digital advertising.

About us

We're a digital advertising management company for SMBs. We launched in April 2016 and currently have ~150 active customers.

Project

Overall, we're trying to improve the conversion performance of our customer's campaigns in an automed way. We believe in order to do this, we need to start by using ML pipelines to output suggested values for % of budget being allocated to the different channels (see above). We also believe there are other pieces to this puzzle but we want to start with the channel allocation suggestions and then move on from there.

We already have a team of developers that will performing any of the devops needed for this project.

So we're looking for a someone to help us do the following:

develop models in R or python using past experience and our data
help us develop the proper techniques to utilize the model pipelines

We plan on using AzureML to construct our pipelines/APIs. You do not need to know AzureML, you can pick it up along the way as we work together to implement the pipeline(s) you construct.

Data

The data will be advertising performance data from adwords, facebook ads, instagram, twitter ads as well as web site and conversion analytics data.

This data will be ETL'd by us and made available to you in DBs to conduct your work.

However, ideally you have a method to access the APIs directly (ex. Pentaho) as well while performing your function to streamline the workflow (example you need access to something that we're not currently capturing from the APIs)... so that would be ideal, not required.

Machine Learning

Online Advertising

Market Segmentation and Targeting

$100/hr - $200/hr

Starts Jan 19, 2017

19 Proposals Status: IN PROGRESS

Client: A*****

Posted: Jan 08, 2017

R package for media data validation and cleaning engine

Every day we are receiving media data featuring several media metrics that has a business logic that needs to be upheld. Some of this business metrics is easy to uphold. Others might be more tricky. At Blackwood Seven we rely on massive quantities of data and the correctness of this data is naturally crucial.

The following needs to be understood and completely grasped.

Online media metrics
Offline media metrics
CPM, CPC, CPA
ROI
Marginal cost analysis
Time domain filtering

For programming

R
R-Studio
Python
R6 Classes

Specifically the R package that needs to be developed should handle the sanity checks of data as well as methods for fixing issues on a best effort. The approach could be to use Deep learning on generated sane and insane data. The package needs to be structured and built using R6 classes. Worst case S3 can be used. Alternatively all of it can be implemented in python with an interface to R.

Sample sane and insane data are provided as CSV's which is of course not enough to train a network but just to reveal some of the potential issues.

Explanation of the data

The data set here consists of Impressions, Clicks and Net as metrics. As dimensions we have Date, Channel and Supplier.

Impressions: The number of times a banner has been shown to a user
Clicks: The number of Impressions that users clicked on. Thus the following MUST be true always Clicks < Impressions
Net: The amount of money paid for the banner which is usually reconciled as CPC=Cost per Click or CPM=Cost per thousand impressions. In other words (Impressions > 0) => (Net > 0) And (Clicks > 0) => (Net > 0) while the reverse is not true. Just because you paid does not mean you received any clicks. It is however very unlikely.

R-Project

Marketing Mix Modeling

Bayesian Inference

$100/hr - $200/hr

Starts Feb 24, 2017

12 Proposals Status: COMPLETED

Client: B********* *****

Posted: Dec 30, 2016

Risk Analysis of Fund Investments

We track a number of statistics for our fund investments. All of the numbers are derived from monthly returns of a fund/index. We are limited to monthly due to the fact that each fund only reports on a monthly basis.

Attached are the most common statistical measures we track for each manager. Almost all of these numbers are derived from general finance industry practice. It would be helpful for us to understand how a data analyst would evaluate the risk of our fund investments given the constraints on frequency of data points and with an unbiased approach to how to attack the problem.

We would like to implement more sophisticated risk analytics based on the limited data at our disposal. Please provide your approach and how it may benefit us.

Financial Services

Finance

Risk and Compliance

$100/hr - $150/hr

15 Proposals Status: COMPLETED

Client: S****** ***** ***

Posted: Dec 26, 2016

FUTURE OF WORK PLATFORM

COMPARE OFFERINGS

UPSKILLING PLATFORM

EXPERFY TALENTCLOUDS

Custom TalentClouds

Browse Projects

Data Strategy Advisor for Financial Services Company

$75/hr - $125/hr

Price Optimization of Condominium Units

$25,000

Knn Algorithm Addition in Cluster App

$41,000 - $42,000

Voice Analytics - Sentiment Analysis Predictive Model

$281/hr

Training Data Generation for Ansible Build Time Prediction

$13,000 - $20,000

Data Scientist for Exploratory Analysis

$100/hr - $150/hr

Proof of concept for a Web Page Classifier that identifies reader intent

$10,000 - $15,000

Machine Learning pipelines for optimizing online marketing performance

$100/hr - $200/hr

R package for media data validation and cleaning engine

$100/hr - $200/hr

Risk Analysis of Fund Investments

$100/hr - $150/hr