Experfy Big Data, Analytics, and BI Projects

Browse Projects

381 Projects that match your criteria

Sort by:

Medical Images Classifier

We are a startup in AI field. We need to predict binary outcomes based on the sequence of images of the cell taken by a microscope. Our 2 junior Python devs run a few popular NN on this set but we only getting about 60% acuracy. We also have a description of features that potentially could be affecting the outcome. Building feature extractors might be one way to increase accuracy.

We are looking to consult with a Neural Network expert in image processing and classification.

Data sources

6000 Sequences of jpg images 500x500px, sorted by TRAIN (POS NEG) and VAL sets. Also, source data in xls and imported in mySQL dtabase.

Current Technology stack

Nvidia DIGITS server, keras, tensorflow, caffe...

We have an UBUNTU server with one 1080 TI

Deliverable

Neural network that can reach higher accuracy. Ideally you would work on our server. We have teamviewer or SSH access.

As an option, you would advice our team on best direction to take and we will do the leg work.

In your proposal please tell us more about your neural network experience in image processing and classification.

Healthcare

Hi-Tech

Pharmaceutical and Life Sciences

$70/hr - $150/hr

Starts Dec 22, 2017

9 Proposals Status: COMPLETED

Client: R*****

Posted: Dec 07, 2017

WorkFusion Development - Create Human Tasks to existing Business Process

We are creating credit investment data to help analysts automate the required data sets for proper underwriting of investments.

We use WorkFusion (WF) to read, parse and extract data from legal agreements that are received in pdf format.

We have built Phase 1 of a WF business process that breaks out the key sections of the legal agreement (ie Cover Page, Recitals, Table of Contents, Defined Terms, Sections) using parsing and regular expressions.

We now need to build 5 human tasks by segmenting specific sections and then providing automated extraction where possible.

We have 500 total legal agreements that need to be run through this process. All work should be developed directly in the WorkFusion platform/Business Process

Attached is a screenshot of the current Business Process that needs to be worked with.

In your proposal please answer the following questions.

Have you worked with WorkFusion in the past? Describe your experience using WorkFusion.
What is your comfort level with Java?
How many years of RPA experience do you have?

Regular Expressions

Parsing

Business Process Automation

$50/hr - $100/hr

Starts Jan 03, 2018

6 Proposals Status: COMPLETED

Client: K****** ************

Posted: Nov 29, 2017

Data Anomaly Detection and Suggestions

We are plannig to develop an unsupervised learning solution to detect anomalies in structured data. Our data will be single tables only. We need a solution that will detect anomalies across various data types (time series, configuration, etc.). We need the solution to identify normalcy patterns in individual columns and in intercolumn relationships and then identify anomalies within those patterns and make suggestions for what a correct value could be (providing some confidence interval).

I realize the desrciption above is a bit generic but thats because we are looking to develop a base solution that works (to some level) in an unsupervised manner across a wide variet of data types. We expect to need further tuning in order to maximize signal from different data types.

Stage 1: Planning (current project)

For this stage we are interested in finding data scientists with relevant background and experience in data anomaly detection similar to what is described above. During this stage we will hire (pay) 1-3 data scientists to consult for one phone call on project planning and strategy. Please apply to this project an explain why we should consult with you.

Stage 2: Execution (future project)

A future project posting will provide additional details, ask for proposals. We will hire data scientist(s) to developer the solution.

Financial Services

Hi-Tech

$100/hr - $200/hr

14 Proposals Status: CLOSED

Net 30

Client: C*******

Posted: Nov 22, 2017

Schedule Reconciliation Problem

We have challenges reconciling data related to Subject Visits. It is difficult to aggregate this data accurately when values used to represent a Visit vary with sources. We currently have 3 sources of visit data:

Protocol Document

EDC

IRT

We are seeking a data scientist who can propose an AI technique that can be used to automate generating a concordance mapping between the systems? It would need to detect similarities between the Names and with supporting actual dates data - determine the mapping with high confidence. Our efforts to manually manage this issue has never gotten any traction.

There is a business need to not only reconcile the naming, but to then be able to enhance the data with known values about the visit- Whas this a Dosing visit, Unplanned Visit, End of Study Visit, Payable Visit?

Pharmaceutical and Life Sciences

$100/hr - $150/hr

Starts Nov 27, 2017

8 Proposals Status: CLOSED

Net 30

Client: C*******

Posted: Nov 21, 2017

MVP - The Future of Litigation: Practicing Law With a Crystal Ball

Technology and the law currently overlap in meaningful, but largely incomplete, ways. This is an opportunity to bridge the gap and change the legal landscape forever.

Understanding the problem.

The entire legal industry is premised on the notion that rules, statutes, and prior decisions by judges and courts (caselaw) govern the decision-making process for lawyers when representing clients. Of course, a lawyer's education, perspective, and unique problem-solving abilities will affect the decision making process, but not as much as you might think. Almost invariably this is how a lawyer makes a decision for a client:

What does the rule of procedure say we must do?
Is there a statute that controls this situation?
How have prior judges and/or courts ruled on similar arguments under similar circumstances

Stated another way, lawyers rely almost exclusively on the already-existing law (rules, statutes, caselaw) to create arguments and advance a certain strategy instead of another--nothing else. This is how we are trained in law school; this is what other lawyers and judges expect; this is how the system has always worked. But there's more data/information out there that already exists and is NOT being utilized. Data that could prove to be far more valuable in terms of correct decision making than any rule, statute, or case.

Technology and the Law: Some Meaningful Overlap, but Something is Missing

It wasn't that long ago that if a lawyer needed to research a rule, statute, or caselaw he or she would have to phsyically go to a law library, located and retrieve books, and read them. Then, with the advance of technology, came web-based databases that stored the information but allowed lawyers to browse them at lightining speed (e.g. Westlaw and LexisNexis). That markeplace exists, is controlled by major players, and lawyers' use of their products are ubiqutious.

More recently, law firms were managed in paper-heavy, intensive phsyical environments. Physical files, endless documents, all created inefficient management of legal operations. Needless to say creating and managing task assignemtns and workflow was really challenging. Then came seemingly dozens of vendors all proclaiming to have the solution (e.g. PracticePanther, RocketMatter, FileVine, etc.). And they all help in many ways. Lawyers are beginning to use them more and more.

In conclusion, systems exist for help lawyers research the law, communicate with their colleagues, organize and store information, and even automate certain proceses (i.e. timekeeping, form document generation, etc.). And while that may seem like a complete solution, it's not.

The Most Important Data Lawyers Should Consider Before Making a Decision Is Not Currently Available to Them: We are Going to Create It!

Lawyers who handle litigation (that is, lawsuits or criminal prosecutions where cases are actually fought in courts) are working at a serious disadvantage--they just don't know it. (Trust me, I am a litigation lawyer).

Yes, whether a rule or statute exists and addresses a particular concern is important to know.

Yes, a judge or court's prior decision in a written opinion is important, but doesn't give the full picture. It's too surface level.

What if a criminal defense lawyer currently represents a white, male, 26 years old, no prior criminal record, in Miami, Florida, for DUI, and has judge John Smith, prosecutor Jane Doe, and wants to know:

Does this particular prosecutor ever negotiate plea bargains to amend the charge to a lower level instead of just DUI?
What is the likelihood that this prosecutor will offer jail time in exchange for a guilty plea?
If my client pleads guilty, what is the likelihood that this judge will sentence him to jail? What if we go to trial instead of a guilty plea but at trial the jury still finds my client guilty? Will that affect the way this judge sentences my client?
There are literally hundres if not thousands of other similar queries litigation lawyers always think but can never know...until now.

We are looking for one or more creative, innovative, hard-working people to help developing a web-based environment for litigation lawyers that both pulls data from available public records, but also is driven by those same user-lawyers to input case- and client-specific information about judges, other lawyers, witnesses, insurance adjusters, jurors, etc., to eventually create the capability of showing statistical probabilities of certain outcomes based on specific queries, and in some instances, demonstrate predictive outputs. This enviornment will be largely driven by users who input data from around the country while prompted in a non-exhausting, inviting way repeatedly.

At first, my own law firm can provide lots of data (and guidance) to help build this platform. We can even beta test it. The ultimate goal is to commercialize the platform. We are looking for long term developers, not just a one and done.

This project will require immense and particular knowledge of how litigaiton works. The nuance is so complex that only a lawyer would understand. However, I am confident that I can translate it and work diligently to help whomever works on this project get it done.

Imagine if lawyers could predict the future? This could and would change the practice of law forever. Completely disrupt the market.

We are Open Minded To The Best Path to Reach Our Goal, But Here Is My Rought Idea of the Journey

Explore - Using my own law firm and tons of data for personal injury and criminal defense cases in Florida, we can provide both data and tons of example queries and factors you should consider when developing the platform. This is more exploration for you to gain deeper insight into how litigation works and what times of data will be needed; what types of queries will the system need to be able to process.
Test and Fix - We can first start beta testing with my own law firm.
Limited Launch - Launch the platform and begin to market it without to other attorneys we know.
Fix the Bugs - Absorb the feedback and fix the problems that need to be addressed
Major Launch - Develop a website to accompany the platform and to drive traffic/potential users too. That website will also be where users login, where info on our platform is stored, and other basic company stuff. But this is the first meaningful step in commercializing the product via subscription or license model.
Ongoing Development and Support - Even after the launch, keep you on board for the future to help continually develop, fine tune, and better the platform.

We are looking to first build an MVP. In your proposal please submit the milestones for the MVP.

Legal

Customer Behavior Analysis

Consumer Experience

$10,000 - $30,000

Starts Dec 05, 2017

5 Proposals Status: COMPLETED

Client: F******* ******* ****

Posted: Nov 21, 2017

Statistical Model to Establish Relative Strength of a Business Based on Online Ratings and Reviews

We have an existing statistical that accomplishes the scope as outlined below. We want to enhance it as we have more data points. The model calculates an apartment community’s online reputation as compared to the entire population.

We collect ratings and number of reviews from various sites on over 70,000 apartment communities on a monthly basis.

As of now, there are about 19 sites that we collect data on. The number of sites is growing. For each site, we gather two variables i.e. the aggregate star rating of the property and the number of reviews that make up the star rating. Most sites have a 5 point scale. There is one site apartmentratings.com that lists the percentage of people recommending the property in addition to the star rating and the number of reviews. Please see the attached sample data.

All data and analysis will be in excel. The model should be in excel too.

The goal is to generate an overall rating that aggregates the websites’ reviews. The rating should be both simple to implement, stable, and accurate. The end result should be a score for each property on a scale of 100 and should serve as a relative ranking of a property’s online reputation.

We have data on what sites are more important to prospects while looking for an apartment. We’ll work with the consultant to fine tune the weightage of various sites. The resulting scoring methodology needs to be tested using various methods such as Sigmoid function example, Bayesian weighting, and Mean Absolute Error.

Tools Used - Excel

Professional Services

Real Estate

Market Research

$3,000 - $7,500

Starts Dec 18, 2017

13 Proposals Status: IN PROGRESS

Client: J* ****** ********

Posted: Nov 08, 2017

Assessing the Value of User Data in Online Native Advertising

Background:

Taboola is the leading global recommendation platform, serving over 470 billion recommendations to over 1.3 billion people every month on some of the Web’s most innovative digital properties, including USA TODAY, Huffington Post, MSN, Business Insider, Chicago Tribune and The Weather Channel. Headquartered in New York City, Taboola also has offices in Los Angeles, London, Tel Aviv, New Delhi, Bangkok, São Paulo, Shanghai, Beijing, Seoul, Istanbul, Sydney and Tokyo. Taboola’s global reach is second only to Google’s and currently, 88% of America and 83% of the UK sees a Taboola recommendation 2-3 times a day. Over 60% of the business is in mobile web and mobile apps.

Taboola collects and analyzes a vast amount and range of non PII user data, most of which is related to the online behavior and content consumption of users. In addition, Taboola operates the industry's most comprehennsive data marketplace, allowing advertisers to utilize data and segments from numerous 3rd part providers in addition to Taboola segments.

User data at Taboola creates value through 2 primary mechanisms:

Targeting (used by clients)
Personalization (used by Taboola's recommendation engine and other products)

Both of these mechanisms are ultimately related to the Revenue per Thousand page views (RPM) that Taboola is able to generate for it's supply partners (web publishers, apps, browsers etc. that use Taboola's product to monetize traffic). Both are also linked to the Cost Per Action (CPA) Taboola advertisers effectively pay.

Objectives:

To estimate the value of user data in RPM and/or CPA terms
To estimate the impact of a change in the average persistancy of user data on these metrics

Relevant Know-How and Experience:

Knowledge of real life cases of use of data in online advertising where insights were derived on the impact/value of data. Cases in other verticals (retail, healthcare...) may be relevant given high degree of similarity in other dimentions.

Please do not submit a proposal if you don't have relevant data to share (i.e. a story through which interesting data points related to the research question can be learned). We aren't looking for amazing dta scientists with a great idea as to how they would go about solving this (not yet :) ) . We are looking for interesting information on relevant businesses (even anonymized). So ask yourself:

Do you have a good case study with quantititive information?
Can you share information about how companies in the advertising space have calculated the value of data?

Please note this will be a paid exploratory call for 1-2 hours. We are open to engaging with multiple experts if you do have relevant insight as per questions and information above. There will be no interviews for this project. This could lead to further paid engagements depending on the call.

Media and Advertising

Customer Lifetime Value

Web Analytics

$100/hr - $300/hr

Starts Nov 13, 2017

9 Proposals Status: CLOSED

Net 30

Client: T*******

Posted: Nov 03, 2017

Quantitative Back Testing Developer

Founded in 2005, we are a small company in Santa Cruz, California that harnesses the collective wisdom of online investors to gain an edge in the stock market. Our Research Department works with one of the most extensive investor sentiment databases in the world. As part of these activities, we use back-testing systems to evaluate investment strategies built from this proprietary data.

Skills required:

Computer Science or Software Engineering education or equivalent industry experience
5+ years developing in Python, preferably in a quantitative equity trading environment.
3+ years’ experience in SQL and relational databases.
Writes highly readable, maintainable, and testable code.
Knowledge of equity financial markets.

Additionally, we highly value the following:

Experience scaling systems and addressing the performance issues that come with scale.
Experience creating back-testing systems both loop-based and event-based which handle share quantities, deal with corporate actions, and produce daily reports with multiple statistics (e.g. descriptive stats and performance metrics).
Experience in moving from simulation to a production environment with multiple trading platforms.
Knowledge of APIs for Interactive Brokers, Wolverine Execution Services, and/or other brokers.

As a small team, we value accuracy and quality. You will report to the Head of Research and operate with a high degree of independence. You value tackling difficult problems and like to work on projects that make you proud. If you are at the intersection of Finance and Engineering, are smart, have a strong work ethic, are enthusiastic, driven and know how to get things done, we want to talk to you.

Back testing

Unmet Need Analysis

R&D

$50/hr - $75/hr

Starts Oct 25, 2017

8 Proposals Status: CLOSED

Client: i******

Posted: Oct 26, 2017

NLP Machine Learning Expert to Build Predictive Model on Unstructured Data

Require a highly experienced data scientist specializing in analyzing unstructured data and build predictive self-learning algorithms.

Problem: Analyze unstructured documents (blogs, PDFs, etc) to predict more than 300 pre-identified categories/key phrases/tags/topics etc. And then build self-learning algorithm that learns from every human intervention/feedback. Sample data for building the model will be about 5000 documents shared by the end client.

Looking for a data scientist who can build the model, lead the entire project execution and be the single point of contact with the client. Or open to a team that can provide 2-3 resources i.e. a project manager and data scientists.

Should have strong expertise in text analytics, NLP based machine learning, cognitive, and AI applications leveraging open source technologies like R, python etc.

The final deliverable will be a self-learning alogrithm built in Python.

NLP

Text Analytics

cognitive analytics

$2,500

Starts Nov 06, 2017

19 Proposals Status: COMPLETED

Net 30

Client: G****** ********* ********** *******

Posted: Oct 26, 2017

Predict Rate of Recruitment for a Clinical Trial

Knowledgent is a precision-focused data intelligence firm with consistent, field-proven results across industries. Rather than follow the latest industry hype, we rise above the noise to craft innovative and reliable data and analytics solutions that help organizations use information as a strategic asset.

Problem:

We wish to develop a model capable of predicting the rate of recruitment (number of patients per site per month) for a clinical trial on a country-by-country basis and with a known confidence interval. Data include free-text elements and structured data, although it is expected that because of the limited time available, structured data will play a larger role in the analysis.

Expertise Required:

We will need someone with data science expertise, preferably with some experience in the clinical trial space. Expertise in NLP may be beneficial, as much of the data available for the project that differentiates one clinical trial from another is in the form of free text inclusion/exclusion criteria.

Data sources:

Internal trial monitoring data from one company, public clinical trial data (e.g., clinicaltrials.gov), incidence/prevalence data by country (public and 3rd party licensed data). Data is available in Hive tables in an AWS environment, as well as in the original formats if needed/desired (varies by data source, but includes XML and xls files). It is known that this is not a comprehensive list of data that would influence the rate of recruitment. While any publicly available data may be incorporated to improve predictions, it is expected that a basic model can be produced using only the data provided.

We cannot provide a sample of the licensed data. But, much of the relevant data is publicly available. Data from clinicaltrials.gov can be most easily accessed from http://aact.ctti-clinicaltrials.org/

Technology stack:

Data available on Hive on AWS. Python server running on an AWS EC2 instance with Jupyter notebook. Note that any solution must be provided in Python

Deliverable:

An algorithm capable of predicting the rate of recruitment (number of patients per site per month) for a clinical trial within a fixed ±0.05 range (typical recruitment rates can be expected to be in the 0.1 to 1.0 range) with 80% of predictions within the target range. Predictions must be made for each country for which incidence/prevalence data is provided (~7), as well as for the trial as a whole, although the success criteria stated above only apply to the trial-level prediction.

Location Preference:

We have some preference for people located close to our Warren, NJ office, although this is not a strict requirement.

Healthcare

Pharmaceutical and Life Sciences

Biology, Health and Medicine

$75/hr - $150/hr

Starts Oct 23, 2017

10 Proposals Status: CLOSED

Client: K***********

Posted: Oct 17, 2017

FUTURE OF WORK PLATFORM

COMPARE OFFERINGS

UPSKILLING PLATFORM

EXPERFY TALENTCLOUDS

Custom TalentClouds

Browse Projects

Medical Images Classifier

$70/hr - $150/hr

WorkFusion Development - Create Human Tasks to existing Business Process

$50/hr - $100/hr

Data Anomaly Detection and Suggestions

$100/hr - $200/hr

Schedule Reconciliation Problem

$100/hr - $150/hr

MVP - The Future of Litigation: Practicing Law With a Crystal Ball

$10,000 - $30,000

Statistical Model to Establish Relative Strength of a Business Based on Online Ratings and Reviews

$3,000 - $7,500

Assessing the Value of User Data in Online Native Advertising

$100/hr - $300/hr

Quantitative Back Testing Developer

$50/hr - $75/hr

NLP Machine Learning Expert to Build Predictive Model on Unstructured Data

$2,500

Predict Rate of Recruitment for a Clinical Trial

$75/hr - $150/hr