Industry Hi-Tech
Specialization Or Business Function Risk and Compliance (Fraud Identification and Prevention, Anomaly Detection, Credit Risk Analysis), Consumer Experience (Customer Behavior Analysis, Web Analytics)
Technical Function Data Visualization (Statistical Graphics), Marketing and Web Analytics, Software and Web Development (E-commerce)
Technology & Tools
Background
We are a web development and online marketing e-commerce company with 1.5 million visitors and over 2 million of users.
The problem
Despite the vast majority of our users happily utilizing our free and paid services, there are always a few bad apples in the bunch. Our business, like many e-commerce businesses, has to actively seek friendly fraud (aka chargeback fraud) risk. Friendly fraud occurs when an individual makes a purchase online via their credit or debit card then requests a chargeback from the bank once the goods or services have been consumed. A completed chargeback cancels the original transaction and refunds the individual and the merchant is held accountable regardless of the measures taken to verify the transaction. This harms our business, both our reputation and our bottom line.
Project requirements
We are searching for an analytical algorithm and data modeling set to:
1) identify the traits, signals, patterns of the customers whom are most likely to do request a chargeback
2) predict the customers who are most likely to request a chargeback and when
The analytic algorithms and data modeling will allow us to:
1) feed newly subscribed customer data and purchasing information into the data models and analytical algorithms to produce a prediction result for each new customer
2) show the prediction result as a report (CSV, excel) on the likelihood of chargeback behavior based on:
a. subscriber information
b. date of the first purchase
c. chargeback timeline: date or days after the first purchase
d. confidence level
e. the identified traits, signals, and patterns for the chargeback behavior
Upon the completion of the project, you will be provide:
1) analytical algorithm software module(s) in Python
2) data modeling module(s) in Python
3) a simple software application for us to feed the new data into the algorithms and data models for generating the prediction results described above.
Datasets
1) subscriber information
a. user reference ID
b. registration timestamp
c. last logged in timestamp in using our online service application
2) purchase information of subscriber
a. first purchase timestamp
b. number of purchases
c. last purchase timestamp
d. credit card type (VISA/MC/AMEX, etc…)
e. chargeback timestamp (on those who did do the chargeback to us)
f. service cancellation timestamp (on those who cancel the subscription service)
3) we can also provide the geographical/location data of the subscriber, if needed
Additional Information
1. The dataset provided is somehow considered unbalanced - the percentage of unhappy customers is around 1% of total users. Incorrectly identifying a happy customer as an unhappy customer will negatively impact our revenue stream. Obviously, this is what you need to take into consideration in evaluation metrics of machine learning models – we cannot only focus on the True Positive Rate and results.
2. The subscription status of users does change over time; we’ll need to figure out when is the best timing to be certain that a user will conduct a chargeback soon. However, if we wait too long for enough evidence to catch unhappy customers, it might be too late.
Please describe your ideas for overcoming challenges listed above in the proposal, or any of relevant projects you have done before, so we can invite you to discuss further with our engineering team.
Matching Providers