facebook-pixel

Healthcare Paid Claim Data Modeling and Analysis (US-based Candidates)

Industry Healthcare, Insurance

Specialization Or Business Function

Technical Function Analytics (Predictive Modeling, Machine Learning)

Technology & Tools

COMPLETED Apr 03, 2017

Project Description

About the Project:  We would like to have one or more algorithms built that can use large, healthcare paid claim data sets and identify those claims most likely to be part of an accident (motor vehicle, slip and fall, etc), as well as most likely to NOT be part of an accident.  It is possible that there may need to be separate algorithms for the distinct accident types.

About Us:  We perform subrogation-related activities including the identification of  claims that should be paid by another liable party (auto-medical insurance, homeowner’s insurance, etc). Our current identification methodology includes an ETL process that mines paid healthcare claims based on a number of criteria. Selected claims (or a single claim) are aggregated into a case for investigation as to whether or not the claim relates to an accident or injury for which a third party is responsible. 

About the Existing Process:  Currently, our process leverages a proprietary rule set, by which we both identify cases we definitely want to open, as well as claims we know we want to reject.  We review the existing claim on its own merit, as well as claims for the same patient within a reasonable time period, to determine whether the full episode of care appears to be part of an accident.  We augment this advanced process with human review of claims that are classified as “possible” candidates for selection.

About the Data:  We have the claims data, our current rule sets, and our outcomes data, which would be included in this project.

Claims Data includes:

  • Health Plan information
  • Healthcare provider information
  • Patient Information
  • Diagnosis Codes
  • Procedure Codes
  • Occurrence Codes

Existing Case Outcomes Data includes:

  • The determination of whether a case should, or should not, have been paid by a different party
  • Settlements, which are agreements by other parties to repay part or all of the payment made by the health plan
  • Recoveries, which are actual payments made against settlements

We are going to provide an Excel header file which shows, in more detail, the data elements which would be made available during the project.

About the Model:  It is expected that the algorithm will leverage features, both from the claims, and potentially beyond the claims, to predict the likelihood of a claim being part of an accident.  Some potentially predictive features may include:

  • Diagnosis Codes (some ICD-9 and ICD-10 codes specify that the cause was an accident; others may be highly correlated to an accident; still others may help weed out claims that should not be included)
  • Procedure Codes 
  • Patient Age/Gender
  • Patient Zip Code (studies suggest that some states have a significantly higher prevalence of accidents than others)
  • Place of Service (Emergency Room, for example)
  • The use of an ambulance or air-ambulance 
  • Are there claims for multiple covered individuals under the same subscriber for the same DOS at the same provider?  
  • Other third-party data sets that could be incorporated to further augment the prediction (TBD)

About the Deliverable:  The expected outcome of this project is that we have an algorithm, which can be inserted into our existing ETL process, and provides a single, key metric:  likelihood of this claim being part of an accident.  In addition, it is expected that the data scientist will demonstrate the expected claims whose decisioning changed between actual historical decision and modeled decision so that we can estimate profitability lift and ensure a sense of comfort in the expected outcome of the model.

Other Notes:  

  • Our technology team would provide access to a secure server location either in AWS or Azure where the data must remain.  We will assist in installation of any and all software needed to perform the model development.
  • The general expectation is that the model will be called during the claim ETL process, but specifics as to the data scientist’s recommended approach to operationalizing the algorithm(s) should be described in their proposal
  • Only U.S. based Data Scientists will be considered

Project Overview

  • Posted
    September 27, 2016
  • Planned Start
    November 10, 2016
  • Preferred Location
    United States
  • Payment Due
    Net 30

Client Overview


EXPERTISE REQUIRED

Matching Providers