Industry Chemical, Oil and Gas, Energy and Utility
Specialization Or Business Function Strategic Business Planning (Competitive Intelligence)
Technical Function Analytics (Natural Language Processing, Text Analytics)
Technology & Tools Programming Languages and Frameworks (R, Python)
We have two distinct projects with aggressive deadlines. These projects collectively will serve as a proof of concept for a research platform that we would like to build. We intend to hire two data scientists, one for each project.
Project 1: Academic Network Formation Detection
Objectives
Project Description
The finished project will include scripts/software to scrape/pull the dataset from one or more sources (e.g. ResearchGate) and
You will also deliver the actual database/files produced by the above scripts.
We are open to recommendations of techniques, approaches, and strategy based on the expertise and experience of the expert.
We have no particular requirement around tools, programming language, or methodology. We will ultimately need to refresh our dataset at least monthly; however our immediate need is to have historical data for a proof of concept.
We expect the output to eventually be imported into a relational database, and merged/deduped with people, institutions, and relationships from other data sources. While we currently don’t have a full working taxonomy, we also expect to combine the categorization scheme extracted from this data source with other data sources (e.g. company profiles/descriptions, patents, press releases).
Boundaries
To narrow the universe of publications and people for this dataset, we are only interested in publications, people, and institutions in any STEM discipline, within the date range of 2000-present.
Project 2: Entity extraction and classification from patent filing databases
Objectives
Project Description
The finished project will include:
We are open to recommendations of techniques, approaches, and strategy based on the expertise and experience of the expert.
We have no particular requirement around tools, programming language, or methodology.
We expect the output to eventually be imported into a relational database, and merged/deduped with people, institutions, and relationships from other data sources. While we currently don’t have a full working taxonomy, we also expect to combine the categorization scheme extracted from this data source with other data sources (e.g. company profiles/descriptions, patents, press releases).
Boundaries
To narrow the universe of patents for this dataset, we are only interested in publications, people, and institutions in any STEM discipline, within the date range of 2000-present.
Proposal
Please specify which of the two projects you are most interested in and how much time you can dedicate on a weekly basis. An estimate of how long it would take to create a proof of concept would be very helpful. We would also like to understand your specific methodology to tackle the challenges described above.
Matching Providers