Named entity recognition is a classic task for many text problems. Oftentimes companies want to identify named entities that they care about in social media posts so that they can identify the discussions happening online about their technology field, as well as about their products specifically. One problem where named entity recognition can be applied is for a news organization to understand which institutions are being discussed online to discover if there are any new discussions going on about a newsworthy event. How can we automatically find these named entities within social media posts?
Solution:
I developed an automated natural language processing, machine learning based pipeline for cleansing text data sets, featurizing them, and applying a classification model to determine if any given set of 1 through 5 tokens was an entity of interest.
Methods:
Classical machine learning methods such as SVMs, logistic regression, random forests, etc… along with a variety of EDA tasks and dimensionality reduction tasks.
Frameworks and Platforms:
Python, scikit-learn, etc…
Outcomes:
Developed an automated NER pipeline that had a precision of over 90%, and a recall of over 70%.