PDF Data Annotation for Researchers

I took over the project and was responsible for building features and scaling from 0 to 1000 users

Software Engineer
ML Research
Front End Development
Back End Development

The problem

Researchers were trying to fine tune models for better performance on PDF attribution. Things like label box existed, but nothing was quite good enough for their specific usecase. So we built out our own pdf data collection tool, and the paper is released here.

The Solution

We built out a tool that took json files from researchers allowing them to customize their studies, and then outputted json files with results of all the users from upwork. We also collected user behavior data, such as clicks, and scrolls. Ultimately this gave us good coarse data to filter and train on a model.