Skip to main content

PDF Data Annotation for Researchers

I took over the project and was responsible for building features and scaling from 0 to 1000 users

  • Software Engineer
  • ML Research
  • Front End Development
  • Back End Development

The problem

Researchers were trying to fine tune models for better performance on PDF attribution. Things like label box existed, but nothing was quite good enough for their specific usecase. So we built out our own pdf data collection tool, and the paper is released here.

The Solution

We built out a tool that took json files from researchers allowing them to customize their studies, and then outputted json files with results of all the users from upwork. We also collected user behavior data, such as clicks, and scrolls. Ultimately this gave us good coarse data to filter and train on a model.