Day Twenty-Five - Introducing my language data pipeline project
It’s about time for me to introduce the main project that I’ve been working on at the Recurse Center. This is a pipeline for collecting annotated text data for AI training and benchmarking. I’m working on this because it’s a fairly complex real-world process, it involves an area of my expertise (language data), and I’ll get to learn hands-on about orchestrators and other data engineering tools.