Sloan Grant – Historic Wine Prices
Data extraction methodologies for historic documents
Further Develop Tools for the Distributed Transcription and Classification of Data from Historic Sources. Grant No. # G-2017-10047 .Peter Brantley UC Davis Library, 2019-12-31
With support from the Sloan foundation, the Library at the University of California, Davis developed data extraction methodologies from historic wine related documents. The main aim of the project was to develop methodologies using machine techniques that extract more structured information from the documents beyond simple text extraction. In addition, the goals of the project served as the basis for two immersive learning datafests hosted at the library.
Key Project Outcomes
- Discovery and digitization of previously unincluded Sherry-Lehmann catalogs, resulting a more complete corpus;
- Two multi-day datafests, (2018 with 26 participants, 2019 with 10) hosting UC Davis students in computer science, statistics; and other disciplines;
- Hiring a graduate student and completing an initial price data recovery scheme;
- Application of the scheme to the Sherry Lehmann corpus, resulting in about 250K prices being extracted from the catalogs;
- A presentation of the methodology, results, and concerns at the 2019 American Association of Wine Economists meeting;
- Strategies on moving forward, both on expanding our price extraction routines, and applying learned strategies to new domains.