Type of session : TEACH
Title : Workshop : Introduction to Open Refine for data wrangling.
Name of session facilitator(s) : Geraldine / Nora ?
Approximate duration : 1h
Skill level : beginner (no coding)
Open refine is a free and open source tool to clean and explore datasets in tabular forms.
An open refine project consists of a table with rows of data.
OR makes it possible to identify elements in a massive file and to modify them if need be. Useful to correct mistakes, spot empty cells, merge data…
There is in particular a ‘clustering’ function which is precious to normalize data automatically.
The user can also use OR to filter the rows to display using facets that define filtering criteria. Facets can be textual, numeric….
For example, if you have a file with customer information, you can filter the rows of clients living in Brighton, whose company has over 50 employees and whose boss is female. Or those whose turnover is over a certain amount and haven’t ordered anything in the last two years. If you’re doing research on a file with information on works by various authors, you can filter only the ones whose title contains the word ‘love’, published between 1858 and 1954 in Germany.
The workshop could consist in a brief presentation of the software’s capabilities followed by an exercise to manipulate its mains functions.
Prerequisite : A laptop with OR installed and the sample file dowloaded.