A little scraping goes a long way

Photo of us hacking on scrapers together.

Photo by Nick Evershed

Last night, about 10 of us got together in Sydney for a fun night of scraping and learning about morph.io. I organised the get together because I’m just really excited about writing scrapers and using data from morph.io at the moment. I’ve only been writing scrapers for the last few months as Matthew and I have been working on adding new features and evolving morph.io’s interfaces.

I think writing scrapers is a great candidate for a hack event activity. Scrapers don’t take long to write and a lot of the code and techniques are common between scrapers so people can share and teach each other. Once a scraper is written, it just keeps paying you back with useful data because morph.io takes care of all the grunt work. Each new scraper collects previously obscurely published, unstructured data and opens it up for new research, reporting and civic tech possibilities. Henare tells me that the PlanningAlerts scraper hackfest in 2011 was one of the OpenAustralia Foundation’s most productive events, adding PlanningAlerts coverage for over 1,823,124 Australians. A little scraping can go a long way.

So what did we do?

Matthew fixed up and reviewed PlanningAlerts scrapers. The fine people of Yarra Ranges Shire, Victoria can now be informed and have their say on changes to their local area.

Henare wanted to open up data from the NSW Environmental Protection Agency and remembered he’d written something for ScraperWiki Classic a few years back. His scraper to collect all prosecutions under the NSW Protection of the Environment Operations Act 1997 (POEO Act) is now running on morph with over 1000 new records.

Nick published one of his Australian Electoral Commission scrapers which collects political donation records. It ran for 17 hours overnight and collected 29k records. He says he’s got a few more like this to put up too. Nick took his Guardian colleague Todd Moore through morph.io for the first time. Maybe we’ll see some more scrapers emerging from there–Todd seems to have an interest collecting news about Cyclist Demons. Pat and Nick also explored Bureau of Meteorology data.

Chris worked on his scraper that tracks documents tabled in NSW Parliament.

Rosie and I started making a scraper to collect Australian government contract award notices. Matthew helpfully showed us how to create and write to files in Ruby to help with reading .xlsx spreadsheets.

Erietta sketched out a big pad full of observations and ideas about how non-programmer journalists approach scraping and the questions they have.

Jack explored the data in morph.io with a focus on records of domestic violence in Australia. He also found that Alex Sadlier’s site disclosurelo.gs, that tracks documents released by Australian Government agencies through Freedom of Information requests, was’t updating, prompting Alex to get that working again (thanks Alex!)

A huge thanks to Nick and the Guardian for providing us lovely space to hack in.

New useful data was added to the public domain and people discovered why morph.io is so awesome. I’d call that a success. Thanks for coming along everyone, I had a great time catching up and ranting on about morph.io—see you next time!

This entry was posted in Event, Morph and tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


Subscribe without commenting