This post is part of a series of posts that provide step-by-step instructions on how to write a simple web scraper using Ruby on morph.io. If you find any problems, let us know in the comments so we can improve these tutorials.
In the last post we dealt with the site’s pagination and started scraping a complete dataset. In this final post we work out how to save our data and publish our scraper to morph.io.
Scrapers on morph.io use the handy ScraperWiki library to save data to an SQLite database. This is how all data in morph.io is stored. Each scraper page provides options to download the SQLite database, a CSV file of each table, or access the data via an API.
You might remember seeing the ScraperWiki library listed as a dependency in your Gemfile earlier:
ruby "2.0.0"
gem "scraperwiki", git: "https://github.com/openaustralia/scraperwiki-ruby.git", branch: "morph_defaults"
gem "mechanize"
To use this library in your scraper, you need to declare that it is required at the top of your scraper.rb in the same way you have for the Mechanize library:
require 'mechanize'
require 'scraperwiki'
You can save data using the ScraperWiki.save_sqlite() method. This method takes care of the messy business of creating a database and handling duplication for you. There are two augments you need to pass it: an array of the record’s unique keys so it knows when to override or update a record, and the data that you want to save.
A member’s full name is unique to them so you can use that as your unique key (we’ve called the field “title”). The data you want to save is your member object. After your p member statement is a good place to save your data.
p member
ScraperWiki.save_sqlite([:title], member)
Your scraper.rb should now look like this:
require 'mechanize'
require 'scraperwiki'
agent = Mechanize.new
url = 'https://morph.io/documentation/examples/australian_members_of_parliament'
["1", "2", "3"].each do |page_number|
page = agent.get(url + "?page=" + page_number)
page.at('.search-filter-results').search('li').each do |li|
member = {
title: li.at('.title').inner_text.strip,
electorate: li.search('dd')[0].inner_text,
party: li.search('dd')[1].inner_text,
url: li.at('.title a').attr('href')
}
p member
ScraperWiki.save_sqlite([:title], member)
end
end
Save and run your file. The command line output should be unchanged— but if you view the files in your project directory you will see a new file data.sqlite.
Great job. You’ve now written a scraper to collect data and save it to a database. It’s time to put your new scraper code on morph.io so you can show the world how cool you are—and so it can take care of running the thing, storing your data, and providing you easy access to it.
Running your scraper on morph.io
morph.io runs scraper code that is stored in public GitHub repositories. To run your scraper on morph.io, you’ll first have to push it back up to GitHub repository you originally cloned it from.
Start off with another git commit to save any outstanding changes.
Push your changes up to your remote GitHub repository with:
> git push origin master
Now go view your scraper’s page on GitHub (the url will be something like github.com/yourusername/the_name_of_this_scraper). Navigate to view your scraper.rb file on GitHub and see that it’s got all your local changes.
You can now go over to your scraper’s page on morph.io and click the “Run scraper” button near the top of the page. The moment of truth is upon us.
As your scraper runs you will see all your console output print the data for the members you are scraping. A few seconds later, underneath the heading “Data”, you’ll find a table showing a representative ten rows of data and buttons to download your data in a range of formats.
Take a moment to explore the download options and check that the data looks as you expected.
That’s all folks
Well done my friend, you’ve just written a web scraper.
With just a few lines of code you’ve collected information from a website and saved it in a structured format you can play with. You’ve published your work for all to see on morph.io and set it to run, store and provide access to your data.
If you want to get really fancy you can set your scraper to auto run daily on your scraper’s settings page so it’s stays up to date with any changes to the members list.
Before you go mad with power, go and explore some of the scrapers on morph.io. Try searching for topics you find interesting and domains you know. Get ideas for what to scrape next and learn from other peoples’ scraper code.
Remember to post questions to the help forums if you get blocked by tricky problems.
If you have any feedback on this tutorial we’d love to hear it.
Now go forth with your new powers and scrape all the things!





Who comments in PlanningAlerts and how could it work better?
In our last two quarterly planning posts (see Q3 2015 and Q4 2015), we’ve talked about helping people write to their elected local councillors about planning applications through PlanningAlerts. As Matthew wrote in June, “The aim is to strengthen the connection between citizens and local councillors around one of the most important things that local government does which is planning”. We’re also trying to improve the whole commenting flow in PlanningAlerts.
I’ve been working on this new system for a while now, prototyping and iterating on the new comment options and folding improvements back into the general comment form so everybody benefits.
About a month ago I ran a survey with people who had made a comment on PlanningAlerts in the last few months. The survey went out to just over 500 people and we had 36 responders–about the same percentage turn-out as our PlanningAlerts survey at the beginning of the year (6% from 20,000). As you can see, the vast majority of PlanningAlerts users don’t currently comment.
We’ve never asked users about the commenting process before, so I was initially trying to find out some quite general things:
The responses include some clear patterns and have raised a bunch of questions to follow up with short structured interviews. I’m also going to have these people use the new form prototype. This is to weed out usability problems before we launch this new feature to some areas of PlanningAlerts.
Here are some of the observations from the survey responses:
Older people are more likely to comment in PlanningAlerts
We’re now run two surveys of PlanningAlerts users asking them roughly how old they are. The first survey was sent to all users, this recent one was just to people who had recently commented on a planning application through the site.
Compared to the first survey to all users, responders to the recent commenters survey were relatively older. There were less people in their 30s and 40s and more in their 60s and 70s. Older people may be more likely to respond to these surveys generally, but we can still see from the different results that commenters are relatively older.
Knowing this can help us better empathise with the people using PlanningAlerts and make it more usable. For example, there is currently a lot of very small, grey text on the site that is likely not noticeable or comfortable to read for people with diminished eye sight—almost everybody’s eye sight gets at least a little worse with age. Knowing that this could be an issue for lots of PlanningAlerts users makes improving the readability of text a higher priority.
There’s a good understanding that comments go to planning authorities, but not that they go to neighbours signed up to PlanningAlerts
To “Who do you think receives your comments made on PlanningAlerts?” 86% (32) of responders checked “Local council staff”. Only 35% (13) checked “Neighbours who are signed up to PlanningAlerts”. Only one person thought their comments also went to elected councillors.
There seems to be a good understanding amongst these commenters that their comments are sent to the planning authority for the application. But not that they go to other people in the area signed up to PlanningAlerts. They were also very clear that their comments did not go to elected councillors.
In the interviews I want to follow up on this are find out if people are positive or negative about their comments going to other locals. I personally think it’s an important part of PlanningAlerts that people in an area can learn about local development, local history and how to impact the planning process from their neighbours. It seems like an efficient way to share knowledge, a way to strengthen connections between people and to demonstrate how easy it is to comment. If people are negative about this then what are their concerns?
“I have no idea if the comments will be listened to or what impact they will have if any”
There’s a clear pattern in the responses that people don’t think their comments are being listened to by planning authorities. They also don’t know how they could find out if they are. One person noted this as a reason to why they don’t make more comments.
Giving people simple access to their elected local representatives, and a way to have a public exchange with them, will hopefully provide a lever to increase their impact.
“I would only comment on applications that really affect me”
There was a strong pattern of people saying they only comment on applications that will effect them or that are interesting to them:
How do people decide if an application is relevant to them? Is there a common criteria?
Why don’t you comment on more applications? “It takes too much time”
A number of people mentioned that commenting was a time consuming process, and that this prevented them from commenting on more applications:
What are people’s basic processes for commenting in PlanningAlerts? What are the most time consuming components of this? Can we save people time?
“I have only commented on applications where I have a knowledge of the property or street amenities.”
A few people mentioned that they feel you should have a certain amount of knowledge of an application or area to comment on it, and that they only comment on applications they are knowledgeable about.
How does someone become knowledgeable about application? What is the most important and useful information about applications?
Comment in private
A small number of people mentioned that they would like to be able to comment without it being made public.
Suggestions & improvements
There were a few suggestions for changes to PlanningAlerts:
Summing up PlanningAlerts
We also had a few comments that are just nice summaries of what is good about PlanningAlerts. It’s great to see that there are people who understand and can articulate what PlanningAlerts does well:
Next steps
If we want to make using PlanningAlerts a intuitive and enjoyable experience we need to understand the humans at the centre of it’s design. This is a small step to improve our understanding of the type of people who comment in PlanningAlerts, some of their concerns, and some of the barriers to commenting.
We’ve already drawn on the responses to this survey in updating wording and information surrounding the commenting process to make it better fit people’s mental model and address their concerns.
I’m now lining up interviews with a handful of the people who responded to try and answer some of the questions raised above and get to know them more. They’ll also show us how they use PlanningAlerts and test out the new comment form. This will highlight current usability problems and hopefully suggest ways to make commenting easier for everyone.
Design research is still very new to the OpenAustralia Foundation. Like all our work, we’re always open to advice and contributions to help us improve our projects. If you’re experienced in user research and want to make a contribution to our open source projects to transform democracy, please drop us a line or come down to our monthly pub meet. We’d love to hear your ideas.