Blog | Page 9

Ruby web scraping tutorial on morph.io – Part 5, saving your data & running it on morph.io

By Luke Bacon | Published: October 22, 2015

This post is part of a series of posts that provide step-by-step instructions on how to write a simple web scraper using Ruby on morph.io. If you find any problems, let us know in the comments so we can improve these tutorials.

In the last post we dealt with the site’s pagination and started scraping a complete dataset. In this final post we work out how to save our data and publish our scraper to morph.io.

Scrapers on morph.io use the handy ScraperWiki library to save data to an SQLite database. This is how all data in morph.io is stored. Each scraper page provides options to download the SQLite database, a CSV file of each table, or access the data via an API.

You might remember seeing the ScraperWiki library listed as a dependency in your Gemfile earlier:

ruby "2.0.0"

gem "scraperwiki", git: "https://github.com/openaustralia/scraperwiki-ruby.git", branch: "morph_defaults"
gem "mechanize"

To use this library in your scraper, you need to declare that it is required at the top of your scraper.rb in the same way you have for the Mechanize library:

require 'mechanize'
require 'scraperwiki'

You can save data using the ScraperWiki.save_sqlite() method. This method takes care of the messy business of creating a database and handling duplication for you. There are two augments you need to pass it: an array of the record’s unique keys so it knows when to override or update a record, and the data that you want to save.

A member’s full name is unique to them so you can use that as your unique key (we’ve called the field “title”). The data you want to save is your member object. After your p member statement is a good place to save your data.

p member
ScraperWiki.save_sqlite([:title], member)

Your scraper.rb should now look like this:

require 'mechanize'
require 'scraperwiki'

agent = Mechanize.new
url = 'https://morph.io/documentation/examples/australian_members_of_parliament'

["1", "2", "3"].each do |page_number|
  page = agent.get(url + "?page=" + page_number)

  page.at('.search-filter-results').search('li').each do |li|
    member = {
      title: li.at('.title').inner_text.strip,
      electorate: li.search('dd')[0].inner_text,
      party: li.search('dd')[1].inner_text,
      url: li.at('.title a').attr('href')
    }

    p member
    ScraperWiki.save_sqlite([:title], member)
  end
end

Save and run your file. The command line output should be unchanged— but if you view the files in your project directory you will see a new file data.sqlite.

Great job. You’ve now written a scraper to collect data and save it to a database. It’s time to put your new scraper code on morph.io so you can show the world how cool you are—and so it can take care of running the thing, storing your data, and providing you easy access to it.

Running your scraper on morph.io

morph.io runs scraper code that is stored in public GitHub repositories. To run your scraper on morph.io, you’ll first have to push it back up to GitHub repository you originally cloned it from.

Start off with another git commit to save any outstanding changes.

Push your changes up to your remote GitHub repository with:

> git push origin master

Now go view your scraper’s page on GitHub (the url will be something like github.com/yourusername/the_name_of_this_scraper). Navigate to view your scraper.rb file on GitHub and see that it’s got all your local changes.

You can now go over to your scraper’s page on morph.io and click the “Run scraper” button near the top of the page. The moment of truth is upon us.

As your scraper runs you will see all your console output print the data for the members you are scraping. A few seconds later, underneath the heading “Data”, you’ll find a table showing a representative ten rows of data and buttons to download your data in a range of formats.

Take a moment to explore the download options and check that the data looks as you expected.

That’s all folks

Well done my friend, you’ve just written a web scraper.

With just a few lines of code you’ve collected information from a website and saved it in a structured format you can play with. You’ve published your work for all to see on morph.io and set it to run, store and provide access to your data.

If you want to get really fancy you can set your scraper to auto run daily on your scraper’s settings page so it’s stays up to date with any changes to the members list.

Before you go mad with power, go and explore some of the scrapers on morph.io. Try searching for topics you find interesting and domains you know. Get ideas for what to scrape next and learn from other peoples’ scraper code.

Remember to post questions to the help forums if you get blocked by tricky problems.

If you have any feedback on this tutorial we’d love to hear it.

Now go forth with your new powers and scrape all the things!

Posted in Morph | Tagged Ruby, Ruby web scraping tutorial on morph.io, scraping | 2 Responses

Ruby web scraping tutorial on morph.io – Part 4, dealing with pagination

By Luke Bacon | Published: October 21, 2015

In the last post we finished collecting the data we want but discovered we needed to collect it over several pages. In this post we learn how to deal with this pagination. There are number of techniques of dealing with pagination and the one we present here is deliberately simple.

Visit the target page in your browser and navigate between the different pages using the links above the members list. Notice that when you go to page 2 the url is mostly the same except it has the query string ?page=2 on the end:

https://morph.io/documentation/examples/australian_members_of_parliament?page=2

When scraping websites pay close attention to the page URLs and their query strings. They often include clues to help you scrape.

It turns out you can navigate between the different member pages by just changing the page number to 1, 2 or 3 in the query string.

You can use what you’ve discovered as the basis for another each loop. This time you want to make a loop that runs your scraping code for each page.

You know that the three pages with members are pages 1, 2 and 3. Create an Array of these page numbers ["1", "2", "3"] and then loop through these numbers to run your get request and scraping code for each page.

require 'mechanize'

agent = Mechanize.new
url = 'https://morph.io/documentation/examples/australian_members_of_parliament'

["1", "2", "3"].each do |page_number|
  page = agent.get(url + "?page=" + page_number)

  page.at('.search-filter-results').search('li').each do |li|
    member = {
      title: li.at('.title').inner_text.strip,
      electorate: li.search('dd')[0].inner_text,
      party: li.search('dd')[1].inner_text,
      url: li.at('.title a').attr('href')
    }

    p member
  end
end

Save and run your scraper.rb. You should now see all 150 members details printed. Well done! You should do a git commit for this working code.

This is great—but there’s one more step. You’ve written a scraper that collects the details of members of Parliament and prints them to the command line— but you actually want to save this data. You need to store the information you’ve scraped so you can use it in your projects and that’s one of the things we’ll cover in our next and final post.

Posted in Morph | Tagged Ruby, Ruby web scraping tutorial on morph.io, scraping | Leave a comment

Ruby web scraping tutorial on morph.io – Part 3, continue writing your scraper

By Luke Bacon | Published: October 20, 2015

In the last post we started writing our scraper and gathering some data. In this post we’ll expand our scraper to get more of the data we’re after.

So now that you’ve got the title for the first member get the electorate (the place the member is ‘member for’) and party.

Looking at the page source again, you can see this information is in the first and second <dd> elements in the member’s <li>.

<li>
  <p class='title'>
    <a href="http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6">
      The Hon Ian Macfarlane MP
    </a>
  </p>
  <p class='thumbnail'>
    <a href="http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6">
      <img alt="Photo of The Hon Ian Macfarlane MP" src="http://parlinfo.aph.gov.au/parlInfo/download/handbook/allmps/WN6/upload_ref_binary/WN6.JPG" width="80" />
    </a>
  </p>
  <dl>
    <dt>Member for</dt>
    <dd>Groom, Queensland</dd>
    <dt>Party</dt>
    <dd>Liberal Party of Australia</dd>
    <dt>Connect</dt>
    <dd>
      <a class="social mail" href="mailto:Ian.Macfarlane.MP@aph.gov.au"
      target="_blank">Email</a>
    </dd>
  </dl>
</li>

Get the electorate and party by first getting an array of the <dd> elements and then selecting the one you want by its index in the array. Remember that [0] is the first item in an Array.

Try getting the data in your irb session:

>> page.at('.search-filter-results').at('li').search('dd')[0].inner_text
=> "Groom, Queensland"
>> page.at('.search-filter-results').at('li').search('dd')[1].inner_text
=> "Liberal Party of Australia"

Then add the code to expand your member object in your scraper.rb:

member = {
  title: page.at('.search-filter-results').at('li').at('.title').inner_text.strip,
  electorate: page.at('.search-filter-results').at('li').search('dd')[0].inner_text,
  party: page.at('.search-filter-results').at('li').search('dd')[1].inner_text
}

Save and run your scraper using bundle exec ruby scraper.rb and check that your object includes the attributes with values you expect.

OK, now you just need the url for the member’s individual page. Look at that source code again and you’ll find it in the href of the <a> inside the <p> with the class title.

In your irb session, first get the <a> element:

>> page.at('.search-filter-results').at('li').at('.title a')
=> #<Nokogiri::XML::Element:0x3fca485cfba0 name="a" attributes=[#<Nokogiri::XML::Attr:0x3fca48432a18 name="href" value="http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6">] children=[#<Nokogiri::XML::Text:0x3fca4843b5c8 "The Hon Ian Macfarlane MP">]>

You get a Nokogiri XML Element with one attribute. The attribute has the name “href” and the value is the url you want. You can use the attr() method here to return this value:

>> page.at('.search-filter-results').at('li').at('.title a').attr('href')
=> "http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6"

You can now add this final attribute to your member object in scraper.rb:

member = {
  title: page.at('.search-filter-results').at('li').at('.title').inner_text.strip,
  electorate: page.at('.search-filter-results').at('li').search('dd')[0].inner_text,
  party: page.at('.search-filter-results').at('li').search('dd')[1].inner_text,
  url: page.at('.search-filter-results').at('li').at('.title a').attr('href')
}

Save and run your scraper file to make sure all is well. This is a good time to do another git commit to save your progress.

Now you’ve written a scraper to get information about one member of Australian Parliament. It’s time to get information about all the members on the first page.

Currently you’re using page.at('.search-filter-results').at('li') to target the first list item in the members list. You can adapt this to get every list item using the search() method:

page.at('.search-filter-results').search('li')

Use a ruby each loop to run your code to collect and print your member object once for each list item.

page.at('.search-filter-results').search('li').each do |li|
  member = {
    title: li.at('.title').inner_text.strip,
    electorate: li.search('dd')[0].inner_text,
    party: li.search('dd')[1].inner_text,
    url: li.at('.title a').attr('href')
  }

  p member
end

Save and run the file and see if it collects all the members on the page as expected. Now you’re really scraping!

You still don’t have all the members though, they are split over 3 pages and you only have the first. In our next post we’ll work out how to deal with this pagination.

Posted in Morph | Tagged Ruby, Ruby web scraping tutorial on morph.io, scraping | Leave a comment

Who comments in PlanningAlerts and how could it work better?

By Luke Bacon | Published: October 20, 2015

In our last two quarterly planning posts (see Q3 2015 and Q4 2015), we’ve talked about helping people write to their elected local councillors about planning applications through PlanningAlerts. As Matthew wrote in June, “The aim is to strengthen the connection between citizens and local councillors around one of the most important things that local government does which is planning”. We’re also trying to improve the whole commenting flow in PlanningAlerts.

I’ve been working on this new system for a while now, prototyping and iterating on the new comment options and folding improvements back into the general comment form so everybody benefits.

About a month ago I ran a survey with people who had made a comment on PlanningAlerts in the last few months. The survey went out to just over 500 people and we had 36 responders–about the same percentage turn-out as our PlanningAlerts survey at the beginning of the year (6% from 20,000). As you can see, the vast majority of PlanningAlerts users don’t currently comment.

We’ve never asked users about the commenting process before, so I was initially trying to find out some quite general things:

What kind of people are commenting currently?
How do they feel about the experience of commenting?
How easily do they get through the process of commenting?
Do people see the comments as a discussion between neighbours or just a message to council? or both?
Who do they think these comments go to? Do they understand the difference between the council organisation and the councillors?

The responses include some clear patterns and have raised a bunch of questions to follow up with short structured interviews. I’m also going to have these people use the new form prototype. This is to weed out usability problems before we launch this new feature to some areas of PlanningAlerts.

Here are some of the observations from the survey responses:

Older people are more likely to comment in PlanningAlerts

We’re now run two surveys of PlanningAlerts users asking them roughly how old they are. The first survey was sent to all users, this recent one was just to people who had recently commented on a planning application through the site.

Compared to the first survey to all users, responders to the recent commenters survey were relatively older. There were less people in their 30s and 40s and more in their 60s and 70s. Older people may be more likely to respond to these surveys generally, but we can still see from the different results that commenters are relatively older.

Knowing this can help us better empathise with the people using PlanningAlerts and make it more usable. For example, there is currently a lot of very small, grey text on the site that is likely not noticeable or comfortable to read for people with diminished eye sight—almost everybody’s eye sight gets at least a little worse with age. Knowing that this could be an issue for lots of PlanningAlerts users makes improving the readability of text a higher priority.

Comparing recent commenters to all PlanningAlerts users
Age group	All users	Recent commenters
30s	20%	11%
40s	26%	14%
50s	26%	28%
60s	18%	33%
70s	5%	8%

There’s a good understanding that comments go to planning authorities, but not that they go to neighbours signed up to PlanningAlerts

To “Who do you think receives your comments made on PlanningAlerts?” 86% (32) of responders checked “Local council staff”. Only 35% (13) checked “Neighbours who are signed up to PlanningAlerts”. Only one person thought their comments also went to elected councillors.

There seems to be a good understanding amongst these commenters that their comments are sent to the planning authority for the application. But not that they go to other people in the area signed up to PlanningAlerts. They were also very clear that their comments did not go to elected councillors.

In the interviews I want to follow up on this are find out if people are positive or negative about their comments going to other locals. I personally think it’s an important part of PlanningAlerts that people in an area can learn about local development, local history and how to impact the planning process from their neighbours. It seems like an efficient way to share knowledge, a way to strengthen connections between people and to demonstrate how easy it is to comment. If people are negative about this then what are their concerns?

“I have no idea if the comments will be listened to or what impact they will have if any”

There’s a clear pattern in the responses that people don’t think their comments are being listened to by planning authorities. They also don’t know how they could find out if they are. One person noted this as a reason to why they don’t make more comments.

“I have no real way of knowing whether my concerns are given any attention by local council.”
“I have no idea if the comments will be listened to or what impact they will have if any”
“I believe that the [council] are going to go ahead and develop, come what may. However, if I and others don’t comment/object we will be seen as providing tacit approval to Council’s actions ”
“Insufficient tools and transparency of processes from Planning Panel.”
“I don’t feel I have any influence. I was just sharing my observations, or thoughts with like minded people who may. (have influence)”
“I do get the ‘Form Letter’ from Council but I’m not in any way convinced they listen.”
“The process of being alerted and expressing an opinion works well but whether it has any effect is doubtful.”
“Although councils do respond to my comments, it is just an automated reply. The replies from City of Sydney are quite informative but the ones from Marrickville pretty meaningless.”
“I am not in any way convinced anyone listens. A previous mayor stated he ONLY listens to people whose property directly adjoins the building site.”
“–I know it’s money that matters, not people”

Giving people simple access to their elected local representatives, and a way to have a public exchange with them, will hopefully provide a lever to increase their impact.

“I would only comment on applications that really affect me”

There was a strong pattern of people saying they only comment on applications that will effect them or that are interesting to them:

“I would only comment on applications that really affect me, don’t want to just restrict any application.”
“Not many are that relevant / interest me. ”
“Sometimes it doesn’t feel like it is right making comments that don’t directly impact”
“I target the ones that are most important”
“Only interested in applications which either reflect major planning and development issues for the district as a whole (eg approval for demolition of old houses or repurposing of industrial structures) or which affect the immediate location around where I live.”
“I comment on those that affect my area”
“I only comment on applications that may effect my immediate area.”
“Comment on those that I get that are significant,ie: not on normal sheds,pools,dwellings etc.”
“only comment on ones that I feel directly impact myself or my suburb”
“I would only comment on an application, that adversely affected me or my community. ”
“Not all relevant to me. Also don’t want to be seen as simply negative about a lot of the development ”
“A lot are irrelevant to my interest.”

How do people decide if an application is relevant to them? Is there a common criteria?

Why don’t you comment on more applications? “It takes too much time”

A number of people mentioned that commenting was a time consuming process, and that this prevented them from commenting on more applications:

“Time – not so much in writing the response but in being across the particulars of DAs and being able to write an informed response.”
“Not enough time in my life – I restrict myself to those most relevant to me”
“Time poor”
“It takes too much time, but one concern is that it generates too much paper and mail from the council. ”

What are people’s basic processes for commenting in PlanningAlerts? What are the most time consuming components of this? Can we save people time?

“I have only commented on applications where I have a knowledge of the property or street amenities.”

A few people mentioned that they feel you should have a certain amount of knowledge of an application or area to comment on it, and that they only comment on applications they are knowledgeable about.

How does someone become knowledgeable about application? What is the most important and useful information about applications?

Comment in private

A small number of people mentioned that they would like to be able to comment without it being made public.

“Would like an option to remain private on the internet – eg a “name withheld” type system.”
“Should be able to make comments in confidence ie only seen by council, not other residents”
“I prefer not to have my name published on the web. The first time I commented it wasn’t clear that the name was published.”

Suggestions & improvements

There were a few suggestions for changes to PlanningAlerts:

“Should be able to cut and paste photos diagrams, sketches etc.”
“I was pleased that the local council accepted the comments as an Objection. But it was not clear in making the comment that it would be going to the council.”
“There could be a button to share the objection via other social media or a process to enforce the council to contact us.”
“Some times it is hard to find a document to comment on if I don’t know the exact details, The search function is complex.”

Summing up PlanningAlerts

We also had a few comments that are just nice summaries of what is good about PlanningAlerts. It’s great to see that there are people who understand and can articulate what PlanningAlerts does well:

“PlanningAlerts removes the hurdles. I hear about developments I would not have otherwise known about, and I can quickly provide input without having to know any particular council processes.”
“Its an efficient system. I’m alerted to the various viewpoints of others.”
“Because it shares my opinion with other concerned people as well as council. Going directly to council wouldn’t share it with others concerned.”

Next steps

If we want to make using PlanningAlerts a intuitive and enjoyable experience we need to understand the humans at the centre of it’s design. This is a small step to improve our understanding of the type of people who comment in PlanningAlerts, some of their concerns, and some of the barriers to commenting.

We’ve already drawn on the responses to this survey in updating wording and information surrounding the commenting process to make it better fit people’s mental model and address their concerns.

I’m now lining up interviews with a handful of the people who responded to try and answer some of the questions raised above and get to know them more. They’ll also show us how they use PlanningAlerts and test out the new comment form. This will highlight current usability problems and hopefully suggest ways to make commenting easier for everyone.

Design research is still very new to the OpenAustralia Foundation. Like all our work, we’re always open to advice and contributions to help us improve our projects. If you’re experienced in user research and want to make a contribution to our open source projects to transform democracy, please drop us a line or come down to our monthly pub meet. We’d love to hear your ideas.

Posted in PlanningAlerts.org.au | Tagged age, comments in PlanningAlerts, local councils, local democracy, local development, local planning submissions, planning, planningalerts, research, user testing | Leave a comment

Ruby web scraping tutorial on morph.io – Part 2, start writing your scraper

By Luke Bacon | Published: October 15, 2015

In the past post we set up our scraper. Now we’re going to start out writing our scraper.

It can be really helpful to start out writing your scraper in an interactive shell. In the shell you’ll get quick feedback as you explore the page you’re trying to scrape, instead of having to run your scraper file to see what your code does.

The interactive shell for ruby is called irb. Start an irb session on the command line with:

> bundle exec irb

The bundle exec command executes your irb command in the context of your project’s Gemfile. This means that your specified gems will be available.

The first command you need to run in irb is:

>> require 'mechanize'

This loads in the Mechanize library. Mechanize is a helpful library for making requesting and interacting with webpages.

Now you can create an instance of Mechanize that will be your agent to do things like ‘get’ pages and ‘click’ on links:

>> agent = Mechanize.new

You want to get information for all the members you can. Looking at your target page it turns out the members are spread across several pages. You’ll have to scrape all 3 pages to get all the members. Rather than worry about this now, lets start small. Start by just collecting the information you want for the first member on the first page. Reducing the complexity as you start to write your code will make it easier to debug as you go along.

In your irb session, use the Mechanize get method to get the first page with members listed on it.

>> url = "https://morph.io/documentation/examples/australian_members_of_parliament"
>> page = agent.get(url)

This returns the source of your page as a Mechanize Page object. You’ll be pulling the information you want out of this object using the handy Nokogiri XML searching methods that Mechanize loads in for you.

Let’s review some of these methods.

at()

The at() method returns the first element that matches the selectors provided. For example, page.at(‘ul’) returns the first <ul> element in the page as a Nokogiri XML Element that you can parse. There are a number of ways to target elements using the at() method. We’re using a css style selector in this example because many people are familiar with this style from writing CSS or jQuery. You can also target elements by class, e.g. page.at('.search-filter-results'); or id, e.g. page.at('#content').

search()

The search() method works like the at() method, but returns an Array of every element that matches the target instead of just the first. Running page.search('li') returns an Array of every <li> element in page.

You can chain these methods together to find specific elements. page.at('.search-filter-results').at('li').search('p') will return an Array of all <p> elements found within the first <li> element found within the first element with the class .search-filter-results on the page.

You can use the at() and search() methods to get the first member list item on the page:

>> page.at('.search-filter-results').at('li')

This returns a big blob of code that can be hard to read. You can use the inner_text() method to help work out if got the element you’re looking for: the first member in the list.

>> page.at('.search-filter-results').at('li').inner_text
=> "\n\nThe Hon Ian Macfarlane MP\n\n\n\n\n\nMember for\nGroom,Queensland\nParty\nLiberal Party of Australia\nConnect\n\nEmail\n\n\n"

Victory!

Now that you’ve found your first member, you want to collect their title, electorate, party, and the url for their individual page. Let’s start with the title.

If you view the page source in your browser and look at the first member list item, you can see that the title of the member, “The Hon Ian Macfarlane MP”, is the text inside the link in the <p> with the class ‘title’.

<li>
  <p class='title'>
    <a href="http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6">
      The Hon Ian Macfarlane MP
    </a>
  </p>
  <p class='thumbnail'>
    <a href="http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6">
      <img alt="Photo of The Hon Ian Macfarlane MP" src="http://parlinfo.aph.gov.au/parlInfo/download/handbook/allmps/WN6/upload_ref_binary/WN6.JPG" width="80" />
    </a>
  </p>
  <dl>
    <dt>Member for</dt>
    <dd>Groom, Queensland</dd>
    <dt>Party</dt>
    <dd>Liberal Party of Australia</dd>
    <dt>Connect</dt>
    <dd>
      <a class="social mail" href="mailto:Ian.Macfarlane.MP@aph.gov.au"
      target="_blank">Email</a>
    </dd>
  </dl>
</li>

You can use the .inner_text method here.

>> page.at('.search-filter-results').at('li').at('.title').inner_text
=> "\nThe Hon Ian Macfarlane MP\n"

There it is: the title of the first member. It’s got messy \n whitespace characters around it though. Never fear, you can clean it up with the Ruby method strip.

>> page.at('.search-filter-results').at('li').at('.title').inner_text.strip
=> "The Hon Ian Macfarlane MP"

You’ve successfully scraped the first bit of information you want.

Now that you’ve got some code for your scraper, let’s add it to your scraper.rb file and make your first commit.

You’ll want to come back to your irb session, so leave it running and open your scraper.rb file in your code editor. Replace the commented out template code with the working code from your irb session.

Your scraper.rb should look like this:

require 'mechanize'

agent = Mechanize.new
url = 'https://morph.io/documentation/examples/australian_members_of_parliament'
page = agent.get(url)

page.at('.search-filter-results').at('li').at('.title').inner_text.strip

You actually want to collect members with this scraper, so create a member object and assign the text you’ve collected as it’s title:

require 'mechanize'

agent = Mechanize.new
url = 'https://morph.io/documentation/examples/australian_members_of_parliament'
page = agent.get(url)

member = {
  title: page.at('.search-filter-results').at('li').at('.title').inner_text.strip
}

Add a final line to the file to help confirm that everything is working as expected.

p member

You can now, back in on the command line in the folder for your project, run this file in Ruby:

> bundle exec ruby scraper.rb

The scraper runs and the p command returns your member:

> bundle exec ruby scraper.rb
{:title=>"The Hon Ian Macfarlane MP"}

This is a good time to make your first git commit for this project. Yay!

In our next post we’ll work out how to scrape more bits of information from the page.

Posted in Morph | Tagged Ruby, Ruby web scraping tutorial on morph.io, scraping | Leave a comment

We’ve got a lot to finish in 2015 – here is our plan

By Henare Degan | Published: October 15, 2015

Catch a ferry to Cockatoo Island for a planning session? Why not.

Every three months the core team at the OpenAustralia Foundation gets together for a day to plan our next quarter ahead. It’s a good time to review what we got done, how we’re feeling about our work, what we want to get done next and to make any course corrections. It’s all in the context of a broader plan we mapped out at the start of the year.

Last Tuesday, the 6th of October, was our last planning session for 2015. We like to escape our normal working environments for these sessions to make sure our thinking is a bit fresher than usual. Thanks to the beautiful weather in Sydney lately we decided to take the ferry to Cockatoo Island.

After our last planning session in June we said it was going to be a really busy 3 months. It sure was.

Because of some much needed holidays, for most of the time we were back to just 2 full time people. This turned out to be good preparation for Matthew joining the Digital Transformation Office.

By and large we did what we set out to do last quarter but with some interesting additions, including an exciting international collaboration. A few months ago we started discussing a project to bring They Vote For You to the Ukraine with some partners, Open North and OPORA. That project was confirmed last quarter and we’ve already started work on it and we’re making great progress.

To make way for this Henare didn’t start work on PlanningAlerts – Write To Your Councillor as we’d planned and instead Luke plunged into the most vital bit of the project – making it work beautifully and simply within the existing PlanningAlerts application. We’re continuing work on this major project over the next quarter.

A problem that both Luke and Henare encountered over the last couple of months was the big overhead of context switching between so many little projects and jobs. We were really keen to avoid this in our next plan however it’s clear that we have a lot of small things to get done before the end of the year. A positive part of the plan ahead is that we’ve managed to allow Luke to mainly focus on the important work of our next major project.

Here’s what’s on the agenda

The brain dump of what we’ve been up to and, roughly, what the next 3 months will be like

October

Another great scraping workshop

In August we ran a scraping workshop. It was lots of fun and we had excellent feedback so we’re doing it all again on October the 25th. At the time of writing there are still places available if you want to register to join us.

After the last workshop Luke developed a detailed scraping tutorial. We’re in the process of publishing a series of blog posts of this tutorial in time for the next scraping course. You can already take a look at the first post of the series.

More work on They Vote For You for Ukraine. And a bit of a break

We’re planning to keep progressing TVFY for Ukraine but we’ll also take a short break to do some other work while the OPORA team start work on developing the translations. This will allow us to pick the work back up in November and make a lot more progress quickly when we do.

PlanningAlerts – Write To Your Councillor

Throughout October Luke will almost exclusively work on this project with the ambitious goal of launching a focussed minimum viable project early in November. This will likely involve rolling this feature out to a small number of councils to see what works and what needs improvement.

Charging commercial users of PlanningAlerts

Luke and Henare worked on this feature over August but so far we haven’t had a lot of take up. We still have a number of important questions unanswered and Henare is hoping to spend some time here and there trying to answer those questions and make this work.

It’s still one of the most promising opportunities we have to financially support the work the Foundation does but we’re not going to spend forever chasing something that doesn’t work. This quarter will be the decider.

Server upgrade

Last quarter we spent the time to verify a viable strategy for a vital server upgrade. We’ll start planning this in October and schedule it in for a weekend in the near future.

November

Write To Your Councillor & Ukraine

At the start of November Luke and Henare will work together on hopefully rolling out the first stages of the PlanningAlerts feature to write to your councillor. They’ll then switch to working together on TVFY for Ukraine, which Henare will do for the rest of the month.

Saying Hi to Melbourne

Since the start of this year we’ve been wanting to connect with the Melbourne civic tech community. It’s just been a matter of finding the right time.

Recently the good people at the Melbourne chapter of Open Knowledge Foundation Australia contacted us to see if we’d like to come and chat with them about what we do – we said of course, of course! Henare will head down around the 25th of November and is also hoping to say hi to some of the other tech and transparency communities in Melbourne.

December

Party Time

To celebrate all our achievements in 2015, and all you great people who’ve helped us on the way, we’re going to have a little party on the 6th of December. It’ll be pretty low key and casual (what else?) and will probably involve us heading to a park somewhere and enjoying some sunshine, a drink, and something to eat. We’d love for you to join us.

Write To Your Councillor

We’ll begin to look at the infrastructure of this project in December so we can start rolling it out more widely. Hopefully we’ll also have learnt a little about how people are using it so that we can also improve the design before a more general rollout in the new year.

Full steam ahead!

Posted in OpenAustralia Foundation, Planning | Leave a comment

Ruby web scraping tutorial on morph.io – Part 1, setting up your scraper

By Luke Bacon | Published: October 13, 2015

With just a few lines of code, you can write a scraper to collect data from messy web pages and save it in a structured format you can work with.

This tutorial will take you through the process of writing a simple scraper. This tutorial uses the Ruby programming language, but you can apply the steps and techniques to any language available on morph.io.

Over this tutorial you will:

create a scraper on morph.io
clone it using git to work with on your local machine
make sure you have the necessary dependencies installed
write scraping code to collect information from a website
publish and run your scraper on morph.io

In this first instalment you’ll create a scraper, clone it to your machine, and install

You’ll use morph.io, the command line and a code editor on your local machine.

Let’s get started.

Find the data you want to scrape

In this tutorial you’re going to write a simple scraper to collect information about the elected members of Australia’s Federal Parliament. For each member let’s capture their title, electorate, party, and the url for their individual page on the Parliament’s website.

The data you want to scrape needs to be available on the web. We’ve copied a basic list of members from the Parliament’s website to https://morph.io/documentation/examples/australian_members_of_parliament for practice scraping. You will target this page to get the member information with your scraper.

The simplified list of Australian MPs for you to scrape on morph.io

Some web pages are much harder to scrape than others. The member information you’re trying to collect is published in a simple HTML list, which means you should be able to target and collect the information you want quite easily. If the information was in an image or PDF then it would be much harder to access programmatically and therefore much harder to write a scraper for.

Now that you’ve found the data you want to scrape and you know you can scrape it, the next step is to set up your scraper.

Create your scraper on morph.io and clone it to your machine

The easiest way to get started is to create a new scraper on morph.io.

Select the language you want to write your scraper in. This tutorial uses Ruby, so let’s go with that.

Fill out the new scraper form

If you are a member of organisations on GitHub, you can set the owner of your scraper to be either your own account or one of your organisations.

Choose a name and description for your scraper. Use keywords that will help you and others find this scraper on morph.io in the future. Let’s call this scraper “tutorial_members_of_australian_parliament” and describe it as “Collects members of Australia’s Federal Parliament (tutorial)”.

Click “Create Scraper”!

After morph.io has finished creating the new scraper you are taken to your fresh scraper page. You want to clone all the template scraper code morph.io provides to your local machine so you can work with it there.

On the scraper page there is a heading “Scraper code”, with a button to copy the “git clone URL”. This is the link to the GitHub repository of your scraper’s code. Click the button to copy the link to your clipboard.

Commands you’ll need to enter to clone your repository

Open your computer’s command line and cd to the directory you want to work in. Type git clone then paste in the url you copied to get something like:

git clone https://github.com/username/tutorial_members_of_australian_parliament.git

This command pulls down the code from GitHub and adds it to a new directory called nsw_parliament_current_session_bills. Change to that directory with cd tutorial_members_of_australian_parliament and then list the files with ls -al. You should see a bunch of files including:

scraper.rb, the file that morph.io runs and that you’ll write your scraping code in
Gemfile, which defines the dependencies you’ll need to run your scraper.

Now that you have the template scraper on your local machine, you need to make sure you have the necessary software installed to run it.

Installing Ruby

Installing Ruby is out of the scope of this tutorial but there are lots of good guides on the web. You might like to use something like RailsInstaller that takes care of this for you. Tools like rbenv or rvm can also be helpful for installing and switching Ruby versions on your computer.

Install the required libraries

In the Gemfile, you’ll see a Ruby version and two libraries specified:

ruby "2.0.0"

gem "scraperwiki", git: "https://github.com/openaustralia/scraperwiki-ruby.git", branch: "morph_defaults"
gem "mechanize"

This is template code that helps you get started by defining some basic dependencies for your scraper. You can read more about language versions and libraries in the morph.io documentation.

You can use Bundler to manage a Ruby project’s dependencies. Run, bundle install on the command line to check the Gemfile and install any libraries (called gems in Ruby) that are required.

So far you’ve set up all your files, cloned them to your machine, and installed the necessary dependencies. In our next post it’ll be time to write your scraper!

Posted in Morph | Tagged Ruby, Ruby web scraping tutorial on morph.io, scraping | Leave a comment

Matthew and the Digital Transformation Office

By Matthew Landauer | Published: October 5, 2015

The Digital Transformation Office was established in July of this year, by the then Minister, now Prime Minister Malcolm Turnbull, to transform government and make government services simpler, clearer, faster and more humane.

The cornerstone of this huge endeavour is putting users first. Design and build government services for people, not for government and all kinds of amazing things will happen.

This is a mission that is close to my heart. After all Kat Szuminska and I started the OpenAustralia Foundation with the goal of getting people more actively involved in the political process. This led us to create services with people at the centre.

The creation of the Digital Transformation Office brings a once in a generation opportunity to make things better and I want to help realise this amazing opportunity.

That is why I’m joining the Digital Transformation Office.

In order for the OpenAustralia Foundation to clearly and definitively maintain its independence I will step down as a director of the foundation.

This is certainly not a decision I take lightly. On a personal level I would have loved to stay on as a member of the board but with a much less active day-to-day involvement with the running of the foundation and the development of its projects. However, that would not have been the best thing for the foundation. In very practical terms the foundation needs to be able to praise government when it is doing the right thing and criticise government when it is doing the wrong thing. This is only possible with true independence.

I won’t be disappearing. In fact I still plan to contribute as a volunteer.

The foundation is People.

I want to thank Henare Degan for his unflailing “getting shit done” attitude and approach – nothing is too hard and little is too serious to not laugh about.

I want to thank Kat Szuminska for inciting me at just the right time with some carefully chosen words. She’s the true radical, yet ever-patient, with a keen bullshit detector and always the one to ask the questions that get to the core of a problem.

And to all the volunteers and donors over the years, there are far too many of you to mention here. Thank you for everything. Thank you for your support. Thank you for everything you’ve done.

For me everything started with a talk in 2004 given by Tom Steinberg, Tom Loosemore and Stefan Magdalinski. It was the launch of a new website TheyWorkForYou.com. It was really an accident that I was there. Little did I know where it would lead. Thank you to Tom, Tom and Stef for that.

For inventing what we now know as civic tech I want to thank Julian Todd, Francis Irving, Chris Lightfoot, Matthew Somerville and Tom Steinberg. You have all been a continuing source of inspiration to me. Thank you.

I’m very proud of what we’ve achieved at the OpenAustralia Foundation. We’ve helped millions of Australians connect with their communities, governments and politicians. We’ve made tools to help people create the change they want to see.

With Henare, Kat and Luke, the OpenAustralia Foundation is in excellent hands and I very much look forward to seeing it develop and grow into the future. I can’t wait to see what they will make next!

The foundation, not being part of government, and with full independence is still in a unique position to do things that no one else can do. What we need for Australia is a rich and diverse ecosystem of governmental organisations and non-governmental organisations all working together for the best possible outcome for citizens.

This is one of many reasons why what the OpenAustralia Foundation does is now more important than ever.

Posted in Announcement, OpenAustralia Foundation | Tagged Digital Transformation Office, DTO, Matthew Landauer | 4 Responses

Civic Tech Monthly, September 2015

By Luke Bacon | Published: September 25, 2015

Welcome to the eighth edition of Civic Tech Monthly. Below you’ll find news and notes about civic tech from Australia and around the world.

As always we’d love to see you at the OpenAustralia Foundation Sydney Pub Meet
next Tuesday if you’re in town. Come along and give a lightning talk about something interesting in civic tech you’ve seen or done.

If you know someone who’d like this newsletter, pass it on: http://eepurl.com/bcE0DX.

News and Notes

A massive week for They Vote For You

Australia got a new Prime Minister last week. As the media detailed the step by step drama, tens of thousands of people visited They Vote For You to see how the players had voted in Parliament.

Over 30,000 people have visited They Vote For You since the leadership challenge was announced. This is over three times as many people as on the project’s launch day when it featured prominently in the Guardian Australia. In a week of so many words, it was fantastic to see people looking for some real information about what MPs do in Parliament.

Web scraping workshop success—we’re doing it again in October

We ran our first Introduction to Web Scraping Workshop a few weeks ago and it was a big success.

You can read more about how it went on information graphics designer Kelly Tall’s blog. We also blogged about the things we learned in this first experiment. We’re looking forward to seeing what everyone does with their new scraping skills!

We were really pleased and impressed with how much everyone learned—so we’re doing it all again. We’re still locking down our date and venue, but it will almost certainly be on Sunday, 25th October, near Central in Sydney. There will be 10 places available.

If you want to learn how to scrape structured data for your projects then let us know you’re interested via email or twitter. Then we can contact you when registrations open shortly.

EveryPolitician – 200 countries and counting

For Global Legislative Openness Week 2015 EveryPolitician set the ambitious target to add 66 more nations to their project and pass the 200 mark—and they did it! People from all around the world contributed research and scrapers to hit the goal.

EveryPolitician is an open, free repository of information on politicians from (now) over 200 national parliaments that you can use in your projects.

FOI and eyes wide shut: even public servants want to know

Suelette Dreyfus made a clear case for a more open, reliable Freedom of Information system in Australia to a conference hall full of public servants in August. Read a transcript of the address in The Mandarin.

Dreyfus argues that a more open government isn’t only in the public interest, but can also protect the independence of public servants and foster public trust in government.

Freedominfo.org is creating a deliberative process exemption library

In Freedom of Information law deliberative process exemptions are rules designed to protect the open and frank flow of advice and discussion inside government by restricting it from public access.

Freedominfo.org is creating a library of model examples of these rules and summaries of national laws that you can compare. How do your government’s rules compare to others? What would you change?

If you’re government isn’t listed yet you can contribute it to the resource.

Economist Joseph Stiglitz discusses the justification for these kinds of public access exemptions and their impact in his 1999 lecture on the importance of transparency to good governance (PDF).

Leadership transition at Open North

Canadian civic tech organisation Open North have a new executive director Jean-Noé Landry as founder James McKinney moves on to new things.

We’re looking forward to seeing how Open North continue to grow with a new director and can’t wait to see what James does next.

In this post James shares his current thoughts about where he fit as an individual inside the organisation he founded and looks back on his approach as executive director.

FoxScan

Since foxes were introduced into Australia in 1871 they have caused huge damage to native species. Foxes are a designated pest throughout the country.

FoxScan gives information to people trying to fix this problem. You report sightings of foxes and the damage they cause and Foxscan—rather than just sucking all this data into a black hole—tries to make it available to everyone. It shares some similarities with GrowStuff in that it encourages people to share disparate information and helps practitioners working independently to contribute to a social good.

morph.io got a big new server and continues to grow

A couple of months ago we celebrated the 3000th scraper running on morph.io. In the last week we passed 3600 scrapers and there’s now 3000 people using the platform—yikes!

With all this new use morph.io’s server was beginning to struggle. Some scrapers were also requiring more memory than we could allocate them.

After a helpful discussion with some of the most active morph.io users we upgraded to a bigger, more powerful server. You can now scrape bigger documents and run more memory intensive processes, as we increased the memory allowance for your scraper runs from 100 MB to 512 MB.

If morph.io is useful to you, please become a supporter to keep it running and open to all.

Posted in Civic Tech Monthly | Tagged deliberative process exemptions, EveryPolitician, feral animals, FOI, FoxScan, Freedom of Information, Freedominfo.org, James McKinney, Jean-Noé Landry, morph.io, native animals, Open North, Suelette Dreyfus, They Vote For You, Web scraping | Leave a comment

A massive week for They Vote For You

By Luke Bacon | Published: September 24, 2015

Forget what politicians say. What truly matters is what they do. And what they do is vote, to write our laws which affect us all.
— They Vote For You

Australia got a new Prime Minister last week. As the media detailed the step by step drama, tens of thousands of people visited and shared They Vote For You to see how the players had actually voted in Parliament.

Well over 30,000 people have visited They Vote For You since the leadership challenge was announced. This is over three times as many people as on the project’s launch day when it featured in the Guardian Australia’s comment section and Datablog.

Most people looked at the new Prime Minister Malcolm Turnbull’s voting record, while others looked up their own representative or the newly promoted ministers when they were announced. Dozens more people have subscribed to be notified when policies that interest them are updated and we even saw one wonderful person start contributing summaries of divisions. The vast majority of people came from Facebook (70%!), and over 90% came from social media more generally.

In a week of so many words, it was fantastic to see people looking for and sharing real information about what MPs do in Parliament. We hope to see more people holding their representatives accountable for their votes on the laws that effect our society.

Henare went on FBi Radio’s Backchat on Saturday morning to talk about the project and why people are looking for the kind of information it provides.

Posted in Media, They Vote For You | Tagged auspol, Australian Government, Backchat, Malcolm Turnbull, media, Public Whip, They Vote For You | Leave a comment

Search
Occasional News

Stay in the loop with occasional news and notes from the OpenAustralia Foundation in your inbox.

Email address:
Categories

RSS Links
- All posts
- All comments

Digital Library Services for Democracy

Our services are made possible by your generosity

Running your scraper on morph.io

That’s all folks

Older people are more likely to comment in PlanningAlerts

There’s a good understanding that comments go to planning authorities, but not that they go to neighbours signed up to PlanningAlerts

“I have no idea if the comments will be listened to or what impact they will have if any”

“I would only comment on applications that really affect me”

Why don’t you comment on more applications? “It takes too much time”

“I have only commented on applications where I have a knowledge of the property or street amenities.”

Comment in private

Suggestions & improvements

Summing up PlanningAlerts

Next steps

at()

search()

Here’s what’s on the agenda

October

Another great scraping workshop

More work on They Vote For You for Ukraine. And a bit of a break

PlanningAlerts – Write To Your Councillor

Charging commercial users of PlanningAlerts

Server upgrade

November

Write To Your Councillor & Ukraine

Saying Hi to Melbourne

December

Party Time

Write To Your Councillor

Find the data you want to scrape

Create your scraper on morph.io and clone it to your machine

Installing Ruby

Install the required libraries

News and Notes

morph.io got a big new server and continues to grow

Search

Occasional News

Categories

RSS Links

Stay Connected

Visit Our Sites