Maybe you, in some day, have faced this problem: you're wanting to show some data in your application and there's no webservice or any other way to get it. So, you know that the information you want is available in some websites and you ask yourself: how can I get this?
Waves is the brazilian most complete website for surf community, there's a huge number of informations about the waves in many surf spots around the country and that's my target. My idea is to get these informations and distribute them as a REST webservice.
Show me the code
First thing I did was create a class that represents the report itself.
This class constructor receives the beach URL and the endpoint of the Waves website with the default value
WAVES_URL created as a constant before. As you can see, I've used the awesome Nokogiri to parse the HTML.
Now let's create a method that returns a hash with all the information we need.
In this method I used Duck Typing to simplify each condition parsing. In other words, for each parser I call extract method that returns an object with the information needed and store it in
ConditionParser has a list of classes that are related to each information that should be extracted from the page and every parser should have the static method
self.extract(html). So, imagine you want to extract the full name of the beach in the page:
In every parser, we can manipulate HTML data using a query selector to find the content we want. In this case, I'm searching for
#content h1 selector, getting its first occurrence and extract the inner content of it.
Finally, extracting data from websites is a pretty simple task when you are using Ruby, but you should know that every single HTML change on page will make you to adapt your parser to the new code.
You can see this code on my Github as a Rails plugin and the REST webservice you can access here.