When it comes to saving web content – whether it’s a web page, an attendee list, or Google search results: scraping can solve it all.
Web scraping refers to the extraction of data from a webpage. It may sound technical, but you will be surprised by how easy it might be.
If you want to be a bit nerdy and go into technical details on how you can build a web scraper on your own, here is an excellent resource from www.parsehub.com: What is web scraping?
Otherwise, let’s get hands-on and check the available tools that don’t require any prior knowledge.
πFirst of all, have a look at the page to spot how the information is presented. Secondly, think of how you would like to store it: should it be plain text or a spreadsheet?
Scenario 1
You’re considering scraping plain text to read it later at your convenience. My go-to tool for this purpose would be the Evernote clipper.
Evernote is a note-taking app that deserves a separate post to describe all its advantages. And 225 million daily users should be a great sign of appreciation. With its clipper, you can save a piece of the web content in plenty of ways:
Simultaneously, you can organize your saved content under a folder of your choice, and add a tag or a comment.
Scenario 2
You would like to save the web content as a table. In this case, Instant Data Scraper will automatically identify the most relevant information on the page, scrape it and suggest saving it as a CSV or XLSX file
The only shortcoming is that if it doesn’t capture the information as you would like to, it doesn’t suggest much customization. Still, you can try the button ‘Try another table’ to adjust the fields it should scrape:
Often times it helps, but if it doesn’t, there is nothing you can do about it. In addition, Instant Data Scraper doesn’t support web scraping on LinkedIn.
Scenario 3
If you feel you are in need of a more complex tool, Data Miner will provide a solution to any of your scraping needs.
Every scraping algorithm in Data Scraper (previously known as Data Miner) will be called a recipe:
The row is an actual space you select for scraping, while the columns rearrange the chosen area in a table format.
It took a while to familiarize myself with the tool as it’s not self-explanatory for non-techie individuals. But once you have a good grasp of it, you won’t use any other tool.
In case there’re better and easier ways for scaping, share your experience in the comment section π. I’d love to test drive any tool π€