Data Aggregation
The main challenge ahead of us was the amount of data that we needed to aggregate from various sources. The data aggregation was mainly to solve scouts work of navigating to various sources to find data and instead have a dedicated search system where they can find most of the needed results. This wasn't an easy task at all. We had to address the following scenarios
- Sites with different markup structure
- Sites with frequently changing markup
- Sites with data toggled using javascript
- Sites with different ways to navigate across the site
Our solution
After a lot of research from our side, we developed a system which scraped data from all major research oriented sites. The major challenge was to make these scrapers fail-safe, which we were able to achieve only with time. We initially built a mechanism to store the failed links during scraping. We gradually built an interactive tool which would scrape all the failed links. Later, we built ways to scrape an individual link as well, which made our scraper friendly as well as robust.