Crawling the Web


Use the crawler («‘‘rack) and download more than 500 Web pages given as seed the URL of your popular Web page. (You can download HTTrack from here)
  1. Construct the adjacency matrix, A, of the graph of the pages you downloaded. (Aij=1 if there is a link from node i to j otherwise Aij=0)
  2. Give a visual printout of your matrix A, using Matlab.
  3. Discover a "strongly connected" group of Web pages.
  4. Calculate the in-degree of all the nodes in the graph. Sort the Web pages wrt to in-degree and give a draw of your results