Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi all,
Vanilla are planning an update to the site on April 24th (next Wednesday). It is a major PHP8 update which is expected to boost performance across the site. The site will be down from 7pm and it is expected to take about an hour to complete. We appreciate your patience during the update.
Thanks all.

Is it possible for me to see all the links of pages on a website?

Options
  • 29-07-2020 2:54pm
    #1
    Registered Users Posts: 1,881 ✭✭✭


    I am looking at a company website , which shares specific links with certain people or organisations.

    The links addresses shared dont follow a certain pattern.

    Is there a program that I can use to scrape the server of this website and find out a list of links on the website.

    Some of these pages are embedded with content as well to an external webpage and I am just wondering if it is possible to find these webpages.


Comments

  • Registered Users Posts: 2,660 ✭✭✭Baz_


    kala85 wrote: »

    Your question seems shadily unspecific, nevertheless, this should help https://scrapy.org/

    Note, though, that this would have been easily found with a web search...


  • Registered Users Posts: 1,881 ✭✭✭kala85


    Baz_ wrote: »
    Your question seems shadily unspecific, nevertheless, this should help https://scrapy.org/

    Note, though, that this would have been easily found with a web search...

    Did a search but didn't come up with that website.

    Do I have to write code or is the code there already.


  • Registered Users Posts: 10,460 ✭✭✭✭28064212


    kala85 wrote: »
    Did a search but didn't come up with that website.

    Do I have to write code or is the code there already.
    You'd have to write some code. However, it likely wouldn't help with what you're looking for. Web spiders work by picking one page on a website, then following all the links on that page to all the new pages, then all the links on the new pages to more new pages etc. That doesn't help if there aren't any links to your pages from any other pages on the site. If that's the case, there aren't really any practical ways to build a list of the pages you're looking for

    Boardsie Enhancement Suite - a browser extension to make using Boards on desktop a better experience (includes full-width display, keyboard shortcuts, and dark mode). Now available through the extension stores

    Firefox: https://addons.mozilla.org/addon/boardsie-enhancement-suite/

    Chrome/Edge/Opera: https://chromewebstore.google.com/detail/boardsie-enhancement-suit/bbgnmnfagihoohjkofdnofcfmkpdmmce



  • Registered Users Posts: 2,426 ✭✭✭ressem


    Depends on the website and whether it needs authentication, or makes life difficult for bots.

    You could try wget (available on linux, or windows using Cygwin)

    wget -r https://www.test.mysite
    will try to create a mirror of the content of the website's html.

    wget -r -spider https://www.test.mysite
    will crawl through the website making a note of the links it finds on the console.
    (handy for finding broken links if you no better tool available)

    and
    wget -r -spider --span-hosts https://www.test.mysite
    should follow for external links.


Advertisement