Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi all! We have been experiencing an issue on site where threads have been missing the latest postings. The platform host Vanilla are working on this issue. A workaround that has been used by some is to navigate back from 1 to 10+ pages to re-sync the thread and this will then show the latest posts. Thanks, Mike.
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Is it possible for me to see all the links of pages on a website?

  • 29-07-2020 2:54pm
    #1
    Registered Users, Registered Users 2 Posts: 1,907 ✭✭✭


    I am looking at a company website , which shares specific links with certain people or organisations.

    The links addresses shared dont follow a certain pattern.

    Is there a program that I can use to scrape the server of this website and find out a list of links on the website.

    Some of these pages are embedded with content as well to an external webpage and I am just wondering if it is possible to find these webpages.


Comments

  • Registered Users, Registered Users 2 Posts: 2,660 ✭✭✭Baz_


    kala85 wrote: »

    Your question seems shadily unspecific, nevertheless, this should help https://scrapy.org/

    Note, though, that this would have been easily found with a web search...


  • Registered Users, Registered Users 2 Posts: 1,907 ✭✭✭kala85


    Baz_ wrote: »
    Your question seems shadily unspecific, nevertheless, this should help https://scrapy.org/

    Note, though, that this would have been easily found with a web search...

    Did a search but didn't come up with that website.

    Do I have to write code or is the code there already.


  • Registered Users, Registered Users 2 Posts: 10,817 ✭✭✭✭28064212


    kala85 wrote: »
    Did a search but didn't come up with that website.

    Do I have to write code or is the code there already.
    You'd have to write some code. However, it likely wouldn't help with what you're looking for. Web spiders work by picking one page on a website, then following all the links on that page to all the new pages, then all the links on the new pages to more new pages etc. That doesn't help if there aren't any links to your pages from any other pages on the site. If that's the case, there aren't really any practical ways to build a list of the pages you're looking for

    Boardsie Enhancement Suite - a browser extension to make using Boards on desktop a better experience (includes full-width display, keyboard shortcuts, dark mode, and more). Now available through your browser's extension store.

    Firefox: https://addons.mozilla.org/addon/boardsie-enhancement-suite/

    Chrome/Edge/Opera: https://chromewebstore.google.com/detail/boardsie-enhancement-suit/bbgnmnfagihoohjkofdnofcfmkpdmmce



  • Registered Users, Registered Users 2 Posts: 2,426 ✭✭✭ressem


    Depends on the website and whether it needs authentication, or makes life difficult for bots.

    You could try wget (available on linux, or windows using Cygwin)

    wget -r https://www.test.mysite
    will try to create a mirror of the content of the website's html.

    wget -r -spider https://www.test.mysite
    will crawl through the website making a note of the links it finds on the console.
    (handy for finding broken links if you no better tool available)

    and
    wget -r -spider --span-hosts https://www.test.mysite
    should follow for external links.


Advertisement