Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Html > Txt

  • 15-01-2003 9:24pm
    #1
    Closed Accounts Posts: 10,921 ✭✭✭✭


    Hi there!
    Does anyone know of a program that can take a HTML page and boil it down to a TXT file keeping only 'what you see in the browser' text and getting rid of all the source tags, javascript etc.

    I know copy+paste could achieve this but I have about 2000 pages to convert so I'd like to avoid this if possible.

    Thanks!


Comments

  • Closed Accounts Posts: 16,339 ✭✭✭✭tman


    windows commander has a pretty handy view feature, just select the file you want & press f3. sounds kind of like what you're looking for.
    you should be able to find it on download.com


  • Registered Users, Registered Users 2 Posts: 1,186 ✭✭✭davej


    check this app out.

    davej


  • Registered Users, Registered Users 2 Posts: 453 ✭✭Ant


    I'm more used to using this browser for testing web pages under Linux. However, I have also used it in Windows 2000 and AFAIK the same executable can be used for other 32-bit Windows.

    You can configure how it displays files by using "O" to change the Options eg. You can get it to ignore images or to display the "alt" text for the images.

    To convert a html file to text, use the comand line and type:
    lynx -dump -nolist foo.html > foo.txt

    To convert many files to text, you'd have to write a batch file. However, it's been years since I've had to use batch files so I can't help you there.


    Lynx for Win32 is downloadable from http://jim.spath.com/lynx_win32/


  • Closed Accounts Posts: 16,339 ✭✭✭✭tman


    for some reason windows commander which i mentioned in my last post is now known as total commander:confused:
    find it here>
    http://www.ghisler.com/

    it's a great file manager, pisses all over windows explorer!


  • Registered Users, Registered Users 2 Posts: 12,309 ✭✭✭✭Bard


    You just want something simple to convert HTML files to TXT files?

    Right you are...

    HTML2TXT from Yang Bo


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 1,237 ✭✭✭GUI


    did anyone think of plain old file save dialog in IE?
    Save as type text

    works a treat


  • Registered Users, Registered Users 2 Posts: 1,237 ✭✭✭GUI


    he still has to save the pages as html for those parser applications to work :-)


Advertisement