Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Wget with url with hash - how to process it in python

  • 27-01-2020 12:57PM
    #1
    Registered Users, Registered Users 2 Posts: 5,755 ✭✭✭


    I'm calling the command here for wget for url http://pypi.org/project/pip/#files
    self.run_command('("wget http://pypi.org/project/pip/\#files -O index1.html")')
    

    My log thinks that i'm running it without anything from hash onward
    2020-01-27 11:37:23,128 020776:084 INFO:  wget http://pypi.org/project/pip/
    

    I've tried it without the quotes, brackets and escape characters but get same result. Anyone have any idea?


Comments

  • Registered Users, Registered Users 2 Posts: 6,176 ✭✭✭Idleater


    Have a look at urlencode


  • Registered Users, Registered Users 2 Posts: 885 ✭✭✭clearz


    I'm calling the command here for wget for url http://pypi.org/project/pip/#files
    self.run_command('("wget http://pypi.org/project/pip/\#files -O index1.html")')
    

    My log thinks that i'm running it without anything from hash onward
    2020-01-27 11:37:23,128 020776:084 INFO:  wget http://pypi.org/project/pip/
    

    I've tried it without the quotes, brackets and escape characters but get same result. Anyone have any idea?
    http://pypi.org/project/pip/%23files
    
    should work. If not try curl instead of wget if it's installed

    I don’t know much about the python standard library but I’d be positive there are classes available for downloading data from the web. This would be a safer and cleaner bet than calling system apps like wget.

    The hash symbol is usually used on the client side as part of a JavaScript app so even if you get it to work, what downloads might not be what you expected.


  • Registered Users, Registered Users 2 Posts: 7,157 ✭✭✭srsly78


    OP just use a raw string.

    rawstring = r"whatever"

    self.run_command(r"wget http://pypi.org/project/pip/\#files -O index1.html")


  • Registered Users, Registered Users 2 Posts: 885 ✭✭✭clearz


    srsly78 wrote: »
    OP just use a raw string.

    rawstring = r"whatever"

    self.run_command(r"wget http://pypi.org/project/pip/\#files -O index1.html")


    Won't make a difference. This is not an 'issue' with python but with the wget application.

    EDIT:

    Everything related to this can be found here in the source for wget
    http://git.savannah.gnu.org/cgit/wget.git/tree/src/url.c
    To get started: Anywhere you can find the string 'fragment' in the above code is of interest

    This led me to search google for "wget fragment" which contins plenty of relevant information.


Advertisement