Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

.net, Word and the Irish Language

  • 18-02-2008 11:51am
    #1
    Moderators, Science, Health & Environment Moderators Posts: 9,035 Mod ✭✭✭✭


    So I have a asp.net vb.net app that lets our users create content by either pasting/typing content into a scripted iframe thingy (a la fckeditor - it's actually an enhanced widgEditor for those interested) or they can browse to one of their word documents and upload that. In the background I save the word doc as html, open it and clean it using regular expressions and then insert the content into the iframe. This clean up has evolved and does the job pretty well.

    Anyway some users are uploading Irish language stuff in word and if they paste it into the editable iframe I can catch all letters with an acute and convert it to a html code but with word forget about it. I can open the html file saved from word myself and see á for example but when I open this html in code it doesn't recognise it when the regex replace is run. So this must be an encoding issue but I'm at a loss as I've tried different encodings. I'm pretty much just going to give up and tell users if it's Irish content to just paste it in but maybe someone here has dealt with this problem.


Comments

  • Registered Users, Registered Users 2 Posts: 2,931 ✭✭✭Ginger


    Just a couple of thoughts

    One would be to set the CultureInfo to Ireland before opening the file to see if you can then open it correctly. It should be encoded UTF-8

    That should allow the strings to be recognised correctly..

    If its Docx you can just rename to .zip and open the xml file inside and parse away


  • Moderators, Science, Health & Environment Moderators Posts: 9,035 Mod ✭✭✭✭mewso


    Never heard of DocX but I'll try the culture angle. May provide an answer thanks.


  • Registered Users, Registered Users 2 Posts: 2,931 ✭✭✭Ginger


    Docx is the word 2007 format


Advertisement