Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further unicode issues #35

Open
andrewleech opened this issue Apr 16, 2015 · 6 comments
Open

Further unicode issues #35

andrewleech opened this issue Apr 16, 2015 · 6 comments

Comments

@andrewleech
Copy link
Owner

"-" in movie description is displayed as "u2013" with the "0" for some reason is with a thinner typeface.

@insertnamehere1
Copy link

The Norwegian text supplied by Netflix is:
\n\n\n\n\n\n\n\n\nEn filmskaper vender tilbake til det mystiske stedet der faren døde i villmarken for å avdekke galskapen \u2013 eller ondskapen \u2013 som tok livet hans.<div class="info">

While the english text from the Netflix query is:
A filmmaker returns to the scene of his father's mysterious death in the wilderness to uncover the madness -- or the evil -- that claimed his life.

Netflix is serving up the \u2013 in there https://www.netflix.com/JSON/BOB?movieid= response.
I can mod listVideo() to replace description text "\u2013" with "--" for this special case.

@andrewleech
Copy link
Owner Author

One thing that might be worth trying, I've seen some unicode on hulu that I've had to encode with 'latin1' rather than 'utf-8' to get displaying correctly.... not sure why, it's coming from xml that explicitly says it's 'utf-8'.

@insertnamehere1
Copy link

Here is a bit more information on this problem.
The unicode netflix is returning contains \u005c\u0075\u0032\u0030\u0031\u0033 which is unicode for the ascii characters '' 'u' '2' '0' '1' '3'. So when netflix response is decode("utf-8") we get the string "\u2013". It does this for the Norwegian language version of the movie description. Possibly screwed in translation?
In this case the only solution I can see is to replace the string "\u2013" with the "--". Correct me if I'm wrong, would this decode this out with latin1?
Also I gotta admit I'm wavering on fixing this. It's a dirty fix for a minor issue, and I think it's a netflix problem anyway.

@mantheman
Copy link

I've only seen this with u2013 and u2026 (a triple period), and your patch worked fine, so I at least think it's worthy of a PR =) But didn't you just replace the Unicode-in-Unicode with the real Unicode instead of "--"? u2013 is apparently an "En dash" and not strictly a double dash. http://en.wikipedia.org/wiki/Dash#En_dash

@insertnamehere1
Copy link

Yeah, I replaced it with the correct unicode. Netflix replaces both u2013 and u2026 with "--". (if you compare the English and Norwegian movie descriptions) I was temped to just do what Netflix does but then what the hell, lets use the correct unicode.

@mantheman
Copy link

Well, Netflix show u2026 correctly as a horisontal ellipsis on both my ipad and pc. Anyways, I think you did right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants