Encodings in a MultiLingual Web Application

Encodings in a Web Application

Tools: MySQL as a DB, Ruby/Python/PHP as a Language

Problem: If you are working on multi-lingual web application, and need to store them in database. One will surely encounter with the encodings issue. In Ruby1.8.6 I haven’t found anything promising that can clearly state what is the encoding of a String/data. To achieve the same is easy and explained very clearly for other languages and I feel Python’s support for encodings is the best and very  clean and self explanatory.

Things to remember:

  • MySql Database and each table must be created in UTF-8 format. By default its latin and it was very annoying to change at a later stage after realizing it.
CREATE DATABASE <database name> DEFAULT CHARACTER SET utf8
  • Make sure all data that is being stored in DB and in Tables is in UTF-8 format, else convert it will talk in about it in a while.
ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
  • If struggling in displaying or storing characters in proper encodings, make sure you have set encoding: utf8 in your database.{yml/php}. e.g.
development:
  adapter: mysql
  database: db_name
  username: username
  password: password
  host: localhost
  encoding: utf8

UNICODE is not UTF-8

I will try to be succinct to explain UTF-8 is not Unicode or Unicode is not UTF-8. I don’t remember where I read about it but this expert advice helped me a lot to differentiate between Unicode and UTF-8. This post has explained Unicode philosophy.

As the computer reads the characters on user input, read them as UNICODE which is in Computer  Format and is unique . Once you try to store it in a variable or in DB then Encodings comes into picture and then it depends in what encodings you are saving. If its UTF-8, things are fine and as expected if its not, it may cause some trouble.

Unicode is a system that provides a unique number for every character of a language, no matter what the language.

The mapping of “0x40” for the letter “g” is called an encoding. The value is encoded as the letter. Depending on the encoding, “0x40” could be the letter “g” (as in many North American and European encodings) or the Bangladeshi “Ù„” or the Georgian “პ”.

Python way is the easiest and preferred:
1
t = "Héllo"; x = unicode(t); str = x.encode("utf-8")

To detect String/Text encoding in Ruby?

Where was I struck?

Characters (Cyrillic/Latin/Funny) are stored wrongly in database and need to changed and stored in utf-8, after analyzing what is the current encoding of the stored text.

How to do it:

Certainly there are ways to be solved by mysql itself, but none of them worked out in my case or I may need to learn more mysql. At the same time I felt more interested how to do it Ruby way!

So here is a way I tried out and it worked very well and helps me anytime I need to know about encoding of a text/string or need to convert in any format.

First, Install the chardet gem by issuing the following command:

 $ sudo gem install chardet

Then in irb:

 require 'rubygems'
 require 'UniversalDetector'
 p UniversalDetector::chardet('Ascii text')
 p UniversalDetector::chardet('åäö')
 p UniversalDetector::chardet("Déjà vu")

The respective output from this example is:

{"encoding"=>"ascii", "confidence"=>1.0}
{"encoding"=>"utf-8", "confidence"=>0.87625}
{"encoding"=>"utf-8", "confidence"=>0.7525}

Now to convert it into desired format:

 require 'chardet'
 require 'rubygems'
 require 'UniversalDetector'
 encoding = UniversalDetector::chardet(str)["encoding"] #detects the str encoding
 Iconv.iconv("UTF-8", encoding, str).to_s  #converts the current encoding to UTF-8 of the present string

I shall love to hear your suggestions/feedback if it doesn’t work out or if it helps you and save your nights work to research on how to handle encodings

Auto Date: JavaScript based Autocomplete Date selector

Input Date Known Interface

If we talk about the date interfaces on web. There are only couple of them with easy customization as per use-case.

  • Calendar JS plugin

    Calendar image

    Calender based JS plugin http://www.dynarch.com/projects/calendar/

  • Seperate DD MM YYYY
  • Full Calendar

One might have used some decent date interfaces on some Desktop clients e.g. Omnifocus (Mac) is one having powerful date time selector with sleek interface. To be true, after using it we at EnTrip felt to design something for WEB2.0, something sleek, basic, fast and responsive date selector instead of Calender date selector or conventional interfaces.

Download

Auto-Date is hosted at google code and code can be downloaded from here. Unzip the files and go through the README file and open index.html

Javascript Autocomplete Date Selector

Javascript Autocomplete Date Selector

Demo

Enter the date in any format. Currently year is in YY, so to enter 2030 you need to type 30 only.

Here is the Demo of the plugin to try out

About Auto-Date

Auto Date v1.0.2 is an amazing JS based auto-complete date selector under MIT-LICENSE which uses the Prototype Framework. Its an alternative to calender interface in browser which sounds annoying when one needs to enter future/past dates.

It validates the date before displaying the possible options. It make sure by prompting which date is popping up in your mind. Say for example one types “01/05” it prompts 05 jan or 01 May. Sounds great!

It also understand certain literals as one usually writes in day to day communication e.g today, yesterday, tomorrow, coming weekend, new year, christmas, last month, saturday etc.

Acceptable Date Formats


  • DD MM YY (seperator can be space ” ” , comma “,” , period “.”, forward slash “/”)


    dd mm yy format

    dd mm yy or mm dd yy format

  • MM DD YY (seperator can be space ” ” , comma “,” , period “.”, forward slash “/”)
  • DD MON YY (MON can be (jan/feb/mar etc), seperator can be space ” ” , comma “,” , period “.”, forward slash “/”)


    Month dd yy format

    Mon dd yy or dd Mon yy format

  • MON DD YY (MON can be (jan/feb/mar etc), seperator can be space ” ” , comma “,” , period “.”, forward slash “/”)
  • New year 1 Jan <present year>
  • Christmas 25 Dec <present year>
  • Monday/Tuesday/Wednesday/Thrusday/Friday/Saturday/Sunday (all are coming ones, instead of present week)
  • Weekend/Week
  • Next Month/ Last Month/ Last week/ last weekend/ last sunday/ last monday etc
  • Today/Tomorrow/Yesterday


    Today Tomorrow Yesterday

    Today Tomorrow Yesterday Format

So just start typing and its fast autocomplete feature will take your heart away. Best of all, its all client side so no burden on your server.

Features


  1. Based on Javascript Prototype Library
  2. Multiple Date Format Support
  3. Support for multiple Date Separator.
  4. ONLY_FUTURE_DATE feature. Just set it to true when its called
  5. Literal Support, just by typing some characters
  6. Auto-Complete, it auto-completes as you start typing and show possible options
  7. Support only from 1970 to 2069

    ONly Future Date Support

    Support for ONLY_FUTURE_DATE

    Multiple Date format Support

    Multiple Date Separator Support

For feedback, suggestion, source code or contribution check out auto-date google code