posted by kevin on October 21, 2009

Microsoft Smart Quotes & PHP

It seems like I run into this issue for over half of the websites that I work on. The user wants to copy-and-paste their article, document, or whatever from MS Word, into a textarea and save it. The problem is word uses funky quotes, dashes, and other characters. Once it's submitted, PHP gets it and the characters are encoded differently and they display weird.

We've tried several different methods to try to eliminate the problem, but every time I googled for a fix, I never really found anything that worked well.

My boss mentioned that maybe we could fix it on the client-level, so I dove in and found what seems to be a promising fix. It's clean and simple. This function currently only replaces the single and double smart quotes, and then the strange dash character that MS word uses. Feel free to submit for character conversion codes and I'll add it to the function, I'll also add to this once I come across more problems.

Javascript function to replace Microsoft Smart Quotes with regular quotes.

 
function removeMSWordChars(str) {
    var myReplacements = new Array();
    var myCode, intReplacement;
    myReplacements[8216] = 39;
    myReplacements[8217] = 39;
    myReplacements[8220] = 34;
    myReplacements[8221] = 34;
    myReplacements[8212] = 45;
    for(c=0; c<str.length; c++) {
        var myCode = str.charCodeAt(c);
        if(myReplacements[myCode] != undefined) {
            intReplacement = myReplacements[myCode];
            str = str.substr(0,c) + String.fromCharCode(intReplacement) + str.substr(c+1);
        }
    }
    return str;
}
 

This is the jQuery that will run the filter on all textareas on your page when you tab away from the textarea (Assumes you have jQuery installed and running on the page.)

 
$(function(){
    $("textarea").blur(function(){
        $(this).val(removeMSWordChars(this.value));
    });
});
 

Removing smart quotes javascript example



Or if you don't use jQuery and you're a little green to javascript you can do this:

 
<textarea onBlur="this.value=removeMSWordChars(this.value);" name="a" rows=5 cols=10></textarea>
 

6 comments to "Replace Microsoft Word smart quotes and other characters with Javascript"

#78
November 11, 2009 at 11:02 pm
Consider using this: str = str.value.replace(new RegExp(String.fromCharCode(8216), 'g'), "'" );
#83
gusti says:
November 19, 2009 at 03:22 am
You can also try this one: http://www.dancrintea.ro/doc-to-pdf/
#84
November 19, 2009 at 08:54 am
Gusti, how would a Document to PDF generator using JAVA work for this purpose?
#100
colin says:
January 4, 2010 at 12:16 pm
Just a big-picture question: Why use the jquery code when the "green" version is preferable? Neither form allow for code reuse, but the green version is shorter, more simple, and more self-documenting.
#101
January 4, 2010 at 12:32 pm
The difference is that I can create an app.js file or something similar that is loaded on every page in our header or template php file, and when I created this, our application had over 120 forms using probably around 300 total textareas, and this function is safe to bind to each textarea. So that jQuery code binds that action to each textarea site-wide. Otherwise I'd have to put the 'onblur' action on every single textarea I wanted the function to ran on.

jQuery syntax is a bit different if you're not used to using it, but it's very powerful. Hope that answers your question.
#107
Phil says:
February 25, 2010 at 10:54 am
Great code!! I have been looking for something like this for a long time... Thanks!
Bookmark and Share

Leave a Comment

Your email address will not be published.

(You can enclose code in <php></php> blocks.)

You may use Markdown syntax.

Please enter the letters as they are shown in the image above.
Letters are not case-sensitive.