This is slightly re-inventing the wheel, but I have released a new package called Dumbquotes. The idea is to replace simple typographic techniques with their more correct forms. Such as replacing a ' with ‘ or ’. This also gave me the chance to try and write a package. So dealing with making sure it’s psr-0
compliant and has associated unit tests to run with phpunit
.
The package will deal with apostrophes, quotes, dashes, and ellipses. There are certain issues. Ultimately this is designed to deal with plain text such as a markdown document. It does not work with HTML. Trying to parse HTML with regex will bring the return of Cthulu. However once you deal with HTML directly things get a little complicated.
Consider the following sentence that could appear in some HTML <p>Mary said \"How <em>did</em> she do that?\"</p>
. We want to turn this into <p>Mary said “How <em>did</em> she do that?”</p>
. This is complicated by the fact we can't just search for a string of text containing two double quotes like so, /\"(.*?)\"/
. The sentence doesn't actually appear in the HTML DOM. We actually have three blocks of text
-
Mary said \"How
-
did
-
she do that?\"
To concatenate that into a single string, and then put the tags back in the right place seems a very difficult task. So I have decided to write the dumbquotes parser to be applied before the markdown transform is applied.
*[HTML]: HyperText Markup Language
*[DOM]: Document Object Model