Principally I think you make the whole thing to complicate given that at the moment nothing is internationalized.
Why do you want to create a parser if you have to search all strings manually before you can use your parser? I think that's just a big unneeded overhead.
My idea would be much easier: :-)
- In code and templates just use _($namespace, $entity, $arg1, $arg2, ...)
- When introducing a new string to localize just write it in an XML file (one per locale) with all the needed information: namespace, entity, context, priority, string, and maybe additional a boolean to indicate whether it contains a string that is already escaped (HTML) or not
In that way you can then create something like a compiler that creates you your PHP array file that contains only namespace, entity, string (and maybe the boolean). If someone needs to translate to another language he has all the needed data in one XML file per module and in that way it is even easily and quickly possible to write desktop programs to ease the translation process.
The thing that in my opinion causes the biggest problems with your underline function is that if you use it in that way, everyone needs to write his modules in en_AU or whatever the default is for Silverstripe. But let's say someone wants to create a module for an Italian site, obviously he doesn't care much about the English translation and just puts the Italian strings everywhere.
Translation is then a big pain.
But if he just uses the entities in the code/templates and creates the XML file with all the information in Italian, someone can simple use that XML file, translate it and everything would work.
Maybe you should create a simple tool (in PHP or whatever else that is cross-platform compatible) to make it easy to create those XML files. Or you can simple use a table in the database for that and create an exporter.
I think in that way the implementation is much easier to program and to use also for non-native English speakers.
Tell me what you think about that - hope to have helped you and not confused you :-)