IPC Blog

The PHPUnuhi Framework at a Glance

Translations for pipelines, OpenAI, and more

Jul 14, 2023

While pipelines, tests, and automation positively influence many aspects of our daily work, there are still topics where manual work makes developers yawn. The platform-independent open source framework PHPUnuhi is trying to revamp the topic of “translations”, enhancing it with possibilities in the areas of CI/CD, storage formats, and even OpenAI.

Who hasn’t had the following situation? You’re working on an application, a plug-in, or something similar and suddenly discover that translations in some language are missing. Depending on the software’s application area, this can either make the user smile slightly, or it can have far-reaching consequences. But one thing is always the same. The non-functional requirement “trust in the software” is harmed.

It’s a pity that mistakes happen here again and again. Meanwhile, there are tools like PHPUnit, PHPStan, and many others that help create high-quality applications. But what about translation? Wouldn’t it be wonderful if the pull request pipeline failed right when a colleague forgot a translation? Or even if states arise where individual localizations are out of sync and have a different, invalid structure? This is exactly PHPUnuhi’s approach. But let’s start at the beginning.

Among other things, I’m the developer of the official Mollie payment plug-ins for Shopware. These plug-ins serve as central and optionally installable modules in online stores, based on Shopware 5 or Shopware 6 [1]. Merchants can install these plug-ins in no time and offer a wide range of payment methods from Mollie in their Shopware store [2]. Anyone who’s ever had to do anything in this area knows that payment is a serious sector. In short, it’s about money. There aren’t many excuses when a mistake happens. It just has to work!

IPC NEWSLETTER

All news about PHP and web development

 

Because of this, we’ve already spent a lot of time building pipelines. These range from the usual unit tests, static analysis, to many E2E tests based on Cypress. But despite these precautions, it happens again and again that translations for multilingual plug-ins are forgotten. Every developer and tester knows it’s difficult to verify all areas in all languages, especially as a small team. But for the product’s end user, it simply looks embarrassing and untested.

So one day I decided to integrate a small script that would do at least some rough checks. Lo and behold, soon after, the first pipeline failed when I forgot a translation.

From then on, there were always one or two ideas for further tests and features. And so I decided to completely rebuild the previously small script from the Mollie plug-ins and publish it in combination with many other requirements as a platform-independent open source framework. After all, the world only benefits when more developers get something out of it.

But before we begin our first application, why “unuhi”? Quite simply, it means “translate” or “translation” in Hawaiian. Do I speak Hawaiian? No.

First steps

Before we get into the possibilities and basic concepts of PHPUnuhi, I’d like to start directly with its usage. After just a few steps, the tests are ready to be integrated into a pipeline. Let’s imagine we’re developers

Because after just a few steps, the tests are ready to be integrated into a pipeline. Let’s imagine we are developers:inside of a software that has several translations based on JSON files. These are already finished and are located in the project or source code of the application.

YOU LOVE PHP?

Explore the PHP Core Track

 

You can easily install PHPUnuhi with Composer. The recommendation is to do so as a dev dependency:

composer require --dev boxblinkracer/phpunuhi

After installation, all that’s needed is to create an XML-based configuration, and the framework is ready for use.

In our configuration (phpunuhi.xml) we define one or more translation sets. These sets are freely definable bundles of localizations. A localization is then mapped via a file, section or other, depending on the format. One can either create one large set or several topic-based sets, depending on the platform and application requirements (Listing 1).

<phpunuhi>
  <translations>
    <set name="App">
      <format>
        <json/>
      </format>
      <locales>
        <locale name="de">./snippets/de.json</locale>
        <locale name="en">./snippets/en.json</locale>
      </locales>
    </set>
  </translations>
</phpunuhi>

With that, we’re already finished with basic installation and configuration. Now we can start our tests and check how the translations are doing.

php vendor/bin/phpunuhi validate

Who could see it coming? Unfortunately, the tests fail. We received information that a wrong structure was found, and that a translation exists but doesn’t contain a value.

PHPUnuhi works for individual translation with unique keys in a localization. In our case, there’s an issue with the key card.btnCancel in the German as well as English version (Fig. 1).

(Editor’s note: This article was originally published in German and has been translated into English. Therefore, the translation example in PHPUnuhi is working from German to English.)

Fig. 1: Example of error output during validation

To solve this problem, we have the option of manually entering the missing entry in the de.json file, or we can use a prepared command to automatically repair the structures:

php vendor/bin/phpunuhi fix:structure

This will give us a uniform structure in both files. Now we can run the following command and automatically correct our empty translation too.

php vendor/bin/phpunuhi translate --service=googleweb

With Google’s support, our empty entry has now been automatically translated and entered into the corresponding JSON file. Besides Google [3], DeepL [4], and OpenAI [5] can also be used for this. But before we delve deeper into this topic, it’s time to get to know the basic framework better.

 

PHPUnuhi’s basic structure

PHPUnuhi exists in the combination of different abstraction layers. This makes it possible to guarantee basic functionality while still being flexible in choosing formats and services. What does this mean?

In the current version, there are three basic pillars: storage formats, exchange formats, and translation services. These are in constant interaction and can be combined with each other as you wish (Fig. 2).

Fig. 2: Basic structure of the PHPUnuhi abstraction layers

Storage formats

Storage formats define how data is persisted. Translations can be stored in JSON files, INI files, PHP (array) files, or directly in a database (Shopware 6). Therefore, the focus of Storages is on reading, converting, and writing translations.

Different formats can also be equipped with individual settings. For instance, the JSON and PHP formats have the option of specifying the number of indentations and alphabetical sorting. In the case of Shopware 6 Storage, the database entries entity can (and must be) specified. Listing 2 shows two examples for the INI and Shopware 6 formats.

<set name="Storefront">
  <format>
    <ini indent="4" sort="true"/>
  </format>
  ...
</set>
 
<set name="Products">
  <format>
    <shopware6 entity="product"/>
  </format>
  ...
</set>

While simpler formats like JSON, INI, and PHP are based on simple data structures, there are also formats that divide translations into groups, like Shopware 6. The Shopware 6 format directly connects to the database, so a corresponding connection to the database must be established first. The parameters needed for this connection can be stored easily with an env area in the XML configuration or specified directly via env export (Listing 3).

<phpunuhi>
  <php>
    <env name="DB_HOST" value="127.0.0.1"/>
    <env name="DB_PORT" value="3306"/>
    <env name="DB_USER" value=""/>
    <env name="DB_PASSWD" value=""/>
    <env name="DB_DBNAME" value="shopware"/>
  </php>
</phpunuhi>

But back to our groups. Shopware 6 works as a storage with entities in the database. These are things like products, payment types, currencies, and more. Here, translations don’t refer to the general names of properties, but to product data or user data in the system.

IPC NEWSLETTER

All news about PHP and web development

 

This means that each entry of these entities (for instance, a single product) has multiple properties (name, description, etc.) that can be translated into different languages. The resulting additional dimension in our matrix is solved in PHPUnuhi using groups. Each entity (each product) receives a unique group ID with all associated translations. Table 1 shows an example of this.

Key Group DE EN
name product-1 PHP Magazin PHP Magazine
description product-1 ein tolles Heft a great magazine
name product-2 Entwickler Magazin Developer Magazine
description product-2 auch ein tolles Heft also a great magazine

Table 1: Example of generated translation structures based on groups

Considering that products in particular can have many properties, this list can get very long. There’s also a high chance that only a part of the properties should even be translated at all. This is where another storage format feature comes into play: the filters.

With include or exclude filters, you can include or exclude certain translations. Wildcard placeholders can also be used for this. The configuration in Listing 4 removes the custom_fields property and all properties beginning with meta_ from the translation list.

<set>
  ...
  <filter>
    <exclude>
      <key>custom_fields</key>
      <key>meta_*</key>
    </exclude>
  </filter>
  ...
</set>

Exchange formats

This type of format or abstraction layer is used for exchange with other systems. It focuses on data preparation suitable for the format and the storage (export), as well as reading certain file types for conversion back into PHPUnuhi compatible translations (import).

Of course, the classic CSV cannot be missing. This supports the export and import of simple and extended storage formats (groups).

In other words, no matter what your storage format is, you will receive a CSV file. If the storage you use supports writing translations, then the CSV file can be automatically imported again.

 

Besides CSV, there’s  also an integrated HTML format. This format solves several problems at once. The export creates a single index.hml file that can be easily opened in any browser. This file contains an HTML-based spreadsheet with integrated editing options and storage of the adjustments. CSS and JavaScript are directly integrated. This is a great plug-and-play approach, especially for colleagues who tend to send back .xls files instead of the needed CSV files.

However, more than just local processing is possible. There is also another variant that’s just as exciting for staging systems, for instance. Since the export path can be selected individually, it’s possible to store this file in a public directory on the web server. This way, a certain URL on the staging system can output an overview of all currently available translations. Thanks to the integration form, these can also be directly edited. The resulting output can be downloaded and imported into the software with the import command for the next iteration. To add even more automation, generating this export can either run the pipeline’s post-deployment job, or simply in a fixed interval via cronjob or something similar.

The HTML format also supports storage formats with groups. In this case, grouped translations are displayed visually so that translation can be done intuitively. Figures 3 and 4 show examples of HTML and CSV exports.

Fig. 3: Example of HTML export with integrated form

Fig. 4: Example of a CSV export with three languages

Translation services

The last abstraction area in the current version is connecting to different translation providers. Currently, it supports Google, DeepL, and OpenAI. This makes it possible for missing translations to be automatically added with an integrated translate command. Thanks to the framework’s basic concept, this means that all kinds of storage formats that support writing translations can also be combined with translation services at the same time.

PHPUnuhi only needs an existing value in another language as a basis for this automation. If this is the case, the translation can be requested from the external service. The result is automatically persisted with configured storage.

Further individual configurations are provided when integrating different providers. For instance, in DeepL, you can use the  –deepl-formal argument to specify if the translation should be formal or informal. This affects the German salutations “du” and “Sie”, for instance.

The googleweb service can be used for a quick start. This sends a simple query to the familiar Google website that we all know:

php vendor/bin/phpunuhi translate --service=googleweb

Although this isn’t recommended for continuous mass queries, it usually works quite well and can be used purposefully.

If you want to take a more professional approach, you can also connect to Google Cloud Translate and, as previously mentioned, to DeepL, which is becoming increasingly successful. For AI enthusiasts, there is now also an OpenAI integration. It currently uses the text-davinci-003 model, which is not perfect yet but it already delivers surprisingly good results. OpenAI can be used with the following command along with the specification of a corresponding service including the API key:

php vendor/bin/phpunuhi translate --service=openai --openai-key=(my-api-key)

What functions are available?

Now that we understand the basic framework and some of its possibilities, we can take a closer look at the framework’s extended functionality.

YOU LOVE PHP?

Explore the PHP Core Track

 

With the help of a few commands, you can perform much more than simple translation testing. State analysis, listings, reporting, imports and exports offer a multitude of possibilities for your project.

Translation coverage

With the status command, you can output coverage in the area of translations. Values are provided on the level of localizations, translation sets, and as an overall view:

php vendor/bin/phpunuhi status

Validation

One of the framework’s core functions is the validate command. As I previously mentioned, you can test translations for completeness. But the command also has some other useful features.

A problem that occurs frequently during further software development is an unplanned variation in translation key spelling. While working with code styles, little consideration is given to the fact that text modules should also have a conforming structure. Using case style validation, you can maintain the consistency of keys over the project’s lifecycle. PHPUnuhi offers a list of potential options, like the well-known variants Pascal, Camel, Kebab, and more.

Therefore, a translation set can consist of several potential case styles. If no styles are specified, the whole test is skipped. The actual test based on this list works for simple storage formats and for multi-nested storages like JSON and PHP. Here, all hierarchy levels are checked for the specified styles.

Optionally, you can also fix different styles on certain levels. For a nested structure like JSON, Pascal Case can be defined at the root level, while Kebab Case must be used at all other levels (Listing 5).

<set>
  <styles>
    <style level="0">pascal</style>
    <style>kebab</style>
  </styles>
</set>

Friends of JUnit reports will also get their money’s worth with PHPUnuhi. With the report-format argument, you can generate a JUnit compliant XML file:

php vendor/bin/phpunuhi validate --report-format=junit --report-output=junit.xml

This contains all tests performed with corresponding error reports and can be used in a familiar way and processed by the machine.

Fix structure

With large file-based translations like JSON and INI, manually fixing different structures can be extremely time-consuming, even more so if they span several hierarchies or levels. This can be automated and simplified using the integrated fix:structure command..

In the process, PHPUnuhi verifies individual structures and ensures that each localization also receives all of the entries. As a little bonus, the storage formats also rewrite values with previously configured indentations or even in alphabetical order, depending on the type:

php vendor/bin/phpunuhi fix:structure

I should mention that this is only a matter of repairing structures. The values are stored with an empty string, so a validation still fails.

Export/Import

Exports and imports provide a simple variant for working with external agencies and systems. Using a simple export command, you can quickly create files that can be passed to systems or people by selecting a format:

php vendor/bin/phpunuhi export ... --format=csv
php vendor/bin/phpunuhi export ... --format=html

If no special translation set is specified, then all sets will be exported to separate files. However, as with many commands, you can also select a set by argument and have only this set processed. After the customized results have been returned, they can be imported back into the system with an import command:

php vendor/bin/phpunuhi import --set=storefront --file=storefront.csv

It should be noted here that version control using Git or something similar is strongly recommended, especially when working with file-based storage formats. For storage formats using a database, an appropriate back-up should also be made before the import.

Translate

The translate command is one of the more exciting features along with the validate command. As already described in the “Translation Services” section, an external service can be used to automatically translate values. A service is simply selected with the service argument.

Now PHPUnuhi goes through all existing entries and tries to translate empty translations with the specified service. The value of a found language serves as a basis. Only one value can exist. If this isn’t the case, then it cannot be translated.

php vendor/bin/phpunuhi translate --service=googleweb
php vendor/bin/phpunuhi translate --service=deepl --deepl-key=xyz

If you want to completely retranslate an existing localization, you can use the force argument for this. You must specify the locale that will be retranslated.

php vendor/bin/phpunuhi translate --service=googleweb --force=en-GB

But with automated services, it’s important to always remember that translations should be generated depending on the application’s context. Automatically generated results fit most cases, but manual, human verification is still recommended.

IPC NEWSLETTER

All news about PHP and web development

 

Conclusion

As a platform-independent open source framework, PHPUnuhi tries to simplify translation work for developers and teams, while also increasing the possibilities of quality assurance measures. With its simple configuration options, it can be quickly integrated into existing projects and used efficiently after just a few minutes. PHPUnuhi’s possibilities are far from exhausted. So if you feel like joining or just have some ideas, you can participate via the GitHub repository [6].


Links & Literatur

[1] https://www.shopware.com

[2] https://www.mollie.com

[3] https://translate.google.com

[4] https://www.deepl.com

[5] https://openai.com

[6] https://github.com/boxblinkracer/phpunuhi

Stay tuned!

Register for our newsletter

Behind the Tracks of IPC

PHP Core
Best practices & applications

General Web Development
Broader web development topics

Test & Performance
Software testing and performance improvements

Agile & People
Getting agile right is so important

Software Architecture
All about PHP frameworks, concepts &
environments

DevOps & Deployment
Learn about DevOps and transform your development pipeline

Content Management Systems
Sessions on content management systems

#slideless (pure coding)
See how technology really works

Web Security
All about
web security