Contributors: gitlost, zodiac1978
Tags: Unicode, Normalization, Form C, Unicode Normalization Form C, Normalize, Normalizer, UTF-8, NFC
Requires at least: 3.9.13
Tested up to: 5.2.2
Stable tag: 1.0.7
License: GPLv2 or later
License URI: http://www.gnu.org/licenses/gpl-2.0.html
Normalizes UTF-8 input to Normalization Form C.
This is a souped-up version of the Normalizer plugin by Torsten Landsiedel.
It adds WP filters to normalize UTF-8 data coming into the system to the de facto web standard Normalization Form C. The Unicode Consortium report is at Unicode Normalization Forms.
For best performance install (if possible) the PHP Internationalization extension Intl
, which includes the PHP class Normalizer
.
However the plugin works without the PHP Intl
extension being installed, as it uses (a modified version of) the Symfony Normalizer
polyfill.
Also text pasted into (most) inputs is normalized immediately using the JavaScript normalize()
method. For browsers without normalization support, the unorm polyfill is used.
For further info, see the WP Trac ticket #30130 Normalize characters with combining marks to precomposed characters and this Make WP Core comment.
The plugin also works around the Safari-specific issues raised by the WP Trac ticket #22363 Accents in attachment filenames should be sanitized.
For existing data, the plugin includes an administration tool to scan and normalize the database. Important: before using this tool to normalize, please backup your database. This is especially important if your database contains non-normalized serialized data, as this plugin uses the same suck-and-see technique as interconnect/it's Database Search and Replace Script in PHP to deal with serialized data, and is fallible.
A google-cheating schoolboy French translation is supplied.
The plugin should run on PHP 5.2.17 to 8.0, and on WP 3.9.13 to 5.2.2.
The project is on github.
Install in the standard way via the 'Plugins' menu in WordPress and then activate. No further action is needed.
You can add normalization to anything that passes its content through a filter. The canonical way is to use the unfc_extra_filters
filter which returns an array of filter names - for instance, in your theme's functions.php
file, add:
function mytheme_unfc_extra_filters( $extra_filters ) {
$extra_filters[] = 'myfilter';
return $extra_filters;
}
add_filter( 'unfc_extra_filters', 'mytheme_unfc_extra_filters' );
Note that the unfc_extra_filters
filter is only called in the administration backend. You can also add a filter directly, to be called in the frontend or backend, by referencing the global PHP variable unfc_normalize
, but you should ensure that the Normalizer
polyfill is loaded if you don't have the PHP Intl
extension installed:
global $unfc_normalize;
if ( $unfc_normalize ) {
if ( ! function_exists( 'normalizer_is_normalized' ) ) { // If the "Intl" extension is not installed...
$unfc_normalize->load_unfc_normalizer_class(); // ...load the polyfill.
}
add_filter( 'myfilter', array( $unfc_normalize, 'normalize' ), 6 /* Or whatever priority you choose */ );
}
No, it doesn't. Alas, such files would have to re-uploaded. However if (enough - or more like any) interest were expressed...
- Update qunit; run in travis.
- WP 5.2.1 compatible.
- Unicode 12.1.0
- Require explicit define of UNFC_DEBUG. Move load of debug file to 'init'.
- Safari submit workaround.
- WP 4.7.1 compatible.
- Do load of normalizer in slugs list constructor not items list. Fixes slugs listing crash when Intl extension not installed.
- Ignore case when checking charset (UNFC_Normalize::is_blog_utf8()).
- WP coding standards (yoda style in unfc_utf8_4range()).
- Add screenshots.
- Some extra unittests.
- WP 4.7.0 compatible.
- Add _wp_old_slug on normalizing slugs.
- Escape title in screen reader label in db check.
- For PHP 5 performance do preliminary preg_match on isNormalized.
- Improve comments in Normalizer.php, tabs -> 4 spaces, UNFC_REGEX_IS_INVALID_UTF8_XXX invert & rename.
- Fix untested admin notice and adjust tests. Fix some test bleed ($wp_scripts).
- Move all .php files bar main to includes subdir. Remove unused ajax.
- Fix single-byte trie. Move regex alts to tools/functions.php.
- Fix untested admin notice. Adjust tests for untested admin_notice change. Fix some test bleed ($wp_scripts).
- WP 4.6 compatible.
- First release for wordpress.org repository.
- Add assets.
- Include "class-unfc-list_table.php" in .pot file (for forward/backward compat).
- Remove unused variable in UNFC_DB_Check_Slugs_List_Table.
- Initial release.
- Fix bad serialized data corruption on db check.
- Fix text domain tag.
- Initial version after renaming from tl-normalize.
Tested with WordPress 4.8.0.
Less noisy when WP_DEBUG set.
Fixes fatal error on check slugs listing in DB check tool when Intl extension not available.
Now adds _wp_old_slug on normalizing slugs so old links will work.
Improved PHP 5 performance on isNormalized() check.