wpseek.com
A WordPress-centric search engine for devs and theme authors



wp_kses_normalize_entities › WordPress Function

Since1.0.0
Deprecatedn/a
wp_kses_normalize_entities ( $content, $context = 'html' )
Parameters: (2)
  • (string) $content Content to normalize entities.
    Required: Yes
  • (string) $context Context for normalization. Can be either 'html' or 'xml'. Default 'html'.
    Required: No
    Default: 'html'
Returns:
  • (string) Content with normalized entities.
Defined at:
Codex:
Change Log:
  • 5.5.0

Converts and fixes HTML entities.

This function normalizes HTML entities. It will convert AT&T to the correct AT&T, : to :, &#XYZZY; to &#XYZZY; and so on. When $context is set to 'xml', HTML entities are converted to their code points. For example, AT&T…&#XYZZY; is converted to AT&T…&#XYZZY;.


Source

function wp_kses_normalize_entities( $content, $context = 'html' ) {
	// Disarm all entities by converting & to &
	$content = str_replace( '&', '&', $content );

	/*
	 * Decode any character references that are now double-encoded.
	 *
	 * It's important that the following normalizations happen in the correct order.
	 *
	 * At this point, all `&` have been transformed to `&`. Double-encoded named character
	 * references like `&` will be decoded back to their single-encoded form `&`.
	 *
	 * First, numeric (decimal and hexadecimal) character references must be handled so that
	 * `	` becomes `	`. If the named character references were handled first, there
	 * would be no way to know whether the double-encoded character reference had been produced
	 * in this function or was the original input.
	 *
	 * Consider the two examples, first with named entity decoding followed by numeric
	 * entity decoding. We'll use U+002E FULL STOP (.) in our example, this table follows the
	 * string processing from left to right:
	 *
	 * | Input        | &-encoded        | Named ref double-decoded  | Numeric ref double-decoded |
	 * | ------------ | ---------------- | ------------------------- | -------------------------- |
	 * | `.`     | `.`     | `.`              | `.`                   |
	 * | `.` | `.` | `.`              | `.`                   |
	 *
	 * Notice in the example above that different inputs result in the same result. The second case
	 * was not normalized and produced HTML that is semantically different from the input.
	 *
	 * | Input        | &-encoded        |  Numeric ref double-decoded | Named ref double-decoded |
	 * | ------------ | ---------------- | --------------------------- | ------------------------ |
	 * | `.`     | `.`     | `.`                    | `.`                 |
	 * | `.` | `.` | `.`            | `.`             |
	 *
	 * Here, each input is normalized to an appropriate output.
	 */
	$content = preg_replace_callback( '/&#(0*[0-9]{1,7});/', 'wp_kses_normalize_entities2', $content );
	$content = preg_replace_callback( '/&#[Xx](0*[0-9A-Fa-f]{1,6});/', 'wp_kses_normalize_entities3', $content );
	if ( 'xml' === $context ) {
		$content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_xml_named_entities', $content );
	} else {
		$content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_named_entities', $content );
	}

	return $content;
}