libxml2
All Data Structures Files Functions Variables Typedefs Enumerations Enumerator Macros Pages
Typedefs | Enumerations | Functions
encoding.h File Reference

Character encoding conversion functions. More...

Typedefs

typedef int(* xmlCharEncodingInputFunc) (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)
 Convert characters to UTF-8.
 
typedef int(* xmlCharEncodingOutputFunc) (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)
 Convert characters from UTF-8.
 
typedef xmlCharEncError(* xmlCharEncConvFunc) (void *vctxt, unsigned char *out, int *outlen, const unsigned char *in, int *inlen, int flush)
 Convert between character encodings.
 
typedef void(* xmlCharEncConvCtxtDtor) (void *vctxt)
 Free a conversion context.
 
typedef xmlParserErrors(* xmlCharEncConvImpl) (void *vctxt, const char *name, xmlCharEncFlags flags, xmlCharEncodingHandler **out)
 If this function returns XML_ERR_OK, it must fill the out pointer with an encoding handler.
 

Enumerations

enum  xmlCharEncError
 Encoding conversion errors. More...
 
enum  xmlCharEncoding
 Predefined values for some standard encodings. More...
 
enum  xmlCharEncFlags
 Encoding conversion flags. More...
 

Functions

void xmlInitCharEncodingHandlers (void)
 
void xmlCleanupCharEncodingHandlers (void)
 Cleanup the memory allocated for the char encoding support, it unregisters all the encoding handlers and the aliases.
 
void xmlRegisterCharEncodingHandler (xmlCharEncodingHandlerPtr handler)
 Register the char encoding handler.
 
xmlCharEncodingHandlerPtr xmlGetCharEncodingHandler (xmlCharEncoding enc)
 
xmlCharEncodingHandlerPtr xmlFindCharEncodingHandler (const char *name)
 If the encoding is UTF-8, this will return a no-op handler that shouldn't be used.
 
xmlCharEncodingHandlerPtr xmlNewCharEncodingHandler (const char *name, xmlCharEncodingInputFunc input, xmlCharEncodingOutputFunc output)
 Create and registers an xmlCharEncodingHandler.
 
xmlParserErrors xmlCharEncNewCustomHandler (const char *name, xmlCharEncConvFunc input, xmlCharEncConvFunc output, xmlCharEncConvCtxtDtor ctxtDtor, void *inputCtxt, void *outputCtxt, xmlCharEncodingHandler **out)
 Create a custom xmlCharEncodingHandler.
 
int xmlAddEncodingAlias (const char *name, const char *alias)
 Registers an alias alias for an encoding named name.
 
int xmlDelEncodingAlias (const char *alias)
 Unregisters an encoding alias.
 
const char * xmlGetEncodingAlias (const char *alias)
 Lookup an encoding name for the given alias.
 
void xmlCleanupEncodingAliases (void)
 Unregisters all aliases.
 
xmlCharEncoding xmlParseCharEncoding (const char *name)
 Compare the string to the encoding schemes already known.
 
const char * xmlGetCharEncodingName (xmlCharEncoding enc)
 The "canonical" name for XML encoding.
 
xmlCharEncoding xmlDetectCharEncoding (const unsigned char *in, int len)
 Guess the encoding of the entity using the first bytes of the entity content according to the non-normative appendix F of the XML-1.0 recommendation.
 
int xmlCharEncCloseFunc (xmlCharEncodingHandler *handler)
 Releases an xmlCharEncodingHandler.
 
int xmlUTF8ToIsolat1 (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)
 Take a block of UTF-8 chars in and try to convert it to an ISO Latin 1 block of chars out.
 
int xmlIsolat1ToUTF8 (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)
 Take a block of ISO Latin 1 chars in and try to convert it to an UTF-8 block of chars out.
 

Detailed Description

Character encoding conversion functions.

Author
Daniel Veillard

Typedef Documentation

◆ xmlCharEncConvCtxtDtor

typedef void(* xmlCharEncConvCtxtDtor) (void *vctxt)

Free a conversion context.

Parameters
vctxtconversion context

◆ xmlCharEncConvFunc

typedef xmlCharEncError(* xmlCharEncConvFunc) (void *vctxt, unsigned char *out, int *outlen, const unsigned char *in, int *inlen, int flush)

Convert between character encodings.

The value of inlen after return is the number of bytes consumed and outlen is the number of bytes produced.

If the converter can consume partial multi-byte sequences, the flush flag can be used to detect truncated sequences at EOF. Otherwise, the flag can be ignored.

Parameters
vctxtconversion context
outa pointer to an array of bytes to store the result
outlenthe length of out
ina pointer to an array of input bytes
inlenthe length of in
flushend of input
Returns
an xmlCharEncError code.

◆ xmlCharEncConvImpl

typedef xmlParserErrors(* xmlCharEncConvImpl) (void *vctxt, const char *name, xmlCharEncFlags flags, xmlCharEncodingHandler **out)

If this function returns XML_ERR_OK, it must fill the out pointer with an encoding handler.

The handler can be obtained from xmlCharEncNewCustomHandler().

flags can contain XML_ENC_INPUT, XML_ENC_OUTPUT or both.

Parameters
vctxtuser data
nameencoding name
flagsbit mask of flags
outpointer to resulting handler
Returns
an xmlParserErrors code.

◆ xmlCharEncodingInputFunc

typedef int(* xmlCharEncodingInputFunc) (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)

Convert characters to UTF-8.

On success, the value of inlen after return is the number of bytes consumed and outlen is the number of bytes produced.

Parameters
outa pointer to an array of bytes to store the UTF-8 result
outlenthe length of out
ina pointer to an array of chars in the original encoding
inlenthe length of in
Returns
the number of bytes written or an xmlCharEncError code.

◆ xmlCharEncodingOutputFunc

typedef int(* xmlCharEncodingOutputFunc) (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)

Convert characters from UTF-8.

On success, the value of inlen after return is the number of bytes consumed and outlen is the number of bytes produced.

Parameters
outa pointer to an array of bytes to store the result
outlenthe length of out
ina pointer to an array of UTF-8 chars
inlenthe length of in
Returns
the number of bytes written or an xmlCharEncError code.

Enumeration Type Documentation

◆ xmlCharEncError

Encoding conversion errors.

Enumerator
XML_ENC_ERR_SUCCESS 

Success.

XML_ENC_ERR_INTERNAL 

Internal or unclassified error.

XML_ENC_ERR_INPUT 

Invalid or untranslatable input sequence.

XML_ENC_ERR_SPACE 

Not enough space in output buffer.

XML_ENC_ERR_MEMORY 

Out-of-memory error.

◆ xmlCharEncFlags

Encoding conversion flags.

Enumerator
XML_ENC_INPUT 

Create converter for input (conversion to UTF-8)

XML_ENC_OUTPUT 

Create converter for output (conversion from UTF-8)

XML_ENC_HTML 

Use HTML5 mappings.

◆ xmlCharEncoding

Predefined values for some standard encodings.

Enumerator
XML_CHAR_ENCODING_ERROR 

No char encoding detected.

XML_CHAR_ENCODING_NONE 

No char encoding detected.

XML_CHAR_ENCODING_UTF8 

UTF-8.

XML_CHAR_ENCODING_UTF16LE 

UTF-16 little endian.

XML_CHAR_ENCODING_UTF16BE 

UTF-16 big endian.

XML_CHAR_ENCODING_UCS4LE 

UCS-4 little endian.

XML_CHAR_ENCODING_UCS4BE 

UCS-4 big endian.

XML_CHAR_ENCODING_EBCDIC 

EBCDIC uh!

XML_CHAR_ENCODING_UCS4_2143 

UCS-4 unusual ordering.

XML_CHAR_ENCODING_UCS4_3412 

UCS-4 unusual ordering.

XML_CHAR_ENCODING_UCS2 

UCS-2.

XML_CHAR_ENCODING_8859_1 

ISO-8859-1 ISO Latin 1.

XML_CHAR_ENCODING_8859_2 

ISO-8859-2 ISO Latin 2.

XML_CHAR_ENCODING_8859_3 

ISO-8859-3.

XML_CHAR_ENCODING_8859_4 

ISO-8859-4.

XML_CHAR_ENCODING_8859_5 

ISO-8859-5.

XML_CHAR_ENCODING_8859_6 

ISO-8859-6.

XML_CHAR_ENCODING_8859_7 

ISO-8859-7.

XML_CHAR_ENCODING_8859_8 

ISO-8859-8.

XML_CHAR_ENCODING_8859_9 

ISO-8859-9.

XML_CHAR_ENCODING_2022_JP 

ISO-2022-JP.

XML_CHAR_ENCODING_SHIFT_JIS 

Shift_JIS.

XML_CHAR_ENCODING_EUC_JP 

EUC-JP.

XML_CHAR_ENCODING_ASCII 

pure ASCII

XML_CHAR_ENCODING_UTF16 

UTF-16 native, available since 2.14.

XML_CHAR_ENCODING_HTML 

HTML (output only), available since 2.14.

XML_CHAR_ENCODING_8859_10 

ISO-8859-10, available since 2.14.

XML_CHAR_ENCODING_8859_11 

ISO-8859-11, available since 2.14.

XML_CHAR_ENCODING_8859_13 

ISO-8859-13, available since 2.14.

XML_CHAR_ENCODING_8859_14 

ISO-8859-14, available since 2.14.

XML_CHAR_ENCODING_8859_15 

ISO-8859-15, available since 2.14.

XML_CHAR_ENCODING_8859_16 

ISO-8859-16, available since 2.14.

XML_CHAR_ENCODING_WINDOWS_1252 

windows-1252, available since 2.15

Function Documentation

◆ xmlAddEncodingAlias()

int xmlAddEncodingAlias ( const char *  name,
const char *  alias 
)

Registers an alias alias for an encoding named name.

Existing aliases will be overwritten.

Deprecated:
This function modifies global state and is not thread-safe. See xmlCtxtSetCharEncConvImpl() for an alternative.
Parameters
namethe encoding name as parsed, in UTF-8 format (ASCII actually)
aliasthe alias name as parsed, in UTF-8 format (ASCII actually)
Returns
0 in case of success, -1 in case of error.

◆ xmlCharEncCloseFunc()

int xmlCharEncCloseFunc ( xmlCharEncodingHandler *  handler)

Releases an xmlCharEncodingHandler.

Must be called after a handler is no longer in use.

Parameters
handlerencoding handler
Returns
0.

◆ xmlCharEncNewCustomHandler()

xmlParserErrors xmlCharEncNewCustomHandler ( const char *  name,
xmlCharEncConvFunc  input,
xmlCharEncConvFunc  output,
xmlCharEncConvCtxtDtor  ctxtDtor,
void *  inputCtxt,
void *  outputCtxt,
xmlCharEncodingHandler **  out 
)

Create a custom xmlCharEncodingHandler.

Parameters
namethe encoding name
inputinput callback which converts to UTF-8
outputoutput callback which converts from UTF-8
ctxtDtorcontext destructor
inputCtxtcontext for input callback
outputCtxtcontext for output callback
outpointer to resulting handler
Returns
an xmlParserErrors code.

◆ xmlCleanupCharEncodingHandlers()

void xmlCleanupCharEncodingHandlers ( void  )

Cleanup the memory allocated for the char encoding support, it unregisters all the encoding handlers and the aliases.

Deprecated:
This function will be made private. Call xmlCleanupParser() to free global state but see the warnings there. xmlCleanupParser() should be only called once at program exit. In most cases, you don't have call cleanup functions at all.

◆ xmlCleanupEncodingAliases()

void xmlCleanupEncodingAliases ( void  )

Unregisters all aliases.

Deprecated:
This function modifies global state and is not thread-safe. See xmlCtxtSetCharEncConvImpl() for an alternative.

◆ xmlDelEncodingAlias()

int xmlDelEncodingAlias ( const char *  alias)

Unregisters an encoding alias.

Deprecated:
This function modifies global state and is not thread-safe. See xmlCtxtSetCharEncConvImpl() for an alternative.
Parameters
aliasthe alias name as parsed, in UTF-8 format (ASCII actually)
Returns
0 in case of success, -1 in case of error.

◆ xmlDetectCharEncoding()

xmlCharEncoding xmlDetectCharEncoding ( const unsigned char *  in,
int  len 
)

Guess the encoding of the entity using the first bytes of the entity content according to the non-normative appendix F of the XML-1.0 recommendation.

Parameters
ina pointer to the first bytes of the XML entity, must be at least 2 bytes long (at least 4 if encoding is UTF4 variant).
lenpointer to the length of the buffer
Returns
a xmlCharEncoding value.

◆ xmlFindCharEncodingHandler()

xmlCharEncodingHandlerPtr xmlFindCharEncodingHandler ( const char *  name)

If the encoding is UTF-8, this will return a no-op handler that shouldn't be used.

Deprecated:
Use xmlOpenCharEncodingHandler() which has better error reporting.
Parameters
namea string describing the char encoding.
Returns
the handler or NULL if no handler was found or an error occurred.

◆ xmlGetCharEncodingHandler()

xmlCharEncodingHandlerPtr xmlGetCharEncodingHandler ( xmlCharEncoding  enc)
Deprecated:
Use xmlLookupCharEncodingHandler() which has better error reporting.
Parameters
encan xmlCharEncoding value.
Returns
the handler or NULL if no handler was found or an error occurred.

◆ xmlGetCharEncodingName()

const char * xmlGetCharEncodingName ( xmlCharEncoding  enc)

The "canonical" name for XML encoding.

C.f. http://www.w3.org/TR/REC-xml#charencoding Section 4.3.3 Character Encoding in Entities

Parameters
encthe encoding
Returns
the canonical name for the given encoding.

◆ xmlGetEncodingAlias()

const char * xmlGetEncodingAlias ( const char *  alias)

Lookup an encoding name for the given alias.

Deprecated:
This function is not thread-safe.
Parameters
aliasthe alias name as parsed, in UTF-8 format (ASCII actually)
Returns
NULL if not found, otherwise the original name.

◆ xmlInitCharEncodingHandlers()

void xmlInitCharEncodingHandlers ( void  )

◆ xmlIsolat1ToUTF8()

int xmlIsolat1ToUTF8 ( unsigned char *  out,
int *  outlen,
const unsigned char *  in,
int *  inlen 
)

Take a block of ISO Latin 1 chars in and try to convert it to an UTF-8 block of chars out.

The value of inlen after return is the number of bytes consumed. The value of outlen after return is the number of bytes produced.

Parameters
outa pointer to an array of bytes to store the result
outlenthe length of out
ina pointer to an array of ISO Latin 1 chars
inlenthe length of in
Returns
the number of bytes written or an xmlCharEncError code.

◆ xmlNewCharEncodingHandler()

xmlCharEncodingHandlerPtr xmlNewCharEncodingHandler ( const char *  name,
xmlCharEncodingInputFunc  input,
xmlCharEncodingOutputFunc  output 
)

Create and registers an xmlCharEncodingHandler.

Deprecated:
This function modifies global state and is not thread-safe. See xmlCtxtSetCharEncConvImpl() for an alternative.
Parameters
namethe encoding name, in UTF-8 format (ASCII actually)
inputthe xmlCharEncodingInputFunc to read that encoding
outputthe xmlCharEncodingOutputFunc to write that encoding
Returns
the xmlCharEncodingHandlerPtr created (or NULL in case of error).

◆ xmlParseCharEncoding()

xmlCharEncoding xmlParseCharEncoding ( const char *  name)

Compare the string to the encoding schemes already known.

Note that the comparison is case insensitive accordingly to the section [XML] 4.3.3 Character Encoding in Entities.

Parameters
namethe encoding name as parsed, in UTF-8 format (ASCII actually)
Returns
one of the xmlCharEncoding values or XML_CHAR_ENCODING_NONE if not recognized.

◆ xmlRegisterCharEncodingHandler()

void xmlRegisterCharEncodingHandler ( xmlCharEncodingHandlerPtr  handler)

Register the char encoding handler.

Deprecated:
This function modifies global state and is not thread-safe. See xmlCtxtSetCharEncConvImpl() for an alternative.
Parameters
handlerthe xmlCharEncodingHandlerPtr handler block

◆ xmlUTF8ToIsolat1()

int xmlUTF8ToIsolat1 ( unsigned char *  out,
int *  outlen,
const unsigned char *  in,
int *  inlen 
)

Take a block of UTF-8 chars in and try to convert it to an ISO Latin 1 block of chars out.

The value of inlen after return is the number of bytes consumed. The value of outlen after return is the number of bytes produced.

Parameters
outa pointer to an array of bytes to store the result
outlenthe length of out
ina pointer to an array of UTF-8 chars
inlenthe length of in
Returns
the number of bytes written or an xmlCharEncError code.