Internationalized Resource Identifiers (IRIs)

Internationalized Resource Identifiers (IRIs) are a form of resource identifier for the Internet that permits the use of characters and formats that are suitable for national languages other than English. IRIs can be used in place of URIs or URLs where the applications involved with the request and response support them.

IRIs are described by RFC 3987, Internationalized Resource Identifiers (IRIs), which is available from https://www.ietf.org/rfc/rfc3987.txt. CICS supports the use of IRIs in URIMAP resources for inbound web client requests to CICS as an HTTP server, and in Atom feed documents.

Host name

To accommodate the requirements of domain name servers, web clients convert the host name in an IRI into a format called Punycode. Punycode is described by RFC 3492, Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA), which is available from https://www.ietf.org/rfc/rfc3492.txt. This algorithm encodes the hostname into a string composed only of alphanumerics, hyphens, and periods.

If you want to use an IRI as the link for a web resource or Atom feed that is served by CICS, in the URIMAP resource definition that defines the web client's request to CICS, you must specify the host name in Punycode. CICS does not provide a tool to carry out this conversion, but free applications are available on the Internet to support the conversion of Unicode to Punycode. If you use a single asterisk in place of the host name, to make the URIMAP resource match any host name, you do not need to use Punycode.

Path component

Web clients do not convert the path component of an IRI into Punycode, but they do escape, or percent-encode, Unicode characters in the path.

If you are using an IRI for a web resource that is served by CICS, in the URIMAP resource definition, you must percent-encode any Unicode characters in the path that you specify. If you do not have an application that can convert Unicode characters to percent-encoded representations, free applications are available on the Internet to perform this task. Note that the limits on URL length listed in URLs for CICS web support apply also to URLs for Atom feeds, which means that the part of the path component of the URL that you specify in the URIMAP resource definition must be 255 characters or less. A character in this context means a single ASCII character, not the original Unicode character. For example, the Cyrillic character that has the percent-encoded representation %D0%B4 counts as 6 characters from the 255–character limit.

When CICS installs the URIMAP resource definition, CICS stores the path in the canonical form recommended for URIs and unescapes some of the characters, but the path that is displayed when you view the URIMAP resource remains as you entered it.

When you use an IRI as a link for an Atom feed or entry document, you specify the IRI in the Atom configuration file as well as in the URIMAP resource definition. You must percent-encode any Unicode characters in the IRI in the Atom configuration file.

When CICS issues an Atom document containing the IRI, CICS converts the percent-encoded characters to XML character references, so that the XML is valid. To use the resulting link in a web client request, you must convert the XML character references back into percent-encoded characters.

This example URIMAP resource contains a path that uses Unicode characters to specify the beginning of an IRI for an Atom feed, with an asterisk at the end to indicate that path matching is used for the remainder of the IRI:
  Urimap         : ALEXANDR                                                  
  Group          : IRIMAPS                                                   
  DEScription    :                                                           
  STatus         : Enabled            Enabled | Disabled                     
  USAge          : Atom               Server | Client | Pipeline | Atom      
 UNIVERSAL RESOURCE IDENTIFIER                                               
  SCheme         : HTTP               HTTP | HTTPS                           
  POrt           : No                 No | 1-65535                           
  HOST           : *                                                         
  (Mixed Case)   :                                                           
  PAth           : %D0%90%D0%BB%D0%B5%D0%BA%D1%81%D0%B0%D0%BD%D0%B4%D1%80%D0%
  (Mixed Case)   : A1%D0%BE%D0%BB%D0%B6%D0%B5%D0%BD%D0%B8%D1%86%D1%8B%D0%BD* 
This example Atom entry contains an IRI using the equivalent XML character references for the Unicode characters that are represented in the example URIMAP resource:
<entry>
<link rel="self" href="http://example.com:5050/&#x0410;&#x043B;&#x0435;
&#x043A;&#x0441;&#x0430;&#x043D;&#x0434;&#x0440;&#x0421;&#x043E;&#x043B;&#x0436;
&#x0435;&#x043D;&#x0438;&#x0446;&#x044B;&#x043D;/000100"/>
<id>tag:example.com,2009-02-13:file:FILEA:000100</id>
<title>FILEA item 000100</title>
<rights>Copyright (c) 2009, Joe Bloggs</rights>
<published>2008-11-06T12:35:00.000Z</published>
<author>
	<name>Joe Bloggs</name>
	<email>JBloggs@example.com</email>
</author>
<app:edited>2009-03-11T14:42:38+00:00</app:edited>
<updated>2009-03-11T14:42:38+00:00</updated>

<content type="text/xml">
  <DFH0CFIL xmlns="http://www.ibm.com/xmlns/prod/cics/atom/filea">
  <filerec>
  <numb>000100</numb><name>S. D. BORMAN</name><amount>$0100.11</amount>
  </filerec>
  </DFH0CFIL>
</content>

</entry>