Percent Encoding in Queries

Some characters, such as commas and curly braces, are used as query syntax. These characters are used to distinguish between the different parts of a query. For example, to search your Content component text index you might use action=query with the following parameters:

Text=*&FieldText=MATCH{one,two,three}:MyField

This query would search for all documents that have a field named "MyField" that contains the value "one", "two", or "three".

Strings in a query should be percent-encoded (another name for URL-encoded). This ensures that any commas or curly braces that are part of a string are not interpreted as query syntax. For example, if you wanted to search for all documents that have a field named "MyField", that contains the value "four,five,six", you could use the following parameters:

Text=*&FieldText=MATCH{four%2cfive%2csix}:MyField

By percent-encoding the string "four,five,six", you ensure that the commas (percent-encoded as %2c) are interpreted as part of a single query string and not as separators between multiple strings.

You might need additional percent-encoding when sending requests to a Knowledge Discovery component. When you send an HTTP request using the content-type application/x-www-form-urlencoded you should percent-encode all parameter values. This means that any commas, curly braces, or other special characters that are part of a string are percent-encoded twice. For example, a comma would be represented as %252c.

NOTE: The application or library that you use to send the HTTP request might be able to percent-encode the parameter values for you, and in some cases might do this automatically.

The following examples demonstrate how to send requests using cURL, a command-line tool.

To find documents that have a field named "MyField", that contains the value "one", "two", or "three", you could use the following command. This sends a request using the application/x-www-form-urlencoded content-type. The query strings, "one", "two", and "three" are percent-encoded, and the option --data-urlencode is used to percent-encode each parameter value:

curl http://localhost:9100/action=query --data-urlencode Text=* --data-urlencode FieldText=MATCH{one,two,three}:MyField

To find documents that have a field named "MyField", that contains the value "four,five,six", you could use the following command. The query string "four,five,six" is percent-encoded, and as before the option --data-urlencode is used to percent-encode each parameter value:

curl http://localhost:9100/action=query --data-urlencode Text=* --data-urlencode FieldText=MATCH{four%2cfive%2csix}:MyField

When the Content component receives this request, commas and other special characters in the query string will have been percent-encoded twice (such that a comma is represented by the sequence %252c):

Text=%2A&FieldText=MATCH%7Bfour%252cfive%252csix%7D%3AMyField

This double percent-encoding is not necessary with all content-types. For example, you could send the same queries using the multipart/form-data content-type. In both of the following commands, the query strings have been percent-encoded.

In this request, the query strings are "one", "two", and "three":

curl http://localhost:9100/action=query -F Text=* -F FieldText=MATCH{one,two,three}:MyField

In this request, there is a single query string "four,five,six":

curl http://localhost:9100/action=query -F Text=* -F FieldText=MATCH{four%2cfive%2csix}:MyField

With this content-type, the Content component receives the Text and FieldText parameter values in separate parts of the HTTP request body. The query strings that you supply are percent-encoded, but there is no need to percent-encode the parameter values. For example, when you send the second request, the Content component receives the following data:

Content-Disposition: form-data; name="Text"

*
Content-Disposition: form-data; name="FieldText"

MATCH{four%2cfive%2csix}:MyField