Tutorials

The tutorials in this section provide examples of using REST API functions in the AlchemyLanguage API to retrieve and analyze information from content that is identified by a URL or which is supplied as an argument to an API call. As a common basis for these tutorials, they use a web site that provides the annotated text of the Constitution of the United States. Using an annotated copy of the Constitution of the United States enables the tutorials to illustrate some of the more advanced capabilities of the AlchemyLanguage API for selecting specific portions of a web page by using XPath expressions.

In all of the example commands in these tutorials, replace {apikey} with the API key that you received when satisfying the pre-requisites for using the AlchemyLanguage API.

All of the tutorials in the section use the curl command-line tool to call the AlchemyLanguage service's REST API. If you cut and paste any of the sample curl commands in this section to try them out, make sure that you replace the HTML special character for the ampersand (&) with an actual ampersand each time that it appears.

Satisfy the prerequisites

Before you can do any of the tutorials in this section, you must have satisfied the prerequisites for using the AlchemyLanguage service and running curl commands. If you have already satisfied these, you do not need to do so again.

Extracting web page text and titles

To extract the title from a web page, use the curl command to call the AlchemyLanguage /url/URLGetTitle function to display the title of this web page:

curl 'https://gateway-a.watsonplatform.net/calls/url/URLGetTitle?url=http://context.montpelier.org/document/175&outputMode=json&apikey={apikey}'

You should see the following output from this command:

{
    "status": "OK",
    "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
    "url": "http://context.montpelier.org/document/175",
    "title": "United States Constitution"
}

The meaning of the contents of the status and usage elements should be clear.

To improve readability, the output of the remainder of the curl examples in this section do not include the full text of the usage message.

The url element in this output identifies the URL of the page from which the title was extracted, for use in applications that programmatically crawl web pages or sites.

The title element shows the title that was extracted from the specified web page.

To extract the text from a web page, use the curl command to call the AlchemyLanguage /url/URLGetText method to display the text of this web page. This method automatically focuses on the primary page or article content of the specified web page, ignoring page navigation, advertisements, and other typically undesirable page content:

curl 'https://gateway-a.watsonplatform.net/calls/url/URLGetText?url=http://context.montpelier.org/document/175&outputMode=json&apikey={apikey}'

You should see output that begins with the following from this command (only the first few lines of output are presented here):

{
    "status": "OK",
    "warningMessage": "truncated-oversized-text-content",
    "usage": "...",
    "url": "http://context.montpelier.org/document/175",
    "language": "english",
    "text": "1 We the People of the United States, in Order to form a
      more perfect Union, establish Justice, insure domestic
      Tranquility, provide for the common defense, promote the general
      Welfare, and secure the Blessings of Liberty to ourselves and
      our Posterity, do ordain and establish this Constitution for the
      United States of America. 6\n2 Article. I. 1\n3 Section. 1. 1\n4
      All legislative Powers herein granted shall be vested in a
      Congress of the United States,..."
}

The meaning of the contents of the status, usage, and url elements should be clear.

The text element contains the text that was extracted from the specified web page.

Specifying different output formats

The example call to the /url/URLGetTitle and the /url/URLGetText methods in the previous section included the outputMode=json parameter to cause the function to return output in that format. This parameter is used in the tutorials and is also specified in the other tutorials in this section because JSON is the format in which output is most commonly requested. The default output format in which output is returned by this and other AlchemyLanguage API functions is XML. In other words, XML output is returned if the outputMode parameter is not specified, as in the following example:

curl "https://gateway-a.watsonplatform.net/calls/url/URLGetTitle?url=http://context.montpelier.org/document/175&apikey={apikey}"

You should see the following output from this command:

<?xml version="1.0" encoding="UTF-8"?>
<results>
    <status>OK</status>
    <usage>...</usage>
    <url>http://context.montpelier.org/document/175</url>
    <title>United States Constitution</title>
</results>

Extracting selected portions of web page text

As mentioned in the initial tutorial on extracting titles and text from a web page, the AlchemyLanguage API calls enable you to refine the content that is extracted from a web page. This enables you to clean up the text that is returned by applying filters such as:

  • specifying cleaned as the value of the sourceText query parameter. Specifying this parameter strips out non-printable characters and formatting strings such as '\n'

  • specifying xpath as the value of the sourceText query parameter and providing an XPath expression that identifies the specific elements that you want to return from a web page.

For example, you would use the following command to only return paragraph (<p>) elements that have their class attribute set to class="passage":

curl 'https://gateway-a.watsonplatform.net/calls/url/URLGetText?url=http://context.montpelier.org/document/175&outputMode=json&apikey={apikey}&sourceText=xpath&xpath=//p\[@class="passage"\]'

Extracting keywords

Keywords are phrases that can be identified as being important to the content of a web page by things like their frequency within that page, the elements that enclose them (such as <title> or heading elements), and so on.

As an example of identifying keywords within a specified web page, use the curl command to call the AlchemyLanguage /url/URLGetRankedKeywords function to identify keywords that were detected in this web page, along with their ranking:

curl "https://gateway-a.watsonplatform.net/calls/url/URLGetRankedKeywords?url=http://context.montpelier.org/document/175&outputMode=json&apikey={apikey}"

The output from this command should begin with the following:

{
    "status": "OK",
    "warningMessage": "truncated-oversized-text-content",
    "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
    "url": "http://context.montpelier.org/document/175",
    "totalTransactions": "1",
    "language": "english",
    "keywords": [
        {
            "relevance": "0.989856",
            "text": "United States"
        },
        {
            "relevance": "0.491358",
            "text": "president"
        },
        {
            "relevance": "0.473375",
            "text": "Congress"
        },...

In the default output from this command, each keyword is identified in order of its relevance to the overall text.

Extracting entities

Entities - people, places, and things - are the mechanism that semantic search uses to identify fundamental blocks of information on web pages. Entities are derived from schemas that provide the underpinnings of understanding what web content is about. The AlchemyLanguage API provides multiple methods that make it easy to extract entities from web pages or POSTed content.

As an example of identifying entities within a specified web page, use the curl command to call the AlchemyLanguage /url/URLGetRankedNamedEntities function to identify ranked entities that were detected in this web page:

curl "https://gateway-a.watsonplatform.net/calls/url/URLGetRankedNamedEntities?url=http://context.montpelier.org/document/175&outputMode=json&apikey={apikey}"

The output from this command should begin with the following:

{
    "status": "OK",
    "warningMessage": "truncated-oversized-text-content",
    "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
    "url": "http://context.montpelier.org/document/175",
    "language": "english",
    "entities": [
        {
            "type": "Country",
            "relevance": "0.842782",
            "count": "70",
            "text": "United States",
            "disambiguated": {
                "subType": [
                    "Location",
                    "Region",
                    "AdministrativeDivision",
                    "GovernmentalJurisdiction",
                    "FilmEditor"
                ],
                "name": "United States",
                "website": "http://www.usa.gov/",
                "dbpedia": "http://dbpedia.org/resource/United_States",
                "freebase": "http://rdf.freebase.com/ns/m.09c7w0",
                "ciaFactbook": "http://www4.wiwiss.fu-berlin.de/factbook/resource/United_States",
                "opencyc": "http://sw.opencyc.org/concept/Mx4rvVikKpwpEbGdrcN5Y29ycA",
                "yago": "http://yago-knowledge.org/resource/United_States"
            }
        },...

Note that each named entity also includes a significant amount of information that can be used to differentiate between similar entities. For example, the disambiguated element provides subType information to identify the context in which the entity was encountered, and also provides web-based information to help look up basic information about that entity.

Extracting concepts

As an example of identifying concepts within a specified web page, use the curl command to call the AlchemyLanguage /url/URLGetRankedNamedEntities function to identify ranked entities that were detected in this web page:

curl "https://gateway-a.watsonplatform.net/calls/url/URLGetRankedNamedEntities?url=http://context.montpelier.org/document/175&outputMode=json&apikey={apikey}"

The output from this command should begin with the following:

$ curl 'https://gateway-a.watsonplatform.net/calls/url/URLGetRankedConcepts?url=http://context.montpelier.org/document/175&outputMode=json&apikey=6f30339d95ec00a62403ac09df6d592b389bfd75'
{
    "status": "OK",
    "warningMessage": "truncated-oversized-text-content",
    "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
    "url": "http://context.montpelier.org/document/175",
    "language": "english",
    "concepts": [
        {
            "text": "United States Constitution",
            "relevance": "0.932265",
            "website": "http://www.gpoaccess.gov/constitution/index.html",
            "dbpedia": "http://dbpedia.org/resource/United_States_Constitution",
            "freebase": "http://rdf.freebase.com/ns/m.07sdd",
            "opencyc": "http://sw.opencyc.org/concept/Mx4rvy_ifZwpEbGdrcN5Y29ycA",
            "yago": "http://yago-knowledge.org/resource/United_States_Constitution"
        },
        {
            "text": "United States Senate",
            "relevance": "0.751644",
            "website": "http://www.senate.gov",
            "dbpedia": "http://dbpedia.org/resource/United_States_Senate",
            "freebase": "http://rdf.freebase.com/ns/m.07t58",
            "opencyc": "http://sw.opencyc.org/concept/Mx4rG3mmKHXaQdeYnaZimuB2gw",
            "yago": "http://yago-knowledge.org/resource/United_States_Senate"
    },...

Note that, like entities, each concept also returns information that can be used to help differentiate between similar concepts by providing other references to that concept.

Combining multiple methods in a single call

Instead of making separate API requests to get different features from the same file or website, you can use the AlchemyLanguage GetCombinedData function to retrieve most information in one API call. This makes organizing your data a little bit easier: you can make one call per document and parse a single response for all of the results.

For example, the following call extracts keywords, entities, and concepts from a website and returns the sentiment for each extracted item.

curl "https://gateway-a.watsonplatform.net/calls/url/URLGetCombinedData?url=http://www.cnbc.com/2016/05/16/buffetts-berkshire-hathaway-takes-new-stake-in-apple.html&outputMode=json&extract=keywords,entities,concepts&sentiment=1&maxRetrieve=3&apikey={apikey}"

Output:

{
    "status": "OK",
    "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
    "url": "http://www.cnbc.com/2016/05/16/buffetts-berkshire-hathaway-takes-new-stake-in-apple.html",
    "totalTransactions": "5",
    "language": "english",
    "keywords": [
        {
            "text": "new stake",
            "relevance": "0.932782",
            "sentiment": {
                "type": "positive",
                "score": "0.218276",
                "mixed": "1"
            }
        },
        {
            "text": "entire Apple stake",
            "relevance": "0.851343",
            "sentiment": {
                "type": "negative",
                "score": "-0.541126"
            }
        },
        {
            "text": "Berkshire portfolio managers",
            "relevance": "0.764123",
            "sentiment": {
                "type": "positive",
                "score": "0.396507"
            }
        }
    ],
    "concepts": [
        {
            "text": "Berkshire Hathaway",
            "relevance": "0.97522",
            "website": "http://www.berkshirehathaway.com/",
            "dbpedia": "http://dbpedia.org/resource/Berkshire_Hathaway",
            "freebase": "http://rdf.freebase.com/ns/m.01tmng",
            "opencyc": "http://sw.opencyc.org/concept/Mx4rvfhl15wpEbGdrcN5Y29ycA",
            "yago": "http://yago-knowledge.org/resource/Berkshire_Hathaway"
        },
        {
            "text": "Warren Buffett",
            "relevance": "0.885789",
            "website": "http://www.berkshirehathaway.com/",
            "dbpedia": "http://dbpedia.org/resource/Warren_Buffett",
            "freebase": "http://rdf.freebase.com/ns/m.01d_ys",
            "yago": "http://yago-knowledge.org/resource/Warren_Buffett"
        },
        {
            "text": "Howard Graham Buffett",
            "relevance": "0.725814",
            "dbpedia": "http://dbpedia.org/resource/Howard_Graham_Buffett",
            "freebase": "http://rdf.freebase.com/ns/m.09g944",
            "yago": "http://yago-knowledge.org/resource/Howard_Graham_Buffett"
        }
    ],
    "entities": [
        {
            "type": "Company",
            "relevance": "0.848399",
            "sentiment": {
                "type": "negative",
                "score": "-0.408676",
                "mixed": "1"
            },
            "count": "7",
            "text": "Apple"
        },
        {
            "type": "Country",
            "relevance": "0.734529",
            "sentiment": {
                "type": "negative",
                "score": "-0.274002",
                "mixed": "1"
            },
            "count": "6",
            "text": "Berkshire"
        },
        {
            "type": "Person",
            "relevance": "0.390246",
            "sentiment": {
                "type": "positive",
                "score": "0.503473"
            },
            "count": "1",
            "text": "Berkshire Hathaway"
        }
    ]
}

      

The extract=keywords,entities,concepts parameter specifies the methods to use in the combined call, and the sentiment=1 parameter enables sentiment information for each result that supports it (in this case, entities and keywords). Any parameter from the individual API functions can be passed in a similar manner. For example, you can include the knowledgeGraph parameter from the GetRankedConcepts function. See the API reference for the complete lists of parameters.

Using combined call instead of individual method calls does not save you money. Each extract feature you add increases the cost of your combined call. If you pass a parameter like sentiment that incurs an additional transaction charge, one charge will be added for each method it applies to. You can view the number of transactions used in the totalTransactions attribute of the response. In this case, sentiment information was added to two functions, and three functions were used in total, so there were 5 billable transactions.

Possible values for the extract parameter
authors Returns the authors of the document
concepts Identifies general concepts that aren't necessarily directly referenced in the text
dates Identifies datetime information in the text, and returns it in yyyymmddThhmmss format
doc-emotion Detects whether the overall emotion conveyed by the document is anger, disgust, fear, joy, or sadness
entities Returns named people, places, and things found in the document
feeds Returns RSS and ATOM feeds found in the document
keywords Returns a list of topic keywords from the document
pub-date Returns the publication date of the document
relations Identifies Subject-Action-Object relations in the text
doc-sentiment Analyzes the overall sentiment of the document
taxonomy Categorizes the document into a 5-level taxonomy
title Returns the document title

Using custom models for entity and relation extraction

If you have access to Watson Knowledge Studio, you can create custom models for entity and relation extraction and use them with AlchemyLanguage. Each model you create in Watson Knowledge Studio is assigned a unique model identifier. To use the model in AlchemyLanguage, pass the query parameter model={your_model_id} when using text, html, or url variations of the following endpoints:

  • GetTypedRelations
  • GetRankedNamedEntities

You can also specify publicly available default models with the model parameter. The following request uses the the public English news model (en-news) to extract entities from a webpage:

curl "https://gateway-a.watsonplatform.net/calls/url/URLGetRankedNamedEntities?model=en-news&url=www.ibm.com/us-en&outputMode=json&apikey={apikey}"

Output:


  {
    "status": "OK",
    "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
    "url": "http://www.ibm.com/us-en/",
    "language": "english",
    "entities": [
      {
        "count": 4,
        "text": "IBM Cloud\nSee",
        "type": "Organization"
      },
      {
        "count": 2,
        "text": "Twitter \t\t\t\t\t\t\t\t\t\t\t\tJoin us",
        "type": "Organization"
      },
      {
        "count": 1,
        "text": "leaders",
        "type": "Person"
      },
      {
        "count": 1,
        "text": "people",
        "type": "Person"
      },
      {
        "count": 1,
        "text": "public",
        "type": "Person"
      },
      {
        "count": 1,
        "text": "cancer",
        "type": "HealthCondition"
      },
      {
        "count": 3,
        "text": "patients",
        "type": "Person"
      },
      {
        "count": 1,
        "disambiguated": {
          "dbpedia": "http://dbpedia.org/resource/Business",
          "freebase": "http://rdf.freebase.com/ns/m.09s1f",
          "name": "Business",
          "opencyc": "http://sw.opencyc.org/concept/Mx4rvVjQNpwpEbGdrcN5Y29ycA",
          "subType": [
            "Location",
            "FieldOfStudy",
            "OrganizationSector",
            "AwardDiscipline",
            "Profession"
          ]
        },
        "text": "businesses",
        "type": "Organization"
      },
      {
        "count": 1,
        "text": "first",
        "type": "Ordinal"
      },
      {
        "count": 1,
        "text": "2016",
        "type": "Date"
      },
      {
        "count": 1,
        "text": "one",
        "type": "Cardinal"
      },
      {
        "count": 1,
        "text": "respond",
        "type": "EventCommunication"
      },
      {
        "count": 1,
        "text": "Linkedin \t\t\t\t\t\t\t\t\t\t\t\tVisit our Facebook",
        "type": "Organization"
      },
      {
        "count": 1,
        "text": "today",
        "type": "Date"
      },
      {
        "count": 7,
        "text": "your",
        "type": "Person"
      },
      {
        "count": 3,
        "text": "us",
        "type": "Person"
      },
      {
        "count": 1,
        "text": "2.7x\nSoftLayer",
        "type": "Measure"
      },
      {
        "count": 1,
        "text": "YouTube",
        "type": "Product"
      },
      {
        "count": 1,
        "text": "Apache Spark",
        "type": "Organization"
      },
      {
        "count": 1,
        "text": "Watson for Genomics",
        "type": "Organization"
      },
      {
        "count": 2,
        "text": "Ajay Royyuru",
        "type": "Person"
      },
      {
        "count": 1,
        "text": "Java",
        "type": "Organization"
      },
      {
        "count": 1,
        "text": "Grand Slam tennis",
        "type": "SportingEvent"
      },
      {
        "count": 2,
        "text": "Watson APIs\nOur",
        "type": "Organization"
      }
    ]
  }

Note that in this example, AlchemyLanguage returns disambiguated entity information for some results. When the detected entity corresponds with an entity in the default model, AlchemyLanguage is able to return linked data information the the result.