A web-based content mining methodology is planned to obtain the frequency of keyword occurrence on the target websites, and then combine with factor analysis in my research.
This research approach requires the tool, Inventive Firms API (IFAPI, using Google's SOAP API), to conduct automated website searches, which was introduced in the article:
- Diana Hicks, Dirk Libaers, Alan Porter, and David Schoeneck. 2006. Identification of the Technology Commercialization Strategies of High- tech Small Firms. No.289.
http://www.sba.gov/advo/re search/rs289tot.pdf.
As mentioned in this article (page 7 ~ 9), the basic process of using IFAPI to obtain the frequency of keyword occurrence on the target websites is as follows:
- Locate targeted web addresses;
- Identify certain keywords;
- Use IFAPI to search for each of the keywords on each of the target websites, and obtain hit counts for each term on each firm website as well as the total number of pages on the website (in order to be normalized by size of website).
The current situation has two major negative points:
1) There is a new policy enacted by Google: "As of December 5, 2006, we are no longer issuing new API keys for the SOAP Search API. Developers with existing SOAP Search API keys will not be affected." (
http://code.google.com/apis/soap search/)
2) There is no plan to integrate IFAPI into Google's new AJAX search API in the foreseeable future.
Therefore, in order to continue the research, it looks like there are two options for me:
- Find another a search tool with similar functions;
- Or do the search one by one by Google, but the work load is tremendous.
So, could you mind telling me that:
- Is there a similar web-based content search & mining tool which can conduct automatic batch processing?
- Does the IBM Unstructured Information Modeler has the similar functions?
Thanks in advance.