Peggy Zagelow - a view from the lab
Too many blog posts start with "It's been awhile since I've posted here", so I'm gonna skip that part and pretend that we've been talking regularly all along, OK ?
Firstly, yes, I'll be at IOD this year! I've missed a couple due to being completely heads-down on helping get this cool new product out, so it will be great on a personal and professional networking level to reconnect with many of you. I'm there to help represent an IBM product called IBM Content Analytics affectionately known as "ICA". It's part of our Enterprise Content Management suite. ICA integrates very well in an enterprise but at its core it is a standalone product - no prereqs. I hope that we get a chance to chat about it if we run into each other at IOD, I promise to keep it fun and friendly. :-)
The value proposition for IOD is one of unlocking insight with business value from your text content. Of course there is amazing technical capability behind it, and that's where my time is spent. Remember those old ads for the Evelyn Woods Speed Reading courses (kinda funny that it's still around, huh!)? Conceptually, it's like this ICA system knows how to do that, because it can just go and "read" anything and everything you've got, and then tell you what the important things are across the whole smash of documents. You aren't searching, you're being presented the hot terms, words, phrases, words that come from a list you provide, and how they are trending over time (you get to choose the date for analyzing trends - received date? incident date? date extracted from text? hire date? birth date? -- get the picture?). ICA also shows you how other terms, phrases, words correlate to what you've already identified. Right away out of the box, ICA will show you insight -- and it connects to all different enterprise sources. We've got a great suite of built-in capability for text analytics and intuitive visualizations for your insights. And then on top of that, ICA is completely customizable, too - make it find the insights/concepts you are interested in, display them how you need to, get instant results for a document, export analytics results to your favorite reporting tools, and oh, so much more!
My technical lead focus on 2010 has been on the ICA tooling and customization capabilities - how to add new custom text analytics, how to use the LanguageWare Resource Worbench to customize ICA, and how to make sure the sources of text are represented the way you want them. In that vein, I led a team of brilliant capable students in an Extreme Blue project this summer who really helped us to show how you can unlock the value in your data using this platform. And I've been having a lot of geeky fun designing and giving demos to lots of different clients.
Come learn more if you'll be at IOD. It's easy to search this ECM roadmap for Content Analytics sessions, and that's where you'll find me hanging around, plus I'll be at the Expo pedestals. Or let's catch up over a beverage - it's been too long!
PeggyZ 060000UWWS Tags:  stored_procedures db2_z/os metadata_stored_procedure... 7 Comments 4,341 Views
Also among the 'recommended practices' that I often present on DB2 for z/OS stored procedures is this one:
Many invocations of DB2 for z/OS stored procedures come from a Java(TM) or a CLI application. The software stack for these programs accessing DB2 for z/OS is through a "driver" program. These driver programs have SQL packages bound to DB2 for z/OS, and in the case of the application invoking a stored procedure, there is a fair amount of code executed in the driver program.
For a CLI program (the term CLI is often used interchangeably with ODBC) -- this is usually something running from a Microsoft(TM) application accessing DB2 for z/OS. The DB2 connect software that includes the driver for DB2 for z/OS has some smarts in it so that if the application is coded using incorrect data types for the stored procedure being invoked, the driver recovers and invokes the SQLPROCEDURECOLS metadata stored procedure on DB2 for z/OS to find out what the data types are and then re-sends the stored procedure call to DB2 for z/OS. Yes, you got it right, this means that a poorly coded application can invoke 3 stored procedure calls for every SQL CALL it's trying to do -- one to the original SP, one to SYSIBM.SQLPROCEDURECOLS, and then again to the original SP with the correct parm types! How do you recognize this? Well, you could run a client-side DRDA trace and it will show up there. Or you can look at statistics at the server. Or you can set the value DESCRIBEPARAM=0 in the db2cli.ini file on the client, and let the applications get the error SQLCODE -301 because now the driver won't do the metadata PS call and instead will let the application fail due to using the wrong datatype. Same result if you issue a -STOP PROCEDURE (SYSIBM.SQLPROCEDURECOLS) ACTION(REJECT) command on the DB2 for z/OS server.
For a Java(TM) program, the current driver is the DB2 Universal Java Driver, and it will not invoke the metadata stored procedure. So this is an excellent reason to switch to the current driver, because the older version of the driver went through the CLI code path and had the same problem as described above.
Note that if you invoke a stored procedure from the command line (the CLP), that code will always invoke the SQLPROCDURECOLS stored procedure since the command line doesn't provide anything for what data type the arguments are.
Now, if you are stuck with a CLI program that you can't modify, what can you do to improve the performance of SQLPROCEDURECOLS? Well, APAR PK57017 just shipped which reduces the size of the package for this stored procedure, so you can free up some EDM pool usage and get a small CPU usage improvement. You can also be sure you run RUNSTATS so that the data access for this SP is the most efficient it can be. I have also heard rumors of some customers creating additional indexes on the tables used by SQLPROCEDURECOLS, but I don't have any specifics on that, sorry.[Read More]
When I describe native SQL procedures in DB2 9 for z/OS, I often hear variations of these types of questions:
In order to answer this, I have to explain a little bit about how DB2 handles native SQL procedures. They are simply packages, with "runtime structures" for the SQL statements to be executed. So, when you invoke a native SQL procedure, DB2 finds and loads the package and executes the statements.
In contrast, an external stored procedure with SQL needs a complete language environment for the user program, and then that external program comes back to DBM1 to get its package loaded and SQL statements executed. That's what needs to be "throttled" - the external program execution environments and their associated TCBs. When an incoming stored procedure request is queued for WLM, the DB2 thread is suspended in DBM1. Many customers have experienced delays and DBM1 storage problems when their WLM goals weren't adjusted properly and the queued requests built up. The solution is to either adjust the WLM goals, or else adjust the limit on DB2 threads (local and/or distributed).
With native SQL procedures, the thread will just switch packages when the call statement is processed and run the procedure - no queuing. The storage used for the local variables is above the bar and managed with efficient algorithms. The maximum concurrent first-level native SQL procedures is effectively the same as your setting for maximum DB2 threads. (What I mean by first-level is that a native SQL procedure may have a nested call to another native SQL procedure, so the actual number of concurrent native SQL procedures may be even higher).
So, I guess the way I'd answer the questions is:
Of course I recommend that you test your native SQL procedures in your environment and measure for yourself, and do capacity planning based on the results of your testing. Native SQL procedures will use some DBM1 storage, after all, and how much depends on what statements and what variables are used in the program.
I've received a few questions about text search in DB2 9 for z/OS lately, so I thought I'd share the basic information here.
Prior to DB2 9, there was an offering called "DB2 Text Extender". This was an early attempt at text indexing that runs on z/OS. It is dependent on some z/OS code called "Text Search". Later on, the team that worked on the extenders also released something called "DB2 Net Search Extender", aka NSE. DB2 for Linux, Unix, and Windows had a significant upgrade with NSE, but that same upgrade was not shipped for DB2 for z/OS. So DB2 for z/OS customers have not had a significant upgrade to text search since "DB2 Text Extender" in DB2 V7.
In DB2 9, there is a completely new text search solution. Text search is provided by a built-in engine function called CONTAINS. This solution requires an external text search server that runs on a Windows or Linux operating system, which is provided as part of the DB2 Accessories suite. It is not a direct replacement of Text Extender function, so applications and administration policies need to be updated for this change. The best source for detailed information about this is in the information center topic Administering IBM text search for DB2 for z/OS. And I have an earlier blog entry about the announcement.
What about the DB2 Text Extender? It's not available in DB2 9 for z/OS. That means you won't see an equivalent for DB2 V8 FMID JDB881C. And you won't need IBM Text Search FMID-HIMN230 in DB2 9 for z/OS either, because that was a prerequisite for the DB2 Text Extender.
The above is all specific to DB2 for z/OS. DB2 for LUW 9.5 FP 1 ships the same Omnifind text search server, and DB2 LUW still ships support for their NSE as well.
Now that we've got that all cleared up :-), next we'll evaluate some use cases for text search applicability.[Read More]
PeggyZ 060000UWWS Tags:  stored_procedures db2_limits wlm_application_environme... db2_z/os 8 Comments 2,611 Views
Among the 'recommended practices' that I often present on DB2 for z/OS stored procedures is this one:
Let me explain why I recommend this. It's actually at the bottom of the list, and that's because it doesn't come up that often. But it has, and when it does, it can cost in I/O. DB2 has a Language Environment table of load modules in each stored procedures address space. For stored procedures defined STAY RESIDENT YES, we only have room for 512 load modules in that table. A load module has to be in the table in order for DB2 to invoke it. So, starting with the 512th, we'll delete it from the table after we call it, even if it's STAY RESIDENT YES. And come to think of it, we have separate tables for TYPE MAIN and TYPE SUB.
So to be completely accurate, the recommendation could actually say something like this:
For that last bit, remember that different invokers of a stored procedure that end up classified in different WLM enclaves will not have their SPs run in the same instance of a WLM-SPAS.
What's a WLM-SPAS? It's what I use to abbreviate a "WLM-established stored procedures address space".
And this post has motivated me to get a more recent copy of my stored procedures recommended practices presentation out online![Read More]
Even though I don't work daily on DB2 any more, I thought this might be a helpful post.
Last night I was online, shopping coincindentally on Cyber Monday, for presents for both my Dad's birthday and my nephew's birthday.
A work colleague happened to catch me online to ask some questions. Now, these answers are not from a book, just what I said right out of my head -- JUST as I would if you and I had this conversation. Here is our Lotus Sametime conversation, verbatim, no editing. updated: one minor edit marked by  -- OK, and I also put in the html tags and made the requester anonymous, of course. So -- sure, I could have said more about the virtues of data sharing and WLM. They are many! Forgive me, it was nighttime and I wasn't in a "marketing" mood.
In any case, please enjoy the truthiness of this exchange.
Dec 1, 2008
As many of you know, I changed jobs early in 2008 to switch my developer focus from DB2 for z/OS over to unstructured text technologies. Since many DB2 folks are heads-down in structured data, that whole "content"side is a bit of a mystery. Sure, you know about search, and you likely have a vague understanding of what it means to have a text index supporting keyword search. But really, there's so much more...
I've been learning a ton about all of this, which is to be expected after over a year (!) in this job. I have some terrific colleagues in text analytics, in research, development and services, who continually amaze me with their breadth and depth of knowledge, as well as their passion for the topic and their eagerness to help customers.
I happen to love linguistics, so this job is a great fit for me. I love to read, and the turn of a phrase in a book or a song lyric brings me joy. I like to think about the best way to phrase things and ways to interpret sentences. The more I interact with non-native English speakers, the more I appreciate both the beauty and the limitations of language, and the inherent difficulties in both generating and understanding sentences. I truly enjoyed all my study of French in school, too. It's always been such an intellectual snobbery to say something like "the French have a word for that", but anyone who knows more than one language knows it to be true -- language translation is never exact and concepts cannot always be expressed well, even in one's native language. Bottom line is that it's so interesting for me to dig into and help shape the technology and rules around extracting meaning from unstructured text.
Last month I was talking with a long-time friend and colleague who was here with her company at the IBM Silicon Valley Lab for a technology briefing. She and I have had several conversations at conferences over the years on topics like O-O databases, Java, XML, as they were emerging towards mainstream over the years. In the briefing, we talked a bit about unstructured data in the context of the Information Agenda, and one of their company's thought leaders said that unstructured data inclusion is implied. Cool, but, um, how exactly? Their (very reasonable) response when I probed a little further was that they needed to hook up with the business guys on that. YES! That's where I think we all absolutely have to start -- what is the unstructured data, and what questions do you need answered from it for business value. Then we go into more of the logistics around that.
Specifically, just this week I've worked on a couple of items that can help me bring some meaning to what text analytics is all about to folks who haven't been exposed to it deeply. The first was working on a report for a well-known analyst group, where we describe our information access technologies and offerings for unstructured data. And the second is a new offering that I hope will become available soon, to help quantify needs and specific business value that can be derived from unstructured data. If you are curious about any specifics, a great place to start digging into and even playing with some text analytics technology is the LanguageWare capabilities, system text and UIMA.
If you're interested in this kind of stuff, please let me know, or contact your local IBM rep and ask them (and tell them to ask me if they want a starting contact!) I'm passionate and eager to help! :-)[Read More]
PeggyZ 060000UWWS Tags:  sql_procedures stored_procedures db2_zos_architecture 1 Comment 1,934 Views
I often get asked about an architecture where a stored procedure is used for a single SQL statement. This is one of the most common errors in designs using stored procedures. It's always best to amortize the CALL overhead by including several SQL statements and even some business logic in every stored procedure. But folks that come to DB2 for z/OS from other DBMS's still do this.
A new twist on this is with our DB2 9 for z/OS support for native SQL procedures. Folks ask me, well now is it OK to have a single SQL statement in a native SQL procedure?
The thing is, the native SQL procedure is still a package for DB2 to load or at least switch to. So there really still is overhead, not to mention the network time to get over to DB2, as most apps will have to invoke several stored procedures to accomplish their logic. And guess what? Those applications typically invoke the same stored procedures in the same order with the same application logic in between. So... why not make that whole set a single stored procedure?
Bottom line: even with native SQL procedures executing in the DB2 engine rather than the WLM-managed address space, I still recommend an architecture with multiple SQL statements and some business logic rather than a single SQL statement per procedure.
If you do have an architecture with single SQL statements in stored procedures, that's not to say it won't work - it will work, just not as efficiently as it would if you follow the above advice. And even if the single-threaded app performs OK, it won't scale as well as having more SQL statements and logic in the stored procedure, due to longer-held locks over network transmissions as well as some database engine serialization required for external SQL procedures (that part at least is gone in DB2 9).
SQL procedures are absolutely strategic and I do recommend an architecture based on them. The advice I've given other customers is to determine the ones that are invoked most often, analyze the patterns of when they are invoked together from the same applications(s), and work to at least combine those into a single stored procedure with several SQL statements and some logic. If you implement SQL procedures on V8, the switch to DB2 9 native SQL procedures is a snap - drop and create and go, with no change to the code.
So, go, create, reuse, and be happy. :-)[Read More]
I was pointed to this interesting article from the New York Times, about a new technology invented by two software engineers, Jonathan Lindo and Jeffrey Daudel, to be able to "replay" the events that led up to a system crash. Not that I really want to see my "blue screen of death" from yesterday again, but if it would help identify the problem and get a fix, I could probably live through it a couple more times.
Reading the article, I was struck by a couple of points. They quote Lindo as saying that the inspiration came to them as "Wouldn't it be great if we could just TiVo this and replay it?" And then it says this:
Innovation by analogy is a powerful concept, says Giovanni Gavetti, an associate professor at the Harvard Business School who, with his colleague Jan W. Rivkin, has published research on how businesses can use analogic reasoning as a strategic tool. Human beings are analogy machines, he notes, dealing with new information by comparing it to things they already know something about.
That's true, I often try out analogies when I'm trying to understand or explain something. And I can really see how that could lead to innovations, as well as to some odd product evolutions. For a consumer example, I love how the iPhone lets me listen to my voicemail messages in any order, instead of sequentially, which must have been a leftover paradigm from when messages were stored on an analog tape. I can picture someone saying - "why can't I access my messages like I read my email?" - and voila - innovation.
Then I started wondering just how much you could tinker with the crash replay. Could you start eliminating concurrently-running applications, for example, to see if any of them contributed to the crash? And could you test a fix with the replay to see if it fixes the crash?
I also wonder whether IBM's customers would voluntarily seek out software like this to help them narrow down problems. It's not from IBM, and I really don't know any more about it than is in the article above. It's from a company called Replay Solutions, and it runs on several versions of the Microsoft Windows operating system. So, no mainframe support yet (grin). But you could ask them about it!
I heard an interesting story on the news last week, about how the individual states of the U.S. were graded on how they use information. The state I live in, California, got a C+. How can this be, with our advanced technologycenters in Silicon Vallley?
I found the article online here and found some interesting things, although nothing specific about California.
The article says:
Well, that sounds a lot like stuff that I have talked about when describing IBM's Information on Demand strategy. Is your organization good at doing this? I particularly noted the last point in the article, because some of the states complain that their legislators just aren't interested in using the data! Maybe we information professionals have to make that easy (and fun?) to do.
What about the highest-graded states? The article had this to say about one of them:
Yep... this is what everyone wants to know. What did we say we'd do? Did it make a difference? In fact, I've been trying to get this type of information from my financial analyst for some time!
What about states that were graded worse than California?
OK, well, perhaps that is an example of too much information! :-)[Read More]