Thursday, December 12, 2013

Creating Datastage Routines for hash files


I created some datastage routines to help manipulate hash files more easily. Instead of trying to work with the Admin console, I created reusable routines that will:

  • Count rows in a hash file
  • Remove rows from a hash file
  • Select rows from a hash file
I'm not exactly an expert, so my code is fairly simple and straightforward.

To create a routine, go up to File, and select New and Server Routine.

Give the routine a name and description, and choose the type you want it to be.



Most of my routines are 'Transform Function' type, because I use them within Transformers. They take data as arguments, and return results.

Next, you add what arguments your function will be using (if any). Then your code. Compile it. Test it. Save it. Done.

 Below is the code I use for my different hash file routines:


1. Count rows in a hashed file

I use one argument, the name of the hashed file I want to count.




The Code:


**This creates a pointer called myhash that points to your hashfile (Arg1). I hard code the hashfile location.
Command1 = "SETFILE C:\IBM\InformationServer\Server\Projects\<YourProjectName>\<YourHashDir>\":Arg1:" myhash"
Call DSExecute('TCL', Command1, Output, ReturnCode)

ErrorCode = ReturnCode ;* set this to non-zero to stop the stage/job


**Next you simply execute the COUNT command on your hashfile pointer. The result gets returned in Ans.

Command2 = "COUNT myhash"
Call DSExecute('TCL', Command2, Output, ReturnCode)
** The Ans output has crazy formatting because it returns as '*38454 records counted.*' And I want it to return 38454.
Ans = (Output [' ',1,1]) [2,10]

ErrorCode = ReturnCode ;* set this to non-zero to stop the stage/job

** Remove Pointer when done **
Command3 = "DELETE VOC myhash"
Call DSExecute('TCL', Command3, Output, ReturnCode)

ErrorCode = ReturnCode ;* set this to non-zero to stop the stage/job

*******


Here is an example of how I then use the routine in a Transform.

The routine is called HSUCountHashRows().  I use a job parameter (vHASH) to capture the hashed file name to be counted.  The name gets passed to the routine, as the argument.  I add the "+ 0" to convert the answer to an integer.  The transform then returns the file name and the resulting count.


2. Delete Hashed row by Primary key

This routine takes 3 arguments. The hashed file name, the column name that holds the primary key, and the primary key itself.

The Code

** Set file Pointer **
**This creates a pointer called myhash that points to your hashfile (Arg1). I hard code the hashfile location.
Command1 = "SETFILE C:\IBM\InformationServer\Server\Projects\<YourProjectName>\<YourHashDir>\":Arg1:" myhash"Call DSExecute('TCL', Command1, Output, ReturnCode)

Command2 = "DELETE FROM myhash WHERE ":Arg2:" = '":Arg3:"'"
Call DSExecute('TCL', Command2, Output, ReturnCode)
Ans = Output

ErrorCode = ReturnCode ;* set this to non-zero to stop the stage/job

*******


3. Delete Hashed row by Date

This routine deletes rows within a certain date range. I normally use this when I need to redo a load for the day. I delete from the hashed file those records with a load date (CREATE_EW_DTTM) of today.


The Code

**This creates a pointer called myhash that points to your hashfile (Arg1). I hard code the hashfile location.
Command1 = "SETFILE C:\IBM\InformationServer\Server\Projects\<YourProjectName>\<YourHashDir>\":Arg1:" myhash"Call DSExecute('TCL', Command1, Output, ReturnCode)

Command2 = "DELETE FROM myhash WHERE CREATE_EW_DTTM > '":Arg2:"' AND CREATE_EW_DTTM < '":Arg3:"'"
Call DSExecute('TCL', Command2, Output, ReturnCode)
Ans = Output

** Remove Pointer when done **
Command3 = "DELETE VOC myhash"
Call DSExecute('TCL', Command3, Output, ReturnCode)


4. Select a Row with a given Primary Key

My hashed files all use the same column for the primary key, so I didn't need to pull in the column name as an argument.

The Code

**This creates a pointer called myhash that points to your hashfile (Arg1). I hard code the hashfile location.
Command1 = "SETFILE C:\IBM\InformationServer\Server\Projects\<YourProjectName>\<YourHashDir>\":Arg1:" myhash"

Call DSExecute('TCL', Command1, Output, ReturnCode)

Command2 = "SELECT * FROM myhash WHERE PRIM_KEY = '":Arg2:"'"
Call DSExecute('TCL', Command2, Output, ReturnCode)
Ans = Output

** Remove Pointer when done **
Command3 = "DELETE VOC myhash"
Call DSExecute('TCL', Command3, Output, ReturnCode)

ErrorCode = ReturnCode ;* set this to non-zero to stop the stage/job


***

Hope this has been helpful. Happy coding!



No comments:

Post a Comment