TP2700
Introduction to Talend Training
This 3 day course provides an introduction to Talend.
Course Details
Duration
3 days
Prerequisites
- Experience with a programming language like Java
- Familiarity with RDBMS and SQL language is required
Target Audience
- Developers
- Data Engineers
- Integration Engineers
- Architects
- Data Stewards
Skills Gained
- Identify and define clear business goals and objectives for machine learning projects
- Create a comprehensive requirements document that aligns business requirements with technical specifications
- Define measurable acceptance criteria for machine learning projects
- Communicate the requirements and acceptance criteria to stakeholders in a clear and concise manner
Course Outline
OVERVIEW (Theory)
- Introduction to Talend
- Why Talend?
- Talend vs Other tools
- Logical Architecture
- More on Data Integration Aspects
- Talend Big Data Integration
- Talend Open Studio Walkthrough
- Key components in Palette
- Conclusion
INTRODUCTION AND GENERAL PRINCIPLES
- Before you begin
- Installing the software
- Enabling tHashInput and tHashOutput
METADATA AND SCHEMAS
- Introduction
- Hand-cranking a built-in schema
- Propagating schema changes
- Creating a generic schema from the existing metadata
- Cutting and pasting schema information
- Dropping schemas to empty components
- Creating schemas from lists
VALIDATING DATA
- Introduction
- Enabling and disabling reject flows
- Gathering all rejects prior to killing a job
- Validating against the schema
- Rejecting rows using tMap
- Checking a column against a list of allowed values
- Checking a column against a lookup
- Creating validation rules for more complex requirements
- Creating binary error codes to store multiple test results
MAPPING DATA
- Introduction
- Simple mapping and tMap time savers
- Creating tMap expressions
- Using the ternary operator for conditional logic
- Using intermediate variables in tMap
- Filtering input rows
- Splitting an input row into multiple outputs based on input conditions
- Joining data using tMap
- Hierarchical joins using tMap
- Using reload at each row to process real-time / near real-time data
USING JAVA IN TALEND
- Introduction
- Performing one-off pieces of logic using tJava
- Setting the context and globalMap variables using tJava
- Adding complex logic into a flow using tJavaRow
- Creating pseudo components using tJavaFlex
- Creating custom functions using code routines
- Importing JAR files to allow use of external Java classes
MANAGING CONTEXT VARIABLES
- Introduction
- Creating a context group
- Adding a context group to your job
- Adding contexts to a context group
- Using tContextLoad to load contexts
- Using implicit context loading to load contexts
- Turning implicit context loading on and off in a job
- Setting the context file location in the operating system
WORKING WITH DATABASES
- Introduction
- Setting up a database connection
- Importing the table schemas
- Reading from database tables
- Using context and globalMap variables in SQL queries
- Printing your input query
- Writing to a database table
- Printing your output query
- Managing database sessions
- Passing a session to a child job
- Selecting different fields and keys for insert, update, and delete
- Capturing individual rejects and errors
- Database and table management
- Managing surrogate keys for parent and child tables
- Rewritable lookups using an in-process database
MANAGING FILES
- Introduction
- Appending records to a file
- Reading rows using a regular expression
- Using temporary files
- Storing intermediate data in the memory using tHashMap
- Reading headers and trailers using tMap
- Reading headers and trailers with no identifiers
- Using the information in the header and trailer
- Adding a header and trailer to a file
- Moving, copying, renaming, and deleting files and folders
- Capturing file information
- Processing multiple files at once
- Processing control/validation files
- Creating and writing files depending on the input data
WORKING WITH XML, QUEUES, AND WEB SERVICES
- Introduction
- Using tXMLMap to read XML
- Using tXMLMap to create an XML document
- Reading complex hierarchical XML
- Writing complex XML
- Calling a SOAP web service
- Calling a RESTful web service
- Reading and writing to a queue
- Ensuring lossless queues using sessions
DEBUGGING, LOGGING, AND TESTING
- Introduction
- Find the location of compilation errors using the Problems tab
- Locating execution errors from the console output
- Using the Talend debug mode – row-by-row execution
- Using the Java debugger to debug Talend jobs
- Using tLogRow to show data in a row
- Using tJavaRow to display row information
- Using tJava to display status messages and variables
- Printing out the context
- Dumping the console output to a file from within a job
- Creating simple test data using tRowGenerator
- Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences
- Creating random test data using lookups
- Creating test data using Excel
- Testing logic – the most-used pattern
- Killing a job from within tJavaRow
DEPLOYING AND SCHEDULING TALEND CODE
- Introduction
- Creating compiled executables
- Using a different context
- Adding command-line context parameters
- Managing job dependencies
- Capturing and acting on different return codes
- Returning codes from a child job without tDie
- Passing parameters to a child job
- Executing non-Talend objects and operating system commands
COMMON MISTAKES AND OTHER USEFUL HINTS AND TIPS
- Introduction
- My tab is missing
- Finding the code routine
- Finding a new context variable
- Reloads going missing at each row global variable
- Dragging component globalMap variables
- Some complex date formats
- Capturing tMap rejects
- Adding job name, project name, and other job specific information
- Printing tMap variables
- Stopping memory errors in Talend
Software Development Lifecycle (Theory) (Hands-on)
- Working with Git and Talend
- How to perform CI/CD with Jenkins and Talend?
- Job Monitoring using Resource Manager UI
- Unit Testing
- Best Practices
- Joblets
- Parallelization
- Reusing Jobs (Child Jobs)
- Joblets
- Context Variables
- Repository
Getting started with a basic Big Data Job
- Creating a Job
- Adding components to the Job
- Connecting the components together
- Configuring the components
- Executing the Job
- Various types of Big Data Jobs
- Pig Workflow
- Reading and Writing to Hive on Hadoop
- Working with HDFS
- Performing Sqoop
- Using Spark in Talend
- Kafka
CarParts Project
- Creating a Spark Batch Job
- Use cases
- Scenario: Carparts_demoprep
- Scenario: Carparts_ETL
- Scenario: Carparts01_Spark
- Scenario: LoadCarPartsinHDFS