TP2700

Introduction to Talend Training

This 3 day course provides an introduction to Talend.

Course Details

Duration

3 days

Prerequisites

  • Experience with a programming language like Java
  • Familiarity with RDBMS and SQL language is required

Target Audience

  • Developers
  • Data Engineers
  • Integration Engineers
  • Architects
  • Data Stewards

Skills Gained

  • Identify and define clear business goals and objectives for machine learning projects
  • Create a comprehensive requirements document that aligns business requirements with technical specifications
  • Define measurable acceptance criteria for machine learning projects
  • Communicate the requirements and acceptance criteria to stakeholders in a clear and concise manner
Course Outline

OVERVIEW (Theory)

  • Introduction to Talend
  • Why Talend?
  • Talend vs Other tools
  • Logical Architecture
  • More on Data Integration Aspects
  • Talend Big Data Integration
  • Talend Open Studio Walkthrough
  • Key components in Palette
  • Conclusion

INTRODUCTION AND GENERAL PRINCIPLES

  • Before you begin
  • Installing the software
  • Enabling tHashInput and tHashOutput

METADATA AND SCHEMAS

  • Introduction
  • Hand-cranking a built-in schema
  • Propagating schema changes
  • Creating a generic schema from the existing metadata
  • Cutting and pasting schema information
  • Dropping schemas to empty components
  • Creating schemas from lists

VALIDATING DATA

  • Introduction
  • Enabling and disabling reject flows
  • Gathering all rejects prior to killing a job
  • Validating against the schema
  • Rejecting rows using tMap
  • Checking a column against a list of allowed values
  • Checking a column against a lookup
  • Creating validation rules for more complex requirements
  • Creating binary error codes to store multiple test results

MAPPING DATA

  • Introduction
  • Simple mapping and tMap time savers
  • Creating tMap expressions
  • Using the ternary operator for conditional logic
  • Using intermediate variables in tMap
  • Filtering input rows
  • Splitting an input row into multiple outputs based on input conditions
  • Joining data using tMap
  • Hierarchical joins using tMap
  • Using reload at each row to process real-time / near real-time data

USING JAVA IN TALEND

  • Introduction
  • Performing one-off pieces of logic using tJava
  • Setting the context and globalMap variables using tJava
  • Adding complex logic into a flow using tJavaRow
  • Creating pseudo components using tJavaFlex
  • Creating custom functions using code routines
  • Importing JAR files to allow use of external Java classes

MANAGING CONTEXT VARIABLES

  • Introduction
  • Creating a context group
  • Adding a context group to your job
  • Adding contexts to a context group
  • Using tContextLoad to load contexts
  • Using implicit context loading to load contexts
  • Turning implicit context loading on and off in a job
  • Setting the context file location in the operating system

WORKING WITH DATABASES

  • Introduction
  • Setting up a database connection
  • Importing the table schemas
  • Reading from database tables
  • Using context and globalMap variables in SQL queries
  • Printing your input query
  • Writing to a database table
  • Printing your output query
  • Managing database sessions
  • Passing a session to a child job
  • Selecting different fields and keys for insert, update, and delete
  • Capturing individual rejects and errors
  • Database and table management
  • Managing surrogate keys for parent and child tables
  • Rewritable lookups using an in-process database

MANAGING FILES

  • Introduction
  • Appending records to a file
  • Reading rows using a regular expression
  • Using temporary files
  • Storing intermediate data in the memory using tHashMap
  • Reading headers and trailers using tMap
  • Reading headers and trailers with no identifiers
  • Using the information in the header and trailer
  • Adding a header and trailer to a file
  • Moving, copying, renaming, and deleting files and folders
  • Capturing file information
  • Processing multiple files at once
  • Processing control/validation files
  • Creating and writing files depending on the input data

WORKING WITH XML, QUEUES, AND WEB SERVICES

  • Introduction
  • Using tXMLMap to read XML
  • Using tXMLMap to create an XML document
  • Reading complex hierarchical XML
  • Writing complex XML
  • Calling a SOAP web service
  • Calling a RESTful web service
  • Reading and writing to a queue
  • Ensuring lossless queues using sessions

DEBUGGING, LOGGING, AND TESTING

  • Introduction
  • Find the location of compilation errors using the Problems tab
  • Locating execution errors from the console output
  • Using the Talend debug mode – row-by-row execution
  • Using the Java debugger to debug Talend jobs
  • Using tLogRow to show data in a row
  • Using tJavaRow to display row information
  • Using tJava to display status messages and variables
  • Printing out the context
  • Dumping the console output to a file from within a job
  • Creating simple test data using tRowGenerator
  • Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences
  • Creating random test data using lookups
  • Creating test data using Excel
  • Testing logic – the most-used pattern
  • Killing a job from within tJavaRow

DEPLOYING AND SCHEDULING TALEND CODE

  • Introduction
  • Creating compiled executables
  • Using a different context
  • Adding command-line context parameters
  • Managing job dependencies
  • Capturing and acting on different return codes
  • Returning codes from a child job without tDie
  • Passing parameters to a child job
  • Executing non-Talend objects and operating system commands

COMMON MISTAKES AND OTHER USEFUL HINTS AND TIPS

  • Introduction
  • My tab is missing
  • Finding the code routine
  • Finding a new context variable
  • Reloads going missing at each row global variable
  • Dragging component globalMap variables
  • Some complex date formats
  • Capturing tMap rejects
  • Adding job name, project name, and other job specific information
  • Printing tMap variables
  • Stopping memory errors in Talend

Software Development Lifecycle (Theory) (Hands-on)

  • Working with Git and Talend
  • How to perform CI/CD with Jenkins and Talend?
  • Job Monitoring using Resource Manager UI
  • Unit Testing
  • Best Practices
    • Joblets
    • Parallelization
    • Reusing Jobs (Child Jobs)
    • Joblets
    • Context Variables
    • Repository

Getting started with a basic Big Data Job

  • Creating a Job
  • Adding components to the Job
  • Connecting the components together
  • Configuring the components
  • Executing the Job
  • Various types of Big Data Jobs
    • Pig Workflow
    • Reading and Writing to Hive on Hadoop
    • Working with HDFS
    • Performing Sqoop
    • Using Spark in Talend
    • Kafka

CarParts Project

  • Creating a Spark Batch Job
  • Use cases
    • Scenario: Carparts_demoprep
    • Scenario: Carparts_ETL
    • Scenario: Carparts01_Spark
    • Scenario: LoadCarPartsinHDFS