Talend Data Integration

Introduction

Talend’s data integration solution helps companies deal with growing system complexities by addressing both ETL for analytics and ETL for operational integration needs and offering industrialization of features and extended monitoring capabilities. This page describes the various components that make up the Talend Platform, Logical Architecture, Physical Architecture , Best Practices & Guidelines based on our current support and guidelines.

Architecture

The Talend Data Integration Product consists of Platform Components which are designed & integrated to provide the ETL functionality needed for serving many business requirements. The Logical Architecture give us a high level view of various components that make up the platform and Physical Architecuture provides guidelines for various deployment environments.

Talend Platform

Oracle Java JDK – a Talend Pre-requisite

Server side and client side components all require an Oracle JDK (Java Development Kit) to run. Each component has its own specific pre-requisites, but the need for a JDK is consistent across all Talend components.

  • As of Talend v6 only JDK 8 is supported by Talend* – Oracle no longer supports Java 7 Exception: IBM AIX only has IBM Java 7 support, but only for specific Talend server components
  • Talend recommends that you use the same JDK version on all clients and servers
  • Compatible platforms / Java versions can be found in the product documentation.

The components that make up Talend Data Integration (and thus support the requirements of Kimball’s ETL Subsystems) are:

Logical Reference Architecture

The Talend Data Integration functional architecture is an architectural model that identifies Talend Data Integration functions, interactions and corresponding IT needs. The overall architecture has been described by isolating specific functionalities in functional blocks.For more details on the individual components of Logical Architecture please refer Talend Help documentation.

The following chart illustrates the main architectural functional blocks. image alt text

Physical Reference Architecture

Guiding Principles

  • We will have 4 environments as per our best practices
  • SDLC with CI is optional but recommended addon
  • DR environment is an addon depending on customer requirements
  • Data Prep and Data Stewardship are optional components.

Environments

Workstation/Server Role OS CPU RAM SSD Disk Size
Client PC Windows/Linux/Mac 4 Cores i7 Processor or equivalent 16 GB 500 GB
Talend Administration Center Windows/Linux 4 Cores 8 GB RAM Minimum, 32 GB Recommended for 1000s of Jobs 300GB+ Minimum (for software & logs)
Job Server(s) Windows/Linux 4 Cores Minimum,8+ Cores Recommended 16 GB RAM Minimum,128 GB Recommended 300+ GB
Centralized Log Server Windows/Linux 4 Cores Minimum 16 GB RAM 300+ GB
Data Prep & Data Stewardship Server Windows/Linux 4 Cores Minimum 32 GB RAM 300+ GB
Shared Nexus Server Windows/Linux 4 Cores Minimum 8 GB RAM Minimum 300+ GB
Git Server (Better in Saas Mode) Windows/Linux 4 Cores Minimum 8 GB RAM Minimum 50+ GB
CI Server Windows/Linux 4 Cores Minimum,8 Cores Recommended 16 GB RAM 300+ GB