Hands-On Microsoft SQL Server 2008 Integration Services part 50

Hands-On Microsoft SQL Server 2008 Integration Services part 50. Deploy and manage high-performance data transformation solutions across your enterprise using the step-by-step techniques in this fully revised guide. Hands-On Microsoft SQL Server 2008 Integration Services, Second Edition explains the tools and methods necessary to extract conclusive business intelligence from disparate corporate data. Learn how to build and secure packages, load and cleanse data, establish workflow, and optimize performance. Real-world examples, detailed illustrations, and hands-on exercises are included throughout this practical resource. . | 468 Hands-On Microsoft SQL Server 2008 Integration Services Hands-On Removing Duplicates from Owners Data The Fuzzy Lookup and Fuzzy Grouping transformations are the data flow components that you can use for data cleaning purposes. One of the main issues with data quality is to remove duplication in data whether it occurs at loading time or it already exists in the data that you want to cleanse. Until now you ve worked on a couple of instances to remove duplicates in the earlier Hands-On exercises but those instances were dealing with exact duplicates. In this exercise you will deal with fuzzy duplication of data. You will work with these components to remove exact as well as fuzzy duplicates from the input data. The scenario is that you are maintaining an Owner table that contains contact details for the owners of your products. You regularly receive an Owner s data feed that sometimes contains duplicate data. This duplicate data is not consistent as users tend to provide their details differently at different occasions. You need to make sure no duplicate record is added to the Owner table. Method You have the Owner table in the Campaign database and receive files regularly that contain owner records. This Excel file can contain duplicate records for the same person. The complication however is that these records may not be exact duplicates as persons provide their contact details differently at different occasions. Our sample file contains 13 records of which 3 are unique records and the other 2 records have five variants each with different name spellings and address details one of these 2 records already exists in the Owner table. Open the OwnersFeed .xls file to have a look at the incoming data see Figure 10-25 . Figure 10-25 Incoming data contains variants of duplicate records Chapter 10 Data Flow Transformations 469 The owner with first name Johnathon already exists in the table and has five different variants of contact .

Không thể tạo bản xem trước, hãy bấm tải xuống
TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.