Sunday, 11 August 2019 18:39

Commerce Data Hub: A Deep Look. Better Late Than Never

Written by
Rate this item
(0 votes)

SAP Commerce Data Hub is an ETL integration platform introduced by SAP back in September 2014 as part of Hybris 5.3. Today it is being decommissioned because of better alternatives.


must say that nobody liked Datahub despite all the efforts being made by SAP. However, it was the only SAP-aware integration platform that comes with Hybris, and later SAP Commerce which was recommended by the vendor for master and transactional data replication.

I think we’ve hummed it too long.

Hmm… It is becoming to look like an obituary.

I hesitated long, but the day has come. Better late than never.

In this article, I am going to dissect Data Hub and turn it inside out. For those, who are still using Data Hub, it may become a good addition to the official documentation.

A Quick Overview of Key Concepts

As I told above, a product called Hybris Data Hub has been introduced five years ago, as a complimentary tool for inbound and outbound data replication. The product was designed specifically for Hybris Commerce to easily import and export data between hybis and external data storage systems, but architecturally it was a platform-agnostic tool with a set of plugins. These plugins, or extensions, were aware of SAP Commerce and ERP integration interfaces, protocols, data models, and data formats. These pre-built integrations come as part of Commerce Data Hub distributive or as a standalone integration package (for example, SAP for Retail). There were extensions from the third-party vendors (ItemSense IoT Connector). In technical words, these extensions are separate apps deployed on the Tomcat server along with the Data Hub core.

Data Hub is designed only for asynchronous data integration. It is a so-called fire-and-forget model where the data does not have to be moved immediately but can be moved at a later point in time. Data Hub registers all the data packets received from the partner systems to process it asynchronously.

Data Hub loads data using inbound adapter into the so-called raw items, the exact representation of data coming from the source system. The system manages the mapping of data models using a canonical model decoupled from the raw and target models. The items are processed using composition and grouping.

For example, the IDOC adapter saves all IDOCs in Raw, only relevant data are transformed into a canonical model. A separate process converts the canonical data items into the target items. Another separate process publishes the target items to SAP Commerce via the Impex adapter.

The Data Hub is based on the old good Spring Integration Framework. On one hand, it is very convenient for the developers: SAP Commerce is also built with Java Spring and XML-driven. On another hand, the biggest disadvantage here is the freedom of coding. Even in out-of-the-box extensions, the transformation rules are both in the XMLs and code. It makes the configuration hard to read and extend.

There are three processing phases:

  • LOAD or data import phase.
  • COMPOSITION or data processing/transformation phase
  • PUBLISH or data exporting phase

Datahub doesn’t remove any items in the phases. For cleanup, there are dedicated procedures outside the Data Hub data processing flow.

Integration with SAP ERP

In this article, I will focus on SAP ERP integration only.

For the asynchronous ERP integration, the Data Hub uses an IDOC connector. For SAP Commerce integration, it uses a combination of Impex connector (Data Hub side) and datahub adapter (SAP Commerce Cloud side).

Particularly noteworthy is a way how generated ImpEx scripts feed into SAP Commerce. It happens at the publish phase where the system creates Impex scripts from the target items and push these scripts to SAP Commerce. SAP Commerce out of the box doesn’t provide a platform-agnostic integration interface for running ImpEx scripts. However, for Data Hub there is a special module for such a purpose. This module introduces a special “macro” to separate commands from data. Normally, ImpEx has one or more headers (“UPDATE Item;uid[unique=true];name”) and a data block followed by the header (“;1;name1”, “;2;name2”). For the Data Hub integration, the integration extension introduces an “$URL:..” macro which means “make a call to pull data”.

INSERT_UPDATE B2BUnit;;Name;description;active;uid[unique=true];buyer;locName

In other words, the Commerce Data Hub sends data to SAP Commerce using the following process:

SAP Commerce Data Hub sends a sort of message “there are data” along with the structure of this data, and SAP Commerce requests these data asynchronously to build a valid Impex. From my perspective, the approach is controversial and not reliable. For example, Data Hub can be overloaded, and after three attempts, SAP Commerce will stop trying to request the data. Such situation is hard to detect because Data Hub will mark data replication as successful which is not so in fact. The only monitoring console is available on the Data Hub side.

The above process is under documented.


IDocs are the central import and export format of SAP ERP systems. IDocs are used for asynchronous transactions. Each IDoc generated exists as a self-contained structure (can be represented as a text file) that can then be transmitted to the requesting system without connecting to the central database. One of the representations of IDOC is XML.

For replicating master data, SAP ERP sends IDocs via a HTTP connection to the Data Hub Server. IDOCs are received by the Datahub IDoc adapter, which creates Spring Integration messages (idocXmlInboundChannel).

There is a Spring Integration router which routes the messages to specific channels, such as MATMAS or ORDER05. These channels are read by service activators. The service activators use mapping services provided by the Data Hub integration extension, such as sapproduct or sapcustomer.

The IDoc client interfaces can be of different forms of complexity. One used in the Data Hub is the simplest, HTTP-based, XML-payloaded. It treats the IDoc payload as a XML-formatted text with the particular set of segments and attributes.

What is important here:

  1. There is no documentation saying which components (segments, attributes) of IDocs are to be processed and which are to be ignored. It can be extracted from the data hub extensions and java code, but you will find it pretty messy.
  2. The order of the IDocs can be important for the processing. The order the Data Hub expects is not guaranteed because of the asynchronous nature of IDoc and Data Hub. It may create issues with the data replication.
  3. If something goes wrong, the items will be left unprocessed without clear explanation on why. It may clutter the staging area with time and pose the issues on wrong/not timely data replication.
  4. There is no documentation saying what attributes should be mapped to what target items, canonical or SAP Commerce Cloud attributes, and what data transformations are to be applied. One may say that the configuration (XML, Datahub + Spring) is self-explanatory, but it is not so much because part of this logic is outside XML, in the Java code.
  5. Typical length of all text fields in IDOC is 40 characters. Additionally, there is a limit for the whole segment (1000 characters). Because of these limitations, business people tend to shorten the product name and description in the best possible way.

Every IDoc has one data record with multiple segments organized in hierarchy. The segments often represent a referencing table in the data model. For example, MATMAS IDoc, used for product master data (MATerial MASter), has a “E1MARMM” segment which represents a relevant extract from the MARM table where units of measure are stored. “E1MVKEM” represents an “MVKE” table which is used for material sales data and so forth.

IDoc data should be carefully screened. It is important to keep only those IDocs and segments which require Data Hub processing. Extraneous IDocs or segments can have a serious effect on Data Hub performance.

Commerce Data Hub doesn’t have any sample IDOCs. It is not easy to find them on Internet.

Material Data Model in SAP ERP

In SAP ERP, the following tables are considered together as a material master set:

  • MARC (Material Master)
  • MAKT (Material description)
  • MARA (Material Master General data)
  • MLAN (Material Master Tax Classification)
  • MARM (Unit of Measure)
  • MBEW (Material Valuation)
  • MVKE (Sales Data for Material)
  • EINA and EINEA017 and some others.

Additionally, there are tables used for the classification and characteristics:

  • CABN (Characteristic information)
  • KLAH (Class Header Data information)
  • AUSP (Characteristic Values).
  • KSSK (Allocation Table: Object to Class) and some others.

Product Master Data Replication

For the product replication, there is an extension sapproduct. It defines the replication rules, mappings, canonical and target structures.

The following IDocs are involved in this integration scenario:

  • Material Core Data:
    • MATMAS IDoc

While importing MATMAS, for example, only some attributes of IDOC are considered. Below in the table I listed all MATMAS segments and mark those supported by Commerce Data Hub.

E1MARAMMaster material general data (MARA)NO
E1MARA1Additional Fields for E1MARAMNO
E1MAKTMMaster material short texts (MAKT)YES
E1MARCMMaster material C segment (MARC)NO
E1MARC1Additional Fields for E1MARCMNO
E1MARDMMaster material warehouse/batch segment (MARD)NO
E1MFHMMMaster material production resource/too l (MFHM)NO
E1MPGDMMaster material product groupNO
E1MPOPMMaster material forecast parameterNO
E1MPRWMMaster material forecast valueNO
E1MVEGMMaster material total consumptionNO
E1MVEUMMaster material unplanned consumptionNO
E1MKALMMaster material production versionNO
E1MARMMMaster material units of measure (MARM)YES
E1MEANMMaster Material European Article Number (MEAN)NO
E1MBEWMMaster material material valuation (MBEW)NO
E1MLGNMMaster material material data per warehouse number (MLGN)NO
E1MLGTMMaterial Master: Material Data for Each Storage Type (MLGT)NO
E1MVKEMMaster material sales data (MVKE)NO
E1MLANMMaster material tax classification (MLAN)YES
E1MTXHMMaster material long text headerNO
E1MTXLMMaster material long text lineNO
E1MTXLMMaster material long text lineNO
E1CUCFGE1CUVALGeneral configuration data for the configurable material. Not available in MATMAS05.YES

As we see from this table, material long text line is not involved into the replication by design. Also, we see, that only a small portion of the MATMAS is processed. What segments are in scope is not documented.

The conceptual diagram from SAP says that MATMAS is translated to Canonical Product, ProductSales, and ProductUnit structures.

However, the configuration says that some data from MATMAS are also involved in CanonicalProductVariantAttributeValue, for example. This canonical object is not used in the configuration, but it is used in the java classes. Another example is CanonicalProductTax which is populated from MATMAS as well, and there are also no any rules in the XML configuration showing how CanonicalProductTax records are supposed to be transformed into any objects in Hybris. This separation between configuration-driven and java code driven data transformation creates enormous difficulties with understanding how all this stuff works.

The SAP Commerce product data model and local data configuration require an extension and initial setup to get Commerce ready for Datahub. First, there are additional types and attributes. For example, the Product type is extended with the following attributes:

  • sapBlocked: True if the product is blocked for sales. This value merges the cross and specific sales area flag.
  • sapBlockedDate: The date at which the blocked status takes effect.
  • sapConfigurable: True if the product is configurable.
  • sapEAN: The EAN of the sales unit of measure (UOM).
  • sapBaseUnitConversion: The conversion factor to multiply the current quantity (in sales UOM) to be translated into base UOM quantity.

Three other IDOCs involved in the material master data replications are CHRMAS, CLSMAS, and CLFMAS.

Load phase

At the load phase, the information from the IDOC is loaded into the database. The names of XML tags are concatenated and persisted as keys of raw item attributes.

As I mentioned above, only some IDOC attributes are configured for being imported. All others are ignored.

The following list shows the attributes from IDOC the Data Hub is aware of as well as their source (IDOC) and purpose. The symbol “→“ represents nesting in the IDoc XML: a→b means “b is a sub-element of a”.

  • E1CABNM → ATEIN. ‘X’ means Single valued. From CHRMAS. Used for ClassAttributeAssignment.multiValued = true if singleValued or interval is ‘X’ (see ATINT below). Example: “X” or “”.
  • E1CABNM → ATFORData type of characteristic (for example, “NUM”)., Used for ClassAttributeAssignment.attributeType:
    • ‘enum’ if attributeValueId <> “”
    • ‘date’ if attributeType = ‘DATE’
    • ‘number’ if attributeType = ‘NUM’
    • ‘string’ otherwise.
  • E1CABNM → ATINT. “X” means Interval Values Allowed. From CHRMAS.Used for
    • ClassAttributeAssignment.multiValued = true if singleValued or interval is ‘X’ (see ATEIN above)
    • ClassAttributeAssignment.range if interval is ‘X’
  • E1CABNM → ATNAM. A name of the characteristic (for example, “weight”). From CHRMAS. Used for
    • ClassificationAttributeValue.code (attributeId+”_”+attributeValueId)
    • ClassificationAttribute.code (=attributeId)
    • ClassAttributeAssignment.classificationAttribute
    • Product.attributes (via determineAttributeValue()) – @attributeId[…]
    • ERPVariantProduct.attributes (–“–)
  • E1CABNM → ATBEZ. Language-dependent description of characteristic (“Country of origin”).
  • E1CABNM → SPRAZ_ISO. Language (“EN”).
  • E1CABNM → ATWRT. Characteristic Value ID (for example, “12”). From CHRMAS and CLSMAS.Used for
    • ClassificationAttributeValue.code (attributeId+”_”+attributeValueId)
    • ClassificationAttribute.code (=attributeId)
    • ClassAttributeAssignment.classificationAttribute
    • Product.attributes (via determineAttributeValue()) – @attributeId[…]
    • ERPVariantProduct.attributes (–“–)
  • MSEHI. Unit of measurements. From CHRMAS. Used for
    • ClassAttributeAssignment.unit → unit(code,systemVersion(catalog(id),version),unitType)[unique=true]
    • Product.unit → unit(code)
    • ERPVariantProduct.unit → unit(code)
    • Product.sapBaseUnitConversion (formula)
  • CLASS. Class number/name. From CLSMAS. Used for
    • ClassificationClass.categoryID
  • E1KSMLM → ATNAM. Attribute ID
  • KLARTClass Type (for example, “001” – it is a Material class). From CLFMAS if E10CLFMAFID = 0.
  • E1MARAM → E1CUCFG → E1CUVAL → AUTHOR. Attribute author.  From MATMAS.
  • E1MARAM → E1CUCFG → E1CUVAL → CHARC. Attribute ID.  From MATMAS.
  • E1MARAM → E1CUCFG → E1CUVAL → VALUE. Attribute Value.  From MATMAS.
  • E1MARAM → E1MAKTM → MAKTX. Description.  From MATMAS.
  • E1MARAM → E1MAKTM → SPRAS_ISO. Language.  From MATMAS.
  • E1MARAM → E1MARMM→ UMREN. Denominator information. Measure is piece (PC). 5 kg correspond to 3 pieces to the base unit of measure. From MATMAS.
  • E1MARAM → E1MLANM → ALAND. Tax Country (For example, “GB”). From MATMAS.
  • E1MARAM → E1MLANM → TATY1. Tax Class. From MATMAS.
  • E1MARAM → E1MLANM → TAXM1. Tax Value. From MATMAS.
  • E1MARAM → MATNR. The product SKU#. From MATMAS.
  • E1MARAM → MATNR_LONG. Long product name (40 characters). From MATMAS.
  • E10CLFM → E1AUSPM → ATFLB. Attribute Value (Numeric). From CLFMAS.
  • E10CLFM → E1AUSPM → ATFLV. Internal Floating Point From (Numeric). From CLFMAS.
  • E10CLFM → E1AUSPM → ATNAM. Characteristic Name. From CLFMAS if E10CLFM-MAFID is zero.
  • E10CLFM → E1AUSPM → ATWRT. Characteristic Value. From CHRMAS.
  • E10CLFM → E1KSSKM → CLASS. Category Parent. From CLFMAS.
  • E10CLFM → KLART. Category Type. It is used for creating a catalog name (“ERP_CLASSIFICATION_” + category type). From CLFMAS.
  • E10CLFM → OBJEK_LONG. ID, Classification. From CLFMAS.
  • E10CLFM → SNDPRN. Creation system. From CLFMAS.

Internally, Data Hub stores all data in the BLOB fields in the database, as a binary stream. It poses certain challenges while debugging, because the normal database client can’t display BLOB fields.

There is a documented way of getting access to the blob fields with use of Data Hub REST API. You can use /raw-items/{itemId} or /pools/{poolName}/items/{itemType} for that. Make sure the user you’re accessing the data is in content_owner role. To configure the content_owner user in DataHub, use and in the datahub configuration. Also, you need to specify to configure the role content_owner.

If the source IDoc has more than one segment, Datahub creates as many elements in the raw as segments.

For example, the following IDoc contains two E1CUVAL segments:


As a result, Datahub creates two Raw Item during the load phase:

Raw Item #1

  • RawMATMAS_E1MARAM-E1MAKTM-MAKTX=Toolkit (configurable) 1

Raw Item #2

  • RawMATMAS_E1MARAM-E1MAKTM-MAKTX=Toolkit (configurable) 1

Some tasks at the composition phase require having particular data in RAW. These data can come from different IDocs. So the idea is to collect the information in Raw enough to get pushed to the COMPOSITION phase, and collect the information at the COMPOSITION phase enough to get pushed to the PUBLISH phase.

For the sample variant MATMAS IDOC (, the following items are created in the Rawitems:

I highlighted the differences in red.

After adding a base product, MATMAS-BASE.xml (,  the system adds the fifth raw item:

After importing CLSMAS ( and CLFMAS (, our rawitems set was extended with the three new records, one for CLFMAS and two for CLSMAS:

One CLSMAS led to two raw items because there are two segments with the same name.

Composition phase

At the composition phase, the elements from raw items are filtered and grouped according to the business logic for the particular transformation flow.

The target of grouping mechanisms is to split raw items by groups that will later be handled by composition logic. There are two default grouping handlers, splitting by the canonical item type and a primary key splitting. Primary key is a combination of attributes marked in the configuration as “part of a primary key”.

For the example above, the eight raw items of the RawMATMAS, RawCLSMAS, RAWCLFMAS types were converted into five canonical items of the CanonicalProductVariantAttributeValue, CanonicalProductTax, CanonicalProductUnit and CanonicalProduct types:

According to the configuration, CanonicalProductVariantAttributeValue has the following rules:

Canonical Product Variant Attribute Value attributeSource typeSourceExample



Four source RawMATMAS items were grouped by the compound key (4 attributes) that was resulted in two groups.

At this phase, the target items are created as well. The data model of the target items is very close/the same as the data model of the items in the target system, in our case, SAP Commerce Cloud.

So, for the variant MATMAS-only case, our five canonical items (CanonicalProductVariantAttributeValue, CanonicalProductTax, CanonicalProductUnit, and CanonicalProduct) were converted into five target items:

  • BaseVariant
  • BaseVariantAttributes
  • CleanBaseVariantAttributes

After adding a base product MATMAS, we see a new item in the target items

  • BaseProduct

After adding CLSMAS and CLFMAS, the following additional items are created in the target items

  • ClassificationClass
  • ClassAttributeAssignment
  • ClassificationAttribute

Some of these types play a role of “commands” for the target system. For example, all types starting with “Clean” removes data in the target system. The names are may or may not be the same as the names of the types in SAP Commerce.

Publish phase

At this phase, Data Hub sends the target items to the target system (SAP Commerce in our case) via the adapter (SAP Commerce Cloud adapter in our case, it uses Impex and assumes that datahubadapter is installed on the SAP Commerce Cloud side).

Let’s have a look at the publishing process for the case when only a MATMAS is loaded. After processing the MATMAS IDoc, we got four raw items and five canonical items (CanonicalProductVariantAttributeValue, CanonicalProductTax, CanonicalProductUnit, CanonicalProduct).

The resulting Impex for the “variant MATMAS only” case is

#% impex.setLocale( Locale.ENGLISH )

INSERT_UPDATE SAPInboundVariant;;@SAPInboundVariant[];baseProduct(code,catalogVersion(catalog(id),version));catalogVersion(Catalog(id),version)[unique=true];code[unique=true]

#% impex.setLocale( Locale.ENGLISH )

INSERT_UPDATE ERPVariantProduct;;sapEAN;sapBlockedDate[dateformat='yyyyMMdd'];sapBlocked;variantType(code);sapConfigurable;unit(code);baseProduct(code,catalogVersion(catalog(id),version));catalogVersion(Catalog(id),version)[unique=true];code[unique=true];supercategories(code,catalogVersion(catalog(id),version));sapBaseUnitConversion

INSERT_UPDATE ERPVariantProduct;;unit(code);name[lang=en];baseProduct(code,catalogVersion(catalog(id),version));catalogVersion(Catalog(id),version)[unique=true];code[unique=true]
;5;PCE;Toolkit (configurable) 1;TKCNF01:Default:Staged;Default:Staged;TKCNF01_BLACK_5

After adding a base product, another set of impexes is generated:

INSERT_UPDATE Product;;unit(code);name[lang=en];catalogVersion(Catalog(id),version)[unique=true];code[unique=true]
;504;PCE;Toolkit (configurable);Default:Staged;TKCNF01

INSERT_UPDATE SAPInboundVariant;;@SAPInboundVariant[];baseProduct(code,catalogVersion(catalog(id),version));catalogVersion(Catalog(id),version)[unique=true];code[unique=true]

Note that here a product-variant link has been created.

After adding the CLSMAS and CLFMAS, the system generates classification catalogs and entries:

INSERT_UPDATE ClassificationClass;;supercategories(code,catalogVersion(catalog(id),version));catalogVersion(Catalog(id),version)[unique=true];code[unique=true]

INSERT_UPDATE ClassificationClass;;name[lang=en];catalogVersion(Catalog(id),version)[unique=true];code[unique=true]

INSERT_UPDATE SAPInboundVariant;;@SAPInboundVariant[];baseProduct(code,catalogVersion(catalog(id),version));catalogVersion(Catalog(id),version)[unique=true];code[unique=true]

#% impex.setLocale( Locale.ENGLISH )

INSERT_UPDATE ClassificationAttribute;;defaultAttributeValues(systemVersion(catalog(id),version),code);systemVersion(catalog(id),version)[unique=true];code[unique=true]

INSERT_UPDATE ClassAttributeAssignment;;range;formatDefinition;unit(code,systemVersion(catalog(id),version),unitType)[unique=true];multiValued;attributeType(code);classificationAttribute(systemVersion(catalog(id),version),code)[unique=true];classificationClass(catalogVersion(catalog(id),version),code)[unique=true]


INSERT_UPDATE ERPVariantProduct;;sapEAN;sapBlockedDate[dateformat='yyyyMMdd'];sapBlocked;variantType(code);sapConfigurable;unit(code);baseProduct(code,catalogVersion(catalog(id),version));catalogVersion(Catalog(id),version)[unique=true];code[unique=true];supercategories(code,catalogVersion(catalog(id),version));sapBaseUnitConversion

INSERT_UPDATE ERPVariantProduct;;unit(code);name[lang=en];baseProduct(code,catalogVersion(catalog(id),version));catalogVersion(Catalog(id),version)[unique=true];code[unique=true]
;1021;PCE;Toolkit (configurable);TKCNF01:Default:Staged;Default:Staged;TKCNF01_BLACK_5

INSERT_UPDATE Product;;sapConfigurable;unit(code);catalogVersion(Catalog(id),version)[unique=true];code[unique=true];supercategories(code,catalogVersion(catalog(id),version));sapBaseUnitConversion;sapEAN;sapBlockedDate[dateformat='yyyyMMdd'];sapBlocked;variantType(code)

INSERT_UPDATE Product;;unit(code);name[lang=en];catalogVersion(Catalog(id),version)[unique=true];code[unique=true]
;1022;PCE;Toolkit (configurable);Default:Staged;TKCNF01

INSERT_UPDATE ERPVariantProduct;;catalogVersion(Catalog(id),version)[unique=true];code[unique=true];baseProduct(code,catalogVersion(catalog(id),version));@CHR_SIZE[system=ERP_CLASSIFICATION_300,version=ERP_IMPORT,]
;1024;Default:Staged;TKCNF01_BLACK_5;TKCNF01:Default:Staged;5 TOOLS#8

INSERT_UPDATE ERPVariantProduct;;catalogVersion(Catalog(id),version)[unique=true];code[unique=true];baseProduct(code,catalogVersion(catalog(id),version));@CHR_COLOR[system=ERP_CLASSIFICATION_300,version=ERP_IMPORT,]

Datahub sends this large impex script in five calls. For each call, the data were delivered via $URL notation explained above, for each INSERT_UPDATE. Five calls from Data Hub to Commerce, and 11 calls from Commerce to Data Hub.

Also, you can find that some catalogs names won’t be known to Hybris unless it is initialized properly. For example, ERP_CLASSIFICATION_300 is a catalog for ERP classification which is system-specific.

For that, there is an ABAP script to run on the ERP side which generates the initial data load impex script. This script has the configuration for:

  • Languages
  • Currency
  • Units
  • Classification System
  • ClassificationSystemVersion
  • ClassificationAttributeUnit
  • Countries
  • Regions
  • ProductTaxGroup
  • Vendor
  • Warehouse
  • Discount
  • SAPProductIDDataConversion
  • ProductPriceGroup
  • ProductDiscountGroup
  • UserPriceGroup
  • DiscountUserGroup
  • Title
  • ReferenceDistributionChannelMapping
  • ReferenceDivisionMapping
Read 166 times

Leave a comment

Make sure you enter all the required information, indicated by an asterisk (*). HTML code is not allowed.