Advice and Optimizations

 

Reconciliation

As you will load data from at least two different flows (Events and Visit), you will need to reconcile the data using a common key. Both APIs must contain mandatory columns such as the Visit ID and the Unique visitor ID. The concatenation of these two types of information will be considered as a unique key which will allow you to map the visits to the events.

As Visit IDs are reset every day at midnight, we recommend that you also set the timestamp* (only the date) in the reconciliation key.

* Unix time is a system for describing instants in time, defined as the number of seconds that have elapsed since Jan 1st 1970 midnight. The timestamp in Data Flow is already calculated to return the date based on the server local time.

Example:
Website ABC’s server is located in France (GMT+1). Website ABC’s employee calls the data flow event API which returns an event with a timestamp of 1511967867 in unix time. As Data Flow provides timestamps in server local time the event therefore happened at Wednesday, 29 November 2017 15:04:27 GMT+1 or Wednesday, 29 November 2017 14:04:27 GMT.

Common key = Unique Visitor ID + Visit ID + Timestamp (date)

 

Data Quality

 

Info:end

In order to make sure the file you downloaded contained all the data, each CSV file must start with the row #info:start and end with the row #info:end.

We recommend that you set up a process to check this information at the end of the file. If it is not present in the file, it means you are missing some data (may be due to a delay in the data calculation.)

In order to avoid incomplete files, we recommend that you wait 10 minutes minimum before requesting the last hour.

 

Maxdate

In addition, you can check the last minute of data available using our maxdate request. Please use the following URL:

The retrieval #format# is xml, html or json. #siteID# must be updated using your own siteID.

 

Compression

It is possible to enable compression in order to reduce download time and file size (highly recommended if you use the JSON format, which is heavier). In your header, you should add the following information:

Accept-encoding: gzip

 

Simultaneous Calls

Data Flow’s API has a limit of 5 simultaneous calls per user; therefore, a user can not submit over 5 calls at the same time. Each call is processed one by one, if the 5 simultaneous call limit is exceeded, an error message will appear.

 

Implementing “retry” scripts

We recommend implementing “retry” scripts when receiving 1 and 3003 errors to make the ingestion of your flows fail-safe. This will help you make sure you do not have gaps in your data.

There are various ways to do this, we suggest either one of the two following methods or a combination of both:

  1. Retry calling the URL every x minutes until the error is no longer received.
  2. Verifying the maxdate value (see above). If it has changed since the error was received, you can try making the call again.
 

Script examples

Coming soon.

Was this post helpful?
Yes
No
Last update: 14/05/2018