Neo4j Snowplow Integration

Neo4j is the world's leading open-source graph database. Graph databases are widely used in enterprise fraud detection, real-time recommendations, social networking, marketing attribution, identity resolution, etc. Neo4j is democratizing access to graph databases with AuraDB, an affordable cloud-hosted Neo4j.

At SnowcatCloud, we believe in the rise of graph databases, as they provide insights into the relationships between entities in a way that relational databases can't do.

We created a Snowplow Neo4j integration that streams Snowplow behavioral event data into your instance of Neo4j, allowing our customers to create and maintain a graph database with their behavioral event data in minutes.

How does it work?

Events tracked in your Snowplow data stream are transformed and replicated into your Neo4j in real-time, which you can query using Bloom or CYPHER.

Supported event types

EventEvent NameVendor
page_viewpage_viewcom.snowplowanalytics.snowplow
transactiontransactioncom.snowplowanalytics.snowplow
transaction_itemtransaction_itemcom.snowplowanalytics.snowplow

Neo4j Snowplow Integration

Out of the box all supported event types are transformed and sent to Neo4j in real-time. This integration enables the creation and maintenance of a behavioral identity graph in Neo4j with minimal effort.

Real-time Neo4j Graph Update

You can create and update your graph by sending self-decribing events with the com.snowcatcloud.iceberg schema to your Snowplow collector.

As the events pass through your SnowcatCloud account, we transform and forward them to Neo4j in real-time.

Identifier

The identifier id property is unique per node and defined by the fields listed below:

  • FingerprintJS visitorId
  • Cookies if Available: (_gaexp, cart, ajs_user_id, ajs_anonymous_id)
  • Snowplow Cookies: domain_user, network_userid,
  • Snowplow user_id
<script>
  // Snowplow JS Tracker V3.x
  // Lookup a node with existinging identifier id, all devices linked to it
  // are given an additional new identifier id
  snowplow("trackSelfDescribingEvent", {
    event: {
      schema: "iglu:com.snowcatcloud.iceberg/identifier/jsonschema/1-0-0",
      data: {
        lookup_id: "existing identifier id",
        source: "new identifier id source",
        name: "new identifier id name",
        id: "new identifier id",
      },
    },
  });
</script>

Bulk Neo4j Graph Update

You can also enrich your behavioral identity graph with Terabytes of offline data, either yours or from third-party providers (Tapad, Experian, Verizon, etc.), by uploading CSV files into an S3 bucket.

Identifier

The identifiers bulk graph update enables you to enrich existing identifiers by looking up an existing identifier and adding a new identifier to all the connected devices.

The identifier id property is unique per node and defined by the fields listed below:

  • FingerprintJS visitorId
  • Cookies if Available: (_gaexp, cart, ajs_user_id, ajs_anonymous_id)
  • Snowplow Cookies: domain_user, network_userid,
  • Snowplow user_id

The example below illustrates a user who submits a form, creating a data entry that associates their cookie with personal data. Note this enrichment can happen in real-time OR/AND bulk. The goal is to tie as many identifiers are possible to aggregate customer behavior across devices.

File Upload Requirements

SnowcatCloud customers are provided with a dedicated encrypted S3 bucket to upload data files. Data is processed in real-time or in batch mode.

  • No headers
  • CSV file(s), gzipped
  • All columns are mandatory
Lookup Identifier IdSourceNameNew Identifier Id
AsalesforcephonenumberB

Example:

52147316-857b-489b-affd-b40dc7aead94,tapad.email.hash,email,2238fe6d9aa0a9de
b0bffd39-c6fc-46ab-9c75-659886f2bb31,tapad.email.hash,email,73f5f793711859cf
...