mixpanel-node icon indicating copy to clipboard operation
mixpanel-node copied to clipboard

increment is creating anonymous users when distinct_id is not a string

Open fabriziomoscon opened this issue 11 years ago • 6 comments

the type of distinct_id at this point: https://github.com/mixpanel/mixpanel-node/blob/master/lib/mixpanel-node.js#L296 should be checked like so:

if (typeof distinct_id != 'string') {
  return callback(new TypeError('distinct_id must be a string'));
}

I just discovered I created 1000 users because I was passing an object as distinct_id rather than string I would say that the same issue is also present in any function that expects a distinct_id as parameter.

I can write a pull request when I can see the other pull requests I sent are merged.

cc @joeatwork cc @carlsverre

fabriziomoscon avatar Jan 07 '15 16:01 fabriziomoscon

Thanks for this request!

We may want to consider accepting integer (and maybe all scalar?) distinct_ids (or at least coerce them to strings) but it's true that an object or list distinct id is probably always an error.

Also, cc @tdumitrescu

joeatwork avatar Jan 07 '15 16:01 joeatwork

AFAIK we do allow object and list distinct IDs in other server-side and client-side tracking libs, and they end up getting serialized in a deterministic format by the backend. mixpanel-node should probably allow the same behavior. Since metrics.send_request() simply JSONifies all the data, that seems to be the intent here as well. Will have to investigate why this would end up creating multiple users. @fabriziomoscon if you can pass along a simple code snippet to reproduce the issue that would be helpful.

tdumitrescu avatar Jan 07 '15 17:01 tdumitrescu

Further researches shown me that the users created are distinct but have a meaningless distinct_id:

// after url decoded
#user?distinct_id={u'_commission': true, u'_locale': u'en_US', u'_promocode_enabled': True, u'_subcategories': [], u'_travel_mode': u'TRANSIT', u'areas': [], u'avatar': u'https://graph.facebook.com/<HIDDEN>/picture?type=large&width=400', u'braintreeCustomerId': u'5

It doesn't seem useful - unless I am missing something - to use a type distinct_id other than string, I can partially see an argument to allow numbers, but definitely not objects. Event if an object is serialized by JSON.stringify() that is clearly a developer mistake which the library should help prevent with an Error.

You can easily reproduce this using something similar to the following script:

var mixpanel = require('mixpanel');
// initialize mixpanel library
// ..
var user = new User('fabrizio', 'moscon');
mixpanel.people.increment(user, 'arbitrary_count', 1);

this creates an anonymous user (since that distinct_id has no other property) and the distinct_id is the stringified object.

I will not make this mistake in the future ( hopefully =) ) as I have learn this trick, but for new user this might cause unexpected behavior and possibly wasting useful time debugging.

fabriziomoscon avatar Jan 07 '15 18:01 fabriziomoscon

The main obstacle to implementing this suggestion is that we do have customers who legitimately send objects and other non-string/int types as distinct IDs. Although it's not the standard use case, it's one that we do support and it simply doesn't make sense for the Node lib to exhibit different behavior than our other SDKs in that regard.

If, however, you're seeing behavior where sending the exact same JSON is resulting in different distinct_ids being registered, that's an issue which we'll need to fix (probably on the backend). I'm guessing that your 'user' object is changing internally between your different tracking calls, which is what's leading to multiple distinct_ids.

tdumitrescu avatar Jan 08 '15 15:01 tdumitrescu

Javascript has a "particular" way of handling types, so some basic type checking must be put in place.

If you could provide a case for using an object as distinct_id that would definitely help the discussion. If that is legacy behavior and it should not be encouraged anymore, enforcing string/int by code and a major version update number will suffice, otherwise at least some basic object validation rules should be applied

In case using an object is still considering a legitimate case the library should at least still check for:

  • null
  • undefined
  • function
  • boolean
  • Array
  • Date
  • any not accepted object (checking the length of JSON.stringify() result for example)

for example


// use a node.js module to validate
// https://github.com/hapijs/joi or
// https://github.com/philbooth/check-types.js or
// some basic type check

if (
  typeof distinct_id == "undefined" ||
  distinct_id === null ||
  typeof distinct_id == "function" ||
  typeof distinct_id == "boolean" ||
  Array.isArray(distinct_id) ||
  ) {
  return callback(new TypeError('Invalid distinct_id type'));
}

if (typeof distinct_id == 'Object') {
  // check other constrains
}

fabriziomoscon avatar Jan 08 '15 16:01 fabriziomoscon

Validation in mixpanel.people calls to eliminate falsey values and pure functions (if that's even possible) as distinct_id sounds great! We'd welcome a pull request with it. Arrays and arbitrary objects need to be allowed. This isn't about 'legacy' behavior, it's behavior which some customers currently use (e.g., distinct_id = {name: 'Fabrizio', email: '[email protected]'}, much like a multi-column primary key in SQL), and we just can't deviate from it for this one lib, or tell them that the Node lib won't work for their projects because it artificially restricts distinct_ids.

tdumitrescu avatar Jan 08 '15 17:01 tdumitrescu