dbldatagen icon indicating copy to clipboard operation
dbldatagen copied to clipboard

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POC...

Results 81 dbldatagen issues
Sort by recently updated
recently updated
newest added

## Expected Behavior Generation of column with name "ID". ## Current Behavior Exception: `AnalysisException: Reference 'ID' is ambiguous, could be: ID, ID.` ## Steps to Reproduce (for bugs) ``` import...

bug
enhancement

## Expected Behavior When generating any data, if `baseColumn` is set to a reference column in `withColumn` then the data generated for the new column should be the same when...

inconsistency

## Expected Behavior When creating a column from existing schema or new that is of a composite type such as an Array of integers the expected behaviour is to have...

bug

## Expected Behavior Time intervals can be specified as "12 minutes, 2 seconds". You can also specify "1 minute, 2 seconds". You should be able to specify "1 minute 1...

## Expected Behavior ## Current Behavior ## Steps to Reproduce (for bugs) ## Context ## Your Environment * `dbldatagen` version used: * Databricks Runtime version: * Cloud environment used:

## Expected Behavior Add explicit support for Spark 3.2 (included in Databricks runtime 9.1) ## Current Behavior The current versions of the framework work in Databricks 9.1 (which is based...

If you have a named cluster specification in your Databricks environment and it had the current or a previous build of the datagenerator installed, when you uninstall the library and...

bug

## Enhancement: Generate standard data sets It would be useful to be able generate standard data sets without having to define columns etc for quick demos and benchmarking of different...

enhancement

As I have been using the data generator I have had to use trial and error to get a table size I require. Not sure if this is feasible but...

enhancement

Some users have requested the ability to generate 100 GB or 1 TB of data without specifying the number of rows

enhancement