6

Is there a simple way to choose a previous delta table version as the current/working version?

Instead of creating another version by overwrite/truncate, can we just designate a version as the "current/latest" version?

This operation is more like undo, which completely remove some steps and make the data to a previous stage. Then when doing select * from MYTABLE this MYTABLE can point to the restored version.

both SQL or PySpark would be appreicated!

3 Answers 3

17

You can find History of delta table by using

DESCRIBE HISTORY yourTblName

It will give you history of table which includes Version, TimesStamp, UserId/Name ,Operation

enter image description here

To get previous version , you can do few steps, as

SELECT max(version) -1 as previousVersion  FROM (DESCRIBE HISTORY yourTblName)

It will give you previous version(you can save that in some variable) and then use that in Version As Of

select * from yourTblName Version as of 7

You will have records of previous version.

Restore data table to earlier version:

RESTORE TABLE yourTblName  TO VERSION AS OF 7

Databricks Documentation : https://docs.databricks.com/delta/delta-utility.html#restore-a-delta-table-to-an-earlier-state

2
  • How do we drop the later version? eg. I want the versioin 7 only and drop version 8. Then I want be able to select * from yourTblName to automatically point to version 7
    – QPeiran
    Commented Nov 19, 2020 at 21:25
  • Read data-bricks documentation on this.docs.databricks.com/delta/… Check updated answer. Commented Nov 20, 2020 at 13:10
0

You can restore the table to a later version and than use vaccum command

Restore can be done using

    RESTORE TABLE Table_name TO version of Version_no

Vaccum can be done using the vaccum command

    Vaccum table Table_name retain 0 hours 

Retain 0 hours will remove all history snapshots there is a spark config that you need to set before vaccum as by default delta logs are maintained for 7 days.

    spark.conf.set("spark.databricks.delta.retentionDurationCheck.enabled", False)

above is the condition to set it.

0

Pyspark way to restore a table to an earlier version. Works for delta table with unknown name.

  1. Load the table

    from delta.tables import *
    deltaTable = DeltaTable.forPath(spark, "/path/to/delta/table/")
    
  2. Restore to certain version

    deltaTable.restoreToVersion(5)
    

Reference: https://learn.microsoft.com/en-us/azure/databricks/delta/delta-utility

1
  • can we create a function for same in python.Can anyone share idea to do same using a UDF Commented Sep 27, 2022 at 9:19

Not the answer you're looking for? Browse other questions tagged or ask your own question.