ACS icon indicating copy to clipboard operation
ACS copied to clipboard

SQL Server Linux Container running as StatefulSet shows status as Recovery Pending

Open Mahesh-MSFT opened this issue 8 years ago • 11 comments

Is this a request for help?:

YES

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

BUG REPORT

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-29T23:15:59Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.6", GitCommit:"7fa1c1756d8bc963f1a389f4a6937dc71f08ada2", GitTreeState:"clean", BuildDate:"2017-06-16T18:21:54Z", GoVersion:"go1.7.6", Compiler:"gc", Platform:"linux/amd64"}

What happened: I created a custom SQL Server Linux container by following process as described below.

  1. Create a standard SQL Sever Container by running docker run -d -p 1433:1433 -e "SA_PASSWORD=<password>" -e "ACCEPT_EULA=Y" microsoft/mssql-server-linux
  2. Connect to this container on 127.0.0.1, 1433 from SQL Server Management Studio and create a custom database by running following script
GO
IF(db_id(N'snoopyshoppingcart') IS NULL)
BEGIN
	CREATE DATABASE snoopyshoppingcart 
		ON
			(
				NAME = ssc_dat,
				FILENAME = N'/ssc/ssc.mdf'
			)
		LOG ON  
			( 
				NAME = ssc_log,  
				FILENAME = N'/ssc/ssc.ldf'
			)
END;

GO
USE snoopyshoppingcart

GO
IF NOT EXISTS (SELECT * FROM sysobjects WHERE NAME='shopping' AND XTYPE='U')
BEGIN
	CREATE TABLE shopping
	(
		AddedOn datetime,
		ConnectionID nvarchar(100),
		IP nvarchar(20),
		CartItem nvarchar(100)
	)
END

Note that database files are created on path /ssc. This is a deviation from standard SQL Server files path /var/opt/mssql/data. I want to use /ssc - custom path to use as mountpoint and mount it on Azure Disk when creating a statefulset for this container. 3. Run SHUTDOWN WITH NOWAIT. 4. Stop SQL Server Container. 5. Commit changes by running docker commit 8f maksh/snoopyshoppingcartdb and creating a new image. 6. Push this custom image by running docker push maksh/snoopyshoppingcartdb

When I create a container from this custom image and run locally (127.0.0.1, 1433), I can see that my custom database is available.

However, when I run this container as a statefulset on Kubernetes in Azure. Database shows status as Recovery Pending. My k8s manifests below.

Secret (For sa password):

apiVersion: v1
kind: Secret
metadata:
  name: sqlsecret
type: Opaque
data:
  sapassword: UGFzc3dvcmQxMjM0

Storage Class:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: azurestorageclass
provisioner: kubernetes.io/azure-disk
parameters:
  skuName: Standard_LRS
  location: southeastasia
  storageAccount: <my-storage-account-in-same-rg-as-k8s>

Service:

apiVersion: v1
kind: Service
metadata:
  name: sqlservice
  labels:
    app: sqlservice
spec:
  type: LoadBalancer
  ports:
  - port: 1433
    targetPort: 1433
  selector:
    app: sqlinux

StatefulSet:

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: sqlserverstatefulset
spec:
  serviceName: "sqlservice"
  replicas: 1
  template:
    metadata:
      labels:
        app: sqlinux
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: sqlinux
          image: maksh/snoopyshoppingcartdb
          env:
            - name: SA_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: sqlsecret
                  key: sapassword
            - name: ACCEPT_EULA
              value: "Y"
          ports:
            - containerPort: 1433
          volumeMounts:
            - name: sql-persistent-storage
              mountPath: "/ssc"
  volumeClaimTemplates:
  - metadata:
      name: sql-persistent-storage
      annotations:
        volume.beta.kubernetes.io/storage-class: "azurestorageclass"
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 16Gi

Pod logs following message: This message appears only on Kubernetes. Local container doesn't log this message.

2017-08-14 07:20:48.76 spid24s     Starting up database 'snoopyshoppingcart'.
 
2017-08-14 07:20:48.77 spid24s     Error: 17204, Severity: 16, State: 1.
 
2017-08-14 07:20:48.77 spid24s     FCB::Open failed: Could not open file /ssc/ssc.mdf for file number 1.  OS error: 2(The system cannot find the file specified.).
 
2017-08-14 07:20:48.78 spid9s      The resource database build version is 14.00.800. This is an informational message only. No user action is required.
 
2017-08-14 07:20:48.79 spid24s     Error: 5120, Severity: 16, State: 101.
 
2017-08-14 07:20:48.79 spid24s     Unable to open the physical file "/ssc/ssc.mdf". Operating system error 2: "2(The system cannot find the file specified.)".
 
2017-08-14 07:20:48.81 spid24s     Error: 17207, Severity: 16, State: 1.
 
2017-08-14 07:20:48.81 spid24s     FileMgr::StartLogFiles: Operating system error 2(The system cannot find the file specified.) occurred while creating or opening file '/ssc/ssc.ldf'. Diagnose and correct the operating system error, and retry the operation.
 
2017-08-14 07:20:48.81 spid9s      Starting up database 'model'.
 
2017-08-14 07:20:48.83 spid24s     File activation failure. The physical file name "/ssc/ssc.ldf" may be incorrect.

When I connect with SQL Server using external IP of service, I see my custom database is shown with status Recovery Pending

What you expected to happen: I expect custom SQL Server database should be mounted on Azure Disk and in operational state.

How to reproduce it (as minimally and precisely as possible): Follow steps as mentioned above.

Anything else we need to know: A VHD disk gets created in storage account. This matches persistentvolumeclaim .

Mahesh-MSFT avatar Aug 14 '17 08:08 Mahesh-MSFT

This is probably the NFS/remote storage problem that is present in RC1 and RC2 images. To confirm can you please pull/run the image with the :ctp2-1 tag? The bug was not present in that release. This issue is fixed in development now and will be part of the next release after RC2.

twright-msft avatar Aug 14 '17 19:08 twright-msft

Created new image from microsoft/mssql-server-linux:ctp2-1 base image. Same results :-( Database still shows "Recovery Pending" status.

Also tried with following another approach.

Dockerfile:

FROM microsoft/mssql-server-linux:ctp2-1

# Create app directory
RUN mkdir -p /ssc
WORKDIR /ssc

# Bundle app source
COPY . /ssc

# Grant permissions for the import-data script to be executable
RUN chmod +x /ssc/setup.sh

CMD /bin/bash /ssc/entrypoint.sh

Setup.sh:

#wait for the SQL Server to come up
sleep 90s

#run the setup script to create the DB and the schema in the DB
/opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P <my-password> -d master -i setup.sql

Setup.sql:

GO
IF(db_id(N'snoopyshoppingcart') IS NULL)
BEGIN
	CREATE DATABASE snoopyshoppingcart 
		ON
			(
				NAME = ssc_dat,
				FILENAME = N'/ssc/ssc.mdf'
			)
		LOG ON  
			( 
				NAME = ssc_log,  
				FILENAME = N'/ssc/ssc.ldf'
			)
END;

GO
USE snoopyshoppingcart

GO
IF NOT EXISTS (SELECT * FROM sysobjects WHERE NAME='shopping' AND XTYPE='U')
BEGIN
	CREATE TABLE shopping
	(
		AddedOn datetime,
		ConnectionID nvarchar(100),
		IP nvarchar(20),
		CartItem nvarchar(100)
	)
END

entrypoint.ssh:

#start SQL Server, start the script to create the DB
/opt/mssql/bin/sqlservr & /ssc/setup.sh

Docker logs <contID> shows following error.

2017-08-15 05:05:20.46 spid52      Starting up database 'snoopyshoppingcart'.
2017-08-15 05:05:20.55 spid52      Parallel redo is started for database 'snoopyshoppingcart' with worker pool size [1].
2017-08-15 05:05:20.57 spid52      Parallel redo is shutdown for database 'snoopyshoppingcart' with worker pool size [1].
Changed database context to 'snoopyshoppingcart'.

After this error, container becomes unresponsive.

Mahesh-MSFT avatar Aug 15 '17 04:08 Mahesh-MSFT

It looks you placed ssc.mdf, ssc.ldf to directory /ssc in the container image. It is working fine in local machine as there is no data disk attached to it. Then when you create a POD with a persistent volume, the volume /dev/sdc is mounted to directory (mountpoint) /ssc and it does not have the file ssc.mdf or ssc.ldf.

For instance, a new mountpoint /mntdavid, when you write files into it, the file “davidtestabcd20170815” is saved on OS disk /dev/sda1.

root@davidubuntu1604lts:/mntdavid# ls -la total 815672 drwxr-xr-x 2 root root 4096 Aug 15 05:03 . drwxr-xr-x 24 root root 4096 May 18 05:11 .. -rw-r--r-- 1 root root 835232768 Aug 15 05:04 davidtestabcd20170815 root@davidubuntu1604lts:/mntdavid# df -k Filesystem 1K-blocks Used Available Use% Mounted on udev 1740984 0 1740984 0% /dev tmpfs 352176 36220 315956 11% /run /dev/sda1 30428648 8283944 22128320 28% / tmpfs 1760860 0 1760860 0% /dev/shm tmpfs 5120 0 5120 0% /run/lock tmpfs 1760860 0 1760860 0% /sys/fs/cgroup tmpfs 352176 0 352176 0% /run/user/0 tmpfs 352176 0 352176 0% /run/user/999 /dev/sdb1 7092664 16120 6693216 1% /mnt tmpfs 352176 0 352176 0% /run/user/1000

If the mountpoint is mounted with a data disk (for instance /dev/sdc1), it will only show the contents of /dev/sdc1 and you will not be able to see the files from /dev/sda1.

root@davidubuntu1604lts:/mntdavid# mount /dev/sdc1 /mntdavid root@davidubuntu1604lts:/mntdavid# ls -la total 116 drwxr-xr-x 23 root root 4096 May 17 23:58 . drwxr-xr-x 24 root root 4096 May 18 05:11 .. drwxr-xr-x 2 root root 4096 May 17 23:54 bin drwxr-xr-x 3 root root 4096 May 17 23:58 boot drwxr-xr-x 5 root root 4096 Jun 27 2016 dev drwxr-xr-x 99 root root 4096 May 18 06:55 etc drwxr-xr-x 3 root root 4096 May 17 11:05 home lrwxrwxrwx 1 root root 32 May 17 23:58 initrd.img -> boot/initrd.img-4.4.0-78-generic lrwxrwxrwx 1 root root 32 Jun 27 2016 initrd.img.old -> boot/initrd.img-4.4.0-28-generic root@davidubuntu1604lts:/mntdavid# ls -la | grep -i davidtestabcd20170815

Is there a way to set a default directory as an environmental value for SQLonLinux instead of placing any data to it?

windswordsgd avatar Aug 15 '17 05:08 windswordsgd

FWIW, I have a SQL Linux container running on ACS and I used the default directory as below.

    volumeMounts:
    - mountPath: /var/opt/mssql
      name: sqldb-home

chzbrgr71 avatar Aug 15 '17 09:08 chzbrgr71

@chzbrgr71 can you share your k8s manifests/yaml?

Mahesh-MSFT avatar Aug 15 '17 11:08 Mahesh-MSFT

I have it posted here: https://github.com/chzbrgr71/sql-guestbook/blob/master/kube-db.yaml

chzbrgr71 avatar Aug 15 '17 12:08 chzbrgr71

Thanks @chzbrgr71. I see that you are using base image microsoft/mssql-server-linux to create container and then run a script to create custom database.

In my case, I have a custom database in custom database image.

@twright-msft Do you think Dockerfile for my custom database is correct?

Mahesh-MSFT avatar Aug 16 '17 08:08 Mahesh-MSFT

The only thing about putting your database in the image is that it is ephemeral. Any changes you make to the data will be lost if the container is deleted. For example, when you want to upgrade the SQL Server binaries you will need to deploy a new container. That may be fine in some scenarios like for dev/test purposes if that is what you are using it for. Otherwise, you'll want to do something more like what @chzbrgr71 has done.

twright-msft avatar Aug 16 '17 13:08 twright-msft

Thats why I am using statefulset with persistent volume.

Even if I do not put my database as part of image and try to create it using db creation script at container startup, it is failing.

I am simply trying to run db creation script at container start up - by running entrypoint.ssh -(@chzbrgr71 is running his script manually!).

Is this not valid approach?

Mahesh-MSFT avatar Aug 16 '17 14:08 Mahesh-MSFT

My planned approach for this is to use a Kubernetes initContainer. This would allow for a .sql script to be run to pre-stage the database and any seed data. Working on a sample here: https://github.com/chzbrgr71/guestbook-web/blob/master/sql-initcontainer/db.yaml

Still just a draft, but I see this working well. Need another initContainer to wait for SQL Server to start as well.

chzbrgr71 avatar Aug 16 '17 14:08 chzbrgr71

If you are trying to use remote storage for this you are hitting the RC1/RC2 bug that prevents DBs from being created on NFS/Samba shares. Only local storage will work for now. This is fixed already and will be in the next release. Until then you can go back to using the :CTP1-4 tagged image which doesnt have this bug.

twright-msft avatar Aug 16 '17 14:08 twright-msft