Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 9 hours 20 min ago

Documentum – Custom facets not showing up after full reindex?

Fri, 2020-11-27 16:19

Beginning of the year, while performing a migration from a Documentum 7.3 environment on VM to Documentum 16.4 on Kubernetes, a customer had an issue where their custom facets weren’t showing up on D2 after a full reindex. At the end of the migration, since xPlore has been upgraded as well (from xPlore 1.5 to 16.4, from VM to K8s), then a full reindex has been executed so that all the documents are indexed. In this case, it was several millions of documents that were indexed and it took a few days. Unfortunately, at the end of the full reindex, the customer saw that the facets weren’t working…

Why is that exactly? Well, while configuring custom facets, you will need to add subpath configuration for the facet computing and that is a schema change inside the index. Each and every schema change requires at the very least an online rebuild of the index so that the change of the schema is propagated into each and every node of the index. Unless you are doing this online rebuild, the xPlore index schema will NOT be refreshed and the indexing of documents will therefore use the old schema. In case you are wondering what is the “online rebuild” I’m talking about, it’s the action behind the button “Rebuild Index” that you can find in the Dsearch Admin UI under “Home >> Data Management >> <DOMAIN_NAME> (usually Repo name) >> <COLLECTION_NAME> (e.g.: default or Node1_CPS1 or Node4_CPS2 …)“:

This action will not index any new content, it will however create a new index based on the refreshed schema and then copy all the nodes from the current index to the new one. At the end, it will replace the current index with the new one and this can be done online without downtime. This button was initially present for both Data collections (where your documents are) as well as ApplicationInfo collections (ACLs, Groups). However in recent versions of xPlore (at least since 16.4), the feature has been removed for the ApplicationInfo collections.

 

So, what is the minimum required to configure custom facets? The answer is that it depends… :). Here are some examples:

  • If the xPlore has never been started, the index doesn’t exist yet and therefore configuring the facets inside the indexserverconfig.xml file would take effect immediately at the first startup. In this case, an online rebuild wouldn’t even be needed. However, it might not always be easy to modify the indexserverconfig.xml file before xPlore even starts; it depends on how you are deploying the components…
  • If the xPlore has been started at least once but indexing hasn’t started yet (0 content inside the Data collections), then you can just login to the Dsearch Admin UI and perform the online rebuild on the empty collections. This will be almost instantaneous so you will most probably not even see it happen though.
    • If this is a new environment, then make sure the IndexAgent is started in normal mode after that so that it will process incoming indexing requests and that’s it
    • If this is an existing environment, then you will need to execute a full reindex operation using your preferred choice (IndexAgent full reindex action, through some select queries, through the ids.txt)
  • If the xPlore has been started at least once and the indexing has been completed, then you will need to perform the online rebuild as well. However, this time, it will take probably quite some time because as I mentioned earlier, it needs to copy all the indexed nodes to a new index. This process is normally faster than a full reindex because it’s only xPlore internal communications, because it only duplicates the existing index (and applied schema change) and because there is no exchange with the Content Server. Once the online rebuild has been performed, then the facets should be available.

 

Even if an online rebuild is faster than a full reindex, based on the size of the index, it might still take from hours to days to complete. It is therefore quite important to plan this properly in advance in case of migration or upgrade so that you can start with an online rebuild on an empty index (therefore instantaneously done) and then perform the needed full reindex after, instead of the opposite. This might save you several days of pain with your users and considerably reduce the load on the Dsearch/CPS.

This behavior wasn’t really well documented before. I had some exchange with OpenText on this topic and they created the KB15765485 based on these exchanges and also based on what is described in this blog. I’m not sure if that is really better now but at least there is a little bit more information.

 

Cet article Documentum – Custom facets not showing up after full reindex? est apparu en premier sur Blog dbi services.

PostgreSQL 14: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Fri, 2020-11-27 09:52

It is a common misunderstanding that VACUUM FULL saves you from running out of disk space if you already have space pressure. Running a VACUUM FULL temporarily requires at least double the space, as the table (and the indexes on the table) get completely re-written. PostgreSQL 14 will probably come with a solution for that as this patch introduces the possibility to move relations from one tablespace to another, when either CLUSTER, VACUUM FULL or REINDEX is executed.

As this is about moving relations from one tablespace to another we obviously need at least two tablespaces to play with:

postgres=# \! mkdir /var/tmp/tbs1
postgres=# \! mkdir /var/tmp/tbs2
postgres=# create tablespace tbs1 location '/var/tmp/tbs1';
CREATE TABLESPACE
postgres=# create tablespace tbs2 location '/var/tmp/tbs2';
CREATE TABLESPACE
postgres=# \db
          List of tablespaces
    Name    |  Owner   |   Location    
------------+----------+---------------
 pg_default | postgres | 
 pg_global  | postgres | 
 tbs1       | postgres | /var/tmp/tbs1
 tbs2       | postgres | /var/tmp/tbs2
(4 rows)

Lets assume we have a table in the first tablespace and we face space pressure on that file system:

postgres=# create table t1 ( a int, b date ) tablespace tbs1;
CREATE TABLE
postgres=# insert into t1 select x, now() from generate_series(1,1000000) x;
INSERT 0 1000000

Without that patch there is not much you can do, except for this (which blocks for the duration of the operation):

postgres=# alter table t1 set tablespace tbs2;
ALTER TABLE
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | text    |           |          | 
 c      | date    |           |          | 
Tablespace: "tbs2"

This will move the files of that table to the new tablespace (but not the indexes). If you really want to get back the space on disk with “vacuum full” you can now do that:

postgres=# vacuum (tablespace tbs1, full true)  t1;
VACUUM
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 
Tablespace: "tbs1"

The very same is possible with reindex:

postgres=# create index i1 on t1 (a);
CREATE INDEX
postgres=# reindex (tablespace tbs2) index i1;
REINDEX

… and cluster:

postgres=# cluster (tablespace tbs1, index_tablespace tbs1) t1 using i1;
CLUSTER
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 
Indexes:
    "i1" btree (a) CLUSTER, tablespace "tbs1"
Tablespace: "tbs1"

postgres=# 

Nice.

Cet article PostgreSQL 14: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly est apparu en premier sur Blog dbi services.

Documentum DM_SYSOBJECT_E_NO_BROWSE_ACCESS No browse access for sysobject with ID

Thu, 2020-11-26 06:50

When upgrading you may get this specific issues if you are trying to edit content or create new documents. We got it when creating a new document in a specific folder, D2 told us that the folder doesn’t exists and that it had no browse access.

Foolishness! The folder does exist and all accesses are set properly so what happened? Here is the full log:

Caused by: com.documentum.fc.client.DfIdNotFoundException: [DM_API_E_EXIST]error:  "Folder specified by 0b010ba18005277d does not exist."
        at com.documentum.fc.client.impl.session.Session.getFolderBySpecification(Session.java:1289)
        at com.documentum.fc.client.impl.session.SessionHandle.getFolderBySpecification(SessionHandle.java:835)
        at com.emc.smartwave.web.servlets.listeners.properties.D2CreationServicePlugin.printParentFolder(D2CreationServicePlugin.java:151)
        at com.emc.smartwave.web.servlets.listeners.properties.D2CreationServicePlugin.formerOnAfter(D2CreationServicePlugin.java:187)
        at com.emc.smartwave.web.servlets.listeners.properties.D2CreationServicePlugin.createProperties(D2CreationServicePlugin.java:133)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at com.emc.d2fs.dctm.aspects.InjectSessionAspect.callPlugin(InjectSessionAspect.java:487)
        at com.emc.d2fs.dctm.aspects.InjectSessionAspect.ajc$inlineAccessMethod$com_emc_d2fs_dctm_aspects_InjectSessionAspect$com_emc_d2fs_dctm_aspects_InjectSessionAspect$callPlugin(InjectSessionAspect.java:1)
        at com.emc.d2fs.dctm.aspects.InjectSessionAspect.process(InjectSessionAspect.java:130)
        at com.emc.d2fs.dctm.web.services.create.D2CreationService.createProperties(D2CreationService.java:97)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at com.emc.d2fs.dctm.aspects.InjectSessionAspect.callPlugin(InjectSessionAspect.java:487)
        at com.emc.d2fs.dctm.aspects.InjectSessionAspect.ajc$inlineAccessMethod$com_emc_d2fs_dctm_aspects_InjectSessionAspect$com_emc_d2fs_dctm_aspects_InjectSessionAspect$callPlugin(InjectSessionAspect.java:1)
        at com.emc.d2fs.dctm.aspects.InjectSessionAspect.process(InjectSessionAspect.java:236)
        ... 55 common frames omitted
Caused by: com.documentum.fc.common.DfException: [DM_SYSOBJECT_E_NO_BROWSE_ACCESS]error:  "No browse access for sysobject with ID '0b010ba18005277d'."
        at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:57)
        at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getException(DocbaseMessageManager.java:137)
        at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.checkForMessages(NetwiseDocbaseRpcClient.java:329)
        at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.applyForObject(NetwiseDocbaseRpcClient.java:672)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$8.evaluate(DocbaseConnection.java:1382)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1141)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForObject(DocbaseConnection.java:1374)
        at com.documentum.fc.client.impl.docbase.DocbaseApi.parameterizedFetch(DocbaseApi.java:107)
        at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.fetchFromServer(PersistentDataManager.java:191)
        at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.getData(PersistentDataManager.java:82)
        at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObjectFromServer(PersistentObjectManager.java:355)
        at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObject(PersistentObjectManager.java:311)
        at com.documentum.fc.client.impl.session.Session.getObject(Session.java:965)
        at com.documentum.fc.client.impl.session.Session.getFolderBySpecification(Session.java:1281)
        ... 74 common frames omitted

The issue resolution is quite simple, it’s just a cache problem.

You only need to force clearing the caches on the clients. There’s a simple step to do this. The content server is keeping a reference on the “client version” in the dm_docbase_config object. We only need to increment its value to force all clients to reload and clear the caches. So here is how you do it:

In DQL:

retrieve,c,dm_docbase_config
set,c,l,client_pcaching_change
7
save,c,l
reinit,c

Just increase the client_pcaching_change number by 1.

Cet article Documentum DM_SYSOBJECT_E_NO_BROWSE_ACCESS No browse access for sysobject with ID est apparu en premier sur Blog dbi services.

DynamoDB PartiQL – part II: SELECT

Wed, 2020-11-25 14:29
By Franck Pachot

.
In the previous post I insertd a few rows in a Demo table using the SQL-like new API on DynamoDB. I checked my items with a SELECT but was limited in the ORDER BY clause. Here is the most important to understand: there are no additional data processing engine here. PartiQL (pronounce it like ‘particle’ and it helps to avoid any kind of dyslexia) parses a statement with INSERT/UPDATE/DELETE/SELECT and calls the NoSQL API you already know (Put/Get/Update/Delete Item, Query and Scan). It looks like SQL but SQL is a language that declares operations on a set of rows, like relational tables or views, which are a logical layer above the physical model. In RDBMS, you build your SQL queries according to your business needs, not the physical layout. Of course, the physical layout (like indexing, partitioning) is also optimized for this access, but this is done independently. With PartiQL on DynamoDB you must know which operation will happen when you write your statement. Because all the simplicity and scalability of DynamoDB resides on the bounded API that matches the physical layout:

  • GetItem does partition pruning + unique hash/index access to retrieve one item
  • Query does partition pruning + index access to retrieve a sorted range of items
  • Scan reads all partitions, possibly in parallel, to retrieve all items

Of course, the cost is different. With the DynamoDB API you know which one you are doing because you call a different operation. With PartiQL you should know what you are doing but you execute the same statement (SELECT) and the data access will depend on the columns referenced in WHERE clause. Basically, if you don’t have an equality predicate on the partition (HASH) key, you have to read all partitions (Scan). If you have an equality predicate on the partition (HASH) key and inequality on the sort (RANGE) key you benefit from partition pruning (Query). This is obvious when you know what is a hash function, but error-prone if you don’t know the data model. The DynamoDB API helps you to prevent that because your fingers should hurt when typing “scan” for a large table.

Scan

So, if what you want is actually get all items, because you need all of them, or maybe to filter out a small part of them only, you want a scan. Yes, it reads everything, but it is the most efficient access to read a large portion of your table. Because with one RCU you can get many items. Doing the same (getting all items) with GetItem would cost one RCU per item (I suppose strong consistency and small items here). To put it basically, for OLTP workload (many users reading few items) you avoid scans on large tables. DynamoDB is a key-value store: the goal is to access by the key. And for some reporting or to export data, you may scan, which is expensive (in time and RCU) but not done frequently.

As seen in the previous post, scanning the whole table to get all items with all attributes is a simple SELECT * FROM:


[opc@a aws]$ aws dynamodb execute-statement --statement "select * from Demo"

{"Items":[
{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"a"},"MyKeySort":{"N":"1"}},
{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"use parameters when embedding SQL in programs"},"MyKeySort":{"N":"2"}},
{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"c"},"MyKeySort":{"N":"3"}},
{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"d"},"MyKeySort":{"N":"4"}},
{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"e"},"MyKeySort":{"N":"5"}},
{"MyKeyPart":{"N":"1"},"MyUnstructuredData":{"S":"here is my first insert :)"},"MyKeySort":{"N":"1"}}
]}

As long as there’s no equality predicate on the primary key (or the hash part of it in case of composite hash/sort key) the SELECT will do a scan. I mentioned “equality”, we will see later when there are many equality predicates or a list of values to be equal to. We will see later, probably in a further post, what happens with secondary indexes. Anyway, this is not a RDBMS. When you query the table, there’s no query planer to optimize the access to read from an index. If you want to access by a secondary index, the index name must be mentioned in the FROM clause.

Another thing that we have seen in the previous post is that, as it is a scan, you cannot have the partition key in the ORDER BY because DynamoDB does not sort the rows when retrieved from multiple partitions, and PartiQL do not do further data processing on the result. So, basically, there’s no possible ORDER BY when not having a WHERE clause on the partition key:



[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo order by MyKeyPart"

An error occurred (ValidationException) when calling the ExecuteStatement operation: Must have WHERE clause in the statement when using ORDER BY clause.

[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo order by MyKeySort"

An error occurred (ValidationException) when calling the ExecuteStatement operation: Must have WHERE clause in the statement when using ORDER BY clause.

Query

Then, if we query for one partition only, this is a Query rather than a Scan. Here is an example where I select only the items where MyKeyPart = 2 which, with the HASH function, maps to only one partition:


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart = 2"

{"Items":[{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}}]}

The items are ordered by MyKeySort even in the absence of ORDER BY because this is how it is stored and retreived physically within each partition. But, as SQL is a declarative language, I prefer not to rely on the order without ORDER BY clause.

Here is the correct way to do it, with no additional cost:


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart = 2 order by MyKeySort"
{"Items":[{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}}]}


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart = 2 order by MyKeySort desc"
{"Items":[{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}}]}

Here because there is only one value for MyKeyPart I didn’t need to put MyKeyPart in the ORDER BY, but with multiple values you need to:


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort \
from Demo where MyKeyPart = 1  or MyKeyPart = 2 order by MyKeySort"

An error occurred (ValidationException) when calling the ExecuteStatement operation: Must have hash key in ORDER BY clause when more than one hash key condition specified in WHERE clause.

[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort \
from Demo where MyKeyPart = 1  or MyKeyPart = 2 order by MyKeyPart,MyKeyPart desc"

{"Items":[{"MyKeyPart":{"N":"1"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}}]}

[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort \
from Demo where MyKeyPart = 1  or MyKeyPart = 2 order by MyKeyPart desc,MyKeyPart"

{"Items":[{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"1"},"MyKeySort":{"N":"1"}}]}

You might be surprised to see this query with multiple values run as a Query rather than a Scan. What if they come from multiple partitions?
This is possible when the number of values is well known in advance (“1” and “2” here) and then this can be sorted first, and a Query run for each of them. Of course, this will multiply the cost of it. For example, because I know that I inserted values 1 to 5, I can get all my items with:


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort \
from Demo where MyKeyPart in [1,2,3,4,5] order by MyKeyPart,MyKeySort"

{"Items":[{"MyKeyPart":{"N":"1"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}}]}

So I’m able to get all items sorted now? Yes, but for a higher cost than a scan because it will query them one by one. I would be cheaper to Scan here but there is no optimizer to estimate the cost of both operations and choose the cheaper. But at least, the cost is predictable as it is proportional to the number of key values in the list.

I cannot use inequalities, or BETWEEN, because they work on a range and this Query access can be done only on known values.


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart between 2 and 2 order by MyKeyPart,MyKeySort"

An error occurred (ValidationException) when calling the ExecuteStatement operation: Must have at least one non-optional hash key condition in WHERE clause when using ORDER BY clause.

Here, even if maths tells me that it is equivalent to equality (“MyKeyPart between 2 and 2” is the same as “MyKeyPart = 2”) we have no optimizer there to do those transformations. The rule is basic: a set of value can be sorted and queried individually but anything else is considered as a range of value that cannot be accessed with a hash function.

How can I be sure about this behaviour? I have a small table where the response time difference is not significant. Be best proof is to see what happens when full table scan is impossible. There’s an IAM policy to deny scans:

PartiQL documentation (https://t.co/ezzXfsyADB) explains the scenarios when a SELECT can become a SCAN. You can also explicitly deny Scan via IAM policy (https://t.co/elVs4HpvtM) to avoid full table scans.

— DynamoDB (@dynamodb) November 24, 2020


I have created a user with deny on “dynamodb:PartiQLSelect” action on condition “dynamodb:FullTableScan”=”True”

With this user profile I execute the following:


[opc@a aws]$ aws --profile noscan dynamodb execute-statement --statement "select MyKeyPart,MyKeySort \
from Demo"

An error occurred (AccessDeniedException) when calling the ExecuteStatement operation: User: arn:aws:iam::802756008554:user/ddb-noscan is not authorized to perform: dynamodb:PartiQLSelect on resource: arn:aws:dynamodb:eu-west-1:802756008554:table/Demo with an explicit deny

[opc@a aws]$ aws --profile noscan dynamodb execute-statement --statement "select MyKeyPart,MyKeySort \
from Demo where MyKeyPart in [1,2,3,4,5]"                

{"Items":[{"MyKeyPart":{"N":"1"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}}]}

It is clear that when Full Table Scan is denied the WHERE on a list of 5 values is still possible. Because it 5 query calls instead of a scan.

Pagination

I have additionally inserted many rows with MyKeyPart=10 and large size attributes, and query them:


$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart=10"

{"Items":[{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"5"}}],"NextToken":"CS4sUIPi4Efg7eSg4sGJZHJ09C/m8JWMwLXB+DF5n54EIBl6yuPZNAHfoRUFg7qgGg872qXswoXSZEI/XAIfvUPNisWSYGrPiquxLFakMecd6aF/ggaexxpKlhPS+ridkOXu8HoWIuWgSXFRBa32QmIXITRhrSMwuT1Q54+6Li6emcxvtpJfmxvxWf/yQkece5nqQIwH/EC3vAr1SZ4Pd537qexKejVHJ+2QrXALwG283UR/obWc53A2HTQ+G3cNeL4xOvVwp9gsOhlKxhsRrS+GqHRF0IHlGrpsdc0LkbMS1hISuagp/KZ0dqP/v7ejB6HsEHhFYZeKYZBoysTYTzhpB02NF3F4MSKp8QF4nO4vcq4="}

I get a few items and a “Next Token” that is quite large.

I can query the next pages with the –next-token option:


$ ws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart=10 \
--next-token CS4sUIPi4Efg7eSg4sGJZHJ09C/m8JWMwLXB+DF5n54EIBl6yuPZNAHfoRUFg7qgGg872qXswoXSZEI/XAIfvUPNisWSYGrPiquxLFakMecd6aF/ggaexxpKlhPS+ridkOXu8HoWIuWgSXFRBa32QmIXITRhrSMwuT1Q54+6Li6emcxvtpJfmxvxWf/yQkece5nqQIwH/EC3vAr1SZ4Pd537qexKejVHJ+2QrXALwG283UR/obWc53A2HTQ+G3cNeL4xOvVwp9gsOhlKxhsRrS+GqHRF0IHlGrpsdc0LkbMS1hISuagp/KZ0dqP/v7ejB6HsEHhFYZeKYZBoysTYTzhpB02NF3F4MSKp8QF4nO4vcq4="

{"Items":[{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"6"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"7"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"8"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"9"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"10"}}],"NextToken":"FjHEA2wnIK74SlGaS6TiPSv2fEwfiZhJNHyxvJ+qG750oeKlqSNyx9IDdCUD+m2rSpodPIFJhYYQHXBM9sJed3k6qaA/aUk4s4DUlPvZHl7WAJ4rTY0AmNDUYBPqWyCV8FliSsGPtFTfj1A9T4zD1TU6uuvNIORY/zKHtsAjWzT4Jsg5y32MFcVOmOsDBhyWsQotFqxy1ErMGhJy3cQnEvy1P1KpQak6sflzp3sWLWzUgOXQB/xF1PXRtT8w/E1lPk26LnA/L2bA91nucuohN63hP3MVojPH0GkPCjZsx08wJTn4MEpqDArEREWO2XCkL/GI7vTtYw6GXRenKZoatSG55yKCVDkFRuw7cbK749mEIb6r6Xs="}

Again, this is completely different from SQL databases where you have cursors, but this is adapted to DynamoDB query that reads ranges of items with small chunks.

Projection

I used SELECT with * to get the whole item key and attributes (like ALL_ATTRIBUTES), and with a list of attributes to do a query projection (like SPECIFIC_ATTRIBUTES). There’s no aggregation and I don’t think we can do the equivalent of COUNT. Here is how I would do it if it were possible:


[opc@a aws]$ aws dynamodb execute-statement --statement "select count(*) from Demo"

An error occurred (ValidationException) when calling the ExecuteStatement operation: Unexpected path component at 1:8

This clearly not supported (yet).

According to the documentation expressions should be allowed, like this:


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort,size(MyUnstructuredData) from Demo where MyKeyPart=2 and size(MyUnstructuredData)>10"

An error occurred (ValidationException) when calling the ExecuteStatement operation: Unexpected path component at 1:28

Apparently, this size() function is allowed only on the WHERE clause:


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart=2 and size(MyUnstructuredData)>10"

{"Items":[{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}}]}

In summary, be careful. You must know only which attribute you filter in the WHERE clause. Equality on partition key allows single hash partition access. Without it, it is a scan which can take time and lof of RCU.

Cet article DynamoDB PartiQL – part II: SELECT est apparu en premier sur Blog dbi services.

Documentum Administrator plugins installation not possible

Wed, 2020-11-25 07:27

In some customer’s contexts you cannot install whatever you want. For policy reasons, or security. And documentum components really love to install client softwares to interact with them. In this blog we will discuss about the Documentum Administrator tool and the plugins it asks to install in order to gain full access to its functionalities.

Documentum Administrator’s plugins

DA requires at least 3 things to be installed on client machines in order to have the file transfer working:

  • The browser extension (IE, Chrome, Firefox, aso)
  • The NativeSetup.exe
  • You will have to change some settings in the browsers as well, in order to allow the execution of some scripts for these sites, white list it, and so on

These two programs allow to transfer and view files from the Documentum Administrator web application. I think it should be the same for Webtop as it’s based on the same. Unfortunately, some customers cannot allow users, even administrators to run specific exe files on their machines. So we tried to install them manually with some policies allowance and some elevated rights but it didn’t work very well as our policies were too restrictive.

In addition we were not allowed to change some parameters in the browsers as it was managed by the security team. So what can we do?

Setup Documentum Administrator in HTTP mode

Depending on what you need to do with the DA, you can enable the HTTP mode. This mode allows to use the default browser/http transfer protocols, hence it will not require anything to be installed on the end user machine. However this mode has some drawbacks and limitations.

Here is how to enable the HTTP mode:

  1. Login to the web server and navigate to the exploded war file folder
  2. Locate the file: da/wdk/app.xml
  3. Change <default-mechanism> in <contentxfer> to http
  4. Save the file and restart the web application server
  5. Allow popups in the browser

Now the content transfer has been set to http mode.

Limitations

HTTP content transfer is supported for XML files but only for a single file used with the Default XML Application. For virtual documents, only the root (parent) file is transferred. The browser handles the launch of viewing and editing applications.

The checkout directory is not configurable. To checkout a document, users must select Edit, then either save the document or open and save it. On checkin, the user must navigate to the location in which the document was saved. User preferences are not supported in this mode.

Feature UCF HTTP Download to client UCF Deployment at client side No client-side deployment Drag and drop Supported on IE Browser Not supported Viewing or editing application Configurable Controlled by browser ACS support Supported Limited support for export or edit; not supported for view with relative links Progress display Supported Supported only by certain browser Preferences Supported Not supported Restart interrupted operation Supported Not supported Checkout Supported Limited support. User must save and select checkout location Edit Supported Limited support. User must save and select checkout location Checkin Supported Limited support. User must navigate to saved document View Supported; does not transfer content if file is up to date on client Supported; always transfers content Export Supported Supported Import Supported Limited support. Single file selection at a time, no folder import Client xfer tracing Supported Not supported Server xfer tracing Supported Supported File compression Supported, with configurable exceptions Turn on HTTP compression in web.xml XML application Supported Import single file against Default XML Application Virtual document Supported Root only Temp working directory for upload/download temp.working.dir None Checked out files checkout.dir User must save file, then check in from file Exported files export.dir or user choice Browser specific UI to select a download location Viewed files viewed.dir Browser temporary internet files location, for example, $java{user.home}\Local Settings\Temporary Internet Files User location (base for other locations) user.dir (defined in config file) None Registry file location registry.file None Registry mode registry.mode None Log file location logs.dir None Tracing/debug tracing.enabled None File polling file.poll.interval None

Cet article Documentum Administrator plugins installation not possible est apparu en premier sur Blog dbi services.

Documentum CTS reinstallation failing Null Pointer

Wed, 2020-11-25 03:37

I went into an issue by a customer were I had to upgrade in place the CTS. When you have to upgrade a CTS on the same server you first have to uninstall the old version in order to install the new one. Unfortunately, the uninstallation didn’t end well and some folders or configuration haven’t been removed properly.

Removing CTS manually

If the uninstallation failed with the uninstaller, you will not be able to run it again to remove all the stuff, as it thinks that it has already be uninstalled. So you’ll have to remove the components manually.

  1. Remove the 3 services for CTS: You will find 3 services with the name beginning with “Documentum”, remove them.
  2. Remove the installation folder: By default the installation folder is located in C:\Documentum, you can remove it or at least change its name.
  3. There are some keys in the registry, normally you can keep it as it doesn’t prevent the software to be reinstalled. But note that you can search for “Documentum” in the registry to find all concerned keys.
  4. Remove the DFC if it is still there (Add or Remove Programs).
  5. In the targeted repository, remove the folders System/Media Server, System/Applications/Transformation and System/Applications/Rich_Media_services. It is where the transformation profiles are located. Note that you may keep it as the latest installers can dump them in order to replicate your configurations.
Running the installer fails with a null pointer

I had the issue where I did all the recommendations above, and when I ran the installer it failed with a null pointer. Even in the GUI I saw a path trying to install here: null/Documentum.

It is even impossible to change it manually as it’s a path generated by the installer itself. I thought it was due to the registry keys but it wasn’t, I changed a lot of them manually and nothing changed.

The issue came from a file that wasn’t deleted by the uninstallation, this file handles information about the previous installation, it is located here: C:\Program Files\Zero G Registry\.com.zerog.registry.xml

You’ll have to display hidden files and folders. Delete this file and the installation will go smoothly now.

Cet article Documentum CTS reinstallation failing Null Pointer est apparu en premier sur Blog dbi services.

Ansible Custom Interactive Dynamic Inventory

Wed, 2020-11-25 01:35

When working with an Ansible dynamic inventory, you may want to update it from your playbook on the fly. For instance, you may want to create a server and then install an application on it without calling a second playbook or developing specific code to create a runtime group (e.g. “add_host” module). Or you may want to update the inventory on the fly without having to wait for an asynchronous mechanism to be triggered. This all this project is about: providing you a simple, fully encrypted, interactive dynamic inventory made to measure for your environment.

Back-end setup 1) Install the database software:

This will serve as a data storage for your Ansible inventory data (More details here)

wget https://siodb.io/packages/InstallSiodb.sh
chmod u+x ./InstallSiodb.sh
sudo ./InstallSiodb.sh
2) Get the last version of the dynamic inventory:
git clone https://github.com/siodb/ansible-dynamic-inventory.git
cd ansible-dynamic-inventory
3) Create the inventory data model and the database user:
sudo -i -u siodb siocli --user root < sio_inv_data_model.sql
sudo -i -u siodb siocli --user root < sio_inv_user.sql
4) Create your groups, groupvars, hosts and hostvars:

On this step, you can add your groups, groupvars, hosts and hostvars by inserting them into the correct table. Also, you can link them together using the columns:

  • groups_variables.group_id: to link variables to a group from the group table.
  • hosts.group_id: to link a group from the group table.
  • hosts_variables.host_id: to link variables to a host from the hosts table.

Here is a overview of the data model used for this inventory (red number indicating the order in which to insert data):

So, first connect to the database freshly installed from step 1):

sudo -i -u siodb siocli --user root

Then insert your data. For instance:

use database sioinv ;
insert into groups
values
    ( 'production' ),
    ( 'test' ),
    ( 'development' )
;
insert into groups_variables
values
    ( 1, 'domain_name', 'company.com' ),
    ( 2, 'environment_name', 'production' ),
    ( 2, 'subnet', '10.10.0.0/16' ),
    ( 3, 'environment_name', 'test' ),
    ( 3, 'subnet', '10.20.0.0/16' ),
    ( 4, 'environment_name', 'development' ),
    ( 4, 'subnet', '10.30.0.0/16' )
;

insert into hosts
values
    ( 2, 'server-01', CURRENT_TIMESTAMP ),
    ( 3, 'server-02', CURRENT_TIMESTAMP ),
    ( 4, 'server-03', CURRENT_TIMESTAMP )
;

insert into hosts_variables
values
    ( 1, 'public_ip', '0.1.2.3' ),
    ( 1, 'application', 'app01' ),
    ( 2, 'public_ip', '0.1.2.4' ),
    ( 2, 'application', 'app02' ),
    ( 3, 'public_ip', '0.1.2.4' ),
    ( 3, 'application', 'app02' )
;
Configure the inventory

You must tell the inventory script where to lookup for its data. That is done configuring the file sio_inv.ini:

[sio_inv]
siodb_rest_ip = localhost
siodb_rest_port = 50443
siodb_rest_user = sioinv
siodb_rest_token = [token-generated-from-script-sio_inv_user.sql]
siodb_rest_tls_verify_certificate = no
Validate the inventory

Use the command ansible-inventory to validate your inventory. For instance:

$ ansible-inventory -i ./sio_inv.py  --graph --vars
@all:
  |--@development:
  |  |--server-03
  |  |  |--{domain_name = company.com}
  |  |  |--{environment_name = development}
  |  |  |--{subnet = 10.30.0.0/16}
  |  |--{environment_name = development}
  |  |--{subnet = 10.30.0.0/16}
  |--@production:
  |  |--hypervisor-01
  |  |  |--{domain_name = company.com}
  |  |  |--{environment_name = production}
  |  |  |--{subnet = 10.10.0.0/16}
  |  |--hypervisor-02
  |  |  |--{domain_name = company.com}
  |  |  |--{environment_name = production}
  |  |  |--{subnet = 10.10.0.0/16}
  |  |--server-01
  |  |  |--{application = app01}
  |  |  |--{domain_name = company.com}
  |  |  |--{environment_name = production}
  |  |  |--{public_ip = 0.1.2.3}
  |  |  |--{subnet = 10.10.0.0/16}
  |  |--server-04
  |  |  |--{domain_name = company.com}
  |  |  |--{environment_name = production}
  |  |  |--{subnet = 10.10.0.0/16}
  |  |--{environment_name = production}
  |  |--{subnet = 10.10.0.0/16}
  |--@test:
  |  |--server-02
  |  |  |--{application = app02}
  |  |  |--{domain_name = company.com}
  |  |  |--{environment_name = test}
  |  |  |--{public_ip = 0.1.2.4}
  |  |  |--{subnet = 10.20.0.0/16}
  |  |--server-05
  |  |  |--{domain_name = company.com}
  |  |  |--{environment_name = test}
  |  |  |--{subnet = 10.20.0.0/16}
  |  |--{environment_name = test}
  |  |--{subnet = 10.20.0.0/16}
  |--@ungrouped:
  |--{domain_name = company.com}
Example of playbook

Here is an example of how to interactively deal with the dynamic inventory while the playbook is running
(Set the variable {{ sioinv_user_token }} where it makes sense for you and store the token in Ansible Vault):


---
- name: Full Stack Service creation
  hosts: hypervisor-01
  tasks:

    - name: Create machine
      shell: >
        echo "Module or command to create a new Virtual Machine {{ vm_name }}"

    - name: Update the custom dynamic inventory on the fly with the just-created vm
      uri:
        url: https://localhost:50443/databases/sioinv/tables/hosts/rows
        validate_certs: false
        user: sioinv
        password: {{ sioinv_user_token }}
        method: POST
        body: >
          [{
            "name": "{{ vm_name }}"
          }]
        force_basic_auth: yes
        status_code: 201
        body_format: json
      register: r_vm_post

    - name: Add hostvars on the fly to the just-created vm
      uri:
        url: https://localhost:50443/databases/sioinv/tables/hosts_variables/rows
        validate_certs: false
        user: sioinv
        password: {{ sioinv_user_token }}
        method: POST
        body: >
          [{
            "host_id": {{ r_vm_post.json.trids[0] }},
            "name": "ip_address",
            "value": "10.10.0.0"
          }]
        force_basic_auth: yes
        status_code: 201
        body_format: json

    - name: Refresh the inventory in current run to include the new VM and its variables before moving to the next play
      meta: refresh_inventory

- name: Full Stack Service creation
  hosts: "{{ vm_name }}"
  tasks:

    - name: Create the new database on the new VM
      shell: >
        echo "Module or command to create a new database on {{ vm_name }}"

Last words before you go

The inventory is the foundation of your Ansible strategy. It must be dynamic as possible and avoiding complexity to ensure smooth maintenance in the future. Modifying the inventory may lead you to change all your roles and playbooks and is something you may want to avoid. With this simple inventory describe here, you’ll have a strong foundation for your Ansible automation.

I hope this helps you, and please do contact us or comment below if you need more details.

Cet article Ansible Custom Interactive Dynamic Inventory est apparu en premier sur Blog dbi services.

PostgreSQL 14: Add the number of de-allocations to pg_stat_statements?

Wed, 2020-11-25 01:21

In the last post we had a look at an interesting patch (schema variables) that is currently waiting to be committed for PostgreSQL 14. Another patch, that currently is in the same state is named [PATCH] Add features to pg_stat_statements. As this does not give much information what the patch is about, here a short summary: When you reach the maximum number of allowed statements to track, pg_stat_statements will de-allocate entries to free space for other statements. As de-allocations use resources as well and frequent de-allocations might also mean that pg_stat_statement is not configured the right way for your workload, this patch provides the following information: how many de-allocations happened.

In the default configuration pg_stat_statements tracks a maximum of 5000 statements:

postgres=# show pg_stat_statements.max;
 pg_stat_statements.max 
------------------------
 5000
(1 row)

The lowest value allowed for pg_stat_statements.max is 100:

postgres=# alter system set pg_stat_statements.max=5;
ERROR:  5 is outside the valid range for parameter "pg_stat_statements.max" (100 .. 2147483647)
postgres=# alter system set pg_stat_statements.max=100;
ALTER SYSTEM

As this change requires a restart it should be set carefully:

postgres=# select context from pg_settings where name = 'pg_stat_statements.max';
  context   
------------
 postmaster
(1 row)

postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-20 01:00:54.656 CET - 1 - 102610 -  - @ LOG:  redirecting log output to logging collector process
2020-11-20 01:00:54.656 CET - 2 - 102610 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# show pg_stat_statements.max;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# show pg_stat_statements.max;
 pg_stat_statements.max 
------------------------
 100
(1 row)

Lets generate some random statements and the check what additional information we get with this patch:

postgres=# select 'create table t'||i||'( a int, b int, c int)' from generate_series(1,101) i; \gexec
                ?column?                 
-----------------------------------------
 create table t1( a int, b int, c int)
 create table t2( a int, b int, c int)
 create table t3( a int, b int, c int)
 create table t4( a int, b int, c int)
 create table t5( a int, b int, c int)
 create table t6( a int, b int, c int)
 create table t7( a int, b int, c int)
 create table t8( a int, b int, c int)
 create table t9( a int, b int, c int)
...
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
postgres=# 

This created 101 tables and should have forced pg_stat_statements to de-allocate some older statements. There is a new, very simple view which provides this information:

postgres=# select * from pg_stat_statements_info;
 dealloc 
---------
       3
(1 row)

Nice. This will give you an idea if pg_stat_statements.max is not properly configured for your environment. If you manually re-set pg_statements the de-allocation counter will also be reset:

postgres=# select * from pg_stat_statements_info;
 dealloc 
---------
       3
(1 row)

postgres=# select pg_stat_statements_reset();
 pg_stat_statements_reset 
--------------------------
 
(1 row)

postgres=# select * from pg_stat_statements_info;
 dealloc 
---------
       0
(1 row)

Cet article PostgreSQL 14: Add the number of de-allocations to pg_stat_statements? est apparu en premier sur Blog dbi services.

DynamoDB PartiQL – part I: INSERT

Tue, 2020-11-24 12:51
By Franck Pachot

.
This sounds paradoxical to execute SQL statements on a NoSQL database, but we have now a new API to interact with DynamoDB, which looks like SQL. AWS data services is a collection of purpose-built database services that have their own API. The relational databases in RDS all share a similar API thanks to the SQL standard. However, for non-relational databases, there is no standard: each NoSQL databases, even if they have similar get/put API, do not share a common language. In order to build a converged API on top of that, Amazon is using the Open Source PartiQL to provide SQL-like access. And this has been recently implemented for DynamoDB. Here are my first tests, this will be a multi-part blog post. I’ll mention immediately the most important: you are still in a NoSQL database. The “SQL” there has very little in common with the SQL used in relational databases. Introducing the INSERT, UPDATE, SELECT, DELETE keywords do not transform a key-value datastore to a relational database: you still put and get items, not set of rows, you do not join multiple items in the database, you are not ACID,…

AWS CLI

I’ll run from AWS CLI here and for the moment, PartiQL is not available in the Version 2:


[opc@a tmp]$ aws --version
aws-cli/2.1.3 Python/3.7.3 Linux/4.14.35-2025.400.9.el7uek.x86_64 exe/x86_64.oracle.7

[opc@a tmp]$ aws dynamodb execute-statement

usage: aws [options]   [ ...] [parameters]

aws: error: argument operation: Invalid choice, valid choices are:

However, it is available in Version 1 (even if Version 2 is “now stable and recommended for general use”) so I installed temporarily Version 1:


cd /var/tmp
wget -c https://s3.amazonaws.com/aws-cli/awscli-bundle.zip
unzip -o awscli-bundle.zip
sudo ./awscli-bundle/install -i /var/tmp/aws-cli/awscli1
alias aws='/var/tmp/aws-cli/awscli1/bin/aws "$@" --output json | jq --compact-output'

You can see that I like the output as compact json, as this is easier to put in the blog post, but of course you don’t have to. And you can also run the same from the console.

For the moment, this feature is not deployed in all regions, so if you don’t see it check on another region. The announce about it is there: You now can use a SQL-compatible query language to query, insert, update, and delete table data in Amazon DynamoDB. When you will read my blog post, all the above may be outdated and you can use it in all regions, and with the latest AWS CLI. While I’m there, the documentation for this feature is in the developer guide: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-reference.html

DDL

There is no DDL like CREATE TABLE so you need to create your DynamoDB with the usual API: Console, CLI or API


aws dynamodb create-table --attribute-definitions \
  AttributeName=MyKeyPart,AttributeType=N \
  AttributeName=MyKeySort,AttributeType=N \
 --key-schema \
  AttributeName=MyKeyPart,KeyType=HASH \
  AttributeName=MyKeySort,KeyType=RANGE \
 --billing-mode PROVISIONED \
 --provisioned-throughput ReadCapacityUnits=25,WriteCapacityUnits=25 \
 --table-name Demo

This the same table as I used in a previous post in GSI and LSI. This table has a composite key, partitioned by hashed on the MyKeyPart attribute and by range on MyKeySort, locally indexed on the MyKeySort attribute by default.

INSERT

I can insert new items there with a simple INSERT statement:


[opc@a aws]$ aws dynamodb execute-statement --statement \
"insert into Demo value {'MyKeyPart':1,'MyKeySort':1,'MyUnstructuredData':'here is my first insert :)'}"

{"Items":[]}

You can already see that besides the INSERT keyword, nothing is similar to RDBMS SQL. In SQL the INSERT reads from a set of rows, and the VALUES keyword is a table value constructor to provide this set of rows rather than querying it from a table. here we have s singular VALUE keyword to input a single JSON document.

Mixing the code (SQL keywords, table and attribute identifiers) with data (literal values, like the numbers or character strings I used there) is handy in an interactive query console, but is not the right way to code. As with any API, we must code with variables, and pass them as parameters at execution time. Here is how to do it:


[opc@a aws]$ aws dynamodb execute-statement --statement \
"insert into Demo value {'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}" \
  --parameters '[{"N":"2"},{"N":"2"},{"S":"use parameters when embedding SQL in programs"}]'

{"Items":[]}

This is the correct way to use SQL in programs: the SQL text contains parameters, as placeholders for values, also called bind variables. They are simple question marks here (?) which are referenced in order in the parameters list. Some SQL databases also support named parameters (:variable) but not here.

You should remember that are bind variables, not substitution text. For Strings, you don’t put the quotes around them in the SQL text. The data type is declared in the parameter list.

When you have data to ingest, you don’t want a network roundtrip for each item. Here is how to send batched inserts:


[opc@a aws]$ aws dynamodb batch-execute-statement --statements \
'[ {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"2"},{"N":"1"},{"S":"a"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"2"},{"N":"2"},{"S":"b"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"2"},{"N":"3"},{"S":"c"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"2"},{"N":"4"},{"S":"d"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"2"},{"N":"5"},{"S":"e"}]} ]'

{"Responses":[{"TableName":"Demo"},{"TableName":"Demo","Error":{"Message":"Duplicate primary key exists in table","Code":"DuplicateItem"}},{"TableName":"Demo"},{"TableName":"Demo"},{"TableName":"Demo"}]}

With batched statements, some statements may fail and the Response holds a list of errors encountered. Here I inserted multiple MyKeySort values for the same MyKeyPart. And one of them (MyKeyPart=2,MyKeySort=2) was already there, and then was rejected as duplicate key. The others are inserted.

Finally, here is what I have in my Demo table:

Note that if there are already some duplicates in the Statements list, the whole call is rejected:


[opc@a aws]$ aws dynamodb batch-execute-statement --statements '[ {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"9"},{"N":"9"},{"S":"a"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"9"},{"N":"9"},{"S":"b"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"9"},{"N":"1"},{"S":"c"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"9"},{"N":"2"},{"S":"d"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"9"},{"N":"3"},{"S":"e"}]} ]'

An error occurred (ValidationException) when calling the BatchExecuteStatement operation: Provided list of item keys contains duplicates

Because of the duplicate [{“N”:”9″},{“N”:”9″}] here, no items have been inserted.

Unfortunately, I have to repeat each statement there. I would prefer an API where I prepare one parameterized statement and to be executed with many values, like the array interface in SQL databases. But we are in NoSQL and each Item has its own structure that may differ from the other.

SELECT

Let’s have a look at my items with a SELECT statement:


[opc@a aws]$ aws dynamodb execute-statement --statement "select * from Demo"

{"Items":[
 {"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"a"},"MyKeySort":{"N":"1"}}
,{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"use parameters when embedding SQL in programs"}
,"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"c"}
,"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"d"}
,"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"e"}
,"MyKeySort":{"N":"5"}},{"MyKeyPart":{"N":"1"},"MyUnstructuredData":{"S":"here is my first insert :)"}
,"MyKeySort":{"N":"1"}}]}

Of course I would like to see them sorted:


[opc@a aws]$ aws dynamodb execute-statement --statement "select * from Demo order by MyKeyPart,MyKeySort"     
                                                
An error occurred (ValidationException) when calling the ExecuteStatement operation: Must have WHERE clause in the statement when using ORDER BY clause.

This is a reminder that PartiQL is not a database engine. It cannot provide a sorted result set if the underlying database does not support it. DynamoDB is a distributed database, partitioned by hash, which means that a SCAN (the operation behind this SELECT as I have no where clause to select a partition) does not return the result in order. This is a fundamental concept in DynamoDB: in order to be scalable and predictable, there are no cross-partition operations. And a SQL language over it will not change that.


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart=2 order by MyKeyPart,MyKeySort"

{"Items":[{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}}]}

Here the ORDER BY is accepted because I explicitely restrained the operation to a single partition, with my WHERE clause (an equality predicate on the HASH key). And I request a sort on the RANGE key which is locally indexed in DynamoDB, and then can scan the items in this order. Again, you must keep in mind that PartiQL is not a database engine. The engine is DynamoDB and PartiQL maps a SQL-like language on top of the DynamoDB NoSQL API. And of course, you have the choice: all data manipulation can be done with the usual API. But there’s more to say on SELECT, and this will be in Part II…

Cet article DynamoDB PartiQL – part I: INSERT est apparu en premier sur Blog dbi services.

Installing EDB Advanced Server without EDB Repository

Tue, 2020-11-24 01:38

Almost all of us, who already installed Enterprise DB Tools on a Linux server know the procedure. You need to add the EDB Repository to your server or your Red Hat Satellite configuration and after that you can easily install the tools you need. But what happens, if you are not able to add the repository?
The answer is really simple: Download the rpms and install them directly on the server.

Download the EDB tar ball

On the repository of EDB you can find the possibility to download the tar-ball which contains all the packages needed to install EDB products. The tar ball is about 2 GB size, so there should be space for it somewhere.

Just scroll a bit down on the repo page

In case you don’t want to download the whole tar ball, there is also the possibility to download the single rpms. But as I want to have everything, I go on with the tar ball.

Downloading the tar ball directly. Of course you need your subscription credentials here as well.

 wget https://username:password@yum.enterprisedb.com/edb/redhat/edb_redhat_rhel-7-x86_64.tar.gz
--2020-11-19 09:52:34--  https://username:*password*@yum.enterprisedb.com/edb/redhat/edb_redhat_rhel-7-x86_64.tar.gz
Resolving yum.enterprisedb.com (yum.enterprisedb.com)... 54.165.250.135
Connecting to yum.enterprisedb.com (yum.enterprisedb.com)|54.165.250.135|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Reusing existing connection to yum.enterprisedb.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 2052730909 (1.9G) [application/x-gzip]
Saving to: ‘edb_redhat_rhel-7-x86_64.tar.gz’

10% [==============                                                               ] 215,687,168 2.40MB/s  eta 21m 51s
Extract the needed packages

As you we want to install the EDB Advanced Server and EFM, we need to find out which packages are available in the tar ball. As the output is quite long, let’s write it into a file for better readability.

tar -tf edb_redhat_rhel-7-x86_64.tar.gz > edb_rhel_packages.txt

After that we can search for the files we need using vi and afterwards extract them. Find below the complete list of packages you need for the installation of EDB Advanced Server.

tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-13.0-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-edbplus-39.0.0-1.rhel7.x86_64.rpm  
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-libicu-66.1-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-pgagent-4.2.0-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-pgpool41-extensions-4.1.2-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-pgsnmpd-1.0-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-client-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-cloneschema-1.14-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-contrib-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-core-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-devel-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-docs-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-edb-modules-1.0-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-indexadvisor-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-libs-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-llvmjit-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-parallel-clone-1.8-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-pldebugger-1.1-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-plperl-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-plpython3-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-pltcl-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-sqlprofiler-4.0-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-sqlprotect-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-sslutils-1.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-slony-replication-2.2.8-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-slony-replication-core-2.2.8-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-slony-replication-docs-2.2.8-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-slony-replication-tools-2.2.8-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-jdbc-42.2.12.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-pgpool41-libs-4.1.2-1.rhel7.x86_64.rpm
Install EDB Advanced Server

Now that all needed packages are downloaded, go on with the installation. First we have to install the packages needed from the RHEL and EPEL Repository.

sudo yum install -y unzip xorg-x11-xauth screen boost-atomic boost-chrono boost-date-time boost-filesystem boost-system boost-regex boost-thread tcl uuid python3 python3-libs python3-pip python3-setuptools libtirpc libicu libxslt llvm5.0 llvm5.0-libs libtirpc lm_sensors-libs net-snmp-agent-libs net-snmp-libs perl-Compress-Raw-Bzip2 perl-Compress-Raw-Zlib perl-DBD-Pg perl-DBI perl-Data-Dumper perl-IO-Compress perl-Net-Daemon perl-PlRPC perl-version perl-Switch

Once we installed all the dependencies needed, we can finally install the Advanced Server using the rpms.

sudo yum install /tmp/edb-*.rpm      

In case one rpm is missing, you will get a hint, that one rpm needs another as dependency. You can just extract this rpm from the tar ball as well and retry the installation command.

Finished. That’s all you have to do, quite simple, isn’t it.
Now you can go on with the setup of the database. In case you need help in that point, you can find some hints here.

Cet article Installing EDB Advanced Server without EDB Repository est apparu en premier sur Blog dbi services.

AWS burstable EC2 and CloudWatch metrics

Mon, 2020-11-23 16:24
By Franck Pachot

.
Your application workload is usually not constant and maybe not predictable. If you undersize the capacity, you will save money but in case of peak of activity you will have severe performance issues. Thus, you need to size the capacity for peak workloads: you pay for idle CPU when out of those peaks. This is one major reason for virtualization and cloud: elasticity. Because the hardware is shared by multiple applications, which probably don’t have their workload peak at the same time, you don’t have to size the physical capacity for the sum of peak activity, as you would have to do if each application run on dedicated hardware. AWS has many features to choose between reserved capacity and auto-scaling ones. Here is as very simple one that is even available in the Free Tier: burstable compute.

I have created an EC2 t2.micro instance. And I’m displaying 3 metrics in a CloudWatch:

I started with 32 minutes of “CPU Credit Balance”. This means that I can run at 100% CPU for 32 minutes. This credit stays and even increases when I’m not using the CPU. At 16:20 have started a program running fully in CPU ( `yes > /dev/null` ). I used 100% of CPU until 17:00 approximately and then the “CPU Utilization (%)” decreased to around 20% which is the baseline in this t2.micro when credits are exhausted. The “CPU Credit Usage” was at 5 during all this period because it is gathered every 5 minutes: during 5 minutes running at 100% CPU you use 5 minutes of credit. And during that time you see the “CPU Credit Balance” decreasing down to zero. I stopped my `yes` process at 17:45 where the “CPU utilization” is at 0% as nothing is running, and the “CPU credit balance” starts to grow again slowly.

From this CloudWatch “CPU Utilization” you cannot say if between 17:05 and 17:35 you are only 20% CPU usage because the process doesn’t need to (being idle) or because it is throttled by lack of CPU credit. But you can guess it when you look at the credit balance and usage. Let’s have a look at the default SAR statistics gathered every 10 minutes:


[ec2-user@ip-172-31-42-243 ~]$ sar

04:04:59 PM       LINUX RESTART                                                                                                        

04:10:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
04:20:01 PM     all      0.02      0.00      0.02      0.00      0.00     99.95
04:30:01 PM     all     45.88      0.00      0.58      0.09      0.01     53.44
04:40:01 PM     all     99.02      0.00      0.96      0.00      0.02      0.00
04:50:01 PM     all     99.03      0.00      0.95      0.00      0.01      0.00
05:00:01 PM     all     99.04      0.00      0.95      0.00      0.01      0.00
05:10:01 PM     all     28.03      0.00      0.23      0.00     71.74      0.00
05:20:02 PM     all     17.50      0.00      0.12      0.00     82.39      0.00
05:30:01 PM     all     18.90      0.00      0.11      0.00     80.99      0.00
05:40:02 PM     all     18.83      0.02      0.16      0.00     81.00      0.00
05:50:01 PM     all      3.45      1.07      1.10      4.60     33.31     56.47
06:00:01 PM     all      0.02      0.00      0.02      0.00      0.00     99.96
06:10:01 PM     all      0.02      0.00      0.02      0.00      0.02     99.94
06:20:01 PM     all      0.03      0.00      0.01      0.00      0.01     99.95
06:30:01 PM     all      0.01      0.00      0.01      0.00      0.00     99.98

Before starting my program, the CPU was nearly 100% idle, until 16:20. Then when it was running the CPU was used at nearly 100% in user space until 17:00. Then, the CPU usage is nearly 20% until I stopped the process at 17:45. This is exactly the same as what CloudWatch displays. But SAR has an additional metric here: “%steal” and this is how I know that it was not idle but throttled by the hypervisor. My process was 100% in Running state but this shows up as 20% CPU usage and 80% stolen. Then I’ve stopped the process and %user and %steal is down to zero – all time is idle.

And once the CPU is idle, my CPU credits start to increase again:

24 hours later I earned 144 minutes of CPU. I’ll not earn more – this is the maximum. Of course all that is documented in the instance type description (https://aws.amazon.com/ec2/instance-types/) and Burstable CPU description (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-credits-baseline-concepts.html): an idle t2.micro gets 6 minutes of credits every hour, with a maximum of 144. The only thing that differs from documentation in my example is the baseline, which should be 10% when out of credits, but I am above here. No idea why.

I’ve started again a process that never sleeps:

It can run at 100% for 144 minutes but I stop it there, this is the same pattern as above.
Just one thing, CloudWatch doesn’t tell it to you but “CPU utilisation%” includes the time running in kernel (%system). Instead of running `yes > /dev/null` I have run `dd if=/dev/urandom of=/dev/null bs=1M` which runs mostly in kernel space:

03:20:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
09:00:01 PM     all      0.02      0.00     99.96      0.00      0.02      0.00
Average:        all      7.73      0.00      1.26      0.00      0.01     91.00
Amazon RDS

This can be interesting for a database and some RDS instance types are burstable (db.t3.*). Databases rarely have an homogeneous activity, but they must have CPU resources available at peak time. Because databases share data, they protect data structures with latches and maybe spinlocks: in case of CPU starvation, a process scheduled out of CPU may hold an exclusive resource and another process comes to running state just to spin in CPU trying to acquire this resource. This is a waste of CPU cycles, as the resource will not be available while the other process is in the runqueue.
Aurora can run on db.t3 medium and large. But you may also look at auto-scale and maybe serverless. Oracle Database does not benefit a lot from burstable instances because the software license is paid on processor metric, which counts on full vCPU rather than the baseline, so you probably want to use 100% of it as long as you need.

Cet article AWS burstable EC2 and CloudWatch metrics est apparu en premier sur Blog dbi services.

Documentum DQMan for repositories 16.7 and after

Mon, 2020-11-23 08:04

You may have noticed when upgrading your documentum infra that after 7.3 the DQMan utility from FME is no more usable (you get a red cross without any warnings on the screen). In this blog I’ll show you how to get it working again. I tested it on 16.7 but it should work on the latest ones.

Install Java 32bit

DQMan will only work with Java 32bit as it’s not compiled (for the moment) in x64.

On my case I installed Java 8u192 in C:\jre-1.8.192_x32.

Install DFC

Install DFC with the version you need, I have a repository in 16.7 so I installed DFC in 16.7. It’s not mandatory to install the DFC with the 32bits java, it doesn’t prevent DQMan to start. Fill up the dfc.properties which will connect to the repository.

Install DQMan

You can install it wherever you want, in my case I’ve put it in C:\Users\Public\dqman. I’ve the latest dqman version which is currently 6.0.1.34 (you can verify in the about).

Setup DQMan

All is now installed and we can setup dqman to work with our repository.

First you need dmcl.dll from an earlier version, put it into the root folder of dqman, e.g. C:\Users\Public\dqman. Mine is 156KB.

Now copy the dfc.properties from your DFC installation to the config folder of dqman: C:\Users\Public\dqman\config\dfc.properties

If you have a dmcl40.dll in the root folder delete it.

Then create a java.ini file in the root folder with the following inside:

java_library_path="C:\jre-1.8.192_x32\bin\client\jvm.dll"
JAVA_OPTIONS=" -Xcheck:jni -XX:+RestoreMXCSROnJNICalls -Xmx256m"
java_classpath = "C:\Documentum\dctm.jar;C:\Documentum\config;C:\Documentum\Shared\dfc.jar;C:\Users\Public\dqman\config"

Of course update the path depending on your installation.

You should now be able to launch it, you can verify the version of the DFC used in About:

Cet article Documentum DQMan for repositories 16.7 and after est apparu en premier sur Blog dbi services.

Will PostgreSQL14 finally come with schema variables?

Sun, 2020-11-22 13:40

One of the bits you need to solve when you migrate from Oracle to PostgreSQL is this: In the Oracle database there are PL/SQL packages and some of those have package variables defined. PostgreSQL does not know the concept of a package but you can use schemas to group your PL/pgSQL functions and procedures. When it comes to packages variables there is no easy solution currently.

Tip: You can add your own parameters in #postgresql pic.twitter.com/sdaOMStqqU

— Daniel Westermann (@westermanndanie) August 7, 2020

to emulate that, but it is nasty and not the intended use case for this. Schema variables are under discussion for quite some time and the good news is, that the patch is now “Ready for Committer”. This still is not a guarantee that PostgreSQL 14 will come with schema variables, but at least it is close to that.

As always, lets do some simple demos to understand what is going on and how the feature might help, once it is committed. The most simple example for a schema variable is this:

postgres=# create variable var1 as int;
CREATE VARIABLE

As the name of the feature (schema variables) implies, a variable is created in a specific schema. As I have not modified the default search_path the variable got created in the public schema. You can easily check this in the new catalog table pg_variable:

postgres=# select varname, varnamespace::regnamespace from pg_variable;
 varname | varnamespace 
---------+--------------
 var1    | public
(1 row)

By default, once a variable is created, it is persistent and available again after the instance is restarted:

postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-19 01:29:48.825 CET - 1 - 80179 -  - @ LOG:  redirecting log output to logging collector process
2020-11-19 01:29:48.825 CET - 2 - 80179 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# select 1;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select var1;
 var1 
------
     
(1 row)

Until now the variable does not contain any value, as we did not assign anything. To do that you can use “let”:

postgres=# \h let
Command:     LET
Description: change a schema variable's value
Syntax:
LET schema_variable = sql_expression
LET schema_variable = DEFAULT

URL: https://www.postgresql.org/docs/devel/sql-let.html

Assigning a value can be as simple as this:

postgres=# let var1 = 1;
LET
postgres=# select var1;
 var1 
------
    1
(1 row)

.. or you can calculate the new value:

postgres=# let var1 = var1 * 2;
LET
postgres=# select var1;
 var1 
------
    2
(1 row)

The value, a variable has assigned to, is not persistent, it lives only for the duration of the session:

postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-19 01:44:42.837 CET - 1 - 80305 -  - @ LOG:  redirecting log output to logging collector process
2020-11-19 01:44:42.837 CET - 2 - 80305 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# select 1;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select var1;
 var1 
------
     
(1 row)

If you want to have the value of a variable persistent you need to make it immutable:

postgres=# create immutable variable var2 as int default 2;
CREATE VARIABLE
postgres=# select var2;
 var2 
------
    2
(1 row)

postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-19 01:58:53.365 CET - 1 - 80414 -  - @ LOG:  redirecting log output to logging collector process
2020-11-19 01:58:53.365 CET - 2 - 80414 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# select 1;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select var2;
 var2 
------
    2
(1 row)

Important to understand is, that variables are not transaction safe by default:

postgres=# create variable var3 as int default 3;
CREATE VARIABLE 
postgres=# select var3;
 var3 
------
    3
(1 row)

postgres=# begin;
BEGIN
postgres=*# let var3=4;
LET
postgres=*# rollback;
ROLLBACK
postgres=# select var3;
 var3 
------
    4
(1 row)

But you can do it, if you want:

postgres=# create variable var4 as int default 5 on transaction end reset;
CREATE VARIABLE
postgres=# begin;
BEGIN
postgres=*# let var4 = 10;
LET
postgres=*# select var4;
 var4 
------
   10
(1 row)

postgres=*# rollback;
ROLLBACK
postgres=# select var4;
 var4 
------
    5
(1 row)

Like tables, variables can also be temporary:

postgres=# create temporary variable var6 as int default -1;
CREATE VARIABLE
postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-19 02:22:22.308 CET - 1 - 80611 -  - @ LOG:  redirecting log output to logging collector process
2020-11-19 02:22:22.308 CET - 2 - 80611 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# select 1;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select var6;
ERROR:  column "var6" does not exist
LINE 1: select var6;
               ^
postgres=# 

… and you can also specify to drop the variable at commit time:

postgres=# begin;
BEGIN
postgres=*# create temporary variable var7 as int default -1 on commit drop;
CREATE VARIABLE
postgres=*# let var7 = -9;
LET
postgres=*# commit;
COMMIT
postgres=# select var7;
ERROR:  column "var7" does not exist
LINE 1: select var7;
               ^
postgres=# 

Variables can be referenced in procedures, functions and in SQL:

postgres=# create variable var8 as int default 100;
CREATE VARIABLE
postgres=# create variable var8 as int default 100;
CREATE VARIABLE
postgres=# create function f1() returns int as $$select var8;$$ language SQL;
CREATE FUNCTION
postgres=# select f1();
 f1  
-----
 100
(1 row)

postgres=# create procedure p1() as $$
postgres$# declare
postgres$# begin
postgres$#   let var8 = 101;
postgres$#   raise notice '%', var8;
postgres$# end;
postgres$# $$ language plpgsql;
CREATE PROCEDURE
postgres=# call p1();
NOTICE:  101
CALL

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# select var8;
 var8 
------
  101
(1 row)

postgres=# insert into t1 values (101,'aaa');
INSERT 0 1
postgres=# select * from t1 where a = var8;
  a  |  b  
-----+-----
 101 | aaa
(1 row)

This is really a great feature and I do hope it finally gets committed for PostgreSQL 14.

Cet article Will PostgreSQL14 finally come with schema variables? est apparu en premier sur Blog dbi services.

Solr Sharding – Shard creation

Sun, 2020-11-22 06:11

I wrote this blog in conjunction with the one about the Solr Sharding – Concepts & Methods a year ago and it was 95% completed but then I had other stuff to do and just forgot I didn’t publish it. It might be a little bit late but well, better late than never… It will complete the overview I wanted to share around the Alfresco Clustering and in particular around the Solr Sharding. In this blog, I will talk about the creation of Shards. This includes the distribution of the different Instances, explaining the global parameters that can be used and showing some examples of what can be done with the concrete commands to achieve that.

I. Shard distribution

One of the strong points of using the Solr Sharding is that you can do pretty much what you want with the distribution of the Shards. It means that you can choose the best distribution of Shards based on your requirements, based on the resources available (Number of Solr Servers, RAM, I/O, …) and then Alfresco can just work with that seamlessly. Let’s consider some situations:

  • 1 Solr Server – 2 Shards – No HA (1 Shard Instance per Shard)
  • 2 Solr Server – 2 Shards – No HA (1 Shard Instance per Shard)
  • 2 Solr Server – 2 Shards – HA (2+ Shard Instances per Shard)
  • 6 Solr Server – 3 Shards – HA (2+ Shard Instances per Shard)

For the above examples, there are several possible representation of the Shards repartitions across the different Solr Servers. If we consider all Shards in the same way (same number of Shard Instance for each), then you can represent the above examples as follow:

Sharding distribution 1

If the Shard N°1 is the most important one and you want this one to be available on all Solr Servers with a maximum of 2 Shards per Solr Server, then this is another possible solution:

Sharding distribution 2

There are several ways to distribute the Shards and the good thing is that it is not really linked with the Sharding Method used. What I mean is that you can define the distribution that you want and it will be possible whatever the Sharding Method is. However, in some cases, the Method used might have an impact on whether or not the chosen distribution make sense. If I take the above example again where the Shard N°1 would be considered he most important one, then it means that you should know exactly what is inside this Shard N°1 but if DB_ID is used, then the content of this Shard will be random… Therefore, even if there is no hard link, some choice might not make a lot of sense if both aspects aren’t considered ;).

Alfresco (ACS only, not community) provides a graphical representation of the Solr Shards on the Alfresco Administration Console (https://alfresco-dns.domain/alfresco/s/enterprise/admin/admin-flocs). This is very useful to quickly see what is setup, how it is organized and what is the status of each and every Shards: whether they are up-to-date, lagging behind or not running at all. Here is an example of 1 Shard Group composed of 5 Shards with 2 Instances each:

Shard Groups

Shard Instances

In addition to that, I always find it very useful to look at the Solr reports (https://solr-dns.domain/solr/admin/cores?action=REPORT and https://solr-dns.domain/solr/admin/cores?action=SUMMARY) since it gives an overview of exactly how many acls and nodes are indexed. These are well known reports for Alfresco but in case you never saw these, I would advise you to take a look.

II. Shard creation parameters

As mentioned in the previous blog related to the Sharding Methods, when creating a Shard Instance, you will need to define some parameters. Here, I will only define them in the URL/Command because it is easier to have everything there. In addition to these parameters specific to each Method, there are some that are global. Before diving into the commands, let’s take a look at a few parameters and what they represent:

  • storeRef: Which Alfresco Store should this Shard index
  • numShards: Maximum number of different Shards that can be created
    • For DB_ID_RANGE, even if you need only 2 Shards at the moment, put more here (like 10/20) to be able to create new Shards in the future as the index grows. Once this limit is defined, you cannot exceed it so keep some margin
    • For all other Methods, set that to the number of Shards needed
  • numNodes: Maximum number of Solr Servers that can be used
    • For DB_ID_RANGE, even if you have only 2 Solr Servers at the moment, put more here (like 10/20) to be able to install new Solr Servers and create Shard Instances in the future. Once this limit is defined, you cannot exceed it so keep some margin
    • For all other Methods, even if you have only 2 Solr Servers at the moment, put more here (like 10/20). Maybe in the future you will want to have the same Shards (fix number) on more Solr Servers. Once this limit is defined, you cannot exceed it so keep some margin
  • nodeInstance: The “ID/Number” of the Solr Server on which the command is going to be executed, from 1 to numNodes, included
  • template: Which template should be used as base to create the new Core. The templates can be found under <SOLR_HOME>/solrhome/templates/
  • coreName: The base name of the Solr Core that this Shard should be part of. If you do not specify one, it will be generated from the Alfresco storeRef (E.g.: “workspace-SpacesStore”). Usual names can be “alfresco” or “archive” for example (Name of the default non-sharded cores)
  • shardIds: An ID that controls the generated Core Name, Base URL and Solr Data storage. This is to differentiate instances of Shards on a single Solr Server. You might think that this is the “ID/Number” of the Shard but not necessarily: with the DB_ID_RANGE for example, you can have the same shardIds used on two different Solr Servers for different Shards because the important thing is the range, it’s not the shardIds parameter in this case. So be careful here, this might be confusing. A best practice is however to have this shardIds set to the real “ID/Number” of the Shard so that it avoids confusion in all cases
  • replicationFactor: Technically speaking, the replication of a Shard Instance is another Shard Instance indexing the same set of nodes (so from the same Shard). You can normally mention the “replicationFactor” parameter in Solr but with the way it was implemented for Alfresco, you don’t need to care about that actually. You will see in the example below that to have several Shard Instances for the same Shard, you only need to execute the same commands on different servers
  • property.XXX: All the properties that can be set in the template’s solrcore.properties as well as the Sharding Method specific ones like:
    • property.data.dir.root: The place where the index will be put
    • property.shard.method: The Sharding Method to be used to create this Shard Instance
    • property.shard.instance: This is supposed to be the real “ID/Number” of the Shard for which a Shard Instance should be created (from 0 to numShards-1)

In summary, with the above parameters:

  • The name of the Core will be: ${coreName}-${shardIds}
  • The Solr Data will be put under: ${data.dir.root}/${coreName}-${shardIds}
  • The Solr Home will be put under: ${solr.solr.home}/${template}–${coreName}–shards–${numShards}-x-${replicationFactor}–node–${nodeInstance}-of-${numNodes}/${coreName}-${shardIds}
  • The Base URL will be: ${solr.baseurl}/${coreName}-${shardIds}
  • The Shard that will be created (unless otherwise): ${property.shard.instance}

One final note about the parameters: you need to be extra careful with these. I will say it again but in case of DB_ID_RANGE for example, Solr will not care about the shardIds or the property.shard.instance, it will only care about what it needs to index and that is based on the range… If you are using the Dynamic Registration, Alfresco will (and this is where the confusion can start) show you (in the Admin Console > Search Services Sharding) wrong information because it will base itself on the “property.shard.instance” to know to which Shard this instance belongs to (or at least supposed to…). This is just a graphical representation of the Shard Instances (so normally no impact on searches) but still, if you want an efficient and correct graphical view, keep things consistent!

III. Examples a. DB_ID_RANGE

I will use the “High Availability – 6 Solr Server – 3 Shards (2nd solution)” as working example to show which commands can be used to create these: 3 different Shards, 4 Shard Instances for each on the 6 Solr Servers. I will use the DB_ID_RANGE Method so it is easy to see the differences between each command and identify for which Shard an Instance will be created. I will use a range of 50 000 000, assume that we can go up to 20 Shards (so 1 000 000 000 at maximum) on up to 12 Solr Servers.

  • For the Solr Server 1 – Shards 0 & 1:
    • curl -v “http://solr1:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=1&template=rerank&coreName=alfresco&shardIds=0&property.shard.method=DB_ID_RANGE&property.shard.range=0-50000000&property.shard.instance=0
    • curl -v “http://solr1:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=1&template=rerank&coreName=alfresco&shardIds=1&property.shard.method=DB_ID_RANGE&property.shard.range=50000000-100000000&property.shard.instance=1
  • For the Solr Server 2 – Shards 1 & 2:
    • curl -v “http://solr2:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=2&template=rerank&coreName=alfresco&shardIds=1&property.shard.method=DB_ID_RANGE&property.shard.range=50000000-100000000&property.shard.instance=1
    • curl -v “http://solr2:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=2&template=rerank&coreName=alfresco&shardIds=2&property.shard.method=DB_ID_RANGE&property.shard.range=100000000-150000000&property.shard.instance=2
  • For the Solr Server 3 – Shards 0 & 2:
    • curl -v “http://solr3:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=3&template=rerank&coreName=alfresco&shardIds=0&property.shard.method=DB_ID_RANGE&property.shard.range=0-50000000&property.shard.instance=0
    • curl -v “http://solr3:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=3&template=rerank&coreName=alfresco&shardIds=2&property.shard.method=DB_ID_RANGE&property.shard.range=100000000-150000000&property.shard.instance=2
  • For the Solr Server 4 – Shards 0 & 1:
    • curl -v “http://solr4:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=4&template=rerank&coreName=alfresco&shardIds=0&property.shard.method=DB_ID_RANGE&property.shard.range=0-50000000&property.shard.instance=0
    • curl -v “http://solr4:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=4&template=rerank&coreName=alfresco&shardIds=1&property.shard.method=DB_ID_RANGE&property.shard.range=50000000-100000000&property.shard.instance=1
  • For the Solr Server 5 – Shards 1 & 2:
    • curl -v “http://solr5:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=5&template=rerank&coreName=alfresco&shardIds=1&property.shard.method=DB_ID_RANGE&property.shard.range=50000000-100000000&property.shard.instance=1
    • curl -v “http://solr5:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=5&template=rerank&coreName=alfresco&shardIds=2&property.shard.method=DB_ID_RANGE&property.shard.range=100000000-150000000&property.shard.instance=2
  • For the Solr Server 6 – Shards 0 & 2:
    • curl -v “http://solr6:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=6&template=rerank&coreName=alfresco&shardIds=0&property.shard.method=DB_ID_RANGE&property.shard.range=0-50000000&property.shard.instance=0
    • curl -v “http://solr6:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=6&template=rerank&coreName=alfresco&shardIds=2&property.shard.method=DB_ID_RANGE&property.shard.range=100000000-150000000&property.shard.instance=2
b. DATE

In this section, I will use the same setup but with a DATE Sharding Method. I will keep the same details as well (3 Shards) and therefore, each Shard Instance will contain 4 of the 12 months. There are several ways to obtain this:

  • No grouping option (default 1): Shard N°1 will contain the months 1,4,7,10 // Shard N°2 will contain the months 2,5,8,11 // Shard N°3 will contain the months 3,6,9,12
  • Grouping of 2: Shard N°1 will contain the months 1,2,7,8 // Shard N°2 will contain the months 3,4,9,10 // Shard N°3 will contain the months 5,6,11,12
  • Grouping of 3: Shard N°1 will contain the months 1,2,3,10 // Shard N°2 will contain the months 4,5,6,11 // Shard N°3 will contain the months 7,8,9,12
  • Grouping of 4: Shard N°1 will contain the months 1,2,3,4 // Shard N°2 will contain the months 5,6,7,8 // Shard N°3 will contain the months 9,10,11,12

Here, I will be group the months by 4. Since this is not the DB_ID_RANGE Method, all the Shard Instances for a Solr Server can be created by single command. However, when doing so, you cannot specify the “property.shard.instance” because it would contain several values (0,1,2 for example) and it is as far as I know, not supported. Therefore, the Shards will be created based on the “shardIds” parameter which has no problem with comma separated list:

  • For the Solr Server 1 – Shards 0 & 1:
    • curl -v “http://solr1:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=1&template=rerank&coreName=alfresco&shardIds=0,1&property.shard.method=DATE&property.shard.key=cm:modified&property.shard.date.grouping=4&property.shard.count=3″
  • For the Solr Server 2 – Shards 1 & 2:
    • curl -v “http://solr2:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=2&template=rerank&coreName=alfresco&shardIds=1,2&property.shard.method=DATE&property.shard.key=cm:modified&property.shard.date.grouping=4&property.shard.count=3″
  • For the Solr Server 3 – Shards 0 & 2:
    • curl -v “http://solr3:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=3&template=rerank&coreName=alfresco&shardIds=0,2&property.shard.method=DATE&property.shard.key=cm:modified&property.shard.date.grouping=4&property.shard.count=3″
  • For the Solr Server 4 – Shards 0 & 1:
    • curl -v “http://solr4:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=4&template=rerank&coreName=alfresco&shardIds=0,1&property.shard.method=DATE&property.shard.key=cm:modified&property.shard.date.grouping=4&property.shard.count=3″
  • For the Solr Server 5 – Shards 1 & 2:
    • curl -v “http://solr5:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=5&template=rerank&coreName=alfresco&shardIds=1,2&property.shard.method=DATE&property.shard.key=cm:modified&property.shard.date.grouping=4&property.shard.count=3″
  • For the Solr Server 6 – Shards 0 & 2:
    • curl -v “http://solr6:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=6&template=rerank&coreName=alfresco&shardIds=0,2&property.shard.method=DATE&property.shard.key=cm:modified&property.shard.date.grouping=4&property.shard.count=3″
c. ACL_ID (ACL v2)

In this section, I will use the same setup but with an ACL_ID Sharding Method. I will keep the same details as well (3 Shards). Each of the three Shards will get assigned some of the ACLs of the Alfresco Repository and it will index all the Alfresco nodes that have these specific ACLs. Therefore, it will be a random assignment that might or might not be evenly distributed (the more ACLs you have, the more evenly it is supposed to be in theory):

  • For the Solr Server 1 – Shards 0 & 1:
    • curl -v “http://solr1:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=1&template=rerank&coreName=alfresco&shardIds=0,1&property.shard.method=ACL_ID&property.shard.count=3″
  • For the Solr Server 2 – Shards 1 & 2:
    • curl -v “http://solr2:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=2&template=rerank&coreName=alfresco&shardIds=1,2&property.shard.method=ACL_ID&property.shard.count=3″
  • For the Solr Server 3 – Shards 0 & 2:
    • curl -v “http://solr3:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=3&template=rerank&coreName=alfresco&shardIds=0,2&property.shard.method=ACL_ID&property.shard.count=3″
  • For the Solr Server 4 – Shards 0 & 1:
    • curl -v “http://solr4:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=4&template=rerank&coreName=alfresco&shardIds=0,1&property.shard.method=ACL_ID&property.shard.count=3″
  • For the Solr Server 5 – Shards 1 & 2:
    • curl -v “http://solr5:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=5&template=rerank&coreName=alfresco&shardIds=1,2&property.shard.method=ACL_ID&property.shard.count=3″
  • For the Solr Server 6 – Shards 0 & 2:
    • curl -v “http://solr6:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=6&template=rerank&coreName=alfresco&shardIds=0,2&property.shard.method=ACL_ID&property.shard.count=3″
IV. Conclusion & final remarks:

At first, it might be a little confusing to work with Solr Sharding because there are a lot of terms that might overlap each-other a little bit but once you get it, it is such a pleasure to so easily create this kind of indexing architecture. Some final remarks:

  • As mentioned previously, you can create all the Instances on a Solr Server using a single command, except if you are using the DB_ID_RANGE because that contains the one variable parameter that needs to change between two different Shards. If the Instances are part of the same Shard (with the same range), then you can create them all in a single command.
  • If you want to have different Shards on the same Solr Server, then you need to change “property.shard.instance” (because it is a different Shard) and “shardIds” (because the Core & Base URL needs to be unique per Solr Server). The Shard Instance will belong to the Shard specified in “property.shard.instance” by default. If not present, it will use “shardIds” (as always, unless there is a range specified for the DB_ID_RANGE Method because this takes precedence over anything else).
  • If you want to have several Instances of the same Shard on the same Solr Server, then you need to change “shardIds” and keep “property.shard.instance” the same. If you can avoid that, then you should, for simplicity and consistency, but if you have a requirement to have two Instances of the same Shard on the same Solr Server, it is still possible.

If you are interested in playing with the Solr Sharding but you don’t have an existing infrastructure in place, you might want to take a look at this GitHub project from Angel Borroy which should help you to go further. Have fun!

Cet article Solr Sharding – Shard creation est apparu en premier sur Blog dbi services.

odacli create-database -io doesn’t work for older databases in 19.9

Fri, 2020-11-20 13:10
Introduction

On ODA, you create your databases with odacli create-database. You can create a complete database, and it takes about 10 minutes, and you can also create an instance-only database. It will simply and quickly create a record in the ODA registry, start the instance with a basic pfile, and that’s it. Then you’re supposed to do the next step: feed up the database with a manual restore or an RMAN duplicate.

Create a 12.2 instance-only is not OK on ODA 19.9

Unfortunately, creating an instance-only doesn’t work for me with 19.9. I tried to create a 12.2 instance and it failed immediatly:

odacli list-dbhomes

ID                                       Name                 DB Version                               Home Location                                 Status
---------------------------------------- -------------------- ---------------------------------------- --------------------------------------------- ----------
1de83958-c38e-4b9b-9bd3-6c510ce2d3c3     OraDB19000_home1     19.9.0.0.201020                          /u01/app/oracle/product/19.0.0.0/dbhome_1     Configured
c191668f-10de-4184-b22e-b953648bf4a7     OraDB12201_home1     12.2.0.1.201020                          /u01/app/oracle/product/12.2.0.1/dbhome_1     Configured


odacli create-database -dh 'c191668f-10de-4184-b22e-b953648bf4a7' -u ADM001BD_765D -n ADM001BD -r asm -io

odacli describe-job -i b86b28be-0d71-49db-a926-59b98d0c0d6c

Job details
----------------------------------------------------------------
                     ID:  b86b28be-0d71-49db-a926-59b98d0c0d6c
            Description:  Database service creation with db name: ADM001BD
                 Status:  Failure
                Created:  November 19, 2020 9:06:16 AM CET
                Message:  DCS-10001:Internal error encountered: Failed to run sql in method : createTemporaryInstance.Unable to startup standby instance in nomount mode as output conatins ora- :

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Database Service creation                November 19, 2020 9:06:19 AM CET    November 19, 2020 9:07:06 AM CET    Failure
Database Service creation                November 19, 2020 9:06:19 AM CET    November 19, 2020 9:07:06 AM CET    Failure
Setting up ssh equivalance               November 19, 2020 9:06:19 AM CET    November 19, 2020 9:06:19 AM CET    Success
Creating volume datADM001BD              November 19, 2020 9:06:19 AM CET    November 19, 2020 9:06:31 AM CET    Success
Creating volume rdoADM001BD              November 19, 2020 9:06:31 AM CET    November 19, 2020 9:06:43 AM CET    Success
Creating ACFS filesystem for DATA        November 19, 2020 9:06:43 AM CET    November 19, 2020 9:06:54 AM CET    Success
Creating ACFS filesystem for RECO        November 19, 2020 9:06:54 AM CET    November 19, 2020 9:07:05 AM CET    Success
Database Service creation                November 19, 2020 9:07:05 AM CET    November 19, 2020 9:07:06 AM CET    Failure
Auxiliary Instance Creation              November 19, 2020 9:07:05 AM CET    November 19, 2020 9:07:06 AM CET    Failure

Looking in the dcscli.log and dcs-agent.log (under /opt/oracle/dcs/log/) didn’t help. But I found a new init file in the database home:

cat /u01/app/oracle/product/12.2.0.1/dbhome_1/dbs/initADM001BD.ora
db_name=ADM001BD
db_unique_name=ADM001BD_765D
wallet_root=+DATA/ADM001BD_765D,tde_configuration='KEYSTORE_CONFIGURATION=FILE'db_domain=intranet.ch

Is an instance really able to start on this kind of pfile? Let’s try.

cp /u01/app/oracle/product/12.2.0.1/dbhome_1/dbs/initADM001BD.ora /tmp/

su - oracle

export ORACLE_HOME=/u01/app/oracle/product/12.2.0.1/dbhome_1
export ORACLE_SID=ADM001BD
$ORACLE_HOME/bin/sqlplus / as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Thu Nov 19 10:01:20 2020

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile='/tmp/initADM001BD.ora';
LRM-00101: unknown parameter name 'wallet_root'
LRM-00101: unknown parameter name 'tde_configuration'
ORA-01078: failure in processing system parameters
exit

No for sure it can’t work. These wallet_root and tde_configuration parameters are for 18c and later databases. Don’t expect it will work on 11.2, 12.1 or 12.2.

Let’s also try to start a 19c instance with this pfile:

export ORACLE_HOME=/u01/app/oracle/product/19.0.0.0/dbhome_1
$ORACLE_HOME/bin/sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Nov 19 09:58:59 2020
Version 19.9.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile='/tmp/initADM001BD.ora';
ORACLE instance started.

Total System Global Area  436204272 bytes
Fixed Size                  8897264 bytes
Variable Size             285212672 bytes
Database Buffers          134217728 bytes
Redo Buffers                7876608 bytes

Yes a 19c instance can start on it.

I also tested an instance-only creation with odacli for a 19c and it worked fine, the init file being the same.

Workaround

As my database is in Failed status, my registry is not very clean:

odacli list-databases

ID                                       DB Name    DB Type  DB Version           CDB        Class    Shape    Storage    Status        DbHomeID      
---------------------------------------- ---------- -------- -------------------- ---------- -------- -------- ---------- ------------ ----------------------------------------
8ad09a72-fbcf-410f-854b-cd8c945fb3de     DBTEST     Si       19.9.0.0.201020      false      Oltp     Odb1s    Asm        Configured   1de83958-c38e-4b9b-9bd3-6c510ce2d3c3
1d0de144-c93c-4f26-b00a-6cbbfafda6dd     ADM001BD   Si       12.2.0.1.201020      false      Oltp     Odb1     Asm        Failed       c191668f-10de-4184-b22e-b953648bf4a7

What you could do is to delete this database (it will be correctly deleted) and create a complete database: it takes longer but it will succeed for whatever version you use:

odacli create-database  -cs WE8ISO8859P15 -no-c -dh 'c191668f-10de-4184-b22e-b953648bf4a7' -u ADM001BD_765D -n ADM001BD -r asm -s odb1s -no-co

When database creation is complete, you need to remove all the files belonging to it, let’s generate a script for that purpose:

su - oracle
. oraenv <<< ADM001BD
sqlplus -s / as sysdba

set pages 0
set lines 200
set serveroutput off
set feedback off
spool /tmp/delete_files.txt
select 'asmcmd rm -rf '||file_name  from dba_data_files;
select 'asmcmd rm -rf '||file_name  from dba_temp_files;
select 'asmcmd rm -rf '||member from v$logfile;
select 'asmcmd rm -rf '||value from v$spparameter where name='control_files';
select 'asmcmd rm -rf '||value from v$parameter where name='spfile';
spool off
exit

su - grid
cat /tmp/delete_files.txt
asmcmd rm -rf +DATA/ADM001BD_765D/DATAFILE/undotbs1.298.1056884325                                                                                                                                
asmcmd rm -rf +DATA/ADM001BD_765D/DATAFILE/system.306.1056884289                                                                                                                                  
asmcmd rm -rf +DATA/ADM001BD_765D/DATAFILE/sysaux.304.1056884317                                                                                                                                  
asmcmd rm -rf +DATA/ADM001BD_765D/DATAFILE/users.309.1056884715                                                                                                                                   
asmcmd rm -rf +DATA/ADM001BD_765D/TEMPFILE/temp.310.1056884383                                                                                                                                    
asmcmd rm -rf +RECO/ADM001BD_765D/ONLINELOG/group_3.2390.1056884377                                                                                                                               
asmcmd rm -rf +RECO/ADM001BD_765D/ONLINELOG/group_2.2389.1056884377                                                                                                                               
asmcmd rm -rf +RECO/ADM001BD_765D/ONLINELOG/group_1.2386.1056884377                                                                                                                               
asmcmd rm -rf +RECO/ADM001BD_765D/CONTROLFILE/current.2385.1056884373                                                                                                                             
asmcmd rm -rf +DATA/ADM001BD_765D/PARAMETERFILE/spfile.308.1056884469   
sh /tmp/delete_files.txt
exit

su - oracle
. oraenv <<< ADM001BD
startup force nomount;

startup failed: ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/u01/app/oracle/product/12.2.0.1/dbhome_1/dbs/initADM001BD.ora'

starting Oracle instance without parameter file for retrieval of spfile
Oracle instance started

Total System Global Area    1073741824 bytes

Fixed Size                     8628936 bytes
Variable Size                293602616 bytes
Database Buffers             763363328 bytes
Redo Buffers                   8146944 bytes
Conclusion

This little bug is a bit disappointing, but there is a workaround. I opened a SR in MOS and will keep you informed if a better workaround is proposed by Oracle.

Cet article odacli create-database -io doesn’t work for older databases in 19.9 est apparu en premier sur Blog dbi services.

PostgreSQL 14 will support OUT parameters for procedures

Fri, 2020-11-20 06:17

Before PostgreSQL 11 there was no possibility to work with procedures in PostgreSQL, only functions have been supported. Since PostgreSQL 11, procedures are supported and many waited for that because procedures also brought transaction control (commit/rollback), which is not possible with functions. Next year, when PostgreSQL 14 will be released there will also be support for OUT parameters. Currently only IN, INOUT and VARIADIC are supported. This makes PostgreSQL’s procedures more compatible to Oracle’s implementation of procedures, so lets have a look.

The only way for returning something from a procedure currently is to use INOUT:

postgres=# select version();
                                                          version                                                          
---------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 13.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
(1 row)

postgres=# create table t1 ( a int, b text);
CREATE TABLE
postgres=# insert into t1 values (1,'aa');
INSERT 0 1
postgres=# create or replace procedure p1 ( p_val inout int )
postgres-# as $$
postgres$# declare
postgres$# begin
postgres$#   select a
postgres$#     into p_val
postgres$#    from t1;
postgres$# end;
postgres$# $$ language plpgsql;
CREATE PROCEDURE

This simple procedure has one INOUT parameter and the parameter will contain the result of the select once the procedure is executed:

postgres=# call p1(5);
 p_val 
-------
     1
(1 row)

You can, of course, use multiple INOUT parameters as well:

postgres=# create or replace procedure p1 ( p_val inout int 
postgres(#                                , p_val2 inout text)
postgres-# as $$
postgres$# declare
postgres$# begin
postgres$#   select a, b
postgres$#     into p_val, p_val2
postgres$#    from t1;
postgres$# end;
postgres$# $$ language plpgsql;
CREATE PROCEDURE
postgres=# call p1 (5,'ccc');
 p_val | p_val2 
-------+--------
     1 | aa
(1 row)

But if you try to use an OUT parameter this will not work:

postgres=# create or replace procedure p1 ( p_val out int )
postgres-# as $$
postgres$# declare
postgres$# begin
postgres$#   select a
postgres$#     into p_val
postgres$#    from t1;
postgres$# end;
postgres$# $$ language plpgsql;
ERROR:  procedures cannot have OUT arguments
HINT:  INOUT arguments are permitted.
postgres=# 

This will change with PostgreSQL 14:

postgres=# select version();
                                                  version                                                  
-----------------------------------------------------------------------------------------------------------
 PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
(1 row)

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values (1,'aaa');
INSERT 0 1
postgres=# create or replace procedure p1 ( p_val out int )
postgres-# as $$
postgres$# declare
postgres$# begin
postgres$#   select a
postgres$#     into p_val
postgres$#    from t1;
postgres$# end;
postgres$# $$ language plpgsql;
CREATE PROCEDURE

For testing that you somehow need to declare a variable to hold the value, which will be returned:

postgres=# do
postgres-# $$
postgres$# declare 
postgres$#   n int;
postgres$# begin
postgres$#   call p1(n);
postgres$#   raise notice '%', n;
postgres$# end;
postgres$# $$;
NOTICE:  1
DO

Nice.

Cet article PostgreSQL 14 will support OUT parameters for procedures est apparu en premier sur Blog dbi services.

Configure Data Guard with odacli on ODA

Wed, 2020-11-18 14:09
Introduction

I always thought integration of Data Guard was a missing feature on Oracle Database Appliance. For sure, you can configure Data Guard manually as you do on a classic server, but as an appliance, ODA should have proposed an easier way for configuring that. After years of waiting for this feature, it’s now available, and I just tested it. And I’m here to give you my feedback.

Disclaimer

You’ll need 2 ODAs for building such configuration. Data Guard on a single ODA is not supported and wouldn’t make sense.

I did this test on 2 brand new X8-2M ODAs deployed with the latest version 19.9. It should also work starting from 19.8, as it was the first version to include this feature.

The DEMO database used in this test is also version 19.9. I don’t know if it should work with previous versions, it’s probably OK for 18c and 12.2, but I’m not sure for 12.1 and 11.2.

This configuration through odacli is linked to your ODA system configuration, especially regarding the domain. If you do not want to use the db_domain on your primary database and you removed this parameter, you’ll have to know that the standby will anyway be configured with this db_domain inherited from the system’s db domain. And the Data Guard configuration will fail at step 7.

Another piece of advice regarding the use of this feature: you’ll need db_create_online_log_dest_1 parameter to be set. I’m used to remove this parameter to allow multiplexing of the redologs between db_create_file_dest and db_recovery_file_dest (the default behavior when no db_create_online_log_dest_X is configured) but it’s better to use both db_create_online_log_dest_1 and db_create_online_log_dest_2 for using the odacli’s Data Guard feature.

You should be aware that if you already have your own backup configuration without using odacli can be a problem. Make sure to understand that creating a backup configuration for a database with odacli will interfere with your existing backup configuration.

Data Guard configuration is related to a database: you’ll need to apply this procedure for each database for which you want a disaster recovery option.

Data Guard is free of charge on Enterprise Edition, but make sure to never open your standby database in read only mode. For opening a standby database read only while applying the changes from primary, the Active Guard option is needed. Standard Edition does not support Data Guard, but you can use Dbvisit standby as a third party disaster recovery solution.

Finally, I would recommend to test this feature before applying it to your production databases. The goal being to be able to decide if this feature fits your needs or not. You can still configure Data Guard manually as you always did.

Create a backup configuration for your database

If you’re not familiar with backup configuration on ODA, you just need to know that you can manage your backup from odacli. It’s not a new feature, it’s not used very often but it can simplify your job and it relies on RMAN, meaning that it’s reliable.

For configuring Data Guard on ODA you first need to create a backup configuration for your primary. You can do the backup to OCI (the Oracle public Cloud) or nfs, most of us will probably use a classic nfs share. Let’s consider a nfs share /Rback/DEMO for our DEMO database, and a 10-day retention:

odacli create-backupconfig -n nfsRbackDEMO -w 10 -d NFS -c /Rback/DEMO

Backupconfig creation is fast:

odacli describe-job -i "15066dd7-485c-4bd9-9cdb-4259252b9ede"

Job details
----------------------------------------------------------------
                     ID:  15066dd7-485c-4bd9-9cdb-4259252b9ede
            Description:  create backup config:nfsRbackDEMO
                 Status:  Success
                Created:  November 18, 2020 9:06:46 AM CET
                Message:  backup config creation

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Backup config metadata persist           November 18, 2020 9:06:47 AM CET    November 18, 2020 9:06:47 AM CET    Success
Associate the backup config to the database

This backup configuration now needs to be linked to your DEMO database:

odacli modify-database -in DEMO -bin nfsRbackDEMO

odacli describe-job -i "b8f70d11-bc2f-49e2-81e3-5072305302ef"

Job details
----------------------------------------------------------------
                     ID:  b8f70d11-bc2f-49e2-81e3-5072305302ef
            Description:  Modify database : DEMO
                 Status:  Success
                Created:  November 18, 2020 9:14:29 AM CET
                Message:  Modify database

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Validate OMF parameter values            November 18, 2020 9:14:31 AM CET    November 18, 2020 9:14:32 AM CET    Success
update db with backupconfig attributes   November 18, 2020 9:14:39 AM CET    November 18, 2020 9:14:42 AM CET    Success
Enable Database Autobackup               November 18, 2020 9:14:42 AM CET    November 18, 2020 9:14:42 AM CET    Success
Enable Archivelog Autobackup             November 18, 2020 9:14:42 AM CET    November 18, 2020 9:14:42 AM CET    Success
Configure Control file Auto Backup Format November 18, 2020 9:14:42 AM CET    November 18, 2020 9:14:44 AM CET    Success
Backup Current Control file              November 18, 2020 9:14:44 AM CET    November 18, 2020 9:14:50 AM CET    Success
Update metadata for database:DEMO        November 18, 2020 9:14:50 AM CET    November 18, 2020 9:14:50 AM CET    Success

As you can see, applying this backup configuration will change some RMAN parameters, for example the path for the autobackup of the controfile.

Create a first backup of your primary with odacli

Now you can easily take an incremental level 0 backup of your database, an incremental level 0 being actually a full backup. The -ka option is for keeping archivelogs on disk:

odacli create-backup --backupType Regular-L0 -in DEMO -ka

odacli describe-job -i 56a941f7-2f26-4d87-b159-1e02de183db6

Job details
----------------------------------------------------------------
                     ID:  56a941f7-2f26-4d87-b159-1e02de183db6
            Description:  Create Regular-L0 Backup with TAG-auto for Db:DEMO in NFS:/Rback/DEMO/orabackups/oda00770p/database/3800776698/DEMO_770P
                 Status:  Success
                Created:  November 18, 2020 9:18:07 AM CET
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Validate backup config                   November 18, 2020 9:18:09 AM CET    November 18, 2020 9:18:09 AM CET    Success
NFS location existence validation        November 18, 2020 9:18:09 AM CET    November 18, 2020 9:18:09 AM CET    Success
Backup Validations                       November 18, 2020 9:18:09 AM CET    November 18, 2020 9:18:11 AM CET    Success
Recovery Window validation               November 18, 2020 9:18:11 AM CET    November 18, 2020 9:18:13 AM CET    Success
Archivelog deletion policy configuration November 18, 2020 9:18:13 AM CET    November 18, 2020 9:18:14 AM CET    Success
Database backup                          November 18, 2020 9:18:14 AM CET    November 18, 2020 9:19:00 AM CET    Success

Backup can take few minutes to few hours depending on your database size.

Export the backup report to a json file

Each backup done with odacli generates a backup report, you’ll need to save backup report’s description in a json file that will be used on standby server. In this example, I put the json file on the nfs share to avoid having to copy the file from primary to standby server.

odacli list-backupreports | tail -n 2 | grep -v ^$ | awk '{print $1}'
1b64bcc5-dec7-4373-bf95-f8bdd18bdf08

odacli describe-backupreport -i 1b64bcc5-dec7-4373-bf95-f8bdd18bdf08 > /Rback/DEMO/br_DEMO_`date +"%Y%m%d_%H%M%S"`.json
Restore the database on standby server

Assuming your standby server has also the same nfs share mounted, you can restore the database on it. The irestore-database feature will create the database with odacli, and you may know that if you don’t specify a dbhome, a new dbhome will be created. So don’t forget to pick up the dbhome corresponding to your needs if you don’t want a new dbhome. You also need to give to odacli a new db_unique_name for your database (having different db_unique_names for primary and standby is a requirement for Data Guard) and specify the database role: STANDBY.

odacli list-dbhomes

odacli irestore-database -dh 82559355-db72-4616-b968-13b8509466fc -r /Rback/DEMO/br_DEMO_20201118_092635.json -u DEMO_769P -ro STANDBY
Password for SYS user:
{
  "jobId" : "775b58d4-8296-41fa-9c89-7d8761c0b69b",
  "status" : "Created",
  "message" : null,
  "reports" : [ ],
  "createTimestamp" : "November 18, 2020 11:35:20 AM CET",
  "resourceList" : [ ],
  "description" : "Database service recovery with db name: DEMO",
  "updatedTime" : "November 18, 2020 11:35:20 AM CET"
}

The restore can take minutes to hours depending on the size of the database.

odacli describe-job -i "775b58d4-8296-41fa-9c89-7d8761c0b69b"

Job details
----------------------------------------------------------------
                     ID:  775b58d4-8296-41fa-9c89-7d8761c0b69b
            Description:  Database service recovery with db name: DEMO
                 Status:  Success
                Created:  November 18, 2020 11:35:20 AM CET
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Check if cluster ware is running         November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
Creating DbStorage for DbRestore         November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
Validating DiskSpace for DATA            November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
Generating SSH key                       November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
SSH key                                  November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
SSH key scan                             November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
Create TDE And Audit Dir Locations       November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
Create pfile for Auxiliary Instance      November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
Deleting FRA                             November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:22 AM CET   Success
Rman duplicate                           November 18, 2020 11:35:22 AM CET   November 18, 2020 11:37:49 AM CET   Success
Delete RECO FileGroup DEMO_769P          November 18, 2020 11:37:49 AM CET   November 18, 2020 11:37:50 AM CET   Success
Create RECO FileGroup DEMO_769P          November 18, 2020 11:37:50 AM CET   November 18, 2020 11:37:50 AM CET   Success
Delete RECO FileGroup DEMO_769P_9999     November 18, 2020 11:37:50 AM CET   November 18, 2020 11:37:50 AM CET   Success
Creating pfile from spfile               November 18, 2020 11:37:50 AM CET   November 18, 2020 11:37:51 AM CET   Success
Set PFile Ownership                      November 18, 2020 11:37:51 AM CET   November 18, 2020 11:37:51 AM CET   Success
Customize Db Parameters                  November 18, 2020 11:37:51 AM CET   November 18, 2020 11:37:52 AM CET   Success
Shutdown And Start database              November 18, 2020 11:37:52 AM CET   November 18, 2020 11:38:33 AM CET   Success
Create spfile for restore db             November 18, 2020 11:38:33 AM CET   November 18, 2020 11:38:34 AM CET   Success
Set PFile Ownership                      November 18, 2020 11:38:34 AM CET   November 18, 2020 11:38:34 AM CET   Success
Shutdown And Mount database              November 18, 2020 11:38:34 AM CET   November 18, 2020 11:39:19 AM CET   Success
Enable block change tracking             November 18, 2020 11:39:19 AM CET   November 18, 2020 11:39:27 AM CET   Success
Register Database taskflow               November 18, 2020 11:39:37 AM CET   November 18, 2020 11:42:11 AM CET   Success
Create SPFile in shared loc              November 18, 2020 11:39:37 AM CET   November 18, 2020 11:39:52 AM CET   Success
Delete Local Spfile                      November 18, 2020 11:39:52 AM CET   November 18, 2020 11:39:52 AM CET   Success
Register DB with clusterware             November 18, 2020 11:39:52 AM CET   November 18, 2020 11:40:36 AM CET   Success
Set SysPassword and Create PwFile        November 18, 2020 11:40:36 AM CET   November 18, 2020 11:40:38 AM CET   Success
Creating pfile                           November 18, 2020 11:40:38 AM CET   November 18, 2020 11:40:38 AM CET   Success
Updating db env                          November 18, 2020 11:40:38 AM CET   November 18, 2020 11:40:39 AM CET   Success
Enable DbSizing Template                 November 18, 2020 11:40:39 AM CET   November 18, 2020 11:41:25 AM CET   Success
Create tns entry                         November 18, 2020 11:41:25 AM CET   November 18, 2020 11:41:26 AM CET   Success
Running datapatch                        November 18, 2020 11:41:26 AM CET   November 18, 2020 11:41:27 AM CET   Success
Set CPU pool                             November 18, 2020 11:41:27 AM CET   November 18, 2020 11:41:27 AM CET   Success
Reset Associated Networks                November 18, 2020 11:42:11 AM CET   November 18, 2020 11:42:13 AM CET   Success
Copy Pwfile to Shared Storage            November 18, 2020 11:42:13 AM CET   November 18, 2020 11:42:17 AM CET   Success
Copy the TrustStore file from standby server to primary

You need to copy a file from standby server to primary server. This file will only be used during initial Data Guard configuration:

scp /opt/oracle/dcs/dcscli/cacerts root@oda00770p.intranet.ch:/opt/oracle/dcs/dcscli/cacerts.oda00769p
Create a json file for Data Guard configuration

You can configure Data Guard in interactive mode, but it’s better to prepare a json file with all the parameters. You can later duplicate this json file for your other databases.

vi /home/oracle/scripts/DEMO_DTG.json

{
  "name": "DEMO_DTG",
  "protectionMode": "MAX_AVAILABILITY",
  "enableFlashback": true,
  "enableActiveDg": false,
  "replicationGroups": [
    {
      "sourceEndPoints": [
        {
          "endpointType": "PRIMARY",
          "hostName": "oda00770p.intranet.ch",
          "listenerPort": 1521,
          "databaseUniqueName": "DEMO_770P",
          "ipAddress": "125.150.226.239"
        }
      ],
      "targetEndPoints": [
        {
          "endpointType": "STANDBY",
          "hostName": "oda00769p.intranet.ch",
          "listenerPort": 1521,
          "databaseUniqueName": "DEMO_769P",
          "ipAddress": "125.150.226.237"
        }
      ],
      "transportType": "SYNC"
    }
  ]
}
Configure Data Guard with odacli

Now you can configure Data Guard with your json file as an input. You will be asked to manually give the standby server’s name, the password of the ODA webconsole, the path of the file you copied from standby to primary (the TrustStore file), the password of this TrustStore, and the primary SYS password. The password for the TrustStore can be found on standby server here:

cat /opt/oracle/dcs/dcscli/dcscli.conf  | grep TrustStorePassword
TrustStorePassword=changeit


odacli configure-dataguard -r /home/oracle/scripts/DEMO_DTG.json
Standby site address: oda00769p.intranet.ch
BUI username for Standby site (default: oda-admin):
BUI password for Standby site:
TrustStore path for Standby DCS server: /opt/oracle/dcs/dcscli/cacerts.oda00769p
TrustStore password for Standby DCS server:
Primary database syspassword:
The TrustStore of the Standby site should be removed unless it is necessary for some other purpose. Do you want to delete it? (Y/N, default:Y): N
As a requirement, Standby database’s SYS password will be set to Primary database’s after Data Guard configuration. Are you sure you want to proceed with Data Guard configuration? (Y/N): Y
*******************************************************************************************
Configure Data Guard DEMO_DTG started
*******************************************************************************************
Step 1: Validate Data Guard configuration request (Primary site)
Description: Validate DG Config Creation
Job ID: b666ee0c-225a-4228-aadf-64eb14ef2dff
Started November 18, 2020 15:29:54 PM CET
Validate create Data Guard configuration request
Finished November 18, 2020 15:29:58 PM CET
*******************************************************************************************
Step 2: Validate Data Guard configuration request (Standby site)
Description: Validate DG Config Creation
Job ID: 5bca9b36-9920-4991-ba23-b45043af639c
Started November 18, 2020 15:29:58 PM CET
Validate create Data Guard configuration request
Finished November 18, 2020 15:30:00 PM CET
*******************************************************************************************
Step 3: Download password file from Primary database (Primary site)
Description: Download orapwd file from Primary database
Started November 18, 2020 15:30:00 PM CET
Prepare orapwd file for Primary database DEMO
Finished November 18, 2020 15:30:04 PM CET
*******************************************************************************************
Step 4: Upload password file to Standby database (Standby site)
Description: Upload orapwd file to Standby database
Started November 18, 2020 15:30:04 PM CET
Write orapwd file to Standby database DEMO
Finished November 18, 2020 15:30:10 PM CET
*******************************************************************************************
Step 5: Configure Primary database (Primary site)
Description: DG Config service creation - ConfigurePrimary
Job ID: de288bc9-579f-4a1c-8db4-0328d3cfcc1e
Started November 18, 2020 15:30:10 PM CET
Configure host DNS on primary env
Configure Data Guard Tns on primary env
Enable Data Guard related Db parameters for primary env
Enable force logging and archivelog mode in primary env
Enable FlashBack
Configure network parameters for local listener on primary env
Restart listener on primary env
Create services for primary db
Finished November 18, 2020 15:30:24 PM CET
*******************************************************************************************
Step 6: Configure Standby database (Standby site)
Description: DG Config service creation - ConfigureStandby
Job ID: 0b3dbf45-5023-4b5f-a978-a8a512e58c5f
Started November 18, 2020 15:30:24 PM CET
Configure Data Guard Tns on standby env
Configure host DNS on standby env
Clear Data Guard related Db parameters for standby env
Enable Data Guard related Db parameters for standby env
Enable force logging and archivelog mode in standby env
Populate standby database metadata
Configure network parameters for local listener on standby env
Reset Db sizing and hidden parameters for ODA best practice
Restart Listener on standby env
Create services for standby db
Finished November 18, 2020 15:31:42 PM CET
*******************************************************************************************
Step 7: Configure and enable Data Guard (Primary site)
Description: DG Config service creation - ConfigureDg
Job ID: c89f51ef-7dc5-410b-8421-f7892353babc
Started November 18, 2020 15:31:42 PM CET
Config and enable Data Guard
Post check Data Guard configuration
Finished November 18, 2020 15:33:09 PM CET
*******************************************************************************************
Step 8: Enable Flashback (Standby site)
Description: DG Config service creation - EnableFlashback
Job ID: cdd1899c-f7be-40e9-8bc1-0396acd62c20
Started November 18, 2020 15:33:09 PM CET
Enable FlashBack
Finished November 18, 2020 15:33:19 PM CET
*******************************************************************************************
Step 9: Re-enable Data Guard (Primary site)
Description: DG Config service creation - ReenableDg
Job ID: 391fe101-b286-4664-85a7-09a9679a5027
Started November 18, 2020 15:33:19 PM CET
Post check Data Guard configuration
Finished November 18, 2020 15:33:21 PM CET
*******************************************************************************************
Step 10: Create Data Guard status (Primary site)
Description: DG Status service creation - NewDgconfig
Job ID: 587f7f95-8a4f-4be6-aaa4-a1b6f4fdffa3
Started November 18, 2020 15:33:21 PM CET
Create Data Guard status
Finished November 18, 2020 15:33:23 PM CET
*******************************************************************************************
Step 11: Create Data Guard status (Standby site)
Description: DG Status service creation - NewDgconfig
Job ID: d1cee908-3333-4f1b-b91d-70ae9f13f7dd
Started November 18, 2020 15:33:23 PM CET
Create Data Guard status
Finished November 18, 2020 15:33:25 PM CET
*******************************************************************************************
Configure Data Guard DEMO_DTG completed
*******************************************************************************************

Now configuration is done. Note that this configuration is not a unique job but an interactive process that will span multiple jobs on your 2 ODAs.

Is it really working?

It was my first question. Let’s check that:

oracle@oda00770p:/home/oracle/ [DEMO] dgmgrl /
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Wed Nov 18 15:54:40 2020
Version 19.9.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "DEMO_770P"
Connected as SYSDG.
DGMGRL> show configuration;

Configuration - DEMO_DTG

  Protection Mode: MaxAvailability
  Members:
  DEMO_770P - Primary database
    DEMO_769P - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 19 seconds ago)




DGMGRL> show database 'DEMO_769P';

Database - DEMO_769P

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 0 Byte/s
  Real Time Query:    OFF
  Instance(s):
    DEMO

Database Status:
SUCCESS

Yes it’s working fine.

Switchover test

To make sure it’s working, let’s do a switchover from primary server. You first need to pick up the dataguardstatus id:

odacli list-dataguardstatus

ID                                       Name                             Database Name        Role       Protection Mode    Apply Lag       Transport Lag   Apply Rate      Status
---------------------------------------- -------------------------------- -------------------- ---------- ------------------ --------------- --------------- --------------- ----------
d859288f-29fd-45a6-b1b3-c4bb8b52dfd7     DEMO_DTG                         DEMO                 Primary    MaxAvailability    12 seconds      0 seconds       528.00 KByte/s  Configured

odacli describe-dataguardstatus -i d859288f-29fd-45a6-b1b3-c4bb8b52dfd7
Dataguard Status details
----------------------------------------------------------------
                     ID: d859288f-29fd-45a6-b1b3-c4bb8b52dfd7
                   Name: DEMO_DTG
          Database Name: DEMO
                   Role: Primary
        Protection Mode: MaxAvailability
              Apply Lag: 0 seconds
          Transport Lag: 0 seconds
             Apply Rate: 0 Byte/s
                 Status: Configured
           Updated Time: November 18, 2020 3:56:37 PM CET


odacli switchover-dataguard -i d859288f-29fd-45a6-b1b3-c4bb8b52dfd7 -u DEMO_769P


odacli describe-job -i "11d03cf3-d9a1-467b-8226-4363732161d5"

Job details
----------------------------------------------------------------
                     ID:  11d03cf3-d9a1-467b-8226-4363732161d5
            Description:  Dataguard Lifecycle operation service - SwitchoverDg
                 Status:  Success
                Created:  November 18, 2020 3:59:18 PM CET
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Precheck switchover DataGuard            November 18, 2020 3:59:18 PM CET    November 18, 2020 3:59:21 PM CET    Success
Switchover DataGuard                     November 18, 2020 3:59:21 PM CET    November 18, 2020 4:00:15 PM CET    Success
Postcheck switchover DataGuard           November 18, 2020 4:00:15 PM CET    November 18, 2020 4:00:16 PM CET    Success
Check if DataGuard config is updated     November 18, 2020 4:00:16 PM CET    November 18, 2020 4:00:36 PM CET    Success

Switchover is OK. Let’s check with dgmgrl:

oracle@oda00769p:/home/oracle/ [DEMO] dgmgrl /
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Wed Nov 18 16:16:14 2020
Version 19.9.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "DEMO_769P"
Connected as SYSDG.
DGMGRL> show configuration;

Configuration - DEMO_DTG

  Protection Mode: MaxAvailability
  Members:
  DEMO_769P - Primary database
    DEMO_770P - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 37 seconds ago)

DGMGRL> show database 'DEMO_770P';

Database - DEMO_770P

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 14.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    DEMO

Database Status:
SUCCESS

Nice!

Switchback test

Now I’m quite confident for the switchback. Let’s try from standby server:

[root@oda00769p ~]# odacli list-dataguardstatus

ID                                       Name                             Database Name        Role       Protection Mode    Apply Lag       Transport Lag   Apply Rate      Status
---------------------------------------- -------------------------------- -------------------- ---------- ------------------ --------------- --------------- --------------- ----------
d859288f-29fd-45a6-b1b3-c4bb8b52dfd7     DEMO_DTG                         DEMO                 Standby    MaxAvailability    0 seconds       0 seconds       2.00 KByte/s    Configured

[root@oda00769p ~]# odacli describe-dataguardstatus -i d859288f-29fd-45a6-b1b3-c4bb8b52dfd7
Dataguard Status details
----------------------------------------------------------------
                     ID: d859288f-29fd-45a6-b1b3-c4bb8b52dfd7
                   Name: DEMO_DTG
          Database Name: DEMO
                   Role: Primary
        Protection Mode: MaxAvailability
              Apply Lag: 0 seconds
          Transport Lag: 0 seconds
             Apply Rate: 14.00 KByte/s
                 Status: Configured
           Updated Time: November 18, 2020 4:17:12 PM CET




[root@oda00769p ~]#  odacli switchover-dataguard -i d859288f-29fd-45a6-b1b3-c4bb8b52dfd7 -u DEMO_770P
Password for target database:
{
  "jobId" : "99bbbaff-4716-4d84-ada6-1da256a93242",
  "status" : "Created",
  "message" : null,
  "reports" : [ ],
  "createTimestamp" : "November 18, 2020 16:19:15 PM CET",
  "resourceList" : [ ],
  "description" : "Dataguard Lifecycle operation service - SwitchoverDg",
  "updatedTime" : "November 18, 2020 16:19:15 PM CET"
}
You have new mail in /var/spool/mail/root
[root@oda00769p ~]# odacli describe-job -i "99bbbaff-4716-4d84-ada6-1da256a93242"


Job details
----------------------------------------------------------------
                     ID:  99bbbaff-4716-4d84-ada6-1da256a93242
            Description:  Dataguard Lifecycle operation service - SwitchoverDg
                 Status:  Success
                Created:  November 18, 2020 4:19:15 PM CET
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Precheck switchover DataGuard            November 18, 2020 4:19:15 PM CET    November 18, 2020 4:19:18 PM CET    Success
Switchover DataGuard                     November 18, 2020 4:19:18 PM CET    November 18, 2020 4:20:06 PM CET    Success
Postcheck switchover DataGuard           November 18, 2020 4:20:06 PM CET    November 18, 2020 4:20:07 PM CET    Success
Check if DataGuard config is updated     November 18, 2020 4:20:07 PM CET    November 18, 2020 4:20:27 PM CET    Success

Yes it worked. Brilliant. You can also check with dgmgrl or by querying current_scn on standby database. Everything is fine regarding this feature.

Conclusion

The Data Guard configuration with odacli is a game changer. Compared to manual configuration, it’s much easier to do, it’s faster to deploy on all your databases and it’s as reliable because it’s just an interface on top of DGMGRL and RMAN. I highly recommend using this feature! Good job Oracle’s ODA team! You just made ODA even better.

Cet article Configure Data Guard with odacli on ODA est apparu en premier sur Blog dbi services.

Recovery in the ☁ with Oracle Autonomous Database

Wed, 2020-11-18 09:49
By Franck Pachot

.
I’ll start this series with the Oracle Autonomous database but my goal is to cover the Point In Time recovery for many managed databases in the major cloud providers. Because I’ve seen lot of confusion about database backups (see What is a database backup (back to the basics)). On one side, in a managed database, we should not have to care about backups, but only about recovery. The way it is done (backups, redo archiving, snapshots, replication…) is not in our responsibility on a managed service. What we want is an easy way to recover the database to a past Point In Time within a recovery window (the recovery point – RPO) and in a predictable time (the recovery time – RTO). However, I’m certain that it is important to know how it is implemented behind this “recovery” service.

Clone from a backup

What I like the most with the Oracle Autonomous Database is that there is no “restore” or “recovery” button on the database service actions. There is one when you go to the list of backups, in order to restore the database in-place from a previous backup. But this is not the common case of recovery. When you need recovery, you probably want to create a new database. The reason is that when you recover to a Point In Time in the past, you probably want to connect to it, check it, read or export data from it, without discarding the transactions that happened since then in the original database. And you are in the cloud: it is easy to provision a new clone and keep the original database until you are sure to know what you need to do (merging recent transactions into the clone or fixing the original database from the clone data). In the cloud, a Point-In-Time Recovery (PITR) is actually a “clone from backup” or “clone from timestamp”. And this is exactly how you do it in the Oracle Autonomous Database: you create a new clone, and rather than cloning from the current state you choose to clone from a previous state, by mentioning a timestamp or a backup.

Point In Time (PIT)


Actually, I think that the “clone from backup” should not even be there. It gives a false impression that you restore a backup like a snapshot of the database. But that’s wrong: the “restore from backup” is just a “restore from timestamp” where the timestamp is the one from the end of the copy of datafiles to the backup set. It is not a special Recovery Point. It is just a point where the Recovery Time is minimal. From a user point of view, you decide to clone from a timestamp. The timestamp is probably the point just before the failure (like an erroneous update for example). Additionally, I hope that one day more utilities will be provided to define this point. Log Miner can search the journal of changes for a specific point. Imagine a recovery service where you mention a time window, the name of a table, an operation (like UPDATE),… and find the exact point-in-time to recover. Oracle has Log Miner and Flashback Transaction and a DBA can use them. A managed service could provide the same. But currently, you need to know the timestamp you want to recover to. The console shows time per half hours but you can enter it with finer granularity: second in the console, millisecond with the CLI:

I have a PL/SQL loop constantly updating a timestamp in a DEMO table:


SQL> exec loop update demo set ts=sys_extract_utc(current_timestamp); commit; end loop;

I just kept that running.

Later I created a a clone with 21:21:21.000 as recovery point in time:


[opc@a demo]$ oci db autonomous-database create-from-backup-timestamp --clone-type FULL --timestamp 2020-11-14T21:21:21.000Z --db-name clone --cpu-core-count 1 --data-storage-size-in-tbs 1 --admin-password COVID-19-Go-Home --autonomous-database-id ocid1.autonomousdatabase.oc1.iad.abuw...

and once opened, I checked the state of my table:


SQL> select * from demo;

TS                       
------------------------ 
2020-11-14T21:21:19.368Z 

This is less than 2 seconds before the mentioned recovery point-in-time. RPO is minimal.

Recovery window

With the Oracle Autonomous Database, you always have the possibility to recover to any point in time during the past 2 months. Always: without any additional configuration, without additional cost, and even in the always free tier. You can run additional manual backups but you cannot remove those automatic ones, and you cannot reduce this recovery window.
Here is the documentation:
About Backup and Recovery on Autonomous Data Warehouse
Autonomous Data Warehouse automatically backs up your database for you. The retention period for backups is 60 days. You can restore and recover your database to any point-in-time in this retention period.

This is great. You don’t have to worry about anything in advance. And if something happens (can be a user error dropping a table, updating wrong rows, an application release that messes up everything, a legal request to look at past data,…) you can create a clone of the database at any state within this 60 days timeframe. And these 60 days is the minimum guaranteed. I have a small free-tier database where I can still see the backups from the past 6 months:

I have another database that is stopped for 2 months:

Only one backup there because the database has not been opened in the past two months (my current timestamp is 2020-09-17 15:45:50). This is important to know. Some cloud providers will start your managed database to backup it and apply updates. for example, in AWS RDS a stopped database is stopped only for seven days and it will be automatically started after. But for the Oracle Autonomous Database, the database is actually a PDB, and stopped means closed but the instance is still running. But at least one backup remains!

Multiple backups

This is a very important point. If you rely only on restoring a backup, without the possibility to recover to a further Point In Time, you have the risk that any problem in the backup compromises your recovery. Here, even if the last backup has a problem (you should always design for failure even if the probability of it is low) I can recover from the previous one and recover. It will take longer (24 hours of activity to recover) but data will be there, and consistent, to the requested point in time.

When you know how Oracle Database backups work, you know that any backup configuration mentions either a number of backup (please, always more than one) or a recovery window (which hopefully will contain more than one), bot not both:


RMAN> show all;

using target database control file instead of recovery catalog
RMAN configuration parameters for database with db_unique_name CDB1A_IAD154 are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default

RMAN> configure retention policy to recovery window of 60 days;

new RMAN configuration parameters:
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 60 DAYS;
new RMAN configuration parameters are successfully stored

In a managed database you don’t have to configure this. But knowing how it works will give you confidence and trust in the data protection. Here for my closed database I can see only one backup.

Limitation in RTO (Recovery Time Objective)

How long does it take to restore the database? This is proportional to the size. But when you understand that a restore is not sufficient and recovery must apply the redo logs to bring all database files to be consistent to the recovery point-in-time, you know it can take longer. As automatic backups are taken every day, you may have 24 hours to recover. And understanding this may help. If you have the choice to select a point-in-time not too far after the end of the backup, then you can minimize the RTO. This is where you may choose “select the backup from the list” rather than “Point In Time clone”. But the, you may increase the RPO.

My main message is: even in a managed service where you are not the DBA, it is important to understand how it works. There is nothing like “restore a backup”. Recovery always happens and your goal when you select the recovery point-in-time is to minimize the RTO for the required RPO. Recovery has to restore the archived logs, read them and apply the relevant redo vectors. And with the Oracle Autonomous Database, this redo stream is common for all databases (redo is at CDB level if you know the multitenant architecture). This can take time for your database even if you didn’t have a lot of updates in your database (which is a PDB). Most of the redo will be discarded, but they have to be restored and read sequentially. In order to verify that, I’ve selected a point-in-time from this morning for the database that was stopped 2 months ago. The last backup is 2 months before the Point In Time to recover to. If you take this as a black box, you may think that this will be very fast: no updates to recover because the database was closed most of the time. But actually, this has to go through 2 months of redo for this database shared by many users. While writing this, the progress bar is showing “18%” for three hours… This could be avoided if the whole CDB was backed-up every day, because a closed autonomous database is just a closed PDB – files are still accessible for backup. Is it the case? I don’t know.

Finally, this recovery took nearly 5 hours. And then I am not sure whether it really restored the datafiles from 2 months ago, and then restoring this amount of archived logs was really fast, or from a more recent backup taken while the database was closed, not visible in the list of backups. Because I see 74 TeraBytes of archived logs during those 60 days:


select dbms_xplan.format_size(sum(blocks*block_size))
from gv$archived_log where dest_id=1 and first_time>sysdate-60;

DBMS_XPLAN.FORMAT_SIZE(SUM(BLOCKS*BLOCK_SIZE)) 
---------------------------------------------- 
74T                                            

And I would be very surprised that they can restore and recover this amount in less than 5 hours… But who knows? We don’t have the logs in a managed database.

Note that this is a shared infrastructure managed database. You have also the choice to provision a dedicated autonomous database when you need better control and isolation. What I’m saying is that if you don’t know it, you will make mistakes. Like this one: as the database was stopped for 2 months, I should have selected a point-in-time closer to the last backup. It could have taken 1 hour instead of 5 hours (I tested it on the same database). I’ve been working on production databases for 20 years, in many companies. I can tell you that when a problem will happen in production, and you will have to recover a database, you will be surrounded by managers asking, every five minutes, how long this recovery will take. The better you understand the recovery process, the more comfortable you will be.

Just imagine that you are in this situation and the only information you have is this 18% progress bar that is there for two hours:

When I understand what happens (2 months of shared archived logs to restore), I can take another action, or at least explain why it takes so long. Here, even if I have no access to the recovery logs in this managed database, I can guess that it restores and recover all archived logs from the past 60 days. It cannot be fast and it is hard to predict in a shared infrastructure, as it depends on the neighbours activity.

Limitation in RPO (Recovery Point Objective)

Managed services have always more limitations because the automation that is behind must standardize everything. When trying to recover from a RPO that is just 30 minutes ago, I hit a limitation. I’ve not seen it documented but the message is clear: Cloning operation failed because the timestamp specified is not at least 2 hours in the past.

I don’t understand exactly the reason. Point-In-Time recovery should be possible even if redo logs are archived every 2 hours, which is not the case anyway:


DEMO@atp1_tp> select * from v$archive_dest where status='VALID';

   DEST_ID             DEST_NAME    STATUS      BINDING    NAME_SPACE     TARGET    ARCHIVER    SCHEDULE                  DESTINATION    LOG_SEQUENCE    REOPEN_SECS    DELAY_MINS    MAX_CONNECTIONS    NET_TIMEOUT    PROCESS    REGISTER    FAIL_DATE    FAIL_SEQUENCE    FAIL_BLOCK    FAILURE_COUNT    MAX_FAILURE    ERROR    ALTERNATE    DEPENDENCY    REMOTE_TEMPLATE    QUOTA_SIZE    QUOTA_USED    MOUNTID    TRANSMIT_MODE    ASYNC_BLOCKS    AFFIRM      TYPE    VALID_NOW      VALID_TYPE    VALID_ROLE    DB_UNIQUE_NAME    VERIFY    COMPRESSION    APPLIED_SCN    CON_ID    ENCRYPTION
__________ _____________________ _________ ____________ _____________ __________ ___________ ___________ ____________________________ _______________ ______________ _____________ __________________ ______________ __________ ___________ ____________ ________________ _____________ ________________ ______________ ________ ____________ _____________ __________________ _____________ _____________ __________ ________________ _______________ _________ _________ ____________ _______________ _____________ _________________ _________ ______________ ______________ _________ _____________
         1 LOG_ARCHIVE_DEST_1    VALID     MANDATORY    SYSTEM        PRIMARY    ARCH        ACTIVE      USE_DB_RECOVERY_FILE_DEST              3,256            300             0                  1              0 ARCH       YES                                     0             0                0              0          NONE         NONE          NONE                           0             0          0 SYNCHRONOUS                    0 NO        PUBLIC    YES          ALL_LOGFILES    ALL_ROLES     NONE              NO        DISABLE                     0         0 DISABLE


DEMO@atp1_tp> select stamp,dest_id,thread#,sequence#,next_time,backup_count,name from v$archived_log where next_time > sysdate-2/24;

           STAMP    DEST_ID    THREAD#    SEQUENCE#              NEXT_TIME    BACKUP_COUNT                                                                     NAME
________________ __________ __________ ____________ ______________________ _______________ ________________________________________________________________________
   1,056,468,560          1          1        3,255 2020-11-14 15:29:14                  1 +RECO/FEIO1POD/ARCHIVELOG/2020_11_14/thread_1_seq_3255.350.1056468555
   1,056,468,562          1          2        3,212 2020-11-14 15:29:16                  1 +RECO/FEIO1POD/ARCHIVELOG/2020_11_14/thread_2_seq_3212.376.1056468557
   1,056,472,195          1          2        3,213 2020-11-14 16:29:51                  1 +RECO/FEIO1POD/ARCHIVELOG/2020_11_14/thread_2_seq_3213.334.1056472191
   1,056,472,199          1          1        3,256 2020-11-14 16:29:52                  1 +RECO/FEIO1POD/ARCHIVELOG/2020_11_14/thread_1_seq_3256.335.1056472193


DEMO@atp1_tp> select * from v$logfile;

   GROUP#    STATUS      TYPE                                             MEMBER    IS_RECOVERY_DEST_FILE    CON_ID
_________ _________ _________ __________________________________________________ ________________________ _________
        2           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_2.276.1041244447    NO                               0
        2           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_2.262.1041244471    YES                              0
        1           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_1.275.1041244447    NO                               0
        1           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_1.261.1041244471    YES                              0
        5           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_5.266.1041245263    NO                               0
        5           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_5.266.1041245277    YES                              0
        6           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_6.268.1041245295    NO                               0
        6           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_6.267.1041245307    YES                              0
        3           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_3.281.1041245045    NO                               0
        3           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_3.263.1041245057    YES                              0
        4           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_4.282.1041245077    NO                               0
        4           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_4.264.1041245089    YES                              0
        7           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_7.267.1041245327    NO                               0
        7           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_7.268.1041245339    YES                              0
        8           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_8.279.1041245359    NO                               0
        8           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_8.269.1041245371    YES                              0

As you can see, the Autonomous Database has two threads (RAC two nodes) with 4 online redo log groups (the Autonomous Database is not protected by Data Guard even if you enable Autonomous Data Guard… but that’s another story) and two members each. And the online logs are archived and backed up. All is there to be able to recover within the past 2 hours. But probably those archived logs are shipped to a dedicated destination to be recovered by the PITR feature “clone from timestamp”.

Anyway, this is a managed service and you must do with it. You don’t want the DBA responsibility, and then you lack some control. Again, if you understand it, everything is fine. In case of failure, you can start to create a clone from 2 hours ago and look at what you can do to repair the initial mistake. And two hours later, you know that you can create another clone which is recovered to the last second before the failure.

In summary

The Oracle Autonomous Database does not provide all the possibilities available when you manage it yourself, but it is still at the top of the main cloud-managed database services: RPO is at second between 2 hours ago and 60 days ago. RTO is in few hours even when dealing with terabytes. This, without anything to configure or pay in addition to the database service, and this includes the always free database. If the database is stopped during the backup window, the backup is skipped but at least one remains even if out of the retention window.

Cet article Recovery in the ☁ with Oracle Autonomous Database est apparu en premier sur Blog dbi services.

Loading data from S3 to AWS RDS for PostgreSQL

Wed, 2020-11-11 13:44

AWS RDS for PostgreSQL comes with an extension that allows you to fetch data from AWS S3 and to write back data to AWS S3. The use case for this is obvious: Either you use other AWS services that write data to S3 and you want to further process that data in PostgreSQL, or you want other AWS services to consume data from PostgreSQL by providing that data in S3. Let’s have a look at how that works.

The extension AWS is providing for working with S3 from inside PostgreSQL is called “aws_s3”:

postgres=> select * from pg_available_extensions where name like '%aws%';
    name     | default_version | installed_version |                   comment                   
-------------+-----------------+-------------------+---------------------------------------------
 aws_commons | 1.0             |                   | Common data types across AWS services
 aws_s3      | 1.0             |                   | AWS S3 extension for importing data from S3
(2 rows)

If you try to install the extension you’ll notice that there is a dependency on the “aws_commons” extension:

postgres=> create extension aws_s3;
ERROR:  required extension "aws_commons" is not installed
HINT:  Use CREATE EXTENSION ... CASCADE to install required extensions too.

You can install both extensions in one step using the “CASCADE” option:

postgres=> create extension aws_s3 cascade;
NOTICE:  installing required extension "aws_commons"
CREATE EXTENSION

These extensions provide a couple of helper functions (aws_commons) and the function to import a file from S3 (aws_s3):

postgres=> \dx+ aws_commons
             Objects in extension "aws_commons"
                     Object description                      
-------------------------------------------------------------
 function aws_commons.create_aws_credentials(text,text,text)
 function aws_commons.create_s3_uri(text,text,text)
 schema aws_commons
 type aws_commons._aws_credentials_1
 type aws_commons._s3_uri_1
(5 rows)

postgres=> \dx+ aws_s3
                                       Objects in extension "aws_s3"
                                            Object description                                             
-----------------------------------------------------------------------------------------------------------
 function aws_s3.table_import_from_s3(text,text,text,aws_commons._s3_uri_1,aws_commons._aws_credentials_1)
 function aws_s3.table_import_from_s3(text,text,text,text,text,text,text,text,text)
 schema aws_s3
(3 rows)

Having the extension ready we need a file we can import, so lets create one (exactly the same file as in the previous post, but a bit less rows):

dwe@dwe:~/Downloads$ cat gen_data.sh 
#!/bin/bash
 
FILE="/home/dwe/Downloads/sample.csv"
rm -rf ${FILE}
 
for i in {1..1000000}; do
    echo "${i},firstname${i},lastname${i},xxx${i}@xxx.com,street${i},country${i},description${i}" >> ${FILE}
done

dwe@dwe:~/Downloads$ chmod +x gen_data.sh 
dwe@dwe:~/Downloads$ ./gen_data.sh 
dwe@dwe:~/Downloads$ head -5 sample.csv 
1,firstname1,lastname1,xxx1@xxx.com,street1,country1,description1
2,firstname2,lastname2,xxx2@xxx.com,street2,country2,description2
3,firstname3,lastname3,xxx3@xxx.com,street3,country3,description3
4,firstname4,lastname4,xxx4@xxx.com,street4,country4,description4
5,firstname5,lastname5,xxx5@xxx.com,street5,country5,description5
dwe@dwe:~/Downloads$ ls -lha sample.csv 
-rw-rw-r-- 1 dwe dwe 96M Nov 10 11:11 sample.csv

We’ll be using a new bucket for this demo, so lets create one and then upload the file we just generated:

dwe@dwe:~/Downloads$ aws s3 mb s3://s3-rds-demo --region eu-central-1
make_bucket: s3-rds-demo
dwe@dwe:~/Downloads$ aws s3 cp sample.csv s3://s3-rds-demo/
upload: ./sample.csv to s3://s3-rds-demo/sample.csv         

Before we can do anything against S3 from RDS for PostgreSQL we need to setup the required permissions. You can use security credentials for this, but it is recommended to use IAM roles and policies. The first step is to create a policy that allows listing the bucket, read and write (write is required for writing data to S3 later on):

dwe@dwe:~$ aws iam create-policy \
>    --policy-name rds-s3-policy \
>    --policy-document '{
>      "Version": "2012-10-17",
>      "Statement": [
>        {
>          "Sid": "s3import",
>          "Action": [
>            "s3:GetObject",
>            "s3:ListBucket",
>            "S3:PutObject"
>          ],
>          "Effect": "Allow",
>          "Resource": [
>            "arn:aws:s3:::s3-rds-demo", 
>            "arn:aws:s3:::s3-rds-demo/*"
>          ] 
>        }
>      ] 
>    }' 
{
    "Policy": {
        "PolicyName": "rds-s3-policy",
        "PolicyId": "ANPA2U57KX3NFH4HU4COG",
        "Arn": "arn:aws:iam::xxxxxxxx:policy/rds-s3-policy",
        "Path": "/",
        "DefaultVersionId": "v1",
        "AttachmentCount": 0,
        "PermissionsBoundaryUsageCount": 0,
        "IsAttachable": true,
        "CreateDate": "2020-11-10T12:04:34+00:00",
        "UpdateDate": "2020-11-10T12:04:34+00:00"
    }
}

Once the policy is in place we create an IAM role which gets the policy just created attached to:

dwe@dwe:~$ aws iam create-role \
>    --role-name rds-s3-role \
>    --assume-role-policy-document '{
>      "Version": "2012-10-17",
>      "Statement": [
>        {
>          "Effect": "Allow",
>          "Principal": {
>             "Service": "rds.amazonaws.com"
>           },
>          "Action": "sts:AssumeRole"
>        }
>      ] 
>    }'
{
    "Role": {
        "Path": "/",
        "RoleName": "rds-s3-role",
        "RoleId": "AROA2U57KX3NP2XWVCELI",
        "Arn": "arn:aws:iam::xxxxxxxxxx:role/rds-s3-role",
        "CreateDate": "2020-11-10T12:07:20+00:00",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "rds.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        }
    }
}

Attaching the policy to the role (you will need the ARN of the policy from above):

dwe@dwe:~$ aws iam attach-role-policy \
>    --policy-arn arn:aws:iam::xxxxxxxxxx:policy/rds-s3-policy \
>    --role-name rds-s3-role

Finally you need to attach the IAM role to the RDS instance by providing the ARN of the role and the identifier of your RDS instance:

aws rds add-role-to-db-instance \
   --db-instance-identifier dwe-postgres-helvetia \
   --feature-name s3Import \
   --role-arn arn:aws:iam::xxxxxxxx:role/rds-s3-role   \
   --region eu-central-1

Your RDS instance needs to be running to do that, otherwise you’ll get this:

An error occurred (InvalidDBInstanceState) when calling the AddRoleToDBInstance operation: The status for the dwe-postgres DB instance is stopped. The DB instance is not available for s3Import feature.

Having the IAM policy attached to the RDS instance we can load the csv, but first the s3 URI needs to be defined (we do not want to use access keys and credentials):

postgres=> SELECT aws_commons.create_s3_uri('s3-rds-demo'
postgres(>                                 ,'sample.csv'
postgres(>                                 ,'eu-central-1'
postgres(>                                 ) AS s3_uri \gset
postgres=> select :'s3_uri';
               ?column?                
---------------------------------------
 (s3-rds-demo,sample.csv,eu-central-1)
(1 row)

No we are ready to load the file:

postgres=> create table sample ( id int primary key
postgres(>                              , firstname varchar(20)
postgres(>                              , lastname varchar(20)
postgres(>                              , email varchar(20)
postgres(>                              , street varchar(20)
postgres(>                              , country varchar(20)
postgres(>                              , description varchar(20)
postgres(>                              );
CREATE TABLE
postgres=> SELECT aws_s3.table_import_from_s3 ( 'sample'
                                   , ''
                                   , '(format csv)'
                                   , :'s3_uri'
                                   );
                                 table_import_from_s3                                 
--------------------------------------------------------------------------------------
 1000000 rows imported into relation "sample" from file sample.csv of 100222272 bytes
(1 row)
postgres=> select * from sample limit 5;
 id |  firstname  |  lastname  |     email     |  street  |  country  |  description  
----+-------------+------------+---------------+----------+-----------+---------------
 77 | firstname77 | lastname77 | xxx77@xxx.com | street77 | country77 | description77
 78 | firstname78 | lastname78 | xxx78@xxx.com | street78 | country78 | description78
 79 | firstname79 | lastname79 | xxx79@xxx.com | street79 | country79 | description79
  1 | firstname1  | lastname1  | xxx1@xxx.com  | street1  | country1  | description1
  2 | firstname2  | lastname2  | xxx2@xxx.com  | street2  | country2  | description2
(5 rows)

And we’re done. The follow up post will show the opposite: Writing back to to S3 from RDS for PostgreSQL.

Cet article Loading data from S3 to AWS RDS for PostgreSQL est apparu en premier sur Blog dbi services.

Oracle Grid Infrastructure on Windows With 2 Nodes

Wed, 2020-11-11 10:53

Oracle Grid Infrastructure can be also installed on Windows server. In this blog I am explaining how this installation can be done. I am going to install an environment with to nodes. Oracle 19c is being used. We have two servers
winrac1 :
winrac2 :
with the same characteristics


I would like to note that it is just a test environment on virtual machines on VirtualBox. I did not have any dns server and I just have 1 scan address instead of 3 as recommended. Below my hosts files with the IP used

PS C:\Windows\System32\drivers\etc> Get-Content .\hosts | findstr 192
192.168.168.100         winrac1
192.168.168.101         winrac2
192.168.1.100           winrac1-priv
192.168.1.101           winrac2-priv
192.168.168.110         winrac1-vip
192.168.168.111         winrac2-vip
192.168.168.120         winrac-scan
PS C:\Windows\System32\drivers\etc>

The installation user can be local user or a domain user. If he is a local user he should
-be member of Administrators group
-exist on both nodes if he is a local one
-have the same password on both nodes

In my case the Administrator user was used
The oracle grid sofware WINDOWS.X64_193000_grid_home is already downloaded and unpacked to the GRID_HOME

C:\app\19..0.0.0\grid

The shared disks are Disk1 and Disk2 are already presented to both nodes winrac1 and winrac2

We have to disable write caching on each shared disk if supported by the system. For this right click on Disk1 for example and uncheck the Enable write caching on the device on both nodes

The next steps is to create a volume on shared disks. On the first node do following steps for all shared disks. Right click on the shared disk to create a New Simple Volume.






Once done , do a rescan disks on all other nodes


We now have to create logical partition with the shared disks

DISKPART> list disk

  Disk ###  Status         Size     Free     Dyn  Gpt
  --------  -------------  -------  -------  ---  ---
  Disk 0    Online           60 GB      0 B
* Disk 1    Online         5120 MB  5118 MB
  Disk 2    Online           32 GB    31 GB

DISKPART> select disk 1

Disk 1 is now the selected disk.

DISKPART> create partition extended

DiskPart succeeded in creating the specified partition.

DISKPART> create partition logical

DiskPart succeeded in creating the specified partition.

DISKPART>

DISKPART> select disk 2

Disk 2 is now the selected disk.

DISKPART> create partition extended

DiskPart succeeded in creating the specified partition.

DISKPART> create partition logical

DiskPart succeeded in creating the specified partition.

Do a rescan from all other nodes

Now we are going to prepare our disks to be used with ASM. For this we use the tool asmtoolg.exe. Just launch it on the first node

c:\app\19.0.0.0\grid\bin>c:\app\19.0.0.0\grid\bin\asmtoolg.exe





Repeat these steps for all disks you will use with ASM. You can list your labelled disks with following command

c:\app\19.0.0.0\grid\bin>asmtool.exe  -list
NTFS                             \Device\Harddisk0\Partition1              549M
NTFS                             \Device\Harddisk0\Partition2            60889M
ORCLDISKVOTOCR0                  \Device\Harddisk1\Partition1             5117M
ORCLDISKDATA0                    \Device\Harddisk2\Partition1            32765M

Before launching the installation, we have to set these registry values on both nodes

Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Config



And to stop the firewall on both nodes with Administrator

C:\Users\Administrator>netsh advfirewall set allprofiles state off
Ok.

Login with grid and open a terminal with as an Administrator and launch the install command

c:\app\19.0.0.0\grid>gridSetup.bat













As it is just a test environment, I decided to ignore the errors and to continue


The verification failed because of errors I ignored

But the installation is fine

At the end of the installation I can validate the cluster

C:\Users\Administrator>crsctl check cluster -all
**************************************************************
winrac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
winrac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

C:\Users\Administrator>

The status of the different resources

C:\Users\Administrator>crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       winrac1                  STABLE
               ONLINE  ONLINE       winrac2                  STABLE
ora.net1.network
               ONLINE  ONLINE       winrac1                  STABLE
               ONLINE  ONLINE       winrac2                  STABLE
ora.ons
               ONLINE  ONLINE       winrac1                  STABLE
               ONLINE  ONLINE       winrac2                  STABLE
ora.proxy_advm
               OFFLINE OFFLINE      winrac1                  STABLE
               OFFLINE OFFLINE      winrac2                  STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       winrac1                  STABLE
      2        ONLINE  ONLINE       winrac2                  STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       winrac1                  STABLE
ora.VOTOCR.dg(ora.asmgroup)
      1        ONLINE  ONLINE       winrac1                  STABLE
      2        ONLINE  ONLINE       winrac2                  STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.asm(ora.asmgroup)
      1        ONLINE  ONLINE       winrac1                  Started,STABLE
      2        ONLINE  ONLINE       winrac2                  Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       winrac1                  STABLE
      2        ONLINE  ONLINE       winrac2                  STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       winrac1                  STABLE
ora.qosmserver
      1        ONLINE  ONLINE       winrac1                  STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       winrac1                  STABLE
ora.winrac1.vip
      1        ONLINE  ONLINE       winrac1                  STABLE
ora.winrac2.vip
      1        ONLINE  ONLINE       winrac2                  STABLE
--------------------------------------------------------------------------------

C:\Users\Administrator>

The voting disk

C:\Users\Administrator>crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   01909d45161e4f74bfad14dff099dcc0 (\\.\ORCLDISKVOTOCR0) [VOTOCR]
Located 1 voting disk(s).

The OCR

C:\Users\Administrator>ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     491684
         Used space (kbytes)      :      84300
         Available space (kbytes) :     407384
         ID                       :  461265202
         Device/File Name         :    +VOTOCR
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded


C:\Users\Administrator>
Conclusion

We just see that Oracle Clusterware can be configured in a Windows environement. In coming blogs we will show how to configure database on it. But it will the same that on Linux environment

Cet article Oracle Grid Infrastructure on Windows With 2 Nodes est apparu en premier sur Blog dbi services.

Pages